This article explains the python GIL pretty well.
https://www.obytes.com/blog/2018/explaining-python-interpreter-lock/
In Python 2, a thread can at most perform 100 instructions/ticks before releasing the GIL. This setting can be changed with sys.setcheckinterval(). Then the OS will decide which thread will pick up the GIL next, possibly the same thread again based on the criteria the OS uses. The problem here is that the low-priority threads may keep waiting forever.
In Python 3, a thread can at most run for 5ms before releasing the GIL. Then with some system-level signals the same thread won’t pick up the GIL again immediately after. Sounds more fair. However since an I/O call will lead to voluntarily releasing the GIL, a thread with many I/O calls will keep losing the GIL to other CPU bound threads each time by 5ms (at most, say there is a long-running thread). So worst case scenario a thread with 1000 I/O calls will have an overhead of 1000 * 5ms = 5 sec. That’s quite something since you can save that 5 sec by running a non-threaded version. Ironic it is.
The root of the problem here is the GIL, or the fact that CPython is single-threaded for its memory management. Jython, IronPython, etc took away the GIL using transactional memory, but they don’t support C extensions.
Hopefully someday we will have CPython without the GIL.