Threads in Python

Threads in Python - python

General tutorial or good resource on how to use threads in Python?
When to use threads, how they are effective, and some general background on threads [specific to Python]?

Threads should be used when you want two things to run at once, or want something to run in the background without slowing down the main process.
My recommendation is to only use threads if you have to. They generally add complexity to a program.
The main documentation for threading is here: http://docs.python.org/library/threading.html
Some examples are here:
http://www.devshed.com/c/a/Python/Basic-Threading-in-Python/1/
http://linuxgazette.net/107/pai.html
http://www.wellho.net/solutions/python-python-threads-a-first-example.html

One thing to remember before spending time and effort in writing a multi-threaded Python application is that there is a Global Interpreter Lock (GIL), so you won't actually be running more than one thread at a time.
This makes threading unsuitable for trying to take advantage of multiple cores or CPUs. You may get some speedup from multiplexing other resources (network, disk, ...), but it's never been particularly noticeable in my experience.
In general, I only use threads when there are several logically separate tasks happening at once, and yet I want them all in the same VM. A thread pulling data from the web and putting it on a Queue, while another thread pops from the Queue and writes to a database, something like that.
With Python 2.6, there is the new multiprocessing module which is pretty cool - it's got a very similar interface to the threading module, but actually spawns new OS processes, sidestepping the GIL.

There is a fantastic pdf, Tutorial on Threads Programming with Python by
Norman Matloff and Francis Hsu, of University of California, Davis.
Threads should be avoided whenever possible. They add much in complexity, synchronization issues and hard to debug issues. However some problems require them (i.e. GUI programming), but I would encourage you to look for a single-threaded solution if you can.

There are several tutorials here.

Related

Why python does (not) use more CPUs? [duplicate]

I'm slightly confused about whether multithreading works in Python or not.
I know there has been a lot of questions about this and I've read many of them, but I'm still confused. I know from my own experience and have seen others post their own answers and examples here on StackOverflow that multithreading is indeed possible in Python. So why is it that everyone keep saying that Python is locked by the GIL and that only one thread can run at a time? It clearly does work. Or is there some distinction I'm not getting here?
Many posters/respondents also keep mentioning that threading is limited because it does not make use of multiple cores. But I would say they are still useful because they do work simultaneously and thus get the combined workload done faster. I mean why would there even be a Python thread module otherwise?
Update:
Thanks for all the answers so far. The way I understand it is that multithreading will only run in parallel for some IO tasks, but can only run one at a time for CPU-bound multiple core tasks.
I'm not entirely sure what this means for me in practical terms, so I'll just give an example of the kind of task I'd like to multithread. For instance, let's say I want to loop through a very long list of strings and I want to do some basic string operations on each list item. If I split up the list, send each sublist to be processed by my loop/string code in a new thread, and send the results back in a queue, will these workloads run roughly at the same time? Most importantly will this theoretically speed up the time it takes to run the script?
Another example might be if I can render and save four different pictures using PIL in four different threads, and have this be faster than processing the pictures one by one after each other? I guess this speed-component is what I'm really wondering about rather than what the correct terminology is.
I also know about the multiprocessing module but my main interest right now is for small-to-medium task loads (10-30 secs) and so I think multithreading will be more appropriate because subprocesses can be slow to initiate.

The GIL does not prevent threading. All the GIL does is make sure only one thread is executing Python code at a time; control still switches between threads.
What the GIL prevents then, is making use of more than one CPU core or separate CPUs to run threads in parallel.
This only applies to Python code. C extensions can and do release the GIL to allow multiple threads of C code and one Python thread to run across multiple cores. This extends to I/O controlled by the kernel, such as select() calls for socket reads and writes, making Python handle network events reasonably efficiently in a multi-threaded multi-core setup.
What many server deployments then do, is run more than one Python process, to let the OS handle the scheduling between processes to utilize your CPU cores to the max. You can also use the multiprocessing library to handle parallel processing across multiple processes from one codebase and parent process, if that suits your use cases.
Note that the GIL is only applicable to the CPython implementation; Jython and IronPython use a different threading implementation (the native Java VM and .NET common runtime threads respectively).
To address your update directly: Any task that tries to get a speed boost from parallel execution, using pure Python code, will not see a speed-up as threaded Python code is locked to one thread executing at a time. If you mix in C extensions and I/O, however (such as PIL or numpy operations) and any C code can run in parallel with one active Python thread.
Python threading is great for creating a responsive GUI, or for handling multiple short web requests where I/O is the bottleneck more than the Python code. It is not suitable for parallelizing computationally intensive Python code, stick to the multiprocessing module for such tasks or delegate to a dedicated external library.

Yes. :)
You have the low level thread module and the higher level threading module. But it you simply want to use multicore machines, the multiprocessing module is the way to go.
Quote from the docs:
In CPython, due to the Global Interpreter Lock, only one thread can
execute Python code at once (even though certain performance-oriented
libraries might overcome this limitation). If you want your
application to make better use of the computational resources of
multi-core machines, you are advised to use multiprocessing. However,
threading is still an appropriate model if you want to run multiple
I/O-bound tasks simultaneously.

Threading is Allowed in Python, the only problem is that the GIL will make sure that just one thread is executed at a time (no parallelism).
So basically if you want to multi-thread the code to speed up calculation it won't speed it up as just one thread is executed at a time, but if you use it to interact with a database for example it will.

I feel for the poster because the answer is invariably "it depends what you want to do". However parallel speed up in python has always been terrible in my experience even for multiprocessing.
For example check this tutorial out (second to top result in google): https://www.machinelearningplus.com/python/parallel-processing-python/
I put timings around this code and increased the number of processes (2,4,8,16) for the pool map function and got the following bad timings:
serial 70.8921644706279
parallel 93.49704207479954 tasks 2
parallel 56.02441442012787 tasks 4
parallel 51.026168536394835 tasks 8
parallel 39.18044807203114 tasks 16
code:
# increase array size at the start
# my compute node has 40 CPUs so I've got plenty to spare here
arr = np.random.randint(0, 10, size=[2000000, 600])
.... more code ....
tasks = [2,4,8,16]
for task in tasks:
tic = time.perf_counter()
pool = mp.Pool(task)
results = pool.map(howmany_within_range_rowonly, [row for row in data])
pool.close()
toc = time.perf_counter()
time1 = toc - tic
print(f"parallel {time1} tasks {task}")

Does Python support multithreading? Can it speed up execution time?

Yes. :)
You have the low level thread module and the higher level threading module. But it you simply want to use multicore machines, the multiprocessing module is the way to go.
Quote from the docs:
In CPython, due to the Global Interpreter Lock, only one thread can
execute Python code at once (even though certain performance-oriented
libraries might overcome this limitation). If you want your
application to make better use of the computational resources of
multi-core machines, you are advised to use multiprocessing. However,
threading is still an appropriate model if you want to run multiple
I/O-bound tasks simultaneously.

Threading is Allowed in Python, the only problem is that the GIL will make sure that just one thread is executed at a time (no parallelism).
So basically if you want to multi-thread the code to speed up calculation it won't speed it up as just one thread is executed at a time, but if you use it to interact with a database for example it will.

I feel for the poster because the answer is invariably "it depends what you want to do". However parallel speed up in python has always been terrible in my experience even for multiprocessing.
For example check this tutorial out (second to top result in google): https://www.machinelearningplus.com/python/parallel-processing-python/
I put timings around this code and increased the number of processes (2,4,8,16) for the pool map function and got the following bad timings:
serial 70.8921644706279
parallel 93.49704207479954 tasks 2
parallel 56.02441442012787 tasks 4
parallel 51.026168536394835 tasks 8
parallel 39.18044807203114 tasks 16
code:
# increase array size at the start
# my compute node has 40 CPUs so I've got plenty to spare here
arr = np.random.randint(0, 10, size=[2000000, 600])
.... more code ....
tasks = [2,4,8,16]
for task in tasks:
tic = time.perf_counter()
pool = mp.Pool(task)
results = pool.map(howmany_within_range_rowonly, [row for row in data])
pool.close()
toc = time.perf_counter()
time1 = toc - tic
print(f"parallel {time1} tasks {task}")

Tasks queue process in python

Task is:
I have task queue stored in db. It grows. I need to solve tasks by python script when I have resources for it. I see two ways:
python script working all the time. But i don't like it (reason posible memory leak).
python script called by cron and do a little part of task. But i need to solve the problem of one working active script in memory (To prevent active scripts count grow). What is the best solution to implement it in python?
Any ideas to solve this problem at all?

You can use a lockfile to prevent multiple scripts from running out of cron. See the answers to an earlier question, "Python: module for creating PID-based lockfile". This is really just good practice in general for anything that you need to make sure won't have multiple instances running, actually, so you should look into it even if you do have the script running constantly, which I do suggest.
For most things, it shouldn't be too hard to avoid memory leaks, but if you're having a lot of trouble with it (I sometimes do with complex third-party web frameworks, for example), I would suggest instead writing the script with a small, carefully-designed main loop that monitors the database for new jobs, and then uses the multiprocessing module to fork off new processes to complete each task.
When a task is complete, the child process can exit, immediately freeing any memory that isn't properly garbage collected, and the main loop should be simple enough that you can avoid any memory leaks.
This also offers the advantage that you can run multiple tasks in parallel if your system has more than one CPU core, or if your tasks spend a lot of time waiting for I/O.

This is a bit of a vague question. One thing you should remember is that it is very difficult to leak memory in Python, because of the automatic garbage collection. croning a Python script to handle the queue isn't very nice, although it would work fine.
I would use method 1; if you need more power you could make a small Python process that monitors the DB queue and starts new processes to handle the tasks.

I'd suggest using Celery, an asynchronous task queuing system which I use myself.
It may seem a bit heavy for your use case, but it makes it easy to expand later by adding more worker resources if/when needed.

Threading in a PyQt application: Use Qt threads or Python threads?

I'm writing a GUI application that regularly retrieves data through a web connection. Since this retrieval takes a while, this causes the UI to be unresponsive during the retrieval process (it cannot be split into smaller parts). This is why I'd like to outsource the web connection to a separate worker thread.
[Yes, I know, now I have two problems.]
Anyway, the application uses PyQt4, so I'd like to know what the better choice is: Use Qt's threads or use the Python threading module? What are advantages / disadvantages of each? Or do you have a totally different suggestion?
Edit (re bounty): While the solution in my particular case will probably be using a non-blocking network request like Jeff Ober and Lukáš Lalinský suggested (so basically leaving the concurrency problems to the networking implementation), I'd still like a more in-depth answer to the general question:
What are advantages and disadvantages of using PyQt4's (i.e. Qt's) threads over native Python threads (from the threading module)?
Edit 2: Thanks all for you answers. Although there's no 100% agreement, there seems to be widespread consensus that the answer is "use Qt", since the advantage of that is integration with the rest of the library, while causing no real disadvantages.
For anyone looking to choose between the two threading implementations, I highly recommend they read all the answers provided here, including the PyQt mailing list thread that abbot links to.
There were several answers I considered for the bounty; in the end I chose abbot's for the very relevant external reference; it was, however, a close call.
Thanks again.

This was discussed not too long ago in PyQt mailing list. Quoting Giovanni Bajo's comments on the subject:
It's mostly the same. The main difference is that QThreads are better
integrated with Qt (asynchrnous signals/slots, event loop, etc.).
Also, you can't use Qt from a Python thread (you can't for instance
post event to the main thread through QApplication.postEvent): you
need a QThread for that to work.
A general rule of thumb might be to use QThreads if you're going to interact somehow with Qt, and use Python threads otherwise.
And some earlier comment on this subject from PyQt's author: "they are both wrappers around the same native thread implementations". And both implementations use GIL in the same way.

Python's threads will be simpler and safer, and since it is for an I/O-based application, they are able to bypass the GIL. That said, have you considered non-blocking I/O using Twisted or non-blocking sockets/select?
EDIT: more on threads
Python threads
Python's threads are system threads. However, Python uses a global interpreter lock (GIL) to ensure that the interpreter is only ever executing a certain size block of byte-code instructions at a time. Luckily, Python releases the GIL during input/output operations, making threads useful for simulating non-blocking I/O.
Important caveat: This can be misleading, since the number of byte-code instructions does not correspond to the number of lines in a program. Even a single assignment may not be atomic in Python, so a mutex lock is necessary for any block of code that must be executed atomically, even with the GIL.
QT threads
When Python hands off control to a 3rd party compiled module, it releases the GIL. It becomes the responsibility of the module to ensure atomicity where required. When control is passed back, Python will use the GIL. This can make using 3rd party libraries in conjunction with threads confusing. It is even more difficult to use an external threading library because it adds uncertainty as to where and when control is in the hands of the module vs the interpreter.
QT threads operate with the GIL released. QT threads are able to execute QT library code (and other compiled module code that does not acquire the GIL) concurrently. However, the Python code executed within the context of a QT thread still acquires the GIL, and now you have to manage two sets of logic for locking your code.
In the end, both QT threads and Python threads are wrappers around system threads. Python threads are marginally safer to use, since those parts that are not written in Python (implicitly using the GIL) use the GIL in any case (although the caveat above still applies.)
Non-blocking I/O
Threads add extraordinarily complexity to your application. Especially when dealing with the already complex interaction between the Python interpreter and compiled module code. While many find event-based programming difficult to follow, event-based, non-blocking I/O is often much less difficult to reason about than threads.
With asynchronous I/O, you can always be sure that, for each open descriptor, the path of execution is consistent and orderly. There are, obviously, issues that must be addressed, such as what to do when code depending on one open channel further depends on the results of code to be called when another open channel returns data.
One nice solution for event-based, non-blocking I/O is the new Diesel library. It is restricted to Linux at the moment, but it is extraordinarily fast and quite elegant.
It is also worth your time to learn pyevent, a wrapper around the wonderful libevent library, which provides a basic framework for event-based programming using the fastest available method for your system (determined at compile time).

The advantage of QThread is that it's integrated with the rest of the Qt library. That is, thread-aware methods in Qt will need to know in which thread they run, and to move objects between threads, you will need to use QThread. Another useful feature is running your own event loop in a thread.
If you are accessing a HTTP server, you should consider QNetworkAccessManager.

I asked myself the same question when I was working to PyTalk.
If you are using Qt, you need to use QThread to be able to use the Qt framework and expecially the signal/slot system.
With the signal/slot engine, you will be able to talk from a thread to another and with every part of your project.
Moreover, there is not very performance question about this choice since both are a C++ bindings.
Here is my experience of PyQt and thread.
I encourage you to use QThread.

Jeff has some good points. Only one main thread can do any GUI updates. If you do need to update the GUI from within the thread, Qt-4's queued connection signals make it easy to send data across threads and will automatically be invoked if you're using QThread; I'm not sure if they will be if you're using Python threads, although it's easy to add a parameter to connect().

I can't really recommend either, but I can try describing differences between CPython and Qt threads.
First of all, CPython threads do not run concurrently, at least not Python code. Yes, they do create system threads for each Python thread, however only the thread currently holding Global Interpreter Lock is allowed to run (C extensions and FFI code might bypass it, but Python bytecode is not executed while thread doesn't hold GIL).
On the other hand, we have Qt threads, which are basically common layer over system threads, don't have Global Interpreter Lock, and thus are capable of running concurrently. I'm not sure how PyQt deals with it, however unless your Qt threads call Python code, they should be able to run concurrently (bar various extra locks that might be implemented in various structures).
For extra fine-tuning, you can modify the amount of bytecode instructions that are interpreted before switching ownership of GIL - lower values mean more context switching (and possibly higher responsiveness) but lower performance per individual thread (context switches have their cost - if you try switching every few instructions it doesn't help speed.)
Hope it helps with your problems :)

I can't comment on the exact differences between Python and PyQt threads, but I've been doing what you're attempting to do using QThread, QNetworkAcessManager and making sure to call QApplication.processEvents() while the thread is alive. If GUI responsiveness is really the issue you're trying to solve, the later will help.

Python Global Interpreter Lock (GIL) workaround on multi-core systems using taskset on Linux?

So I just finished watching this talk on the Python Global Interpreter Lock (GIL) http://blip.tv/file/2232410.
The gist of it is that the GIL is a pretty good design for single core systems (Python essentially leaves the thread handling/scheduling up to the operating system). But that this can seriously backfire on multi-core systems and you end up with IO intensive threads being heavily blocked by CPU intensive threads, the expense of context switching, the ctrl-C problem[*] and so on.
So since the GIL limits us to basically executing a Python program on one CPU my thought is why not accept this and simply use taskset on Linux to set the affinity of the program to a certain core/cpu on the system (especially in a situation with multiple Python apps running on a multi-core system)?
So ultimately my question is this: has anyone tried using taskset on Linux with Python applications (especially when running multiple applications on a Linux system so that multiple cores can be used with one or two Python applications bound to a specific core) and if so what were the results? is it worth doing? Does it make things worse for certain workloads? I plan to do this and test it out (basically see if the program takes more or less time to run) but would love to hear from others as to your experiences.
Addition: David Beazley (the guy giving the talk in the linked video) pointed out that some C/C++ extensions manually release the GIL lock and if these extensions are optimized for multi-core (i.e. scientific or numeric data analysis/etc.) then rather than getting the benefits of multi-core for number crunching the extension would be effectively crippled in that it is limited to a single core (thus potentially slowing your program down significantly). On the other hand if you aren't using extensions such as this
The reason I am not using the multiprocessing module is that (in this case) part of the program is heavily network I/O bound (HTTP requests) so having a pool of worker threads is a GREAT way to squeeze performance out of a box since a thread fires off an HTTP request and then since it's waiting on I/O gives up the GIL and another thread can do it's thing, so that part of the program can easily run 100+ threads without hurting the CPU much and let me actually use the network bandwidth that is available. As for stackless Python/etc I'm not overly interested in rewriting the program or replacing my Python stack (availability would also be a concern).
[*] Only the main thread can receive signals so if you send a ctrl-C the Python interpreter basically tries to get the main thread to run so it can handle the signal, but since it doesn't directly control which thread is run (this is left to the operating system) it basically tells the OS to keep switching threads until it eventually hits the main thread (which if you are unlucky may take a while).

Another solution is:
http://docs.python.org/library/multiprocessing.html
Note 1: This is not a limitation of the Python language, but of CPython implementation.
Note 2: With regard to affinity, your OS shouldn't have a problem doing that itself.

I have never heard of anyone using taskset for a performance gain with Python. Doesn't mean it can't happen in your case, but definitely publish your results so others can critique your benchmarking methods and provide validation.
Personally though, I would decouple your I/O threads from the CPU bound threads using a message queue. That way your front end is now completely network I/O bound (some with HTTP interface, some with message queue interface) and ideal for your threading situation. Then the CPU intense processes can either use multiprocessing or just be individual processes waiting for work to arrive on the message queue.
In the longer term you might also want to consider replacing your threaded I/O front-end with Twisted or some thing like eventlets because, even if they won't help performance they should improve scalability. Your back-end is now already scalable because you can run your message queue over any number of machines+cpus as needed.

An interesting solution is the experiment reported by Ryan Kelly on his blog: http://www.rfk.id.au/blog/entry/a-gil-adventure-threading2/
The results seems very satisfactory.

I've found the following rule of thumb sufficient over the years: If the workers are dependent on some shared state, I use one multiprocessing process per core (CPU bound), and per core a fix pool of worker threads (I/O bound). The OS will take care of assigining the different Python processes to the cores.

The Python GIL is per Python interpreter. That means the only to avoid problems with it while doing multiprocessing is simply starting multiple interpreters (i.e. using seperate processes instead of threads for concurrency) and then using some other IPC primitive for communication between the processes (such as sockets). That being said, the GIL is not a problem when using threads with blocking I/O calls.
The main problem of the GIL as mentioned earlier is that you can't execute 2 different python code threads at the same time. A thread blocking on a blocking I/O call is blocked and hence not executin python code. This means it is not blocking the GIL. If you have two CPU intensive tasks in seperate python threads, that's where the GIL kills multi-processing in Python (only the CPython implementation, as pointed out earlier). Because the GIL stops CPU #1 from executing a python thread while CPU #0 is busy executing the other python thread.

Until such time as the GIL is removed from Python, co-routines may be used in place of threads. I have it on good authority that this strategy has been implemented by two successful start-ups, using greenlets in at least one case.

This is a pretty old question but since everytime I search about information related to python and performance on multi-core systems this post is always on the result list, I would not let this past before me an do not share my thoughts.
You can use the multiprocessing module that rather than create threads for each task, it creates another process of cpython compier interpreting your code.
It would make your application to take advantage of multicore systems.
The only problem that I see on this approach is that you will have a considerable overhead by creating an entire new process stack on memory. (http://en.wikipedia.org/wiki/Thread_(computing)#How_threads_differ_from_processes)
Python Multiprocessing module:
http://docs.python.org/dev/library/multiprocessing.html
"The reason I am not using the multiprocessing module is that (in this case) part of the program is heavily network I/O bound (HTTP requests) so having a pool of worker threads is a GREAT way to squeeze performance out of a box..."
About this, I guess that you can have also a pool of process too: http://docs.python.org/dev/library/multiprocessing.html#using-a-pool-of-workers
Att,
Leo

Develop Reference

Python is a programming language that lets you work quickly and integrate systems more effectively.