Parallelize my python program

Parallelize my python program - python

I have a python program that reads a line from a input file, does some manipulation and writes it to output file. I have a quadcore machine, and I want to utilize all of them. I think there are two alternatives to do this,
Creating n multiple python processes each handling a total number of records/n
Creating n threads in a single python process for every input record and each thread processing a record.
Creating a pool of n threads in a single python process, each executing a input record.
I have never used python mutliprocessing capabilities, can the hackers please tell which method is best option?

The reference implementation of the Python interpreter (CPython) holds the infamous "Global Interpreter Lock" (GIL), effectively allowing only one thread to execute Python code at a time. As a result, multithreading is very limited in Python -- unless your heavy lifting gets done in C extensions that release the GIL.
The simplest way to overcome this limitation is to use the multiprocessing module instead. It has a similar API to threading and is pretty straight-forward to use. In your case, you could use it like this (assuming that the manipulation is the hard part):
import multiprocessing
def process_line(line):
# This function is executed in your worker processes. Manipulate the
# line and return the results.
return manipulate(line)
if __name__ == '__main__':
with open('input.txt') as fin, open('output.txt', 'w') as fout:
# This creates a pool of N worker processes, where N is the number
# of CPUs in your machine.
pool = multiprocessing.Pool()
# Let the workers do the manipulation and write the results to
# the output file:
for manipulated_line in pool.imap(process_line, fin):
fout.write(manipulated_line)

Number one is the right answer.
First of all, it is easier to create and manage multiple processes than multiple threads. You can use the multiprocessing module or something like pyro to take care of the details. Secondly, threading needs to deal with Python's global interpreter lock which makes it more complicated even if you are an expert at threading with Java or C#. And most importantly, performance on multicore machines is harder to predict than you might think. If you haven't implemented and measured two different ways to do things, your intuition as to which way is fastest, is probably wrong.
By the way if you really are an expert at Java or C# threading, then you probably should go with threading instead, but use Jython or IronPython instead of CPython.

Reading the same file from several processes concurrently is tricky. Is it possible to split the file beforehand?
While Python has the GIL both Jython and IronPython hasn't that limitation.
Also make sure that a simple single process doesn't already max disk I/O. You will have a hard time gaining anything if it does.

Related

Why python does (not) use more CPUs? [duplicate]

I'm slightly confused about whether multithreading works in Python or not.
I know there has been a lot of questions about this and I've read many of them, but I'm still confused. I know from my own experience and have seen others post their own answers and examples here on StackOverflow that multithreading is indeed possible in Python. So why is it that everyone keep saying that Python is locked by the GIL and that only one thread can run at a time? It clearly does work. Or is there some distinction I'm not getting here?
Many posters/respondents also keep mentioning that threading is limited because it does not make use of multiple cores. But I would say they are still useful because they do work simultaneously and thus get the combined workload done faster. I mean why would there even be a Python thread module otherwise?
Update:
Thanks for all the answers so far. The way I understand it is that multithreading will only run in parallel for some IO tasks, but can only run one at a time for CPU-bound multiple core tasks.
I'm not entirely sure what this means for me in practical terms, so I'll just give an example of the kind of task I'd like to multithread. For instance, let's say I want to loop through a very long list of strings and I want to do some basic string operations on each list item. If I split up the list, send each sublist to be processed by my loop/string code in a new thread, and send the results back in a queue, will these workloads run roughly at the same time? Most importantly will this theoretically speed up the time it takes to run the script?
Another example might be if I can render and save four different pictures using PIL in four different threads, and have this be faster than processing the pictures one by one after each other? I guess this speed-component is what I'm really wondering about rather than what the correct terminology is.
I also know about the multiprocessing module but my main interest right now is for small-to-medium task loads (10-30 secs) and so I think multithreading will be more appropriate because subprocesses can be slow to initiate.

The GIL does not prevent threading. All the GIL does is make sure only one thread is executing Python code at a time; control still switches between threads.
What the GIL prevents then, is making use of more than one CPU core or separate CPUs to run threads in parallel.
This only applies to Python code. C extensions can and do release the GIL to allow multiple threads of C code and one Python thread to run across multiple cores. This extends to I/O controlled by the kernel, such as select() calls for socket reads and writes, making Python handle network events reasonably efficiently in a multi-threaded multi-core setup.
What many server deployments then do, is run more than one Python process, to let the OS handle the scheduling between processes to utilize your CPU cores to the max. You can also use the multiprocessing library to handle parallel processing across multiple processes from one codebase and parent process, if that suits your use cases.
Note that the GIL is only applicable to the CPython implementation; Jython and IronPython use a different threading implementation (the native Java VM and .NET common runtime threads respectively).
To address your update directly: Any task that tries to get a speed boost from parallel execution, using pure Python code, will not see a speed-up as threaded Python code is locked to one thread executing at a time. If you mix in C extensions and I/O, however (such as PIL or numpy operations) and any C code can run in parallel with one active Python thread.
Python threading is great for creating a responsive GUI, or for handling multiple short web requests where I/O is the bottleneck more than the Python code. It is not suitable for parallelizing computationally intensive Python code, stick to the multiprocessing module for such tasks or delegate to a dedicated external library.

Yes. :)
You have the low level thread module and the higher level threading module. But it you simply want to use multicore machines, the multiprocessing module is the way to go.
Quote from the docs:
In CPython, due to the Global Interpreter Lock, only one thread can
execute Python code at once (even though certain performance-oriented
libraries might overcome this limitation). If you want your
application to make better use of the computational resources of
multi-core machines, you are advised to use multiprocessing. However,
threading is still an appropriate model if you want to run multiple
I/O-bound tasks simultaneously.

Threading is Allowed in Python, the only problem is that the GIL will make sure that just one thread is executed at a time (no parallelism).
So basically if you want to multi-thread the code to speed up calculation it won't speed it up as just one thread is executed at a time, but if you use it to interact with a database for example it will.

I feel for the poster because the answer is invariably "it depends what you want to do". However parallel speed up in python has always been terrible in my experience even for multiprocessing.
For example check this tutorial out (second to top result in google): https://www.machinelearningplus.com/python/parallel-processing-python/
I put timings around this code and increased the number of processes (2,4,8,16) for the pool map function and got the following bad timings:
serial 70.8921644706279
parallel 93.49704207479954 tasks 2
parallel 56.02441442012787 tasks 4
parallel 51.026168536394835 tasks 8
parallel 39.18044807203114 tasks 16
code:
# increase array size at the start
# my compute node has 40 CPUs so I've got plenty to spare here
arr = np.random.randint(0, 10, size=[2000000, 600])
.... more code ....
tasks = [2,4,8,16]
for task in tasks:
tic = time.perf_counter()
pool = mp.Pool(task)
results = pool.map(howmany_within_range_rowonly, [row for row in data])
pool.close()
toc = time.perf_counter()
time1 = toc - tic
print(f"parallel {time1} tasks {task}")

Use several workers to execute python code

I'm executing python code on several files. Since the files are all very big and since one call one treats one file, it lasts very long till the final file is treated. Hence, here is my question: Is it possible to use several workers which treat the files in parallel?
Is this a possible invocation?:
import annotation as annot # this is a .py-file
import multiprocessing
pool = multiprocessing.Pool(processes=4)
pool.map(annot, "")
The .py-file uses for-loops (etc.) to get all files by itself.
The problem is: If I have a look at all the processes (with 'top'), I only see 1 process which is working with the .py-file. So...I suspect that I shouldn't use multiprocessing like this...does I?
Thanks for any help! :)

Yes. Use multiprocessing.Pool.
import multiprocessing
pool = multiprocessing.Pool(processes=<pool size>)
result = pool.map(<your function>, <file list>)

My answer is not purely a python answer though I think it's the best approach given your problem.
This will only work on Unix systems (OS X/Linux/etc.).
I do stuff like this all the time, and I am in love with GNU Parallel. See this also for an introduction by the GNU Parallel developer. You will likely have to install it, but it's worth it.
Here's a simple example. Say you have a python script called processFiles.py:
#!/usr/bin/python
#
# Script to print out file name
#
fileName = sys.argv[0] # command line argument
print( fileName ) # adapt for python 2.7 if you need to
To make this file executable:
chmod +x processFiles.py
And say all your large files are in largeFileDir. Then to run all the files in parallel with four processors (-P4), run this at the command line:
$ parallel -P4 processFiles.py ::: $(ls largeFileDir/*)
This will output
file1
file3
file7
file2
...
They may not be in order because each thread is operating independently in parallel. To adapt this to your process, insert your file processing script instead of just stupidly printing the file to screen.
This is preferable to threading in your case because each file processing job will get its own instance of the Python interpreter. Since each file is processed independently (or so it sounds) threading is overkill. In my experience this is the most efficient way to parallelize a process like you describe.
There is something called the Global Interpreter Lock that I don't understand very well, but has caused me headaches when trying to use python built-ins to hyperthread. Which is why I say if you don't need to thread, don't. Instead do as I've recommended and start up independent python processes.

There are many options.
multiple threads
multiple processes
"green threads", I personally like Eventlet
Then there are more "Enterprise" solutions, which are even able running workers on multiple servers, e.g. Celery, for more search Distributed task queue python.
In all cases, your scenario will become more complex and sometime you will not gain much, e.g. if your processing is limited by I/O operations (reading the data) and not by computation and processing.

Yes, this is possible. You should investigate the threading module and the multiprocessing module. Both will allow you to execute Python code concurrently. One note with the threading module, though, is that because of the way Python is implemented (Google "python GIL" if you're interested in the details), only one thread will execute at a time, even if you have multiple CPU cores. This is different from the threading implementation in our languages, where each thread will run at the same time, each using a different core. Because of this limitation, in cases where you want to do CPU-intensive operations concurrently, you'll get better performance with the multiprocessing module.

Does Python support multithreading? Can it speed up execution time?

Yes. :)
You have the low level thread module and the higher level threading module. But it you simply want to use multicore machines, the multiprocessing module is the way to go.
Quote from the docs:
In CPython, due to the Global Interpreter Lock, only one thread can
execute Python code at once (even though certain performance-oriented
libraries might overcome this limitation). If you want your
application to make better use of the computational resources of
multi-core machines, you are advised to use multiprocessing. However,
threading is still an appropriate model if you want to run multiple
I/O-bound tasks simultaneously.

Threading is Allowed in Python, the only problem is that the GIL will make sure that just one thread is executed at a time (no parallelism).
So basically if you want to multi-thread the code to speed up calculation it won't speed it up as just one thread is executed at a time, but if you use it to interact with a database for example it will.

I feel for the poster because the answer is invariably "it depends what you want to do". However parallel speed up in python has always been terrible in my experience even for multiprocessing.
For example check this tutorial out (second to top result in google): https://www.machinelearningplus.com/python/parallel-processing-python/
I put timings around this code and increased the number of processes (2,4,8,16) for the pool map function and got the following bad timings:
serial 70.8921644706279
parallel 93.49704207479954 tasks 2
parallel 56.02441442012787 tasks 4
parallel 51.026168536394835 tasks 8
parallel 39.18044807203114 tasks 16
code:
# increase array size at the start
# my compute node has 40 CPUs so I've got plenty to spare here
arr = np.random.randint(0, 10, size=[2000000, 600])
.... more code ....
tasks = [2,4,8,16]
for task in tasks:
tic = time.perf_counter()
pool = mp.Pool(task)
results = pool.map(howmany_within_range_rowonly, [row for row in data])
pool.close()
toc = time.perf_counter()
time1 = toc - tic
print(f"parallel {time1} tasks {task}")

How to program to have all processors on your machine used?

I am running a single-threaded python program that performs massive data processing on my windows box. My machine has 8 processors. When I monitor the CPU usage in performance tab under Windows Task Manager, it shows that I am using only a very small fraction of the processing power available to me. Only one processor is being used to the fullest and all the rest are almost idle. What should I do to ensure that all my processors are used? Is multithreading a solution?

multithreading cannot make use of extra processors or cores.
You should spawn new processes instead of new threads.
This tool is by far the simplest among all that I have come across:
parallel python
Overview:
PP is a python module which provides mechanism for parallel
execution of python code on SMP
(systems with multiple processors or
cores) and clusters (computers
connected via network).
It is light, easy to install and integrate with other python software.
PP is an open source and cross-platform module written in pure
python

Multithreading is required for a single process, but it is not necessarily a solution; processor affinity can restrict it to a subset of available cores even if you have more than enough threads to use all.

As an addition to what Jon said, if you're using the standard Python interpreter you should understand the limitations with respect to multi-threading. If your threads are pure-python and aren't making system calls, they can't run concurrently on multiple processors due to the Global Interpreter Lock so the benefits to multi-threading are minimal. In this case, perhaps the recommendation would be to go with multiple processes instead or to switch to another Python implementation such as JPython or IronPython, which do not have a Global Interpreter Lock.

you can get that if your program is of the type that would benefit using python's multiprocessing module
multiprocessing uses multiple python process which avoids problems with the GIL so it's possible to use all of those cores with python code it has a easy threaded map and the basis for more complex schemes
it is similar to parallel python but is limited to the local machine and is included with python 2.6 and higher and is metaphorically similar to python's threading

Assuming your task is parallelizable, then yes, threading is certainly a solution. In particular, if you have a lot of data items to process but they can all be handled independently then it should be relatively straightforward to parallelize.
Using multiple processes instead of multiple threads might be another solution - you haven't told us enough about the problem to say, really.

Do this.
Break your task in to steps or stages. Each step reads something, does part of the overall calculation and writes something.
"""Some Step."""
import json
for some_line in sys.stdin:
object= json.loads( some_line )
# process the object
json.dump( result, sys.stdout )
Something like that ought to do fine.
If you have multiple objects that must be communicated, make a simple dictionary of the objects.
results = { 'a': a, 'b': b }
Connect them in a pipeline, like this.
python step1.py | python step2.py | python step3.py >output_file.dat
If you can break things into 8 or more steps, you will use 8 or more cores. And, BTW, this will be blazingly fast for very little real work.

multiprocess or threading in python?

I have a python application that grabs a collection of data and for each piece of data in that collection it performs a task. The task takes some time to complete as there is a delay involved. Because of this delay, I don't want each piece of data to perform the task subsequently, I want them to all happen in parallel. Should I be using multiprocess? or threading for this operation?
I attempted to use threading but had some trouble, often some of the tasks would never actually fire.

If you are truly compute bound, using the multiprocessing module is probably the lightest weight solution (in terms of both memory consumption and implementation difficulty.)
If you are I/O bound, using the threading module will usually give you good results. Make sure that you use thread safe storage (like the Queue) to hand data to your threads. Or else hand them a single piece of data that is unique to them when they are spawned.
PyPy is focused on performance. It has a number of features that can help with compute-bound processing. They also have support for Software Transactional Memory, although that is not yet production quality. The promise is that you can use simpler parallel or concurrent mechanisms than multiprocessing (which has some awkward requirements.)
Stackless Python is also a nice idea. Stackless has portability issues as indicated above. Unladen Swallow was promising, but is now defunct. Pyston is another (unfinished) Python implementation focusing on speed. It is taking an approach different to PyPy, which may yield better (or just different) speedups.

Tasks runs like sequentially but you have the illusion that are run in parallel. Tasks are good when you use for file or connection I/O and because are lightweights.
Multiprocess with Pool may be the right solution for you because processes runs in parallel so are very good with intensive computing because each process run in one CPU (or core).
Setup multiprocess may be very easy:
from multiprocessing import Pool
def worker(input_item):
output = do_some_work()
return output
pool = Pool() # it make one process for each CPU (or core) of your PC. Use "Pool(4)" to force to use 4 processes, for example.
list_of_results = pool.map(worker, input_list) # Launch all automatically

For small collections of data, simply create subprocesses with subprocess.Popen.
Each subprocess can simply get it's piece of data from stdin or from command-line arguments, do it's processing, and simply write the result to an output file.
When the subprocesses have all finished (or timed out), you simply merge the output files.
Very simple.

You might consider looking into Stackless Python. If you have control over the function that takes a long time, you can just throw some stackless.schedule()s in there (saying yield to the next coroutine), or else you can set Stackless to preemptive multitasking.
In Stackless, you don't have threads, but tasklets or greenlets which are essentially very lightweight threads. It works great in the sense that there's a pretty good framework with very little setup to get multitasking going.
However, Stackless hinders portability because you have to replace a few of the standard Python libraries -- Stackless removes reliance on the C stack. It's very portable if the next user also has Stackless installed, but that will rarely be the case.

Using CPython's threading model will not give you any performance improvement, because the threads are not actually executed in parallel, due to the way garbage collection is handled. Multiprocess would allow parallel execution. Obviously in this case you have to have multiple cores available to farm out your parallel jobs to.
There is much more information available in this related question.

If you can easily partition and separate the data you have, it sounds like you should just do that partitioning externally, and feed them to several processes of your program. (i.e. several processes instead of threads)

IronPython has real multithreading, unlike CPython and it's GIL. So depending on what you're doing it may be worth looking at. But it sounds like your use case is better suited to the multiprocessing module.
To the guy who recommends stackless python, I'm not an expert on it, but it seems to me that he's talking about software "multithreading", which is actually not parallel at all (still runs in one physical thread, so cannot scale to multiple cores.) It's merely an alternative way to structure asynchronous (but still single-threaded, non-parallel) application.

You may want to look at Twisted. It is designed for asynchronous network tasks.

Develop Reference

Python is a programming language that lets you work quickly and integrate systems more effectively.