How to program to have all processors on your machine used? - python

I am running a single-threaded python program that performs massive data processing on my windows box. My machine has 8 processors. When I monitor the CPU usage in performance tab under Windows Task Manager, it shows that I am using only a very small fraction of the processing power available to me. Only one processor is being used to the fullest and all the rest are almost idle. What should I do to ensure that all my processors are used? Is multithreading a solution?

multithreading cannot make use of extra processors or cores.
You should spawn new processes instead of new threads.
This tool is by far the simplest among all that I have come across:
parallel python
Overview:
PP is a python module which provides mechanism for parallel
execution of python code on SMP
(systems with multiple processors or
cores) and clusters (computers
connected via network).
It is light, easy to install and integrate with other python software.
PP is an open source and cross-platform module written in pure
python

Multithreading is required for a single process, but it is not necessarily a solution; processor affinity can restrict it to a subset of available cores even if you have more than enough threads to use all.

As an addition to what Jon said, if you're using the standard Python interpreter you should understand the limitations with respect to multi-threading. If your threads are pure-python and aren't making system calls, they can't run concurrently on multiple processors due to the Global Interpreter Lock so the benefits to multi-threading are minimal. In this case, perhaps the recommendation would be to go with multiple processes instead or to switch to another Python implementation such as JPython or IronPython, which do not have a Global Interpreter Lock.

you can get that if your program is of the type that would benefit using python's multiprocessing module
multiprocessing uses multiple python process which avoids problems with the GIL so it's possible to use all of those cores with python code it has a easy threaded map and the basis for more complex schemes
it is similar to parallel python but is limited to the local machine and is included with python 2.6 and higher and is metaphorically similar to python's threading

Assuming your task is parallelizable, then yes, threading is certainly a solution. In particular, if you have a lot of data items to process but they can all be handled independently then it should be relatively straightforward to parallelize.
Using multiple processes instead of multiple threads might be another solution - you haven't told us enough about the problem to say, really.

Do this.
Break your task in to steps or stages. Each step reads something, does part of the overall calculation and writes something.
"""Some Step."""
import json
for some_line in sys.stdin:
object= json.loads( some_line )
# process the object
json.dump( result, sys.stdout )
Something like that ought to do fine.
If you have multiple objects that must be communicated, make a simple dictionary of the objects.
results = { 'a': a, 'b': b }
Connect them in a pipeline, like this.
python step1.py | python step2.py | python step3.py >output_file.dat
If you can break things into 8 or more steps, you will use 8 or more cores. And, BTW, this will be blazingly fast for very little real work.

Related

Why python does (not) use more CPUs? [duplicate]

I'm slightly confused about whether multithreading works in Python or not.
I know there has been a lot of questions about this and I've read many of them, but I'm still confused. I know from my own experience and have seen others post their own answers and examples here on StackOverflow that multithreading is indeed possible in Python. So why is it that everyone keep saying that Python is locked by the GIL and that only one thread can run at a time? It clearly does work. Or is there some distinction I'm not getting here?
Many posters/respondents also keep mentioning that threading is limited because it does not make use of multiple cores. But I would say they are still useful because they do work simultaneously and thus get the combined workload done faster. I mean why would there even be a Python thread module otherwise?
Update:
Thanks for all the answers so far. The way I understand it is that multithreading will only run in parallel for some IO tasks, but can only run one at a time for CPU-bound multiple core tasks.
I'm not entirely sure what this means for me in practical terms, so I'll just give an example of the kind of task I'd like to multithread. For instance, let's say I want to loop through a very long list of strings and I want to do some basic string operations on each list item. If I split up the list, send each sublist to be processed by my loop/string code in a new thread, and send the results back in a queue, will these workloads run roughly at the same time? Most importantly will this theoretically speed up the time it takes to run the script?
Another example might be if I can render and save four different pictures using PIL in four different threads, and have this be faster than processing the pictures one by one after each other? I guess this speed-component is what I'm really wondering about rather than what the correct terminology is.
I also know about the multiprocessing module but my main interest right now is for small-to-medium task loads (10-30 secs) and so I think multithreading will be more appropriate because subprocesses can be slow to initiate.
The GIL does not prevent threading. All the GIL does is make sure only one thread is executing Python code at a time; control still switches between threads.
What the GIL prevents then, is making use of more than one CPU core or separate CPUs to run threads in parallel.
This only applies to Python code. C extensions can and do release the GIL to allow multiple threads of C code and one Python thread to run across multiple cores. This extends to I/O controlled by the kernel, such as select() calls for socket reads and writes, making Python handle network events reasonably efficiently in a multi-threaded multi-core setup.
What many server deployments then do, is run more than one Python process, to let the OS handle the scheduling between processes to utilize your CPU cores to the max. You can also use the multiprocessing library to handle parallel processing across multiple processes from one codebase and parent process, if that suits your use cases.
Note that the GIL is only applicable to the CPython implementation; Jython and IronPython use a different threading implementation (the native Java VM and .NET common runtime threads respectively).
To address your update directly: Any task that tries to get a speed boost from parallel execution, using pure Python code, will not see a speed-up as threaded Python code is locked to one thread executing at a time. If you mix in C extensions and I/O, however (such as PIL or numpy operations) and any C code can run in parallel with one active Python thread.
Python threading is great for creating a responsive GUI, or for handling multiple short web requests where I/O is the bottleneck more than the Python code. It is not suitable for parallelizing computationally intensive Python code, stick to the multiprocessing module for such tasks or delegate to a dedicated external library.
Yes. :)
You have the low level thread module and the higher level threading module. But it you simply want to use multicore machines, the multiprocessing module is the way to go.
Quote from the docs:
In CPython, due to the Global Interpreter Lock, only one thread can
execute Python code at once (even though certain performance-oriented
libraries might overcome this limitation). If you want your
application to make better use of the computational resources of
multi-core machines, you are advised to use multiprocessing. However,
threading is still an appropriate model if you want to run multiple
I/O-bound tasks simultaneously.
Threading is Allowed in Python, the only problem is that the GIL will make sure that just one thread is executed at a time (no parallelism).
So basically if you want to multi-thread the code to speed up calculation it won't speed it up as just one thread is executed at a time, but if you use it to interact with a database for example it will.
I feel for the poster because the answer is invariably "it depends what you want to do". However parallel speed up in python has always been terrible in my experience even for multiprocessing.
For example check this tutorial out (second to top result in google): https://www.machinelearningplus.com/python/parallel-processing-python/
I put timings around this code and increased the number of processes (2,4,8,16) for the pool map function and got the following bad timings:
serial 70.8921644706279
parallel 93.49704207479954 tasks 2
parallel 56.02441442012787 tasks 4
parallel 51.026168536394835 tasks 8
parallel 39.18044807203114 tasks 16
code:
# increase array size at the start
# my compute node has 40 CPUs so I've got plenty to spare here
arr = np.random.randint(0, 10, size=[2000000, 600])
.... more code ....
tasks = [2,4,8,16]
for task in tasks:
tic = time.perf_counter()
pool = mp.Pool(task)
results = pool.map(howmany_within_range_rowonly, [row for row in data])
pool.close()
toc = time.perf_counter()
time1 = toc - tic
print(f"parallel {time1} tasks {task}")

Does Python support multithreading? Can it speed up execution time?

I'm slightly confused about whether multithreading works in Python or not.
I know there has been a lot of questions about this and I've read many of them, but I'm still confused. I know from my own experience and have seen others post their own answers and examples here on StackOverflow that multithreading is indeed possible in Python. So why is it that everyone keep saying that Python is locked by the GIL and that only one thread can run at a time? It clearly does work. Or is there some distinction I'm not getting here?
Many posters/respondents also keep mentioning that threading is limited because it does not make use of multiple cores. But I would say they are still useful because they do work simultaneously and thus get the combined workload done faster. I mean why would there even be a Python thread module otherwise?
Update:
Thanks for all the answers so far. The way I understand it is that multithreading will only run in parallel for some IO tasks, but can only run one at a time for CPU-bound multiple core tasks.
I'm not entirely sure what this means for me in practical terms, so I'll just give an example of the kind of task I'd like to multithread. For instance, let's say I want to loop through a very long list of strings and I want to do some basic string operations on each list item. If I split up the list, send each sublist to be processed by my loop/string code in a new thread, and send the results back in a queue, will these workloads run roughly at the same time? Most importantly will this theoretically speed up the time it takes to run the script?
Another example might be if I can render and save four different pictures using PIL in four different threads, and have this be faster than processing the pictures one by one after each other? I guess this speed-component is what I'm really wondering about rather than what the correct terminology is.
I also know about the multiprocessing module but my main interest right now is for small-to-medium task loads (10-30 secs) and so I think multithreading will be more appropriate because subprocesses can be slow to initiate.
The GIL does not prevent threading. All the GIL does is make sure only one thread is executing Python code at a time; control still switches between threads.
What the GIL prevents then, is making use of more than one CPU core or separate CPUs to run threads in parallel.
This only applies to Python code. C extensions can and do release the GIL to allow multiple threads of C code and one Python thread to run across multiple cores. This extends to I/O controlled by the kernel, such as select() calls for socket reads and writes, making Python handle network events reasonably efficiently in a multi-threaded multi-core setup.
What many server deployments then do, is run more than one Python process, to let the OS handle the scheduling between processes to utilize your CPU cores to the max. You can also use the multiprocessing library to handle parallel processing across multiple processes from one codebase and parent process, if that suits your use cases.
Note that the GIL is only applicable to the CPython implementation; Jython and IronPython use a different threading implementation (the native Java VM and .NET common runtime threads respectively).
To address your update directly: Any task that tries to get a speed boost from parallel execution, using pure Python code, will not see a speed-up as threaded Python code is locked to one thread executing at a time. If you mix in C extensions and I/O, however (such as PIL or numpy operations) and any C code can run in parallel with one active Python thread.
Python threading is great for creating a responsive GUI, or for handling multiple short web requests where I/O is the bottleneck more than the Python code. It is not suitable for parallelizing computationally intensive Python code, stick to the multiprocessing module for such tasks or delegate to a dedicated external library.
Yes. :)
You have the low level thread module and the higher level threading module. But it you simply want to use multicore machines, the multiprocessing module is the way to go.
Quote from the docs:
In CPython, due to the Global Interpreter Lock, only one thread can
execute Python code at once (even though certain performance-oriented
libraries might overcome this limitation). If you want your
application to make better use of the computational resources of
multi-core machines, you are advised to use multiprocessing. However,
threading is still an appropriate model if you want to run multiple
I/O-bound tasks simultaneously.
Threading is Allowed in Python, the only problem is that the GIL will make sure that just one thread is executed at a time (no parallelism).
So basically if you want to multi-thread the code to speed up calculation it won't speed it up as just one thread is executed at a time, but if you use it to interact with a database for example it will.
I feel for the poster because the answer is invariably "it depends what you want to do". However parallel speed up in python has always been terrible in my experience even for multiprocessing.
For example check this tutorial out (second to top result in google): https://www.machinelearningplus.com/python/parallel-processing-python/
I put timings around this code and increased the number of processes (2,4,8,16) for the pool map function and got the following bad timings:
serial 70.8921644706279
parallel 93.49704207479954 tasks 2
parallel 56.02441442012787 tasks 4
parallel 51.026168536394835 tasks 8
parallel 39.18044807203114 tasks 16
code:
# increase array size at the start
# my compute node has 40 CPUs so I've got plenty to spare here
arr = np.random.randint(0, 10, size=[2000000, 600])
.... more code ....
tasks = [2,4,8,16]
for task in tasks:
tic = time.perf_counter()
pool = mp.Pool(task)
results = pool.map(howmany_within_range_rowonly, [row for row in data])
pool.close()
toc = time.perf_counter()
time1 = toc - tic
print(f"parallel {time1} tasks {task}")

Concurrency Testing For A Web Service Using Python

I have a web service that is required to handle significant concurrent utilization and volume and I need to test it. Since the service is fairly specialized, it does not lend itself well to a typical testing framework. The test would need to simulate multiple clients concurrently posting to a URL, parsing the resulting Http response, checking that a database has been appropriately updated and making sure certain emails have been correctly sent/received.
The current opinion at my company is that I should write this framework using Python. I have never used Python with multiple threads before and as I was doing my research I came across the Global Interpreter Lock which seems to be the basis of most of Python's concurrency handling. It seems to me that the GIL would prevent Python from being able to achieve true concurrency even on a multi-processor machine. Is this true? Does this scenario change if I use a compiler to compile Python to native code? Am I just barking up the wrong tree entirely and is Python the wrong tool for this job?
The Global Interpreter Lock prevents threads simultaneously executing Python code. This doesn't change when Python is compiled to bytecode, because the bytecode is still run by the Python interpreter, which will enforce the GIL. threading works by switching threads every sys.getcheckinterval() bytecodes.
This doesn't apply to multiprocessing, because it creates multiple Python processes instead of threads. You can have as many of those as your system will support, running truly concurrently.
So yes, you can do this with Python, either with threading or multiprocessing.
you can use python's multiprocessing library to achieve this.
http://docs.python.org/library/multiprocessing.html
Assuming general network conditions, as long you have sufficient system resources Python's regular threading module will allow you to simulate concurrent workload at an higher rate than any a real workload.

Why does my Python program average only 33% CPU per process? How can I make Python use all available CPU?

I use Python 2.5.4. My computer: CPU AMD Phenom X3 720BE, Mainboard 780G, 4GB RAM, Windows 7 32 bit.
I use Python threading but can not make every python.exe process consume 100% CPU. Why are they using only about 33-34% on average?.
I wish to direct all available computer resources toward these large calculations so as to complete them as quickly as possible.
EDIT:
Thanks everybody. Now I'm using Parallel Python and everything works well. My CPU now always at 100%. Thanks all!
It appears that you have a 3-core CPU. If you want to use more than one CPU core in native Python code, you have to spawn multiple processes. (Two or more Python threads cannot run concurrently on different CPUs)
As R. Pate said, Python's multiprocessing module is one way. However, I would suggest looking at Parallel Python instead. It takes care of distributing tasks and message-passing. You can even run tasks on many separate computers with little change to your code.
Using it is quite simple:
import pp
def parallel_function(arg):
return arg
job_server = pp.Server() 
# Define your jobs
job1 = job_server.submit(parallel_function, ("foo",))
job2 = job_server.submit(parallel_function, ("bar",))
# Compute and retrieve answers for the jobs.
print job1()
print job2()
Try the multiprocessing module, as Python, while it has real, native threads, will restrict their concurrent use while the GIL is held. Another alternative, and something you should look at if you need real speed, is writing a C extension module and calling functions in it from Python. You can release the GIL in those C functions.
Also see David Beazley's Mindblowing GIL.
Global Interpreter Lock
The reasons of employing such a lock include:
* increased speed of single-threaded programs (no necessity to acquire or release locks
on all data structures separately)
* easy integration of C libraries that usually are not thread-safe.
Applications written in languages with
a GIL have to use separate processes
(i.e. interpreters) to achieve full
concurrency, as each interpreter has
its own GIL.
From CPU usage it looks like you're still running on a single core. Try running a trivial calculation with 3 or more threads with same threading code and see if it utilizes all cores. If it doesn't, something might be wrong with your threading code.
What about Stackless Python?
You bottleneck is probably somewhere else, like the hard-drive (paging), or memory access.
You should perform some Operating System and Python monitoring to determine where the bottle neck is.
Here is some info for windows 7:
Performance Monitor: You can use Windows Performance Monitor to examine how programs you run affect your computer’s performance, both in real time and by collecting log data for later analysis. (Control Panel-> All Control Panel Items->Performance Information and Tools-> Advanced Tools- > View Performance Monitor)
Resource Monitor: Windows Resource Monitor is a system tool that allows you to view information about the use of hardware (CPU, memory, disk, and network) and software (file handles and modules) resources in real time. You can use Resource Monitor to start, stop, suspend, and resume processes and services. (Control Panel-> All Control Panel Items->Performance Information and Tools-> Advanced Tools- > View Resource Monitor)
I solved the problems that led me to this post by running a second script manually. This post helped me run multiple python scripts at the same time.
I managed to execute in the newly-opened terminal window typing a command there. Not as convenient as shift + enter but does the job.

multiprocess or threading in python?

I have a python application that grabs a collection of data and for each piece of data in that collection it performs a task. The task takes some time to complete as there is a delay involved. Because of this delay, I don't want each piece of data to perform the task subsequently, I want them to all happen in parallel. Should I be using multiprocess? or threading for this operation?
I attempted to use threading but had some trouble, often some of the tasks would never actually fire.
If you are truly compute bound, using the multiprocessing module is probably the lightest weight solution (in terms of both memory consumption and implementation difficulty.)
If you are I/O bound, using the threading module will usually give you good results. Make sure that you use thread safe storage (like the Queue) to hand data to your threads. Or else hand them a single piece of data that is unique to them when they are spawned.
PyPy is focused on performance. It has a number of features that can help with compute-bound processing. They also have support for Software Transactional Memory, although that is not yet production quality. The promise is that you can use simpler parallel or concurrent mechanisms than multiprocessing (which has some awkward requirements.)
Stackless Python is also a nice idea. Stackless has portability issues as indicated above. Unladen Swallow was promising, but is now defunct. Pyston is another (unfinished) Python implementation focusing on speed. It is taking an approach different to PyPy, which may yield better (or just different) speedups.
Tasks runs like sequentially but you have the illusion that are run in parallel. Tasks are good when you use for file or connection I/O and because are lightweights.
Multiprocess with Pool may be the right solution for you because processes runs in parallel so are very good with intensive computing because each process run in one CPU (or core).
Setup multiprocess may be very easy:
from multiprocessing import Pool
def worker(input_item):
output = do_some_work()
return output
pool = Pool() # it make one process for each CPU (or core) of your PC. Use "Pool(4)" to force to use 4 processes, for example.
list_of_results = pool.map(worker, input_list) # Launch all automatically
For small collections of data, simply create subprocesses with subprocess.Popen.
Each subprocess can simply get it's piece of data from stdin or from command-line arguments, do it's processing, and simply write the result to an output file.
When the subprocesses have all finished (or timed out), you simply merge the output files.
Very simple.
You might consider looking into Stackless Python. If you have control over the function that takes a long time, you can just throw some stackless.schedule()s in there (saying yield to the next coroutine), or else you can set Stackless to preemptive multitasking.
In Stackless, you don't have threads, but tasklets or greenlets which are essentially very lightweight threads. It works great in the sense that there's a pretty good framework with very little setup to get multitasking going.
However, Stackless hinders portability because you have to replace a few of the standard Python libraries -- Stackless removes reliance on the C stack. It's very portable if the next user also has Stackless installed, but that will rarely be the case.
Using CPython's threading model will not give you any performance improvement, because the threads are not actually executed in parallel, due to the way garbage collection is handled. Multiprocess would allow parallel execution. Obviously in this case you have to have multiple cores available to farm out your parallel jobs to.
There is much more information available in this related question.
If you can easily partition and separate the data you have, it sounds like you should just do that partitioning externally, and feed them to several processes of your program. (i.e. several processes instead of threads)
IronPython has real multithreading, unlike CPython and it's GIL. So depending on what you're doing it may be worth looking at. But it sounds like your use case is better suited to the multiprocessing module.
To the guy who recommends stackless python, I'm not an expert on it, but it seems to me that he's talking about software "multithreading", which is actually not parallel at all (still runs in one physical thread, so cannot scale to multiple cores.) It's merely an alternative way to structure asynchronous (but still single-threaded, non-parallel) application.
You may want to look at Twisted. It is designed for asynchronous network tasks.

Categories

Resources