I'm considering using Python's multiprocessing package for messaging between local python programs.
This seems like the right way to go IF:
The programs will always run locally on the same machine (and same OS instance)
The programs' implementation will remain in Python
Speed is important
Is it possible in case the python processes were run independently by the user, i.e. one did not spawn the other?
How?
The docs seem to give examples only of cases where one spawns the other.
See Listeners and Clients
The programs will always run locally on the same machine (and same OS instance)
Multiprocessing allows to have remote concurrency.
The programs' implementation will remain in Python
Yes and no. You could wrap another command in a python function. This will work, for example:
from multiprocessing import Process
import subprocess
def f(name):
subprocess.call(["ls", "-l"])
if __name__ == '__main__':
p = Process(target=f, args=('bob',))
p.start()
p.join()
Speed is important
That depends from a number of factors:
how much overhead will cause the co-ordination between processes?
how many cores does your CPU have?
how much disk I/O is required by each process? Do them work on the same physical disk?
...
Is it possible in case the python processes were run independently by the user, i.e. one did not spawn the other?
I'm not an expert on the subject, but I implemented something similar once by using files to exchange data [basically one process' output file was monitored as input source by the other, and vice-versa].
HTH!
Related
I was learning about multi-processing and multi-threading.
From what I understand, threads run on the same core, so I was wondering if I create multiple processes inside a child thread will they be limited to that single core too?
I'm using python, so this is a question about that specific language but I would like to know if it is the same thing with other languages?
I'm not a pyhton expert but I expect this is like in other languages, because it's an OS feature in general.
Process
A process is executed by the OS and owns one thread which will be executed. This is in general your programm. You can start more threads inside your process to do some heavy calculations or whatever you have to do.
But they belong to the process.
Thread
One or more threads are owned by a process and execution will be distributed across all cores.
Now to your question
When you create a given number of threads these threads should in general be distributed across all your cores. They're not limited to the core who's executing the phyton interpreter.
Even when you create a subprocess from your phyton code the process can and should run on other cores.
You can read more about the gernal concept here:
Preemptive multitasking
There're some libraries in different languages who abstract a thread to something often called a Task or something else.
For these special cases it's possible that they're just running inside the thread they were created in.
For example. In the DotNet world there's a Thread and a Task. Often people are misusing the term thread when they're talking about a Task, which in general runns inside the thread it was created.
Every program is represented through one process. A process is the execution context one or multiple threads operate in. All threads in one process share the same tranche of virtual memory assigned to the process.
Python (refering to CPython, e.g. Jython and IronPython have no GIL) is special because it has the global interpreter lock (GIL), which prevents threaded python code from being run on multiple cores in parallel. Only code releasing the GIL can operate truely parallel (I/O operations and some C-extensions like numpy). That's why you will have to use the multiprocessing module for cpu-bound python-code you need to run in parallel. Processes startet with the multiprocessing module then will run it's own python interpreter instance so you can process code truely parallel.
Note that even a single threaded python-application can run on different cores, not in parallel but sequentially, in case the OS re-schedules execution to another core after a context switch took place.
Back to your question:
if I create multiple processes inside a child thread will they be limited to that single core too?
You don't create processes inside a thread, you spawn new independent python-processes with the same limitations of the original python process and on which cores threads of the new processes will execute is up to the OS (...as long you don't manipulate the core-affinity of a process, but let's not go there).
I have created and application. In this application I use multiprocessing library. In that application I do spin two processes (instances of the same class) to consume data from Kafka and put into Python Queue.
This is the library I used:
Python multiprocessing
Q1. Is it concurrency or is it parallelism?
Q2. Is it multithreading or is it multiprocessing?
Q3. How does Python maps Processes to CPUs? (does this question make sense?)
I understand in order to speak about multithreading I need to use separate / multiple CPUs (so separate threads are mapped to separate CPU threads).
I understand in order to speak about multiprocessing I need to use separate memory space for both processes? Is it correct?
I assume if I spin two processes within one Application instance => we talk about concurrency.
If I spin multiple instances of above application then we would talk about parallelism? (multiple CPUs, separate memory spaces used)?
I see that Python library defines it as follows: Python multiprocessing library
The multiprocessing package offers both local and remote concurrency
...
Due to this, the multiprocessing module allows the programmer to fully leverage multiple processors on a given machine.
...
A prime example of this is the Pool object which offers a convenient means of parallelizing the execution of a function across multiple input values, distributing the input data across processes (data parallelism).
First, separate threads are not mapped to separate CPU-s. That's optional, and in python due to the GIL, all threads in a process will run on the same CPU
1) It's both concurrency, in that the order of execution is not set, and parallelism, since the multiprocessing package can run on multiple processors, bypassing the GIL limitations.
2) Since the threading package is another story, then it's definitely multiprocessing
3) I may be speaking out of line, but python , IMO does NOT map processes to CPU-s, it leaves this detail to the OS
Q1: It is at least concurrency, can be parallelism as well (terms intended as defined in the answer to this question). Clearly, if you have one processor only, true parallellism cannot be achieved, becuse only one process can use the CPU at a single time. In that case, however, the muliprocessing library still allows you to define multiple tasks, that run in separate processes. It will be the OS's scheduler to decide which process runs when.
Q2: Multiprocessing (...which is kind of implied by the library name). Due to the Global Interpreter Lock present in most Python interpreter implementations, parallelism with threads is impossible. Multiprocessing offers a threading-like interface that makes use of processes under the hood.
Q3: It doesn't. Python spawns processes, the OS scheduler decided who runs where and when. There are some ways to execute processes on specific CPUs, but this is not the default behaviour of multiprocessing (and I'm not aware of any way to force the library to pin processes to CPUs).
I'm executing python code on several files. Since the files are all very big and since one call one treats one file, it lasts very long till the final file is treated. Hence, here is my question: Is it possible to use several workers which treat the files in parallel?
Is this a possible invocation?:
import annotation as annot # this is a .py-file
import multiprocessing
pool = multiprocessing.Pool(processes=4)
pool.map(annot, "")
The .py-file uses for-loops (etc.) to get all files by itself.
The problem is: If I have a look at all the processes (with 'top'), I only see 1 process which is working with the .py-file. So...I suspect that I shouldn't use multiprocessing like this...does I?
Thanks for any help! :)
Yes. Use multiprocessing.Pool.
import multiprocessing
pool = multiprocessing.Pool(processes=<pool size>)
result = pool.map(<your function>, <file list>)
My answer is not purely a python answer though I think it's the best approach given your problem.
This will only work on Unix systems (OS X/Linux/etc.).
I do stuff like this all the time, and I am in love with GNU Parallel. See this also for an introduction by the GNU Parallel developer. You will likely have to install it, but it's worth it.
Here's a simple example. Say you have a python script called processFiles.py:
#!/usr/bin/python
#
# Script to print out file name
#
fileName = sys.argv[0] # command line argument
print( fileName ) # adapt for python 2.7 if you need to
To make this file executable:
chmod +x processFiles.py
And say all your large files are in largeFileDir. Then to run all the files in parallel with four processors (-P4), run this at the command line:
$ parallel -P4 processFiles.py ::: $(ls largeFileDir/*)
This will output
file1
file3
file7
file2
...
They may not be in order because each thread is operating independently in parallel. To adapt this to your process, insert your file processing script instead of just stupidly printing the file to screen.
This is preferable to threading in your case because each file processing job will get its own instance of the Python interpreter. Since each file is processed independently (or so it sounds) threading is overkill. In my experience this is the most efficient way to parallelize a process like you describe.
There is something called the Global Interpreter Lock that I don't understand very well, but has caused me headaches when trying to use python built-ins to hyperthread. Which is why I say if you don't need to thread, don't. Instead do as I've recommended and start up independent python processes.
There are many options.
multiple threads
multiple processes
"green threads", I personally like Eventlet
Then there are more "Enterprise" solutions, which are even able running workers on multiple servers, e.g. Celery, for more search Distributed task queue python.
In all cases, your scenario will become more complex and sometime you will not gain much, e.g. if your processing is limited by I/O operations (reading the data) and not by computation and processing.
Yes, this is possible. You should investigate the threading module and the multiprocessing module. Both will allow you to execute Python code concurrently. One note with the threading module, though, is that because of the way Python is implemented (Google "python GIL" if you're interested in the details), only one thread will execute at a time, even if you have multiple CPU cores. This is different from the threading implementation in our languages, where each thread will run at the same time, each using a different core. Because of this limitation, in cases where you want to do CPU-intensive operations concurrently, you'll get better performance with the multiprocessing module.
We have a about 500GB of images in various directories we need to process. Each image is about 4MB in size and we have a python script to process each image one at a time (it reads metadata and stores it in a database). Each directory can take 1-4 hours to process depending on size.
We have at our disposal a 2.2Ghz quad core processor and 16GB of RAM on a GNU/Linux OS. The current script is utilizing only one processor. What's the best way to take advantage of the other cores and RAM to process images faster? Will starting multiple Python processes to run the script take advantage of the other cores?
Another option is to use something like Gearman or Beanstalk to farm out the work to other machines. I've taken a look at the multiprocessing library but not sure how I can utilize it.
Will starting multiple Python processes to run the script take advantage of the other cores?
Yes, it will, if the task is CPU-bound. This is probably the easiest option. However, don't spawn a single process per file or per directory; consider using a tool such as parallel(1) and let it spawn something like two processes per core.
Another option is to use something like Gearman or Beanstalk to farm out the work to other machines.
That might work. Also, have a look at the Python binding for ZeroMQ, it makes distributed processing pretty easy.
I've taken a look at the multiprocessing library but not sure how I can utilize it.
Define a function, say process, that reads the images in a single directory, connects to the database and stores the metadata. Let it return a boolean indicating success or failure. Let directories be the list of directories to process. Then
import multiprocessing
pool = multiprocessing.Pool(multiprocessing.cpu_count())
success = all(pool.imap_unordered(process, directories))
will process all the directories in parallel. You can also do the parallelism at the file-level if you want; that needs just a bit more tinkering.
Note that this will stop at the first failure; making it fault-tolerant takes a bit more work.
Starting independent Python processes is ideal. There will be no lock contentions between the processes, and the OS will schedule them to run concurrently.
You may want to experiment to see what the ideal number of instances is - it may be more or less than the number of cores. There will be contention for the disk and cache memory, but on the other hand you may get one process to run while another is waiting for I/O.
You can use pool of multiprocessing to create processes for increasing performance. Let's say, you have a function handle_file which is for processing image. If you use iteration, it can only use at most 100% of one your core. To utilize multiple cores, Pool multiprocessing creates subprocesses for you, and it distributes your task to them. Here is an example:
import os
import multiprocessing
def handle_file(path):
print 'Do something to handle file ...', path
def run_multiprocess():
tasks = []
for filename in os.listdir('.'):
tasks.append(filename)
print 'Create task', filename
pool = multiprocessing.Pool(8)
result = all(list(pool.imap_unordered(handle_file, tasks)))
print 'Finished, result=', result
def run_one_process():
for filename in os.listdir('.'):
handle_file(filename)
if __name__ == '__main__':
run_one_process
run_multiprocess()
The run_one_process is single core way to process data, simple, but slow. On the other hand, run_multiprocess creates 8 worker processes, and distribute tasks to them. It would be about 8 times faster if you have 8 cores. I suggest you set the worker number to double of your cores or exactly the number of your cores. You can try it and see which configuration is faster.
For advanced distributed computing, you can use ZeroMQ as larsmans mentioned. It's hard to understand at first. But once you understand it, you can design a very efficient distributed system to process your data. In your case, I think one REQ with multiple REP would be good enough.
Hope this would be helpful.
See the answer to this question.
If the app can process ranges of input data, then you can launch 4
instances of the app with different ranges of input data to process
and the combine the results after they are all done.
Even though that question looks to be Windows specific, it applies to single threaded programs on all operating system.
WARNING: Beware that this process will be I/O bound and too much concurrent access to your hard drive will actually cause the processes as a group to execute slower than sequential processing because of contention for the I/O resource.
If you are reading a large number of files and saving metadata to a database you program does not need more cores.
Your process is likely IO bound not CPU bound. Using twisted with proper defereds and callbacks would likely outperform any solution that sought to enlist 4 cores.
I think in this scenario it would make perfectly sense to use Celery.
Am new to python and making some headway with threading - am doing some music file conversion and want to be able to utilize the multiple cores on my machine (one active conversion thread per core).
class EncodeThread(threading.Thread):
# this is hacked together a bit, but should give you an idea
def run(self):
decode = subprocess.Popen(["flac","--decode","--stdout",self.src],
stdout=subprocess.PIPE)
encode = subprocess.Popen(["lame","--quiet","-",self.dest],
stdin=decode.stdout)
encode.communicate()
# some other code puts these threads with various src/dest pairs in a list
for proc in threads: # `threads` is my list of `threading.Thread` objects
proc.start()
Everything works, all the files get encoded, bravo! ... however, all the processes spawn immediately, yet I only want to run two at a time (one for each core). As soon as one is finished, I want it to move on to the next on the list until it is finished, then continue with the program.
How do I do this?
(I've looked at the thread pool and queue functions but I can't find a simple answer.)
Edit: maybe I should add that each of my threads is using subprocess.Popen to run a separate command line decoder (flac) piped to stdout which is fed into a command line encoder (lame/mp3).
If you want to limit the number of parallel threads, use a semaphore:
threadLimiter = threading.BoundedSemaphore(maximumNumberOfThreads)
class EncodeThread(threading.Thread):
def run(self):
threadLimiter.acquire()
try:
<your code here>
finally:
threadLimiter.release()
Start all threads at once. All but maximumNumberOfThreads will wait in threadLimiter.acquire() and a waiting thread will only continue once another thread goes through threadLimiter.release().
"Each of my threads is using subprocess.Popen to run a separate command line [process]".
Why have a bunch of threads manage a bunch of processes? That's exactly what an OS does that for you. Why micro-manage what the OS already manages?
Rather than fool around with threads overseeing processes, just fork off processes. Your process table probably can't handle 2000 processes, but it can handle a few dozen (maybe a few hundred) pretty easily.
You want to have more work than your CPU's can possibly handle queued up. The real question is one of memory -- not processes or threads. If the sum of all the active data for all the processes exceeds physical memory, then data has to be swapped, and that will slow you down.
If your processes have a fairly small memory footprint, you can have lots and lots running. If your processes have a large memory footprint, you can't have very many running.
If you're using the default "cpython" version then this won't help you, because only one thread can execute at a time; look up Global Interpreter Lock. Instead, I'd suggest looking at the multiprocessing module in Python 2.6 -- it makes parallel programming a cinch. You can create a Pool object with 2*num_threads processes, and give it a bunch of tasks to do. It will execute up to 2*num_threads tasks at a time, until all are done.
At work I have recently migrated a bunch of Python XML tools (a differ, xpath grepper, and bulk xslt transformer) to use this, and have had very nice results with two processes per processor.
It looks to me that what you want is a pool of some sort, and in that pool you would like the have n threads where n == the number of processors on your system. You would then have another thread whose only job was to feed jobs into a queue which the worker threads could pick up and process as they became free (so for a dual code machine, you'd have three threads but the main thread would be doing very little).
As you are new to Python though I'll assume you don't know about the GIL and it's side-effects with regard to threading. If you read the article I linked you will soon understand why traditional multithreading solutions are not always the best in the Python world. Instead you should consider using the multiprocessing module (new in Python 2.6, in 2.5 you can use this backport) to achieve the same effect. It side-steps the issue of the GIL by using multiple processes as if they were threads within the same application. There are some restrictions about how you share data (you are working in different memory spaces) but actually this is no bad thing: they just encourage good practice such as minimising the contact points between threads (or processes in this case).
In your case you are probably intersted in using a pool as specified here.
Short answer: don't use threads.
For a working example, you can look at something I've recently tossed together at work. It's a little wrapper around ssh which runs a configurable number of Popen() subprocesses. I've posted it at: Bitbucket: classh (Cluster Admin's ssh Wrapper).
As noted, I don't use threads; I just spawn off the children, loop over them calling their .poll() methods and checking for timeouts (also configurable) and replenish the pool as I gather the results. I've played with different sleep() values and in the past I've written a version (before the subprocess module was added to Python) which used the signal module (SIGCHLD and SIGALRM) and the os.fork() and os.execve() functions --- which my on pipe and file descriptor plumbing, etc).
In my case I'm incrementally printing results as I gather them ... and remembering all of them to summarize at the end (when all the jobs have completed or been killed for exceeding the timeout).
I ran that, as posted, on a list of 25,000 internal hosts (many of which are down, retired, located internationally, not accessible to my test account etc). It completed the job in just over two hours and had no issues. (There were about 60 of them that were timeouts due to systems in degenerate/thrashing states -- proving that my timeout handling works correctly).
So I know this model works reliably. Running 100 current ssh processes with this code doesn't seem to cause any noticeable impact. (It's a moderately old FreeBSD box). I used to run the old (pre-subprocess) version with 100 concurrent processes on my old 512MB laptop without problems, too).
(BTW: I plan to clean this up and add features to it; feel free to contribute or to clone off your own branch of it; that's what Bitbucket.org is for).
I am not an expert in this, but I have read something about "Lock"s. This article might help you out
Hope this helps
I would like to add something, just as a reference for others looking to do something similar, but who might have coded things different from the OP. This question was the first one I came across when searching and the chosen answer pointed me in the right direction. Just trying to give something back.
import threading
import time
maximumNumberOfThreads = 2
threadLimiter = threading.BoundedSemaphore(maximumNumberOfThreads)
def simulateThread(a,b):
threadLimiter.acquire()
try:
#do some stuff
c = a + b
print('a + b = ',c)
time.sleep(3)
except NameError: # Or some other type of error
# in case of exception, release
print('some error')
threadLimiter.release()
finally:
# if everything completes without error, release
threadLimiter.release()
threads = []
sample = [1,2,3,4,5,6,7,8,9]
for i in range(len(sample)):
thread = threading.Thread(target=(simulateThread),args=(sample[i],2))
thread.daemon = True
threads.append(thread)
thread.start()
for thread in threads:
thread.join()
This basically follows what you will find on this site:
https://www.kite.com/python/docs/threading.BoundedSemaphore