I have 2 simple functions(loops over a range) that can run separately without any dependency.. I'm trying to run this 2 functions both using the Python multiprocessing module as well as multithreading module..
When I compared the output, I see the multiprocess application takes 1 second more than the multi-threading module..
I read multi-threading is not that efficient because of the Global interpreter lock...
Based on the above statements -
1. Is is best to use the multiprocessing if there is no dependency between 2 processes?
2. How to calculate the number of processes/threads that I can run in my machine for maximum efficiency..
3. Also, is there a way to calculate the efficiency of the program by using multithreading...
Multithread module...
from multiprocessing import Process
import thread
import platform
import os
import time
import threading
class Thread1(threading.Thread):
def __init__(self,threadindicator):
threading.Thread.__init__(self)
self.threadind = threadindicator
def run(self):
starttime = time.time()
if self.threadind == 'A':
process1()
else:
process2()
endtime = time.time()
print 'Thread 1 complete : Time Taken = ', endtime - starttime
def process1():
starttime = time.time()
for i in range(100000):
for j in range(10000):
pass
endtime = time.time()
def process2():
for i in range(1000):
for j in range(1000):
pass
def main():
print 'Main Thread'
starttime = time.time()
thread1 = Thread1('A')
thread2 = Thread1('B')
thread1.start()
thread2.start()
threads = []
threads.append(thread1)
threads.append(thread2)
for t in threads:
t.join()
endtime = time.time()
print 'Main Thread Complete , Total Time Taken = ', endtime - starttime
if __name__ == '__main__':
main()
multiprocess module
from multiprocessing import Process
import platform
import os
import time
def process1():
# print 'process_1 processor =',platform.processor()
starttime = time.time()
for i in range(100000):
for j in range(10000):
pass
endtime = time.time()
print 'Process 1 complete : Time Taken = ', endtime - starttime
def process2():
# print 'process_2 processor =',platform.processor()
starttime = time.time()
for i in range(1000):
for j in range(1000):
pass
endtime = time.time()
print 'Process 2 complete : Time Taken = ', endtime - starttime
def main():
print 'Main Process start'
starttime = time.time()
processlist = []
p1 = Process(target=process1)
p1.start()
processlist.append(p1)
p2 = Process(target = process2)
p2.start()
processlist.append(p2)
for i in processlist:
i.join()
endtime = time.time()
print 'Main Process Complete - Total time taken = ', endtime - starttime
if __name__ == '__main__':
main()
If you have two CPUs available on your machine, you have two processes which don't have to communicate, and you want to use both of them to make your program faster, you should use the multiprocessing module, rather than the threading module.
The Global Interpreter Lock (GIL) prevents the Python interpreter from making efficient use of more than one CPU by using multiple threads, because only one thread can be executing Python bytecode at a time. Therefore, multithreading won't improve the overall runtime of your application unless you have calls that are blocking (e.g. waiting for IO) or that release the GIL (e.g. numpy will do this for some expensive calls) for extended periods of time. However, the multiprocessing library creates separate subprocesses, and therefore several copies of the interpreter, so it can make efficient use of multiple CPUs.
However, in the example you gave, you have one process that finishes very quickly (less than 0.1 seconds on my machine) and one process that takes around 18 seconds to finish on the other. The exact numbers may vary depending on your hardware. In that case, nearly all the work is happening in one process, so you're really only using one CPU regardless. In this case, the increased overhead of spawning processes vs threads is probably causing the process-based version to be slower.
If you make both processes do the 18 second nested loops, you should see that the multiprocessing code goes much faster (assuming your machine actually has more than one CPU). On my machine, I saw the multiprocessing code finish in around 18.5 seconds, and the multithreaded code finish in 71.5 seconds. I'm not sure why the multithreaded one took longer than around 36 seconds, but my guess is the GIL is causing some sort of thread contention issue which is slowing down both threads from executing.
As for your second question, assuming there's no other load on the system, you should use a number of processes equal to the number of CPUs on your system. You can discover this by doing lscpu on a Linux system, sysctl hw.ncpu on a Mac system, or running dxdiag from the Run dialog on Windows (there's probably other ways, but this is how I always do it).
For the third question, the simplest way to figure out how much efficiency you're getting from the extra processes is just to measure the total runtime of your program, using time.time() as you were, or the time utility in Linux (e.g. time python myprog.py). The ideal speedup should be equal to the number of processes you're using, so a 4 process program running on 4 CPUs should be at most 4x faster than the same program with 1 process, assuming you get maximum benefit from the extra processes. If the other processes aren't helping you that much, it will be less than 4x.
Related
I want to perform some benchmarking between 'multiprocessing' a file and sequential processing a file.
In basics it's a file that is read line by line (consists of 100 lines), and the first character is read from eachline and is put into the list if it doesn't exists.
import multiprocessing as mp
import sys
import time
database_layout=[]
def get_first_characters(string):
global database_layout
if string[0:1] not in database_layout:
database_layout.append(string[0:1])
if __name__ == '__main__':
start_time = time.time()
bestand_first_read=open('random.txt','r', encoding="latin-1")
for line in bestand_first_read:
p = mp.Process(target=get_first_characters, args=(line,))
p.start()
print(str(len(database_layout)))
print("Finished part one: "+ str(time.time() - start_time))
bestand_first_read.close()
###Part two
database_layout_two=[]
start_time = time.time()
bestand_first_read_two=open('random.txt','r', encoding="latin-1")
for linetwo in bestand_first_read_two:
if linetwo[0:1] not in database_layout_two:
database_layout_two.append(linetwo[0:1])
print(str(len(database_layout_two)))
print("Finished: part two"+ str(time.time() - start_time))
But when i execute this program i get the following result:
python test.py
0
Finished part one: 17.105965852737427
10
Finished part two: 0.0
Two problems arise at this moment.
1) Why does the multiprocessing takes much longer (+/- 17 sec) than the sequential processing (+/- 0 sec).
2) Why does the list 'database_layout' defined not get filled? (It is the same code)
EDIT
A same example which works with Pools.
import multiprocessing as mp
import timeit
def get_first_characters(string):
return string
if __name__ == '__main__':
database_layout=[]
start = timeit.default_timer()
nr = 0
with mp.Pool(processes=4) as pool:
for i in range(99999):
nr += 1
database_layout.append(pool.starmap(get_first_characters, [(str(i),)]))
stop = timeit.default_timer()
print("Pools: %s " % (stop - start))
database_layout=[]
start = timeit.default_timer()
for i in range(99999):
database_layout.append(get_first_characters(str(i)))
stop = timeit.default_timer()
print("Regular: %s " % (stop - start))
After running above example the following output is shown.
Pools: 22.058468394726148
Regular: 0.051738489109649066
This shows that in such a case working with Pools is 440 times slower than using sequential processing. Any clou why this is?
Multiprocessing starts one process for each line of your input. That means that all the overhead of opening one new Python interpreter for each line of your (possibly very long) file. That accounts for the long time it takes to go through the file.
However, there are other issues with your code. While there is no synchronisation issue due to fighting for the file (since all reads are done in the main process, where the line iteration is going on), you have misunderstood how multiprocessing works.
First of all, your global variable is not global across processes. Actually processes don't usually share memory (like threads) and you have to use some interface to share objects (and hence why shared objects must be picklable). When your code opens each process, each interpreter instance starts by loading your file, which creates a new database_layout variable. Because of that, each interpreter starts with an empty list, which means it ends with a single-element list. For actually sharing the list, you might want to use a Manager (also see how to share state in the docs).
Also because of the huge overhead of opening new interpreters, your script performance may benefit from using a pool of workers, since this will open just a few processes for sharing the work. Remember that resource contention will impact performance if opening more processes than you have CPU cores.
The second problem, besides the issue of sharing your variable, is that your code does not wait for the processing to finish. Hence, even if the state was shared, your processing might not have finished when you check the length of database_layout. Again, using a pool might help with that.
PS: unless you want to preserve the insertion order, you might get even faster by using a set, though I'm not sure the Manager supports it.
EDIT after the OP EDIT: Your pool code is still starting up the pool for each line (or number). As you did, you still have much of your processing in the main process, just looping and passing arguments to the other processes. Besides, you're still running each element in the pool individually and appending in the list, which pretty much uses only one worker process at a time (remember that map or starmaps waits until the work finishes to return). This is from Process Explorer running your code:
Note how the main process is still doing all the hard work (22% in a quad-core machine means its CPU is maxed). What you need to do is pass the iterable to map() in a single call, minimizing the work (specially switching between Python and the C side):
import multiprocessing as mp
import timeit
def get_first_characters(number):
return str(number)[0]
if __name__ == '__main__':
start = timeit.default_timer()
with mp.Pool(processes=4) as pool:
database_layout1 = (pool.map(get_first_characters, range(99999)))
stop = timeit.default_timer()
print("Pools: %s " % (stop - start))
database_layout2=[]
start = timeit.default_timer()
for i in range(99999):
database_layout2.append(get_first_characters(str(i)))
stop = timeit.default_timer()
print("Regular: %s " % (stop - start))
assert database_layout1 == database_layout2
This got me from this:
Pools: 14.169268206710512
Regular: 0.056271265139002935
To this:
Pools: 0.35610273658926417
Regular: 0.07681461930314981
It's still slower than the single-processing one, but that's mainly because of the message-passing overhead for a very simple function. If your function is more complex it'll make more sense.
I have a function slow_function which takes about 200 seconds to process a job_title, it reads and writes to / from a global variable.
There is no improvement in performance using this code. Am I missing something, this however returns the same results.
Code running five job categories in parallel:
from threading import Thread
threads = []
start = time.time()
for job_title in self.job_titles:
t = Thread(target=self.slow_function, args=(job_title,))
threads.append(t)
# Start all threads
for x in threads:
x.start()
# Wait for all of them to finish
for x in threads:
x.join()
end = time.time()
print "New time taken for all jobs:", end - start
You should extract slow_function from class method, because it's impossible to share local context between processes. And after that you can use this code:
from multiprocessing import Pool
start = time.time()
pool = Pool()
results = pool.map(slow_function, self.job_titles)
for r in results:
# update your `global` variables here
end = time.time()
print "New time taken for all jobs:", end - start
You need to use the multiprocessing (https://docs.python.org/2/library/multiprocessing.html) module, since the threading module is limited by the GIL (https://docs.python.org/2/glossary.html#term-global-interpreter-lock).
But you can not use global variables for exchanging data between spawned processes!!! ... see https://docs.python.org/2/library/multiprocessing.html#exchanging-objects-between-processes
I have searched and cannot find an answer to this question elsewhere. Hopefully I haven't missed something.
I am trying to use Python multiprocessing to essentially batch run some proprietary models in parallel. I have, say, 200 simulations, and I want to batch run them ~10-20 at a time. My problem is that the proprietary software crashes if two models happen to start at the same / similar time. I need to introduce a delay between processes spawned by multiprocessing so that each new model run waits a little bit before starting.
So far, my solution has been to introduced a random time delay at the start of the child process before it fires off the model run. However, this only reduces the probability of any two runs starting at the same time, and therefore I still run into problems when trying to process a large number of models. I therefore think that the time delay needs to be built into the multiprocessing part of the code but I haven't been able to find any documentation or examples of this.
Edit: I am using Python 2.7
This is my code so far:
from time import sleep
import numpy as np
import subprocess
import multiprocessing
def runmodels(arg):
sleep(np.random.rand(1,1)*120) # this is my interim solution to reduce the probability that any two runs start at the same time, but it isn't a guaranteed solution
subprocess.call(arg) # this line actually fires off the model run
if __name__ == '__main__':
arguments = [big list of runs in here
]
count = 12
pool = multiprocessing.Pool(processes = count)
r = pool.imap_unordered(runmodels, arguments)
pool.close()
pool.join()
multiprocessing.Pool() already limits number of processes running concurrently.
You could use a lock, to separate the starting time of the processes (not tested):
import threading
import multiprocessing
def init(lock):
global starting
starting = lock
def run_model(arg):
starting.acquire() # no other process can get it until it is released
threading.Timer(1, starting.release).start() # release in a second
# ... start your simulation here
if __name__=="__main__":
arguments = ...
pool = Pool(processes=12,
initializer=init, initargs=[multiprocessing.Lock()])
for _ in pool.imap_unordered(run_model, arguments):
pass
One way to do this with thread and semaphore :
from time import sleep
import subprocess
import threading
def runmodels(arg):
subprocess.call(arg)
sGlobal.release() # release for next launch
if __name__ == '__main__':
threads = []
global sGlobal
sGlobal = threading.Semaphore(12) #Semaphore for max 12 Thread
arguments = [big list of runs in here
]
for arg in arguments :
sGlobal.acquire() # Block if more than 12 thread
t = threading.Thread(target=runmodels, args=(arg,))
threads.append(t)
t.start()
sleep(1)
for t in threads :
t.join()
The answer suggested by jfs caused problems for me as a result of starting a new thread with threading.Timer. If the worker just so happens to finish before the timer does, the timer is killed and the lock is never released.
I propose an alternative route, in which each successive worker will wait until enough time has passed since the start of the previous one. This seems to have the same desired effect, but without having to rely on another child process.
import multiprocessing as mp
import time
def init(shared_val):
global start_time
start_time = shared_val
def run_model(arg):
with start_time.get_lock():
wait_time = max(0, start_time.value - time.time())
time.sleep(wait_time)
start_time.value = time.time() + 1.0 # Specify interval here
# ... start your simulation here
if __name__=="__main__":
arguments = ...
pool = mp.Pool(processes=12,
initializer=init, initargs=[mp.Value('d')])
for _ in pool.imap_unordered(run_model, arguments):
pass
I am trying to implement multiprocessing with Python. It works when pooling very quick tasks, however, freezes when pooling longer tasks. See my example below:
from multiprocessing import Pool
import math
import time
def iter_count(addition):
print "starting ", addition
for i in range(1,99999999+addition):
if i==99999999:
print "completed ", addition
break
if __name__ == '__main__':
print "starting pooling "
pool = Pool(processes=2)
time_start = time.time()
possibleFactors = range(1,3)
try:
pool.map( iter_count, possibleFactors)
except:
print "exception"
pool.close()
pool.join()
#iter_count(1)
#iter_count(2)
time_end = time.time()
print "total loading time is : ", round(time_end-time_start, 4)," seconds"
In this example, if I use smaller numbers in for loop (something like 9999999) it works. But when running for 99999999 it freezes. I tried running two processes (iter_count(1) and iter_count(2)) in sequence, and it takes about 28 seconds, so not really a big task. But when I pool them it freezes. I know that there are some known bugs in python around multiprocessing, however, in my case, same code works for smaller sub tasks, but freezes for bigger ones.
You're using some version of Python 2 - we can tell because of how print is spelled.
So range(1,99999999+addition) is creating a gigantic list, with at least 100 million integers. And you're doing that in 2 worker processes simultaneously. I bet your disk is grinding itself to dust while the OS swaps out everything it can ;-)
Change range to xrange and see what happens. I bet it will work fine then.
I am trying to run python application and execute actions based on specified interval. Below code is consuming constantly 100% of CPU.
def action_print():
print "hello there"
interval = 5
next_run = 0
while True:
while next_run > time.time():
pass
next_run = time.time() + interval
action_print()
I would like to avoid putting process to sleep as there will be more actions to execute at various intervals.
please advise
If you know when the next run will be, you can simply use time.sleep:
import time
interval = 5
next_run = 0
while True:
time.sleep(max(0, next_run - time.time()))
next_run = time.time() + interval
action_print()
If you want other threads to be able to interrupt you, use an event like this:
import time,threading
interval = 5
next_run = 0
interruptEvent = threading.Event()
while True:
interruptEvent.wait(max(0, next_run - time.time()))
interruptEvent.clear()
next_run = time.time() + interval
action_print()
Another thread can now call interruptEvent.set() to wake up yours.
In many cases, you will also want to use a Lock to avoid race conditions on shared data. Make sure to clear the event while you hold the lock.
You should also be aware that under cpython, only one thread can execute Python code. Therefore, if your program is CPU-bound over multiple threads and you're using cpython or pypy, you should substitute threading with multiprocessing.
Presumably you do not want to write time.sleep(interval) , but replacing 'pass' with time.sleep(0.1) will almost completely free up your CPU, and still allow you flexibility in the WHILE predicate.
Alternatively you could use a thread for each event you are scheduling and use time.sleep(interval) but this will still tie up your CPU.
Bottom line : your loop WHILE : PASS is going round and round very fast consuming all your CPU.