Python Multiprocessing - execution time increased, what am I doing wrong?

Python Multiprocessing - execution time increased, what am I doing wrong? - python

I'm doing a simple multiprocessing test and something seems off. Im running this on i5-6200U 2.3 Ghz with Turbo Boost.
from multiprocessing import Process, Queue
import time
def multiply(a,b,que): #add a argument to function for assigning a queue
que.put(a*b) #we're putting return value into queue
if __name__ == '__main__':
queue1 = Queue() #create a queue object
jobs = []
start_time = time.time()
#####PARALLEL####################################
for i in range(0,400):
p = p = Process(target= multiply, args= (5,i,queue1))
jobs.append(p)
p.start()
for j in jobs:
j.join()
print("PARALLEL %s seconds ---" % (time.time() - start_time))
#####SERIAL################################
start_time = time.time()
for i in range(0,400):
multiply(5,i,queue1)
print("SERIAL %s seconds ---" % (time.time() - start_time))
Output:
PARALLEL 22.12951421737671 seconds ---
SERIAL 0.004009723663330078 seconds ---
Help is much appreciated.

Here's a brief example of (silly) code that gets a nice speedup. As already covered in comments, it doesn't create an absurd number of processes, and the work done per remote function invocation is high compared to interprocess communication overheads.
import multiprocessing as mp
import time
def factor(n):
for i in range(n):
pass
return n
if __name__ == "__main__":
ns = range(100000, 110000)
s = time.time()
p = mp.Pool(4)
got = p.map(factor, ns)
print(time.time() - s)
assert got == list(ns)
s = time.time()
got = [factor(n) for n in ns]
print(time.time() - s)
assert got == list(ns)

Related

Python Multiprocessing Queue and Pool slower than normal loop

I am trying to implement multiprocessing in a Python program where I need to run some CPU intensive code. In my test code the multiprocessing Queue and the multiprocessing Pool are both slower than a normal loop with no multiprocessing. During the Pool section of my code, I can see that the CPU usage is maxed out. However, it is still slower than the normal loop! Is there an issue with my code?
import time
from multiprocessing import Process
from multiprocessing import Queue
from multiprocessing import Pool
import random
def run_sims(iterations):
sim_list = []
for i in range(iterations):
sim_list.append(random.uniform(0,1))
print(iterations, "count", sum(sim_list)/len(sim_list))
return (sum(sim_list)/len(sim_list))
def worker(queue):
i=0
while not queue.empty():
task = queue.get()
run_sims(task)
i=i+1
if __name__ == '__main__':
queue = Queue()
iterations_list = [30000000, 30000000, 30000000, 30000000, 30000000]
it_len = len(iterations_list)
## Queue ##
print("#STARTING QUEUE#")
start_t = time.perf_counter()
for i in range(it_len):
iterations = iterations_list[i]
queue.put(iterations)
process = Process(target=worker, args=(queue, ))
process.start()
process.join()
end_t = time.perf_counter()
print("Queue time: ", end_t - start_t)
## Pool ##
print("#STARTING POOL#")
start_t = time.perf_counter()
with Pool() as pool:
results = pool.imap_unordered(run_sims, iterations_list)
for res in results:
res
end_t = time.perf_counter()
print("Pool time: ", end_t - start_t)
## No Multiprocessing - Normal Loop
print("#STARTING NORMAL LOOP#")
start_t = time.perf_counter()
for i in iterations_list:
run_sims(i)
end_t = time.perf_counter()
print("Normal time: ", end_t - start_t)
I've tried the above code but the multiprocessing sections are slower than the normal loop:
Queue Time: 59 seconds
Pool Time: 83 seconds
Normal Loop Time: 55 seconds
My expectation is that Queue and Pool would be significantly faster than the normal loop.

Added processes to the queue code so that it will perform about the same as the pool. On my machine, queue and pool were significantly faster than sequential. I have 4 cores and 8 cpus. Since this is a cpu bound task, performance differences will differ depending on the number of available cpus and other working going on in the machine.
This script keeps the number of workers below the cpu count. If these were network bound tasks, a larger pool could potentially perform faster. Disk bound tasks would likely not benefit from a larger pool.
import time
from multiprocessing import Process
from multiprocessing import Queue
from multiprocessing import Pool
from multiprocessing import cpu_count
import random
def run_sims(iterations):
sim_list = []
for i in range(iterations):
sim_list.append(random.uniform(0,1))
print(iterations, "count", sum(sim_list)/len(sim_list))
return (sum(sim_list)/len(sim_list))
def worker(queue):
i=0
while not queue.empty():
task = queue.get()
run_sims(task)
i=i+1
if __name__ == '__main__':
iteration_count = 5
queue = Queue()
iterations_list = [30000000] * iteration_count
it_len = len(iterations_list)
# guess a parallel execution size. CPU bound, and we want some
# room for other processes.
pool_size = max(min(cpu_count()-2, len(iterations_list)), 2)
print("Pool size", pool_size)
## Queue ##
print("#STARTING QUEUE#")
start_t = time.perf_counter()
for iterations in iterations_list:
queue.put(iterations)
processes = []
for i in range(pool_size):
processes.append(Process(target=worker, args=(queue, )))
processes[-1].start()
for process in processes:
process.join()
end_t = time.perf_counter()
print("Queue time: ", end_t - start_t)
## Pool ##
print("#STARTING POOL#")
start_t = time.perf_counter()
with Pool(pool_size) as pool:
results = pool.imap_unordered(run_sims, iterations_list)
for res in results:
res
end_t = time.perf_counter()
print("Pool time: ", end_t - start_t)
## No Multiprocessing - Normal Loop
print("#STARTING NORMAL LOOP#")
start_t = time.perf_counter()
for i in iterations_list:
run_sims(i)
end_t = time.perf_counter()
print("Normal time: ", end_t - start_t)

Why is training this xgboost model in a subprocess not terminating?

Given the following program running my_function in a subprocess using run_process_timeout_wrapper leads to a timeout (over 160s), while running it "normally" takes less than a second.
from multiprocessing import Process, Queue
import time
import numpy as np
import xgboost
def run_process_timeout_wrapper(function, args, timeout):
def foo(n, out_q):
res = function(*n)
out_q.put(res) # to get result back from thread target
result_q = Queue()
p = Process(target=foo, args=(args, result_q))
p.start()
try:
x = result_q.get(timeout=timeout)
except Empty as e:
p.terminate()
raise multiprocessing.TimeoutError("Timed out after waiting for {}s".format(timeout))
p.terminate()
return x
def my_function(fun):
print("Started")
t1 = time.time()
pol = xgboost.XGBRegressor()
pol.fit(np.random.rand(5,1500), np.random.rand(50,1))
print("Took ", time.time() - t1)
pol.predict(np.random.rand(2,1500))
return 5
if __name__ == '__main__':
t1 = time.time()
pol = xgboost.XGBRegressor()
pol.fit(np.random.rand(50,150000), np.random.rand(50,1))
print("Took ", time.time() - t1)
my_function(None)
t1 = time.time()
res = run_process_timeout_wrapper(my_function, (None,),160)
print("Res ", res, " Time ", time.time() - t1)
I am running this on Linux. Since it has come up, I have also added a print in the beginning of my_function showing that this function is at least reached.

Gathered from this issue I found that forking a multi-threaded application is problematic. One possible solution is to add
if __name__ == "__main__":
mp.set_start_method('spawn')
However, this may lead to other issues.

Why Multiprocessing cost the same time as normal way

I am new to python and multiprocess or multi thread,Here is my question,I tired to use multiprocessing module in python,I follow the guide and create two s
separate process, place a function into each process,and run it,record the time,then I find that the time it cost does not become less, I was wondering why,here is my code:
import multi processing
import time
start = time.time()
def mathwork():
print(sum(j * j for j range(10 ** 7)))
if__name__ ==‘__main__’:
process1 = multiprocessing.Process(name = ‘process1’,target = mathwork)
process2 = multiprocessing.Process(name = ‘process2’,target = mathwork)
process1.start()
process2.start()
end = time.time()
print(end-start)

I'm going to assume that the code you posted was messed with by some text editor.
I'll answer your question using the example below:
import multiprocessing
import time
start = time.time()
def mathwork():
print(sum(j * j for j in range(10 ** 7)))
if __name__ =='__main__':
process1 = multiprocessing.Process(name = 'process1',target = mathwork)
process2 = multiprocessing.Process(name = 'process2',target = mathwork)
process1.start()
process2.start()
end = time.time()
print(end-start)
The reason your code takes just as long to complete, no matter what the threads are doing, is that you aren't waiting for your threads to complete before printing out the time.
To wait for your processes to finish you have to use the join function on them, which will create the following snippet:
if __name__ =='__main__':
process1 = multiprocessing.Process(name = 'process1',target = mathwork)
process2 = multiprocessing.Process(name = 'process2',target = mathwork)
process1.start()
process2.start()
process1.join()
process2.join()
end = time.time()
print(end-start)
You'll notice that the time is now larger when you're running the processes, because your code is now waiting for them to finish and return.
As an interesting aside (Now found out to be due To this quirk between windows and unix), if your print statement was outside the __name__ == '__main__' check, you would print times for each process you ran, because it loaded the file again to get the function definition.
With this method I get:
4.772554874420166 # single execution ( 2 functions in main )
2.486908197402954 # multi processing ( threads for each function )

You measure the time it takes to start the processes, not the time it takes to run them. Wait for the processes to finish by calling join, like this:
import multiprocessing
import time
def mathwork():
sum(j * j for j in range(10 ** 7))
if __name__ == '__main__':
start = time.time()
process1 = multiprocessing.Process(name='process1', target=mathwork)
process2 = multiprocessing.Process(name='process2', target=mathwork)
process1.start()
process2.start()
process1.join()
process2.join()
print('multiprocessing: %s' % (time.time() - start))
start = time.time()
mathwork()
mathwork()
print('one process: %s' % (time.time() - start))
On my system, the output is:
multiprocessing: 0.9190812110900879
one process: 1.8888437747955322
Showing that indeed, multiprocessing makes this computation go twice as fast.

Multiprocessing Queue.get() hangs

I'm trying to implement basic multiprocessing and I've run into an issue. The python script is attached below.
import time, sys, random, threading
from multiprocessing import Process
from Queue import Queue
from FrequencyAnalysis import FrequencyStore, AnalyzeFrequency
append_queue = Queue(10)
database = FrequencyStore()
def add_to_append_queue(_list):
append_queue.put(_list)
def process_append_queue():
while True:
item = append_queue.get()
database.append(item)
print("Appended to database in %.4f seconds" % database.append_time)
append_queue.task_done()
return
def main():
database.load_db()
print("Database loaded in %.4f seconds" % database.load_time)
append_queue_process = Process(target=process_append_queue)
append_queue_process.daemon = True
append_queue_process.start()
#t = threading.Thread(target=process_append_queue)
#t.daemon = True
#t.start()
while True:
path = raw_input("file: ")
if path == "exit":
break
a = AnalyzeFrequency(path)
a.analyze()
print("Analyzed file in %.4f seconds" % a._time)
add_to_append_queue(a.get_results())
append_queue.join()
#append_queue_process.join()
database.save_db()
print("Database saved in %.4f seconds" % database.save_time)
sys.exit(0)
if __name__=="__main__":
main()
The AnalyzeFrequency analyzes the frequencies of words in a file and get_results() returns a sorted list of said words and frequencies. The list is very large, perhaps 10000 items.
This list is then passed to the add_to_append_queue method which adds it to a queue. The process_append_queue takes the items one by one and adds the frequencies to a "database". This operation takes a bit longer than the actual analysis in main() so I am trying to use a seperate process for this method. When I try and do this with the threading module, everything works perfectly fine, no errors. When I try and use Process, the script hangs at item = append_queue.get().
Could someone please explain what is happening here, and perhaps direct me toward a fix?
All answers appreciated!
UPDATE
The pickle error was my fault, it was just a typo. Now I am using the Queue class within multiprocessing but the append_queue.get() method still hangs.
NEW CODE
import time, sys, random
from multiprocessing import Process, Queue
from FrequencyAnalysis import FrequencyStore, AnalyzeFrequency
append_queue = Queue()
database = FrequencyStore()
def add_to_append_queue(_list):
append_queue.put(_list)
def process_append_queue():
while True:
database.append(append_queue.get())
print("Appended to database in %.4f seconds" % database.append_time)
return
def main():
database.load_db()
print("Database loaded in %.4f seconds" % database.load_time)
append_queue_process = Process(target=process_append_queue)
append_queue_process.daemon = True
append_queue_process.start()
#t = threading.Thread(target=process_append_queue)
#t.daemon = True
#t.start()
while True:
path = raw_input("file: ")
if path == "exit":
break
a = AnalyzeFrequency(path)
a.analyze()
print("Analyzed file in %.4f seconds" % a._time)
add_to_append_queue(a.get_results())
#append_queue.join()
#append_queue_process.join()
print str(append_queue.qsize())
database.save_db()
print("Database saved in %.4f seconds" % database.save_time)
sys.exit(0)
if __name__=="__main__":
main()
UPDATE 2
This is the database code:
class FrequencyStore:
def __init__(self):
self.sorter = Sorter()
self.db = {}
self.load_time = -1
self.save_time = -1
self.append_time = -1
self.sort_time = -1
def load_db(self):
start_time = time.time()
try:
file = open("results.txt", 'r')
except:
raise IOError
self.db = {}
for line in file:
word, count = line.strip("\n").split("=")
self.db[word] = int(count)
file.close()
self.load_time = time.time() - start_time
def save_db(self):
start_time = time.time()
_db = []
for key in self.db:
_db.append([key, self.db[key]])
_db = self.sort(_db)
try:
file = open("results.txt", 'w')
except:
raise IOError
file.truncate(0)
for x in _db:
file.write(x[0] + "=" + str(x[1]) + "\n")
file.close()
self.save_time = time.time() - start_time
def create_sorted_db(self):
_temp_db = []
for key in self.db:
_temp_db.append([key, self.db[key]])
_temp_db = self.sort(_temp_db)
_temp_db.reverse()
return _temp_db
def get_db(self):
return self.db
def sort(self, _list):
start_time = time.time()
_list = self.sorter.mergesort(_list)
_list.reverse()
self.sort_time = time.time() - start_time
return _list
def append(self, _list):
start_time = time.time()
for x in _list:
if x[0] not in self.db:
self.db[x[0]] = x[1]
else:
self.db[x[0]] += x[1]
self.append_time = time.time() - start_time

Comments suggest you're trying to run this on Windows. As I said in a comment,
If you're running this on Windows, it can't work - Windows doesn't
have fork(), so each process gets its own Queue and they have nothing
to do with each other. The entire module is imported "from scratch" by
each process on Windows. You'll need to create the Queue in main(),
and pass it as an argument to the worker function.
Here's fleshing out what you need to do to make it portable, although I removed all the database stuff because it's irrelevant to the problems you've described so far. I also removed the daemon fiddling, because that's usually just a lazy way to avoid shutting down things cleanly, and often as not will come back to bite you later:
def process_append_queue(append_queue):
while True:
x = append_queue.get()
if x is None:
break
print("processed %d" % x)
print("worker done")
def main():
import multiprocessing as mp
append_queue = mp.Queue(10)
append_queue_process = mp.Process(target=process_append_queue, args=(append_queue,))
append_queue_process.start()
for i in range(100):
append_queue.put(i)
append_queue.put(None) # tell worker we're done
append_queue_process.join()
if __name__=="__main__":
main()
The output is the "obvious" stuff:
processed 0
processed 1
processed 2
processed 3
processed 4
...
processed 96
processed 97
processed 98
processed 99
worker done
Note: because Windows doesn't (can't) fork(), it's impossible for worker processes to inherit any Python object on Windows. Each process runs the entire program from its start. That's why your original program couldn't work: each process created its own Queue, wholly unrelated to the Queue in the other process. In the approach shown above, only the main process creates a Queue, and the main process passes it (as an argument) to the worker process.

queue.Queue is thread-safe, but doesn't work across processes. This is quite easy to fix, though. Instead of:
from multiprocessing import Process
from Queue import Queue
You want:
from multiprocessing import Process, Queue

Real-time-ability python multiprocessing (Queue and Pipe)

I am a little bit confused testing the multiprocessing module.
Let's simulate a digital timer. The code would look like:
start=datetime.now()
while True:
now=datetime.now()
delta=now-start
s = delta.seconds + delta.microseconds/1E6
print s
time.sleep(1)
Which returns correctly:
8e-06
1.001072
2.00221
3.003353
4.004416
...
Now I want to read the clock from my virtual external digital clock device using a pipe:
def ask_timer(conn):
start=datetime.now()
while True:
now=datetime.now()
delta=now-start
s = delta.seconds + delta.microseconds/1E6
conn.send(s)
parent_conn, child_conn = Pipe()
p = Process(target=ask_timer, args=(child_conn,))
p.start()
while True:
print parent_conn.recv()
time.sleep(1)
It returns:
2.9e-05
6.7e-05
7.7e-05
8.3e-05
8.9e-05
9.4e-05
0.0001
...
Here the timer doesn't seem to run permanently in the background..The implementation of "Queue" looks like:
def ask_timer(q):
while True:
now=datetime.now()
delta=now-start
s = delta.seconds + delta.microseconds/1E6
q.put(s)
#conn.close()
q = Queue()
p = Process(target=ask_timer, args=(q,))
p.start()
while True:
print q.get()
time.sleep(1)
which does the same like pipe. Is this just my misconception of multiprocessing of python? How could I ask a value realtime from a running parallel-thread?

Everything is working correctly. The child process is executing ask_timer() function completely independently from you main process. You don't have any time.sleep() in this function, so it just prints or puts in the queue deltas in the infinite loop with interval of like 10ms.
Once a second your main process asks child process for data and get's it. Data is one of those small intervals.
The problem there is that you're putting much more data into pipe/queue, than taking from it. So you're getting old data, when you ask. To test that you can print queue size in the loop (won't work on OS X):
def ask_timer(q):
start = datetime.now()
while True:
now = datetime.now()
delta = now - start
s = delta.seconds + delta.microseconds / 1E6
q.put(s)
q = Queue()
p = Process(target=ask_timer, args=(q,))
p.start()
while True:
print q.get()
print q.qsize()
time.sleep(1)
The queue size will grow really fast.
Apparently you can use shared memory to read current value from the child process.
from multiprocessing import Process, Value
from datetime import datetime
import time
from ctypes import c_double
def ask_timer(v):
start = datetime.now()
while True:
now = datetime.now()
delta = now - start
s = delta.seconds + delta.microseconds / 1E6
v.value = s
val = Value(c_double, 0.0)
p = Process(target=ask_timer, args=(val,))
p.start()
while True:
print(val.value)
time.sleep(1)

Develop Reference

Python is a programming language that lets you work quickly and integrate systems more effectively.

Python Multiprocessing - execution time increased, what am I doing wrong? - python

Related

Python Multiprocessing Queue and Pool slower than normal loop

Why is training this xgboost model in a subprocess not terminating?

Why Multiprocessing cost the same time as normal way

Multiprocessing Queue.get() hangs

Real-time-ability python multiprocessing (Queue and Pipe)

Categories

Resources