Real-time-ability python multiprocessing (Queue and Pipe) - python

I am a little bit confused testing the multiprocessing module.
Let's simulate a digital timer. The code would look like:
start=datetime.now()
while True:
now=datetime.now()
delta=now-start
s = delta.seconds + delta.microseconds/1E6
print s
time.sleep(1)
Which returns correctly:
8e-06
1.001072
2.00221
3.003353
4.004416
...
Now I want to read the clock from my virtual external digital clock device using a pipe:
def ask_timer(conn):
start=datetime.now()
while True:
now=datetime.now()
delta=now-start
s = delta.seconds + delta.microseconds/1E6
conn.send(s)
parent_conn, child_conn = Pipe()
p = Process(target=ask_timer, args=(child_conn,))
p.start()
while True:
print parent_conn.recv()
time.sleep(1)
It returns:
2.9e-05
6.7e-05
7.7e-05
8.3e-05
8.9e-05
9.4e-05
0.0001
...
Here the timer doesn't seem to run permanently in the background..The implementation of "Queue" looks like:
def ask_timer(q):
while True:
now=datetime.now()
delta=now-start
s = delta.seconds + delta.microseconds/1E6
q.put(s)
#conn.close()
q = Queue()
p = Process(target=ask_timer, args=(q,))
p.start()
while True:
print q.get()
time.sleep(1)
which does the same like pipe. Is this just my misconception of multiprocessing of python? How could I ask a value realtime from a running parallel-thread?

Everything is working correctly. The child process is executing ask_timer() function completely independently from you main process. You don't have any time.sleep() in this function, so it just prints or puts in the queue deltas in the infinite loop with interval of like 10ms.
Once a second your main process asks child process for data and get's it. Data is one of those small intervals.
The problem there is that you're putting much more data into pipe/queue, than taking from it. So you're getting old data, when you ask. To test that you can print queue size in the loop (won't work on OS X):
def ask_timer(q):
start = datetime.now()
while True:
now = datetime.now()
delta = now - start
s = delta.seconds + delta.microseconds / 1E6
q.put(s)
q = Queue()
p = Process(target=ask_timer, args=(q,))
p.start()
while True:
print q.get()
print q.qsize()
time.sleep(1)
The queue size will grow really fast.
Apparently you can use shared memory to read current value from the child process.
from multiprocessing import Process, Value
from datetime import datetime
import time
from ctypes import c_double
def ask_timer(v):
start = datetime.now()
while True:
now = datetime.now()
delta = now - start
s = delta.seconds + delta.microseconds / 1E6
v.value = s
val = Value(c_double, 0.0)
p = Process(target=ask_timer, args=(val,))
p.start()
while True:
print(val.value)
time.sleep(1)

Related

Why is training this xgboost model in a subprocess not terminating?

Given the following program running my_function in a subprocess using run_process_timeout_wrapper leads to a timeout (over 160s), while running it "normally" takes less than a second.
from multiprocessing import Process, Queue
import time
import numpy as np
import xgboost
def run_process_timeout_wrapper(function, args, timeout):
def foo(n, out_q):
res = function(*n)
out_q.put(res) # to get result back from thread target
result_q = Queue()
p = Process(target=foo, args=(args, result_q))
p.start()
try:
x = result_q.get(timeout=timeout)
except Empty as e:
p.terminate()
raise multiprocessing.TimeoutError("Timed out after waiting for {}s".format(timeout))
p.terminate()
return x
def my_function(fun):
print("Started")
t1 = time.time()
pol = xgboost.XGBRegressor()
pol.fit(np.random.rand(5,1500), np.random.rand(50,1))
print("Took ", time.time() - t1)
pol.predict(np.random.rand(2,1500))
return 5
if __name__ == '__main__':
t1 = time.time()
pol = xgboost.XGBRegressor()
pol.fit(np.random.rand(50,150000), np.random.rand(50,1))
print("Took ", time.time() - t1)
my_function(None)
t1 = time.time()
res = run_process_timeout_wrapper(my_function, (None,),160)
print("Res ", res, " Time ", time.time() - t1)
I am running this on Linux. Since it has come up, I have also added a print in the beginning of my_function showing that this function is at least reached.
Gathered from this issue I found that forking a multi-threaded application is problematic. One possible solution is to add
if __name__ == "__main__":
mp.set_start_method('spawn')
However, this may lead to other issues.

Why Multiprocessing cost the same time as normal way

I am new to python and multiprocess or multi thread,Here is my question,I tired to use multiprocessing module in python,I follow the guide and create two s
separate process, place a function into each process,and run it,record the time,then I find that the time it cost does not become less, I was wondering why,here is my code:
import multi processing
import time
start = time.time()
def mathwork():
print(sum(j * j for j range(10 ** 7)))
if__name__ ==‘__main__’:
process1 = multiprocessing.Process(name = ‘process1’,target = mathwork)
process2 = multiprocessing.Process(name = ‘process2’,target = mathwork)
process1.start()
process2.start()
end = time.time()
print(end-start)
I'm going to assume that the code you posted was messed with by some text editor.
I'll answer your question using the example below:
import multiprocessing
import time
start = time.time()
def mathwork():
print(sum(j * j for j in range(10 ** 7)))
if __name__ =='__main__':
process1 = multiprocessing.Process(name = 'process1',target = mathwork)
process2 = multiprocessing.Process(name = 'process2',target = mathwork)
process1.start()
process2.start()
end = time.time()
print(end-start)
The reason your code takes just as long to complete, no matter what the threads are doing, is that you aren't waiting for your threads to complete before printing out the time.
To wait for your processes to finish you have to use the join function on them, which will create the following snippet:
if __name__ =='__main__':
process1 = multiprocessing.Process(name = 'process1',target = mathwork)
process2 = multiprocessing.Process(name = 'process2',target = mathwork)
process1.start()
process2.start()
process1.join()
process2.join()
end = time.time()
print(end-start)
You'll notice that the time is now larger when you're running the processes, because your code is now waiting for them to finish and return.
As an interesting aside (Now found out to be due To this quirk between windows and unix), if your print statement was outside the __name__ == '__main__' check, you would print times for each process you ran, because it loaded the file again to get the function definition.
With this method I get:
4.772554874420166 # single execution ( 2 functions in main )
2.486908197402954 # multi processing ( threads for each function )
You measure the time it takes to start the processes, not the time it takes to run them. Wait for the processes to finish by calling join, like this:
import multiprocessing
import time
def mathwork():
sum(j * j for j in range(10 ** 7))
if __name__ == '__main__':
start = time.time()
process1 = multiprocessing.Process(name='process1', target=mathwork)
process2 = multiprocessing.Process(name='process2', target=mathwork)
process1.start()
process2.start()
process1.join()
process2.join()
print('multiprocessing: %s' % (time.time() - start))
start = time.time()
mathwork()
mathwork()
print('one process: %s' % (time.time() - start))
On my system, the output is:
multiprocessing: 0.9190812110900879
one process: 1.8888437747955322
Showing that indeed, multiprocessing makes this computation go twice as fast.

Wait for all python processes to finish

I'm starting several python processes with downloads in a loop that calls this piece of code:
startTimeMillis = int(round(time.time() * 1000))
for i in range(10):
p = multiprocessing.Process(target=performCurl, args =("http://www.google.com/d", i, ))
p.start()
endTimeMillis = int(round(time.time() * 1000))
totalTimeSeconds = (endTimeMillis - startTimeMillis)
print "The whole process took ", str(totalTimeSeconds)
I want to check the time it takes for all the processes to finish, so how would I make the last part of the code to wait for all the processes?
Use p.join() to wait for a process to terminate
all_processes = [multiprocessing.Process(...) for ... in ...]
for p in all_processes:
p.start()
for p in all_processes:
p.join()

kill threads later a time in python

I have a python code with threads, and i need that if in for example 1 hour the threads are not finished, finish all threads and finish the script, and if the hour are not complete wait that all my threads finish.
I try with a daemon thread, and with a sleep of the hour, and if the hour is complete use a: sys.exit() but it not works to me, because always wait to my sleep threadh, then my script wait until the thread finished and the sys.exit() does not work.
import socket, threading, time, sys
from sys import argv
import os
acc_time=0
transactions_ps=5
ins = open(sys.argv[1],'r')
msisdn_list = []
for line in ins:
msisdn_list.append (line.strip('\n'))
# print line
ins.close()
def worker(msisdn_list):
semaphore.acquire()
global transactions_ps
print " ***** ", threading.currentThread().getName(), "Lanzado"
count=1
acc_time=0
print "len: ",len(msisdn_list)
for i in msisdn_list:
try:
init=time.time()
time.sleep(2)
print "sleeping...",i
time.sleep(4)
final=time.time()
acc_time = acc_time+final-init
print acc_time
except IOError:
print "Connection failed",sys.exc_info()[0]
print "Deteniendo ",threading.currentThread().getName()
semaphore.release()
def kill_process(secs_to_die):
time.sleep(secs_to_die)
sys.exit()
seconds_to_die=3600
thread_kill = threading.Thread(target = kill_process, args=(seconds_to_die,))
thread_kill.start()
max_con=5
semaphore = threading.BoundedSemaphore(max_con)
for i in range(0,28,transactions_ps):
w = threading.Thread(target=worker, args=(msisdn_list[i:i+transactions_ps-1],))
w.setDaemon(True)
w.start()
How can to do it
A minimal change to your code that would fix the issue is threading.Barrier:
barrier = Barrier(number_of_threads, timeout=3600)
# create (number_of_threads - 1) threads, pass them barrier
# each thread calls barrier.wait() on exit
barrier.wait() # after number_of_threads .wait() calls or on timeout it returns
A simpler alternative is to use multiprocessing.dummy.Pool that creates daemon threads:
from multiprocessing.dummy import Pool # use threads
start = timer()
endtime = start + 3600
for result in pool.imap_unordered(work, args):
if timer() > endtime:
exit("timeout")
The code doesn't timeout until a work item is done i.e., it expects that processing a single item from the list doesn't take long.
Complete example:
#!/usr/bin/env python3
import logging
import multiprocessing as mp
from multiprocessing.dummy import Pool
from time import monotonic as timer, sleep
info = mp.get_logger().info
def work(i):
info("start %d", i)
sleep(1)
info("end %d", i)
seconds_to_die = 3600
max_con = 5
mp.log_to_stderr().setLevel(logging.INFO) # enable logging
pool = Pool(max_con) # no more than max_con at a time
start = timer()
endtime = start + seconds_to_die
for _ in pool.imap_unordered(work, range(10000)):
if timer() > endtime:
exit("timeout")
You may refer to this implementation of KThread:
http://python.todaysummary.com/q_python_45717.html

process stop working while queue is not empty

I try to write a script in python to convert url into its corresponding ip. Since the url file is huge (nearly 10GB), so I'm trying to use multiprocessing lib.
I create one process to write output to file and a set of processes to convert url.
Here is my code:
import multiprocessing as mp
import socket
import time
num_processes = mp.cpu_count()
sentinel = None
def url2ip(inqueue, output):
v_url = inqueue.get()
print 'v_url '+v_url
try:
v_ip = socket.gethostbyname(v_url)
output_string = v_url+'|||'+v_ip+'\n'
except:
output_string = v_url+'|||-1'+'\n'
print 'output_string '+output_string
output.put(output_string)
print output.full()
def handle_output(output):
f_ip = open("outputfile", "a")
while True:
output_v = output.get()
if output_v:
print 'output_v '+output_v
f_ip.write(output_v)
else:
break
f_ip.close()
if __name__ == '__main__':
output = mp.Queue()
inqueue = mp.Queue()
jobs = []
proc = mp.Process(target=handle_output, args=(output, ))
proc.start()
print 'run in %d processes' % num_processes
for i in range(num_processes):
p = mp.Process(target=url2ip, args=(inqueue, output))
jobs.append(p)
p.start()
for line in open('inputfile','r'):
print 'ori '+line.strip()
inqueue.put(line.strip())
for i in range(num_processes):
# Send the sentinal to tell Simulation to end
inqueue.put(sentinel)
for p in jobs:
p.join()
output.put(None)
proc.join()
However, it did not work. It did produce several outputs (4 out of 10 urls in the test file) but it just suddenly stops while queues are not empty (I did check queue.empty())
Could anyone suggest what's wrong?Thanks
You're workers exit after processing a single url each, they need to loop internally until they get the sentinel. However, you should probably just look at multiprocessing.pool instead, as that does the bookkeeping for you.

Categories

Resources