I'm starting several python processes with downloads in a loop that calls this piece of code:
startTimeMillis = int(round(time.time() * 1000))
for i in range(10):
p = multiprocessing.Process(target=performCurl, args =("http://www.google.com/d", i, ))
p.start()
endTimeMillis = int(round(time.time() * 1000))
totalTimeSeconds = (endTimeMillis - startTimeMillis)
print "The whole process took ", str(totalTimeSeconds)
I want to check the time it takes for all the processes to finish, so how would I make the last part of the code to wait for all the processes?
Use p.join() to wait for a process to terminate
all_processes = [multiprocessing.Process(...) for ... in ...]
for p in all_processes:
p.start()
for p in all_processes:
p.join()
Related
Using below code I start to thread processes, write_process writes to a queue and read_process reads from a queue :
import time
from multiprocessing import Process, Queue, Pool
class QueueFun():
def writing_queue(self, work_tasks):
while True:
print("Writing to queue")
work_tasks.put(1)
time.sleep(1)
def read_queue(self, work_tasks):
while True:
print('Reading from queue')
work_tasks.get()
time.sleep(2)
if __name__ == '__main__':
q = QueueFun()
work_tasks = Queue()
write_process = Process(target=q.writing_queue,
args=(work_tasks,))
write_process.start()
read_process = Process(target=q.read_queue,
args=(work_tasks,))
read_process.start()
write_process.join()
read_process.join()
Running above code prints:
Writing to queue
Reading from queue
Writing to queue
Reading from queue
Writing to queue
Writing to queue
Reading from queue
Writing to queue
How to start N processes to read from the queue?
I tried starting 3 processes using below code but just 1 process is started, this is because the .join() prevents the second process from starting?:
for i in range(0 , 3):
read_process = Process(target=q.read_queue,
args=(work_tasks,))
print('Starting read_process' , i)
read_process.start()
read_process.join()
I also considered using a Pool as described in https://docs.python.org/2/library/multiprocessing.html but this seems just relevant for transforming an existing collection :
print pool.map(f, range(10))
How to start n threads where each thread processes a shared queue?
You can just put it to list, and join it outside of creation loop:
if __name__ == '__main__':
q = QueueFun()
work_tasks = Queue()
write_process = Process(target=q.writing_queue,
args=(work_tasks,))
write_process.start()
processes = []
for i in range(0, 5):
processes.append(Process(target=q.read_queue,
args=(work_tasks,)))
for p in processes:
p.start()
write_process.join()
for p in processes:
p.join()
Example:
I have installed a sensor in the car, that is sending the data continuously, Now, I have to process(fusion) the continuous data coming from the sensor but at the same while process will be finishing its execution, data will also be coming so, how to store the data that is coming while process is taking time for execution for future?.
sample code:
buffer1=[]
buffer2=[]
def process_function(buffer):
//processing
while true:
//data receiving continously
buffer1.append(data)
if len(buffer1)>0: process(buffer1)
buffer2.append(data)
(while the process_function will take buffer1 to process, at the same time, the continuous data should be stored in buffer2 so that after finishing the process_function with buffer1 can process with buffer2 and repeat.)
You could use a multiprocessing Queue and two processes. One for the producer and one for the consumer:
from multiprocessing import Process, Queue
def collection_sensor_values(mp_queue):
fake_value = 0
while True:
mp_queue.put(f"SENSOR_DATA_{fake_value}")
fake_value += 1
time.sleep(2)
def process_function(mp_queue):
while True:
sensor_reading = mp_queue.get(block=True)
print(f"Received sensor reading: {sensor_reading}")
q = Queue()
sensor_collector_process = Process(target=collection_sensor_values, args=(q,))
readings_process = Process(target=process_function, args=(q,))
all_procs = [sensor_collector_process, readings_process]
for p in all_procs:
p.start()
for p in all_procs:
# run until either process stops
if p.is_alive():
p.join()
for p in all_procs:
if p.is_alive():
p.terminate()
I am new to python and multiprocess or multi thread,Here is my question,I tired to use multiprocessing module in python,I follow the guide and create two s
separate process, place a function into each process,and run it,record the time,then I find that the time it cost does not become less, I was wondering why,here is my code:
import multi processing
import time
start = time.time()
def mathwork():
print(sum(j * j for j range(10 ** 7)))
if__name__ ==‘__main__’:
process1 = multiprocessing.Process(name = ‘process1’,target = mathwork)
process2 = multiprocessing.Process(name = ‘process2’,target = mathwork)
process1.start()
process2.start()
end = time.time()
print(end-start)
I'm going to assume that the code you posted was messed with by some text editor.
I'll answer your question using the example below:
import multiprocessing
import time
start = time.time()
def mathwork():
print(sum(j * j for j in range(10 ** 7)))
if __name__ =='__main__':
process1 = multiprocessing.Process(name = 'process1',target = mathwork)
process2 = multiprocessing.Process(name = 'process2',target = mathwork)
process1.start()
process2.start()
end = time.time()
print(end-start)
The reason your code takes just as long to complete, no matter what the threads are doing, is that you aren't waiting for your threads to complete before printing out the time.
To wait for your processes to finish you have to use the join function on them, which will create the following snippet:
if __name__ =='__main__':
process1 = multiprocessing.Process(name = 'process1',target = mathwork)
process2 = multiprocessing.Process(name = 'process2',target = mathwork)
process1.start()
process2.start()
process1.join()
process2.join()
end = time.time()
print(end-start)
You'll notice that the time is now larger when you're running the processes, because your code is now waiting for them to finish and return.
As an interesting aside (Now found out to be due To this quirk between windows and unix), if your print statement was outside the __name__ == '__main__' check, you would print times for each process you ran, because it loaded the file again to get the function definition.
With this method I get:
4.772554874420166 # single execution ( 2 functions in main )
2.486908197402954 # multi processing ( threads for each function )
You measure the time it takes to start the processes, not the time it takes to run them. Wait for the processes to finish by calling join, like this:
import multiprocessing
import time
def mathwork():
sum(j * j for j in range(10 ** 7))
if __name__ == '__main__':
start = time.time()
process1 = multiprocessing.Process(name='process1', target=mathwork)
process2 = multiprocessing.Process(name='process2', target=mathwork)
process1.start()
process2.start()
process1.join()
process2.join()
print('multiprocessing: %s' % (time.time() - start))
start = time.time()
mathwork()
mathwork()
print('one process: %s' % (time.time() - start))
On my system, the output is:
multiprocessing: 0.9190812110900879
one process: 1.8888437747955322
Showing that indeed, multiprocessing makes this computation go twice as fast.
I am a little bit confused testing the multiprocessing module.
Let's simulate a digital timer. The code would look like:
start=datetime.now()
while True:
now=datetime.now()
delta=now-start
s = delta.seconds + delta.microseconds/1E6
print s
time.sleep(1)
Which returns correctly:
8e-06
1.001072
2.00221
3.003353
4.004416
...
Now I want to read the clock from my virtual external digital clock device using a pipe:
def ask_timer(conn):
start=datetime.now()
while True:
now=datetime.now()
delta=now-start
s = delta.seconds + delta.microseconds/1E6
conn.send(s)
parent_conn, child_conn = Pipe()
p = Process(target=ask_timer, args=(child_conn,))
p.start()
while True:
print parent_conn.recv()
time.sleep(1)
It returns:
2.9e-05
6.7e-05
7.7e-05
8.3e-05
8.9e-05
9.4e-05
0.0001
...
Here the timer doesn't seem to run permanently in the background..The implementation of "Queue" looks like:
def ask_timer(q):
while True:
now=datetime.now()
delta=now-start
s = delta.seconds + delta.microseconds/1E6
q.put(s)
#conn.close()
q = Queue()
p = Process(target=ask_timer, args=(q,))
p.start()
while True:
print q.get()
time.sleep(1)
which does the same like pipe. Is this just my misconception of multiprocessing of python? How could I ask a value realtime from a running parallel-thread?
Everything is working correctly. The child process is executing ask_timer() function completely independently from you main process. You don't have any time.sleep() in this function, so it just prints or puts in the queue deltas in the infinite loop with interval of like 10ms.
Once a second your main process asks child process for data and get's it. Data is one of those small intervals.
The problem there is that you're putting much more data into pipe/queue, than taking from it. So you're getting old data, when you ask. To test that you can print queue size in the loop (won't work on OS X):
def ask_timer(q):
start = datetime.now()
while True:
now = datetime.now()
delta = now - start
s = delta.seconds + delta.microseconds / 1E6
q.put(s)
q = Queue()
p = Process(target=ask_timer, args=(q,))
p.start()
while True:
print q.get()
print q.qsize()
time.sleep(1)
The queue size will grow really fast.
Apparently you can use shared memory to read current value from the child process.
from multiprocessing import Process, Value
from datetime import datetime
import time
from ctypes import c_double
def ask_timer(v):
start = datetime.now()
while True:
now = datetime.now()
delta = now - start
s = delta.seconds + delta.microseconds / 1E6
v.value = s
val = Value(c_double, 0.0)
p = Process(target=ask_timer, args=(val,))
p.start()
while True:
print(val.value)
time.sleep(1)
I try to write a script in python to convert url into its corresponding ip. Since the url file is huge (nearly 10GB), so I'm trying to use multiprocessing lib.
I create one process to write output to file and a set of processes to convert url.
Here is my code:
import multiprocessing as mp
import socket
import time
num_processes = mp.cpu_count()
sentinel = None
def url2ip(inqueue, output):
v_url = inqueue.get()
print 'v_url '+v_url
try:
v_ip = socket.gethostbyname(v_url)
output_string = v_url+'|||'+v_ip+'\n'
except:
output_string = v_url+'|||-1'+'\n'
print 'output_string '+output_string
output.put(output_string)
print output.full()
def handle_output(output):
f_ip = open("outputfile", "a")
while True:
output_v = output.get()
if output_v:
print 'output_v '+output_v
f_ip.write(output_v)
else:
break
f_ip.close()
if __name__ == '__main__':
output = mp.Queue()
inqueue = mp.Queue()
jobs = []
proc = mp.Process(target=handle_output, args=(output, ))
proc.start()
print 'run in %d processes' % num_processes
for i in range(num_processes):
p = mp.Process(target=url2ip, args=(inqueue, output))
jobs.append(p)
p.start()
for line in open('inputfile','r'):
print 'ori '+line.strip()
inqueue.put(line.strip())
for i in range(num_processes):
# Send the sentinal to tell Simulation to end
inqueue.put(sentinel)
for p in jobs:
p.join()
output.put(None)
proc.join()
However, it did not work. It did produce several outputs (4 out of 10 urls in the test file) but it just suddenly stops while queues are not empty (I did check queue.empty())
Could anyone suggest what's wrong?Thanks
You're workers exit after processing a single url each, they need to loop internally until they get the sentinel. However, you should probably just look at multiprocessing.pool instead, as that does the bookkeeping for you.