python parallel calculations in a while loop - python

I have been looking around for some time, but haven't had luck finding an example that could solve my problem. I have added an example from my code. As one can notice this is slow and the 2 functions could be done separately.
My aim is to print every second the latest parameter values. At the same time the slow processes can be calculated in the background. The latest value is shown and when any process is ready the value is updated.
Can anybody recommend a better way to do it? An example would be really helpful.
Thanks a lot.
import time
def ProcessA(parA):
# imitate slow process
time.sleep(5)
parA += 2
return parA
def ProcessB(parB):
# imitate slow process
time.sleep(10)
parB += 5
return parB
# start from here
i, parA, parB = 1, 0, 0
while True: # endless loop
print(i)
print(parA)
print(parB)
time.sleep(1)
i += 1
# update parameter A
parA = ProcessA(parA)
# update parameter B
parB = ProcessB(parB)

I imagine this should do it for you. This has the benefit of you being able to add extra parallel funcitons up to a total equal to the number of cores you have. Edits are welcome.
#import time module
import time
#import the appropriate multiprocessing functions
from multiprocessing import Pool
#define your functions
#whatever your slow function is
def slowFunction(x):
return someFunction(x)
#printingFunction
def printingFunction(new,current,timeDelay):
while new == current:
print current
time.sleep(timeDelay)
#set the initial value that will be printed.
#Depending on your function this may take some time.
CurrentValue = slowFunction(someTemporallyDynamicVairable)
#establish your pool
pool = Pool()
while True: #endless loop
#an asynchronous function, this will continue
# to run in the background while your printing operates.
NewValue = pool.apply_async(slowFunction(someTemporallyDynamicVairable))
pool.apply(printingFunction(NewValue,CurrentValue,1))
CurrentValue = NewValue
#close your pool
pool.close()

Related

random.random() generates same number in multiprocessing

I'm working on an optimization problem, and you can see a simplified version of my code posted below (the origin code is too complicated for asking such a question, and I hope my simplified code has simulated the original one as much as possible).
My purpose:
use the function foo in the function optimization, but foo can take very long time due to some hard situations. So I use multiprocessing to set a time limit for execution of the function (proc.join(iter_time), the method is from an anwser from this question; How to limit execution time of a function call?).
My problem:
In the while loop, every time the generated values for extra are the same.
The list lst's length is always 1, which means in every iteration in the while loop it starts from an empty list.
My guess: possible reason can be each time I create a process the random seed is counting from the beginning, and each time the process is terminated, there could be some garbage collection mechanism to clean the memory the processused, so the list is cleared.
My question
Anyone know the real reason of such problems?
if not using multiprocessing, is there anyway else that I can realize my purpose while generate different random numbers? btw I have tried func_timeout but it has other problems that I cannot handle...
random.seed(123)
lst = [] # a global list for logging data
def foo(epoch):
...
extra = random.random()
lst.append(epoch + extra)
...
def optimization(loop_time, iter_time):
start = time.time()
epoch = 0
while time.time() <= start + loop_time:
proc = multiprocessing.Process(target=foo, args=(epoch,))
proc.start()
proc.join(iter_time)
if proc.is_alive(): # if the process is not terminated within time limit
print("Time out!")
proc.terminate()
if __name__ == '__main__':
optimization(300, 2)
You need to use shared memory if you want to share variables across processes. This is because child processes do not share their memory space with the parent. Simplest way to do this here would be to use managed lists and delete the line where you set a number seed. This is what is causing same number to be generated because all child processes will take the same seed to generate the random numbers. To get different random numbers either don't set a seed, or pass a different seed to each process:
import time, random
from multiprocessing import Manager, Process
def foo(epoch, lst):
extra = random.random()
lst.append(epoch + extra)
def optimization(loop_time, iter_time, lst):
start = time.time()
epoch = 0
while time.time() <= start + loop_time:
proc = Process(target=foo, args=(epoch, lst))
proc.start()
proc.join(iter_time)
if proc.is_alive(): # if the process is not terminated within time limit
print("Time out!")
proc.terminate()
print(lst)
if __name__ == '__main__':
manager = Manager()
lst = manager.list()
optimization(10, 2, lst)
Output
[0.2035898948744943, 0.07617925389396074, 0.6416754412198231, 0.6712193790613651, 0.419777147554235, 0.732982735576982, 0.7137712131028766, 0.22875414425414997, 0.3181113880578589, 0.5613367673646847, 0.8699685474084119, 0.9005359611195111, 0.23695341111251134, 0.05994288664062197, 0.2306562314450149, 0.15575356275408125, 0.07435292814989103, 0.8542361251850187, 0.13139055891993145, 0.5015152768477814, 0.19864873743952582, 0.2313646288041601, 0.28992667535697736, 0.6265055915510219, 0.7265797043535446, 0.9202923318284002, 0.6321511834038631, 0.6728367262605407, 0.6586979597202935, 0.1309226720786667, 0.563889613032526, 0.389358766191921, 0.37260564565714316, 0.24684684162272597, 0.5982042933298861, 0.896663326233504, 0.7884030244369596, 0.6202229004466849, 0.4417549843477827, 0.37304274232635715, 0.5442716244427301, 0.9915536257041505, 0.46278512685707873, 0.4868394190894778, 0.2133187095154937]
Keep in mind that using managers will affect performance of your code. Alternate to this, you could also use multiprocessing.Array, which is faster than managers but is less flexible in what data it can store, or Queues as well.

Multiprocessing Running Slower than a Single Process

I'm attempting to use multiprocessing to run many simulations across multiple processes; however, the code I have written only uses 1 of the processes as far as I can tell.
Updated
I've gotten all the processes to work (I think) thanks to #PaulBecotte ; however, the multiprocessing seems to run significantly slower than its non-multiprocessing counterpart.
For instance, not including the function and class declarations/implementations and imports, I have:
def monty_hall_sim(num_trial, player_type='AlwaysSwitchPlayer'):
if player_type == 'NeverSwitchPlayer':
player = NeverSwitchPlayer('Never Switch Player')
else:
player = AlwaysSwitchPlayer('Always Switch Player')
return (MontyHallGame().play_game(player) for trial in xrange(num_trial))
def do_work(in_queue, out_queue):
while True:
try:
f, args = in_queue.get()
ret = f(*args)
for result in ret:
out_queue.put(result)
except:
break
def main():
logging.getLogger().setLevel(logging.ERROR)
always_switch_input_queue = multiprocessing.Queue()
always_switch_output_queue = multiprocessing.Queue()
total_sims = 20
num_processes = 5
process_sims = total_sims/num_processes
with Timer(timer_name='Always Switch Timer'):
for i in xrange(num_processes):
always_switch_input_queue.put((monty_hall_sim, (process_sims, 'AlwaysSwitchPlayer')))
procs = [multiprocessing.Process(target=do_work, args=(always_switch_input_queue, always_switch_output_queue)) for i in range(num_processes)]
for proc in procs:
proc.start()
always_switch_res = []
while len(always_switch_res) != total_sims:
always_switch_res.append(always_switch_output_queue.get())
always_switch_success = float(always_switch_res.count(True))/float(len(always_switch_res))
print '\tLength of Always Switch Result List: {alw_sw_len}'.format(alw_sw_len=len(always_switch_res))
print '\tThe success average of switching doors was: {alw_sw_prob}'.format(alw_sw_prob=always_switch_success)
which yields:
Time Elapsed: 1.32399988174 seconds
Length: 20
The success average: 0.6
However, I am attempting to use this for total_sims = 10,000,000 over num_processes = 5, and doing so has taken significantly longer than using 1 process (1 process returned in ~3 minutes). The non-multiprocessing counterpart I'm comparing it to is:
def main():
logging.getLogger().setLevel(logging.ERROR)
with Timer(timer_name='Always Switch Monty Hall Timer'):
always_switch_res = [MontyHallGame().play_game(AlwaysSwitchPlayer('Monty Hall')) for x in xrange(10000000)]
always_switch_success = float(always_switch_res.count(True))/float(len(always_switch_res))
print '\n\tThe success average of not switching doors was: {not_switching}' \
'\n\tThe success average of switching doors was: {switching}'.format(not_switching=never_switch_success,
switching=always_switch_success)
You could try import “process “ under some if statements
EDIT- you changed some stuff, let me try and explain a bit better.
Each message you put into the input queue will cause the monty_hall_sim function to get called and send num_trial messages to the output queue.
So your original implementation was right- to get 20 output messages, send in 5 input messages.
However, your function is slightly wrong.
for trial in xrange(num_trial):
res = MontyHallGame().play_game(player)
yield res
This will turn the function into a generator that will provide a new value on each next() call- great! The problem is here
while True:
try:
f, args = in_queue.get(timeout=1)
ret = f(*args)
out_queue.put(ret.next())
except:
break
Here, on each pass through the loop you create a NEW generator with a NEW message. The old one is thrown away. So here, each input message only adds a single output message to the queue before you throw it away and get another one. The correct way to write this is-
while True:
try:
f, args = in_queue.get(timeout=1)
ret = f(*args)
for result in ret:
out_queue.put(ret.next())
except:
break
Doing it this way will continue to yield output messages from the generator until it finishes (after yielding 4 messages in this case)
I was able to get my code to run significantly faster by changing monty_hall_sim's return to a list comprehension, having do_work add the lists to the output queue, and then extend the results list of main with the lists returned by the output queue. Made it run in ~13 seconds.

Set the nr of executions per second using Python's multiprocessing

I wrote a script in Python 3.6 initially using a for loop which called an API, then putting all results into a pandas dataframe and writing them to a SQL database. (approximately 9,000 calls are made to that API every time the script runs).
Realising the calls inside the for loop were processed one-by-one, I decided to use the multiprocessing module to speed things up.
Therefore, I created a module level function called parallel_requests and now I call that instead of having the for loop:
list_of_lists = multiprocessing.Pool(processes=4).starmap(parallel_requests, zip(....))
Side note: I use starmap instead of map only because my parallel_requests function takes multiple arguments which I need to zip.
The good: this approach works and is much faster.
The bad: this approach works but is too fast. By using 4 processes (I tried that because I have 4 cores), parallel_requests is getting executed too fast. More than 15 calls per second are made to the API, and I'm getting blocked by the API itself.
In fact, it only works if I use 1 or 2 processes, otherwise it's too damn fast.
Essentially what I want is to keep using 4 processes, but also to limit the execution of my parallel_requests function to only 15 times per second overall.
Is there any parameter of multiprocessing.Pool that would help with this, or it's more complicated than that?
For this case I'd use a leaky bucket. You can have one process that fills a queue at the proscribed rate, with a maximum size that indicates how many requests you can "bank" if you don't make them at the maximum rate; the worker processes then just need to get from the queue before doing its work.
import time
def make_api_request(this, that, rate_queue):
rate_queue.get()
print("DEBUG: doing some work at {}".format(time.time()))
return this * that
def throttler(rate_queue, interval):
try:
while True:
if not rate_queue.full(): # avoid blocking
rate_queue.put(0)
time.sleep(interval)
except BrokenPipeError:
# main process is done
return
if __name__ == '__main__':
from multiprocessing import Pool, Manager, Process
from itertools import repeat
rq = Manager().Queue(maxsize=15) # conservative; no banking
pool = Pool(4)
Process(target=throttler, args=(rq, 1/15.)).start()
pool.starmap(make_api_request, zip(range(100), range(100, 200), repeat(rq)))
I'll look at the ideas posted here, but in the meantime I've just used a simple approach of opening and closing a Pool of 4 processes for every 15 requests and appending all the results in a list_of_lists.
Admittedly, not the best approach, since it takes time/resources to open/close a Pool, but it was the most handy solution for now.
# define a generator for use below
def chunks(l, n):
"""Yield successive n-sized chunks from l."""
for i in range(0, len(l), n):
yield l[i:i + n]
list_of_lists = []
for current_chunk in chunks(all_data, 15): # 15 is the API's limit of requests per second
pool = multiprocessing.Pool(processes=4)
res = pool.starmap(parallel_requests, zip(current_chunk, [to_symbol]*len(current_chunk), [query]*len(current_chunk), [start]*len(current_chunk), [stop]*len(current_chunk)) )
sleep(1) # Sleep for 1 second after every 15 API requests
list_of_lists.extend(res)
pool.close()
flatten_list = [item for sublist in list_of_lists for item in sublist] # use this to construct a `pandas` dataframe
PS: This solution is really not at all that fast due to the multiple opening/closing of pools. Thanks Nathan Vērzemnieks for suggesting to open just one pool, it's much faster, plus your processor won't look like it's running a stress test.
One way to do is to use Queue, which can share details about api-call timestamps with other processes.
Below is an example how this could work. It takes the oldest entry in queue, and if it is younger than one second, sleep functions is called for the duration of the difference.
from multiprocessing import Pool, Manager, queues
from random import randint
import time
MAX_CONNECTIONS = 10
PROCESS_COUNT = 4
def api_request(a, b):
time.sleep(randint(1, 9) * 0.03) # simulate request
return a, b, time.time()
def parallel_requests(a, b, the_queue):
try:
oldest = the_queue.get()
time_difference = time.time() - oldest
except queues.Empty:
time_difference = float("-inf")
if 0 < time_difference < 1:
time.sleep(1-time_difference)
else:
time_difference = 0
print("Current time: ", time.time(), "...after sleeping:", time_difference)
the_queue.put(time.time())
return api_request(a, b)
if __name__ == "__main__":
m = Manager()
q = m.Queue(maxsize=MAX_CONNECTIONS)
for _ in range(0, MAX_CONNECTIONS): # Fill the queue with zeroes
q.put(0)
p = Pool(PROCESS_COUNT)
# Create example data
data_length = 100
data1 = range(0, data_length) # Just some dummy-data
data2 = range(100, data_length+100) # Just some dummy-data
queue_iterable = [q] * (data_length+1) # required for starmap -function
list_of_lists = p.starmap(parallel_requests, zip(data1, data2, queue_iterable))
print(list_of_lists)

Set a timer for running a process, pause the timer under certain conditions

I've got this program:
import multiprocessing
import time
def timer(sleepTime):
time.sleep(sleepTime)
fooProcess.terminate()
fooProcess.join() #line said to "cleanup", not sure if it is required, refer to goo.gl/Qes6KX
def foo():
i=0
while 1
print i
time.sleep(1)
i
if i==4:
#pause timerProcess for X seconds
fooProcess = multiprocessing.Process(target=foo, name="Foo", args=())
timer()
fooProcess.start()
And as you can see in the comment, under certain conditions (in this example i has to be 4) the timer has to stop for a certain X time, while foo() keeps working.
Now, how do I implement this?
N.B.: this code is just an example, the point is that I want to pause a process under certain conditions for a certain amount of time.
I am think you're going about this wrong for game design. Games always (no exceptions come to mind) use a primary event loop controlled in software.
Each time through the loop you check the time and fire off all the necessary events based on how much time has elapsed. At the end of the loop you sleep only as long as necessary before you got the next timer or event or refresh or ai check or other state change.
This gives you the best performance regarding lag, consistency, predictability, and other timing features that matter in games.
roughly:
get the current timestamp at the time start time (time.time(), I presume)
sleep with Event.wait(timeout=...)
wake up on an Event or timeout.
if on Event: get timestamp, subtract initial on, subtract result from timer; wait until foo() stops; repeat Event.wait(timeout=[result from 4.])
if on timeout: exit.
Here is an example, how I understand, what your Programm should do:
import threading, time, datetime
ACTIVE = True
def main():
while ACTIVE:
print "im working"
time.sleep(.3)
def run(thread, timeout):
global ACTIVE
thread.start()
time.sleep(timeout)
ACTIVE = False
thread.join()
proc = threading.Thread(target = main)
print datetime.datetime.now()
run(proc, 2) # run for 2 seconds
print datetime.datetime.now()
In main() it does a periodic task, here printing something. In the run() method you can say, how long main should do the task.
This code producess following output:
2014-05-25 17:10:54.390000
im working
im working
im working
im working
im working
im working
im working
2014-05-25 17:10:56.495000
please correct me, if I've understood you wrong.
I would use multiprocessing.Pipe for signaling, combined with select for timing:
#!/usr/bin/env python
import multiprocessing
import select
import time
def timer(sleeptime,pipe):
start = time.time()
while time.time() < start + sleeptime:
n = select.select([pipe],[],[],1) # sleep in 1s intervals
for conn in n[0]:
val = conn.recv()
print 'got',val
start += float(val)
def foo(pipe):
i = 0
while True:
print i
i += 1
time.sleep(1)
if i%7 == 0:
pipe.send(5)
if __name__ == '__main__':
mainpipe,foopipe = multiprocessing.Pipe()
fooProcess = multiprocessing.Process(target=foo,name="Foo",args=(foopipe,))
fooProcess.start()
timer(10,mainpipe)
fooProcess.terminate()
# since we terminated, mainpipe and foopipe are corrupt
del mainpipe, foopipe
# ...
print 'Done'
I'm assuming that you want some condition in the foo process to extend the timer. In the sample I have set up, every time foo hits a multiple of 7 it extends the timer by 5 seconds while the timer initially counts down 10 seconds. At the end of the timer we terminate the process - foo won't finish nicely at all, and the pipes will get corrupted, but you can be certain that it'll die. Otherwise you can send a signal back along mainpipe that foo can listen for and exit nicely while you join.

How do I ensure that a Python while-loop takes a particular amount of time to run?

I'm reading serial data with a while loop. However, I have no control over the sample rate.
The code itself seems to take 0.2s to run, so I know I won't be able to go any faster than that. But I would like to be able to control precisely how much slower I sample.
I feel like I could do it using 'sleep', but the problem is that there is potential that at different points the loop itself will take longer to read(depending on precisely what is being transmitted over serial data), so the code would have to make up the balance.
For example, let's say I want to sample every 1s, and the loop takes anywhere from 0.2s to 0.3s to run. My code needs to be smart enough to sleep for 0.8s (if the loop takes 0.2s) or 0.7s (if the loop takes 0.3s).
import serial
import csv
import time
#open serial stream
while True:
#read and print a line
sample_value=ser.readline()
sample_time=time.time()-zero
sample_line=str(sample_time)+','+str(sample_value)
outfile.write(sample_line)
print 'time: ',sample_time,', value: ',sample_value
Just measure the time running your code takes every iteration of the loop, and sleep accordingly:
import time
while True:
now = time.time() # get the time
do_something() # do your stuff
elapsed = time.time() - now # how long was it running?
time.sleep(1.-elapsed) # sleep accordingly so the full iteration takes 1 second
Of course not 100% perfect (maybe off one millisecond or another from time to time), but I guess it's good enough.
Another nice approach is using twisted's LoopingCall:
from twisted.internet import task
from twisted.internet import reactor
def do_something():
pass # do your work here
task.LoopingCall(do_something).start(1.0)
reactor.run()
An rather elegant method is you're working on UNIX : use the signal library
The code :
import signal
def _handle_timeout():
print "timeout hit" # Do nothing here
def second(count):
signal.signal(signal.SIGALRM, _handle_timeout)
signal.alarm(1)
try:
count += 1 # put your function here
signal.pause()
finally:
signal.alarm(0)
return count
if __name__ == '__main__':
count = 0
count = second(count)
count = second(count)
count = second(count)
count = second(count)
count = second(count)
print count
And the timing :
georgesl#cleese:~/Bureau$ time python timer.py
5
real 0m5.081s
user 0m0.068s
sys 0m0.004s
Two caveats though : it only works on *nix, and it is not multithread-safe.
At the beginning of the loop check if the appropriate amount of time has passed. If it has not, sleep.
# Set up initial conditions for sample_time outside the loop
sample_period = ???
next_min_time = 0
while True:
sample_time = time.time() - zero
if sample_time < next_min_time:
time.sleep(next_min_time - sample_time)
continue
# read and print a line
sample_value = ser.readline()
sample_line = str(sample_time)+','+str(sample_value)
outfile.write(sample_line)
print 'time: {}, value: {}'.format(sample_time, sample_value)
next_min_time = sample_time + sample_period

Categories

Resources