I'm trying to run three functions (each can take up to 1 second to execute) every second. I'd then like to store the output from each function, and write them to separate files.
At the moment I'm using Timers for my delay handling. (I could subclass Thread, but that's getting a bit complicated for this simple script)
def main:
for i in range(3):
set_up_function(i)
t = Timer(1, run_function, [i])
t.start()
time.sleep(100) # Without this, main thread exits
def run_function(i):
t = Timer(1, run_function, [i])
t.start()
print function_with_delay(i)
What's the best way to handle the output from function_with_delay? Append the result to a global list for each function?
Then I could put something like this at the end of my main function:
...
while True:
time.sleep(30) # or in a try/except with a loop of 1 second sleeps so I can interrupt
for i in range(3):
save_to_disk(data[i])
Thoughts?
Edit: Added my own answer as a possibility
I believe the python Queue module is designed for precisely this sort of scenario. You could do something like this, for example:
def main():
q = Queue.Queue()
for i in range(3):
t = threading.Timer(1, run_function, [q, i])
t.start()
while True:
item = q.get()
save_to_disk(item)
q.task_done()
def run_function(q, i):
t = threading.Timer(1, run_function, [q, i])
t.start()
q.put(function_with_delay(i))
I would say store a list of lists (bool, str), where bool is whether the function has finished running and str is the output. Each function locks the list with a mutex to append output (or if you don't care about thread safety omit this). Then, have a simple polling loop checking if all the bool values are True, and if so then do your save_to_disk calls.
Another alternative would be to implement a class (taken from this answer) that uses threading.Lock(). This has the advantage of being able to wait on the ItemStore, and save_to_disk can use getAll, rather than polling the queue. (More efficient for large data sets?)
This is particularly suited to writing at a set time interval (ie every 30 seconds), rather than once per second.
class ItemStore(object):
def __init__(self):
self.lock = threading.Lock()
self.items = []
def add(self, item):
with self.lock:
self.items.append(item)
def getAll(self):
with self.lock:
items, self.items = self.items, []
return items
Related
So what I want to do is run the same function multiple times simultaneously while getting a result as return and storing it in an array or list. It goes like this:
def base_func(matrix,arg1,arg2):
result = []
for row in range(matrix.shape[0]):
#perform necessary operation on row and return a certain value to store it into result
x = func(matrix[row],arg1,arg2)
result.append(x)
return np.array(result)
I tried using threading in python. My implementation goes:
def base_func(matrix,arg1,arg2):
result = []
threads = []
for row in range(matrix.shape[0]):
t = threading.Thread(target=func,args=(matrix[row],arg1,arg2,))
threads.append(t)
t.start()
for t in threads:
res = t.join()
result.append(res)
return np.array(result)
This doesn't seem to work and just returns None from the threads.
From what I read in the documentation of threading.join(), it says:
As join() always returns None, you must call is_alive() after join() to decide whether a timeout happened – if the thread is still alive, the join() call timed out.
You will always get None from these line of your code:
res = t.join()
result.append(res)
This post mentions a similar problem as yours, please follow this for your solution. You might want to use concurrent.futures module as explained in this answer.
I'm attempting to use multiprocessing to run many simulations across multiple processes; however, the code I have written only uses 1 of the processes as far as I can tell.
Updated
I've gotten all the processes to work (I think) thanks to #PaulBecotte ; however, the multiprocessing seems to run significantly slower than its non-multiprocessing counterpart.
For instance, not including the function and class declarations/implementations and imports, I have:
def monty_hall_sim(num_trial, player_type='AlwaysSwitchPlayer'):
if player_type == 'NeverSwitchPlayer':
player = NeverSwitchPlayer('Never Switch Player')
else:
player = AlwaysSwitchPlayer('Always Switch Player')
return (MontyHallGame().play_game(player) for trial in xrange(num_trial))
def do_work(in_queue, out_queue):
while True:
try:
f, args = in_queue.get()
ret = f(*args)
for result in ret:
out_queue.put(result)
except:
break
def main():
logging.getLogger().setLevel(logging.ERROR)
always_switch_input_queue = multiprocessing.Queue()
always_switch_output_queue = multiprocessing.Queue()
total_sims = 20
num_processes = 5
process_sims = total_sims/num_processes
with Timer(timer_name='Always Switch Timer'):
for i in xrange(num_processes):
always_switch_input_queue.put((monty_hall_sim, (process_sims, 'AlwaysSwitchPlayer')))
procs = [multiprocessing.Process(target=do_work, args=(always_switch_input_queue, always_switch_output_queue)) for i in range(num_processes)]
for proc in procs:
proc.start()
always_switch_res = []
while len(always_switch_res) != total_sims:
always_switch_res.append(always_switch_output_queue.get())
always_switch_success = float(always_switch_res.count(True))/float(len(always_switch_res))
print '\tLength of Always Switch Result List: {alw_sw_len}'.format(alw_sw_len=len(always_switch_res))
print '\tThe success average of switching doors was: {alw_sw_prob}'.format(alw_sw_prob=always_switch_success)
which yields:
Time Elapsed: 1.32399988174 seconds
Length: 20
The success average: 0.6
However, I am attempting to use this for total_sims = 10,000,000 over num_processes = 5, and doing so has taken significantly longer than using 1 process (1 process returned in ~3 minutes). The non-multiprocessing counterpart I'm comparing it to is:
def main():
logging.getLogger().setLevel(logging.ERROR)
with Timer(timer_name='Always Switch Monty Hall Timer'):
always_switch_res = [MontyHallGame().play_game(AlwaysSwitchPlayer('Monty Hall')) for x in xrange(10000000)]
always_switch_success = float(always_switch_res.count(True))/float(len(always_switch_res))
print '\n\tThe success average of not switching doors was: {not_switching}' \
'\n\tThe success average of switching doors was: {switching}'.format(not_switching=never_switch_success,
switching=always_switch_success)
You could try import “process “ under some if statements
EDIT- you changed some stuff, let me try and explain a bit better.
Each message you put into the input queue will cause the monty_hall_sim function to get called and send num_trial messages to the output queue.
So your original implementation was right- to get 20 output messages, send in 5 input messages.
However, your function is slightly wrong.
for trial in xrange(num_trial):
res = MontyHallGame().play_game(player)
yield res
This will turn the function into a generator that will provide a new value on each next() call- great! The problem is here
while True:
try:
f, args = in_queue.get(timeout=1)
ret = f(*args)
out_queue.put(ret.next())
except:
break
Here, on each pass through the loop you create a NEW generator with a NEW message. The old one is thrown away. So here, each input message only adds a single output message to the queue before you throw it away and get another one. The correct way to write this is-
while True:
try:
f, args = in_queue.get(timeout=1)
ret = f(*args)
for result in ret:
out_queue.put(ret.next())
except:
break
Doing it this way will continue to yield output messages from the generator until it finishes (after yielding 4 messages in this case)
I was able to get my code to run significantly faster by changing monty_hall_sim's return to a list comprehension, having do_work add the lists to the output queue, and then extend the results list of main with the lists returned by the output queue. Made it run in ~13 seconds.
For a web-scraping analysis I need two loops that run permanently, one returning a list with websites updated every x minutes, while the other one analyses the sites (old an new ones) every y seconds. This is the code construction that exemplifies, what I am trying to do, but it doesn't work: Code has been edited to incorporate answers and my research
from multiprocessing import Process
import time, random
from threading import Lock
from collections import deque
class MyQueue(object):
def __init__(self):
self.items = deque()
self.lock = Lock()
def put(self, item):
with self.lock:
self.items.append(item)
# Example pointed at in [this][1] answer
def get(self):
with self.lock:
return self.items.popleft()
def a(queue):
while True:
x=[random.randint(0,10), random.randint(0,10), random.randint(0,10)]
print 'send', x
queue.put(x)
time.sleep(10)
def b(queue):
try:
while queue:
x = queue.get()
print 'recieve', x
for i in x:
print i
time.sleep(2)
except IndexError:
print queue.get()
if __name__ == '__main__':
q = MyQueue()
p1 = Process(target=a, args=(q,))
p2 = Process(target=b, args=(q,))
p1.start()
p2.start()
p1.join()
p2.join()
So, this is my first Python project after an online introduction course and I am struggling here big time. I understand now, that the functions don't truly run in parallel, as b does not start until a is finished ( I used this answer an tinkered with the timer and while True). EDIT: Even after using the approach given in the answer, I think this is still the case, as the queue.get() throws an IndexError saying, the deque is empty. I can only explain that with process a not finishing, because when I print queue.get()
immediately after .put(x) it is not empty.
I eventually want an output like this:
send [3,4,6]
3
4
6
3
4
send [3,8,6,5] #the code above gives always 3 entries, but in my project
3 #the length varies
8
6
5
3
8
6
.
.
What do I need for having two truly parallel loops where one is returning an updated list every x minutes which the other loop needs as basis for analysis? Is Process really the right tool here?
And where can I get good info about designing my program.
I did something a little like this a while ago. I think using the Process is the correct approach, but if you want to pass data between processes then you should probably use a Queue.
https://docs.python.org/2/library/multiprocessing.html#exchanging-objects-between-processes
Create the queue first and pass it into both processes. One can write to it, the other can read from it.
One issue I remember is that the reading process will block on the queue until something is pushed to it, so you may need to push a special 'terminate' message of some kind to the queue when process 1 is done so process 2 knows to stop.
EDIT: Simple example. This doesn't include a clean way to stop the processes. But it shows how you can start 2 new processes and pass data from one to the other. Since the queue blocks on get() function b will automatically wait for data from a before continuing.
from multiprocessing import Process, Queue
import time, random
def a(queue):
while True:
x=[random.randint(0,10), random.randint(0,10), random.randint(0,10)]
print 'send', x
queue.put(x)
time.sleep(5)
def b(queue):
x = []
while True:
time.sleep(1)
try:
x = queue.get(False)
print 'receive', x
except:
pass
for i in x:
print i
if __name__ == '__main__':
q = Queue()
p1 = Process(target=a, args=(q,))
p2 = Process(target=b, args=(q,))
p1.start()
p2.start()
p1.join()
p2.join()
I am about to start on an endevour with python. The goal is to multithread different tasks and use queues to communicate between tasks. For the sake of clarity I would like to be able to pass a queue to a sub-function, thus sending information to the queue from there. So something similar like so:
from queue import Queue
from threading import Thread
import copy
# Object that signals shutdown
_sentinel = object()
# increment function
def increment(i, out_q):
i += 1
print(i)
out_q.put(i)
return
# A thread that produces data
def producer(out_q):
i = 0
while True:
# Produce some data
increment( i , out_q)
if i > 5:
out_q.put(_sentinel)
break
# A thread that consumes data
def consumer(in_q):
while True:
# Get some data
data = in_q.get()
# Process the data
# Check for termination
if data is _sentinel:
in_q.put(_sentinel)
break
# Create the shared queue and launch both threads
q = Queue()
t1 = Thread(target=consumer, args=(q,))
t2 = Thread(target=producer, args=(q,))
t1.start()
t2.start()
# Wait for all produced items to be consumed
q.join()
Currently the output is a row of 0's, where I would like it to be the numbers 1 to 6. I have read the difficulty of passing references in python, but would like to clarify if this is just not possible in python or am I looking at this issue wrongly?
The problem has nothing to do with the way the queues are passed; you're doing that right. The issue is actually related to how you're trying to increment i. Because variable in python are passed by assignment, you have to actually return the incremented value of i back to the caller for the change you made inside increment to have any effect. Otherwise, you just rebind the local variable i inside of increment, and then i gets thrown away when increment completes.
You can also simplify your consume method a bit by using the iter built-in function, along with a for loop, to consume from the queue until _sentinel is reached, rather than a while True loop:
from queue import Queue
from threading import Thread
import copy
# Object that signals shutdown
_sentinel = object()
# increment function
def increment(i):
i += 1
return i
# A thread that produces data
def producer(out_q):
i = 0
while True:
# Produce some data
i = increment( i )
print(i)
out_q.put(i)
if i > 5:
out_q.put(_sentinel)
break
# A thread that consumes data
def consumer(in_q):
for data in iter(in_q.get, _sentinel):
# Process the data
pass
# Create the shared queue and launch both threads
q = Queue()
t1 = Thread(target=consumer, args=(q,))
t2 = Thread(target=producer, args=(q,))
t1.start()
t2.start()
Output:
1
2
3
4
5
6
I need to implement something like this
def turnOn(self):
self.isTurnedOn = True
while self.isTurnedOn:
updateThread = threading.Thread(target=self.updateNeighborsList, args=())
updateThread.daemon = True
updateThread.start()
time.sleep(1)
def updateNeighborsList(self):
self.neighbors=[]
for candidate in points:
distance = math.sqrt((candidate.X-self.X)**2 + (candidate.Y-self.Y)**2)
if distance <= maxDistance and candidate!=self and candidate.isTurnedOn:
self.neighbors.append(candidate)
print self.neighbors
print points
This is a class member function from which updateNeighborsList function should be called every second until self.isTurnedOn == True.
When I create class object and call turnOn function, all following statements are not being executed, it takes the control and stacks on that while loop, but I need a lot of objects of class.
What is the correct way to do this kind of thing?
I think you'd be better off creating a single Thread when turnOn is called, and have the looping happen inside that thread:
def turnOn(self):
self.isTurnedOn = True
self.updateThread = threading.Thread(target=self.updateNeighborsList, args=())
self.updateThread.daemon = True
self.updateThread.start()
def updateNeighborsList(self):
while self.isTurnedOn:
self.neighbors=[]
for candidate in points:
distance = math.sqrt((candidate.X-self.X)**2 + (candidate.Y-self.Y)**2)
if distance <= maxDistance and candidate!=self and candidate.isTurnedOn:
self.neighbors.append(candidate)
print self.neighbors
print points
time.sleep(1)
Note, though, that doing mathematical calculations inside of a thread will not improve performance at all using CPython, because of the Global Interpreter Lock. In order to utilize multiple cores in parallel, you'll need to use the multiprocessing module instead. However, if you're just trying to prevent your main thread from blocking, feel free to stick with threads. Just know that only one thread will ever actually be running at a time.