My code
import time
from multiprocessing.pool import ThreadPool
from concurrent.futures import ThreadPoolExecutor
def print_function(tests):
while True:
print tests
time.sleep(2)
executor = ThreadPoolExecutor(max_workers=2)
for i in range(5):
a = executor.submit(print_function(i))
output
0 0 0 0 0 0 0 0...
but I want out 012345, 012345, 012345...
How can I do this ?
In the line
a = executor.submit(print_function(i))
^^^^^^^^^^^^^^^^^
you are calling the function already. Since it has a while True, it will never finish and thus submit() will never be reached.
The solution is to pass the function as a reference and the argument separately:
a = executor.submit(print_function, i)
However, notice that you will not get the output you like (012345), since a) the range will stop at 4 and b) you kick off only 2 workers and c) the operating system will choose which process to run, so that will be seemingly random (more like 310254).
Related
I've read an article about multithreading in Python where they trying to use Synchronization to solve race condition issue. And I've run the example code below to reproduce race condition issue:
import threading
# global variable x
x = 0
def increment():
"""
function to increment global variable x
"""
global x
x += 1
def thread_task():
"""
task for thread
calls increment function 100000 times.
"""
for _ in range(100000):
increment()
def main_task():
global x
# setting global variable x as 0
x = 0
# creating threads
t1 = threading.Thread(target=thread_task)
t2 = threading.Thread(target=thread_task)
# start threads
t1.start()
t2.start()
# wait until threads finish their job
t1.join()
t2.join()
if __name__ == "__main__":
for i in range(10):
main_task()
print("Iteration {0}: x = {1}".format(i,x))
It does return the same result as the article when I'm using Python 2.7.15. But it does not when I'm using Python 3.6.9 (all threads return the same result = 200000).
I wonder that does new implementation of GIL (since Python 3.2) was handled race condition issue? If it does, why Lock, Mutex still exist in Python >3.2 . If it doesn't, why there is no conflict when running multi threading to modify shared resource like the example above?
My mind was struggling with those question in these days when I'm trying to understand more about how Python really works under the hood.
The change you are referring to was to replace check interval with switch interval. This meant that rather than switching threads every 100 byte codes it would do so every 5 milliseconds.
Ref: https://pymotw.com/3/sys/threads.html https://mail.python.org/pipermail/python-dev/2009-October/093321.html
So if your code ran fast enough, it would never experience a thread switch and it might appear to you that the operations were atomic when they are in fact not. The race condition did not appear as there was no actual interweaving of threads. x += 1 is actually four byte codes:
>>> dis.dis(sync.increment)
11 0 LOAD_GLOBAL 0 (x)
3 LOAD_CONST 1 (1)
6 INPLACE_ADD
7 STORE_GLOBAL 0 (x)
10 LOAD_CONST 2 (None)
13 RETURN_VALUE
A thread switch in the interpreter can occur between any two bytecodes.
Consider that in 2.7 this prints 200000 always because the check interval is set so high that each thread completes in its entirety before the next runs. The same can be constructed with switch interval.
import sys
import threading
print(sys.getcheckinterval())
sys.setcheckinterval(1000000)
# global variable x
x = 0
def increment():
"""
function to increment global variable x
"""
global x
x += 1
def thread_task():
"""
task for thread
calls increment function 100000 times.
"""
for _ in range(100000):
increment()
def main_task():
global x
# setting global variable x as 0
x = 0
# creating threads
t1 = threading.Thread(target=thread_task)
t2 = threading.Thread(target=thread_task)
# start threads
t1.start()
t2.start()
# wait until threads finish their job
t1.join()
t2.join()
if __name__ == "__main__":
for i in range(10):
main_task()
print("Iteration {0}: x = {1}".format(i,x))
The GIL protects individual byte code instructions. In contrast, a race condition is an incorrect ordering of instructions, which means multiple byte code instructions. As a result, the GIL cannot protect against race conditions outside of the Python VM itself.
However, by their very nature race conditions do not always trigger. Certain GIL strategies are more or less likely to trigger certain race conditions. A thread shorter than the GIL window is never interrupted, and one longer than the GIL window is always interrupted.
Your increment function has 6 byte code instructions, as has the inner loop calling it. Of these, 4 instructions must finish at once, meaning there are 3 possible switching points that corrupt the result. Your entire thread_task function takes about 0.015s to 0.020s (on my system).
With the old GIL switching every 100 instructions, the loop is guaranteed to be interrupted every 8.3 calls, or roughly 1200 times. With the new GIL switching every 5ms, the loop is interrupted only 3 times.
In a for loop, I am calling a function twice but with different argument sets (argSet1, argSet2) that change on each iteration of the for loop. I want to parallelize this operation since one set of the arguments causes the called function to run faster, and the other set of arguments causes a slow run of the function. Note that I do not want to have two for loops for this operation. I also have another requirement: Each of these functions will execute some parallel operations and therefore I do not want to have any of the functions with either argSet1 or argSet2 be running more than once, because of the computational limited resources that I have. Making sure that the function with both argument sets is running will help me utilize the CPU cores as much as possible. Here's how do it normally without parallelization:
def myFunc(arg1, arg2):
if arg1:
print ('do something that does not take too long')
else:
print ('do something that takes long')
for i in range(10):
argSet1 = arg1Storage[i]
argSet1 = arg2Storage[i]
myFunc(argSet1)
myFunc(argSet2)
This will definitely not take the advantage of the computational resources that I have. Here's my try to parallelize the operations:
from multiprocessing import Process
def myFunc(arg1, arg2):
if arg1:
print ('do something that does not take too long')
else:
print ('do something that takes long')
for i in range(10):
argSet1 = arg1Storage[i]
argSet1 = arg2Storage[i]
p1 = Process(target=myFunc, args=argSet1)
p1.start()
p2 = Process(target=myFunc, args=argSet2)
p2.start()
However, this way each function with its respective arguments will be called 10 times and things become extremely slow. Given my limited knowledge of multiprocessing, I tried to improve things a bit more by adding p1.join() and p2.join() to the end of the for loop but this still causes slow down as p1 is done much faster and things wait until p2 is done. I also thought about using multiprocessing.Value to do some communication with the functions but then I have to add a while loop inside the function for each of the function calls which slows down everything again. I wonder if someone can offer a practical solution?
Since I built this answer in patches, scroll down for the best solution to this problem
You need specify to exactly how you want things to run. As far as I can tell, you want two processes to run at most, but also at least. Also, you do not want the heavy call to hold up the fast ones. One simple non-optimal way to run is:
from multiprocessing import Process
def func(counter,somearg):
j = 0
for i in range(counter): j+=i
print(somearg)
def loop(counter,arglist):
for i in range(10):
func(counter,arglist[i])
heavy = Process(target=loop,args=[1000000,['heavy'+str(i) for i in range(10)]])
light = Process(target=loop,args=[500000,['light'+str(i) for i in range(10)]])
heavy.start()
light.start()
heavy.join()
light.join()
The output here is (for one example run):
light0
heavy0
light1
light2
heavy1
light3
light4
heavy2
light5
light6
heavy3
light7
light8
heavy4
light9
heavy5
heavy6
heavy7
heavy8
heavy9
You can see the last part is sub-optimal, since you have a sequence of heavy runs - which means there is one process instead of two.
An easy way to optimize this, if you can estimate how much longer is the heavy process running. If it's twice as slow, as here, just run 7 iterations of heavy first, join the light process, and have it run the additional 3.
Another way is to run the heavy process in pairs, so at first you have 3 processes until the fast process ends, and then continues with 2.
The main point is separating the heavy and light calls to another process entirely - so while the fast calls complete one after the other quickly you can work your slow stuff. Once th fast ends, it's up to you how elaborate do you want to continue, but I think for now estimating how to break up the heavy calls is good enough. This is it for my example:
from multiprocessing import Process
def func(counter,somearg):
j = 0
for i in range(counter): j+=i
print(somearg)
def loop(counter,amount,arglist):
for i in range(amount):
func(counter,arglist[i])
heavy1 = Process(target=loop,args=[1000000,7,['heavy1'+str(i) for i in range(7)]])
light = Process(target=loop,args=[500000,10,['light'+str(i) for i in range(10)]])
heavy2 = Process(target=loop,args=[1000000,3,['heavy2'+str(i) for i in range(7,10)]])
heavy1.start()
light.start()
light.join()
heavy2.start()
heavy1.join()
heavy2.join()
with output:
light0
heavy10
light1
light2
heavy11
light3
light4
heavy12
light5
light6
heavy13
light7
light8
heavy14
light9
heavy15
heavy27
heavy16
heavy28
heavy29
Much better utilization. You can of course make this more advanced by sharing a queue for the slow process runs, so when the fast are done they can join as workers on the slow queue, but for only two different calls this may be overkill (though not much harder using the queue). The best solution:
from multiprocessing import Queue,Process
import queue
def func(index,counter,somearg):
j = 0
for i in range(counter): j+=i
print("Worker",index,':',somearg)
def worker(index):
try:
while True:
func,args = q.get(block=False)
func(index,*args)
except queue.Empty: pass
q = Queue()
for i in range(10):
q.put((func,(500000,'light'+str(i))))
q.put((func,(1000000,'heavy'+str(i))))
nworkers = 2
workers = []
for i in range(nworkers):
workers.append(Process(target=worker,args=(i,)))
workers[-1].start()
q.close()
for worker in workers:
worker.join()
This is the best and most scalable solution for what you want. Output:
Worker 0 : light0
Worker 0 : light1
Worker 1 : heavy0
Worker 1 : light2
Worker 0 : heavy1
Worker 0 : light3
Worker 1 : heavy2
Worker 1 : light4
Worker 0 : heavy3
Worker 0 : light5
Worker 1 : heavy4
Worker 1 : light6
Worker 0 : heavy5
Worker 0 : light7
Worker 1 : heavy6
Worker 1 : light8
Worker 0 : heavy7
Worker 0 : light9
Worker 1 : heavy8
Worker 0 : heavy9
You might want to use a multiprocessing.Pool of processes and map your myFunc into it, like so:
from multiprocessing import Pool
import time
def myFunc(arg1, arg2):
if arg1:
print ('do something that does not take too long')
time.sleep(0.01)
else:
print ('do something that takes long')
time.sleep(1)
def wrap(args):
return myFunc(*args)
if __name__ == "__main__":
p = Pool()
argStorage = [(True, False), (False, True)] * 12
p.map(wrap, argStorage)
I added a wrap function, since the function passed to p.map must accept a single argument. You could just as well adapt myFunc to accept a tuple, if that's possible in your case.
My sample appStorage constists of 24 items, where 12 of them will take 1sec to process, and 12 will be done in 10ms. In total, this script runs in 3-4 seconds (I have 4 cores).
One possible implementation could be as follow:
import concurrent.futures
import math
list_of_args = [arg1, arg2]
def my_func(arg):
....
print ('do something that takes long')
def main():
with concurrent.futures.ProcessPoolExecutor() as executor:
for arg, result in zip(list_of_args, executor.map(is_prime, list_of_args)):
print('my_func({0}) => {1}'.format(arg, result))
executor.map is like the built in function, the map method allows multiple calls to a provided function, passing each of the items in an iterable to that function.
I wrote a script in Python 3.6 initially using a for loop which called an API, then putting all results into a pandas dataframe and writing them to a SQL database. (approximately 9,000 calls are made to that API every time the script runs).
Realising the calls inside the for loop were processed one-by-one, I decided to use the multiprocessing module to speed things up.
Therefore, I created a module level function called parallel_requests and now I call that instead of having the for loop:
list_of_lists = multiprocessing.Pool(processes=4).starmap(parallel_requests, zip(....))
Side note: I use starmap instead of map only because my parallel_requests function takes multiple arguments which I need to zip.
The good: this approach works and is much faster.
The bad: this approach works but is too fast. By using 4 processes (I tried that because I have 4 cores), parallel_requests is getting executed too fast. More than 15 calls per second are made to the API, and I'm getting blocked by the API itself.
In fact, it only works if I use 1 or 2 processes, otherwise it's too damn fast.
Essentially what I want is to keep using 4 processes, but also to limit the execution of my parallel_requests function to only 15 times per second overall.
Is there any parameter of multiprocessing.Pool that would help with this, or it's more complicated than that?
For this case I'd use a leaky bucket. You can have one process that fills a queue at the proscribed rate, with a maximum size that indicates how many requests you can "bank" if you don't make them at the maximum rate; the worker processes then just need to get from the queue before doing its work.
import time
def make_api_request(this, that, rate_queue):
rate_queue.get()
print("DEBUG: doing some work at {}".format(time.time()))
return this * that
def throttler(rate_queue, interval):
try:
while True:
if not rate_queue.full(): # avoid blocking
rate_queue.put(0)
time.sleep(interval)
except BrokenPipeError:
# main process is done
return
if __name__ == '__main__':
from multiprocessing import Pool, Manager, Process
from itertools import repeat
rq = Manager().Queue(maxsize=15) # conservative; no banking
pool = Pool(4)
Process(target=throttler, args=(rq, 1/15.)).start()
pool.starmap(make_api_request, zip(range(100), range(100, 200), repeat(rq)))
I'll look at the ideas posted here, but in the meantime I've just used a simple approach of opening and closing a Pool of 4 processes for every 15 requests and appending all the results in a list_of_lists.
Admittedly, not the best approach, since it takes time/resources to open/close a Pool, but it was the most handy solution for now.
# define a generator for use below
def chunks(l, n):
"""Yield successive n-sized chunks from l."""
for i in range(0, len(l), n):
yield l[i:i + n]
list_of_lists = []
for current_chunk in chunks(all_data, 15): # 15 is the API's limit of requests per second
pool = multiprocessing.Pool(processes=4)
res = pool.starmap(parallel_requests, zip(current_chunk, [to_symbol]*len(current_chunk), [query]*len(current_chunk), [start]*len(current_chunk), [stop]*len(current_chunk)) )
sleep(1) # Sleep for 1 second after every 15 API requests
list_of_lists.extend(res)
pool.close()
flatten_list = [item for sublist in list_of_lists for item in sublist] # use this to construct a `pandas` dataframe
PS: This solution is really not at all that fast due to the multiple opening/closing of pools. Thanks Nathan Vērzemnieks for suggesting to open just one pool, it's much faster, plus your processor won't look like it's running a stress test.
One way to do is to use Queue, which can share details about api-call timestamps with other processes.
Below is an example how this could work. It takes the oldest entry in queue, and if it is younger than one second, sleep functions is called for the duration of the difference.
from multiprocessing import Pool, Manager, queues
from random import randint
import time
MAX_CONNECTIONS = 10
PROCESS_COUNT = 4
def api_request(a, b):
time.sleep(randint(1, 9) * 0.03) # simulate request
return a, b, time.time()
def parallel_requests(a, b, the_queue):
try:
oldest = the_queue.get()
time_difference = time.time() - oldest
except queues.Empty:
time_difference = float("-inf")
if 0 < time_difference < 1:
time.sleep(1-time_difference)
else:
time_difference = 0
print("Current time: ", time.time(), "...after sleeping:", time_difference)
the_queue.put(time.time())
return api_request(a, b)
if __name__ == "__main__":
m = Manager()
q = m.Queue(maxsize=MAX_CONNECTIONS)
for _ in range(0, MAX_CONNECTIONS): # Fill the queue with zeroes
q.put(0)
p = Pool(PROCESS_COUNT)
# Create example data
data_length = 100
data1 = range(0, data_length) # Just some dummy-data
data2 = range(100, data_length+100) # Just some dummy-data
queue_iterable = [q] * (data_length+1) # required for starmap -function
list_of_lists = p.starmap(parallel_requests, zip(data1, data2, queue_iterable))
print(list_of_lists)
I have been looking around for some time, but haven't had luck finding an example that could solve my problem. I have added an example from my code. As one can notice this is slow and the 2 functions could be done separately.
My aim is to print every second the latest parameter values. At the same time the slow processes can be calculated in the background. The latest value is shown and when any process is ready the value is updated.
Can anybody recommend a better way to do it? An example would be really helpful.
Thanks a lot.
import time
def ProcessA(parA):
# imitate slow process
time.sleep(5)
parA += 2
return parA
def ProcessB(parB):
# imitate slow process
time.sleep(10)
parB += 5
return parB
# start from here
i, parA, parB = 1, 0, 0
while True: # endless loop
print(i)
print(parA)
print(parB)
time.sleep(1)
i += 1
# update parameter A
parA = ProcessA(parA)
# update parameter B
parB = ProcessB(parB)
I imagine this should do it for you. This has the benefit of you being able to add extra parallel funcitons up to a total equal to the number of cores you have. Edits are welcome.
#import time module
import time
#import the appropriate multiprocessing functions
from multiprocessing import Pool
#define your functions
#whatever your slow function is
def slowFunction(x):
return someFunction(x)
#printingFunction
def printingFunction(new,current,timeDelay):
while new == current:
print current
time.sleep(timeDelay)
#set the initial value that will be printed.
#Depending on your function this may take some time.
CurrentValue = slowFunction(someTemporallyDynamicVairable)
#establish your pool
pool = Pool()
while True: #endless loop
#an asynchronous function, this will continue
# to run in the background while your printing operates.
NewValue = pool.apply_async(slowFunction(someTemporallyDynamicVairable))
pool.apply(printingFunction(NewValue,CurrentValue,1))
CurrentValue = NewValue
#close your pool
pool.close()
My problem is, whenever I use thr.results() the program acts like its running on one thread. But when i don't you use thr.results() it will use x threads
so if I remove my if statement, it will run on 10 threads, if I have it in there it will act like its on one 1 thread
def search(query):
r = requests.get("https://www.google.com/search?q=" + query)
return r.status_code
pool = ThreadPoolExecutor(max_workers=10)
for i in range(50):
thr = pool.submit(search, "stocks")
print(i)
if thr.result() != 404:
print("Ran")
pool.shutdown(wait=True)
That's because result will wait for the future to complete:
Return the value returned by the call. If the call hasn’t yet completed then this method will wait up to timeout seconds. If the call hasn’t completed in timeout seconds, then a concurrent.futures.TimeoutError will be raised. timeout can be an int or float. If timeout is not specified or None, there is no limit to the wait time.
When you have result within a loop you submit a task, then wait it to complete and then submit another one so there can be only one task running at a time.
Update You can either store the returned futures to a list and iterate over them once you have submitted all the task. Other option is to use map:
from concurrent.futures import ThreadPoolExecutor
import time
def square(x):
time.sleep(0.3)
return x * x
print(time.time())
with ThreadPoolExecutor(max_workers=3) as pool:
for res in pool.map(square, range(10)):
print(res)
print(time.time())
Output:
1485845609.983702
0
1
4
9
16
25
36
49
64
81
1485845611.1942203