The code has been vastly simplified, but should serve to illustrate my question.
S = ('A1RT', 'BDF7', 'CP09')
for s in S:
if is_valid(s): # very slow!
process(s)
I have a collection of strings obtained from a site-scrape. (Strings will be retrieved from site-scrapes periodically.) Each of these strings need to be validated, over the network, against a third party. The validation process can be slow at times, which is problematic. Due to the iterative nature of the above code, it may take some time before the last string is validated and processed.
Is there a proper way to parallelize the above logic in Python? To be frank, I'm not very familiar with concurrency / parallel-processing concepts, but it would seem as though they may be useful in this circumstance. Thoughts?
The concurrent.futures module is a great way to start work on "embarrassingly parallel" problems, and can very easily be switched between using either multiple processes or multiple threads within a single process.
In your case, it sounds like the "hard work" is being done on other machines over the network, and your main program will spend most of its time waiting for them to deliver results. If so, threads should work fine. Here's a complete, executable toy example:
import concurrent.futures as cf
def is_valid(s):
import random
import time
time.sleep(random.random() * 10)
return random.choice([False, True])
NUM_WORKERS = 10 # number of threads you want to run
strings = list("abcdefghijklmnopqrstuvwxyz")
with cf.ThreadPoolExecutor(max_workers=NUM_WORKERS) as executor:
# map a future object to the string passed to is_valid
futures = {executor.submit(is_valid, s): s for s in strings}
# `as_complete()` returns results in the order threads
# complete work, _not_ necessarily in the order the work
# was passed out
for future in cf.as_completed(futures):
result = future.result()
print(futures[future], result)
And here's sample output from one run:
g False
i True
j True
b True
f True
e True
k False
h True
c True
l False
m False
a False
s False
v True
q True
p True
d True
n False
t False
z True
o True
y False
r False
w False
u True
x False
concurrent.futures handles all the headaches of starting threads, parceling out work for them to do, and noticing when threads deliver results.
As written above, up through 10 (NUM_WORKERS) is_valid() invocations can be active simultaneously. as_completed() returns a future object as soon as its result is ready to retrieve, and the executor automatically hands the thread that computed the result another string for is_valid() to chew on.
Related
I'm working on an optimization problem, and you can see a simplified version of my code posted below (the origin code is too complicated for asking such a question, and I hope my simplified code has simulated the original one as much as possible).
My purpose:
use the function foo in the function optimization, but foo can take very long time due to some hard situations. So I use multiprocessing to set a time limit for execution of the function (proc.join(iter_time), the method is from an anwser from this question; How to limit execution time of a function call?).
My problem:
In the while loop, every time the generated values for extra are the same.
The list lst's length is always 1, which means in every iteration in the while loop it starts from an empty list.
My guess: possible reason can be each time I create a process the random seed is counting from the beginning, and each time the process is terminated, there could be some garbage collection mechanism to clean the memory the processused, so the list is cleared.
My question
Anyone know the real reason of such problems?
if not using multiprocessing, is there anyway else that I can realize my purpose while generate different random numbers? btw I have tried func_timeout but it has other problems that I cannot handle...
random.seed(123)
lst = [] # a global list for logging data
def foo(epoch):
...
extra = random.random()
lst.append(epoch + extra)
...
def optimization(loop_time, iter_time):
start = time.time()
epoch = 0
while time.time() <= start + loop_time:
proc = multiprocessing.Process(target=foo, args=(epoch,))
proc.start()
proc.join(iter_time)
if proc.is_alive(): # if the process is not terminated within time limit
print("Time out!")
proc.terminate()
if __name__ == '__main__':
optimization(300, 2)
You need to use shared memory if you want to share variables across processes. This is because child processes do not share their memory space with the parent. Simplest way to do this here would be to use managed lists and delete the line where you set a number seed. This is what is causing same number to be generated because all child processes will take the same seed to generate the random numbers. To get different random numbers either don't set a seed, or pass a different seed to each process:
import time, random
from multiprocessing import Manager, Process
def foo(epoch, lst):
extra = random.random()
lst.append(epoch + extra)
def optimization(loop_time, iter_time, lst):
start = time.time()
epoch = 0
while time.time() <= start + loop_time:
proc = Process(target=foo, args=(epoch, lst))
proc.start()
proc.join(iter_time)
if proc.is_alive(): # if the process is not terminated within time limit
print("Time out!")
proc.terminate()
print(lst)
if __name__ == '__main__':
manager = Manager()
lst = manager.list()
optimization(10, 2, lst)
Output
[0.2035898948744943, 0.07617925389396074, 0.6416754412198231, 0.6712193790613651, 0.419777147554235, 0.732982735576982, 0.7137712131028766, 0.22875414425414997, 0.3181113880578589, 0.5613367673646847, 0.8699685474084119, 0.9005359611195111, 0.23695341111251134, 0.05994288664062197, 0.2306562314450149, 0.15575356275408125, 0.07435292814989103, 0.8542361251850187, 0.13139055891993145, 0.5015152768477814, 0.19864873743952582, 0.2313646288041601, 0.28992667535697736, 0.6265055915510219, 0.7265797043535446, 0.9202923318284002, 0.6321511834038631, 0.6728367262605407, 0.6586979597202935, 0.1309226720786667, 0.563889613032526, 0.389358766191921, 0.37260564565714316, 0.24684684162272597, 0.5982042933298861, 0.896663326233504, 0.7884030244369596, 0.6202229004466849, 0.4417549843477827, 0.37304274232635715, 0.5442716244427301, 0.9915536257041505, 0.46278512685707873, 0.4868394190894778, 0.2133187095154937]
Keep in mind that using managers will affect performance of your code. Alternate to this, you could also use multiprocessing.Array, which is faster than managers but is less flexible in what data it can store, or Queues as well.
I want a while loop that only run when certain condition is met. For example, I need to loop the condition A if listposts != 0 and listposts != listView: to check whether there is a new record or not. If it found a new record it will do function B and stop until the condition is met again.
I'm new to programming and I tried with this code but its still looping endlessly.
while True:
if listposts != 0 and listposts != listView:
Condition = True
while Condition == True :
function B()
Condition = False
What I want to achieve is the loop will stop after 1 loop and wait until the condition is met to loop again.
From what you expect as behavior you need 3 things:
a condition-test (as function) returning either True or False
a loop that calls the condition-test regularly
a conditional call of function B() when condition is met (or condition-test function returns True)
# list_posts can change and is used as parameter
# listView is a global variable (could also be defined as parameter)
# returns either True or Fals
def condition_applies(list_posts):
return list_posts != 0 and list_posts != listView
# Note: method names are by convention lower-case
def B():
print("B was called")
# don't wait ... just loop and test until when ?
# until running will become False
running = True
while running:
if condition_applies(listposts):
print("condition met, calling function B ..")
B()
# define an exit-condition to stop at some time
running = True
Warning: This will be an endless-loop!
So you need at some point in time to set running = False.
Otherwise the loop will continue infinite and check if condition applies.
To me it seems that you have a producer/consumer like situation.
IMHO your loop is ok. The principle applied here is called polling. Polling keeps looking for new items by constantly asking.
Another way of implementing this in a more CPU optimized way (using less CPU) requires synchronization. A synchronization object such as a mutex or semaphore will be signaled when a new element is available for processing. The processing thread can then be stopped (e.g. WaitForSingleObject() on Windows), freeing the CPU for other things to do. When being signaled, Windows finds out that the thread should wake up and let's it run again.
Queue.get() and Queue.put() are such methods that have synchronization built-in.
Unfortunately, I see many developers using Sleep() instead, which is not the right tool for the job.
Here's a producer/consumer example:
from threading import Thread
from time import sleep
import random
import queue
q = queue.Queue(10) # Shared resource to work on. Synchronizes itself
producer_stopped = False
consumer_stopped = False
def producer():
while not producer_stopped:
try:
item = random.randint(1, 10)
q.put(item, timeout=1.0)
print(f'Producing {str(item)}. Size: {str(q.qsize())}')
except queue.Full:
print("Consumer is too slow. Consider using more consumers.")
def consumer():
while not consumer_stopped:
try:
item = q.get(timeout=1.0)
print(f'Consuming {str(item)}. Size: {str(q.qsize())}')
except queue.Empty:
if not consumer_stopped:
print("Producer is too slow. Consider using more producers.")
if __name__ == '__main__':
producer_stopped = False
p = Thread(target=producer)
p.start()
consumer_stopped = False
c = Thread(target=consumer)
c.start()
sleep(2) # run demo for 2 seconds. This is not for synchronization!
producer_stopped = True
p.join()
consumer_stopped = True
c.join()
I am trying to calculate whether a given number is prime or not with the formula :
(n-1)! mod n =? (n-1)
I must calculate the factorial with different threads and make them work simultaneously and control if they're all finished and if so then join them. By doing so I will be calculating factorial with different threads and then be able to take the modulo. However even though my code works fine with the small prime numbers it is taking too long to execute when the number is too big. I searched my code and couldn't really find alternative that can slow down the execution time. Here is my code :
import threading
import time
# GLOBAL VARIABLE
result = 1
# worker class in order to multiply on threads
class Worker:
# initiating the worker class
def __init__(self):
super().__init__()
self.jobs = []
# the function that does the actual multiplying
def multiplier(self,beg,end):
global result
for i in range(beg,end+1):
result*= i
#print("\tresult updated with *{}:".format(i),result)
#print("Calculating from {} to {}".format(beg,end)," : ",result)
# appending threads to the object
def append_job(self,job):
self.jobs.append(job)
# function that is to see the threads
def see_jobs(self):
return self.jobs
# initiating the threads
def initiate(self):
for j in self.jobs:
j.start()
# finalizing and joining the threads
def finalize(self):
for j in self.jobs:
j.join()
# controlling the threads by blocking them untill all threads are asleep
def work(self):
while True:
if 0 == len([t for t in self.jobs if t.is_alive()]):
self.finalize()
break
# this is the function to split the factorial into several threads
def splitUp(n,t):
# defining the remainder and the whole
remainder, whole = (n-1) % t, (n-1) // t
# deciding to tuple count
tuple_count = whole if remainder == 0 else whole + 1
# empty result list
result = []
# iterating
beginning = 1
end = (n-1) // t
for i in range(1,tuple_count+1):
if i == tuple_count:
result.append((beginning,n-1)) # if we are at the end, just append all to end
else:
result.append((beginning,end*i))
beginning = end*i + 1
return result
if __name__ == "__main__":
threads = 64
number = 743
splitted = splitUp(number,threads)
worker = Worker()
#print(worker.see_jobs())
s = time.time()
# creating the threads
for arg in splitted:
thread = threading.Thread(target=worker.multiplier(arg[0],arg[1]))
worker.append_job(thread)
worker.initiate()
worker.work()
e = time.time()
print("result found with {} threads in {} secs\n".format(threads,e-s))
if result % number == number-1:
print("PRIME")
else:
print("NOT PRIME")
"""
-------------------- REPORT ------------------------
result found with 2 threads in 6.162530899047852 secs
result found with 4 threads in 0.29897499084472656 secs
result found with 16 threads in 0.009003162384033203 secs
result found with 32 threads in 0.0060007572174072266 secs
result found with 64 threads in 0.0029952526092529297 secs
note that: these results may differ from machine to machine
-------------------------------------------------------
"""
Thanks in advance.
First and foremost, you have a critical error in your code that you haven't reported or tried to trace:
======================== RESTART: ========================
result found with 2 threads in 5.800899267196655 secs
NOT PRIME
>>>
======================== RESTART: ========================
result found with 64 threads in 0.002002716064453125 secs
PRIME
>>>
As the old saying goes, "if the program doesn't work, it doesn't matter how fast it is".
The only test case you've given is 743; if you want help to diagnose the logic error, please determine the minimal test and parallelism that causes the error, and post a separate question.
I suspect that it's with your multplier function, as you're working with an ill-advised global variable in parallel processing, and your multiply operation is not thread-safe.
In assembly terms, you have an unguarded region:
LOAD result
MUL i
STORE result
If this is interleaved with the same work from another process, the result is virtually guaranteed to be wrong. You have to make this a critical region.
Once you fix that, you still have your speed problem. Factorial is the poster-child for recursion acceleration. I see two obvious accelerants:
Instead of that horridly slow multiplication loop, use functools.reduce to blast through your multiplication series.
If you're going to loop the program with a series of inputs, then short-cut most of the calculations with memoization. The example on the linked page benefits greatly from multiple-recursion; since factorial is linear, you'd need repeated application to take advantage of the technique.
So what I want to do is run the same function multiple times simultaneously while getting a result as return and storing it in an array or list. It goes like this:
def base_func(matrix,arg1,arg2):
result = []
for row in range(matrix.shape[0]):
#perform necessary operation on row and return a certain value to store it into result
x = func(matrix[row],arg1,arg2)
result.append(x)
return np.array(result)
I tried using threading in python. My implementation goes:
def base_func(matrix,arg1,arg2):
result = []
threads = []
for row in range(matrix.shape[0]):
t = threading.Thread(target=func,args=(matrix[row],arg1,arg2,))
threads.append(t)
t.start()
for t in threads:
res = t.join()
result.append(res)
return np.array(result)
This doesn't seem to work and just returns None from the threads.
From what I read in the documentation of threading.join(), it says:
As join() always returns None, you must call is_alive() after join() to decide whether a timeout happened – if the thread is still alive, the join() call timed out.
You will always get None from these line of your code:
res = t.join()
result.append(res)
This post mentions a similar problem as yours, please follow this for your solution. You might want to use concurrent.futures module as explained in this answer.
In a for loop, I am calling a function twice but with different argument sets (argSet1, argSet2) that change on each iteration of the for loop. I want to parallelize this operation since one set of the arguments causes the called function to run faster, and the other set of arguments causes a slow run of the function. Note that I do not want to have two for loops for this operation. I also have another requirement: Each of these functions will execute some parallel operations and therefore I do not want to have any of the functions with either argSet1 or argSet2 be running more than once, because of the computational limited resources that I have. Making sure that the function with both argument sets is running will help me utilize the CPU cores as much as possible. Here's how do it normally without parallelization:
def myFunc(arg1, arg2):
if arg1:
print ('do something that does not take too long')
else:
print ('do something that takes long')
for i in range(10):
argSet1 = arg1Storage[i]
argSet1 = arg2Storage[i]
myFunc(argSet1)
myFunc(argSet2)
This will definitely not take the advantage of the computational resources that I have. Here's my try to parallelize the operations:
from multiprocessing import Process
def myFunc(arg1, arg2):
if arg1:
print ('do something that does not take too long')
else:
print ('do something that takes long')
for i in range(10):
argSet1 = arg1Storage[i]
argSet1 = arg2Storage[i]
p1 = Process(target=myFunc, args=argSet1)
p1.start()
p2 = Process(target=myFunc, args=argSet2)
p2.start()
However, this way each function with its respective arguments will be called 10 times and things become extremely slow. Given my limited knowledge of multiprocessing, I tried to improve things a bit more by adding p1.join() and p2.join() to the end of the for loop but this still causes slow down as p1 is done much faster and things wait until p2 is done. I also thought about using multiprocessing.Value to do some communication with the functions but then I have to add a while loop inside the function for each of the function calls which slows down everything again. I wonder if someone can offer a practical solution?
Since I built this answer in patches, scroll down for the best solution to this problem
You need specify to exactly how you want things to run. As far as I can tell, you want two processes to run at most, but also at least. Also, you do not want the heavy call to hold up the fast ones. One simple non-optimal way to run is:
from multiprocessing import Process
def func(counter,somearg):
j = 0
for i in range(counter): j+=i
print(somearg)
def loop(counter,arglist):
for i in range(10):
func(counter,arglist[i])
heavy = Process(target=loop,args=[1000000,['heavy'+str(i) for i in range(10)]])
light = Process(target=loop,args=[500000,['light'+str(i) for i in range(10)]])
heavy.start()
light.start()
heavy.join()
light.join()
The output here is (for one example run):
light0
heavy0
light1
light2
heavy1
light3
light4
heavy2
light5
light6
heavy3
light7
light8
heavy4
light9
heavy5
heavy6
heavy7
heavy8
heavy9
You can see the last part is sub-optimal, since you have a sequence of heavy runs - which means there is one process instead of two.
An easy way to optimize this, if you can estimate how much longer is the heavy process running. If it's twice as slow, as here, just run 7 iterations of heavy first, join the light process, and have it run the additional 3.
Another way is to run the heavy process in pairs, so at first you have 3 processes until the fast process ends, and then continues with 2.
The main point is separating the heavy and light calls to another process entirely - so while the fast calls complete one after the other quickly you can work your slow stuff. Once th fast ends, it's up to you how elaborate do you want to continue, but I think for now estimating how to break up the heavy calls is good enough. This is it for my example:
from multiprocessing import Process
def func(counter,somearg):
j = 0
for i in range(counter): j+=i
print(somearg)
def loop(counter,amount,arglist):
for i in range(amount):
func(counter,arglist[i])
heavy1 = Process(target=loop,args=[1000000,7,['heavy1'+str(i) for i in range(7)]])
light = Process(target=loop,args=[500000,10,['light'+str(i) for i in range(10)]])
heavy2 = Process(target=loop,args=[1000000,3,['heavy2'+str(i) for i in range(7,10)]])
heavy1.start()
light.start()
light.join()
heavy2.start()
heavy1.join()
heavy2.join()
with output:
light0
heavy10
light1
light2
heavy11
light3
light4
heavy12
light5
light6
heavy13
light7
light8
heavy14
light9
heavy15
heavy27
heavy16
heavy28
heavy29
Much better utilization. You can of course make this more advanced by sharing a queue for the slow process runs, so when the fast are done they can join as workers on the slow queue, but for only two different calls this may be overkill (though not much harder using the queue). The best solution:
from multiprocessing import Queue,Process
import queue
def func(index,counter,somearg):
j = 0
for i in range(counter): j+=i
print("Worker",index,':',somearg)
def worker(index):
try:
while True:
func,args = q.get(block=False)
func(index,*args)
except queue.Empty: pass
q = Queue()
for i in range(10):
q.put((func,(500000,'light'+str(i))))
q.put((func,(1000000,'heavy'+str(i))))
nworkers = 2
workers = []
for i in range(nworkers):
workers.append(Process(target=worker,args=(i,)))
workers[-1].start()
q.close()
for worker in workers:
worker.join()
This is the best and most scalable solution for what you want. Output:
Worker 0 : light0
Worker 0 : light1
Worker 1 : heavy0
Worker 1 : light2
Worker 0 : heavy1
Worker 0 : light3
Worker 1 : heavy2
Worker 1 : light4
Worker 0 : heavy3
Worker 0 : light5
Worker 1 : heavy4
Worker 1 : light6
Worker 0 : heavy5
Worker 0 : light7
Worker 1 : heavy6
Worker 1 : light8
Worker 0 : heavy7
Worker 0 : light9
Worker 1 : heavy8
Worker 0 : heavy9
You might want to use a multiprocessing.Pool of processes and map your myFunc into it, like so:
from multiprocessing import Pool
import time
def myFunc(arg1, arg2):
if arg1:
print ('do something that does not take too long')
time.sleep(0.01)
else:
print ('do something that takes long')
time.sleep(1)
def wrap(args):
return myFunc(*args)
if __name__ == "__main__":
p = Pool()
argStorage = [(True, False), (False, True)] * 12
p.map(wrap, argStorage)
I added a wrap function, since the function passed to p.map must accept a single argument. You could just as well adapt myFunc to accept a tuple, if that's possible in your case.
My sample appStorage constists of 24 items, where 12 of them will take 1sec to process, and 12 will be done in 10ms. In total, this script runs in 3-4 seconds (I have 4 cores).
One possible implementation could be as follow:
import concurrent.futures
import math
list_of_args = [arg1, arg2]
def my_func(arg):
....
print ('do something that takes long')
def main():
with concurrent.futures.ProcessPoolExecutor() as executor:
for arg, result in zip(list_of_args, executor.map(is_prime, list_of_args)):
print('my_func({0}) => {1}'.format(arg, result))
executor.map is like the built in function, the map method allows multiple calls to a provided function, passing each of the items in an iterable to that function.