I trying run parellel get requests using multiprocessing.dummy with report by progress.
from multiprocessing.dummy import Pool
from functools import partial
class Test(object):
def __init__(self):
self.count = 0
self.threads = 10
def callback(self, total, x):
self.count += 1
if self.count%100==0:
print("Working ({}/{}) cases processed.".format(self.count, total))
def do_async(self):
thread_pool = Pool(self.threads)#self.threads
input_list = link
callback = partial(self.callback, len(link))
tasks = [thread_pool.apply_async(get_data, (x,), callback=callback) for x in input_list]
return (task.get() for task in tasks)
start = time.time()
t = Test()
results = t.do_async()
end = time.time()`
the result of the operation - the same time as the non-parallel requests
CPython is inherently single-threaded due to something called the Global Interpreter Lock (GIL). This means only one thread can run at a time, even if there are multiple CPU cores available. multiprocessing.dummy is just a wrapper for using threads, so this is why you are not getting a speed up.
To get the benefit of having multiple CPUs, you must use multiprocessing itself. However, there are overheads based on the cost of sending and receiving the input and output data of the sub-process. If the cost of this is greater than the amount of work done by the sub-process then using multiprocessing can actually slow your program down. So in your example, multiprocessing would likely not give you a speed increase. This is especially true as most of the work in the callback involves printing to standard out, which all the processes in the pool must synchronise over to prevent garbage being printed out.
i found solution in concurrent.futures:
import concurrent.futures as futures
import datetime
import sys
results=[]
print("start", datetime.datetime.now().isoformat())
start =time.time()
with futures.ThreadPoolExecutor(max_workers=100) as executor:
fs = [executor.submit(get_data, url) for url in link]
for i, f in enumerate(futures.as_completed(fs)):
results.append(f.result())
if i%100==0:
sys.stdout.write("line nr: {} / {} \r".format(i, len(link)))
Related
I'm trying to scan all the cloudwatch log groups(nearly 10k log groups) and check for subscription filters in my AWS Account.Since we have an lambda execution time restriction of 15 mins.I'm using Multiprocessing for this to complete it by 15 mins. Here is my code. When i execute this, its code is giving a timeout error
import time
import concurrent.futures
import boto3
from multiprocessing import Process, Pipe
logs = boto3.client('logs')
def describe_log_groups():
paginator = logs.get_paginator('describe_log_groups')
for page in paginator.paginate():
for log_groups in page['logGroups']:
yield(log_groups)
def describe_subscription_filter(loggroupname,conn):
print('In Subscription Filters')
response = logs.describe_subscription_filters(logGroupName=loggroupname)['subscriptionFilters']
if len(response) != 0:
for log in response:
print(log['destinationArn'])
conn.send([log['destinationArn']])
conn.close()
def lambda_handler(event, context):
t1 = time.perf_counter()
evlaute_loggroups = []
processes = []
parent_connections = []
loggroups_list = describe_log_groups()
for loggroup in loggroups_list:
parent_conn, child_conn = Pipe()
parent_connections.append(parent_conn)
print(parent_connections)
print(loggroup['logGroupName'])
process = Process(target=describe_subscription_filter, args=(loggroup['logGroupName'], child_conn,))
processes.append(process)
for process in processes:
process.start()
for process in processes:
process.join()
for parent_connection in parent_connections:
print(parent_connection.recv()[0])
print('done')
t2 = time.perf_counter()
print(f'Finished in {t2-t1} seconds')
I also have a doubt, using multiprocessing can we scan huge amount of log groups in Lambda.
Using multiprocessing in lambda is not going to help much. The computational power of your function is related to its RAM allocation.
If you want your function to run faster, you have to give it more RAM. With 1792 MB of RAM your function gets an allocation of 1 vCPU. This means that even with max amount of RAM (3008 MB) you will not get 2 vCPUs. Since one 1vCP can be considered as equivalent to 1 hyper-thread on a physical CPU core, your lambda function is basically limited to one thread.
You can consider the following options:
check execution time with more RAM,
simplify your code. Instead of having one large function, have few smaller function which can be orchestrated using Step Functions for instance,
move from lambda to other service, e.g. ECS.
I'm new in python threading and I'm experimenting this:
When I run something in threads (whenever I print outputs), it never seems to be running in parallel. Also, my functions take the same time that before using the library concurrent.futures (ThreadPoolExecutor).
I have to calculate the gains of some attributes over a dataset (I cannot use libraries). Since I have about 1024 attributes and the function was taking about a minute to execute (and I have to use it in a for iteration) I dicided to split the array of attributes into 10 (just as an example) and run the separete function gain(attribute) separetly for each sub array. So I did the following (avoiding some extra unnecessary code):
def calculate_gains(self):
splited_attributes = np.array_split(self.attributes, 10)
result = {}
for atts in splited_attributes:
with concurrent.futures.ThreadPoolExecutor() as executor:
future = executor.submit(self.calculate_gains_helper, atts)
return_value = future.result()
self.gains = {**self.gains, **return_value}
Here's the calculate_gains_helper:
def calculate_gains_helper(self, attributes):
inter_result = {}
for attribute in attributes:
inter_result[attribute] = self.gain(attribute)
return inter_result
Am I doing something wrong? I read some other older posts but I couldn't get any info.
Thanks a lot for any help!
Python threads do not run in parallel (at least in CPython implementation) because of the GIL. Use processes and ProcessPoolExecutor to really have parallelism
with concurrent.futures.ProcessPoolExecutor() as executor:
...
You submit and then wait for each work item serially so all the threads do is slow everything down. I can't guarantee this will speed things up much because you are still dealing with the python GIL that keeps python level stuff from working in parallel, but here goes.
I've created a thread pool and pushed everything possible into the worker, including the slicing of self.attributes.
def calculate_gains(self):
with concurrent.futures.ThreadPoolExecutor(max_workers=10) as executor:
result_list = executor.map(self.calculate_gains_helper,
((i, i+10) for i in range(0, len(self.attributes), 10)))
for return_value in result_list:
self.gains = {**self.gains, **return_value}
def calculate_gains_helper(self, start_end):
start, end = start_end
inter_result = {}
for attribute in self.attributes[start:end]:
inter_result[attribute] = self.gain(attribute)
return inter_result
I have tested a multiprocess and thread in python, but multiprocess is slower than thread, and I calculate a distance using editdistance, my code like:
def calc_dist(kw, trie_word):
dists = []
while len(trie_word) != 0:
w = trie_word.pop()
dist = editdistance.eval(kw, w)
dists.append((w, dist))
return dists
if __name__ == "__main__":
word_list = [str(i) for i in range(1, 10000001)]
key_word = '2'
print("calc")
s = time.time()
with Pool(processes=4) as pool:
result = pool.apply_async(calc_dist, (key_word, word_list))
print(len(result.get()))
print("用时",time.time()-s)
Using threading:
class DistThread(threading.Thread):
def __init__(self, func, args):
super(DistThread, self).__init__()
self.func = func
self.args = args
self.dists = None
def run(self):
self.dists = self.func(*self.args)
def join(self):
super().join(self)
return self.dists
In my computer, it consumes about 118s, but thread takes about 36s, where is wrong with it?
a couple of issues:
a significant amount of time will be spent serialising the data so it can be sent to the other process while threads share the same address space so pointers can be used
your current code is only using one process to do all the calcs with multiprocessing. you need to seperate your array into "chunks" somehow so that it can be processed via multiple workers
e.g:
import time
from multiprocessing import Pool
import editdistance
def calc_one(trie_word):
return editdistance.eval(key_word, trie_word)
if __name__ == "__main__":
word_list = [str(i) for i in range(1, 10000001)]
key_word = '2'
print("calc")
s = time.time()
with Pool(processes=4) as pool:
result = pool.map(calc_one, word_list, chunksize=10000)
print(len(result))
print("time",time.time()-s)
s = time.time()
result = list(calc_one(w) for w in word_list)
print(len(result))
print("time",time.time()-s)
this relies on key_word being a global variable. for me, the version using multiple processes takes ~5.3 seconds while the second version takes ~16.9 secs. not 4 times as quick as the data still needs to be sent back and forth, but pretty good
I had a similar experience with threading and multi processing inside Python to consume CSVS that had a large amount of data. I had a small look into this and found that processing spawns multiple processes to perform tasks which can be slower than just running one threaded process since threading runs in one place. There is a more definitive answer here: Multiprocessing vs Threading Python.
Pasting answer from link incase link disappears;
The threading module uses threads, the multiprocessing module uses processes. The difference is that threads run in the same memory space, while processes have separate memory. This makes it a bit harder to share objects between processes with multiprocessing. Since threads use the same memory, precautions have to be taken or two threads will write to the same memory at the same time. This is what the global interpreter lock is for.
Spawning processes is a bit slower than spawning threads. Once they are running, there is not much difference.
I have searched and cannot find an answer to this question elsewhere. Hopefully I haven't missed something.
I am trying to use Python multiprocessing to essentially batch run some proprietary models in parallel. I have, say, 200 simulations, and I want to batch run them ~10-20 at a time. My problem is that the proprietary software crashes if two models happen to start at the same / similar time. I need to introduce a delay between processes spawned by multiprocessing so that each new model run waits a little bit before starting.
So far, my solution has been to introduced a random time delay at the start of the child process before it fires off the model run. However, this only reduces the probability of any two runs starting at the same time, and therefore I still run into problems when trying to process a large number of models. I therefore think that the time delay needs to be built into the multiprocessing part of the code but I haven't been able to find any documentation or examples of this.
Edit: I am using Python 2.7
This is my code so far:
from time import sleep
import numpy as np
import subprocess
import multiprocessing
def runmodels(arg):
sleep(np.random.rand(1,1)*120) # this is my interim solution to reduce the probability that any two runs start at the same time, but it isn't a guaranteed solution
subprocess.call(arg) # this line actually fires off the model run
if __name__ == '__main__':
arguments = [big list of runs in here
]
count = 12
pool = multiprocessing.Pool(processes = count)
r = pool.imap_unordered(runmodels, arguments)
pool.close()
pool.join()
multiprocessing.Pool() already limits number of processes running concurrently.
You could use a lock, to separate the starting time of the processes (not tested):
import threading
import multiprocessing
def init(lock):
global starting
starting = lock
def run_model(arg):
starting.acquire() # no other process can get it until it is released
threading.Timer(1, starting.release).start() # release in a second
# ... start your simulation here
if __name__=="__main__":
arguments = ...
pool = Pool(processes=12,
initializer=init, initargs=[multiprocessing.Lock()])
for _ in pool.imap_unordered(run_model, arguments):
pass
One way to do this with thread and semaphore :
from time import sleep
import subprocess
import threading
def runmodels(arg):
subprocess.call(arg)
sGlobal.release() # release for next launch
if __name__ == '__main__':
threads = []
global sGlobal
sGlobal = threading.Semaphore(12) #Semaphore for max 12 Thread
arguments = [big list of runs in here
]
for arg in arguments :
sGlobal.acquire() # Block if more than 12 thread
t = threading.Thread(target=runmodels, args=(arg,))
threads.append(t)
t.start()
sleep(1)
for t in threads :
t.join()
The answer suggested by jfs caused problems for me as a result of starting a new thread with threading.Timer. If the worker just so happens to finish before the timer does, the timer is killed and the lock is never released.
I propose an alternative route, in which each successive worker will wait until enough time has passed since the start of the previous one. This seems to have the same desired effect, but without having to rely on another child process.
import multiprocessing as mp
import time
def init(shared_val):
global start_time
start_time = shared_val
def run_model(arg):
with start_time.get_lock():
wait_time = max(0, start_time.value - time.time())
time.sleep(wait_time)
start_time.value = time.time() + 1.0 # Specify interval here
# ... start your simulation here
if __name__=="__main__":
arguments = ...
pool = mp.Pool(processes=12,
initializer=init, initargs=[mp.Value('d')])
for _ in pool.imap_unordered(run_model, arguments):
pass
So I thought I'd finally post; what is the proper way to manage Process workers? I've tried to use a Pool, but I noticed I could not get the return value of each completed process. I tried to use a callback but that didn't work as expected either. Should I just be managing them myself with active_children ()?
My Pool code:
from multiprocessing import *
import time
import random
SOME_LIST = []
def myfunc():
a = random.randint(0,3)
time.sleep(a)
return a
def cb(retval):
SOME_LIST.append(retval)
print("Starting...")
p = Pool(processes=8)
p.apply_async(myfunc, callback=cb)
p.close()
p.join()
print("Stopping...")
print(SOME_LIST)
I expect a list of values; but all I get is the last item in the worker job to complete:
$ python multi.py
Starting...
Stopping...
[3]
Note: The answer should not use threading module; here is the reason why:
In CPython, due to the Global Interpreter Lock, only one thread can
execute Python code at once (even though certain performance-oriented
libraries might overcome this limitation). If you want your
application to make better use of the computational resources of
multi-core machines, you are advised to use multiprocessing.
You're misunderstanding the way apply_async works. It doesn't call the function you pass to it in every process in the Pool. It just calls the function one time, in one of the worker processes. So the results you're seeing are to be expected. You have a couple of options to get the behavior you want:
from multiprocessing import Pool
import time
import random
SOME_LIST = []
def myfunc():
a = random.randint(0,3)
time.sleep(a)
return a
def cb(retval):
SOME_LIST.append(retval)
print("Starting...")
p = Pool(processes=8)
for _ in range(p._processes):
p.apply_async(myfunc, callback=cb)
p.close()
p.join()
print("Stopping...")
print(SOME_LIST)
Or
from multiprocessing import Pool
import time
import random
def myfunc():
a = random.randint(0,3)
time.sleep(a)
return a
print("Starting...")
p = Pool(processes=8)
SOME_LIST = p.map(myfunc, range(p._processes))
p.close()
p.join()
print("Stopping...")
print(SOME_LIST)
Note that you could also call apply_async or map for more than the number of processes in the pool. The idea of the Pool is that it guarantees exactly num_processes processes will be running for the entire lifetime of the Pool, no matter how many tasks you submit. So if you create a Pool(8) and call apply_async once, one of your eight workers will get a task, and the other seven will be idle. If you create a Pool(8) and call apply_async 80 times, the 80 tasks will get distributed to your eight workers, with no more than eight of the tasks actually being processed at once.