Python multiprocessing - starting next process after several seconds - python

I am a beginner in Python, so I would very appreciate it if you can help me with clear and easy explanations.
In my Python script, I have a function that makes several threads to do an I/O bound task (What it really does is making several Azure requests concurrently using Azure Python SDK), and I also have a list of time differences like [1 second, 3 seconds, 10 seconds, 5 seconds, ..., 7 seconds] so that I execute the function again after each time difference.
Let's say I want to execute the function and execute it again after 5 seconds. The first execution can take much more than 5 seconds to finish as it has to wait for the requests it makes to be done. So, I want to execute each function in a different process so that different executions of the function do not block each other (Even if they don't block each other without using different processes, I just didn't want threads in different executions to be mixed).
My code is like:
import multiprocessing as mp
from time import sleep
def function(num_threads):
"""
This functions makes num_threads number of threads to make num_threads number of requests
"""
# Time to wait in seconds between each execution of the function
times = [1, 10, 7, 3, 13, 19]
# List of number of requests to make for each execution of the function
num_threads_list = [1, 2, 3, 4, 5, 6]
processes = []
for i in range(len(times)):
p = mp.Process(target=function, args=[num_threads_list[i]])
p.start()
processes.append(p)
sleep(times[i])
for process in processes:
process.join()
Question I have due to mare:
the length of the list "times" is very big in my real script (, which is 1000). Considering the time differences in the list "times", I guess there are at most 5 executions of the function running concurrently using processes. I wonder if each process terminates when it is done executing the function, so that there are actually at most 5 processes running. Or, Does it remain so that there will be 1000 processes, which sounds very weird given the number of CPU cores of my computer?
Please tell me if you think there is a better way to do what I explained above.
Thank you!

The main problem I destilate from your question is having a large amount of processes running simultaniously.
You can prevent that by maintaining a list of processes with a maximum length. Something like this.
import multiprocessing as mp
from time import sleep
from random import randint
def function(num_threads):
"""
This functions makes num_threads number of threads to make num_threads number of requests
"""
sleep(randint(3, 7))
# Time to wait in seconds between each execution of the function
times = [1, 10, 7, 3, 13, 19]
# List of number of requests to make for each execution of the function
num_threads_list = [1, 2, 3, 4, 5, 6, 7, 8, 9, 10]
process_data_list = []
max_processes = 4
# =======================================================================================
def main():
times_index = 0
while times_index < len(times):
# cleanup stopped processes -------------------------------
cleanup_done = False
while not cleanup_done:
cleanup_done = True
# search stopped processes
for i, process_data in enumerate(process_data_list):
if not process_data[1].is_alive():
print(f'process {process_data[0]} finished')
# remove from processes
p = process_data_list.pop(i)
del p
# start new search
cleanup_done = False
break
# try start new process ---------------------------------
if len(process_data_list) < max_processes:
process = mp.Process(target=function, args=[num_threads_list[times_index]])
process.start()
process_data_list.append([times_index, process])
print(f'process {times_index} started')
times_index += 1
else:
sleep(0.1)
# wait for all processes to finish --------------------------------
while process_data_list:
for i, process_data in enumerate(process_data_list):
if not process_data[1].is_alive():
print(f'process {process_data[0]} finished')
# remove from processes
p = process_data_list.pop(i)
del p
# start new search
break
print('ALL DONE !!!!!!')
# =======================================================================================
if __name__ == '__main__':
main()
It runs max_processes at once as you can see in the result.
process 0 started
process 1 started
process 2 started
process 3 started
process 3 finished
process 4 started
process 1 finished
process 5 started
process 0 finished
process 2 finished
process 5 finished
process 4 finished
ALL DONE !!!!!!

You would also use a timer to do the job like in the following code.
I voluntarily put 15 second to thread 2 in order that one could see it’s effectively ended in last position once time elapsed.
This code sample has two main functions.
The first one your_process_here() like it’s name says is waiting for your own code
The second one is a manager which organizes the threads slicing in order to not overload the system.
Parameters
max_process : total number of processes being executed by the script
simultp : maximum number of simultaneous processes
timegl : time guideline which defines the waiting time for each thread since time parent starts. So waiting time is at least the time defined in the guideline (which refers to parent's start time).
Say in other words, since its guideline time is elapsed, thread starts as soon as possible when taking into account the maximum number of simultaneous threads allowed.
In this example
max_process = 6
simultp = 3
timegl = [1, 15, 1, 0.22, 6, 0.5] (just for explanations because the more logical is to have an increase series there)
Result in the shell
simultaneously launched processes : 3
process n°2 is active and will wait 14.99 seconds more before treatment function starts
process n°1 is active and will wait 0.98 seconds more before treatment function starts
process n°3 is active and will wait 0.98 seconds more before treatment function starts
---- process n°1 ended ----
---- process n°3 ended ----
simultaneously launched processes : 3
process n°5 is active and will wait 2.88 seconds more before treatment function starts
process n°4 is active and will start now
---- process n°4 ended ----
---- process n°5 ended ----
simultaneously launched processes : 2
process n°6 is active and will start now
---- process n°6 ended ----
---- process n°2 ended ----
Code
import multiprocessing as mp
from threading import Timer
import time
def your_process_here(starttime, pnum, timegl):
# Delay since the parent thread starts
delay_since_pstart = time.time() - starttime
# Time to sleep in order to follow the most possible the time guideline
diff = timegl[pnum-1]- delay_since_pstart
if diff > 0: # if time ellapsed since Parent starts < guideline time
print('process n°{0} is active and will wait {1} seconds more before treatment function starts'\
.format(pnum, round(diff, 2)))
time.sleep(diff) # wait for X more seconds
else:
print('process n°{0} is active and will start now'.format(pnum))
########################################################
## PUT THE CODE AFTER SLEEP() TO START CODE WITH A DELAY
## if pnum == 1:
## function1()
## elif pnum == 2:
## function2()
## ...
print('---- process n°{0} ended ----'.format(pnum))
def process_manager(max_process, simultp, timegl, starttime=0, pnum=1, launchp=[]):
# While your number of simultaneous current processes is less than simultp and
# the historical number of processes is less than max_process
while len(mp.active_children()) < simultp and len(launchp) < max_process:
# Incrementation of the process number
pnum = len(launchp) + 1
# Start a new process
mp.Process(target=your_process_here, args=(starttime, pnum, timegl)).start()
# Historical of all launched unique processes
launchp = list(set(launchp + mp.active_children()))
# ...
####### THESE 2 FOLLOWING LINES ARE TO DELETE IN OPERATIONAL CODE ############
print('simultaneously launched processes : ', len(mp.active_children()))
time.sleep(3) # optionnal : This a break of 3 seconds before the next slice of process to be treated
##############################################################################
if pnum < max_process:
delay_repeat = 0.1 # 100 ms
# If all the processes have not been launched renew the operation
Timer(delay_repeat, process_manager, (max_process, simultp, timegl, starttime, pnum, launchp)).start()
if __name__ == '__main__':
max_process = 6 # maximum of processes
simultp = 3 # maximum of simultaneous processes to save resources
timegl = [1, 15, 1, 0.22, 6, 0.5] # Time guideline
starttime = time.time()
process_manager(max_process, simultp, timegl, starttime)

Related

Understanding the speed difference in threading

This is the script both threading functions call:
def searchBadWireless( hub ):
host = f'xxx.xxx.xxx.{hub}'
results = {}
try:
netConnect = ConnectHandler( device_type=platform, ip=host, username=cisco_username, password=cisco_password, )
output = netConnect.send_command( 'sh int status | i 298|299' )
netConnect.disconnect()
results[ int( hub ) ] = output
except:
print( f'{host} - Failed to connect' )
Now the first threading function I have completes in around 7 seconds:
def threadingProcess( execFunction ):
switchList = getSwitchIPs()
start = perf_counter()
threads = []
for ip in switchList:
thread = threading.Thread(target=execFunction, args=( [ip[ 0 ]] ) )
threads.append( thread )
for t in threads:
t.start()
for c in threads:
c.join()
finish = perf_counter()
print(f"It took {finish-start} second(s) to finish.")
But the second one I have runs at around 32 seconds:
def newThreadProcess():
switchList = getSwitchIPs()
start = perf_counter()
with ThreadPoolExecutor() as executor:
results = executor.map(searchBadWireless, switchList)
# for result in results:
# print(result)
finish = perf_counter()
print(f"It took {finish-start} second(s) to finish.")
From what I have read online the better approach is the second function but why does it take so much longer to complete than the first, is there a way of speeding it up to be as fast as the first function?
The first function is faster for the simple reason that all threads are started immediately. If your work items are of number N, you are lunching N threads in parallel. If your machine can handle that load, it will be fast.
For the second function, the ThreadPoolExecutor, by default, limits the number of threads by using a pool of threads. In order to specify the pool size, you need to set the max_workers arguments to the target number of threads.
Doc:
Changed in version 3.5: If max_workers is None or not given, it will default to the number of processors on the machine, multiplied by 5, assuming that ThreadPoolExecutor is often used to overlap I/O instead of CPU work and the number of workers should be higher than the number of workers for ProcessPoolExecutor.
So it seems that the host had a low number of CPUs, thus limiting the number of threads in the pool. Theoretically, if the number of max_workers was equal to N (number of work items), the throughput of both functions would be the same.

How do I run these simultaneously?

I'm relatively new to Python so I don't know how difficult or easy this is to solve, but I'm trying to make a function that can measure time without blocking other code from executing while doing so. Here's what I have:
import time
def tick(secs):
start = time.time()
while True:
end = time.time()
elapsed = end - start
if elapsed >= secs:
return elapsed
break
input("what time is it?: ")
print(f"one: {round(tick(1))}")
print(f"two: {round(tick(2))}")
print(f"three: {round(tick(3))}")
print(f"four: {round(tick(4))}")
print(f"five: {round(tick(5))}")
The input blocks the timer from starting until it gets input, and the tick()'s after dont run simultaneously. Thus running one at a time like, wait 1 second then wait 2 seconds instead of wait 5 seconds (to be clear I want all timers that are started to run at the same time others are, so the 5 second timer would start at the same time the 1 second one does), thank you for your time and please let me know if you have a solution for this.
Not exactly sure what you are asking for, but how does this look:
import time
start=time.time()
input("what time is it?: ")
time.sleep(1)
print(time.time()-start)
time.sleep(2)
print(time.time()-start)
time.sleep(3)
print(time.time()-start)
time.sleep(4)
print(time.time()-start)
time.sleep(5)
print(time.time()-start)
The tick's are not running simultanously because your are first waiting for 1 second, then again you are waiting for 2, then again for 3, etc.
A simple thing to do is have a list of time intervals you want to "pause" at, in your case [1, 2, 3, 4, 5] which are sorted numerically. You will then keep track of the current index by checking elapsed >= secs and if it succedds you will increment it by one. Here's a glance
import time
def tick(tocks: list):
"""Tocks is a list of the time intervals which you want to be
notified at when are reached, all of them are going to run in parallel.
"""
tocks = sorted(tocks)
current = 0 # we are at index 0 which is the lowest interval
start = time.time()
while current < len(tocks): # while we have not reached the last interval
end = time.time()
elapsed = end - start
if elapsed >= tocks[current]: # if the current interval has passed check for the next
print(f"Tock: {tocks[current]}")
current += 1
This function can then be called like this
tick([1, 2, 3, 4, 5])
This will print 1 to 5 seconds at the same time. Here is the output
Tock: 1 # at 1 sec
Tock: 2 # at 2 sec
Tock: 3 # at 3 sec
Tock: 4 # .....
Tock: 5
You can imagine that this may have some minor flaws if you choose really close numbers like [0.5, 0.50001, 0.50002] and because the difference is so small 0.0001 seconds may not actually pass.
You could also try multithreading as has been noted, but it will be a very CPU-intensive (imagine wanting to count to 100 and you have to open 100 threads) task for something very simple.

Run process after process in a queue using python

I have a queue of 500 processes that I want to run through a python script, I want to run every N processes in parallel.
What my python script does so far:
It runs N processes in parallel, waits for all of them to terminate, then runs the next N files.
What I need to do:
When one of the N processes is finished, another process from the queue is automatically started, without waiting for the rest of the processes to terminate.
Note: I do not know how much time each process will take, so I can't schedule a process to run at a particular time.
Following is the code that I have.
I am currently using subprocess.Popen, but I'm not limited to its use.
for i in range(0, len(queue), N):
batch = []
for _ in range(int(jobs)):
batch.append(queue.pop(0))
for process in batch:
p = subprocess.Popen([process])
ps.append(p)
for p in ps:
p.communicate()
I believe this should work:
import subprocess
import time
def check_for_done(l):
for i, p in enumerate(l):
if p.poll() is not None:
return True, i
return False, False
processes = list()
N = 5
queue = list()
for process in queue:
p = subprocess.Popen(process)
processes.append(p)
if len(processes) == N:
wait = True
while wait:
done, num = check_for_done(processes)
if done:
processes.pop(num)
wait = False
else:
time.sleep(0.5) # set this so the CPU does not go crazy
So you have an active process list, and the check_for_done function loops through it, the subprocess returns None if it is not finished and it returns a return code if it is. So when something is returned it should be done (without knowing if it was successful or not). Then you remove that process from the list allowing for the loop to add another one.
Assuming python3, you could make use of ThreadPoolExecutor from concurrent.futures like,
$ cat run.py
from subprocess import Popen, PIPE
from concurrent.futures import ThreadPoolExecutor
def exec_(cmd):
proc = Popen(cmd, stdout=PIPE, stderr=PIPE)
stdout, stderr = proc.communicate()
#print(stdout, stderr)
def main():
with ThreadPoolExecutor(max_workers=4) as executor:
# to demonstrate it will take a batch of 4 jobs at the same time
cmds = [['sleep', '4'] for i in range(10)]
start = time.time()
futures = executor.map(exec_, cmds)
for future in futures:
pass
end = time.time()
print(f'Took {end-start} seconds')
if __name__ == '__main__':
main()
This will process 4 tasks at a time, and since the number of tasks are 10, it should only take around 4 + 4 + 4 = 12 seconds
First 4 seconds for the first 4 tasks
Seconds 4 seconds for the second 4 tasks
And the final 4 seconds for the last 2 tasks remaining
Output:
$ python run.py
Took 12.005989074707031 seconds

Python Max concurrency thread alive

i would like to understand how to use python threading and queue. my goal is to have 40 threads always alive, this is my code:
for iteration iterations: # main iteration
dance = 1
if threads >= len(MAX_VALUE_ITERATION) :
threads = len(MAX_VALUE_ITERATION)-1 # adjust number of threads as because in this iteration i have just x subvalues
else:
threads = threads_saved # recover the settings or the passed argument
while dance <= 5: # iterate from 1 to 5
request = 0
for lol in MAX_LOL: # lol iterate
for thread_n in range(threads): # MAX threads
t = threading.Thread(target=do_something)
t.setDaemon(True)
t.start()
request += 1
main_thread = threading.currentThread()
for t in threading.enumerate():
if t is main_thread:
continue
if request < len(MAX_LOL)-1 and settings.errors_count <= MAX_ERR_COUNT:
t.join()
dance += 1
The code you see here it was cleaned because it was long to debug for you, so i try to semplify a little bit.
As you can see there are many iteration , i start from a dbquery and i fetch the result in the list (iterations)
next i adjust the max number of the threads allowed
then i iterate again from 1 to 5 (it's an argoument passed to the small thread)
then inside the value fetched from the query iteration there is a json that contain another list i need to iterate again ...
and then finally i open the threads with start and join ...
The script open x threads and then, when they (all or almost) finish it will open others threads ... but my goal is to keep max X threads forever , i mean once one thread is finished another have to spawn and so on ... until the max_number of the threads is reached.
i hope you can help me.
Thanks

Python: subprocess memory leak

I want to run a serial program on multiple cores at the same time and I need to do that multiple time (in a loop).
I use subprocess.Popen to distribute the jobs on the processors by limiting the number of jobs to the number of available processors. I add the jobs to a list and then I check with poll() if the jobs are done, I remove them from the list and continue the submission until the total number of jobs are completed.
I have been looking on the web and found a couple of interesting scripts to do that and came out with my adapted version:
nextProc = 0
processes = []
while (len(processes) < limitProc): # Here I assume that limitProc < ncores
input = filelist[nextProc]+'.in' # filelist: list of input file
output = filelist[nextProc]+'.out' # list of output file
cwd = pathlist[nextProc] # list of paths
processes.append(subprocess.Popen(['myProgram','-i',input,'-screen',output],cwd=cwd,bufsize=-1))
nextProc += 1
time.sleep(wait)
while (len(processes) > 0): # Loop until all processes are done
time.sleep(wait)
for i in xrange(len(processes)-1, -1, -1): # Remove processes done (traverse backward)
if processes[i].poll() is not None:
del processes[i]
time.sleep(wait)
while (len(processes) < limitProc) and (nextProc < maxProcesses): # Submit new processes
output = filelist[nextProc]+'.out'
input = filelist[nextProc]+'.in'
cwd = pathlist[nextProc]
processes.append(subprocess.Popen(['myProgram','-i',input,'-screen',output],cwd=cwd,bufsize=-1))
nextProc += 1
time.sleep(wait)
print 'Jobs Done'
I run this script in a loop and the problem is that the execution time increases from one step to another. Here is the graph: http://i62.tinypic.com/2lk8f41.png
myProgram time execution is constant.
I'd be so glad if someone could explain me what is causing this leak.
Thanks a lot,
Begbi

Categories

Resources