How to control number of process open in Python - python

My main goal is to open 30 child process from the parent process and then open unknown number of new processes from each of those 30 child processes. I am going to call redis for some location data from those new child processes and I am not sure how many times I have to call, it would be 100 or more than 1000. When I am calling more than 1000 times, I am crossing page limit, Error is:
OSError: [Errno 24] Too many open files
I don't want to manually increase the page limit on the production server. I want to put a throttle on the process creation, so that in no way it has more than 1000 connections open.
Here is my template code:
import multiprocessing
import time
from multiprocessing.dummy import Pool
from random import randint
class MultiProcessing():
def second_calculation(self, index1, index2):
random = randint(1, 10)
print("Slept for: {} seconds".format(random))
print("Call done: index: {} | index2: {}".format(index1, index2))
def calculation(self, index):
child_process = list()
random = randint(1, 5)
print("Slept for : {} seconds".format(random))
counter = 0
for i in range(0, 1500):
counter += 1
new_child_process = multiprocessing.Process(target=self.second_calculation, args=(index, counter))
for process in child_process:
print("Request done: {}".format(index))
if __name__ == '__main__':
index = 0
parent_process = list()
m = MultiProcessing()
for i in range(0, 30):
index += 1
print("Index: {}".format(index))
new_process = multiprocessing.Process(target=m.calculation, args=(index,))
for process in parent_process:
Thank you.


Why I am getting error at the time of parallel processing

I am passing the key and value of a dictionary for parallel processing
if __name__ == "__main__":
"Dataset_1": data_preprocess.dataset_1,
"Dataset_2": data_preprocess.dataset_2,}
pool = mp.Pool(8)
pool.starmap(main, zip(DATASETS.keys(), DATASETS.values()))
# As I am not joining any result and I am directly saving the output
# in CSV file from (main function) I did not used pool.join()
The main function
def main(dataset_name, generate_dataset):
"LinReg": LinearRegression(),
"Lasso": Lasso(),}
ROOT = Path(__file__).resolve().parent
dataset_name = dataset_name
generate_dataset = generate_dataset
dfs = []
for reg_name, regressor in REGRESSORS.items():
df = function_calling(
df = pd.concat(dfs, axis=0, ignore_index=True)
filename = dataset_name + "_result.csv"
outfile = str(PATH) + "/" + filename
I am getting an error AssertionError: daemonic processes are not allowed to have children.
Could you tell me why I am getting the error? How can I resolve this?
To just create your own Process instances:
import multiprocessing as mp
def main(dataset_name, generate_dataset):
print(dataset_name, generate_dataset, flush=True)
... # etc.
if __name__ == "__main__":
"Dataset_1": 1,
"Dataset_2": 2,}
processes = [mp.Process(target=main, args=(k, v)) for k, v in DATASETS.items()]
for process in processes:
# wait for termination:
for process in processes:
Dataset_1 1
Dataset_2 2
The issue is suppose you have 8 CPU cores and DATASETS had 100 key/value pairs. You would be creating 100 processes. Assuming these processes were CPU-intensive, you could not expect more than 8 of them to really be doing anything productive. Yet you incurred the CPU and storage overhead of having created all those processes. But as long as the number of processes you will be creating are not excessively greater than the number of CPU cores you have and your function main does not need to return a value back to your main process, this should be OK.
There is also a way of implementing your own multiprocessing pool with these Process instances and a Queue instance, but that's a bit more complicated:
import multiprocessing as mp
def main(dataset_name, generate_dataset):
print(dataset_name, generate_dataset, flush=True)
... # etc.
def worker(queue):
while True:
arg = queue.get()
if arg is None:
# signal to terminate
# unpack
dataset_name, generate_dataset = arg
main(dataset_name, generate_dataset)
if __name__ == "__main__":
"Dataset_1": 1,
"Dataset_2": 2,}
queue = mp.Queue()
items = list(DATASETS.items())
for k, v in items:
# put the arguments on the queue
queue.put((k, v))
# number of processors we will be using:
n_processors = min(mp.cpu_count(), len(items))
for _ in range(n_processors):
# special value to tell main there is no nore work: one for each task
processes = [mp.Process(target=worker, args=(queue,)) for _ in range(n_processors)]
for process in processes:
for process in processes:

How to create a continuous stream of Python's concurrent.futures.ProcessPoolExecutor.submits()?

I am able to submit batches of concurrent.futures.ProcessPoolExecutor.submits() where each batch may contain several submit(). However, I noticed that if each batch of submits consumes a significant about of RAM, there can be quite a bit of RAM usage inefficiencies; need to wait for all futures in the batch to be completed before another batch of submit() can be submitted.
How does one create a continuous stream of Python's concurrent.futures.ProcessPoolExecutor.submit() until some condition is satisfied?
Test Script:
#!/usr/bin/env python3
import numpy as np
from numpy.random import default_rng, SeedSequence
import concurrent.futures as cf
from itertools import count
def dojob( process, iterations, samples, rg ):
# Do some tasks
result = []
for i in range( iterations ):
a = rg.standard_normal( samples )
b = rg.integers( -3, 3, samples )
mean = np.mean( a + b )
result.append( ( i, mean ) )
return { process : result }
if __name__ == '__main__':
cpus = 2
iterations = 10000
samples = 1000
# Setup NumPy Random Generator
ss = SeedSequence( 1234567890 )
child_seeds = ss.spawn( cpus )
rg_streams = [ default_rng(s) for s in child_seeds ]
# Peform concurrent analysis by batches
counter = count( start=0, step=1 )
# Serial Run of dojob
process = next( counter )
for cpu in range( cpus ):
process = next( counter )
rg = rg_streams[ cpu ]
rdict = dojob( process, iterations, samples, rg )
print( 'rdict', rdict )
# Concurrent Run of dojob
futures = []
results = []
with cf.ProcessPoolExecutor( max_workers=cpus ) as executor:
while True:
for cpu in range( cpus ):
process = next( counter )
rg = rg_streams[ cpu ]
futures.append( executor.submit( dojob, process, iterations, samples, rg ) )
for future in cf.as_completed( futures ):
# Do some post processing
r = future.result()
for k, v in r.items():
if len( results ) < 5000:
results.append( np.std( v ) )
print( k, len(results) )
if len(results) <= 100: #Put a huge number to simulate continuous streaming
futures = []
child_seeds = child_seeds[0].spawn( cpus )
rg_streams = [ default_rng(s) for s in child_seeds ]
print( '\n*** Concurrent Analyses Ended ***' )
To expand on my comment, how about something like this, using the completion callback and a threading.Condition? I took the liberty of adding a progress indicator too.
EDIT: I refactored this into a neat function you pass your desired concurrency and queue depth, as well as a function that generates new jobs, and another function that processes a result and lets the executor know whether you've had enough.
import concurrent.futures as cf
import threading
import time
from itertools import count
import numpy as np
from numpy.random import SeedSequence, default_rng
def dojob(process, iterations, samples, rg):
# Do some tasks
result = []
for i in range(iterations):
a = rg.standard_normal(samples)
b = rg.integers(-3, 3, samples)
mean = np.mean(a + b)
result.append((i, mean))
return {process: result}
def execute_concurrently(cpus, max_queue_length, get_job_fn, process_result_fn):
running_futures = set()
jobs_complete = 0
job_cond = threading.Condition()
all_complete_event = threading.Event()
def on_complete(future):
nonlocal jobs_complete
if process_result_fn(future.result()):
jobs_complete += 1
with job_cond:
time_since_last_status = 0
start_time = time.time()
with cf.ProcessPoolExecutor(cpus) as executor:
while True:
while len(running_futures) < max_queue_length:
fn, args = get_job_fn()
fut = executor.submit(fn, *args)
with job_cond:
if all_complete_event.is_set():
if time.time() - time_since_last_status > 1.0:
rps = jobs_complete / (time.time() - start_time)
f"{len(running_futures)} running futures on {cpus} CPUs, "
f"{jobs_complete} complete. RPS: {rps:.2f}"
time_since_last_status = time.time()
def main():
ss = SeedSequence(1234567890)
counter = count(start=0, step=1)
iterations = 10000
samples = 1000
results = []
def get_job():
seed = ss.spawn(1)[0]
rg = default_rng(seed)
process = next(counter)
return dojob, (process, iterations, samples, rg)
def process_result(result):
for k, v in result.items():
if len(results) >= 10000:
return True # signal we're complete
if __name__ == "__main__":
The Answer posted by #AKX works. Kudos to him. After testing it, I would like to recommend two amendments that I believe are worth considering and implementing.
Amendment 1: To prematurely cancel the execution of the python script, Ctrl+C has to be used. Unfortunately, doing that would not terminate the concurrent.futures.ProcessPoolExecutor() processes that are executing the function dojob(). This issue becomes more pronounced when the time is taken to complete dojob() is long; this situation can be simulated by making the sample size in the script to be large (e.g. samples = 100000). This issue can be seen when the terminal command ps -ef | grep python is executed. Also, if dojob() consumes a significant amount of RAM, the memory used by these concurrent processes do not get released until the concurrent processes are manually killed (e.g. kill -9 [PID]). To address these issues, the following amendment is needed.
with job_cond:
should be changed to:
with job_cond:
except KeyboardInterrupt:
# Cancel running futures
for future in running_futures:
_ = future.cancel()
# Ensure concurrent.futures.executor jobs really do finish.
_ = cf.wait(running_futures, timeout=None)
So when Ctrl+C has to be used, you just have to press it once first. Next, give some time for the futures in running_futures to be cancelled. This could take a few seconds to several seconds to complete; it depends on the resource requirements of dojob(). You can see the CPUs activity in your task manager or system monitor drops to zero or hear the high revving sound from your cpu cooling fan reduce. Note, the RAM used would not be released yet. Thereafter, press Ctrl+C again and that should allow a clean exit of all the concurrent processes whereby the used RAM are also released.
Amendment 2: Presently, the inner while-loop dictates that jobs must be submitted continuously as fast as the cpu "mainThread" can allow. Realistically, there is no benefit to be able to submit more jobs than there are available cpus in the cpus pool. Doing so only unnecessarily consumes cpu resources from the "MainThread" of the main processor. To regulate the continuous job submission, a new submit_job threading.Event() object can be used.
Firstly, define such an object and set its value to True with:
submit_job = threading.Event()
Next, at the end of the inner while-loop add this condition and .wait() method:
with cf.ProcessPoolExecutor(cpus) as executor:
while True:
while len(running_futures) < max_queue_length:
fn, args = get_job_fn()
fut = executor.submit(fn, *args)
if len(running_futures) >= cpus: # Add this line
submit_job.clear() # Add this line
submit_job.wait() # Add this line
Finally change the on_complete(future) callback to:
def on_complete(future):
nonlocal jobs_complete
if process_result_fn(future.result()):
if len(running_futures) < cpus: # add this conditional setting
submit_job.set() # add this conditional setting
jobs_complete += 1
with job_cond:
There is a library called Pypeln that does this beautifully. It allows for streaming tasks between stages, and each stage can be run in a process, thread, or asyncio pool, depending on what is optimum for your use case.
Sample code:
import pypeln as pl
import time
from random import random
def slow_add1(x):
time.sleep(random()) # <= some slow computation
return x + 1
def slow_gt3(x):
time.sleep(random()) # <= some slow computation
return x > 3
data = range(10) # [0, 1, 2, ..., 9]
stage =, data, workers=3, maxsize=4)
stage = pl.process.filter(slow_gt3, stage, workers=2)
data = list(stage) # e.g. [5, 6, 9, 4, 8, 10, 7]

Fasted way to submit tasks with celery?

I'm trying to submit around 150 million jobs to celery using the following code:
from celery import chain
from .task_receiver import do_work,handle_results,get_url
urls = '/home/ubuntu/celery_main/urls'
if __name__ == '__main__':
fh = open(urls,'r')
alldat = fh.readlines()
for line in alldat:
result = chain(get_url.s(line[:-1]),do_work.s(line[:-1])).apply_async()
print ("failed to submit job")
print('task submitted ' + str(line[:-1]))
Would it be faster to split the file into chunks and run multiple instances of this code? Or what can I do? I'm using memcached as the backend, rabbitmq as the broker.
import multiprocessing
from celery import chain
from .task_receiver import do_work,handle_results,get_url
urls = '/home/ubuntu/celery_main/urls'
num_workers = 200
def worker(urls,id):
"""worker function"""
for url in urls:
print ("%s - %s" % (id,url))
result = chain(get_url.s(url),do_work.s(url)).apply_async()
if __name__ == '__main__':
fh = open(urls,'r')
alldat = fh.readlines()
jobs = []
stack = []
id = 0
for i in alldat:
if (len(stack) < len(alldat) / num_workers):
id = id + 1
p = multiprocessing.Process(target=worker, args=(stack,id,))
stack = []
for j in jobs:
If I understand your problem correctly:
you have a list of 150M urls
you want to run get_url() then do_work() on each of the urls
so you have two issues:
going over the 150M urls
queuing the tasks
Regarding the main for loop in your code, yes you could do that faster if you use multithreading, especially if you are using multicore cpu. Your master thread could read the file and pass chunks of it to sub-threads that will be creating the celery tasks.
Check the guide and the documentation:
And now let's imagine you have 1 worker that is receiving these tasks. The code will generate 150M new tasks that will be pushed to the queue. Each chain will be a chain of get_url(), and do_work(), the next chain will run only when do_work() finishes.
If get_url() takes a short time and do_work() takes a long time, it will be a series of quick-task, slow-task, and the total time:
t_total_per_worker = (t_get_url_average+t_do_work_average) X 150M
If you have n workers
t_total = t_total_per_worker/n
t_total = (t_get_url_average+t_do_work_average) X 150M / n
Now if get_url() is time critical while do_work() is not, then, if you can, you should run all 150M get_url() first and when that is done run all 150M do_work(), but that may require changes to your process design.
That is what I would do. Maybe others have better ideas!?

How to send a message to process zero, from all other processes?

How to send a message to process zero, from all other processes?
I'm using mpi4py, with Python 2, and was following this example Parallel programming Research
Why does this line fail, and what fixes it?
searchResult = comm.recv(source=tempRank)
My code below (appears to) works fine until it reaches the line above. I put print statements above and below this line, so I pretty sure this is the problem.
My expectation was ... processor zero will receive a message from each processor, but it does not. The program seems to just hang and do nothing. Here is the program.
import time
from random import randint
from random import shuffle
from mpi4py import MPI
import sys
rank = comm.Get_rank()
size = MPI.COMM_WORLD.Get_size()
name = MPI.Get_processor_name()
if rank == 0:
someNumber = 0
data = list(range(1,8))
chunks = [ [] for _ in range(size) ]
for i,chunk in enumerate(data):
chunks[i % size].append(chunk)
# scatter data to all processors
# give another variable to each processor
someNumber = comm.bcast(someNumber,root=0)
print('im rank=',rank,', my data=', data, ' searching for someNumber = ',someNumber)
for num in data:
if someNumber == num:
print('found the someNumber')
if searchResult == False:
print('someNumber not found')
# Now, at this point, I want all processors (including processor 0)
# to send processor 0 a message
# attempting to send process 0 a message from all other processes
# (does/can processor 0 send itself a message?)
if rank == 0:
print('this line prints one time, and program hangs')
searchResult = comm.recv(source=tempRank)
print('this line never prints, so whats wrong with previous line?')
if rank == 0:
if searchResult == True:
print('found the someNumber, everyone stop searching .. how to make all processes stop?')
print('elapsedtime = {}'.format(time.time()-starttime))
print('no one found the someNumber')
print('elapsedtime = {}'.format(time.time()-starttime))

Python Multiprocessing Process causes Parent to idle

My question is very similar to this question here, except the solution with catching didn't quite work for me.
Problem: I'm using multiprocessing to handle a file in parallel. Around 97%, it works. However, sometimes, the parent process will idle forever and CPU usage shows 0.
Here is a simplified version of my code
from PIL import Image
import imageio
from multiprocessing import Process, Manager
def split_ranges(min_n, max_n, chunks=4):
chunksize = ((max_n - min_n) / chunks) + 1
return [range(x, min(max_n-1, x+chunksize)) for x in range(min_n, max_n, chunksize)]
def handle_file(file_list, vid, main_array):
for index in file_list:
#Do Stuff
valid_frame = Image.fromarray(vid.get_data(index))
main_array[index] = 1
main_array[index] = 0
def main(file_path):
mp_manager = Manager()
vid = imageio.get_reader(file_path, 'ffmpeg')
num_frames = vid._meta['nframes'] - 1
list_collector = mp_manager.list(range(num_frames)) #initialize a list as the size of number of frames in the video
total_list = split_ranges(10, min(200, num_frames), 4) #some arbitrary numbers between 0 and num_frames of video
processes = []
file_readers = []
for split_list in total_list:
video = imageio.get_reader(file_path, 'ffmpeg')
proc = Process(target=handle_file, args=(split_list, video, list_collector))
print "Started Process" #Always gets printed
proc.Daemon = False
for i, proc in enumerate(processes):
print "Join Process " + str(i) #Doesn't get printed
fd = file_readers[i]
return list_collector
The issue is that I can see the processes starting and I can see that all of the items are being handled. However, sometimes, the processes don't rejoin. When I check back, only the parent process is there but it's idling as if it's waiting for something. None of the child processes are there, but I don't think join is called because my print statement doesn't show up.
My hypothesis is that this happens to videos with a lot of broken frames. However, it's a bit hard to reproduce this error because it rarely occurs.
EDIT: Code should be valid now. Trying to find a file that can reproduce this error.

