Dynamically add/remove threads to the worker pool in celery

Dynamically add/remove threads to the worker pool in celery - python

How do I add more threads (and remove threads) to the current multiprocessing pool, from within a task (i.e. celeryd was run with CELERYD_CONCURRENCY = 10 but I want to change it on-the-fly to CELERYD_CONCURRENCY = 15)?
There is a function called celery.concurrency.processes.TaskPool.Pool.grow but I have no idea how to call that from a running task or whether it is the correct function to do that.

Read the source:
https://github.com/ask/celery/blob/master/celery/concurrency/processes/__init__.py
there's both grow() and shrink(), although the latter seems a tad fishy.
you'd need to keep a reference to the pool somewhere, if you have only one pool, keep it global.
caveat poster: if multiprocessing actually means running multiple separate processes, you might already be in a child process when you try to shrink or grow, and obviously that won't work.

Related

Make two competing functions and kill the slow one

In python, I have to fetch crypto data from binance every minute and do some calculations. For fetching data I have two functions func_a() and func_b(). They both do the same thing but in wildly different manner. Sometimes func_a is faster and sometimes func_b is faster. I want to run both the functions in parallel, if any of the function returns result to me faster, I want to kill the other one and move on (because they both are going to bring the same result).
How can I achieve this in python? Please mind that I do not want to replace these functions or their mechanics.

Python threads aren't very suitable for this purpose for two reasons:
The Python GIL means that if you spawn two CPU-bound threads, each of the two threads will run at half its normal speed (because only one thread is actually running at any given instant; the other is waiting to acquire the interpreter lock)
There is no reliable way to unilaterally kill a thread, because if you do that, any resources it had allocated will be leaked, causing major problems.
If you really want to be able to cancel a function-in-progress, then, you have two options:
Modify the function to periodically check a "please_quit" boolean variable (or whatever) and return immediately if that boolean's state has changed to True. Then your main thread can set the please_quit variable and then call join() on the thread, and rest assured that the thread will quit ASAP. (This does require that you have the ability to modify the function's implementation)
Spawn child processes instead of child threads. A child process takes more resources to launch, but it can run truly in parallel (since it has its own separate Python interpreter) and it is safe (usually) to unilaterally kill it, because the OS will automatically clean up all of the process's held resources when the process is killed.

multiprocessing in python - what gets inherited by forkserver process from parent process?

I am trying to use forkserver and I encountered NameError: name 'xxx' is not defined in worker processes.
I am using Python 3.6.4, but the documentation should be the same, from https://docs.python.org/3/library/multiprocessing.html#contexts-and-start-methods it says that:
The fork server process is single threaded so it is safe for it to use os.fork(). No unnecessary resources are inherited.
Also, it says:
Better to inherit than pickle/unpickle
When using the spawn or forkserver start methods many types from multiprocessing need to be picklable so that child processes can use them. However, one should generally avoid sending shared objects to other processes using pipes or queues. Instead you should arrange the program so that a process which needs access to a shared resource created elsewhere can inherit it from an ancestor process.
So apparently a key object that my worker process needs to work on did not get inherited by the server process and then passing to workers, why did that happen? I wonder what exactly gets inherited by forkserver process from parent process?
Here is what my code looks like:
import multiprocessing
import (a bunch of other modules)
def worker_func(nameList):
global largeObject
for item in nameList:
# get some info from largeObject using item as index
# do some calculation
return [item, info]
if __name__ == '__main__':
result = []
largeObject # This is my large object, it's read-only and no modification will be made to it.
nameList # Here is a list variable that I will need to get info for each item in it from the largeObject
ctx_in_main = multiprocessing.get_context('forkserver')
print('Start parallel, using forking/spawning/?:', ctx_in_main.get_context())
cores = ctx_in_main.cpu_count()
with ctx_in_main.Pool(processes=4) as pool:
for x in pool.imap_unordered(worker_func, nameList):
result.append(x)
Thank you!
Best,

Theory
Below is an excerpt from Bojan Nikolic blog
Modern Python versions (on Linux) provide three ways of starting the separate processes:
Fork()-ing the parent processes and continuing with the same processes image in both parent and child. This method is fast, but potentially unreliable when parent state is complex
Spawning the child processes, i.e., fork()-ing and then execv to replace the process image with a new Python process. This method is reliable but slow, as the processes image is reloaded afresh.
The forkserver mechanism, which consists of a separate Python server with that has a relatively simple state and which is fork()-ed when a new processes is needed. This method combines the speed of Fork()-ing with good reliability (because the parent being forked is in a simple state).
Forkserver
The third method, forkserver, is illustrated below. Note that children retain a copy of the forkserver state. This state is intended to be relatively simple, but it is possible to adjust this through the multiprocess API through the set_forkserver_preload() method.
Practice
Thus, if you want simething to be inherited by child processes from the parent, this must be specified in the forkserver state by means of set_forkserver_preload(modules_names), which set list of module names to try to load in forkserver process. I give an example below:
# inherited.py
large_obj = {"one": 1, "two": 2, "three": 3}
# main.py
import multiprocessing
import os
from time import sleep
from inherited import large_obj
def worker_func(key: str):
print(os.getpid(), id(large_obj))
sleep(1)
return large_obj[key]
if __name__ == '__main__':
result = []
ctx_in_main = multiprocessing.get_context('forkserver')
ctx_in_main.set_forkserver_preload(['inherited'])
cores = ctx_in_main.cpu_count()
with ctx_in_main.Pool(processes=cores) as pool:
for x in pool.imap(worker_func, ["one", "two", "three"]):
result.append(x)
for res in result:
print(res)
Output:
# The PIDs are different but the address is always the same
PID=18603, obj id=139913466185024
PID=18604, obj id=139913466185024
PID=18605, obj id=139913466185024
And if we don't use preloading
...
ctx_in_main = multiprocessing.get_context('forkserver')
# ctx_in_main.set_forkserver_preload(['inherited'])
cores = ctx_in_main.cpu_count()
...
# The PIDs are different, the addresses are different too
# (but sometimes they can coincide)
PID=19046, obj id=140011789067776
PID=19047, obj id=140011789030976
PID=19048, obj id=140011789030912

So after an inspiring discussion with Alex I think I have sufficient info to address my question: what exactly gets inherited by forkserver process from parent process?
Basically when the server process starts, it will import your main module and everything before if __name__ == '__main__' will be executed. That's why my code don't work, because large_object is nowhere to be found in server process and in all those worker processes that fork from the server process.
Alex's solution works because large_object now gets imported to both main and server process so every worker forked from server will also gets large_object. If combined with set_forkserver_preload(modules_names) all workers might even get the same large_object from what I saw. The reason for using forkserver is explicitly explained in Python documentations and in Bojan's blog:
When the program starts and selects the forkserver start method, a server process is started. From then on, whenever a new process is needed, the parent process connects to the server and requests that it fork a new process. The fork server process is single threaded so it is safe for it to use os.fork(). No unnecessary resources are inherited.
The forkserver mechanism, which consists of a separate Python server with that has a relatively simple state and which is fork()-ed when a new processes is needed. This method combines the speed of Fork()-ing with good reliability (because the parent being forked is in a simple state).
So it's more on the safe side of concern here.
On a side note, if you use fork as the starting method though, you don't need to import anything since all child process gets a copy of parents process memory (or a reference if the system use COW-copy-on-write, please correct me if I am wrong). In this case using global large_object will get you access to large_object in worker_func directly.
The forkserver might not be a suitable approach for me because the issue I am facing is memory overhead. All the operations that gets me large_object in the first place are memory-consuming, so I don't want any unnecessary resources in my worker processes.
If I put all those calculations directly into inherited.py as Alex suggested, it will be executed twice (once when I imported the module in main and once when the server imports it; maybe even more when worker processes were born?), this is suitable if I just want a single-threaded safe process that workers can fork from. But since I am trying to get workers to not inherit unnecessary resources and only get large_object, this won't work.
And putting those calculations in __main__ in inherited.py won't work either since now none of the processes will execute them, including main and server.
So, as a conclusion, if the goal here is to get workers to inherit minimal resources, I am better off breaking my code into 2, do calculation.py first, pickle the large_object, exit the interpreter, and start a fresh one to load the pickled large_object. Then I can just go nuts with either fork or forkserver.

Parallel python loss of data

I have a python function that creates and stores a object instance in a global list and this function is called by a thread. While the thread runs the lists is filled up as it should be, but when the thread exits the list is empty and I have no idea why. Any help would be appreciated.
simulationResults = []
def run(width1, height1, seed1, prob1):
global simulationResults
instance = Life(width1, height1, seed1, prob1)
instance.run()
simulationResults.append(instance)
this is called in my main by:
for i in range(1, nsims + 1):
simulations.append(multiprocessing.Process(target=run, args=(width, height, seed, prob)))
simulations[(len(simulations) - 1)].start()
for i in simulations:
i.join()

multiprocessing is based on processes, not threads. The important difference: Each process has a separate memory space, while threads share a common memory space. When first created, a process may (depending on OS, spawn method, etc.) be able to read the same values the parent process has, but if it writes to them, only the local values are changed, not the parent's copy. Only threads can rely on being able to access an arbitrary single shared global variable and have it behave as expected.
I'd suggest looking at either multiprocessing.Pool and its various methods to dispatch tasks and retrieve their results later, or if you must use raw Processes, look at the various ways to exchange data between processes; you can't just assign to a global variable, because globals stop being shared when the new Process is forked/spawned.

In your code you are creating new processes rather than threads. When the process is created the new process will have deep copies of the variables in the main process, but they are independent from each other. I think for your case it makes sense to use processes rather than threads because It would allow you to utilise multiple cores as opposed to thread that will be limited to a single core due to GIL.
You will have to use interprocess communication techniques to communicate between processes. But since in your case the processes are not persistent daemons, it would make sense to write the simulationResults into a different unique file by each process and read them back from the main process.

Lock threads in Python for a task

I have a program that uses threads to start another thread once a certain threshold is reached. Right now the second thread is being started multiple times. I implemented a lock but I don't think I did it right.
for i in range(max_threads):
t1 = Thread(target=grab_queue)
t1.start()
in grab_queue, I have:
...
rows.append(resultJson)
if len(rows.value()) >= 250:
with Lock():
row_thread = Thread(target=insert_rows, kwargs={'rows': rows.value()})
row_thread.start()
rows.reset()
Which starts another thread to process the list of rows. I would like to make sure that as soon as it hits the if condition, the other threads wont run in order to make sure that extra threads to process the list of rows aren't started.

Your lock is covering the wrong portion of the code. You have a race condition between the check for the size of rows, and the portion of the code where you reset the rows. Given that the lock is taken only after the size check, two threads could easily both decide that the array has grown too large, and only then would the lock kick in to serialize the resetting of the array. "Serialize" in this case means that the task would still be performed twice, once by each thread, but it would happen in succession rather than in parallel.
The correct code could look like this:
rows.append(resultJson)
with grow_lock:
if len(rows.value()) >= 250:
row_thread = Thread(target=insert_rows, kwargs={'rows': rows.value()})
row_thread.start()
rows.reset()
There is another issue with the code as shown in the question: if Lock() refers to threading.Lock, it is creating and locking a new lock on each invocation, and in each thread! A lock protects a resource shared among threads, and to perform that function, the lock must itself be shared. To fix the problem, instantiate the lock once and pass it to the thread's target function.
Taking a step back, your code implements a custom thread pool. Getting that right and covering all the corner cases takes a lot of work, testing, and debugging. There are production-tested modules specialized for that purpose, such as the multiprocessing module shipped with Python (which supports both process and thread pools), and it is a good idea to get acquainted with them before reimplementing their functionality. See, for example, this article for an accessible introduction to multiprocessing-based thread pools.

Should I create a new Pool object every time or reuse a single one?

I'm trying to understand the best practices with Python's multiprocessing.Pool object.
In my program I use Pool.imap very frequently. Normally every time I start tasks in parallel I create a new pool object and then close it after I'm done.
I recently encountered a hang where the number of tasks submitted to the pool was less than the number of processes. What was odd was that it only occurred in my test pipeline which had a bunch of things run before it. Running the test as a standalone did not cause the hand. I assume it has to do with making multiple pools.
I'd really like to find some resources to help me understand the best practices in using Python's multiprocessing. Specifically I'm currently trying to understand the implications of making several pool objects versus using only one.

When you create a Pool of worker processes, new processes are spawned from the parent one. This is a very fast operation but it has its cost.
Therefore, as long as you don't have a very good reason, for example the Pool breaks due to one worker dying unexpectedly, it's better to always use the same Pool instance.
The reason for the hang is hard to tell without inspecting the code. You might not have clean the previous instances properly (call close()/stop() and then always call join()). You might have sent too big data through the Pool channel which usually ends up with a deadlock and so on.
Surely a pool does not break if you submit less tasks than workers. The pool is designed exactly to de-couple the number of tasks from the number of workers.

Develop Reference

Python is a programming language that lets you work quickly and integrate systems more effectively.