I recently learned that the multiprocessing package Pool with python requires you to call:
pool.close()
pool.join()
when you're finished in order to free the memory used for state in those processes. Otherwise, they persist and your computer will fill up with python jobs; they won't use the CPU, but will hog memory.
My question is this:
I'm now using celery for parallelization (instead of Pool-- I'm in operating within a Django WSGI app, and Pool makes it difficult to prevent all users from forking jobs at once, which will crash the server).
I've noticed a very similar phenomenon with celery: my 6 celery processes running in the background start to gobble up memory. Is this normal or is there the equivalent of calling close() and join() that I can use to tell a celery task that I've retrieved the result and don't need it anymore?
Thanks a lot for the help.
Oliver
As referring to documentation there's no need to close workers as they will be closed automatically. But you can force close them using kill or discard them:
from celery.task.control import discard_all
discard_all()
Here are some examples from developers.
Related
In my quest to make a parallelised python program (written on Linux) truly platform independent, I was looking for python packages that would parallelise seamlessly in windows. So, I found joblib, which looked like a godsend, because it worked on Windows without having hundreds of annoying pickling errors.
Then I ran into a problem that the processes spawned by joblib would continue to exist even if there were no parallel jobs running. This is a problem because in my code, there are multiple uses of os.chdir(). If there are processes running after the jobs end, there is no way to do os.chdir() on them, so it is not possible to delete the temporary folder where the parallel processes were working. This issue has been noted in this previous post: Joblib Parallel doesn't terminate processes
After some digging, I figured out that the problem is coming from the backend loky, which reuses the pool of processes, so keeps them alive.
So, I used loky directly, by calling the loky.ProcessPoolExecutor() instead of using loky.get_reusable_executor() that joblib uses. The ProcessPoolExecutor from loky seems to be a reimplementation of python's concurrent.futures and uses similar semantics.
This works fine, however, there is one python process that seems to still stick around, even after shutting down the ProcessPoolExecutor.
Minimal example with interactive python on Windows (use interactive shell because otherwise all the processes will terminate on exit):
>>> import loky
>>> import time
>>> def f(): # just some dummy function
... time.sleep(10)
... return 0
...
>>> pool = loky.ProcessPoolExecutor(3)
At this point, there is only one python process running (please note the PID in task manager).
Then, submit a job to the process pool (which returns a Future object).
>>> task = pool.submit(f)
>>> task
<Future at 0x11ad6f37640 state=running>
There are 5 processes now. One is the main process (PID=16508). There are three worker processes in the pool. But there is another extra process. I am really not sure what it is doing there.
After getting results, shutting the pool down removes the three worker processes. But not the one extra process.
>>> task.result()
0
>>> pool.shutdown() # you can add kill_workers=True, but does not change the result
The new python process with PID=16904 is still running. I have tried looking through the source code of loky, but I cannot figure out where that additional process is being created (and why it is necessary). Is there any way to tell this loky process to shutdown? (I do not want to resort to os.kill or some other drastic way of terminating process e.g. with SIGTERM, I want to do it programmatically if I can)
I am working on a django application which uses celery for the distributed async processes. Now I have been tasked with integrating a process which was originally written with concurrent.futures in the code. So my question is, can this job with the concurrent futures processing work inside the celery task queue. Would it cause any problems ? If so what would be the best way to go forward. The concurrent process which was written earlier is resource intensive as it is able to avoid the GIL. Also, its very fast due to it. Not only that the process uses concurrent.futures.ProcessPoolExecutor and inside it another few (<5) concurrent.futures.ThreadPoolExecutor jobs.
So now the real question is should we extract all the core functions of the process and re-write them by breaking them as celery app tasks or just keep the original code and run it as one big piece of code within the celery queue.
As per the design of the system, a user of the system can submit several such celery tasks which would contain the concurrent futures code.
Any help will be appreciated.
Your library should work without modification. There's no harm in having threaded code running within Celery, unless you are mixing in gevent with non-gevent compatible code for example.
Reasons to break the code up would be for resource management (reduce memory/CPU overhead). With threading, the thing you want to monitor is CPU load. Once your concurrency causes enough load (e.g. threads doing CPU intensive work), the OS will start swapping between threads, and your processing gets slower, not faster.
I was learning about multi-processing and multi-threading.
From what I understand, threads run on the same core, so I was wondering if I create multiple processes inside a child thread will they be limited to that single core too?
I'm using python, so this is a question about that specific language but I would like to know if it is the same thing with other languages?
I'm not a pyhton expert but I expect this is like in other languages, because it's an OS feature in general.
Process
A process is executed by the OS and owns one thread which will be executed. This is in general your programm. You can start more threads inside your process to do some heavy calculations or whatever you have to do.
But they belong to the process.
Thread
One or more threads are owned by a process and execution will be distributed across all cores.
Now to your question
When you create a given number of threads these threads should in general be distributed across all your cores. They're not limited to the core who's executing the phyton interpreter.
Even when you create a subprocess from your phyton code the process can and should run on other cores.
You can read more about the gernal concept here:
Preemptive multitasking
There're some libraries in different languages who abstract a thread to something often called a Task or something else.
For these special cases it's possible that they're just running inside the thread they were created in.
For example. In the DotNet world there's a Thread and a Task. Often people are misusing the term thread when they're talking about a Task, which in general runns inside the thread it was created.
Every program is represented through one process. A process is the execution context one or multiple threads operate in. All threads in one process share the same tranche of virtual memory assigned to the process.
Python (refering to CPython, e.g. Jython and IronPython have no GIL) is special because it has the global interpreter lock (GIL), which prevents threaded python code from being run on multiple cores in parallel. Only code releasing the GIL can operate truely parallel (I/O operations and some C-extensions like numpy). That's why you will have to use the multiprocessing module for cpu-bound python-code you need to run in parallel. Processes startet with the multiprocessing module then will run it's own python interpreter instance so you can process code truely parallel.
Note that even a single threaded python-application can run on different cores, not in parallel but sequentially, in case the OS re-schedules execution to another core after a context switch took place.
Back to your question:
if I create multiple processes inside a child thread will they be limited to that single core too?
You don't create processes inside a thread, you spawn new independent python-processes with the same limitations of the original python process and on which cores threads of the new processes will execute is up to the OS (...as long you don't manipulate the core-affinity of a process, but let's not go there).
I've installed Nginx + uWSGI + Django on a VDS with 3 CPU cores. uWSGI is configured for 6 processes and 5 threads per process. Now I want to tell uWSGI to use processes for load balancing until all processes are busy, and then to use threads if needed. It seems uWSGI prefer threads, and I have not found any config option to change this behaviour. First process takes over 100% CPU time, second one takes about 20%, and another processes are mostly not used.
Our site receives 40 r/s. Actually even having 3 processes without threads is anough to handle all requests usually. But request processing hangs from time to time for various reasons like locked shared resources, etc. In such cases we have -1 process. Users don't like to wait and click the link again and again. As a result all processes hangs and all users have to wait.
I'd add even more threads to make the server more robust. But the problem is probably python GIL. Threads wan't use all CPU cores. So multiple processes work much better for load balancing. But threads may help a lot in case of locked shared resources and i/o wait delays. A process may do much work while one of it's thread is locked.
I don't want to decrease time limits until there is no another solution. It is possible to solve this problem with threads in theory, and I don't want to show error messages to user or to make him waiting on every request until there is no another choice.
So, the solution is:
Upgrade uWSGI to recent stable version (as roberto suggested).
Use --thunder-lock option.
Now I'm running with 50 threads per process and all requests are distributed between processes equally.
Every process is effectively a thread, as threads are execution contexts of the same process.
For such a reason there is nothing like "a process executes it instead of a thread". Even without threads your process has 1 execution context (a thread). What i would investigate is why you get (perceived) poor performances when using multiple threads per process. Are you sure you are using a stable (with solid threading support) uWSGI release ? (1.4.x or 1.9.x)
Have you thought about dynamically spawning more processes when the server is overloaded ? Check the uWSGI cheaper modes, there are various algorithm available. Maybe one will fit your situation.
The GIL is not a problem for you, as from what you describe the problem is the lack of threads for managing new requests (even if from your numbers it looks you may have a too much heavy lock contention on something else)
Basically I have a lot of tasks (in batches of about 1000) and execution times of these tasks can vary widely (from less than second to 10 minutes). I know that if a task is executing more than a minute i can kill it. These tasks are steps in optimization of some data mining model (but are independent of each other) and are spending most of the time inside some C extension function so they would not cooperate if I tried to kill them gracefully.
Is there a distributed task queue that fits into that schema --- AFAIK: celery allows aborting tasks that are willing to cooperate. But I might be wrong.
I recently asked similar question about killing hanging functions in pure python Kill hanging function in Python in multithreaded enviorment.
I guess I could subclass celery task so it spawns a new process and then executes its payload aborting it's execution if it takes to long, but then I would be killed by overhead of initialization of new interpreter.
Celery supports time limiting. You can use time limits to kill long running tasks. Beside killing tasks you can use soft limits which enable to handle SoftTimeLimitExceeded exceptions in tasks and terminate tasks cleanly.
from celery.task import task
from celery.exceptions import SoftTimeLimitExceeded
#task
def mytask():
try:
do_work()
except SoftTimeLimitExceeded:
clean_up_in_a_hurry()
Pistil allows multiple process management, including killing uncooperative tasks.
But:
it's beta software, even if it powers gunicorn which is reliable
I don't know how it handle 1000 processes
Communication between process is not included yet, so you'll have to setup your own using for example zeromq
Another possibility is to use the timer signal so it raises an exception in 36000 seconds. But signals are not trigered if somebody acquire the GIL, which you C program might do.
When you revoke a celery task, you can provide it with an optional terminate=True keyword.
task.revoke(terminate=True)
It doesn't exactly fit your requirements since it's not done by the process itself, but you should be able to either extend the task class to be able to commit suicide, or have a reccurring cleanup task or process killing off tasks that have not completed on time.