Preventing management commands from running more than one at a time - python

I'm designing a long running process, triggered by a Django management command, that needs to run on a fairly frequent basis. This process is supposed to run every 5 min via a cron job, but I want to prevent it from running a second instance of the process in the rare case that the first takes longer than 5 min.
I've thought about creating a touch file that gets created when the management process starts and is removed when the process ends. A second management command process would then check to make sure the touch file didn't exist before running. But that seems like a problem if a process dies abruptly without properly removing the touch file. It seems like there's got to be a better way to do that check.
Does anyone know any good tools or patterns to help solve this type of issue?

For this reason I prefer to have a long-running process that gets its work off of a shared queue. By long-running I mean that its lifetime is longer than a single unit of work. The process is then controlled by some daemon service such as supervisord which can take over control of restarting the process when it crashes. This delegates the work appropriately to something that knows how to manage process lifecycles and frees you from having to worry about the nitty gritty of posix processes in the scope of your script.
If you have a queue, you also have the luxury of being able to spin up multiple processes that can each take jobs off of the queue and process them, but that sounds like it's out of scope of your problem.

Related

How to enforce only one running instance of a process in python Django framework?

I have a python Django manage command that should be called upon receiving an input file but this command is not safe for parallel calls. So an input file should be processed only and only when there is no other file being processed.
One solution that I have is to use a lock file. Basically, create a lock file at the start of the process and delete it at the end.
I'm worried that if the process crashes the lock file won't be deleted and consequently none of the other files would be processed until we manually remove that lock file.
The solution doesn't need to be specific for Django or even python, but what is the best practice to enforce that only one instance of this process is being run?
As KlausD mentions in his comment, the canonical (and language-agnostic) solution is to use a lock file containing the pid of the running process, so the code responsible for the lock acquisition can check if the process is still running.
An alternative solution if you use redis in your project is to store the lock in redis with a TTL that's a bit longer than the worst case runtime of the task. This makes sure the lock will be freed whatever, and also allow to easily share the lock between multiple servers if needed.
EDIT:
is it possible that the process crashes and another process pick up the same pid?
Yes, of course, and that's even rather likely (and this is an understatement) on a server running for month or more without reboot, and even more so if the server runs a lot of short-lived processes. You will not only have to check if there's a running process matching this pid but also get the process stats to inspect the process start time, the command line, the parent etc and decides the likelyhood it's the same process or a new one.
Note that this is nothing new - most process monitoring tools face the same problem, so you may want to check how they solved it (gunicorn might be a good starting point here).

Which is better to send notifications, a python loop/timer, or as a cron job?

I've built a notification system w/ raspberry pi which checks a Database every two minutes, and if any new entries are found, it sends out emails. I have it working two ways..
A python script starts at boot and runs forever. It has a timer built into the loop. Every two minutes, the DB is checked and emails are sent.
A python script is set to check DB and send emails. A cron job is set to run this script every two minutes.
which would be the better choice and why?
Your first option, even if you use a sleep implements a kind of busy-waiting strategy
(https://en.wikipedia.org/wiki/Busy_waiting),
this stragegy uses more CPU/memory than your second option (the cron approach)
because you will have in memory your processus footprint
even if it is actually doing nothing.
On the other hand, in the cron approach your processus will only appear while doing useful activities.
Just Imagine if you implement this kind of approach
for many programs running on your machine,
a lot of memory will be consume by processus in waiting states,
it will also have an impact (memory/CPU usage) on the scheduling algorithm of your OS
since it will have more processes in queue to manage.
Therefore, I would absolutely recommend the cron/scheduling approach.
Anyway,your cron daemon will be running in background whether you add the entry or not in the crontab, so why not adding it?
Last but not least, imagine if your busy-waiting processus is killed for any reason, if you go for the first option you will need to restart it manually and you might lose a couple of monitoring entries.
Hope it helps you.

Find reason for sleeping python process

I have a unittest that does a bunch of stuff in several different threads. When I stop everything in the tearDown method, somehow something is still running. And by running I mean sleeping. I ran the top command on the python process (Ubuntu 12.04), which told me that the process was sleeping.
Now I have tried using pdb to figure out what is going on, e.g. by putting set_trace() at the end of tearDown. But that tells me nothing. I suspect this is because some other thread has started sleeping earlier and is therefore not accessed anymore at this point.
Is there any tool or method I can use to track down the cause of my non-stopping process?
EDIT
Using ps -Tp <#Process> -o wchan I now know that 4 threads are still running, of which three waiting on futex_wait_queue_me and one on unix_stream_data_wait. Since I had a subprocess previously, which I killed with os.kill(pid, signal.SIGKILL), I suspect that the Pipe connection is somehow still waiting for that process. Perhaps the fast mutexes are waiting for that as well.
Is there anyway I could further reduce the search space?
If you are working under Linux then you should be able to use 'ps -eLf' to get a list of all active processes and threads. Assuming your have given your threads good names at creation it should be easy to see what is still running.
I believe under windows you can get a tool to do something similar - see http://technet.microsoft.com/en-us/sysinternals/bb896645.aspx
N.B. I have not used the windows tool this myself
Also from within Python you can use the psutil package (https://pypi.python.org/pypi/psutil/) to get similar infomration

Python Celery task to restart celery worker

In celery, is there a simple way to create a (series of) task(s) that I could use to automagically restart a worker?
The goal is to have my deployment automagically restart all the child celery workers every time it gets a new source from github. So I could then send out a restartWorkers() task to my management celery instance on that machine that would kill (actually stopwait) all the celery worker processes on that machine, and restart them with the new modules.
The plan is for each machine to have:
Management node [Queues: Management, machine-specific] - Responsible for managing the rest of the workers on the machine, bringing up new nodes and killing old ones as necessary
Worker nodes [Queues: git revision specific, worker specific, machine specific] - Actually responsible for doing the work.
It looks like the code I need is somewhere in dist_packages/celery/bin/celeryd_multi.py, but the source is rather opaque for starting workers, and I can't tell how it's supposed to work or where it's actually starting the nodes. (It looks like shutdown_nodes is the correct code to be calling for killing the processes, and I'm slowly debugging my way through it to figure out what my arguments should be)
Is there a function/functions restart_nodes(self, nodes) somewhere that I could call or am I going to be running shell scripts from within python?
/Also, is there a simpler way to reload the source into Python than killing and restarting the processes? If I knew that reloading the module actually worked(Experiments say that it doesn't. Changes to functions do not percolate until I restart the process), I'd just do that instead of the indirection with management nodes.
EDIT:
I can now shutdown, thanks to broadcast(Thank you mihael. If I had more rep, I'd upvote). Any way to broadcast a restart? There's pool_restart, but that doesn't kill the node, which means that it won't update the source.
I've been looking into some of the behind the scenes source in celery.bin.celeryd:WorkerCommand().run(), but there's some weird stuff going on before and after the run call, so I can't just call that function and be done because it crashes. It just makes 0 sense to call a shell command from a python script to run another python script, and I can't believe that I'm the first one to want to do this.
You can try to use broadcast functionality of Celery.
Here you can see some good examples: https://github.com/mher/flower/blob/master/flower/api/control.py

Tasks queue process in python

Task is:
I have task queue stored in db. It grows. I need to solve tasks by python script when I have resources for it. I see two ways:
python script working all the time. But i don't like it (reason posible memory leak).
python script called by cron and do a little part of task. But i need to solve the problem of one working active script in memory (To prevent active scripts count grow). What is the best solution to implement it in python?
Any ideas to solve this problem at all?
You can use a lockfile to prevent multiple scripts from running out of cron. See the answers to an earlier question, "Python: module for creating PID-based lockfile". This is really just good practice in general for anything that you need to make sure won't have multiple instances running, actually, so you should look into it even if you do have the script running constantly, which I do suggest.
For most things, it shouldn't be too hard to avoid memory leaks, but if you're having a lot of trouble with it (I sometimes do with complex third-party web frameworks, for example), I would suggest instead writing the script with a small, carefully-designed main loop that monitors the database for new jobs, and then uses the multiprocessing module to fork off new processes to complete each task.
When a task is complete, the child process can exit, immediately freeing any memory that isn't properly garbage collected, and the main loop should be simple enough that you can avoid any memory leaks.
This also offers the advantage that you can run multiple tasks in parallel if your system has more than one CPU core, or if your tasks spend a lot of time waiting for I/O.
This is a bit of a vague question. One thing you should remember is that it is very difficult to leak memory in Python, because of the automatic garbage collection. croning a Python script to handle the queue isn't very nice, although it would work fine.
I would use method 1; if you need more power you could make a small Python process that monitors the DB queue and starts new processes to handle the tasks.
I'd suggest using Celery, an asynchronous task queuing system which I use myself.
It may seem a bit heavy for your use case, but it makes it easy to expand later by adding more worker resources if/when needed.

Categories

Resources