I have apscheduler running in django and it appears to work ... okay. In my project init.py, I initialize the scheduler:
scheduler = Scheduler(daemon=True)
print("\n\n\n\n\n\n\n\nstarting scheduler")
scheduler.configure({'apscheduler.jobstores.file.class': settings.APSCHEDULER['jobstores.file.class']})
scheduler.start()
atexit.register(lambda: scheduler.shutdown(wait=False))
The first problem with this is that the print shows this code is executed twice. Secondly, in other applications, I'd like to reference the scheduler, but haven't a clue how to do that. If I get another instance of a scheduler, I believe it is a separate threadpool and not the one created here.
how do I get one and only one instance of apscheduler running?
how do I reference that instance in other apps?
That depends on how you ended up with two scheduler instances in the first place. Are you starting apscheduler in a worker thread/process? If you have more than one such worker, you're going to get multiple instances of the scheduler. So, you have to find a way to prevent the scheduler from being started more than once by either running it in a different process if possible, or adding some condition to the scheduler startup.
You don't. Variables are local to each process. The best you can do is to build some kind of remote execution system, either using some kind of a ReST service or some remote control system like execnet or rpyc.
Related
Im writing an app for Raspberry Pi. App has to run periodic tasks and also connected to main server over socket.io to get commands from server. I preferred APscheduler to run periodic tasks because it gives ability to control task intervals dynamically. I used socketIO_client to get cron statements from server and apply them on running tasks. Up until this point it works like charm. Yet i need some more functionality.
Between periodic task runs, i want to run tasks by socket.io server events. On this site i found similar problem on this question and applied answer. Normally APscheduler is smart enough not to run task before previous task finished by setting coalesce True and/or max_instances 1. But with job.func() method, job starts even though previous hasn't finished yet.
Basically what i want is run a function periodically and also be able to run between intervals by server events. If job started either cron or server event, up until it finishes new job should be passed. Is there any way to do that?
Sorry, that is not currently possible natively with APScheduler. You'll have to create two jobs and share a lock object or something among them that will make sure they don't run simultaneously.
I'm using APScheduler with Python 2.7.6. I'm using BlockingScheduler to store scheduled jobs and SQLAlchemy as persistent database.
I want to schedule jobs and guarantee that they finish (function reach last line). Everything is working fine, but I see that when a job is started, it's removed from the database, even when the job did not finish the entire method.
Note: Obviously, I developed jobs that do not have state and can be re-executed in next program executions. This should be not an issue to be discussed in this question.
What is the best way to persist a job until the complete function/method is executed using APScheduler?
I had a similar problem, and was able to resolve it using Background Scheduler instead of blocking scheduler.
Is there any Celery functionality or preferred way of executing periodic background tasks locally when using a single worker? Sort of like a background thread, but scheduled and handled by Celery?
celery.beat doesn't seem suitable as it appears to be simply tied to a consumer (so could run on any server) - that's the type of scheduling I was after, but just a task that is always run locally on each server running this worker (the task does some cleanup and stats relating to the main task the worker handles).
I may be going about this the wrong way, but I'm confined to implementing this within a celery worker daemon.
You could use a custom remote control command and use the broadcast function on a cron to run cleanup or whatever else might be required.
One possible method I thought of, though not ideal, is to patch the celery.worker.heartbeat Heart() class.
Since we already use heartbeats, the class allows for a simple modification to its start() method (add another self.timer.call_repeatedly() entry), or an additional self.eventer.on_enabled.add() __init__ entry which references a new method that also uses self.timer.call_repeatedly() to perform a periodic task.
I'm trying to figure that best way to keep a zeroMQ listener running forever in my django app.
I'm setting up a zmq server app in my Django project that acts as internal API to other applications in our network (no need to go through http/requests stuff since these apps are internal). I want the zmq listener inside of my django project to always be alive.
I want the zmq listener in my Django project so I have access to all of the projects models (for querying) and other django context things.
I'm currently thinking:
Set up a Django management command that will run the listener and keep it alive forever (aka infinite loop inside the zmq listener code) or
use a celery worker to always keep the zmq listener alive? But I'm not exactly sure on how to get a celery worker to restart a task only if it's not running. All the celery docs are about frequency/delayed running. Or maybe I should let celery purge the task # a given interval & restart it anyways..
Any tips, advice on performance implications or alternate approaches?
Setting up a management command is a fine way to do this, especially if you're running on your own hardware.
If you're running in a cloud, where a machine may disappear along with your process, then the latter is a better option. This is how I've done it:
Setup a periodic task that runs every N seconds (you need celerybeat running somewhere)
When the task spawns, it first checks a shared network resource (redis, zookeeper, or a db), to see if another process has an active/valid lease. If one exists, abort.
If there's no valid lease, obtain your lease (beware of concurrency here!), and start your infinite loop, making sure you periodically renew the lease.
Add instrumentation so that you know who, where the process is running.
Start celery workers on multiple boxes, consuming from the same queue your periodic task is designated for.
The second solution is more complex and harder to get right; so if you can, a singleton is great and consider using something like supervisord to ensure the process gets restarted if it faults for some reason.
In celery, is there a simple way to create a (series of) task(s) that I could use to automagically restart a worker?
The goal is to have my deployment automagically restart all the child celery workers every time it gets a new source from github. So I could then send out a restartWorkers() task to my management celery instance on that machine that would kill (actually stopwait) all the celery worker processes on that machine, and restart them with the new modules.
The plan is for each machine to have:
Management node [Queues: Management, machine-specific] - Responsible for managing the rest of the workers on the machine, bringing up new nodes and killing old ones as necessary
Worker nodes [Queues: git revision specific, worker specific, machine specific] - Actually responsible for doing the work.
It looks like the code I need is somewhere in dist_packages/celery/bin/celeryd_multi.py, but the source is rather opaque for starting workers, and I can't tell how it's supposed to work or where it's actually starting the nodes. (It looks like shutdown_nodes is the correct code to be calling for killing the processes, and I'm slowly debugging my way through it to figure out what my arguments should be)
Is there a function/functions restart_nodes(self, nodes) somewhere that I could call or am I going to be running shell scripts from within python?
/Also, is there a simpler way to reload the source into Python than killing and restarting the processes? If I knew that reloading the module actually worked(Experiments say that it doesn't. Changes to functions do not percolate until I restart the process), I'd just do that instead of the indirection with management nodes.
EDIT:
I can now shutdown, thanks to broadcast(Thank you mihael. If I had more rep, I'd upvote). Any way to broadcast a restart? There's pool_restart, but that doesn't kill the node, which means that it won't update the source.
I've been looking into some of the behind the scenes source in celery.bin.celeryd:WorkerCommand().run(), but there's some weird stuff going on before and after the run call, so I can't just call that function and be done because it crashes. It just makes 0 sense to call a shell command from a python script to run another python script, and I can't believe that I'm the first one to want to do this.
You can try to use broadcast functionality of Celery.
Here you can see some good examples: https://github.com/mher/flower/blob/master/flower/api/control.py