APScheduler using cron and instant triggers together - python

Im writing an app for Raspberry Pi. App has to run periodic tasks and also connected to main server over socket.io to get commands from server. I preferred APscheduler to run periodic tasks because it gives ability to control task intervals dynamically. I used socketIO_client to get cron statements from server and apply them on running tasks. Up until this point it works like charm. Yet i need some more functionality.
Between periodic task runs, i want to run tasks by socket.io server events. On this site i found similar problem on this question and applied answer. Normally APscheduler is smart enough not to run task before previous task finished by setting coalesce True and/or max_instances 1. But with job.func() method, job starts even though previous hasn't finished yet.
Basically what i want is run a function periodically and also be able to run between intervals by server events. If job started either cron or server event, up until it finishes new job should be passed. Is there any way to do that?

Sorry, that is not currently possible natively with APScheduler. You'll have to create two jobs and share a lock object or something among them that will make sure they don't run simultaneously.

Related

Manual job execution outside of schedule in APScheduler

I have a job which is scheduled using the cron scheduler in APScheduler to run a function of some sort for my Flask app.
I'd like to also be able to manually run this function, without interrupting the schedule that I also have set up.
For example, say the task is set to run once per day, I'd also like to run it manually whenever a user does a particular thing.
It is important that two instances of the job not be run at the same time (which is why I'm not simply calling the function itself) - so I'm trying to come up with a solution using APScheduler to prevent a scenario where the manual trigger is performed while the scheduled run is busy.
This is effectively a duplicate of this question: APScheduler how to trigger job now
Lars Blumberg's answer was the one that solved it for me. I used this line:
scheduler_object.get_job(job_id ="my_job_id").modify(next_run_time=datetime.datetime.now())
This ensures that the particular job will run immediately, and maintain the previous schedule. If the scheduled job is already running, this will not trigger the job now (desired behaviour for me)...unless you have set max_instances to more than 1. Similarly, if you manually execute the job and it is running when the scheduled run is triggered, it will also not execute unless max_instances is greater than 1.

pika connection times out during execution of long task (3+ minutes)

I have a process in which I need to assign long running tasks amongst a pool of workers, in python. So far I have been using RabbitMQ to queue the tasks (input is a nodejs frontend); a python worker subscribes to the queue, obtains a task and executes it. Each task takes several minutes minimum.
After an update this process started breaking, and I eventually discovered this was due to RabbitMQ version 3.6.10 having changed the way it handles timeouts. I now believe I need to rethink my method of assigning tasks, but I want to make sure I do it the right way.
Until now I only had one worker (the task is to control a sequence of actions in a VM - I couldn't afford a new Windows license for a while, so until recently I had no practical way of testing parallel task execution); I suspect if I'd had two before I would have noticed this sooner. The worker attaches to a VM using libvirt to control it. The way my code is written currently implies that I would run one instance of the script per VM that I wish to control.
I suspect that part of my problem is the use of BlockingConnection - I think I need a way for the worker to disconnect from the queue when it has received and validated a task (this part takes less than 1 sec), then reconnect once it has completed the actions, but I haven't figured out how to do this yet. Is this correct? If so, how should I do this, and if not, what should I do instead?
One other idea I've had is that instead of running a script per VM I could have a global control script that on receiving a task would spin off a thread which would handle the task. This would solve the problem of the connection timing out during task execution, but the timeout would just have moved to a different stage: I would potentially receive tasks while there were no idle VMs, and I would have to come up with a way to make the script await an available VM without breaking the RabbitMQ connection.
My current code can be seen here:
https://github.com/scherma/antfarm/blob/master/src/runmanager/runmanager.py#L342
Any thoughts folks?

How to persist scheduled jobs using APScheduler until they finish completely?

I'm using APScheduler with Python 2.7.6. I'm using BlockingScheduler to store scheduled jobs and SQLAlchemy as persistent database.
I want to schedule jobs and guarantee that they finish (function reach last line). Everything is working fine, but I see that when a job is started, it's removed from the database, even when the job did not finish the entire method.
Note: Obviously, I developed jobs that do not have state and can be re-executed in next program executions. This should be not an issue to be discussed in this question.
What is the best way to persist a job until the complete function/method is executed using APScheduler?
I had a similar problem, and was able to resolve it using Background Scheduler instead of blocking scheduler.

Making a zmq server run forever in Django?

I'm trying to figure that best way to keep a zeroMQ listener running forever in my django app.
I'm setting up a zmq server app in my Django project that acts as internal API to other applications in our network (no need to go through http/requests stuff since these apps are internal). I want the zmq listener inside of my django project to always be alive.
I want the zmq listener in my Django project so I have access to all of the projects models (for querying) and other django context things.
I'm currently thinking:
Set up a Django management command that will run the listener and keep it alive forever (aka infinite loop inside the zmq listener code) or
use a celery worker to always keep the zmq listener alive? But I'm not exactly sure on how to get a celery worker to restart a task only if it's not running. All the celery docs are about frequency/delayed running. Or maybe I should let celery purge the task # a given interval & restart it anyways..
Any tips, advice on performance implications or alternate approaches?
Setting up a management command is a fine way to do this, especially if you're running on your own hardware.
If you're running in a cloud, where a machine may disappear along with your process, then the latter is a better option. This is how I've done it:
Setup a periodic task that runs every N seconds (you need celerybeat running somewhere)
When the task spawns, it first checks a shared network resource (redis, zookeeper, or a db), to see if another process has an active/valid lease. If one exists, abort.
If there's no valid lease, obtain your lease (beware of concurrency here!), and start your infinite loop, making sure you periodically renew the lease.
Add instrumentation so that you know who, where the process is running.
Start celery workers on multiple boxes, consuming from the same queue your periodic task is designated for.
The second solution is more complex and harder to get right; so if you can, a singleton is great and consider using something like supervisord to ensure the process gets restarted if it faults for some reason.

Where to put message queue consumer in Django?

I'm using Carrot for a message queue in a Django project and followed the tutorial, and it works fine. But the example runs in the console, and I'm wondering how I apply this in Django. The publisher class I'm calling from one of my models in models.py, so that's OK. But I have no idea where to put the consumer class.
Since it just sits there with .wait(), I don't know at what point or where I need to instantiate it so that it's always running and listening for messages!
Thanks!
The consumer is simply a long running script in the example you cite from the tutorial. It pops a message from the queue, does something, then calls wait and essentially goes to sleep until another message comes in.
This script could just be running at the console under your account or configured as a unix daemon or a win32 service. In production, you'd want to make sure that if it dies, it can be restarted, etc (a daemon or service would be more appropriate here).
Or you could take out the wait call and run it under the windows scheduler or as a cron job. So it processes the queue every n minutes or something and exits. It really depends on your application requirements, how fast your queue is filling up, etc.
Does that make sense or have I totally missed what you were asking?
If what you are doing is processing tasks, please check out celery: http://github.com/ask/celery/

Categories

Resources