How to persist scheduled jobs using APScheduler until they finish completely? - python

I'm using APScheduler with Python 2.7.6. I'm using BlockingScheduler to store scheduled jobs and SQLAlchemy as persistent database.
I want to schedule jobs and guarantee that they finish (function reach last line). Everything is working fine, but I see that when a job is started, it's removed from the database, even when the job did not finish the entire method.
Note: Obviously, I developed jobs that do not have state and can be re-executed in next program executions. This should be not an issue to be discussed in this question.
What is the best way to persist a job until the complete function/method is executed using APScheduler?

I had a similar problem, and was able to resolve it using Background Scheduler instead of blocking scheduler.

Related

How to check if celery task is already running before running it again with beat?

I have a periodic task scheduled to run every 10 minutes. Sometimes this task completes in 2-3 minutes, sometimes it takes 20 minutes.
Is there any way using celery beats to not open the task if the previous task hasn't completed yet? I don't see an option for it in the interval settings.
No, Celery Beat knows nothing about the running tasks.
One way to achieve what you are trying to do is to link the task to itself. async_apply() for an example has optional parameter link and link_error which can be used to provide a signature (it can be a single task too) to run if the task finishes successfully (link) or unsuccessfully (link_error).
What I use is the following - I schedule task to run frequently (say every 5 minutes), and I use a distributed lock to make sure I always have only one instance of the task running.
Finally a reminder - you can always implement your own scheduler, and use it in your beat configuration. I was thinking about doing this in the past for exactly the same thing you want, but decided that the solution I already have is good enough for me.
You can try this
It provides you with a singleton base class for your tasks.
I use Celery with Django models and I implemented a boolean has_task_running at the model level. Then with Celery signals I change the state of the flag to True when signal before_task_publish is trigged and False when a task terminates. Not simple but flexible.

Manual job execution outside of schedule in APScheduler

I have a job which is scheduled using the cron scheduler in APScheduler to run a function of some sort for my Flask app.
I'd like to also be able to manually run this function, without interrupting the schedule that I also have set up.
For example, say the task is set to run once per day, I'd also like to run it manually whenever a user does a particular thing.
It is important that two instances of the job not be run at the same time (which is why I'm not simply calling the function itself) - so I'm trying to come up with a solution using APScheduler to prevent a scenario where the manual trigger is performed while the scheduled run is busy.
This is effectively a duplicate of this question: APScheduler how to trigger job now
Lars Blumberg's answer was the one that solved it for me. I used this line:
scheduler_object.get_job(job_id ="my_job_id").modify(next_run_time=datetime.datetime.now())
This ensures that the particular job will run immediately, and maintain the previous schedule. If the scheduled job is already running, this will not trigger the job now (desired behaviour for me)...unless you have set max_instances to more than 1. Similarly, if you manually execute the job and it is running when the scheduled run is triggered, it will also not execute unless max_instances is greater than 1.

pika connection times out during execution of long task (3+ minutes)

I have a process in which I need to assign long running tasks amongst a pool of workers, in python. So far I have been using RabbitMQ to queue the tasks (input is a nodejs frontend); a python worker subscribes to the queue, obtains a task and executes it. Each task takes several minutes minimum.
After an update this process started breaking, and I eventually discovered this was due to RabbitMQ version 3.6.10 having changed the way it handles timeouts. I now believe I need to rethink my method of assigning tasks, but I want to make sure I do it the right way.
Until now I only had one worker (the task is to control a sequence of actions in a VM - I couldn't afford a new Windows license for a while, so until recently I had no practical way of testing parallel task execution); I suspect if I'd had two before I would have noticed this sooner. The worker attaches to a VM using libvirt to control it. The way my code is written currently implies that I would run one instance of the script per VM that I wish to control.
I suspect that part of my problem is the use of BlockingConnection - I think I need a way for the worker to disconnect from the queue when it has received and validated a task (this part takes less than 1 sec), then reconnect once it has completed the actions, but I haven't figured out how to do this yet. Is this correct? If so, how should I do this, and if not, what should I do instead?
One other idea I've had is that instead of running a script per VM I could have a global control script that on receiving a task would spin off a thread which would handle the task. This would solve the problem of the connection timing out during task execution, but the timeout would just have moved to a different stage: I would potentially receive tasks while there were no idle VMs, and I would have to come up with a way to make the script await an available VM without breaking the RabbitMQ connection.
My current code can be seen here:
https://github.com/scherma/antfarm/blob/master/src/runmanager/runmanager.py#L342
Any thoughts folks?

APScheduler using cron and instant triggers together

Im writing an app for Raspberry Pi. App has to run periodic tasks and also connected to main server over socket.io to get commands from server. I preferred APscheduler to run periodic tasks because it gives ability to control task intervals dynamically. I used socketIO_client to get cron statements from server and apply them on running tasks. Up until this point it works like charm. Yet i need some more functionality.
Between periodic task runs, i want to run tasks by socket.io server events. On this site i found similar problem on this question and applied answer. Normally APscheduler is smart enough not to run task before previous task finished by setting coalesce True and/or max_instances 1. But with job.func() method, job starts even though previous hasn't finished yet.
Basically what i want is run a function periodically and also be able to run between intervals by server events. If job started either cron or server event, up until it finishes new job should be passed. Is there any way to do that?
Sorry, that is not currently possible natively with APScheduler. You'll have to create two jobs and share a lock object or something among them that will make sure they don't run simultaneously.

apscheduler - multiple instances

I have apscheduler running in django and it appears to work ... okay. In my project init.py, I initialize the scheduler:
scheduler = Scheduler(daemon=True)
print("\n\n\n\n\n\n\n\nstarting scheduler")
scheduler.configure({'apscheduler.jobstores.file.class': settings.APSCHEDULER['jobstores.file.class']})
scheduler.start()
atexit.register(lambda: scheduler.shutdown(wait=False))
The first problem with this is that the print shows this code is executed twice. Secondly, in other applications, I'd like to reference the scheduler, but haven't a clue how to do that. If I get another instance of a scheduler, I believe it is a separate threadpool and not the one created here.
how do I get one and only one instance of apscheduler running?
how do I reference that instance in other apps?
That depends on how you ended up with two scheduler instances in the first place. Are you starting apscheduler in a worker thread/process? If you have more than one such worker, you're going to get multiple instances of the scheduler. So, you have to find a way to prevent the scheduler from being started more than once by either running it in a different process if possible, or adding some condition to the scheduler startup.
You don't. Variables are local to each process. The best you can do is to build some kind of remote execution system, either using some kind of a ReST service or some remote control system like execnet or rpyc.

Categories

Resources