Scheduling a method to run at specific times

Scheduling a method to run at specific times - python

I'm making an app in python to send texts via Twilio. I'm using flask and it's hosted on Google App Engine. I have a list of messages that need to be sent at a specific date and time, by calling my message function. What's a simple way to go about creating this? I'm relatively new to all this.
I tried apscheduler, but it only worked on my local and not on the app engine. I've read about cron jobs, but can't find anything about specific dates/times or how to pass args when the job runs.

As mention in the comments by Fabio, you could make a cron task to run every 10 min (or every minute). I would look into a folder for messages to send. If you would make a filename format in that folder to start with the date and time, you could do something like that :
folder content:
201707092205_<#message_id>
pseudo-code for sending the message:
intant_when_the_script_is_ran = datetime.now().strftime(format_to_the_minute)
for file in folder:
if intant_when_the_script_is_ran in file
with open(file, 'rw') as fh:
destination = fh.readline() #reading the fisrt line
message = fh.readlines() #reading the rest of the message
twilioapi.sendmessage(destination, message)
os.remove(file) #the remove could be done in another script to leave some traces

This is where Google app engine come in handy. You can use cron jobs from app engine. Create a cron.yaml file in your project.In this file you can make all kind of scheduling option every day to one day in a week in a particular time. The following is an example cron.yaml file
cron:
- description: "daily summary job"
url: /tasks/summary
schedule: every 24 hours
- description: "monday morning mailout"
url: /mail/weekly
schedule: every monday 09:00
timezone: Australia/NSW
- description: "new daily summary job"
url: /tasks/summary
schedule: every 24 hours
target: beta
Cron schedules are specified using a simple English-like format.
every 12 hours
every 5 minutes from 10:00 to 14:00
every day 00:00
every monday 09:00
2nd,third mon,wed,thu of march 17:00
1st monday of sep,oct,nov 17:00
1 of jan,april,july,oct 00:00
for more well-explained scheduling format please refer this documentation.

Another option is to use a taskqueue task where you specify the time that the task should be run using the eta option (estimated time of arrival).
The task will sit in the queue until its time of execution arrives, and then GAE will cause the task to be launched to do whatever processing you need such as sending a text message.
The tasks may not be executed at the precise time you specify but in my experience it is generally quite close. Certainly far more accurate than running a CRON job every 10 minutes.
This will also be much more efficient than using a CRON job because a CRON job will cause a request to your app every 10 minutes but the task will only execute when needed. If you have a low volume app this may help you stay within the free quota.

Related

Python scheduled task to run every 14 days

I am currently working on a program that needs to run every 14 days. I have looked into Schedule which works fine, but I have a few doubts about how to go about this.
I will create a service which will handle the execution of the python program itself on a CentOS 7 system.
The issue here is that every 14 days I will run a function that generates a lot of email addresses and send them to a support entity. I am afraid that if something unintended happens, and the program restart - the support entity will get spammed with emails outside the time frame in which they should receive emails.
As far as I can tell, Schedule does not have any way of determining if the program has restarted, and therefore a reboot of either the system or the service will cause this behaviour.
Would it be a correct solution to write a date to a text file after each completed function run, and then check that text file once a day to determine whether the function should run or not? This method would survive a service and/or system reboot, but is it a "correct" way of doing it?
****UPDATE**** Having the cronjob run on specific days of the month (for example 1st and 15th.) is not sufficient. This could cause gaps in the data which the program processes. The script makes a call which pulls data from 14 days back, and this is the maximum number of days supported by the script (licensing and stuff, can't be changed so not that important except that it is a limitation). So it need to run on lets say odd or even week numbers (to get 14 days).
Any ideas on how to accomplish this given this new information?.

You should look into the use of cron (or google it yourself if you dont like the link).
I suggest creating a simple Python script that is called by cron every 14 days. The crontab entry could look like the following:
# this will run at 00:01 on the 15th and 30th of every month
1 0 */15 * * /path/to/python/script.py
# this will run at 00:01 on the 1st and 15th of every month
1 0 1,15 * * /path/to/python/script.py
You still could make your script write some sort of result (with maybe a timestamp) to a file, so that you could easily check that file to see if it ran correctly (or log some error info).
# this will run at 00:01 on the 1st and 15th of every month
1 0 1,15 * * /path/to/python/script.py >> /path/to/logfile.log 2>&1
EDIT
You can also configure cron to run every Monday (or another day) if the 1st and 15th of every month are not sufficient. And the script could check a log file to see if it was run the previous Monday to assure it only executes your business logic every 2 weeks.
# this will run at 00:01 once a week on Mondays
1 0 * * 1 /path/to/python/script.py >> /path/to/logfile.log 2>&1

What does the landing time mean in airflow?

There is a section called "landing time" in the DAG view on the web console of airflow.
An example screen shot taken from airbnb's blog:
But what does it mean? There is no definition in the documents or in their repository.

Since the existing answer here wasn't totally clear, and this is the top hit for "airflow landing time" I went to the chat archives and found the original answer being referenced here:
Maxime Beauchemin #mistercrunch Jun 09 2016 11:12
it's the number of hours after the time the scheduling period ended
take a schedule_interval='#daily' run for 2016-01-01 that finishes at 2016-01-02 03:52:00
landing time is 3:52
https://gitter.im/apache/incubator-airflow/archives/2016/06/09
It seems the Y axis is in hours, and the negative landing times are a result of running jobs manually so they finish hours before they "should have finished" based on the schedule.

I directly asked the author Maxime. His answer was landing_time is when the job completes minus when the job should have started (for airflow, it's the end of the scheduled period).
source:
http://gitter.im/apache/incubator-airflow
It is a good place to get help and Maxine is very nice and helpful. But the answers are not persistent..

For me its easier to understand landing_time using an example.
So let's say we have a dag scheduled to run daily at 0 0 * * *. This dag has 2 tasks that execute sequentially:
first_task >> second_task
The first_task starts at 00:00 and 10 seconds and finishes after 5 minutes at 00:05:10.
The landing_time for first_task will be 5 mins and 10 seconds.
The second_task starts execution at 00:07:00 minute and finishes after 2 minutes. The landing_time for the second_task would be 9 minutes.
So we just delete from the task end_time the dag execution_date.
Thanks to #Kalinde Pride for commenting and pointing me to the only source of truth, the airflow code base.
I usually use landing_time as a measure and metric of the performance of the whole airflow system. For example increase in landing_times in the first tasks seems to mean that scheduler is under heavy load or we should adapt task parallelization (through airflow.cfg).

Landing Times: Total time spent including retries.

Create scheduled job and run the periodically

I have flask web service application with some daily, weekly and monthly events I want to store these events and calculate their start time, for example for an order with count of two and weekly period.
The first payment is today and other one is next week.
I want to store repeated times and then for each of them send notification on the start time periodically.
What is the best solution ?

I have used windows task scheduler to schedule a .bat file. The .bat file contained some short code to run the python script.
This way the scripy is not idling in the background when you are not using it.
As for storing data in between, I would save it to a file.

Cron Job on Google App Engine to Speed up pages

I want to schedule a cron job on Google App Engine to view my 5 main pages every 10 minutes or so to keep a current instance up and running and to increase page speed for users. I understand all of the basic syntax for creating a cron job but I am curious what the python would look like for that. Do I simply need to make 5 different cron jobs and have each one fetch a URL?

To answer your specific question, such a cron.yaml could look like this:
cron:
- description: five minute run
url: /refresh
schedule: every 5 minutes
where /refresh is a handler you've written in your app that is then called even N minutes automatically.
E.G. myapplication.appspot.com/refresh
There's no need to refresh a specific page or more than one. Just having the handler called will keep your app alive.
But as others have noted, this is a bit much to keep an app permanently warm.

You don't have to resort to this. You can pay to have App Engine keep a certain number of frontends running constantly. They're referred to as "resident" instances.
https://developers.google.com/appengine/docs/adminconsole/instances

I don't know about AppEngine but, in generic Python, all you need is urllib.urlopen(). I'd probably just have a single script that pulls all 5 pages in order - I can't really think of a reason to make them separate.

https://cloud.google.com/appengine/docs/standard/python/config/appref#automatic_scaling_min_instances
This seems like the proper way to solve the issue of your single low-traffic auto-scaled instance now. Basically just add this to your app.yaml:
automatic_scaling:
min_instances: 1
...Then add the warmup handler to your app (just so you're not throwing a 400 error every time GAE attempts to warm up your app):
https://cloud.google.com/appengine/docs/standard/python3/configuring-warmup-requests
Don't waste your time with pinging, this has the exact same effect & cost.

Cron job can fetch only one url.
I see 2 way:
1. You can add cron for every page.
2. You can add one cron job and add task to each page from cron.

Python or Bash commands to determine time since a cron job string would have triggered

I'm writing a Django app (although parts can be Bash) that stores the cron job strings of many other machines. It needs to calculate the amount of time since that cron job would have triggered on that machine. Is there a python library useful for converting cron style strings to another Python friendly scheduler format that has a function for determining when that should have last triggered?
For example:
a machine has a cron job at "0 8 * * 1-5" (every weekday at 8am local time to that server). Assuming my Django app was in the same time zone, and the current time was 10:15 AM on a Tuesday, then my app would need to be able to calculate 2 hours and 15 minutes as the answer.

Celery is the package that's usually used with Django for job scheduling. It has a module for parsing cron specs. It might be of use.

Develop Reference

Python is a programming language that lets you work quickly and integrate systems more effectively.

Scheduling a method to run at specific times - python

Related

Python scheduled task to run every 14 days

What does the landing time mean in airflow?

Create scheduled job and run the periodically

Cron Job on Google App Engine to Speed up pages

Python or Bash commands to determine time since a cron job string would have triggered

Categories

Resources