What does the landing time mean in airflow? - python

There is a section called "landing time" in the DAG view on the web console of airflow.
An example screen shot taken from airbnb's blog:
But what does it mean? There is no definition in the documents or in their repository.

Since the existing answer here wasn't totally clear, and this is the top hit for "airflow landing time" I went to the chat archives and found the original answer being referenced here:
Maxime Beauchemin #mistercrunch Jun 09 2016 11:12
it's the number of hours after the time the scheduling period ended
take a schedule_interval='#daily' run for 2016-01-01 that finishes at 2016-01-02 03:52:00
landing time is 3:52
https://gitter.im/apache/incubator-airflow/archives/2016/06/09
It seems the Y axis is in hours, and the negative landing times are a result of running jobs manually so they finish hours before they "should have finished" based on the schedule.

I directly asked the author Maxime. His answer was landing_time is when the job completes minus when the job should have started (for airflow, it's the end of the scheduled period).
source:
http://gitter.im/apache/incubator-airflow
It is a good place to get help and Maxine is very nice and helpful. But the answers are not persistent..

For me its easier to understand landing_time using an example.
So let's say we have a dag scheduled to run daily at 0 0 * * *. This dag has 2 tasks that execute sequentially:
first_task >> second_task
The first_task starts at 00:00 and 10 seconds and finishes after 5 minutes at 00:05:10.
The landing_time for first_task will be 5 mins and 10 seconds.
The second_task starts execution at 00:07:00 minute and finishes after 2 minutes. The landing_time for the second_task would be 9 minutes.
So we just delete from the task end_time the dag execution_date.
Thanks to #Kalinde Pride for commenting and pointing me to the only source of truth, the airflow code base.
I usually use landing_time as a measure and metric of the performance of the whole airflow system. For example increase in landing_times in the first tasks seems to mean that scheduler is under heavy load or we should adapt task parallelization (through airflow.cfg).

Landing Times: Total time spent including retries.

Related

Airflow DAG Schedule Meaning

What does the below airflow dag schedule mean?
schedule: "12 0-4,14-23 * * *"
Thanks,
cha
I want to schedule airflow dag to run run hourly but not between midnight and morning 7. Also, i want to pass more resources during last run of the day. so, I am trying to figure out how to do in airflow. I usually schedule once a day at certain hour. I want to understand how to schedule multiple times.
It's a cron expression. There are several tools on the internet to explain a cron expression in human-readable language. For example https://crontab.guru/#12_0-4,14-23___*:
"At minute 12 past every hour from 0 through 4 and every hour from 14 through 23."

Airflow: Retry up to a specific time

I need to create an Airflow job that needs to run absolutely before 9h.
I currently have a job that starts at 7h, with retries=8 with 15 minutes interval (8*15m=2h) unfortunately, my job takes more time, and due to this, the task fails after 9h that is the hard deadline.
How can I make it do retry every 15 minutes but fail if it's after 9h so a human can take a look at the issue ?
Thanks for your help
You could use the execution_timeout argument when creating the task to control how long it'll run before timing out. So if you run your task at 7AM, and want it to end at 9AM, then set the timeout to 2 hours. Below is info from Airflow documentation
aggregate_db_message_job = BashOperator(
task_id='aggregate_db_message_job',
execution_timeout=timedelta(hours=2),
pool='ep_data_pipeline_db_msg_agg',
bash_command=aggregate_db_message_job_cmd,
dag=dag)
aggregate_db_message_job.set_upstream(wait_for_empty_queue)

Scheduling a method to run at specific times

I'm making an app in python to send texts via Twilio. I'm using flask and it's hosted on Google App Engine. I have a list of messages that need to be sent at a specific date and time, by calling my message function. What's a simple way to go about creating this? I'm relatively new to all this.
I tried apscheduler, but it only worked on my local and not on the app engine. I've read about cron jobs, but can't find anything about specific dates/times or how to pass args when the job runs.
As mention in the comments by Fabio, you could make a cron task to run every 10 min (or every minute). I would look into a folder for messages to send. If you would make a filename format in that folder to start with the date and time, you could do something like that :
folder content:
201707092205_<#message_id>
pseudo-code for sending the message:
intant_when_the_script_is_ran = datetime.now().strftime(format_to_the_minute)
for file in folder:
if intant_when_the_script_is_ran in file
with open(file, 'rw') as fh:
destination = fh.readline() #reading the fisrt line
message = fh.readlines() #reading the rest of the message
twilioapi.sendmessage(destination, message)
os.remove(file) #the remove could be done in another script to leave some traces
This is where Google app engine come in handy. You can use cron jobs from app engine. Create a cron.yaml file in your project.In this file you can make all kind of scheduling option every day to one day in a week in a particular time. The following is an example cron.yaml file
cron:
- description: "daily summary job"
url: /tasks/summary
schedule: every 24 hours
- description: "monday morning mailout"
url: /mail/weekly
schedule: every monday 09:00
timezone: Australia/NSW
- description: "new daily summary job"
url: /tasks/summary
schedule: every 24 hours
target: beta
Cron schedules are specified using a simple English-like format.
every 12 hours
every 5 minutes from 10:00 to 14:00
every day 00:00
every monday 09:00
2nd,third mon,wed,thu of march 17:00
1st monday of sep,oct,nov 17:00
1 of jan,april,july,oct 00:00
for more well-explained scheduling format please refer this documentation.
Another option is to use a taskqueue task where you specify the time that the task should be run using the eta option (estimated time of arrival).
The task will sit in the queue until its time of execution arrives, and then GAE will cause the task to be launched to do whatever processing you need such as sending a text message.
The tasks may not be executed at the precise time you specify but in my experience it is generally quite close. Certainly far more accurate than running a CRON job every 10 minutes.
This will also be much more efficient than using a CRON job because a CRON job will cause a request to your app every 10 minutes but the task will only execute when needed. If you have a low volume app this may help you stay within the free quota.

Google App Engine Python Cron Job

I wanted to run my cron job as 'schedule: every saturday every 2 minutes from 01:00 to 3:00', and it won't allow this format. Is it possible to set a cron job to target another cron job? Or is my schedule possible just not in the correct format?
Unfortunately, you cannot combine the weekday option with the interval.
You could add a switch in the request handler of your cron-job, that will just exit if current week-day is not Saturday, while your cron.job is scheduled "every 2 minutes from 01:00 to 03:00". But that means that your handler will be called 300 times per week for doing nothing, and only doing something the other 60 times.
Alternatively, you could combine an "every saturday 01:00" cron-job (as dispatcher) that will create 60 push tasks (as worker) with countdown or ETA, spread between 01:00 and 03:00. However, I don't think the execution time is not guaranteed.

How to set Google App Engine cron job using different interval in different period of time?

How to config a cron job to run every 5 minutes between 9:00am~20:00pm,
but every 10 minutes in other time of the day.
I would recommend just using every 5 minutes synchronized in the cron.yaml, and then just terminate immediately in the handler if the exact time is not to your liking (hour before 9 or after 20 and minute // 5 is odd, for example). GAE's cron is not very sophisticated, but running a trivial handler which just gets the time, checks whether that's OK, and terminates immediately otherwise, is pretty simple and cheap (and the 70 or so "extra hits per day", each with a trivial amount of resource consumption, will hardly make a difference to your app's overall resource consumption anyway).
The new API for cron now can do it. Please check the document at: https://cloud.google.com/appengine/docs/python/config/cron#Python_app_yaml_The_schedule_format
every 12 hours
every 5 minutes from 10:00 to 14:00
every day 00:00
every monday 09:00
2nd,third mon,wed,thu of march 17:00
1st monday of sep,oct,nov 17:00
1 of jan,april,july,oct 00:00

Categories

Resources