Re-running Failed SubDAGs

Re-running Failed SubDAGs - python

I've been playing around with SubDAGs. A big problem I've faced is whenever something within the SubDAG fails, and I re-run things by hitting Clear, only the cleared task will re-run; the success does not propagate to downstream tasks in the SubDAG and get them running.
How do I re-run a failed task in a SubDAG such that the downstream tasks will flow correctly? Right now, I have to literally re-run every task in the SubDAG that is downstream of the failed task.
I think I followed the best practices of SubDAGs; the SubDAG inherits the Parent DAG properties wherever possible (including schedule_interval), and I don't turn the SubDAG on in the UI; the parent DAG is on and triggers it instead.

A bit of a workaround but in case you have given your tasks task_id-s consistently you can try the backfilling from Airflow CLI (Command Line Interface):
airflow backfill -t TASK_REGEX ... dag_id
where TASK_REGEX corresponds to the naming pattern of the task you want to rerun and its dependencies.
(remember to add the rest of the command line options, like --start_date).

Related

How do you mark DAG run as success when majority of tasks succeed, and only a few fail?

I have a DAG that runs hundreds of tasks. There are tasks that if they fail the failures are handled elsewhere, so it is ok if they fail. However, Airflow marks the whole DAG run as a failure.
What I want to do is as follows: I want to measure the number of tasks and if more than a certain percentage succeed mark the DAG run as a success.

You can achieve this by defining Airflow trigger rule:
all_done: all parents are done with their execution
op = DummyOperator(task_id='join', dag=dag, trigger_rule='all_done')
https://airflow.apache.org/docs/apache-airflow/stable/concepts.html#trigger-rules

What happens if run same dag multiple times while already running?

What happens if the same dag is triggered concurrently (or such that the run times overlap)?
Asking because recently manually triggered a dag that ended up still being running when its actual scheduled run time passed, at which point, from the perspective of the web-server UI, it began running again from the beginning (and I could no longer track the previous instance). Is this just a case of that "run instance" overloading the dag_id or is the job literally restarting (ie. the previous processes are killed)?

As I understand it depends on how it was triggered and if the DAG has a schedule. If it's based on the schedule defined in the DAG say a task to run daily it is incomplete / still working and you click the rerun then this instance of the task will be rerun. i.e the one for today. Likewise if the frequency were any other unit of time.
If you wanted to rerun other instances you need to delete them from
the previous jobs as described by #lars-haughseth in a different
question. airflow-re-run-dag-from-beginning-with-new-schedule
If you trigger a DAG run then it will get the triggers execution
timestamp and the run will be displayed separately to the scheduled
runs. As described in the documentation here. external-triggers documentation
Note that DAG Runs can also be created manually through the CLI while running an airflow trigger_dag command, where you can define a specific run_id. The DAG Runs created externally to the scheduler get associated to the trigger’s timestamp, and will be displayed in the UI alongside scheduled DAG runs.
In your instance it sounds like the latter. Hope that helps.

How to avoid running previously successful tasks in Airflow?

I have multiple tasks that are passing some data objects to each other. In some tasks, if some condition is not met, I'm raising an exception. This leads to the failure of that task. When the next DAG run is triggered, the already successful task runs once again. I'm finding some way to avoid running the previously successful tasks and resume the DAG run from the failed task in the next DAG run.

As mentioned, every DAG has it's set of tasks that are executed every run. In order to avoid running previously successful tasks, you could perform a check for an external variable via Airflow XCOMs or Airflow Variables, you could also query the meta database as to the status of previous runs. You could also store a variable in something like Redis or a similar external database.
Using that variable you can then skip the execution of a Task and directly mark the task successful until it reaches the task that is to be completed.
Of course you need to be mindful of any potential race conditions if the DAG run times can overlap.
def task_1( **kwargs ):
if external_variable:
pass
else:
perform_task()
return True

airflow dag failed... but all tasks succeeded

I am extremely confused by something in our airflow ui. In the tree view (and the graph view), a dag is indicated to have failed. However, all of its member tasks appear to have succeeded. You can see it here below (third from the end):
Does anyone know how this is possible, what it means, or how one would investigate it?

I have experienced the same. All tasks complete with success, but the DAG fails. Did not find anything in any logs.
In my case, it was the DAG's dagrun_timeout setting that was set too low for my tasks that did run for more than 30 minutes:
dag = DAG(...,
dagrun_timeout=timedelta(minutes=30),
...)
I am on Airflow version 1.10.1.

How to stop/kill Airflow tasks from the UI

How can I stop/kill a running task on Airflow UI? I am using LocalExecutor.
Even if I use CeleryExecutor, how do can I kill/stop the running task?

In the DAGs screen you can see the running tasks:
Example
On 'Recent Tasks' press the running icon and Airflow will automatically run the search query with the filters for the Dag Id and State equal to 'running' and show the results on the Task Instances screen (you can find it manually on the tab Browse > Task Instances).
There you can select the presented tasks and set them to another state or delete them.
Please notice that if the DAG is currently running, the Airflow scheduler will start again the tasks you delete. So either you stop the DAG first by changing its state or stop the scheduler (if you are running on a test environment).

Set task to failed state:
Click task
Set task to "Failed" state
All subsequent tasks (if there are any) will also be marked as failed:

Simply set the task to failed state will stop the running task.
[2019-09-17 23:53:28,040] {logging_mixin.py:82} INFO - [2019-09-17 23:53:28,039] {jobs.py:2695} WARNING - State of this instance has been externally set to failed. Taking the poison pill.
[2019-09-17 23:53:28,041] {helpers.py:240} INFO - Sending Signals.SIGTERM to GPID 20977

from airflow gitter (#villasv)
" Not gracefully, no. You can stop a dag (unmark as running) and clear
the tasks states or even delete them in the UI. The actual running
tasks in the executor won't stop, but might be killed if the
executor realizes that it's not in the database anymore. "

As menioned by Pablo and Jorge pausing the Dag will not stop the task from being executed if the execution already started. However there is a way to stop a running task from the UI but it's a bit hacky.
When the task is on running state you can click on CLEAR this will call job.kill() the task will be set to shut_down and moved to up_for_retry immediately hence it is stopped.
Clearly Airflow did not meant for you to clear tasks in Running state however since Airflow did not disable it either you can use it as I suggested. Airflow meant CLEAR to be used with failed, up_for_retry etc... Maybe in the future the community will use this bug(?) and implement this as a functionality with "shut down task" button.

AirFlow Version: v2.2.2 and above
Go to AirFlow home page
and search you Dag.
Under the "Runs" status column press on the running status(green circle).
You can choose a specific DAG run or, in the check box above choose all.
Then on the "Action" button, choose your relevant action

Develop Reference

Python is a programming language that lets you work quickly and integrate systems more effectively.

Re-running Failed SubDAGs - python

Related

How do you mark DAG run as success when majority of tasks succeed, and only a few fail?

What happens if run same dag multiple times while already running?

How to avoid running previously successful tasks in Airflow?

airflow dag failed... but all tasks succeeded

How to stop/kill Airflow tasks from the UI

Categories

Resources