I am learning airflow and as a practice exercise im trying to create a table at Redshift through an airflow dag at MWAA. I create the connection to Redshift at the UI (specifying host,port, etc) and run the following dag, but it fails at the "sql_query" task. Any idea of how can I solve this problem or what can be causing it?
import os
from datetime import timedelta
from airflow import DAG
from airflow.models import Variable
from airflow.models.baseoperator import chain
from airflow.operators.dummy import DummyOperator
from airflow.providers.amazon.aws.operators.redshift import RedshiftSQLOperator
from airflow.utils.dates import days_ago
"owner": "username",
"depends_on_past": False,
"retries": 0,
"email_on_failure": False,
"email_on_retry": False,
"redshift_conn_id": "redshift_default",
with DAG(
dag_id= "new_table_dag",
) as dag:
begin = DummyOperator(task_id="begin")
end = DummyOperator(task_id="end")
sql_query = RedshiftSQLOperator(
sql= "CREATE TABLE schema_name.table_a AS (SELECT * FROM table_b)")
chain(begin,sql_query, end)
I've created a module named dag_template_module.py that returns a DAG using specified arguments. I want to use this definition for multiple DAGs, doing same thing but from different sources (thus parameters). A simplified version of dag_template_module.py:
from airflow.decorators import dag, task
from airflow.operators.bash import BashOperator
def dag_template(
dag_id: str,
echo_message_1: str,
echo_message_2: str
schedule_interval="0 6 2 * *"
def dag_example():
echo_1 = BashOperator(
bash_command=f'echo {echo_message_1}'
echo_2 = BashOperator(
bash_command=f'echo {echo_message_2}'
echo_1 >> echo_2
dag = dag_example()
return dag
Now I've created a hello_world_dag.py that imports dag_template() function from dag_template_module.py and uses it to create a DAG:
from dag_template import dag_template
hello_world_dag = dag_template(
I've expected that this DAG will be discovered by Airflow UI but that's not the case.
I've also tried using globals() in hello_world_dag.py according to documentation but that also doesn't work for me:
from dag_template import dag_template
hello_world_dag = 'hello_word_dag'
globals()[hello_world_dag] = dag_template(dag_id='hello_world_dag',
A couple things:
The DAG you are attempting to create is missing the start_date param
There is a nuance to how Airflow determine which Python files might contain a DAG definition and it's looking for "dag" and "airflow" in the file contents. The hello_world_dag.py is missing these keywords so the DagFileProcessor won't attempt to parse this file and, therefore, doesn't call the dag_template() function.
Adding these small tweaks, and running with Airflow 2.5.0:
from pendulum import datetime
from airflow.decorators import dag
from airflow.operators.bash import BashOperator
def dag_template(dag_id: str, echo_message_1: str, echo_message_2: str):
#dag(dag_id, start_date=datetime(2023, 1, 22), schedule=None)
def dag_example():
echo_1 = BashOperator(task_id="echo_1", bash_command=f"echo {echo_message_1}")
echo_2 = BashOperator(task_id="echo_2", bash_command=f"echo {echo_message_2}")
echo_1 >> echo_2
return dag_example()
#airflow dag <- Make sure this these words appear _somewhere_ in the file.
from dag_template_module import dag_template
dag_template(dag_id="dag_example", echo_message_1="Hello", echo_message_2="World")
I have been in Airflow 1.10.14 for a long time, and now I'm trying to upgrade to Airflow 2.4.3 (latest?) I have built this dag in the new format in hopes to assimilate the language and understand how the new format works. Below is my dag:
from airflow.decorators import dag, task
from airflow.models import Variable
from airflow.providers.google.cloud.operators.bigquery import BigQueryInsertJobOperator
from airflow.providers.microsoft.mssql.operators.mssql import MsSqlOperator
from airflow.operators.bash import BashOperator
from datetime import datetime
import glob
path = '~/airflow/staging/gcs/offrs2/'
clear_Staging_Folders = """
rm -rf {}OFFRS2/LEADS*.*
start_date=datetime(2022, 11, 1),
tags=['offrs2', 'LEADS']
def taskflow():
CLEAR_STAGING = BashOperator(
BQ_Output = BigQueryInsertJobOperator(
"query": {
"query": '~/airflow/sql/Leads/Leads_Export.sql',
"useLegacySql": False
Prep_MSSQL = MsSqlOperator(
mssql_conn_id = 'db.offrs.com',
sql='truncate table offrs_staging..LEADS;'
def Load_Staging_Table():
for files in glob.glob(path + 'LEADS*.csv'):
CLEAR_STAGING >> BQ_Output >> Load_Staging_Table()
dag = taskflow()
when I send this up, I'm getting the below error:
Broken DAG: [/home/airflow/airflow/dags/BQ_OFFRS2_Leads.py] Traceback (most recent call last):
File "/home/airflow/.local/lib/python3.10/site-packages/airflow/models/baseoperator.py", line 376, in apply_defaults
task_group = TaskGroupContext.get_current_task_group(dag)
File "/home/airflow/.local/lib/python3.10/site-packages/airflow/utils/task_group.py", line 490, in get_current_task_group
return dag.task_group
AttributeError: 'function' object has no attribute 'task_group'
As I look at my code, I don't have a specified task_group.
Where am I going wrong here?
Thank you!
You forgot to remove an undefined dag variable in CLEAR_STAGING. When you are using decorator, remove dag=dag.
CLEAR_STAGING = BashOperator(
# dag=dag <== Remove this
I've installed the airflow on docker and i'm trying to create my first DAG, but when i use the command FROM airflow import DAG and try to execute it gives an import error. The file name isn't set as airflow.py to avoid import problems. Also i can't import the from airflow.operators.python_operator import PythonOperator it says that the airflow.operators.python_operator could not be resolved.
Here's the code that i've used to create my first DAG:
import airflow
from airflow import DAG
from airflow.operators.python_operator import PythonOperator
default_args ={
'owner': 'eike',
'depends_on_past': False,
'start-date': airflow.utils.dates.days_ago(2),
'email': ['eike#gmail.com.br'],
'email_on_failure': False,
'email_on_retry': False,
'retries': 2,
'retry_delay': timedelta(minutes=3),
dag = DAG(
default_args = default_args,
description = 'Realização da anonimzação do banco de dados propesq',
schedule_interval = timedelta(None),
catchup = False,
Code of the DAG on vs code
Airflow home page with DAG import error
I installed a docker pointing to local folders where I configured my dag in real file path: "C:\Users\Rod\airflow-docker"
So far so good. I can run my DAGs without any problems.
The problem is when I try to run a script via BashOperator task. Returns error. What am I doing wrong?
the error:
Broken DAG: [/opt/airflow/dags/invetory_sap.py] Traceback (most recent call last): File "", line 219, in _call_with_frames_removed File "/opt/airflow/dags/invetory_sap.py", line 34, in etl_invetory_sap NameError: name 'etl_invetory_sap' is not defined
The DAG:
from airflow import DAG
from airflow.operators.python import PythonOperator, BranchPythonOperator
from airflow.operators.python import PythonOperator
from airflow.operators.bash import BashOperator
from datetime import datetime, timedelta
seven_days_ago = datetime.combine(datetime.today() - timedelta(7),
args = {
'owner': 'airflow',
'depends_on_past': False,
'start_date': seven_days_ago,
'retries': 1,
'retry_delay': timedelta(minutes=5),
with DAG("invetory_sap",
schedule_interval='30 * * * *',
catchup=False) as dag:
etl_inventory_sap = BashOperator(
bash_command='python /opt/airflow/plugins/ler_txt_convert_todataframe_v5.py'
Spelling error. You declared it as "etl_inventory_sap" and then wrote "etl_invetory_sap" on the next line. Put back the n and you should be fine.
I am new to Airflow and I followed the tutorial on the official page (https://airflow.readthedocs.io/en/stable/tutorial.html) and added a subdag to the tutorial dag.
When I zoom into the subdag on the web-UI and click on code, the code of the main-dag is shown. Also when I click on details of the subdag the filename of the main-dag is displayed, like on the screenshot.
Screenshot of wrong filepath:
My file structure:
├── subdags
│ ├── hellosubdag.py
│ ├── __init__.py
├── tutorial.py
My main-dag code:
from datetime import timedelta
from airflow import DAG
from airflow.operators.bash_operator import BashOperator
from airflow.operators.subdag_operator import SubDagOperator
from airflow.utils.dates import days_ago
from subdags.hellosubdag import sub_dag
default_args = {
'owner': 'airflow',
'depends_on_past': False,
'start_date': days_ago(2),
'retries': 1,
'retry_delay': timedelta(minutes=5),
parentdag = DAG(
description='A simple tutorial DAG',
subdag_execute = SubDagOperator(
subdag=sub_dag('tutorial', 'subdag-exe', default_args['start_date'], timedelta(days=1)),
And the subdag simply prints a string.
My company was using airflow 1.10.3 before updating to 1.10.9 and I've been told that it used to work before the update.
I can't find any changelog or documentation regarding this issue, was this feature removed at some point or am I doing something wrong?