I am learning airflow and as a practice exercise im trying to create a table at Redshift through an airflow dag at MWAA. I create the connection to Redshift at the UI (specifying host,port, etc) and run the following dag, but it fails at the "sql_query" task. Any idea of how can I solve this problem or what can be causing it?
Script:
import os
from datetime import timedelta
from airflow import DAG
from airflow.models import Variable
from airflow.models.baseoperator import chain
from airflow.operators.dummy import DummyOperator
from airflow.providers.amazon.aws.operators.redshift import RedshiftSQLOperator
from airflow.utils.dates import days_ago
DEFAULT_ARGS = {
"owner": "username",
"depends_on_past": False,
"retries": 0,
"email_on_failure": False,
"email_on_retry": False,
"redshift_conn_id": "redshift_default",
}
with DAG(
dag_id= "new_table_dag",
description="",
default_args=DEFAULT_ARGS,
dagrun_timeout=timedelta(minutes=15),
start_date=days_ago(1),
schedule_interval=None,
tags=[""],
) as dag:
begin = DummyOperator(task_id="begin")
end = DummyOperator(task_id="end")
sql_query = RedshiftSQLOperator(
task_id="sql_query",
sql= "CREATE TABLE schema_name.table_a AS (SELECT * FROM table_b)")
chain(begin,sql_query, end)
Related
I've created a module named dag_template_module.py that returns a DAG using specified arguments. I want to use this definition for multiple DAGs, doing same thing but from different sources (thus parameters). A simplified version of dag_template_module.py:
from airflow.decorators import dag, task
from airflow.operators.bash import BashOperator
def dag_template(
dag_id: str,
echo_message_1: str,
echo_message_2: str
):
#dag(
dag_id=dag_id,
schedule_interval="0 6 2 * *"
)
def dag_example():
echo_1 = BashOperator(
task_id='echo_1',
bash_command=f'echo {echo_message_1}'
)
echo_2 = BashOperator(
task_id='echo_2',
bash_command=f'echo {echo_message_2}'
)
echo_1 >> echo_2
dag = dag_example()
return dag
Now I've created a hello_world_dag.py that imports dag_template() function from dag_template_module.py and uses it to create a DAG:
from dag_template import dag_template
hello_world_dag = dag_template(
dag_id='hello_world_dag',
echo_message_1='Hello',
echo_message_2='World'
)
I've expected that this DAG will be discovered by Airflow UI but that's not the case.
I've also tried using globals() in hello_world_dag.py according to documentation but that also doesn't work for me:
from dag_template import dag_template
hello_world_dag = 'hello_word_dag'
globals()[hello_world_dag] = dag_template(dag_id='hello_world_dag',
echo_message_1='Hello',
echo_message_2='World'
)
A couple things:
The DAG you are attempting to create is missing the start_date param
There is a nuance to how Airflow determine which Python files might contain a DAG definition and it's looking for "dag" and "airflow" in the file contents. The hello_world_dag.py is missing these keywords so the DagFileProcessor won't attempt to parse this file and, therefore, doesn't call the dag_template() function.
Adding these small tweaks, and running with Airflow 2.5.0:
dag_template_module.py
from pendulum import datetime
from airflow.decorators import dag
from airflow.operators.bash import BashOperator
def dag_template(dag_id: str, echo_message_1: str, echo_message_2: str):
#dag(dag_id, start_date=datetime(2023, 1, 22), schedule=None)
def dag_example():
echo_1 = BashOperator(task_id="echo_1", bash_command=f"echo {echo_message_1}")
echo_2 = BashOperator(task_id="echo_2", bash_command=f"echo {echo_message_2}")
echo_1 >> echo_2
return dag_example()
hello_world_dag.py
#airflow dag <- Make sure this these words appear _somewhere_ in the file.
from dag_template_module import dag_template
dag_template(dag_id="dag_example", echo_message_1="Hello", echo_message_2="World")
I have been in Airflow 1.10.14 for a long time, and now I'm trying to upgrade to Airflow 2.4.3 (latest?) I have built this dag in the new format in hopes to assimilate the language and understand how the new format works. Below is my dag:
from airflow.decorators import dag, task
from airflow.models import Variable
from airflow.providers.google.cloud.operators.bigquery import BigQueryInsertJobOperator
from airflow.providers.microsoft.mssql.operators.mssql import MsSqlOperator
from airflow.operators.bash import BashOperator
from datetime import datetime
import glob
path = '~/airflow/staging/gcs/offrs2/'
clear_Staging_Folders = """
rm -rf {}OFFRS2/LEADS*.*
""".format(Variable.get("temp_directory"))
#dag(
schedule_interval='#daily',
start_date=datetime(2022, 11, 1),
catchup=False,
tags=['offrs2', 'LEADS']
)
def taskflow():
CLEAR_STAGING = BashOperator(
task_id='Clear_Folders',
bash_command=clear_Staging_Folders,
dag=dag,
)
BQ_Output = BigQueryInsertJobOperator(
task_id='BQ_Output',
configuration={
"query": {
"query": '~/airflow/sql/Leads/Leads_Export.sql',
"useLegacySql": False
}
}
)
Prep_MSSQL = MsSqlOperator(
task_id='Prep_DB3_Table',
mssql_conn_id = 'db.offrs.com',
sql='truncate table offrs_staging..LEADS;'
)
#task
def Load_Staging_Table():
for files in glob.glob(path + 'LEADS*.csv'):
print(files)
CLEAR_STAGING >> BQ_Output >> Load_Staging_Table()
dag = taskflow()
when I send this up, I'm getting the below error:
Broken DAG: [/home/airflow/airflow/dags/BQ_OFFRS2_Leads.py] Traceback (most recent call last):
File "/home/airflow/.local/lib/python3.10/site-packages/airflow/models/baseoperator.py", line 376, in apply_defaults
task_group = TaskGroupContext.get_current_task_group(dag)
File "/home/airflow/.local/lib/python3.10/site-packages/airflow/utils/task_group.py", line 490, in get_current_task_group
return dag.task_group
AttributeError: 'function' object has no attribute 'task_group'
As I look at my code, I don't have a specified task_group.
Where am I going wrong here?
Thank you!
You forgot to remove an undefined dag variable in CLEAR_STAGING. When you are using decorator, remove dag=dag.
CLEAR_STAGING = BashOperator(
task_id='Clear_Folders',
bash_command=clear_Staging_Folders,
# dag=dag <== Remove this
)
I've installed the airflow on docker and i'm trying to create my first DAG, but when i use the command FROM airflow import DAG and try to execute it gives an import error. The file name isn't set as airflow.py to avoid import problems. Also i can't import the from airflow.operators.python_operator import PythonOperator it says that the airflow.operators.python_operator could not be resolved.
Here's the code that i've used to create my first DAG:
import airflow
from airflow import DAG
from airflow.operators.python_operator import PythonOperator
default_args ={
'owner': 'eike',
'depends_on_past': False,
'start-date': airflow.utils.dates.days_ago(2),
'email': ['eike#gmail.com.br'],
'email_on_failure': False,
'email_on_retry': False,
'retries': 2,
'retry_delay': timedelta(minutes=3),
}
dag = DAG(
'anonimização',
default_args = default_args,
description = 'Realização da anonimzação do banco de dados propesq',
schedule_interval = timedelta(None),
catchup = False,
)
Code of the DAG on vs code
Airflow home page with DAG import error
I installed a docker pointing to local folders where I configured my dag in real file path: "C:\Users\Rod\airflow-docker"
So far so good. I can run my DAGs without any problems.
The problem is when I try to run a script via BashOperator task. Returns error. What am I doing wrong?
the error:
Broken DAG: [/opt/airflow/dags/invetory_sap.py] Traceback (most recent call last): File "", line 219, in _call_with_frames_removed File "/opt/airflow/dags/invetory_sap.py", line 34, in etl_invetory_sap NameError: name 'etl_invetory_sap' is not defined
The DAG:
from airflow import DAG
from airflow.operators.python import PythonOperator, BranchPythonOperator
from airflow.operators.python import PythonOperator
from airflow.operators.bash import BashOperator
from datetime import datetime, timedelta
seven_days_ago = datetime.combine(datetime.today() - timedelta(7),
datetime.min.time())
args = {
'owner': 'airflow',
'depends_on_past': False,
'start_date': seven_days_ago,
'retries': 1,
'retry_delay': timedelta(minutes=5),
}
with DAG("invetory_sap",
default_args=args,
schedule_interval='30 * * * *',
dagrun_timeout=timedelta(minutes=60),
catchup=False) as dag:
etl_inventory_sap = BashOperator(
task_id='etl_invetory_sap',
bash_command='python /opt/airflow/plugins/ler_txt_convert_todataframe_v5.py'
)
etl_invetory_sap
Spelling error. You declared it as "etl_inventory_sap" and then wrote "etl_invetory_sap" on the next line. Put back the n and you should be fine.
I am new to Airflow and I followed the tutorial on the official page (https://airflow.readthedocs.io/en/stable/tutorial.html) and added a subdag to the tutorial dag.
When I zoom into the subdag on the web-UI and click on code, the code of the main-dag is shown. Also when I click on details of the subdag the filename of the main-dag is displayed, like on the screenshot.
Screenshot of wrong filepath:
My file structure:
dags/
├── subdags
│ ├── hellosubdag.py
│ ├── __init__.py
├── tutorial.py
My main-dag code:
from datetime import timedelta
from airflow import DAG
from airflow.operators.bash_operator import BashOperator
from airflow.operators.subdag_operator import SubDagOperator
from airflow.utils.dates import days_ago
from subdags.hellosubdag import sub_dag
default_args = {
'owner': 'airflow',
'depends_on_past': False,
'start_date': days_ago(2),
'retries': 1,
'retry_delay': timedelta(minutes=5),
}
parentdag = DAG(
dag_id='tutorial',
default_args=default_args,
description='A simple tutorial DAG',
schedule_interval=timedelta(days=1),
)
subdag_execute = SubDagOperator(
task_id='subdag-exe',
subdag=sub_dag('tutorial', 'subdag-exe', default_args['start_date'], timedelta(days=1)),
dag=parentdag,
)
And the subdag simply prints a string.
My company was using airflow 1.10.3 before updating to 1.10.9 and I've been told that it used to work before the update.
I can't find any changelog or documentation regarding this issue, was this feature removed at some point or am I doing something wrong?