Airflow subdag codeview shows code of main-dag - python

I am new to Airflow and I followed the tutorial on the official page (https://airflow.readthedocs.io/en/stable/tutorial.html) and added a subdag to the tutorial dag.
When I zoom into the subdag on the web-UI and click on code, the code of the main-dag is shown. Also when I click on details of the subdag the filename of the main-dag is displayed, like on the screenshot.
Screenshot of wrong filepath:
My file structure:
dags/
├── subdags
│   ├── hellosubdag.py
│   ├── __init__.py
├── tutorial.py
My main-dag code:
from datetime import timedelta
from airflow import DAG
from airflow.operators.bash_operator import BashOperator
from airflow.operators.subdag_operator import SubDagOperator
from airflow.utils.dates import days_ago
from subdags.hellosubdag import sub_dag
default_args = {
'owner': 'airflow',
'depends_on_past': False,
'start_date': days_ago(2),
'retries': 1,
'retry_delay': timedelta(minutes=5),
}
parentdag = DAG(
dag_id='tutorial',
default_args=default_args,
description='A simple tutorial DAG',
schedule_interval=timedelta(days=1),
)
subdag_execute = SubDagOperator(
task_id='subdag-exe',
subdag=sub_dag('tutorial', 'subdag-exe', default_args['start_date'], timedelta(days=1)),
dag=parentdag,
)
And the subdag simply prints a string.
My company was using airflow 1.10.3 before updating to 1.10.9 and I've been told that it used to work before the update.
I can't find any changelog or documentation regarding this issue, was this feature removed at some point or am I doing something wrong?

Related

Problems connecting Redshift to Airflow (MWAA)

I am learning airflow and as a practice exercise im trying to create a table at Redshift through an airflow dag at MWAA. I create the connection to Redshift at the UI (specifying host,port, etc) and run the following dag, but it fails at the "sql_query" task. Any idea of how can I solve this problem or what can be causing it?
Script:
import os
from datetime import timedelta
from airflow import DAG
from airflow.models import Variable
from airflow.models.baseoperator import chain
from airflow.operators.dummy import DummyOperator
from airflow.providers.amazon.aws.operators.redshift import RedshiftSQLOperator
from airflow.utils.dates import days_ago
DEFAULT_ARGS = {
"owner": "username",
"depends_on_past": False,
"retries": 0,
"email_on_failure": False,
"email_on_retry": False,
"redshift_conn_id": "redshift_default",
}
with DAG(
dag_id= "new_table_dag",
description="",
default_args=DEFAULT_ARGS,
dagrun_timeout=timedelta(minutes=15),
start_date=days_ago(1),
schedule_interval=None,
tags=[""],
) as dag:
begin = DummyOperator(task_id="begin")
end = DummyOperator(task_id="end")
sql_query = RedshiftSQLOperator(
task_id="sql_query",
sql= "CREATE TABLE schema_name.table_a AS (SELECT * FROM table_b)")
chain(begin,sql_query, end)

ImportError: cannot import name 'DAG' from 'airflow' (unknown location)

I've installed the airflow on docker and i'm trying to create my first DAG, but when i use the command FROM airflow import DAG and try to execute it gives an import error. The file name isn't set as airflow.py to avoid import problems. Also i can't import the from airflow.operators.python_operator import PythonOperator it says that the airflow.operators.python_operator could not be resolved.
Here's the code that i've used to create my first DAG:
import airflow
from airflow import DAG
from airflow.operators.python_operator import PythonOperator
default_args ={
'owner': 'eike',
'depends_on_past': False,
'start-date': airflow.utils.dates.days_ago(2),
'email': ['eike#gmail.com.br'],
'email_on_failure': False,
'email_on_retry': False,
'retries': 2,
'retry_delay': timedelta(minutes=3),
}
dag = DAG(
'anonimização',
default_args = default_args,
description = 'Realização da anonimzação do banco de dados propesq',
schedule_interval = timedelta(None),
catchup = False,
)
Code of the DAG on vs code
Airflow home page with DAG import error

Poke the Specified Extension file in the server directory using the Airflow SFTPSensor

My use case is quite simple:
When file dropped in the FTP server directory, SFTPSensor task picks the specified txt extension file and process the file content.
path="/test_dir/sample.txt" this case is working.
my requirement is to read the dynamic filenames with only the specified extension(text files).
path="/test_dir/*.txt", in this case file poking is not working..
#Sample Code
from airflow.models import DAG
from airflow.operators.python import PythonOperator
from airflow.providers.sftp.sensors.sftp import SFTPSensor
from airflow.providers.ssh.hooks.ssh import SSHHook
from datetime import datetime
default_args= {
"owner": "airflow",
"depends_on_past": False,
"start_date": datetime(2022, 4, 16)
}
with DAG(
'sftp_sensor_test',
schedule_interval=None,
default_args=default_args
) as dag:
waiting_for_file = SFTPSensor(
task_id="check_for_file",
sftp_conn_id="sftp_default",
path="/test_dir/*.txt", #NOTE: Poking for the txt extension files
mode="reschedule",
poke_interval=30
)
waiting_for_file
To achieve what you want, I think you should use the file_pattern argument as follows :
waiting_for_file = SFTPSensor(
task_id="check_for_file",
sftp_conn_id="sftp_default",
path="test_dir",
file_pattern="*.txt",
mode="reschedule",
poke_interval=30
)
However, there is currently a bug for this feature → https://github.com/apache/airflow/issues/28121
While this gets solved, you can easily create a local fixed version of the sensor in your project following the issue's explanations.
Here is the file with the current fix: https://github.com/RishuGuru/airflow/blob/ac0457a51b885459bc5ae527878a50feb5dcadfa/airflow/providers/sftp/sensors/sftp.py

Deploy converted oozie DAG into Google Composer Airflow: No module named 'o2a'

I'm using google oozie to airflow converter to convert some oozie workflow that are running on AWS EMR. Managed to get a first version, but when I try to upload the DAG, airflow throws an error:
Broken DAG: No module named 'o2a'
I have tried to deploy the pypi package o2a, both using command
gcloud composer environments update composer-name --update-pypi-packages-from-file requirements.txt --location location
And from google cloud console. Both failed.
requirements.txt
o2a==1.0.1
Here is the code
from airflow import models
from airflow.operators.subdag_operator import SubDagOperator
from airflow.utils import dates
from o2a.o2a_libs import functions
from airflow.models import Variable
import subdag_validation
import subdag_generate_reports
CONFIG = {}
JOB_PROPS = {
}
dag_config = Variable.get("coordinator", deserialize_json=True)
cdrPeriod = dag_config["cdrPeriod"]
TASK_MAP = {"validation": ["validation"], "generate_reports": ["generate_reports"] }
TEMPLATE_ENV = {**CONFIG, **JOB_PROPS, "functions": functions, "task_map": TASK_MAP}
with models.DAG(
"workflow_coordinator",
schedule_interval=None, # Change to suit your needs
start_date=dates.days_ago(0), # Change to suit your needs
user_defined_macros=TEMPLATE_ENV,
) as dag:
validation = SubDagOperator(
task_id="validation",
trigger_rule="one_success",
subdag=subdag_validation.sub_dag(dag.dag_id, "validation", dag.start_date, dag.schedule_interval),
)
generate_reports = SubDagOperator(
task_id="generate_reports",
trigger_rule="one_success",
subdag=subdag_generate_reports.sub_dag(dag.dag_id, "generate_reports", dag.start_date, dag.schedule_interval,
{
"cdrPeriod": "{{cdrPeriod}}"
}),
)
validation.set_downstream(generate_reports)
There is a section in the o2a docs that cover how to deploy o2a:
https://github.com/GoogleCloudPlatform/oozie-to-airflow#the-o2a-libraries
With started to failed because another dependency:lark-parser
Just installed using pypi package manager for Composer did the trick.

Airflow is unable to Import custom python package

I want to call a script through airflow from a custom python project
My directory structure is:
/home/user/
├──airflow/
│ ├──dags
├──.venv_airflow (virtual environment for airflow)
│ ├──my_dag.py
├──my_project
├──.venv (virtual environment for my_project)
├──folderA
├──__init__.py
├──folderB
├──call_me.py (has a line "from my_project.folderA.folderB import import_me")
├──import_me.py
My dag file looks like:
from airflow import DAG
import datetime as dt
from airflow.operators.bash_operator import BashOperator
default_args = {
'owner': 'arpita',
'start_date': dt.datetime(2019, 11, 20),
'retries': 1,
'retry_delay': dt.timedelta(minutes=5),
'depends_on_past': False,
'email': ['example#abc.com'],
'email_on_failure': True,
'email_on_retry': True,
}
with DAG('sample',
default_args=default_args,
schedule_interval='30 * * * *',
) as dag:
enter_project = BashOperator(task_id='enter_project',
bash_command='cd /home/user/my_project',
retries=2)
setup_environment = BashOperator(task_id='setup_environment',
bash_command='source /home/user/my_project/.venv/bin/activate',
retries=2)
call_script = BashOperator(task_id='call_script',
bash_command='python -m my_project.folderA.folderB.call_me,
retries=2)
enter_project >> setup_environment >> call_script
But I am getting this error
[2019-11-22 11:56:49,311] {bash_operator.py:115} INFO - Running command: python -m my_project.folderA.folderB.call_me
[2019-11-22 11:56:49,315] {bash_operator.py:124} INFO - Output:
[2019-11-22 11:56:49,349] {bash_operator.py:128} INFO - /home/user/airflow/.venv/bin/python: Error while finding spec for 'my_project.folderA.folderB.call_me' (ImportError: No module named 'my_project')
Project and the script are working outside airflow. In airflow, it imports other packages like pandas and tensorflow but not custom packages. I tried inserting path with sys.path.insert but that is not working. Thank you for reading:)
Your bash commands run in three separate bash operators. It should run in one.
call_script = BashOperator(
task_id='call_script',
bash_command='cd /home/user/my_project;'
'source /home/user/my_project/.venv/bin/activate;'
'python -m my_project.folderA.folderB.call_me',
retries=2)

Categories

Resources