How to import external scripts in a Airflow DAG with Python? - python

I have the following structure:
And I try to import the script inside some files of the inbound_layer like so:
import calc
However I get the following error message on Airflow web:
Any idea?

For airflow DAG, when you import your own module, you need make sure 2 things:
where is the module? You need to find where is the root path in you airflow folder. For example, in my dev box, the folders are:
~/projects/data/airflow/teams/team_name/projects/default/dags/dag_names/dag_files.py
The root is airflow, so if I put my modules my_module in
~/projects/data/airflow/teams/team_name/common
Then I need to use
from teams.team_name.common import my_module
In your case, if the root is the upper folder of bi, and you put the scripts of calc in bi/inbound_layer/test.py then you can use:
from bi.inbound_layer.test import calc
And you must make sure you have \__init\__.py files in the directory structure for the imports to function properly. You should have an empty file \__init\__.py in each folder in the path. It indicates this directory is part of airflow packages. In your case, you can use touch \__init\__.py (cli) under bi and _inbound_layer_ folders to create the empty __init\__.py.

Airflow adds dags/, plugins/, and config/ directories in the Airflow home to PYTHONPATH by default so you can for example create folder commons under dags folder, create file there (scriptFileName ). Assuming that script has some class (GetJobDoneClass) you want to import in your DAG you can do it like this:
from common.scriptFileName import GetJobDoneClass

I needed insert the following script inside at the top of ren.py :
import sys, os
from airflow.models import Variable
DAGBAGS_DIR = Variable.get('DAGBAGS_DIR')
sys.path.append(DAGBAGS_DIR + '/bi/inbound_layer/')
This way I make available the current folder packages.

Related

Import Error Attempted relative import when importing from another folder

It might be a pretty simple error but I don't get to solve it. The issue is that I have two folders, folder1 and utils. Inside folder one, I am trying to import a function from utils.py script in utils folder. Both folders have their respective init files. However, I get this import error and I don't know why.
Folder1
Script1.py
utils
utils.py
The way I try to import the package is:
from .utils.utils import *
Is there anything I am doing wrong?
I had written a utility script which I wanted to use for rest of my scripts. This is how I import -
sys.path.append(path_to_script)
import my_util as mu
df = mu.get_data()
We need to add the path to sys.path and then we can access that as a module.
path_to_script is the absolute path of the directory where your script/files are.
If the script is in same directory then we can simply import it as -
import my_util as mu
df = mu.get_data()
In this case, we do not need to add the path to sys.path as well. It is simply like importing another file inside a python application.

import local packages inside PyFlink

I'm trying to write a local package in PyFlink project.
But I can only import via relative path.
like
from .package import func
Can I use absolute paths in packages inside PyFlink project imported as env.add_python_file('/path_to_project') ?
For using absolute paths answer from https://lists.apache.org/list.html?user#flink.apache.org:
full answer here
for abstract structure the directory:
flink_app/
data_service/
filesystem.py
validator/
validator.py
common/
constants.py
main.py <- entry job
When submitting the PyFlink job you can specify python files and entry main module with option --pyFiles and --pyModule1, like:
$ ./bin/flink run --pyModule flink_app.main --pyFiles ${WORKSPACE}/flink_app
In this way, all files under the directory will be added to the PYTHONPAHT of both the local client and the remote python UDF worker.

How to execute multiple commands in powershell using python?

I have the following code:
import os
os.system(cd new_folder)
Now, this code changes the directory to the "new_folder" folder, but what I want to do is make the directory stay in that state so I can conduct another line of code, such as the following:
import os
os.system(cd new_folder)
os.system(md good_folder)
What I intend to do in the above is after I move inside the "new_folder" folder directory, I make another folder named "good_folder" inside the "new_folder" folder.
But I can't seem to find a way to make the directory stay in the same place... how can this be done?
You can have multiple commands separated by semicolons. In your case it would look like:
import os
os.system("cd someFolder; mkdir newFolder")
Refer to this thread for further explanation.

Can not import class from another folder

I have a similar issue as this Can't get Python to import from a different folder The solution there doesn't solve my issue.
I'm working with Airflow lib. The lib updated one of the operators and since I can not at this time upgrade the my Airflow version I want to manually download the operator .py file and use it in my code manualy.
airflow
-dags
--mydag.py
-AddedOperators
--sagemaker_tuning_operator.py
--__init__.py (empty file)
The sagemaker_tuning_operator.py is this file:
https://github.com/apache/airflow/blob/master/airflow/contrib/operators/sagemaker_tuning_operator.py
It contains the class SageMakerTuningOperator
In my mydag.py I do:
from AddedOperators.sagemaker_tuning_operator import SageMakerTuningOperator
When Airflow try to parse mydag.py I get:
No module named AddedOperators.sagemaker_tuning_operator
Check if your project directory is in your system path or not. You can can check it as follows:
import sys
print(sys.path)
Note that when running a Python script, sys.path doesn’t care what
your current “working directory” is. It only cares about the path to
the script. For example, if my shell is currently at the Airflow/
folder and I run python ./dags/mydag.py, then sys.path includes
Airflow/dags/ but NOT Airflow/
If you project directory is not in sys path, you can do following:
Include it dynamically.
import sys
sys.path.insert(0, 'path/to/your/')
# import your module now
Import all required modules in your project root folder in some file like app.py and then call from here.

How to make my Python unit tests to import the tested modules if they are in sister folders?

I am still getting my head around the import statement. If I have 2 folders in the same level:
src
test
How to make the py files in test import the modules in src?
Is there a better solution (like put a folder inside another?)
The code you want is for using src/module_name.py
from src import module_name
and the root directory is on your PYTHONPATH e.g. you run from the root directory
Your directory structure is what I use but with the model name instead from src. I got this structure from J Calderone's blog and
Try this out:
import sys
import os
sys.path.append(os.path.join('..', 'src'))
import module_in_src_folder
edited to support any platform
I have exactly the same situation as the OP with all the python projects I write:
Project Folder
src
test
All modules, whether in src, or test, or subfolders of these always use the form of import that Mark shows in his answer:
from src import module_name
What I have done is write a module that sits in Project Folder and recursively discovers all the test modules within the test folder and gets unittest to run all those tests. As python is running in Project Folder, then modules are relative to the working directory.
This means that the tests are just like any other client that wants to modules from src.

Categories

Resources