Why docker-compose python no module found for airflow operator - python

I am working on tests for docker-airflow postgres etl. My project structure currently looks like this:
docker-airflow
|
├── Dockerfile
├── __init__.py
├── dags
│   ├── __init__.py
│   ├── pandas_etl.py
│   └── tuto.py
├── docker-compose.yml
├── operators
   ├── __init__.py
   └── pandas_etl_over_postgres_operator.py
When importing my pandas_etl_over_postgres_operator.py into the pandas_etl.py dag, I am getting an error that the module is not found.
The pandas_etl.py import code is:
from operators.pandas_etl_over_postgres_operator import PandasETLOverPostgresOperator
I have tried the following two alternatives, they also give the same error.
from .operators.pandas_etl_over_postgres_operator import PandasETLOverPostgresOperator
and
from ..operators.pandas_etl_over_postgres_operator import PandasETLOverPostgresOperator
The import works fine locally but fails when I build and run using docker-compose.

Please note that for airflow, by default, the [core]>dags_folder will have a value of /usr/local/airflow/dags meaning that airflow will look for dags at path /usr/local/airflow/dags.
As a result, all your dags code should be inside that folder and hence, here are a few things that you need to change for your code to work:
In docker-compose.yml file:
- ./dags:/usr/local/airflow/dags/dags
- ./logs:/usr/local/airflow/dags/logs
- ./operators:/usr/local/airflow/dags/operators
In pandas_etl.py file:
from operators.pandas_etl_over_postgres_operator import PandasETLOverPostgresOperator
Hope it helps!

I think it is because you need to put your operator inside a plugins directory and mount that into a container. Have a look at the Puckle documentation regarding plugins here. You can also see and change where your particular airflow instance looks for plugins by checking your configuration file here

Related

Importing a file to a Django project - errors when running manage.py due to location of script

So, I know the source of my error, and I can fix it in a kind of hacky way, but I want to know the sort of best practices way of solving it - especially as my hacky way runs into issues when running stuff via commandline - and throws errors in my IDE.
So, I have a django project, the folder tree looks like this (edited out irrelevant parts)
├── manage.py
├── simulator
│   ├── events.py
│   ├── parser.py
│   ├── parser_test.py
│   ├── patch.py
│   ├── simulator.py
├── VSPOMs
│   ├── settings.py
│   ├── urls.py
│   └── wsgi.py
└── VSPOMsApp
├── admin.py
├── apps.py
├── models.py
├── tests.py
├── urls.py
└── views.py
When I run manage.py that is obviously running in the . directory. views.py is in the VSPOMSapp directory. If I have the following imports in views.py
from ..simulator.patch import Patch
from ..simulator.simulator import Simulator
The IDE doesn't throw an error - and this is correct as it is searching in the parent directory for a file called patch in a folder called simulator that is in the dir above. However, when I run manage.py this causes me to get an error
ImportError: attempted relative import beyond top-level package
This is, as mentioned, because manage.py is not running in the same dir as views.py. I can fix this by importing from
simulator.patch
But this is A) hacky and B runs into other, worse errors.
In patch.py we have to import from events, however, as manage.py is running in a different directory for it to work in Django the code has to be
from simulator.events import ColonisationEvent, ExtinctionEvent
Which is not only wrong, as events is in the same dir as patch but doesn't work when I need to run from command line for testing purposes. Any idea on how to fix this? I've tried to look up relative imports and stuff, but I'm really not sure what the best solution is - adding an __init__.py file in the simulator didn't help either

Pylint disagrees with VSCode and python in imports

I am not finding the way to properly code so that both pylint and the execution of the code (within VSCode or from the command line) would work.
There are some similar questions but none seems to apply to my project structure with a src directory under which there will be multiple packages. Here's the simplified project structure:
.
├── README.md
├── src
│   ├── rssita
│   │   ├── __init__.py
│   │   ├── feeds.py
│   │   ├── rssita.py
│   │   └── termcolors.py
│   └── zanotherpackage
│   ├── __init__.py
│   └── anothermodule.py
└── tests
├── __init__.py
└── test_feeds.py
From what I understand rssita is one of my my packages (because of the init.py file) with some modules under it amongst which rssita.py file contains the following imports:
from feeds import RSS_FEEDS
from termcolors import PC
The rssita.py code as shown above runs well from both within VSCode and from command line python ( python src/rssita/rssita.py ) from the project root, but at the same time pylint (both from within VSCode and from the command line (pylint src or pylint src/rssita)) flags the two imports as not found.
If I modify the code as follows:
from rssita.feeds import RSS_FEEDS
from rssita.termcolors import PC
pylint will then be happy but the code will not run anymore since it would not find the imports.
What's the cleanest fix for this?
As far as I'm concerned pylinty is right, your setup / PYTHONPATH is screwed up: in Python 3, all imports are absolute by default, so
from feeds import RSS_FEEDS
from termcolors import PC
should look for top-level packages called feeds and termcolors which I don't think exist.
python src/rssita/rssita.py
That really ain't the correct invocation, it's going to setup a really weird PYTHONPATH in order to run a random script.
The correct imports should be package-relative:
from .feeds import RSS_FEEDS
from .termcolors import PC
Furthermore if you intend to run a package, that should be either a runnable package using __main__:
python -m rssita
or you should run the sub-package as a module:
python -m rssita.rssita
Because you're using an src-package, you'll either need to create a pyproject.toml so you can use an editable install, or you'll have to PYTHONPATH=src before you run the command. This ensures the packages are visible at the top-level of the PYTHONPATH, and thus correctly importable. Though I'm not a specialist in the interaction of src layouts & runnable packages, so there may be better solutions.

Importing files from different folder, "__init__.py" does not work

I have seen many responses suggesting including an __init__.py file in the subdirectory of submodules in order to import them as python package, but I can't get it working for my project. My project directory structure looks like this:
helm-2022
├── model
│   ├── __init__.py
│   ├── model.ipynb
│   └── torchModelSummary.py
├── preprocess
│   └── preprocess.py
└── utils
├── __init__.py
└── vis_utils.py
I want to import functions inside vis_utils.py in the notebook model.ipynb under model folder and preprocess.py under the preprocess folder. I have already added empty __init__.py under the utils folder. When I tried to import in model.ipynb using from utils import vis_utils, I still got No module named 'utils'. I have also tried to import by including the top directory from helm-2020.utils import vis_utils, but that gives me a syntax error because of the hyphen. I don't have permission to change the top directory name, so changing the hyphen is not an option. Thank you so much in advance.

Relative Repo Path in __init__.py?

I'll share a problem that we are kinda having at work. We have a code repo and there are files spread all around (currently trying to get everyone to clean up). The issue is that with every file that gets created we are having to make a RELATIVE_REPO_PATH (RRP) in order to import other files in our repo. The repo is hosted using mercurial, and all users have a clone of the repo on their local machines. They each have the ability to push and pull updates as needed. This means that the RRP cannot be hardcoded in anywhere. Below I'll show an example of our structure, and also show what we currently do to get a RRP.
An example of our structure.
.
├── __init__.py
├── misc_functions
│   ├── __init__.py
│   ├── myTest.py
│   └── __pycache__
│   └── __init__.cpython-37.pyc
└── Test
├── __init__.py
├── pythonfile1 (copy).py
└── pythonfile1.py
Here is what we currently do at work
import os
import sys
if not hasattr(sys, 'frozen'):
RELATIVE_REPO_PATH = os.path.dirname(os.path.dirname(os.path.realpath(__file__)))
if __name__ == '__main__':
sys.path.append(RELATIVE_REPO_PATH)
sys.path.pop(0)
else:
RELATIVE_REPO_PATH = os.path.dirname(sys.executable)
As you can see, however deep you are from the root of the repo, you need to add more or less os.path.dirname. We have tried this with while loops, however, alot of our files import each other and it's a kind of mess.
Is there any way to just add a simple snippet of code to the top level init.py or all the init.py files that would eliminate the constant copy and pasting of this code?
Just to clarify, everything that we do works just fine. Want to know if there is a way to set up a global RRP in the init.py that will work anywhere there is an init.py.

PyQt5 Project Structure and PyInstaller ModuleNotFoundError

I'm in the process of structuring my PyQt5 application to more established conventions. Now it looks like this
MyProj
├── my_qt_tool
│   ├── __init__.py
│   ├── class1.py
│   ├── my_qt_tool.py
│   ├── wizard1.py
│   ├── resources
│   │   └── templates
│   │   └── tool.conf.template
│   └── ui
│   ├── __init__.py
│   ├── mainwindow.py
│   ├── mainwindow.ui
│   ├── wizard_01_start.py
│   ├── wizard_01_start.ui
│   ├── ...
├── my_qt_tool.spec # for PyInstaller
├── bin
│   └── generate_ui_code.py # for compiling Qt *.ui to *.py
├── dist
│   └── my_qt_tool
├── environment.yml # conda environment requirements.
├── LICENSE
└── README.md
So MyProj is the top-level git repo, my_qt_tool is the package of my application, with a subpackage for UI specific code, my_qt_tool.py contains the "main" code which runs the GUI, class1.py handles business logic and wizard1.py is just some extra class for a GUI wizard.
Q1: Is this project structure canonical? Is the main function where it should be? Should *.ui files be separated to resources?
Now, after some haggling with imports, I added my_qt_tool as source directory to pycharm to make the imports work and created a run for my_qt_tool.py with working dir MyProj/my_qt_tool.
Q2: Technically, I want the working dir to be MyProj, but then I would have to reference resources/templates/tool.conf.template with my_qt_tool/resources.., which seems yucky... or is this the way to do it?
Now the imports in my_qt_tool look like this:
from class1 import DataModel
from ui.mainwindow import Ui_MainWindow
...
so no relative imports or the like, because everything is in the same package, right? (Again: to make this work, I had to add my_qt_tool as source directory in my PyCharm project settings...)
Q3: Okay, now the thing that doesn't work. Running PyInstaller on the spec file, which is pretty much stock with Analysis(['my_qt_tool/my_qt_tool.py'], ..., the resulting binary fails to start with the error message: ModuleNotFoundError: No Module named 'class1'. How can I fix this up?
Q1
if project going to get larger you may create module specific folders and each module has py and gui files within. structure it as mvc project folders.
For folder structure of mvc: https://www.tutorialsteacher.com/mvc/mvc-folder-structure and here is hov model-view architecture can be implemented https://www.learnpyqt.com/courses/model-views/modelview-architecture/.
Q2
read resources/templates/tool.conf.template when application bootstrapped instead of statically referencing. this could be done in generate_ui_code.py to load all configs as part of app reference
so no relative imports or the like, because everything is in the same package, right? (Again: to make this work, I had to add my_qt_tool as source directory in my PyCharm project settings...)
no need to add my_qt_tool if bootstrapped properly
Q3
add these 2 lines top of spec file
import sys
sys.setrecursionlimit(5000)
if you still encounter class1 related issue try importing it in where pyinstaller being called for your case its my_qt_tool.py
fix pyinstaller issue first and than consider refactoring folder structure with model-view conventions.
here are some fairly large project examples
https://wiki.python.org/moin/PyQt/SomeExistingApplications
https://github.com/topics/pyqt5-desktop-application

Categories

Resources