We have a python project structure as following, airflow is a new:
├── python
│ ├── airflow
│ │ ├── airflow.cfg
│ │ ├── config
│ │ ├── dags
│ │ ├── logs
│ │ ├── requirements.txt
│ │ └── webserver_config.py
│ ├── shared_utils
│ │ ├── auth
│ │ ├── datadog
│ │ ├── drivers
│ │ ├── entities
│ │ ├── formatter
│ │ ├── helpers
│ │ └── system
...
We have several other package same level as shared_utils, some are common libraries and some are standalone backend services.
We want to keep airflow part independent and meanwhile to benefit the common libraries. We have python folder in PYTHONPATH, python/airflow is in PYTHONPATH as well (currently airflow doesn't import any code from other package).
I am wondering how can I call code from shared_utils in my airflow dags, or how should I organize the project structure to make it possible?
UPDATE:
it seems there is no conflict when I set python and python/airflow both in PYTHONPATH, after add requirements from shared_utils to airflow, it does work as expected.
I have a project where I have this layout for the project
|
|-- dags/
|---- dag.py
|-- logs/
|-- plugins/
|---- __init__.py
|---- core.py
|-- airflow.cfg
And then I keep the core stuff in the core.py.
When I want to use the code in the core.py file then I will in the dag.py do the following:
from core import <some function>
Note:
This is my airflow.cfg file and it registers the plugins folder so PythonVirtualOperator can find the code in the plugins.
[core]
dags_folder = {AIRFLOW_HOME}/dags
plugins_folder = {AIRFLOW_HOME}/plugins%
TLDR;
So for your case, I would imagine you can do like this in the airflow.cfg:
plugins_folder = {AIRFLOW_HOME}/shared_utils%
You can just move your shared_utils to a new folder my_package in python folder, then add the my_package path to your python path:
# in your host
echo export PYTHONPATH="/path/to/python/my_package:$PYTHONPATH" >> ~/.profile
# in airflow docker image
ENV PYTHONPATH="/path/to/python/my_package"
Now you can import from your package in all the python consoles:
from shared_utils.auth import module_x
Related
I have a directory structure which looks like below :
.
├── bitbucket-pipelines.yml
├── MANIFEST.in
├── pylintrc
├── setup.cfg
├── setup.py
├── src
│ ├── bin
│ │ ├── __init__.py
│ │ └── project.py
│ ├── __init__.py
│ └── ml_project
│ ├── configurations
│ │ └── precommit
│ ├── core
│ │ ├── command
│ │ │ ├── abs_command.py
│ │ │ ├── __init__.py
│ │ │ ├── no_command.py
│ │ │ ├── precommit.py
│ │ │ ├── project_utils.py
│ │ │ ├── setupsrc.py
│ │ │ └── setuptox.py
│ │ ├── configurations
│ │ │ └── precommit
│ │ └── __init__.py
│ └── __init__.py
└── tox.ini
When i do the packaging for the project my requirement is to basically copy the files .gitlint and .pre-commit-config.yaml files inside the configurations/precommit folder of my ml_project package. configurations is just a normal directory and not a Python package as it does not contain .py files.
A small edit the .gitlint and .pre-commit-config.yaml are in the same level as setup.py is.
My setup.py looks like below :
"""Setup script."""
import io
import re
import os
import shutil
from setuptools import setup
PROJECT_NAME = "ml_project"
CONFIGURATIONS_DIR_NAME = "configurations"
FULL_CONFIG_DIR = os.path.join("src", PROJECT_NAME, CONFIGURATIONS_DIR_NAME)
def get_version() -> str:
"""Return the version stored in `ml_project/__init__.py:__version__`."""
# see https://github.com/pallets/flask/blob/master/setup.py
with io.open("src/ml_project/__init__.py", "rt", encoding="utf8") as init_file:
return re.search(r'__version__ = "(.*?)"', init_file.read()).group(1)
def add_config_files_for_package(source_dir: str = None) -> None:
if not source_dir:
source_dir = os.path.dirname(os.path.abspath(__file__))
config_files = {"precommit": [".gitlint", ".pre-commit-config.yaml"]}
for config in config_files:
config_dir = os.path.join(source_dir, FULL_CONFIG_DIR, config)
for file in config_files[config]:
shutil.copyfile(
os.path.join(source_dir, file), os.path.join(config_dir, file)
)
add_config_files_for_package()
setup(version=get_version())
So i am using the add_config_files_for_package function to do the copying when i run python setup.py sdist.
I have a MANIFEST.in file which looks like below :
include .gitlint
include .pre-commit-config.yaml
graft src/ml_project
And finally below is my setup.cfg :
[options]
package_dir =
=src
packages = find:
include_package_data = true
install_requires =
click
pre-commit
pyyaml
gitlint
[options.packages.find]
where = src
[options.entry_points]
console_scripts =
project = bin.project:main
[options.extras_require]
tests =
pytest
pytest-mock
pyfakefs
pyyaml
configparser
linting =
pylint
testdocs =
pydocstyle
pre-commit =
pre-commit
[semantic_release]
version_variable = ml_project/__init__.py:__version__
This runs fine but my question is : is there a better and more standard way of doing this stuff ? Like without writing the function in the first place at all?
Thanks for any pointers in advance.
As mentioned in the comments, it could be a good idea to place these files in the src/ml_project/configurations/precommit directory and create symbolic links to these files at the root of the project. Symbolic links should play well with git, but some platforms (Windows for example) don't have good support for them.
Or the copy of these files could be just another step in the build process (eventually via a custom setuptools command), that should be triggered from a Makefile (for example, or any other similar tool), and from the CI/CD toolchains (bitbucket-pipelines.yml in this case).
So i'm trying to use this github repository, i put it on my site-packages folder and tested this example, but i got the error cannot import name 'market_candles'. What could be causing this problem? I already made sure that TA-Lib, Pandas and Matplotlib are installed, so where could be the problem? I'm looking at the __init__py and it seems fine.
Cloning the repository directly into your sites-packages will result in nested pyttrex folders. Copy only the pyttrex/pyttrex directory into sites-packages
pyttrex
├── LICENSE
├── README.md
├── pyttrex
│ ├── ADX.py
│ ├── __init__.py
│ ├── average_n
│ ├── average_true_range.py
│ ├── backtest.py
│ └── test.py
└── tgnotifier.py
I have a project with directory structure like below
.
├── Pipfile
├── Pipfile.lock
├── module
│ ├── __init__.py
│ ├── helpers
│ │ ├── __init__.py
│ │ ├── __pycache__
│ │ │ └── __init__.cpython-36.pyc
│ │ ├── dynamo.py
│ │ └── logger.py
│ └── test.py
Relavant code
logger.py
import click
import sys
from tabulate import tabulate
def formatter(string, *rest):
return string.format(*rest)
def info(*rest):
"""Write text in blue color
"""
click.echo(click.style('☵ ' + formatter(*rest), fg='blue'))
test.py
import helpers
helpers.logger.info('Trying')
When I try to run using the command
python3 module/test.py
I get this error
Traceback (most recent call last):
File "module/test.py", line 4, in <module>
helpers.logger.info('Trying')
AttributeError: module 'helpers' has no attribute 'logger'
I have tried restructuring the code. Putting the helpers directory outside, in level with module directory. But still it didn't work, which it should not have, from what I have read. I tried researching a bit about __init__.py and python module system. The more I read, the more confusing it gets. But from whatever I learned, I created another sample project. With the following structure,
.
└── test
├── __init__.py
├── helpers
│ ├── __init__.py
│ ├── __pycache__
│ │ ├── __init__.cpython-36.pyc
│ │ └── quote.cpython-36.pyc
│ └── quote.py
├── index.py
├── logger
│ ├── __init__.py
│ ├── __pycache__
│ │ ├── __init__.cpython-36.pyc
│ │ └── info.cpython-36.pyc
│ └── info.py
Code is same as first project.
Here when I do,
python3 test/index.py
It works as expected. The only difference between the two projects:
In the first project, I used pipenv to install deps and create virtual environment.
Using your initial layout (with loggers as a submodule of the helpers package), you'd need to explicitely import loggers in helpers/__init__.py to expose it as an attribute of the helpers package:
# helpers/__init__.py
from . import logger
logger is module and not attribute and helpers.logger evalutes logger as attribute. Actually you should do as follow:
from helpers import logger
print(logger.info('Trying'))
I want to create a pybuilder project with unit tests and packages. As an example, modified simple python app example, with "helloworld" moved to package "hello".
My first instinct was to match the package structure with "main" and "unittest" sources:
+---src
+---main
| \---python
| \---hello
| helloworld.py
| __init__.py
|
\---unittest
\---python
\---hello
helloworld_tests.py
__init__.py
This does not work because of conflicting "hello" package.
BUILD FAILED - 'module' object has no attribute 'helloworld_tests'
I see pybuilder itself just skips the top-level pybuilder package in unittests, but won't do if there are multiple top-level packages.
My second guess would be to create extra top level package for unittests.
\---unittest
\---python
\---tests
| __init__.py
\---hello
helloworld_tests.py
__init__.py
Is there a better solution or established convention how to organize python tests in packages?
Probably nothing really new for the OP, but I just wanted to collect all options that I could come up with in one place:
1) Just append _tests to names of top-level packages
The easiest way to mirror the structure of src/main/python in the src/test/python almost 1:1 would be by simply appending _tests to the names of the top-level packages. For example, if I have only one top-level package rootPkg, then I can add the corresponding rootPkg_tests to the test/ subdirectory:
src
├── main
│ ├── python
│ │ └── rootPkg
│ │ ├── __init__.py
│ │ ├── pkgA
│ │ │ ├── __init__.py
│ │ │ └── modA.py
│ │ └── pkgB
│ │ ├── __init__.py
│ │ └── modB.py
│ └── scripts
│ └── entryPointScript.py
└── test
└── python
└── rootPkg_tests
├── __init__.py
├── pkgA
│ ├── __init__.py
│ └── modA_tests.py
└── pkgB
├── __init__.py
└── modB_tests.py
This seems to work nicely with PyBuilder 0.11.15 and unittest plugin (notice that I've deviated from PyBuilders convention, and put tests in test instead of unittest, you probably shouldn't do this if you intend to use multiple testing frameworks).
2) If there is only one package: do it like PyBuilder
The PyBuilder is itself built with PyBuilder. This is what its source directory looks like (drastically reduced, unnecessary details omitted):
src
├── main
│ ├── python
│ │ └── pybuilder
│ │ ├── __init__.py
│ │ ├── cli.py
│ │ ├── core.py
│ │ └── plugins
│ │ ├── __init__.py
│ │ ├── core_plugin.py
│ │ └── exec_plugin.py
│ └── scripts
│ └── pyb
└── unittest
└── python
├── cli_tests.py
├── core_tests.py
├── plugins
│ ├── exec_plugin_tests.py
│ ├── __init__.py
│ ├── python
│ │ ├── core_plugin_tests.py
│ │ ├── __init__.py
If I understand it correctly, the tree in unittest mirrors the tree in src, but the directory for the top-level package pybuilder is omitted. That's what you have described in your question as first workaround. The drawback is that it doesn't really work if there are multiple top-level packages.
3) Add one additional tests top-level package
That's what you have proposed as a workaround: mirror the tree in main, but wrap everything in an additional tests-package. This works with many top-level packages in /src/main/python and prevents any package name collisions.
I'm not aware of any convention. Upvote one of the comments below if you have an opinion on that matter.
i have the following file structure:
ihe/
├── dcmt
│ ├── actions
│ ├── calendar_observer
│ ├── cms
│ ├── consumption
│ ├── data_mining
│ ├── dcmt
│ ├── dcmt_db
│ ├── dcmt_db.bak.bak
│ ├── dcmt_db.sqlite
│ ├── devices
│ ├── d.py
│ ├── gadgets
│ ├── history
│ ├── houses
│ ├── hwc_settings
│ ├── __init__.py
│ ├── __init__.pyc
│ ├── manage.py
│ ├── notifications
│ ├── profitable
│ ├── rules
│ └── schedule
├── hwc
│ ├── configuration
│ ├── daemons
│ ├── database
│ ├── __init__.py
│ ├── __init__.pyc
│ ├── utils
│ └── wrapper
├── __init__.py
├── __init__.pyc
dcmt is a django project. hwc is pure python. however for instance in hwc/daemons there is a runme.py script. in that runme.py script i want to be able to import the models from the django project. Now as i understand it i have to have the correct python path and then somehow set the django settings. My question is how do i best do this so that for the whole hwc modules I only have to do that once?
Your project structure seems a bit confused.
It's probably not a good idea to have a Django project inside another package hierarchy. A lot of the import paths assume your project is in a top-level package and the only reason you're probably not running into issues already is that Python 2.x still supports relative imports (which have been removed in 3.x). This makes references to packages very ambiguous and can cause weird bugs.
From what I can see your settings package is actually called (fully-qualified) ihe.dcmt.hwc_settings. If ihe is in your Python path (check the value of sys.path in the script you're trying to run), that (i.e. the fully-qualified path) is probably what DJANGO_SETTINGS_MODULE should point at.
If you want to hook into Django's functionality in your scripts, you might want to look into the documentation for writing manage.py commands. This would let you write Django-related scripts more consistently and save you the worry about referencing and initialising Django's settings correctly yourself.