I have a directory structure which looks like below :
.
├── bitbucket-pipelines.yml
├── MANIFEST.in
├── pylintrc
├── setup.cfg
├── setup.py
├── src
│ ├── bin
│ │ ├── __init__.py
│ │ └── project.py
│ ├── __init__.py
│ └── ml_project
│ ├── configurations
│ │ └── precommit
│ ├── core
│ │ ├── command
│ │ │ ├── abs_command.py
│ │ │ ├── __init__.py
│ │ │ ├── no_command.py
│ │ │ ├── precommit.py
│ │ │ ├── project_utils.py
│ │ │ ├── setupsrc.py
│ │ │ └── setuptox.py
│ │ ├── configurations
│ │ │ └── precommit
│ │ └── __init__.py
│ └── __init__.py
└── tox.ini
When i do the packaging for the project my requirement is to basically copy the files .gitlint and .pre-commit-config.yaml files inside the configurations/precommit folder of my ml_project package. configurations is just a normal directory and not a Python package as it does not contain .py files.
A small edit the .gitlint and .pre-commit-config.yaml are in the same level as setup.py is.
My setup.py looks like below :
"""Setup script."""
import io
import re
import os
import shutil
from setuptools import setup
PROJECT_NAME = "ml_project"
CONFIGURATIONS_DIR_NAME = "configurations"
FULL_CONFIG_DIR = os.path.join("src", PROJECT_NAME, CONFIGURATIONS_DIR_NAME)
def get_version() -> str:
"""Return the version stored in `ml_project/__init__.py:__version__`."""
# see https://github.com/pallets/flask/blob/master/setup.py
with io.open("src/ml_project/__init__.py", "rt", encoding="utf8") as init_file:
return re.search(r'__version__ = "(.*?)"', init_file.read()).group(1)
def add_config_files_for_package(source_dir: str = None) -> None:
if not source_dir:
source_dir = os.path.dirname(os.path.abspath(__file__))
config_files = {"precommit": [".gitlint", ".pre-commit-config.yaml"]}
for config in config_files:
config_dir = os.path.join(source_dir, FULL_CONFIG_DIR, config)
for file in config_files[config]:
shutil.copyfile(
os.path.join(source_dir, file), os.path.join(config_dir, file)
)
add_config_files_for_package()
setup(version=get_version())
So i am using the add_config_files_for_package function to do the copying when i run python setup.py sdist.
I have a MANIFEST.in file which looks like below :
include .gitlint
include .pre-commit-config.yaml
graft src/ml_project
And finally below is my setup.cfg :
[options]
package_dir =
=src
packages = find:
include_package_data = true
install_requires =
click
pre-commit
pyyaml
gitlint
[options.packages.find]
where = src
[options.entry_points]
console_scripts =
project = bin.project:main
[options.extras_require]
tests =
pytest
pytest-mock
pyfakefs
pyyaml
configparser
linting =
pylint
testdocs =
pydocstyle
pre-commit =
pre-commit
[semantic_release]
version_variable = ml_project/__init__.py:__version__
This runs fine but my question is : is there a better and more standard way of doing this stuff ? Like without writing the function in the first place at all?
Thanks for any pointers in advance.
As mentioned in the comments, it could be a good idea to place these files in the src/ml_project/configurations/precommit directory and create symbolic links to these files at the root of the project. Symbolic links should play well with git, but some platforms (Windows for example) don't have good support for them.
Or the copy of these files could be just another step in the build process (eventually via a custom setuptools command), that should be triggered from a Makefile (for example, or any other similar tool), and from the CI/CD toolchains (bitbucket-pipelines.yml in this case).
Related
We have a python project structure as following, airflow is a new:
├── python
│ ├── airflow
│ │ ├── airflow.cfg
│ │ ├── config
│ │ ├── dags
│ │ ├── logs
│ │ ├── requirements.txt
│ │ └── webserver_config.py
│ ├── shared_utils
│ │ ├── auth
│ │ ├── datadog
│ │ ├── drivers
│ │ ├── entities
│ │ ├── formatter
│ │ ├── helpers
│ │ └── system
...
We have several other package same level as shared_utils, some are common libraries and some are standalone backend services.
We want to keep airflow part independent and meanwhile to benefit the common libraries. We have python folder in PYTHONPATH, python/airflow is in PYTHONPATH as well (currently airflow doesn't import any code from other package).
I am wondering how can I call code from shared_utils in my airflow dags, or how should I organize the project structure to make it possible?
UPDATE:
it seems there is no conflict when I set python and python/airflow both in PYTHONPATH, after add requirements from shared_utils to airflow, it does work as expected.
I have a project where I have this layout for the project
|
|-- dags/
|---- dag.py
|-- logs/
|-- plugins/
|---- __init__.py
|---- core.py
|-- airflow.cfg
And then I keep the core stuff in the core.py.
When I want to use the code in the core.py file then I will in the dag.py do the following:
from core import <some function>
Note:
This is my airflow.cfg file and it registers the plugins folder so PythonVirtualOperator can find the code in the plugins.
[core]
dags_folder = {AIRFLOW_HOME}/dags
plugins_folder = {AIRFLOW_HOME}/plugins%
TLDR;
So for your case, I would imagine you can do like this in the airflow.cfg:
plugins_folder = {AIRFLOW_HOME}/shared_utils%
You can just move your shared_utils to a new folder my_package in python folder, then add the my_package path to your python path:
# in your host
echo export PYTHONPATH="/path/to/python/my_package:$PYTHONPATH" >> ~/.profile
# in airflow docker image
ENV PYTHONPATH="/path/to/python/my_package"
Now you can import from your package in all the python consoles:
from shared_utils.auth import module_x
I have several related projects that I think will be a good fit for Python's namespace-packages. I'm currently running python 3.8, and have created the following directory structure for testing.
├── namespace-package-test.package1
│ ├── LICENSE.txt
│ ├── README.md
│ ├── setup.cfg
│ ├── setup.py
│ ├── src
│ │ └── pkg1
│ │ ├── cli
│ │ │ ├── __init__.py
│ │ │ └── pkg1_cli.py
│ │ └── __init__.py
│ └── tests
├── namespace-package-test.package2
│ ├── AUTHORS.rst
│ ├── CHANGELOG.rst
│ ├── LICENSE.txt
│ ├── README.md
│ ├── setup.cfg
│ ├── setup.py
│ ├── src
│ │ └── pkg2
│ │ ├── cli
│ │ │ ├── __init__.py
│ │ │ └── pkg2_cli.py
│ │ └── __init__.py
│ └── tests
The entire project is on a private bitbucket (cloud) server at;
git#bitbucket.org:<my-company>/namespace-package-test.git
I would like to install, locally, only package 1. I've tried every iteration I can imagine of the following, but nothing seems to get me there. I either get a repository not found error or a setup.py not found error.
pip install git+ssh://git#bitbucket.org:<my-company>/namespace-package-test.package1.git
Is this possible?
Is my project structure correct for what I am doing?
What should the pip install command look like?
Bonus, what if I only want to install a specific spec using pipx?
pipx install "namespace-package-test.package1[cli] # git+ssh://git#bitbucket.org:<my-company>/namespace-package-test.package1.git"
I think I figured it out ... for posterity sake
Pip install (into virtual environment)
pip install git+ssh://git#bitbucket.org/<company name>/namespace-package-test.git/#subdirectory=namespace-package-test.package1
pipx install - with spec
pipx install "namespace-package-test.package1[cli] # git+ssh://git#bitbucket.org/<company name>/namespace-package-test.git/#subdirectory=namespace-package-test.package1"
I want to run my built CLI like other cli tools, for eg, kubectl, redis, etc. Currently, I run my cli as: python3 cli.py subarg --args; instead, I want to run: invdb subarg --args where invdb is the Python package.
The structure of the project repository is:
.
├── CHALLENGE.md
├── Pipfile
├── Pipfile.lock
├── README.md
├── __pycache__
│ └── config.cpython-38.pyc
├── data_platform_challenge_darwin
├── data_platform_challenge_linux
├── data_platform_challenge_windows
├── discussion_answers_rough_work
├── dist
│ ├── invdb-0.0.1.tar.gz
│ └── invdb-tesla-kebab-mai-haddi-0.0.1.tar.gz
├── example.json
├── invdb
│ ├── __init__.py
│ ├── analysis.py
│ ├── cleanup.py
│ ├── cli.py
│ ├── config.py
│ ├── etl.py
│ ├── groups.py
│ ├── initialize_db.py
│ └── nodes.py
├── invdb.egg-info
│ ├── PKG-INFO
│ ├── SOURCES.txt
│ ├── dependency_links.txt
│ └── top_level.txt
├── setup.py
├── test.db
└── tests
setuptools (or is it distutils? The line is so blurry) provides an entry_points.console_scripts option that can do this for you when installing your package. I will provide an example repository at the bottom of my summary.
Construct a project tree like so:
# /mypackage/mymodule.py
print("We did it!")
# /pyproject.toml
[build-system]
requires = ["setuptools", "wheel"]
build-backend = "setuptools.build_meta"
# this is required for Python to recognize setuptools as the build backend
[metadata]
name = sample_module
version = 0.0.1
author = Adam Smith
description = console_script example
[bdist_wheel]
universal = true
[options]
packages = my_package
python_requires = >=2.7
entry_points =
[console_scripts]
sample_module = my_package.my_module:main
then run the following at the shell:
$ python3 -mpip install .
(ed. this will install the file locally. To build a wheel (to install elsewhere) try pep517)
If you get a warning about the installation script not being on your PATH, you should consider adding it. Otherwise, just run your new script
$ sample_module
We did it!
GitLab: nottheeconomist/console_script_example
Since you already have a setup.py, consider adding the following entry to your setuptools.setup call:
# ...
setuptools.setup(
# ...
entry_points = {
'console_scripts': ['sample_module=my_package.my_module:main']
}
)
I wrote a small tool (package) that reuses an existing namespace, pki.server. I named my package as pki.server.healthcheck. The old namespace did not use setuptools to install the package, while my package uses it.
Contents of setup.py
from setuptools import setup
setup(
name='pkihealthcheck',
version='0.1',
packages=[
'pki.server.healthcheck.core',
'pki.server.healthcheck.meta',
],
entry_points={
# creates bin/pki-healthcheck
'console_scripts': [
'pki-healthcheck = pki.server.healthcheck.core.main:main'
]
},
classifiers=[
'Programming Language :: Python :: 3.6',
],
python_requires='!=3.0.*,!=3.1.*,!=3.2.*,!=3.3.*,!=3.4.*',
setup_requires=['pytest-runner',],
tests_require=['pytest',],
)
The installation tree (from scenario 1 below) looks like:
# tree /usr/lib/python3.8/site-packages/pki/
├── __init__.py <---- Has methods and classes
├── cli
│ ├── __init__.py <---- Has methods and classes
│ ├── <some files>
├── server
│ ├── cli
│ │ ├── __init__.py <---- Has methods and classes
│ │ ├── <Some files>
│ ├── deployment
│ │ ├── __init__.py <---- Has methods and classes
│ │ ├── <some files>
│ │ └── scriptlets
│ │ ├── __init__.py <---- Has methods and classes
│ │ ├── <some files>
│ ├── healthcheck
│ │ ├── core
│ │ │ ├── __init__.py <---- EMPTY
│ │ │ └── main.py
│ │ └── pki
│ │ ├── __init__.py <---- EMPTY
│ │ ├── certs.py
│ │ └── plugin.py
│ └── instance.py <---- Has class PKIInstance
└── <snip>
# tree /usr/lib/python3.8/site-packages/pkihealthcheck-0.1-py3.8.egg-info/
├── PKG-INFO
├── SOURCES.txt
├── dependency_links.txt
├── entry_points.txt
└── top_level.txt
I read the official documentation and experimented with all 3 suggested methods. I saw the following results:
Scenario 1: Native namespace packages
At first, everything seemed smooth. But:
# This used to work before my package gets installed
>>> import pki.server
>>> instance = pki.server.instance.PKIInstance("pki-tomcat")
Traceback (most recent call last):
File "<stdin>", line 1, in <module>
AttributeError: module 'pki.server' has no attribute 'instance'
Now, only this works
>>> import pki.server.instance
>>> instance = pki.server.instance.PKIInstance("pki-tomcat")
>>> instance
pki-tomcat
Scenario 2: pkgutil-style namespace packages
I am restricted from using this method as my other __init__.py contain classes and functions
Scenario 3: pkg_resources-style namespace packages
Though this method was not-recommended, I went ahead and experimented with it by adding namespace=pki.server.healthcheck to my setup.py. This made all pki.* modules invisible
So I am convinced that Scenario 1 seems to be the closest to what I'm trying to achieve. I was reading an old post to understand more on how import in python works.
My question is: Why does a perfectly working snippet break after I install my package?
Your __init__.py files need to import the files. You have two options--absolute and relative imports:
Relative Imports
pki/__init__.py:
from . import server
pki/server/__init__.py:
from . import instance
Absolute Imports
pki/__init__.py:
import pki.server
pki/server/__init__.py:
import pki.server.instance
I have a multiplatform project in which I required to ship few third party executable/ data files which are not part of python. In the source file I kept it under data directory and from main script calling the executable using this line
trd_prt_exe = os.path.join("tools", "syslinux", "bin", "executable_name")
It works perfectly fine while testing/ developing from source. Problem comes when I distribute the same using setup.py. After installing application using setup.py, I get this error
for path, subdirs, files in os.walk(os.path.join("tools"))
File "/usr/lib/python2.7/os.py", line 276, in walk
names = listdir(top)
TypeError: coercing to Unicode: need string or buffer, NoneType found
Clearly, python could not find my executables under data directory.
How can we access these executables/ data files during development and after distributing it.
Update I
I could have included but simply forgot. Here is my complete project strecture:-
[sundar#arch multibootusb-7.0.0]$ tree
.
├── data
│ ├── multibootusb.desktop
│ └── multibootusb.png
├── LICENSE.txt
├── multibootusb
├── PKG-INFO
├── README.txt
├── scripts
│ ├── admin.py
│ ├── detect_iso.py
│ ├── __init__.py
│ ├── install_distro.py
│ ├── install_syslinux.py
│ ├── isodump.py
│ ├── multibootusb_ui.py
│ ├── qemu.py
│ ├── uninstall_distro.py
│ ├── update_cfg.py
│ └── var.py
├── setup.py
└── tools
├── checking.gif
├── mbr.bin
├── multibootusb
│ ├── chain.c32
│ ├── extlinux.cfg
│ ├── grub.exe
│ ├── memdisk
│ ├── menu.c32
│ ├── menu.lst
│ ├── syslinux.cfg
│ └── vesamenu.c32
├── multibootusb.png
├── syslinux
│ └── bin
│ ├── syslinux3
│ ├── syslinux4
│ ├── syslinux5
│ └── syslinux6
└── version.txt
Here is what I have in setup.py:-
from distutils.core import setup
import os
mbusb_version = open(os.path.join("tools", "version.txt"), 'r').read().strip()
setup(
name='multibootusb',
version=mbusb_version,
packages=['scripts'],
scripts = ['multibootusb'],
platforms = ['Linux'],
url='http://multibootusb.org/',
license='General Public License (GPL)',
author='Sundar',
author_email='feedback.multibootusb#gmail.com',
description='Create multi boot Live linux on a USB disk...',
long_description = 'The multibootusb is an advanced cross-platform application for installing/uninstalling Linux operating systems on to USB flash drives.',
data_files = [("/usr/share/applications",["data/multibootusb.desktop"]),
('/usr/share/pixmaps',["data/multibootusb.png"]),
('multibootusb/tools',["tools/checking.gif"]),
('multibootusb/tools',["tools/mbr.bin"]),
('multibootusb/tools',["tools/version.txt"]),
('multibootusb/tools/multibootusb',["tools/multibootusb/chain.c32"]),
('multibootusb/tools/multibootusb',["tools/multibootusb/extlinux.cfg"]),
('multibootusb/tools/multibootusb',["tools/multibootusb/grub.exe"]),
('multibootusb/tools/multibootusb',["tools/multibootusb/memdisk"]),
('multibootusb/tools/multibootusb',["tools/multibootusb/menu.c32"]),
('multibootusb/tools/multibootusb',["tools/multibootusb/menu.lst"]),
('multibootusb/tools/multibootusb',["tools/multibootusb/syslinux.cfg"]),
('multibootusb/tools/multibootusb',["tools/multibootusb/vesamenu.c32"]),
('multibootusb/tools/syslinux/bin',["tools/syslinux/bin/syslinux3"]),
('multibootusb/tools/syslinux/bin',["tools/syslinux/bin/syslinux4"]),
('multibootusb/tools/syslinux/bin',["tools/syslinux/bin/syslinux5"]),
('multibootusb/tools/syslinux/bin',["tools/syslinux/bin/syslinux6"])]
#('multibootusb/tools',["tools/multibootusb.png"])]
)
The problem what I found is that the main executable script "multibootusb" is available in usr/bin/multibootusb but other data/ third party executables are under /usr/multibootusb/ and other modules/s scripts required by main program multibootusb is under /usr/lib/python2.7/site-packages/scripts. Therefore, the main program is unable to locate my third party data/ executables.
How to overcome this issue? where am I doing wrong?