How to properly package set of callable python scripts or modules

How to properly package set of callable python scripts or modules - python

I've been searching the net for quite some time now but I can't seem to wrap my head around on how can I distribute my python scripts for my end user.
I've been using my scripts on my command line using this command python samplemodule.py "args1"
And this is also the way I want my user to also use it on their end with their command line. But my worry is that this certain modules have dependencies on other library or modules.
My scripts are working when they are all in the Project's root directory, but everything crumbles when I try to package them and put them in sub directories.
An example of this is I can't now run my scripts since its having an error when I'm importing a module from the data subdirectory.
This is my project structure.
MyProject
\formatter
__init__.py
__main__.py
formatter.py
addfilename.py
addscrapertype.py
...\data
__init__.py
helper.py
csv_formatter.py
setup.py
The csv_formatter.py file is just a wrapper to call the formatter.main.
Update: I was now able to generate a tar.gz package but the package wasn't callable when installed on my machine.
This is the setup.py:
import setuptools
with open("README.md", "r") as fh:
long_description = fh.read()
setuptools.setup(
name="formatter",
version="1.0.1",
author="My Name",
author_email="sample#email.com",
description="A package for cleaning and reformatting csv data",
long_description=long_description,
long_description_content_type="text/markdown",
url="https://github.com/RhaEL012/Python-Scripts",
packages=["formatter"],
include_package_data=True,
package_data={
# If any package contains *.txt or *.rst files, include them:
"": ["*.csv", "*.rst", "*.txt"],
},
entry_points={
"console_scripts": [
"formatter=formatter.formatter:main"
]
},
classifiers=[
"Programming Language :: Python :: 3",
"License :: OSI Approved :: MIT License",
"Operating System :: OS Independent",
],
python_requires='>=3.6',
install_requires=[
"pandas"
]
)
Now, after installing the package on the machine I wasn't able to call the module and it results in an error:
Z:\>addfilename "C:\Users\Username\Desktop\Python Scripts\"
Update: I try to install the setup.py in a virtual environment just to see where the error is coming from.
I install it then I get the following error: FileNotFoundError: [Errno 2] no such file or directory: 'README.md'
I try to include the README.md in the MANIFEST.in but still no luck.
So I try to make it a string just to see if the install will proceed.
The install proceed but then again, I encounter an error that says that the package directory 'formatter' does not exist

As I am not able to look into your specific files I will just explain how I usually tackle this issue.
This is the manner how I usually setup the command line interface (cli) tools. The project folder looks like:
Projectname
├── modulename
│   ├── __init__.py # this one is empty in this example
│   ├── cli
│   │   ├── __init__.py # this is the __init__.py that I refer to hereafter
│   ├── other_subfolder_with_scripts
├── setup.py
Where all functionality is within the modulename folder and subfolders.
In my __init__.py I have:
def main():
# perform the things that need to be done
# also all imports are within the function call
print('doing the stuff I should be doing')
but I think you can also import what you want into the __init__.py and still reference to it in the manner I do in setup.py.
In setup.py we have:
import setuptools
setuptools.setup(
name='modulename',
version='0.0.0',
author='author_name',
packages=setuptools.find_packages(),
entry_points={
'console_scripts': ['do_main_thing=modulename.cli:main'] # so this directly refers to a function available in __init__.py
},
)
Now install the package with pip install "path to where setup.py is" Then if it is installed you can call:
do_main_thing
>>> doing the stuff I should be doing
For the documentation I use: https://setuptools.readthedocs.io/en/latest/.
My recommendation is to start with this and slowly add the functionality that you want. Then step by step solve your problems, like adding a README.md etc.

I disagree with the other answer. You shouldn't run scripts in __init__.py but in __main__.py instead.
Projectfolder
├── formatter
│ ├── __init__.py
│ ├── cli
│ │ ├── __init__.py # Import your class module here
│ │ ├── __main__.py # Call your class module here, using __name__ == "__main__"
│ │ ├── your_class_module.py
├── setup.py
If you don't want to supply a readme, just remove that code and enter a description manually.
I use https://setuptools.readthedocs.io/en/latest/setuptools.html#find-namespace-packages instead of manually setting the packages.
You can now install your package by just running pip install ./ like you have been doing before.
After you've done that run: python -m formatter.cli arguments. It runs the __main__.py file you've created in the CLI folder (or whatever you've called it).
An important note about packaging modules is that you need to use relative imports. You'd use from .your_class_module import YourClassModule in that __init__.py for example. If you want to import something from an adjacent folder you need two dots, from ..helpers import HelperClass.

I'm not sure if this is helpful, but usually I package my python scripts using the wheel package:
pip install wheel
python setup.py sdist bdist_wheel
After those two commands a whl package is created in a 'dist' folder which you can then either upload to PyPi and download/install from there, or you can install it offline with the "pip install ${PackageName}.py"
Here's A useful user guide just in case there is something else that I didn't explain:
https://packaging.python.org/tutorials/packaging-projects/

Related

Python CLI entry point doesn't work as expected

The Setup
OS: Ubuntu 20.04
Python: 3.8.5 | pip: 20.0.2 | venv
Repo
.
├── build
├── dist
├── source.egg-info
├── source
├── readme.md
├── requirements.txt
├── setup.py
└── venv
source dir
.
├── config
├── examples
├── script.py
├── __init__.py
├── tests
└── utils
The important directories within the source directory are config, which contains a few .env and .json files; and utils, which is a package that contains a sub-package called config.
Running script.py, which references config and imports modules from utils, is how the CLI app is started. Ideally when it is run, it should load a bunch of environment variables, create some command aliases and display the application's prompt. (After which the user can start working within that shell.)
I created a wheel to install this application. The setup.py contains an entry point as follows:
entry_points={
'console_scripts': [
'script=source.script:main'
]
}
The Problem
I pip installed the wheel in a test directory with its own virtual environment. When I go to the corresponding site-packages directory and run python script.py, the CLI loads properly with the information about the aliases etc. However when I run simply script (the entry point) from the root directory of the environment the shell loads but I don't see any of the messages about the aliases etc., and some of the functionality which depends on the utils package aren't available either.
What could I be doing wrong? How can I make the command work as if it was running with all the necessary packages available?
Other information that may be useful
site-packages has copies of config and utils
config is included in the package as part of the package_data parameter in setup.py as ['./config/*.env', './config/*.json']
All import statements begin from source, i.e. from source.utils.config import etc.
which script gives me the location as venv/bin/script, but that bin directory does not have the packages. (Which is expected, I think.)

How to access text file in project root from python package under `src/` directory

I want my package's version number to live in a single place where everything that needs it can refer to it.
I found several suggestions in this Python guide to Single Sourcing the Package Version and decided to try #4, storing it in a simple text file in my project root named VERSION.
Here's a shortened version of my project's directory tree (you can see the the full project on GitHub):
.
├── MANIFEST.in
├── README.md
├── setup.py
├── VERSION
├── src/
│   └── fluidspaces/
│   ├── __init__.py
│   ├── __main__.py
│   ├── i3_commands.py
│   ├── rofi_commands.py
│   ├── workspace.py
│   └── workspaces.py
└── tests/
   ├── test_workspace.py
   └── test_workspaces.py
Since VERSION and setup.py are siblings, it's very easy to read the version file inside the setup script and do whatever I want with it.
But VERSION and src/fluidspaces/__main__.py aren't siblings and the main module doesn't know the project root's path, so I can't use this approach.
The guide had this reminder:
Warning: With this approach you must make sure that the VERSION file is included in all your source and binary distributions (e.g. add include VERSION to your MANIFEST.in).
That seemed reasonable - instead of package modules needing the project root path, the version file could be copied into the package at build time for easy access - but I added that line to the manifest and the version file still doesn't seem to be showing up in the build anywhere.
To build, I'm running pip install -U . from the project root and inside a virtualenv. Here are the folders that get created in <virtualenv>/lib/python3.6/site-packages as a result:
fluidspaces/
├── i3_commands.py
├── __init__.py
├── __main__.py
├── __pycache__/ # contents snipped
├── rofi_commands.py
├── workspace.py
└── workspaces.py
fluidspaces-0.1.0-py3.6.egg-info/
├── dependency_links.txt
├── entry_points.txt
├── installed-files.txt
├── PKG-INFO
├── SOURCES.txt
└── top_level.txt
More of my configuration files:
MANIFEST.in:
include README.md
include VERSION
graft src
prune tests
setup.py:
#!/usr/bin/env python3
from setuptools import setup, find_packages
def readme():
'''Get long description from readme file'''
with open('README.md') as f:
return f.read()
def version():
'''Get version from version file'''
with open('VERSION') as f:
return f.read().strip()
setup(
name='fluidspaces',
version=version(),
description='Navigate i3wm named containers',
long_description=readme(),
author='Peter Henry',
author_email='me#peterhenry.net',
url='https://github.com/mosbasik/fluidspaces',
license='MIT',
classifiers=[
'Development Status :: 3 - Alpha',
'Programming Language :: Python :: 3.6',
],
packages=find_packages('src'),
include_package_data=True,
package_dir={'': 'src'},
package_data={'': ['VERSION']},
setup_requires=[
'pytest-runner',
],
tests_require=[
'pytest',
],
entry_points={
'console_scripts': [
'fluidspaces = fluidspaces.__main__:main',
],
},
python_requires='~=3.6',
)
I found this SO question Any python function to get “data_files” root directory? that makes me think the pkg_resources library is the answer to my problems, but I've not been able to figure out how to use it in my situation.
I've been having trouble because most examples I've found have python packages directly in the project root instead of isolated in a src/ directory. I'm using a src/ directory because of recommendations like these:
PyTest: Good Practices: Tests Outside Application Code
Ionel Cristian Mărieș - Packaging a Python Library
Hynek Schlawack - Testing and Packaging
Other knobs I've found and tried twisting a little are the package_data, include_package_data, and data_files kwargs for setup(). Don't know how relevent they are. Seems like there's some interplay between things declared with these and things declared in the manifest, but I'm not sure about the details.

Chatted with some people in the #python IRC channel on Freenode about this issue. I learned:
pkg_resources was probably how I should to do what I was asking for, but it would require putting the version file in the package directory instead of the project root.
In setup.py I could read in such a version file from the package directory without importing the package itself (a no-no for a few reasons) but it would require hard-coding the path from the root to the package, which I wanted to avoid.
Eventually I decided to use the setuptools_scm package to get version information from my git tags instead of from a file in my repo (someone else was doing that with their package and their arguments were convincing).
As a result, I got my version number in setup.py very easily:
setup.py:
from setuptools import setup, find_packages
def readme():
'''Get long description from readme file'''
with open('README.md') as f:
return f.read()
setup(
name='fluidspaces',
use_scm_version=True, # use this instead of version
description='Navigate i3wm named containers',
long_description=readme(),
author='Peter Henry',
author_email='me#peterhenry.net',
url='https://github.com/mosbasik/fluidspaces',
license='MIT',
classifiers=[
'Development Status :: 3 - Alpha',
'Programming Language :: Python :: 3.6',
],
packages=find_packages('src'),
package_dir={'': 'src'},
setup_requires=[
'pytest-runner',
'setuptools_scm', # require package for setup
],
tests_require=[
'pytest',
],
entry_points={
'console_scripts': [
'fluidspaces = fluidspaces.__main__:main',
],
},
python_requires='~=3.6',
)
but I ended up having to have a hard-coded path indicating what the project root should be with respect to the package code, which is kind of what I had been avoiding before. I think this issue on the setuptools_scm GitHub repo might be why this is is necessary.
src/fluidspaces/__main__.py:
import argparse
from setuptools_scm import get_version # import this function
def main(args=None):
# set up command line argument parsing
parser = argparse.ArgumentParser()
parser.add_argument('-V', '--version',
action='version',
version=get_version(root='../..', relative_to=__file__)) # and call it here

For folks still looking for the answer to this, below is my attempt at following variety #4 of the guide to Single Sourcing the Package Version. It's worth noting WHY you might choose this solutions when there are other simpler ones. As the link notes, this approach is useful when you have external tools that might also want to easily check the version (e.g. CI/CD tools).
File tree
myproj
├── MANIFEST.in
├── myproj
│   ├── VERSION
│   └── __init__.py
└── setup.py
myproj/VERSION
1.4.2
MANIFEST.in
include myproj/VERSION
setup.py
with open('myproj/VERSION') as version_file:
version = version_file.read().strip()
setup(
...
version=version,
...
include_package_data=True, # needed for the VERSION file
...
)
myproj/__init__.py
import pkgutil
__name__ = 'myproj'
__version__ = pkgutil.get_data(__name__, 'VERSION').decode()
It's worth noting that setting configuration in setup.cfg is a nice, clean alternative to including everything in the setup.py setup function. Instead of reading version in setup.py, and then including in the function, you could do the following:
setup.cfg
[metadata]
name = my_package
version = attr: myproj.VERSION
In the full example I chose to leave everything in setup.py for the ease of one less file and uncertainty about whether or not potential whitespace around the version in the VERSION file would be stripped by the cfg solution.

Including and accessing additional files in python packages

I know roughly similar questions have already been asked, but I can't seem to find the solution to my particular problem (or my error!).
I am building a small Python package for myself, so I can use several functions without caring about folders and paths. For some of these functions (e.g., for interpolation), I need additional files which should also be copied when installing the package. I can't get this to work no matter what I try. I am also puzzled about how to add these files without explicitly specifying their paths once installed.
Here is the structure of my package
my_package
├── setup.py
├── README.rst
├── MANIFEST.in
├── my_package
│ ├── __init__.py
│ └── some_stuff.py
├── tables
│ ├── my_table.txt
my_Table.txt is the additional file that I need to install, so I have set my MANIFEST.in to
include README.rst
recursive-include tables *
And my setup.py looks like this (including the include_package_data=True statement)
from setuptools import setup
setup(name='my_package',
version='0.1',
description='Something',
url='http://something.com',
author='me',
author_email='an_email',
license='MIT',
packages=['my_package'],
include_package_data=True,
zip_safe=False)
However, after running python setup.py install, I can't find my_table.txt anywhere. What am I doing wrong? Where/how are these files copied? And after installing the package, how would you get the path of my_table.txt without explicitly writing it?
Thanks a lot!

I took the time to try your code/structure.
As it is, with packages=['my_package'], it only install the content of "my_package" (the subfolder).
You could use "find_packages" in your setup.py, I made it works with your structure.
from setuptools import setup, find_packages
setup(name='my_package',
version='0.1',
description='Something',
url='http://something.com',
author='me',
author_email='an_email',
license='MIT',
packages=find_packages(),
include_package_data=True,
zip_safe=False)
You can read more on "find_packages" here:
https://pythonhosted.org/setuptools/setuptools.html#using-find-packages
Hope this help.

setup.py sdist exclude packages in subdirectory

I have the following project structure I would like to package:
├── doc
│   └── source
├── src
│   ├── core
│   │   ├── config
│   │   │   └── log.tmpl
│   │   └── job
│   ├── scripts
│   └── test
└── tools
I would like to package core under src but exclude test. Here is what I tried unsuccessfully:
setup(name='core',
version=version,
package_dir = {'': 'src'}, # Our packages live under src but src is not a package itself
packages = find_packages("src", exclude=["test"]), # I also tried exclude=["src/test"]
install_requires=['xmltodict==0.9.0',
'pymongo==2.7.2',
'ftputil==3.1',
'psutil==2.1.1',
'suds==0.4',
],
include_package_data=True,
)
I know I can exclude test using the MANIFEST.in file, but I would be happy if you could show me how to do this with setup and find_packages.
Update:
After some more playing around, I realized that building the package with python setup.py install does what I expected (that is, it excludes test). However, issuing python setup.py sdist causes everything to be included (that is, it ignores my exclude directive). I don't know whether it is a bug or a feature, but there is still the possibility of excluding files in sdist using MANIFEST.in.

find_packages("src", exclude=["test"]) works.
The trick is to remove stale files such as core.egg-info directory. In your case you need to remove src/core.egg-info.
Here's setup.py I've used:
from setuptools import setup, find_packages
setup(name='core',
version='0.1',
package_dir={'':'src'},
packages=find_packages("src", exclude=["test"]), # <- test is excluded
####packages=find_packages("src"), # <- test is included
author='J.R. Hacker',
author_email='jr#example.com',
url='http://stackoverflow.com/q/26545668/4279',
package_data={'core': ['config/*.tmpl']},
)
To create distributives, run:
$ python setup.py sdist bdist bdist_wheel
To enable the latter command, run: pip install wheel.
I've inspected created files. They do not contain test but contain core/__init__.py, core/config/log.tmpl files.

In your MANIFEST.in at project root, add
prune src/test/
then build package with python setup.py sdist

I probably just use wild cards as defined in the find_packages documentation. *test* or *tests* is something I tend to use as we save only test filenames with the word test. Simple and easy ^-^.
setup(name='core',
version=version,
package_dir = {'': 'src'}, # Our packages live under src but src is not a package itself
packages = find_packages("src", exclude=['*tests*']), # I just use wild card. Works perfect ^-^
install_requires=['xmltodict==0.9.0',
'pymongo==2.7.2',
'ftputil==3.1',
'psutil==2.1.1',
'suds==0.4',
],
include_package_data=True,
)
FYI:
I would also recommend adding following into .gitignore.
build
dist
pybueno.egg-info
And move build and pushing package to pypi or your private repository bit into CI/CD to make whole setup look clean and neat.

Assuming that your folder is called tests and not test, it should work with the following code:
setup(name='core',
version=version,
package_dir = {'': 'src'}, # Our packages live under src but src is not a package itself
packages = find_packages('src', exclude=['tests'])
install_requires=['xmltodict==0.9.0',
'pymongo==2.7.2',
'ftputil==3.1',
'psutil==2.1.1',
'suds==0.4',
],
include_package_data=True,
)

MANIFEST.in, package_data, and data_files clarification?

I am trying to create a Python package, and I have a directory structure like this:
mypkg/
├── __init__.py
├── module1
│   ├── x.py
│   ├── y.py
│   └── z.txt
└── module2
├── a.py
└── b.py
Then I added all the files in MANIFEST.in and when I check the created archive, it had all the files.
When I do python setup.py install in the dist-packages/mypkg/module1. I see only the Python files and not z.txt.
I have z.txt in both MANIFEST.in and setup.py:
setup (
packages = [
'mypkg',
'mypkg.module1',
'mypkg.module2',
],
package_data = {
'mypkg': ['module1/z.txt']
},
include_package_data = True,
...
)
I tried adding the file as data_files as well but that created a directory in /usr/local. I want to keep it inside the source code directory as the code uses that data.
I have read the posts listed below but I keep getting confused about what is the right way to keep z.txt in the right location after setup.py install.
MANIFEST.in ignored on "python setup.py install" - no data files installed?
Installing data files into site-packages with setup.py
http://blog.codekills.net/2011/07/15/lies,-more-lies-and-python-packaging-documentation-on--package_data-/

Try using setuptools instead of distutils.

Update: It got fixed when I started using setuptools instead of distutils.core. I think it was some problem with distutils not agreeing with manifest while setuptools worked without any changes in the code. I recommend using setuptools in the future. Using the link here : setup tools- developers guide

Develop Reference

Python is a programming language that lets you work quickly and integrate systems more effectively.

How to properly package set of callable python scripts or modules - python

Related

Python CLI entry point doesn't work as expected

How to access text file in project root from python package under `src/` directory

Including and accessing additional files in python packages

setup.py sdist exclude packages in subdirectory

MANIFEST.in, package_data, and data_files clarification?

Categories

Resources