MANIFEST.in, package_data, and data_files clarification? - python

I am trying to create a Python package, and I have a directory structure like this:
mypkg/
├── __init__.py
├── module1
│   ├── x.py
│   ├── y.py
│   └── z.txt
└── module2
├── a.py
└── b.py
Then I added all the files in MANIFEST.in and when I check the created archive, it had all the files.
When I do python setup.py install in the dist-packages/mypkg/module1. I see only the Python files and not z.txt.
I have z.txt in both MANIFEST.in and setup.py:
setup (
packages = [
'mypkg',
'mypkg.module1',
'mypkg.module2',
],
package_data = {
'mypkg': ['module1/z.txt']
},
include_package_data = True,
...
)
I tried adding the file as data_files as well but that created a directory in /usr/local. I want to keep it inside the source code directory as the code uses that data.
I have read the posts listed below but I keep getting confused about what is the right way to keep z.txt in the right location after setup.py install.
MANIFEST.in ignored on "python setup.py install" - no data files installed?
Installing data files into site-packages with setup.py
http://blog.codekills.net/2011/07/15/lies,-more-lies-and-python-packaging-documentation-on--package_data-/

Try using setuptools instead of distutils.

Update: It got fixed when I started using setuptools instead of distutils.core. I think it was some problem with distutils not agreeing with manifest while setuptools worked without any changes in the code. I recommend using setuptools in the future. Using the link here : setup tools- developers guide

Related

How to properly package set of callable python scripts or modules

I've been searching the net for quite some time now but I can't seem to wrap my head around on how can I distribute my python scripts for my end user.
I've been using my scripts on my command line using this command python samplemodule.py "args1"
And this is also the way I want my user to also use it on their end with their command line. But my worry is that this certain modules have dependencies on other library or modules.
My scripts are working when they are all in the Project's root directory, but everything crumbles when I try to package them and put them in sub directories.
An example of this is I can't now run my scripts since its having an error when I'm importing a module from the data subdirectory.
This is my project structure.
MyProject
\formatter
__init__.py
__main__.py
formatter.py
addfilename.py
addscrapertype.py
...\data
__init__.py
helper.py
csv_formatter.py
setup.py
The csv_formatter.py file is just a wrapper to call the formatter.main.
Update: I was now able to generate a tar.gz package but the package wasn't callable when installed on my machine.
This is the setup.py:
import setuptools
with open("README.md", "r") as fh:
long_description = fh.read()
setuptools.setup(
name="formatter",
version="1.0.1",
author="My Name",
author_email="sample#email.com",
description="A package for cleaning and reformatting csv data",
long_description=long_description,
long_description_content_type="text/markdown",
url="https://github.com/RhaEL012/Python-Scripts",
packages=["formatter"],
include_package_data=True,
package_data={
# If any package contains *.txt or *.rst files, include them:
"": ["*.csv", "*.rst", "*.txt"],
},
entry_points={
"console_scripts": [
"formatter=formatter.formatter:main"
]
},
classifiers=[
"Programming Language :: Python :: 3",
"License :: OSI Approved :: MIT License",
"Operating System :: OS Independent",
],
python_requires='>=3.6',
install_requires=[
"pandas"
]
)
Now, after installing the package on the machine I wasn't able to call the module and it results in an error:
Z:\>addfilename "C:\Users\Username\Desktop\Python Scripts\"
Update: I try to install the setup.py in a virtual environment just to see where the error is coming from.
I install it then I get the following error: FileNotFoundError: [Errno 2] no such file or directory: 'README.md'
I try to include the README.md in the MANIFEST.in but still no luck.
So I try to make it a string just to see if the install will proceed.
The install proceed but then again, I encounter an error that says that the package directory 'formatter' does not exist
As I am not able to look into your specific files I will just explain how I usually tackle this issue.
This is the manner how I usually setup the command line interface (cli) tools. The project folder looks like:
Projectname
├── modulename
│   ├── __init__.py # this one is empty in this example
│   ├── cli
│   │   ├── __init__.py # this is the __init__.py that I refer to hereafter
│   ├── other_subfolder_with_scripts
├── setup.py
Where all functionality is within the modulename folder and subfolders.
In my __init__.py I have:
def main():
# perform the things that need to be done
# also all imports are within the function call
print('doing the stuff I should be doing')
but I think you can also import what you want into the __init__.py and still reference to it in the manner I do in setup.py.
In setup.py we have:
import setuptools
setuptools.setup(
name='modulename',
version='0.0.0',
author='author_name',
packages=setuptools.find_packages(),
entry_points={
'console_scripts': ['do_main_thing=modulename.cli:main'] # so this directly refers to a function available in __init__.py
},
)
Now install the package with pip install "path to where setup.py is" Then if it is installed you can call:
do_main_thing
>>> doing the stuff I should be doing
For the documentation I use: https://setuptools.readthedocs.io/en/latest/.
My recommendation is to start with this and slowly add the functionality that you want. Then step by step solve your problems, like adding a README.md etc.
I disagree with the other answer. You shouldn't run scripts in __init__.py but in __main__.py instead.
Projectfolder
├── formatter
│ ├── __init__.py
│ ├── cli
│ │ ├── __init__.py # Import your class module here
│ │ ├── __main__.py # Call your class module here, using __name__ == "__main__"
│ │ ├── your_class_module.py
├── setup.py
If you don't want to supply a readme, just remove that code and enter a description manually.
I use https://setuptools.readthedocs.io/en/latest/setuptools.html#find-namespace-packages instead of manually setting the packages.
You can now install your package by just running pip install ./ like you have been doing before.
After you've done that run: python -m formatter.cli arguments. It runs the __main__.py file you've created in the CLI folder (or whatever you've called it).
An important note about packaging modules is that you need to use relative imports. You'd use from .your_class_module import YourClassModule in that __init__.py for example. If you want to import something from an adjacent folder you need two dots, from ..helpers import HelperClass.
I'm not sure if this is helpful, but usually I package my python scripts using the wheel package:
pip install wheel
python setup.py sdist bdist_wheel
After those two commands a whl package is created in a 'dist' folder which you can then either upload to PyPi and download/install from there, or you can install it offline with the "pip install ${PackageName}.py"
Here's A useful user guide just in case there is something else that I didn't explain:
https://packaging.python.org/tutorials/packaging-projects/

How to access text file in project root from python package under `src/` directory

I want my package's version number to live in a single place where everything that needs it can refer to it.
I found several suggestions in this Python guide to Single Sourcing the Package Version and decided to try #4, storing it in a simple text file in my project root named VERSION.
Here's a shortened version of my project's directory tree (you can see the the full project on GitHub):
.
├── MANIFEST.in
├── README.md
├── setup.py
├── VERSION
├── src/
│   └── fluidspaces/
│   ├── __init__.py
│   ├── __main__.py
│   ├── i3_commands.py
│   ├── rofi_commands.py
│   ├── workspace.py
│   └── workspaces.py
└── tests/
   ├── test_workspace.py
   └── test_workspaces.py
Since VERSION and setup.py are siblings, it's very easy to read the version file inside the setup script and do whatever I want with it.
But VERSION and src/fluidspaces/__main__.py aren't siblings and the main module doesn't know the project root's path, so I can't use this approach.
The guide had this reminder:
Warning: With this approach you must make sure that the VERSION file is included in all your source and binary distributions (e.g. add include VERSION to your MANIFEST.in).
That seemed reasonable - instead of package modules needing the project root path, the version file could be copied into the package at build time for easy access - but I added that line to the manifest and the version file still doesn't seem to be showing up in the build anywhere.
To build, I'm running pip install -U . from the project root and inside a virtualenv. Here are the folders that get created in <virtualenv>/lib/python3.6/site-packages as a result:
fluidspaces/
├── i3_commands.py
├── __init__.py
├── __main__.py
├── __pycache__/ # contents snipped
├── rofi_commands.py
├── workspace.py
└── workspaces.py
fluidspaces-0.1.0-py3.6.egg-info/
├── dependency_links.txt
├── entry_points.txt
├── installed-files.txt
├── PKG-INFO
├── SOURCES.txt
└── top_level.txt
More of my configuration files:
MANIFEST.in:
include README.md
include VERSION
graft src
prune tests
setup.py:
#!/usr/bin/env python3
from setuptools import setup, find_packages
def readme():
'''Get long description from readme file'''
with open('README.md') as f:
return f.read()
def version():
'''Get version from version file'''
with open('VERSION') as f:
return f.read().strip()
setup(
name='fluidspaces',
version=version(),
description='Navigate i3wm named containers',
long_description=readme(),
author='Peter Henry',
author_email='me#peterhenry.net',
url='https://github.com/mosbasik/fluidspaces',
license='MIT',
classifiers=[
'Development Status :: 3 - Alpha',
'Programming Language :: Python :: 3.6',
],
packages=find_packages('src'),
include_package_data=True,
package_dir={'': 'src'},
package_data={'': ['VERSION']},
setup_requires=[
'pytest-runner',
],
tests_require=[
'pytest',
],
entry_points={
'console_scripts': [
'fluidspaces = fluidspaces.__main__:main',
],
},
python_requires='~=3.6',
)
I found this SO question Any python function to get “data_files” root directory? that makes me think the pkg_resources library is the answer to my problems, but I've not been able to figure out how to use it in my situation.
I've been having trouble because most examples I've found have python packages directly in the project root instead of isolated in a src/ directory. I'm using a src/ directory because of recommendations like these:
PyTest: Good Practices: Tests Outside Application Code
Ionel Cristian Mărieș - Packaging a Python Library
Hynek Schlawack - Testing and Packaging
Other knobs I've found and tried twisting a little are the package_data, include_package_data, and data_files kwargs for setup(). Don't know how relevent they are. Seems like there's some interplay between things declared with these and things declared in the manifest, but I'm not sure about the details.
Chatted with some people in the #python IRC channel on Freenode about this issue. I learned:
pkg_resources was probably how I should to do what I was asking for, but it would require putting the version file in the package directory instead of the project root.
In setup.py I could read in such a version file from the package directory without importing the package itself (a no-no for a few reasons) but it would require hard-coding the path from the root to the package, which I wanted to avoid.
Eventually I decided to use the setuptools_scm package to get version information from my git tags instead of from a file in my repo (someone else was doing that with their package and their arguments were convincing).
As a result, I got my version number in setup.py very easily:
setup.py:
from setuptools import setup, find_packages
def readme():
'''Get long description from readme file'''
with open('README.md') as f:
return f.read()
setup(
name='fluidspaces',
use_scm_version=True, # use this instead of version
description='Navigate i3wm named containers',
long_description=readme(),
author='Peter Henry',
author_email='me#peterhenry.net',
url='https://github.com/mosbasik/fluidspaces',
license='MIT',
classifiers=[
'Development Status :: 3 - Alpha',
'Programming Language :: Python :: 3.6',
],
packages=find_packages('src'),
package_dir={'': 'src'},
setup_requires=[
'pytest-runner',
'setuptools_scm', # require package for setup
],
tests_require=[
'pytest',
],
entry_points={
'console_scripts': [
'fluidspaces = fluidspaces.__main__:main',
],
},
python_requires='~=3.6',
)
but I ended up having to have a hard-coded path indicating what the project root should be with respect to the package code, which is kind of what I had been avoiding before. I think this issue on the setuptools_scm GitHub repo might be why this is is necessary.
src/fluidspaces/__main__.py:
import argparse
from setuptools_scm import get_version # import this function
def main(args=None):
# set up command line argument parsing
parser = argparse.ArgumentParser()
parser.add_argument('-V', '--version',
action='version',
version=get_version(root='../..', relative_to=__file__)) # and call it here
For folks still looking for the answer to this, below is my attempt at following variety #4 of the guide to Single Sourcing the Package Version. It's worth noting WHY you might choose this solutions when there are other simpler ones. As the link notes, this approach is useful when you have external tools that might also want to easily check the version (e.g. CI/CD tools).
File tree
myproj
├── MANIFEST.in
├── myproj
│   ├── VERSION
│   └── __init__.py
└── setup.py
myproj/VERSION
1.4.2
MANIFEST.in
include myproj/VERSION
setup.py
with open('myproj/VERSION') as version_file:
version = version_file.read().strip()
setup(
...
version=version,
...
include_package_data=True, # needed for the VERSION file
...
)
myproj/__init__.py
import pkgutil
__name__ = 'myproj'
__version__ = pkgutil.get_data(__name__, 'VERSION').decode()
It's worth noting that setting configuration in setup.cfg is a nice, clean alternative to including everything in the setup.py setup function. Instead of reading version in setup.py, and then including in the function, you could do the following:
setup.cfg
[metadata]
name = my_package
version = attr: myproj.VERSION
In the full example I chose to leave everything in setup.py for the ease of one less file and uncertainty about whether or not potential whitespace around the version in the VERSION file would be stripped by the cfg solution.

Python packaging: exclude directory from bdist_wheel

I have the following project structure:
.
├── docs
├── examples
├── MANIFEST.in
├── README.rst
├── setup.cfg
├── setup.py
└── myproject
I want to bundle my project into a wheel. For this, I use the following setup.py:
#!/usr/bin/env python
from setuptools import setup, find_packages
setup(name='myproject',
version='1.0',
description='Great project'
long_description=open('README.rst').read(),
author='Myself'
packages=find_packages(exclude=['tests','test','examples'])
)
When running python setup.py bdist_wheel, the examples directory is included in the wheel. How do I prevent this?
According to
Excluding a top-level directory from a setuptools package
I would expect that examples is excluded.
I solved the issue by using a suffixed star, examples*, i.e.:
find_packages(exclude=['*tests','examples*'])
(Note that I am writing '*tests' with a leading star,because I have test packages within each code package, as in myproject.mypackage.tests. Somehow the suffixed star seems to not be necessary if there is already a prefixed one)

setup.py sdist exclude packages in subdirectory

I have the following project structure I would like to package:
├── doc
│   └── source
├── src
│   ├── core
│   │   ├── config
│   │   │   └── log.tmpl
│   │   └── job
│   ├── scripts
│   └── test
└── tools
I would like to package core under src but exclude test. Here is what I tried unsuccessfully:
setup(name='core',
version=version,
package_dir = {'': 'src'}, # Our packages live under src but src is not a package itself
packages = find_packages("src", exclude=["test"]), # I also tried exclude=["src/test"]
install_requires=['xmltodict==0.9.0',
'pymongo==2.7.2',
'ftputil==3.1',
'psutil==2.1.1',
'suds==0.4',
],
include_package_data=True,
)
I know I can exclude test using the MANIFEST.in file, but I would be happy if you could show me how to do this with setup and find_packages.
Update:
After some more playing around, I realized that building the package with python setup.py install does what I expected (that is, it excludes test). However, issuing python setup.py sdist causes everything to be included (that is, it ignores my exclude directive). I don't know whether it is a bug or a feature, but there is still the possibility of excluding files in sdist using MANIFEST.in.
find_packages("src", exclude=["test"]) works.
The trick is to remove stale files such as core.egg-info directory. In your case you need to remove src/core.egg-info.
Here's setup.py I've used:
from setuptools import setup, find_packages
setup(name='core',
version='0.1',
package_dir={'':'src'},
packages=find_packages("src", exclude=["test"]), # <- test is excluded
####packages=find_packages("src"), # <- test is included
author='J.R. Hacker',
author_email='jr#example.com',
url='http://stackoverflow.com/q/26545668/4279',
package_data={'core': ['config/*.tmpl']},
)
To create distributives, run:
$ python setup.py sdist bdist bdist_wheel
To enable the latter command, run: pip install wheel.
I've inspected created files. They do not contain test but contain core/__init__.py, core/config/log.tmpl files.
In your MANIFEST.in at project root, add
prune src/test/
then build package with python setup.py sdist
I probably just use wild cards as defined in the find_packages documentation. *test* or *tests* is something I tend to use as we save only test filenames with the word test. Simple and easy ^-^.
setup(name='core',
version=version,
package_dir = {'': 'src'}, # Our packages live under src but src is not a package itself
packages = find_packages("src", exclude=['*tests*']), # I just use wild card. Works perfect ^-^
install_requires=['xmltodict==0.9.0',
'pymongo==2.7.2',
'ftputil==3.1',
'psutil==2.1.1',
'suds==0.4',
],
include_package_data=True,
)
FYI:
I would also recommend adding following into .gitignore.
build
dist
pybueno.egg-info
And move build and pushing package to pypi or your private repository bit into CI/CD to make whole setup look clean and neat.
Assuming that your folder is called tests and not test, it should work with the following code:
setup(name='core',
version=version,
package_dir = {'': 'src'}, # Our packages live under src but src is not a package itself
packages = find_packages('src', exclude=['tests'])
install_requires=['xmltodict==0.9.0',
'pymongo==2.7.2',
'ftputil==3.1',
'psutil==2.1.1',
'suds==0.4',
],
include_package_data=True,
)

Confused about the package_dir and packages settings in setup.py

Here is my project directory structure, which includes the project folder, plus
a "framework" folder containing packages and modules shared amongst several projects
which resides at the same level in the hierarchy as the project folders:
Framework/
package1/
__init__.py
mod1.py
mod2.py
package2/
__init__.py
moda.py
modb.py
My_Project/
src/
main_package/
__init__.py
main_module.py
setup.py
README.txt
Here is a partial listing of the contents of my setup.py file:
from distutils.core import setup
setup(packages=[
'package1',
'package2.moda',
'main_package'
],
package_dir={
'package1': '../Framework/package1',
'package2.moda': '../Framework/package2',
'main_package': 'src/main_package'
})
Here are the issues:
No dist or build directories are created
Manifest file is created, but all modules in package2 are listed, not just the moda.py module
The build terminates with an error:
README.txt: Incorrect function
I don't know if I have a single issue (possibly related to my directory structure) or if I have multiple issues but I've read everything I can find on distribution of Python applications, and I'm stumped.
If I understand correctly, the paths in package_dir should stop at the parent directory of the directories which are Python packages. In other words, try this:
package_dir={'package1': '../Framework',
'package2': '../Framework',
'main_package': 'src'})
I've had a similar problem, which was solved through the specification of the root folder and of the packages inside that root.
My package has the following structure:
.
├── LICENSE
├── README.md
├── setup.py
└── src
└── common
├── __init__.py
├── persistence.py
├── schemas.py
└── utils.py
The setup.py contains the package_dir and packages line:
package_dir={"myutils": "src"},
packages=['myutils.common'],
After running the python setup.py bdist_wheel and installing the .whl file, the package can be called using:
import myutils.common

Categories

Resources