Python namespace package clear folder structure - python

I would like to set up a python namespace package containing several connected packages which need to be installable independently unless depencies are explicitly specified. Existing solutions however seem more ore less messy to me.
One of the packages contains moste of the problems logic for example and the others contain auxiliray functionality such as plotting and data export. The logic package needs to stay slim and can not import more than numpy where as the other packages can utilise more complex packages like pandas and matplolib. I would like to set up a package structure which looks like the resulting namespace of a namespace package but without unnecessary folder nesting something like this:
namespace
├── logic
│   ├── __init__.py
| ├── functions.py
│   └── setup.py # requires numpy
├── datastructure
│ ├── __init__.py
| ├── functions.py
│ └── setup.py # requires namespace.logic and pandas
├── plotting
│ ├── __init__.py
| ├── functions.py
│ └── setup.py # requires namespace.logic, namespace.datastructure and matplotlib
└── setup.py #should install every package in namespace
I figured that this looks like a conventional package with modules but I did not find a way yet to set it up as a packgae while mainintainign the option to only install specific modules therefore I assumed a namespace package should offer that option but I can not quite get it to work with pip
At the moment I would be required to have two more directory levels like this:
namespace
├── NamespaceLogic #don't want this
│   ├── namespace #don't want this
│   │   └── logic
│   │   └── __init__.py
│   └── setup.py
├── NamespaceDatastructure #don't want this
│ ├── namespace #don't want this
│ │ └── datastructure
│ │ └── __init__.py
│ └── setup.py
├── NamespacePlotting #don't want this
│   ├── namespace #don't want this
│   │   └── plotting
│   │   └── __init__.py
│   └── setup.py
└── setup.py
My problem is similar to this question: Python pip install sub-package from own package but I would like to avoid having to many subfolders as this poses the risk to max out the path length restrictions of my system (+ it confuses everyone else). How do I need to configure the different setup.py files in order to be able to run
pip install namespace #installs namespace.logic, namespace.datastructure, namespace.plotting
pip install namespce.logic #installs only namspace.logic and works in an environment with numpy which does not have pandas or matplotlib

You can use the package_dir option of setuptools to your advantage to get rid of the empty folders for the namespace packages:
NmspcPing
├── ping
│   └── __init__.py
└── setup.py
import setuptools
setuptools.setup(
name='NmspcPing',
version='0.0.0.dev0',
packages=['nmspc.ping'],
package_dir={'nmspc.ping': 'ping'},
)
Something like the following would also be feasible, but depending on how the projects are built or installed, the setup.py files might be included as part of the packages as well (which is probably unwanted):
.
├── ping
│   ├── __init__.py
│   └── setup.py
├── pong
│   ├── __init__.py
│   └── setup.py
└── setup.py
If the path length restriction is an issue, then using shorter package names might be a better bet. Because in many cases the packages are installed with all the directory levels anyway (unless they stay zipped), even if you skip them in your source code repository.
Honestly I would be surprised if a path length restriction issue actually happened, and I believe it would anyway still happen on things that you don't have control over (like 3rd party packages: numpy, pandas, plotlib probably have lots of nested sub-packages as well).

Related

Have several packages share the same file placed in a different folder - and how to correctly build the package?

I currently have several packages with this structure (showing only 2 here)
├── package1
│ ├── setup.py
│ ├── package1
│ ├── __init__.py
| ├── main.py
| ├── abstract_classes.py
├── package2
│ ├── setup.py
│ ├── package2
│ ├── __init__.py
| ├── main.py
| ├── abstract_classes.py
Each package contains the file abstract_classes.py, which contains abstract classes which are inherited by the specific classes of each package. This file must be the same for each package. However, since I also need to ship the packages individually to pypi, I ended up making local copies of this file in each package folder, which is particularly annoying because if I make a change I need to make sure to copy the file in all the other directories.
Is there a way to keep only one copy of abstract_classes.py, for example in the top directory, and then add some instructions (maybe in the setup.py file?) to make sure that abstract_classes.py is retrieved when the package is being built and shipped to pypi? Or do you think it would be better to create a separate package just for the abstract_classes.py file?

python package metadata best practice [duplicate]

A typical directory tree of a python project might look like this.
.
├── src
│   ├── __init__.py
│   ├── main.py
│   ├── module1
│   │   ├── __init__.py
│   │   └── foo.py
│   └── module2
│   ├── __init__.py
│   └── bar.py
├── setup.py
└── venv
└── ...
setup.py contains package metadata like name, version, description, etc.
In some cases, it is useful to have these values inside the application code. For example with FastAPI, you can provide them to the constructor of the API object so that they are shown in the auto-generated docs. Or with Click, you can add a version option.
To avoid duplication, these values should be defined only once and used in both places. However, I have never found a good way, to share these values between setup.py and application code.
Importing one from the other does not seem to work, because they are not part of the same package structure.
What is best practice in this case?
In the code (run-time code, not setup.py) use importlib.metadata (or its back-port importlib-metadata). The only thing you need to duplicate is the name of the project (the name of the "distribution package").
For example with a project named MyLibrary:
import importlib.metadata
PROJECT_NAME = 'MyLibrary'
_DISTRIBUTION_METADATA = importlib.metadata.metadata(PROJECT_NAME)
SUMMARY = _DISTRIBUTION_METADATA['Summary']
VERSION = _DISTRIBUTION_METADATA['Version']
Aside: If it is not possible to hard-code the name of the distribution package, there are ways to find it: https://stackoverflow.com/a/63849982

Trouble organizing Python library for import to behave in expected manner

I've had a lot of trouble figuring out a key point about how the import mechanism works, and how this relates to organizing packages.
Suppose I've written two or more unrelated, reusable libraries. (I'll use "library" informally as a collection of code and resources, including tests and possibly data, as opposed to a "package" in the formal Python sense.) Here are two imaginary libraries in a parent directory called "my_libraries":
my_libraries/
├── audio_studio
│   ├── src
│   │   ├── distortion.py
│   │   ├── filter.py
│   │   └── reverb.py
│   └── test
│   └── test_audio.py
└── picasso_graphics
├── src
│   ├── brushes.py
│   ├── colors.py
│   └── easel.py
└── test
└── test_picasso.py
I'm hoping to accomplish all three of the following, all of which seem to me to be normal practice or expectation:
1. MAIN LIBRARY CODE IN SUBDIRECTORY
For neatness of library organization, I want to put the library's core code in a subdirectory such as "src" rather than at the top-level directory. (My point here isn't to debate whether "src" in particular is a good naming approach; I've read multiple pages pro and con. Some people appear to prefer the form foo/foo, but I think I'd have the same problem I'm describing with that too.)
2. ADD TO $PYTHONPATH JUST ONCE
I'd like to be able to add "my_libraries" to $PYTHONPATH or sys.path just once. If I add a new library to "my_libraries", it's automatically discoverable by my scripts.
3. NORMAL-LOOKING import STATEMENTS
I'd like to be able import from these libraries into other projects in a normal-looking way, without mentioning the "src" directory:
import picasso_graphics.brushes
OR
from picasso_graphics import brushes
HOW TO DO THIS?
Despite much reading and experimentation, I haven't been able to find a solution which satisfies all three of these criteria. The closes I've gotten is to create a picasso_graphics/__init__.py file containing the following:
base_dir = os.path.dirname(__file__)
src_dir = os.path.join(base_dir, "src")
sys.path.insert(0, src_dir)
This almost does what I want, but I have to break up the imports into two statements, so that the __init__.py file executes with the first import:
import picasso_graphics
import brushes
Am I making a wrong assumption here about what's possible? Is there a solution which satisfies all three of these criteria?
What you want, Sean, is most likely what is called a namespace project; Use a tool called pyscaffold to help with writing the boilerplate. Each project should have a setup.cfg with all your project dependencies. Once you have that, create a virtual environment for your project, then install each with ... (inside the environment).
pip install -e audio-studio
pip install -e picasso-graphics
Installing your project into your virtual environment will cause your imports to behave as you want -- between projects.
This is a bit of overhead to get started, I know, but these are skills you want to have sooner or later. Setup.cfg, virtual environments, and pip install -e is a magical pattern that just makes things work where other approaches will drive you mad.
Below is a simple example project I created using pyscaffold. Notice there is a package below src and that src does not have an init.py. That is a decision made by the pyscaffold folks to help ease import confusion - you should likely adopt it.
my_libraries/
├── audio-studio
│   ├── AUTHORS.rst
│   ├── CHANGELOG.rst
│   ├── LICENSE.txt
│   ├── README.rst
│   ├── requirements.txt
│   ├── setup.cfg
│   ├── setup.py
│   ├── src
│   │   └── audio_studio
│   │   ├── __init__.py
│   │   └── skeleton.py
│   └── tests
│   ├── conftest.py
│   └── test_skeleton.py
└── picasso-graphics
├── AUTHORS.rst
├── CHANGELOG.rst
├── LICENSE.txt
├── README.rst
├── requirements.txt
├── setup.cfg
├── setup.py
├── src
│   └── picasso_graphics
│   ├── __init__.py
│   └── skeleton.py
└── tests
├── conftest.py
└── test_skeleton.py

Python Package Folder Structure

I have been researching how to build the folder structure for a custom python package. There were several attempts, but none of them seemed to be applicable in general. In particular, the usage (or not usage) of the \__init__.py file(s).
I have a package that consists of several sub-packages, each being responsible to parse Files of a certain kind. Therefore I currently adopted this structure:
Parsers/
├── __init__.py
|
├── ExternalPackages
│ ├── __init__.py
│ ├── package1
│ └── package2
|
├── FileType1_Parsers/
│ ├── __init__.py
│ ├── parsers1.py
│ └── containers1.py
│
└── FileType2_Parsers/
├── __init__.py
├── parsers2.py
└── containers2.py
But it seems not very pythonic, that when I import his package and I want to use a certain class of a module I have to type something like
from Parsers.FileType1_Parsers.parsers1 import example_class
Is there any convention on how to structure such packages or any rules on how to avoid such long import lines?
You can add the following line to Parsers/__init__.py
from .FileType1_Parsers.parsers1 import example_class
Then you can import example_class by
from Parsers import example_class
This is a common practice in large package.
You can modify sys.path at run-time so that it contains a directory for each module you'll be using. For example, for package1 issue the following statements:
>>> sys.path.append(r"[package directory path]\\Parsers\\FileType1_Parsers\\package1")
You can do this for any other modules in the package as well. Now, you can just use this command:
from package1 import example_class
Hope this helps!

What import system should I use when I want to run my application both 'from source' and from installing with setuptools?

Consider this application:
.
├── LICENSE
├── MANIFEST.in
├── program
│   ├── apple.py
│   ├── __init__.py
│   ├── __main__.py
│   ├── nonfruit.py
│   ├── pear.py
│   ├── strawberry.py
│   └── vegetables
│   ├── carrot.py
│   ├── __init__.py
│   └── lettuce.py
├── README.md
├── setup.cfg
└── setup.py
__main__.py is the file that users should run to use my program. I am distributing my program via PyPI and so I want to be able to install it via pip as well. Because of that, I created a setup.py file with an entry point:
entry_points = {
'console_scripts': ['pg=program.__main__:main']}
The problem I'm facing is that there are several imports in my program, and these result in the situation that my program does run 'locally' (by executing python ./__main__.py, but not from installation (by running pg). Or, depending on the way I import it, the other way around.
__main__.py imports nonfruit.py:
from nonfruit import Nonfruit
nonfruit.py imports vegetables/carrot.py:
import vegetables.carrot
ca = vegetables.carrot.Carrot()
I would like to hear some advice in structuring my project regarding imports, so that it runs both locally and from setuptools installation. For example, should I use absolute imports or relative imports? And should I use from X import Y or import X.Y?
I found a solution on Jan-Philip Gehrcke's website.
The instructions are written for use with Python 3, but I applied it to Python 2.7 with success. The problem I was having originated from my directory becoming a package. Jan advices to create one file to run it from source (bootstrap-runner.py) and one file to run it from installation (bootstrap/__main__.py). Furthermore, he advices to use explicit relative imports:
from .X import Y
This will probably be a good guideline in the next applications I'm writing.

Categories

Resources