The way to make namespace packages in Python - python

From Namespace Packages in distribute, I know I can make use of namespace packages to separate a big Python package into several smaller ones. It is really awesome. The document also mentions:
Note, by the way, that your project’s source tree must include the
namespace packages’ __init__.py files (and the __init__.py of any
parent packages), in a normal Python package layout. These __init__.py
files must contain the line:
__import__('pkg_resources').declare_namespace(__name__)
This code ensures that the namespace package machinery is operating
and that the current package is registered as a namespace package.
I'm wondering are there any benefits to keep the same hierarchy of directories to the hierarchy of packages? Or, this is just the technical requirement of the namespace packages feature of distribute/setuptools?
Ex,
I would like to provide a sub-package foo.bar, such that I have to build the following hierarchy of folders and prepare a __init__.py to make setup.py work the namespace package:
~foo.bar/
~foo.bar/setup.py
~foo.bar/foo/__init__.py <= one-lined file dedicated to namespace packages
~foo.bar/foo/bar/__init__.py
~foo.bar/foo/bar/foobar.py
I'm not familiar with namespace packages but it looks to me that 1) foo/bar and 2) (nearly) one-lined __init__.py are routine tasks. They do provide some hints of "this is a namespace package" but I think we already have that information in setup.py?
edit:
As illustrated in the following block, can I have a namespace package without that nested directory and one-lined __init__.py in my working directory? That is, can we ask setup.py to automatically generate those by just putting one line namespace_packages = ['foo']?
~foo.bar/
~foo.bar/setup.py
~foo.bar/src/__init__.py <= for bar package
~foo.bar/src/foobar.py

A namespace package mainly has a particular effect when it comes time to import a sub-package. Basically, here's what happens, when importing foo.bar
the importer scans through sys.path looking for something that looks like foo.
when it finds something, it will look inside of the discovered foo for bar.
if bar is not found:
if foo is a normal package, an ImportError is raised, indicating that foo.bar doesn't exist.
if foo is a namespace package, the importer goes back to looking through sys.path for the next match of foo. the ImportError is only raised if all paths have been exhausted.
So that's what it does, but doesn't explain why you might want that. Suppose you designed a big, useful library (foo) but as part of that, you also developed a small, but very useful utility (foo.bar) that others python programmers find useful, even when they don't have a use for the bigger library.
You could distribute them together as one big blob of a package (as you designed it) even though most of the people using it only ever import the sub-module. Your users would find this terribly inconvenient because they'd have to download the whole thing (all 200MB of it!) even though they are only really interested in a 10 line utility class. If you have an open license, you'll probably find that several people end up forking it and now there are a half dozen diverging versions of your utility module.
You could rewrite your whole library so that the utility lives outside the foo namespace (just bar instead of foo.bar). You'll be able to distribute the utility separately, and some of your users will be happy, but that's a lot of work, especially considering that there actually are lots of users using the whole library, and so they'll have to rewrite their programs to use the new.
So what you really want is a way to install foo.bar on its own, but happily coexist with foo when that's desired too.
A namespace package allows exactly this, two totally independent installations of a foo package can coexist. setuptools will recognize that the two packages are designed to live next to each other and politely shift the folders/files in such a way that both are on the path and appear as foo, one containing foo.bar and the other containing the rest of foo.
You'll have two different setup.py scripts, one for each. foo/__init__.py in both packages have to indicate that they are namespace packages so the importer knows to continue regardless of which package is discovered first.

Related

Why does python packaging documentation say __init__.py file of package should be empty?

Based on the the documentation for Packaging Python Projects, __init__.py should be empty. I want to know why? because I'm placing certain objects in the __init__.py file that are used in every module in the Package. On checking bunch of __init.py__ files in my local environments for standard packages like importlib, multiprocessing etc. All of them have a bunch of code in the file.
The sole purpose of __init__.py is to indicate that the folder containing this file is a package, and to treat this folder as package, that's why it is recommended to leave it empty.
Consider following hierarchy:
foo
__init__.py
bar.py
When you use from foo import bar or import foo.bar, Python interpreter will look for __init__.py in foo folder, if it finds it, the bar module will be imported else it won't; however, this behavior has changed over the time, and it may be able to successfully import the modules/packages even if __init__.py is missing, but remember Zen of Python: Explicit is better than implicit, so it's always safe to have it.
But in case, if you need some package level variables to be defined, you can do it inside the __init__.py file, and all the modules inside the package will be able to use it.
And in fact, if you look at PEP 257, it mentions that the __init__.py can also contain the documentation for package level information.
You're taking that statement as more general than it's meant. You're reading a statement from a tutorial, where they walk you through creating a simple example project. That particular example project's __init__.py should be empty, simply because the example doesn't need to do anything in __init__.py.
Most projects' __init__.py files will not be empty. Taking a few examples from popular packages, such as numpy, requests, flask, sortedcontainers, or the stdlib asyncio, none of these example __init__.py files are empty. They may perform package initialization, import things from submodules into the main package namespace, or include metadata like __all__, __version__, or a package docstring. The example project is just simplified to the point where it doesn't have any of that.
To my knowledge, there are three things you need to be aware of when you create a non-empty __init__ file:
it might be more difficult to follow the code. If you instantiate a = B() in __init__ it's even worse. I know developers who don't like it only for this reason
on package import contents of __init__ are evaluated. Sometimes it might be computation heavy or simply not needed.
namespace conflicts. You can't really instantiate bar in init and have a bar.py file in your package.
I like importing package contents in __init__ as otherwise in bigger projects import statements become ugly. Overall it's not a good or bad practice. This advice applies only to the project in this particular example.
In some case, you didn't have any shared component in your package. Suppose Defining a little package for calculating some algorithms, Then you didn't need any shared component in your __init__

How do I structure my Python project to allow named modules to be imported from sub directories

This is my directory structure:
Projects
+ Project_1
+ Project_2
- Project_3
- Lib1
__init__.py # empty
moduleA.py
- Tests
__init__.py # empty
foo_tests.py
bar_tests.py
setpath.py
__init__.py # empty
foo.py
bar.py
Goals:
Have an organized project structure
Be able to independently run each .py file when necessary
Be able to reference/import both sibling and cousin modules
Keep all import/from statements at the beginning of each file.
I Achieved #1 by using the above structure
I've mostly achieved 2, 3, and 4 by doing the following (as recommended by this excellent guide)
In any package that needs to access parent or cousin modules (such as the Tests directory above) I include a file called setpath.py which has the following code:
import os
import sys
sys.path.insert(0, os.path.abspath('..'))
sys.path.insert(0, os.path.abspath('.'))
sys.path.insert(0, os.path.abspath('...'))
Then, in each module that needs parent/cousin access, such as foo_tests.py, I can write a nice clean list of imports like so:
import setpath # Annoyingly, PyCharm warns me that this is an unused import statement
import foo.py
Inside setpath.py, the second and third inserts are not strictly necessary for this example, but are included as a troubleshooting step.
My problem is that this only works for imports that reference the module name directly, and not for imports that reference the package. For example, inside bar_tests.py, neither of the two statements below work when running bar_tests.py directly.
import setpath
import Project_3.foo.py # Error
from Project_3 import foo # Error
I receive the error "ImportError: No module named 'Project_3'".
What is odd is that I can run the file directly from within PyCharm and it works fine. I know that PyCharm is doing some behind the scenes magic with the Python Path variable to make everything work, but I can't figure out what it is. As PyCharm simply runs python.exe and sets some environmental variables, it should be possible to clone this behavior from within a Python script itself.
For reasons not really germane to this question, I have to reference bar using the Project_3 qualifier.
I'm open to any solution that accomplishes the above while still meeting my earlier goals. I'm also open to an alternate directory structure if there is one that works better. I've read the Python doc on imports and packages but am still at a loss. I think one possible avenue might be manually setting the __path__ variable, but I'm not sure which one needs to be changed or what to set it to.
Those types of questions qualify as "primarily opinion based", so let me share my opinion how I would do it.
First "be able to independently run each .py file when necessary": either the file is an module, so it should not be called directly, or it is standalone executable, then it should import its dependencies starting from top level (you may avoid it in code or rather move it to common place, by using setup.py entry_points, but then your former executable effectively converts to a module). And yes, it is one of weak points of Python modules model, that causes misunderstandings.
Second, use virtualenv (or venv in Python3) and put each of your Project_x into separate one. This way project's name won't be part of Python module's path.
Third, link you've provided mentions setup.py – you may make use of it. Put your custom code into Project_x/src/mylib1, create src/mylib1/setup.py and finally your modules into src/mylib1/mylib1/module.py. Then you may install your code by pip as any other package (or pip -e so you may work on the code directly without reinstalling it, though it unfortunately has some limitations).
And finally, as you've confirmed in comment already ;). Problem with your current model was that in sys.path.insert(0, os.path.abspath('...')) you'd mistakenly used Python module's notation, which in incorrect for system paths and should be replaced with '../..' to work as expected.
I think your goals are not reasonable. Specifically, goal number 2 is a problem:
Be able to independently run each .py file when neccessary
This doesn't work well for modules in a package. At least, not if you're running the .py files naively (e.g. with python foo_tests.py on the command line). When you run the files that way, Python can't tell where the package hierarchy should start.
There are two alternatives that can work. The first option is to run your scripts from the top level folder (e.g. projects) using the -m flag to the interpreter to give it a dotted path to the main module, and using explicit relative imports to get the sibling and cousin modules. So rather than running python foo_tests.py directly, run python -m project_3.tests.foo_tests from the projects folder (or python -m tests.foo_tests from within project_3 perhaps), and have have foo_tests.py use from .. import foo.
The other (less good) option is to add a top-level folder to your Python installation's module search path on a system wide basis (e.g. add the projects folder to the PYTHON_PATH environment variable), and then use absolute imports for all your modules (e.g. import project3.foo). This is effectively what your setpath module does, but doing it system wide as part of your system's configuration, rather than at run time, it's much cleaner. It also avoids the multiple names that setpath will allow to you use to import a module (e.g. try import foo_tests, tests.foo_tests and you'll get two separate copies of the same module).

Module naming convention (avoid collisions)

I'm learning Python and I have already created few ad-hoc utility modules that I use for whatever. I don't intend to install them anywhere, having them simply laying around and copying them wherever needed is OK for me now.
So I typically just create a file named like mymodule.py and import mymodule from a script in the same directory. For names I use lowercase alphabet (i.e. no _s) only. So now after I have seen couple of "real" Python modules and realized that the convention is mostly the same, I'm starting to wonder about clashes.
Is there a convention for naming own ad-hoc modules, so that one can avoid clashes with core or "pip" modules (even future ones that are yet to be added)?
Similar convention exists in Perl community (especially because of CPAN), where all such modules should start with Local::, as in Local::MyCrazyModule.
Note: There is a similar question here on SO, but that one does not seem to ask specifically about modules, but rather about variable names clashing with modules.
The easiest way to not to conflict with pip modules is to create dump module with your name and publish it to pypi. And then you can use this top level namespace for all your modeles, the published or not published.
For more informations about name conversions you can read: http://www.python.org/dev/peps/pep-0423/

Namespace packages with a core part?

This question follows up The way to make namespace packages in Python and How do I create a namespace package in Python?.
Note PEP 420, and the distribute docs, which state:
You must NOT include any other code and data in a namespace package’s __init__.py. Even though it may appear to work during development, or when projects are installed as .egg files, it will not work when the projects are installed using “system” packaging tools – in such cases the __init__.py files will not be installed, let alone executed.
This all seems to make it impossible to have a "main library" package with independently distributed extension sub-packages. What I want is to be able to:
define a core library package, to be used like this:
import mylibrary
mylibrary.some_function()
allow library extensions, packaged and distributed separately, to be used like this:
import mylibrary.myextension
mylibrary.myextension.some_other_function()
I would've expected to be able to do this with namespace packages, but it seems not to be the case, based on the questions and links above. Can this be done at all?
It is indeed not possible to have code in a top level __init__.py for a PEP 420 namespace package.
If I were you, I'd either:
create 2 packages, one called mylibrary (a normal package) which contains your actual library code, and the other called mylibrary_plugins which is a namespace package.
or, create mylibrary.lib, which is a normal package and contains your code, and mylibrary.plugins, which is a namespace package.
Personally I'd use option 1.
The rationale section of PEP 420 explains why __init__.py cannot contain any code.
strictly speaking, you can have variables under mylibrary, you just won't be able to define them there. You can, for instance:
# mylibrary/core.py
import mylibrary
def some_function():
pass
mylibrary.some_function = some_function
and your users can use it like:
import mylibrary.core
mylibrary.some_function()
That is to say, mylibrary.core monkey patches mylibrary so that, other than the import, it looks as though somefunction is defined in mylibrary rather than a sub-package.

Python path issue while importing my own modules. What is the best way to go about it?

I've created python modules but they are in different directories.
/xml/xmlcreator.py
/tasklist/tasks.py
Here, tasks.py is trying to import xmlcreator but both are in different paths. One way to do it is include xmlcreator.py in the Pythonpath. But, considering that I'll be publishing the code, this doesn't seem the right way to go about it as suggested here. Thus, how do I include xmlcreator or rather any module that might be written by me which would be in various directories and sub directories?
Are you going to publish both modules separately or together in one package?
If the former, then you'll probably want to have your users install your xml module (I'd call it something else :) so that it is, by default, already on Python's path, and declare it as a dependency of the tasklist module.
If both are distributed as a bundle, then relative imports seem to be the best option, since you can control where the paths are relative to each other.
The best way is to create subpackages in a single top-level package that you define. You then ship these together in one package. If you are using setuptools/Distribute and you want to distribute them separately then you may also define a "namspace package" that the packages will be installed in. You don't need to use any ugly sys.path hacks.
Make a directory tree like this:
mypackage/__init__.py
mypackage/xml/__init__.py
mypackage/xml/xmlcreator.py
mypackage/tasklist/__init__.py
mypackage/tasklist/tasks.py
The __init__.py files may be empty. They define the directory to be a package that Python will search in.
Except if you want to use namespace packages the mypackage/__init__.py should contains:
__import__('pkg_resources').declare_namespace(__name__)
And your setup.py file contain:
...
namespace_packages=["mypackage"],
...
Then in your code:
from mypackage.xml import xmlcreator
from mypackage.tasklist import tasks
Will get them anywhere you need them. You only need to make one name globally unique in this case, the mypackage name.
For developing the code you can put the package in "develop mode", by doing
python setup.py develop --user
This will set up the local python environment to look for your package in your workspace.
When I start a new Python project, I immediately write its setup.py and declare my Python modules/packages, so that then I just do:
python setup.py develop
and everything gets magically added to my PYTHONPATH. If you do it from a virtualenv it's even better, since you don't need to install it system-wide.
Here's more about it:
http://packages.python.org/distribute/setuptools.html#development-mode

Categories

Resources