How to modularize a Python application

How to modularize a Python application - python

I've got a number of scripts that use common definitions. How do I split them in multiple files? Furthermore, the application can not be installed in any way in my scenario; it must be possible to have an arbitrary number of versions concurrently running and it must work without superuser rights. Solutions I've come up with are:
Duplicate code in every
script. Messy, and probably the worst
scheme.
Put all scripts and common
code in a single directory, and
use from . import to load them.
The downside of this approach is that
I'd like to put my libraries in other
directory than the applications.
Put common
code in its own directory, write a __init__.py that imports all submodules and finally use from . import to load them.
Keeps code organized, but it's a little bit of overhead to maintain __init__.py and qualify names.
Add the library directory to
sys.path and
import. I tend to
this, but I'm not sure whether
fiddling with sys.path
is nice code.
Load using
execfile
(exec in Python 3).
Combines the advantages of the
previous two approaches: Only one
line per module needed, and I can use
a dedicated. On the other hand, this
evades the python module concept and
polutes the global namespace.
Write and install a module using
distutils. This
installs the library for all python
scripts and needs superuser rights
and impacts other applications and is hence not applicable in my case.
What is the best method?

Adding to sys.path (usually using site.addsitedir) is quite common and not particularly frowned upon. Certainly you will want your common working shared stuff to be in modules somewhere convenient.
If you are using Python 2.6+ there's already a user-level modules folder you can use without having to add to sys.path or PYTHONPATH. It's ~/.local/lib/python2.6/site-packages on Unix-likes - see PEP 370 for more information.

You can set the PYTHONPATH environment variable to the directory where your library files are located. This adds that path to the library search path and you can use a normal import to import them.

If you have multiple environments which have various combinations of dependencies, a good solution is to use virtualenv to create sandboxed Python environments, each with their own set of installed packages. Each environment will function in the same way as a system-wide Python site-packages setup, but no superuser rights are required to create local environments.
Google has plenty of info, but this looks like a pretty good starting point.

Another alternative to manually adding the path to sys.path is to use the environment variable PYTHONPATH.
Also, distutils allows you to specify a custom installation directory using
python setup.py install --home=/my/dir
However, neither of these may be practical if you need to have multiple versions running simultaneously with the same module names. In that case you're probably best off modifying sys.path.

I've used the third approach (add the directories to sys.path) for more than one project, and I think it's a valid approach.

Related

Is it possible to have users not pip install modules and instead include the modules used in a different folder and then import that?

I want to know if I can create a python script with a folder in the same directory with all the assets of a python module, so when someone wants to use it, they would not have to pip install module, because it would import from the directory.

Yes, you can, but it doesn't mean that you should.
First, ask yourself who is suposed to use that code.
If you plan to give it to consumers, it would be a good idea to use a tool like py2exe and create executable file which would include all modules and not allow for code to be changed.
If you plan to share it with another developer, you might want to look into virtual environments and requirements.txt file.
There are multiple reasons why sharing modules is bad idea:
It is harder to update modules later, at least without upgrading whole project.
It uses more space on version control, which can create issues on huge projects with hundreds of modules and branches
It might be illegal as some licenses specifically forbid including their code in your source code.
The pip install of some module might do different things depending on operating system version or installed packages. The modules on your machine might be suboptimal on someone else's machine, and in some instances might not even work.
And probably more that I can't think of right now.
The only situation where I saw this being unavoidable was when the module didn't support python implementation the application was running on. The module was changed, and its source was put under lib folder with the rest of the libraries.

I think you can add the directory with python modules into PYTHONPATH. Then people want to use those modules just need has this envvar set.
https://docs.python.org/3/using/cmdline.html#envvar-PYTHONPATH

Some way to create a cross-platform, self-contained, cloud-synchronized python library of modules for personal use? [duplicate]

I need to ship a collection of Python programs that use multiple packages stored in a local Library directory: the goal is to avoid having users install packages before using my programs (the packages are shipped in the Library directory). What is the best way of importing the packages contained in Library?
I tried three methods, but none of them appears perfect: is there a simpler and robust method? or is one of these methods the best one can do?
In the first method, the Library folder is simply added to the library path:
import sys
import os
sys.path.insert(0, os.path.join(os.path.dirname(__file__), 'Library'))
import package_from_Library
The Library folder is put at the beginning so that the packages shipped with my programs have priority over the same modules installed by the user (this way I am sure that they have the correct version to work with my programs). This method also works when the Library folder is not in the current directory, which is good. However, this approach has drawbacks. Each and every one of my programs adds a copy of the same path to sys.path, which is a waste. In addition, all programs must contain the same three path-modifying lines, which goes against the Don't Repeat Yourself principle.
An improvement over the above problems consists in trying to add the Library path only once, by doing it in an imported module:
# In module add_Library_path:
sys.path.insert(0, os.path.join(os.path.dirname(__file__), 'Library'))
and then to use, in each of my programs:
import add_Library_path
import package_from_Library
This way, thanks to the caching mechanism of CPython, the module add_Library_path is only run once, and the Library path is added only once to sys.path. However, a drawback of this approach is that import add_Library_path has an invisible side effect, and that the order of the imports matters: this makes the code less legible, and more fragile. Also, this forces my distribution of programs to inlude an add_Library_path.py program that users will not use.
Python modules from Library can also be imported by making it a package (empty __init__.py file stored inside), which allows one to do:
from Library import module_from_Library
However, this breaks for packages in Library, as they might do something like from xlutils.filter import …, which breaks because xlutils is not found in sys.path. So, this method works, but only when including modules in Library, not packages.
All these methods have some drawback.
Is there a better way of shipping programs with a collection of packages (that they use) stored in a local Library directory? or is one of the methods above (method 1?) the best one can do?
PS: In my case, all the packages from Library are pure Python packages, but a more general solution that works for any operating system is best.
PPS: The goal is that the user be able to use my programs without having to install anything (beyond copying the directory I ship them regularly), like in the examples above.
PPPS: More precisely, the goal is to have the flexibility of easily updating both my collection of programs and their associated third-party packages from Library by having my users do a simple copy of a directory containing my programs and the Library folder of "hidden" third-party packages. (I do frequent updates, so I prefer not forcing the users to update their Python distribution too.)

Messing around with sys.path() leads to pain... The modern package template and Distribute contain a vast array of information and were in part set up to solve your problem.
What I would do is to set up setup.py to install all your packages to a specific site-packages location or if you could do it to the system's site-packages. In the former case, the local site-packages would then be added to the PYTHONPATH of the system/user. In the latter case, nothing needs to changes
You could use the batch file to set the python path as well. Or change the python executable to point to a shell script that contains a modified PYTHONPATH and then executes the python interpreter. The latter of course, means that you have to have access to the user's machine, which you do not. However, if your users only run scripts and do not import your own libraries, you could use your own wrapper for scripts:
#!/path/to/my/python
And the /path/to/my/python script would be something like:
#!/bin/sh
PYTHONPATH=/whatever/lib/path:$PYTHONPATH /usr/bin/python $*

I think you should have a look at path import hooks which allow to modify the behaviour of python when searching for modules.
For example you could try to do something like kde's scriptengine does for python plugins[1].
It adds a special token to sys.path(like "<plasmaXXXXXX>" with XXXXXX being a random number just to avoid name collisions) and then when python try to import modules and can't find them in the other paths, it will call your importer which can deal with it.
A simpler alternative is to have a main script used as launcher which simply adds the path to sys.path and execute the target file(so that you can safely avoid putting the sys.path.append(...) line on every file).
Yet an other alternative, that works on python2.6+, would be to install the library under the per-user site-packages directory.
[1] You can find the source code under /usr/share/kde4/apps/plasma_scriptengine_python in a linux installation with kde.

Best practice for handling path/executables in project scripts in Python (e.g. something like Django's manage.py, or fabric)

I do a lot of work on different projects (I'm a scientist) in a fairly standardised directory structure. e.g.:
project
/analyses/
/lib
/doc
/results
/bin
I put all my various utility scripts in /bin/ because cleanliness is next to godliness. However, I have to hard code paths (e.g. ../../x/y/z) and then I have to run things within ./bin/ or they break.
I've used Django and that has /manage.py which runs various django-things and automatically handles the path. I've also used fabric to run various user defined functions.
Question: How do I do something similar? and what's the best way? I can easily write something in /manage.py to inject the root dir into sys.path etc, but then I'd like to be able to do "./manage.py foo" which would run /bin/foo.py. Or is it possible to get fabric to call executables from a certain directory?
Basically - I want something easy and low maintenance. I want to be able to drop an executable script/file/whatever into ./bin/ and not have to deal with path issues or import issues.
What is the best way to do this?

Keep Execution at TLD
In general, try to keep your runtime at top-level. This will straighten out your imports tremendously.
If you have to do a lot of import addressing with relative imports, there's probably a
better way.
Modifying The Path
Other poster's have mentioned the PYTHONPATH. That's a great way to do it permanently in your shell.
If you don't want to/aren't able to manipulate the PYTHONPATH project path directly you can use sys.path to get yourself out of relative import hell.
Using sys.path.append
sys.path is just a list internally. You can append to it to add stuff to into your path.
Say I'm in /bin and there's a library markdown in lib/. You can append a relative paths with sys.path to import what you want.
import sys
sys.path.append('../lib')
import markdown
print markdown.markdown("""
Hello world!
------------
""")
Word to the wise: Don't get too crazy with your sys.path additions. Keep your schema simple to avoid yourself a lot confusion.
Overly eager imports can sometimes lead to cases where a python module needs to import itself, at which point execution will halt!
Using Packages and __init__.py
Another great trick is creating python packages by adding __init__.py files. __init__.py gets loaded before any other modules in the directory, so it's a great way to add imports across the entire directory. This makes it an ideal spot to add sys.path hackery.
You don't even need to necessarily add anything to the file. It's sufficient to just do touch __init__.py at the console to make a directory a package.
See this SO post for a more concrete example.

In a shell script that you source (not run) in your current shell you set the following environment variables:
PATH=$PATH:$PROJECTDIR/bin
PYTHONPATH=$PROJECTDIR/lib
Then you put your Python modules and package tree in your projects ./lib directory. Python automatically adds the PYTHONPATH environment variable to sys.path.
Then you can run any top-level script from the shell without specifying the path, and any imports from your library modules are looked for in the lib directory.
I recommend very simple top-level scripts, such as:
#!/usr/bin/python
import sys
import mytool
mytool.main(sys.argv)
Then you never have to change that, you just edit the module code, and also benefit from the byte-code caching.

You can easily achieve your goals by creating a mini package that hosts each one of your projects. Use paste scripts to create a simple project skeleton. And to make it executable, just install it via setup.py develop. Now your bin scripts just need to import the entry point to this package and execute it.

Importing Python modules from a distant directory

What's the shortest way to import a module from a distant, relative directory?
We've been using this code which isn't too bad except your current working directory has to be the same as the directory as this code's or the relative path breaks, which can be error prone and confusing to users.
import sys
sys.path.append('../../../Path/To/Shared/Code')
This code (I think) fixes that problem but is a lot more to type.
import os,sys
sys.path.append(os.path.realpath(os.path.join(os.path.dirname(__file__), '../../../Path/To/Shared/Code')))
Is there a shorter way to append the absolute path? The brevity matters because this is going to have to be typed/appear in a lot of our files. (We can't factor it out because then it would be in the shared code and we couldn't get to it. Chicken & egg, bootstrapping, etc.)
Plus it bothers me that we keep blindly appending to sys.path but that would be even more code. I sure wish something in the standard library could help with this.
This will typically appear in script files which are run from the command line. We're running Python 2.6.2.
Edit:
The reason we're using relative paths is that we typically have multiple, independent copies of the codebase on our computers. It's important that each copy of the codebase use its own copy of the shared code. So any solution which supports only a single code base (e.g., 'Put it in site-packages.') won't work for us.
Any suggestions? Thank you!

Since you don't want to install it in site-packages, you should use buildout or virtualenv to create isolated development environments. That solves the problem, and means you don't have to fiddle with sys.path anymore (in fact, because Buildout does exactly that for you).

You've explained in a comment why you don't want to install "a single site-packages directory", but what about putting in site-packages a single, tiny module, say jebootstrap.py:
import os, sys
def relative_dir(apath):
return os.path.realpath(
os.path.join(os.path.dirname(apath),
'../../../Path/To/Shared/Code'))
def addpack(apath):
relative = relative_dir(apath)
if relative not in sys.path:
sys.path.append(relative)
Now everywhere in your code you can just have
import jebootstrap
jebootsrap.addpack(__file__)
and all the rest of your shared codebase can remain independent per-installation.

Any reason you wouldn't want to make your own shared-code dir under site-packages? Then you could just import import shared.code.module...

You have several ways to handle imports, all documented in the Python language manual.
See http://docs.python.org/library/site.html and http://docs.python.org/reference/simple_stmts.html#the-import-statement
Put it in site-packages and have multiple Python installations. You select the installation using the ordinary PATH environment variable.
Put the directory in your PYTHONPATH environment variable. This is a per-individual-person setting, so you can manage to have multiple versions of the codebase this way.
Put the directory in .pth files in your site-packages. You select the installation using the ordinary PATH environment variable.

PYTHONPATH vs. sys.path

Another developer and I disagree about whether PYTHONPATH or sys.path should be used to allow Python to find a Python package in a user (e.g., development) directory.
We have a Python project with a typical directory structure:
Project
setup.py
package
__init__.py
lib.py
script.py
In script.py, we need to do import package.lib. When the package is installed in site-packages, script.py can find package.lib.
When working from a user directory, however, something else needs to be done. My solution is to set my PYTHONPATH to include "~/Project". Another developer wants to put this line of code in the beginning of script.py:
sys.path.append(os.path.dirname(os.path.dirname(os.path.abspath(__file__))))
So that Python can find the local copy of package.lib.
I think this is a bad idea, as this line is only useful for developers or people running from a local copy, but I can't give a good reason why it is a bad idea.
Should we use PYTOHNPATH, sys.path, or is either fine?

If the only reason to modify the path is for developers working from their working tree, then you should use an installation tool to set up your environment for you. virtualenv is very popular, and if you are using setuptools, you can simply run setup.py develop to semi-install the working tree in your current Python installation.

I hate PYTHONPATH. I find it brittle and annoying to set on a per-user basis (especially for daemon users) and keep track of as project folders move around. I would much rather set sys.path in the invoke scripts for standalone projects.
However sys.path.append isn't the way to do it. You can easily get duplicates, and it doesn't sort out .pth files. Better (and more readable): site.addsitedir.
And script.py wouldn't normally be the more appropriate place to do it, as it's inside the package you want to make available on the path. Library modules should certainly not be touching sys.path themselves. Instead, you'd normally have a hashbanged-script outside the package that you use to instantiate and run the app, and it's in this trivial wrapper script you'd put deployment details like sys.path-frobbing.

In general I would consider setting up of an environment variable (like PYTHONPATH)
to be a bad practice. While this might be fine for a one off debugging but using this as
a regular practice might not be a good idea.
Usage of environment variable leads to situations like "it works for me" when some one
else reports problems in the code base. Also one might carry the same practice with the
test environment as well, leading to situations like the tests running fine for a
particular developer but probably failing when some one launches the tests.

Along with the many other reasons mentioned already, you could also point outh that hard-coding
sys.path.append(os.path.dirname(os.path.dirname(os.path.abspath(__file__))))
is brittle because it presumes the location of script.py -- it will only work if script.py is located in Project/package. It will break if a user decides to move/copy/symlink script.py (almost) anywhere else.

Neither hacking PYTHONPATH nor sys.path is a good idea due to the before mentioned reasons. And for linking the current project into the site-packages folder there is actually a better way than python setup.py develop, as explained here:
pip install --editable path/to/project
If you don't already have a setup.py in your project's root folder, this one is good enough to start with:
from setuptools import setup
setup('project')

I think, that in this case using PYTHONPATH is a better thing, mostly because it doesn't introduce (questionable) unneccessary code.
After all, if you think of it, your user doesn't need that sys.path thing, because your package will get installed into site-packages, because you will be using a packaging system.
If the user chooses to run from a "local copy", as you call it, then I've observed, that the usual practice is to state, that the package needs to be added to PYTHONPATH manually, if used outside the site-packages.

Develop Reference

Python is a programming language that lets you work quickly and integrate systems more effectively.

How to modularize a Python application - python

You can set the PYTHONPATH environment variable to the directory where your library files are located. This adds that path to the library search path and you can use a normal import to import them.

I've used the third approach (add the directories to sys.path) for more than one project, and I think it's a valid approach.

Related

Is it possible to have users not pip install modules and instead include the modules used in a different folder and then import that?

Some way to create a cross-platform, self-contained, cloud-synchronized python library of modules for personal use? [duplicate]

Best practice for handling path/executables in project scripts in Python (e.g. something like Django's manage.py, or fabric)

Importing Python modules from a distant directory

PYTHONPATH vs. sys.path

Categories

Resources