I'm writing two scripts which are intended to be run from the command line. Let's call them foo.py and bar.py. In addition, I have a utility module called util which is to be shared by both of these scripts. How do I structure my code so that foo and bar can both have a simple line like import util? Currently, my directory structure is like this:
MyProject
\--foo
\--foo.py
\--foo_util.py
\--bar
\--bar.py
\--bar_util.py
\--util
\--util.py
Within foo, I want to be able to write import foo_util and import util but I dont' want to be able to write import bar, because they are independent programs which have no reason to use each others' code.
Update: A slightly modified version of chepner's solution, I've found something that seems to work for me. My project is now structured like this:
MyProject
\--__init__.py
\--foo.py
\--bar.py
\--MyProject
\--__init__.py
\--foo
\--__init__.py
\--foo_impl.py
\--bar
\--__init__.py
\--bar_impl.py
\--shared
\--__init__.py
\--util.py
foo.py can write import MyProject.foo as foo and bar.py can do something similar, and both foo and bar can do import MyProject.shared
I would suggest the following layout: foo.py and bar.py, as scripts, can be placed anywhere. You should install in a known place the following packages:
<known location, such as /usr/lib/python/site-packages>
\--foo
\--util.py
\--bar
\--util.py
\--util.py
Then, use import foo.util, import bar.util, and import util to access the individual modules where and when necessary.
As abarnert pointed out, you can't hide bar/util.py from foo, nor should you care about doing so.
Put util in site-packages. That's where any modules that you want to be able to import into multiple projects are supposed to go.
The import util, given your current structure, does not import your util: it imports Python's util module. To solve this, you have to add __init__.py to util directory, which makes it valid Python package.
In order to import your util, you can either do relative import (which would violate your second wish and require even your app to be a package) or use PYTHONPATH to run your script.
PYTHONPATH="../util" python foo.py
If you wanted to use util as a library, it belongs to a site-packages directory, located implicitly in PYTHONPATH.
I think what you're missing here is that building an installable setuptools-based package solves your problem without actually requiring you to install anything system-wide.
First, this kind of thing is exactly what virtualenv was created for. (Or, if you're on 3.4+, the stdlib's venv.) You create a new virtual environment. Within that environment, you pip install . your util library, and now it's in that environment's site-packages. And then you can run foo and bar and whatever else you want inside that environment, and they can all import util—but nothing has changed in your main system environment.
Even if you don't want to use virtualenv for some reason (but you really should…), if you build your setup right, it will allow you to run everything in "development" mode, which does a fake install to a directory under your source tree and sets up all the links to make it work. This can be a bit fussy when you're trying to install multiple separate scripts as well as modules that they share, but it does work.
A full tutorial on how to layout distributions, make setuptools auto-generate the wrapper scripts and install them to bin, etc. is way too big for an answer here, and the PyPA people already wrote one. There are some complicated bits, so you probably will get stuck at some point, but when you do, you'll have another good question to ask at SO. :)
Related
I wanted to ask if there is a way to import modules/functions from a folder in a project that is adjacent to another folder.
For example lets say I have two files:
project/src/training.py
project/lib/functions.py
Now both these folders have the __init__.py file in them. If I wanted to import functions.py into training.py, it doesn't seem to detect. I'm trying to use from lib.functions import * .I know this works from the upper level of the folder structure, where I can call both files from a script, but is there a way to do it files in above/sideways folders?
Fundamentally, the best way of doing this depends on exactly how the two modules are related. There's two possibilities:
The modules are part of one cohesive unit that is intended to be used as a single whole, rather than as separate pieces. (This is the most common situation, and it should be your default if you're not sure.)
The functions.py module is a dependency for training.py but is intended to be used separately.
Cohesive unit
If the modules are one cohesive unit, this is not the standard way of structuring a project in Python.
If you need multiple modules in the same project, the standard way of structuring the folders is to include all the modules in a single package, like so:
project/
trainingproject/
__init__.py
training.py
functions.py
other/
...
project/
...
folders/
...
The __init__.py file causes Python to recognize the trainproject/ directory as a single unit called a package. Using a package enables to use of relative imports:
training.py
from . import functions
# The rest of training.py code
Assuming your current directory is project, you can then invoke training.py as a module:
python -m trainingproject.training
Separate units
If your modules are actually separate packages, then the simplest idiomatic solutions during development is to modify the PYTHONPATH environment variable:
sh-derviative syntax:
# All the extra PYTHONPATH references on the right are to append if it already has values
# Using $PWD ensures that the path in the environment variable is absolute.
PYTHONPATH=$PYTHONPATH${PYTHONPATH:+:}$PWD/lib/
python ./src/training.py
PowerShell syntax:
$env:PYTHONPATH = $(if($env:PYTHONPATH) {$env:PYTHONPATH + ';'}) + (Resolve-Path ./lib)
python ./src/training.py
(This is possible in Command Prompt, too, but I'm omitting that since PowerShell is preferred.)
In your module, you would just do a normal import statement:
training.py
import functions
# Rest of training.py code
Doing this will work when you deploy your code to production as well if you copy all the files over and set up the correct paths, but you might want to consider putting functions.py in a wheel and then installing it with pip. That will eliminate the need to set up PYTHONPATH by installing functions.py to site-packages, which will make the import statement just work out of the box. That will also make it easier to distribute functions.py for use with other scripts independent of training.py. I'm not going to cover how to create a wheel here since that is beyond the scope of this question, but here's an introduction.
Yes, it’s as simple as writing the entire path from the working directory:
from project.src.training import *
Or
from project.lib.functions import *
I agree with what polymath stated above. If you were also wondering how to run these specific scripts or functions once they are imported, use: your_function_name(parameters), and to run a script that you have imported from the same directory, etc, use: exec(‘script_name.py). I would recommend making functions instead of using the exec command however, because it can be a bit hard to use correctly.
This is my directory structure:
Projects
+ Project_1
+ Project_2
- Project_3
- Lib1
__init__.py # empty
moduleA.py
- Tests
__init__.py # empty
foo_tests.py
bar_tests.py
setpath.py
__init__.py # empty
foo.py
bar.py
Goals:
Have an organized project structure
Be able to independently run each .py file when necessary
Be able to reference/import both sibling and cousin modules
Keep all import/from statements at the beginning of each file.
I Achieved #1 by using the above structure
I've mostly achieved 2, 3, and 4 by doing the following (as recommended by this excellent guide)
In any package that needs to access parent or cousin modules (such as the Tests directory above) I include a file called setpath.py which has the following code:
import os
import sys
sys.path.insert(0, os.path.abspath('..'))
sys.path.insert(0, os.path.abspath('.'))
sys.path.insert(0, os.path.abspath('...'))
Then, in each module that needs parent/cousin access, such as foo_tests.py, I can write a nice clean list of imports like so:
import setpath # Annoyingly, PyCharm warns me that this is an unused import statement
import foo.py
Inside setpath.py, the second and third inserts are not strictly necessary for this example, but are included as a troubleshooting step.
My problem is that this only works for imports that reference the module name directly, and not for imports that reference the package. For example, inside bar_tests.py, neither of the two statements below work when running bar_tests.py directly.
import setpath
import Project_3.foo.py # Error
from Project_3 import foo # Error
I receive the error "ImportError: No module named 'Project_3'".
What is odd is that I can run the file directly from within PyCharm and it works fine. I know that PyCharm is doing some behind the scenes magic with the Python Path variable to make everything work, but I can't figure out what it is. As PyCharm simply runs python.exe and sets some environmental variables, it should be possible to clone this behavior from within a Python script itself.
For reasons not really germane to this question, I have to reference bar using the Project_3 qualifier.
I'm open to any solution that accomplishes the above while still meeting my earlier goals. I'm also open to an alternate directory structure if there is one that works better. I've read the Python doc on imports and packages but am still at a loss. I think one possible avenue might be manually setting the __path__ variable, but I'm not sure which one needs to be changed or what to set it to.
Those types of questions qualify as "primarily opinion based", so let me share my opinion how I would do it.
First "be able to independently run each .py file when necessary": either the file is an module, so it should not be called directly, or it is standalone executable, then it should import its dependencies starting from top level (you may avoid it in code or rather move it to common place, by using setup.py entry_points, but then your former executable effectively converts to a module). And yes, it is one of weak points of Python modules model, that causes misunderstandings.
Second, use virtualenv (or venv in Python3) and put each of your Project_x into separate one. This way project's name won't be part of Python module's path.
Third, link you've provided mentions setup.py – you may make use of it. Put your custom code into Project_x/src/mylib1, create src/mylib1/setup.py and finally your modules into src/mylib1/mylib1/module.py. Then you may install your code by pip as any other package (or pip -e so you may work on the code directly without reinstalling it, though it unfortunately has some limitations).
And finally, as you've confirmed in comment already ;). Problem with your current model was that in sys.path.insert(0, os.path.abspath('...')) you'd mistakenly used Python module's notation, which in incorrect for system paths and should be replaced with '../..' to work as expected.
I think your goals are not reasonable. Specifically, goal number 2 is a problem:
Be able to independently run each .py file when neccessary
This doesn't work well for modules in a package. At least, not if you're running the .py files naively (e.g. with python foo_tests.py on the command line). When you run the files that way, Python can't tell where the package hierarchy should start.
There are two alternatives that can work. The first option is to run your scripts from the top level folder (e.g. projects) using the -m flag to the interpreter to give it a dotted path to the main module, and using explicit relative imports to get the sibling and cousin modules. So rather than running python foo_tests.py directly, run python -m project_3.tests.foo_tests from the projects folder (or python -m tests.foo_tests from within project_3 perhaps), and have have foo_tests.py use from .. import foo.
The other (less good) option is to add a top-level folder to your Python installation's module search path on a system wide basis (e.g. add the projects folder to the PYTHON_PATH environment variable), and then use absolute imports for all your modules (e.g. import project3.foo). This is effectively what your setpath module does, but doing it system wide as part of your system's configuration, rather than at run time, it's much cleaner. It also avoids the multiple names that setpath will allow to you use to import a module (e.g. try import foo_tests, tests.foo_tests and you'll get two separate copies of the same module).
I have a flask app that uses functions from custom modules.
My File hierarchy is like so:
__init__.py
ec2/__init__.py
citrixlb/__init__.py
So far in the root __init__.py I have a from ec2 import * clause to load my module.
Now I'm adding a new 'feature' called citrixlb.
Both the of the __init__.py files in citrixlb and ec2 use some of the same functions to do their task.
I was thinking of doing something like:
__init__.py
common/__init__.py
ec2/__init__.py
citrixlb/__init__.py
If I do the above,and move all common functions to common/__init__.py, how would ec2/__init__.py and citrixlb/__init__.py get access to the functions
in common/__init__.py?
The reason is that
I would like to keep the root __init__.py as sparse as possible
I wish to be able to run the __init__.py in citrixlb and ec2 as
standalone scripts.
I also wish to be able to continue to add functionality by adding newdir/__init__.py
If I do the above,and move all common functions to common/__init__.py, how would ec2/__init__.py and citrixlb/__init__.py get access to the functions in common/__init__.py?
This is exactly what explicit relative imports were designed for:
from .. import common
Or, if you insist on using import *:
from ..common import *
You can do this with absolute import instead. Assuming your top-level package is named mything:
from mything import common
from mything.common import *
But in this case, I think you're better with the relative version. It's not just more concise and easier to read, it's more robust (if you rename mything, or reorganize its structure, or embed this whole package inside a larger package…). But you may want to read the rationales for the two different features in PEP 328 to decide which one seems more compelling to you here.
One thing:
I wish to be able to run the __init__.py in citrixlb and ec2 as standalone scripts.
That, you can't do. Running modules inside a package as a top-level script is not supposed to work. Sometimes you get away with it. Once you're importing from a sibling or a parent, you definitely will not get away with it.
The right way to do it is either:
python -m mything.ec2 instead of python mything/ec2/__init__.py
Write a trivial ec2 script at the top level, that just does something like from mything.ec2 import main; main().
The latter is a common enough pattern that, if you're building a setuptools distribution, it can build the ec2 script for you automatically. And automatically make it still work even ec2 ends up in /usr/local/bin while the mything package is in your site-packages. See console_scripts for more details.
I currently have a project called "peewee" which consists of a single python file, peewee.py. There is also a module "tests.py" containing unit tests. This has been great, people that want to use the library can just grab a single file and run with it.
I've lately wanted to add some extras, but am not sure how to do this to make the namespacing right. If you look in the root of my project, it is something like:
peewee.py
tests.py
I want to add the following:
extras/__init__.py
extras/foo.py
extras/bar.py
And this is the tricky part. I want to have it such that folks using the single file can still do this, but if you want the extras you can have them, too. I want the extras to be namespaced such that:
from peewee.extras import foo
from peewee.extras.bar import Baz
My setup.py looks a bit like this:
setup(
name='peewee',
packages=['extras'],
py_modules=['peewee'],
# ... etc ...
)
But this doesn't quite work. Any help would be greatly appreciated!
Setting Up a Package
As #ThomasK said, the easiest way to do this would be with a package. If you name your package peewee, then you can edit the top-level __init__.py file to allow users to continue to use your package in the same way they have previously.
First, directory structure for your package and subfolders:
peewee/
__init__.py
peewee.py
extras/
__init__.py
foo.py
bar.py
The __init__.py file
Next, you need to add a few lines to the top-level __init__.py.
You could go for a quick-and-dirty method and just include:
from peewee.peewee import *
which would put everything in peewee.py in the top-level namespace of your package. Or, you could take the more traditional alternative and explicitly import only those methods that should be at the top level.
from peewee.peewee import funtion1, class1,...
and, for backwards compatibility, you could explicitly set the __all__ attribute of your module to include only peewee
__all__ = ['peewee']
which will let people continue to use from peewee import * if they really need to.
Writing a setup.py file
Finally, you'll have to set up some install scripts and such too. Zed Shaw's Learn Python The Hard Way exercise 46 has a simple and clear project skeleton that you should use.
The most important part is the setup.py file. The example page isn't too long and Zed's put a lot of work into making a really great book, so I'm not going to repost it here (though the entire book is available for free). You can also read the longer instructions/documentation for writing a setup.py file for distutils, however LPTHW will give you something that will do everything you want quickly and easily.
Total package directory structure
Note that your final directory structure will actually be a bit bigger (the name of peewee-pkg doesn't matter, bin is optional--the names of the subfolders matter)
peewee-pkg/
setup.py
bin
peewee/
__init__.py
peewee.py
extras/
__init__.py
foo.py
bar.py
Installing and using
After that, you could actually post your package to PyPi if you like, or you can distribute it to users directly. All they would need to do is run:
python setup.py install
and everything will be available to them.
Importing post-install
Finally, if you do specific imports in the peewee/__init__.py file as described earlier, your users will still be able to do:
from peewee import function1, class1, ...
But now they can also use import peewee.extras to get to the extras functions (or import peewee.extras.foo as foo or from pewee.extras.foo import extra_awesome), etc. And, as you asked in your question, they will also be able to do:
from pewee.extras import foo
And then access foo's functions as if the file were in the current directory.
Useful note for developing a package
On your computer, you should run:
python setup.py develop
which will install the package to your path just like using python setup.py install; however, develop tells python to recheck the file every time it uses the module, so that every time you make changes, they will be immediately available to the system for testing.
I do a lot of work on different projects (I'm a scientist) in a fairly standardised directory structure. e.g.:
project
/analyses/
/lib
/doc
/results
/bin
I put all my various utility scripts in /bin/ because cleanliness is next to godliness. However, I have to hard code paths (e.g. ../../x/y/z) and then I have to run things within ./bin/ or they break.
I've used Django and that has /manage.py which runs various django-things and automatically handles the path. I've also used fabric to run various user defined functions.
Question: How do I do something similar? and what's the best way? I can easily write something in /manage.py to inject the root dir into sys.path etc, but then I'd like to be able to do "./manage.py foo" which would run /bin/foo.py. Or is it possible to get fabric to call executables from a certain directory?
Basically - I want something easy and low maintenance. I want to be able to drop an executable script/file/whatever into ./bin/ and not have to deal with path issues or import issues.
What is the best way to do this?
Keep Execution at TLD
In general, try to keep your runtime at top-level. This will straighten out your imports tremendously.
If you have to do a lot of import addressing with relative imports, there's probably a
better way.
Modifying The Path
Other poster's have mentioned the PYTHONPATH. That's a great way to do it permanently in your shell.
If you don't want to/aren't able to manipulate the PYTHONPATH project path directly you can use sys.path to get yourself out of relative import hell.
Using sys.path.append
sys.path is just a list internally. You can append to it to add stuff to into your path.
Say I'm in /bin and there's a library markdown in lib/. You can append a relative paths with sys.path to import what you want.
import sys
sys.path.append('../lib')
import markdown
print markdown.markdown("""
Hello world!
------------
""")
Word to the wise: Don't get too crazy with your sys.path additions. Keep your schema simple to avoid yourself a lot confusion.
Overly eager imports can sometimes lead to cases where a python module needs to import itself, at which point execution will halt!
Using Packages and __init__.py
Another great trick is creating python packages by adding __init__.py files. __init__.py gets loaded before any other modules in the directory, so it's a great way to add imports across the entire directory. This makes it an ideal spot to add sys.path hackery.
You don't even need to necessarily add anything to the file. It's sufficient to just do touch __init__.py at the console to make a directory a package.
See this SO post for a more concrete example.
In a shell script that you source (not run) in your current shell you set the following environment variables:
PATH=$PATH:$PROJECTDIR/bin
PYTHONPATH=$PROJECTDIR/lib
Then you put your Python modules and package tree in your projects ./lib directory. Python automatically adds the PYTHONPATH environment variable to sys.path.
Then you can run any top-level script from the shell without specifying the path, and any imports from your library modules are looked for in the lib directory.
I recommend very simple top-level scripts, such as:
#!/usr/bin/python
import sys
import mytool
mytool.main(sys.argv)
Then you never have to change that, you just edit the module code, and also benefit from the byte-code caching.
You can easily achieve your goals by creating a mini package that hosts each one of your projects. Use paste scripts to create a simple project skeleton. And to make it executable, just install it via setup.py develop. Now your bin scripts just need to import the entry point to this package and execute it.