Add path to python package to sys.path - python

I have a case for needing to add a path to a python package to sys.path (instead of its parent directory), but then refer to the package normally by name.
Maybe that's weird, but let me exemplify what I need and maybe you guys know how to achieve that.
I have all kind of experimental folders, modules, etc inside a path like /home/me/python.
Now I don't want to add that folder to my sys.path (PYTHONPATH) since there are experimental modules which names could clash with something useful.
But, inside /home/me/python I want to have a folder like pyutils. So I want to add /home/me/python/pyutils to PYTHONPATH, but, be able to refer to the package by its name pyutils...like I would have added /home/me/python to the path.

One helpful fact is that adding something to the python path is different from importing it into your interpreter. You can structure your modules and submodules such that names will not clash.
Look here regarding how to create modules. I read the documents, but I think that module layout takes a little learning-by-doing, which means creating your modules, and then importing them into scripts, and see if the importing is awkward or requires too much qualification.
Separately consider the python import system. When you import something you can use the "import ... as" feature to name it something different as you import, and thereby prevent naming clashes.
You seem to have already understood how you can change the PYTHONPATH using sys.path(), as documented here.

You have a number of options:
Make a new directory pyutilsdir, place pyutils in pyutilsdir,
and then add pyutilsdir to PYTHONPATH.
Move the experimental code outside of /home/me/python and add python to
your PYTHONPATH.
Rename the experimental modules so their names do not clash with
other modules. Then add python to PYTHONPATH.
Use a version control system like git or hg to make the
experimental modules available or unavailable as desired.
You could have a master branch without the experimental modules,
and a feature branch that includes them. With git, for example, you could switch between
the two with
git checkout [master|feature]
The contents of /home/me/python/pyutils (the git repo directory) would
change depending on which commit is checked out. Thus, using version control, you can keep the experimental modules in pyutils, but only make them present when you checkout the feature branch.

I'll answer to my own question since I got an idea while writing the question, and maybe someone will need that.
I added a link from that folder to my site-packages folder like that:
ln -s /home/me/python/pyutils /path/to/site-packages/pyutils
Then, since the PYTHONPATH contains the /path/to/site-packages folder, and I have a pyutils folder in it, with init.py, I can just import like:
from pyutils import mymodule
And the rest of the /home/me/python is not in the PYTHONPATH

Related

How do I structure my Python project to allow named modules to be imported from sub directories

This is my directory structure:
Projects
+ Project_1
+ Project_2
- Project_3
- Lib1
__init__.py # empty
moduleA.py
- Tests
__init__.py # empty
foo_tests.py
bar_tests.py
setpath.py
__init__.py # empty
foo.py
bar.py
Goals:
Have an organized project structure
Be able to independently run each .py file when necessary
Be able to reference/import both sibling and cousin modules
Keep all import/from statements at the beginning of each file.
I Achieved #1 by using the above structure
I've mostly achieved 2, 3, and 4 by doing the following (as recommended by this excellent guide)
In any package that needs to access parent or cousin modules (such as the Tests directory above) I include a file called setpath.py which has the following code:
import os
import sys
sys.path.insert(0, os.path.abspath('..'))
sys.path.insert(0, os.path.abspath('.'))
sys.path.insert(0, os.path.abspath('...'))
Then, in each module that needs parent/cousin access, such as foo_tests.py, I can write a nice clean list of imports like so:
import setpath # Annoyingly, PyCharm warns me that this is an unused import statement
import foo.py
Inside setpath.py, the second and third inserts are not strictly necessary for this example, but are included as a troubleshooting step.
My problem is that this only works for imports that reference the module name directly, and not for imports that reference the package. For example, inside bar_tests.py, neither of the two statements below work when running bar_tests.py directly.
import setpath
import Project_3.foo.py # Error
from Project_3 import foo # Error
I receive the error "ImportError: No module named 'Project_3'".
What is odd is that I can run the file directly from within PyCharm and it works fine. I know that PyCharm is doing some behind the scenes magic with the Python Path variable to make everything work, but I can't figure out what it is. As PyCharm simply runs python.exe and sets some environmental variables, it should be possible to clone this behavior from within a Python script itself.
For reasons not really germane to this question, I have to reference bar using the Project_3 qualifier.
I'm open to any solution that accomplishes the above while still meeting my earlier goals. I'm also open to an alternate directory structure if there is one that works better. I've read the Python doc on imports and packages but am still at a loss. I think one possible avenue might be manually setting the __path__ variable, but I'm not sure which one needs to be changed or what to set it to.
Those types of questions qualify as "primarily opinion based", so let me share my opinion how I would do it.
First "be able to independently run each .py file when necessary": either the file is an module, so it should not be called directly, or it is standalone executable, then it should import its dependencies starting from top level (you may avoid it in code or rather move it to common place, by using setup.py entry_points, but then your former executable effectively converts to a module). And yes, it is one of weak points of Python modules model, that causes misunderstandings.
Second, use virtualenv (or venv in Python3) and put each of your Project_x into separate one. This way project's name won't be part of Python module's path.
Third, link you've provided mentions setup.py – you may make use of it. Put your custom code into Project_x/src/mylib1, create src/mylib1/setup.py and finally your modules into src/mylib1/mylib1/module.py. Then you may install your code by pip as any other package (or pip -e so you may work on the code directly without reinstalling it, though it unfortunately has some limitations).
And finally, as you've confirmed in comment already ;). Problem with your current model was that in sys.path.insert(0, os.path.abspath('...')) you'd mistakenly used Python module's notation, which in incorrect for system paths and should be replaced with '../..' to work as expected.
I think your goals are not reasonable. Specifically, goal number 2 is a problem:
Be able to independently run each .py file when neccessary
This doesn't work well for modules in a package. At least, not if you're running the .py files naively (e.g. with python foo_tests.py on the command line). When you run the files that way, Python can't tell where the package hierarchy should start.
There are two alternatives that can work. The first option is to run your scripts from the top level folder (e.g. projects) using the -m flag to the interpreter to give it a dotted path to the main module, and using explicit relative imports to get the sibling and cousin modules. So rather than running python foo_tests.py directly, run python -m project_3.tests.foo_tests from the projects folder (or python -m tests.foo_tests from within project_3 perhaps), and have have foo_tests.py use from .. import foo.
The other (less good) option is to add a top-level folder to your Python installation's module search path on a system wide basis (e.g. add the projects folder to the PYTHON_PATH environment variable), and then use absolute imports for all your modules (e.g. import project3.foo). This is effectively what your setpath module does, but doing it system wide as part of your system's configuration, rather than at run time, it's much cleaner. It also avoids the multiple names that setpath will allow to you use to import a module (e.g. try import foo_tests, tests.foo_tests and you'll get two separate copies of the same module).

Python importing only modules within package

I am creating a Python package with multiple modules. I want to make sure that when I import modules within the package that they are importing only from the package and not something outside the package that has the same name.
Is the correct way of doing this is to use relative imports? Will this interfere when I move my package to a different location on my machine (or gets installed wherever on a customer's machine)?
Modern relative imports (here's a reference) are package-relative and package-specific, so as long as the internal structure of your package does not change you can move the package as a whole around wherever you want.
While Joran Beasley's answer should work as well (though does not seem necessary in those older versions of Python where absolute imports aren't the default, as the old style of importing checked within the package's directory first), I personally don't really like modifying the import path like that when you don't have to, especially if you need to load some of those other packages or modules that your modules or packages now shadow.
A warning, however: these do require that the module in question is loaded as part of a package, or at least have their __name__ set to indicate a location in a package. Relative imports won't work for a module when __name__ == '__main__', so if you're writing a simple/casual script that utilizes another module in the same directory as it (and want to make sure the script will refer to the proper directory, things won't work right if the current working directory is not set to the script's), you could do something like import os, sys; sys.path.insert(0, os.path.dirname(os.path.abspath(__file__))) (with thanks to https://stackoverflow.com/a/1432949/138772 for the idea). As noted in S.Lott's answer to the same question, this probably isn't something you'd want to do professionally or as part of a team project, but for something personal where you're just doing some menial task automation or the like it should be fine.
the sys.path tells python where to look for imports
add
import sys
sys.path.insert(0,".")
to the top of your main python script this will ensure local packages are imported BEFORE builtin packages (although tbh I think this happens automagically)
if you really want to import only packages in your folder do
import sys
sys.path = ["."]
however I do not recommend this at all as it will probably break lots of your stuff ...
most IDE's (eclipse/pycharm/etc) provide mechanisms to set up the environment a project uses including its paths
really the best option is not to name packages the same as builtin packages or 3rd party modules that are installed on your system
also the best option is to distribute it via a correctly bundled package, this should more than suffice

Python path issue while importing my own modules. What is the best way to go about it?

I've created python modules but they are in different directories.
/xml/xmlcreator.py
/tasklist/tasks.py
Here, tasks.py is trying to import xmlcreator but both are in different paths. One way to do it is include xmlcreator.py in the Pythonpath. But, considering that I'll be publishing the code, this doesn't seem the right way to go about it as suggested here. Thus, how do I include xmlcreator or rather any module that might be written by me which would be in various directories and sub directories?
Are you going to publish both modules separately or together in one package?
If the former, then you'll probably want to have your users install your xml module (I'd call it something else :) so that it is, by default, already on Python's path, and declare it as a dependency of the tasklist module.
If both are distributed as a bundle, then relative imports seem to be the best option, since you can control where the paths are relative to each other.
The best way is to create subpackages in a single top-level package that you define. You then ship these together in one package. If you are using setuptools/Distribute and you want to distribute them separately then you may also define a "namspace package" that the packages will be installed in. You don't need to use any ugly sys.path hacks.
Make a directory tree like this:
mypackage/__init__.py
mypackage/xml/__init__.py
mypackage/xml/xmlcreator.py
mypackage/tasklist/__init__.py
mypackage/tasklist/tasks.py
The __init__.py files may be empty. They define the directory to be a package that Python will search in.
Except if you want to use namespace packages the mypackage/__init__.py should contains:
__import__('pkg_resources').declare_namespace(__name__)
And your setup.py file contain:
...
namespace_packages=["mypackage"],
...
Then in your code:
from mypackage.xml import xmlcreator
from mypackage.tasklist import tasks
Will get them anywhere you need them. You only need to make one name globally unique in this case, the mypackage name.
For developing the code you can put the package in "develop mode", by doing
python setup.py develop --user
This will set up the local python environment to look for your package in your workspace.
When I start a new Python project, I immediately write its setup.py and declare my Python modules/packages, so that then I just do:
python setup.py develop
and everything gets magically added to my PYTHONPATH. If you do it from a virtualenv it's even better, since you don't need to install it system-wide.
Here's more about it:
http://packages.python.org/distribute/setuptools.html#development-mode

ImportError - Using Python Packages at the same level

I have two Python packages where one needs to be imported by the other. The directory structure is like follows:
workspace/
management/
__init__.py
handle_management.py
other_management.py
utils/
__init__.py
utils_dict.py
I'm trying to import functionality from the utils project in the handle_management.py file:
import utils.utils_dict
Error I'm getting when trying to run handle_management.py:
ImportError: No module named utils.utils_dict
I've read a lot about how to resolve this problem and I can't seem to find a solution that works.
I started with Import a module from a relative path - I tried the applicable solutions but none worked.
Is the only solution to make workspace/ available via site_packages? If so, what is the best way to do this?
EDIT:
I've tried to add the /home/rico/workspace/ to the PYTHONPATH - no luck.
EDIT 2:
I was able to successfully use klobucar's solution but I don't think it will work as a general solution since this utility is going to be used by several other developers. I know I can use some Python generalizations to determine the relative path for each user. I just feel like there is a more elegant solution.
Ultimately this script will run via cron to execute unit testing on several Python projects. This is also going to be available to each developer to ensure integration testing during their development.
I need to be able to make this more general.
EDIT 3:
I'm sorry, but I don't really like any of these solutions for what I'm trying to accomplish. I appreciate the input - and I'm using them as a temporary fix. As a complete fix I'm going to look into adding a script available in the site_packages directory that will add to the PYTHONPATH. This is something that is needed across several machines and several developers.
Once I build my complete solution I'll post back here with what I did and mark it as a solution.
EDIT 4:
I feel as though I didn't do a good job expressing my needs with this question. The answers below addressed this question well - but not my actual needs. As a result I have restructured my question in another post. I considered editing this one, but then the answers below (which would be very helpful for others) wouldn't be meaningful to the change and would seem out of place and irrelevant.
For the revised version of this question please see Unable to import Python package universally
You have 2 solutions:
Either put workspace in your PYTHONPATH:
import sys
sys.path.append('/path/to/workspace')
from utils import utils_dict
(Note that if you're running a script inside workspace, that is importing handle_management, most probably workspace is already in your PYTHONPATH, and you wouldn't need to do that, but it seems it's not the case HERE).
Or, make "workspace" a package by adding an empty (or not) __init__.py file in the workspace directory. Then:
from ..utils import utils_dict
I would prefer the second, because you would have a problem if there's another module called "utils" in you PYTHONPATH
Apart from that, you are importing wrong here: import utils.utils_dict.py. You don't need to include the extension of the file ".py". You are not importing the file, you are importing the module (or package if it's a folder), so you don't want the path to that file, you need its name.
What you need to do is add workspace to your import path. I would make a wrapper that does this for you in workspace or just put workspace in you PYTHONPATH as an environment variable.
import sys
# Add the workspace folder path to the sys.path list
sys.path.append('/path/to/workspace/')
from workspace.utils import utils_dict
Put "workspace/" in your PYTHONPATH to make the packages underneath available to you when it searches.
This can be done from your shell profile (if you are on *nix, or environment variables on windows.
For instance, on OSX you might add to your ~/.profile (if workspace is in your home directory):
export PYTHONPATH=$HOME/workspace:$PYTHONPATH
Another option is to use virtualenv to make your project area its own contained environment.

Best practice for handling path/executables in project scripts in Python (e.g. something like Django's manage.py, or fabric)

I do a lot of work on different projects (I'm a scientist) in a fairly standardised directory structure. e.g.:
project
/analyses/
/lib
/doc
/results
/bin
I put all my various utility scripts in /bin/ because cleanliness is next to godliness. However, I have to hard code paths (e.g. ../../x/y/z) and then I have to run things within ./bin/ or they break.
I've used Django and that has /manage.py which runs various django-things and automatically handles the path. I've also used fabric to run various user defined functions.
Question: How do I do something similar? and what's the best way? I can easily write something in /manage.py to inject the root dir into sys.path etc, but then I'd like to be able to do "./manage.py foo" which would run /bin/foo.py. Or is it possible to get fabric to call executables from a certain directory?
Basically - I want something easy and low maintenance. I want to be able to drop an executable script/file/whatever into ./bin/ and not have to deal with path issues or import issues.
What is the best way to do this?
Keep Execution at TLD
In general, try to keep your runtime at top-level. This will straighten out your imports tremendously.
If you have to do a lot of import addressing with relative imports, there's probably a
better way.
Modifying The Path
Other poster's have mentioned the PYTHONPATH. That's a great way to do it permanently in your shell.
If you don't want to/aren't able to manipulate the PYTHONPATH project path directly you can use sys.path to get yourself out of relative import hell.
Using sys.path.append
sys.path is just a list internally. You can append to it to add stuff to into your path.
Say I'm in /bin and there's a library markdown in lib/. You can append a relative paths with sys.path to import what you want.
import sys
sys.path.append('../lib')
import markdown
print markdown.markdown("""
Hello world!
------------
""")
Word to the wise: Don't get too crazy with your sys.path additions. Keep your schema simple to avoid yourself a lot confusion.
Overly eager imports can sometimes lead to cases where a python module needs to import itself, at which point execution will halt!
Using Packages and __init__.py
Another great trick is creating python packages by adding __init__.py files. __init__.py gets loaded before any other modules in the directory, so it's a great way to add imports across the entire directory. This makes it an ideal spot to add sys.path hackery.
You don't even need to necessarily add anything to the file. It's sufficient to just do touch __init__.py at the console to make a directory a package.
See this SO post for a more concrete example.
In a shell script that you source (not run) in your current shell you set the following environment variables:
PATH=$PATH:$PROJECTDIR/bin
PYTHONPATH=$PROJECTDIR/lib
Then you put your Python modules and package tree in your projects ./lib directory. Python automatically adds the PYTHONPATH environment variable to sys.path.
Then you can run any top-level script from the shell without specifying the path, and any imports from your library modules are looked for in the lib directory.
I recommend very simple top-level scripts, such as:
#!/usr/bin/python
import sys
import mytool
mytool.main(sys.argv)
Then you never have to change that, you just edit the module code, and also benefit from the byte-code caching.
You can easily achieve your goals by creating a mini package that hosts each one of your projects. Use paste scripts to create a simple project skeleton. And to make it executable, just install it via setup.py develop. Now your bin scripts just need to import the entry point to this package and execute it.

Categories

Resources