How are python module paths translated to filesystem paths?

How are python module paths translated to filesystem paths? - python

This may seem like a simple question, but I haven't found an answer that explains the behavior I'm seeing. Hard to provide a simple repro case but I basically have a package structure like this:
a.b.c
a.b.utils
I have one project that has files in a.b.c. (let's call this aux_project) and another that has files in a.b.d, a.b.utils, etc (call it main_project). I'm trying to import a.b.utils inside pytest tests in the first project, using tests_require. This does not work because a.b is for some reason sourced from inside aux_project/a/b/__init__.pyc instead of the virtualenv and it shadows the other package (i.e. this a.b only has a c in it, not d or utils). This happens ONLY in the test context. In ipython I can load all packages fine, and they are correctly loaded from virtualenv.
What's weirder is that if I simply delete the actual directory, the tests do load the pycs from virtualenv and everything works (I need that directory, though)
python==2.7.9
What is going on?

Ok, the problem was simply that the cwd is prepended to the PYTHONPATH. sys.path.pop(1) (0 is the tests dir, prepended by pytest) resolved the behavior.

Related

How do I structure my Python project to allow named modules to be imported from sub directories

This is my directory structure:
Projects
+ Project_1
+ Project_2
- Project_3
- Lib1
__init__.py # empty
moduleA.py
- Tests
__init__.py # empty
foo_tests.py
bar_tests.py
setpath.py
__init__.py # empty
foo.py
bar.py
Goals:
Have an organized project structure
Be able to independently run each .py file when necessary
Be able to reference/import both sibling and cousin modules
Keep all import/from statements at the beginning of each file.
I Achieved #1 by using the above structure
I've mostly achieved 2, 3, and 4 by doing the following (as recommended by this excellent guide)
In any package that needs to access parent or cousin modules (such as the Tests directory above) I include a file called setpath.py which has the following code:
import os
import sys
sys.path.insert(0, os.path.abspath('..'))
sys.path.insert(0, os.path.abspath('.'))
sys.path.insert(0, os.path.abspath('...'))
Then, in each module that needs parent/cousin access, such as foo_tests.py, I can write a nice clean list of imports like so:
import setpath # Annoyingly, PyCharm warns me that this is an unused import statement
import foo.py
Inside setpath.py, the second and third inserts are not strictly necessary for this example, but are included as a troubleshooting step.
My problem is that this only works for imports that reference the module name directly, and not for imports that reference the package. For example, inside bar_tests.py, neither of the two statements below work when running bar_tests.py directly.
import setpath
import Project_3.foo.py # Error
from Project_3 import foo # Error
I receive the error "ImportError: No module named 'Project_3'".
What is odd is that I can run the file directly from within PyCharm and it works fine. I know that PyCharm is doing some behind the scenes magic with the Python Path variable to make everything work, but I can't figure out what it is. As PyCharm simply runs python.exe and sets some environmental variables, it should be possible to clone this behavior from within a Python script itself.
For reasons not really germane to this question, I have to reference bar using the Project_3 qualifier.
I'm open to any solution that accomplishes the above while still meeting my earlier goals. I'm also open to an alternate directory structure if there is one that works better. I've read the Python doc on imports and packages but am still at a loss. I think one possible avenue might be manually setting the __path__ variable, but I'm not sure which one needs to be changed or what to set it to.

Those types of questions qualify as "primarily opinion based", so let me share my opinion how I would do it.
First "be able to independently run each .py file when necessary": either the file is an module, so it should not be called directly, or it is standalone executable, then it should import its dependencies starting from top level (you may avoid it in code or rather move it to common place, by using setup.py entry_points, but then your former executable effectively converts to a module). And yes, it is one of weak points of Python modules model, that causes misunderstandings.
Second, use virtualenv (or venv in Python3) and put each of your Project_x into separate one. This way project's name won't be part of Python module's path.
Third, link you've provided mentions setup.py – you may make use of it. Put your custom code into Project_x/src/mylib1, create src/mylib1/setup.py and finally your modules into src/mylib1/mylib1/module.py. Then you may install your code by pip as any other package (or pip -e so you may work on the code directly without reinstalling it, though it unfortunately has some limitations).
And finally, as you've confirmed in comment already ;). Problem with your current model was that in sys.path.insert(0, os.path.abspath('...')) you'd mistakenly used Python module's notation, which in incorrect for system paths and should be replaced with '../..' to work as expected.

I think your goals are not reasonable. Specifically, goal number 2 is a problem:
Be able to independently run each .py file when neccessary
This doesn't work well for modules in a package. At least, not if you're running the .py files naively (e.g. with python foo_tests.py on the command line). When you run the files that way, Python can't tell where the package hierarchy should start.
There are two alternatives that can work. The first option is to run your scripts from the top level folder (e.g. projects) using the -m flag to the interpreter to give it a dotted path to the main module, and using explicit relative imports to get the sibling and cousin modules. So rather than running python foo_tests.py directly, run python -m project_3.tests.foo_tests from the projects folder (or python -m tests.foo_tests from within project_3 perhaps), and have have foo_tests.py use from .. import foo.
The other (less good) option is to add a top-level folder to your Python installation's module search path on a system wide basis (e.g. add the projects folder to the PYTHON_PATH environment variable), and then use absolute imports for all your modules (e.g. import project3.foo). This is effectively what your setpath module does, but doing it system wide as part of your system's configuration, rather than at run time, it's much cleaner. It also avoids the multiple names that setpath will allow to you use to import a module (e.g. try import foo_tests, tests.foo_tests and you'll get two separate copies of the same module).

why python contains libs and lib directories? [duplicate]

For me, it's located at C:\Python33\libs.
For reference - this is not the same folder as C:\Python33\Lib - note the capitalization and lack of an 's'.
On one computer I was working on, I simply dropped a .py file into the libs folder and could import and use it like a library / module (sorry, I don't really know terminology very well), regardless of where the project I was working on is.
However, in trying to duplicate this on another machine, this doesn't work. Attempting to import simply gives a "no module named X" error.
So, clearly I'm misunderstanding the purpose of the libs folder, and how it differs from the Lib folder.
So, what exactly is the difference?

If you compare libs/ vs. Lib/ you'll notice that the latter is full of *.py files and the former has *.lib files. Further investigation with a text editor will show that *.py files are human-readable (I hope) and the *.lib files are not.
And that's really the difference. If you want to know more, the .lib files are static-link libraries, used for building .dlls, C extensions, and all that good stuff. Head on down the rabbit hole if that interests you.
On to the meat of your question: are you supposed to be able to drop modules in there and be able to import them? Not really. That is a side effect of that folder being included in your path. From the Modules docs:
When a module named spam is imported, the interpreter first searches
for a built-in module with that name. If not found, it then searches
for a file named spam.py in a list of directories given by the
variable sys.path. sys.path is initialized from these locations:
the directory containing the input script (or the current directory).
PYTHONPATH (a list of directory names, with the same syntax as the shell variable PATH).
the installation-dependent default.
Various installation methods will modify %PATH% or %PYTHONPATH% so I can't tell you exactly where to look; on my windows box, the python installer modified %PATH% for me, so you should probably look there first. Notably, my path does not include Python33/libs/ so I would not expect it to be there by default.

Just looking on mine (Windows 7) /libs appears to be the native code libraries (*.lib) vs the straight python libraries in /Lib. The readme also mentions a configuration flag:
--with-libs='libs': Add 'libs' to the LIBS that the python interpreter
is linked against.
Which may or may not be set on different installs/platforms.
This isn't really a answer; hopefully someone with a firmer knowledge of it will explain further - was just a bit too much info to squeeze into a comment.

Best practice for handling path/executables in project scripts in Python (e.g. something like Django's manage.py, or fabric)

I do a lot of work on different projects (I'm a scientist) in a fairly standardised directory structure. e.g.:
project
/analyses/
/lib
/doc
/results
/bin
I put all my various utility scripts in /bin/ because cleanliness is next to godliness. However, I have to hard code paths (e.g. ../../x/y/z) and then I have to run things within ./bin/ or they break.
I've used Django and that has /manage.py which runs various django-things and automatically handles the path. I've also used fabric to run various user defined functions.
Question: How do I do something similar? and what's the best way? I can easily write something in /manage.py to inject the root dir into sys.path etc, but then I'd like to be able to do "./manage.py foo" which would run /bin/foo.py. Or is it possible to get fabric to call executables from a certain directory?
Basically - I want something easy and low maintenance. I want to be able to drop an executable script/file/whatever into ./bin/ and not have to deal with path issues or import issues.
What is the best way to do this?

Keep Execution at TLD
In general, try to keep your runtime at top-level. This will straighten out your imports tremendously.
If you have to do a lot of import addressing with relative imports, there's probably a
better way.
Modifying The Path
Other poster's have mentioned the PYTHONPATH. That's a great way to do it permanently in your shell.
If you don't want to/aren't able to manipulate the PYTHONPATH project path directly you can use sys.path to get yourself out of relative import hell.
Using sys.path.append
sys.path is just a list internally. You can append to it to add stuff to into your path.
Say I'm in /bin and there's a library markdown in lib/. You can append a relative paths with sys.path to import what you want.
import sys
sys.path.append('../lib')
import markdown
print markdown.markdown("""
Hello world!
------------
""")
Word to the wise: Don't get too crazy with your sys.path additions. Keep your schema simple to avoid yourself a lot confusion.
Overly eager imports can sometimes lead to cases where a python module needs to import itself, at which point execution will halt!
Using Packages and __init__.py
Another great trick is creating python packages by adding __init__.py files. __init__.py gets loaded before any other modules in the directory, so it's a great way to add imports across the entire directory. This makes it an ideal spot to add sys.path hackery.
You don't even need to necessarily add anything to the file. It's sufficient to just do touch __init__.py at the console to make a directory a package.
See this SO post for a more concrete example.

In a shell script that you source (not run) in your current shell you set the following environment variables:
PATH=$PATH:$PROJECTDIR/bin
PYTHONPATH=$PROJECTDIR/lib
Then you put your Python modules and package tree in your projects ./lib directory. Python automatically adds the PYTHONPATH environment variable to sys.path.
Then you can run any top-level script from the shell without specifying the path, and any imports from your library modules are looked for in the lib directory.
I recommend very simple top-level scripts, such as:
#!/usr/bin/python
import sys
import mytool
mytool.main(sys.argv)
Then you never have to change that, you just edit the module code, and also benefit from the byte-code caching.

You can easily achieve your goals by creating a mini package that hosts each one of your projects. Use paste scripts to create a simple project skeleton. And to make it executable, just install it via setup.py develop. Now your bin scripts just need to import the entry point to this package and execute it.

PYTHONPATH vs. sys.path

Another developer and I disagree about whether PYTHONPATH or sys.path should be used to allow Python to find a Python package in a user (e.g., development) directory.
We have a Python project with a typical directory structure:
Project
setup.py
package
__init__.py
lib.py
script.py
In script.py, we need to do import package.lib. When the package is installed in site-packages, script.py can find package.lib.
When working from a user directory, however, something else needs to be done. My solution is to set my PYTHONPATH to include "~/Project". Another developer wants to put this line of code in the beginning of script.py:
sys.path.append(os.path.dirname(os.path.dirname(os.path.abspath(__file__))))
So that Python can find the local copy of package.lib.
I think this is a bad idea, as this line is only useful for developers or people running from a local copy, but I can't give a good reason why it is a bad idea.
Should we use PYTOHNPATH, sys.path, or is either fine?

If the only reason to modify the path is for developers working from their working tree, then you should use an installation tool to set up your environment for you. virtualenv is very popular, and if you are using setuptools, you can simply run setup.py develop to semi-install the working tree in your current Python installation.

I hate PYTHONPATH. I find it brittle and annoying to set on a per-user basis (especially for daemon users) and keep track of as project folders move around. I would much rather set sys.path in the invoke scripts for standalone projects.
However sys.path.append isn't the way to do it. You can easily get duplicates, and it doesn't sort out .pth files. Better (and more readable): site.addsitedir.
And script.py wouldn't normally be the more appropriate place to do it, as it's inside the package you want to make available on the path. Library modules should certainly not be touching sys.path themselves. Instead, you'd normally have a hashbanged-script outside the package that you use to instantiate and run the app, and it's in this trivial wrapper script you'd put deployment details like sys.path-frobbing.

In general I would consider setting up of an environment variable (like PYTHONPATH)
to be a bad practice. While this might be fine for a one off debugging but using this as
a regular practice might not be a good idea.
Usage of environment variable leads to situations like "it works for me" when some one
else reports problems in the code base. Also one might carry the same practice with the
test environment as well, leading to situations like the tests running fine for a
particular developer but probably failing when some one launches the tests.

Along with the many other reasons mentioned already, you could also point outh that hard-coding
sys.path.append(os.path.dirname(os.path.dirname(os.path.abspath(__file__))))
is brittle because it presumes the location of script.py -- it will only work if script.py is located in Project/package. It will break if a user decides to move/copy/symlink script.py (almost) anywhere else.

Neither hacking PYTHONPATH nor sys.path is a good idea due to the before mentioned reasons. And for linking the current project into the site-packages folder there is actually a better way than python setup.py develop, as explained here:
pip install --editable path/to/project
If you don't already have a setup.py in your project's root folder, this one is good enough to start with:
from setuptools import setup
setup('project')

I think, that in this case using PYTHONPATH is a better thing, mostly because it doesn't introduce (questionable) unneccessary code.
After all, if you think of it, your user doesn't need that sys.path thing, because your package will get installed into site-packages, because you will be using a packaging system.
If the user chooses to run from a "local copy", as you call it, then I've observed, that the usual practice is to state, that the package needs to be added to PYTHONPATH manually, if used outside the site-packages.

Python module search path problem

I am trying to work on a dev environment but am find problems in that python seems to be using modules from the site-packages directory. I want it to be using the modules from my dev directory.
sys.path returns a bunch of dirs, like this
['', '/usr/lib/python26.zip', '/usr/lib/python2.6', '/usr/lib/python2.6/plat-linux2', '/usr/lib/python2.6/lib-tk', '/usr/lib/python2.6/lib-old', '/usr/lib/python2.6/lib-dynload', '/usr/lib/python2.6/site-packages' etc
This is good, it's using the current directory as the first place of lookup (at least this is how I understand it to be).
Ok now if I create say a file called command.py in the current directory, things work as I would expect them.
>>> import commands
>>> commands.__file__
'commands.pyc'
I then exit out of the python shell, and start another one. I then do this.
>>> import foo.bar.commands
Now, what I'm expecting it to do is go down from the current directory to ./foo/bar/ and get me the commands module from there. What I get though is this
>>> foo.bar.commands.__file__
'/usr/lib/python2.6/site-packages/foo/bar/commands.pyc'
Even though from my current directory there is a ./foo/bar/commands.py
Using imp.find_module() and imp.load_module() I can load the local module properly. Whats actually interesting (although I don't really know what it means) is the last line that is printed out in this sequence
>>> import foo.bar.commands
>>> foo.bar.commands.__file__
'/usr/lib/python2.6/site-packages/foo/bar/commands.pyc'
>>> foo.bar.__file__
'/usr/lib/python2.6/site-packages/foo/bar/__int__.pyc'
>>> foo.__file__
'./foo/__init__.pyc'
So if it can find the foo/init.pyc in the local dir why can't it find the other files in the local dir?
Cheers

You mention that there's a foo directory under your current directory, but you don't tell us whether foo/__init__.py exists (even possibly empty): if it doesn't, this tells Python that foo is not a package. Similarly for foo/bar/__init__.py -- if that file doesn't exist, even if foo/__init__.py does, then foo.bar is not a package.
You can play around a little by placing .pth files and/or setting __path__ explicitly in your packages, but the basic, simple rule is to just place an __init__.py in every directory that you want Python to recognize as a package. The contents of that file are "the body" of the package itself, so if you import foo and foo is a directory with a foo/__init__.py file, then that's what you're importing (in any case, the package's body executes the first time you import anything from the package or any subpackage thereof).
If that is not the problem, it looks like some other import (or explicit sys.path manipulation) may be messing you up. Running python with a -v flag makes imports highly visible, which can help. Another good technique is to place an
import pdb; pdb.set_trace()
just before the import that you think is misbehaving, and examining sys.path, sys.modules (and possibly other advanced structures such as import hooks) at that point - is there a sys.modules['foo'] already defined, for example? Interactively trying the functions from standard library module imp that locate modules on your behalf given a path may also prove instructive.

What is foo doing in /usr/lib/python2.6/site-packages?
It sounds like you have created foo in your local directory but that is not necessarily the one you are importing.
Try getting rid of the foo/bar in site-packages
Make sure your directory structure looks like this
/foo/__init__.py
/bar/__init__.py
/commands.py
Also, it is a good idea to not reuse python standard library names for your own modules -- can you call your commands.py something else?

Develop Reference

Python is a programming language that lets you work quickly and integrate systems more effectively.

How are python module paths translated to filesystem paths? - python

Ok, the problem was simply that the cwd is prepended to the PYTHONPATH. sys.path.pop(1) (0 is the tests dir, prepended by pytest) resolved the behavior.

Related

How do I structure my Python project to allow named modules to be imported from sub directories

why python contains libs and lib directories? [duplicate]

Best practice for handling path/executables in project scripts in Python (e.g. something like Django's manage.py, or fabric)

PYTHONPATH vs. sys.path

Python module search path problem

Categories

Resources