Python- working with files in multiple directories

Python- working with files in multiple directories - python

I realize this may be considered a duplicate question to some of the other questions out there, but I've spent over an hour now reading through various pages and documentation and still don't understand what's going on here.
I'm trying to work with python files in multiple directories; I have essentially this:
myproject/
__init__.py
some_file.py
some_data.dat
tests/
__init__.py
test_some_file.py
test_some_file.py is run from the command line, as the name implies, intended to run the code contained in some_file.py, and needs to import it. However, I can't seem to do so.
I've tried:
from myproject import some_file
and also
from .. import some_file
I did manage to make it run using sys.path, but that doens't seem to be the correct way to do things based on what I've read.
Secondly, when I did make it run, using sys.path, I got an error that it couldn't find some_data.dat which is used by some_file.py.

This is a perennial question from Python programmers. The issue is that Python doesn't play nicely with scripts that are inside of packages. The situation has improved a bit over the last few releases, but it still doesn't do the right thing a lot of the time.
I think the best answer is to restrict where you run your test_some_file.py from, and use the Python interpreter's -m parameter. That is, change into the directory above myproject, and then run python -m myproject.tests.test_some_file. This is the only way that will work without messing around with sys.path.
This will allow either of your import lines to work correctly. PEP 8 currently recommends using absolute imports always, so the first version is probably better than the relative version using ...

For cases like yours, I add the directory of some_file.py to sys.path (temporarily).
Code:
import sys, os
dirname = os.path.dirname( # going up by 1 directory
os.path.dirname( # going up by 2 directories
sys.argv[0]))
sys.path.append(dirname)
import some_test

Related

ModuleNotFoundErr while run python script with jenkins pipeline [duplicate]

I have a specific problem which might require a general solution. I am currently learning apache thrift. I used this guide.I followed all the steps and i am getting a import error as Cannot import module UserManager. So the question being How does python import lookup take place. Which directory is checked first. How does it move upwards?
How does sys.path.append('') work?
I found out the answer for this here. I followed the same steps. But i am still facing the same issue. Any ideas why? Anything more i should put up that could help debug you guys. ?
Help is appreciated.

On windows, Python looks up modules from the Lib folder in the default python path, for example from "C:\Python34\Lib\". You can add your Python libaries in a custom folder ("my-lib" or sth.) in there, but you need a file in order to tell Python that you can import from there. This file is called __init__.py , and is totally empty. That data structure should look like this:
my-lib
__init__.py
/myfolder
mymodule.py
(This is how every Python module works. For example urllib.request, it's at "%PYTHONPATH%\Lib\urllib\request.py")
You can import from the "mymodule.py" file by typing
import my-lib
and then using
mylib.mymodule.myfunction
or you can use
from my-lib import mymodule
And then just using the name of you function.
You can now use sys.path.append to append the path you pass into the function to the folders Python looks for the modules (Please note that thats not permanent). If the path of your modules should be static, you should consider putting these in the Lib folder. If that path is relative to your file you could look for the path of the file you execute from, and then append the sys.path relative to your file, but i reccomend using relative imports.
If you consider doing that, i recommend reading the docs, you can do that here: https://docs.python.org/3/reference/import.html#submodules

If I got you right, you're using Python 3.3 from Blender but try to include the 3.2 standard library. This is bound to give you a flurry of issues, you should not do that. Find another way. It's likely that Blender offers a way to use the 3.3 standard library (and that's 99% compatible with 3.2). Pure-Python third party library can, of course, be included by fiddling with sys.path.
The specific issue you're seeing now is likely caused by the version difference. As people have pointed out in the comments, Python 3.3 doesn't find the _tkinter extension module. Although it is present (as it works from Python 3.2), it is most likely in a .so file with an ABI tag that is incompatible with Blender's Python 3.3, hence it won't even look at it (much like a module.txt is not considered for import module). This is a good thing. Extension modules are highly version-specific, slight ABI mismatches (such as between 3.2 and 3.3, or two 3.3 compiled with different options) can cause pretty much any kind of error, from crashes to memory leaks to silent data corruption or even something completely different.
You can verify whether this is the case via import _tkinter; print(_tkinter.file) in the 3.2 shell. Alternatively, _tkinter may live in a different directory entirely. Adding that directory won't actually fix the real issue outlined above.

For any new readers coming along that are still having issues, try the following. This is cleaner than using sys.path.append if your app directory is structured with your .py files that contain functions for import underneath your script that imports those files. Let me illustrate.
Script that imports files: main.py
Function files named like: func1.py
main.py
/functionfolder
__init__.py
func1.py
func2.py
The import code in your main.py file should look as follows:
from functionfolder import func1
from functionfolder import func2
As Agilix correctly stated, you must have an __init__.py file in your "functionfolder" (see directory illustration above).
In addition, this solved my issue with Pylance not resolving the import, and showing me a nagging error constantly. After a rabbit-hole of sifting through GitHub issues, and trying too many comparatively complicated proposed solutions, this ever-so-simple solution worked for me.

You may try with declaring sys.path.append('/path/to/lib/python') before including any IMPORT statements.

I just created a __init__.py file inside my new folder, so the directory is initialised, and it worked (:

Do Python module imports depend on how the code is run? [duplicate]

This question already has answers here:
Relative imports for the billionth time
(12 answers)
Closed 5 years ago.
I don't think any language causes as much headache as python in so simple a thing as importing other source files. So here is the question:
Do my module imports need to depend on how the code is supposed to run?
I have the following directory structure:
./__init__.py
./config.py
./kmer
./kmer/__init__.py
./kmer/__main__.py
./kmer/__pycache__
./kmer/__pycache__/__init__.cpython-36.pyc
./kmer/__pycache__/__main__.cpython-36.pyc
./kmer/__pycache__/bed.cpython-36.pyc
./kmer/__pycache__/config.cpython-36.pyc
./kmer/__pycache__/reference.cpython-36.pyc
./kmer/__pycache__/sets.cpython-36.pyc
./kmer/bed.py
./kmer/config.py
./kmer/reference.py
./kmer/sets.py
I wish to import a module inside the khmer from another module inside the kmer package. Simple?
So I add this at the of the bed.py:
import reference
import config
import sets
Now thing will work just fine if I run python bed.py from the kmer directory. Also things are fine if I come back one directory and call python kmer/bed.py. Seems like python searches for imported modules relative to the given file.
Again running it as python -m bed from the kmer directory works fine but coming back one directory and running python -m kmer.bed results in module errors. Here it seem like python looks for modules inside the interpreter's current directory which could be anywhere on the file system so imports relative to it shouldn't work.
This basically means that the imports should depend on how the code is going to be run which makes no sense. I'd appreciate some explanation on how this is supposed to work.
I've looked at quite a lot of resources including the answer to this question which although very detailed (one of the best descriptions I've found actually) doesn't solve my problem and also doesn't provide proper examples.
Update: I believe this question is more focused on a different aspect of relative import, more precisely how different methods of running the code affect the imports, than the one it is marked as a duplicate of. Thats why I mentioned the other question in first place. Hence I don't think this should be a duplicate.

When writing a package, you must consistently treat it as a package. First, that means using relative imports within the package: write from . import reference in bed.py.
It also means the answer for how to access it is always the same: put the directory containing kmer onto sys.path (usually via PYTHONPATH). This is true even if running some module in the package as the script, so it must be python -m kmer.bed.
Unfortunately, it's a bad idea to run a module inside a package as a script: the module's file is still eligible to be imported again under its proper name (rather than the __main__ used the first time). The only robust solution is to write a __main__.py and run the package as the script, avoiding overloading a public module name.
Finally, don't pay much attention to the "feature" of Python putting the script's directory on sys.path. This is useless other than for toy problems where you don't care about name collisions, since any script whose directory would be useful on sys.path is necessarily a module (except in case of tricks like making its name not be an identifier). Moreover, if you want your library to be available to scripts installed elsewhere, it needs to get on sys.path some other way anyway.

How do I structure my Python project to allow named modules to be imported from sub directories

This is my directory structure:
Projects
+ Project_1
+ Project_2
- Project_3
- Lib1
__init__.py # empty
moduleA.py
- Tests
__init__.py # empty
foo_tests.py
bar_tests.py
setpath.py
__init__.py # empty
foo.py
bar.py
Goals:
Have an organized project structure
Be able to independently run each .py file when necessary
Be able to reference/import both sibling and cousin modules
Keep all import/from statements at the beginning of each file.
I Achieved #1 by using the above structure
I've mostly achieved 2, 3, and 4 by doing the following (as recommended by this excellent guide)
In any package that needs to access parent or cousin modules (such as the Tests directory above) I include a file called setpath.py which has the following code:
import os
import sys
sys.path.insert(0, os.path.abspath('..'))
sys.path.insert(0, os.path.abspath('.'))
sys.path.insert(0, os.path.abspath('...'))
Then, in each module that needs parent/cousin access, such as foo_tests.py, I can write a nice clean list of imports like so:
import setpath # Annoyingly, PyCharm warns me that this is an unused import statement
import foo.py
Inside setpath.py, the second and third inserts are not strictly necessary for this example, but are included as a troubleshooting step.
My problem is that this only works for imports that reference the module name directly, and not for imports that reference the package. For example, inside bar_tests.py, neither of the two statements below work when running bar_tests.py directly.
import setpath
import Project_3.foo.py # Error
from Project_3 import foo # Error
I receive the error "ImportError: No module named 'Project_3'".
What is odd is that I can run the file directly from within PyCharm and it works fine. I know that PyCharm is doing some behind the scenes magic with the Python Path variable to make everything work, but I can't figure out what it is. As PyCharm simply runs python.exe and sets some environmental variables, it should be possible to clone this behavior from within a Python script itself.
For reasons not really germane to this question, I have to reference bar using the Project_3 qualifier.
I'm open to any solution that accomplishes the above while still meeting my earlier goals. I'm also open to an alternate directory structure if there is one that works better. I've read the Python doc on imports and packages but am still at a loss. I think one possible avenue might be manually setting the __path__ variable, but I'm not sure which one needs to be changed or what to set it to.

Those types of questions qualify as "primarily opinion based", so let me share my opinion how I would do it.
First "be able to independently run each .py file when necessary": either the file is an module, so it should not be called directly, or it is standalone executable, then it should import its dependencies starting from top level (you may avoid it in code or rather move it to common place, by using setup.py entry_points, but then your former executable effectively converts to a module). And yes, it is one of weak points of Python modules model, that causes misunderstandings.
Second, use virtualenv (or venv in Python3) and put each of your Project_x into separate one. This way project's name won't be part of Python module's path.
Third, link you've provided mentions setup.py – you may make use of it. Put your custom code into Project_x/src/mylib1, create src/mylib1/setup.py and finally your modules into src/mylib1/mylib1/module.py. Then you may install your code by pip as any other package (or pip -e so you may work on the code directly without reinstalling it, though it unfortunately has some limitations).
And finally, as you've confirmed in comment already ;). Problem with your current model was that in sys.path.insert(0, os.path.abspath('...')) you'd mistakenly used Python module's notation, which in incorrect for system paths and should be replaced with '../..' to work as expected.

I think your goals are not reasonable. Specifically, goal number 2 is a problem:
Be able to independently run each .py file when neccessary
This doesn't work well for modules in a package. At least, not if you're running the .py files naively (e.g. with python foo_tests.py on the command line). When you run the files that way, Python can't tell where the package hierarchy should start.
There are two alternatives that can work. The first option is to run your scripts from the top level folder (e.g. projects) using the -m flag to the interpreter to give it a dotted path to the main module, and using explicit relative imports to get the sibling and cousin modules. So rather than running python foo_tests.py directly, run python -m project_3.tests.foo_tests from the projects folder (or python -m tests.foo_tests from within project_3 perhaps), and have have foo_tests.py use from .. import foo.
The other (less good) option is to add a top-level folder to your Python installation's module search path on a system wide basis (e.g. add the projects folder to the PYTHON_PATH environment variable), and then use absolute imports for all your modules (e.g. import project3.foo). This is effectively what your setpath module does, but doing it system wide as part of your system's configuration, rather than at run time, it's much cleaner. It also avoids the multiple names that setpath will allow to you use to import a module (e.g. try import foo_tests, tests.foo_tests and you'll get two separate copies of the same module).

ImportError - Using Python Packages at the same level

I have two Python packages where one needs to be imported by the other. The directory structure is like follows:
workspace/
management/
__init__.py
handle_management.py
other_management.py
utils/
__init__.py
utils_dict.py
I'm trying to import functionality from the utils project in the handle_management.py file:
import utils.utils_dict
Error I'm getting when trying to run handle_management.py:
ImportError: No module named utils.utils_dict
I've read a lot about how to resolve this problem and I can't seem to find a solution that works.
I started with Import a module from a relative path - I tried the applicable solutions but none worked.
Is the only solution to make workspace/ available via site_packages? If so, what is the best way to do this?
EDIT:
I've tried to add the /home/rico/workspace/ to the PYTHONPATH - no luck.
EDIT 2:
I was able to successfully use klobucar's solution but I don't think it will work as a general solution since this utility is going to be used by several other developers. I know I can use some Python generalizations to determine the relative path for each user. I just feel like there is a more elegant solution.
Ultimately this script will run via cron to execute unit testing on several Python projects. This is also going to be available to each developer to ensure integration testing during their development.
I need to be able to make this more general.
EDIT 3:
I'm sorry, but I don't really like any of these solutions for what I'm trying to accomplish. I appreciate the input - and I'm using them as a temporary fix. As a complete fix I'm going to look into adding a script available in the site_packages directory that will add to the PYTHONPATH. This is something that is needed across several machines and several developers.
Once I build my complete solution I'll post back here with what I did and mark it as a solution.
EDIT 4:
I feel as though I didn't do a good job expressing my needs with this question. The answers below addressed this question well - but not my actual needs. As a result I have restructured my question in another post. I considered editing this one, but then the answers below (which would be very helpful for others) wouldn't be meaningful to the change and would seem out of place and irrelevant.
For the revised version of this question please see Unable to import Python package universally

You have 2 solutions:
Either put workspace in your PYTHONPATH:
import sys
sys.path.append('/path/to/workspace')
from utils import utils_dict
(Note that if you're running a script inside workspace, that is importing handle_management, most probably workspace is already in your PYTHONPATH, and you wouldn't need to do that, but it seems it's not the case HERE).
Or, make "workspace" a package by adding an empty (or not) __init__.py file in the workspace directory. Then:
from ..utils import utils_dict
I would prefer the second, because you would have a problem if there's another module called "utils" in you PYTHONPATH
Apart from that, you are importing wrong here: import utils.utils_dict.py. You don't need to include the extension of the file ".py". You are not importing the file, you are importing the module (or package if it's a folder), so you don't want the path to that file, you need its name.

What you need to do is add workspace to your import path. I would make a wrapper that does this for you in workspace or just put workspace in you PYTHONPATH as an environment variable.
import sys
# Add the workspace folder path to the sys.path list
sys.path.append('/path/to/workspace/')
from workspace.utils import utils_dict

Put "workspace/" in your PYTHONPATH to make the packages underneath available to you when it searches.
This can be done from your shell profile (if you are on *nix, or environment variables on windows.
For instance, on OSX you might add to your ~/.profile (if workspace is in your home directory):
export PYTHONPATH=$HOME/workspace:$PYTHONPATH
Another option is to use virtualenv to make your project area its own contained environment.

Best practice for handling path/executables in project scripts in Python (e.g. something like Django's manage.py, or fabric)

I do a lot of work on different projects (I'm a scientist) in a fairly standardised directory structure. e.g.:
project
/analyses/
/lib
/doc
/results
/bin
I put all my various utility scripts in /bin/ because cleanliness is next to godliness. However, I have to hard code paths (e.g. ../../x/y/z) and then I have to run things within ./bin/ or they break.
I've used Django and that has /manage.py which runs various django-things and automatically handles the path. I've also used fabric to run various user defined functions.
Question: How do I do something similar? and what's the best way? I can easily write something in /manage.py to inject the root dir into sys.path etc, but then I'd like to be able to do "./manage.py foo" which would run /bin/foo.py. Or is it possible to get fabric to call executables from a certain directory?
Basically - I want something easy and low maintenance. I want to be able to drop an executable script/file/whatever into ./bin/ and not have to deal with path issues or import issues.
What is the best way to do this?

Keep Execution at TLD
In general, try to keep your runtime at top-level. This will straighten out your imports tremendously.
If you have to do a lot of import addressing with relative imports, there's probably a
better way.
Modifying The Path
Other poster's have mentioned the PYTHONPATH. That's a great way to do it permanently in your shell.
If you don't want to/aren't able to manipulate the PYTHONPATH project path directly you can use sys.path to get yourself out of relative import hell.
Using sys.path.append
sys.path is just a list internally. You can append to it to add stuff to into your path.
Say I'm in /bin and there's a library markdown in lib/. You can append a relative paths with sys.path to import what you want.
import sys
sys.path.append('../lib')
import markdown
print markdown.markdown("""
Hello world!
------------
""")
Word to the wise: Don't get too crazy with your sys.path additions. Keep your schema simple to avoid yourself a lot confusion.
Overly eager imports can sometimes lead to cases where a python module needs to import itself, at which point execution will halt!
Using Packages and __init__.py
Another great trick is creating python packages by adding __init__.py files. __init__.py gets loaded before any other modules in the directory, so it's a great way to add imports across the entire directory. This makes it an ideal spot to add sys.path hackery.
You don't even need to necessarily add anything to the file. It's sufficient to just do touch __init__.py at the console to make a directory a package.
See this SO post for a more concrete example.

In a shell script that you source (not run) in your current shell you set the following environment variables:
PATH=$PATH:$PROJECTDIR/bin
PYTHONPATH=$PROJECTDIR/lib
Then you put your Python modules and package tree in your projects ./lib directory. Python automatically adds the PYTHONPATH environment variable to sys.path.
Then you can run any top-level script from the shell without specifying the path, and any imports from your library modules are looked for in the lib directory.
I recommend very simple top-level scripts, such as:
#!/usr/bin/python
import sys
import mytool
mytool.main(sys.argv)
Then you never have to change that, you just edit the module code, and also benefit from the byte-code caching.

You can easily achieve your goals by creating a mini package that hosts each one of your projects. Use paste scripts to create a simple project skeleton. And to make it executable, just install it via setup.py develop. Now your bin scripts just need to import the entry point to this package and execute it.

Develop Reference

Python is a programming language that lets you work quickly and integrate systems more effectively.

Python- working with files in multiple directories - python

For cases like yours, I add the directory of some_file.py to sys.path (temporarily). Code: import sys, os dirname = os.path.dirname( # going up by 1 directory os.path.dirname( # going up by 2 directories sys.argv[0])) sys.path.append(dirname) import some_test

Related

ModuleNotFoundErr while run python script with jenkins pipeline [duplicate]

Do Python module imports depend on how the code is run? [duplicate]

How do I structure my Python project to allow named modules to be imported from sub directories

ImportError - Using Python Packages at the same level

Best practice for handling path/executables in project scripts in Python (e.g. something like Django's manage.py, or fabric)

Categories

Resources