Dynamically import a package from a directory - python

I am making a server that listens on a socket. Whenever a new request comes in, the server spawns a new instance of handle_request. Each instance of handle_request.py imports the relevant handler from request_handlers.
server.py
handle_request.py
request_handlers
|_handler_beta
|_handler_1
|_handler_2
While handlerX is a module, request_handlers is not a package. The modules are self-contained and reloaded on each request. The modules may be added, modified or dropped while the program is running.
Question: What is the way to import a module from an arbitrary directory?
Doing my homework, I saw that most questions deal with packages, even one is titled "python: import a module from a folder". Hence I believe this question is distinct. The architecture has been simplified; and yes, I am considering pre-forking with reload on file modification.

Create __init__.py in the directory. That makes it a package. If you're scanning the directory for .py files, you'll probably then want to skip __init__.py.
Then, you can import them with __import__('request_handlers.' + module, fromlist=['']) (the fromlist is important, otherwise you'll get request_handlers rather than the appropriate module).
One way of doing it without an __init__.py would be to put the request_handlers directory in sys.path, but that then makes name clashes with other modules possible. Another way would be with execfile. You can research that more if you want to. I'd do it (and have done it before) the package/__import__ way.

As Chris said: add an __init__.py to your folder. You can use the __import__() function if you do not know the names beforehand or if you do not want to have their names fixed in your code.

Related

How do I structure my Python project to allow named modules to be imported from sub directories

This is my directory structure:
Projects
+ Project_1
+ Project_2
- Project_3
- Lib1
__init__.py # empty
moduleA.py
- Tests
__init__.py # empty
foo_tests.py
bar_tests.py
setpath.py
__init__.py # empty
foo.py
bar.py
Goals:
Have an organized project structure
Be able to independently run each .py file when necessary
Be able to reference/import both sibling and cousin modules
Keep all import/from statements at the beginning of each file.
I Achieved #1 by using the above structure
I've mostly achieved 2, 3, and 4 by doing the following (as recommended by this excellent guide)
In any package that needs to access parent or cousin modules (such as the Tests directory above) I include a file called setpath.py which has the following code:
import os
import sys
sys.path.insert(0, os.path.abspath('..'))
sys.path.insert(0, os.path.abspath('.'))
sys.path.insert(0, os.path.abspath('...'))
Then, in each module that needs parent/cousin access, such as foo_tests.py, I can write a nice clean list of imports like so:
import setpath # Annoyingly, PyCharm warns me that this is an unused import statement
import foo.py
Inside setpath.py, the second and third inserts are not strictly necessary for this example, but are included as a troubleshooting step.
My problem is that this only works for imports that reference the module name directly, and not for imports that reference the package. For example, inside bar_tests.py, neither of the two statements below work when running bar_tests.py directly.
import setpath
import Project_3.foo.py # Error
from Project_3 import foo # Error
I receive the error "ImportError: No module named 'Project_3'".
What is odd is that I can run the file directly from within PyCharm and it works fine. I know that PyCharm is doing some behind the scenes magic with the Python Path variable to make everything work, but I can't figure out what it is. As PyCharm simply runs python.exe and sets some environmental variables, it should be possible to clone this behavior from within a Python script itself.
For reasons not really germane to this question, I have to reference bar using the Project_3 qualifier.
I'm open to any solution that accomplishes the above while still meeting my earlier goals. I'm also open to an alternate directory structure if there is one that works better. I've read the Python doc on imports and packages but am still at a loss. I think one possible avenue might be manually setting the __path__ variable, but I'm not sure which one needs to be changed or what to set it to.
Those types of questions qualify as "primarily opinion based", so let me share my opinion how I would do it.
First "be able to independently run each .py file when necessary": either the file is an module, so it should not be called directly, or it is standalone executable, then it should import its dependencies starting from top level (you may avoid it in code or rather move it to common place, by using setup.py entry_points, but then your former executable effectively converts to a module). And yes, it is one of weak points of Python modules model, that causes misunderstandings.
Second, use virtualenv (or venv in Python3) and put each of your Project_x into separate one. This way project's name won't be part of Python module's path.
Third, link you've provided mentions setup.py – you may make use of it. Put your custom code into Project_x/src/mylib1, create src/mylib1/setup.py and finally your modules into src/mylib1/mylib1/module.py. Then you may install your code by pip as any other package (or pip -e so you may work on the code directly without reinstalling it, though it unfortunately has some limitations).
And finally, as you've confirmed in comment already ;). Problem with your current model was that in sys.path.insert(0, os.path.abspath('...')) you'd mistakenly used Python module's notation, which in incorrect for system paths and should be replaced with '../..' to work as expected.
I think your goals are not reasonable. Specifically, goal number 2 is a problem:
Be able to independently run each .py file when neccessary
This doesn't work well for modules in a package. At least, not if you're running the .py files naively (e.g. with python foo_tests.py on the command line). When you run the files that way, Python can't tell where the package hierarchy should start.
There are two alternatives that can work. The first option is to run your scripts from the top level folder (e.g. projects) using the -m flag to the interpreter to give it a dotted path to the main module, and using explicit relative imports to get the sibling and cousin modules. So rather than running python foo_tests.py directly, run python -m project_3.tests.foo_tests from the projects folder (or python -m tests.foo_tests from within project_3 perhaps), and have have foo_tests.py use from .. import foo.
The other (less good) option is to add a top-level folder to your Python installation's module search path on a system wide basis (e.g. add the projects folder to the PYTHON_PATH environment variable), and then use absolute imports for all your modules (e.g. import project3.foo). This is effectively what your setpath module does, but doing it system wide as part of your system's configuration, rather than at run time, it's much cleaner. It also avoids the multiple names that setpath will allow to you use to import a module (e.g. try import foo_tests, tests.foo_tests and you'll get two separate copies of the same module).

Pythonic way for magaging and using user-created shared libraries

tl;dr I have a directory of common files outside of my various project directories. What is the pythonic way of using/importing these common files inside my projects, and for building them into an output directory.
Background:
I'm in school and taking a data structures class that uses Python as the language. I'm learning the languages as I take the class but have having some issues trying to maintain a shared code base.
In all of the other languages I've used, both compiled and interpreted, there has been a fairly intuitive way of being able to keep shared modules separate from the code that is using them so that updating a shared module doesn't require updates to the calling code.
This is how I initially had my directory structure organized.
/.../Projects
Assignment_1
__init__.py
classA.py
classB.py
Assignment_2
__init__.pu
classC.py
(etc)
After realizing that much of the functionality of classA and classB would be required later on, I reorganized to this:
/.../Projects
Common
Sorters
__init__.py
BubbleSort.py
MergeSort.py
__init__.py
SimpleProfiler.py
Assignment_1
__init__.py
main.py
Assignment_2
__init__.py
main.py
My issue is that I can't find a good way of importing things like SimpleProfiler or MergeSort from main,py. Right now I'm manually coping all of the Common files into each assignment, which is bad.
I understand that one possible solution is to update the path to include the common folder form within each main.py file, but I've also read that this is very hacky and isn't encouraged.
Another Stackoverflow answer to a similar question suggested that the user structure everything under one large project. I tried this but still couldn't import modules from one sibling into another sibling.
My other issue is how to package everything together when submitting the assignment. In other languages it was easy to implement a build script that would scan the main project for any imports, then copy (flatten) those imported files into a single output directory which I could then compress and submit for grading. I'm using PyCharm, but can't seem to find a way to reference the imports as part of the build process. Is there any kind of script for this? Whatever the solution is, I need to be able to submit the project in such a way that all the instructor has to do is call a single python file (such as main.py)
This issue isn't unique to a school setting, but seems universal to most programming projects. So, what is the pythonic way of managing a shared code base and for building that shared code into a final project?
[Disclaimer: I think it is better to use PYTHONPATH environment variable]
I think of two very similar alternatives:
/.../Projects
Common
Sorters
__init__.py
BubbleSort.py
MergeSort.py
__init__.py
SimpleProfiler.py
assignment_1.py
assignment_2.py
If you use, from assignment_1.py, the following import: from Common.Sorters.BubbleSort import bubble_sort. This is because, by default, PYTHONPATH considers the current path as a valid PYTHONPATH. This is assuming that you are invoking the scripts assignment_* directly.
The other alternative would be:
/.../Projects
Common
Sorters
__init__.py
BubbleSort.py
MergeSort.py
__init__.py
SimpleProfiler.py
Assignment_1
__init__.py
__main__.py
Assignment_2
__init__.py
__main__.py
And invoking the assignments like so: python -m Assignment_1 (from the Projects folder). By default, "executing" a module like that will load its __main__.py code. (This is not a rigurous explanation, although the official one is a bit short).
It works for the same reasons as before: Python interpreter will consider the current path as a valid PYTHONPATH.
Try setting PYTHONPATH environment variable to your directory.
Python first searches for files being imported in sys.path, and the first directory in sys.path is the current directory. PYTHONPATH is the next where python will look for files.
On the minimum end, make a PyCharm run configuration that sets your PYTHONPATH before executing and include the other directory. That way you don't need to do a sys.path call in your code.
Closer to the "perfect" end, make your other directory into a Python package with a setup.py. Then, using the interpreter from your project, do "python path/to/other/dir/setup.py develop" to bring your separately-developed package into the consuming project.

What is the argument for Python to seemingly frown on importing from different directories?

This might be a more broad question, and more related to understanding Python's nature and probably good programming practices in general.
I have a file, called util.py. It has a lot of different small functions I've collected over the past few months that are useful when doing various machine learning tasks.
My thinking is this: I'd like to continue adding important functions to this script as I go. As so, I will want to use import util.py often, now and in the future, in many unrelated projects.
But Python seems to feel like I should only be able to access the code in this file if it lives in my current directly, even if the functions in this file are useful for scripts in different directories. I sense some reason behind the way that works that I don't fully grasp; to me, it seems like I'll be forced to make unnecessary copies often.
If I should have to create a new copy of util.py every time I'm working from within a new directory, on a different project, it won't be long until I have many different version / iterations of this file, scattered all over my hard drive, in various states. I don't desire this degree of modularity in my programming -- for the sake of simplicity, repeatability, and clarity, I want only one file in only one location, accessible to many projects.
The question in a nutshell: What is the argument for Python to seemingly frown on importing from different directories?
If your util.py file contains functions you're using in a lot of different projects, then it's actually a library, and you should package it as such so you can install it in any Python environment with a single line (python setup.py install), and update it if required (Python's packaging ecosystem has several features to track and update library versions).
An added benefit is that right now, if you're doing what the other answers suggested, you have to remember to manually have put util.py in your PYTHONPATH (the "dirty" way). If you try to run one of your programs and you haven't done that, you'll get a cryptic ImportError that doesn't explain much: is it a missing dependency? A typo in the program?
Now think about what happens if someone other than you tries to run the program(s) and gets those error messages.
If you have a library, on the other hand, trying to set up your program will either complain in clear, understandable language that the library is missing or out of date, or (if you've taken the appropriate steps) automatically download and install it so things are ready to roll.
On a related topic, having a file/module/namespace called "util" is a sign of bad design. What are these utilities for? It's the programming equivalent of a "miscellaneous" folder: eventually, everything will end up in it and you'll have no way to know what it contains other than opening it and reading it all.
Another way, is adding the directory/you/want/to/import/from to the path from within the scripts that need it.
You should have a file __init__.py in the same folder where utils.py lives, to tell python to treat the folder as a package. The file __init__.py may be empty, or not, you can define other things in there.
Example:
/home/marcos/python/proj1/
__init__.py
utils.py
/home/marcos/school_projects/final_assignment/
my_scrpyt.py
And then inside my_script.py
import sys
sys.path.append('/home/marcos/python/')
from proj1 import utils
MAX_HEIGHT = utils.SOME_CONSTANT
a_value = utils.some_function()
First, define an environment variable. If you are using bash, for example, then put the following in the appropriate startup file:
export PYTHONPATH=/path/to/my/python/utilities
Now, put your util.py and any of your other common modules or packages in that directory. Now you can import util from anywhere and python will find it.

Importing a class and calling a method

I am using Eclipse for Python programming.
In my project, I have a file: main.py. This file is located in the root of the project files hierarchy. In the root itself, I created a folder with the name Classes in which I have a class file named: PositionWindow.py. This file contains a class PositionWindow and the class itself contains a function named: Center().
In main.py, I want to import this class [PositionWindow] and later call that function Center in the appropriate place.
I am not able to import that class correctly in main.py and not following how to call that function later.
You seem to be programming in java, still. I understand that you used java for a long time, but this is no longer java. This is python...
Python doesn't have directories. It has packages
Python doesn't have class files. It has modules.
You can have multiple classes in a module.
You can have multiple modules in a package.
I suggest you read at least the python basic tutorial (specially the part about packages and modules) so you can learn python, instead of trying to guess the language.
About the structure of your project, there's this article which is pretty good, and shows you how to do it.
shameless copy paste:
Filesystem structure of a Python project
by Jp Calderone
Do:
name the directory something related to your project. For example, if your
project is named "Twisted", name the
top-level directory for its source
files Twisted. When you do releases,
you should include a version number
suffix: Twisted-2.5.
create a directory Twisted/bin and put your executables there, if you
have any. Don't give them a .py
extension, even if they are Python
source files. Don't put any code in
them except an import of and call to a
main function defined somewhere else
in your projects.
If your project is expressable as a single Python source file, then put it
into the directory and name it
something related to your project. For
example, Twisted/twisted.py. If you
need multiple source files, create a
package instead (Twisted/twisted/,
with an empty
Twisted/twisted/__init__.py) and place
your source files in it. For example,
Twisted/twisted/internet.py.
put your unit tests in a sub-package of your package (note - this means
that the single Python source file
option above was a trick - you always
need at least one other file for your
unit tests). For example,
Twisted/twisted/test/. Of course, make
it a package with
Twisted/twisted/test/__init__.py.
Place tests in files like
Twisted/twisted/test/test_internet.py.
add Twisted/README and Twisted/setup.py to explain and
install your software, respectively,
if you're feeling nice.
Don't:
put your source in a directory called src or lib. This makes it hard
to run without installing.
put your tests outside of your Python package. This makes it hard to
run the tests against an installed
version.
create a package that only has a __init__.py and then put all your code into __init__.py. Just make a module
instead of a package, it's simpler.
try to come up with magical hacks to make Python able to import your module
or package without having the user add
the directory containing it to their
import path (either via PYTHONPATH or
some other mechanism). You will not
correctly handle all cases and users
will get angry at you when your
software doesn't work in their
environment.
Instead of creating "folder" in the root of your project, create a "package". Simply create a blank file called __init__.py and you should be able to import your module in main.py.
import Classes.PositionWindow
p = Classes.PositionWindow.PositionWindow()
p.Center()
However, you should read up on modules and packages, because your structure indicates that your approach may be flawed. First, a class doesn't have to be in a separate .py file like it does in Java. Further, your packages/modules/functions/methods should all be in lower case. Only class names should be in Upper case.
So you have this file layout:
/main.py
/Classes/PositionWindow.py (with Center inside it)
You have two choices:
Add "Classes" to your Python Path, allowing you to import PositionWindow.py directly.
Make "Classes" a package (possibly with a better name).
To add the Classes folder to your Python Path, set PYTHONPATH as an environment variable to include it. This works like your shell's PATH -- when you import PositionWindow, it will look through all the directories in your Python Path to find it.
Alternatively, if you add a blank file:
Classes/__init__.py
You can import the package and its contents like so in main.py:
import Classes.PositionWindow
x = Classes.PositionWindow.Center()

Refactoring python module configuration to avoid relative imports

This is related to a previous question of mine.
I understand how to store and read configuration files. There are choices such as ConfigParser and ConfigObj.
Consider this structure for a hypothetical 'eggs' module:
eggs/
common/
__init__.py
config.py
foo/
__init__.py
a.py
'eggs.foo.a' needs some configuration information. What I am currently doing is, in 'a', import eggs.common.config. One problem with this is that if 'a' is moved to a deeper level in the module tree, the relative imports break. Absolute imports don't, but they require your module to be on your PYTHONPATH.
A possible alternative to the above absolute import is a relative import. Thus, in 'a',
import .common.config
Without debating the merits of relative vs absolute imports, I was wondering about other possible solutions?
edit- Removed the VCS context
"imports ... require your module to be on your PYTHONPATH"
Right.
So, what's wrong with setting PYTHONPATH?
require statement from pkg_resources maybe what you need.
As I understand it from this and previous questions you only need one path to be in sys.path. If we are talking about git as VCS (mentioned in previous question) when only one branch is checked out at any time (single working directory). You can switch, merge branches as frequently as you like.
I'm thinking of something along the lines of a more 'push-based' kind of solution. Instead of importing the shared objects (be they for configuration, or utility functions of some sort), have the top-level init export it, and each intermediate init import it from the layer above, and immediately re-export it.
I'm not sure if I've got the python terminology right, please correct me if I'm wrong.
Like this, any module that needs to use the shared object(which in the context of this example represents configuration information) simply imports it from the init at its own level.
Does this sound sensible/feasible?
You can trick the import mechanism, by adding each subdirectory to egg/__init__.py:
__path__.append(__path__[0]+"\\common")
__path__.append(__path__[0]+"\\foo")
then, you simply import all modules from the egg namespace; e.g. import egg.bar (provided you have file egg/foo/bar.py).
Note that foo and common should not be a package - in other words, they should not contain __init__.py file.
This solution completely solves the issue of eventually moving files around; however it flattens the namespace and therefore it may not be as good, especially in big projects - personally, I prefer full name resolution.

Categories

Resources