I have a Python package with an __init__.py that imports some things to be exposed as the package API.
# __init__.py
from .mymodule import MyClass
# ...
I also want to be able to use the package as a command-line application, as in python -m mypackage, so I have a __main__.py file for that purpose:
# __main__.py
if __name__ == '__main__':
from .main import main
main()
So far so good. The problem is that, when the package is run as a program like this, I want to be able to do some stuff before importing any of the submodules - namely changing some environment variables before some third-party dependencies are loaded.
I do not know how to do this, at least not in a reasonable way. Ideally, I would like to do something like:
# __init__.py
def thePackageIsRunningAsAnApplication():
# ???
def prepareEnvironment():
# ...
if thePackageIsRunningAsAnApplication():
prepareEnvironment()
from .mymodule import MyClass
# ...
The problem is I don't think thePackageIsRunningAsAnApplication() can be implemented. The usual __name__ == '__main__' does not work here, because the main module being run is __main__.py, not __init__.py. In fact, I would prefer to define and run prepareEnvironment within __main__.py, but I don't know how to get that to run before the inner modules are loaded by __init__.py.
I might (not sure, actually) work around it by lazily loading dependencies on my module, or somehow delaying the internal module loading or something, but I would prefer to avoid doing something like that just for this.
EDIT: Thinking more about it, lazy loading probably would not work either. In the example, MyClass is, well, a class, not a submodule, so I cannot lazily load it. Moreover, MyClass happens to inherit from a class from that third-party dependency I was mentioning, so I cannot even define it without loading it.
It might make sense to add a separate entry point for running your code as a script, rather than using __main__.py, which as you've noticed, can only be run after the package's __init__.py is fully loaded.
A simple script like run_mypackage.py located at the top level could contain the environment variable tweaking code, and then could import and run the package afterwards.
def prepare_environment():
...
if __name__ == "__main__":
prepare_environment() # adjust the environment firstt
from mypackage.main import main # then load the package afterwards
main()
Not 100% sure what you want to do pre-import wise.
Afaik there is no preprocessor step before imports.
In other languages than python there usually are compiler flags which can be read before imports are done but i don't know if python has those to.
I possible solution to do stuff pre-import wise could be to have a seperate package that is imported before your other submodules (ofcourse you would need some kind of switch to not have it run when you call/use the package the usual external way).
At import of the package the whole package is run (if there is no name == 'main' part) which can be used to do stuff before the other modules are importet.
But if you mean solving some environment issues when your programm is called on the different way i think there is no way for having multi-importpath resolution without ambiguity.
There are relative import paths if your package is locally saved but i don't think they can be used that way.
Would be some interesting experiment though.
Related
I would like to run setup() in setup.py only if the module is actually run. So I want to do something like:
from setuptools import setup
def somefunction():
...
if __name__ == "__main__":
setup(...)
(I want to use some of the defined functions in the file, which are used in the setup() call, in another python script that is used during a documentation build process)
Is this possible? Disallowed? Discouraged? Why?
I could not find any documentation about this, but oddly all examples do NOT use the test for "main", so I wonder if there is anything that makes using this problematic.
The guards around the setup() call are not commonly seen in practice because this file is not usually imported. It is well-known to be an installer script, intended to be executed directly.
However, you may add the guards for the reason mentioned ("I want to use some of the defined functions in the file") and everything within distutils/setuptools should still work fine. It is somewhat unusual for the setup.py script to have library functions defined within, so you may ask yourself whether there is a better home for such functions, rather than writing them directly in the installer script itself.
Is this possible?
Yes.
Disallowed?
No.
Discouraged?
Somewhat opinion-based. Personally, I would say yes: discourage that.
Why?
The job of setup.py is to install your code from a source distribution, period. It is not typically the home for other random tasks such as documentation of build process. This file will not even be included in a wheel distribution, which is probably the more typical way to deploy Python code these days.
As a final note: if and when you move toward modern Python packaging practices using pyproject.toml, with a declarative build-system, there will be no setup.py script. Then you are going to have to find a new home for such helper functions anyway.
Shouldn't be a problem. I would recommend always using the if __name__ == "__main__": condition even if there the module is never imported. What I would not recommend on the other hand tough, is to share code between the setup script and the code of the project itself.
You can go a different route, and add all your functions of interest to a separate module that's normally importable by the rest of your program. I wouldn't recommend putting package functions into setup.py because the setup file is meant to live outside your package. You may need a utility function to import the module by path in setup.py. I generally use something that looks like this:
def import_file(name, location):
"""
Imports the specified python file as a module, without explicitly
registering it to `sys.modules`.
"""
if sys.version_info[0] == 2:
# Python 2.7-
from imp import load_source
mod = load_source(name, location)
elif sys.version_info < (3, 5, 0):
# Python 3.4-
from importlib.machinery import SourceFileLoader
mod = SourceFileLoader(name, location).load_module()
else:
# Python 3.5+
from importlib.util import spec_from_file_location, module_from_spec
spec = spec_from_file_location(name, location)
mod = module_from_spec(spec)
spec.loader.exec_module(mod)
return mod
I wanted to ask if there is a way to import modules/functions from a folder in a project that is adjacent to another folder.
For example lets say I have two files:
project/src/training.py
project/lib/functions.py
Now both these folders have the __init__.py file in them. If I wanted to import functions.py into training.py, it doesn't seem to detect. I'm trying to use from lib.functions import * .I know this works from the upper level of the folder structure, where I can call both files from a script, but is there a way to do it files in above/sideways folders?
Fundamentally, the best way of doing this depends on exactly how the two modules are related. There's two possibilities:
The modules are part of one cohesive unit that is intended to be used as a single whole, rather than as separate pieces. (This is the most common situation, and it should be your default if you're not sure.)
The functions.py module is a dependency for training.py but is intended to be used separately.
Cohesive unit
If the modules are one cohesive unit, this is not the standard way of structuring a project in Python.
If you need multiple modules in the same project, the standard way of structuring the folders is to include all the modules in a single package, like so:
project/
trainingproject/
__init__.py
training.py
functions.py
other/
...
project/
...
folders/
...
The __init__.py file causes Python to recognize the trainproject/ directory as a single unit called a package. Using a package enables to use of relative imports:
training.py
from . import functions
# The rest of training.py code
Assuming your current directory is project, you can then invoke training.py as a module:
python -m trainingproject.training
Separate units
If your modules are actually separate packages, then the simplest idiomatic solutions during development is to modify the PYTHONPATH environment variable:
sh-derviative syntax:
# All the extra PYTHONPATH references on the right are to append if it already has values
# Using $PWD ensures that the path in the environment variable is absolute.
PYTHONPATH=$PYTHONPATH${PYTHONPATH:+:}$PWD/lib/
python ./src/training.py
PowerShell syntax:
$env:PYTHONPATH = $(if($env:PYTHONPATH) {$env:PYTHONPATH + ';'}) + (Resolve-Path ./lib)
python ./src/training.py
(This is possible in Command Prompt, too, but I'm omitting that since PowerShell is preferred.)
In your module, you would just do a normal import statement:
training.py
import functions
# Rest of training.py code
Doing this will work when you deploy your code to production as well if you copy all the files over and set up the correct paths, but you might want to consider putting functions.py in a wheel and then installing it with pip. That will eliminate the need to set up PYTHONPATH by installing functions.py to site-packages, which will make the import statement just work out of the box. That will also make it easier to distribute functions.py for use with other scripts independent of training.py. I'm not going to cover how to create a wheel here since that is beyond the scope of this question, but here's an introduction.
Yes, it’s as simple as writing the entire path from the working directory:
from project.src.training import *
Or
from project.lib.functions import *
I agree with what polymath stated above. If you were also wondering how to run these specific scripts or functions once they are imported, use: your_function_name(parameters), and to run a script that you have imported from the same directory, etc, use: exec(‘script_name.py). I would recommend making functions instead of using the exec command however, because it can be a bit hard to use correctly.
This is my directory structure:
Projects
+ Project_1
+ Project_2
- Project_3
- Lib1
__init__.py # empty
moduleA.py
- Tests
__init__.py # empty
foo_tests.py
bar_tests.py
setpath.py
__init__.py # empty
foo.py
bar.py
Goals:
Have an organized project structure
Be able to independently run each .py file when necessary
Be able to reference/import both sibling and cousin modules
Keep all import/from statements at the beginning of each file.
I Achieved #1 by using the above structure
I've mostly achieved 2, 3, and 4 by doing the following (as recommended by this excellent guide)
In any package that needs to access parent or cousin modules (such as the Tests directory above) I include a file called setpath.py which has the following code:
import os
import sys
sys.path.insert(0, os.path.abspath('..'))
sys.path.insert(0, os.path.abspath('.'))
sys.path.insert(0, os.path.abspath('...'))
Then, in each module that needs parent/cousin access, such as foo_tests.py, I can write a nice clean list of imports like so:
import setpath # Annoyingly, PyCharm warns me that this is an unused import statement
import foo.py
Inside setpath.py, the second and third inserts are not strictly necessary for this example, but are included as a troubleshooting step.
My problem is that this only works for imports that reference the module name directly, and not for imports that reference the package. For example, inside bar_tests.py, neither of the two statements below work when running bar_tests.py directly.
import setpath
import Project_3.foo.py # Error
from Project_3 import foo # Error
I receive the error "ImportError: No module named 'Project_3'".
What is odd is that I can run the file directly from within PyCharm and it works fine. I know that PyCharm is doing some behind the scenes magic with the Python Path variable to make everything work, but I can't figure out what it is. As PyCharm simply runs python.exe and sets some environmental variables, it should be possible to clone this behavior from within a Python script itself.
For reasons not really germane to this question, I have to reference bar using the Project_3 qualifier.
I'm open to any solution that accomplishes the above while still meeting my earlier goals. I'm also open to an alternate directory structure if there is one that works better. I've read the Python doc on imports and packages but am still at a loss. I think one possible avenue might be manually setting the __path__ variable, but I'm not sure which one needs to be changed or what to set it to.
Those types of questions qualify as "primarily opinion based", so let me share my opinion how I would do it.
First "be able to independently run each .py file when necessary": either the file is an module, so it should not be called directly, or it is standalone executable, then it should import its dependencies starting from top level (you may avoid it in code or rather move it to common place, by using setup.py entry_points, but then your former executable effectively converts to a module). And yes, it is one of weak points of Python modules model, that causes misunderstandings.
Second, use virtualenv (or venv in Python3) and put each of your Project_x into separate one. This way project's name won't be part of Python module's path.
Third, link you've provided mentions setup.py – you may make use of it. Put your custom code into Project_x/src/mylib1, create src/mylib1/setup.py and finally your modules into src/mylib1/mylib1/module.py. Then you may install your code by pip as any other package (or pip -e so you may work on the code directly without reinstalling it, though it unfortunately has some limitations).
And finally, as you've confirmed in comment already ;). Problem with your current model was that in sys.path.insert(0, os.path.abspath('...')) you'd mistakenly used Python module's notation, which in incorrect for system paths and should be replaced with '../..' to work as expected.
I think your goals are not reasonable. Specifically, goal number 2 is a problem:
Be able to independently run each .py file when neccessary
This doesn't work well for modules in a package. At least, not if you're running the .py files naively (e.g. with python foo_tests.py on the command line). When you run the files that way, Python can't tell where the package hierarchy should start.
There are two alternatives that can work. The first option is to run your scripts from the top level folder (e.g. projects) using the -m flag to the interpreter to give it a dotted path to the main module, and using explicit relative imports to get the sibling and cousin modules. So rather than running python foo_tests.py directly, run python -m project_3.tests.foo_tests from the projects folder (or python -m tests.foo_tests from within project_3 perhaps), and have have foo_tests.py use from .. import foo.
The other (less good) option is to add a top-level folder to your Python installation's module search path on a system wide basis (e.g. add the projects folder to the PYTHON_PATH environment variable), and then use absolute imports for all your modules (e.g. import project3.foo). This is effectively what your setpath module does, but doing it system wide as part of your system's configuration, rather than at run time, it's much cleaner. It also avoids the multiple names that setpath will allow to you use to import a module (e.g. try import foo_tests, tests.foo_tests and you'll get two separate copies of the same module).
I have a flask app that uses functions from custom modules.
My File hierarchy is like so:
__init__.py
ec2/__init__.py
citrixlb/__init__.py
So far in the root __init__.py I have a from ec2 import * clause to load my module.
Now I'm adding a new 'feature' called citrixlb.
Both the of the __init__.py files in citrixlb and ec2 use some of the same functions to do their task.
I was thinking of doing something like:
__init__.py
common/__init__.py
ec2/__init__.py
citrixlb/__init__.py
If I do the above,and move all common functions to common/__init__.py, how would ec2/__init__.py and citrixlb/__init__.py get access to the functions
in common/__init__.py?
The reason is that
I would like to keep the root __init__.py as sparse as possible
I wish to be able to run the __init__.py in citrixlb and ec2 as
standalone scripts.
I also wish to be able to continue to add functionality by adding newdir/__init__.py
If I do the above,and move all common functions to common/__init__.py, how would ec2/__init__.py and citrixlb/__init__.py get access to the functions in common/__init__.py?
This is exactly what explicit relative imports were designed for:
from .. import common
Or, if you insist on using import *:
from ..common import *
You can do this with absolute import instead. Assuming your top-level package is named mything:
from mything import common
from mything.common import *
But in this case, I think you're better with the relative version. It's not just more concise and easier to read, it's more robust (if you rename mything, or reorganize its structure, or embed this whole package inside a larger package…). But you may want to read the rationales for the two different features in PEP 328 to decide which one seems more compelling to you here.
One thing:
I wish to be able to run the __init__.py in citrixlb and ec2 as standalone scripts.
That, you can't do. Running modules inside a package as a top-level script is not supposed to work. Sometimes you get away with it. Once you're importing from a sibling or a parent, you definitely will not get away with it.
The right way to do it is either:
python -m mything.ec2 instead of python mything/ec2/__init__.py
Write a trivial ec2 script at the top level, that just does something like from mything.ec2 import main; main().
The latter is a common enough pattern that, if you're building a setuptools distribution, it can build the ec2 script for you automatically. And automatically make it still work even ec2 ends up in /usr/local/bin while the mything package is in your site-packages. See console_scripts for more details.
I currently have a project called "peewee" which consists of a single python file, peewee.py. There is also a module "tests.py" containing unit tests. This has been great, people that want to use the library can just grab a single file and run with it.
I've lately wanted to add some extras, but am not sure how to do this to make the namespacing right. If you look in the root of my project, it is something like:
peewee.py
tests.py
I want to add the following:
extras/__init__.py
extras/foo.py
extras/bar.py
And this is the tricky part. I want to have it such that folks using the single file can still do this, but if you want the extras you can have them, too. I want the extras to be namespaced such that:
from peewee.extras import foo
from peewee.extras.bar import Baz
My setup.py looks a bit like this:
setup(
name='peewee',
packages=['extras'],
py_modules=['peewee'],
# ... etc ...
)
But this doesn't quite work. Any help would be greatly appreciated!
Setting Up a Package
As #ThomasK said, the easiest way to do this would be with a package. If you name your package peewee, then you can edit the top-level __init__.py file to allow users to continue to use your package in the same way they have previously.
First, directory structure for your package and subfolders:
peewee/
__init__.py
peewee.py
extras/
__init__.py
foo.py
bar.py
The __init__.py file
Next, you need to add a few lines to the top-level __init__.py.
You could go for a quick-and-dirty method and just include:
from peewee.peewee import *
which would put everything in peewee.py in the top-level namespace of your package. Or, you could take the more traditional alternative and explicitly import only those methods that should be at the top level.
from peewee.peewee import funtion1, class1,...
and, for backwards compatibility, you could explicitly set the __all__ attribute of your module to include only peewee
__all__ = ['peewee']
which will let people continue to use from peewee import * if they really need to.
Writing a setup.py file
Finally, you'll have to set up some install scripts and such too. Zed Shaw's Learn Python The Hard Way exercise 46 has a simple and clear project skeleton that you should use.
The most important part is the setup.py file. The example page isn't too long and Zed's put a lot of work into making a really great book, so I'm not going to repost it here (though the entire book is available for free). You can also read the longer instructions/documentation for writing a setup.py file for distutils, however LPTHW will give you something that will do everything you want quickly and easily.
Total package directory structure
Note that your final directory structure will actually be a bit bigger (the name of peewee-pkg doesn't matter, bin is optional--the names of the subfolders matter)
peewee-pkg/
setup.py
bin
peewee/
__init__.py
peewee.py
extras/
__init__.py
foo.py
bar.py
Installing and using
After that, you could actually post your package to PyPi if you like, or you can distribute it to users directly. All they would need to do is run:
python setup.py install
and everything will be available to them.
Importing post-install
Finally, if you do specific imports in the peewee/__init__.py file as described earlier, your users will still be able to do:
from peewee import function1, class1, ...
But now they can also use import peewee.extras to get to the extras functions (or import peewee.extras.foo as foo or from pewee.extras.foo import extra_awesome), etc. And, as you asked in your question, they will also be able to do:
from pewee.extras import foo
And then access foo's functions as if the file were in the current directory.
Useful note for developing a package
On your computer, you should run:
python setup.py develop
which will install the package to your path just like using python setup.py install; however, develop tells python to recheck the file every time it uses the module, so that every time you make changes, they will be immediately available to the system for testing.