I would like to run setup() in setup.py only if the module is actually run. So I want to do something like:
from setuptools import setup
def somefunction():
...
if __name__ == "__main__":
setup(...)
(I want to use some of the defined functions in the file, which are used in the setup() call, in another python script that is used during a documentation build process)
Is this possible? Disallowed? Discouraged? Why?
I could not find any documentation about this, but oddly all examples do NOT use the test for "main", so I wonder if there is anything that makes using this problematic.
The guards around the setup() call are not commonly seen in practice because this file is not usually imported. It is well-known to be an installer script, intended to be executed directly.
However, you may add the guards for the reason mentioned ("I want to use some of the defined functions in the file") and everything within distutils/setuptools should still work fine. It is somewhat unusual for the setup.py script to have library functions defined within, so you may ask yourself whether there is a better home for such functions, rather than writing them directly in the installer script itself.
Is this possible?
Yes.
Disallowed?
No.
Discouraged?
Somewhat opinion-based. Personally, I would say yes: discourage that.
Why?
The job of setup.py is to install your code from a source distribution, period. It is not typically the home for other random tasks such as documentation of build process. This file will not even be included in a wheel distribution, which is probably the more typical way to deploy Python code these days.
As a final note: if and when you move toward modern Python packaging practices using pyproject.toml, with a declarative build-system, there will be no setup.py script. Then you are going to have to find a new home for such helper functions anyway.
Shouldn't be a problem. I would recommend always using the if __name__ == "__main__": condition even if there the module is never imported. What I would not recommend on the other hand tough, is to share code between the setup script and the code of the project itself.
You can go a different route, and add all your functions of interest to a separate module that's normally importable by the rest of your program. I wouldn't recommend putting package functions into setup.py because the setup file is meant to live outside your package. You may need a utility function to import the module by path in setup.py. I generally use something that looks like this:
def import_file(name, location):
"""
Imports the specified python file as a module, without explicitly
registering it to `sys.modules`.
"""
if sys.version_info[0] == 2:
# Python 2.7-
from imp import load_source
mod = load_source(name, location)
elif sys.version_info < (3, 5, 0):
# Python 3.4-
from importlib.machinery import SourceFileLoader
mod = SourceFileLoader(name, location).load_module()
else:
# Python 3.5+
from importlib.util import spec_from_file_location, module_from_spec
spec = spec_from_file_location(name, location)
mod = module_from_spec(spec)
spec.loader.exec_module(mod)
return mod
Related
I have a Python package with an __init__.py that imports some things to be exposed as the package API.
# __init__.py
from .mymodule import MyClass
# ...
I also want to be able to use the package as a command-line application, as in python -m mypackage, so I have a __main__.py file for that purpose:
# __main__.py
if __name__ == '__main__':
from .main import main
main()
So far so good. The problem is that, when the package is run as a program like this, I want to be able to do some stuff before importing any of the submodules - namely changing some environment variables before some third-party dependencies are loaded.
I do not know how to do this, at least not in a reasonable way. Ideally, I would like to do something like:
# __init__.py
def thePackageIsRunningAsAnApplication():
# ???
def prepareEnvironment():
# ...
if thePackageIsRunningAsAnApplication():
prepareEnvironment()
from .mymodule import MyClass
# ...
The problem is I don't think thePackageIsRunningAsAnApplication() can be implemented. The usual __name__ == '__main__' does not work here, because the main module being run is __main__.py, not __init__.py. In fact, I would prefer to define and run prepareEnvironment within __main__.py, but I don't know how to get that to run before the inner modules are loaded by __init__.py.
I might (not sure, actually) work around it by lazily loading dependencies on my module, or somehow delaying the internal module loading or something, but I would prefer to avoid doing something like that just for this.
EDIT: Thinking more about it, lazy loading probably would not work either. In the example, MyClass is, well, a class, not a submodule, so I cannot lazily load it. Moreover, MyClass happens to inherit from a class from that third-party dependency I was mentioning, so I cannot even define it without loading it.
It might make sense to add a separate entry point for running your code as a script, rather than using __main__.py, which as you've noticed, can only be run after the package's __init__.py is fully loaded.
A simple script like run_mypackage.py located at the top level could contain the environment variable tweaking code, and then could import and run the package afterwards.
def prepare_environment():
...
if __name__ == "__main__":
prepare_environment() # adjust the environment firstt
from mypackage.main import main # then load the package afterwards
main()
Not 100% sure what you want to do pre-import wise.
Afaik there is no preprocessor step before imports.
In other languages than python there usually are compiler flags which can be read before imports are done but i don't know if python has those to.
I possible solution to do stuff pre-import wise could be to have a seperate package that is imported before your other submodules (ofcourse you would need some kind of switch to not have it run when you call/use the package the usual external way).
At import of the package the whole package is run (if there is no name == 'main' part) which can be used to do stuff before the other modules are importet.
But if you mean solving some environment issues when your programm is called on the different way i think there is no way for having multi-importpath resolution without ambiguity.
There are relative import paths if your package is locally saved but i don't think they can be used that way.
Would be some interesting experiment though.
This is my directory structure:
Projects
+ Project_1
+ Project_2
- Project_3
- Lib1
__init__.py # empty
moduleA.py
- Tests
__init__.py # empty
foo_tests.py
bar_tests.py
setpath.py
__init__.py # empty
foo.py
bar.py
Goals:
Have an organized project structure
Be able to independently run each .py file when necessary
Be able to reference/import both sibling and cousin modules
Keep all import/from statements at the beginning of each file.
I Achieved #1 by using the above structure
I've mostly achieved 2, 3, and 4 by doing the following (as recommended by this excellent guide)
In any package that needs to access parent or cousin modules (such as the Tests directory above) I include a file called setpath.py which has the following code:
import os
import sys
sys.path.insert(0, os.path.abspath('..'))
sys.path.insert(0, os.path.abspath('.'))
sys.path.insert(0, os.path.abspath('...'))
Then, in each module that needs parent/cousin access, such as foo_tests.py, I can write a nice clean list of imports like so:
import setpath # Annoyingly, PyCharm warns me that this is an unused import statement
import foo.py
Inside setpath.py, the second and third inserts are not strictly necessary for this example, but are included as a troubleshooting step.
My problem is that this only works for imports that reference the module name directly, and not for imports that reference the package. For example, inside bar_tests.py, neither of the two statements below work when running bar_tests.py directly.
import setpath
import Project_3.foo.py # Error
from Project_3 import foo # Error
I receive the error "ImportError: No module named 'Project_3'".
What is odd is that I can run the file directly from within PyCharm and it works fine. I know that PyCharm is doing some behind the scenes magic with the Python Path variable to make everything work, but I can't figure out what it is. As PyCharm simply runs python.exe and sets some environmental variables, it should be possible to clone this behavior from within a Python script itself.
For reasons not really germane to this question, I have to reference bar using the Project_3 qualifier.
I'm open to any solution that accomplishes the above while still meeting my earlier goals. I'm also open to an alternate directory structure if there is one that works better. I've read the Python doc on imports and packages but am still at a loss. I think one possible avenue might be manually setting the __path__ variable, but I'm not sure which one needs to be changed or what to set it to.
Those types of questions qualify as "primarily opinion based", so let me share my opinion how I would do it.
First "be able to independently run each .py file when necessary": either the file is an module, so it should not be called directly, or it is standalone executable, then it should import its dependencies starting from top level (you may avoid it in code or rather move it to common place, by using setup.py entry_points, but then your former executable effectively converts to a module). And yes, it is one of weak points of Python modules model, that causes misunderstandings.
Second, use virtualenv (or venv in Python3) and put each of your Project_x into separate one. This way project's name won't be part of Python module's path.
Third, link you've provided mentions setup.py – you may make use of it. Put your custom code into Project_x/src/mylib1, create src/mylib1/setup.py and finally your modules into src/mylib1/mylib1/module.py. Then you may install your code by pip as any other package (or pip -e so you may work on the code directly without reinstalling it, though it unfortunately has some limitations).
And finally, as you've confirmed in comment already ;). Problem with your current model was that in sys.path.insert(0, os.path.abspath('...')) you'd mistakenly used Python module's notation, which in incorrect for system paths and should be replaced with '../..' to work as expected.
I think your goals are not reasonable. Specifically, goal number 2 is a problem:
Be able to independently run each .py file when neccessary
This doesn't work well for modules in a package. At least, not if you're running the .py files naively (e.g. with python foo_tests.py on the command line). When you run the files that way, Python can't tell where the package hierarchy should start.
There are two alternatives that can work. The first option is to run your scripts from the top level folder (e.g. projects) using the -m flag to the interpreter to give it a dotted path to the main module, and using explicit relative imports to get the sibling and cousin modules. So rather than running python foo_tests.py directly, run python -m project_3.tests.foo_tests from the projects folder (or python -m tests.foo_tests from within project_3 perhaps), and have have foo_tests.py use from .. import foo.
The other (less good) option is to add a top-level folder to your Python installation's module search path on a system wide basis (e.g. add the projects folder to the PYTHON_PATH environment variable), and then use absolute imports for all your modules (e.g. import project3.foo). This is effectively what your setpath module does, but doing it system wide as part of your system's configuration, rather than at run time, it's much cleaner. It also avoids the multiple names that setpath will allow to you use to import a module (e.g. try import foo_tests, tests.foo_tests and you'll get two separate copies of the same module).
I have a flask app that uses functions from custom modules.
My File hierarchy is like so:
__init__.py
ec2/__init__.py
citrixlb/__init__.py
So far in the root __init__.py I have a from ec2 import * clause to load my module.
Now I'm adding a new 'feature' called citrixlb.
Both the of the __init__.py files in citrixlb and ec2 use some of the same functions to do their task.
I was thinking of doing something like:
__init__.py
common/__init__.py
ec2/__init__.py
citrixlb/__init__.py
If I do the above,and move all common functions to common/__init__.py, how would ec2/__init__.py and citrixlb/__init__.py get access to the functions
in common/__init__.py?
The reason is that
I would like to keep the root __init__.py as sparse as possible
I wish to be able to run the __init__.py in citrixlb and ec2 as
standalone scripts.
I also wish to be able to continue to add functionality by adding newdir/__init__.py
If I do the above,and move all common functions to common/__init__.py, how would ec2/__init__.py and citrixlb/__init__.py get access to the functions in common/__init__.py?
This is exactly what explicit relative imports were designed for:
from .. import common
Or, if you insist on using import *:
from ..common import *
You can do this with absolute import instead. Assuming your top-level package is named mything:
from mything import common
from mything.common import *
But in this case, I think you're better with the relative version. It's not just more concise and easier to read, it's more robust (if you rename mything, or reorganize its structure, or embed this whole package inside a larger package…). But you may want to read the rationales for the two different features in PEP 328 to decide which one seems more compelling to you here.
One thing:
I wish to be able to run the __init__.py in citrixlb and ec2 as standalone scripts.
That, you can't do. Running modules inside a package as a top-level script is not supposed to work. Sometimes you get away with it. Once you're importing from a sibling or a parent, you definitely will not get away with it.
The right way to do it is either:
python -m mything.ec2 instead of python mything/ec2/__init__.py
Write a trivial ec2 script at the top level, that just does something like from mything.ec2 import main; main().
The latter is a common enough pattern that, if you're building a setuptools distribution, it can build the ec2 script for you automatically. And automatically make it still work even ec2 ends up in /usr/local/bin while the mything package is in your site-packages. See console_scripts for more details.
I am creating a Python package with multiple modules. I want to make sure that when I import modules within the package that they are importing only from the package and not something outside the package that has the same name.
Is the correct way of doing this is to use relative imports? Will this interfere when I move my package to a different location on my machine (or gets installed wherever on a customer's machine)?
Modern relative imports (here's a reference) are package-relative and package-specific, so as long as the internal structure of your package does not change you can move the package as a whole around wherever you want.
While Joran Beasley's answer should work as well (though does not seem necessary in those older versions of Python where absolute imports aren't the default, as the old style of importing checked within the package's directory first), I personally don't really like modifying the import path like that when you don't have to, especially if you need to load some of those other packages or modules that your modules or packages now shadow.
A warning, however: these do require that the module in question is loaded as part of a package, or at least have their __name__ set to indicate a location in a package. Relative imports won't work for a module when __name__ == '__main__', so if you're writing a simple/casual script that utilizes another module in the same directory as it (and want to make sure the script will refer to the proper directory, things won't work right if the current working directory is not set to the script's), you could do something like import os, sys; sys.path.insert(0, os.path.dirname(os.path.abspath(__file__))) (with thanks to https://stackoverflow.com/a/1432949/138772 for the idea). As noted in S.Lott's answer to the same question, this probably isn't something you'd want to do professionally or as part of a team project, but for something personal where you're just doing some menial task automation or the like it should be fine.
the sys.path tells python where to look for imports
add
import sys
sys.path.insert(0,".")
to the top of your main python script this will ensure local packages are imported BEFORE builtin packages (although tbh I think this happens automagically)
if you really want to import only packages in your folder do
import sys
sys.path = ["."]
however I do not recommend this at all as it will probably break lots of your stuff ...
most IDE's (eclipse/pycharm/etc) provide mechanisms to set up the environment a project uses including its paths
really the best option is not to name packages the same as builtin packages or 3rd party modules that are installed on your system
also the best option is to distribute it via a correctly bundled package, this should more than suffice
In a Python system for which I develop, we usually have this module structure.
mymodule/
mymodule/mymodule/feature.py
mymodule/test/feature.py
This allows our little testing framework to easily import test/feature.py and run unit tests. However, we now have the need for some shell scripts (which are written in Python):
mymodule/
mymodule/scripts/yetanotherfeature.py
mymodule/test/yetanotherfeature.py
yetanotherfeature.py is installed by the module Debian package into /usr/bin. But we obviously don't want the .py extension there. So, in order for the test framework to still be able to import the module I have to do this symbolic link thingie:
mymodule/
mymodule/scripts/yetanotherfeature
mymodule/scripts/yetanotherfeature.py # -> mymodule/scripts/yetanotherfeature
mymodule/test/yetanotherfeature.py
Is it possible to import a module by filename in Python, or can you think of a more elegant solution for this?
The imp module is used for this:
daniel#purplehaze:/tmp/test$ cat mymodule
print "woho!"
daniel#purplehaze:/tmp/test$ cat test.py
import imp
imp.load_source("apanapansson", "mymodule")
daniel#purplehaze:/tmp/test$ python test.py
woho!
daniel#purplehaze:/tmp/test$
You could most likely use some tricker by using import hooks, I wouldn't recommend it though. On the other hand I would also probably do it the other way around , have your .py scripts somewhere, and make '.py'less symbolic links to the .py files. So your library could be anywhere and you can run the test from within by importing it normall (since it has the py extension), and then /usr/bin/yetanotherfeature points to it, so you can run it without the py.
Edit: Nevermind this (at least the hooks part), the import imp solution looks very good to me :)
Check out imp module:
http://docs.python.org/library/imp.html
This will allow you to load a module by filename. But I think your symbolic link is a more elegant solution.
Another option would be to use setuptools:
"...there’s no easy way to have a script’s filename match local conventions on both Windows and POSIX platforms. For another, you often have to create a separate file just for the “main” script, when your actual “main” is a function in a module somewhere... setuptools fixes all of these problems by automatically generating scripts for you with the correct extension, and on Windows it will even create an .exe file..."
https://pythonhosted.org/setuptools/setuptools.html#automatic-script-creation