Users should install our python package via pip or it can be cloned from a github repo and installed from source. Users should not be running import Foo from within the source tree directory for a number of reasons, e.g. C extensions are missing (numpy has the same issue: read here). So, we want to check if the user is running import Foo from within the source tree, but how to do this cleanly, efficiently, and robustly with support for Python 3 and 2?
Edit: Note the source tree here is defined as where the code is downloaded too (e.g. via git or from the source archive) and it contrasts with the installation directory where the code is installed too.
We considered the following:
Check for setup.py, or other file like PKG-INFO, which should only be present in the source. It’s not that elegant and checking for the presence of a file is not very cheap, given this check will happen every time someone import Foo. Also there is nothing to stop someone from putting a setup.py outside to the source tree in their lib/python3.X/site-packages/ directory or similar.
Parsing the contents of setup.py for the package name, but it also adds overhead and is not that clean to parse.
Create a dummy flag file that is only present in the source tree.
Some clever, but likely overcomplicated and error-prone, ideas like modifying Foo/__init__.py during installation to note that we are now outside of the source tree.
Since you mention numpy in your comments and wanting to do it like they do but not fully understanding it, I figured I would break that down and see if you could implement a similar process.
__init__.py
The error you are seeking starts here which is what you linked to in your comments and answers so you already know that. It's just attempting an import of __config__.py and failing if it isn't there or can't be imported.
try:
from numpy.__config__ import show as show_config
except ImportError:
msg = """Error importing numpy: you should not try to import numpy from
its source directory; please exit the numpy source tree, and relaunch
your python interpreter from there."""
raise ImportError(msg)
So where does the __config__.py file come from then and how does that help? Let's follow below...
setup.py
When the package is installed, setup is called to run and it in turn does some configuration actions. This is essentially what ensures that the package is properly installed rather than being run from the download directory (which I think is what you are wanting to ensure).
The key here is this line:
config.make_config_py() # installs __config__.py
misc_util.py
That is imported from distutils/misc_util.py which we can follow all the way down to here.
def make_config_py(self,name='__config__'):
"""Generate package __config__.py file containing system_info
information used during building the package.
This file is installed to the
package installation directory.
"""
self.py_modules.append((self.name, name, generate_config_py))
Which is then running here which is writing in that __config__.py file with some system information and your show() function.
Summary
The import of __config__.py is attempted and fails which generates the error you are wanting to raise if setup.py wasn't run, which is what triggers that file to be properly created. That ensures not only that a file check is being done but that the file only exists in the installation directory. It is still some overhead of importing an additional file on every import but no matter what you do you're adding some amount of overhead making this check in the first place.
Suggestions
I think that you could implement a much lighter weight version of what numpy is doing while accomplishing the same thing.
Remove the distutils subfunction and create the checked file within your setup.py file as part of the standard install. It would only exist in the installed directory after installation and never elsewhere unless a user faked that (in which case they could get around just about anything you try probably).
As an alternative (without knowing your application and what your setup file is doing) possibly you have a function that is normally imported anyway that isn't key to the running of the application but is good to have available (in numpy's case the functions are information about the installation like version(). Instead of keeping those functions where you put them now, you make them part of this file that is created. Then you are at least loading something that you would otherwise load anyway from somewhere else.
Using this method you are importing something no matter what, which has some overhead, or raising the error. I think as far as methods to raise an error because they aren't working out of the installed directory, it's a pretty clean and straightforward way to do it. No matter what method you use, you have some overhead of using that method so I would focus on keeping the overhead low, simple, and not going to cause errors.
I wouldn't do something that is complicated like parsing the setup file or modifying necessary files like __init__.py somewhere. I think you are right that those methods would be more error prone.
Checking if setup.py exists could work but I would consider it less clean than attempting to import which is already optimized as a standard Python function. They accomplish similar things but I think implemented the numpy style is going to be more straight forward.
Related
I have an .so file called tissue-classifier.cpython-37m-x86_64-linux-gnu.so from an external library that I want to import so that I can extend it in one of my local classes. Since I am extending a class, I need to import it as an extension type using cimport and I am wondering if this is even possible. If I use a normal import statement then I will be left with a Python compiled version which cannot be used to extend a cdef class in my current file.
When I try to cimport the tissue_classifier file, it gives me the error that tissue_classifier.pxd file could not be found which makes sense since it is in .so format. Sorry if this is a dumb question, I just haven't been able to figure this out for a while.
No, a *.so-file cannot be cimported.
If one is having C/CPP-backgrounds, then pyx/pxd/so-business is propably easiest to understand using the following model:
the resulting extension (*.so-file) corresponds to the final artefact in C/CPP-world which could be an executable, a shared-object (*.so), or a library/object-file collection. If you just runs the resulting program it is all you need. For example you can use (and probably do) a CPython-interpreter without having built it or having its source code. In analogy, if you have a binary extension (*.so) you can import and use it whitout having to build it (or even having corresponding pyx-files or a compiler on your machine) - that is what is provided by a wheel.
*.pyx corresponds to c/cpp-files, which have the definitions of the functionality. These files are needed if you want to build the resulting artifact from the source. In C/CPP-world this build process would be triggered by using make or similar. pyx-files are needed if you install the package via python setup.py install - which corresponds to calling make.
*.pxd corresponds to headers (h/hpp-files): it decribes the functionality in the resulting so-files, so it can be reused. For example just having CPython-interpreter isn't enough to build extensions - one has to install the dev-version so also the includes Python.h&Co. are present at the machine.
So what can be done?
First possibility:
If authors of the package consider *.pxd-files being part of the public API, then they can put the corresponding pxd-files next to *.so-files into the installation, so the c-interface of the module can be used/extended.
If they don't put the pxd-file into the installation, so chances are high that this c-interface is an implementation detail and you should not be using it, as it may change without notice in the future.
However, it is possible to take the risk and to copy the necessary pxd-files to the installation per hand, but making first sure that it is the right pxd-version (i.e. the same with which so-files in the installation were built).
Second possibility:
The easiest way to ensure, that the right pxd-version is used is to build package from the source, i.e.
dowloading the the right versioin from github (master or last release)
calling python setup.py install or what README tells you to do
Now, instead of copying pdx-files to the installation, one could add include_path to the downloaded package-source via include_path for cythonize-function or by adding it to sys.path.
Alternatively, as #BeforeFlight has pointed out in the comments, one could use python setup.py develop (or pip install -e the same folder so it can be deinstalled), and because it creates a link instead of copying data, the pxd-files will be found.
The above solutions will help to build the module, distributing it is a completely different story however.
I'm cleaning up packaging for a python project I didn't create. Currently, it does some explicitly unsupported magic to get its dependencies from a requirements.txt file. The file looks like it may have been generated by pip freeze; there are fixed versions for everything, and many apparently-extraneous packages listed. I am pretty sure some of these aren't real dependencies, but I don't know which ones.
Given just the source tree, how would I figure out, from scratch, what dependencies ought to be included in install_requires?
As a first stab, I'm grepping for non-stdlib import statements. I hope there's a better way.
There's no way to do this perfectly, because Python is too flexible.
But it's usually possible to do it well enough.
You can use start with the stdlib's modulefinder.
Beyond that, a number of projects—mostly projects designed for building binary executables, installers, etc. for Python apps—have come up with heuristics that go even farther.
These usually work. And, when they fail, you usually immediately spot it on your first test. Even if they aren't sufficient, they're at the very least good sample code. Here are a few off the top of my head:
cx_Freeze
py2exe
py2app
pyInstaller
In case you're wondering why it's impossible:
Even forgetting about the program of dependencies in C extension modules, Python is just too flexible to catch all the ways you could import a module via static analysis.
Sure, you'd have to be dealing with code written by someone crazy enough to use explicitly unsupported magic for no good reason… but if you were, there's nothing to stop someone from writing this instead of import lxml:1
with open('picture.jpg', encoding='cp500') as f:
getattr(sys.modules[11], codecs.encode('vzcbeg_zbqhyr', 'rot13'))(f.read().strip())
In reality, things aren't going to be that bad. But they could easily be too bad for rg import to be sufficient.
You could try to detect all the imports dynamically with a simple import hook, but that's only guaranteed to work if you can exercise 100% of the code paths.
1. Of course this only works if importlib was the 12th module loaded, and if picture.jpg is not a JPEG image but a textfile whose contents are, in EBCDIC, lxml\n
I've had great results with pipreqs that will automatically generate a requirements.txt file from your source code.
pipreqs /home/project/location
Successfully saved requirements file in /home/project/location/requirements.txt
I wrote a tool, realreq, specifically for this issue.
You can install it using pip python3 -m pip install realreq. Using it is easy as:
realreq -s /path/to/your/source
It will then gather your dependencies actually used in your source code.
I mean, the most effective way would honestly be to go through the code line by line and determine what packages may not be needed, what packages need updates, etc. I know Python 2 and 3 both have ModuleFinder which finds all the modules a script needs to successfully compile and run, but I've never used it before, so not sure how effective it is, especially for what you're doing. However, if you're interested, I'll attach the link below.
https://docs.python.org/3/library/modulefinder.html
So recently I have found about a NEAT algorithm and wanted to give it a try using NEAT-Python(not sure if this is even the correct source :| ). So I created my virtual environment activated it and installed the neat-python using pip in the VE. When I then tried to run one of the examples from their GitHub page it threw an error like this:
ImportError: No module named visualize
So I checked my source files, and actually the neat-python doesn't include the visualize.py script, however it is in their GitHub repository. I then tried to add it myself by downloading just the visualize.oy script dragging it inside my VE and adding it to all the textfiles the NEAT brought with it, like the installed-filex.txt etc. However it still threw the same error.
I'm still fairly new to VE and GitHub so please don't be too hard on me :] thanks in advance.
-Jorge
I think you could simply copying the visualize.py into the same directory as the script you are running.
If you wanted it in your lib/site-packages directory so you could import it with the neat module:
copy visualize.py into lib/site-packages/neat/ and modify __init__.py to add the line import neat.visualize as visualize. Delete the __pycache__ directory. Make sure you have modules installed: Numpy, GraphViz, and Matplotlib. When you've done the above, you should be able to import neat and access neat.visualize.
I don't recommend doing this though for several reasons:
Say you wanted to update your neat module. Your visualize.py file is technically not part of the module. So it wouldn't be updated along with your neat module.
the visualize.py file seems to be written in the context of the examples as opposed to being for general use with the module, so contextually, it doesn't belong there.
At some point in the future, you might also forget that this wasn't a part of the module, but your code acts as if it was part of the API. So your code will break in some other neat installation.
This might be a more broad question, and more related to understanding Python's nature and probably good programming practices in general.
I have a file, called util.py. It has a lot of different small functions I've collected over the past few months that are useful when doing various machine learning tasks.
My thinking is this: I'd like to continue adding important functions to this script as I go. As so, I will want to use import util.py often, now and in the future, in many unrelated projects.
But Python seems to feel like I should only be able to access the code in this file if it lives in my current directly, even if the functions in this file are useful for scripts in different directories. I sense some reason behind the way that works that I don't fully grasp; to me, it seems like I'll be forced to make unnecessary copies often.
If I should have to create a new copy of util.py every time I'm working from within a new directory, on a different project, it won't be long until I have many different version / iterations of this file, scattered all over my hard drive, in various states. I don't desire this degree of modularity in my programming -- for the sake of simplicity, repeatability, and clarity, I want only one file in only one location, accessible to many projects.
The question in a nutshell: What is the argument for Python to seemingly frown on importing from different directories?
If your util.py file contains functions you're using in a lot of different projects, then it's actually a library, and you should package it as such so you can install it in any Python environment with a single line (python setup.py install), and update it if required (Python's packaging ecosystem has several features to track and update library versions).
An added benefit is that right now, if you're doing what the other answers suggested, you have to remember to manually have put util.py in your PYTHONPATH (the "dirty" way). If you try to run one of your programs and you haven't done that, you'll get a cryptic ImportError that doesn't explain much: is it a missing dependency? A typo in the program?
Now think about what happens if someone other than you tries to run the program(s) and gets those error messages.
If you have a library, on the other hand, trying to set up your program will either complain in clear, understandable language that the library is missing or out of date, or (if you've taken the appropriate steps) automatically download and install it so things are ready to roll.
On a related topic, having a file/module/namespace called "util" is a sign of bad design. What are these utilities for? It's the programming equivalent of a "miscellaneous" folder: eventually, everything will end up in it and you'll have no way to know what it contains other than opening it and reading it all.
Another way, is adding the directory/you/want/to/import/from to the path from within the scripts that need it.
You should have a file __init__.py in the same folder where utils.py lives, to tell python to treat the folder as a package. The file __init__.py may be empty, or not, you can define other things in there.
Example:
/home/marcos/python/proj1/
__init__.py
utils.py
/home/marcos/school_projects/final_assignment/
my_scrpyt.py
And then inside my_script.py
import sys
sys.path.append('/home/marcos/python/')
from proj1 import utils
MAX_HEIGHT = utils.SOME_CONSTANT
a_value = utils.some_function()
First, define an environment variable. If you are using bash, for example, then put the following in the appropriate startup file:
export PYTHONPATH=/path/to/my/python/utilities
Now, put your util.py and any of your other common modules or packages in that directory. Now you can import util from anywhere and python will find it.
I need to ship a collection of Python programs that use multiple packages stored in a local Library directory: the goal is to avoid having users install packages before using my programs (the packages are shipped in the Library directory). What is the best way of importing the packages contained in Library?
I tried three methods, but none of them appears perfect: is there a simpler and robust method? or is one of these methods the best one can do?
In the first method, the Library folder is simply added to the library path:
import sys
import os
sys.path.insert(0, os.path.join(os.path.dirname(__file__), 'Library'))
import package_from_Library
The Library folder is put at the beginning so that the packages shipped with my programs have priority over the same modules installed by the user (this way I am sure that they have the correct version to work with my programs). This method also works when the Library folder is not in the current directory, which is good. However, this approach has drawbacks. Each and every one of my programs adds a copy of the same path to sys.path, which is a waste. In addition, all programs must contain the same three path-modifying lines, which goes against the Don't Repeat Yourself principle.
An improvement over the above problems consists in trying to add the Library path only once, by doing it in an imported module:
# In module add_Library_path:
sys.path.insert(0, os.path.join(os.path.dirname(__file__), 'Library'))
and then to use, in each of my programs:
import add_Library_path
import package_from_Library
This way, thanks to the caching mechanism of CPython, the module add_Library_path is only run once, and the Library path is added only once to sys.path. However, a drawback of this approach is that import add_Library_path has an invisible side effect, and that the order of the imports matters: this makes the code less legible, and more fragile. Also, this forces my distribution of programs to inlude an add_Library_path.py program that users will not use.
Python modules from Library can also be imported by making it a package (empty __init__.py file stored inside), which allows one to do:
from Library import module_from_Library
However, this breaks for packages in Library, as they might do something like from xlutils.filter import …, which breaks because xlutils is not found in sys.path. So, this method works, but only when including modules in Library, not packages.
All these methods have some drawback.
Is there a better way of shipping programs with a collection of packages (that they use) stored in a local Library directory? or is one of the methods above (method 1?) the best one can do?
PS: In my case, all the packages from Library are pure Python packages, but a more general solution that works for any operating system is best.
PPS: The goal is that the user be able to use my programs without having to install anything (beyond copying the directory I ship them regularly), like in the examples above.
PPPS: More precisely, the goal is to have the flexibility of easily updating both my collection of programs and their associated third-party packages from Library by having my users do a simple copy of a directory containing my programs and the Library folder of "hidden" third-party packages. (I do frequent updates, so I prefer not forcing the users to update their Python distribution too.)
Messing around with sys.path() leads to pain... The modern package template and Distribute contain a vast array of information and were in part set up to solve your problem.
What I would do is to set up setup.py to install all your packages to a specific site-packages location or if you could do it to the system's site-packages. In the former case, the local site-packages would then be added to the PYTHONPATH of the system/user. In the latter case, nothing needs to changes
You could use the batch file to set the python path as well. Or change the python executable to point to a shell script that contains a modified PYTHONPATH and then executes the python interpreter. The latter of course, means that you have to have access to the user's machine, which you do not. However, if your users only run scripts and do not import your own libraries, you could use your own wrapper for scripts:
#!/path/to/my/python
And the /path/to/my/python script would be something like:
#!/bin/sh
PYTHONPATH=/whatever/lib/path:$PYTHONPATH /usr/bin/python $*
I think you should have a look at path import hooks which allow to modify the behaviour of python when searching for modules.
For example you could try to do something like kde's scriptengine does for python plugins[1].
It adds a special token to sys.path(like "<plasmaXXXXXX>" with XXXXXX being a random number just to avoid name collisions) and then when python try to import modules and can't find them in the other paths, it will call your importer which can deal with it.
A simpler alternative is to have a main script used as launcher which simply adds the path to sys.path and execute the target file(so that you can safely avoid putting the sys.path.append(...) line on every file).
Yet an other alternative, that works on python2.6+, would be to install the library under the per-user site-packages directory.
[1] You can find the source code under /usr/share/kde4/apps/plasma_scriptengine_python in a linux installation with kde.