I have a personal python library consisting of several modules of scientific programs that I use. These live on a directory with the structure:
root/__init__.py
root/module1/__init__.py
root/module1/someprog.py
root/module1/ (...)
root/module2/__init__.py
root/module2/someprog2.py
root/module2/somecython.pyx
root/module2/somecython.so
root/module2/somefortran.f
root/module2/somefortran.so
(...)
I am constantly making changes to these programs and adding new files. With my current setup at work, I share the same directory with several machines of different architectures. What I want is a way to use these packages from python in the different architectures. If the packages were all pure python, this would be no problem. But the issue is that I have several compiled binaries (as shown in the example) from Cython and from f2py.
Is there a clever way to repackage these binaries so that python in the different systems only imports the relevant binaries? I'd like to keep the code organised in the same directory.
Obviously the simplest way would be to duplicate the directory or create another directory of symlinks. But this would mean that when new files are created, I'd have to update the symlinks manually.
Has anyone bumped into a similar problem, or can suggest a more pythonic approach to this organisation problem?
Probably you should use setuptools/distribute. You can then define a setup.py that compiles all files according to your current platform, copies them to some adequate directory and makes sure they are available in your sys.path.
You would do the following when you compile the source code of python.
Pass the exec-prefix flag with the directory to ./configure
For more info: ./configure --help will give you the following:
Installation directories:
--prefix=PREFIX install architecture-independent files in PREFIX
[/usr/local]
--exec-prefix=EPREFIX install architecture-dependent files in EPREFIX
[PREFIX]
Hope this helps :)
There is unfortunately no way to do this. A python package must reside entirely in one directory. PEP 382 proposed support for namespace-packages that could be split in different directories, but it was rejected. (And in any case, those would be special packages.)
Given that python packages have to be in a single directory, it is not possible to mix compiled extension modules for different architectures. There are two ways to mitigate this problem:
Keep binary extensions on a separate directory, and have all the python packages in a common directory that can be shared between architectures. The separate directory for binary extension can then be selected for different architectures with PYTHONPATH.
Keep a common directory with all the python files and extensions for different architectures. For each architecture, create a new directory with the package name. Then symlink all the python files and binaries in each of these directories. This will still allow a single place where the code lives, at the expense of having to create new symlinks for each new file.
The option suggested by Thorsten Krans is unfortunately not viable for this problem. Using distutils/setuptools/distribute still requires all the python source files to be installed in a directory for each architecture, negating the advantage of having them in a single directory. (This is not a finished package, but always work in progress.)
Related
Is there a way to convert a python package, i.e. is a folder of python files, into a single file that can be copied and then directly imported into a python script without needing to run any extra shell commands? I know it is possible to zip all of the files and then unzip them from python when they are needed, but I'm hoping that there is a more elegant solution.
It's not totally clear what the question is. I could interpret it two ways.
If you are looking to manage the symbols from many modules in a more organized way:
You'll want to put an __init__.py file in your directory and make it a package. In it you can define the symbols for your package, and create a graceful import packagename behavior. Details on packages.
If you are looking to make your code portable to another environment:
One way or the other, the package needs to be accessible in whatever environment it is run in. That means it either needs to be installed in the python environment (likely using pip), copied into a location that is in a subdirectory relative to the running code, or in a directory that is listed in the PYTHONPATH environment variable.
The most straightforward way to package up code and make it portable is to use setuptools to create a portable package that can be installed into any python environment. The manual page for Packaging Projects gives the details of how to go about building a package archive, and optionally uploading to PyPi for public distribution. If it is for private use, the resulting archive can be passed around without uploading it to the public repository.
I am writing a program in python to be sent to other people, who are running the same python version, however these some 3rd party modules that need to be installed to use it.
Is there a way to compile into a .pyc (I only say pyc because its a python compiled file) that has the all the dependant modules inside it as well?
So they can run the programme without needing to install the modules separately?
Edit:
Sorry if it wasnt clear, but I am aware of things such as cx_freeze etc but what im trying to is just a single python file.
So they can just type "python myapp.py" and then it will run. No installation of anything. As if all the module codes are in my .py file.
If you are on python 2.3 or later and your dependencies are pure python:
If you don't want to go the setuptools or distutiles routes, you can provide a zip file with the pycs for your code and all of its dependencies. You will have to do a little work to make any complex pathing inside the zip file available (if the dependencies are just lying around at the root of the zip this is not necessary. Then just add the zip location to your path and it should work just as if the dependencies files has been installed.
If your dependencies include .pyds or other binary dependencies you'll probably have to fall back on distutils.
You can simply include .pyc files for the libraries required, but no - .pyc cannot work as a container for multiple files (unless you will collect all the source into one .py file and then compile it).
It sounds like what you're after is the ability for your end users to run one command, e.g. install my_custom_package_and_all_required_dependencies, and have it assemble everything it needs.
This is a perfect use case for distutils, with which you can make manifests for your own code that link out to external dependencies. If your 3rd party modules are available publicly in a standard format (they should be, and if they're not, it's pretty easy to package them yourself), then this approach has the benefit of allowing you to very easily change what versions of 3rd party libraries your code runs against (see this section of the above linked doc). If you're dead set on packaging others' code with your own, you can always include the required files in the .egg you create with distutils.
Two options:
build a package that will install the dependencies for them (I don't recommend this if the only dependencies are python packages that are installed with pip)
Use virtual environments. You use an existing python on their system but python modules are installed into the virtualenv.
or I suppose you could just punt, and create a shell script that installs them, and tell them to run it once before they run your stuff.
For one of my projects at work I needed to create a standalone python installation (from source). However, the complete directory takes ~90MB of disk space, not much, but too much to be replicated over and over.
Which files can I remove from the custom python installation directory?
There is a large "test" folder (./lib/python2.7/test), everything is precompiled (but 99% of modules will not be used in this project), libpython2.7.a is placed twice (./lib and .lib/python2.7/config), etc.
freeze.py should help you - it's part of the standard installation.
See: Python: Where is freeze.py?
and: http://wiki.python.org/moin/Freeze
and the README: http://svn.python.org/projects/python/trunk/Tools/freeze/README
It tries to only include what is required.
If I place my project in /usr/bin/
will my python interpreter generate bytecode? If so where does it put them as the files do not have write permission in that folder. Does it cache them in a temp file?
If not, is there a performance loss for me putting the project there?
I have packaged this up as a .deb file that is installed from my Ubuntu ppa, so the obvious place to install the project is in /usr/bin/
but if I don't generate byte code by putting it there what should I do? Can I give the project write permission if it installs on another persons machine? that would seem to be a security risk.
There are surely lots of python projects installed in Ubuntu ( and obviously other distros ) how do they deal with this?
Thanks
Regarding the script in /usr/bin, if you execute your script as a user that doesn't have permissions to write in /usr/bin, then the .pyc files won't be created and, as far as I know, there isn't any other caching mechanism.
This means that your file will be byte compiled by the interpreter every time so, yes, there will be a performance loss. However, probably that loss it's not noticeable. Note that when a source file is updated, the compiled file is updated automatically without the user noticing it (at least most of the times).
What I've seen is the common practice in Ubuntu is to use small scripts in /usr/bin without even the .py extension. Those scripts are byte compiled very fast, so you don't need to worry about that. They just import a library and call some kind of library.main.Application().run() method and that's all.
Note that the library is installed in a different path and that all library files are byte compiled for different python versions. If that's not the case in your package, then you have to review you setup.py and your debian files since that's not the way it should be.
.pyc/.pyo files are not generated for scripts that are run directly. Python modules placed where Python modules are normally expected and packaged up have the .pyc/.pyo files generated at either build time or install time, and so aren't the end user's problem.
I've got a number of scripts that use common definitions. How do I split them in multiple files? Furthermore, the application can not be installed in any way in my scenario; it must be possible to have an arbitrary number of versions concurrently running and it must work without superuser rights. Solutions I've come up with are:
Duplicate code in every
script. Messy, and probably the worst
scheme.
Put all scripts and common
code in a single directory, and
use from . import to load them.
The downside of this approach is that
I'd like to put my libraries in other
directory than the applications.
Put common
code in its own directory, write a __init__.py that imports all submodules and finally use from . import to load them.
Keeps code organized, but it's a little bit of overhead to maintain __init__.py and qualify names.
Add the library directory to
sys.path and
import. I tend to
this, but I'm not sure whether
fiddling with sys.path
is nice code.
Load using
execfile
(exec in Python 3).
Combines the advantages of the
previous two approaches: Only one
line per module needed, and I can use
a dedicated. On the other hand, this
evades the python module concept and
polutes the global namespace.
Write and install a module using
distutils. This
installs the library for all python
scripts and needs superuser rights
and impacts other applications and is hence not applicable in my case.
What is the best method?
Adding to sys.path (usually using site.addsitedir) is quite common and not particularly frowned upon. Certainly you will want your common working shared stuff to be in modules somewhere convenient.
If you are using Python 2.6+ there's already a user-level modules folder you can use without having to add to sys.path or PYTHONPATH. It's ~/.local/lib/python2.6/site-packages on Unix-likes - see PEP 370 for more information.
You can set the PYTHONPATH environment variable to the directory where your library files are located. This adds that path to the library search path and you can use a normal import to import them.
If you have multiple environments which have various combinations of dependencies, a good solution is to use virtualenv to create sandboxed Python environments, each with their own set of installed packages. Each environment will function in the same way as a system-wide Python site-packages setup, but no superuser rights are required to create local environments.
Google has plenty of info, but this looks like a pretty good starting point.
Another alternative to manually adding the path to sys.path is to use the environment variable PYTHONPATH.
Also, distutils allows you to specify a custom installation directory using
python setup.py install --home=/my/dir
However, neither of these may be practical if you need to have multiple versions running simultaneously with the same module names. In that case you're probably best off modifying sys.path.
I've used the third approach (add the directories to sys.path) for more than one project, and I think it's a valid approach.