Python's "make install" still references the source files - python

For testing my libraries on multiple Python versions I have a single virtual environment that I install them into, and reference them with their complete name/version (i.e. python3.7). Recently I noticed that sys.path is still referencing the source library instead of the copied library (i.e. /source/python/... instead of /source/virtualenv/lib/python3.7/...)
I've tried make install instead of make altinstall 1, I've searched for answers -- so far nothing has helped.
How do I fix this?
1 PSA: If you use make (alt)install and you don't want to clobber your system Python, make sure and use
./configure --prefix /path/to/install_to/here

TL;DR Remove the pyvenv.cfg in the virtualenv root directory.
The issue is the interaction with the virtualenv, and not make. Somewhere in Python's startup it checks to see if it is running in a virtualenv and, if it is, uses the libraries from it's original installation (and I had created the virtualenv from my source copy).
The solution is to remove the pyvenv.cfg file in the root of the virtualenv. This will completely isolate the virtualenv from the system (so no sharing of site-packages nor dist-packages), but is exactly what I wanted for my purposes.

Related

Python, site-packages partially on network file system?

CentOS 7 system, Python 2.7. The OS's installed python has directory
/usr/lib/python2.7/site-packages
and that is where a
python setup.py install
command would install a package. On a computing cluster I would like to install some packages so that they are referenced from that directory but which actually reside in an NFS served directory here:
/usr/common/lib/python2.7/site-packages
That is, I do not want to have to run setup.py on each of the cluster nodes to do a local install on each, duplicating the package on every machine. The packages already installed locally must not be affected, some of those are used by the OS's commands. Also the local ones must work even if the network is down for some reason. I am not trying to set up a virtual environment, I am only trying to place a common set of packages in a different directory in such a way that the OS supplied python sees them.
It isn't clear to me what is the best way to do this. It seems like such a common problem that there must be a standard or preferred way of doing this, and if possible, that is the method I would like to use.
This command
/usr/bin/python setup.py install --prefix=/usr/common
would probably install into the target directory. However the "python" command on the cluster nodes will not know this package is present, and there is no "network" python program that corresponds to the shared site-packages.
After the network install one could make symlinks from the local to the shared directory for each of the files created. That would be acceptable, assuming that is sufficient.
It looks like the PYTHONPATH environmental variable might also work here, although I'm unclear about what it expects
for "path" (full path to site-packages, or just the /usr/common part.)
EDIT: This does seem to work as needed, at least for the test case. The software package in question was installed using --prefix, as above. PYTHONPATH was not previously defined.
export PYTHONPATH=/usr/common/lib/python2.7/site-packages
python $PATH_TO_START_SCRIPT/start.py
ran correctly.
Thanks.

Moving Anaconda installation from one user account to another

I apologize if this is not the correct site for this. If it is not, please let me know.
Here's some background on what I am attempting. We are working on a series of chat bots that will go into production. Each of them will run on a environment in Anaconda. However, our setup uses tensorflow, which uses gcc to be compiled, and compliance has banned compilers from production. In addition, compliance rules also frown on us using pip or conda install in production.
As a way to get around this, I'm trying to tar the Anaconda 3 folder and move it into prod, with all dependencies already compiled and installed. However, the accounts between environments have different names, so this requires me to go into the bin folder (at the very least; I'm sure I will need to change them in the lib and pckg folders as well) and use sed -i to rename the hard coded paths to point from \home\<dev account>\anaconda to \home\<prod account>\anaconda, and while this seems to work, its also a good way to mangle my installation.
My questions are as follows:
Is there any good way to transfer anaconda from one user to another, without having to use sed -i on these paths? I've already read that Anaconda itself does not support this, but I would like your input.
Is there any way for me to install anaconda in dev so the scripts in it are either hard coded to use the production account name in their paths, or to use ~.
If I must continue to use sed, is there anything critical I should be aware of? For example, when I use grep <dev account> *, I will some files listed as binary file matches. DO I need to do anything special to change these?
And once again, I am well aware that I should just create a new Anaconda installation on the production machine, but that is simply not an option.
Edit:
So far, I've changed the conda.sh and conda.csh files in /etc, as well as the conda, activate, and deactivate files in the root bin. As such, I'm able to activate and deactivate my environment on the new user account. Also, I've changed the files in the bin folder under the bot environment. Right now, I'm trying to train the bot to test if this works, but it keeps failing and stating that a custom action does not exist in the the list. I don't think that is related to this, though.
Edit2:
I've confirmed that the error I was getting was not related to this. In order to get the bot to work properly with a ported version of Anaconda, all I had to change was the the conda.sh and conda.csh files in /etc so their paths to python use ~, do the same for the activate and deactivate files in /bin, and change the shebang line in the conda file in /bin to use the actual account name. This leaves every other file in /bin and lib still using the old account name in their shebang lines and other variable that use the path, and yet the bots work as expected. By all rights, I don't think this should work, but it does.
Anaconda is touchy about path names. They're obviously inserted into scripts, but they may be inserted into binaries as well. Some approaches that come to mind are:
Use Docker images in production. When building the image:
Install compilers as needed.
Build your stuff.
Uninstall the compilers and other stuff not needed at runtime.
Squash the image into a single layer.
This makes sure that the uninstalled stuff is actually gone.
Install Anaconda into the directory \home\<prod account>\anaconda on the development or build systems as well. Even though accounts are different, there should be a way to create a user-writeable directory in the same location.
Even better: Install Anaconda into a directory \opt\anaconda in all environments. Or some other directory that does not contain a username.
If you cannot get a directory outside of the user home, negotiate for a symlink or junction (mklink.exe /d or /j) at a fixed path \opt\anaconda that points into the user home.
If necessary, play it from the QA angle: Differing directory paths in production, as compared to all other environments, introduce a risk for bugs that can only be detected and reproduced in production. The QA or operations team should mandate that all applications use fixed paths everywhere, rather than make an exception for yours ;-)
Build inside a Docker container using directory \home\<prod account>\anaconda, then export an archive and run on the production system without Docker.
It's generally a good idea to build inside a reproducible Docker environment, even if you can get a fixed path without an account name in it.
Bundle your whole application as a pre-compiled Anaconda package, so that it can be installed without compilers.
That doesn't really address your problem though, because even conda install is frowned upon in production. But it could simplify building Docker images without squashing.
I've been building Anaconda environments inside Docker and running them on bare metal in production, too. But we've always made sure that the paths are identical across environments. I found mangling the paths too scary to even try. Life has become much simpler when we switched to Docker images everywhere. But if you have to keep using sed... Good Luck :-)
This is probably what you need : pip2pi.
This only works for pip compatible packages.
As I understand you need to move your whole setup as previously compiled as .tar.gz file, then here are a few things you could try:
Create a requirements.txt. These packages can help :
a. pipreqs
$ pipreqs /home/project/location
Successfully saved requirements file in /home/project/location/requirements.txt
b. snakefood.
Then, install pip2pi
$ pip install pip2pi
$ pip2tgz packages/ foo==1.2
...
$ ls packages/
foo-1.2.tar.gz
bar-0.8.tar.gz
pip2tgz passes package arguments directly to pip, so packages can be specified in any format that pip recognises:
$ cat requirements.txt
foo==1.2
http://example.com/baz-0.3.tar.gz
$ pip2tgz packages/ -r requirements.txt bam-2.3/
...
$ ls packages/
foo-1.2.tar.gz
bar-0.8.tar.gz
baz-0.3.tar.gz
bam-2.3.tar.gz
After getting all .tar.gz files, .tar.gz files can be turned into PyPI-compatible "simple" package index using the dir2pi command:
$ ls packages/
bar-0.8.tar.gz
baz-0.3.tar.gz
foo-1.2.tar.gz
$ dir2pi packages/
$ find packages/
packages/
packages/bar-0.8.tar.gz
packages/baz-0.3.tar.gz
packages/foo-1.2.tar.gz
packages/simple
packages/simple/bar
packages/simple/bar/bar-0.8.tar.gz
packages/simple/baz
packages/simple/baz/baz-0.3.tar.gz
packages/simple/foo
packages/simple/foo/foo-1.2.tar.gz
but they may be inserted into binaries as well
I can confirm that some packages have hard-coded the absolute path (including username) into the compiled binary. But if you restrict usernames to have the same length, you can apply sed on both binary and text files to make almost everything work as perfect.
On the other hand, if you copy the entire folder and use sed to replace usernames on only text files, you can run most of the installed packages. However, operations involving run-time compilation might fail, one example is installing a new package that requires compilation during installation.

How to add QuantLib to a virtualenv (ubuntu)

I am using pydev and a virtualenv (which has already been set up successfully). How do you add quantlib (and for that matter any python wrapper plus its C++ native library) to a virtualenv?
I successfully built quantlib and the quantlib-SWIG from source as described here. I notice that after the boost build, //usr/local/lib contains libQuantLib.* files which are probably the native libs.
I then tried copying libQuantLib.* to my virtualenv/lib/python2.7/site-packages, as described here but eclipse still complains about unresolved imports (at this point I am also externally referencing //usr/local/lib/QuantLib-SWIG-1.4/Python/build/lib.linux-x86_64-2.7/QuantLib folder). I am not sure if I had this correctly working.
I have seen this solution, but I really want everything contained in the virtualenv - both the python wrapper and C++ libraries, so everything is resolved when I set the project's pydev interpreter as my virtualenv.
I am unsure what best practices are here.
I'm not familiar with the way the virtualenv is set up. However: from the fact that your Python modules are in virtualenv/lib/python2.7/site-packages, I'd guess that the native libraries should go in virtualenv/lib. The correct way to have everything set up there, though, would be to tell the build machinery where you want the library; in your case (and assuming my guess above is correct) you'd do it by building QuantLib with:
./configure --prefix=/path/to/virtualenv
make
make install
where /path/to/virtualenv is the path to your virtualenv, including the virtualenv folder (but not lib). This will put header files and native libraries in the correct place in the virtualenv. After this, build QuantLib-SWIG using the QuantLib libraries you just installed: I think the easiest way is to do it from within the virtualenv (that is, using the Python interpreter inside it). Activate the env, enter the QuantLib-SWIG/Python directory, and run:
export PATH=/path/to/virtualenv/bin:$PATH
python setup.py build
python setup.py install
where setting PATH as above might be needed to find the correct quantlib-config script. (By the way, you should end up with just a QuantLib Python module in site-packages, not the whole build/lib.linux-x86_64-2.7 thing you have now.)

Why use Pythons 'virtualenv' on Linux when one has 'chroot' (and union/overlay filesystems)?

First of all let me state that I am a proponent of generic software (in general ;-). I am no expert on Python, but it seems that the 'virtualenv' utility solves pretty much the same problem 'chroot' can help to solve - bootstrapping a directory tree that can be passed as root, thus effectively protecting the real directory tree, if needed.
Since I am no expert in Python as already mentioned, I wonder - what problem can virtualenv solve that chroot cannot? I mean, can't I just set up a nice fake root tree (possibly using union mounting), chroot into it, and do pip install a package I want in my new environment, and then play around within the bounds of my new environment, running python scripts and what not?
Am I missing something here?
Update:
Can't one install packages/modules locally in whatever application directory, I mean, without root privileges and subsequently without overwriting or adding files to /usr/lib or /usr/local/lib? It appears that this is what virtualenv does, however I think it has to symlink or otherwise provide a python interpreter for each environment one creates, does it not?
bootstrapping a directory tree that can be passed as root
That's not what virtualenv does, except (to some degree) for Python packages. It provides a place where these can be installed without replacing the rest of the filesystem. It also works without root privileges and it's portable as it needs no kernel support, unlike chroot, which (I presume) won't work on Windows.
Can't one install packages/modules locally in whatever application directory
Yes, but virtualenv does one more thing, which is that it disables (by default at least) the system's Python package directories. That means you can test whether your package correctly installs all of its dependencies (you might have forgotten to list one because it's already installed on your system) and it allows installing different versions in case you need either newer or older versions. The ability to install older versions should not be overlooked because sometimes new versions of packages introduce bugs.

Is there a way to "version" my python distribution?

I'm working by myself right now, but am looking at ways to scale my operation.
I'd like to find an easy way to version my Python distribution, so that I can recreate it very easily. Is there a tool to do this? Or can I add /usr/local/lib/python2.7/site-packages/ (or whatever) to an svn repo? This doesn't solve the problems with PATHs, but I can always write a script to alter the path. Ideally, the solution would be to build my Python env in a VM, and then hand copies of the VM out.
How have other people solved this?
virtualenv + requirements.txt are your friend.
You can create several virtual python installs for your projects, everything containing exactly those library versions you need (Tip: pip freeze spits out a requirements.txt with the exact library versions).
Find a good reference to virtualenv here: http://simononsoftware.com/virtualenv-tutorial/ (it's from this question Comprehensive beginner's virtualenv tutorial?).
Alternatively, if you just want to distribute your code together with libraries, PyInstaller is worth a try. You can package everything together in a static executable - you don't even have to install the software afterwards.
You want to use virtualenv. It lets you create an application(s) specific directory for installed packages. You can also use pip to generate and build a requirements.txt
For the same goal, i.e. having the exact same Python distribution as my colleagues, I tried to create a virtual environment in a network drive, so that everybody of us would be able to use it, without anybody making his local copy.
The idea was to share the same packages installed in a shared folder.
Outcome: Python run so unbearably slow that it could not be used. Also installing a package was very very sluggish.
So it looks there is no other way than using virtualenv and a requirements file. (Even if unfortunately often it does not always work smoothly on Windows and it requires manual installation of some packages and dependencies, at least at this time of writing.)

Categories

Resources