Make virtualenv share already existing site-packages? - python

My layout is as follows:
I have various different python projects under ~/projects, each with the following structure:
~/projects/$project_name/env #This is the virtualenv
~/projects/$project_name/scripts #This is where the code actually lives
~/projects/$project_name/scripts/requirements.txt #This helps keep track of this project's dependencies
Now, this setup works great as it does the following:
Each project has it's own dependencies in its corresponding env
I can easily redeploy this project somewhere else by cloning the scripts file, creating a new virtualenv and doing pip install -r requirements.txt
The main downside of this setup is that I have multiple copies of the same packages in multiple virtual environments. I regularly end up with a couple of hundred megs for each virtual environment.
My question is:
Is there a way to share packages between multiple virtualenvs?
Things I've tried and do not work:
virtualenv --system-site-packages. This makes the system-wise packages available in the virtualenv but:
it makes it impossible to get a list of specific dependencies
I can't have multiple versions of the same dependency installed (e.g. pandas 0.16 vs pandas 0.15) which I need, as different projects have different needs.
virtualenv --extra-search-dir=/path/to/dist only works for pip, AFAICT, so not good for me.

Scrap the comment, maybe I do know an answer. It appears as though Anaconda's package management system does use symlinks. So that would basically be a virtualenv but with the feature you want. See here: How to free disk space taken up by (ana)conda?
That said, there's a large initial harddisk cost to using Conda, so investigate a bit more and decide if it will work for you.

Related

How to have python libraries already installed in python project?

I am working on a python project that requires a few libraries. This project will further be shared with other people.
The problem I have is that I can't use the usual pip install 'library' as part of my code because the project could be shared with offline computers and the work proxy could block the download.
So what I first thought of was installing .whl files and running pip install 'my_file.whl' but this is limited since some .whl files work on some computers but not on others, so this couldn't be the solution of my problem.
I tried sharing my project with another project and i had an error with a .whl file working on one computer but not the other.
What I am looking for is to have all the libraries I need to be already downloaded before sharing my project. So that when the project is shared, the peers can launch it without needing to download the libraries.
Is this possible or is there something else that can solve my problem ?
There are different approaches to the issue here, depending on what the constraints are:
1. Defined Online Dependencies
It is a good practice to define the dependencies of your project (not only when shared). Python offers different methods for this.
In this scenario every developer has access to a pypi repository via the network. Usually the official main mirrors (i.e. via internet). New packages need to be pulled individually from here, whenever there are changes.
Repository (internet) access is only needed when pulling new packages.
Below the most common ones:
1.1 requirements.txt
The requirements.txt is a plain text list of required packages and versions, e.g.
# requirements.txt
matplotlib==3.6.2
numpy==1.23.5
scipy==1.9.3
When you check this in along with your source code, users can freely decide how to install it. The mosty simple (and most convoluted way) is to install it in the base python environment via
pip install -r requirements.txt
You can even automatically generate such a file, if you lost track with pipreqs. The result is usually very good. However, a manual cleanup afterwards is recommended.
Benefits:
Package dependency is clear
Installation is a one line task
Downsides:
Possible conflicts with multiple projects
Not sure that everyone has the exact same version if flexibility is allowed (default)
1.2 Pipenv
There is a nice and almost complete Answer to Pipenv. Also the Pipenv documentation itself is very good.
In a nutshell: Pipenv allows you to have virtual environments. Thus, version conflicts from different projects are gone for good. Also, the Pipfile used to define such an environment allows seperation of production and development dependencies.
Users now only need to run the following commands in the folder with the source code:
pip install pipenv # only needed first time
pipenv install
And then, to activate the virtual environment:
pipenv shell
Benefits:
Seperation between projects
Seperation of development/testing and production packages
Everyone uses the exact same version of the packages
Configuration is flexible but easy
Downsides:
Users need to activate the environment
1.3 conda environment
If you are using anaconda, a conda environment definition can be also shared as a configuration file. See this SO answer for details.
This scenario is as the pipenv one, but with anaconda as package manager. It is recommended not to mix pip and conda.
1.4 setup.py
When you are anyway implementing a library, you want to have a look on how to configure the dependencies via the setup.py file.
2. Defined local dependencies
In this scenario the developpers do not have access to the internet. (E.g. they are "air-gapped" in a special network where they cannot communicate to the outside world. In this case all the scenarios from 1. can still be used. But now we need to setup our own mirror/proxy. There are good guides (and even comlplete of the shelf software) out there, depending on the scenario (above) you want to use. Examples are:
Local Pypi mirror [Commercial solution]
Anaconda behind company proxy
Benefits:
Users don't need internet access
Packages on the local proxy can be trusted (cannot be corrupted / deleted anymore)
The clean and flexible scenarios from above can be used for setup
Downsides:
Network connection to the proxy is still required
Maintenance of the proxy is extra effort
3. Turn key environments
Last, but not least, there are solutions to share the complete and installed environment between users/computers.
3.1 Copy virtual-env folders
If (and only if) all users (are forced to) use an identical setup (OS, install paths, uses paths, libraries, LOCALS, ...) then one can copy the virtual environments for pipenv (1.2) or conda (1.3) between PCs.
These "pre-compiled" environments are very fragile, as a sall change can cause the setup to malfunction. So this is really not recommended.
Benefits:
Can be shared between users without network (e.g. USB stick)
Downsides:
Very fragile
3.2 Virtualisation
The cleanest way to support this is some kind of virtualisation technique (virtual machine, docker container, etc.).
Install python and the dependencies needed and share the complete container.
Benefits:
Users can just use the provided container
Downsides:
Complex setup
Complex maintenance
Virtualisation layer needed
Code and environment may become convoluted
Note: This answer is compiled from the summary of (mostly my) comments

Structure for developmental Python packages

I'm starting a brand new project. I'm splitting code of different functions (e.g., utils, database, main application) into their own respective packages such that other new projects can just add them as dependencies and import them in the future. Packages may have cross-dependencies (e.g., database depends on utils).
I know that I can build each component as a Python package and use pip to manage the dependencies. However, as I'm starting from scratch, I will be making active changes to all packages at the same time. It seems to me packaging them as a "proper package" would be quite inefficient. I envisage that I will need to add a new function in say utils, increment the version, use it database, then realise that I need another new function that belongs in utils, increment version again etc.
What would be the best way to structure the project in this scenario? I'm using conda, Python 3.10 and VSCode if that matters.
I suggest the package approach you are thinking about. The key method you're missing is to make editable installs locally for all your packages.
pip install -e . in package root directory
Editable installs will reflect their changes right away in your environment
Since you are using conda you probably want conda develop . like this answer suggests

How to manage python project that depends on multiple versions of a shared library?

I am on macOS, using brew, pyenv, and virtualenv.
I have a Python project that depends on bokeh and gdal (both python packages were installed with pip inside a virtual environment). Both bokeh and gdal depend on a system version of libopenssl, but they depend on different versions (1.0 and 1.1).
I have had this project working at various points in the past, with some combination of libraries (using pip for all python packages and brew for system packages) but when I change python versions and environments (using pyenv) to work on other projects, and then come back to this project, it no longer works. Usually something along these lines with a problem finding a shared library for openssl:
$ ./my_python_program.py
...
ImportError: dlopen(/Users/userBob/.pyenv/versions/3.7.0/lib/python3.7/lib-dynload/_ssl.cpython-37m-darwin.so, 2):
Library not loaded: /usr/local/opt/openssl#1.1/lib/libssl.1.1.dylib
Referenced from: /Users/userBob/.pyenv/versions/3.7.0/lib/python3.7/lib-dynload/_ssl.cpython-37m-darwin.so
Reason: image not found
I feel like I am eventually able to get things to work by trying random combinations of installing and uninstalling various package versions using pip and brew. But this is a fragile and inefficient way to maintain my projects.
In general what is the best way to handle this kind of situation? Do I need to simply record the exact brew and pip install/uninstall commands to get it working? Am I missing the concept of version "pinning"? Are there additional options with brew and pyenv that I am missing that might make this process easier?
I'm not sure this is the best way to do it, but I can tell you what I do usually.
First of all, I'm using Anaconda.
When I'm on a project, I switch to the relevant virtual environment.
Before switching out, when I commit/push my modifications, I also create an export file of my environment like you can find it there.
I also track this file with git, this way, if I make any modification when working on the environment, it's stored in the .yml file.
This way, I can reinstall all dependencies needed for the project if I format my machine or get a new one, etc. And the reference for every dependency I need is stored in the cloud with my sources. So in case I start getting weird behaviour, I just restore my environment with this reference file from the time when it was working.
I'm not switching between projects fast enough for me to justify automatizing this process, but I'm sure that's feasible if you want to.

is it ok to use files inside pip packages without installing them

I am working on a project and I have cloned a repository from github.
After first compile I realized that the project that I cloned has some dependencies and they were in requirements.txt file.
I know I have to install these packages, but I dont want to cause I am on windows development environment and after finishing my project I am going to publish it to my ubuntu production environment and I dont want to take the hassle of double installation.
I have two options:
Using a virtualenv and installing those packages inside it
Downloading the packages and use them the direct way using import foldername
I wanna avoid the first option cause I have less control over my project and the problem gets bigger and bigger If for example I were inside another project's virtualenv and wanted to run my project's main.py file from its own virtualenv and etc... Also moving the virtualenv from windows (bat files) to linux (bash / sh files) seems ugly to me and directs me to approaches that I choose to better avoid.
The second option is my choice. for example I need to use the future package. The scenario would be downloading the package using pip download future and when done extracting the tar.gz file, inside the src folder I can see the future package folder, And I use it with import future_package.src.future without even touching anything else.
Aside from os.path problems (which assume I take care of):
Is this good practice?
I am not running the setup.py preventing any installation. Can it cause problems?
Is there any better approach that involves less work (like the second one) or the better one is my mentioned first approach?
UPDATE 1: I have extracted future and certifi packages which were part of the requirements of my project and I used them the direct way and it is working in this particular case.

Is there a way to "version" my python distribution?

I'm working by myself right now, but am looking at ways to scale my operation.
I'd like to find an easy way to version my Python distribution, so that I can recreate it very easily. Is there a tool to do this? Or can I add /usr/local/lib/python2.7/site-packages/ (or whatever) to an svn repo? This doesn't solve the problems with PATHs, but I can always write a script to alter the path. Ideally, the solution would be to build my Python env in a VM, and then hand copies of the VM out.
How have other people solved this?
virtualenv + requirements.txt are your friend.
You can create several virtual python installs for your projects, everything containing exactly those library versions you need (Tip: pip freeze spits out a requirements.txt with the exact library versions).
Find a good reference to virtualenv here: http://simononsoftware.com/virtualenv-tutorial/ (it's from this question Comprehensive beginner's virtualenv tutorial?).
Alternatively, if you just want to distribute your code together with libraries, PyInstaller is worth a try. You can package everything together in a static executable - you don't even have to install the software afterwards.
You want to use virtualenv. It lets you create an application(s) specific directory for installed packages. You can also use pip to generate and build a requirements.txt
For the same goal, i.e. having the exact same Python distribution as my colleagues, I tried to create a virtual environment in a network drive, so that everybody of us would be able to use it, without anybody making his local copy.
The idea was to share the same packages installed in a shared folder.
Outcome: Python run so unbearably slow that it could not be used. Also installing a package was very very sluggish.
So it looks there is no other way than using virtualenv and a requirements file. (Even if unfortunately often it does not always work smoothly on Windows and it requires manual installation of some packages and dependencies, at least at this time of writing.)

Categories

Resources