How to handle python dependencies throughout the project?

How to handle python dependencies throughout the project? - python

Lets say a developer is working on a project when he realizes he needs to use some package.
He uses pip to install it. Now, after installing it, would a the developer write it down as a dependency in the requirements file / setup.py?
What does that same dev do if he forgot to write down all the dependencies of the project (or if he didn't know better since he hasn't been doing it long)?
What I'm asking is what's the workflow when working with external packages from the PyPi?

The command:
pip freeze > requirements.txt
will copy all of the dependencies currently in your python environment into requirements.txt. http://pip.readthedocs.org/en/latest/reference/pip_freeze.html

It depends on the project.
If you're working on a library, you'll want to put your dependencies in setup.py so that if you're putting the library on PyPi, people will be able to install it, and its dependencies automatically.
If you're working on an application in Python (possibly web application), a requirements.txt file will be easier for deploying. You can copy all your code to where you need it, set up a virtual environment with virtualenv or pyvenv, and then do pip install -r requirements.txt. (You should be doing this for development as well so that you don't have a mess of libraries globally).
It's certainly easier to write the packages you're installing to your requirements.txt as soon as you've installed them than trying to figure out which ones you need at the end. What I do so that I never forget is I write the packages to the file first and then install with pip install -r.
pip freeze helps if you've forgotten what you've installed, but you should always read the file it created to make sure that you actually need everything that's in there. If you're using virtualenv it'll give better results than if you're installing all packages globally.

Related

How to ensure that needed packages are always installed/available

I'm working on a script in python that relies on several different packages and libraries. When this script is transferred to another machine, the packages it needs in order to run are sometimes not present or are older versions that do not have the same functionality and cause the script to fail.
I was considering using a virtual environment, but I can't find a way to have the script use the specific environment I design as it's default, and in order to use the environment a user must manually activate it from the command line.
I've also looked into trying to check the versions of the packages installed on the machine, and if they are not sufficient then updating them from the script as described here:
Installing python module within code
Is there any easier/surefire way to make sure that the needed packages will always be available regardless of where it's run?

The normal approach is to create an installation script and have that manage your dependencies. Then when you move your project to a new environment your installer will check that all dependencies are present.
I recommend you check out setuptools: https://setuptools.readthedocs.io/en/latest/
If you don't want to install dependencies whenever you need to use your script somewhere new, then you could package your script into a Docker container.

If the problem is ensuring the required packages are available in a new environment or virtual environment, you could use pip and generate a requirements.txt and check it in version control or use a tool to do that for you, like pipenv.
If you would prefer to generate a requirements.txt by hand, you should:
Install your depencencies using pip
Type pip freeze > requirements.txt to generate a requirements.txt file
Check requirements.txt in you source management software
When you need to setup a new environment, use pip install -m requirements.txt

The solution that I've been using has been to include a custom library (folder with all of my desired packages) in the folder with my script, and I simply import them from there:
from Customlib import pkg1, pkg2,...
As long as the custom library and script stay together in the same folder, it will always have access to the right packages and the correct versions of those packages.
I'm not sure how robust this solution actually is or what possible bugs may arise from this if it is passed from machine to machine, but for now this seems to work.

What is the use case for `pip install -e`?

When I need to work on one of my pet projects, I simply clone the repository as usual (git clone <url>), edit what I need, run the tests, update the setup.py version, commit, push, build the packages and upload them to PyPI.
What is the advantage of using pip install -e? Should I be using it? How would it improve my workflow?

I find pip install -e extremely useful when simultaneously developing a product and a dependency, which I do a lot.
Example:
You build websites using Django for numerous clients, and have also developed an in-house Django app called locations which you reuse across many projects, so you make it available on pip and version it.
When you work on a project, you install the requirements as usual, which installs locations into site packages.
But you soon discover that locations could do with some improvements.
So you grab a copy of the locations repository and start making changes. Of course, you need to test these changes in the context of a Django project.
Simply go into your project and type:
pip install -e /path/to/locations/repo
This will overwrite the directory in site-packages with a symbolic link to the locations repository, meaning any changes to code in there will automatically be reflected - just reload the page (so long as you're using the development server).
The symbolic link looks at the current files in the directory, meaning you can switch branches to see changes or try different things etc...
The alternative would be to create a new version, push it to pip, and hope you've not forgotten anything. If you have many such in-house apps, this quickly becomes untenable.

For those who don't have time:
If you install your project with an -e flag (e.g. pip install -e mynumpy) and use it in your code (e.g. from mynumpy import some_function), when you make any change to some_function, you should be able to use the updated function without reinstalling it.

pip install -e is how setuptools dependencies are handled via pip.
What you typically do is to install the dependencies:
git clone URL
cd project
run pip install -e . or pip install -e .[dev]*
And now all the dependencies should be installed.
*[dev] is the name of the requirements group from setup.py
Other than setuptools (egg) there is also a wheel system of python installation.
Both these systems are based on promise that no building and compilation is performed.

Creating "virtualenv" for an existing project

I have a python on which I've been working on. Now I've realized that I need a virtual environment for it. How can I create it for an existing project? If I do this:
virtualenv venv
will it work fine? Or do I have to re-create my project, create virtualenv and then copy the existing files to it?

You can just create an virtual enviroment with virtualenv venv and start it with venv/bin/activate.
You will need to reinstall all dependencies using pip, but the rest should just work fine.

The key thing is creating requirements.txt.
Create a virtualenv as normal. Do not activate it yet.
Now you need to install the required packages. If you do not readily remember it, ask pip:
pip freeze > requirements.txt
Now edit requirements.txt so that only the packages you know you installed are included. Note that the list will include all dependencies for all installed packages. Remove them, unless you want to explicitly pin their versions, and know what you're doing.
Now activate the virtualenv (the normal source path/to/virtualenv/bin/activate).
Install the dependencies you've collected:
pip install -r requirements.txt
The dependencies will be installed into your virtualenv.
The same way you'll be able to re-create the same env on your deployment target.

If you are using from windows then follow the following procedure:
Step 1: Go to your root directory of existing python project
Step 2: Create virtual environment with virtualenv venv
Step 4: Go to /Scripts and type this command activate
then if you would like to install all required library , pip3 install -r requirements.txt

There is something that I would like to add to this question. Because, newbees always have a problem and even once in a while, I do some mistake like that.
If you do not have requirements.txt for an already existing python-project, then you are doomed. Save at-least 2-3 hours of the day to recover the requirements.txt for an already existing python-project.
The best way to see it through and only if you are lucky, remember from that python-project, the most import package. By most-important, I mean the package that has highest-dependency.
Once, you have located this highest-dependency package, install it via pip. This will install all the dependency of the highest dependency package.
Now, see if it works. If it does, Voila !!
If it doesn't you will have get where the conflicts are and start to resolve them one-by-one.
I was working on such a situation recently, where there was no requirements.txt. But I knew, the highest dependency was this deep-learning package called Sentence-Transformers and I installed it and with minor conflicts, resolved everything.
Best-of-luck !! Let me know if it ever helped anyone !!

Reinstall python with modules

I will format my pc and i would like to somehow collect all the python modules that i have currently and package them (zip or rar etc) / or create an index file of them, so that when i'm done formatting the pc i can reinstall them all in one go, either by using the package/or by using the index created to pip install them all in a batch.
Is there any python module that allows to do that?

Use pip
pip freeze > requirements.txt
This will save the names of all your installed python modules to a file called requirements.txt.
Then when you want to install them again run the following command.
pip install -r requirements.txt
Using a package manager like this is good practice to get into, especially if you use a code repository, so you dont upload all the dependencies to the repo.
If you are not already doing so, its a good idea to use a virtual environment for your python projects.
This will create a unique python environment for each of your projects, keeping each project self contained.

Yes - pip. Using pip freeze will give you the list of all the installed modules. Next all you need to do is to install all that modules by running pip install -r your_file_with_modules_list.

When would the -e, --editable option be useful with pip install?

When would the -e, or --editable option be useful with pip install?
For some projects the last line in requirements.txt is -e .. What does it do exactly?

As the man page says it:
-e,--editable <path/url>
Install a project in editable mode (i.e. setuptools "develop mode") from a local project path or a VCS url.
So you would use this when trying to install a package locally, most often in the case when you are developing it on your system. It will just link the package to the original location, basically meaning any changes to the original package would reflect directly in your environment.
Some nuggets around the same here and here.
An example run can be:
pip install -e .
or
pip install -e ~/ultimate-utils/ultimate-utils-proj-src/
note the second is the full path to where the setup.py would be at.

Concrete example of using --editable in development
If you play with this test package as in:
cd ~
git clone https://github.com/cirosantilli/vcdvcd
cd vcdvcd
git checkout 5dd4205c37ed0244ecaf443d8106fadb2f9cfbb8
python -m pip install --editable . --user
it outputs:
Obtaining file:///home/ciro/bak/git/vcdvcd
Installing collected packages: vcdvcd
Attempting uninstall: vcdvcd
Found existing installation: vcdvcd 1.0.6
Can't uninstall 'vcdvcd'. No files were found to uninstall.
Running setup.py develop for vcdvcd
Successfully installed vcdvcd-1.0.6
The Can't uninstall 'vcdvcd' is normal: it tried to uninstall any existing vcdvcd to then replace them with the "symlink-like mechanism" that is produced in the following steps, but failed because there were no previous installations.
Then it generates a file:
~/.local/lib/python3.8/site-packages/vcdvcd.egg-link
which contains:
/home/ciro/vcdvcd
.
and acts as a "symlink" to the Python interpreter.
So now, if I make any changes to the git source code under /home/ciro/vcdvcd, it reflects automatically on importers who can from any directory do:
python -c 'import vcdvcd'
Note however that at my pip version at least, binary files installed with --editable, such as the vcdcat script provided by that package via scripts= on setup.py, do not get symlinked, just copied to:
~/.local/bin/vcdcat
just like for regular installs, and therefore updates to the git repository won't directly affect them.
By comparison, a regular non --editable install from the git source:
python -m pip uninstall vcdvcd
python -m pip install --user .
produces a copy of the installed files under:
~/.local/lib/python3.8/site-packages/vcdvcd
Uninstall of an editable package as done above requires a new enough pip as mentioned at: How to uninstall editable packages with pip (installed with -e)
Tested in Python 3.8, pip 20.0.2, Ubuntu 20.04.
Recommendation: develop directly in-tree whenever possible
The editable setup is useful when you are testing your patch to a package through another project.
If however you can fully test your change in-tree, just do that instead of generating an editable install which is more complex.
E.g., the vcdvcd package above is setup in a way that you can just cd into the source and do ./vcdcat without pip installing the package itself (in general, you might need to install dependencies from requirements.txt though), and the import vcdvcd that that executable does (or possibly your own custom test) just finds the package correctly in the same directory it lives in.

From Working in "development" mode:
Although not required, it’s common to locally install your project in
“editable” or “develop” mode while you’re working on it. This allows
your project to be both installed and editable in project form.
Assuming you’re in the root of your project directory, then run:
pip install -e .
Although somewhat cryptic, -e is short for
--editable, and . refers to the current working directory, so together, it means to install the current directory (i.e. your
project) in editable mode.
Some additional insights into the internals of setuptools and distutils from “Development Mode”:
Under normal circumstances, the distutils assume that you are going to
build a distribution of your project, not use it in its “raw” or
“unbuilt” form. If you were to use the distutils that way, you would
have to rebuild and reinstall your project every time you made a
change to it during development.
Another problem that sometimes comes up with the distutils is that you
may need to do development on two related projects at the same time.
You may need to put both projects’ packages in the same directory to
run them, but need to keep them separate for revision control
purposes. How can you do this?
Setuptools allows you to deploy your projects for use in a common
directory or staging area, but without copying any files. Thus, you
can edit each project’s code in its checkout directory, and only need
to run build commands when you change a project’s C extensions or
similarly compiled files. You can even deploy a project into another
project’s checkout directory, if that’s your preferred way of working
(as opposed to using a common independent staging area or the
site-packages directory).
To do this, use the setup.py develop command. It works very similarly
to setup.py install, except that it doesn’t actually install anything.
Instead, it creates a special .egg-link file in the deployment
directory, that links to your project’s source code. And, if your
deployment directory is Python’s site-packages directory, it will also
update the easy-install.pth file to include your project’s source
code, thereby making it available on sys.path for all programs using
that Python installation.

It is important to note that pip uninstall can not uninstall a module that has been installed with pip install -e. So if you go down this route, be prepared for things to get very messy if you ever need to uninstall. A partial solution is to (1) reinstall, keeping a record of files created, as in sudo python3 -m setup.py install --record installed_files.txt, and then (2) manually delete all the files listed, as in e.g. sudo rm -r /usr/local/lib/python3.7/dist-packages/tdc7201-0.1a2-py3.7.egg/ (for release 0.1a2 of module tdc7201). This does not 100% clean everything up however; even after you've done it, importing the (removed!) local library may succeed, and attempting to install the same version from a remote server may fail to do anything (because it thinks your (deleted!) local version is already up to date).

As suggested in previous answers, there is no symlinks that are getting created.
How does '-e' option work? -> It just updates the file "PYTHONDIR/site-packages/easy-install.pth" with the project path specified in the 'command pip install -e'.
So each time python search for a package it will check this directory as well => any changes to the files in this directory is instantly reflected.

Develop Reference

Python is a programming language that lets you work quickly and integrate systems more effectively.