I ran into a problem when trying to figure out how to specify dependencies for a python application which uses pyproject.toml.
What is the best practice when it comes to installing applications? Couple of approaches I am considering:
have requirements.txt with resolved and pinned dependencies, have dynamic dependencies in pyproject.toml, which would reference this requirements file.
pin only top level dependencies in pyproject.toml, without any requirements file
have requirements.txt as in first case, which would again be referenced as dynamic dependencies in pyproject.toml, but also have optional dev dependencies, which would contain unpinned top level dependencies
Things I am looking for is reproducibility, but also minimal manual work needed when developing the app. Third approach makes the most sense to me, as it would only require pip freeze before pushing my new changes. I have used pyproject.toml only for developing python libraries so far (well, this is actually the first app I am creating in python in general).
I have been playing around with poetry, but would be more interested how to do it with minimal set up that is pip + pyproject.toml.
Related
I am working on a python project that requires a few libraries. This project will further be shared with other people.
The problem I have is that I can't use the usual pip install 'library' as part of my code because the project could be shared with offline computers and the work proxy could block the download.
So what I first thought of was installing .whl files and running pip install 'my_file.whl' but this is limited since some .whl files work on some computers but not on others, so this couldn't be the solution of my problem.
I tried sharing my project with another project and i had an error with a .whl file working on one computer but not the other.
What I am looking for is to have all the libraries I need to be already downloaded before sharing my project. So that when the project is shared, the peers can launch it without needing to download the libraries.
Is this possible or is there something else that can solve my problem ?
There are different approaches to the issue here, depending on what the constraints are:
1. Defined Online Dependencies
It is a good practice to define the dependencies of your project (not only when shared). Python offers different methods for this.
In this scenario every developer has access to a pypi repository via the network. Usually the official main mirrors (i.e. via internet). New packages need to be pulled individually from here, whenever there are changes.
Repository (internet) access is only needed when pulling new packages.
Below the most common ones:
1.1 requirements.txt
The requirements.txt is a plain text list of required packages and versions, e.g.
# requirements.txt
matplotlib==3.6.2
numpy==1.23.5
scipy==1.9.3
When you check this in along with your source code, users can freely decide how to install it. The mosty simple (and most convoluted way) is to install it in the base python environment via
pip install -r requirements.txt
You can even automatically generate such a file, if you lost track with pipreqs. The result is usually very good. However, a manual cleanup afterwards is recommended.
Benefits:
Package dependency is clear
Installation is a one line task
Downsides:
Possible conflicts with multiple projects
Not sure that everyone has the exact same version if flexibility is allowed (default)
1.2 Pipenv
There is a nice and almost complete Answer to Pipenv. Also the Pipenv documentation itself is very good.
In a nutshell: Pipenv allows you to have virtual environments. Thus, version conflicts from different projects are gone for good. Also, the Pipfile used to define such an environment allows seperation of production and development dependencies.
Users now only need to run the following commands in the folder with the source code:
pip install pipenv # only needed first time
pipenv install
And then, to activate the virtual environment:
pipenv shell
Benefits:
Seperation between projects
Seperation of development/testing and production packages
Everyone uses the exact same version of the packages
Configuration is flexible but easy
Downsides:
Users need to activate the environment
1.3 conda environment
If you are using anaconda, a conda environment definition can be also shared as a configuration file. See this SO answer for details.
This scenario is as the pipenv one, but with anaconda as package manager. It is recommended not to mix pip and conda.
1.4 setup.py
When you are anyway implementing a library, you want to have a look on how to configure the dependencies via the setup.py file.
2. Defined local dependencies
In this scenario the developpers do not have access to the internet. (E.g. they are "air-gapped" in a special network where they cannot communicate to the outside world. In this case all the scenarios from 1. can still be used. But now we need to setup our own mirror/proxy. There are good guides (and even comlplete of the shelf software) out there, depending on the scenario (above) you want to use. Examples are:
Local Pypi mirror [Commercial solution]
Anaconda behind company proxy
Benefits:
Users don't need internet access
Packages on the local proxy can be trusted (cannot be corrupted / deleted anymore)
The clean and flexible scenarios from above can be used for setup
Downsides:
Network connection to the proxy is still required
Maintenance of the proxy is extra effort
3. Turn key environments
Last, but not least, there are solutions to share the complete and installed environment between users/computers.
3.1 Copy virtual-env folders
If (and only if) all users (are forced to) use an identical setup (OS, install paths, uses paths, libraries, LOCALS, ...) then one can copy the virtual environments for pipenv (1.2) or conda (1.3) between PCs.
These "pre-compiled" environments are very fragile, as a sall change can cause the setup to malfunction. So this is really not recommended.
Benefits:
Can be shared between users without network (e.g. USB stick)
Downsides:
Very fragile
3.2 Virtualisation
The cleanest way to support this is some kind of virtualisation technique (virtual machine, docker container, etc.).
Install python and the dependencies needed and share the complete container.
Benefits:
Users can just use the provided container
Downsides:
Complex setup
Complex maintenance
Virtualisation layer needed
Code and environment may become convoluted
Note: This answer is compiled from the summary of (mostly my) comments
I was trying to understand the purpose of use_develop and from the docs, I found this:
Install the current package in development mode with develop mode. For pip this uses -e option, so should be avoided if you’ve specified a custom install_command that does not support -e.
I don't understand what "development mode" means. Is this a python concept or it's specific to tox? Either way what does it mean?
development mode or editable installs is a Python concept, or even more specific a Python packaging concept.
Usually, when you package a Python application or library, the source files are packaged into a "container", either a wheel or a source distribution.
This is a good thing to distribute a package, but not for developing, as then the source files are no longer accessible.
editable installs is a concept, that instead of "moving/copying" the files into the package container, the files are just symlinked (at least this is one way).
So when you edit the source files, also the package is updated immediately.
For tox this also means that the files in the root of the source tree are importable by Python, not only the one in the package.
This might be comfortable, but there is one huge caveat. If you misconfigure the packaging setup, maybe the tests ran by tox are green, but it is entirely possible that you forget to include the source files in the package you deliver to your users.
Is there any easy way to delete no-more-using packages from requirements file?
I wrote a bash script for this task but, it doesn't work as I expected. Because, some packages are not used following their PyPI project names. For example;
dj-database-url
package is used as
dj_database_url
My project has many packages in its own requirements file, so, searching them one-by-one is too messy, error-prone and takes too much time. As I searched, IDEs don't have this property, yet.
You can use Code Inspection in PyCharm.
Delete the contents of your requirements.txt but keep the empty file.
Load your project in,
PyCharm go to Code -> Inspect code....
Choose Whole project option in dialog and click OK.
In inspection results panel locate Package requirements section under Python (note that this section will be showed only if there is any requirements.txt or setup.py file).
The section will contain one of the following messages:
Package requirement '<package>' is not satisfied if there is any package that is listed in requirements.txt but not used in any .py file.
Package '<package>' is not listed in project requirements if there is any package that is used in .py files, but not listed in requirements.txt.
You are interested in the second inspection.
You can add all used packages to requirements.txt by right clicking the Package requirements section and selecting Apply Fix 'Add requirements '<package>' to requirements.txt'. Note that it will show only one package name, but it will actually add all used packages to requirements.txt if called for section.
If you want, you can add them one by one, just right click the inspection corresponding to certain package and choose Apply Fix 'Add requirements '<package>' to requirements.txt', repeat for each inspection of this kind.
After that you can create clean virtual environment and install packages from new requirements.txt.
Also note that PyCharm has import optimisation feature, see Optimize imports.... It can be useful to use this feature before any other steps listed above.
The best bet is to use a (fresh) python venv/virtual-env with no packages, or only those you definitely know you need, test your package - installing missing packages with pip as you hit problems which should be quite quick for most software then use the pip freeze command to list the packages you really need. Better you you could use pip wheel to create a wheel with the packages in.
The other approach would be to:
Use pylint to check each file for unused imports and delete them, (you should be doing this anyway),
Run your tests to make sure that it was right,
Use a tool like snakefood or snakefood3 to generate your new list of dependencies
Note that for any dependency checking to work well it is advisable to avoid conditional import and import within functions.
Also note that to be sure you have everything then it is a good idea to build a new venv/virtual-env and install from your dependencies list then re-test your code.
You can find obsolete dependencies by using deptry, a command line utility that checks for various issues with a project's dependencies, such as obsolete, missing or transitive dependencies.
Add it to your project with
pip install deptry
and then run
deptry .
Example output:
-----------------------------------------------------
The project contains obsolete dependencies:
Flask
scikit-learn
scipy
Consider removing them from your projects dependencies. If a package is used for development purposes, you should add
it to your development dependencies instead.
-----------------------------------------------------
Note that for the best results, you should be using a virtual environment for your project, see e.g. here.
Disclaimer: I am the author of deptry.
In pycharm go to Tools -> Sync Python Requirements. There's a 'Remove unused requirements' checkbox.
I've used with success pip-check-reqs.
With command pip-extra-reqs your_directory it will check for all unused dependencies in your_directory
Install it with pip install pip-check-reqs.
Imagine I have:
Library Z
Library Y, which depends on Library Z
Application A, which depends on Library Y
To fully test out changes to Library Z, I'd like to run the tests of Application A with any Development releases of Library Z.
To do this I can set up Library Z to publish packages to some package index for development releases under the versioning scheme {major}.{minor}.{micro}.dev{build}, then have Library Y specify it's dependency range for Library Z as >={major},<{major+1} for instance and use pip install --pre ... on Application A to ensure the Development releases of Library Z are picked up.
This all works fine, until we have > 1 maintainer of Library Z making changes, likely in different git branches, and effectively competing on the {build} number. Wondering how folks have solved this problem?
This problem gets potentially worse as well if in Application A you are also in a situation where > 1 maintainer are making changes and not everyone wants to ingest the Development release, so ensure the --pre flag is optionally passed and ideally synced up with just the dependency in question (possible with poetry via the more granular allow-prereleases flag, see docs here).
Editable installs are likely considered out of scope, this set up is a trivial case. In reality this dependency graph could be deeper, and is often pared with Docker to make it commercially viable when pared with C dependencies so the complexity of hooking up volume mounts very hard. Also the user developing Library Z, may be different than the person testing Application A.
Whilst I used pip in the examples here, in reality our system uses poetry (and pip in places).
[given the information exchanged in the comments...] Then version control branches should do. If you agree that a certain branch in lib Z will have the new features app A needs for the purposes of build W, just keeping that branch updated won´t require any changes on the configuration of W, and no package building: at this point you are down to being able to switch the requirement list (and the pointers to each branch) on the build process of A.
I don't know how Poetry could handle this, but using a setup.py file it should be trivial: just read different text files listing the requirements (with pointers to specific GIT branches or Tags), based on a system-variable or other configuration you can easily change for the build. Since setup.py are plain Python code file, one can just use if statements along with os.environ to check environment variables, and read the contents of requirements_setup_W.txt to feed the install_requires parameter of the call to setup.
Either way, you are saying each package might have more than one state of "pre-release", which would be interesting for different builds - I think managing that through branches in the version control would be better than uploading several differing packages for a repository. or maybe, change the uploaded package name for each finality. So, you could have a package named libZ_setupW on the local pypi (and rotate the requirements by code on setup.py . (again, the main pain point of poetry is trashing executable code expecting all build needs can be represented by config files: they can't. I just looked around, and there seems to be no "poetry pre-build hook", which could be used to rename (or dynamically rewrite) your pyproject.toml according to the desired setup. One could add a script to do just that, but it would need to be called manually before calling poetry install)
Imagine a project MyLibrary which used to have its own requirements.txt file specifying all the versions needed by each of the dependencies...
lib_a==0.1
lib_b==0.11
lib_c==0.1.1
lib_d==0.1.2
lib_e==0.1.8
And a project ChildProject which happens to have the same kind of setup, with its own requirements.txt file and everything.
ChildProject uses MyLibrary as it needs some common functionality it has. The problem with this two, is that ChildProject has a library which is also specified in MyLibrary, but with a different version which causes conflict and causing the build to fail.
What I've done to get rid of the problem is to erase the dependencies in MyLibrary and specify the minimum and maximum versions for each of the libraries, specifying those in the setup_requires property within the setup() method...
setup(
setup_requires=['pbr', 'pytest-runner'],
install_requires=[
'lib_a>=0,<1',
'lib_b>=0,<2',
'lib_c>=0,<3',
'lib_d>=0,<4',
'lib_e>=0,<5'
],
pbr=True,
)
And here is where I get lost...
Should I remove requirements.txt in MyLibrary and leave all the versioning to child projects using ?
If so, how do I know that ChildProject is specifying all of the needed dependencies? What if I miss to specify lib_a in ChildProject?
Does the latest version that complies with the setup_requires constraints gets automatically installed or how does it work? (I ask this because AFAIK, install_requires just specified the constraints but doesn't include any library whatsoever in the project).
General suggestions for managing deps versions:
libraries dont't pin versions (i.e. either install_requires doesn't have version at all, or loose restrictions, i.e. <4). That's what you have already
applications can do whatever needed. In reality, it's highly recommended to pin your dependencies to some exact version (ant better yet — provide hash, to save yourself from forged libs). Reason for this — you can not guarantee 3rd-party libraries to follow semver. Which means that having >2, <3 in your requirements.txt may lead to broken build/deployment, because 3rd party lib released 2.5 which appears to be backward-incompatible with 2.4. So, you must do you best to avoid breaking builds by just re-building in different time. In other words, your build should be idempotent on PyPI state.
In general — you pin version to some state, test your application and commit/save/build/however you deliver. Some time later, you're revising versions (i.e. to update framework or address security patch), updating version in requirements.txt, testing your app with new deps state, if there's no conflicts/broken parts, you "freeze" that state with pinned versions, and build/deploy/etc. This kind of loop gives you space to occasionally update your requirements to stay up to date, and at the same time you have code that will not be broken by just re-installing dependencies.
If you're looking to easier dep management with version, I suggest taking a look at pipenv