Cleaning up requirements.txt - python

I was playing with Python and Machine Learning. During the experimental phase, I added more and more stuff to the requirements.txt file.
Now that I know what code I want to keep, I have deleted those experiments which are not helpful. Thus, some requirements have become obsolete. I'm now looking for a way to clean up my requirements.txt file.
Initially I thought I could just clear the file entirely and then go through the import statements and let PyCharm add them. However, that's not a good idea, because it will just add the latest version of the library, which isn't always possible. I need some libraries in special versions.
Is there a good way to clean up the requirements? I'm thinking of an action similar to "Optimize imports" (PyCharm) or "remove unused variable" or "Remove unused libraries" (Resharper).

I think I found a pypi package that could be useful - pip_check_reqs.
It looks like there is a tool in it - pip-extra-reqs that
would find anything that is listed in requirements.txt but that is not
imported by sample.
I guess in this example "sample" is your python module:
pip-extra-reqs --ignore-file=sample/tests/* sample
I would give that package a try.

Related

Is there a way to package tests in python

In my CI/CD environment I have a multiple projects that use mostly the same tests, with a bit of variation. Since all of them are mostly the same, just different projects/builds use them a bit differently, I am looking for a way (if there is one) to package the tests themselves to pass around the projects. EDIT: Packaging tested code is not possible.
The ultimate usage will be something like this:
pip install <test-package>
pytest -m <some-mark-depending-on-build/project> --<additional-variables>
Is there a way to do this?
EDIT: If there is, please point me out toward a solution.
Thanks in advance.
Keeping it here for references.
The way to do this is to create a test package that can run as python module, from main.py.
After researching and some testing, I've concluded that in my case this will create more code to maintain than I would otherwise properly reuse.

How do I determine which requirements are actually needed in setup.py?

I'm cleaning up packaging for a python project I didn't create. Currently, it does some explicitly unsupported magic to get its dependencies from a requirements.txt file. The file looks like it may have been generated by pip freeze; there are fixed versions for everything, and many apparently-extraneous packages listed. I am pretty sure some of these aren't real dependencies, but I don't know which ones.
Given just the source tree, how would I figure out, from scratch, what dependencies ought to be included in install_requires?
As a first stab, I'm grepping for non-stdlib import statements. I hope there's a better way.
There's no way to do this perfectly, because Python is too flexible.
But it's usually possible to do it well enough.
You can use start with the stdlib's modulefinder.
Beyond that, a number of projects—mostly projects designed for building binary executables, installers, etc. for Python apps—have come up with heuristics that go even farther.
These usually work. And, when they fail, you usually immediately spot it on your first test. Even if they aren't sufficient, they're at the very least good sample code. Here are a few off the top of my head:
cx_Freeze
py2exe
py2app
pyInstaller
In case you're wondering why it's impossible:
Even forgetting about the program of dependencies in C extension modules, Python is just too flexible to catch all the ways you could import a module via static analysis.
Sure, you'd have to be dealing with code written by someone crazy enough to use explicitly unsupported magic for no good reason… but if you were, there's nothing to stop someone from writing this instead of import lxml:1
with open('picture.jpg', encoding='cp500') as f:
getattr(sys.modules[11], codecs.encode('vzcbeg_zbqhyr', 'rot13'))(f.read().strip())
In reality, things aren't going to be that bad. But they could easily be too bad for rg import to be sufficient.
You could try to detect all the imports dynamically with a simple import hook, but that's only guaranteed to work if you can exercise 100% of the code paths.
1. Of course this only works if importlib was the 12th module loaded, and if picture.jpg is not a JPEG image but a textfile whose contents are, in EBCDIC, lxml\n
I've had great results with pipreqs that will automatically generate a requirements.txt file from your source code.
pipreqs /home/project/location
Successfully saved requirements file in /home/project/location/requirements.txt
I wrote a tool, realreq, specifically for this issue.
You can install it using pip python3 -m pip install realreq. Using it is easy as:
realreq -s /path/to/your/source
It will then gather your dependencies actually used in your source code.
I mean, the most effective way would honestly be to go through the code line by line and determine what packages may not be needed, what packages need updates, etc. I know Python 2 and 3 both have ModuleFinder which finds all the modules a script needs to successfully compile and run, but I've never used it before, so not sure how effective it is, especially for what you're doing. However, if you're interested, I'll attach the link below.
https://docs.python.org/3/library/modulefinder.html

Utilizing the Dependency-Graph of pip

I want to write a visualization of the Dependency-Graph of all python-packages installed with pip. My problem is that the code is poorly documented, and im unable to find where the Graph is stored in the source Code.
I hope someone has enough knowledge about pip-sourcecode to help me out.
Also im new to python and am not sure if i should just make my adjustments in the existing source-code, or write a module for it, although im leaning more towards the latter.
// edit: I can get all installed modules via pip freeze, but that givbes me only one list without the dependencies. So i have to find a way to extract the dependencies from that list.
Yes, its code is quite unreadable if you're not used to it. I don't recall something like that and I would not use it. You may find yourself better suited with distlib, which has a module just for that: https://distlib.readthedocs.org/en/latest/depgraph.html
Heres what i found during my search:
Pip doesn't use a Dependency-graph at all internally. (As of version 1.3.X)
So one solution is to do the following:
You can install setuptools, if you havent allready. It brings a module named pkg_resources.
This module has all the tools, to see all installed modules (not only the ones installed with pip) in your desired dists-directory. You can then read out the metadata (including requirements/dependencies) with methods that are as well included in pkg_resources.

Reading setup.py via API to test requirements

I'm working on a plugin system and was thinking of simply using a setup.py file for each 'plugin' since it's already an existing dependency. The thing is, I need a way to test requirements.
Is there already an existing API in place for this, or would it make more sense just to roll a custom system and check it manually?
setup.py is a script, and you can't generally parse that to figure out requirements, especially since some setup scripts will change the requirements depending on the python version used to run them.
There is an upcoming standard that will fix this: PEP 345. At this point in time very few packages make use of this though. For more information on this topic you can look at the distutils-sig list archives where this topic has come up several times.
Have you looked at egg entry points? They basically implement a plugin system that you can use directly. This stackoverflow question has some information that might be interesting.

Using git to manage virtualenv state: will this cause problems?

I currently have git and virtualenv set up in a way which exactly
suits my needs and, so far, hasn't caused any problems. However I'm aware that
my setup is non-standard and I'm wondering if anyone more familiar with virtualenv's
internals can point out if, and where, it's likely to go wrong.
My setup
My virtualenv is inside my git repository, but git is set to ignore the bin and include directories and everything in lib except for the site-packages directory.
More precisely, my .gitignore file looks like this:
*.pyc
# Ignore all the virtualenv stuff except the actual packages
# themselves
/bin
/include
/lib/python*/*
!/lib/python*/site-packages
# Ignore easyinstall and setuptools
/lib/python*/site-packages/easy-install.pth
/lib/python*/site-packages/setuptools.pth
/lib/python*/site-packages/setuptools-*
/lib/python*/site-packages/pip-*
With this arrangement I -- and anyone else working on a checkout of the project -- can use virtualenv and pip as normal but with the following advantages:
If anyone updates or installs a package and pushes their changes, anyone else who pulls those changes automatically gets the update: they don't need to notice that a requirements.txt file has changed or do any post-receive hook magic.
There are no network dependencies: all the code to make the application work lives in the git repository.
I'm aware that this only works with pure-Python packages, but that's all I'm concerned with at the moment.
Does anyone know of any other problems with this approach that I should be aware of?
This is an interesting question. I think the other two answers (thus far) raise good specific points. Clearly you've thought this through and have arrived at a solution you like, but I'll note that there does seem to be a philosophical split here among virtualenv users.
One camp, to which I'd guess you belong, feels that the local VE is part of the project (i.e. it should be under version control). The other feels that the VE should essentially be treated as a development artifact -- that requirements.txt should be part of the project repo, but that you should be able to blow away and recreate the VE as needed.
I just mention this because when I first saw this distinction made, it helped shape my thinking about virtualenv. (I'm in the second camp, FWIW, because it seems simpler and cleaner to me, but that's not to say that being in the first camp is wrong for your particular project.)
If you have any -e items in your requirements.txt - in other words if you have any editable dependencies as described in the format that are using the git version control system you will most likely run into issues when your src directory is committed. If you just add /src to your .gitignore then you can avoid this problem, but often the installation just adds a pointer in site-packages to this location, so you need it!
In /lib/python2.7/site-packages in my virtualenv I have absolute paths, for example in Django.egg-link I have /Users/henry/.virtualenv/mowapp/src/django (this is a Mac).
Check to see if you have any absolute paths checked into git, I think that is a major problem with this approach.

Categories

Resources