Behavior change in pip wheel - python

I have a Python application that I am upgrading from Python 3.6 to 3.9 (some of our dependencies don't support 3.10 yet). For... complicated reasons we use a single source tree to build various different configurations with different subsets of files, so part of setup.py is devoted to figuring out which files aren't used and deleting them. With Python 3.6.9 and Pip 20.3.3, I can just run pip wheel --no-deps . and it works fine, as pip creates a working directory in $TMPDIR, copies the source tree, and only works from the copy. However, using Python 3.9.14 and Pip 22.0.4, it seems something has changed and Pip now operates on the current directory itself - meaning it deletes the "original" copies of files instead. Not a huge problem, I can just use Git to restore everything to the last commit (or manually create and work from a temporary directory), but it would be simpler to enable the old behavior. Nothing in the documentation for pip wheel says anything about using (or not using) a temporary directory, and I can't find anything about when or why this behavior might have changed. Is it possible to do make newer versions of Pip enable this behavior?

It turns out to have been due to changing behavior of Pip. Pip 21.1 introduced the --use-feature=in-tree-build option and Pip 21.3 made in-tree builds the default while introducing a temporary deprecated option --use-deprecated=out-of-tree-build as a temporary workaround to enable the old behavior; however this was changed from deprecated to removed in Pip 22.1b1.
Options we're considering for our workflow:
use the exclude_package_data argument to setup()
change our automated build system to run git checkout -- . afterwards automatically
change our automated build system to copy the necessary files to a temporary directory itself, basically moving the functionality to a higher stage up
Figure out a better way to accomplish what we currently accomplish by omitting files
Which might better for you, oh traveller on the dunes of time, I cannot say.

Related

setuptools "eager_resources" to executable directory

I maintain a Python utility that allows bpy to be installable as a Python module. Due to the hugeness of the spurce code, and the length of time it takes to download the libraries, I have chosen to provide this module as a wheel.
Unfortunately, platform differences and Blender runtime expectations makes support for this tricky at times.
Currently, one of my big goals is to get the Blender addon scripts directory to install into the correct location. The directory (simply named after the version of Blender API) has to exist in the same directory as the Python executable.
Unfortunately the way that setuptools works (or at least the way that I have it configured) the 2.79 directory is not always placed as a sibling to the Python executable. It fails on Windows platforms outside of virtual environments.
However, I noticed in setuptools documentation that you can specify eager_resources that supposedly guarantees the location of extracted files.
https://setuptools.readthedocs.io/en/latest/setuptools.html#automatic-resource-extraction
https://setuptools.readthedocs.io/en/latest/pkg_resources.html#resource-extraction
There was a lot of hand waving and jargon in the documentation, and 0 examples. I'm really confused as to how to structure my setup.py file in order to guarantee the resource extraction. Currently, I just label the whole 2.79 directory as "scripts" in my setuptools Extension and ship it.
Is there a way to write my setup.py and package my module so as to guarantee the 2.79 directory's location is the same as the currently running python executable when someone runs
py -3.6.8-32 -m pip install bpy
Besides simply "hacking it in"? I was considering writing a install_requires module that would simply move it if possible but that is mangling with the user's file system and kind of hacky. However it's the route I am going to go if this proves impossible.
Here is the original issue for anyone interested.
https://github.com/TylerGubala/blenderpy/issues/13
My build process is identical to the process descsribed in my answer here
https://stackoverflow.com/a/51575996/6767685
Maybe try the data_files option of distutils/setuptools.
You could start by adding data_files=[('mydata', ['setup.py'],)], to your setuptools.setup function call. Build a wheel, then install it and see if you can find mydata/setup.py somewhere in your sys.prefix.
In your case the difficult part will be to compute the actual target directory (mydata in this example). It will depend on the platform (Linux, Windows, etc.), if it's in a virtual environment or not, if it's a global or local install (not actually feasible with wheels currently, see update below) and so on.
Finally of course, check that everything gets removed cleanly on uninstall. It's a bit unnecessary when working with virtual environments, but very important in case of a global installation.
Update
Looks like your use case requires a custom step at install time of your package (since the location of the binary for the Python interpreter relative to sys.prefix can not be known in advance). This can not be done currently with wheels. You have seen it yourself in this discussion.
Knowing this, my recommendation would be to follow the advice from Jan Vlcinsky in his comment for his answer to this question:
Post install script after installing a wheel.
Add an extra setuptools console entry point to your package (let's call it bpyconfigure).
Instruct the users of your package to run it immediately after installing your package (pip install bpy && bpyconfigure).
The purpose of bpyconfigure should be clearly stated (in the documentation and maybe also as a notice shown in the console right after starting bpyconfigure) since it would write into locations of the file system where pip install does not usually write.
bpyconfigure should figure out where is the Python interpreter, and where to write the extra data.
The extra data to write should be packaged as package_data, so that it can be found with pkg_resources.
Of course bpyconfigure --uninstall should be available as well!

Edit package installed by pip

I'm trying to edit a package that I installed via pip, called py_mysql2pgsql (I had an error when converting my db from mysql to postgre, just like this.
However, when I got to the folder /user/local/lib/python2.7/dist-packages/py_mysql2pgsql-0.1.5.egg-info, I cannot find the source code for the package. I only find PKG-INFO and text files.
How can I find the actual source code for a package (or in particular, this package)?
Thanks
TL;DR:
Modifying in place is dangerous. Modify the source and then install it from your modified version.
Details
pip is a tool for managing the installation of packages. You should not modify files creating during package installation. At best, doing so would mean pip will believe that a particular version of the package is installed when it isn't. This would not interact well with the upgrade function. I suspect pip would just overwrite your customizations, discarding them forever, but I haven't confirmed. The other possibility is that it checks if files have changed and throws an error if so. (I don't think that's likely.) It also misleads other users of the system. They see that you have a package installed, but you don't actually have that version indicated; you have a customized version. This is likely to result in confusion if they try to install the unmodified version somewhere else or if they expect some particular behavior from the version installed.
If you want to modify the source code, the right thing to do is modify the source code and either build a new, custom package or just install from source. py-mysql2pgsql provides instructions for performing a source install:
> git clone git://github.com/philipsoutham/py-mysql2pgsql.git
> cd py-mysql2pgsql
> python setup.py install
You can clone the source, modify it, and then install without using pip. You could alternatively build your own customized version of the package if you need to redistribute it internally. This project uses setuptools for building its packages, so you only need to familiarize yourself with setuptools to make use of their setup.py file. Make sure that installing it this way doesn't create any misleading entries in pip's package list. If it does, either find a way to make sure the entry is more clear or find an alternative install method.
Since you've discovered a bug in the software, I also highly recommend forking it on Github and submitting a pull request once you have it fixed. If you do so, you can use the above installation instructions just by changing the repository URL to your fork. If you don't fork it, at least file an issue and describe the changes that fix it.
Alternatives:
You could copy all the source code into your project, modify it there, and then distribute the modified version with the rest of your code. (Make sure you don't violate the license if you do so.)
You might be able to solve you problem at runtime. Monkey-patching the module is a little risky if other people on your team might not expect the change in behavior, but it could be done for global modification of the module's behavior. You could also create some additional code that wraps the buggy code: it can take input, call the buggy code, and either prevents or handles the bug (e.g., modifying the input to make it work or catching an exception and handling it, etc.).
just print out the .__file__ attribute of the module:
>>> import numpy
>>> numpy.__file__
'/Library/Frameworks/Python.framework/Versions/3.5/lib/python3.5/site-packages/numpy/__init__.py'
Obviously the path and specific package would be different for you but this is pretty fool proof way of tracking down the source file of any module in python.
You can patch pip packages quite easily with the patch command.
When you do this, it's important that you specify exact version numbers for the packages that you patch.
I recommend using Pipenv, it creates a lock file where all versions of all dependencies and sub-dependencies are locked, so that always the same versions of packages are installed. It also manages your virtual env, and makes it convenient to use the method described here.
The first argument to the patch command is the file you want to patch. So that should be the module of the pip package, which is probably inside a virtualenv.
If you use Pipenv, you can get the virtual env path with pipenv --venv, so then you could patch the requests package like this:
patch $(pipenv --venv)/lib/python3.6/site-packages/requests/api.py < requests-api.patch
The requests.patch file is a diff file, which could look like:
--- requests/api.py 2022-05-03 21:55:06.712305946 +0200
+++ requests/api_new.py 2022-05-03 21:54:57.002368710 +0200
## -54,6 +54,8 ##
<Response [200]>
"""
+ print(f"Executing {method} request at {url}")
+
# By using the 'with' statement we are sure the session is closed, thus we
# avoid leaving sockets open which can trigger a ResourceWarning in some
# cases, and look like a memory leak in others.
You can make the patch file like this:
diff -u requests/api.py requests/api_new.py > requests-api.patch
Where requests/api_new.py would be the new, updated version of requests/api.py.
The -u flag to the diff command gives a unified diff format, which can be used to patch files later with the patch command.
So this method could be used in an automated process. Just make sure that you have specified an exact version numbers for the module that you patch. You don't want the module to upgrade unexpectedly, because you might have to update the patch file. So you also have to keep in mind, that if you ever manually upgrade the module, that you also check if the patch file needs to be recreated, and do so if it is necessary. It is only necessary when the file that you are patching has been updated in the new version of the package.
py_mysql2pgsql package is hosted on PyPI: https://pypi.python.org/pypi/py-mysql2pgsql
If you want the code for that specific version, just download the source tarball from PyPI (py-mysql2pgsql-0.1.5.tar.gz)
Development is hosted on GitHub: https://github.com/philipsoutham/py-mysql2pgsql

Easy-install live python libraries/scripts

I have a number of python "script suites" (as I call them) which I would like to make easy-to-install for my colleagues. I have looked into pip, and that seems really nice, but in that regimen (as I understand it) I would have to submit a static version and update them on every change.
As it happens I am going to be adding and changing a lot of stuff in my script suites along the way, and whenever someone installs it, I would like them to get the newest version. With pip, that means that on every commit to my repository, I will also have to re-submit a package to the PyPI index. That's a lot of unnecessary work.
Is there any way to provide an easy cross-platform installation (via pip or otherwise) which pulls the files directly from my github repo?
I'm not sure if I understand your problem entirely, but you might want to use pip's editable installs[1]
Here's a brief example: In this artificial example let's suppose you want to use git as CVS.
git clone url_to_myrepo.git path/to/local_repository
pip install [--user] -e path/to/local_repository
The installation of the package will reflect the state of your local repository. Therefore there is no need to reinstall the package with pip when the remote repository gets updated. Whenever you pull changes to your local repository, the installation will be up-to-date as well.
[1] http://pip.readthedocs.org/en/latest/reference/pip_install.html#editable-installs

Locally modify package from pip

I've locally installed via pip Python package in virtualenv. I'd like to modify it (not monkey patch or subclass, but deeply modify) and keep it in my source control system referencing without installing. Maybe later I'd like to package it again so I'd like to keep all files for creating package, not only python sources.
Should I just copy it to my project folder and deinstall from virtualenv?
Two points. One, are the changes you're planning to make useful for anyone else? If the first, you might consider cloning the source repo, making your changes and submitting a PR. Even if it's not immediately merged, you can make use of setup.py to create a local package and install that in your virtualenv.
And two, are you planning to use these changes for just one project, or on many projects? If it's just for one project, throwing it in your repo and deeply modifying it is probably an ok thing, (although you need to confirm you're allowed to do so by the license). If you can foresee using this in multiple projects, you're probably better off creating a repo for it, and packaging it via setup.py.

Is there a way to "version" my python distribution?

I'm working by myself right now, but am looking at ways to scale my operation.
I'd like to find an easy way to version my Python distribution, so that I can recreate it very easily. Is there a tool to do this? Or can I add /usr/local/lib/python2.7/site-packages/ (or whatever) to an svn repo? This doesn't solve the problems with PATHs, but I can always write a script to alter the path. Ideally, the solution would be to build my Python env in a VM, and then hand copies of the VM out.
How have other people solved this?
virtualenv + requirements.txt are your friend.
You can create several virtual python installs for your projects, everything containing exactly those library versions you need (Tip: pip freeze spits out a requirements.txt with the exact library versions).
Find a good reference to virtualenv here: http://simononsoftware.com/virtualenv-tutorial/ (it's from this question Comprehensive beginner's virtualenv tutorial?).
Alternatively, if you just want to distribute your code together with libraries, PyInstaller is worth a try. You can package everything together in a static executable - you don't even have to install the software afterwards.
You want to use virtualenv. It lets you create an application(s) specific directory for installed packages. You can also use pip to generate and build a requirements.txt
For the same goal, i.e. having the exact same Python distribution as my colleagues, I tried to create a virtual environment in a network drive, so that everybody of us would be able to use it, without anybody making his local copy.
The idea was to share the same packages installed in a shared folder.
Outcome: Python run so unbearably slow that it could not be used. Also installing a package was very very sluggish.
So it looks there is no other way than using virtualenv and a requirements file. (Even if unfortunately often it does not always work smoothly on Windows and it requires manual installation of some packages and dependencies, at least at this time of writing.)

Categories

Resources