Easy-install live python libraries/scripts

Easy-install live python libraries/scripts - python

I have a number of python "script suites" (as I call them) which I would like to make easy-to-install for my colleagues. I have looked into pip, and that seems really nice, but in that regimen (as I understand it) I would have to submit a static version and update them on every change.
As it happens I am going to be adding and changing a lot of stuff in my script suites along the way, and whenever someone installs it, I would like them to get the newest version. With pip, that means that on every commit to my repository, I will also have to re-submit a package to the PyPI index. That's a lot of unnecessary work.
Is there any way to provide an easy cross-platform installation (via pip or otherwise) which pulls the files directly from my github repo?

I'm not sure if I understand your problem entirely, but you might want to use pip's editable installs[1]
Here's a brief example: In this artificial example let's suppose you want to use git as CVS.
git clone url_to_myrepo.git path/to/local_repository
pip install [--user] -e path/to/local_repository
The installation of the package will reflect the state of your local repository. Therefore there is no need to reinstall the package with pip when the remote repository gets updated. Whenever you pull changes to your local repository, the installation will be up-to-date as well.
[1] http://pip.readthedocs.org/en/latest/reference/pip_install.html#editable-installs

Related

Behavior change in pip wheel

I have a Python application that I am upgrading from Python 3.6 to 3.9 (some of our dependencies don't support 3.10 yet). For... complicated reasons we use a single source tree to build various different configurations with different subsets of files, so part of setup.py is devoted to figuring out which files aren't used and deleting them. With Python 3.6.9 and Pip 20.3.3, I can just run pip wheel --no-deps . and it works fine, as pip creates a working directory in $TMPDIR, copies the source tree, and only works from the copy. However, using Python 3.9.14 and Pip 22.0.4, it seems something has changed and Pip now operates on the current directory itself - meaning it deletes the "original" copies of files instead. Not a huge problem, I can just use Git to restore everything to the last commit (or manually create and work from a temporary directory), but it would be simpler to enable the old behavior. Nothing in the documentation for pip wheel says anything about using (or not using) a temporary directory, and I can't find anything about when or why this behavior might have changed. Is it possible to do make newer versions of Pip enable this behavior?

It turns out to have been due to changing behavior of Pip. Pip 21.1 introduced the --use-feature=in-tree-build option and Pip 21.3 made in-tree builds the default while introducing a temporary deprecated option --use-deprecated=out-of-tree-build as a temporary workaround to enable the old behavior; however this was changed from deprecated to removed in Pip 22.1b1.
Options we're considering for our workflow:
use the exclude_package_data argument to setup()
change our automated build system to run git checkout -- . afterwards automatically
change our automated build system to copy the necessary files to a temporary directory itself, basically moving the functionality to a higher stage up
Figure out a better way to accomplish what we currently accomplish by omitting files
Which might better for you, oh traveller on the dunes of time, I cannot say.

Does the Python Package Manager pip have something like yarn link?

I am wanting to fork a python (pip) dependency that I am using and make some edits to it, etc. And I don't want to risk a pip update/upgrade erasing my changes.
In the javascript world, and easy way to do what I want to do is with the yarn link command.
Is there a command similar to yarn link when using python/pip?

So, I found out how to do this. Instead of doing a normal pip install, you can do the following:
Checkout a repo of the forked package
Then, run this command
pip install -e /path/to/the/package/on/local/file/system
This creates an editable install of the package in the folder of your choosing, so you can develop and make changes and see the effect of the changes immediately.
I'm sure seasoned python developers already know this. But I'm not in python everyday. I've been wanting to know how to do this for a long time. Finally figured it out. I hope this helps someone else!

Edit package installed by pip

I'm trying to edit a package that I installed via pip, called py_mysql2pgsql (I had an error when converting my db from mysql to postgre, just like this.
However, when I got to the folder /user/local/lib/python2.7/dist-packages/py_mysql2pgsql-0.1.5.egg-info, I cannot find the source code for the package. I only find PKG-INFO and text files.
How can I find the actual source code for a package (or in particular, this package)?
Thanks

TL;DR:
Modifying in place is dangerous. Modify the source and then install it from your modified version.
Details
pip is a tool for managing the installation of packages. You should not modify files creating during package installation. At best, doing so would mean pip will believe that a particular version of the package is installed when it isn't. This would not interact well with the upgrade function. I suspect pip would just overwrite your customizations, discarding them forever, but I haven't confirmed. The other possibility is that it checks if files have changed and throws an error if so. (I don't think that's likely.) It also misleads other users of the system. They see that you have a package installed, but you don't actually have that version indicated; you have a customized version. This is likely to result in confusion if they try to install the unmodified version somewhere else or if they expect some particular behavior from the version installed.
If you want to modify the source code, the right thing to do is modify the source code and either build a new, custom package or just install from source. py-mysql2pgsql provides instructions for performing a source install:
> git clone git://github.com/philipsoutham/py-mysql2pgsql.git
> cd py-mysql2pgsql
> python setup.py install
You can clone the source, modify it, and then install without using pip. You could alternatively build your own customized version of the package if you need to redistribute it internally. This project uses setuptools for building its packages, so you only need to familiarize yourself with setuptools to make use of their setup.py file. Make sure that installing it this way doesn't create any misleading entries in pip's package list. If it does, either find a way to make sure the entry is more clear or find an alternative install method.
Since you've discovered a bug in the software, I also highly recommend forking it on Github and submitting a pull request once you have it fixed. If you do so, you can use the above installation instructions just by changing the repository URL to your fork. If you don't fork it, at least file an issue and describe the changes that fix it.
Alternatives:
You could copy all the source code into your project, modify it there, and then distribute the modified version with the rest of your code. (Make sure you don't violate the license if you do so.)
You might be able to solve you problem at runtime. Monkey-patching the module is a little risky if other people on your team might not expect the change in behavior, but it could be done for global modification of the module's behavior. You could also create some additional code that wraps the buggy code: it can take input, call the buggy code, and either prevents or handles the bug (e.g., modifying the input to make it work or catching an exception and handling it, etc.).

just print out the .__file__ attribute of the module:
>>> import numpy
>>> numpy.__file__
'/Library/Frameworks/Python.framework/Versions/3.5/lib/python3.5/site-packages/numpy/__init__.py'
Obviously the path and specific package would be different for you but this is pretty fool proof way of tracking down the source file of any module in python.

You can patch pip packages quite easily with the patch command.
When you do this, it's important that you specify exact version numbers for the packages that you patch.
I recommend using Pipenv, it creates a lock file where all versions of all dependencies and sub-dependencies are locked, so that always the same versions of packages are installed. It also manages your virtual env, and makes it convenient to use the method described here.
The first argument to the patch command is the file you want to patch. So that should be the module of the pip package, which is probably inside a virtualenv.
If you use Pipenv, you can get the virtual env path with pipenv --venv, so then you could patch the requests package like this:
patch $(pipenv --venv)/lib/python3.6/site-packages/requests/api.py < requests-api.patch
The requests.patch file is a diff file, which could look like:
--- requests/api.py 2022-05-03 21:55:06.712305946 +0200
+++ requests/api_new.py 2022-05-03 21:54:57.002368710 +0200
## -54,6 +54,8 ##
<Response [200]>
"""
+ print(f"Executing {method} request at {url}")
+
# By using the 'with' statement we are sure the session is closed, thus we
# avoid leaving sockets open which can trigger a ResourceWarning in some
# cases, and look like a memory leak in others.
You can make the patch file like this:
diff -u requests/api.py requests/api_new.py > requests-api.patch
Where requests/api_new.py would be the new, updated version of requests/api.py.
The -u flag to the diff command gives a unified diff format, which can be used to patch files later with the patch command.
So this method could be used in an automated process. Just make sure that you have specified an exact version numbers for the module that you patch. You don't want the module to upgrade unexpectedly, because you might have to update the patch file. So you also have to keep in mind, that if you ever manually upgrade the module, that you also check if the patch file needs to be recreated, and do so if it is necessary. It is only necessary when the file that you are patching has been updated in the new version of the package.

py_mysql2pgsql package is hosted on PyPI: https://pypi.python.org/pypi/py-mysql2pgsql
If you want the code for that specific version, just download the source tarball from PyPI (py-mysql2pgsql-0.1.5.tar.gz)
Development is hosted on GitHub: https://github.com/philipsoutham/py-mysql2pgsql

Use setuptools to install from location

I have a framework for a site that I want to use in multiple projects but I don't want to submit my framework to PyPi. Is there anyway I can tell my setup.py to install the framework from a specific location?
Here is my current setup.py
from setuptools import setup
setup(
name='Website',
version='0.2.1',
install_requires=[
'boto>=2.6',
'fabric>=1.4',
'lepl>=5.1',
'pygeoip>=0.2.4',
'pylibmc>=1.2.3',
'pymongo>=2.2',
'pyyaml>=3.1',
'requests>=0.12',
'slimit>=0.7.4',
'thrift>=0.8.0',
'tornado>=2.3',
],
)
Those dependencies are actually all the dependencies for my framework so if I could include it somehow I could only have the framework listed here.

It looks like all of your requirements are public (on PyPI), and you don't need specific versions of them, just "new enough". In 2016, when you can count on everyone having a recent-ish version of pip, there's really nothing to do. If you just pip install . from the source directory or pip install git+https://url/to/package or similar, it will just pull the latest versions of the dependencies off the net. The fact that your package isn't on PyPI won't stop pip from finding its dependencies there.
Or, if you want to stash them all locally, you can set up a local PyPI index. Although in that case, it probably would be simpler to push your package to that same local index, and install it from there.
If you need anything more complicated, a requirements file can take care of that for you.
In particular, if you need to distribute the package to other people in your organization who may not have your team's local index set up, or for some reason you can't set up a local index in the first place, you can put all the necessary information in the requirements file--or, if it's more appropriate, on the command line used to install your package (which even works if you're stuck with easy_install or ancient versions of pip).
The documentation gives full details, and this blog post explains it very nicely, but the short version is this:
If you have a local PyPI index, provide --extra-index-url=http://my.server/path/to/my/pypi/.
If you've got an HTTP server that you can drop the packages on, and you can enable the "auto index directory contents" option in your server, just provide --find-links=http://my.server/path/to/my/packages/.
If you want to use local files (or SMB/AFP/etc. file sharing), create a trivial HTML file with nothing but links to all local packages, and provide --find-links=file:///path/to/my/index.html.
Again, these can go on the command line of a "to install this package, run this" (or a curl | sh install script), but usually you just want to put them in a requirements file. If so, make sure to use only one value per option (e.g., if you want to add two extra indexes, add two --extra-index-url params) and put each one on its own line.
A requirements file also lets you specify specific versions of each package, so you can be sure people are deploying with the same code you developed and tested with, which is often useful in these kinds of situations.

Django and VirtualEnv Development/Deployment Best Practices

Just curious how people are deploying their Django projects in combination with virtualenv
More specifically, how do you keep your production virtualenv's synched correctly with your development machine?
I use git for scm but I don't have my virtualenv inside the git repo - should I, or is it best to use the pip freeze and then re-create the environment on the server using the freeze output? (If you do this, could you please describe the steps - I am finding very little good documentation on the unfreezing process - is something like pip install -r freeze_output.txt possible?)

I just set something like this up at work using pip, Fabric and git. The flow is basically like this, and borrows heavily from this script:
In our source tree, we maintain a requirements.txt file. We'll maintain this manually.
When we do a new release, the Fabric script creates an archive based on whatever treeish we pass it.
Fabric will find the SHA for what we're deploying with git log -1 --format=format:%h TREEISH. That gives us SHA_OF_THE_RELEASE
Fabric will get the last SHA for our requirements file with git log -1 --format=format:%h SHA_OF_THE_RELEASE requirements.txt. This spits out the short version of the hash, like 1d02afc which is the SHA for that file for this particular release.
The Fabric script will then look into a directory where our virtualenvs are stored on the remote host(s).
If there is not a directory named 1d02afc, a new virtualenv is created and setup with pip install -E /path/to/venv/1d02afc -r /path/to/requirements.txt
If there is an existing path/to/venv/1d02afc, nothing is done
The little magic part of this is passing whatever tree-ish you want to git, and having it do the packaging (from Fabric). By using git archive my-branch, git archive 1d02afc or whatever else, I'm guaranteed to get the right packages installed on my remote machines.
I went this route since I really didn't want to have extra virtuenvs floating around if the packages hadn't changed between release. I also don't like the idea of having the actual packages I depend on in my own source tree.

I use this bootstrap.py: http://github.com/ccnmtl/ccnmtldjango/blob/master/ccnmtldjango/template/bootstrap.py
which expects are directory called 'requirements' that looks something like this: http://github.com/ccnmtl/ccnmtldjango/tree/master/ccnmtldjango/template/requirements/
There's an apps.txt, a libs.txt (which apps.txt includes--I just like to keep django apps seperate from other python modules) and a src directory which contains the actual tarballs.
When ./bootstrap.py is run, it creates the virtualenv (wiping a previous one if it exists) and installs everything from requirements/apps.txt into it. I do not ever install anything into the virtualenv otherwise. If I want to include a new library, I put the tarball into requirements/src/, add a line to one of the textfiles and re-run ./bootstrap.py.
bootstrap.py and requirements get checked into version control (also a copy of pip.py so I don't even have to have that installed system-wide anywhere). The virtualenv itself isn't. The scripts that I have that push out to production run ./bootstrap.py on the production server each time I push. (bootstrap.py also goes to some lengths to ensure that it's sticking to Python 2.5 since that's what we have on the production servers (Ubuntu Hardy) and my dev machine (Ubuntu Karmic) defaults to Python 2.6 if you're not careful)

Develop Reference

Python is a programming language that lets you work quickly and integrate systems more effectively.