Edit package installed by pip - python

I'm trying to edit a package that I installed via pip, called py_mysql2pgsql (I had an error when converting my db from mysql to postgre, just like this.
However, when I got to the folder /user/local/lib/python2.7/dist-packages/py_mysql2pgsql-0.1.5.egg-info, I cannot find the source code for the package. I only find PKG-INFO and text files.
How can I find the actual source code for a package (or in particular, this package)?
Thanks

TL;DR:
Modifying in place is dangerous. Modify the source and then install it from your modified version.
Details
pip is a tool for managing the installation of packages. You should not modify files creating during package installation. At best, doing so would mean pip will believe that a particular version of the package is installed when it isn't. This would not interact well with the upgrade function. I suspect pip would just overwrite your customizations, discarding them forever, but I haven't confirmed. The other possibility is that it checks if files have changed and throws an error if so. (I don't think that's likely.) It also misleads other users of the system. They see that you have a package installed, but you don't actually have that version indicated; you have a customized version. This is likely to result in confusion if they try to install the unmodified version somewhere else or if they expect some particular behavior from the version installed.
If you want to modify the source code, the right thing to do is modify the source code and either build a new, custom package or just install from source. py-mysql2pgsql provides instructions for performing a source install:
> git clone git://github.com/philipsoutham/py-mysql2pgsql.git
> cd py-mysql2pgsql
> python setup.py install
You can clone the source, modify it, and then install without using pip. You could alternatively build your own customized version of the package if you need to redistribute it internally. This project uses setuptools for building its packages, so you only need to familiarize yourself with setuptools to make use of their setup.py file. Make sure that installing it this way doesn't create any misleading entries in pip's package list. If it does, either find a way to make sure the entry is more clear or find an alternative install method.
Since you've discovered a bug in the software, I also highly recommend forking it on Github and submitting a pull request once you have it fixed. If you do so, you can use the above installation instructions just by changing the repository URL to your fork. If you don't fork it, at least file an issue and describe the changes that fix it.
Alternatives:
You could copy all the source code into your project, modify it there, and then distribute the modified version with the rest of your code. (Make sure you don't violate the license if you do so.)
You might be able to solve you problem at runtime. Monkey-patching the module is a little risky if other people on your team might not expect the change in behavior, but it could be done for global modification of the module's behavior. You could also create some additional code that wraps the buggy code: it can take input, call the buggy code, and either prevents or handles the bug (e.g., modifying the input to make it work or catching an exception and handling it, etc.).

just print out the .__file__ attribute of the module:
>>> import numpy
>>> numpy.__file__
'/Library/Frameworks/Python.framework/Versions/3.5/lib/python3.5/site-packages/numpy/__init__.py'
Obviously the path and specific package would be different for you but this is pretty fool proof way of tracking down the source file of any module in python.

You can patch pip packages quite easily with the patch command.
When you do this, it's important that you specify exact version numbers for the packages that you patch.
I recommend using Pipenv, it creates a lock file where all versions of all dependencies and sub-dependencies are locked, so that always the same versions of packages are installed. It also manages your virtual env, and makes it convenient to use the method described here.
The first argument to the patch command is the file you want to patch. So that should be the module of the pip package, which is probably inside a virtualenv.
If you use Pipenv, you can get the virtual env path with pipenv --venv, so then you could patch the requests package like this:
patch $(pipenv --venv)/lib/python3.6/site-packages/requests/api.py < requests-api.patch
The requests.patch file is a diff file, which could look like:
--- requests/api.py 2022-05-03 21:55:06.712305946 +0200
+++ requests/api_new.py 2022-05-03 21:54:57.002368710 +0200
## -54,6 +54,8 ##
<Response [200]>
"""
+ print(f"Executing {method} request at {url}")
+
# By using the 'with' statement we are sure the session is closed, thus we
# avoid leaving sockets open which can trigger a ResourceWarning in some
# cases, and look like a memory leak in others.
You can make the patch file like this:
diff -u requests/api.py requests/api_new.py > requests-api.patch
Where requests/api_new.py would be the new, updated version of requests/api.py.
The -u flag to the diff command gives a unified diff format, which can be used to patch files later with the patch command.
So this method could be used in an automated process. Just make sure that you have specified an exact version numbers for the module that you patch. You don't want the module to upgrade unexpectedly, because you might have to update the patch file. So you also have to keep in mind, that if you ever manually upgrade the module, that you also check if the patch file needs to be recreated, and do so if it is necessary. It is only necessary when the file that you are patching has been updated in the new version of the package.

py_mysql2pgsql package is hosted on PyPI: https://pypi.python.org/pypi/py-mysql2pgsql
If you want the code for that specific version, just download the source tarball from PyPI (py-mysql2pgsql-0.1.5.tar.gz)
Development is hosted on GitHub: https://github.com/philipsoutham/py-mysql2pgsql

Related

Behavior change in pip wheel

I have a Python application that I am upgrading from Python 3.6 to 3.9 (some of our dependencies don't support 3.10 yet). For... complicated reasons we use a single source tree to build various different configurations with different subsets of files, so part of setup.py is devoted to figuring out which files aren't used and deleting them. With Python 3.6.9 and Pip 20.3.3, I can just run pip wheel --no-deps . and it works fine, as pip creates a working directory in $TMPDIR, copies the source tree, and only works from the copy. However, using Python 3.9.14 and Pip 22.0.4, it seems something has changed and Pip now operates on the current directory itself - meaning it deletes the "original" copies of files instead. Not a huge problem, I can just use Git to restore everything to the last commit (or manually create and work from a temporary directory), but it would be simpler to enable the old behavior. Nothing in the documentation for pip wheel says anything about using (or not using) a temporary directory, and I can't find anything about when or why this behavior might have changed. Is it possible to do make newer versions of Pip enable this behavior?
It turns out to have been due to changing behavior of Pip. Pip 21.1 introduced the --use-feature=in-tree-build option and Pip 21.3 made in-tree builds the default while introducing a temporary deprecated option --use-deprecated=out-of-tree-build as a temporary workaround to enable the old behavior; however this was changed from deprecated to removed in Pip 22.1b1.
Options we're considering for our workflow:
use the exclude_package_data argument to setup()
change our automated build system to run git checkout -- . afterwards automatically
change our automated build system to copy the necessary files to a temporary directory itself, basically moving the functionality to a higher stage up
Figure out a better way to accomplish what we currently accomplish by omitting files
Which might better for you, oh traveller on the dunes of time, I cannot say.

how to properly check if a module is installed, and if not attempt to install it

I am working in Python 3.6+ and want to check if a few different modules are installed from within my script. If not, I want to attempt to install them with a few caveats:
1) the proper way to do this, if I remember reading it correctly, is to look into 'packaging and versioning' .. possibly with setuptools .. im not really sure. There is a Digital ocean page that is hard to follow. It discusses this but I keep running into an issue with documents around this topic: they are all based around the assumption that the project will be uploaded to pypip for use with pip. I specifically do not want this. I want to distribute this directly to individuals, by hand. Maybe in the future have it available in a closed, not-open-to-everyone github repo.
Currently in my script I'm using a try and except. Try to import these modules, if they don't exist i run this exception which i don't know if it works.
except ImportError:
from pip._internal import main as pip
pip(['install', colorama])
import colorama
print('colorama imported successfully')
and for what its worth - i have no idea what pip(['install', colorama]) is doing.
The packaging aspect seems to include imported modules with your code. How does one preform this function? Instead of checking if colorama is installed and then attempting to launch a subprocess to install it .. how do i just include the entire thing assuming this is the 'right' way to do this?
One thing that's usually done to avoid this problem is to build your program in a virtual environment which you know to contain the correct python scripts - and then either
package the entire virtual environment with your project as a unit, or
write a requirements.txt file that lists all the packages (and versions) that are expected to be installed before the user runs the program (you'd install everything on that list by doing pip install -r requirements.txt on the command line before running the program with python script_name.py)
Ideally, you'd then have your script fail if the required dependencies aren't there, and have the user install them manually to fix the issue.
Here's python 3's documentation on virtual environments
What you're doing now is unconventional - if it's working, it's working, but it's not great practice. Biggest reason for that is that your script, in its current state, is installing software on the user's machine without their consent, i.e. the user did not tell the program to install this software and was not told that the software was necessary, but the program is installing it anyway. In your case this may be harmless, but in general it's something to stay away from because it can get into really shady territory.

Easy-install live python libraries/scripts

I have a number of python "script suites" (as I call them) which I would like to make easy-to-install for my colleagues. I have looked into pip, and that seems really nice, but in that regimen (as I understand it) I would have to submit a static version and update them on every change.
As it happens I am going to be adding and changing a lot of stuff in my script suites along the way, and whenever someone installs it, I would like them to get the newest version. With pip, that means that on every commit to my repository, I will also have to re-submit a package to the PyPI index. That's a lot of unnecessary work.
Is there any way to provide an easy cross-platform installation (via pip or otherwise) which pulls the files directly from my github repo?
I'm not sure if I understand your problem entirely, but you might want to use pip's editable installs[1]
Here's a brief example: In this artificial example let's suppose you want to use git as CVS.
git clone url_to_myrepo.git path/to/local_repository
pip install [--user] -e path/to/local_repository
The installation of the package will reflect the state of your local repository. Therefore there is no need to reinstall the package with pip when the remote repository gets updated. Whenever you pull changes to your local repository, the installation will be up-to-date as well.
[1] http://pip.readthedocs.org/en/latest/reference/pip_install.html#editable-installs

How to import libraries from source code in python?

I am trying to write a python script that I could easily export to friends without dependency problems, but am not sure how to do so. Specifically, my script relies on code from BeautifulSoup, but rather than forcing friends to have to install BeautifulSoup, I would rather just package the src for BeautifulSoup into a Libraries/ folder in my project files and call to functions from there. However, I can no longer simply "import bs4." What is the correct way of going about this?
Thanks!
A common approach is ship a requirements file with a project, specifying which version of which library is required. This file is (by convention) often named requirements.txt and looks something like this:
MyApp
BeautifulSoup==3.2.1
SomeOtherLib==0.9.4
YetAnother>=0.2
(The fictional file above says: I need exactly BeautifulSoup 3.2.1, SomeOtherLib 0.9.4 and any version of YetAnother greater or equal to 0.2).
Then the user of this project can simply take you library, (create a virtualenv) and run
$ pip install -r requirements.txt
which then will fetch all libraries and makes them available either system-wide of project-wide (if virtualenv is used). Here's a random python project off github, having a requirements file:
https://github.com/laoqiu/pypress
https://github.com/laoqiu/pypress/blob/master/requirements.txt
The nice thing about this approach is that you'll get your transitive dependencies resolved automatically. Also, if you use virtualenv, you'll get a clean separation of your projects and avoid library version collisions.
You must add Libraries/ (converted to an absolute path first) to sys.path before attempting to import anything under it.

Use setuptools to install from location

I have a framework for a site that I want to use in multiple projects but I don't want to submit my framework to PyPi. Is there anyway I can tell my setup.py to install the framework from a specific location?
Here is my current setup.py
from setuptools import setup
setup(
name='Website',
version='0.2.1',
install_requires=[
'boto>=2.6',
'fabric>=1.4',
'lepl>=5.1',
'pygeoip>=0.2.4',
'pylibmc>=1.2.3',
'pymongo>=2.2',
'pyyaml>=3.1',
'requests>=0.12',
'slimit>=0.7.4',
'thrift>=0.8.0',
'tornado>=2.3',
],
)
Those dependencies are actually all the dependencies for my framework so if I could include it somehow I could only have the framework listed here.
It looks like all of your requirements are public (on PyPI), and you don't need specific versions of them, just "new enough". In 2016, when you can count on everyone having a recent-ish version of pip, there's really nothing to do. If you just pip install . from the source directory or pip install git+https://url/to/package or similar, it will just pull the latest versions of the dependencies off the net. The fact that your package isn't on PyPI won't stop pip from finding its dependencies there.
Or, if you want to stash them all locally, you can set up a local PyPI index. Although in that case, it probably would be simpler to push your package to that same local index, and install it from there.
If you need anything more complicated, a requirements file can take care of that for you.
In particular, if you need to distribute the package to other people in your organization who may not have your team's local index set up, or for some reason you can't set up a local index in the first place, you can put all the necessary information in the requirements file--or, if it's more appropriate, on the command line used to install your package (which even works if you're stuck with easy_install or ancient versions of pip).
The documentation gives full details, and this blog post explains it very nicely, but the short version is this:
If you have a local PyPI index, provide --extra-index-url=http://my.server/path/to/my/pypi/.
If you've got an HTTP server that you can drop the packages on, and you can enable the "auto index directory contents" option in your server, just provide --find-links=http://my.server/path/to/my/packages/.
If you want to use local files (or SMB/AFP/etc. file sharing), create a trivial HTML file with nothing but links to all local packages, and provide --find-links=file:///path/to/my/index.html.
Again, these can go on the command line of a "to install this package, run this" (or a curl | sh install script), but usually you just want to put them in a requirements file. If so, make sure to use only one value per option (e.g., if you want to add two extra indexes, add two --extra-index-url params) and put each one on its own line.
A requirements file also lets you specify specific versions of each package, so you can be sure people are deploying with the same code you developed and tested with, which is often useful in these kinds of situations.

Categories

Resources