How to import libraries from source code in python? - python

I am trying to write a python script that I could easily export to friends without dependency problems, but am not sure how to do so. Specifically, my script relies on code from BeautifulSoup, but rather than forcing friends to have to install BeautifulSoup, I would rather just package the src for BeautifulSoup into a Libraries/ folder in my project files and call to functions from there. However, I can no longer simply "import bs4." What is the correct way of going about this?
Thanks!

A common approach is ship a requirements file with a project, specifying which version of which library is required. This file is (by convention) often named requirements.txt and looks something like this:
MyApp
BeautifulSoup==3.2.1
SomeOtherLib==0.9.4
YetAnother>=0.2
(The fictional file above says: I need exactly BeautifulSoup 3.2.1, SomeOtherLib 0.9.4 and any version of YetAnother greater or equal to 0.2).
Then the user of this project can simply take you library, (create a virtualenv) and run
$ pip install -r requirements.txt
which then will fetch all libraries and makes them available either system-wide of project-wide (if virtualenv is used). Here's a random python project off github, having a requirements file:
https://github.com/laoqiu/pypress
https://github.com/laoqiu/pypress/blob/master/requirements.txt
The nice thing about this approach is that you'll get your transitive dependencies resolved automatically. Also, if you use virtualenv, you'll get a clean separation of your projects and avoid library version collisions.

You must add Libraries/ (converted to an absolute path first) to sys.path before attempting to import anything under it.

Related

Python code checker for modules included in requirements.txt but unused? [duplicate]

Is there any easy way to delete no-more-using packages from requirements file?
I wrote a bash script for this task but, it doesn't work as I expected. Because, some packages are not used following their PyPI project names. For example;
dj-database-url
package is used as
dj_database_url
My project has many packages in its own requirements file, so, searching them one-by-one is too messy, error-prone and takes too much time. As I searched, IDEs don't have this property, yet.
You can use Code Inspection in PyCharm.
Delete the contents of your requirements.txt but keep the empty file.
Load your project in,
PyCharm go to Code -> Inspect code....
Choose Whole project option in dialog and click OK.
In inspection results panel locate Package requirements section under Python (note that this section will be showed only if there is any requirements.txt or setup.py file).
The section will contain one of the following messages:
Package requirement '<package>' is not satisfied if there is any package that is listed in requirements.txt but not used in any .py file.
Package '<package>' is not listed in project requirements if there is any package that is used in .py files, but not listed in requirements.txt.
You are interested in the second inspection.
You can add all used packages to requirements.txt by right clicking the Package requirements section and selecting Apply Fix 'Add requirements '<package>' to requirements.txt'. Note that it will show only one package name, but it will actually add all used packages to requirements.txt if called for section.
If you want, you can add them one by one, just right click the inspection corresponding to certain package and choose Apply Fix 'Add requirements '<package>' to requirements.txt', repeat for each inspection of this kind.
After that you can create clean virtual environment and install packages from new requirements.txt.
Also note that PyCharm has import optimisation feature, see Optimize imports.... It can be useful to use this feature before any other steps listed above.
The best bet is to use a (fresh) python venv/virtual-env with no packages, or only those you definitely know you need, test your package - installing missing packages with pip as you hit problems which should be quite quick for most software then use the pip freeze command to list the packages you really need. Better you you could use pip wheel to create a wheel with the packages in.
The other approach would be to:
Use pylint to check each file for unused imports and delete them, (you should be doing this anyway),
Run your tests to make sure that it was right,
Use a tool like snakefood or snakefood3 to generate your new list of dependencies
Note that for any dependency checking to work well it is advisable to avoid conditional import and import within functions.
Also note that to be sure you have everything then it is a good idea to build a new venv/virtual-env and install from your dependencies list then re-test your code.
You can find obsolete dependencies by using deptry, a command line utility that checks for various issues with a project's dependencies, such as obsolete, missing or transitive dependencies.
Add it to your project with
pip install deptry
and then run
deptry .
Example output:
-----------------------------------------------------
The project contains obsolete dependencies:
Flask
scikit-learn
scipy
Consider removing them from your projects dependencies. If a package is used for development purposes, you should add
it to your development dependencies instead.
-----------------------------------------------------
Note that for the best results, you should be using a virtual environment for your project, see e.g. here.
Disclaimer: I am the author of deptry.
In pycharm go to Tools -> Sync Python Requirements. There's a 'Remove unused requirements' checkbox.
I've used with success pip-check-reqs.
With command pip-extra-reqs your_directory it will check for all unused dependencies in your_directory
Install it with pip install pip-check-reqs.

Difference between adding path to PYTHONPATH and installing your own module

I'm working on a python project that contains a number of routines I use repeatedly. Instead of rewriting code all the time, I just want to update my package and import it; however, it's nowhere near done and is constantly changing. I host the package on a repo so that colleagues on various machines (UNIX + Windows) can pull it into their local repos and use it.
It sounds like I have two options, either I can keeping installing the package after every change or I can just add the folder directory to my system's path. If I change the package, does it need to be reinstalled? I'm using this blog post as inspiration, but the author there doesn't stress the issue of a continuously changing package structure, so I'm not sure how to deal with this.
Also if I wanted to split the project into multiple files and bundle it as a package, at what level in the directory structure does the PTYHONPATH need to be at? To the main project directory, or the .sample/ directory?
README.rst
LICENSE
setup.py
requirements.txt
sample/__init__.py
sample/core.py
sample/helpers.py
docs/conf.py
docs/index.rst
tests/test_basic.py
tests/test_advanced.py
In this example, I want to be able to just import the package itself and call the modules within it like this:
import sample
arg = sample.helper.foo()
out = sample.core.bar(arg)
return out
Where core contains a function called foo
PYTHONPATH is a valid way of doing this, but in my (personal) opinion it's more useful if you have a whole different place where you keep your python variables. Like /opt/pythonpkgs or so.
For projects where I want it to be installed and also I have to keep developing, I use develop instead of install in setup.py:
When installing the package, don't do:
python setup.py install
Rather, do:
python setup.py develop
What this does is that it creates a synlink/shortcut (I believe it's called egglink in python) in the python libs (where the packages are installed) to point to your module's directory. Hence, as it's a shortcut/symlink/egglink when ever you change a python file, it will immediately reflect the next time you import that file.
Note: Using this, if you delete the repository/directory you ran this command from, the package will cease to exist (as its only a shortcut)
The equivalent in pip is -e (for editable):
pip install -e .
Instead of:
pip install .

Edit package installed by pip

I'm trying to edit a package that I installed via pip, called py_mysql2pgsql (I had an error when converting my db from mysql to postgre, just like this.
However, when I got to the folder /user/local/lib/python2.7/dist-packages/py_mysql2pgsql-0.1.5.egg-info, I cannot find the source code for the package. I only find PKG-INFO and text files.
How can I find the actual source code for a package (or in particular, this package)?
Thanks
TL;DR:
Modifying in place is dangerous. Modify the source and then install it from your modified version.
Details
pip is a tool for managing the installation of packages. You should not modify files creating during package installation. At best, doing so would mean pip will believe that a particular version of the package is installed when it isn't. This would not interact well with the upgrade function. I suspect pip would just overwrite your customizations, discarding them forever, but I haven't confirmed. The other possibility is that it checks if files have changed and throws an error if so. (I don't think that's likely.) It also misleads other users of the system. They see that you have a package installed, but you don't actually have that version indicated; you have a customized version. This is likely to result in confusion if they try to install the unmodified version somewhere else or if they expect some particular behavior from the version installed.
If you want to modify the source code, the right thing to do is modify the source code and either build a new, custom package or just install from source. py-mysql2pgsql provides instructions for performing a source install:
> git clone git://github.com/philipsoutham/py-mysql2pgsql.git
> cd py-mysql2pgsql
> python setup.py install
You can clone the source, modify it, and then install without using pip. You could alternatively build your own customized version of the package if you need to redistribute it internally. This project uses setuptools for building its packages, so you only need to familiarize yourself with setuptools to make use of their setup.py file. Make sure that installing it this way doesn't create any misleading entries in pip's package list. If it does, either find a way to make sure the entry is more clear or find an alternative install method.
Since you've discovered a bug in the software, I also highly recommend forking it on Github and submitting a pull request once you have it fixed. If you do so, you can use the above installation instructions just by changing the repository URL to your fork. If you don't fork it, at least file an issue and describe the changes that fix it.
Alternatives:
You could copy all the source code into your project, modify it there, and then distribute the modified version with the rest of your code. (Make sure you don't violate the license if you do so.)
You might be able to solve you problem at runtime. Monkey-patching the module is a little risky if other people on your team might not expect the change in behavior, but it could be done for global modification of the module's behavior. You could also create some additional code that wraps the buggy code: it can take input, call the buggy code, and either prevents or handles the bug (e.g., modifying the input to make it work or catching an exception and handling it, etc.).
just print out the .__file__ attribute of the module:
>>> import numpy
>>> numpy.__file__
'/Library/Frameworks/Python.framework/Versions/3.5/lib/python3.5/site-packages/numpy/__init__.py'
Obviously the path and specific package would be different for you but this is pretty fool proof way of tracking down the source file of any module in python.
You can patch pip packages quite easily with the patch command.
When you do this, it's important that you specify exact version numbers for the packages that you patch.
I recommend using Pipenv, it creates a lock file where all versions of all dependencies and sub-dependencies are locked, so that always the same versions of packages are installed. It also manages your virtual env, and makes it convenient to use the method described here.
The first argument to the patch command is the file you want to patch. So that should be the module of the pip package, which is probably inside a virtualenv.
If you use Pipenv, you can get the virtual env path with pipenv --venv, so then you could patch the requests package like this:
patch $(pipenv --venv)/lib/python3.6/site-packages/requests/api.py < requests-api.patch
The requests.patch file is a diff file, which could look like:
--- requests/api.py 2022-05-03 21:55:06.712305946 +0200
+++ requests/api_new.py 2022-05-03 21:54:57.002368710 +0200
## -54,6 +54,8 ##
<Response [200]>
"""
+ print(f"Executing {method} request at {url}")
+
# By using the 'with' statement we are sure the session is closed, thus we
# avoid leaving sockets open which can trigger a ResourceWarning in some
# cases, and look like a memory leak in others.
You can make the patch file like this:
diff -u requests/api.py requests/api_new.py > requests-api.patch
Where requests/api_new.py would be the new, updated version of requests/api.py.
The -u flag to the diff command gives a unified diff format, which can be used to patch files later with the patch command.
So this method could be used in an automated process. Just make sure that you have specified an exact version numbers for the module that you patch. You don't want the module to upgrade unexpectedly, because you might have to update the patch file. So you also have to keep in mind, that if you ever manually upgrade the module, that you also check if the patch file needs to be recreated, and do so if it is necessary. It is only necessary when the file that you are patching has been updated in the new version of the package.
py_mysql2pgsql package is hosted on PyPI: https://pypi.python.org/pypi/py-mysql2pgsql
If you want the code for that specific version, just download the source tarball from PyPI (py-mysql2pgsql-0.1.5.tar.gz)
Development is hosted on GitHub: https://github.com/philipsoutham/py-mysql2pgsql

how to check what packages are used to include in a requirements.txt file?

I'm starting to dive into python but I'm a bit confused with how the requirements.txt file work. How do I know what to include in it?
For example, the current project I'm working on, I only installed Flask. So do I just add only flask to that file? Or are there other packages that I don't know about - if so is there a way to find out (e.g display a full list)?
You could run pip to get the list of requirements for your project.
pip freeze > requirements.txt
You could just "grep" the Python source files in your project for "import " to get an exhaustive list of packages you use. Remove the obvious ones that are part of the standard library, like datetime or whatever, and the rest are what you might include in requirements.txt.
I don't know of a more "automatic" way to do it; another way might be to set up a clean virtualenv or other sandboxed install of Python with no extra packages, and try installing your software in there using only your requirements.txt.

Packaging single Python module with dependencies

I have a single Python3 .py module with a few dependencies (lockfile, python-daemon). Is there a simple way to package this with its' dependencies so that users do not need to download and install the other modules? An all included install is what I am trying to do.
I tried looking at setuptools, distribute, and distutils and ended up even more confused than when I started.
The simplest way I see often used is to put all your dependencies in a single file (usually named requirements.txt) and then you ask user to run the following command:
pip install -r requirements.txt
And here is an example for the content of the file (https://github.com/cenkalti/pypi-notifier/blob/master/requirements.txt):
Flask==0.10.1
Flask-Cache==0.12
Flask-SQLAlchemy==1.0
Flask-Script==0.5.3
GitHub-Flask==0.3.4
Jinja2==2.7
MarkupSafe==0.18
SQLAlchemy==0.8.2
...
cx_Freeze should do what you're looking for.
You could quite easily do this with something simple like a .zip file containing all the files; so long as all the files are exported to the same directory then they should all work! The downfall is if there are lots of dependancies for the modules, ie they have extra folders you would need to find.
I also think a fair amount of people/companies write their own packaging systems so that all the modules are in 1 .py file that opens in the console and exports everything to it's correct place. This would require a fair amount of work though so you may to try and find one prebuilt. I've gone down this method and it didn't prove too taxing unitl I had to unzip .zips with files in...
As another solution you could try PyExe (I think it's called that) to export everything to a single .exe file (Windows only though)
I personally havn't used setuptools, distribute or distutils so can't comment on those unfortunately.
1 other thing to bear in mind is the licences for each module, some may not be allowed to be redistributed so check first!
py2exe is fine, but will limit you to Windows.
Best way to do this without limiting your audience and following generally accepted best practices is to create a requirements.txt and setup.py file and then upload your project to github. See https://github.com/sourcegraph/python-deps as a reference. The requirements.txt lists the dependencies in a simple, easy-to-read format and you specify the commands and library modules your project installs using the scripts and py_modules options in setup.py.
Assuming your git repository is at github.com/foo/bar, your users can then do pip install git+https://github.com/foo/bar.
Read official Python packaging tutorial.
You create a Python package from your module with setup.py
In setup.py you can declare what other Python packages must be installed as dependencies
Furthermore you can narrow down application specific dependency version with requirements.txt.

Categories

Resources