Structure for developmental Python packages

Structure for developmental Python packages - python

I'm starting a brand new project. I'm splitting code of different functions (e.g., utils, database, main application) into their own respective packages such that other new projects can just add them as dependencies and import them in the future. Packages may have cross-dependencies (e.g., database depends on utils).
I know that I can build each component as a Python package and use pip to manage the dependencies. However, as I'm starting from scratch, I will be making active changes to all packages at the same time. It seems to me packaging them as a "proper package" would be quite inefficient. I envisage that I will need to add a new function in say utils, increment the version, use it database, then realise that I need another new function that belongs in utils, increment version again etc.
What would be the best way to structure the project in this scenario? I'm using conda, Python 3.10 and VSCode if that matters.

I suggest the package approach you are thinking about. The key method you're missing is to make editable installs locally for all your packages.
pip install -e . in package root directory
Editable installs will reflect their changes right away in your environment
Since you are using conda you probably want conda develop . like this answer suggests

Related

Python code checker for modules included in requirements.txt but unused? [duplicate]

Is there any easy way to delete no-more-using packages from requirements file?
I wrote a bash script for this task but, it doesn't work as I expected. Because, some packages are not used following their PyPI project names. For example;
dj-database-url
package is used as
dj_database_url
My project has many packages in its own requirements file, so, searching them one-by-one is too messy, error-prone and takes too much time. As I searched, IDEs don't have this property, yet.

You can use Code Inspection in PyCharm.
Delete the contents of your requirements.txt but keep the empty file.
Load your project in,
PyCharm go to Code -> Inspect code....
Choose Whole project option in dialog and click OK.
In inspection results panel locate Package requirements section under Python (note that this section will be showed only if there is any requirements.txt or setup.py file).
The section will contain one of the following messages:
Package requirement '<package>' is not satisfied if there is any package that is listed in requirements.txt but not used in any .py file.
Package '<package>' is not listed in project requirements if there is any package that is used in .py files, but not listed in requirements.txt.
You are interested in the second inspection.
You can add all used packages to requirements.txt by right clicking the Package requirements section and selecting Apply Fix 'Add requirements '<package>' to requirements.txt'. Note that it will show only one package name, but it will actually add all used packages to requirements.txt if called for section.
If you want, you can add them one by one, just right click the inspection corresponding to certain package and choose Apply Fix 'Add requirements '<package>' to requirements.txt', repeat for each inspection of this kind.
After that you can create clean virtual environment and install packages from new requirements.txt.
Also note that PyCharm has import optimisation feature, see Optimize imports.... It can be useful to use this feature before any other steps listed above.

The best bet is to use a (fresh) python venv/virtual-env with no packages, or only those you definitely know you need, test your package - installing missing packages with pip as you hit problems which should be quite quick for most software then use the pip freeze command to list the packages you really need. Better you you could use pip wheel to create a wheel with the packages in.
The other approach would be to:
Use pylint to check each file for unused imports and delete them, (you should be doing this anyway),
Run your tests to make sure that it was right,
Use a tool like snakefood or snakefood3 to generate your new list of dependencies
Note that for any dependency checking to work well it is advisable to avoid conditional import and import within functions.
Also note that to be sure you have everything then it is a good idea to build a new venv/virtual-env and install from your dependencies list then re-test your code.

You can find obsolete dependencies by using deptry, a command line utility that checks for various issues with a project's dependencies, such as obsolete, missing or transitive dependencies.
Add it to your project with
pip install deptry
and then run
deptry .
Example output:
-----------------------------------------------------
The project contains obsolete dependencies:
Flask
scikit-learn
scipy
Consider removing them from your projects dependencies. If a package is used for development purposes, you should add
it to your development dependencies instead.
-----------------------------------------------------
Note that for the best results, you should be using a virtual environment for your project, see e.g. here.
Disclaimer: I am the author of deptry.

In pycharm go to Tools -> Sync Python Requirements. There's a 'Remove unused requirements' checkbox.

I've used with success pip-check-reqs.
With command pip-extra-reqs your_directory it will check for all unused dependencies in your_directory
Install it with pip install pip-check-reqs.

How to manage python project that depends on multiple versions of a shared library?

I am on macOS, using brew, pyenv, and virtualenv.
I have a Python project that depends on bokeh and gdal (both python packages were installed with pip inside a virtual environment). Both bokeh and gdal depend on a system version of libopenssl, but they depend on different versions (1.0 and 1.1).
I have had this project working at various points in the past, with some combination of libraries (using pip for all python packages and brew for system packages) but when I change python versions and environments (using pyenv) to work on other projects, and then come back to this project, it no longer works. Usually something along these lines with a problem finding a shared library for openssl:
$ ./my_python_program.py
...
ImportError: dlopen(/Users/userBob/.pyenv/versions/3.7.0/lib/python3.7/lib-dynload/_ssl.cpython-37m-darwin.so, 2):
Library not loaded: /usr/local/opt/openssl#1.1/lib/libssl.1.1.dylib
Referenced from: /Users/userBob/.pyenv/versions/3.7.0/lib/python3.7/lib-dynload/_ssl.cpython-37m-darwin.so
Reason: image not found
I feel like I am eventually able to get things to work by trying random combinations of installing and uninstalling various package versions using pip and brew. But this is a fragile and inefficient way to maintain my projects.
In general what is the best way to handle this kind of situation? Do I need to simply record the exact brew and pip install/uninstall commands to get it working? Am I missing the concept of version "pinning"? Are there additional options with brew and pyenv that I am missing that might make this process easier?

I'm not sure this is the best way to do it, but I can tell you what I do usually.
First of all, I'm using Anaconda.
When I'm on a project, I switch to the relevant virtual environment.
Before switching out, when I commit/push my modifications, I also create an export file of my environment like you can find it there.
I also track this file with git, this way, if I make any modification when working on the environment, it's stored in the .yml file.
This way, I can reinstall all dependencies needed for the project if I format my machine or get a new one, etc. And the reference for every dependency I need is stored in the cloud with my sources. So in case I start getting weird behaviour, I just restore my environment with this reference file from the time when it was working.
I'm not switching between projects fast enough for me to justify automatizing this process, but I'm sure that's feasible if you want to.

Make virtualenv share already existing site-packages?

My layout is as follows:
I have various different python projects under ~/projects, each with the following structure:
~/projects/$project_name/env #This is the virtualenv
~/projects/$project_name/scripts #This is where the code actually lives
~/projects/$project_name/scripts/requirements.txt #This helps keep track of this project's dependencies
Now, this setup works great as it does the following:
Each project has it's own dependencies in its corresponding env
I can easily redeploy this project somewhere else by cloning the scripts file, creating a new virtualenv and doing pip install -r requirements.txt
The main downside of this setup is that I have multiple copies of the same packages in multiple virtual environments. I regularly end up with a couple of hundred megs for each virtual environment.
My question is:
Is there a way to share packages between multiple virtualenvs?
Things I've tried and do not work:
virtualenv --system-site-packages. This makes the system-wise packages available in the virtualenv but:
it makes it impossible to get a list of specific dependencies
I can't have multiple versions of the same dependency installed (e.g. pandas 0.16 vs pandas 0.15) which I need, as different projects have different needs.
virtualenv --extra-search-dir=/path/to/dist only works for pip, AFAICT, so not good for me.

Scrap the comment, maybe I do know an answer. It appears as though Anaconda's package management system does use symlinks. So that would basically be a virtualenv but with the feature you want. See here: How to free disk space taken up by (ana)conda?
That said, there's a large initial harddisk cost to using Conda, so investigate a bit more and decide if it will work for you.

Locally modify package from pip

I've locally installed via pip Python package in virtualenv. I'd like to modify it (not monkey patch or subclass, but deeply modify) and keep it in my source control system referencing without installing. Maybe later I'd like to package it again so I'd like to keep all files for creating package, not only python sources.
Should I just copy it to my project folder and deinstall from virtualenv?

Two points. One, are the changes you're planning to make useful for anyone else? If the first, you might consider cloning the source repo, making your changes and submitting a PR. Even if it's not immediately merged, you can make use of setup.py to create a local package and install that in your virtualenv.
And two, are you planning to use these changes for just one project, or on many projects? If it's just for one project, throwing it in your repo and deeply modifying it is probably an ok thing, (although you need to confirm you're allowed to do so by the license). If you can foresee using this in multiple projects, you're probably better off creating a repo for it, and packaging it via setup.py.

Is there a way to "version" my python distribution?

I'm working by myself right now, but am looking at ways to scale my operation.
I'd like to find an easy way to version my Python distribution, so that I can recreate it very easily. Is there a tool to do this? Or can I add /usr/local/lib/python2.7/site-packages/ (or whatever) to an svn repo? This doesn't solve the problems with PATHs, but I can always write a script to alter the path. Ideally, the solution would be to build my Python env in a VM, and then hand copies of the VM out.
How have other people solved this?

virtualenv + requirements.txt are your friend.
You can create several virtual python installs for your projects, everything containing exactly those library versions you need (Tip: pip freeze spits out a requirements.txt with the exact library versions).
Find a good reference to virtualenv here: http://simononsoftware.com/virtualenv-tutorial/ (it's from this question Comprehensive beginner's virtualenv tutorial?).
Alternatively, if you just want to distribute your code together with libraries, PyInstaller is worth a try. You can package everything together in a static executable - you don't even have to install the software afterwards.

You want to use virtualenv. It lets you create an application(s) specific directory for installed packages. You can also use pip to generate and build a requirements.txt

For the same goal, i.e. having the exact same Python distribution as my colleagues, I tried to create a virtual environment in a network drive, so that everybody of us would be able to use it, without anybody making his local copy.
The idea was to share the same packages installed in a shared folder.
Outcome: Python run so unbearably slow that it could not be used. Also installing a package was very very sluggish.
So it looks there is no other way than using virtualenv and a requirements file. (Even if unfortunately often it does not always work smoothly on Windows and it requires manual installation of some packages and dependencies, at least at this time of writing.)

Develop Reference

Python is a programming language that lets you work quickly and integrate systems more effectively.