How to maintain python application with dependencies, including my own custom libs?

How to maintain python application with dependencies, including my own custom libs? - python

I'm using Python to develop few company-specific applications. There is a custom shared module ("library") that describes some data and algorithms and there are dozens of Python scripts that work with this library. There's quite a lot of these files, so they are organized in subfolders
myproject
apps
main_apps
app1.py
app2.py
...
utils
util1.py
util2.py
...
library
__init__.py
submodule1
__init__.py
file1.py
...
submodule2
...
Users want to run these scripts by simply going, say, to myproject\utils and launching "py util2.py some_params". Many of these users are developers, so quite often they want to edit a library and immediately re-run scripts with updated code. There are also some 3rd party libraries used by this project and I want to make sure that everyone is using the same versions of these libs.
Now, there are two key problems I encountered:
how to reference (library) from (apps)?
how to manage 3rd party dependencies?
The first problem is well-familiar to many Python developers and was asked on SO for many times: it's quite difficult to instruct Python to import package from "....\library". I tested several different approaches, but it seems that python is reluctant to search for packages anywhere, but in standard libraries locations or the folder of the script itself.
Relative import doesn't work since script is not a part of a library (and even if it was, this still doesn't work when script is executed directly unless it's placed in the "root" project folder which I'd like to avoid)
Placing .pth file (as one might think from reading this document) to script folder apparently doesn't have any effect
Of course direct meddling with sys.path work, but boilerplate code like this one in each and every one of the script files looks quite terrible
import sys, os.path
here = os.path.dirname(os.path.realpath(__file__))
module_root = os.path.abspath(os.path.join(here, '../..'))
sys.path.append(python_root)
import my_library
I realize that this happens because Python wants my library to be properly "installed" and that's indeed would be the only right way to go had this library was developed separately from the scripts that use it. But unfortunately it's not the case and I think that re-doing "installation" of library each time it's changed is going to be quite inconvenient and prone to errors.
The second problem is straightforward. Someone adds a new 3rd party module to our app/lib and everyone else start seeing import problems once they update their apps. Several branches of development, different moments when user does pip install, few rollbacks - and everyone eventually ends using different versions of 3rd party modules. In my case things are additionally complicated by the fact that many devs work a lot with older Python 2.x code while I'd like to move on to Python 3.x
While looking for a possible solution for my problems, I found a truly excellent virtual environments feature in Python. Things looked quite bright:
Create a venv for myproject
Distribute a Requirements.txt file as part of app and provide a script that populates venv accordingly
Symlink my own library to venv site_packages folder so it'll be always detected by Python
This solution looked quite natural & robust. I'm explicitly setting my own environment for my project and place whatever I need into this venv, including my own lib that I can still edit on the fly. And it indeed work. But calling activate.bat to make this python environment active and another batch file to deactivate it is a mess, especially on Windows platform. Boilerplate code that is editing sys.path looks terrible, but at least it doesn't interfere with UX like this potential fix do.
So there's a question that I want to ask.
Is there a way to bind particular python venv to particular folders so python launcher will automatically use this venv for scripts from these folders?
Is there a better alternative way to handle this situation that I'm missing?
Environment for my project is Python 3.6 running on Windows 10.

I think that I finally found a reasonable answer. It's enough to just add shebang line pointing to python interpreter in venv, e.g.
#!../../venv/Scripts/python
The full project structure will look like this
myproject
apps
main_apps
app1.py (with shebang)
app2.py (with shebang)
...
utils
util1.py (with shebang)
util2.py (with shebang)
...
library
__init__.py
submodule1
__init__.py
file1.py
...
submodule2
...
venv
(python interpreter, 3rd party modules)
(symlink to library)
requirements.txt
init_environment.bat
and things work like this:
venv is a virtual python environment with everything that project needs
init_environment.bat is a script that populates venv according to requirements.txt and places a symlink to my library into venv site-modules
all scripts start with shebang line pointing (with relative path) to venv interpreter
There's a full custom environment with all the libs including my own and scripts that use it will all have very natural imports. Python launcher will also automatically pick Python 3.6 as interpreter & load the relevant modules whenever any user-facing script in my project is launched from console or windows explorer.
Cons:
Relative shebang won't work if a script is called from other folder
User will still have to manually run init_environment.bat to update virtual environment according to requirements.txt
init_environment scrip on Windows require elevated privileges to make a symlink (but hopefully that strange MS decision will be fixed with upcoming Win10 update in April'17)
However I can live with these limitations. Hope that this will help others looking for similar problems.
Would be still nice to still hear other options (as answers) and critics (as comments) too.

Related

Is it possible to have users not pip install modules and instead include the modules used in a different folder and then import that?

I want to know if I can create a python script with a folder in the same directory with all the assets of a python module, so when someone wants to use it, they would not have to pip install module, because it would import from the directory.

Yes, you can, but it doesn't mean that you should.
First, ask yourself who is suposed to use that code.
If you plan to give it to consumers, it would be a good idea to use a tool like py2exe and create executable file which would include all modules and not allow for code to be changed.
If you plan to share it with another developer, you might want to look into virtual environments and requirements.txt file.
There are multiple reasons why sharing modules is bad idea:
It is harder to update modules later, at least without upgrading whole project.
It uses more space on version control, which can create issues on huge projects with hundreds of modules and branches
It might be illegal as some licenses specifically forbid including their code in your source code.
The pip install of some module might do different things depending on operating system version or installed packages. The modules on your machine might be suboptimal on someone else's machine, and in some instances might not even work.
And probably more that I can't think of right now.
The only situation where I saw this being unavoidable was when the module didn't support python implementation the application was running on. The module was changed, and its source was put under lib folder with the rest of the libraries.

I think you can add the directory with python modules into PYTHONPATH. Then people want to use those modules just need has this envvar set.
https://docs.python.org/3/using/cmdline.html#envvar-PYTHONPATH

How to use a utils across projects and ensure it is updated

I have made a few Python projects for work that all revolve around extracting data, performing Pandas manipulations, and exporting to Excel. Obviously, there are common functions I've been reusing. I've saved these into utils.py, and I copy paste utils.py into each new project.
Whenever I change utils.py, I need to ensure that I change it in my other project, which is an error-prone process
What would you suggest?
Currently, I create a new directory for each project, so
/PyCharm Projects
--/CollegeBoard
----/venv
----/CollegeBoard.py
----/Utils.py
----/Paths.py
--/BoxTracking
----/venv
----/BoxTracking.py
----/Utils.py
----/Paths.py
I'm wondering if this is the most effective way to structure/version control my work. Since I have many imports in common, too, would a directory like this be better?
/Projects
--/Reporting
----/venv
----/Collegeboard
------/Collegeboard.py
------/paths.py
----/BoxTracking
------/BoxTracking.py
------/paths.py
----/Utils.py
I would appreciate any related resources.

Instead of putting a copy of utils.py into each of your projects, make utils.py into a package with it's own dedicated repository/folder somewhere. I'd recommend renaming it to something less generic, such as "zhous_utils".
In that dedicated repository for zhous_utils, you can create a setup.py file and you can use that setup.py file to install the current version of the zhous_utils into your python install. That way you can import zhous_utils into any other python script on your PC, just like you would import pandas or any other package you've installed to your computer.
Check out this stackoverflow thread: What is setup.py?
When you understand setup.py, then you will understand how to make and install your own packages so that you can import those installed packages to any python script on your PC. That way all source code for zhous_utils is centralized to just one folder on your PC, which you can update whenever you want and re-install the package.
Now, of course, there are some potential challenges/downsides to this. If you install zhous_utils to your computer and then import and use zhous_utils in one of your other projects, then you've just made zhous_utils into a dependency of that project. That means that if you want to share that project with other people and let them work on it as well or use it in some way, then they will need to install zhous_utils. Just be aware of that. This won't be an issue if you're the only one interacting/developing the source code of the projects you intend to import zhous_utils into.

How to import some set of Python modules globally?

In my particular setting, I have a set of python modules that include auxiliary functions used in many different other modules. I putted them into a LIBS folder and I have other folder at the same path level those are including other modules that are doing certain jobs by using the help of these LIBS modules. Presently, I do this for all the modules to import LIBS modules.
import sys
sys.path.insert(0, '../LIBS')
import lib_module1
import lib_module2
....
As the project getting larger, this starts to be pain in the neck. I need to write down a large set of import statements for these auxiliary LIBS modules for each new module.
Is there any way to automatically import all these LIBS modules for the other modules that are in the folders living at the same path lelvel with LIBS folder?

For this, you can use
__init__.py
Kindly refer Modules and Stackoverflow.

Indeed there is! Start treating your LIBS modules as "real" modules (or packages) that are installed into the system like any other.
This means you will have to write a setup.py script to install your code. Generally this is done inside your development directory, then your module is installed with:
$ sudo python setup.py install
This will install your module under the site-packages subdirectory of wherever Python libraries are stored on your system.
I suggest starting by copying someone else's working setup.py and supporting files, then modifying to suit your packages. For example, here is my quoter module.
Fair warning: This is a pretty big step. Not only will you learn to deploy your module locally, you can also publish it on PyPI if you wish. The step of moving to true packages will encourage you to write more and more standard documentation, to develop and run more tests, to adopt more rigorous version specifications, to more clearly identify and define code dependencies, and take many other "professionalization" steps. These all pay dividends in better, more reliable, more portable, more easily deployed code--but I'd be lying if I didn't admit the learning curve can be steep at times.

Installing (and accessing) a Python module from a local directory

I'm trying to make an application in Python that will be run from a USB drive on a variety of different computers. Let's just assume for a second that the client computer has Python installed and put that to one side.
My application uses cherrypy to launch a local web server, but it's safe to assume that this won't be already installed on the computer it's run on.
What would be the best of addressing this problem as transparently as possible?
It should be usable by a non-technical user and they shouldn't have to worry about installing any dependencies themselves.
At the moment, I've packaged a copy of the cherrypy source with my app and if the program detects that it isn't already installed, it runs the setup.py but with the --user flag, avoiding any permissions issues.
This doesn't seem like great practice.
Is it possible to just include the built package with my app? I've tried doing this with something like the following:
/myapp/libs/cherrypy
I copied the cherrypy directory from the CherryPy archive. This is pre-build, and so probably won't work. I also tried an alternative where I run setup.py (which created a build directory) and copied those files instead.
/myapp/somefile.py
import myapp.libs.cherrypy as cherrypy
In theory, this should work... But it doesn't. When I try to run the server (using cherrpy.quickstart()) it runs two instances of the server, which obviously starts causing problems.
So, the question(s): First off, am I going about this completely the wrong way? How can I make a third-party package available to my app?

This is due to PYTHONPATH issues. I would recommend using virtual envs and pip as standard when working with packages you've imported or obatined externally.
Some great notes here: https://python-guide.readthedocs.org/en/latest/
If you wnat to import your own code. I'd set your PYTHONPATH (in the case below the dev_folder) to a root development directory and follow this structure...
dev_folder \
- project_name \
- main_script.py
- helper.py
- libary1 \
- __init__.py
- lib1.py
- libary2 \
- __init__.py
- lib2.py
You'd obviously come up with better names for the library folders/packages :-)
Hope this helps.

PYTHONPATH vs. sys.path

Another developer and I disagree about whether PYTHONPATH or sys.path should be used to allow Python to find a Python package in a user (e.g., development) directory.
We have a Python project with a typical directory structure:
Project
setup.py
package
__init__.py
lib.py
script.py
In script.py, we need to do import package.lib. When the package is installed in site-packages, script.py can find package.lib.
When working from a user directory, however, something else needs to be done. My solution is to set my PYTHONPATH to include "~/Project". Another developer wants to put this line of code in the beginning of script.py:
sys.path.append(os.path.dirname(os.path.dirname(os.path.abspath(__file__))))
So that Python can find the local copy of package.lib.
I think this is a bad idea, as this line is only useful for developers or people running from a local copy, but I can't give a good reason why it is a bad idea.
Should we use PYTOHNPATH, sys.path, or is either fine?

If the only reason to modify the path is for developers working from their working tree, then you should use an installation tool to set up your environment for you. virtualenv is very popular, and if you are using setuptools, you can simply run setup.py develop to semi-install the working tree in your current Python installation.

I hate PYTHONPATH. I find it brittle and annoying to set on a per-user basis (especially for daemon users) and keep track of as project folders move around. I would much rather set sys.path in the invoke scripts for standalone projects.
However sys.path.append isn't the way to do it. You can easily get duplicates, and it doesn't sort out .pth files. Better (and more readable): site.addsitedir.
And script.py wouldn't normally be the more appropriate place to do it, as it's inside the package you want to make available on the path. Library modules should certainly not be touching sys.path themselves. Instead, you'd normally have a hashbanged-script outside the package that you use to instantiate and run the app, and it's in this trivial wrapper script you'd put deployment details like sys.path-frobbing.

In general I would consider setting up of an environment variable (like PYTHONPATH)
to be a bad practice. While this might be fine for a one off debugging but using this as
a regular practice might not be a good idea.
Usage of environment variable leads to situations like "it works for me" when some one
else reports problems in the code base. Also one might carry the same practice with the
test environment as well, leading to situations like the tests running fine for a
particular developer but probably failing when some one launches the tests.

Along with the many other reasons mentioned already, you could also point outh that hard-coding
sys.path.append(os.path.dirname(os.path.dirname(os.path.abspath(__file__))))
is brittle because it presumes the location of script.py -- it will only work if script.py is located in Project/package. It will break if a user decides to move/copy/symlink script.py (almost) anywhere else.

Neither hacking PYTHONPATH nor sys.path is a good idea due to the before mentioned reasons. And for linking the current project into the site-packages folder there is actually a better way than python setup.py develop, as explained here:
pip install --editable path/to/project
If you don't already have a setup.py in your project's root folder, this one is good enough to start with:
from setuptools import setup
setup('project')

I think, that in this case using PYTHONPATH is a better thing, mostly because it doesn't introduce (questionable) unneccessary code.
After all, if you think of it, your user doesn't need that sys.path thing, because your package will get installed into site-packages, because you will be using a packaging system.
If the user chooses to run from a "local copy", as you call it, then I've observed, that the usual practice is to state, that the package needs to be added to PYTHONPATH manually, if used outside the site-packages.

Develop Reference

Python is a programming language that lets you work quickly and integrate systems more effectively.