python egg development environment setup - python

I inherited a python project, which has been packaged as egg. Upon check out through SVN, I am seeing package content as:
__init__.py
scripts/
ptools/
setup.py
...
Here, ptools/ hold the source of various modules. scripts/ is bunch of end-user tools that make use of modules provided by the "ptools". The package has been installed on this shared host environment through "easy_install", but I want to modify both scripts/ and ptools/ and test them out without having to go through the cycle of "make an egg, and easy_install" that will affect everyone else.
However, I am lost on how to make environment changes to make scripts/ not to search default .egg when invoking through my development tree, instead of using the "local" modules in ptools/ ... any ideas?
Update: I should have added that I tried PYTHONPATH approach by putting the module path in dev tree there, but then I tried verify through "import sys; print sys.path", there is no change in module search path, which baffles me.
thanks
Oliver

I think I have found the solution to my problem, and this has been answered in the following post. "setup.py develop" seems to be the perfect solution
PYTHONPATH vs. sys.path

You can use the PYTHONPATH environment variable to customize the locations Python searches for modules.

Related

What is the best practice for imports when developing a Python package?

I am trying to build a Python package, that contains sub-modules and sub-packages ("libraries").
I was looking everywhere for the right way to do it, but amazingly I find it very complicated. Also went through multiple threads in StackOverFlow of course..
The problem is as follows:
In order to import a module or a package from another directory, it seems to me that there are 2 options:
a. Adding the absolute path to sys.path.
b. Installing the package with the setuptools.setup function in a setup.py file, in the main directory of the package - which installs the package into the site-packages directory of the specific Python version that in use.
Option a seems too clumsy for me. Option b is great, however I find it impractical becasue I am currently working and editing the package's source code - and the changes are not updating on the installed directory of the package, of course. In addition the installed directory of the package is not tracked by Git, and needless to say I use Git the original directory.
To conclude the question:
What is the best practice to import modules and sub-packages freely and nicely from within sub-directories of a Python package that is currently under construction?
I feel I am missing something but couldn't find a decent solution so far.
Thanks!
This is a great question, and I wish more people would think along these lines. Making a module importable and ultimately installable is absolutely necessary before it can be easily used by others.
On sys.path munging
Before I answer I will say I do use sys.path munging when I do initial development on a file outside of an existing package structure. I have an editor snippet that constructs code like this:
import sys, os
sys.path.append(os.path.expanduser('~/path/to/parent'))
from module_of_interest import * # NOQA
Given the path to the current file I use:
import ubelt as ub
fpath = ub.Path('/home/username/path/to/parent/module_of_interest.py')
modpath, modname = ub.split_modpath(fpath, check=False)
modpath = ub.Path(modpath).shrinkuser() # abstract home directory
To construct the necessary parts the snippet will insert into the file so I can interact with it from within IPython. I find taking the little bit of extra time to remove the reference to my explicit homefolder such that the code still works as long as users have the same relative path structure wrt to the home directory makes this slightly more portable.
Proper Python Package Management
That being said, sys.path munging is not a sustainable solution. Ultimately you want your package to be managed by a python package manger. I know a lot of people use poetry, but I like plain old pip, so I can describe that process, but know this isn't the only way to do it.
To do this we need to go over some basics.
Basics
You must know what Python environment you are working in. Ideally this is a virtual environment managed with pyenv (or conda or mamba or poetry ...). But it's also possible to do this in your global sytem Python environment, although that is not recommended. I like working in a single default Python virtual environment that is always activated in my .bashrc. Its always easy to switch to a new one or blow it away / start fresh.
You need to consider two root paths: the root of your repository, which I will call your repo path, and your root to your package, the package path or module path, which should be a folder with the name of the top-level Python package. You will use this name to import it. This package path must live inside the repo path. Some repos, like xdoctest, like to put the module path in a src directory. Others , like ubelt, like to have the repo path at the top-level of the repository. I think the second case is conceptually easier for new package creators / maintainers, so let's go with that.
Setting up the repo path
So now, you are in an activated Python virtual environment, and we have designated a path we will checkout the repo. I like to clone repos in $HOME/code, so perhaps the repo path is $HOME/code/my_project.
In this repo path you should have your root package path. Lets say your package is named mypymod. Any directory that contains an __init__.py file is conceptually a python module, where the contents of __init__.py are what you get when you import that directory name. The only difference between a directory module and a normal file module is that a directory module/package can have submodules or subpackages.
For example if you are in the my_project repo, i.e. when you ls you see mypymod, and you have a file structure that looks something like this...
+ my_project
+ mypymod
+ __init__.py
+ submod1.py
+ subpkg
+ __init__.py
+ submod2.py
you can import the following modules:
import mypymod
import mypymod.submod1
import mypymod.subpkg
import mypymod.subpkg.submod2
If you ensured that your current working directory was always the repo root, or you put the repo root into sys.path, then this would be all you need. Being visible in sys.path or the CWD is all that is needed for another module could see your module.
The package manifest: setup.py / pyproject.toml
Now the trick is: how do you ensure your other packages / scripts can always see this module? That is where the package manager comes in. For this we will need a setup.py or the newer pyproject.toml variant. I'll describe the older setup.py way of doing things.
All you need to do is put the setup.py in your repo root. Note: it does not go in your package directory. There are plenty of resources for how to write a setup.py so I wont describe it in much detail, but basically all you need is to populate it with enough information so it knows about the name of the package, its location, and its version.
from setuptools import setup
setup(
name='mypymod',
version='0.1.0',
packages=find_packages(include=['mypymod', 'mypymod.*']),
install_requires=[],
)
So your package structure will look like this:
+ my_project
+ setup.py
+ mypymod
+ __init__.py
+ submod1.py
+ subpkg
+ __init__.py
+ submod2.py
There are plenty of other things you can specify, I recommend looking at ubelt and xdoctest as examples. I'll note they contain a non-standard way of parsing requirements out of a requirements.txt or requirements/*.txt files, which I think is generally better than the standard way people handle requirements. But I digress.
Given something that pip or some other package manager (e.g. pipx, poetry) recognizes as a package manifest - a file that describes the contents of your package, you can now install it. If you are still developing it you can install it in editable mode, so instead of the package being copied into your site-packages, only a symbolic link is made, so any changes in your code are reflected each time you invoke Python (or immediately if you have autoreload on with IPython).
With pip it is as simple as running pip install -e <path-to-repo-root>, which is typically done by navigating into the repo and running pip install -e ..
Congrats, you now have a package you can reference from anywhere.
Making the most of your package
The python -m invocation
Now that you have a package you can reference as if it was installed via pip from pypi. There are a few tricks for using it effectively. The first is running scripts.
You don't need to specify a path to a file to run it as a script in Python. It is possible to run a script as __main__ using only its module name. This is done with the -m argument to Python. For instance you can run python -m mypymod.submod1 which will invoke $HOME/code/my_project/mypymod/submod1.py as the main module (i.e. it's __name__ attribute will be set to "__main__").
Furthermore if you want to do this with a directory module you can make a special file called __main__.py in that directory, and that is the script that will be executed. For instance if we modify our package structure
+ my_project
+ setup.py
+ mypymod
+ __init__.py
+ __main__.py
+ submod1.py
+ subpkg
+ __init__.py
+ __main__.py
+ submod2.py
Now python -m mypymod will execute $HOME/code/my_project/mypymod/__main__.py and python -m mypymod.subpkg will execute $HOME/code/my_project/mypymod/subpkg/__main__.py. This is a very handy way to make a module double as both a importable package and a command line executable (e.g. xdoctest does this).
Easier imports
One pain point you might notice is that in the above code if you run:
import mypymod
mypymod.submod1
You will get an error because by default a package doesn't know about its submodules until they are imported. You need to populate the __init__.py to expose any attributes you desire to be accessible at the top-level. You could populate the mypymod/__init__.py with:
from mypymod import submod1
And now the above code would work.
This has a tradeoff though. The more thing you make accessible immediately, the more time it takes to import the module, and with big packages it can get fairly cumbersome. Also you have to manually write the code to expose what you want, so that is a pain if you want everything.
If you took a look at ubelt's init.py you will see it has a good deal of code to explicitly make every function in every submodule accessible at a top-level. I've written yet another library called mkinit that actually automates this process, and it also has the option of using the lazy_loader library to mitigate the performance impact of exposing all attributes at the top-level. I find the mkinit tool very helpful when writing large nested packages.
Summary
To summarize the above content:
Make sure you are working in a Python virtualenv (I recommend pyenv)
Identify your "package path" inside of your "repo path".
Put an __init__.py in every directory you want to be a Python package or subpackage.
Optionally, use mkinit to autogenerate the content of your __init__.py files.
Put a setup.py / pyproject.toml in the root of your "repo path".
Use pip install -e . to install the package in editable mode while you develop it.
Use python -m to invoke module names as scripts.
Hope this helps.

How to maintain python application with dependencies, including my own custom libs?

I'm using Python to develop few company-specific applications. There is a custom shared module ("library") that describes some data and algorithms and there are dozens of Python scripts that work with this library. There's quite a lot of these files, so they are organized in subfolders
myproject
apps
main_apps
app1.py
app2.py
...
utils
util1.py
util2.py
...
library
__init__.py
submodule1
__init__.py
file1.py
...
submodule2
...
Users want to run these scripts by simply going, say, to myproject\utils and launching "py util2.py some_params". Many of these users are developers, so quite often they want to edit a library and immediately re-run scripts with updated code. There are also some 3rd party libraries used by this project and I want to make sure that everyone is using the same versions of these libs.
Now, there are two key problems I encountered:
how to reference (library) from (apps)?
how to manage 3rd party dependencies?
The first problem is well-familiar to many Python developers and was asked on SO for many times: it's quite difficult to instruct Python to import package from "....\library". I tested several different approaches, but it seems that python is reluctant to search for packages anywhere, but in standard libraries locations or the folder of the script itself.
Relative import doesn't work since script is not a part of a library (and even if it was, this still doesn't work when script is executed directly unless it's placed in the "root" project folder which I'd like to avoid)
Placing .pth file (as one might think from reading this document) to script folder apparently doesn't have any effect
Of course direct meddling with sys.path work, but boilerplate code like this one in each and every one of the script files looks quite terrible
import sys, os.path
here = os.path.dirname(os.path.realpath(__file__))
module_root = os.path.abspath(os.path.join(here, '../..'))
sys.path.append(python_root)
import my_library
I realize that this happens because Python wants my library to be properly "installed" and that's indeed would be the only right way to go had this library was developed separately from the scripts that use it. But unfortunately it's not the case and I think that re-doing "installation" of library each time it's changed is going to be quite inconvenient and prone to errors.
The second problem is straightforward. Someone adds a new 3rd party module to our app/lib and everyone else start seeing import problems once they update their apps. Several branches of development, different moments when user does pip install, few rollbacks - and everyone eventually ends using different versions of 3rd party modules. In my case things are additionally complicated by the fact that many devs work a lot with older Python 2.x code while I'd like to move on to Python 3.x
While looking for a possible solution for my problems, I found a truly excellent virtual environments feature in Python. Things looked quite bright:
Create a venv for myproject
Distribute a Requirements.txt file as part of app and provide a script that populates venv accordingly
Symlink my own library to venv site_packages folder so it'll be always detected by Python
This solution looked quite natural & robust. I'm explicitly setting my own environment for my project and place whatever I need into this venv, including my own lib that I can still edit on the fly. And it indeed work. But calling activate.bat to make this python environment active and another batch file to deactivate it is a mess, especially on Windows platform. Boilerplate code that is editing sys.path looks terrible, but at least it doesn't interfere with UX like this potential fix do.
So there's a question that I want to ask.
Is there a way to bind particular python venv to particular folders so python launcher will automatically use this venv for scripts from these folders?
Is there a better alternative way to handle this situation that I'm missing?
Environment for my project is Python 3.6 running on Windows 10.
I think that I finally found a reasonable answer. It's enough to just add shebang line pointing to python interpreter in venv, e.g.
#!../../venv/Scripts/python
The full project structure will look like this
myproject
apps
main_apps
app1.py (with shebang)
app2.py (with shebang)
...
utils
util1.py (with shebang)
util2.py (with shebang)
...
library
__init__.py
submodule1
__init__.py
file1.py
...
submodule2
...
venv
(python interpreter, 3rd party modules)
(symlink to library)
requirements.txt
init_environment.bat
and things work like this:
venv is a virtual python environment with everything that project needs
init_environment.bat is a script that populates venv according to requirements.txt and places a symlink to my library into venv site-modules
all scripts start with shebang line pointing (with relative path) to venv interpreter
There's a full custom environment with all the libs including my own and scripts that use it will all have very natural imports. Python launcher will also automatically pick Python 3.6 as interpreter & load the relevant modules whenever any user-facing script in my project is launched from console or windows explorer.
Cons:
Relative shebang won't work if a script is called from other folder
User will still have to manually run init_environment.bat to update virtual environment according to requirements.txt
init_environment scrip on Windows require elevated privileges to make a symlink (but hopefully that strange MS decision will be fixed with upcoming Win10 update in April'17)
However I can live with these limitations. Hope that this will help others looking for similar problems.
Would be still nice to still hear other options (as answers) and critics (as comments) too.

Absolute imports and managing both a "live" and "development" version of a package

As my limited brain has come to understand it after much reading, relative imports bad, absolute imports good. My question is, how can one effectively manage a "live" and "development" version of a package? That is, if I use absolute imports, my live code and development code are going to be looking at the same thing.
Example
/admin/project1/__init__.py
/scripts/__init__.py
/main1.py
/main2.py
/modules/__init__.py
/helper1.py
with "/admin" on my PYTHONPATH, the contents of project1 all use absolute imports. For example:
main1.py
import project1.modules.helper1
But I want to copy the contents of project1 to another location, and use that copy for development and testing. Because everything is absolute, and because "/admin" is on PYTHONPATH, my copied version is still going to be referencing the live code. I could add my new location to PYTHONPATH, and change the names of all files by hand (i.e. add "dev" to the end of everything), do my changes/work, then when I'm ready to go live, once again, by hand, remove "dev" from everything. This, will work, but is a huge hassle and prone to error.
Surely there must be some better way of handling "live" and "development" versions of a Python project.
You want to use virtualenv (or something like it).
$ virtualenv mydev
$ source mydev/bin/activate
This creates a local Python installation in the mydev directory and modifies several key environment variables to use mydev instead of the default Python directories.
Now, your PYTHONPATH looks in mydev first for any imports, and anything you install (using pip, setup.py, etc) will go in mydev. When you are finished using the mydev virtual environment, run
$ deactivate
to restore your PYTHONPATH to its previous value. mydev remains, so you can always reactivate it later.
#chepner's virtualenv suggestion is a good one. Another option, assuming your project is not installed on the machine as a python egg, is to just add your development path to the front of PYTHONPATH. Python will find your development project1 before the regular one and everyone is happy. Eggs can spoil the fun because they tend to get resolved before the PYTHONPATH paths.

Developing and using the same Python on the same computer

I'm developing a Python utility module to help with file downloads, archives, etc. I have a project set up in a virtual environment along with my unit tests. When I want to use this module on the same computer (essentially as "Production"), I move the files to the mymodule directory in the ~/dev/modules/mymodule
I keep all 3rd-party modules under ~/dev/modules/contrib. This contrib path is on my PYTHONPATH, but mymodule is NOT because I've noticed that if mymodule is on my PYTHONPATH, my unit tests cannot distinguish between the "Development" version and the "Production" version. But now if I want to use this common utility module, I have to manually add it to the PYTHONPATH.
This works, but I'm sure there's a better, more automated way.
What is the best way to have a Development and Production module on the same computer? For instance, is there a way to set PYTHONPATH dynamically?
You can add/modify python paths at sys.path, just make sure that the first path is the current directory ".", because some third-party modules rely on importing from the directory of the current module.
More information on python paths:
http://djangotricks.blogspot.com/2008/09/note-on-python-paths.html
I'm guessing by virtual environment you mean the virtualenv package?
http://pypi.python.org/pypi/virtualenv
What I'd try (and apologies if I've not understood the question right) is:
Keep the source somewhere that isn't referenced by PYTHONPATH (e.g. ~/projects/myproject)
Write a simple setuptools or distutils script for installing it (see Python distutils - does anyone know how to use it?)
Use the virtualenv package to create a dev virtual environment with the --no-site-packages option - this way your "dev" version won't see any packages installed in the default python installation.
(Also make sure your PYTHONPATH doesn't have any of your source directories)
Then, for testing:
Activate dev virtual environment
Run install script, (usually something like python setup.py build install). Your package ends up in /path/to/dev_virtualenv/lib/python2.x/site-packages/
Test, break, fix, repeat
And, for production:
Make sure dev virtualenv isn't activated
Run install script
All good to go, the "dev" version is hidden away in a virtual environment that production can't see...
...And there's no (direct) messing around with PYTHONPATH
That said, I write this with the confidence of someone who's not actually tried setting using virtualenv in anger and the hope I've vaguely understood your question... ;)
You could set the PYTHONPATH as a global environment variable pointing to your Production code, and then in any shell in which you want to use the Development code, change the PYTHONPATH to point to that code.
(Is that too simplistic? Have I missed something?)

PYTHONPATH vs. sys.path

Another developer and I disagree about whether PYTHONPATH or sys.path should be used to allow Python to find a Python package in a user (e.g., development) directory.
We have a Python project with a typical directory structure:
Project
setup.py
package
__init__.py
lib.py
script.py
In script.py, we need to do import package.lib. When the package is installed in site-packages, script.py can find package.lib.
When working from a user directory, however, something else needs to be done. My solution is to set my PYTHONPATH to include "~/Project". Another developer wants to put this line of code in the beginning of script.py:
sys.path.append(os.path.dirname(os.path.dirname(os.path.abspath(__file__))))
So that Python can find the local copy of package.lib.
I think this is a bad idea, as this line is only useful for developers or people running from a local copy, but I can't give a good reason why it is a bad idea.
Should we use PYTOHNPATH, sys.path, or is either fine?
If the only reason to modify the path is for developers working from their working tree, then you should use an installation tool to set up your environment for you. virtualenv is very popular, and if you are using setuptools, you can simply run setup.py develop to semi-install the working tree in your current Python installation.
I hate PYTHONPATH. I find it brittle and annoying to set on a per-user basis (especially for daemon users) and keep track of as project folders move around. I would much rather set sys.path in the invoke scripts for standalone projects.
However sys.path.append isn't the way to do it. You can easily get duplicates, and it doesn't sort out .pth files. Better (and more readable): site.addsitedir.
And script.py wouldn't normally be the more appropriate place to do it, as it's inside the package you want to make available on the path. Library modules should certainly not be touching sys.path themselves. Instead, you'd normally have a hashbanged-script outside the package that you use to instantiate and run the app, and it's in this trivial wrapper script you'd put deployment details like sys.path-frobbing.
In general I would consider setting up of an environment variable (like PYTHONPATH)
to be a bad practice. While this might be fine for a one off debugging but using this as
a regular practice might not be a good idea.
Usage of environment variable leads to situations like "it works for me" when some one
else reports problems in the code base. Also one might carry the same practice with the
test environment as well, leading to situations like the tests running fine for a
particular developer but probably failing when some one launches the tests.
Along with the many other reasons mentioned already, you could also point outh that hard-coding
sys.path.append(os.path.dirname(os.path.dirname(os.path.abspath(__file__))))
is brittle because it presumes the location of script.py -- it will only work if script.py is located in Project/package. It will break if a user decides to move/copy/symlink script.py (almost) anywhere else.
Neither hacking PYTHONPATH nor sys.path is a good idea due to the before mentioned reasons. And for linking the current project into the site-packages folder there is actually a better way than python setup.py develop, as explained here:
pip install --editable path/to/project
If you don't already have a setup.py in your project's root folder, this one is good enough to start with:
from setuptools import setup
setup('project')
I think, that in this case using PYTHONPATH is a better thing, mostly because it doesn't introduce (questionable) unneccessary code.
After all, if you think of it, your user doesn't need that sys.path thing, because your package will get installed into site-packages, because you will be using a packaging system.
If the user chooses to run from a "local copy", as you call it, then I've observed, that the usual practice is to state, that the package needs to be added to PYTHONPATH manually, if used outside the site-packages.

Categories

Resources