Django, Python Modules, and Git Submodules

Django, Python Modules, and Git Submodules - python

I am working on a django project that has uses multiple applications (python modules). Most of those python modules are maintained by other people in their own git repositories. I use the git-submodules command to import them into my project under the 'apps' directory like so:
mysite/
mysite/apps
mysite/apps/django-extensions
mysite/apps/django-celery
mysite/apps/django-comments
mysite/apps/myapp
...etc
Most of those submodules (take django-extensions for example) have a subfolder containing the actual python module: mysite/apps/django-extensions/django_extensions
This means I can't simply set my python path to include mysite/apps--I have to set it to include mysite/apps/django-extensions so it can import the django_extensions subfolder.
It gets annoying typing:
PYTHONPATH=mysite/apps/django-extensions:mysite/apps/django-celery... python manage.py runserver
Is there an easier way I should be laying out my repo? An easier process?
Just for fun, I tried a PYTHONPATH of mysite/apps/*, but that didn't work.

This is the wrong way to do it. Don't install other people's third-party code in your own project file. Instead, create a virtualenv, and install the code directly using pip.

After coming up blank on the internet, I hacked this solution together. It's straight forward and works well enough:
#At the top of settings.py
import sys, os
git_sub_modules = '/path/to/dir/containing/submodules' #Relative paths ok too
for dir in os.listdir(git_sub_modules):
path = os.path.join(git_sub_modules, dir)
if not path in sys.path:
sys.path.append(path)
time passes
UPDATE: It's much easier to use a virtualenv and/or something like dokku for deploying apps. I no longer use this. Although it is still a pain to checkout 3rd party apps that need 'tweaks' and use them in the project.

You could tuck those paths in a dependencies.pth file, and only have the .pth in your path. There are examples in your site-packages / dist-packages.

Could you try checking out just the wanted part of the repository? So if they have the actual code inside of what you're checking out, don't check out the extra part.
So instead of getting django-extensions get django-extensions/django-extensions.
Edit: I believe this is what you could do above.
Also, I believe you could get away with adding an __init__.py in the first django-extensions directory, but then you're going to have to add an extra django-extensions to your imports as well (__init__.py tells python it is a package). While I think this might work, I would recommend shooting for my first example.

Related

What is the best practice for imports when developing a Python package?

I am trying to build a Python package, that contains sub-modules and sub-packages ("libraries").
I was looking everywhere for the right way to do it, but amazingly I find it very complicated. Also went through multiple threads in StackOverFlow of course..
The problem is as follows:
In order to import a module or a package from another directory, it seems to me that there are 2 options:
a. Adding the absolute path to sys.path.
b. Installing the package with the setuptools.setup function in a setup.py file, in the main directory of the package - which installs the package into the site-packages directory of the specific Python version that in use.
Option a seems too clumsy for me. Option b is great, however I find it impractical becasue I am currently working and editing the package's source code - and the changes are not updating on the installed directory of the package, of course. In addition the installed directory of the package is not tracked by Git, and needless to say I use Git the original directory.
To conclude the question:
What is the best practice to import modules and sub-packages freely and nicely from within sub-directories of a Python package that is currently under construction?
I feel I am missing something but couldn't find a decent solution so far.
Thanks!

This is a great question, and I wish more people would think along these lines. Making a module importable and ultimately installable is absolutely necessary before it can be easily used by others.
On sys.path munging
Before I answer I will say I do use sys.path munging when I do initial development on a file outside of an existing package structure. I have an editor snippet that constructs code like this:
import sys, os
sys.path.append(os.path.expanduser('~/path/to/parent'))
from module_of_interest import * # NOQA
Given the path to the current file I use:
import ubelt as ub
fpath = ub.Path('/home/username/path/to/parent/module_of_interest.py')
modpath, modname = ub.split_modpath(fpath, check=False)
modpath = ub.Path(modpath).shrinkuser() # abstract home directory
To construct the necessary parts the snippet will insert into the file so I can interact with it from within IPython. I find taking the little bit of extra time to remove the reference to my explicit homefolder such that the code still works as long as users have the same relative path structure wrt to the home directory makes this slightly more portable.
Proper Python Package Management
That being said, sys.path munging is not a sustainable solution. Ultimately you want your package to be managed by a python package manger. I know a lot of people use poetry, but I like plain old pip, so I can describe that process, but know this isn't the only way to do it.
To do this we need to go over some basics.
Basics
You must know what Python environment you are working in. Ideally this is a virtual environment managed with pyenv (or conda or mamba or poetry ...). But it's also possible to do this in your global sytem Python environment, although that is not recommended. I like working in a single default Python virtual environment that is always activated in my .bashrc. Its always easy to switch to a new one or blow it away / start fresh.
You need to consider two root paths: the root of your repository, which I will call your repo path, and your root to your package, the package path or module path, which should be a folder with the name of the top-level Python package. You will use this name to import it. This package path must live inside the repo path. Some repos, like xdoctest, like to put the module path in a src directory. Others , like ubelt, like to have the repo path at the top-level of the repository. I think the second case is conceptually easier for new package creators / maintainers, so let's go with that.
Setting up the repo path
So now, you are in an activated Python virtual environment, and we have designated a path we will checkout the repo. I like to clone repos in $HOME/code, so perhaps the repo path is $HOME/code/my_project.
In this repo path you should have your root package path. Lets say your package is named mypymod. Any directory that contains an __init__.py file is conceptually a python module, where the contents of __init__.py are what you get when you import that directory name. The only difference between a directory module and a normal file module is that a directory module/package can have submodules or subpackages.
For example if you are in the my_project repo, i.e. when you ls you see mypymod, and you have a file structure that looks something like this...
+ my_project
+ mypymod
+ __init__.py
+ submod1.py
+ subpkg
+ __init__.py
+ submod2.py
you can import the following modules:
import mypymod
import mypymod.submod1
import mypymod.subpkg
import mypymod.subpkg.submod2
If you ensured that your current working directory was always the repo root, or you put the repo root into sys.path, then this would be all you need. Being visible in sys.path or the CWD is all that is needed for another module could see your module.
The package manifest: setup.py / pyproject.toml
Now the trick is: how do you ensure your other packages / scripts can always see this module? That is where the package manager comes in. For this we will need a setup.py or the newer pyproject.toml variant. I'll describe the older setup.py way of doing things.
All you need to do is put the setup.py in your repo root. Note: it does not go in your package directory. There are plenty of resources for how to write a setup.py so I wont describe it in much detail, but basically all you need is to populate it with enough information so it knows about the name of the package, its location, and its version.
from setuptools import setup
setup(
name='mypymod',
version='0.1.0',
packages=find_packages(include=['mypymod', 'mypymod.*']),
install_requires=[],
)
So your package structure will look like this:
+ my_project
+ setup.py
+ mypymod
+ __init__.py
+ submod1.py
+ subpkg
+ __init__.py
+ submod2.py
There are plenty of other things you can specify, I recommend looking at ubelt and xdoctest as examples. I'll note they contain a non-standard way of parsing requirements out of a requirements.txt or requirements/*.txt files, which I think is generally better than the standard way people handle requirements. But I digress.
Given something that pip or some other package manager (e.g. pipx, poetry) recognizes as a package manifest - a file that describes the contents of your package, you can now install it. If you are still developing it you can install it in editable mode, so instead of the package being copied into your site-packages, only a symbolic link is made, so any changes in your code are reflected each time you invoke Python (or immediately if you have autoreload on with IPython).
With pip it is as simple as running pip install -e <path-to-repo-root>, which is typically done by navigating into the repo and running pip install -e ..
Congrats, you now have a package you can reference from anywhere.
Making the most of your package
The python -m invocation
Now that you have a package you can reference as if it was installed via pip from pypi. There are a few tricks for using it effectively. The first is running scripts.
You don't need to specify a path to a file to run it as a script in Python. It is possible to run a script as __main__ using only its module name. This is done with the -m argument to Python. For instance you can run python -m mypymod.submod1 which will invoke $HOME/code/my_project/mypymod/submod1.py as the main module (i.e. it's __name__ attribute will be set to "__main__").
Furthermore if you want to do this with a directory module you can make a special file called __main__.py in that directory, and that is the script that will be executed. For instance if we modify our package structure
+ my_project
+ setup.py
+ mypymod
+ __init__.py
+ __main__.py
+ submod1.py
+ subpkg
+ __init__.py
+ __main__.py
+ submod2.py
Now python -m mypymod will execute $HOME/code/my_project/mypymod/__main__.py and python -m mypymod.subpkg will execute $HOME/code/my_project/mypymod/subpkg/__main__.py. This is a very handy way to make a module double as both a importable package and a command line executable (e.g. xdoctest does this).
Easier imports
One pain point you might notice is that in the above code if you run:
import mypymod
mypymod.submod1
You will get an error because by default a package doesn't know about its submodules until they are imported. You need to populate the __init__.py to expose any attributes you desire to be accessible at the top-level. You could populate the mypymod/__init__.py with:
from mypymod import submod1
And now the above code would work.
This has a tradeoff though. The more thing you make accessible immediately, the more time it takes to import the module, and with big packages it can get fairly cumbersome. Also you have to manually write the code to expose what you want, so that is a pain if you want everything.
If you took a look at ubelt's init.py you will see it has a good deal of code to explicitly make every function in every submodule accessible at a top-level. I've written yet another library called mkinit that actually automates this process, and it also has the option of using the lazy_loader library to mitigate the performance impact of exposing all attributes at the top-level. I find the mkinit tool very helpful when writing large nested packages.
Summary
To summarize the above content:
Make sure you are working in a Python virtualenv (I recommend pyenv)
Identify your "package path" inside of your "repo path".
Put an __init__.py in every directory you want to be a Python package or subpackage.
Optionally, use mkinit to autogenerate the content of your __init__.py files.
Put a setup.py / pyproject.toml in the root of your "repo path".
Use pip install -e . to install the package in editable mode while you develop it.
Use python -m to invoke module names as scripts.
Hope this helps.

How to maintain python application with dependencies, including my own custom libs?

I'm using Python to develop few company-specific applications. There is a custom shared module ("library") that describes some data and algorithms and there are dozens of Python scripts that work with this library. There's quite a lot of these files, so they are organized in subfolders
myproject
apps
main_apps
app1.py
app2.py
...
utils
util1.py
util2.py
...
library
__init__.py
submodule1
__init__.py
file1.py
...
submodule2
...
Users want to run these scripts by simply going, say, to myproject\utils and launching "py util2.py some_params". Many of these users are developers, so quite often they want to edit a library and immediately re-run scripts with updated code. There are also some 3rd party libraries used by this project and I want to make sure that everyone is using the same versions of these libs.
Now, there are two key problems I encountered:
how to reference (library) from (apps)?
how to manage 3rd party dependencies?
The first problem is well-familiar to many Python developers and was asked on SO for many times: it's quite difficult to instruct Python to import package from "....\library". I tested several different approaches, but it seems that python is reluctant to search for packages anywhere, but in standard libraries locations or the folder of the script itself.
Relative import doesn't work since script is not a part of a library (and even if it was, this still doesn't work when script is executed directly unless it's placed in the "root" project folder which I'd like to avoid)
Placing .pth file (as one might think from reading this document) to script folder apparently doesn't have any effect
Of course direct meddling with sys.path work, but boilerplate code like this one in each and every one of the script files looks quite terrible
import sys, os.path
here = os.path.dirname(os.path.realpath(__file__))
module_root = os.path.abspath(os.path.join(here, '../..'))
sys.path.append(python_root)
import my_library
I realize that this happens because Python wants my library to be properly "installed" and that's indeed would be the only right way to go had this library was developed separately from the scripts that use it. But unfortunately it's not the case and I think that re-doing "installation" of library each time it's changed is going to be quite inconvenient and prone to errors.
The second problem is straightforward. Someone adds a new 3rd party module to our app/lib and everyone else start seeing import problems once they update their apps. Several branches of development, different moments when user does pip install, few rollbacks - and everyone eventually ends using different versions of 3rd party modules. In my case things are additionally complicated by the fact that many devs work a lot with older Python 2.x code while I'd like to move on to Python 3.x
While looking for a possible solution for my problems, I found a truly excellent virtual environments feature in Python. Things looked quite bright:
Create a venv for myproject
Distribute a Requirements.txt file as part of app and provide a script that populates venv accordingly
Symlink my own library to venv site_packages folder so it'll be always detected by Python
This solution looked quite natural & robust. I'm explicitly setting my own environment for my project and place whatever I need into this venv, including my own lib that I can still edit on the fly. And it indeed work. But calling activate.bat to make this python environment active and another batch file to deactivate it is a mess, especially on Windows platform. Boilerplate code that is editing sys.path looks terrible, but at least it doesn't interfere with UX like this potential fix do.
So there's a question that I want to ask.
Is there a way to bind particular python venv to particular folders so python launcher will automatically use this venv for scripts from these folders?
Is there a better alternative way to handle this situation that I'm missing?
Environment for my project is Python 3.6 running on Windows 10.

I think that I finally found a reasonable answer. It's enough to just add shebang line pointing to python interpreter in venv, e.g.
#!../../venv/Scripts/python
The full project structure will look like this
myproject
apps
main_apps
app1.py (with shebang)
app2.py (with shebang)
...
utils
util1.py (with shebang)
util2.py (with shebang)
...
library
__init__.py
submodule1
__init__.py
file1.py
...
submodule2
...
venv
(python interpreter, 3rd party modules)
(symlink to library)
requirements.txt
init_environment.bat
and things work like this:
venv is a virtual python environment with everything that project needs
init_environment.bat is a script that populates venv according to requirements.txt and places a symlink to my library into venv site-modules
all scripts start with shebang line pointing (with relative path) to venv interpreter
There's a full custom environment with all the libs including my own and scripts that use it will all have very natural imports. Python launcher will also automatically pick Python 3.6 as interpreter & load the relevant modules whenever any user-facing script in my project is launched from console or windows explorer.
Cons:
Relative shebang won't work if a script is called from other folder
User will still have to manually run init_environment.bat to update virtual environment according to requirements.txt
init_environment scrip on Windows require elevated privileges to make a symlink (but hopefully that strange MS decision will be fixed with upcoming Win10 update in April'17)
However I can live with these limitations. Hope that this will help others looking for similar problems.
Would be still nice to still hear other options (as answers) and critics (as comments) too.

ImportError - Using Python Packages at the same level

I have two Python packages where one needs to be imported by the other. The directory structure is like follows:
workspace/
management/
__init__.py
handle_management.py
other_management.py
utils/
__init__.py
utils_dict.py
I'm trying to import functionality from the utils project in the handle_management.py file:
import utils.utils_dict
Error I'm getting when trying to run handle_management.py:
ImportError: No module named utils.utils_dict
I've read a lot about how to resolve this problem and I can't seem to find a solution that works.
I started with Import a module from a relative path - I tried the applicable solutions but none worked.
Is the only solution to make workspace/ available via site_packages? If so, what is the best way to do this?
EDIT:
I've tried to add the /home/rico/workspace/ to the PYTHONPATH - no luck.
EDIT 2:
I was able to successfully use klobucar's solution but I don't think it will work as a general solution since this utility is going to be used by several other developers. I know I can use some Python generalizations to determine the relative path for each user. I just feel like there is a more elegant solution.
Ultimately this script will run via cron to execute unit testing on several Python projects. This is also going to be available to each developer to ensure integration testing during their development.
I need to be able to make this more general.
EDIT 3:
I'm sorry, but I don't really like any of these solutions for what I'm trying to accomplish. I appreciate the input - and I'm using them as a temporary fix. As a complete fix I'm going to look into adding a script available in the site_packages directory that will add to the PYTHONPATH. This is something that is needed across several machines and several developers.
Once I build my complete solution I'll post back here with what I did and mark it as a solution.
EDIT 4:
I feel as though I didn't do a good job expressing my needs with this question. The answers below addressed this question well - but not my actual needs. As a result I have restructured my question in another post. I considered editing this one, but then the answers below (which would be very helpful for others) wouldn't be meaningful to the change and would seem out of place and irrelevant.
For the revised version of this question please see Unable to import Python package universally

You have 2 solutions:
Either put workspace in your PYTHONPATH:
import sys
sys.path.append('/path/to/workspace')
from utils import utils_dict
(Note that if you're running a script inside workspace, that is importing handle_management, most probably workspace is already in your PYTHONPATH, and you wouldn't need to do that, but it seems it's not the case HERE).
Or, make "workspace" a package by adding an empty (or not) __init__.py file in the workspace directory. Then:
from ..utils import utils_dict
I would prefer the second, because you would have a problem if there's another module called "utils" in you PYTHONPATH
Apart from that, you are importing wrong here: import utils.utils_dict.py. You don't need to include the extension of the file ".py". You are not importing the file, you are importing the module (or package if it's a folder), so you don't want the path to that file, you need its name.

What you need to do is add workspace to your import path. I would make a wrapper that does this for you in workspace or just put workspace in you PYTHONPATH as an environment variable.
import sys
# Add the workspace folder path to the sys.path list
sys.path.append('/path/to/workspace/')
from workspace.utils import utils_dict

Put "workspace/" in your PYTHONPATH to make the packages underneath available to you when it searches.
This can be done from your shell profile (if you are on *nix, or environment variables on windows.
For instance, on OSX you might add to your ~/.profile (if workspace is in your home directory):
export PYTHONPATH=$HOME/workspace:$PYTHONPATH
Another option is to use virtualenv to make your project area its own contained environment.

Best practice for handling path/executables in project scripts in Python (e.g. something like Django's manage.py, or fabric)

I do a lot of work on different projects (I'm a scientist) in a fairly standardised directory structure. e.g.:
project
/analyses/
/lib
/doc
/results
/bin
I put all my various utility scripts in /bin/ because cleanliness is next to godliness. However, I have to hard code paths (e.g. ../../x/y/z) and then I have to run things within ./bin/ or they break.
I've used Django and that has /manage.py which runs various django-things and automatically handles the path. I've also used fabric to run various user defined functions.
Question: How do I do something similar? and what's the best way? I can easily write something in /manage.py to inject the root dir into sys.path etc, but then I'd like to be able to do "./manage.py foo" which would run /bin/foo.py. Or is it possible to get fabric to call executables from a certain directory?
Basically - I want something easy and low maintenance. I want to be able to drop an executable script/file/whatever into ./bin/ and not have to deal with path issues or import issues.
What is the best way to do this?

Keep Execution at TLD
In general, try to keep your runtime at top-level. This will straighten out your imports tremendously.
If you have to do a lot of import addressing with relative imports, there's probably a
better way.
Modifying The Path
Other poster's have mentioned the PYTHONPATH. That's a great way to do it permanently in your shell.
If you don't want to/aren't able to manipulate the PYTHONPATH project path directly you can use sys.path to get yourself out of relative import hell.
Using sys.path.append
sys.path is just a list internally. You can append to it to add stuff to into your path.
Say I'm in /bin and there's a library markdown in lib/. You can append a relative paths with sys.path to import what you want.
import sys
sys.path.append('../lib')
import markdown
print markdown.markdown("""
Hello world!
------------
""")
Word to the wise: Don't get too crazy with your sys.path additions. Keep your schema simple to avoid yourself a lot confusion.
Overly eager imports can sometimes lead to cases where a python module needs to import itself, at which point execution will halt!
Using Packages and __init__.py
Another great trick is creating python packages by adding __init__.py files. __init__.py gets loaded before any other modules in the directory, so it's a great way to add imports across the entire directory. This makes it an ideal spot to add sys.path hackery.
You don't even need to necessarily add anything to the file. It's sufficient to just do touch __init__.py at the console to make a directory a package.
See this SO post for a more concrete example.

In a shell script that you source (not run) in your current shell you set the following environment variables:
PATH=$PATH:$PROJECTDIR/bin
PYTHONPATH=$PROJECTDIR/lib
Then you put your Python modules and package tree in your projects ./lib directory. Python automatically adds the PYTHONPATH environment variable to sys.path.
Then you can run any top-level script from the shell without specifying the path, and any imports from your library modules are looked for in the lib directory.
I recommend very simple top-level scripts, such as:
#!/usr/bin/python
import sys
import mytool
mytool.main(sys.argv)
Then you never have to change that, you just edit the module code, and also benefit from the byte-code caching.

You can easily achieve your goals by creating a mini package that hosts each one of your projects. Use paste scripts to create a simple project skeleton. And to make it executable, just install it via setup.py develop. Now your bin scripts just need to import the entry point to this package and execute it.

PYTHONPATH vs. sys.path

Another developer and I disagree about whether PYTHONPATH or sys.path should be used to allow Python to find a Python package in a user (e.g., development) directory.
We have a Python project with a typical directory structure:
Project
setup.py
package
__init__.py
lib.py
script.py
In script.py, we need to do import package.lib. When the package is installed in site-packages, script.py can find package.lib.
When working from a user directory, however, something else needs to be done. My solution is to set my PYTHONPATH to include "~/Project". Another developer wants to put this line of code in the beginning of script.py:
sys.path.append(os.path.dirname(os.path.dirname(os.path.abspath(__file__))))
So that Python can find the local copy of package.lib.
I think this is a bad idea, as this line is only useful for developers or people running from a local copy, but I can't give a good reason why it is a bad idea.
Should we use PYTOHNPATH, sys.path, or is either fine?

If the only reason to modify the path is for developers working from their working tree, then you should use an installation tool to set up your environment for you. virtualenv is very popular, and if you are using setuptools, you can simply run setup.py develop to semi-install the working tree in your current Python installation.

I hate PYTHONPATH. I find it brittle and annoying to set on a per-user basis (especially for daemon users) and keep track of as project folders move around. I would much rather set sys.path in the invoke scripts for standalone projects.
However sys.path.append isn't the way to do it. You can easily get duplicates, and it doesn't sort out .pth files. Better (and more readable): site.addsitedir.
And script.py wouldn't normally be the more appropriate place to do it, as it's inside the package you want to make available on the path. Library modules should certainly not be touching sys.path themselves. Instead, you'd normally have a hashbanged-script outside the package that you use to instantiate and run the app, and it's in this trivial wrapper script you'd put deployment details like sys.path-frobbing.

In general I would consider setting up of an environment variable (like PYTHONPATH)
to be a bad practice. While this might be fine for a one off debugging but using this as
a regular practice might not be a good idea.
Usage of environment variable leads to situations like "it works for me" when some one
else reports problems in the code base. Also one might carry the same practice with the
test environment as well, leading to situations like the tests running fine for a
particular developer but probably failing when some one launches the tests.

Along with the many other reasons mentioned already, you could also point outh that hard-coding
sys.path.append(os.path.dirname(os.path.dirname(os.path.abspath(__file__))))
is brittle because it presumes the location of script.py -- it will only work if script.py is located in Project/package. It will break if a user decides to move/copy/symlink script.py (almost) anywhere else.

Neither hacking PYTHONPATH nor sys.path is a good idea due to the before mentioned reasons. And for linking the current project into the site-packages folder there is actually a better way than python setup.py develop, as explained here:
pip install --editable path/to/project
If you don't already have a setup.py in your project's root folder, this one is good enough to start with:
from setuptools import setup
setup('project')

I think, that in this case using PYTHONPATH is a better thing, mostly because it doesn't introduce (questionable) unneccessary code.
After all, if you think of it, your user doesn't need that sys.path thing, because your package will get installed into site-packages, because you will be using a packaging system.
If the user chooses to run from a "local copy", as you call it, then I've observed, that the usual practice is to state, that the package needs to be added to PYTHONPATH manually, if used outside the site-packages.

Develop Reference

Python is a programming language that lets you work quickly and integrate systems more effectively.