Difference between adding path to PYTHONPATH and installing your own module

Difference between adding path to PYTHONPATH and installing your own module - python

I'm working on a python project that contains a number of routines I use repeatedly. Instead of rewriting code all the time, I just want to update my package and import it; however, it's nowhere near done and is constantly changing. I host the package on a repo so that colleagues on various machines (UNIX + Windows) can pull it into their local repos and use it.
It sounds like I have two options, either I can keeping installing the package after every change or I can just add the folder directory to my system's path. If I change the package, does it need to be reinstalled? I'm using this blog post as inspiration, but the author there doesn't stress the issue of a continuously changing package structure, so I'm not sure how to deal with this.
Also if I wanted to split the project into multiple files and bundle it as a package, at what level in the directory structure does the PTYHONPATH need to be at? To the main project directory, or the .sample/ directory?
README.rst
LICENSE
setup.py
requirements.txt
sample/__init__.py
sample/core.py
sample/helpers.py
docs/conf.py
docs/index.rst
tests/test_basic.py
tests/test_advanced.py
In this example, I want to be able to just import the package itself and call the modules within it like this:
import sample
arg = sample.helper.foo()
out = sample.core.bar(arg)
return out
Where core contains a function called foo

PYTHONPATH is a valid way of doing this, but in my (personal) opinion it's more useful if you have a whole different place where you keep your python variables. Like /opt/pythonpkgs or so.
For projects where I want it to be installed and also I have to keep developing, I use develop instead of install in setup.py:
When installing the package, don't do:
python setup.py install
Rather, do:
python setup.py develop
What this does is that it creates a synlink/shortcut (I believe it's called egglink in python) in the python libs (where the packages are installed) to point to your module's directory. Hence, as it's a shortcut/symlink/egglink when ever you change a python file, it will immediately reflect the next time you import that file.
Note: Using this, if you delete the repository/directory you ran this command from, the package will cease to exist (as its only a shortcut)
The equivalent in pip is -e (for editable):
pip install -e .
Instead of:
pip install .

Related

What is the best practice for imports when developing a Python package?

I am trying to build a Python package, that contains sub-modules and sub-packages ("libraries").
I was looking everywhere for the right way to do it, but amazingly I find it very complicated. Also went through multiple threads in StackOverFlow of course..
The problem is as follows:
In order to import a module or a package from another directory, it seems to me that there are 2 options:
a. Adding the absolute path to sys.path.
b. Installing the package with the setuptools.setup function in a setup.py file, in the main directory of the package - which installs the package into the site-packages directory of the specific Python version that in use.
Option a seems too clumsy for me. Option b is great, however I find it impractical becasue I am currently working and editing the package's source code - and the changes are not updating on the installed directory of the package, of course. In addition the installed directory of the package is not tracked by Git, and needless to say I use Git the original directory.
To conclude the question:
What is the best practice to import modules and sub-packages freely and nicely from within sub-directories of a Python package that is currently under construction?
I feel I am missing something but couldn't find a decent solution so far.
Thanks!

This is a great question, and I wish more people would think along these lines. Making a module importable and ultimately installable is absolutely necessary before it can be easily used by others.
On sys.path munging
Before I answer I will say I do use sys.path munging when I do initial development on a file outside of an existing package structure. I have an editor snippet that constructs code like this:
import sys, os
sys.path.append(os.path.expanduser('~/path/to/parent'))
from module_of_interest import * # NOQA
Given the path to the current file I use:
import ubelt as ub
fpath = ub.Path('/home/username/path/to/parent/module_of_interest.py')
modpath, modname = ub.split_modpath(fpath, check=False)
modpath = ub.Path(modpath).shrinkuser() # abstract home directory
To construct the necessary parts the snippet will insert into the file so I can interact with it from within IPython. I find taking the little bit of extra time to remove the reference to my explicit homefolder such that the code still works as long as users have the same relative path structure wrt to the home directory makes this slightly more portable.
Proper Python Package Management
That being said, sys.path munging is not a sustainable solution. Ultimately you want your package to be managed by a python package manger. I know a lot of people use poetry, but I like plain old pip, so I can describe that process, but know this isn't the only way to do it.
To do this we need to go over some basics.
Basics
You must know what Python environment you are working in. Ideally this is a virtual environment managed with pyenv (or conda or mamba or poetry ...). But it's also possible to do this in your global sytem Python environment, although that is not recommended. I like working in a single default Python virtual environment that is always activated in my .bashrc. Its always easy to switch to a new one or blow it away / start fresh.
You need to consider two root paths: the root of your repository, which I will call your repo path, and your root to your package, the package path or module path, which should be a folder with the name of the top-level Python package. You will use this name to import it. This package path must live inside the repo path. Some repos, like xdoctest, like to put the module path in a src directory. Others , like ubelt, like to have the repo path at the top-level of the repository. I think the second case is conceptually easier for new package creators / maintainers, so let's go with that.
Setting up the repo path
So now, you are in an activated Python virtual environment, and we have designated a path we will checkout the repo. I like to clone repos in $HOME/code, so perhaps the repo path is $HOME/code/my_project.
In this repo path you should have your root package path. Lets say your package is named mypymod. Any directory that contains an __init__.py file is conceptually a python module, where the contents of __init__.py are what you get when you import that directory name. The only difference between a directory module and a normal file module is that a directory module/package can have submodules or subpackages.
For example if you are in the my_project repo, i.e. when you ls you see mypymod, and you have a file structure that looks something like this...
+ my_project
+ mypymod
+ __init__.py
+ submod1.py
+ subpkg
+ __init__.py
+ submod2.py
you can import the following modules:
import mypymod
import mypymod.submod1
import mypymod.subpkg
import mypymod.subpkg.submod2
If you ensured that your current working directory was always the repo root, or you put the repo root into sys.path, then this would be all you need. Being visible in sys.path or the CWD is all that is needed for another module could see your module.
The package manifest: setup.py / pyproject.toml
Now the trick is: how do you ensure your other packages / scripts can always see this module? That is where the package manager comes in. For this we will need a setup.py or the newer pyproject.toml variant. I'll describe the older setup.py way of doing things.
All you need to do is put the setup.py in your repo root. Note: it does not go in your package directory. There are plenty of resources for how to write a setup.py so I wont describe it in much detail, but basically all you need is to populate it with enough information so it knows about the name of the package, its location, and its version.
from setuptools import setup
setup(
name='mypymod',
version='0.1.0',
packages=find_packages(include=['mypymod', 'mypymod.*']),
install_requires=[],
)
So your package structure will look like this:
+ my_project
+ setup.py
+ mypymod
+ __init__.py
+ submod1.py
+ subpkg
+ __init__.py
+ submod2.py
There are plenty of other things you can specify, I recommend looking at ubelt and xdoctest as examples. I'll note they contain a non-standard way of parsing requirements out of a requirements.txt or requirements/*.txt files, which I think is generally better than the standard way people handle requirements. But I digress.
Given something that pip or some other package manager (e.g. pipx, poetry) recognizes as a package manifest - a file that describes the contents of your package, you can now install it. If you are still developing it you can install it in editable mode, so instead of the package being copied into your site-packages, only a symbolic link is made, so any changes in your code are reflected each time you invoke Python (or immediately if you have autoreload on with IPython).
With pip it is as simple as running pip install -e <path-to-repo-root>, which is typically done by navigating into the repo and running pip install -e ..
Congrats, you now have a package you can reference from anywhere.
Making the most of your package
The python -m invocation
Now that you have a package you can reference as if it was installed via pip from pypi. There are a few tricks for using it effectively. The first is running scripts.
You don't need to specify a path to a file to run it as a script in Python. It is possible to run a script as __main__ using only its module name. This is done with the -m argument to Python. For instance you can run python -m mypymod.submod1 which will invoke $HOME/code/my_project/mypymod/submod1.py as the main module (i.e. it's __name__ attribute will be set to "__main__").
Furthermore if you want to do this with a directory module you can make a special file called __main__.py in that directory, and that is the script that will be executed. For instance if we modify our package structure
+ my_project
+ setup.py
+ mypymod
+ __init__.py
+ __main__.py
+ submod1.py
+ subpkg
+ __init__.py
+ __main__.py
+ submod2.py
Now python -m mypymod will execute $HOME/code/my_project/mypymod/__main__.py and python -m mypymod.subpkg will execute $HOME/code/my_project/mypymod/subpkg/__main__.py. This is a very handy way to make a module double as both a importable package and a command line executable (e.g. xdoctest does this).
Easier imports
One pain point you might notice is that in the above code if you run:
import mypymod
mypymod.submod1
You will get an error because by default a package doesn't know about its submodules until they are imported. You need to populate the __init__.py to expose any attributes you desire to be accessible at the top-level. You could populate the mypymod/__init__.py with:
from mypymod import submod1
And now the above code would work.
This has a tradeoff though. The more thing you make accessible immediately, the more time it takes to import the module, and with big packages it can get fairly cumbersome. Also you have to manually write the code to expose what you want, so that is a pain if you want everything.
If you took a look at ubelt's init.py you will see it has a good deal of code to explicitly make every function in every submodule accessible at a top-level. I've written yet another library called mkinit that actually automates this process, and it also has the option of using the lazy_loader library to mitigate the performance impact of exposing all attributes at the top-level. I find the mkinit tool very helpful when writing large nested packages.
Summary
To summarize the above content:
Make sure you are working in a Python virtualenv (I recommend pyenv)
Identify your "package path" inside of your "repo path".
Put an __init__.py in every directory you want to be a Python package or subpackage.
Optionally, use mkinit to autogenerate the content of your __init__.py files.
Put a setup.py / pyproject.toml in the root of your "repo path".
Use pip install -e . to install the package in editable mode while you develop it.
Use python -m to invoke module names as scripts.
Hope this helps.

Converting a python package into a single importable file

Is there a way to convert a python package, i.e. is a folder of python files, into a single file that can be copied and then directly imported into a python script without needing to run any extra shell commands? I know it is possible to zip all of the files and then unzip them from python when they are needed, but I'm hoping that there is a more elegant solution.

It's not totally clear what the question is. I could interpret it two ways.
If you are looking to manage the symbols from many modules in a more organized way:
You'll want to put an __init__.py file in your directory and make it a package. In it you can define the symbols for your package, and create a graceful import packagename behavior. Details on packages.
If you are looking to make your code portable to another environment:
One way or the other, the package needs to be accessible in whatever environment it is run in. That means it either needs to be installed in the python environment (likely using pip), copied into a location that is in a subdirectory relative to the running code, or in a directory that is listed in the PYTHONPATH environment variable.
The most straightforward way to package up code and make it portable is to use setuptools to create a portable package that can be installed into any python environment. The manual page for Packaging Projects gives the details of how to go about building a package archive, and optionally uploading to PyPi for public distribution. If it is for private use, the resulting archive can be passed around without uploading it to the public repository.

Changes in Python scripts are not accepted

I'm new to Python, so I think my question is very fundamental and is asked a few times before but I cannot really find something (maybe because I do not really know how to search for that problem).
I installed a module in Python (reportlab). Now I wanted to modify a python script in that module but it seems that the python interpreter does not notice the updates in the script. Ironically the import is successful although Python actually should not find that package because I deleted it before. Does Python uses something like a Cache or any other storage for the modules? How can I edit modules and use those updated scripts?

From what you are saying, you downloaded a package and installed it using either a local pip or setup.py. When you do so, it copies all the files into your python package directory. So after an install, you can delete the source folder because python is not looking here.
If you want to be able to modify, edit, something and see changes, you have to install it in editable mode. Inside the main folder do:
python setup.py develop
or
pip install -e .
This will create a symbolic link to you python package repository. You will be able to modify sources.
Careful for the changes to be effective, you have to restart your python interpreter. You cannot just import again the module or whatever else.

Installing a package to a specific relative directory with pip

I have several packages that I need to install in a directory like this:
../site-packages/mynamespace/packages
Mainly, this is for historical reasons so imports don't break.
There are several such packages and we would need to be able to pick and choose which package we need based on the needs of the project.
So, this all needs to happen in the requirements.txt file, so something like this in the requirements.txt
requests
celery
beautifulsoup4
path/to/mypackage1
path/to/mypackage3
path/to/mypackage5
(for example)
And then I need
path/to/mypackage1
path/to/mypackage3
path/to/mypackage5
to be installed to:
../site-packages/mynamespace/packages
Accordingly, would mypackage1, mypackage3, and mypacakge5 all to be available from
from mynamespace.packages import mypackage1 #(and 3 and 5)
So far what I've tried to do is create a site.cfg file to go along with the setup.py, but I think pip might be creating it's own install values and bypassing the site.cfg, that looks like this:
[install]
install-base=$HOME
install-purelib=Lib\site-packages\mynamespace\packages
Also, each environment runs in a virtual environment.
I thought of having two separate requirements files to run, but this isn't workable. I need to have all the packages in a single requirements.txt file that is invoked simply by pip install -r requirements.txt
I thought of having them install directly to site-packages and then adding the imports to mynamespace.packages.__init__.py but this is also isn't workable. The packages need to physically be located under site-pacakges.mynamespace.packages.
Furthermore, --target will not work for me because A.) this is absolute path only (relative paths work, but this is based on where the user is currently located) B.) this violates the above requirement of a single requirements.txt file (otherwise requests, etc would all go to the other directory)
I'm thinking there must be some way to hack site.cfg.
Ideally, if I could add --target=relative_path within the requirements.txt that would solve my problem, but I don't think this is possible. Only a few options are allowed within a requirements file (-e being one) but not -t. Even so, it would have to be relative based on the site-packages directory used for the install and not relative where the user is located.
Also, this needs to work in all operating systems.
Thank you

How to import libraries from source code in python?

I am trying to write a python script that I could easily export to friends without dependency problems, but am not sure how to do so. Specifically, my script relies on code from BeautifulSoup, but rather than forcing friends to have to install BeautifulSoup, I would rather just package the src for BeautifulSoup into a Libraries/ folder in my project files and call to functions from there. However, I can no longer simply "import bs4." What is the correct way of going about this?
Thanks!

A common approach is ship a requirements file with a project, specifying which version of which library is required. This file is (by convention) often named requirements.txt and looks something like this:
MyApp
BeautifulSoup==3.2.1
SomeOtherLib==0.9.4
YetAnother>=0.2
(The fictional file above says: I need exactly BeautifulSoup 3.2.1, SomeOtherLib 0.9.4 and any version of YetAnother greater or equal to 0.2).
Then the user of this project can simply take you library, (create a virtualenv) and run
$ pip install -r requirements.txt
which then will fetch all libraries and makes them available either system-wide of project-wide (if virtualenv is used). Here's a random python project off github, having a requirements file:
https://github.com/laoqiu/pypress
https://github.com/laoqiu/pypress/blob/master/requirements.txt
The nice thing about this approach is that you'll get your transitive dependencies resolved automatically. Also, if you use virtualenv, you'll get a clean separation of your projects and avoid library version collisions.

You must add Libraries/ (converted to an absolute path first) to sys.path before attempting to import anything under it.

Develop Reference

Python is a programming language that lets you work quickly and integrate systems more effectively.

Difference between adding path to PYTHONPATH and installing your own module - python

Related

What is the best practice for imports when developing a Python package?

Converting a python package into a single importable file

Changes in Python scripts are not accepted

Installing a package to a specific relative directory with pip

How to import libraries from source code in python?

Categories

Resources