Best practice for hacking on a 3rd-party python module

Best practice for hacking on a 3rd-party python module - python

I often find myself wanting to use a 3rd party python module in my own project, but I know that I will also need to make changes to the 3rd party module that I want to push upstream. What is the best practice of file layout/installation to achieve this?
Most python modules are laid out with root dir containing a "setup.py" to compile/install the module. The problem is, every time I make changes to the module source I need to re-run the full install step in order to use those changes in my project. For large modules, like scipy this can take some time.
Alternatively, I can hack on the installed version of the python module, but then I have to manually move those changes back to the source version of the module in order to generate patches etc.
I know about virtualenv and PYTHONPATH but they are ways of installing a module to a different location.
So far, I have manually created symlinks, but that is messy.

If the 3rd party project is using setuptools or distribute, you can do python setup.py develop instead of install. This will create the appropriate sym-links in the site-packages dir for you.

Related

Is it possible to have users not pip install modules and instead include the modules used in a different folder and then import that?

I want to know if I can create a python script with a folder in the same directory with all the assets of a python module, so when someone wants to use it, they would not have to pip install module, because it would import from the directory.

Yes, you can, but it doesn't mean that you should.
First, ask yourself who is suposed to use that code.
If you plan to give it to consumers, it would be a good idea to use a tool like py2exe and create executable file which would include all modules and not allow for code to be changed.
If you plan to share it with another developer, you might want to look into virtual environments and requirements.txt file.
There are multiple reasons why sharing modules is bad idea:
It is harder to update modules later, at least without upgrading whole project.
It uses more space on version control, which can create issues on huge projects with hundreds of modules and branches
It might be illegal as some licenses specifically forbid including their code in your source code.
The pip install of some module might do different things depending on operating system version or installed packages. The modules on your machine might be suboptimal on someone else's machine, and in some instances might not even work.
And probably more that I can't think of right now.
The only situation where I saw this being unavoidable was when the module didn't support python implementation the application was running on. The module was changed, and its source was put under lib folder with the rest of the libraries.

I think you can add the directory with python modules into PYTHONPATH. Then people want to use those modules just need has this envvar set.
https://docs.python.org/3/using/cmdline.html#envvar-PYTHONPATH

How to import some set of Python modules globally?

In my particular setting, I have a set of python modules that include auxiliary functions used in many different other modules. I putted them into a LIBS folder and I have other folder at the same path level those are including other modules that are doing certain jobs by using the help of these LIBS modules. Presently, I do this for all the modules to import LIBS modules.
import sys
sys.path.insert(0, '../LIBS')
import lib_module1
import lib_module2
....
As the project getting larger, this starts to be pain in the neck. I need to write down a large set of import statements for these auxiliary LIBS modules for each new module.
Is there any way to automatically import all these LIBS modules for the other modules that are in the folders living at the same path lelvel with LIBS folder?

For this, you can use
__init__.py
Kindly refer Modules and Stackoverflow.

Indeed there is! Start treating your LIBS modules as "real" modules (or packages) that are installed into the system like any other.
This means you will have to write a setup.py script to install your code. Generally this is done inside your development directory, then your module is installed with:
$ sudo python setup.py install
This will install your module under the site-packages subdirectory of wherever Python libraries are stored on your system.
I suggest starting by copying someone else's working setup.py and supporting files, then modifying to suit your packages. For example, here is my quoter module.
Fair warning: This is a pretty big step. Not only will you learn to deploy your module locally, you can also publish it on PyPI if you wish. The step of moving to true packages will encourage you to write more and more standard documentation, to develop and run more tests, to adopt more rigorous version specifications, to more clearly identify and define code dependencies, and take many other "professionalization" steps. These all pay dividends in better, more reliable, more portable, more easily deployed code--but I'd be lying if I didn't admit the learning curve can be steep at times.

Python / Git / Module structure best practice

We have a lot small projects that share common utility "projects"
Example:
utility project math contains function add
project A and project B both need math.add
project A has nothing to do with project B
so is it a good idea to have 3 git repositories (project_A,project_B and math) and clone them locally as
/SOMWHERE/workspace/project_A
/SOMWHERE/workspace/math
and have in /SOMWHERE/workspace/project_A/__init__.py something like
import sys
sys.path.append('../math')
import math
math.add()
I have read Structuring Your Project but that doesn't handle SCM and sharing modules.
So to sum up my question: is
sys.path.append('../math')
import math
good practice or is there a more "pythonic" way of doing that?

Submodules are a suboptimal way of sharing modules like you said in your comments. A better way would be to use the tools offered by your language of choice, i.e Python.
First, create virtualenvs to isolate every project python environment. Use pip to install packages and store dependencies in a requirements.txt file.
Then, you can create a specific package for each of your utils library using distutils and share it on Pypi.
If you don't want to release your packages into the wild, you can also host your own Pypi server.
Using this setup, you will be able to use different versions of your libraries and work on them without breaking compatibility with older code bases. You will also avoid using submodules, that are difficult to use with git.

all of what you describe (3 projects) sounds fine except that you shouldn't mess around with sys.path. instead, set the PYTHONPATH environment variable.
also, if you were not aware of distutils i am guessing you may be new to python development, and may not know about virtualenv. you should use that too (it allows you to develope against a "clean" python version that has no packages, or only the packages you install for that env).

Python compile all modules into a single python file

I am writing a program in python to be sent to other people, who are running the same python version, however these some 3rd party modules that need to be installed to use it.
Is there a way to compile into a .pyc (I only say pyc because its a python compiled file) that has the all the dependant modules inside it as well?
So they can run the programme without needing to install the modules separately?
Edit:
Sorry if it wasnt clear, but I am aware of things such as cx_freeze etc but what im trying to is just a single python file.
So they can just type "python myapp.py" and then it will run. No installation of anything. As if all the module codes are in my .py file.

If you are on python 2.3 or later and your dependencies are pure python:
If you don't want to go the setuptools or distutiles routes, you can provide a zip file with the pycs for your code and all of its dependencies. You will have to do a little work to make any complex pathing inside the zip file available (if the dependencies are just lying around at the root of the zip this is not necessary. Then just add the zip location to your path and it should work just as if the dependencies files has been installed.
If your dependencies include .pyds or other binary dependencies you'll probably have to fall back on distutils.

You can simply include .pyc files for the libraries required, but no - .pyc cannot work as a container for multiple files (unless you will collect all the source into one .py file and then compile it).

It sounds like what you're after is the ability for your end users to run one command, e.g. install my_custom_package_and_all_required_dependencies, and have it assemble everything it needs.
This is a perfect use case for distutils, with which you can make manifests for your own code that link out to external dependencies. If your 3rd party modules are available publicly in a standard format (they should be, and if they're not, it's pretty easy to package them yourself), then this approach has the benefit of allowing you to very easily change what versions of 3rd party libraries your code runs against (see this section of the above linked doc). If you're dead set on packaging others' code with your own, you can always include the required files in the .egg you create with distutils.

Two options:
build a package that will install the dependencies for them (I don't recommend this if the only dependencies are python packages that are installed with pip)
Use virtual environments. You use an existing python on their system but python modules are installed into the virtualenv.
or I suppose you could just punt, and create a shell script that installs them, and tell them to run it once before they run your stuff.

Python path issue while importing my own modules. What is the best way to go about it?

I've created python modules but they are in different directories.
/xml/xmlcreator.py
/tasklist/tasks.py
Here, tasks.py is trying to import xmlcreator but both are in different paths. One way to do it is include xmlcreator.py in the Pythonpath. But, considering that I'll be publishing the code, this doesn't seem the right way to go about it as suggested here. Thus, how do I include xmlcreator or rather any module that might be written by me which would be in various directories and sub directories?

Are you going to publish both modules separately or together in one package?
If the former, then you'll probably want to have your users install your xml module (I'd call it something else :) so that it is, by default, already on Python's path, and declare it as a dependency of the tasklist module.
If both are distributed as a bundle, then relative imports seem to be the best option, since you can control where the paths are relative to each other.

The best way is to create subpackages in a single top-level package that you define. You then ship these together in one package. If you are using setuptools/Distribute and you want to distribute them separately then you may also define a "namspace package" that the packages will be installed in. You don't need to use any ugly sys.path hacks.
Make a directory tree like this:
mypackage/__init__.py
mypackage/xml/__init__.py
mypackage/xml/xmlcreator.py
mypackage/tasklist/__init__.py
mypackage/tasklist/tasks.py
The __init__.py files may be empty. They define the directory to be a package that Python will search in.
Except if you want to use namespace packages the mypackage/__init__.py should contains:
__import__('pkg_resources').declare_namespace(__name__)
And your setup.py file contain:
...
namespace_packages=["mypackage"],
...
Then in your code:
from mypackage.xml import xmlcreator
from mypackage.tasklist import tasks
Will get them anywhere you need them. You only need to make one name globally unique in this case, the mypackage name.
For developing the code you can put the package in "develop mode", by doing
python setup.py develop --user
This will set up the local python environment to look for your package in your workspace.

When I start a new Python project, I immediately write its setup.py and declare my Python modules/packages, so that then I just do:
python setup.py develop
and everything gets magically added to my PYTHONPATH. If you do it from a virtualenv it's even better, since you don't need to install it system-wide.
Here's more about it:
http://packages.python.org/distribute/setuptools.html#development-mode

Develop Reference

Python is a programming language that lets you work quickly and integrate systems more effectively.

Best practice for hacking on a 3rd-party python module - python

If the 3rd party project is using setuptools or distribute, you can do python setup.py develop instead of install. This will create the appropriate sym-links in the site-packages dir for you.

Related

Is it possible to have users not pip install modules and instead include the modules used in a different folder and then import that?

How to import some set of Python modules globally?

Python / Git / Module structure best practice

Python compile all modules into a single python file

Python path issue while importing my own modules. What is the best way to go about it?

Categories

Resources