Python / Git / Module structure best practice - python

We have a lot small projects that share common utility "projects"
Example:
utility project math contains function add
project A and project B both need math.add
project A has nothing to do with project B
so is it a good idea to have 3 git repositories (project_A,project_B and math) and clone them locally as
/SOMWHERE/workspace/project_A
/SOMWHERE/workspace/math
and have in /SOMWHERE/workspace/project_A/__init__.py something like
import sys
sys.path.append('../math')
import math
math.add()
I have read Structuring Your Project but that doesn't handle SCM and sharing modules.
So to sum up my question: is
sys.path.append('../math')
import math
good practice or is there a more "pythonic" way of doing that?

Submodules are a suboptimal way of sharing modules like you said in your comments. A better way would be to use the tools offered by your language of choice, i.e Python.
First, create virtualenvs to isolate every project python environment. Use pip to install packages and store dependencies in a requirements.txt file.
Then, you can create a specific package for each of your utils library using distutils and share it on Pypi.
If you don't want to release your packages into the wild, you can also host your own Pypi server.
Using this setup, you will be able to use different versions of your libraries and work on them without breaking compatibility with older code bases. You will also avoid using submodules, that are difficult to use with git.

all of what you describe (3 projects) sounds fine except that you shouldn't mess around with sys.path. instead, set the PYTHONPATH environment variable.
also, if you were not aware of distutils i am guessing you may be new to python development, and may not know about virtualenv. you should use that too (it allows you to develope against a "clean" python version that has no packages, or only the packages you install for that env).

Related

Is it possible to have users not pip install modules and instead include the modules used in a different folder and then import that?

I want to know if I can create a python script with a folder in the same directory with all the assets of a python module, so when someone wants to use it, they would not have to pip install module, because it would import from the directory.
Yes, you can, but it doesn't mean that you should.
First, ask yourself who is suposed to use that code.
If you plan to give it to consumers, it would be a good idea to use a tool like py2exe and create executable file which would include all modules and not allow for code to be changed.
If you plan to share it with another developer, you might want to look into virtual environments and requirements.txt file.
There are multiple reasons why sharing modules is bad idea:
It is harder to update modules later, at least without upgrading whole project.
It uses more space on version control, which can create issues on huge projects with hundreds of modules and branches
It might be illegal as some licenses specifically forbid including their code in your source code.
The pip install of some module might do different things depending on operating system version or installed packages. The modules on your machine might be suboptimal on someone else's machine, and in some instances might not even work.
And probably more that I can't think of right now.
The only situation where I saw this being unavoidable was when the module didn't support python implementation the application was running on. The module was changed, and its source was put under lib folder with the rest of the libraries.
I think you can add the directory with python modules into PYTHONPATH. Then people want to use those modules just need has this envvar set.
https://docs.python.org/3/using/cmdline.html#envvar-PYTHONPATH

How to use a utils across projects and ensure it is updated

I have made a few Python projects for work that all revolve around extracting data, performing Pandas manipulations, and exporting to Excel. Obviously, there are common functions I've been reusing. I've saved these into utils.py, and I copy paste utils.py into each new project.
Whenever I change utils.py, I need to ensure that I change it in my other project, which is an error-prone process
What would you suggest?
Currently, I create a new directory for each project, so
/PyCharm Projects
--/CollegeBoard
----/venv
----/CollegeBoard.py
----/Utils.py
----/Paths.py
--/BoxTracking
----/venv
----/BoxTracking.py
----/Utils.py
----/Paths.py
I'm wondering if this is the most effective way to structure/version control my work. Since I have many imports in common, too, would a directory like this be better?
/Projects
--/Reporting
----/venv
----/Collegeboard
------/Collegeboard.py
------/paths.py
----/BoxTracking
------/BoxTracking.py
------/paths.py
----/Utils.py
I would appreciate any related resources.
Instead of putting a copy of utils.py into each of your projects, make utils.py into a package with it's own dedicated repository/folder somewhere. I'd recommend renaming it to something less generic, such as "zhous_utils".
In that dedicated repository for zhous_utils, you can create a setup.py file and you can use that setup.py file to install the current version of the zhous_utils into your python install. That way you can import zhous_utils into any other python script on your PC, just like you would import pandas or any other package you've installed to your computer.
Check out this stackoverflow thread: What is setup.py?
When you understand setup.py, then you will understand how to make and install your own packages so that you can import those installed packages to any python script on your PC. That way all source code for zhous_utils is centralized to just one folder on your PC, which you can update whenever you want and re-install the package.
Now, of course, there are some potential challenges/downsides to this. If you install zhous_utils to your computer and then import and use zhous_utils in one of your other projects, then you've just made zhous_utils into a dependency of that project. That means that if you want to share that project with other people and let them work on it as well or use it in some way, then they will need to install zhous_utils. Just be aware of that. This won't be an issue if you're the only one interacting/developing the source code of the projects you intend to import zhous_utils into.

How to import some set of Python modules globally?

In my particular setting, I have a set of python modules that include auxiliary functions used in many different other modules. I putted them into a LIBS folder and I have other folder at the same path level those are including other modules that are doing certain jobs by using the help of these LIBS modules. Presently, I do this for all the modules to import LIBS modules.
import sys
sys.path.insert(0, '../LIBS')
import lib_module1
import lib_module2
....
As the project getting larger, this starts to be pain in the neck. I need to write down a large set of import statements for these auxiliary LIBS modules for each new module.
Is there any way to automatically import all these LIBS modules for the other modules that are in the folders living at the same path lelvel with LIBS folder?
For this, you can use
__init__.py
Kindly refer Modules and Stackoverflow.
Indeed there is! Start treating your LIBS modules as "real" modules (or packages) that are installed into the system like any other.
This means you will have to write a setup.py script to install your code. Generally this is done inside your development directory, then your module is installed with:
$ sudo python setup.py install
This will install your module under the site-packages subdirectory of wherever Python libraries are stored on your system.
I suggest starting by copying someone else's working setup.py and supporting files, then modifying to suit your packages. For example, here is my quoter module.
Fair warning: This is a pretty big step. Not only will you learn to deploy your module locally, you can also publish it on PyPI if you wish. The step of moving to true packages will encourage you to write more and more standard documentation, to develop and run more tests, to adopt more rigorous version specifications, to more clearly identify and define code dependencies, and take many other "professionalization" steps. These all pay dividends in better, more reliable, more portable, more easily deployed code--but I'd be lying if I didn't admit the learning curve can be steep at times.

Best practice for hacking on a 3rd-party python module

I often find myself wanting to use a 3rd party python module in my own project, but I know that I will also need to make changes to the 3rd party module that I want to push upstream. What is the best practice of file layout/installation to achieve this?
Most python modules are laid out with root dir containing a "setup.py" to compile/install the module. The problem is, every time I make changes to the module source I need to re-run the full install step in order to use those changes in my project. For large modules, like scipy this can take some time.
Alternatively, I can hack on the installed version of the python module, but then I have to manually move those changes back to the source version of the module in order to generate patches etc.
I know about virtualenv and PYTHONPATH but they are ways of installing a module to a different location.
So far, I have manually created symlinks, but that is messy.
If the 3rd party project is using setuptools or distribute, you can do python setup.py develop instead of install. This will create the appropriate sym-links in the site-packages dir for you.

How to install python modules on a per project basis, rather than system wide?

I am trying to define a process for migrating django projects from my development server to my production server using git, and it's driving me crazy that distutils installs python modules system-wide. I've read the documentation but unless I'm missing something it seems to be mostly about how to change the installation directory. I need to be able to use different versions of the same module in different projects running on the same server, and deploy projects from git without having to download and install dependencies.
tl;dr: I need to know how to install python modules, using distutils, into my project's source tree for version control without compromising other projects using different versions of the same module.
I'm new to python, so I apologize in advance if this is common knowledge.
Besides the already mentioned virtualenv which is a good option but has the potential drawback of requiring a third-party module, Distutils itself has options to install modules into arbitrary locations. In particular, there is the home scheme which allows you to "build and maintain a personal stash of Python modules". It's described in the Python documentation set here.
Perhaps you are looking for virtualenv. It will allow you to install packages into a separate virtual Python "root".
for completeness sake, virtualenvwrapper makes every day work with virtualenv a lot quicker and simpler once you are working on multiple projects and/or on multiple development platforms at the same time.
If you are looking something akin to npm or yarn of the JavaScript world or composer of the PHP world, then you may want to look at pipenv (not to be confused with pip). Here's a guide to get you started.
Alternatively there is also Poetry, which some people say is even better, but I haven't used it yet.

Categories

Resources