How to use a utils across projects and ensure it is updated - python

I have made a few Python projects for work that all revolve around extracting data, performing Pandas manipulations, and exporting to Excel. Obviously, there are common functions I've been reusing. I've saved these into utils.py, and I copy paste utils.py into each new project.
Whenever I change utils.py, I need to ensure that I change it in my other project, which is an error-prone process
What would you suggest?
Currently, I create a new directory for each project, so
/PyCharm Projects
--/CollegeBoard
----/venv
----/CollegeBoard.py
----/Utils.py
----/Paths.py
--/BoxTracking
----/venv
----/BoxTracking.py
----/Utils.py
----/Paths.py
I'm wondering if this is the most effective way to structure/version control my work. Since I have many imports in common, too, would a directory like this be better?
/Projects
--/Reporting
----/venv
----/Collegeboard
------/Collegeboard.py
------/paths.py
----/BoxTracking
------/BoxTracking.py
------/paths.py
----/Utils.py
I would appreciate any related resources.

Instead of putting a copy of utils.py into each of your projects, make utils.py into a package with it's own dedicated repository/folder somewhere. I'd recommend renaming it to something less generic, such as "zhous_utils".
In that dedicated repository for zhous_utils, you can create a setup.py file and you can use that setup.py file to install the current version of the zhous_utils into your python install. That way you can import zhous_utils into any other python script on your PC, just like you would import pandas or any other package you've installed to your computer.
Check out this stackoverflow thread: What is setup.py?
When you understand setup.py, then you will understand how to make and install your own packages so that you can import those installed packages to any python script on your PC. That way all source code for zhous_utils is centralized to just one folder on your PC, which you can update whenever you want and re-install the package.
Now, of course, there are some potential challenges/downsides to this. If you install zhous_utils to your computer and then import and use zhous_utils in one of your other projects, then you've just made zhous_utils into a dependency of that project. That means that if you want to share that project with other people and let them work on it as well or use it in some way, then they will need to install zhous_utils. Just be aware of that. This won't be an issue if you're the only one interacting/developing the source code of the projects you intend to import zhous_utils into.

Related

Is it possible to have users not pip install modules and instead include the modules used in a different folder and then import that?

I want to know if I can create a python script with a folder in the same directory with all the assets of a python module, so when someone wants to use it, they would not have to pip install module, because it would import from the directory.
Yes, you can, but it doesn't mean that you should.
First, ask yourself who is suposed to use that code.
If you plan to give it to consumers, it would be a good idea to use a tool like py2exe and create executable file which would include all modules and not allow for code to be changed.
If you plan to share it with another developer, you might want to look into virtual environments and requirements.txt file.
There are multiple reasons why sharing modules is bad idea:
It is harder to update modules later, at least without upgrading whole project.
It uses more space on version control, which can create issues on huge projects with hundreds of modules and branches
It might be illegal as some licenses specifically forbid including their code in your source code.
The pip install of some module might do different things depending on operating system version or installed packages. The modules on your machine might be suboptimal on someone else's machine, and in some instances might not even work.
And probably more that I can't think of right now.
The only situation where I saw this being unavoidable was when the module didn't support python implementation the application was running on. The module was changed, and its source was put under lib folder with the rest of the libraries.
I think you can add the directory with python modules into PYTHONPATH. Then people want to use those modules just need has this envvar set.
https://docs.python.org/3/using/cmdline.html#envvar-PYTHONPATH

How to maintain python application with dependencies, including my own custom libs?

I'm using Python to develop few company-specific applications. There is a custom shared module ("library") that describes some data and algorithms and there are dozens of Python scripts that work with this library. There's quite a lot of these files, so they are organized in subfolders
myproject
apps
main_apps
app1.py
app2.py
...
utils
util1.py
util2.py
...
library
__init__.py
submodule1
__init__.py
file1.py
...
submodule2
...
Users want to run these scripts by simply going, say, to myproject\utils and launching "py util2.py some_params". Many of these users are developers, so quite often they want to edit a library and immediately re-run scripts with updated code. There are also some 3rd party libraries used by this project and I want to make sure that everyone is using the same versions of these libs.
Now, there are two key problems I encountered:
how to reference (library) from (apps)?
how to manage 3rd party dependencies?
The first problem is well-familiar to many Python developers and was asked on SO for many times: it's quite difficult to instruct Python to import package from "....\library". I tested several different approaches, but it seems that python is reluctant to search for packages anywhere, but in standard libraries locations or the folder of the script itself.
Relative import doesn't work since script is not a part of a library (and even if it was, this still doesn't work when script is executed directly unless it's placed in the "root" project folder which I'd like to avoid)
Placing .pth file (as one might think from reading this document) to script folder apparently doesn't have any effect
Of course direct meddling with sys.path work, but boilerplate code like this one in each and every one of the script files looks quite terrible
import sys, os.path
here = os.path.dirname(os.path.realpath(__file__))
module_root = os.path.abspath(os.path.join(here, '../..'))
sys.path.append(python_root)
import my_library
I realize that this happens because Python wants my library to be properly "installed" and that's indeed would be the only right way to go had this library was developed separately from the scripts that use it. But unfortunately it's not the case and I think that re-doing "installation" of library each time it's changed is going to be quite inconvenient and prone to errors.
The second problem is straightforward. Someone adds a new 3rd party module to our app/lib and everyone else start seeing import problems once they update their apps. Several branches of development, different moments when user does pip install, few rollbacks - and everyone eventually ends using different versions of 3rd party modules. In my case things are additionally complicated by the fact that many devs work a lot with older Python 2.x code while I'd like to move on to Python 3.x
While looking for a possible solution for my problems, I found a truly excellent virtual environments feature in Python. Things looked quite bright:
Create a venv for myproject
Distribute a Requirements.txt file as part of app and provide a script that populates venv accordingly
Symlink my own library to venv site_packages folder so it'll be always detected by Python
This solution looked quite natural & robust. I'm explicitly setting my own environment for my project and place whatever I need into this venv, including my own lib that I can still edit on the fly. And it indeed work. But calling activate.bat to make this python environment active and another batch file to deactivate it is a mess, especially on Windows platform. Boilerplate code that is editing sys.path looks terrible, but at least it doesn't interfere with UX like this potential fix do.
So there's a question that I want to ask.
Is there a way to bind particular python venv to particular folders so python launcher will automatically use this venv for scripts from these folders?
Is there a better alternative way to handle this situation that I'm missing?
Environment for my project is Python 3.6 running on Windows 10.
I think that I finally found a reasonable answer. It's enough to just add shebang line pointing to python interpreter in venv, e.g.
#!../../venv/Scripts/python
The full project structure will look like this
myproject
apps
main_apps
app1.py (with shebang)
app2.py (with shebang)
...
utils
util1.py (with shebang)
util2.py (with shebang)
...
library
__init__.py
submodule1
__init__.py
file1.py
...
submodule2
...
venv
(python interpreter, 3rd party modules)
(symlink to library)
requirements.txt
init_environment.bat
and things work like this:
venv is a virtual python environment with everything that project needs
init_environment.bat is a script that populates venv according to requirements.txt and places a symlink to my library into venv site-modules
all scripts start with shebang line pointing (with relative path) to venv interpreter
There's a full custom environment with all the libs including my own and scripts that use it will all have very natural imports. Python launcher will also automatically pick Python 3.6 as interpreter & load the relevant modules whenever any user-facing script in my project is launched from console or windows explorer.
Cons:
Relative shebang won't work if a script is called from other folder
User will still have to manually run init_environment.bat to update virtual environment according to requirements.txt
init_environment scrip on Windows require elevated privileges to make a symlink (but hopefully that strange MS decision will be fixed with upcoming Win10 update in April'17)
However I can live with these limitations. Hope that this will help others looking for similar problems.
Would be still nice to still hear other options (as answers) and critics (as comments) too.

import openpyxl in django

I am quite new to Python and Django. I have a problem with integrating a python package (openpyxl) to my django app. I'd like to use the methods of these files into my views.py file.
My problem is first that I don't know where's the best place to put the openpyxl folder containing all the files in my file hierarchy.
My hierarchy looks like this:
http://imgur.com/t4iOX98
Is it well placed? Should I put it outside the international folder? inside the carte_interactive folder?
And my biggest problem is inside the __init__.py of openpyxl. I get errors lines like this one:
from openpyxl.xml import LXML
Where there is no resolved reference to LXML, but is actually defined in the xml file of openpyxl.
Is it my bad file placement that caused this? or is it Django?, or is it openpyxl's fault? Do anyone have an idea?
You can see openpyxl's source files here, where I downloaded them:
https://bitbucket.org/openpyxl/openpyxl/src
If you need any more details, please ask!
Thanks in advance!
I applaud your enthusiasm for wanting to learn Django while being new to Python. That said, the way you have things set up right now will make your life unnecessarily difficult to manage.
I would first recommend reading up on best practices for setting up a Django project. Just doing a quick google search for "Django project layout best practices" will give you a lot of resources, but they'll all essentially tell you to do what's in the SO answer above.
The second very basic thing is using pip to install and use other python packages. This is especially important for a django project, where you often have a lot of dependencies outside of Django. Pip is a program to install additionaly python packages. They get installed in your PYTHONPATH, which is just a list of filepaths on disk where python will look for additional packages. If you're on a *NIX system, this is usually in something like /usr/lib/python2.7/. Once you have something in your python path, you can from any piece of code, use other libraries you've installed via the python import system. Essentially, all this more or less does is look through each location in your PYTHONPATHs for the library you're trying to import.
Finally, in regards specifically to lxml, you will want to install it via apt or some other package installer. (e.g. on ubuntu, apt install python-lxml
In order to keep track of all your external python-dependencies, stuff them in a file named "requirements.txt" in the top level directory. This is a pretty standard thing to do for Django projects, so don't worry about shipping code with ALL dependencies inside the project.
Thanks to all of you! I'm using Jetbrains Pycharm and when I wrote import openpyxl, it gave me the choice to install the package. I suppose it does it with pip, which would certainly have worked the same. And I put the package in requirements.txt, so that other users would only have to install this requirement!
It works now! And thanks for the link on the best practices. I'll read that!

How to import some set of Python modules globally?

In my particular setting, I have a set of python modules that include auxiliary functions used in many different other modules. I putted them into a LIBS folder and I have other folder at the same path level those are including other modules that are doing certain jobs by using the help of these LIBS modules. Presently, I do this for all the modules to import LIBS modules.
import sys
sys.path.insert(0, '../LIBS')
import lib_module1
import lib_module2
....
As the project getting larger, this starts to be pain in the neck. I need to write down a large set of import statements for these auxiliary LIBS modules for each new module.
Is there any way to automatically import all these LIBS modules for the other modules that are in the folders living at the same path lelvel with LIBS folder?
For this, you can use
__init__.py
Kindly refer Modules and Stackoverflow.
Indeed there is! Start treating your LIBS modules as "real" modules (or packages) that are installed into the system like any other.
This means you will have to write a setup.py script to install your code. Generally this is done inside your development directory, then your module is installed with:
$ sudo python setup.py install
This will install your module under the site-packages subdirectory of wherever Python libraries are stored on your system.
I suggest starting by copying someone else's working setup.py and supporting files, then modifying to suit your packages. For example, here is my quoter module.
Fair warning: This is a pretty big step. Not only will you learn to deploy your module locally, you can also publish it on PyPI if you wish. The step of moving to true packages will encourage you to write more and more standard documentation, to develop and run more tests, to adopt more rigorous version specifications, to more clearly identify and define code dependencies, and take many other "professionalization" steps. These all pay dividends in better, more reliable, more portable, more easily deployed code--but I'd be lying if I didn't admit the learning curve can be steep at times.

Python / Git / Module structure best practice

We have a lot small projects that share common utility "projects"
Example:
utility project math contains function add
project A and project B both need math.add
project A has nothing to do with project B
so is it a good idea to have 3 git repositories (project_A,project_B and math) and clone them locally as
/SOMWHERE/workspace/project_A
/SOMWHERE/workspace/math
and have in /SOMWHERE/workspace/project_A/__init__.py something like
import sys
sys.path.append('../math')
import math
math.add()
I have read Structuring Your Project but that doesn't handle SCM and sharing modules.
So to sum up my question: is
sys.path.append('../math')
import math
good practice or is there a more "pythonic" way of doing that?
Submodules are a suboptimal way of sharing modules like you said in your comments. A better way would be to use the tools offered by your language of choice, i.e Python.
First, create virtualenvs to isolate every project python environment. Use pip to install packages and store dependencies in a requirements.txt file.
Then, you can create a specific package for each of your utils library using distutils and share it on Pypi.
If you don't want to release your packages into the wild, you can also host your own Pypi server.
Using this setup, you will be able to use different versions of your libraries and work on them without breaking compatibility with older code bases. You will also avoid using submodules, that are difficult to use with git.
all of what you describe (3 projects) sounds fine except that you shouldn't mess around with sys.path. instead, set the PYTHONPATH environment variable.
also, if you were not aware of distutils i am guessing you may be new to python development, and may not know about virtualenv. you should use that too (it allows you to develope against a "clean" python version that has no packages, or only the packages you install for that env).

Categories

Resources