Best practice for reusing python code [closed] - python

As it currently stands, this question is not a good fit for our Q&A format. We expect answers to be supported by facts, references, or expertise, but this question will likely solicit debate, arguments, polling, or extended discussion. If you feel that this question can be improved and possibly reopened, visit the help center for guidance.
Closed 10 years ago.
I have write a python library app(which contains several *.py files). And several of my python projects need to reuse the code in the library app. What's the recommended best practice for reusing python code? Currently I have thought out three options:
Copy and paste. This is far away from best practice. It violates the
DRY principle.(Don't repeat yourself.)
Add the folder of the library app to the environment variable PYTHONPATH: export PYTHONPATH=/path/to/library/app. Then every projects on the same computer can reference the code in the library app.
And the folder of the library app to sys.path in python code: sys.path.append('/path/to/library/app')
Among the three options above which one do you prefer? What advantage does it have compared to the other two options? Do you have any other better options? It is much appreciated that if some one with years of python development experiences could answer this question.

Allow me to propose a fourth alternative: take the time to learn how to package your library and install it in your site-packages; it's easier than one may think and I'm convinced it's time well spent. This is a very good starting point: https://packaging.python.org/en/latest/

Of your three options, PYTHONPATH is the way to go. Copy & paste is clearly out, and adding code to your other projects to modify sys.path simply pollutes those files with knowledge about their environment.
A fourth option is, create a true installable package from your common code, and install it into your Python installation. Then you can simply import those modules like any other 3rd-party install code.

If its a shared library you should package it up and add it to site-packages and then you wont have to worry about setting anything up. This is the best option.
If you dont want to use site-packages, then use PYTHONPATH. Its why it exists, and it is the way to do what you want.
You might want to look into using site.addsitedir, path.append does not prevent duplicates. It will also allow you to leverage .pth files.
Dynamically setting/adding things to PYTHONPATH via sys.path achieves the same result, but it can complicate things if you choose to reuse your new library. It also causes issues if your paths change, you have to change code versus an environment variable. Unless it is completely necessary, you should not dynamically setup your python path.
Copy & Paste is not a re-use pattern. You aren't reusing anything you are duplicating and increasing maintenance.

The first way is, as you noted yourself, hardly acceptable as it has countless well-known problems.
The other two have their own problems. For starters, they require manual work when files move (particular bad if you mix it with the application logic, i.e. put it in *.py files instead of leaving it to the computer it runs on) and requires either fixed installation locations (absolute paths) or at least a certain directory structure (relative paths). IMHO, these ways are only acceptable if the applications aren't ever going to move anywhere else. As soon as that becomes required, you should give up and use the existing solution as the one reason not to use it, being overkill for small and local scripts, doesn't apply any more:
Make the common parts, which you apparently already treat as a free-standing library (good!), a fully-fledged project in its own right, with a setup.py that allows installing and adding to PYTHONPATH in a cross-platform way with a single command. You don't need to actually publish it at PyPI, but it makes doing so easier if you should change your mind in the future. Should you do so and also publish some of your projects on PyPI, you also made installing the project in question and its dependencies easier for every potential user.

Related

Knowing existing functions/modules etc

Newbie Python Learner here,
A question for programmers,
English is not main language, will be hard to explain the question I wish to convey.
How do you programmers know which modules exist and which don't?
Say you are writing a script/program etc.
There are modules/functions etc. which you may need to create or use to either write your program, or perhaps complete it quicker, how will you know a required module/function etc. exists that may help you write your program? What prevents you from wasting your time in writing an entire module/function etc. which you might need to use which may already exist without you knowing so?
If you're asking how to find Python packages, in general, that can help you, usually a quick Google search or few will show you packages other people have used for similar problems as yours.
Experience is the short answer for this.
A more detailed explanation is understanding the scope of your needs. If you believe that the problem you are faced with is a common one, chances are that someone has come up with a solution for it and put it into a module. You become aware of modules by running into a challenge and then searching up how others have solved it. You will most likely run into others who have come across the same thing and have used others modules to solve it.
The more specific your problem is the less likely there will be a module already made for it. For example, plotting data is a widely common need, which is why the Matplotlib module is known by most python programmers. Searching the PyPi website will show you a lot of modules that can come in handy later.
Good Luck and have fun looking at all the oddly specific modules out there!
The main sources of info are
docs.python.org (The Tutorial also introduces important modules in the standard lib)
pypi.org
StackOverflow (of course)
Google
You can be almost sure that basic functionalities are provided by standard lib, pypi.org allows then to search by several criteria.

What is the argument for Python to seemingly frown on importing from different directories?

This might be a more broad question, and more related to understanding Python's nature and probably good programming practices in general.
I have a file, called util.py. It has a lot of different small functions I've collected over the past few months that are useful when doing various machine learning tasks.
My thinking is this: I'd like to continue adding important functions to this script as I go. As so, I will want to use import util.py often, now and in the future, in many unrelated projects.
But Python seems to feel like I should only be able to access the code in this file if it lives in my current directly, even if the functions in this file are useful for scripts in different directories. I sense some reason behind the way that works that I don't fully grasp; to me, it seems like I'll be forced to make unnecessary copies often.
If I should have to create a new copy of util.py every time I'm working from within a new directory, on a different project, it won't be long until I have many different version / iterations of this file, scattered all over my hard drive, in various states. I don't desire this degree of modularity in my programming -- for the sake of simplicity, repeatability, and clarity, I want only one file in only one location, accessible to many projects.
The question in a nutshell: What is the argument for Python to seemingly frown on importing from different directories?
If your util.py file contains functions you're using in a lot of different projects, then it's actually a library, and you should package it as such so you can install it in any Python environment with a single line (python setup.py install), and update it if required (Python's packaging ecosystem has several features to track and update library versions).
An added benefit is that right now, if you're doing what the other answers suggested, you have to remember to manually have put util.py in your PYTHONPATH (the "dirty" way). If you try to run one of your programs and you haven't done that, you'll get a cryptic ImportError that doesn't explain much: is it a missing dependency? A typo in the program?
Now think about what happens if someone other than you tries to run the program(s) and gets those error messages.
If you have a library, on the other hand, trying to set up your program will either complain in clear, understandable language that the library is missing or out of date, or (if you've taken the appropriate steps) automatically download and install it so things are ready to roll.
On a related topic, having a file/module/namespace called "util" is a sign of bad design. What are these utilities for? It's the programming equivalent of a "miscellaneous" folder: eventually, everything will end up in it and you'll have no way to know what it contains other than opening it and reading it all.
Another way, is adding the directory/you/want/to/import/from to the path from within the scripts that need it.
You should have a file __init__.py in the same folder where utils.py lives, to tell python to treat the folder as a package. The file __init__.py may be empty, or not, you can define other things in there.
Example:
/home/marcos/python/proj1/
__init__.py
utils.py
/home/marcos/school_projects/final_assignment/
my_scrpyt.py
And then inside my_script.py
import sys
sys.path.append('/home/marcos/python/')
from proj1 import utils
MAX_HEIGHT = utils.SOME_CONSTANT
a_value = utils.some_function()
First, define an environment variable. If you are using bash, for example, then put the following in the appropriate startup file:
export PYTHONPATH=/path/to/my/python/utilities
Now, put your util.py and any of your other common modules or packages in that directory. Now you can import util from anywhere and python will find it.

Best Practices in Handling Modules Prerequisites

Recently I started working on a personal project in my notebook that, all going OK, it will be placed in a server elsewhere. The problem is that I make use of modules. Some were installed from apt-get, others from easy_install and one or two of those were placed directly under a subdirectory since I changed them a bit. My question is: is there a way to move all those things together? Moreover, I don't want any of those modules being updated since it may break something. How to handle that?
Finally, I'm pretty sure that I've done things the wrong way since the beginning. How do you guys work to avoid those problems?
Have a look at virtualenv. Virtualenv is a tool to create isolated Python environments.

Script to install and compile Python, Django, Virtualenv, Mercurial, Git, LessCSS, etc... on Dreamhost

The Story
After cleaning up my Dreamhost shared server's home folder from all the cruft accumulated over time, I decided to start afresh and compile/reinstall Python.
All tutorials and snippets I found seemed overly simplistic, assuming (or ignoring) a bunch of dependencies needed by Python to compile all modules correctly. So, starting from http://andrew.io/weblog/2010/02/installing-python-2-6-virtualenv-and-virtualenvwrapper-on-dreamhost/ (so far the best guide I found), I decided to write a set-and-forget Bash script to automate this painful process, including along the way a bunch of other things I am planning to use.
The Script
I am hosting the script on http://bitbucket.org/tmslnz/python-dreamhost-batch/src/
The TODOs
So far it runs fine, and does all it needs to do in about 900 seconds, giving me at the end of the process a fully functional Python / Mercurial / etc... setup without even needing to log out and back in.
I though this might be of use for others too, but there are a few things that I think it's missing and I am not quite sure how to go for it, what's the best way to do it, or if this just doesn't make any sense at all.
Check for errors and break
Check for minor version bumps of the packages and give warnings
Check for known dependencies
Use arguments to install only some of the packages instead of commenting out lines
Organise the code in a manner that's easy to update
Optionally make the installers and compiling silent, with error logging to file
failproof .bashrc modification to prevent breaking ssh logins and having to log back via FTP to fix it
EDIT: The implied question is: can anyone, more bashful than me, offer general advice on the worthiness of the above points or highlight any problems they see with this approach? (see my answer to Ry4an's comment below)
The Gist
I am no UNIX or Bash or compiler expert, and this has been built iteratively, by trial and error. It is somehow going towards apt-get (well, 1% of it...), but since Dreamhost and others obviously cannot give root access on shared servers, this looks to me like a potentially very useful workaround; particularly so with some community work involved.
One way to streamline this would be to make it work with one of: capistrano/fabric, puppet/chef, jhbuild, or buildout+minitage (and a lot of cmmi tasks). There are some opportunities for factoring in common code, especially with something more high-level than bash. You will run into bootstrapping issues, however, so maybe leave good enough alone.
If you want to look into userland package managers, there is autopackage (bootstraps well), nix (quickstart), and stow (simple but helps with isolation).
Honestly, I would just build packages with a name prefix for all of the pieces and have them install under /opt so that they're out of the way. That way it only takes the download time and a bit of install time to do.

Organizing Python projects with shared packages

What is the best way to organize and develop a project composed of many small scripts sharing one (or more) larger Python libraries?
We have a bunch of programs in our repository that all use the same libraries stored in the same repository. So in other words, a layout like
trunk
libs
python
utilities
projects
projA
projB
When the official runs of our programs are done, we want to record what version of the code was used. For our C++ executables, things are simple because as long as the working copy is clean at compile time, everything is fine. (And since we get the version number programmatically, it must be a working copy, not an export.) For Python scripts, things are more complicated.
The problem is that, often one project (e.g. projA) will be running, and projB will need to be updated. This could cause the working copy revision to appear mixed to projA during runtime. (The code takes hours to run, and can be used as inputs for processes that take days to run, hence the strong traceability goal.)
My current workaround is, if necessary, check out another copy of the trunk to a different location, and run off there. But then I need to remember to change my PYTHONPATH to point to the second version of lib/python, not the one in the first tree.
There's not likely to be a perfect answer. But there must be a better way.
Should we be using subversion keywords to store the revision number, which would allow the data user to export files? Should we be using virtualenv? Should we be going more towards a packaging and installation mechanism? Setuptools is the standard, but I've read mixed things about it, and it seems designed for non-developer end users (of which we have none).
The much better solution involves not storing all your projects and their shared dependencies in the same repository.
Use one repository for each project, and externals for the shared libraries.
Make use of tags in the shared library repositories, so consumer projects may use exactly the version they need in their external.
Edit: (just copying this from my comment) use virtualenv if you need to provide isolated runtime environments for the different apps on the same server. Then each environment can contain a unique version of the library it needs.
If I'm understanding your question properly, then you definitely want virtualenv. Add in some virtualenvwrapper goodness to make it that much better.

Categories

Resources