Python packaging with demo

Python packaging with demo - python

I've a question about python packaging.
My project is a library and I would like to have a demo of how to use this library in the form of an application with GUI.
I'm thinking of doing the following by using the install_requires from setuptools:
1 package for the library
1 package for the demo with GUI (auto installed when the 1st package is or that installs the library if the demo is installed)
Is that overkilled or is a decent way to approach it?
Basically I don't want to bloat my library with an entire demo application.
edit:
I probably need to give a bit more context to explain the situation better.
Basically my Library is an API to represent data and interact with it. But since I’m only offering the representation part, it doesn’t actually do anything when you interact with it.
The idea behind the demo is to create an entire environment in the form of an application that will provide contextual data to represent -> represent it -> then connect it to the environment so that when the user interacts with the representation it actually does something such as modifying the data for example.
so in that demo, most of the content is context based and irrelevant to the Library and that’s what i’m referring to when I say clutter / bloat. it’s only there to provide a context to the user and have finished product that actually does something when they test it.
Also I realize that over time I could grow that demo even more to show more possibilities of implementation without having to change the library.
That’s why I want to separate it.
So basically I want to always install the latest compatible version of the demo when the library is installed and also potentially be able to separately update the demo to the latest version if needed.
I hope this make sense :)
Cheers,

If a package provides some kind of optional feature that isn't part of the core functionality, it is usually gated with extras. This only addresses the dependencies your project has, the demo code would always be part of your library. That's usually ok though, because dependencies are where the really big chunks of bandwidth get wasted on.
Just to have an example to go on, the popular pyparsing package offers in its version 3 an extra called diagrams. The code to make diagrams work is part of the codebase and will be downloaded when installing pyparsing even without the extra. But its ~400 lines of code are nicely split off the core so that it can't possibly interfere with it. It's not clutter if it can safely be ignored.
The additional packages needed to make it work are a different story. On my machine, running pip install pyparsing==3.0.0a2 takes up 648kB of space. Running pip install pyparsing[diagrams]==3.0.0a2 adds 1872kB total, nearly three times as much. And this isn't even an extreme example, it just happened to be the last thing I installed which had an extra.
Summed up, if your optional feature doesn't introduce any additional dependencies, it's best practice to just include the feature. If it does introduce dependencies (e.g. some gui_lib_for_demo), split them off through extras, and maybe do something like this in your demo code
try:
import gui_lib_for_demo
except ImportError:
print("If you want to run the demo, please install this library with "
"`pip install my_lib[demo]` instead.")
# actual demo code starts here

Related

Django: requirements.txt

So far I know requirements.txt like this: Django==2.0. Now I saw this style of writing Django>=1.8,<2.1.99
Can you explain to me what it means?

requirements.txt is a file where one specifies dependencies. For example your program will here depend on Django (well you probably do not want to implement Django yourself).
In case one only writes a custom application, and does not plan to export it (for example as a library) to other programmers, one can pin the version of the library, for example Django==2.0.1. Then you can always assume (given pip manages to install the correct package) that your environment will ave the correct version, and thus that if you follow the correct documentation, no problems will (well should) arise.
If you however implement a library, for example mygreatdjangolibrary, then you probably do not want to pin the version: it would mean that everybody that wants to use your library would have to install Django==2.0.1. Imagine that they want a feature that is only available in django-2.1, then they can - given they follow the dependencies strictly - not do this: your library requires 2.0.1. This is of course not manageable.
So typically in a library, one aims to give as much freedom to a user of a library. It would be ideal if regardless of the Django version the user installed, your library could work.
Unfortunately this would result in a lot of trouble for the library developer. Imagine that you have to take into account that a user can use Django-1.1 up to django-2.1. Through the years, several features have been introduced that the library then can not use, since the programmer should be conservative, and take into account that it is possible that these features do not exist in the library the user installed.
It becomes even worse since Django went through some refactoring: some features have later been removed, so we can not simply program on django-1.1 and hope that everything works out.
So in that case, it makes sense to specify a range of versions we support. For example we can read the documentation of django-2.0, and look to the release notes to see if something relevant changed in django-2.1, and let tox test both versions for the tests we write. We thus then can specify a range like Django>=2.0,<2.1.99.
This is also important if you depend on several libraries that each a common requirement. Say for example you want to install a library liba, and a library libb, both depend on Django, bot the two have a different range, for example:
liba:
Django>=1.10, <2.1
libb:
Django>=1.9, <1.11
Then this thus means that we can only install a Django version between >=1.10 and <1.11.
The above even easily gets more complex. Since liba and libb of course have versions as well, for example:
liba-0.1:
Django>=1.10, <2.1
liba-0.2:
Django>=1.11, <2.1
liba-0.3:
Django>=1.11, <2.2
libb-0.1:
Django>=1.5, <1.8
libb-0.2:
Django>=1.10, <2.0
So if we now want to install any liba, and any libb, we need to find a version of liba and libb that "allows" us to install a Django version, and that is not that trivial since for example if we would pick libb-0.1, then there is no version of liba that supports an "overlapping" Django version.
To the best of my knowledge, pip currently has no dependency resolution algorithm. It looks at the specification, and each time aims to pick the most recent that is satisfying the constraints, and recursively installs the dependencies of these packages.
Therefore it is up to the user to make sure that (sub)dependencies do not conflict: if we would specify liba libb==0.1, then pip will probably install Django-2.1, and then find out that libb can not work with this.
There are some dependency resolution programs. But the problem turns out to be quite hard (it is NP-hard if I recall correctly). So that means that for a given dependency tree, it can takes years to find a valid configuration.

How do I access different python module versions?

so I am working on a shared computer. It is a workhorse for computations across the department. The problem we have run into is controlling versions of imported modules. Take for example Biopython. Some people have requirements of an older version of biopython - 1.58. And yet others have requirements for the latest biopython - 1.61. How would I have both versions of the module installed side by side, and how does one specifically access a particular version. I ask because sometimes these apis change and break old scripts for other people (or they expect certain functionality that is no longer there).
I understand that one could locally (i.e. per user) install the module and specifically direct python to that module. Is there another way to handle this? Or would everyone have to create an export PYTHONPATH before using?

I'm not sure if it's possible to change the active installed versions of a given module. Given my understanding of how imports and site-packages work, I'm leaning towards no.
Have you considered using virtualenv though ?
With virtualenv, you could create multiple shared environments -- one for biopython 1.58 , another for 1.61 , another for whatever other special situations you need. They don't need to be locked down to a particular user, so while it would take more space than what you desired, it could take less space than everyone having their own python environment.

It sounds like you're doing scientific computing. You should use Anaconda, and make particular note of the conda tool, documented here.
Conda uses hard links whenever possible to avoid copies of the same files. It also manages non-python binary modules in a much better way than virtualenv (virtualenv chokes on VTK, for example).

Tracking global migration to Python 3.x

Python 3.x is looking ever more tempting with cleaned up syntax (I like it, others may not) new features and what looks like a gradual progression towards more speed and better multithreading.
But Python 3.x is still held back by lack of 3rd party support. Important packages like Django, Twisted, etc. are not ported. It's hard to get an overview of where the bottlebecks in the migration are, how far it has come, and if it's progressing at all. The migration dependencies are also hard to map. Also, projects are probably waiting for Python 3.x to offer some major improvement over 2.x that would justify the effort of porting.
Ideally, there would be a site for tracking this migration overall, with (links to) migration plans and dependencies shown so that people willing to help the migration globally could coordinate their efforts and help specific projects. Perhaps also linking to projects' bug tracking systems for relevant migration-related bugs.
But perhaps I'm just not looking hard enough. Does someone know of any efforts to track global migration to Python 3.x?
(By "global", I mean the universe of open source projects built on Python.)
Update:
There's a poll right now on the Python home page which asks about packages you'd like to see ported to Python 3.x.

George Brandl has made a script that generates a graph with the amount of packages supporting Python 3:
The Link on the CheeseShop front page shows the packages in question: http://pypi.python.org/pypi?%3aaction=browse&c=533&show=all
There is also (a pretty crummy) list of unported packages ordered by how many depends on it: http://onpython3yet.com/ Why do I say it's crummy? Well, because it is done entirely without manual fixing up, resulting in things like listing Python as a package. This is to a large extent because people don't know that the "Dependencies" listing isn't a place to just list any sort of random dependencies, it should be used to list the packages that should be auto installed when you use easy_install/PIP. But for example in the Django world, they don't know that so you see things like "django-saddle" depending on Django and Python, and hence not being easy_installable.
That said, the list is interesting, and we see that PIL really should get ported.
Now this is not anything "global" it's just the packages on PyPI, and as such tend to be mostly Python modules, not separate applications. But I think the trend in general is visible there anyway.

The Python Package Index (PyPI) allows you to search for Python 3rd-party modules that support Python 3.x. It even has a Python 3 packages link which lists them all.
But that doesn't track individual projects' progress on Python 3 support. It just tells you which projects have achieved it.
Something I'd be interested to see is a graph of the total number/percentage of Python 3 packages in PyPI over time (from Python 3 release until present). I don't know if anyone has tracked this, or if the PyPI administrators have enough history data to produce such graphs.

Organizing Python projects with shared packages

What is the best way to organize and develop a project composed of many small scripts sharing one (or more) larger Python libraries?
We have a bunch of programs in our repository that all use the same libraries stored in the same repository. So in other words, a layout like
trunk
libs
python
utilities
projects
projA
projB
When the official runs of our programs are done, we want to record what version of the code was used. For our C++ executables, things are simple because as long as the working copy is clean at compile time, everything is fine. (And since we get the version number programmatically, it must be a working copy, not an export.) For Python scripts, things are more complicated.
The problem is that, often one project (e.g. projA) will be running, and projB will need to be updated. This could cause the working copy revision to appear mixed to projA during runtime. (The code takes hours to run, and can be used as inputs for processes that take days to run, hence the strong traceability goal.)
My current workaround is, if necessary, check out another copy of the trunk to a different location, and run off there. But then I need to remember to change my PYTHONPATH to point to the second version of lib/python, not the one in the first tree.
There's not likely to be a perfect answer. But there must be a better way.
Should we be using subversion keywords to store the revision number, which would allow the data user to export files? Should we be using virtualenv? Should we be going more towards a packaging and installation mechanism? Setuptools is the standard, but I've read mixed things about it, and it seems designed for non-developer end users (of which we have none).

The much better solution involves not storing all your projects and their shared dependencies in the same repository.
Use one repository for each project, and externals for the shared libraries.
Make use of tags in the shared library repositories, so consumer projects may use exactly the version they need in their external.
Edit: (just copying this from my comment) use virtualenv if you need to provide isolated runtime environments for the different apps on the same server. Then each environment can contain a unique version of the library it needs.

If I'm understanding your question properly, then you definitely want virtualenv. Add in some virtualenvwrapper goodness to make it that much better.

How might I handle development versions of Python packages without relying on SCM?

One issue that comes up during Pinax development is dealing with development versions of external apps. I am trying to come up with a solution that doesn't involve bringing in the version control systems. Reason being I'd rather not have to install all the possible version control systems on my system (or force that upon contributors) and deal the problems that might arise during environment creation.
Take this situation (knowing how Pinax works will be beneficial to understanding):
We are beginning development on a new version of Pinax. The previous version has a pip requirements file with explicit versions set. A bug comes in for an external app that we'd like to get resolved. To get that bug fix in Pinax the current process is to simply make a minor release of the app assuming we have control of the app. Apps we don't have control we just deal with the release cycle of the app author or force them to make releases ;-) I am not too fond of constantly making minor releases for bug fixes as in some cases I'd like to be working on new features for apps as well. Of course branching the older version is what we do and then do backports as we need.
I'd love to hear some thoughts on this.

Could you handle this using the "==dev" version specifier? If the distribution's page on PyPI includes a link to a .tgz of the current dev version (such as both github and bitbucket provide automatically) and you append "#egg=project_name-dev" to the link, both easy_install and pip will use that .tgz if ==dev is requested.
This doesn't allow you to pin to anything more specific than "most recent tip/head", but in a lot of cases that might be good enough?

I meant to mention that the solution I had considered before asking was to put up a Pinax PyPI and make development releases on it. We could put up an instance of chishop. We are already using pip's --find-links to point at pypi.pinaxproject.com for packages we've had to release ourselves.

Most open source distributors (the Debians, Ubuntu's, MacPorts, et al) use some sort of patch management mechanism. So something like: import the base source code for each package as released, as a tar ball, or as a SCM snapshot. Then manage any necessary modifications on top of it using a patch manager, like quilt or Mercurial's Queues. Then bundle up each external package with any applied patches in a consistent format. Or have URLs to the base packages and URLs to the individual patches and have them applied during installation. That's essentially what MacPorts does.
EDIT: To take it one step further, you could then version control the set of patches across all of the external packages and make that available as a unit. That's quite easy to do with Mercurial Queues. Then you've simplified the problem to just publishing one set of patches using one SCM system, with the patches applied locally as above or available for developers to pull and apply to their copies of the base release packages.

EDIT: I am not sure I am reading your question correctly so the following may not answer your question directly.
Something I've considered, but haven't tested, is using pip's freeze bundle feature. Perhaps using that and distributing the bundle with Pinax would work? My only concern would be how different OS's are handled. For example, I've never used pip on Windows, so I wouldn't know how a bundle would interact there.
The full idea I hope to try is creating a paver script that controls management of the bundles, making it easy for users to upgrade to newer versions. This would require a bit of scaffolding though.
One other option may be you keeping a mirror of the apps you don't control, in a consistent vcs, and then distributing your mirrored versions. This would take away the need for "everyone" to have many different programs installed.
Other than that, it seems the only real solution is what you guys are doing, there isn't a hassle-free way that I've been able to find.

Develop Reference

Python is a programming language that lets you work quickly and integrate systems more effectively.