distutils, distutils2, pip and requirements [closed]

distutils, distutils2, pip and requirements [closed] - python

It's difficult to tell what is being asked here. This question is ambiguous, vague, incomplete, overly broad, or rhetorical and cannot be reasonably answered in its current form. For help clarifying this question so that it can be reopened, visit the help center.
Closed 10 years ago.
I am diving into the world of packaging Python applications and managed to get into this state of confusion where my head starts to spin due to all the concepts and options I am supposed to deal with.
Question:
What do I need to accomplish? Deploy my Python project from source located on a git server. The deployment tool should get and install all the dependencies, most of which are available via PIP and one needs to be fetched and installed via Git. The final result should be installable via Pip, so I can do something as:
[~] git clone git://some/path/project.git
[~] pip install project/
Context:
Currently I am trying to get Distutils2 to do what I want, but it seems that setup.py made using 'generate-setup' command doesn't play along with Pip.
I wanted to use Distutils2 since it is supposed to be most future proof of all. But various documentation on all the tools is just horrible (accurate info mixed with outdated and inaccurate information) in a way that makes a guy question his sanity.
So what should I do? Stick to distutils and setup.py? Or do I need to take a look at something the likes of Buildout?
Could the kind answerer please lay out what I am supposed to do with particular tool (e.g.: deploy your code using Distutils2, install dependencies using PIP, for git dependencies write a script and glue everything together doing XYZ).
Edit: I am using Distutils2 1.0a4, which seems incompatible with the docs.
Edit2: Reformatted the question to make it clearer what my question is really about.
Edit3: This is my fourth attempt at breaking the packaging and distribution toolchain of Python. I am not trying to get other people to do my work for me, however for a rookie it is pretty much impossible to crack what a particular tool is supposed to do, where it starts and where another ends. Especially because of the functional overlap between the tools. I am not located in Silicon Valley encircled by sages who could initiate me into the secrets and the publicly available documentation is of no use.
Final Edit:
Although I wasn't really thinking about replacing virtualenv with Buildout when starting this question. But while doing my research I realized something I always knew, but just didn't come to me in total clarity. There are many ways about Python packaging and deployment automation. There are also many tools which can help you get the stuff done. However while there is significant functional overlap between the tools, the toolchain is ever evolving and thus far there is no clear "standard best practice". The distribution toolchain arms race is still in full heat and no clear victor has emerged yet. This may be confusing to us noobs, who expect that most of shit in Python just works. What I was after (distutils/setuptools + pip + virtualenv in a Buildout fashion or even semi integrated with Buildout) certainly is dooable, but it just doesn't make much sense, not because its not possible - but because nobody does it. If you need this level of sophistication, then you need to commit. Personally I have decided to leave virtualenv behind (for this project) and embrace Buildout.

Take a look at buildout; together with a buildout plugin called mr.developer you can put together a deployment system that'll do all that you ask for.
There are plenty of examples and presentations on buildout configurations on the web, here are a few to get you started:
An introductionairy presentation on buildout: http://www.slideshare.net/djay/buildout-how-to-maintain-big-app-stacks-without-losing-your-mind
Includes a YouTube video of the presentation, so you can listen along.
Excellent blog post on how to use buildout to develop a Django application.
Includes details on how buildout and setup.py interplay.
The configuration for planet.plone.org: https://github.com/plone/planet.plone.org/blob/master/buildout.cfg
This builds a venus RSS aggregator with configuration, style, apache config and cronjobs, pulling in eggs as needed.
The Plone core development buildout: https://github.com/plone/buildout.coredev
Complex buildout that pulls in all the sources needed to develop the Plone CMS; this is a complex beast but it shows off what you can do with mr.developer.

It doesn't need to be difficult: install Jenkins and use pip's requirements.txt files to define the packages that your project needs. After that you can configure a build in Jenkins to perform various tasks, including installing the required packages. It can obtain the source code from your repository and install + build the whole project.

Related

What's going on with Go applications on PyPI?

Rather by accident I found myself in a situation in a previous role where the previous admin apparently installed "Python bindings" of InfluxDB and Docker-Compose and magically both applications where available on the systems while I was sure that they where written in Go.
I had a few issues with that:
It's incomprehensible what happens here, there should be some go binary belonging to the application but I can't find it by name, I doubt that docker-compose and influxdb have been rewritten in Python just to have one more option available while at least docker-compose static binaries are available on Github for direct download. It doesn't make a lot of sense to me.
Undermining security guidelines set by the organization and best practices for systems administration.
Dependency Confusion
Links to the packages on PyPI:
https://pypi.org/project/influxdb/
https://pypi.org/project/docker-compose/
I haven't looked into Python wheels and packaging before beyond Debian packaging, I just got curious again the get to the bottom of this strange usage pattern.
Docker-Compose refers to https://github.com/docker/compose a project consisting of 95.5% Go code according to GitHub, which isn't really helpful since the source package and wheel package on PyPI look completely different and at first sight I'm overwhelmed by the amount of Python files. InfluxDB seems to be a better example but I would really appreciate help from a Python developer or package maintainer explaining to me what happening there. Thanks.
Edit 2022-09-10:
From the show notes of Security Now 887: https://www.grc.com/sn/sn-887-notes.pdf
a researcher at Checkmarx noted in a technical report they published last week that “A worrying
feature in pip/PyPI allows code to automatically run when developers are merely downloading a
package.” He added that the feature is alarming because “a great deal of the malicious packages
we are finding in the wild use this feature of code execution upon installation to achieve higher
infection rates.”
With my preexisting misconception about some PyPI packages like docker-compose, that sounded alarming to me.
The following article mentions that compiled libraries from C, Rust, Go and others can be bundled in packages, but no applications "hidden" as artifacts, which I assumed. https://realpython.com/python-wheels/

What is best practice for working on a Python library package? [closed]

Closed. This question is opinion-based. It is not currently accepting answers.
Want to improve this question? Update the question so it can be answered with facts and citations by editing this post.
Closed 5 years ago.
Improve this question
Basically we have a Python library with modules and functions that we use in many of our programs. Currently, we checkout the SVN repository directly into C:\Python27\Lib so that the library is in the Python path. When someone make modifications to the library, everyone will update to get those modifications.
Some of our programs are frozen (using cx-Freeze) and delivered so we have to keep tracking of the library version used in the deliveries, but cx-Freeze automatically packages the modules imported in the code.
I don't think it is a good idea to rely on people to verify that they have no uncommitted local changes in the library or that they are up to date before freezing any program importing it.
The only version tracking we have is the commit number of the library repository, which is not linked anywhere to the program delivery version, and which should not be used as a delivery version of the library in my opinion.
I was thinking about using a setup.py to build a distribution of a specific version of that library and then indicate that version in a requirements.txt file in the project folder of the program importing it, but then it becomes complicated if we want to make modifications to that library because we would have to build and install a distribution each time we want to test it. It is not that complicated but I think someone will freeze a program with a test version of that library and it comes back to the beginning...
I kept looking for a best practice for that specific case but I found nothing, any ideas?

Ultimately you're going to have to trust your users to follow what development process you establish. You can create tools to make that easier, but you'll always end up having some trust.
Things that have been helpful to a number of people include:
All frozen/shipped builds of an executable are built on a central machine by something like BuildBot or Jenkins, not by individual developers. That gives you a central point for making sure that builds are shipped from clean checkouts.
Provide scripts that do the build and error out if there are uncommitted changes.
Where possible it is valuable to make it possible to point PYTHONPATH at your distribution's source tree and have things work even if there is a setup.py that can build the distribution. That makes tests easier. As always, make sure that your tools for building shipped versions check for this and fail if it happens.
I don't personally think that a distribution has a lot of value over a clean tagged subversion checkout for a library included in closed-source applications.
You can take either approach, but I think you will find that the key is in having good automation for whichever approach you choose, not in the answer to distribution vs subversion checkout

What are the use cases for a Python distribution?

I'm developing a distribution for the Python package I'm writing so I can post
it on PyPI. It's my first time working with distutils, setuptools, distribute,
pip, setup.py and all that and I'm struggling a bit with a learning curve
that's quite a bit steeper than I anticipated :)
I was having a little trouble getting some of my test data files to be
included in the tarball by specifying them in the data_files parameter in setup.py until I came across a different post here that pointed me
toward the MANIFEST.in file. Just then I snapped to the notion that what you
include in the tarball/zip (using MANIFEST.in) and what gets installed in a
user's Python environment when they do easy_install or whatever (based on what
you specify in setup.py) are two very different things; in general there being
a lot more in the tarball than actually gets installed.
This immediately triggered a code-smell for me and the realization that there
must be more than one use case for a distribution; I had been fixated on the
only one I've really participated in, using easy_install or pip to install a
library. And then I realized I was developing work product where I had only a
partial understanding of the end-users I was developing for.
So my question is this: "What are the use cases for a Python distribution
other than installing it in one's Python environment? Who else am I serving
with this distribution and what do they care most about?"
Here are some of the working issues I haven't figured out yet that bear on the
answer:
Is it a sensible thing to include everything that's under source control
(git) in the source distribution? In the age of github, does anyone download
a source distribution to get access to the full project source? Or should I
just post a link to my github repo? Won't including everything bloat the
distribution and make it take longer to download for folks who just want to
install it?
I'm going to host the documentation on readthedocs.org. Does it make any
sense for me to include HTML versions of the docs in the source
distribution?
Does anyone use python setup.py test to run tests on a source
distribution? If so, what role are they in and what situation are they in? I
don't know if I should bother with making that work and if I do, who to make
it work for.

Some things that you might want to include in the source distribution but maybe not install include:
the package's license
a test suite
the documentation (possibly a processed form like HTML in addition to the source)
possibly any additional scripts used to build the source distribution
Quite often this will be the majority or all of what you are managing in version control and possibly a few generated files.
The main reason why you would do this when those files are available online or through version control is so that people know they have the version of the docs or tests that matches the code they're running.
If you only host the most recent version of the docs online, then they might not be useful to someone who has to use an older version for some reason. And the test suite on the tip in version control may not be compatible with the version of the code in the source distribution (e.g. if it tests features added since then). To get the right version of the docs or tests, they would need to comb through version control looking for a tag that corresponds to the source distribution (assuming the developers bothered tagging the tree). Having the files available in the source distribution avoids this problem.
As for people wanting to run the test suite, I have a number of my Python modules packaged in various Linux distributions and occasionally get bug reports related to test failures in their environments. I've also used the test suites of other people's modules when I encounter a bug and want to check whether the external code is behaving as the author expects in my environment.

Where is there a good discussion of using python packages?

Is there a good discussion of setup.py and python packages from the package user perspective? Then maybe also from the package developer viewpoint too?

The Hitchhiker's Guide to Packaging is a good read. Covers everything from creating your own packaging to finding packages, installation, virtualenvs, etc.

Strictly speaking, the valid (though subjective) answer for your question would be "yes".
But I think you need the links. Well, from the package developer viewpoint, I had the very same question. In Python world, package management appears to be a bit of a mess at the moment. Here are some resources that I found useful:
First, James Bennet's (django dev) ‘On packaging’
Ian Bicking's ‘Corrections To “On Packaging”’—very useful read, made finally clear for me the complicated relationship between distribute and setuptools, as well as many other things
Occasionally, Building and Distributing Packages with Distribute
setup.py files from different projects
(Though I also think it would be better if you tell us what exactly do you want to learn.)

Are there any other good alternatives to zc.buildout and/or virtualenv for installing non-python dependencies?

I am a member of a team that is about to launch a beta of a python (Django specifically) based web site and accompanying suite of backend tools. The team itself has doubled in size from 2 to 4 over the past few weeks and we expect continued growth for the next couple of months at least. One issue that has started to plague us is getting everyone up to speed in terms of getting their development environment configured and having all the right eggs installed, etc.
I'm looking for ways to simplify this process and make it less error prone. Both zc.buildout and virtualenv look like they would be good tools for addressing this problem but both seem to concentrate primarily on the python-specific issues. We have a couple of small subprojects in other languages (Java and Ruby specifically) as well as numerous python extensions that have to be compiled natively (lxml, MySQL drivers, etc). In fact, one of the biggest thorns in our side has been getting some of these extensions compiled against appropriate versions of the shared libraries so as to avoid segfaults, malloc errors and all sorts of similar issues. It doesn't help that out of 4 people we have 4 different development environments -- 1 leopard on ppc, 1 leopard on intel, 1 ubuntu and 1 windows.
Ultimately what would be ideal would be something that works roughly like this, from the dos/unix prompt:
$ git clone [repository url]
...
$ python setup-env.py
...
that then does what zc.buildout/virtualenv does (copy/symlink the python interpreter, provide a clean space to install eggs) then installs all required eggs, including installing any native shared library dependencies, installs the ruby project, the java project, etc.
Obviously this would be useful for both getting development environments up as well as deploying on staging/production servers.
Ideally I would like for the tool that accomplishes this to be written in/extensible via python, since that is (and always will be) the lingua franca of our team, but I am open to solutions in other languages.
So, my question then is: does anyone have any suggestions for better alternatives or any experiences they can share using one of these solutions to handle larger/broader install bases?

Setuptools may be capable of more of what you're looking for than you realize -- if you need a custom version of lxml to work correctly on MacOS X, for instance, you can put a URL to an appropriate egg inside your setup.py and have setuptools download and install that inside your developers' environments as necessary; it also can be told to download and install a specific version of a dependency from revision control.
That said, I'd lean towards using a scriptably generated virtual environment. It's pretty straightforward to build a kickstart file which installs whichever packages you depend on and then boot virtual machines (or production hardware!) against it, with puppet or similar software doing other administration (adding users, setting up services [where's your database come from?], etc). This comes in particularly handy when your production environment includes multiple machines -- just script the generation of multiple VMs within their handy little sandboxed subnet (I use libvirt+kvm for this; while kvm isn't available on all the platforms you have developers working on, qemu certainly is, or you can do as I do and have a small number of beefy VM hosts shared by multiple developers).
This gets you out of the headaches of supporting N platforms -- you only have a single virtual platform to support -- and means that your deployment process, as defined by the kickstart file and puppet code used for setup, is source-controlled and run through your QA and review processes just like everything else.

I always create a develop.py file at the top level of the project, and have also a packages directory with all of the .tar.gz files from PyPI that I want to install, and also included an unpacked copy of virtualenv that is ready to run right from that file. All of this goes into version control. Every developer can simply check out the trunk, run develop.py, and a few moments later will have a virtual environment ready to use that includes all of our dependencies at exactly the versions the other developers are using. And it works even if PyPI is down, which is very helpful at this point in that service's history.

Basically, you're looking for a cross-platform software/package installer (on the lines of apt-get/yum/etc.) I'm not sure something like that exists?
An alternative might be specifying the list of packages that need to be installed via the OS-specific package management system such as Fink or DarwinPorts for Mac OS X and having a script that sets up the build environment for the in-house code?

I have continued to research this issue since I posted the question. It looks like there are some attempts to address some of the needs I outlined, e.g. Minitage and Puppet which take different approaches but both may accomplish what I want -- although Minitage does not explicitly state that it supports Windows. Lacking any better options I will try to make either one of these or just extensive customized use of zc.buildout work for our needs, but I still feel like there must be better options out there.

You might consider creating virtual machine appliances with whatever production OS you are running, and all of the software dependencies pre-built. Code can be edited either remotely, or with a shared folder. It worked pretty well for me in a past life that had a fairly complicated development environment.

Puppet doesn't (easily) support the Win32 world either. If you're looking for a deployment mechanism and not just a "dev setup" tool, you might consider looking into ControlTier (http://open.controltier.com/) which has a open-source cross-platform solution.
Beyond that you're looking at "enterprise" software such as BladeLogic or OpsWare and typically an outrageous pricetag for the functionality offered (my opinion, obviously).
A lot of folks have been aggressively using a combination of Puppet and Capistrano (even non-rails developers) for deployment automation tools to pretty good effect. Downside, again, is that it's expecting a somewhat homogeneous environment.

Develop Reference

Python is a programming language that lets you work quickly and integrate systems more effectively.