Is there a good discussion of setup.py and python packages from the package user perspective? Then maybe also from the package developer viewpoint too?
The Hitchhiker's Guide to Packaging is a good read. Covers everything from creating your own packaging to finding packages, installation, virtualenvs, etc.
Strictly speaking, the valid (though subjective) answer for your question would be "yes".
But I think you need the links. Well, from the package developer viewpoint, I had the very same question. In Python world, package management appears to be a bit of a mess at the moment. Here are some resources that I found useful:
First, James Bennet's (django dev) ‘On packaging’
Ian Bicking's ‘Corrections To “On Packaging”’—very useful read, made finally clear for me the complicated relationship between distribute and setuptools, as well as many other things
Occasionally, Building and Distributing Packages with Distribute
setup.py files from different projects
(Though I also think it would be better if you tell us what exactly do you want to learn.)
Related
Rather by accident I found myself in a situation in a previous role where the previous admin apparently installed "Python bindings" of InfluxDB and Docker-Compose and magically both applications where available on the systems while I was sure that they where written in Go.
I had a few issues with that:
It's incomprehensible what happens here, there should be some go binary belonging to the application but I can't find it by name, I doubt that docker-compose and influxdb have been rewritten in Python just to have one more option available while at least docker-compose static binaries are available on Github for direct download. It doesn't make a lot of sense to me.
Undermining security guidelines set by the organization and best practices for systems administration.
Dependency Confusion
Links to the packages on PyPI:
https://pypi.org/project/influxdb/
https://pypi.org/project/docker-compose/
I haven't looked into Python wheels and packaging before beyond Debian packaging, I just got curious again the get to the bottom of this strange usage pattern.
Docker-Compose refers to https://github.com/docker/compose a project consisting of 95.5% Go code according to GitHub, which isn't really helpful since the source package and wheel package on PyPI look completely different and at first sight I'm overwhelmed by the amount of Python files. InfluxDB seems to be a better example but I would really appreciate help from a Python developer or package maintainer explaining to me what happening there. Thanks.
Edit 2022-09-10:
From the show notes of Security Now 887: https://www.grc.com/sn/sn-887-notes.pdf
a researcher at Checkmarx noted in a technical report they published last week that “A worrying
feature in pip/PyPI allows code to automatically run when developers are merely downloading a
package.” He added that the feature is alarming because “a great deal of the malicious packages
we are finding in the wild use this feature of code execution upon installation to achieve higher
infection rates.”
With my preexisting misconception about some PyPI packages like docker-compose, that sounded alarming to me.
The following article mentions that compiled libraries from C, Rust, Go and others can be bundled in packages, but no applications "hidden" as artifacts, which I assumed. https://realpython.com/python-wheels/
I've googled and googled, but have found almost nothing in the way of discussions or best practices in managing larger enterprise codebases in Python. Here, I'm simply soliciting any and all pointers to such information. Here's some background and some of the questions I'm looking to answer.
We're long-time Java developers, who have solved similar problems to those mentioned below largely using well established Java best practices, as well as Maven, Ant and a Sonotype Nexus repo.
I'm talking internal software only here. We're not looking to distribute anything Python-based. We've got multiple development groups using Python, each developing sharable utility code libraries, final web applications and stand-alone tools, all in pure Python. Each group has its own Github source repository.
How do we manage our shareable code, both within a group and across groups? Do we create eggs (or something similar) and distribute and install them into the Python system? If so, would we store them in our Nexus repo like our Java jars, or is there a more Python-specific method if internal package distribution? Or, do we just share raw code, checking out sources from multiple Github repos?
If we share raw code, how do we manage getting the Python searchpath right as we bring together code from multiple repositories?
How do we manage package namespaces when we want our packages to all live in a com.ourcompany base namespace? It seems like python isn't too happy when you bring together source trees with overlapping namespaces.
How do we manage third party package versioning? I've never seen easy_install or pip passed a version number. How do we lock down third party package versions?
Do tools exist to aid in Python code reviews, CI, regression testing, etc.?
We're relative newbies to Python code, so some of these questions may have fairly obvious answers. Still, I find it surprising that I can't find more information on managing larger Python codebases.
What issues will we encounter that I haven't thought to ask about, or don't yet know enough to even know to ask about?
Any valuable pointers will be greatly appreciated.
Well, I won't even try to answer all those (excellent) questions, but here are a few opinionated pointers which will hopefully help (as someone who works in both worlds, though more Java).
Packaging
If so, would we store them in our Nexus repo like our Java jars, or is
there a more Python-specific method if internal package distribution?
Or, do we just share raw code, checking out sources from multiple
Github repos?
Packaging in Python is historically a bit of a mess IMHO, though it feels like it's improving. Distutils is the major / native tool here - I've not used it much, feels slightly scary in places. In general, also check recommended tools.
Pip has all but won the war of mindshare, especially when installing 3rd party libraries. I've not solved the local library problem myself, (maybe someone else reading has), but if I were, I'd probably opt for Pip with local/network-disk repos e.g. by installing from wheels.
Another option (which can cause all sorts of hassles itself) is to package in your OS's native packager, be it Debian-style apt or by creating RPMs, etc. Of course, Windows not so much.
Versioning etc
How do we manage third party package versioning? I've never seen
easy_install or pip passed a version number.
Pip
Pip definitely supports version specifiers. Turns out Easy Install does too. I suppose many people / smaller projects opt for latest-and-greatest, which of course isn't always as "appropriate" in the enterprise...
Virtualenv
No discussion of versioning and Python would miss a Python2/3 reference, but I'm sure you're aware of all this already.
More important then would be to mention virtualenv. It truly frees you from the mess you can get in to testing multiple versions, bearing in mind especially that your (*NIX) operating systems typcially rely heavily on Python themselves. It's a big subject so have a look at the docs.
Developer Tooling
Do tools exist to aid in Python code reviews, CI, regression testing,
etc.?
Code Review
Very much so. Most code review tools are multi-language (it's just a formatting issue really), so just pick your favourite enterprise-friendly one, be it Crucible, Github's one (Barkeep?), Gerrit, or whatever.
CI
For CI you have almost as many options again. Running python apps is usually less involved than Java ones, so most CI systems, though often Java-focused, support Python. (FWIW, we use drone.io for Quod Libet). Jenkins should have no problem doing this, and it seems people have done so with TeamCity.
However, the "original" or "most Pythonic" is probably Buildbot, but I've not used it personally. Looks a lot newer than I remember, and it had quite a lot of support in the Python community I think...
Testing
For testing, though not quite as mature as JUnit / TestNG, check out the de-facto / JUnit-like unit testing unittest, but also (nicer?) alternatives like nose.py.
For higher level (BDD) testing, try something like Lettuce - as the name implies heavily inspired by Cucumber, or maybe Behave. I've not tried them, but common opinion is they're less mature than Cucumber / JBehave / Concordion / Rspec etc.
It's difficult to tell what is being asked here. This question is ambiguous, vague, incomplete, overly broad, or rhetorical and cannot be reasonably answered in its current form. For help clarifying this question so that it can be reopened, visit the help center.
Closed 10 years ago.
I am diving into the world of packaging Python applications and managed to get into this state of confusion where my head starts to spin due to all the concepts and options I am supposed to deal with.
Question:
What do I need to accomplish? Deploy my Python project from source located on a git server. The deployment tool should get and install all the dependencies, most of which are available via PIP and one needs to be fetched and installed via Git. The final result should be installable via Pip, so I can do something as:
[~] git clone git://some/path/project.git
[~] pip install project/
Context:
Currently I am trying to get Distutils2 to do what I want, but it seems that setup.py made using 'generate-setup' command doesn't play along with Pip.
I wanted to use Distutils2 since it is supposed to be most future proof of all. But various documentation on all the tools is just horrible (accurate info mixed with outdated and inaccurate information) in a way that makes a guy question his sanity.
So what should I do? Stick to distutils and setup.py? Or do I need to take a look at something the likes of Buildout?
Could the kind answerer please lay out what I am supposed to do with particular tool (e.g.: deploy your code using Distutils2, install dependencies using PIP, for git dependencies write a script and glue everything together doing XYZ).
Edit: I am using Distutils2 1.0a4, which seems incompatible with the docs.
Edit2: Reformatted the question to make it clearer what my question is really about.
Edit3: This is my fourth attempt at breaking the packaging and distribution toolchain of Python. I am not trying to get other people to do my work for me, however for a rookie it is pretty much impossible to crack what a particular tool is supposed to do, where it starts and where another ends. Especially because of the functional overlap between the tools. I am not located in Silicon Valley encircled by sages who could initiate me into the secrets and the publicly available documentation is of no use.
Final Edit:
Although I wasn't really thinking about replacing virtualenv with Buildout when starting this question. But while doing my research I realized something I always knew, but just didn't come to me in total clarity. There are many ways about Python packaging and deployment automation. There are also many tools which can help you get the stuff done. However while there is significant functional overlap between the tools, the toolchain is ever evolving and thus far there is no clear "standard best practice". The distribution toolchain arms race is still in full heat and no clear victor has emerged yet. This may be confusing to us noobs, who expect that most of shit in Python just works. What I was after (distutils/setuptools + pip + virtualenv in a Buildout fashion or even semi integrated with Buildout) certainly is dooable, but it just doesn't make much sense, not because its not possible - but because nobody does it. If you need this level of sophistication, then you need to commit. Personally I have decided to leave virtualenv behind (for this project) and embrace Buildout.
Take a look at buildout; together with a buildout plugin called mr.developer you can put together a deployment system that'll do all that you ask for.
There are plenty of examples and presentations on buildout configurations on the web, here are a few to get you started:
An introductionairy presentation on buildout: http://www.slideshare.net/djay/buildout-how-to-maintain-big-app-stacks-without-losing-your-mind
Includes a YouTube video of the presentation, so you can listen along.
Excellent blog post on how to use buildout to develop a Django application.
Includes details on how buildout and setup.py interplay.
The configuration for planet.plone.org: https://github.com/plone/planet.plone.org/blob/master/buildout.cfg
This builds a venus RSS aggregator with configuration, style, apache config and cronjobs, pulling in eggs as needed.
The Plone core development buildout: https://github.com/plone/buildout.coredev
Complex buildout that pulls in all the sources needed to develop the Plone CMS; this is a complex beast but it shows off what you can do with mr.developer.
It doesn't need to be difficult: install Jenkins and use pip's requirements.txt files to define the packages that your project needs. After that you can configure a build in Jenkins to perform various tasks, including installing the required packages. It can obtain the source code from your repository and install + build the whole project.
My program requires specific versions of several python packages. I don't want to have to require the user to specifically install the specific version, so I feel that the best solution is to simply install the package within the source repository, and to distribute it along with my package.
What is the simplest way to do this?
(Please be detailed - I'm familiar with pip and easy_install, but they don't seem to do this, at least not by default).
Go for virtualenv. Life will be much easier. MUCH easier. Basically, it allows you to create specific python environments as needed.
There are indeed two ways to get this done.
I usually use buildout (see a post by Jacob from Django: http://jacobian.org/writing/django-apps-with-buildout/) - and have everything from django up installed locally at the project's environment, with pydev and django support. It's very easy since I have projects that use latest versions of open source software and others that use specific versions of the same packages.
Another alternative is, as Charlie says, the virtualenv,which is designed to do just that. Many people recommend it, I've never used it myself as I'm happy with buildout.
I am new at writing APIs in python, in any language for that matter. I was hoping to get pointers on how i can create an API that can be installed using setup.py method and used in other python projects. Something similar to the twitterapi.
I have already created and coded all the methods i want to include in the API. I just need to know how to implement the installation so other can use my code to leverage ideas they may have. Or if i need to format the code a certain way to facilitate installation.
I learn best with examples or tutorials.
Thanks so much.
It's worth noting that this part of python is undergoing some changes right now. It's all a bit messy. The most current overview I know of is the Hitchhiker's Guide to Packaging: http://guide.python-distribute.org/
The current state of packaging section is important: http://guide.python-distribute.org/introduction.html#current-state-of-packaging
The python packaging world is a mess (like poswald said). Here's a brief overview along with a bunch of pointers. Your basic problem (using setup.py etc.) is solved by reading the distutils guide which msw has mentioned in his comment.
Now for the dirt. The basic infrastructure of the distribution modules which is in the Python standard library is distutils referred to above. It's limited in some ways and so a series of extensions was written on top of it called setuptools. Setuptools along with actually increasing the functionality provided a command line "installer" called "easy_install".
Setuptools maintenance was not too great and so it was forked and a more active branch called "distribute" was setup and it is the preferred alternative right now. In addition to this, a replacement for easy_install named pip was created which was more modular and useful.
Now there's a huge project going which attempts to fold in all changes from distribute and stuff into a unified library that will go into the stdlib. It's tentatively called "distutils2".