So I've been using Perl for several years now and I'm starting to dabble a little in Python. Is there a sort of CPAN but for Python? What's the normal way to manage modules in Python? Any direction would be greatly appreciated. FWIW I use Linux so Windows-only solutions aren't really useful to me.
The repository formerly known as Cheese Shop.
PyPI
The Python Package Index is a repository of software for the Python programming language. There are currently 9140 packages here. To contact the PyPI admins, please use the Get help or Bug reports links.
Also, take a look at
SIG for Python Resource Cataloguing
This SIG exists in order to discuss and build a catalog of Python resources. The SIG charter is:
The Python Catalog SIG (Special Interest Group) aims at producing a master index of Python software and other resources. It will begin by figuring out what the requirements are, converging on a design for the data schema, and producing an implementation. ("Implementation" will almost certainly include mean a set of CGI scripts for browsing the catalog, and may also contain a standard library module for automatically fetching and installing modules, if the SIG decides that's a worthwhile feature.)
Check out Python eggs / easy_install
Related
I've googled and googled, but have found almost nothing in the way of discussions or best practices in managing larger enterprise codebases in Python. Here, I'm simply soliciting any and all pointers to such information. Here's some background and some of the questions I'm looking to answer.
We're long-time Java developers, who have solved similar problems to those mentioned below largely using well established Java best practices, as well as Maven, Ant and a Sonotype Nexus repo.
I'm talking internal software only here. We're not looking to distribute anything Python-based. We've got multiple development groups using Python, each developing sharable utility code libraries, final web applications and stand-alone tools, all in pure Python. Each group has its own Github source repository.
How do we manage our shareable code, both within a group and across groups? Do we create eggs (or something similar) and distribute and install them into the Python system? If so, would we store them in our Nexus repo like our Java jars, or is there a more Python-specific method if internal package distribution? Or, do we just share raw code, checking out sources from multiple Github repos?
If we share raw code, how do we manage getting the Python searchpath right as we bring together code from multiple repositories?
How do we manage package namespaces when we want our packages to all live in a com.ourcompany base namespace? It seems like python isn't too happy when you bring together source trees with overlapping namespaces.
How do we manage third party package versioning? I've never seen easy_install or pip passed a version number. How do we lock down third party package versions?
Do tools exist to aid in Python code reviews, CI, regression testing, etc.?
We're relative newbies to Python code, so some of these questions may have fairly obvious answers. Still, I find it surprising that I can't find more information on managing larger Python codebases.
What issues will we encounter that I haven't thought to ask about, or don't yet know enough to even know to ask about?
Any valuable pointers will be greatly appreciated.
Well, I won't even try to answer all those (excellent) questions, but here are a few opinionated pointers which will hopefully help (as someone who works in both worlds, though more Java).
Packaging
If so, would we store them in our Nexus repo like our Java jars, or is
there a more Python-specific method if internal package distribution?
Or, do we just share raw code, checking out sources from multiple
Github repos?
Packaging in Python is historically a bit of a mess IMHO, though it feels like it's improving. Distutils is the major / native tool here - I've not used it much, feels slightly scary in places. In general, also check recommended tools.
Pip has all but won the war of mindshare, especially when installing 3rd party libraries. I've not solved the local library problem myself, (maybe someone else reading has), but if I were, I'd probably opt for Pip with local/network-disk repos e.g. by installing from wheels.
Another option (which can cause all sorts of hassles itself) is to package in your OS's native packager, be it Debian-style apt or by creating RPMs, etc. Of course, Windows not so much.
Versioning etc
How do we manage third party package versioning? I've never seen
easy_install or pip passed a version number.
Pip
Pip definitely supports version specifiers. Turns out Easy Install does too. I suppose many people / smaller projects opt for latest-and-greatest, which of course isn't always as "appropriate" in the enterprise...
Virtualenv
No discussion of versioning and Python would miss a Python2/3 reference, but I'm sure you're aware of all this already.
More important then would be to mention virtualenv. It truly frees you from the mess you can get in to testing multiple versions, bearing in mind especially that your (*NIX) operating systems typcially rely heavily on Python themselves. It's a big subject so have a look at the docs.
Developer Tooling
Do tools exist to aid in Python code reviews, CI, regression testing,
etc.?
Code Review
Very much so. Most code review tools are multi-language (it's just a formatting issue really), so just pick your favourite enterprise-friendly one, be it Crucible, Github's one (Barkeep?), Gerrit, or whatever.
CI
For CI you have almost as many options again. Running python apps is usually less involved than Java ones, so most CI systems, though often Java-focused, support Python. (FWIW, we use drone.io for Quod Libet). Jenkins should have no problem doing this, and it seems people have done so with TeamCity.
However, the "original" or "most Pythonic" is probably Buildbot, but I've not used it personally. Looks a lot newer than I remember, and it had quite a lot of support in the Python community I think...
Testing
For testing, though not quite as mature as JUnit / TestNG, check out the de-facto / JUnit-like unit testing unittest, but also (nicer?) alternatives like nose.py.
For higher level (BDD) testing, try something like Lettuce - as the name implies heavily inspired by Cucumber, or maybe Behave. I've not tried them, but common opinion is they're less mature than Cucumber / JBehave / Concordion / Rspec etc.
in the early nineties I bought the tawk (Thompson awk) compiler and developed since than a lot of programs for my companies. The compiler produces fast reliable code and has a lot of useful extensions for the Windows environment.
Until now it worked in the W95, W2K and XP without problems but now that I have to move to W7 / 2008 Server I am in doubt if it is wise to try to continue with this although excellent but outdated and no more supported product.
My questions to you :
What can you recommend for real-world business applications (all of them run in batch mode - no GUI) ?
Has someone made a bigger transition (manual reprogramming) from xxx (here: awk) to Python ?
What Python implementation should I use ? I need fast file I/O and extensive random access to 100.000+ dictionary elements for 1.5 Mio monthly transactions
Which is the most stable version ? 2.7.x ? 3.1.x ?
Does 3.1 support Windows Automation ? I have to drive the Excel API through COM and need access to MS-SQL
And : is Python really the choice for this kind of task ?
Thank you for your honorable answersMeiki
Python is a good choice for these types of tasks. You should use Python 2.7.2 and since you are on Windows, you may want to use the Activestate Python distribution http://www.activestate.com/activepython/downloads which is standard Python bundled with a number of additional useful libraries and an easy to use package manager named PyPm.
Also, you should have a look at the slide presentations here http://www.dabeaz.com/generators/ and here http://www.dabeaz.com/generators-uk/index.html because Python generators are a powerful way to handle the same types of batch processing that AWK is used for.
As for Windows automation, the Activestate distro for Windows includes this, or you can download and install pywin separately if you are using the Python.org distro. I've used Python and COM to extract data from Word documents, Excel spreadsheets, Outlook mailboxes and Lotus Notes databases among other things.
If you want to stick with the awk style of doing things, you can write some Python helper functions so that your Python programs don't look so foreign to awk eyes. In fact, pyawk.py may already be all that you need http://pyawk.sourceforge.net/ You can download it here http://sourceforge.net/projects/pyawk/files/pyawk/pyawk-0.4/ however be warned that Python has evolved a lot since it was last updated.
Without question this is the best way to add Tawk to Django/python. It solved all my needs.
https://github.com/CleitonDeLima/django-tawkto
Python 3.x is looking ever more tempting with cleaned up syntax (I like it, others may not) new features and what looks like a gradual progression towards more speed and better multithreading.
But Python 3.x is still held back by lack of 3rd party support. Important packages like Django, Twisted, etc. are not ported. It's hard to get an overview of where the bottlebecks in the migration are, how far it has come, and if it's progressing at all. The migration dependencies are also hard to map. Also, projects are probably waiting for Python 3.x to offer some major improvement over 2.x that would justify the effort of porting.
Ideally, there would be a site for tracking this migration overall, with (links to) migration plans and dependencies shown so that people willing to help the migration globally could coordinate their efforts and help specific projects. Perhaps also linking to projects' bug tracking systems for relevant migration-related bugs.
But perhaps I'm just not looking hard enough. Does someone know of any efforts to track global migration to Python 3.x?
(By "global", I mean the universe of open source projects built on Python.)
Update:
There's a poll right now on the Python home page which asks about packages you'd like to see ported to Python 3.x.
George Brandl has made a script that generates a graph with the amount of packages supporting Python 3:
The Link on the CheeseShop front page shows the packages in question: http://pypi.python.org/pypi?%3aaction=browse&c=533&show=all
There is also (a pretty crummy) list of unported packages ordered by how many depends on it: http://onpython3yet.com/ Why do I say it's crummy? Well, because it is done entirely without manual fixing up, resulting in things like listing Python as a package. This is to a large extent because people don't know that the "Dependencies" listing isn't a place to just list any sort of random dependencies, it should be used to list the packages that should be auto installed when you use easy_install/PIP. But for example in the Django world, they don't know that so you see things like "django-saddle" depending on Django and Python, and hence not being easy_installable.
That said, the list is interesting, and we see that PIL really should get ported.
Now this is not anything "global" it's just the packages on PyPI, and as such tend to be mostly Python modules, not separate applications. But I think the trend in general is visible there anyway.
The Python Package Index (PyPI) allows you to search for Python 3rd-party modules that support Python 3.x. It even has a Python 3 packages link which lists them all.
But that doesn't track individual projects' progress on Python 3 support. It just tells you which projects have achieved it.
Something I'd be interested to see is a graph of the total number/percentage of Python 3 packages in PyPI over time (from Python 3 release until present). I don't know if anyone has tracked this, or if the PyPI administrators have enough history data to produce such graphs.
I am new at writing APIs in python, in any language for that matter. I was hoping to get pointers on how i can create an API that can be installed using setup.py method and used in other python projects. Something similar to the twitterapi.
I have already created and coded all the methods i want to include in the API. I just need to know how to implement the installation so other can use my code to leverage ideas they may have. Or if i need to format the code a certain way to facilitate installation.
I learn best with examples or tutorials.
Thanks so much.
It's worth noting that this part of python is undergoing some changes right now. It's all a bit messy. The most current overview I know of is the Hitchhiker's Guide to Packaging: http://guide.python-distribute.org/
The current state of packaging section is important: http://guide.python-distribute.org/introduction.html#current-state-of-packaging
The python packaging world is a mess (like poswald said). Here's a brief overview along with a bunch of pointers. Your basic problem (using setup.py etc.) is solved by reading the distutils guide which msw has mentioned in his comment.
Now for the dirt. The basic infrastructure of the distribution modules which is in the Python standard library is distutils referred to above. It's limited in some ways and so a series of extensions was written on top of it called setuptools. Setuptools along with actually increasing the functionality provided a command line "installer" called "easy_install".
Setuptools maintenance was not too great and so it was forked and a more active branch called "distribute" was setup and it is the preferred alternative right now. In addition to this, a replacement for easy_install named pip was created which was more modular and useful.
Now there's a huge project going which attempts to fold in all changes from distribute and stuff into a unified library that will go into the stdlib. It's tentatively called "distutils2".
I found http://www.iseriespython.com/, which is a version of Python for the iSeries apparently including some system specific data access classes. I am keen to try this out, but will have to get approval at work to do so. My questions are:
Does the port work well, or are there limits to what the interpreter can handle compared with standard Python implementations?
Does the iSeries database access layer work well, creating usable objects from table definitions?
From what I have seen so far, it works pretty well. Note that I'm using iSeries Python 2.3.3. The fact that strings are natively EBCDIC can be a problem; it's definitely one of the reasons many third-party packages won't work as-is, even if they are pure Python. (In some cases they can be tweaked and massaged into working with judicious use of encoding and decoding.) Supposedly 2.5 uses ASCII natively, which would in principle improve compatibility, but I have no way to test this because I'm on a too-old version of OS/400.
Partly because of EBCDIC and partly because OS/400 and the QSYS file system are neither Unix-like nor Windows-like, there are some pieces of the standard library that are not implemented or are imperfectly implemented. How badly this would affect you depends on what you're trying to do.
On the plus side, the iSeries-specific features work quite well. It's very easy to work with physical files as well as stream files. Calling CL or RPG programs from Python is fairly painless. On balance, I find iSeries Python to be highly usable and very worthwhile.
Update (2012): A lot of work has gone into iSeries Python since this question was asked. Version 2.7 is now available, meaning it's up-to-date as far as 2.x versions go. A few participants of the forum are reasonably active and provide amazing support. One of them has gotten Django working on the i. As expected, the move to native ASCII strings solves a lot of the EBCDIC problems and greatly increases compatibility with third-party packages. I enthusiastically recommend iSeries Python 2.7 for anyone on V5R3 or later. (I still strongly recommend iSeries Python 2.3.3 for those who are on earlier versions of the operating system.)
Update (2021): Unfortunately, iSeriesPython is no longer maintained, and the old website and forum are gone. You can still get the software from its SourceForge repository, and it is still an amazingly useful and worthwhile asset for those who are stuck on old (pre-7.2) versions of the operating system. For those who are on 7.2 or newer, there is a Python for PASE from IBM, which should be considered the preferred way to run Python on the midrange platform. This version of Python is part of a growing ecosystem of open source software on IBM i.
It sounds like it is would work as expected. Support for other libraries might be pretty limited, though.
Timothy Prickett talks about some Python ports for the iSeries in this article:
http://www.itjungle.com/tfh/tfh041706-story02.html
Also, some discussion popped up in the Python mailing archives:
http://mail.python.org/pipermail/python-list/2004-January/245276.html
iSeriesPython is working very well.
We are usning it since 2005 (or earlier) in our Development and Production Environments as an utility language, for generating of COBOL source code, generating of PCML interfaces, sending SMS, validating/correcting some data ... etc.
With iSeriesPython you can access the iSeries database at 2 ways: using File400 and/or db2 module. You can execute OS/400 commands and you can work with both QSYS.LIB members and IFS stream files.
IMHO, iSeries Python is very powerful tool, more better than REXX included with iSeries.
Try it!
I got permission to install iSeries Python on a box about 3 years ago. I found that it worked pretty much as advertised. I contacted the developer and he was very good about answering questions. However, before I could think about using it in production, I had to approach the developer regarding a support contract. That really isn't his gig, so he said no and we scrapped the idea. The main limitation I found is that it is several releases behind Python on other platforms.
I have also had very good experience with Jython on the iSeries. Java is completely supported on the iSeries. Theoretically, everything you can do in RPG on the iSeries, you can do in Java, which means you can do it in Jython. I was sending email from an AS/400 (old name for iSeries) via JPython (old name for Jython) and smtplib.py in 1999 or 2000.
Another place to look is on the mailing list MIDRANGE-L or search the archives for the list at midrange.com. I know they have talked about this a while back.