Twisted + SQLAlchemy and the best way to do it

Twisted + SQLAlchemy and the best way to do it - python

So I'm writing yet another Twisted based daemon. It'll have an xmlrpc interface as usual so I can easily communicate with it and have other processes interchange data with it as needed.
This daemon needs to access a database. We've been using SQL Alchemy in place of hard coding SQL strings for our latest projects - those mostly done for web apps in Pylons.
We'd like to do the same for this app and re-use library code that makes use of SQL Alchemy. So what to do? Well of course since that library was written for use in a Pylons app it's all the straight-forward blocking style code that everyone is accustomed to and all of the non-blocking is magically handled by Pylons via threading, thread locals, scoped sessions and so on.
So now for Twisted I guess I'm a bit stuck. I could:
Just write the sql I need directly if it's minimal and use the dbapi pool in twisted to do runInteractions etc when I need to hit the db.
Use the objects and inherently blocking methods in our library and block now and then in my Twisted daemon. Bah.
Use sAsync which was last updated in 2008 and kind of reuse the models we have defined already but not really and this doesn't address that the library code needs to work in Pylons too. Does that even work with the latest version SQL Alchemy? Who knows. That project looked great though - why was it apparently abandoned?
Spawn a separate subprocess and have it deal with the library code and all it's blocking, the results being returned back to my daemon when ready as objects marshalled via YAML over xmlrpc.
Use deferToThread and then expunge the objects returned having made sure to do eager loads so that I have all my stuff that I might need. Seems kind of ugha to me.
I'm also stuck using Python 2.5.4 atm so no 2.6 yet and I don't think I can just do an import from future to get access to the cool new multiprocessing module stuff in there. That's OK though I guess as we've got dealing with interprocess communication down pretty well.
So I'm leaning towards option 4 mostly as that would avoid the mortal sin of logic duplication with option 1 while also staying the heck away from threads.
My first attempt though will be option 2 to just get the thing going and then separate out the calls to the library code perhaps into a separate process if it looks like there's a good chance that something might take a bit too long to block on. Sad. Maybe a combination of Stackless Python and Twisted would be interesting here.
Any better ideas?

In the intervening couple of years, Alex Gaynor created https://github.com/alex/alchimia which may be a better central repository for doing integration with SQLAlchemy and Twisted.

Firstly, I can unfortunately only second your opinion that twisted and
SQLAlchemy don't play along very well. I have worked some with both
and would be somewhat afraid of the complexity that would arise from
putting them together.
All the database integration layers that I know of to date use
twisteds threading integration layer, and if you want to avoid that at
all costs you are pretty much stuck with point 4 in your list.
On the other hand, I have seen examples of database connecting code
using deferToThread() and friends that worked very well.
Anyway, some pointers if you'd be ready to consider other frameworks
than SQLAlchemy:
The DivMod guys have been doing some tentative work on twisted -
database integration based on the Storm ORM (google for "storm orm").
See this link for an example:
http://divmod.readthedocs.org/en/latest/products/nevow/storm-approach.html
Also, head over to DivMod's site and have a look at the sources of
their Axiom db layer (probably not of any use to you directly since
it's Sqlite only, but it's principles might be useful).

There's a storm branch that you can use with twisted directly (internally it does the defer to thread stuff) on launchpad https://code.launchpad.net/~therve/storm/twisted-integration. I've used it nicely.
Sadly sqlalchemy is significantly more complex in implementation to audit for async usage. If you really want to use it, i'd recommend an out of process approach with a storage rpc layer.
alternatively if your feeling adventurous and using postgresql, the latest pyscopg2 supports true async usage (https://launchpad.net/txpostgres), and the storm source is pretty simple to hack on ;-)
incidentally the storm you tried last year may not have had the C-extension on by default (it is now in the latest releases.) which might account for your speed issues.

Perhaps twistar is what you're looking for. It's a native active record (aka ORM) implementation for twisted, working on top of twisted.enterprise.adbapi.
http://findingscience.com/twistar/

Related

Python - Multiprocessing and database entries

I'm working on a framework for Digital Forensic Investigators to use to compare files with each other for my Master's capstone project. However, I ran into a bit of a snag...
I'm trying to implement multiprocessing on the comparisons since using a single core seems to be really slow. The trouble I'm having, however, is when the code goes to enter information into an SQLite database. It will occasionally get a "Database is locked" error when two cores complete at nearly the same time.
So, simple side of my question, is it unsafe to operate database functions within a multiprocessing environment due to the errors I'm encountering? If not, is there a method of going about this that is safe and won't result in random errors?
Thanks!

Your problem is that you are trying to have multiple writers access a toy database -- i.e. sqlite -- which is stored in a single file. Using Lock might help, but it's going to kill your multiprocess throughput because of all the waiting-for-the-lock time. In essence, the lock choke point will serialize your program.
Setting up either MySQL or Postgres on almost any platform is straightforward, and there are several excellent Python modules for accessing them. Using one of those will completely eliminate this problem.
Update for an extended response to comment:
I always ask clients / students, "What problem are you trying to solve?" I'm assuming that you are not trying to create a database system, simply to use one. SQLite3 is fine for a well-defined set of problems, but multiprocess access is not one of them. I could veer off into asking what aspect of your project requires multiprocess access, but I'll assume that you have already determined that this is needed. I don't know either your programming skills or your understanding of how a database works, so forgive me if the following is a bit basic.
Normally you need a database (my preference is Postgres), and a Python module that understands all of the fiddly details of how to talk to that database. Then you need to know what it is you want the DBMS to do for you. The Good News is that you are hardly the first to go down this path.
The Postgres Wiki is full of good stuff. See their page on Python Drivers. Psycopg2 is the category leader and runs on Win/Linux/Mac. Also check out PyPi, the Python Package Index, for many well-written extensions.
If you want to stay more object-oriented, as opposed to writing straight SQL, you might want to look at an ORM like SQLAlchemy. This is another category leader that is well-maintained and widely deployed.
The value of using an ORM is that you can (mostly) keep your head in ObjectLand, where most of your problem lives, and not get tangled up in the cognitive dissonance created by object-oriented programming vs. relational database management, which are two very different views of the world of data.
If you need more help, email me. My address is in my profile.

You can make use of Lock. Take a look at https://docs.python.org/2/library/multiprocessing.html#synchronization-between-processes

Bad practice to have ORMs with NoSQL stores?

I use Redis (redis-py) inside my Python platform. Recently it was suggested that I switch to an ORM.
E.g.: python-stdnet, rom or redisco
Is use of ORMs considered bad practice in the NoSQL world?

Ultimately the question boils down to at what layer do you want to write code.
Do you want to write code that manipulates data structures in a remote database, or do you want to write higher-level code that uses the abstractions built on top of those data structures? You can think of it as a similar question about relational databases as do you want to write SQL, or do you want to write higher-level code?
Personally, despite using rom myself for a variety of tasks (I am the author), I also directly manipulate Redis in the same projects where it makes sense.

Comments pointing out that the R in ORM is for relational are technically correct. That doesn't mean there aren't valid uses and reasons for libraries that abstract redis away.
There are some great libraries that make interfacing with a redis feel much nicer and more idiomatic to the language you are using. For ruby libraries like ohm or redis-native_hash (disclosure: I wrote that one) do just that. For python there are tools like redisco and surely others. These make persisting objects to redis very simple and make working with redis feel much more ruby-ish or python-ish.
Here are a few more benefits from using even the most basic abstraction, like a very thin wrapper you might write and keep in your application:
Switching redis clients will be easier. Maybe you'll never do this, but if you did, changing your calls to redis in one place (your wrapper) is much simpler than changing them everywhere you use redis.
Implementing things you might need for scaling, like sharding or connection pooling, is likely going to be easier if your calls are made through some abstraction.
Replacing redis with some other key/value store or data structure server would be simpler if an abstraction is in place.
I'm not advocating using an object mapping library or building your own abstraction, just pointing out there are valid reasons why you would. Its up to you to evaluate your needs and pick what works best for you. There is nothing wrong with calling redis directly either.

Which way to go with twisted and web-programming?

So, I programmed this twisted application a few months ago, which I now would like to extend with a web-based user interface for configuration.
The Twisted website recommends Nevow, but I am not really sure if this is a good choice. Their website is down for a while it seems and their launchpad page hadn't seen any update in half a year. Is this project dead?
Additionally I have seen discussion of moving parts of Nevow into twisted.web on the twisted-web mailinglist. So, is it still recommended for new developments?
Another idea was using Django. I would need user authentication and permissions anyway in the config-interface, and I am quite familiar with it. (I have never worked with Nevow or twisted.web)
But it seems quite difficult to interface both worlds, all I could find were examples of running Django with WSGI in Twisted.
Are there any other possibilities to have a slick looking user interface on top of twisted?

First, let me address the perception that Nevow is dead. The launchpad project containing the code for Nevow (and the rest of the Divmod projects) is divmod.org on launchpad. A hardware failure has badly impacted the project's public presence, but it's still there, and other things (like the wiki and the tickets) are in the process of being recovered. There isn't a lot of active maintenance work going on right now, but that's mostly because it's good enough for most of its users; there are lots of people who depend on Nevow and would be very upset if it stopped working. Those people have the skills and experience necessary to continue maintaining it. So, while it's not being actively promoted right now, I think it's unlikely that it's going to go away.
My long-term hope for Nevow would be as follows. (I'd say "plan", but since I haven't been actively involved with its maintenance lately, this is really up to those who are.) First, I'd like to extract its templating facilities and move them into twisted.web. The clean, non-deprecated API for Nevow is mostly covered by nevow.page.Element and the various loaders. Twisted itself wants to generate HTML in a few places and these facilities could be useful. Then we should throw out the "appserver" and resource-model parts of Nevow. Those are mostly just a random collection of bugfixes or alterations for twisted.web, most of which were present in some form in twisted.web2 and will therefore either be rolled back into twisted.web anyway, or have already been applied there. Finally there's the question of Athena. While two-way communication is one of Twisted's strengths, Athena is itself a gigantic, sprawling JavaScript codebase and should probably remain its own project.
Third, on to the main question, given this information, what should you do now?
Generally speaking, I'd say, "use nevow". The project has some warts, it needs more documentation and its API needs to be trimmed to eliminate some old and broken stuff, but it's still quite useful and very much alive. To make up for the slightly sparse documentation, you can join the #divmod or #twisted.web channels on Freenode to get help with it. If you help out by contributing patches where you can, you will find that you'll get a lot of enthusiastic help there. When you ignore the deprecated parts Nevow has a pretty small, sane, twisted friendly API. The consequence of the plan for Nevow's evolution that I outlined above are actually pretty minimal. If it even happens at all, what it means for you is, in 1-5 years, when you upgrade to a new version of Twisted, you'll get a couple of deprecation warnings, change some import lines in your code from from nevow.page import ...; from nevow.loaders import ... to some hypothetical new thing like from twisted.web.page.element import ...; from twisted.web.page.templates import ..., or somesuch. Most of the API past that point should remain the same, and definitely the high-level concepts shouldn't change much.
The main advantage that you get from using Nevow is that it's async-friendly and can render pages in your main thread without blocking things. Plus, you can then get really easy COMET for free with Athena.
You can also use Django. This is not quite as async-friendly but it obviously does have a broader base of support. However, "not as async friendly" doesn't mean "hard to use". You can run it in twisted.web via WSGIResource, and simply use blockingCallFromThread in your Django application to invoke any Twisted API that returns a Deferred, which should be powerful enough to do just about anything you want. If you have a more specific question about how to instantiate Twisted web resources to combine Twisted Web and Django, you should probably ask it in its own Stack Overflow question.

Nevow is still a good choice if you want support for Deferreds in the templating system you use (it's not dead). It also has a few advantages over plain Twisted Web when it comes to complicated URL dispatch. However, it is basically just a templating system. Twisted Web is the real web server. So either way, you're going to use Twisted Web. In fact, even if you use Django in Twisted Web's WSGI container, you're still going to use Twisted Web. So learning things about Twisted Web isn't going to hurt you.
If you're going to be generating any amount of HTML, then you very much want to use an HTML templating library. By this point no one should be constructing HTML using primitive string operations. So if you want to use one of the other Python HTML templating libraries out there - Cheetah, Quixote, etc - instead of Nevow, that's great! You're just going to use the templating library to get a string to write out in response to an HTTP request. Twisted Web doesn't care where the string came from.
And if you do want to do something with Django (or another WSGI-based system), then you can certainly deploy this in your Twisted process using Twisted Web's WSGI support. And you can still interact between the WSGI applications and the rest of your Twisted code, as long as you exercise a little care - WSGI applications run in a thread pool, and Twisted APIs are not thread-safe, you have to invoke them with reactor.callFromThread or one of the small number of similar APIs (in particular, blockingCallFromThread is sometimes a useful higher-level tool to use).

At this point Nevow is definitively dead. As illustration of how dead it is, there is a bug that prevents installation of Nevow using pip, which was fixed on trunk in 2009, but it isn't in any release because there has been no release since then.
twisted.web and in particular twisted.web.template cover pretty much all of what was useful in Nevow, and should be used for any new project that was considering using Nevow.

Architecting from scratch in Python: what to use?

I'm lucky enough to have full control over the architecture of my company's app, and I've decided to scrap our prototype written in Ruby/Rails and start afresh in Python. This is for a few reasons: I want to learn Python, I prefer the syntax and I've basically said "F**k it, let's do it."
So, baring in mind this is going to be a pretty intensive app, I'd like to hear your opinions on the following:
Generic web frameworks
ORM/Database Layer (perhaps to work with MongoDB)
RESTful API w/ oAuth/xAuth authentication
Testing/BDD support
Message queue (I'd like to keep this in Python if possible)
The API is going to need to interface with a Clojure app to handle some internal data stuff, and interface with the message queue, so if it's not Python it'd be great to have some libraries to it.
TDD/BDD is very important to me, so the more tested, the better!
It'll be really interesting to read your thoughts on this. Much appreciated.
My best,
Jamie

Frameworks
OK, so I'm a little biased here as I currently make extensive use of Django and organise the Django User Group in London so bear that in mind when reading the following.
Start with Django because it's a great gateway drug. Lots of documentation and literature, a very active community of people to talk to and lots of example code around the web.
That's a completely non-technical reason. Pylons is probably purer in terms of Python philosophy (being much more a collection of discrete bits and pieces) but lots of the technical stuff is personal preference, at least until you get into Python more. Compare the very active Django tag on Stack Overflow with that of pylons or turbogears though and I'd argue getting started is simply easier with Django irrespective of anything to do with code.
Personally I default to Django, but find that an increasing amount of time I actually opt for writing using simpler micro frameworks (think Sinatra rather than Rails). Lots of things to choose from (good list here, http://fewagainstmany.com/blog/python-micro-frameworks-are-all-the-rage). I tend to use MNML (because I wrote parts of it and it's tiny) but others are actively developed. I tend to do this for small, stupid web services which are then strung together with a Django project in the middle serving people.
Worth noting here is appengine. You have to work within it's limitations and it's not designed for everything but it's a great way to just play with Python and get something up and working quickly. It makes a great testbed for learning and experimentation.
Mongo/ORM
On the MongoDB front you'll likely want to look at the basic python mongo library ( http://api.mongodb.org/python/ ) first to see if it has everything you need. If you really do want something a little more ORM like then mongoengine (http://hmarr.com/mongoengine/) might be what you're looking for. A bunch of folks are also working on making Django specifically integrate more seamlessly with nosql backends. Some of that is for future Django releases, but django-norel ( http://www.allbuttonspressed.com/projects/django-nonrel) has code now.
For relational data SQLAlchemy ( http://www.sqlalchemy.org/) is good if you want something standalone. Django's ORM is also excellent if you're using Django.
API
The most official Oauth library is python-oauth2 ( http://github.com/simplegeo/python-oauth2), which handily has a Django example as part of it's docs.
Piston ( http://bitbucket.org/jespern/django-piston/wiki/Home) is a Django app which provides lots of tools for building APIs. It has the advantage of being pretty active and well maintained and in production all over the place. Other projects exist too, including Dagny ( http://zacharyvoase.github.com/dagny/) which is an early attempt to create something akin to RESTful resources in Rails.
In reality any Python framework (or even just raw WSGI code) should be reasonably good for this sort of task.
Testing
Python has unittest as part of it's standard library, and unittest2 is in python 2.7 (but backported to previous versions too http://pypi.python.org/pypi/unittest2/0.1.4). Some people also like Nose ( http://code.google.com/p/python-nose/), which is an alternative test runner with some additional features. Twill ( http://twill.idyll.org/) is also nice, it's a "a simple scripting language for Web browsing", so handy for some functional testing. Freshen ( http://github.com/rlisagor/freshen) is a port of cucumber to Python. I haven't yet gotten round to using this in anger, but a quick look now suggests it's much better than when I last looked.
I actually also use Ruby for high level testing of Python apps and apis because I love the combination of celerity and cucumber. But I'm weird and get funny looks from other Python people for this.
Message Queues
For a message queue, whatever language I'm using, I now always use RabbitMQ. I've had some success with stompserver in the past but Rabbit is awesome. Don't worry that it's not itself written in Python, neither is PostgresSQL, Nginx or MongoDB - all for good reason. What you care about are the libraries available. What you're looking for here is py-amqplib ( http://barryp.org/software/py-amqplib/) which is a low level library for talking amqp (the protocol for talking to rabbit as well as other message queues). I've also used Carrot ( http://github.com/ask/carrot/), which is easier to get started with and provides a nicer API. Think bunny in Ruby if you're familiar with that.
Environment
Whatever bits and pieces you decide to use from the Python ecosystem I'd recommend getting to who pip and virtualenv ( http://clemesha.org/blog/2009/jul/05/modern-python-hacker-tools-virtualenv-fabric-pip/ - note that fabric is also cool, but not essential and these docs are out of date on that tool). Think about using Ruby without gem, bundler or rvm and you'll be in the right direction.

Ok, you might be making a mistake, the same one I made when I started with python.
Before you decide on a thing like django, which is an excellent, yet atypical python web framework, spend an night cuddled up with:
This, is a good start. Make sure you do A little Werkzeug watching , Then check out
some classic WebOb. Maybe, if you feel the fire in the blood, and you might, wsgi is a bit flawed, but only to the gods, check out Flask
I'm not saying use it, Django is beautiful too, but if you don't know python, and you go through django, you run the risk of learning a framework.
WSGI is super straightforward. You'll find out about Paste, and Pastescript, and Pylons.
Then, make your decision. It'll be much easier learning stuff doing bare bones wsgi or Flask, stuff like variable assignment, using the interpreter, style concerns, testing, on 3 files for a couple of nights, instead of django. Take 2 nights. Then you'll see the great similarity between python web frameworks, instead of the differences. Hell, you might even roll with Flask.
Just some advice, I did the same thing with ruby, going in through Rails, and... well, strong words were said.
Language, then basic wsgi and testing, then pick your framework and roll

I'm new to python myself, and plan to get more in depth with it this year. I've had a few false starts at this, but always professional needs bring me back to PHP. The few times I've done some development, I've had really good experiences with web2py as a python framework. It's quite well done, and complete in features, while still being extremely lightweight. The database layer seems to be very flexible and mature.
As for TDD/BDD and the rest of your questions, I don't have any experience with python options, but would be interested to hear what others say.

I am using Twisted Framework based Nevow library for python based web app.
All your criteria fit into this single framework.

Suggestion Needed - Networking in Python - A good idea?

I am considering programming the network related features of my application in Python instead of the C/C++ API. The intended use of networking is to pass text messages between two instances of my application, similar to a game passing player positions as often as possible over the network.
Although the python socket modules seems sufficient and mature, I want to check if there are limitations of the python module which can be a problem at a later stage of the development.
What do you think of the python socket module :
Is it reliable and fast enough for production quality software ?
Are there any known limitations which can be a problem if my app. needs more complex networking other than regular client-server messaging ?
Thanks in advance,
Paul

Check out Twisted, a Python engine for Networking. Has built-in support for TCP, UDP, SSL/TLS, multicast, Unix sockets, a large number of protocols (including HTTP, NNTP, IMAP, SSH, IRC, FTP, and others)

Python is a mature language that can do almost anything that you can do in C/C++ (even direct memory access if you really want to hurt yourself).
You'll find that you can write beautiful code in it in a very short time, that this code is readable from the start and that it will stay readable (you will still know what it does even after returning one year later).
The drawback of Python is that your code will be somewhat slow. "Somewhat" as in "might be too slow for certain cases". So the usual approach is to write as much as possible in Python because it will make your app maintainable. Eventually, you might run into speed issues. That would be the time to consider to rewrite a part of your app in C.
The main advantages of this approach are:
You already have a running application. Translating the code from Python to C is much more simple than write it from scratch.
You already have a running application. After the translation of a small part of Python to C, you just have to test that small part and you can use the rest of the app (that didn't change) to do it.
You don't pay a price upfront. If Python is fast enough for you, you'll never have to do the optional optimization.
Python is much, much more powerful than C. Every line of Python can do the same as 100 or even 1000 lines of C.

To answer #1, I know that among other things, EVE Online (the MMO) uses a variant of Python for their server code.

The python that EVE online uses is StacklessPython (http://www.stackless.com/), and as far as i understand they use it for how it implements threading through using tasklets and whatnot. But since python itself can handle stuff like MMO with 40k people online i think it can do anything.
This bad answer and not really an answer to your question, rather addition to previous answer.
Alan.

Develop Reference

Python is a programming language that lets you work quickly and integrate systems more effectively.