I am now working on a big backend system for a real-time and history tracking web service.
I am highly experienced in Python and intend to use it with sqlalchemy (MySQL) to develop the backend.
I don't have any major experience developing robust and sustainable backend systems and I was wondering if you guys could point me out to some documentation / books about backend design patterns? I basically need to feed data to a database by querying different services (over HTML / SOAP / JSON) at realtime, and to keep history of that data.
Thanks!
Can you define "backend" more precisely? Normally, in web dev, I follow a MVC'ish structure where my "front-end", html/css/js and code dealing with displaying either, is loosly coupled with my "backend" model (business objects and data persistence; i.e. database).
I like Django's Model/View/Template approach:
http://docs.djangoproject.com/en/dev/faq/general/#django-appears-to-be-a-mvc-framework-but-you-call-the-controller-the-view-and-the-view-the-template-how-come-you-don-t-use-the-standard-names
But, you haven't really defined what you mean by "backend" so its hard to give advice on design patterns. You said you are experienced in Python, have you ever developed a database driven web application before?
update
Based on your comment, I won't be able to help much as I don't have much experience doing "backends" like that. However, seeing as how you are pulling in resources from the web, your latency/throughput is going to be pretty high. So, in order to increase overall effectiveness, you are going to want to have something that can run multiple threads or processes with pretty high concurrency. I suggest you check out the answers on this thread (and search for similar ones):
Concurrent downloads - Python
Specifically, I found the example for the recursive web server and the example following it to probably be a very good start on your solution:
http://eventlet.net/doc/examples.html#recursive-web-crawler
As for taking that idea and then turning it into a robust/continuous process, that's going to depend a lot on your platform and how well you do error handling. Basically:
run it in a loop and make sure you handle any error that can possibly be thrown
have some kind process monitoring your worker process to kill/restart it if it hangs or dies
make sure you have a monitoring solution to notify you if it stops working (nagios, etc.)
One of the best ways to keep things "robust" is to make them as simple (not simplistic) as possible. If all you are doing is pulling in info from the web, parsing it in some way, and then storing that info in a DB, then try to keep the process that simple. Don't add unnecessarily complexity in an effort to make it more robust. If you end up with a 200 line script that does what you want, great!
Use Apache, Django and Piston.
Use REST as the protocol.
Write as little code as possible.
Django models, forms, and admin interface.
Piston wrapppers for your resources.
Related
I'm working on a framework for Digital Forensic Investigators to use to compare files with each other for my Master's capstone project. However, I ran into a bit of a snag...
I'm trying to implement multiprocessing on the comparisons since using a single core seems to be really slow. The trouble I'm having, however, is when the code goes to enter information into an SQLite database. It will occasionally get a "Database is locked" error when two cores complete at nearly the same time.
So, simple side of my question, is it unsafe to operate database functions within a multiprocessing environment due to the errors I'm encountering? If not, is there a method of going about this that is safe and won't result in random errors?
Thanks!
Your problem is that you are trying to have multiple writers access a toy database -- i.e. sqlite -- which is stored in a single file. Using Lock might help, but it's going to kill your multiprocess throughput because of all the waiting-for-the-lock time. In essence, the lock choke point will serialize your program.
Setting up either MySQL or Postgres on almost any platform is straightforward, and there are several excellent Python modules for accessing them. Using one of those will completely eliminate this problem.
Update for an extended response to comment:
I always ask clients / students, "What problem are you trying to solve?" I'm assuming that you are not trying to create a database system, simply to use one. SQLite3 is fine for a well-defined set of problems, but multiprocess access is not one of them. I could veer off into asking what aspect of your project requires multiprocess access, but I'll assume that you have already determined that this is needed. I don't know either your programming skills or your understanding of how a database works, so forgive me if the following is a bit basic.
Normally you need a database (my preference is Postgres), and a Python module that understands all of the fiddly details of how to talk to that database. Then you need to know what it is you want the DBMS to do for you. The Good News is that you are hardly the first to go down this path.
The Postgres Wiki is full of good stuff. See their page on Python Drivers. Psycopg2 is the category leader and runs on Win/Linux/Mac. Also check out PyPi, the Python Package Index, for many well-written extensions.
If you want to stay more object-oriented, as opposed to writing straight SQL, you might want to look at an ORM like SQLAlchemy. This is another category leader that is well-maintained and widely deployed.
The value of using an ORM is that you can (mostly) keep your head in ObjectLand, where most of your problem lives, and not get tangled up in the cognitive dissonance created by object-oriented programming vs. relational database management, which are two very different views of the world of data.
If you need more help, email me. My address is in my profile.
You can make use of Lock. Take a look at https://docs.python.org/2/library/multiprocessing.html#synchronization-between-processes
I'm currently about to start designing a new application.
The application will allow a user to insert some data and will provide data analysis (with reports as well), i know it's not helpful but the data-processing will be done in post-processing so that's not really interesting for the front-end.
I'd like to start with the right path to help myself when there will be the need to scale to handle more users.
I'm thinking about PostgreSQL to store the data, because I've already used it and I like it (also if a NoSQL would be a good choice -since not all data needs to have a relation- I like the Postgres support and community and I feel better knowing that there's a big community out there to help me), MySQL (innodb) is also a good choice, tbh I've not a real reason to choose it over PostgreSQL and vice-versa (is maybe MySQL easier to create shards?).
I know several programming languages but my strengths are Python, C/C++, Javascript.
I'm not sure if I should choose a sync or async approach for this task (I could scale out by running more sync applications behind a load balancer).
I've already developed another big-size project that teached me a lot of things about concurrency, but there each choice was influenced according to the (whole rest of the team, but mostly by the) sysadmin skills, so we have used python (django) + uwsgi + nginx.
For this project (since it's totally different from the other - that was an e-commerce, this is such a SaaS) I was also considering to make use of node.js, it would be a good opportunity to try it out in a serious project.
The most heavy data processing would be done by post-processes so all the front-end (user website) would be mostly I/O (+1 to use an async enviroment).
What would you suggest?
ps. I must also keep in mind that first of all the project has to start, so I cannot only think about each possible design, but I should start writing code ASAP :-)
My current thoughts are:
- start with something you know
- keep it as simple as possibile
- track everything to find bottlenecks
- scale out
So it wouldn't really matter if I deploy sync or async, but I know async has much better performances, and each thing that could help me to get better results (ergo lower costs) is evaluable as well.
I'm curious to know what are your experiences (also with other technologies)...
I'm becoming paranoid about this scalability and I fear it could lead to a wrong design (it's also the first time I'm designing alone for a commercial purpose = FUD)
If you need some more info please let me know and I'll try go give to you an answer.
Thanks.
A good resource for all of this is http://highscalability.com/. Lots of interesting case studies about handling big web loads.
You didn't mention it but you might want to think about hosting it in the cloud (Azure, Amazon, etc). Makes scaling the hardware a little easier and it's especially nice if your demand fluctuates.
Here are some basic guidelines:
Use as much async processes as possible. Or atleast design it in such a way that it can be converted to be async.
Design processes such that they can be segregated on different servers. This also goes to above. Say you have a webapp that has some intensive processes. If this process is asynch; then the main webserver could queue the job and be done with. Then a seperate server could pick the job and process it. This way your main web servers are not affected. But if you are resource constrained, you could still run the background process on same server (till you have enough clients and then you can spawn it off to a diff. server)
Design for load balancing. So if you app useses sessions, then you should factor in how you will be replicating sessions or not. You dont have to - you could send the user to a diff. server and then forward all subsequent requests to that server. But you still have to design for it.
Have the ability to route load to different servers based on some predefined criteria. So for eg: since your app is a SAAS app, you could decide that certain clients will go to Environment1 and certain other clients will go to Environment2. Lot of the SAAS players do this. For eg Salesforce.
You dont necessarily have to do this from the get go - but having this ability will go a long way to scale your app when the time comes.
Also, remember that theses approaches are not exclusive. You should design your app for all these approaches; but only implement it when required.
Take a look at the book The Art of Scalability
This book was written by guys that worked with eBay & Paypal.
Tale a look at this excellent presentation on scalability patterns and approaches.
So, I programmed this twisted application a few months ago, which I now would like to extend with a web-based user interface for configuration.
The Twisted website recommends Nevow, but I am not really sure if this is a good choice. Their website is down for a while it seems and their launchpad page hadn't seen any update in half a year. Is this project dead?
Additionally I have seen discussion of moving parts of Nevow into twisted.web on the twisted-web mailinglist. So, is it still recommended for new developments?
Another idea was using Django. I would need user authentication and permissions anyway in the config-interface, and I am quite familiar with it. (I have never worked with Nevow or twisted.web)
But it seems quite difficult to interface both worlds, all I could find were examples of running Django with WSGI in Twisted.
Are there any other possibilities to have a slick looking user interface on top of twisted?
First, let me address the perception that Nevow is dead. The launchpad project containing the code for Nevow (and the rest of the Divmod projects) is divmod.org on launchpad. A hardware failure has badly impacted the project's public presence, but it's still there, and other things (like the wiki and the tickets) are in the process of being recovered. There isn't a lot of active maintenance work going on right now, but that's mostly because it's good enough for most of its users; there are lots of people who depend on Nevow and would be very upset if it stopped working. Those people have the skills and experience necessary to continue maintaining it. So, while it's not being actively promoted right now, I think it's unlikely that it's going to go away.
My long-term hope for Nevow would be as follows. (I'd say "plan", but since I haven't been actively involved with its maintenance lately, this is really up to those who are.) First, I'd like to extract its templating facilities and move them into twisted.web. The clean, non-deprecated API for Nevow is mostly covered by nevow.page.Element and the various loaders. Twisted itself wants to generate HTML in a few places and these facilities could be useful. Then we should throw out the "appserver" and resource-model parts of Nevow. Those are mostly just a random collection of bugfixes or alterations for twisted.web, most of which were present in some form in twisted.web2 and will therefore either be rolled back into twisted.web anyway, or have already been applied there. Finally there's the question of Athena. While two-way communication is one of Twisted's strengths, Athena is itself a gigantic, sprawling JavaScript codebase and should probably remain its own project.
Third, on to the main question, given this information, what should you do now?
Generally speaking, I'd say, "use nevow". The project has some warts, it needs more documentation and its API needs to be trimmed to eliminate some old and broken stuff, but it's still quite useful and very much alive. To make up for the slightly sparse documentation, you can join the #divmod or #twisted.web channels on Freenode to get help with it. If you help out by contributing patches where you can, you will find that you'll get a lot of enthusiastic help there. When you ignore the deprecated parts Nevow has a pretty small, sane, twisted friendly API. The consequence of the plan for Nevow's evolution that I outlined above are actually pretty minimal. If it even happens at all, what it means for you is, in 1-5 years, when you upgrade to a new version of Twisted, you'll get a couple of deprecation warnings, change some import lines in your code from from nevow.page import ...; from nevow.loaders import ... to some hypothetical new thing like from twisted.web.page.element import ...; from twisted.web.page.templates import ..., or somesuch. Most of the API past that point should remain the same, and definitely the high-level concepts shouldn't change much.
The main advantage that you get from using Nevow is that it's async-friendly and can render pages in your main thread without blocking things. Plus, you can then get really easy COMET for free with Athena.
You can also use Django. This is not quite as async-friendly but it obviously does have a broader base of support. However, "not as async friendly" doesn't mean "hard to use". You can run it in twisted.web via WSGIResource, and simply use blockingCallFromThread in your Django application to invoke any Twisted API that returns a Deferred, which should be powerful enough to do just about anything you want. If you have a more specific question about how to instantiate Twisted web resources to combine Twisted Web and Django, you should probably ask it in its own Stack Overflow question.
Nevow is still a good choice if you want support for Deferreds in the templating system you use (it's not dead). It also has a few advantages over plain Twisted Web when it comes to complicated URL dispatch. However, it is basically just a templating system. Twisted Web is the real web server. So either way, you're going to use Twisted Web. In fact, even if you use Django in Twisted Web's WSGI container, you're still going to use Twisted Web. So learning things about Twisted Web isn't going to hurt you.
If you're going to be generating any amount of HTML, then you very much want to use an HTML templating library. By this point no one should be constructing HTML using primitive string operations. So if you want to use one of the other Python HTML templating libraries out there - Cheetah, Quixote, etc - instead of Nevow, that's great! You're just going to use the templating library to get a string to write out in response to an HTTP request. Twisted Web doesn't care where the string came from.
And if you do want to do something with Django (or another WSGI-based system), then you can certainly deploy this in your Twisted process using Twisted Web's WSGI support. And you can still interact between the WSGI applications and the rest of your Twisted code, as long as you exercise a little care - WSGI applications run in a thread pool, and Twisted APIs are not thread-safe, you have to invoke them with reactor.callFromThread or one of the small number of similar APIs (in particular, blockingCallFromThread is sometimes a useful higher-level tool to use).
At this point Nevow is definitively dead. As illustration of how dead it is, there is a bug that prevents installation of Nevow using pip, which was fixed on trunk in 2009, but it isn't in any release because there has been no release since then.
twisted.web and in particular twisted.web.template cover pretty much all of what was useful in Nevow, and should be used for any new project that was considering using Nevow.
I'm super new to programming and I've been using appengine to help me learn python and general coding. I'm getting better quickly and I'm loving it all the way :)
Appengine was awesome for allowing me to just dive into writing my app and getting something live that works (see http://www.7bks.com/). But I'm realising that the longer I continue to learn on appengine the more I'm constraining myself and locking myself into a single system.
I'd like to move to developing on full django (since django looks super cool!). What are my next steps? To give you a feel for my level of knowledge:
I'm not a unix user
I'm not familiar with command line controls (I still use appengine/python completely via the appengine SDK)
I've never programmed in anything other than python, anywhere other than appengine
I know the word SQL, but don't know what MySQL is really or how to use it.
So, specifically:
What are the skills I need to learn to get up and running with full django/python?
If I'm going to host somewhere else I suppose I'll need to learn some sysadmin type skills (maybe even unix?). Is there anywhere that offers easy hosting (like appengine) but that supports django?
I hear such great things about heroku I'm considering switching to RoR and going there
I appreciate that I'm likely not quite ready to move away from appengine just yet but I'm a fiercely passionate learner (http://www.7bks.com/blog/179001) and would love it if I knew all the steps I needed to learn so I could set about learning them. At the moment, I don't even know what the steps are I need to learn!
Thank you very much. Sorry this isn't a specific programming question but I've looked around and haven't found a good how-to for someone of my level of experience and I think others would appreciate a good roadmap for the things we need to learn to get up and running.
Thanks,
Tom
PS - if anyone is in London and fancies showing me the ropes in person that would be super awesome :)
First up, you can benefit by doing some RoR work by learning a new language. However, I don't know if that'll be entirely beneficial to you right now since you still are learning. I'd stick with Python and Django (or AppEngine) for the moment, until you can grasp some of the more advance concepts. Then, by all means, learning new languages will be fantastic.
As for moving to Django from AppEngine. There isn't a whole lot that's different. The way you define models is similar, but has different types for the definition. As you mentioned, hosting is another consideration.
There should be plenty of hosting options (mod_wsgi is what you're after) based on Apache. Django in particular has seen quite a bit of popularity, and hosting usually springs up for popular frameworks.
I don't think you'll need to know too much sysadmin stuff. This will all depend on the kind of hosting you can find. Same goes for the database. Hosting providers usually offer databases preconfigured so you shouldn't need to worry about that too much.
Django, along with many other frameworks, provide an ORM (Object Relational Mapper) which abstracts away having to write SQL, by calling methods on objects and accessing their properties. I'd advise learning a little bit of SQL to understand it at a bare minimum though.
The Django tutorial is excellent! If you decide to go the Django route, I'd highly recommend working through the entire thing. A development server comes bundled, so you can try out your work instantly without worrying about a provider. Once you have something you want to share with the world, you can worry about hosting then.
I started off using Windows for Django development and it was quite easy. The amount of command line work you need to do is minimal. Really. Not something you need to worry about, as the tutorial covers all 4 or so commands you need to know.
Django hosting provides links to hosting providers, though I'm not sure how up to date that list is.
Getting started in Django is pretty simple. Once you want to host it, there's a bit more work involved - but that can come later. The friction is minimal. Follow the tutorial, it will take you through running the server, setting up the database (a free one comes bundled), and coding your first app.
What makes you think you're being locked into a single platform? Did you know that Google's App Engine SDK is open source? Also, there are universities and other organizations who are working on building platforms that will use the App Engine SDK outside of the context of Google? Amazon EC2 is also capable of running App Engine's SDK in a limited capacity. I'd say lock-in is perhaps not the right word to use.
Additionally, I believe AppEngine is going to continue to improve as time goes on. Google is the leader of the Internet; they've done great things and will continue to do so. I believe that anyone who sticks with their platform as a service will experience these great benefits in the years to come.
If your reasoning for moving is purely academic, I'd suggest starting a new project. Moving off of AppEngine's SDK is similar to switching from one framework to another on an already-built application. Like with any framework or platform, there are dependencies that must be dealt with in order to successfully migrate the app from platform A to platform B.
django-nonrel makes it possible to run Django on Google AppEngine: http://www.allbuttonspressed.com/projects/django-nonrel
Beside that there exists a couple of cloud offers like djangy https://www.djangy.com/
With both options you can focus on Django and Python programming and don't have to care about the sysadmin stuff.
On the django homepage there is a very good tutorial to get started with django development: http://docs.djangoproject.com/en/1.2/intro/tutorial01/
What are the skills I need to learn to get up and running with full django/python?
The question can't easily be answered because you haven't described the app. You have to actually write down the technology stack -- in detail -- or you'll never know what skills you need.
The skill list mostly comes from your technology choices. So write down your technology choices. (That's part of configuration management, an important skill you'll need if you move away from GAE.)
Since you've chosen to talk about yourself and not your technology choices, I can only guess what technologies you're using and what skills you'll need.
Here's a common technology stack.
Technically, the OS doesn't matter. Most hosting environments use open source GNU/Linux because the licensing is inexpensive. You, too, can do this. You can start with VMWare and download a nice Linux distro. Or, you can buy a very cheap PC and install Linux directly from a DVD image that you can download and burn.
My company demands that I use the Windows PC they give me. So I develop in Windows and test in VMWare Linux (Fedora 14, actually)
To learn Linux, start with a download and install. Then find a tutorial. Then stop using Windows and learn by doing. Flipping back and forth between Windows and Linux is difficult. I can do it because I don't know Windows very well. I treat Windows as a hyper-complex IDE with all kinds of non-standard, non-POSIX quirks that I try to ignore.
RDBMS. Python comes with SQLite. For a lot of applications, it works fine. It works because web sites tend to be heavy on queries and light on updates/inserts, so SQLite works well. MySQL is nice. It's easy to install and runs on Windows as well as other OS's.
The good thing about Django is you need to know very little SQL. Very little.
However, you do need to know a tiny bit about the "Data Control Language" (Grant, Revoke and Create User) to work with MySQL. You won't create a lot of users. But you do need to create a few to get things running. Also, as your database matures, you'll often need to know a little bit about the "Data Definition Language" (Drop Table).
You will need to know how to backup and restore your database. That's absolutely critical.
So find Database Administrator tutorials to get started.
If your application really uses a lot of sophisticated data, you'll need to buy real books on database design so you can understand how the Django models really work. You don't need to become a SQL guru, but it does help to know what's really going on inside the database.
Application Server. We use Apache with the mod_wsgi module. There are numerous choices. Hosting services vary in what they require and what they permit. Some have Apache, mod_wsgi and Django pre-configured. Some don't. Some do not permit tinkering with the Apache configuration. Some do. You probably don't need to know much about this, because you can probably find a hosting service that will handle the details.
Apache tutorials are all over the place. mod_wsgi is very simple; once you understand how Apache works, mod_wsgi is obvious.
Since you have stuff working, presumably you know about HTML and CSS. Those are important skills, but you probably already have them.
Since you have stuff working, you also might know a lot about Configuration Management and how to control change. This isn't obvious and many people do it wrong. If you don't know about CM, you should find some books or articles on configuration management and change control.
Since you have stuff working, you also might know a lot about quality assurance, unit testing and related topics. If you don't have a complete suite of unit tests, you should probably start learning about unit testing before you start any serious coding for the next release of your product.
Bug Tracking, Problem Reporting, Feature Requests and other management skills are also essential. I can't tell if you have them or don't have them. Or what tools you're using for this. If you're working by yourself, you don't need a lot of formality. However, these are critical skills even if you're a one-person developer. Sticky-Notes on your workstation can work. What's important is the skills to manage bugs and features.
Hey Tom,
I suggest the reasonable evaluation you can make is carefully list the advantages and disadvantages of the choices.
The way I don't regret taking was physical rack server (2006-07), moved to virtual hosting (2008) and now moved to GAE (2009-current). Seeing the rate new features get added to GAE and the costeffectiveness are more reasons to stay. I agree more stuff django can are needed like in my case form preview and form validation with GAE are difficult or too difficult to set up.
I tried RoR and soon thought RoR requires more code to do the same GAE can with less code.
Also with GAE you have absolutely no hardware that can break. If you move to a rack server or a virtual hosting where there are places you can get 5 GB hosting for free but you don't have a plan when you run out of 5 GB and may need to migrate again which you don't want.
MySQL has been around for over 10 years and is quite a different kind of system. It's possible to save blobs in MySQL but don't you think the blobstore GAE has is much better?
If you choose to migrate to a solution with MySQL you can export you data from GAE and import it to MySQL with a tool such as approcket.
Kind regards/Niklas R
I'm writing a reasonably complex web application. The Python backend runs an algorithm whose state depends on data stored in several interrelated database tables which does not change often, plus user specific data which does change often. The algorithm's per-user state undergoes many small changes as a user works with the application. This algorithm is used often during each user's work to make certain important decisions.
For performance reasons, re-initializing the state on every request from the (semi-normalized) database data quickly becomes non-feasible. It would be highly preferable, for example, to cache the state's Python object in some way so that it can simply be used and/or updated whenever necessary. However, since this is a web application, there several processes serving requests, so using a global variable is out of the question.
I've tried serializing the relevant object (via pickle) and saving the serialized data to the DB, and am now experimenting with caching the serialized data via memcached. However, this still has the significant overhead of serializing and deserializing the object often.
I've looked at shared memory solutions but the only relevant thing I've found is POSH. However POSH doesn't seem to be widely used and I don't feel easy integrating such an experimental component into my application.
I need some advice! This is my first shot at developing a web application, so I'm hoping this is a common enough issue that there are well-known solutions to such problems. At this point solutions which assume the Python back-end is running on a single server would be sufficient, but extra points for solutions which scale to multiple servers as well :)
Notes:
I have this application working, currently live and with active users. I started out without doing any premature optimization, and then optimized as needed. I've done the measuring and testing to make sure the above mentioned issue is the actual bottleneck. I'm sure pretty sure I could squeeze more performance out of the current setup, but I wanted to ask if there's a better way.
The setup itself is still a work in progress; assume that the system's architecture can be whatever suites your solution.
Be cautious of premature optimization.
Addition: The "Python backend runs an algorithm whose state..." is the session in the web framework. That's it. Let the Django framework maintain session state in cache. Period.
"The algorithm's per-user state undergoes many small changes as a user works with the application." Most web frameworks offer a cached session object. Often it is very high performance. See Django's session documentation for this.
Advice. [Revised]
It appears you have something that works. Leverage to learn your framework, learn the tools, and learn what knobs you can turn without breaking a sweat. Specifically, using session state.
Second, fiddle with caching, session management, and things that are easy to adjust, and see if you have enough speed. Find out whether MySQL socket or named pipe is faster by trying them out. These are the no-programming optimizations.
Third, measure performance to find your actual bottleneck. Be prepared to provide (and defend) the measurements as fine-grained enough to be useful and stable enough to providing meaningful comparison of alternatives.
For example, show the performance difference between persistent sessions and cached sessions.
I think that the multiprocessing framework has what might be applicable here - namely the shared ctypes module.
Multiprocessing is fairly new to Python, so it might have some oddities. I am not quite sure whether the solution works with processes not spawned via multiprocessing.
I think you can give ZODB a shot.
"A major feature of ZODB is transparency. You do not need to write any code to explicitly read or write your objects to or from a database. You just put your persistent objects into a container that works just like a Python dictionary. Everything inside this dictionary is saved in the database. This dictionary is said to be the "root" of the database. It's like a magic bag; any Python object that you put inside it becomes persistent."
Initailly it was a integral part of Zope, but lately a standalone package is also available.
It has the following limitation:
"Actually there are a few restrictions on what you can store in the ZODB. You can store any objects that can be "pickled" into a standard, cross-platform serial format. Objects like lists, dictionaries, and numbers can be pickled. Objects like files, sockets, and Python code objects, cannot be stored in the database because they cannot be pickled."
I have read it but haven't given it a shot myself though.
Other possible thing could be a in-memory sqlite db, that may speed up the process a bit - being an in-memory db, but still you would have to do the serialization stuff and all.
Note: In memory db is expensive on resources.
Here is a link: http://www.zope.org/Documentation/Articles/ZODB1
First of all your approach is not a common web development practice. Even multi threading is being used, web applications are designed to be able to run multi-processing environments, for both scalability and easier deployment .
If you need to just initialize a large object, and do not need to change later, you can do it easily by using a global variable that is initialized while your WSGI application is being created, or the module contains the object is being loaded etc, multi processing will do fine for you.
If you need to change the object and access it from every thread, you need to be sure your object is thread safe, use locks to ensure that. And use a single server context, a process. Any multi threading python server will serve you well, also FCGI is a good choice for this kind of design.
But, if multiple threads are accessing and changing your object the locks may have a really bad effect on your performance gain, which is likely to make all the benefits go away.
This is Durus, a persistent object system for applications written in the Python
programming language. Durus offers an easy way to use and maintain a consistent
collection of object instances used by one or more processes. Access and change of a
persistent instances is managed through a cached Connection instance which includes
commit() and abort() methods so that changes are transactional.
http://www.mems-exchange.org/software/durus/
I've used it before in some research code, where I wanted to persist the results of certain computations. I eventually switched to pytables as it met my needs better.
Another option is to review the requirement for state, it sounds like if the serialisation is the bottle neck then the object is very large. Do you really need an object that large?
I know in the Stackoverflow podcast 27 the reddit guys discuss what they use for state, so that maybe useful to listen to.