Reload single module in cherrypy? - python

Is it possible to use the python reload command (or similar) on a single module in a standalone cherrypy web app? I have a CherryPy based web application that often is under continual usage. From time to time I'll make an "important" change that only affects one module. I would like to be able to reload just that module immediately, without affecting the rest of the web application. A full restart is, admittedly, fast, however there are still several seconds of downtime that I would prefer to avoid if possible.

Reloading modules is very, very hard to do in a sane way. It leads to the potential of stale objects in your code with impossible-to-interrogate state and subtle bugs. It's not something you want to do.
What real web applications tend to do is to have a server that stays alive in front of their application, such as Apache with mod_proxy, to serve as a reverse proxy. You start your new app server, change your reverse proxy's routing, and only then kill the old app server.
No downtime. No insane, undebuggable code.

Related

Testing a Flask application with requests

Ever since I read
A untested application is broken
in the flask documentation about testing here
I have been working down my list of things to make for some of my applications.
I currently have a flask web app when I write a new route I just write a requests.get('https://api.github.com/user', auth=('user', 'pass')), post, put, etc to test the route.
Is this a decent alternative? Or should I try and do tests via what flask's documentation says, and if so why?
Fundamentally it's the same concept, you are running functionality tests as they do. However, you have a prerequisite, a live application running somewhere (if I got it right). They create a fake application (aka mock) so you can test it without being live, e.g. you want to run tests in a CI environment.
In my opinion it's a better alternative than a live system. Your current approach consumes more resources on your local machine, since you are required to run the whole system to test something (i.e. at least a DB and the application itself). In their approach they don't, the fake instance does not need to have real data, thus no connection to a DB or any other external dependency.
I suggest you to switch to their testing, in the end you will like it.

Fastest, simplest way to handle long-running upstream requests for Django

I'm using Django with Uwsgi. We have 8 processes running, and I have no real indication that our code is particularly thread safe, as it was never designed with threads in mind.
Recently, we added the ability to get live rates from vendors of a service through their various APIs and display them at once for the user. The problem is these requests are old web services technologies, and due to their response times, the time needed before the all rates from vendors are acquired (or it gives up), can be up to 10 seconds.
This presents a problem. We have a pretty decent amount of traffic on our site, and the customers need to look at these rates pretty often. With only 8 processes, it's quite easy to see how the server can get tied up waiting on these upstream requests. Especially when other optimizations need to be made to make the site baseline faster anyway (we're working on that).
We made a separate library (which should be mostly threadsafe, and if not, should be converted to it easily enough) for the rates requesting, and we can separate out its configuration. So I was thinking of making a separate service with its own threads, perhaps in Twisted, and having the browser contact that service for JSON instead of having it run in the main Django server.
Is this solution a good one? Can you think of a better or simpler way to do it? Should I use something other than Twisted, and if so, why?
If you want to use your code in-process with Django, you can simply call out to your Twisted by using Crochet, which can automatically manage the creation, running, and shutdown of the reactor within whatever WSGI implementation you choose (presuming that it behaves like a regular Python process, at least).
Obviously it might be less complex to just run within the Twisted WSGI container :-).
It might also be worth looking at TReq to issue your service client requests; your new "thread safe" library will still have the disadvantage of tying up an entire thread for each blocking client, which is a non-trivial amount of memory and additional concurrency overhead, whereas with Twisted you will only need to worry about a couple of objects.

Learning Python, Python web Application stay active between requests

I am a PHP programmer learning Python, when ever I get a chance.
I read that Python web Application stay active between requests.
Meaning that data stays in memory and is available between requests, right?
I am wondering how that works.
In php we place a cookie with a unique token, and save data in sessions.
Sessions are arrays, saved on disk or database.
Between requests the session functions, restore the correct session array based on the cookie with the unique token. That means each browser gets it's own unique session, and the session has a preset expiration time. If the user is inactive and the expiration get's triggered then the session gets purged. A new session has to be created when the user comes back.
My understanding is Python doesn't need this, because the application stays active between requests.
Doesn't each request get a unique thread in Python?
How does it distinguish between requests, who the requester is?
Is there a handling method to separate vars between users and application?
Lets say I have a dict saved, is this dict globally available between all requests from any browser, or only to that one browser.
When and how does the memory get cleared. If everything stays in the memory. What if the app is running for a couple years without a restart. There must be some kind of expiration setting or memory handling?
One commenter says it depends on the web app. So I am using Bottle.py to learn.
I would assume the answer would depend on which web application framework you are using within python. Some of them have session management pieces in them that track a user across requests. But if you just had a basic port listener that responded with http, you would have to build any cookie support or session management yourself.
The other big difference is that in php, you have a module installed on the server that the actual http server delegates to in order to generate a response. PHP doesn't handle the routing or actual serving of the responses. Where as python can actually be the server and the resource for generating the response. It depends on how python is installed/accessed on the machine where the server is running. So in that sense you can do whatever you want within a python web application.
If you are interested, you should look at some available python web frameworks.
Edit: I see you mentioned bottle.py, and out of the box, it does not provide authentication and session management because it's a micro framework for fast prototyping and not necessarily suitable for a large scale application (although not impossible, just a lot of work).
Yes and no. If you check out this question you get an idea how it could work for a Django application.
However, the way you state it, it will not work. Defining a dict in one request without passing it somewhere in order to make it accessible in following request will obviously not make it available in further requests. So, yes, you have the options do this but its not the casue out of the box!
I was able to persist an object in Python between requests before using Twisted's web server. I have not tried seeing for myself if it persists across browsers though but I have a feeling it does. Here's a code snippet from the documentation:
Twisted includes an event-driven web server. Here's a sample web application; notice how the resource object persists in memory, rather than being recreated on each request:
from twisted.web import server, resource
from twisted.internet import reactor
class HelloResource(resource.Resource):
isLeaf = True
numberRequests = 0
def render_GET(self, request):
self.numberRequests += 1
request.setHeader("content-type", "text/plain")
return "I am request #" + str(self.numberRequests) + "\n"
reactor.listenTCP(8080, server.Site(HelloResource()))
reactor.run()
First of all you should understand the difference between local and global variables in python, and also how thread local storage works.
This is a (very) short explanation:
global variables are those declared at module scope and are shared by all threads. They live as long as the process is running, unless explicitly removed
local variables are those declared inside a function and instantiated for each call of that function. They are deleted when the function is over unless it is still referenced somewhere else.
thread local stoarage enables defining global variables that are specific to the current thread. The live as tong as the current thread is running, unless explicitly removed.
And now I'll try to answer your original questions (the answers are specific to bottle.py, but it is the most common implementation in python web servers)
Doesn't each request get a unique thread in Python?
Each concurrent will have a separate thread, future requests might reuse the previous threads.
How does it distinguish between requests, who the requester is?
bottle.py uses thread local storage to access the current request
Is there a handling method to separate vars between users and application?
Sounds like you are looking for a session. If so, there is no standard way of doing it, because different implementation have advantages and disadvantages. For example this is a bottle.py middleware for sessions.
Lets say I have a dict saved, is this dict globally available between
all requests from any browser, or only to that one browser. When and
how does the memory get cleared.
If everything stays in the memory. What if the app is running for a
couple years without a restart. There must be some kind of expiration
setting or memory handling?
Exactly, there must be an expiration setting. Since you are using a custom dict you need a timer that checks each entry in the dict for expiration.

Python web application: How to keep state

I wrote a WSGI compatible web application using web.py that loads a few dozen MB data into memory during startup.
It works quite well with the web.py integrated server.
However, using Apache 2 + mod_wsgi, every single request reloads the data, essentially starting the program again. Due to the loading time of several seconds, this is unbearable.
Is it inherent to mod_wsgi or can it be configured? What are my alternatives?
"Is it inherent to mod_wsgi?" No. It's inherent in HTTP
Since you didn't post your mod_wsgi configuration, it's impossible to say what you did wrong.
I can only guess that you didn't use daemon mode.
See http://code.google.com/p/modwsgi/wiki/ConfigurationGuidelines#Defining_Process_Groups for more information on daemon mode.
This may not be the best solution. It may be better (far, far better) to use a proper database. Without actual code examples, and more details, this is all just random guessing.

A good multithreaded python webserver?

I am looking for a python webserver which is multithreaded instead of being multi-process (as in case of mod_python for apache). I want it to be multithreaded because I want to have an in memory object cache that will be used by various http threads. My webserver does a lot of expensive stuff and computes some large arrays which needs to be cached in memory for future use to avoid recomputing. This is not possible in a multi-process web server environment. Storing this information in memcache is also not a good idea as the arrays are large and storing them in memcache would lead to deserialization of data coming from memcache apart from the additional overhead of IPC.
I implemented a simple webserver using BaseHttpServer, it gives good performance but it gets stuck after a few hours time. I need some more matured webserver. Is it possible to configure apache to use mod_python under a thread model so that I can do some object caching?
CherryPy. Features, as listed from the website:
A fast, HTTP/1.1-compliant, WSGI thread-pooled webserver. Typically, CherryPy itself takes only 1-2ms per page!
Support for any other WSGI-enabled webserver or adapter, including Apache, IIS, lighttpd, mod_python, FastCGI, SCGI, and mod_wsgi
Easy to run multiple HTTP servers (e.g. on multiple ports) at once
A powerful configuration system for developers and deployers alike
A flexible plugin system
Built-in tools for caching, encoding, sessions, authorization, static content, and many more
A native mod_python adapter
A complete test suite
Swappable and customizable...everything.
Built-in profiling, coverage, and testing support.
Consider reconsidering your design. Maintaining that much state in your webserver is probably a bad idea. Multi-process is a much better way to go for stability.
Is there another way to share state between separate processes? What about a service? Database? Index?
It seems unlikely that maintaining a huge array of data in memory and relying on a single multi-threaded process to serve all your requests is the best design or architecture for your app.
Twisted can serve as such a web server. While not multithreaded itself, there is a (not yet released) multithreaded WSGI container present in the current trunk. You can check out the SVN repository and then run:
twistd web --wsgi=your.wsgi.application
Its hard to give a definitive answer without knowing what kind of site you are working on and what kind of load you are expecting. Sub second performance may be a serious requirement or it may not. If you really need to save that last millisecond then you absolutely need to keep your arrays in memory. However as others have suggested it is more than likely that you don't and could get by with something else. Your usage pattern of the data in the array may affect what kinds of choices you make. You probably don't need access to the entire set of data from the array all at once so you could break your data up into smaller chunks and put those chunks in the cache instead of the one big lump. Depending on how often your array data needs to get updated you might make a choice between memcached, local db (berkley, sqlite, small mysql installation, etc) or a remote db. I'd say memcached for fairly frequent updates. A local db for something in the frequency of hourly and remote for the frequency of daily. One thing to consider also is what happens after a cache miss. If 50 clients all of a sudden get a cache miss and all of them at the same time decide to start regenerating those expensive arrays your box(es) will quickly be reduced to 8086's. So you have to take in to consideration how you will handle that. Many articles out there cover how to recover from cache misses. Hope this is helpful.
Not multithreaded, but twisted might serve your needs.
You could instead use a distributed cache that is accessible from each process, memcached being the example that springs to mind.
web.py has made me happy in the past. Consider checking it out.
But it does sound like an architectural redesign might be the proper, though more expensive, solution.
Perhaps you have a problem with your implementation in Python using BaseHttpServer. There's no reason for it to "get stuck", and implementing a simple threaded server using BaseHttpServer and threading shouldn't be difficult.
Also, see http://pymotw.com/2/BaseHTTPServer/index.html#module-BaseHTTPServer about implementing a simple multi-threaded server with HTTPServer and ThreadingMixIn
I use CherryPy both personally and professionally, and I'm extremely happy with it. I even do the kinds of thing you're describing, such as having global object caches, running other threads in the background, etc. And it integrates well with Apache; simply run CherryPy as a standalone server bound to localhost, then use Apache's mod_proxy and mod_rewrite to have Apache transparently forward your requests to CherryPy.
The CherryPy website is http://cherrypy.org/
I actually had the same issue recently. Namely: we wrote a simple server using BaseHTTPServer and found that the fact that it's not multi-threaded was a big drawback.
My solution was to port the server to Pylons (http://pylonshq.com/). The port was fairly easy and one benefit was it's very easy to create a GUI using Pylons so I was able to throw a status page on top of what's basically a daemon process.
I would summarize Pylons this way:
it's similar to Ruby on Rails in that it aims to be very easy to deploy web apps
it's default templating language, Mako, is very nice to work with
it uses a system of routing urls that's very convenient
for us performance is not an issue, so I can't guarantee that Pylons would perform adequately for your needs
you can use it with Apache & Lighthttpd, though I've not tried this
We also run an app with Twisted and are happy with it. Twisted has good performance, but I find Twisted's single-threaded/defer-to-thread programming model fairly complicated. It has lots of advantages, but would not be my choice for a simple app.
Good luck.
Just to point out something different from the usual suspects...
Some years ago while I was using Zope 2.x I read about Medusa as it was the web server used for the platform. They advertised it to work well under heavy load and it can provide you with the functionality you asked.

Categories

Resources