I am developing an application with the bottlepy framework. I am using the standard library WSGIRefServer() to run a development server. It is a single threaded server.
Now when going into production, I will want to move to a multi-threaded production server, and there are many choices. Let's say I choose CherryPy.
Now, in my code, I am initializing a single wsgi application. Other than that, I am also initializing other things...
Memcached connection
Mako templates
MongoDB connection
Since standard library wsgiref is a single threaded server, and I am creating only a single wsgi app (wsgi callable), everything works just fine.
What I want to know is that when I move to the multi-threaded server, how will my wsgi app, initialization code, connections to different server, etc. behave.
Will a multi-threaded server create a separate instance of wsgi app for every thread. And will a new thread be spawned for each new request (which then means a new wsgi app for each request)?
Will my connections to memcached, mongoDB, etc, be shared across threads or not. What else will be shared between threads
Please explain the request-response cycle for a threaded server
In general your application is using wsgi compliant framework and you shouldn't be afraid of multi-threaded / single-threaded server side. It's meant to work transparent and has to react same way despite of what kind of server is it, as long as it is wsgi compliant.
Every code block before bottle.run() will be run only once. As so, every connection (database, memcached) will be instantiated only once and shared.
When you call bottle.run() bottlepy starts wsgi server for you. Every request to that server fires some wsgi callable inside bottlepy framework. You are not really interested if it is single or multi -threaded environment, as long as you don't do something strange.
For strange i mean for instance synchronizing something through global variables. (Exception here is global request object for which bottlepy ensures that it contains proper request in proper context).
And in response to first question on the list: request may be computed in newly spawned thread or thread from the pool of threads (CherryPy is thread-pooled)
Related
So we used to run our Pyramid server with Apache in production. But we are moving to Docker containerization for easier prod deployments etc, and we want to adhere to the philosophy of "one process per container"..so instead of running Apache in the container + 4 python procs, we just want 1 python proc.
So my question is - is there a way to run a Pyramid server in production directly? Without using WSGI+Apache?
https://www.digitalocean.com/community/tutorials/how-to-use-the-pyramid-framework-to-build-your-python-web-app-on-ubuntu
My understanding is that pserve is for development only?
Create an application.py file and fill it with the following contents:
from wsgiref.simple_server import make_server
from pyramid.config import Configurator
from pyramid.response import Response
def hello_world(request):
return Response('<h1>Hello world!</h1>')
if __name__ == '__main__':
config = Configurator()
config.add_view(hello_world)
app = config.make_wsgi_app()
server = make_server('0.0.0.0', 8080, app)
server.serve_forever()
Will the above work as a production-grade server?
The latest official recommendation is one concern per container. From the Docker docs (emphasis my own):
Each container should have only one concern. Decoupling applications
into multiple containers makes it easier to scale horizontally and
reuse containers. For instance, a web application stack might consist
of three separate containers, each with its own unique image, to
manage the web application, database, and an in-memory cache in a
decoupled manner.
Limiting each container to one process is a good rule of thumb, but it
is not a hard and fast rule. For example, not only can containers be
spawned with an init process, some programs might spawn additional
processes of their own accord. For instance, Celery can spawn multiple
worker processes, and Apache can create one process per request.
In your case, your web application server is a single concern. Running Apache+WSGI is totally fine. Don't worry about the processes—That's Apache's job.
For a better understanding of the "one concern" rule, this post is a good overview of what problems its trying to solve.
You can use Waitress, which, according to their documentation, is
... meant to be a production-quality pure-Python WSGI server with very
acceptable performance. It has no dependencies except ones which live
in the Python standard library.
Waitress is a part of the Pylons Project just like Pyramid is.
It looks like Bjoern is a solid choice when it comes to running Python directly, where the Python server has WSGI bindings:
https://www.appdynamics.com/blog/engineering/a-performance-analysis-of-python-wsgi-servers-part-2/
https://github.com/jonashaag/bjoern
I'm using wsgi apache and flask to run my application. I'm use from yourapplication import app as application to start my application. That works so far fine. The problem is, with every request a new instance of my application is created. That leads to the unfortunate situation that my flask application creates a new database connection but only closes it after about 15 min. Since my server allows only 16 open DB connections the server starts to block requests very soon. BTW: This is not happening when I run flask without apache/wsgi since it opens only one connection and serves all requests as I want.
What I want: I want to run only one flask instance which then servers all requests.
The WSGIApplicationGroup directive may be what you're looking for as long as you have the wsgi app running in daemon mode (otherwise I believe apache's default behavior is to use prefork which spins up a process to handle each individual request):
The WSGIApplicationGroup directive can be used to specify which application group a WSGI application or set of WSGI applications belongs to. All WSGI applications within the same application group will execute within the context of the same Python sub interpreter of the process handling the request.
You have to provide an argument to the directive that specifies a name for the application group. There's a few expanding variables: %{GLOBAL}, %{SERVER}, %{RESOURCE} and %{ENV:variable}; or you can specify your own explicit name. %{GLOBAL} is special in that it expands to the empty string, which has the following behavior:
The application group name will be set to the empty string.
Any WSGI applications in the global application group will always be executed within the context of the first interpreter created by Python when it is initialised. Forcing a WSGI application to run within the first interpreter can be necessary when a third party C extension module for Python has used the simplified threading API for manipulation of the Python GIL and thus will not run correctly within any additional sub interpreters created by Python.
I would recommend specifying something other than %{GLOBAL}.
For every process you have mod_wsgi spawn, everything will be executed in the same environment. Then you can simply control the number of database connections based on the number of processes you want mod_wsgi to spawn.
I have a web app written with Bottle framework. It have a global somedict list accessed by multiple HTTP query.
After some researching, I find that the Bottle framework only support 1 thread in 1 process mode to run my app(I don't believe it is true, perhaps migrating it to other frameworks like Flask is a good idea.).
1 To enable multi-threading, I find WSGI solution but it does not support multiple processs(1 threads for each process) accessing global variable like somedict in my app, because process will re-init the list every time a query gets handled. How can I handle this issue?
2 Is there any other solutions except WSGI that solve the problem to enable this app to serve multiple HTTP query at once?
from bottle import request, route
import threading
somedict = {}
somedict_lock = threading.Lock()
#route("/read")
def read():
with somedict_lock:
return somedict
#route("/write", method="POST")
def write():
with somedict_lock:
somedict[request.forms.get("key1")] = request.forms.get("value1")
somedict[request.forms.get("key2")] = request.forms.get("value2")
It's best to serve a WSGI app via a server like gunicorn or waitress, which will handle your concurrency needs, but almost no matter what you do for concurrency your global queue in memory will not work the way you want it to. You need to use an external memory store like memcached, redis, etc. Static data is one thing, but mutable state should never be shared between web app processes. That's contrary to Python web server idioms and the typical execution model of Python web apps.
I'm not saying it's literally impossible to do in Python, but it's not the way Python solves this problem.
You can process incoming requests asynchronously, currently Celery seems very suitable for running asynchronous tasks. Read how Celery can do this.
I'm from PHP/Apache background. With PHP/Apache setup PHP interpreter is loaded by apache module and then on every page request new worker is created that executes entire script. I've been working with WSGI applications in Python recently and it seems that Apache (mod_wsgi) server loads the entire application and keeps it alive. Then based on incoming requests executes pieces of code. If my understanding is correct it would explain why some of my objects won't execute __del__?
Edit:
In my case I have a wrapper class around python mysql module. There's only one instance used in entire application. It's responsibilities are to run queries and reuse mysql connection throughout the code when possible, but I have to make sure connection will be closed when everything is processed. In some parts I'm using multiprocessing where I tell the object not to reuse connection for spawn child processes. Initially I thought I can use __del__ to implement closing mysql connection but noticed It would be never called. I did end up using flask teardown function to make sure connection is closed after each request but was wondering if there are any other options to handle this nicely.
preface: I would like to separate these problems into smaller questions, but apparently, I am missing some pieces of the puzzle and it seems impossible to me.
I developed my cherrypy application using cherrypy's built in WSGI server. I naively assumed that when the time comes, I will be able to use created WSGI Application class and deploy it using any WSGI compliant server.
I used this blog post to create my own (but very similar) cherrypy Plugin and Tool to connect to database using SQLAlchemy during http requests.
I expected that any server will somehow work like cherrypy's built in server:
main process will spawn X threads to satisfy X concurrent requests
my engine Plugin will create SQLalchemy engine with connection pool = X (so any request will have its connection)
on request arrival, my Tool will supply sql alchemy connection from pool
This flow does not match with uWSGI (as long as I understand it).
I assign my application.py in uWSGI configuration. This file looks something like this:
cherrypy.tools.db = DbConnectorTool()
cherrypy.engine.dbengine = DbEnginePlugin(cherrypy.engine, settings.database)
cherrypy.config.update({
'engine.dbengine.on': True
})
from myapp.application import Application
root = Application(settings)
application = cherrypy.Application(root, script_name='', config=settings)
I was using this application.py to mount my application into cherrypy's built in server when I was developing and testing it.
The problems are that uWSGI does not create any threads itself and my SQLAlchemy plugin is not working with it, because no cherrypy.engine is created.
Does uWSGI support threading in the meaning of using threads to serve multiple concurrent requests? Can I start these threads in my application.py? Will uWSGI understand it and use these threads for concurrent requests? And how can this be done? I think cherrypy can be used somehow, or not?
And what about my SQLAlchemy Plugin, how can I start cherrypy.engine when using only WSGI cherrypy.Application?
Any help or information that could help me will be appreciated.
Edit:
My uWSGI configuration:
<uwsgi>
<socket>127.0.0.1:9001</socket>
<master/>
<daemonize>/var/log/uwsgi/app.log</daemonize>
<logdate/>
<threads/>
<pidfile>/home/web/uwsgi.pid</pidfile>
<uid>uwsgi</uid>
<gid>uwsgi</gid>
<workers>2</workers>
<harakiri>90</harakiri>
<harakiri-verbose/>
<home>/home/web/</home>
<pythonpath>/home/web/instance</pythonpath>
<module>core.application</module>
<no-orphans/>
<touch-reload>/home/web/uwsgi-reload-web</touch-reload>
</uwsgi>
uWSGI uses worker processes, not threads. It's worth noting that it means that the globals are not shared between all requests any more. You can use SharedArea for global data.
The processes are forked by default, so make sure you're ok with that or adjust settings (see Things to know).
Get Cherrypy's WSGI application with cherrypy.tree.mount(root, config=settings) call.
If your DB plugin does not have threading / shared data issues, chances are it will work. Like you say, you may need cherrypy.engine.start(), but definitely not cherrypy.engine.block(), since your main thread is now uWSGI worker.
You should post your uWSGI config, otherwise it will be hard to understand what is going on.
By the way to spawn additional threads (per worker) you simply need to add --threads N