I'm trying to build a Twisted/Django mashup that will let me control various client connections managed by a Twisted server via Django's admin interface. Meaning, I want to be able to login to Django's admin and see what protocols are currently in use, any details specific to each connection (e.g. if the server is connected to freenode via IRC, it should list all the channels currently connected to), and allow me to disconnect or connect new clients by modifying or creating database records.
What would be the best way to do this? There are lots of posts out there about combining Django with Twisted, but I haven't found any prior art for doing quite what I've outlined. All the Twisted examples I've seen use hardcoded connection parameters, which makes it difficult for me to imagine how I would dynamically running reactor.connectTCP(...) or loseConnection(...) when signalled by a record in the database.
My strategy is to create a custom ClientFactory that solely polls the Django/managed database every N seconds for any commands, and to modify/create/delete connections as appropriate, reflecting the new status in the database when complete.
Does this seem feasible? Is there a better approach? Does anyone know of any existing projects that implement similar functionality?
Polling the database is lame, but unfortunately, databases rarely have good tools (and certainly there are no database-portable tools) for monitoring changes. So your approach might be okay.
However, if your app is in Django and you're not supporting random changes to the database from other (non-Django) clients, and your WSGI container is Twisted, then you can do this very simply by doing callFromThread(connectTCP, ...).
I've been working on yet another way of combing django and twisted. Fell free to give it a try: https://github.com/kowalski/featdjango.
The way it works, is slightly different that the others. It starts a twisted application and http site. The requests done to django are processed inside a special thread pool. What makes it special, is that that these threads can wait on Deferred, which makes it easy to combine synchronous django application code with asynchronous twisted code.
The reason I came up with structure like this, is that my application needs to perform a lot of http requests from inside the django views. Instead of performing them one by one I can delegate all of them at once to "the main application thread" which runs twisted and wait for them. The similarity to your problem is, that I also have an asynchronous component, which is a singleton and I access it from django views.
So this is, for example, this is how you would initiate the twisted component and later to get the reference from the view.
import threading
from django.conf import settings
_initiate_lock = threading.Lock()
def get_component():
global _initiate_lock
if not hasattr(settings, 'YOUR_CLIENT')
_initiate_lock.acquire()
try:
# other thread might have did our job while we
# were waiting for the lock
if not hasattr(settings, 'YOUR_CLIENT'):
client = YourComponent(**whatever)
threading.current_thread().wait_for_deferred(
client.initiate)
settings.YOUR_CLIENT = client
finally:
_initiate_lock.release()
return settings.YOUR_CLIENT
The code above, initiates my client and calls the initiate method on it. This method is asynchronous and returns a Deferred. I do all the necessary setup in there. The django thread will wait for it to finish before processing to next line.
This is how I do it, because I only access it from the request handler. You probably would want to initiate your component at startup, to call ListenTCP|SSL. Than your django request handlers could get the data about the connections just accessing some public methods on the your client. These methods could even return Deferred, in which case you should use .wait_for_defer() to call them.
Related
I have a Flask microservice which serves user requests by an endpoint (say): /getdata
The data can be fetched in one of the two ways 1) cache or 2) from database directly - if the cache is in the process of being updated
Another service updates the database (thus making the cache stale). Once the service is done updating the database, it publishes a message to the rabbitmq stating: "update done"
Back to the microservice: I'd like it to have two threads:
Thread 1: runs the app.run()
Thread 2: subscribes to the queue - where "update done" messages are published
Given the two threads, I don't want the /getdata to be fetching database from the cache when it's being updated. At the same time, I don't want to update the cache when data is being fetched from the endpoint.
Here's one solution I can think of:
1) Have a threading.Lock() as a "global"
2) /getdata checks if the lock is available; if so, it will acquire, fetch data from cache and release the lock. If the lock is unavailable, it will fetch the data from the database directly, thereby incurring a performance hit - but still getting the "latest" data
3) RabbitMQ "subscriber" checks the state of the lock; if so, it acquires the lock , updates the cache from the database and releases the lock. If not, it adds the request to a local "queue", and waits for say one minute before trying to acquire the lock again. When it does, it will pop the first item from queue and update the cache from the database.
My questions:
Given the multitude of libraries and options in Python/Flask - is
there a library that allows me to do task like this in a "safe" way
(I am using pika for rabbitmq access)
Is it possible to launch the flask app.run() via one thread and the
queue subscriber via another (i.e. in if __name__ == "main":
)
How do I declare a "global" threading.Lock() which can coordinate
the two threads?
Notes:
I expect that in the worse case the lock won't be acquired for more than one minute.
Pika is not thread safe. You should avoid sharing the connection object across Flask's contexts. Writing your own Flask plugin wouldn't take that much boilerplate though. It would be very similar to the documentation example plugin. Otherwise, you could do a quick search with flask pika on a search engine and you'll find some existing plugins for this purpose. I have not tried them and they don't seem really popular, but maybe you should give them a go?
I don't see why it wouldn't be possible. Flask knows how to deal with this. However, I reckon it would severly degrade performances. Moreover, you might hit some corner-cases if the plugins you use are not perfectly written.
Just like you would declare any lock for threading. Nothing much. You put it at the module level (not in Flask's context) so that it is global, that's it.
That being said, I think you shouldn't proceed this way. You should rather run the update-job in a different process from the Web Server (using Flask CLI or whatever if you need to re-use some functions). It will be better performance-wise, it's easier to reason about, it's more loosely coupled.
Also, you should avoid running into locking headaches as long as possible. Believe me, it's a real source of problems. It's a nightmare to test properly, to debug, to maintain and quite risky when it comes to real-production use-cases. And if you really, really need a lock, don't hold it for one minute, it's way too long.
I don't know your exact requirements, but there surely is a solution that is OK and that does not involve such complexity.
In one of the views in my django application, I need to perform a relatively lengthy network IO operation. The problem is other requests must wait for this request to be completed even though they have nothing to do with it.
I did some research and stumbled upon Celery but as I understand, it is used to perform background tasks independent of the request. (so I can not use the result of the task for the response to the request)
Is there a way to process views asynchronously in django so while the network request is pending other requests can be processed?
Edit: What I forgot to mention is that my application is a web service using django rest framework. So the result of a view is a json response not a page that I can later modify using AJAX.
The usual solution here is to offload the task to celery, and return a "please wait" response in your view. If you want, you can then use an Ajax call to periodically hit a view that will report whether the response is ready, and redirect when it is.
You want to maintain that HTTP connection for an extended period of time but still allow other requests to be managed, right? There's no simple solution to this problem. Also, any solution will be a level away from Django as it depends on how you process requests.
I don't know what you're currently using, so I can only tell you how I handled this in the past... I was using uwsgi to provide the WSGI interface between my python application and nginx. In uwsgi I used the asynchronous functions to suspend my long running connection when there was time to wait on the IO connections. The methods allow you to ask it to suspend things until there is something to read or write and then allow other connections to be serviced.
The above mentioned async calls use "green threads". It's much lighter weight then regular threads and you have control over when you move from thread to thread.
I am not saying that it is a good solution for your scenario[1], but the simple answer is using the following pattern:
async_result = some_task.delay(arg1)
result = async_result.get()
Check documentation for the get method. And instead of using the delay method you can use anything that returns an AsyncResult (like the apply_async method
[1] Why it may be a bad idea? Having an ongoing connection waiting a lot is bad for Django (it is not ready for long-lived connections), may conflict with the proxy configuration (if there is a reverse proxy somewhere) and may be identified as a timeout from the browser. So... it seems a Bad Idea[TM] to use this pattern for a Django Rest Framework view.
I am a PHP programmer learning Python, when ever I get a chance.
I read that Python web Application stay active between requests.
Meaning that data stays in memory and is available between requests, right?
I am wondering how that works.
In php we place a cookie with a unique token, and save data in sessions.
Sessions are arrays, saved on disk or database.
Between requests the session functions, restore the correct session array based on the cookie with the unique token. That means each browser gets it's own unique session, and the session has a preset expiration time. If the user is inactive and the expiration get's triggered then the session gets purged. A new session has to be created when the user comes back.
My understanding is Python doesn't need this, because the application stays active between requests.
Doesn't each request get a unique thread in Python?
How does it distinguish between requests, who the requester is?
Is there a handling method to separate vars between users and application?
Lets say I have a dict saved, is this dict globally available between all requests from any browser, or only to that one browser.
When and how does the memory get cleared. If everything stays in the memory. What if the app is running for a couple years without a restart. There must be some kind of expiration setting or memory handling?
One commenter says it depends on the web app. So I am using Bottle.py to learn.
I would assume the answer would depend on which web application framework you are using within python. Some of them have session management pieces in them that track a user across requests. But if you just had a basic port listener that responded with http, you would have to build any cookie support or session management yourself.
The other big difference is that in php, you have a module installed on the server that the actual http server delegates to in order to generate a response. PHP doesn't handle the routing or actual serving of the responses. Where as python can actually be the server and the resource for generating the response. It depends on how python is installed/accessed on the machine where the server is running. So in that sense you can do whatever you want within a python web application.
If you are interested, you should look at some available python web frameworks.
Edit: I see you mentioned bottle.py, and out of the box, it does not provide authentication and session management because it's a micro framework for fast prototyping and not necessarily suitable for a large scale application (although not impossible, just a lot of work).
Yes and no. If you check out this question you get an idea how it could work for a Django application.
However, the way you state it, it will not work. Defining a dict in one request without passing it somewhere in order to make it accessible in following request will obviously not make it available in further requests. So, yes, you have the options do this but its not the casue out of the box!
I was able to persist an object in Python between requests before using Twisted's web server. I have not tried seeing for myself if it persists across browsers though but I have a feeling it does. Here's a code snippet from the documentation:
Twisted includes an event-driven web server. Here's a sample web application; notice how the resource object persists in memory, rather than being recreated on each request:
from twisted.web import server, resource
from twisted.internet import reactor
class HelloResource(resource.Resource):
isLeaf = True
numberRequests = 0
def render_GET(self, request):
self.numberRequests += 1
request.setHeader("content-type", "text/plain")
return "I am request #" + str(self.numberRequests) + "\n"
reactor.listenTCP(8080, server.Site(HelloResource()))
reactor.run()
First of all you should understand the difference between local and global variables in python, and also how thread local storage works.
This is a (very) short explanation:
global variables are those declared at module scope and are shared by all threads. They live as long as the process is running, unless explicitly removed
local variables are those declared inside a function and instantiated for each call of that function. They are deleted when the function is over unless it is still referenced somewhere else.
thread local stoarage enables defining global variables that are specific to the current thread. The live as tong as the current thread is running, unless explicitly removed.
And now I'll try to answer your original questions (the answers are specific to bottle.py, but it is the most common implementation in python web servers)
Doesn't each request get a unique thread in Python?
Each concurrent will have a separate thread, future requests might reuse the previous threads.
How does it distinguish between requests, who the requester is?
bottle.py uses thread local storage to access the current request
Is there a handling method to separate vars between users and application?
Sounds like you are looking for a session. If so, there is no standard way of doing it, because different implementation have advantages and disadvantages. For example this is a bottle.py middleware for sessions.
Lets say I have a dict saved, is this dict globally available between
all requests from any browser, or only to that one browser. When and
how does the memory get cleared.
If everything stays in the memory. What if the app is running for a
couple years without a restart. There must be some kind of expiration
setting or memory handling?
Exactly, there must be an expiration setting. Since you are using a custom dict you need a timer that checks each entry in the dict for expiration.
I have a django app which is used for managing registrations to a survey.
There are fixed number of slots and I want to "reserve" slots for users when they sign up.
In one of my views, I get the next available slot and reserve it (or redirect the user if there are no slots available.)
I want to protect against the case where two user's signing up at the same time get registered for the same slot because the the method "get_next_available_slot" returned the same slot for both users.
For this I am trying to understand the use of processes and threads with Django's views.
1) Is each request handled in a separate thread and will using python threading module's LOCK() take care of exclusive access?
2) I am running apache on RHEL with modwsgi. How do I configure Apache/modwsgi to ensure a more easy and simple solution to handle the above situation?
I am not expecting a huge load on the web application at all. So I would like a simpler solution instead of a high performance one.
You should not make assumptions about thread/process setup of django application, because it depends on web server you're using and how django is integrated to it. Therefore, interprocess communication methods should not rely on these details to be reliable. One good solution is using built-in cache library for locks and shared data.
Here's a good example of cache lock ensuring only once instance of celery task is run at a time. You can apply it to regular requests as well.
You shouldn't be worrying about this kind of stuff.
These slots are stored in a database right? The database should handle all the locking mechanisms for you, just make sure you run everything under a transaction and you will be fine.
I am developing web app on flask, python, sqlalchemy and postgresql.
My question is here regarding concurrency handling in this app.
How I wrote the app :
I take the example of adding user in database. I post the form and a view is called. I process all the form data and then call add_user(*arg) which uses sqlalchemy code to insert user in database and returns on successful execution and I return the response from the view.
What I assumed:
Ok now I assumed that my web server (which I have not decided yet) will either spawn a thread or a process if two users are trying to signup at the same time and will handle all the concurreny requirements.
Do i need to write threaded code here? By threaded code I mean that before writing I acquire a lock and after write release it.
I am pretty new to web development and multithreading/multiprocessing programing and would like some guidance on how write web app which can handle concurrency well.
Writing concurrency handling from start is right or this thought should come when a large number of concurrent users are using the webapp. Even If it should be done later I would like some pointers about it.
Basically I have no idea about concurrency part of webapp development. If you can point to resources from where I can learn more about it would be really helpful.
Flask will execute each request in a separate thread or even in separate processes. The number of threads and processes to spawn is determined by the WSGI server (for example, Apache with mod_wsgi).
If you use SQLAlchemy ScopedSessions, the session is perfectly thread-safe. You must not share ORM-controlled objects across threads (but in the large majority of cases, you won't let your objects live longer than a request anyway so this is usually not a concern).
In other words, as long as you don't intend to share state between requests other than through the database or cookies, you don't need to worry about concurrency issues. You don't need to create a lock for writing to the database.
If you create your own long-lived objects within your application, which you most likely don't need to do, and if those objects communicate or share state with request handling code, then you must take appropriate precautions to avoid synchronization issues (race conditions, deadlocks, use of libraries that are not thread-safe, etc.)