standard way to handle user session in tornado - python

So, in order to avoid the "no one best answer" problem, I'm going to ask, not for the best way, but the standard or most common way to handle sessions when using the Tornado framework. That is, if we're not using 3rd party authentication (OAuth, etc.), but rather we have want to have our own Users table with secure cookies in the browser but most of the session info stored on the server, what is the most common way of doing this? I have seen some people using Redis, some people using their normal database (MySQL or Postgres or whatever), some people using memcached.
The application I'm working on won't have millions of users at a time, or probably even thousands. It will need to eventually get some moderately complex authorization scheme, though. What I'm looking for is to make sure we don't do something "weird" that goes down a different path than the general Tornado community, since authentication and authorization, while it is something we need, isn't something that is at the core of our product and so isn't where we should be differentiating ourselves. So, we're looking for what most people (who use Tornado) are doing in this respect, hence I think it's a question with (in theory) an objectively true answer.
The ideal answer would point to example code, of course.

Here's how it seems other micro frameworks handle sessions (CherryPy, Flask for example):
Create a table holding session_id and whatever other fields you'll want to track on a per session basis. Some frameworks will allow you to just store this info in a file on a per user basis, or will just store things directly in memory. If your application is small enough, you may consider those options as well, but a database should be simpler to implement on your own.
When a request is received (RequestHandler initialize() function I think?) and there is no session_id cookie, set a secure session-id using a random generator. I don't have much experience with Tornado, but it looks like setting a secure cookie should be useful for this. Store that session_id and associated info in your session table. Note that EVERY user will have a session, even those not logged in. When a user logs in, you'll want to attach their status as logged in (and their username/user_id, etc) to their session.
In your RequestHandler initialize function, if there is a session_id cookie, read in what ever session info you need from the DB and perhaps create your own Session object to populate and store as a member variable of that request handler.
Keep in mind sessions should expire after a certain amount of inactivity, so you'll want to check for that as well. If you want a "remember me" type log in situation, you'll have to use a secure cookie to signal that (read up on this at OWASP to make sure it's as secure as possible, thought again it looks like Tornado's secure_cookie might help with that), and upon receiving a timed out session you can re-authenticate a new user by creating a new session and transferring whatever associated info into it from the old one.

Tornado designed to be stateless and don't have session support out of the box.
Use secure cookies to store sensitive information like user_id.
Use standard cookies to store not critical information.
For storing large objects - use standard scheme - MySQL + memcache.

The key issue with sessions is not where to store them, is to how to expire them intelligently. Regardless of where sessions are stored, as long as the number of stored sessions is reasonable (i.e. only active sessions plus some surplus are stored), all this data is going to fit in RAM and be served fast. If there is a lot of old junk you may expect unpredictable delays (the need to hit the disk to load the session).

There isn't anything built directly into Tornado for this purpose. As others have commented already, Tornado is designed to be a very fast async framework. It is lean by design. However, it is possible to hook in your own session management capability. You need to add a preamble section to each handler that would create or grab a session container. You will need to store the session ID in a cookie. If you are not strictly HTTPS then you will want to use a secure cookie. The session persistence can be any technology of your choosing such as Redis, Postgres, MySQL, a file store, etc...
There is a Github project that provides session management for Tornado. Even if you decide not to use it, it can provide insight into how to structure your own session management. The Github project is called dustdevil. Full disclosure - we created this several years ago but find it very easy to use and have it in active use today.

Related

Caching data on backend side with Django (scalable applications)

Suppose I need to build a web application where each client will be simulating their trading strategy using historical stock data. The data will be provided by a 3rd party vendor over the internet: for example, fetching historical data for a single stock based on stock ticker through HTTP call. Also, I am planning to use Django as a back-end framework.
Here is my question:
I would like to be able to prefetch and cache the data on the server side, so that each client's request would not need to do an HTTP call again, but get it from the shared resource. I guess, storing it in database, like SQL could be one solution. However, is there a way to use memory shared between clients in Django on the backend side? Any pointer or suggestion would be very helpful. Thanks.
This sounds like a fine thing to store in a share cache, like memcache or redis (or, yes, even an SQL database-backed cache).
You should read https://docs.djangoproject.com/en/dev/topics/cache/; this can explain how you can store the result of your HTTP call under a cache key and then retrieve it. The caching works the same regardless of what backend (memcache, redis, local memory, SQL DB) you use, so you can test this out with the local-memory cache or DB cache and, if you like it, move to a better solution like memcache.
There are many caching strategies you could use here, but a nice place to start rather then storing the data in a SQL database you could store the data in something like Memcached https://docs.djangoproject.com/en/dev/topics/cache/#memcached. Without more information I couldn't get more specific than that.

Django Passing data between views

I was wondering what is the 'best' way of passing data between views. Is it better to create invisible fields and pass it using POST or should I encode it in my URLS? Or is there a better/easier way of doing this? Sorry if this question is stupid, I'm pretty new to web programming :)
Thanks
There are different ways to pass data between views. Actually this is not much different that the problem of passing data between 2 different scripts & of course some concepts of inter-process communication come in as well. Some things that come to mind are -
GET request - First request hits view1->send data to browser -> browser redirects to view2
POST request - (as you suggested) Same flow as above but is suitable when more data is involved
Django session variables - This is the simplest to implement
Client-side cookies - Can be used but there is limitations of how much data can be stored.
Shared memory at web server level- Tricky but can be done.
REST API's - If you can have a stand-alone server, then that server can REST API's to invoke views.
Message queues - Again if a stand-alone server is possible maybe even message queues would work. i.e. first view (API) takes requests and pushes it to a queue and some other process can pop messages off and hit your second view (another API). This would decouple first and second view API's and possibly manage load better.
Cache - Maybe a cache like memcached can act as mediator. But then if one is going this route, its better to use Django sessions as it hides a whole lot of implementation details but if scale is a concern, memcached or redis are good options.
Persistent storage - store data in some persistent storage mechanism like mysql. This decouples your request taking part (probably a client facing API) from processing part by having a DB in the middle.
NoSql storages - if speed of writes are in other order of hundreds of thousands per sec, then MySql performance would become bottleneck (there are ways to get around by tweaking mysql config but its not easy). Then considering NoSql DB's could be an alternative. e.g: dynamoDB, Redis, HBase etc.
Stream Processing - like Storm or AWS Kinesis could be an option if your use-case is real-time computation. In fact you could use AWS Lambda in the middle as a server-less compute module which would read off and call your second view API.
Write data into a file - then the next view can read from that file (real ugly). This probably should never ever be done but putting this point here as something that should not be done.
Cant think of any more. Will update if i get any. Hope this helps in someway.

Performing session management in python

I have created an application in python. And I am using extjs for the front-end.
Once a valid user logs in, I want to use the username of the logged in user for further transactions.
I wanted urgent help on how session management is to be done in python.
Thanks in advance.
You don't do "session management in Python". You do "session management in a framework, which may be implemented in Python".
A session can be implemented in many ways, but normally is implemented via Cookies. For example, Django (a Python framework), writes a cookie with a value called session containing a given string (say aabbccddeeff12345). For every request, Django checks whether the cookie exists, and if it does, it maps it to a user.
This is far from trivial. However, if you use Django (or most robust web frameworks in any language), this is completely transparent.
If you don't know what a cookie is, or are unfamiliar with web security, I don't think this is a problem that you should tackle yourself.
Use a framework, and understand it.
If you don't use a framework or if your framework doesn't provide session management, Beaker is a popular library do this.
Examples at https://beaker.groovie.org/sessions.html#using show how to store a username in the session object.

Using sessions in Django

I'm using sessions in Django to store login user information as well as some other information. I've been reading through the Django session website and still have a few questions.
From the Django website:
By default, Django stores sessions in
your database (using the model
django.contrib.sessions.models.Session).
Though this is convenient, in some
setups it’s faster to store session
data elsewhere, so Django can be
configured to store session data on
your filesystem or in your cache.
Also:
For persistent, cached data, set
SESSION_ENGINE to
django.contrib.sessions.backends.cached_db.
This uses a write-through cache –
every write to the cache will also be
written to the database. Session reads
only use the database if the data is
not already in the cache.
Is there a good rule of thumb for which one to use? cached_db seems like it would always be a better choice because best case, the data is in the cache, and worst case it's in the database where it would be anyway. The one downside is I have to setup memcached.
By default, SESSION_EXPIRE_AT_BROWSER_CLOSE is set
to False, which means session cookies
will be stored in users' browsers for
as long as SESSION_COOKIE_AGE. Use
this if you don't want people to have
to log in every time they open a
browser.
Is it possible to have both, the session expire at the browser close AND give an age?
If value is an integer, the session
will expire after that many seconds of
inactivity. For example, calling
request.session.set_expiry(300) would
make the session expire in 5 minutes.
What is considered "inactivity"?
If you're using the database backend, note that session data can
accumulate in the django_session
database table and Django does not
provide automatic purging. Therefore,
it's your job to purge expired
sessions on a regular basis.
So that means, even if the session is expired there are still records in my database. Where exactly would one put code to "purge the db"? I feel like you would need a seperate thread to just go through the db every once in awhile (Every hour?) and delete any expired sessions.
Is there a good rule of thumb for which one to use?
No.
Cached_db seems like it would always be a better choice ...
That's fine.
In some cases, there a many Django (and Apache) processes querying a common database. mod_wsgi allows a lot of scalability this way. The cache doesn't help much because the sessions are distributed randomly among the Apache (and Django) processes.
Is it possible to have both, the session expire at the browser close AND give an age?
Don't see why not.
What is considered "inactivity"?
I assume you're kidding. "activity" is -- well -- activity. You know. Stuff happening in Django. A GET or POST request that Django can see. What else could it be?
Where exactly would one put code to "purge the db"?
Put it in crontab or something similar.
I feel like you would need a seperate thread to just go through the db every once in awhile (Every hour?)
Forget threads (please). It's a separate process. Once a day is fine. How many sessions do you think you'll have?

Best practice: How to persist simple data without a database in django?

I'm building a website that doesn't require a database because a REST API "is the database". (Except you don't want to be putting site-specific things in there, since the API is used by mostly mobile clients)
However there's a few things that normally would be put in a database, for example the "jobs" page. You have master list view, and the detail views for each job, and it should be easy to add new job entries. (not necessarily via a CMS, but that would be awesome)
e.g. example.com/careers/ and example.com/careers/77/
I could just hardcode this stuff in templates, but that's no DRY- you have to update the master template and the detail template every time.
What do you guys think? Maybe a YAML file? Or any better ideas?
Thx
Why not still keep it in a database? Your remote REST store is all well and funky, but if you've got local data, there's nothing (unless there's spec saying so) to stop you storing some stuff in a local db. Doesn't have to be anything v glamorous - could be sqlite, or you could have some fun with redis, etc.
You could use the Memcachedb via the Django cache interface.
For example:
Set the cache backend as memcached in your django settings, but install/use memcachedb instead.
Django can't tell the difference between the two because the provide the same interface (at least last time I checked).
Memcachedb is persistent, safe for multithreaded django servers, and won't lose data during server restarts, but it's just a key value store. not a complete database.
Some alternatives built into the Python library are listed in the Data Persistence chapter of the documentation. Still another is JSON.

Categories

Resources