Best practice: How to persist simple data without a database in django?

Best practice: How to persist simple data without a database in django? - python

I'm building a website that doesn't require a database because a REST API "is the database". (Except you don't want to be putting site-specific things in there, since the API is used by mostly mobile clients)
However there's a few things that normally would be put in a database, for example the "jobs" page. You have master list view, and the detail views for each job, and it should be easy to add new job entries. (not necessarily via a CMS, but that would be awesome)
e.g. example.com/careers/ and example.com/careers/77/
I could just hardcode this stuff in templates, but that's no DRY- you have to update the master template and the detail template every time.
What do you guys think? Maybe a YAML file? Or any better ideas?
Thx

Why not still keep it in a database? Your remote REST store is all well and funky, but if you've got local data, there's nothing (unless there's spec saying so) to stop you storing some stuff in a local db. Doesn't have to be anything v glamorous - could be sqlite, or you could have some fun with redis, etc.

You could use the Memcachedb via the Django cache interface.
For example:
Set the cache backend as memcached in your django settings, but install/use memcachedb instead.
Django can't tell the difference between the two because the provide the same interface (at least last time I checked).
Memcachedb is persistent, safe for multithreaded django servers, and won't lose data during server restarts, but it's just a key value store. not a complete database.

Some alternatives built into the Python library are listed in the Data Persistence chapter of the documentation. Still another is JSON.

Related

Caching data on backend side with Django (scalable applications)

Suppose I need to build a web application where each client will be simulating their trading strategy using historical stock data. The data will be provided by a 3rd party vendor over the internet: for example, fetching historical data for a single stock based on stock ticker through HTTP call. Also, I am planning to use Django as a back-end framework.
Here is my question:
I would like to be able to prefetch and cache the data on the server side, so that each client's request would not need to do an HTTP call again, but get it from the shared resource. I guess, storing it in database, like SQL could be one solution. However, is there a way to use memory shared between clients in Django on the backend side? Any pointer or suggestion would be very helpful. Thanks.

This sounds like a fine thing to store in a share cache, like memcache or redis (or, yes, even an SQL database-backed cache).
You should read https://docs.djangoproject.com/en/dev/topics/cache/; this can explain how you can store the result of your HTTP call under a cache key and then retrieve it. The caching works the same regardless of what backend (memcache, redis, local memory, SQL DB) you use, so you can test this out with the local-memory cache or DB cache and, if you like it, move to a better solution like memcache.

There are many caching strategies you could use here, but a nice place to start rather then storing the data in a SQL database you could store the data in something like Memcached https://docs.djangoproject.com/en/dev/topics/cache/#memcached. Without more information I couldn't get more specific than that.

standard way to handle user session in tornado

So, in order to avoid the "no one best answer" problem, I'm going to ask, not for the best way, but the standard or most common way to handle sessions when using the Tornado framework. That is, if we're not using 3rd party authentication (OAuth, etc.), but rather we have want to have our own Users table with secure cookies in the browser but most of the session info stored on the server, what is the most common way of doing this? I have seen some people using Redis, some people using their normal database (MySQL or Postgres or whatever), some people using memcached.
The application I'm working on won't have millions of users at a time, or probably even thousands. It will need to eventually get some moderately complex authorization scheme, though. What I'm looking for is to make sure we don't do something "weird" that goes down a different path than the general Tornado community, since authentication and authorization, while it is something we need, isn't something that is at the core of our product and so isn't where we should be differentiating ourselves. So, we're looking for what most people (who use Tornado) are doing in this respect, hence I think it's a question with (in theory) an objectively true answer.
The ideal answer would point to example code, of course.

Here's how it seems other micro frameworks handle sessions (CherryPy, Flask for example):
Create a table holding session_id and whatever other fields you'll want to track on a per session basis. Some frameworks will allow you to just store this info in a file on a per user basis, or will just store things directly in memory. If your application is small enough, you may consider those options as well, but a database should be simpler to implement on your own.
When a request is received (RequestHandler initialize() function I think?) and there is no session_id cookie, set a secure session-id using a random generator. I don't have much experience with Tornado, but it looks like setting a secure cookie should be useful for this. Store that session_id and associated info in your session table. Note that EVERY user will have a session, even those not logged in. When a user logs in, you'll want to attach their status as logged in (and their username/user_id, etc) to their session.
In your RequestHandler initialize function, if there is a session_id cookie, read in what ever session info you need from the DB and perhaps create your own Session object to populate and store as a member variable of that request handler.
Keep in mind sessions should expire after a certain amount of inactivity, so you'll want to check for that as well. If you want a "remember me" type log in situation, you'll have to use a secure cookie to signal that (read up on this at OWASP to make sure it's as secure as possible, thought again it looks like Tornado's secure_cookie might help with that), and upon receiving a timed out session you can re-authenticate a new user by creating a new session and transferring whatever associated info into it from the old one.

Tornado designed to be stateless and don't have session support out of the box.
Use secure cookies to store sensitive information like user_id.
Use standard cookies to store not critical information.
For storing large objects - use standard scheme - MySQL + memcache.

The key issue with sessions is not where to store them, is to how to expire them intelligently. Regardless of where sessions are stored, as long as the number of stored sessions is reasonable (i.e. only active sessions plus some surplus are stored), all this data is going to fit in RAM and be served fast. If there is a lot of old junk you may expect unpredictable delays (the need to hit the disk to load the session).

There isn't anything built directly into Tornado for this purpose. As others have commented already, Tornado is designed to be a very fast async framework. It is lean by design. However, it is possible to hook in your own session management capability. You need to add a preamble section to each handler that would create or grab a session container. You will need to store the session ID in a cookie. If you are not strictly HTTPS then you will want to use a secure cookie. The session persistence can be any technology of your choosing such as Redis, Postgres, MySQL, a file store, etc...
There is a Github project that provides session management for Tornado. Even if you decide not to use it, it can provide insight into how to structure your own session management. The Github project is called dustdevil. Full disclosure - we created this several years ago but find it very easy to use and have it in active use today.

Django Passing data between views

I was wondering what is the 'best' way of passing data between views. Is it better to create invisible fields and pass it using POST or should I encode it in my URLS? Or is there a better/easier way of doing this? Sorry if this question is stupid, I'm pretty new to web programming :)
Thanks

There are different ways to pass data between views. Actually this is not much different that the problem of passing data between 2 different scripts & of course some concepts of inter-process communication come in as well. Some things that come to mind are -
GET request - First request hits view1->send data to browser -> browser redirects to view2
POST request - (as you suggested) Same flow as above but is suitable when more data is involved
Django session variables - This is the simplest to implement
Client-side cookies - Can be used but there is limitations of how much data can be stored.
Shared memory at web server level- Tricky but can be done.
REST API's - If you can have a stand-alone server, then that server can REST API's to invoke views.
Message queues - Again if a stand-alone server is possible maybe even message queues would work. i.e. first view (API) takes requests and pushes it to a queue and some other process can pop messages off and hit your second view (another API). This would decouple first and second view API's and possibly manage load better.
Cache - Maybe a cache like memcached can act as mediator. But then if one is going this route, its better to use Django sessions as it hides a whole lot of implementation details but if scale is a concern, memcached or redis are good options.
Persistent storage - store data in some persistent storage mechanism like mysql. This decouples your request taking part (probably a client facing API) from processing part by having a DB in the middle.
NoSql storages - if speed of writes are in other order of hundreds of thousands per sec, then MySql performance would become bottleneck (there are ways to get around by tweaking mysql config but its not easy). Then considering NoSql DB's could be an alternative. e.g: dynamoDB, Redis, HBase etc.
Stream Processing - like Storm or AWS Kinesis could be an option if your use-case is real-time computation. In fact you could use AWS Lambda in the middle as a server-less compute module which would read off and call your second view API.
Write data into a file - then the next view can read from that file (real ugly). This probably should never ever be done but putting this point here as something that should not be done.
Cant think of any more. Will update if i get any. Hope this helps in someway.

How to make process wide editable variable (storage) in django?

I want create project wide accessible storage for project/application settings.
What i want to achieve: - Each app has it's own app specific settings stored in db - When you spawn django wsgi process each settings are stored in memory storage and are available project wide - Whenever you change any setting value in db there is a call to regenerate storage from db
So it works very close to cache but I can't use cache mechanism because it's serializing data. I can also use memcache for that purpose but i want to develop generic solution (not always you have access to memcache).
If anyone have any ideas to solve my problem i would be really gratefully for sharing.

Before giving any specific advice to you, you need to be aware of the limitations of these systems.
ISSUES
Architectural Differences between Django and PHP/other popular language.
PHP re-reads and re-evaluates the entire code tree every time a page is accessed. This allows PHP to re-read settings from DB/cache/whatever on every request.
Initialisation
Many django apps initialise their own internal cache of settings and parameters and perform validation on these parameters at module import time (server start) rather than on every request. This means you would need a server restart anyway when modifying any settings for non db-settings enabled apps, so why not just change settings.py and restart server?
Multi-site/Multi-instance and in-memory settings
In Memory changes are strictly discouraged by Django docs because they will only affect a single server instance. In the case of multiple sites (contrib.sites), only a single site will recieve updates to shared settings. In the case of instanced servers (ep.io/gondor) any changes will only be made to a local instance, not every instance running for your site. Tread carefully
PERFORMANCE
In Memory Changes
Changing settings values while the server is running is strictly discouraged by django docs for the reasons outlined above. However there is no performance hit with this option. USE ONLY WITHIN THE CONFINES OF SPECIFIC APPS MADE BY YOU AND SINGLE SERVER/INSTANCE.
Cache (redis-cache/memcached)
This is the intermediate speed option. Reasonably fast lookup of settings which can be deserialised into complex python structures easily - great for dict-configs. IMPORTANT is that values are shared among Sites/Instances safely and are updated atomically.
DB (SLOOOOW)
Grabbing one setting at a time from DB will be very very slow unless you hack in connection pooling. grabbing all settings at once is faster but increases db transfer on every single request. Settings synched between Sites/Instances safely. Use only for 1 or 2 settings and it would be reasonable to use.
CONCLUSION
Storing configuration values in database/cache/mem can be done, but you have to be aware of the caveats, and the potential performance hit. Creating a generic replacement for settings.py will not work, however creating a support app that stores settings for other apps could be a viable solution, as long as the other apps accept that settings must be reloaded every request.

Is there a production-safe way to measure time spent in Production w/Python?

I want to be able to instrument Python applications so that I know:
Page generation time.
Percentage of time spent in external requests (mysql, api calls).
Number of mysql queries, what the MySQL queries were.
I want this data from production (not offline profiling) - because the time spent in various places will be different under load.
In PHP I can do this with XHProf or instrumentation-for-php. In Ruby on Rails/.NET/Java, I can do this with New Relic.
Is there such a package recommended for Python or django?

Yes, it's perfectly possible. E.g. use some magic switch in URL, like "?profile-me" which triggers profiling in Django middleware.
There are a number of snippets on the Internet, like this one: http://djangosnippets.org/snippets/70/ or modules like this one: http://code.google.com/p/django-profiling/ - but I haven't used any of them so I cannot recommend anything.
Anyway, the approach they take is similar to what I do - i.e. use Python Hotshot profiler module in a middleware that wraps your view. For the MySQL part, you can just use connection.queries form Django.
The nice thing about Hotshot is that its output can be browsed using Kcachegrind like here: http://www.rkblog.rk.edu.pl/w/p/django-profiling-hotshot-and-kcachegrind/

New Relic now had a package for Python, including Django through mod_wsgi.
https://support.newrelic.com/help/kb/python

django-prometheus is a good choice for handling production workloads, especially in a container environment like Kubernetes. Out of the box, it has middleware for tracking request latencies and counts (by view method), as well as Database and cache access times. It wouldn't be a good solution for tracking which queries are actually executing, but that's where a logging solution like ELK would come into play. If it helps, I've written a post which walks through how to add custom metrics to a Django application.

Develop Reference

Python is a programming language that lets you work quickly and integrate systems more effectively.