Django: Horizontal scaling with a single database

Django: Horizontal scaling with a single database - python

I have a single django instance about to hit its limits in terms of throughput. Id like to make a second instance and start scaling horizontally.
I understand when dealing with database read replicas there is some minimal django configuration necessary, but in the instance of only using a single database: is there anything I need to do, or anything I should be careful of when adding a second instance?
For the record, I use render.com (it’s similar to heroku) and their scaling solution just gives us a slider and will automatically move an instance up or down. Is there any sort of configuration I need to do with django + gunicorn + uvicorn? It will automatically sit behind their load balancer as well.
For reference my stack is:
Django + DRF
Postgres
Redis for cache and broker
Django-q for async
Cloudflare

You can enable autoscaling on Render and it will automatically scale your instances up (and down) based on your application's average CPU and/or memory utilization across all instances. You do not need to change your Django app.

Related

How do I configure multiple django applications to use the same database?

As I configure my Django applications to use containers I'm left with a problem as I try to separate my processes into smaller images. How can I break logic into smaller components that run in their own containers, yet access the same database?
I realize that in a "true" microservices environment I'd want a different database for each service. However, consider a situation where I have a bit of logic that reads data from a database, and produces a CSV file.
I'd like to break out that into a separate type of image, that contains only that logic and gets a special disk mount to write the file to. The rest of the applications remain stateless.
So I'm left with needed two django applications, one that reads and writes data to its database, and another one that I can use to spin up and run reports against that same database.
Option 1: Is that I keep using the same application, where the models have already been defined. I spin up one container for processing, and spin up another container of the same image for use for reporting. While this would work, it seems like it would be a better pattern to isolate the behavior to a specific application.
Option 2: ... What is my option two?

Do you use docker network in your enviroment ?
Suggestion 1: (simple, but fix your situation) duplicate the models.py in two applications. Connect in database using the container database name.
If you need to call a function (def) between containers you can use API's call, using DRF (Django Rest Framework) ou pure Django with JsonResponse.
Do not forget to create tokens to access this "Api's Calls"
Best Regards

Django Passing data between views

I was wondering what is the 'best' way of passing data between views. Is it better to create invisible fields and pass it using POST or should I encode it in my URLS? Or is there a better/easier way of doing this? Sorry if this question is stupid, I'm pretty new to web programming :)
Thanks

There are different ways to pass data between views. Actually this is not much different that the problem of passing data between 2 different scripts & of course some concepts of inter-process communication come in as well. Some things that come to mind are -
GET request - First request hits view1->send data to browser -> browser redirects to view2
POST request - (as you suggested) Same flow as above but is suitable when more data is involved
Django session variables - This is the simplest to implement
Client-side cookies - Can be used but there is limitations of how much data can be stored.
Shared memory at web server level- Tricky but can be done.
REST API's - If you can have a stand-alone server, then that server can REST API's to invoke views.
Message queues - Again if a stand-alone server is possible maybe even message queues would work. i.e. first view (API) takes requests and pushes it to a queue and some other process can pop messages off and hit your second view (another API). This would decouple first and second view API's and possibly manage load better.
Cache - Maybe a cache like memcached can act as mediator. But then if one is going this route, its better to use Django sessions as it hides a whole lot of implementation details but if scale is a concern, memcached or redis are good options.
Persistent storage - store data in some persistent storage mechanism like mysql. This decouples your request taking part (probably a client facing API) from processing part by having a DB in the middle.
NoSql storages - if speed of writes are in other order of hundreds of thousands per sec, then MySql performance would become bottleneck (there are ways to get around by tweaking mysql config but its not easy). Then considering NoSql DB's could be an alternative. e.g: dynamoDB, Redis, HBase etc.
Stream Processing - like Storm or AWS Kinesis could be an option if your use-case is real-time computation. In fact you could use AWS Lambda in the middle as a server-less compute module which would read off and call your second view API.
Write data into a file - then the next view can read from that file (real ugly). This probably should never ever be done but putting this point here as something that should not be done.
Cant think of any more. Will update if i get any. Hope this helps in someway.

How to make process wide editable variable (storage) in django?

I want create project wide accessible storage for project/application settings.
What i want to achieve: - Each app has it's own app specific settings stored in db - When you spawn django wsgi process each settings are stored in memory storage and are available project wide - Whenever you change any setting value in db there is a call to regenerate storage from db
So it works very close to cache but I can't use cache mechanism because it's serializing data. I can also use memcache for that purpose but i want to develop generic solution (not always you have access to memcache).
If anyone have any ideas to solve my problem i would be really gratefully for sharing.

Before giving any specific advice to you, you need to be aware of the limitations of these systems.
ISSUES
Architectural Differences between Django and PHP/other popular language.
PHP re-reads and re-evaluates the entire code tree every time a page is accessed. This allows PHP to re-read settings from DB/cache/whatever on every request.
Initialisation
Many django apps initialise their own internal cache of settings and parameters and perform validation on these parameters at module import time (server start) rather than on every request. This means you would need a server restart anyway when modifying any settings for non db-settings enabled apps, so why not just change settings.py and restart server?
Multi-site/Multi-instance and in-memory settings
In Memory changes are strictly discouraged by Django docs because they will only affect a single server instance. In the case of multiple sites (contrib.sites), only a single site will recieve updates to shared settings. In the case of instanced servers (ep.io/gondor) any changes will only be made to a local instance, not every instance running for your site. Tread carefully
PERFORMANCE
In Memory Changes
Changing settings values while the server is running is strictly discouraged by django docs for the reasons outlined above. However there is no performance hit with this option. USE ONLY WITHIN THE CONFINES OF SPECIFIC APPS MADE BY YOU AND SINGLE SERVER/INSTANCE.
Cache (redis-cache/memcached)
This is the intermediate speed option. Reasonably fast lookup of settings which can be deserialised into complex python structures easily - great for dict-configs. IMPORTANT is that values are shared among Sites/Instances safely and are updated atomically.
DB (SLOOOOW)
Grabbing one setting at a time from DB will be very very slow unless you hack in connection pooling. grabbing all settings at once is faster but increases db transfer on every single request. Settings synched between Sites/Instances safely. Use only for 1 or 2 settings and it would be reasonable to use.
CONCLUSION
Storing configuration values in database/cache/mem can be done, but you have to be aware of the caveats, and the potential performance hit. Creating a generic replacement for settings.py will not work, however creating a support app that stores settings for other apps could be a viable solution, as long as the other apps accept that settings must be reloaded every request.

sqlalchemy identity map question

The identity map and unit of work patterns are part of the reasons sqlalchemy is much more attractive than django.db. However, I am not sure how the identity map would work, or if it works when an application is configured as wsgi and the orm is accessed directly through api calls, instead of a shared service. I would imagine that apache would create a new thread with its own python instance for each request. Each instance would therefore have their own instance of the sqlalchemy classes and not be able to make use of the identity map. Is this correct?

I think you misunderstood the identity map pattern.
From : http://martinfowler.com/eaaCatalog/identityMap.html
An Identity Map keeps a record of all
objects that have been read from the
database in a single business
transaction.
Records are kept in the identity map for a single business transaction. This means that no matter how your web server is configured, you probably will not hold them for longer than a request (or store them in the session).
Normally, you will not have many users taking part in a single business transation. Anyway, you probably don't want your users to share objects, as they might end up doing things that are contradictory.

So this all depends on how you setup your sqlalchemy connection. Normally what you do is to manage each wsgi request to have it's own threadlocal session. This session will know about all of the goings-on of it, items added/changed/etc. However, each thread is not aware of the others. In this way the loading/preconfiguring of the models and mappings is shared during startup time, however each request can operate independent of the others.

Using sessions in Django

I'm using sessions in Django to store login user information as well as some other information. I've been reading through the Django session website and still have a few questions.
From the Django website:
By default, Django stores sessions in
your database (using the model
django.contrib.sessions.models.Session).
Though this is convenient, in some
setups it’s faster to store session
data elsewhere, so Django can be
configured to store session data on
your filesystem or in your cache.
Also:
For persistent, cached data, set
SESSION_ENGINE to
django.contrib.sessions.backends.cached_db.
This uses a write-through cache –
every write to the cache will also be
written to the database. Session reads
only use the database if the data is
not already in the cache.
Is there a good rule of thumb for which one to use? cached_db seems like it would always be a better choice because best case, the data is in the cache, and worst case it's in the database where it would be anyway. The one downside is I have to setup memcached.
By default, SESSION_EXPIRE_AT_BROWSER_CLOSE is set
to False, which means session cookies
will be stored in users' browsers for
as long as SESSION_COOKIE_AGE. Use
this if you don't want people to have
to log in every time they open a
browser.
Is it possible to have both, the session expire at the browser close AND give an age?
If value is an integer, the session
will expire after that many seconds of
inactivity. For example, calling
request.session.set_expiry(300) would
make the session expire in 5 minutes.
What is considered "inactivity"?
If you're using the database backend, note that session data can
accumulate in the django_session
database table and Django does not
provide automatic purging. Therefore,
it's your job to purge expired
sessions on a regular basis.
So that means, even if the session is expired there are still records in my database. Where exactly would one put code to "purge the db"? I feel like you would need a seperate thread to just go through the db every once in awhile (Every hour?) and delete any expired sessions.

Is there a good rule of thumb for which one to use?
No.
Cached_db seems like it would always be a better choice ...
That's fine.
In some cases, there a many Django (and Apache) processes querying a common database. mod_wsgi allows a lot of scalability this way. The cache doesn't help much because the sessions are distributed randomly among the Apache (and Django) processes.
Is it possible to have both, the session expire at the browser close AND give an age?
Don't see why not.
What is considered "inactivity"?
I assume you're kidding. "activity" is -- well -- activity. You know. Stuff happening in Django. A GET or POST request that Django can see. What else could it be?
Where exactly would one put code to "purge the db"?
Put it in crontab or something similar.
I feel like you would need a seperate thread to just go through the db every once in awhile (Every hour?)
Forget threads (please). It's a separate process. Once a day is fine. How many sessions do you think you'll have?

Develop Reference

Python is a programming language that lets you work quickly and integrate systems more effectively.