How to make process wide editable variable (storage) in django?

How to make process wide editable variable (storage) in django? - python

I want create project wide accessible storage for project/application settings.
What i want to achieve: - Each app has it's own app specific settings stored in db - When you spawn django wsgi process each settings are stored in memory storage and are available project wide - Whenever you change any setting value in db there is a call to regenerate storage from db
So it works very close to cache but I can't use cache mechanism because it's serializing data. I can also use memcache for that purpose but i want to develop generic solution (not always you have access to memcache).
If anyone have any ideas to solve my problem i would be really gratefully for sharing.

Before giving any specific advice to you, you need to be aware of the limitations of these systems.
ISSUES
Architectural Differences between Django and PHP/other popular language.
PHP re-reads and re-evaluates the entire code tree every time a page is accessed. This allows PHP to re-read settings from DB/cache/whatever on every request.
Initialisation
Many django apps initialise their own internal cache of settings and parameters and perform validation on these parameters at module import time (server start) rather than on every request. This means you would need a server restart anyway when modifying any settings for non db-settings enabled apps, so why not just change settings.py and restart server?
Multi-site/Multi-instance and in-memory settings
In Memory changes are strictly discouraged by Django docs because they will only affect a single server instance. In the case of multiple sites (contrib.sites), only a single site will recieve updates to shared settings. In the case of instanced servers (ep.io/gondor) any changes will only be made to a local instance, not every instance running for your site. Tread carefully
PERFORMANCE
In Memory Changes
Changing settings values while the server is running is strictly discouraged by django docs for the reasons outlined above. However there is no performance hit with this option. USE ONLY WITHIN THE CONFINES OF SPECIFIC APPS MADE BY YOU AND SINGLE SERVER/INSTANCE.
Cache (redis-cache/memcached)
This is the intermediate speed option. Reasonably fast lookup of settings which can be deserialised into complex python structures easily - great for dict-configs. IMPORTANT is that values are shared among Sites/Instances safely and are updated atomically.
DB (SLOOOOW)
Grabbing one setting at a time from DB will be very very slow unless you hack in connection pooling. grabbing all settings at once is faster but increases db transfer on every single request. Settings synched between Sites/Instances safely. Use only for 1 or 2 settings and it would be reasonable to use.
CONCLUSION
Storing configuration values in database/cache/mem can be done, but you have to be aware of the caveats, and the potential performance hit. Creating a generic replacement for settings.py will not work, however creating a support app that stores settings for other apps could be a viable solution, as long as the other apps accept that settings must be reloaded every request.

Related

Django: Horizontal scaling with a single database

I have a single django instance about to hit its limits in terms of throughput. Id like to make a second instance and start scaling horizontally.
I understand when dealing with database read replicas there is some minimal django configuration necessary, but in the instance of only using a single database: is there anything I need to do, or anything I should be careful of when adding a second instance?
For the record, I use render.com (it’s similar to heroku) and their scaling solution just gives us a slider and will automatically move an instance up or down. Is there any sort of configuration I need to do with django + gunicorn + uvicorn? It will automatically sit behind their load balancer as well.
For reference my stack is:
Django + DRF
Postgres
Redis for cache and broker
Django-q for async
Cloudflare

You can enable autoscaling on Render and it will automatically scale your instances up (and down) based on your application's average CPU and/or memory utilization across all instances. You do not need to change your Django app.

Custom access log in aiohttp

I want to have additional attributes of request to be logged in access log of aiohttp server.
For example I have middleware, that adds user attribute to each request, and I want to store this value inside extra attribute for access log records. Documentation suggests to overwrite aiohttp.helpers.AccessLogger which indeed seems to be a good starts, but what do I do next, where do I put instance of my custom logger? I looked through the code and it looks like that is not possible on application creation stage, but rather on application run. But I'm running application using different approaches, so this it's not that convenient to modify startup in several places (for example locally I'm using aiohttp-devtools runserver and gunicorn for deployment).
So what should be correct approach here?
(Also I'd like to do the same for error log, but that seems even more complex, so for now I'm just using another middleware that catches errors and creates log records I need).

Taking into account that development logging config is usually very different from production one keeping two different approaches for gunicorn and aiohttp-devtools are totally fine.
For dev server you perhaps need to log everything on a console, staging and production writes logs differently.

sqlalchemy identity map question

The identity map and unit of work patterns are part of the reasons sqlalchemy is much more attractive than django.db. However, I am not sure how the identity map would work, or if it works when an application is configured as wsgi and the orm is accessed directly through api calls, instead of a shared service. I would imagine that apache would create a new thread with its own python instance for each request. Each instance would therefore have their own instance of the sqlalchemy classes and not be able to make use of the identity map. Is this correct?

I think you misunderstood the identity map pattern.
From : http://martinfowler.com/eaaCatalog/identityMap.html
An Identity Map keeps a record of all
objects that have been read from the
database in a single business
transaction.
Records are kept in the identity map for a single business transaction. This means that no matter how your web server is configured, you probably will not hold them for longer than a request (or store them in the session).
Normally, you will not have many users taking part in a single business transation. Anyway, you probably don't want your users to share objects, as they might end up doing things that are contradictory.

So this all depends on how you setup your sqlalchemy connection. Normally what you do is to manage each wsgi request to have it's own threadlocal session. This session will know about all of the goings-on of it, items added/changed/etc. However, each thread is not aware of the others. In this way the loading/preconfiguring of the models and mappings is shared during startup time, however each request can operate independent of the others.

Using sessions in Django

I'm using sessions in Django to store login user information as well as some other information. I've been reading through the Django session website and still have a few questions.
From the Django website:
By default, Django stores sessions in
your database (using the model
django.contrib.sessions.models.Session).
Though this is convenient, in some
setups it’s faster to store session
data elsewhere, so Django can be
configured to store session data on
your filesystem or in your cache.
Also:
For persistent, cached data, set
SESSION_ENGINE to
django.contrib.sessions.backends.cached_db.
This uses a write-through cache –
every write to the cache will also be
written to the database. Session reads
only use the database if the data is
not already in the cache.
Is there a good rule of thumb for which one to use? cached_db seems like it would always be a better choice because best case, the data is in the cache, and worst case it's in the database where it would be anyway. The one downside is I have to setup memcached.
By default, SESSION_EXPIRE_AT_BROWSER_CLOSE is set
to False, which means session cookies
will be stored in users' browsers for
as long as SESSION_COOKIE_AGE. Use
this if you don't want people to have
to log in every time they open a
browser.
Is it possible to have both, the session expire at the browser close AND give an age?
If value is an integer, the session
will expire after that many seconds of
inactivity. For example, calling
request.session.set_expiry(300) would
make the session expire in 5 minutes.
What is considered "inactivity"?
If you're using the database backend, note that session data can
accumulate in the django_session
database table and Django does not
provide automatic purging. Therefore,
it's your job to purge expired
sessions on a regular basis.
So that means, even if the session is expired there are still records in my database. Where exactly would one put code to "purge the db"? I feel like you would need a seperate thread to just go through the db every once in awhile (Every hour?) and delete any expired sessions.

Is there a good rule of thumb for which one to use?
No.
Cached_db seems like it would always be a better choice ...
That's fine.
In some cases, there a many Django (and Apache) processes querying a common database. mod_wsgi allows a lot of scalability this way. The cache doesn't help much because the sessions are distributed randomly among the Apache (and Django) processes.
Is it possible to have both, the session expire at the browser close AND give an age?
Don't see why not.
What is considered "inactivity"?
I assume you're kidding. "activity" is -- well -- activity. You know. Stuff happening in Django. A GET or POST request that Django can see. What else could it be?
Where exactly would one put code to "purge the db"?
Put it in crontab or something similar.
I feel like you would need a seperate thread to just go through the db every once in awhile (Every hour?)
Forget threads (please). It's a separate process. Once a day is fine. How many sessions do you think you'll have?

Best practice: How to persist simple data without a database in django?

I'm building a website that doesn't require a database because a REST API "is the database". (Except you don't want to be putting site-specific things in there, since the API is used by mostly mobile clients)
However there's a few things that normally would be put in a database, for example the "jobs" page. You have master list view, and the detail views for each job, and it should be easy to add new job entries. (not necessarily via a CMS, but that would be awesome)
e.g. example.com/careers/ and example.com/careers/77/
I could just hardcode this stuff in templates, but that's no DRY- you have to update the master template and the detail template every time.
What do you guys think? Maybe a YAML file? Or any better ideas?
Thx

Why not still keep it in a database? Your remote REST store is all well and funky, but if you've got local data, there's nothing (unless there's spec saying so) to stop you storing some stuff in a local db. Doesn't have to be anything v glamorous - could be sqlite, or you could have some fun with redis, etc.

You could use the Memcachedb via the Django cache interface.
For example:
Set the cache backend as memcached in your django settings, but install/use memcachedb instead.
Django can't tell the difference between the two because the provide the same interface (at least last time I checked).
Memcachedb is persistent, safe for multithreaded django servers, and won't lose data during server restarts, but it's just a key value store. not a complete database.

Some alternatives built into the Python library are listed in the Data Persistence chapter of the documentation. Still another is JSON.

Develop Reference

Python is a programming language that lets you work quickly and integrate systems more effectively.