How can I use different databases for different models - python

I have a model called Requests which I want to save in different database than default django databse.
The reason for this is that that table is going to record every request for analytics and that is going to get populated very heavily. As I am taking database backups hourly so I don't want to increase the db size just for that table.
So I was thinking of puting in separate DB so that I don't backup it up more often.
This docs says like this
https://docs.djangoproject.com/en/dev/topics/db/multi-db/
def db_for_read(self, model, **hints):
"""
Reads go to a randomly-chosen slave.
"""
return random.choice(['slave1', 'slave2'])
def db_for_write(self, model, **hints):
"""
Writes always go to master.
"""
return 'master'
Now I am not sure how can I check that if my model is Requests then choose database A else database B

Models are just classes - so check, if you have right class. This example should work for you:
from analytics.models import Requests
def db_for_read(self, model, **hints):
"""
Reads go to default database, unless it is about requests
"""
if model is Requests:
return 'database_A'
else:
return 'database_B'
def db_for_write(self, model, **hints):
"""
Writes go to default database, unless it is about requests
"""
if model is Requests:
return 'database_A'
else:
return 'database_B'
If you wish, though, you can also use one of some other techniques (such as checking model.__name__ or looking at model._meta).
One note, though: the requests should not have foreign keys connecting them to models in other databases. But you probably already know that.

Related

Indexing SQLAlchemy models on ElasticSearch

I'm trying to use sqlalchemy signals to keep an elasticsearch index updated reflecting certain models on my DB.
The problem I'm having is that to build the ES documents I access some of the models' relationships, and that means that there's several points at the signals cycle the relationships cannot be accessed. I was creating the ES Document from the Model instance in the listeners for after_(insert|udpate|delete), but those are issued while the connection is in flushing state, and it seems like relationships return None at that point, so the ES document cannot be built.
I changed it so I just saved queue of the commands to be emitted, keeping a reference to the model instead, and the plan was to issue them on commit, but the before_commit signal is issued before flushing, and when after_commit is issued the DB cannot be accessed anymore, so again relationships are inaccessible.
It seems to me like the right moment to create the Documents and save them is in the after_flush_postexec signal, but I have a feeling that between that and when the commit is actually issued there might be a rollback, and then the ES index would not reflect the DB anymore.
I'm not sure what's the best way to do this.
Here's the code I'm working with right now. ChargeIndexer.upsert takes a SQLAlchemy ORM model, and uses the elasticsearch api to insert/update a document built from it and .delete does the same, except of course it deletes it, and only depends on the model's id.
class ChargeListener(object):
ops = []
#event.listens_for(Charge, 'after_delete', propagate=True)
def after_delete(mapper, connection, target):
ChargeListener.add_delete(target)
#event.listens_for(Charge, 'after_insert', propagate=True)
def after_insert(mapper, connection, target):
ChargeListener.add_insert(target)
#event.listens_for(Charge, 'after_update', propagate=True)
def after_update(mapper, connection, target):
ChargeListener.add_insert(target)
#classmethod
def execute_ops(cls):
charges_indexer = ChargesIndexer()
for op, charge in ChargeListener.ops:
if op == 'insert':
charges_indexer.upsert(charge)
elif op == 'delete':
charges_indexer.delete(charge)
ChargeListener.reset()
#event.listens_for(sess, 'after_flush_postexec')
def after_flush_postexec(session, flush_context):
ChargeListener.execute_ops()
#event.listens_for(sess, 'after_soft_rollback')
def after_soft_rollback(session, previous_transaction):
ChargeListener.reset()
#classmethod
def add_insert(cls, charge):
cls.ops.append(('insert', charge))
#classmethod
def add_delete(cls, charge):
cls.ops.append(('delete', charge))
#classmethod
def reset(cls):
cls.ops = []

Using RedShift as an additional Django Database

I have two Databases defined, default which is a regular MySQL backend andredshift (using a postgres backend). I would like to use RedShift as a read-only database that is just used for django-sql-explorer.
Here is the router I have created in my_project/common/routers.py:
class CustomRouter(object):
def db_for_read(self, model, **hints):
return 'default'
def db_for_write(self, model, **hints):
return 'default'
def allow_relation(self, obj1, obj2, **hints):
db_list = ('default', )
if obj1._state.db in db_list and obj2._state.db in db_list:
return True
return None
def allow_migrate(self, db, app_label, model_name=None, **hints):
return db == 'default'
And my settings.py references it like so:
DATABASE_ROUTERS = ['my_project.common.routers.CustomRouter', ]
The problem occurs when invoking makemigrations, Django throws an error with the indication that it is trying to create django_* tables in RedShift (and obviously failing because of the postgres type serial not being supported by RedShift:
...
raise MigrationSchemaMissing("Unable to create the django_migrations table (%s)" % exc)
django.db.migrations.exceptions.MigrationSchemaMissing: Unable to create the django_migrations table (Column "django_migrations.id" has unsupported type "serial".)
So my question is two-fold:
Is it possible to completely disable Django Management for a database, but still use the ORM?
Barring Read-Only Replicas, why has Django not considered it an acceptable use case to support read-only databases?
Related Questions
- Column 'django_migrations.id' has unsupported type 'serial' [ with Amazon Redshift]
I just discovered that this is the result of a bug. It's been addressed in a few PRs, most notably: https://github.com/django/django/pull/7194
So, to answer my own questions:
No. It's not currently possible. The best solution is to use a custom Database Router in combination with a read-only DB account and have allow_migrate() return False in the router.
The best solution is to upgrade to Django >= 1.10.4 and not use a Custom Database Router, which stops the bug. However, this is a caveat if you have any other databases defined, such as a Read-Replica.

Creating an object with more than 1 foreign key in a Django project with multiple databases causing error

In my Django project, I have 2 databases. Everything works perfectly except for when I try to create a model with 2 foreign keys. The database router locks up and gives me a Cannot assign "<FKObject: fk_object2>": the current database router prevents this relation. even if both foreign keys come from the same database and I've yet to save the object in question. Code below
1 fk_object1 = FKObject.objects.using("database2").get(pk=1)
2 fk_object2 = FKObject.objects.using("database2").get(pk=2)
3
4 object = Object()
5 object.fk_object1 = fk_object1
6 object.fk_object2 2 = fk_object2
7 object.save(using="database2")
The problem arises on line 6, before the object is even saved into the database so I'm assuming that Django somehow calls Object() with database1 even though it hasn't been specified yet.
Does anyone know how to deal with this?
So I ended up finding a work around as such:
As it turns out, my suspicions were only partially true. Calling Model() does not cause Django to assume it is to use the default database but setting a foreign key does. This would explain why my code would error out at line 6 and not at line 5 as by this point in time, Django already assumes that you're using the default database and as fk_object2 is called from database2, it errors out for fear of causing an inter-database relation.
To get around this, I used threading.current_thread() as so:
class Command(BaseCommand):
current_thread().db_name = "database2"
def handle(self, **args, **kwargs):
# Do work here
class DatabaseRouter(object):
db_thread = threading.current_thread()
def db_for_read(self, model, **hints):
try:
print("Using {}".format(db_thread.db_name))
return db_thread.db_name
except AttributeError:
return "default"
def db_for_write(self, model, **hints):
try:
print("Using {}".format(db_thread.db_name))
return db_thread.db_name
except AttributeError:
return "default"
This way, my 2nd database is used every time, thereby avoiding any possible relation inconsistencies.

Django Db routing

I am trying to run my Django application with two db's (1 master, 1 read replica). My problem is if I try to read right after a write the code explodes. For example:
p = Product.objects.create()
Product.objects.get(id=p.id)
OR
If user is redirected to Product's
details page
The code runs way faster than the read replica. And if the read operation uses the replica the code crashes, because it didn't update in time.
Is there any way to avoid this? For example, the db to read being chosen by request instead of by operation?
My Router is identical to Django's documentation:
import random
class PrimaryReplicaRouter(object):
def db_for_read(self, model, **hints):
"""
Reads go to a randomly-chosen replica.
"""
return random.choice(['replica1', 'replica2'])
def db_for_write(self, model, **hints):
"""
Writes always go to primary.
"""
return 'primary'
def allow_relation(self, obj1, obj2, **hints):
"""
Relations between objects are allowed if both objects are
in the primary/replica pool.
"""
db_list = ('primary', 'replica1', 'replica2')
if obj1._state.db in db_list and obj2._state.db in db_list:
return True
return None
def allow_migrate(self, db, app_label, model_name=None, **hints):
"""
All non-auth models end up in this pool.
"""
return True
Solved it with :
class Model(models.Model):
objects = models.Manager() -> objects only access master
sobjects = ReplicasManager() -> sobjects access either master and replicas
class Meta:
abstract = True -> so django doesn't create a table
make every model extend this one instead of models.Model, and then use objects or sobjects whether I want to access only master or if want to access either master or replicas
Depending on the size of the data and the application I'd tackle this with either of the following methods:
Database pinning:
Extend your database router to allow pinning functions to specific databases. For example:
from customrouter.pinning import use_master
#use_master
def save_and_fetch_foo():
...
A good example of that can be seen in django-multidb-router.
Of course you could just use this package as well.
Use a model manager to route queries to specific databases.
class MyManager(models.Manager):
def get_queryset(self):
qs = CustomQuerySet(self.model)
if self._db is not None:
qs = qs.using(self._db)
return qs
Write a middleware that'd route your requests to master/slave automatically.
Basically same as the pinning method but you wouldn't specify when to run GET requests against master.
IN master replica conf the new data will take few millisecond to replicate the data on all other replica server/database.
so whenever u tried to read after write it wont gives you correct result.
Instead of reading from replica you can use master to read immediately after write by using using('primary') keyword with your get query.

What is best practice to verify the user in google app engine (gae) for every request? Or how to avoid DB access?

I am working on a google app engine (gae) project in python which has the following structure:
class LoginHandler(webapp2.RequestHandler):
def get(self):
...#check User-> DB access
def post():
...#check User-> DB access
class SignupHandler(webapp2.RequestHandler):
def get(self):
...#check User-> DB access
def post():
...#check User-> DB access
class Site1Handler(webapp2.RequestHandler):
def get(self):
...#check User-> DB access
def post():
...#check User-> DB access
class Site2Handler(webapp2.RequestHandler):
def get(self):
...#check User-> DB access
def post():
...#check User-> DB access
class ...
application = webapp2.WSGIApplication([('/login', LoginHandler),
('/signup',SignupHandler),
('/site1', Site1Handler),
('/site2', Site2Handler),
...,
],
debug=True)
Every user who wants to use this application has to be logged in.
Therefore on the login-site and the signup-site a cookie value with an user_id is set.
So lets imagine this app has 100 URLs and the corresponding 100 Site...Handlers() implemented.
Than for every get()/post() call I first get the user_id from the cookie and check in the database if this user exists and if it is valid.
So if the user clicks on 20 sites the app accesses 20 times the db to validate the user.
I am sure there is a better way and I would be glad if someone could show me how to do this.
I have already seen someone inherited his own Handler from webapp2.RequestHandler
which would than look like:
class MyHandler(webapp2.RequestHandler):
def initialize(self, *a, **kw):
webapp2.RequestHandler.initialize(self, *a, **kw)
uid = self.request.cookies.get('user_id')
self.user = uid and User.all().filter('userid =', uid).get()
class LoginHandler(MyHandler):
def get(self):
...#if self.user is valid -> OK
def post():
...#if self.user is valid -> OK
...
And here it is getting confusing for me.
Consider two or more people accessing the application concurrently. Will then User1 see data of User2 because self.user is initialized with data from User2?
I also concidered using a global variable to save the current user. But here the same problem if two users access the app concurrent.
I also found the webapp2.registry functionality which seemed to me the same like a global dictionary. And here also the problem of two or more users accessing the app at the same time.
Could someone please show me how to do it right? I am very new to gae and very happy for every hint in the right direction.
(Maybe Memcached is the solution. But I am more interested in a review of this check if user is valid pattern. So what would be best practice to do this?)
Assuming that you are using NDB and validating your user by getting a User object via a key/id - it will be automatically cached in memcache as well as in current local instance's memory, so your route handlers won't be calling Datastore with every single request, this is all done automatically, no extra coding required. If for validation / getting the user object you are using a query - the result won't be automatically cached but you can always manually cache it and verify the user via cache first and if the cache doesn't exist only then query Datastore, caching the results for the next request.
See more here.
If you are using webapp2's Sessions with signed/secure cookies then the data in those cookies, including the fact that the user is validated (which you previously set when when validating the user the first time) can be trusted, as long as you use long and randomly generated secret_key, that is kept secret and thus, just like with cache, you first check whether the user is validated in the cookie and if not, you ask Datastore and save the result in the session cookie for the next request. See more here.
Either way, you don't have to repeat your validation code in every single handler like you are showing in your example. One way of fixing it would be using decorators which would make your validation reuse as simple as placing #login_required before your get method. See more info here and take a look at the webapp2_extras.appengine.users file to get an idea how to write your own, simmilar decorator.

Categories

Resources