I am trying to run my Django application with two db's (1 master, 1 read replica). My problem is if I try to read right after a write the code explodes. For example:
p = Product.objects.create()
Product.objects.get(id=p.id)
OR
If user is redirected to Product's
details page
The code runs way faster than the read replica. And if the read operation uses the replica the code crashes, because it didn't update in time.
Is there any way to avoid this? For example, the db to read being chosen by request instead of by operation?
My Router is identical to Django's documentation:
import random
class PrimaryReplicaRouter(object):
def db_for_read(self, model, **hints):
"""
Reads go to a randomly-chosen replica.
"""
return random.choice(['replica1', 'replica2'])
def db_for_write(self, model, **hints):
"""
Writes always go to primary.
"""
return 'primary'
def allow_relation(self, obj1, obj2, **hints):
"""
Relations between objects are allowed if both objects are
in the primary/replica pool.
"""
db_list = ('primary', 'replica1', 'replica2')
if obj1._state.db in db_list and obj2._state.db in db_list:
return True
return None
def allow_migrate(self, db, app_label, model_name=None, **hints):
"""
All non-auth models end up in this pool.
"""
return True
Solved it with :
class Model(models.Model):
objects = models.Manager() -> objects only access master
sobjects = ReplicasManager() -> sobjects access either master and replicas
class Meta:
abstract = True -> so django doesn't create a table
make every model extend this one instead of models.Model, and then use objects or sobjects whether I want to access only master or if want to access either master or replicas
Depending on the size of the data and the application I'd tackle this with either of the following methods:
Database pinning:
Extend your database router to allow pinning functions to specific databases. For example:
from customrouter.pinning import use_master
#use_master
def save_and_fetch_foo():
...
A good example of that can be seen in django-multidb-router.
Of course you could just use this package as well.
Use a model manager to route queries to specific databases.
class MyManager(models.Manager):
def get_queryset(self):
qs = CustomQuerySet(self.model)
if self._db is not None:
qs = qs.using(self._db)
return qs
Write a middleware that'd route your requests to master/slave automatically.
Basically same as the pinning method but you wouldn't specify when to run GET requests against master.
IN master replica conf the new data will take few millisecond to replicate the data on all other replica server/database.
so whenever u tried to read after write it wont gives you correct result.
Instead of reading from replica you can use master to read immediately after write by using using('primary') keyword with your get query.
Related
I'm trying to create an API endpoint on my Django project to retrieve data from my frontend.
I'm using two DBs on my django project, the first one is a SQLite DB, the second one is a MongoDB database, the data i need to retrieve is on MongoDB.
Here is my model:
class tst(models.Model):
_id = models.CharField(max_length=100)
ticker = models.FloatField()
def save(self): # ALL the signature
super(Trade, self).save(using='dbtwo')
Here is my view:
class tstList(generics.ListCreateAPIView):
queryset = tst.objects.all()
serializer_class = tstSerializer
And the url:
path('tst/', views.tstList.as_view()),
Everything is alright here but when i try to open the API from my browser, i keep getting the following error:
OperationalError at /tst/
no such table: main_tst
I think this happens because it tries to look for the table tst on the first SQLite database, instead of looking for it on the MongoDB one. Is there any way to solve this? I thought that adding using='dbtwo' would do it, but it's not the right solution.
Every advice is appreciated!
You need to define the database that you are using in the queryset for your API view
class tstList(generics.ListCreateAPIView):
queryset = tst.objects.using('dbtwo').all()
serializer_class = tstSerializer
Even better than this, if the model will only ever use the other database, you can set up a router so that you do not have to set "using" every time
class MyRouter:
def db_for_read(model, **hints):
if model == tst:
return 'dbtwo'
def db_for_write(model, **hints):
if model == tst:
return 'dbtwo'
# In your settings
DATABASE_ROUTERS = ['path.to.MyRouter']
My flask app centers around modifying models based on SQLAlchemy. Hence, I find flask-admin a great plugin because it maps my SQLA models to forms with views already defined with a customizable interface that is tried and tested.
I understand that Flask-admin is intended to be a plugin for administrators managing their site's data. However, I don't see why I can't use FA as a framework for my users to CRUD their data as well.
To do this, I have written the following:
class AuthorizationRequiredView(BaseView):
def get_roles(self):
raise NotImplemented("Override AuthorizationRequiredView.get_roles not set.")
def is_accessible(self):
if not is_authenticated():
return False
if not current_user.has_role(*self.get_roles()):
return False
return True
def inaccessible_callback(self, name, **kwargs):
if not is_authenticated():
return current_app.user_manager.unauthenticated_view_function()
if not current_user.has_role(*self.get_roles()):
return current_app.user_manager.unauthorized_view_function()
class InstructionModelView(DefaultModelView, AuthorizationRequiredView):
def get_roles(self):
return ["admin", "member"]
def get_query(self):
"""Jails the user to only see their instructions.
"""
base = super(InstructionModelView, self).get_query()
if current_user.has_role('admin'):
return base
else:
return base.filter(Instruction.user_id == current_user.id)
#expose('/edit/', methods=('GET', 'POST'))
def edit_view(self):
if not current_user.has_role('admin'):
instruction_id = request.args.get('id', None)
if instruction_id:
m = self.get_one(instruction_id)
if m.user_id != current_user.id:
return current_app.user_manager.unauthorized_view_function()
return super(InstructionModelView, self).edit_view()
#expose('/delete/', methods=('POST',))
def delete_view(self):
return_url = get_redirect_target() or self.get_url('.index_view')
if not self.can_delete:
return redirect(return_url)
form = self.delete_form()
if self.validate_form(form):
# id is InputRequired()
id = form.id.data
model = self.get_one(id)
if model is None:
flash(gettext('Record does not exist.'), 'error')
return redirect(return_url)
# message is flashed from within delete_model if it fails
if self.delete_model(model):
if not current_user.has_role('admin') \
and model.user_id != current_user.id:
# Denial: NOT admin AND NOT user_id match
return current_app.user_manager.unauthorized_view_function()
flash(gettext('Record was successfully deleted.'), 'success')
return redirect(return_url)
else:
flash_errors(form, message='Failed to delete record. %(error)s')
return redirect(return_url)
Note: I am using Flask-User which is built on top of Flask-Login.
The code above works. However, it is difficult to abstract as a base class for other models which I would like to implement access control for CRUD operations and Index/Edit/Detail/Delete views.
Mainly, the problems are:
the API method, is_accessible, does not provide the primary key of the model. This key is needed because in almost all cases relationships between users and entities are almost always stored via relationships or in the model table directly (i.e. having user_id in your model table).
some views, such as delete_view, do not provide the instance id that can be retrieve easily. In delete_view, I had to copy the entire function just to add one extra line to check if it belongs to the right user.
Surely someone has thought about these problems.
How should I go about rewriting this to something that is more DRY and maintainable?
I have two Databases defined, default which is a regular MySQL backend andredshift (using a postgres backend). I would like to use RedShift as a read-only database that is just used for django-sql-explorer.
Here is the router I have created in my_project/common/routers.py:
class CustomRouter(object):
def db_for_read(self, model, **hints):
return 'default'
def db_for_write(self, model, **hints):
return 'default'
def allow_relation(self, obj1, obj2, **hints):
db_list = ('default', )
if obj1._state.db in db_list and obj2._state.db in db_list:
return True
return None
def allow_migrate(self, db, app_label, model_name=None, **hints):
return db == 'default'
And my settings.py references it like so:
DATABASE_ROUTERS = ['my_project.common.routers.CustomRouter', ]
The problem occurs when invoking makemigrations, Django throws an error with the indication that it is trying to create django_* tables in RedShift (and obviously failing because of the postgres type serial not being supported by RedShift:
...
raise MigrationSchemaMissing("Unable to create the django_migrations table (%s)" % exc)
django.db.migrations.exceptions.MigrationSchemaMissing: Unable to create the django_migrations table (Column "django_migrations.id" has unsupported type "serial".)
So my question is two-fold:
Is it possible to completely disable Django Management for a database, but still use the ORM?
Barring Read-Only Replicas, why has Django not considered it an acceptable use case to support read-only databases?
Related Questions
- Column 'django_migrations.id' has unsupported type 'serial' [ with Amazon Redshift]
I just discovered that this is the result of a bug. It's been addressed in a few PRs, most notably: https://github.com/django/django/pull/7194
So, to answer my own questions:
No. It's not currently possible. The best solution is to use a custom Database Router in combination with a read-only DB account and have allow_migrate() return False in the router.
The best solution is to upgrade to Django >= 1.10.4 and not use a Custom Database Router, which stops the bug. However, this is a caveat if you have any other databases defined, such as a Read-Replica.
I have a model called Requests which I want to save in different database than default django databse.
The reason for this is that that table is going to record every request for analytics and that is going to get populated very heavily. As I am taking database backups hourly so I don't want to increase the db size just for that table.
So I was thinking of puting in separate DB so that I don't backup it up more often.
This docs says like this
https://docs.djangoproject.com/en/dev/topics/db/multi-db/
def db_for_read(self, model, **hints):
"""
Reads go to a randomly-chosen slave.
"""
return random.choice(['slave1', 'slave2'])
def db_for_write(self, model, **hints):
"""
Writes always go to master.
"""
return 'master'
Now I am not sure how can I check that if my model is Requests then choose database A else database B
Models are just classes - so check, if you have right class. This example should work for you:
from analytics.models import Requests
def db_for_read(self, model, **hints):
"""
Reads go to default database, unless it is about requests
"""
if model is Requests:
return 'database_A'
else:
return 'database_B'
def db_for_write(self, model, **hints):
"""
Writes go to default database, unless it is about requests
"""
if model is Requests:
return 'database_A'
else:
return 'database_B'
If you wish, though, you can also use one of some other techniques (such as checking model.__name__ or looking at model._meta).
One note, though: the requests should not have foreign keys connecting them to models in other databases. But you probably already know that.
Is there any plugin or 3rd party backend to manage redis connections in Django, so the methods in view.py don't have to explicitly connect to redis for every request?
If not, how would you start implementing one? A new plugin? a new backend? a new django middleware?
Thank you.
I think the emerging standard for non-rel databases is django-nonrel . I don't know if django-nonrel is production ready or if support redis, but they have a guide on writing a custom no-sql backend.
Unfortunately, i don't think that writing support for a redis on standard django is easy as writing a DatabaseBackend. There's a lot in django models mechanics and workflow that simply assumes an ACID database. What about syncdb ? And about Querysets?
However, you may try to write a poor-mans approach using models.Manager and a lot of tweaking on your model. For example:
# helper
def fill_model_instance(instance, values):
""" Fills an model instance with the values from dict values """
attributes = filter(lambda x: not x.startswith('_'), instance.__dict__.keys())
for a in attributes:
try:
setattr(instance, a, values[a.upper()])
del values[a.upper()]
except:
pass
for v in values.keys():
setattr(instance, v, values[v])
return instance
class AuthorManager( models.Manager ):
# You may try to use the default methods.
# But should be freaking hard...
def get_query_set(self):
raise NotImplementedError("Maybe you can write a Non relational Queryset()! ")
def latest(self, *args, **kwargs):
# redis Latest query
pass
def filter(self, *args, **kwargs):
# redis filter query
pass
# Custom methods that you may use, instead of rewriting
# the defaults ones.
def open_connection(self):
# Open a redis connection
pass
def search_author( self, *args, **kwargs ):
self.open_connection()
# Write your query. I don't know how this shiny non-sql works.
# Assumes it returns a dict for every matched author.
authors_list = [{'name': 'Leibniz', 'email': 'iinventedcalculus#gmail.com'},
'name': 'Kurt Godel','email': 'self.consistent.error#gmail.com'}]
return [fill_instance(Author(), author) for author in authors_list]
class Author( models.Model ):
name = models.CharField( max_length = 255 )
email = models.EmailField( max_length = 255 )
def save(self):
raise NotImplementedError("TODO: write a redis save")
def delete(self):
raise NotImplementedError(""TODO: write a delete save")
class Meta:
managed = False
Please not that i've only made an sketch of how you can tweak the django models. I have not
tested and run this code. I first suggest you to investigate django-nonrel.