How to stop SELECT before INSERT with Django REST Framework

How to stop SELECT before INSERT with Django REST Framework - python

I've used Django REST Framework to expose an API which is only used by another service to POST new data. It basically just takes json and inserts it in the DB. That's all.
It's quite a high volume data source (sometimes more than 100 records/second), so I need to tune it a bit.
So I was logging the (PostgreSQL) queries that are run, and I see that every POST gives 3 queries:
2019-10-01 11:09:03.320 CEST [23983] postgres#thedb LOG: statement: SET TIME ZONE 'UTC'
2019-10-01 11:09:03.322 CEST [23983] postgres#thedb LOG: statement: SELECT (1) AS "a" FROM "thetable" WHERE "thetable"."id" = 'a7f74e5c-7cad-4983-a909-49857481239b'::uuid LIMIT 1
2019-10-01 11:09:03.363 CEST [23983] postgres#thedb LOG: statement: INSERT INTO "thetable" ("id", "version", "timestamp", "sensor", [and 10 more fields...]) VALUES ('a7f74e5c-7cad-4983-a909-49857481239b'::uuid, '1', '2019-10-01T11:09:03.313690+02:00'::timestamptz, 'ABC123', [and 10 more fields...])
I tuned the DB for INSERTs to be fast, but SELECTs are slow. So I would like to remove the SELECT from the system. I added this line to the Serializer:
id = serializers.UUIDField(validators=[])
But it still does a SELECT. Does anybody know how I can prevent the SELECT from happening?
For complete info; the full Serializer now looks like this:
import logging
from rest_framework import serializers
from .models import TheData
log = logging.getLogger(__name__)
class TheDataSerializer(serializers.HyperlinkedModelSerializer):
class Meta:
model = TheData
fields = [
'id',
'version',
'timestamp',
'sensor',
[and 10 more fields...]
]
class TheDataDetailSerializer(serializers.ModelSerializer):
id = serializers.UUIDField(validators=[])
class Meta:
model = TheData
fields = '__all__'
Edit
And as requested by frankie567, the ViewSet:
class TheDataViewSet(DetailSerializerMixin, viewsets.ModelViewSet):
serializer_class = serializers.TheDataSerializer
serializer_detail_class = serializers.TheDataDetailSerializer
queryset = TheData.objects.all().order_by('timestamp')
http_method_names = ['post', 'list', 'get']
filter_backends = [DjangoFilterBackend]
filter_class = TheDataFilter
pagination_class = TheDataPager
def get_serializer(self, *args, **kwargs):
""" The incoming data is in the `data` subfield. So I take it from there and put
those items in root to store it in the DB"""
request_body = kwargs.get("data")
if request_body:
new_request_body = request_body.get("data", {})
new_request_body["details"] = request_body.get("details", None)
request_body = new_request_body
kwargs["data"] = request_body
serializer_class = self.get_serializer_class()
kwargs['context'] = self.get_serializer_context()
return serializer_class(*args, **kwargs)

After some digging, I was able to see where this behaviour comes from. If you look at Django Rest Framework code:
if getattr(model_field, 'unique', False):
unique_error_message = model_field.error_messages.get('unique', None)
if unique_error_message:
unique_error_message = unique_error_message % {
'model_name': model_field.model._meta.verbose_name,
'field_label': model_field.verbose_name
}
validator = UniqueValidator(
queryset=model_field.model._default_manager,
message=unique_error_message)
validator_kwarg.append(validator)
We see that if unique is True (which is in your case, as I guess you defined your UUID field as primary key), DRF adds automatically a UniqueValidator. This validator performs a SELECT request to check if the value doesn't already exist.
It is appended to the ones you are defining in the validators parameter of the field, so that's why what you did has no effect.
So, how do we circumvent this?
First attempt
class TheDataDetailSerializer(serializers.ModelSerializer):
# ... your code
def get_fields(self):
fields = super().get_fields()
fields['id'].validators.pop()
return fields
Basically, we remove the validators of the id field after they have been generated. There are surely more clever ways to do this. It seems to me though that DRF may be too opinionated on this matter.
Second attempt
class TheDataDetailSerializer(serializers.ModelSerializer):
# ... your code
def build_standard_field(self, field_name, model_field):
field_class, field_kwargs = super().build_standard_field(field_name, model_field)
if field_name == 'id':
field_kwargs['validators'] = []
return field_class, field_kwargs
When generating the field arguments, set an empty validators list if we are generating the id field.

Related

Django FilterSet with annotations always returns blank response

I do have a rather simple FilterSet that I want to use on a queryset with annotations, but the issue is that it always return an empty result for some reason.
This is the filter I'm having issues with
class BaseGroupFilter(django_filters.FilterSet):
joined = django_filters.BooleanFilter(lookup_expr='exact')
class Meta:
model = Group
fields = dict(id=['exact'],
name=['exact', 'icontains'],
direct_join=['exact'])
And this is the service:
def group_list(*, fetched_by: User, filters=None):
filters = filters or {}
joined_groups = Group.objects.filter(id=OuterRef('pk'), groupuser__user__in=[fetched_by])
qs = _group_get_visible_for(user=fetched_by).annotate(joined=Exists(joined_groups)).all()
return BaseGroupFilter(filters, qs).qs

Django: Search results with django-tables2 and django-filter

I'd like to retrieve a model's objects via a search form but add another column for search score. I'm unsure how to achieve this using django-tables2 and django-filter.
In the future, I'd like the user to be able to use django-filter to help filter the search result. I can access the form variables from PeopleSearchListView but perhaps it's a better approach to integrate a django form for form handling?
My thought so far is to handle to the get request in get_queryset() and then modify the queryset before it's sent to PeopleTable, but adding another column to the queryset does not seem like a standard approach.
tables.py
class PeopleTable(tables.Table):
score = tables.Column()
class Meta:
model = People
template_name = 'app/bootstrap4.html'
exclude = ('id',)
sequence = ('score', '...')
views.py
class PeopleFilter(django_filters.FilterSet):
class Meta:
model = People
exclude = ('id',)
class PeopleSearchListView(SingleTableMixin, FilterView):
table_class = PeopleTable
model = People
template_name = 'app/people.html'
filterset_class = PeopleFilter
def get_queryset(self):
p = self.request.GET.get('check_this')
qs = People.objects.all()
####
# Run code to score users against "check_this".
# The scoring code I'm using is complex, so below is a simpler
# example.
# Modify queryset using output of scoring code?
####
for person in qs:
if person.first_name == 'Phil' and q == 'Hey!':
score = 1
else:
score = 0
return qs
urls.py
urlpatterns = [
...
path('search/', PeopleSearchListView.as_view(), name='search_test'),
... ]
models.py
class People(models.model):
first_name = models.CharField(max_length=200)
last_name = models.CharField(max_length=200)
Edit:
The scoring algorithm is a bit more complex than the above example. It requires a full pass over all of the rows in the People table to generate a score matrix, before finally comparing each scored row with the search query. It's not a one-off score. For example:
def get_queryset(self):
all = []
for person in qs:
all.append(person.name)
# Do something complex with all,
# e.g., measure cosine distance between every person,
# and finally compare to the get request
scores = measure_cosine(all, self.request.GET.get('check_this'))
# We now have the scores for each person.

So you can add extra columns when you initialise the table.
I've got a couple of tables which do this based on events in the system;
def __init__(self, *args, **kwargs):
"""
Override the init method in order to add dynamic columns as
we need to declare one column per existent event on the system.
"""
extra_columns = []
events = Event.objects.filter(
enabled=True,
).values(
'pk', 'title', 'city'
)
for event in events:
extra_columns.append((
event['city'],
MyColumn(event_pk=event['pk'])
))
if extra_columns:
kwargs.update({
'extra_columns': extra_columns
})
super().__init__(*args, **kwargs)
So you could add your score column similar to this when a score has been provided. Perhaps passing your scores into the table from the view so you can identify they're present and add the column, then use the data when rendering the column.
extra_columns doesn't appear to be in the tables2 docs, but you can find the code here; https://github.com/jieter/django-tables2/blob/master/django_tables2/tables.py#L251

When you define a new column for django-tables2 which is not included in table data or queryset, you should provide a render method to calculate it's value.
You don't have to override get_queryset if a complex filtering, preprocess or join required.
In your table class:
class PeopleTable(tables.Table):
score = tables.Column(accessor="first_name")
class Meta:
model = People
def render_score(self, record):
return 1 if record["first_name"] == "Phil" and q == "Hey!" else 0
In your view you can override and provide complex data as well as special filtering or aggregates with get_context_data:
def get_context_data(self, **kwargs):
context = super().get_context_data(**kwargs)
context["filter"] = self.filter
aggs = {
"score": Function("..."),
"other": Sum("..."),
}
_data = (
People.objects.filter(**params)
.values(*values)
.annotate(**aggs)
.order_by(*values)
.distinct()
)
df = pandas.DataFrame(_data)
df = df....
chart_data = df.to_json()
data = df.to_dict()...
self.table = PeopleTable(data)
context["table"] = self.table
context['chart_data']=chart_data
return context

Django - Tastypie provide summary of data returned

Hi I have a resource named employees which have 10 columns. How can I create a /employees/summary/ endpoint which only returns 5 columns but have all features of the main /employees/ endpoint such as filters and ordering. What i have tried to do is modify the result from get_list() but thats turning out to be hard.
class EmployeeResouece(ModelResource):
class Meta:
queryset = Employees.objects.all()
resource_name = 'employees'
allowed_methods = ['get','post','put','patch']
def prepend_urls(self):
return [
url(r"^(?P<resource_name>%s)/summary%s$" % (self._meta.resource_name, trailing_slash()), self.wrap_view('summary'), name="summary"),
]
def summary(self, request, **kwargs):
result=EmployeeResouece().get_list(request)
### LIMIT COLUMNS TO 5###
return result

Found a solution by pasting the code for get_list method in resources.py
base_bundle = self.build_bundle(request=request)
objects = self.obj_get_list(bundle=base_bundle, **self.remove_api_resource_names(kwargs))
sorted_objects = self.apply_sorting(objects, options=request.GET)
paginator = self._meta.paginator_class(request.GET, sorted_objects, resource_uri=self.get_resource_uri(), limit=self._meta.limit, max_limit=self._meta.max_limit, collection_name=self._meta.collection_name)
to_be_serialized = paginator.page()
# Dehydrate the bundles in preparation for serialization.
bundles = [
self.full_dehydrate(self.build_bundle(obj=obj, request=request), for_list=True)
for obj in to_be_serialized[self._meta.collection_name]
]
whitelist_columns=['id','name','dept']
for bundle in bundles:
bundle.data = { k : v for k,v in bundle.data.items() if k in whitelist_columns}
to_be_serialized[self._meta.collection_name] = bundles
to_be_serialized = self.alter_list_data_to_serialize(request, to_be_serialized)
return self.create_response(request, to_be_serialized)
Not the best method, as I am not using the in build excludes or fields meta data. But it works. If detail level implementation is needed, add a prepend_url and change the code in get_detail, your url regex needs to handle primary keys

How to set a deadline for POST requests based on related object?

I have two related model like this:
Form:
name
fields
date_deadline
FormEntry:
form = ForeignKey(Form)
data
I want to prevent adding new entry after submission deadline. I write a validation in serializer like this:
class FormEntrySerializer(serializers.ModelSerializer):
def validate(self, data):
from datetime import datetime
form = data.get('form')
if form.date_deadline and\
datetime.date(datetime.today()) > form.date_deadline:
message = 'Entries can\'t be added after submission deadline.'
raise serializers.ValidationError(message)
return data
class Meta:
model = FormEntry
fields = (
'id', 'form', 'data',
)
It works but I can't update an form entry too after submission deadline. I want to make this validation only for POST requests (means new insertions).
Also I'm not sure this is the best way to do it. Maybe I must use permissions.
How I do it?

You can check either an instance exists:
class FormEntrySerializer(serializers.ModelSerializer):
def validate(self, data):
from datetime import datetime
form = data.get('form')
if not self.instance and form.date_deadline and\
datetime.date(datetime.today()) > form.date_deadline:
message = 'Entries can\'t be added after submission deadline.'
raise serializers.ValidationError(message)
return data
class Meta:
model = FormEntry
fields = (
'id', 'form', 'data',
)
If the instance doesn't exist then it's being created, otherwise updated.
Check the docs.

Django Tastypie throws a 'maximum recursion depth exceeded' when full=True on reverse relation.

I get a maximum recursion depth exceeded if a run the code below:
from tastypie import fields, utils
from tastypie.resources import ModelResource
from core.models import Project, Client
class ClientResource(ModelResource):
projects = fields.ToManyField(
'api.resources.ProjectResource', 'project_set', full=True
)
class Meta:
queryset = Client.objects.all()
resource_name = 'client'
class ProjectResource(ModelResource):
client = fields.ForeignKey(ClientResource, 'client', full=True)
class Meta:
queryset = Project.objects.all()
resource_name = 'project'
# curl http://localhost:8000/api/client/?format=json
# or
# curl http://localhost:8000/api/project/?format=json
If a set full=False on one of the relations it works. I do understand why this is happening but I need both relations to bring data, not just the "resource_uri". Is there a Tastypie way to do it? I managed to solve the problem creating a serialization method on my Project Model, but it is far from elegant. Thanks.

You would have to override full_dehydrate method on at least one resource to skip dehydrating related resource that is causing the recursion.
Alternatively you can define two types of resources that use the same model one with full=Trueand another with full=False.

Thanks #astevanovic pointing the right direction.
I found that overriding dehydrate method to process only some specified fields is a bit less tedious than overriding full_hydrate method to skip fields.
In the pursuit of reusability, I came up with the following code snippets. Hope it would be useful to some:
class BeeModelResource(ModelResource):
def dehydrate(self, bundle):
bundle = super(BeeModelResource, self).dehydrate(bundle)
bundle = self.dehydrate_partial(bundle)
return bundle
def dehydrate_partial(self, bundle):
for field_name, resource_field in self.fields.items():
if not isinstance(resource_field, RelatedField):
continue
if resource_field.full: # already dehydrated
continue
if not field_name in self._meta.partial_fields:
continue
if isinstance(resource_field, ToOneField):
fk_object = getattr(bundle.obj, resource_field.attribute)
fk_bundle = Bundle(obj=fk_object, request=bundle.request)
fk_resource = resource_field.get_related_resource(fk_object)
bundle.data[field_name] = fk_resource.dehydrate_selected(
fk_bundle, self._meta.partial_fields[field_name]).data
elif isinstance(resource_field, ToManyField):
data = []
fk_objects = getattr(bundle.obj, resource_field.attribute)
for fk_object in fk_objects.all():
fk_bundle = Bundle(obj=fk_object, request=bundle.request)
fk_resource = resource_field.get_related_resource(fk_object)
fk_bundle = fk_resource.dehydrate_selected_fields(
fk_bundle, self._meta.partial_fields[field_name])
data.append(fk_bundle.data)
bundle.data[field_name] = data
return bundle
def dehydrate_selected_fields(self, bundle, selected_field_names):
# Dehydrate each field.
for field_name, field_object in self.fields.items():
# A touch leaky but it makes URI resolution work.
# (borrowed from tastypie.resources.full_dehydrate)
if field_name in selected_field_names and not self.is_special_fields(field_name):
if getattr(field_object, 'dehydrated_type', None) == 'related':
field_object.api_name = self._meta.api_name
field_object.resource_name = self._meta.resource_name
bundle.data[field_name] = field_object.dehydrate(bundle)
bundle.data['resource_uri'] = self.get_resource_uri(bundle.obj)
bundle.data['id'] = bundle.obj.pk
return bundle
#staticmethod
def is_special_fields(field_name):
return field_name in ['resource_uri']
With #sigmus' example, the resources will need 3 modifications:
both resource will use BeeModuleResource as its super class (or, add dehydrate_partial to one resource and dehydrate_selected to the other.)
unset full=True on either of the resource
add partial_fields into the resource Meta the unset resource
```
class ClientResource(BeeModelResource): # make BeeModelResource a super class
projects = fields.ToManyField(
'api.resources.ProjectResource', 'project_set'
) # remove full=True
class Meta:
queryset = Client.objects.all()
resource_name = 'client'
partial_fields = {'projects': ['memo', 'title']} # add partial_fields
class ProjectResource(BeeModelResource): # make BeeModelResource a super class
client = fields.ForeignKey(ClientResource, 'client', full=True)
class Meta:
queryset = Project.objects.all()
resource_name = 'project'

Dead simple solution: set the use_in = 'list' kwarg on both relationship fields!
The docs: http://django-tastypie.readthedocs.org/en/latest/fields.html#use-in

Develop Reference

Python is a programming language that lets you work quickly and integrate systems more effectively.

How to stop SELECT before INSERT with Django REST Framework - python

Related

Django FilterSet with annotations always returns blank response

Django: Search results with django-tables2 and django-filter

Django - Tastypie provide summary of data returned

How to set a deadline for POST requests based on related object?

Django Tastypie throws a 'maximum recursion depth exceeded' when full=True on reverse relation.

Categories

Resources