Store Subtitles in a Database - python

I'm working on a project that uses AI to recognise the speech of an audio file. The output of this AI is a huge JSON object with tons of values. I'll remove some keys, and the final structure will look as follows.
{
text: "<recognised text>",
language: "<detected language>"
segments: [
{startTimestamp: "00:00:00", endTimestamp: "00:00:10", text: "<some text>"},
{startTimestamp: "00:00:10", endTimestamp: "00:00:17", text: "<some text>"},
{startTimestamp: "00:00:17", endTimestamp: "00:00:26", text: "<some text>"},
{ ... },
{ ... }
]
}
Now, I wish to store this new trimmed object in a SQL database because I wish to be able to edit it manually. I'll create a React application to edit segments, delete segments, etc. Additionally, I want to add this feature to the React application, where the information will be saved every 5 seconds using an AJAX call.
Now, I don't understand how I should store this object in the SQL database. Initially, I thought I would store the whole object as a string in a database. Whenever some change is made to the object, I'll send a JSON object from the React application, the backend will sanitize it and then replace the old stringified object in the database with the new sanitised string object. This way updating and deletion will happen with ease but there can be issues in case of searching. But I'm wondering if there are any better approaches to do this.
Could someone guide me on this?
Tech Stack
Frontend - React
Backend - Django 3.2.15
Database - PostgreSQL
Thank you

Now, I don't understand how I should store this object in the SQL database. Initially, I thought I would store the whole object as a string in a database.
If the data has a clear structure, you should not store it as a JSON blob in a relational database. While relational databases have some support for JSON nowadays, it is still not very effective, and normally it means you can not effectively filter, aggregate, and manipulate data, nor can you check referential integrity.
You can work with two models that look like:
from django.db import models
from django.db.models import F, Q
class Subtitle(models.Model):
text = models.CharField(max_length=128)
language = models.CharField(max_length=128)
class Segment(models.Model):
startTimestamp = models.DurationField()
endTimestamp = models.DurationField()
subtitle = models.ForeignKey(
Subtitle, on_delete=models.CASCADE, related_name='segments'
)
text = models.CharField(max_length=512)
class Meta:
ordering = ('subtitle', 'startTimestamp', 'endTimestamp')
constraints = [
models.CheckConstraint(
check=Q(startTimestamp__gt=F('endTimestamp')),
name='start_before_end',
)
]
This will also guarantee that the startTimestamp is before the endTimestamp for example, that these fields store durations (and not "foo" for example).
You can convert from and to JSON with serializers [drf-doc]:
from rest_framework import serializers
class SegmentSerializer(serializers.ModelSerializer):
class Meta:
model = Segment
fields = ['startTimestamp', 'endTimestamp', 'text']
class SubtitleSerializer(serializers.ModelSerializer):
segments = SegmentSerializer(many=True)
class Meta:
model = Subtitle
fields = ['text', 'language', 'segments']

Related

Python flask Graphene: Mapping fields with API response

I'm building a graphql api using python flask and python graphene.
basically my json file data looks like following.
{
"address":{
"streetAddress":"301",
"#city":"Los Angeles",
"state":"CA"
}
}
And my graphene schema looks like follow.
class Address(ObjectType):
streetAddress = String()
city = String()
state = String()
class Meta:
exclude_fields = ('#city',)
class Common(ObjectType):
data = Field(Address)
def resolve_data(self, info):
data = open("address.json", "r")
data_mod = json.loads(data.read())["address"]
return data_mod
So I am trying to map this #city json key value to my schema field called city.
I saw one of the articles and in that, it mentioned that using the meta class we can exclude original field name like this.
class Meta:
exclude_fields = ('#city',)
Still it didn't work. And I am using a common schema to fetch the json data to Address schema fields by using one resolver. Can someone tell me a solution to map these kind of fields to graphene schema fields.

django email list field

I want to create field that hods list of emails. My model is like this :
class Alerts(models.Model):
emails = MultiEmailField()
events = models.OneToOneField(Event, on_delete=models.CASCADE)
All good, but when this models is saved in DB it is like this
{
"id": 11,
"emails": "['ht#ht.com, bb#bb.com']",
"events": 13
}
The list in 'emails' key is represented as a string "['ht#ht.com, bb#bb.com']" , not as a list ['ht#ht.com, bb#bb.com']. I tried different method, parsers and approaches, but without success. Is there a way to save list in DB or not? If it is the second is there a way to receive back a list as a response, not a string, that is fine too. ( DB is MYSQL)
You can use ListCharField from django-mysql lib.
from django.db import models
from django_mysql.models import ListCharField
class Alerts(models.Model):
emails = ListCharField(
base_field=models.EmailField(...),
...
)
events = models.OneToOneField(Event, on_delete=models.CASCADE)
You can just split the string to get the emails in a list like this:
email_list = emails.replace("['", "").replace("']", "").split(", ")
Django 3.1 added models.JsonField which can be used for all DB backends. You can use this to store a list
class Alerts(models.Model):
emails = models.JsonField()
Building on the answer of Dacx. And from https://stackoverflow.com/a/1894296/5731101
How about:
import ast
class Alerts(models.Model):
raw_emails = MultiEmailField()
events = models.OneToOneField(Event, on_delete=models.CASCADE)
#property
def emails(self):
return ast.literal_eval(self.raw_emails)

Django : Migration of polymorphic models back to a single base class

Let's suppose I have a polymorphic model and I want to get rid of it.
class AnswerBase(models.Model):
question = models.ForeignKey(Question, related_name="answers")
response = models.ForeignKey(Response, related_name="answers")
class AnswerText(AnswerBase):
body = models.TextField(blank=True, null=True)
class AnswerInteger(AnswerBase):
body = models.IntegerField(blank=True, null=True)
When I want to get all the answers I can never access "body" or I need to try to get the instance of a sub-class by trial and error.
# Query set of answerBase, no access to body
AnswerBase.objects.all()
question = Question.objects.get(pk=1)
# Query set of answerBase, no access to body (even with django-polymorphic)
question.answers.all()
I don't want to use django-polymorphic because of performances, because it does not seem to work for foreignKey relation, and because I don't want my model to be too complicated. So I want this polymorphic architecture to become this simplified one :
class Answer(models.Model):
question = models.ForeignKey(Question, related_name="answers")
response = models.ForeignKey(Response, related_name="answers")
body = models.TextField(blank=True, null=True)
The migrations cannot be created automatically, it would delete all older answers in the database. I've read the Schema Editor documentation but it does not seem there is a buildin to migrate a model to something that already exists. So I want to create my own operation to save the AnswerText and AnswerInteger as an Answer then delete AnswerText and AnswerInteger. I'm hoping I won't have to write SQL directly, but maybe that's the only solution ? My migration file looks like this. I created an Operation called MigrateAnswer :
from myapp.migrations import MigrateAnswer
class Migration(migrations.Migration):
operations = [
migrations.RenameModel("AnswerBase", "Answer"),
migrations.AddField(
model_name='answer',
name='body',
field=models.TextField(blank=True, null=True),
),
MigrateAnswer("AnswerInteger"),
MigrateAnswer("AnswerText"),
migrations.DeleteModel(name='AnswerInteger',),
migrations.DeleteModel(name='AnswerText',),
]
So what I want to do in MigrateAnswer is to migrate the value for an old model (AnswerInteger and AnswerText) to the base class (now named Answer, previousely AnswerBase). Here's my operation class :
from django.db.migrations.operations.base import Operation
class MigrateAnswer(Operation):
reversible = False
def __init__(self, model_name):
self.old_name = model_name
def database_forwards(self, app_label, schema_editor, from_state,
to_state):
new_model = to_state.apps.get_model(app_label, "Answer")
old_model = from_state.apps.get_model(app_label, self.old_name)
for field in old_model._meta.local_fields:
# loop on "question", "reponse" and "body"
# schema_editor.alter_field() Alter a field on a single model
# schema_editor.add_field() + remove_field() Does not permit
# to migrate the value from the old field to the new one
pass
So my question is : Is it possible to do this wihout using "execute" (ie : without writing SQL). If so what should I do in the for loop of my Operation ?
Thanks in advance !
There is no need to write an Operations class; data migrations can be done simply with a RunPython call, as the docs show.
Within that function you can use perfectly normal model instance methods; since you know the fields you want to move the data for, there is no need to get them via meta lookups.
However you will need to temporarily call the new body field a different name, so it doesn't conflict with the old fields on the subclasses; you can rename it back at the end and delete the child classes because the value will be in the base class.
def migrate_answers(apps, schema_editor):
classes = []
classes_str = ['AnswerText', 'AnswerInteger']
for class_name in classes_str:
classes.append(apps.get_model('survey', class_name))
for class_ in classes:
for answer in class_.objects.all():
answer.new_body = answer.body
answer.save()
operations = [
migrations.AddField(
model_name='answerbase',
name='new_body',
field=models.TextField(blank=True, null=True),
),
migrations.RunPython(migrate_answers),
migrations.DeleteModel(name='AnswerInteger',),
migrations.DeleteModel(name='AnswerText',),
migrations.RenameField('AnswerBase', 'new_body', 'body'),
migrations.RenameModel("AnswerBase", "Answer"),
]
You could create an empty migration for the app you want to do these modifications and use the migrations.RunPython Class to execute custom python functions.
Inside these functions you can have access to your models
The Django ORM that you can do data manipulation.
Pure python, no raw SQL.

How to post to a Django REST Framework API with Related Models

I have two related models (Events + Locations) with a serialzer shown below:
class Locations
title = models.CharField(max_length=250)
address = model.CharField(max_length=250)
class Events
title = models.CharField(max_length=250)
locations = models.ForeignKey(Locations, related_name='events'
class EventsSerializer(serializers.ModelSerializer):
class Meta:
model = Events
depth = 1
I set the depth to 1 in the serializer so I can get the information from the Locations model instead of a single id. When doing this however, I cant post to events with the location info. I can only perform a post with the title attribute. If I remove the depth option in the serializer, I can perform the post with both the title and location id.
I tried to create a second serializer (EventsSerialzerB) without the depth field with the intention of using the first one as a read-only response, however when I created a second serializer, viewset, and added it to the router, it would automatically override the original viewset.
Is it possible for me to create a serializer that outputs the related model fields, and allows you to post directly to the single model?
// EDIT - Here's what I'm trying to post
$scope.doClick = function (event) {
var test_data = {
title: 'Event Test',
content: 'Some test content here',
location: 2,
date: '2014-12-16T11:00:00Z'
}
// $resource.save() doesn't work?
$http.post('/api/events/', test_data).
success(function(data, status, headers, config) {
console.log('sucess', status);
}).
error(function(data, status, headers, config) {
console.log('error', status);
});
}
So when the serializers are flat, I can post all of these fields. The location field is the id of a location from the related Locations table. When they are nested, I can't include the location field in the test data.
By setting the depth option on the serializer, you are telling it to make any relation nested instead of flat. For the most part, nested serializers should be considered read-only by default, as they are buggy in Django REST Framework 2.4 and there are better ways to handle them in 3.0.
It sounds like you want a nested representation when reading, but a flat representation when writing. While this isn't recommended, as it means GET requests don't match PUT requests, it is possible to do this in a way to makes everyone happy.
In Django REST Framework 3.0, you can try the following to get what you want:
class LocationsSerializer(serializers.ModelSerializer):
class Meta:
model = Locations
fields = ('title', 'address', )
class EventsSerializer(serializers.ModelSerializer):
locations = LocationsSerializer(read_only=True)
class Meta:
model = Events
fields = ('locations', )
class EventViewSet(viewsets.ModelViewSet):
queryet = Event.objects.all()
serializer_class = EventsSerializer
def perform_create(self, serializer):
serializer.save(locations=self.request.data['locations'])
def perform_update(self, serializer):
serializer.save(locations=self.request.data['locations'])
A new LocationsSerializer was created, which will handle the read-only nested representation of the Locations object. By overriding perform_create and perform_update, we can pass in the location id that was passed in with the request body, so the location can still be updated.
Also, you should avoid having model names being plurals. It's confusing when Events.locations is a single location, even though Locations.events is a list of events for the location. Event.location and Location.events reads a bit more clearly, the Django admin will display them reasonably, and your fellow developers will be able to easily understand how the relations are set up.

Is it possible to add fields not present in the structure on the fly?

I was trying out mongokit and I'm having a problem. I thought it would be possible to add fields not present in the schema on the fly, but apparently I can't save the document.
The code is as below:
from mongokit import *
connection = Connection()
#connection.register
class Test(Document):
structure = {'title': unicode, 'body': unicode}
On the python shell:
test = connection.testdb.testcol.Test()
test['foo'] = u'bar'
test['title'] = u'my title'
test['body'] = u'my body'
test.save()
This gives me a
StructureError: unknown fields ['foo'] in Test
I have an application where, while I have a core of fields that are always present, I can't predict what new fields will be necessary beforehand. Basically, in this case, it's up to the client to insert what fields it find necessary. I'll just receive whatever he sends, do my thing, and store them in mongodb.
But there is still a core of fields that are common to all documents, so it would be nice to type and validate them.
Is there a way to solve this with mongokit?
According to the MongoKit structure documentation you can have optional fields if you use the Schemaless Structure feature.
As of version 0.7, MongoKit allows you to save partially structured documents.
So if you set up your class like this, it should work:
from mongokit import *
class Test(Document):
use_schemaless = True
structure = {'title': unicode, 'body': unicode}
required_fields = [ 'title', 'body' ]
That will require title and body but should allow any other fields to be present. According to the docs:
MongoKit will raise an exception only if required fields are missing

Categories

Resources