python mongoengine EmbeddedDocumentListField and aggregate

python mongoengine EmbeddedDocumentListField and aggregate - python

I wanna now if MongoEngine aggregate() functionality only works on QuerySet?
well, this is the project I'm working on
#imports
class Status(Enum):
ACTIVE = "active"
CLOSED = "closed"
class OrderType(Enum):
BUY = "buy"
SELL = "sell"
class Order(EmbeddedDocument):
type = EnumField(OrderType, required=True)
vol = IntField(required=True)
time = DateTimeField(required=True, default=datetime.now())
class Stock(EmbeddedDocument):
name = StringField(max_length=20, required=True)
orders = EmbeddedDocumentListField(Order, required=True)
status = EnumField(Status, required=True, default=Status.ACTIVE)
amount = IntField(required=True, default=0)
pnl = FloatField()
class Portfolio(Document):
account = IntField(required=True, unique=True)
stocks = EmbeddedDocumentListField(Stock, required=True)
balance = FloatField()
equity = FloatField()
I want to be able to make Stock.amount field be an aggregation of Order.vol which, as can be seen, is an EmbeddedDocumentListField of stock. I already know about mongoengine.signals and seen examples on aggregate(), but all of them use aggregate() on QuerySet returned by .objects(). So what is the tweak for it? Many Thanks.

Currently I come up with this:
def handler(event):
def decorator(fn):
def apply(cls):
event.connect(fn, sender=cls)
return cls
fn.apply = apply
return fn
return decorator
#handler(signals.post_init)
def update_amount(sender, document):
orders = document.orders
amnt = 0
for order in orders:
if order.type == OrderType.Buy:
amnt += order.vol
else:
amnt -= order.vol
document.amount = amnt
and then I added decorator #update_amount.apply to my Stock class. but it would work nicer with aggregate if it's possible

Related

Django convert Date of Birth model field to Age

I have a Django class to convert the date of birth (dob) field in my model to age and annotate the result to a queryset.
class CalculateAge(Case):
def __init__(self, model, field, condition=None, then=None, **lookups):
today = date.today()
obj = model.objects.first()
field_object = model._meta.get_field(field)
field_value = field_object.value_from_object(obj)
bornyear = field_value.year
bornmonth = field_value.month
bornday = field_value.day
# something is wrong with the next two lines
age = [today.year - bornyear - ((today.month, today.day) < (bornmonth, bornday))]
return super().__init__(*age, output_field=IntegerField())
however when I try to pass the result to my queryset
queryset = Person.objects.all().annotate(age=CalculateAge(Person, 'dob')
I get the error
Positional arguments must all be When objects.
How can I get this to work?

There is an easier way. Just add a function on your model to get the age like this:
class ModelName(models.Model):
birth_date = models.DateField()
#other fields
def get_age(self):
age = datetime.date.today()-self.birth_date
return int((age).days/365.25)

Django - How to join two querysets with different key values (but from same model)

I am trying to join two querysets in Django with different key values, but from the same model, is there any way to do so?
Here is my code:
models.py
class CustomerInformation(models.Model):
status = (
('lead', 'Lead'),
('client', 'Client'),
)
customer_id = models.AutoField(primary_key=True)
customer_name = models.CharField(max_length=100)
status = models.CharField(max_length=100, choices=status, default='lead')
conversion_date = models.DateField(null=True, blank=True)
created_date = models.DateField(default=timezone.localdate)
def save(self, *args, **kwargs):
if self.customer_id:
if self.status != CustomerInformation.objects.get(customer_id=self.customer_id).status and self.status == 'client':
self.conversion_date = timezone.now()
elif self.status != CustomerInformation.objects.get(customer_id=self.customer_id).status and self.status == 'lead':
self.conversion_date = None
super(CustomerInformation, self).save(*args, **kwargs)
here is my filtering
start = date.today() + relativedelta(days=-30)
client_qs = CustomerInformation.objects.filter(conversion_date__gte=start).values(date=F('conversion_date')).annotate(client_count=Count('date'))
lead_qs = CustomerInformation.objects.filter(created_date__gte=start).values(date=F('created_date')).annotate(lead_count=Count('date'))
Basically what I am trying to achieve is to get the count of CustomerInformation instances created in the past 30 days (by annotating the count of the field 'created_date'). Also, I want to get the count of CustomerInformation instances that have converted to 'client' status within the past 30 days (by annotating the count of the field 'conversion_date'). Is there any way for me to do so and receive them in a single queryset, preferably with a single date field?
For example, my desired output would be
[ {'date': '170620', 'lead_count': 2, 'client_count': 1}, {'date': '180620', 'lead_count': 1, 'client_count': 0 }, ... ]
All help is appreciated, thanks all!

I think you can achieve the same by doing:
combined_qs = CustomerInformation.objects.filter(created_date__gte=start, conversion_date__gte=start).annotate(lead_count=Count('created_date'), client_count=Count('conversion_date')
Note that the above will use AND the filter. Roughly that means it will only return customer information that have both created_date and conversion_date greater than date.( I might be wrong but I think that's what you want in this case).
Otherwise you can use the Q object for a more complex query.
from django.db.models import Q
combined_qs = CustomerInformation.objects.filter(Q(created_date__gte=start)|Q(conversion_date__gte=start)).annotate(lead_count=Count('created_date'), client_count=Count('conversion_date')
The official doc for using the Q objects is here
I will say try both and compare the results you get.
Hope that helps.

How to update queryset value in django?

I have written a python script in my project. I want to update the value of a field.
Here are my modes
class News_Channel(models.Model):
name = models.TextField(blank=False)
info = models.TextField(blank=False)
image = models.FileField()
website = models.TextField()
total_star = models.PositiveIntegerField(default=0)
total_user = models.IntegerField()
class Meta:
ordering = ["-id"]
def __str__(self):
return self.name
class Count(models.Model):
userId = models.ForeignKey(User, on_delete=models.CASCADE)
channelId = models.ForeignKey(News_Channel, on_delete=models.CASCADE)
rate = models.PositiveIntegerField(default=0)
def __str__(self):
return self.channelId.name
class Meta:
ordering = ["-id"]
This is my python script:
from feed.models import Count, News_Channel
def run():
for i in range(1, 11):
news_channel = Count.objects.filter(channelId=i)
total_rate = 0
for rate in news_channel:
total_rate += rate.rate
print(total_rate)
object = News_Channel.objects.filter(id=i)
print(total_rate)
print("before",object[0].total_star,total_rate)
object[0].total_star = total_rate
print("after", object[0].total_star)
object.update()
After counting the total_rate from the Count table I want to update the total star value in News_Channel table. I am failing to do so and get the data before the update and after the update as zero. Although total_rate has value.

The problem
The reason why this fails is because here object is a QuerySet of News_Channels, yeah that QuerySet might contain exactly one News_Channel, but that is irrelevant.
If you then use object[0] you make a query to the database to fetch the first element and deserialize it into a News_Channel object. Then you set the total_star of that object, but you never save that object. You only call .update() on the entire queryset, resulting in another independent query.
You can fix this with:
objects = News_Channel.objects.filter(id=i)
object = objects[0]
object.total_star = total_rate
object.save()
Or given you do not need any validation, you can boost performance with:
News_Channel.objects.filter(id=i).update(total_star=total_rate)
Updating all News_Channels
If you want to update all News_Channels, you actually better use a Subquery here:
from django.db.models import OuterRef, Sum, Subquery
subq = Subquery(
Count.objects.filter(
channelId=OuterRef('id')
).annotate(
total_rate=Sum('rate')
).order_by('channelId').values('total_rate')[:1]
)
News_Channel.objects.update(total_star=subq)

The reason is that object in your case is a queryset, and after you attempt to update object[0], you don't store the results in the db, and don't refresh the queryset. To get it to work you should pass the field you want to update into the update method.
So, try this:
def run():
for i in range(1, 11):
news_channel = Count.objects.filter(channelId=i)
total_rate = 0
for rate in news_channel:
total_rate += rate.rate
print(total_rate)
object = News_Channel.objects.filter(id=i)
print(total_rate)
print("before",object[0].total_star,total_rate)
object.update(total_star=total_rate)
print("after", object[0].total_star)

News_Channel.total_star can be calculated by using aggregation
news_channel_obj.count_set.aggregate(total_star=Sum('rate'))['total_star']
You can then either use this in your script:
object.total_star = object.count_set.aggregate(total_star=Sum('rate'))['total_star']
Or if you do not need to cache this value because performance is not an issue, you can remove the total_star field and add it as a property on the News_Channel model
#property
def total_star(self):
return self.count_set.aggregate(total_star=Sum('rate'))['total_star']

Create error message datefield

I want to create an error message for following form:
class ExaminationCreateForm(forms.ModelForm):
class Meta:
model = Examination
fields = ['patient', 'number_of_examination', 'date_of_examination']
Models:
class Patient(models.Model):
patientID = models.CharField(max_length=200, unique=True, help_text='Insert PatientID')
birth_date = models.DateField(auto_now=False, auto_now_add=False, help_text='YYYY-MM-DD')
gender = models.CharField(max_length=200,choices=Gender_Choice, default='UNDEFINED')
class Examination(models.Model):
number_of_examination = models.IntegerField(choices=EXA_Choices)
patient = models.ForeignKey(Patient, on_delete=models.CASCADE)
date_of_examination = models.DateField(auto_now=False, auto_now_add=False, help_text='YYYY-MM-DD')
Every Patient has 2 Examinations (number of examination = Choices 1 or 2) and the error message should be activated when the date of the second examination < date of the first examination. Something like this:
Solution: `
def clean_date_of_examination(self):
new_exam = self.cleaned_data.get('date_of_examination')
try:
old_exam = Examination.objects.get(patient=self.cleaned_data.get('patient'))
except Examination.DoesNotExist:
return new_exam
if old_exam:
if old_exam.date_of_examination > new_exam:
raise forms.ValidationError("Second examination should take place after first examination")
return new_exam`

def clean_date_of_examination(self):
new_exam = self.cleaned_data.get('date_of_examination')
old_exam = Examination.objects.get(patient = self.cleaned_data.get('Patient'))
if old_exam:
if old_exam.date_of_examination > new_exam.date_of_examination:
raise forms.ValidationError("Second examination should take place after first examination")
return data

def clean_date_of_examination(self):
# Where 'data' is used?
date_of_exam = self.cleaned_data['date_of_examination']
try:
pat1 = Patient.object.get(examination__number_of_examination=1, date_of_examination=date_of_exam)
except Patiens.DoesNotExist:
# Patient 1 with given query doesn't exist. Handle it!
try:
pat2 = Patient.object.get(examination__number_of_examination=2, date_of_examination=date_of_exam)
except Patiens.DoesNotExist:
# Patient 2 with given query doesn't exist. Handle it!
if pat2.date_of_examination < pat1.date_of_examination:
raise forms.ValidationError("Second examination should take place after first examination")`
return data`

Django - Getting/Saving large objects takes a lot of time

I'm trying to get a few million of items from a model, and parsing them. However, somehow it spends a lot of time trying to get the data saved.
These are the current models that I have:
class mapt(models.Model):
s = models.IntegerField(primary_key=True)
name = models.CharField(max_length=2000)
def __unicode__(self):
return str(self.s)
class datt(models.Model):
s = models.IntegerField(primary_key=True)
setid = models.IntegerField()
var = models.IntegerField()
val = models.IntegerField()
def __unicode(self):
return str(self.s)
class sett(models.Model):
setid = models.IntegerField(primary_key=True)
block = models.IntegerField()
username = models.IntegerField()
ts = models.IntegerField()
def __unicode__(self):
return str(self.setid)
class data_parsed(models.Model):
setid = models.IntegerField(max_length=2000, primary_key=True)
block = models.CharField(max_length=2000)
username = models.CharField(max_length=2000)
data = models.CharField(max_length=200000)
time = models.IntegerField()
def __unicode__(self):
return str(self.setid)
The s parameter for the datt model should actually act as a foreign key to mapt's s parameter. Furthermore, sett's setid field should act as a foreign key to setid's setid.
Lastly, data_parsed's setid is a foreign key to sett's models.
The algorithm is currently written this way:
def database_rebuild(start_data_parsed):
listSetID = []
#Part 1
for items in sett.objects.filter(setid__gte=start_data_parsed):
listSetID.append(items.setid)
uniqueSetID = listSetID
#Part 2
for items in uniqueSetID:
try:
SetID = items
settObject = sett.objects.get(setid=SetID)
UserName = mapt.objects.get(pk=settObject.username).name
TS = pk=settObject.ts
BlockName = mapt.objects.get(pk=settObject.block).name
DataPairs_1 = []
DataPairs_2 = []
DataPairs_1_Data = []
DataPairs_2_Data = []
for data in datt.objects.filter(setid__exact=SetID):
DataPairs_1.append(data.var)
DataPairs_2.append(data.val)
for Data in DataPairs_1:
DataPairs_1_Data.append(mapt.objects.get(pk=Data).name)
for Data in DataPairs_2:
DataPairs_2_Data.append(mapt.objects.get(pk=Data).name)
assert (len(DataPairs_1) == len(DataPairs_2)), "Length not equal"
#Part 3
Serialize = []
for idx, val in enumerate(DataPairs_1_Data):
Serialize.append(str(DataPairs_1_Data[idx]) + ":PARSEABLE:" + str(DataPairs_2_Data[idx]) + ":PARSEABLENEXT:")
Serialize_Text = ""
for Data in Serialize:
Serialize_Text += Data
Data = Serialize_Text
p = data_parsed(SetID, BlockName, UserName, Data, TS)
p.save()
except AssertionError, e:
print "Error:" + str(e.args)
print "Possibly DataPairs does not have equal length"
except Exception as e:
print "Error:" + str(sys.exc_info()[0])
print "Stack:" + str(e.args)
Basically, what it does is that:
Finds all sett objects that is greater than a number
Gets the UserName, TS, and BlockName, then get all the fields in datt field that correspond to a var and val field maps to the mapt 's' field. Var and Val is basically NAME_OF_FIELD:VALUE type of relationship.
Serialize all the var and val parameters so that I could get all the parameters from var and val that is spread across the mapt table in a row in data_parsed.
The current solution does everything I would like to, however, on a Intel Core i5-4300U CPU # 1.90Ghz, it parses around 15000 rows of data daily on a celery periodic worker. I have around 3355566 rows of data at my sett table, and it will take around ~23 days to parse them all.
Is there a way to speed up the process?
============================Updated============================
New Models:
class mapt(models.Model):
s = models.IntegerField(primary_key=True)
name = models.CharField(max_length=2000)
def __unicode__(self):
return str(self.s)
class sett(models.Model):
setid = models.IntegerField(primary_key=True)
block = models.ForeignKey(mapt, related_name='sett_block')
username = models.ForeignKey(mapt, related_name='sett_username')
ts = models.IntegerField()
def __unicode__(self):
return str(self.setid)
# class sett(models.Model):
# setid = models.IntegerField(primary_key=True)
# block = models.IntegerField()
# username = models.IntegerField()
# ts = models.IntegerField()
# def __unicode__(self):
# return str(self.setid)
class datt(models.Model):
s = models.IntegerField(primary_key=True)
setid = models.ForeignKey(sett, related_name='datt_setid')
var = models.ForeignKey(mapt, related_name='datt_var')
val = models.ForeignKey(mapt, related_name='datt_val')
def __unicode(self):
return str(self.s)
# class datt(models.Model):
# s = models.IntegerField(primary_key=True)
# setid = models.IntegerField()
# var = models.IntegerField()
# val = models.IntegerField()
# def __unicode(self):
# return str(self.s)
class data_parsed(models.Model):
setid = models.ForeignKey(sett, related_name='data_parsed_setid', primary_key=True)
block = models.CharField(max_length=2000)
username = models.CharField(max_length=2000)
data = models.CharField(max_length=2000000)
time = models.IntegerField()
def __unicode__(self):
return str(self.setid)
New Parsing:
def database_rebuild(start_data_parsed, end_data_parsed):
for items in sett.objects.filter(setid__gte=start_data_parsed, setid__lte=end_data_parsed):
try:
UserName = mapt.objects.get(pk=items.username_id).name
TS = pk=items.ts
BlockName = mapt.objects.get(pk=items.block_id).name
DataPairs_1 = []
DataPairs_2 = []
DataPairs_1_Data = []
DataPairs_2_Data = []
for data in datt.objects.filter(setid_id__exact=items.setid):
DataPairs_1.append(data.var_id)
DataPairs_2.append(data.val_id)
for Data in DataPairs_1:
DataPairs_1_Data.append(mapt.objects.get(pk=Data).name)
for Data in DataPairs_2:
DataPairs_2_Data.append(mapt.objects.get(pk=Data).name)
assert (len(DataPairs_1) == len(DataPairs_2)), "Length not equal"
Serialize = []
for idx, val in enumerate(DataPairs_1_Data):
Serialize.append(str(DataPairs_1_Data[idx]) + ":PARSEABLE:" + str(DataPairs_2_Data[idx]))
Data = ":PARSEABLENEXT:".join(Serialize)
p = data_parsed(items.setid, BlockName, UserName, Data, TS)
p.save()
except AssertionError, e:
print "Error:" + str(e.args)
print "Possibly DataPairs does not have equal length"
except Exception as e:
print "Error:" + str(sys.exc_info()[0])
print "Stack:" + str(e.args)

Defining lists by appending repeadedly is very slow. Use list comprehensions or even just the list() constructor.
In python you should not join a list of strings using for loops and +=, you should use join().
But that is not the primary bottleneck here. You have a lot of objects.get()s which each takes a database roundtrip. If you didn't have milions of rows in the mapt table, you should probably just make a dictionary mapping mapt primary keys to mapt objects.
Had you defined your foreign keys as foreign keys the django orm could help you do much of this in like five queries in total. That is, instead of SomeModel.objects.get(id=some_instance.some_fk_id) you can do some_instance.some_fk (which will only hit the databse the first time you do it for each instance). You can then even get rid of the foreign key query if some_instance had been initialized as some_instance = SomeOtherModel.objects.select_related('some_fk').get(id=id_of_some_instance).
Perhaps changing the models without changing the database will work.

Develop Reference

Python is a programming language that lets you work quickly and integrate systems more effectively.

python mongoengine EmbeddedDocumentListField and aggregate - python

Related

Django convert Date of Birth model field to Age

Django - How to join two querysets with different key values (but from same model)

How to update queryset value in django?

Create error message datefield

Django - Getting/Saving large objects takes a lot of time

Categories

Resources