Django aggregation Many To Many into list of dict - python

It's been hours since I tried to perform this operation but I couldn't figure it out.
Let's say I have a Django project with two classes like these:
from django.db import models
class Person(models.Model):
name=models.CharField()
address=models.ManyToManyField(to=Address)
class Address(models.Model):
city=models.CharField()
zip=models.IntegerField()
So it's just a simple Person having multiple addresses.
Then I create some objects:
addr1=Address.objects.create(city='first', zip=12345)
addr2=Address.objects.create(city='second', zip=34555)
addr3=Address.objects.create(city='third', zip=5435)
person1=Person.objects.create(name='person_one')
person1.address.set([addr1,addr2])
person2=Person.objects.create(name='person_two')
person2.address.set([addr1,addr2,addr3])
Now it comes the hard part, I want to make a single query that will return something like that:
result = [
{
'name': 'person_one',
'addresses': [
{
'city':'first',
'zip': 12345
},
{
'city': 'second',
'zip': 34555
}
]
},
{
'name': 'person_two',
'addresses': [
{
'city':'first',
'zip': 12345
},
{
'city': 'second',
'zip': 34555
},
{
'city': 'third',
'zip': 5435
}
]
}
]
The best i could get was using ArrayAgg and JSONBAgg operators for Django (I'm on POSTGRESQL BY THE WAY):
from django.contrib.postgres.aggregates import JSONBAgg, ArrayAgg
result = Person.objects.values(
'name',
addresses=JSONBAgg('city')
)
But that's not enough, I can't pull a lit of dictionaries out of the query directly as I would like to do, just a list of values or something useless using:
addresses=JSONBAgg(('city','zip'))
which returns a dictionari with random keys and the strings I passed as input as values.
Can someone help me out?
Thanks

If you use postgres, you can do this:
subquery = Address.objects.filter(person_id=OuterRef("pk")).annotate(
data=JSONObject(city=F("city"), zip=F("zip"))
).values_list("data")
persons = Persons.objects.annotate(addresses=ArraySubquery(subquery))

Your requirement: To make an aggregation of customized JSON objects after group_by (values) in Django.
Currently, to my knowledge, Django is not providing any function to aggregate manually created JSON objects. There are a couple of ways to solve this. Firstly, make a customized function which is quite laborious. However, there is another approach that is pretty much easy, using both aggregate functions (ArrayAgg or JSONBAgg) and RawSQL together.
from django.contrib.postgres.aggregates import JSONBAgg, ArrayAgg
result = Person.objects.values('name').annotate(addresses=JSONBAgg(RawSQL("json_build_object('city', city, 'zip', zip)", ())))
I hope it would help you.

person.address already holds a queryset of addresses. From there you can use list-comprehension / model_from_dict to get the values you want.

Related

How to group mongo docs which have the same key:value and return the response in the form of an array?

So what i am basically trying to do is groups a set of mongo docs having the same key:value pair and return them in the form of a list of list.
EX:
{"client":"abp","product":"a"},{"client":"aaj","product":"b"},{"client":"abp","product":"c"}
Output:
{"result": [ [{"client":"abp","product":"a"},{"client":"abp","product":"c"}], [{"client":"aaj","product":"b"}] ] }
Mongo query or any other logic in python would help. Thanks in advance.
I would group by client and then create and array of product using $push. $push allows you to insert each grouped object in an array.
db.yourcollection.aggregate([
{
$group: {
_id: '$client',
products: {$push: {client: '$client', product: '$product'}}
}
}])
from operator import itemgetter
from itertools import groupby
x=[{"client":"abp","product":"a"},{"client":"aaj","product":"b"},{"client":"abp","product":"c"}]
x.sort(key=itemgetter('client'),reverse=True)
d=[list(g) for (k,g) in groupby(x,itemgetter('client'))]
final = {}
final['result']=d
Output:
{'result': [[{'client': 'abp', 'product': 'a'},
{'client': 'abp', 'product': 'c'}],
[{'client': 'aaj', 'product': 'b'}]]

Mongodb aggregate query with condition

I have to perform aggregate on mongodb in python and unable to do so.
Below is the structure of mongodb document extracted:
{'Category': 'Male',
'details' :[{'name':'Sachin','height': 6},
{'name':'Rohit','height': 5.6},
{'name':'Virat','height': 5}
]
}
I want to return the height where name is Sachin by the aggregate function. Basically my idea is to extract data by $match apply condition and aggregate at the same time with aggregate function. This can be easily done by doing in 3 steps with if statements but i'm looking to do in 1 aggregate function.
Please note: there is not fixed length of 'details' value.
Let me know if any more explanation is needed.
You can do a $filter to achieve
db.collection.aggregate([
{
$project: {
details: {
$filter: {
input: "$details",
cond: {
$eq: [
"$$this.name",
"Sachin"
]
}
}
}
}
}
])
Working Mongo playground
If you use in find, but you need to be aware of positional operator
db.collection.find({
"details.name": "Sachin"
},
{
"details.$": 1
})
Working Mongo playground
If you need to make it as object, you can simply use $arrayElemAr with $ifNull

How to pick apart array data

Trying to output just the employee data(empfirst, emplast, empsalary, emproles) to a bottle project. I Just want the value not the keys. How would I go about this? It feels like i've tried everything but can't get at the data I need!
My query
emp_curs = connection.coll.find({},{"_id": False,"employee.empFirst":True})
dept_list = list(emp_curs)```
(just playing with the first name for now until its working)
My loop
```% for d in emp_list:
% for i in d:
<tr>
<td>{{d[i]}}</td>
<td>{{d[i]}}</td>
<td>{{d[i]}}</td>
<td>{{d[i]}}</td>
</tr>
%end
%end```
thats the closest i've gotten :\
Looking to take all the data and place in a table.
Sorry, here is the whole data file!
Sorry, here's some sample data
[
{
"deptCode": "ACCT",
"deptName": "Accounting",
"deptBudget": 200000,
"employee": [
{
"empFirst": "Marsha",
"empLast": "Bonavoochi",
"empSalary": 59000
},
{
"empFirst": "Roberto",
"empLast": "Acostaletti",
"empSalary": 85000,
"empRoles": [
"Manager"
]
},
{
"empFirst": "Dini",
"empLast": "Cappelletti",
"empSalary": 50500
}
]
}
]
It looks like you are stopping just one layer early within your nested list of dictionaries. This should get you all the applicable values for the employee data:
for department in department_list:
for employee in department["employee"]:
for value in employee.values():
print(value) # or whatever operation you want, adding to the table in your case
Looks like you have adding to the table working as you want, so that should work for you. Based on the structure of your sample data, I'm assuming there will be multiple departments to pull this data from (hence me starting with department_list).

Django queryset: Build a dictionary from a queryset, with common elements

I'm trying to construct a dictionary from my database, that will separate my data into values with common time stamps.
data_point:
time: <timestamp>
value: integer
I have 66k data points, out of which around 7k share timestamps (meaning the measurement was taken at the same time.
I need to make a dict that would look like:
{
"data_array": [
{
"time": "2018-05-11T10:34:43.826Z",
"values": [
13560465,
87856595,
78629348
]
},
{
"time": "2018-05-11T10:34:43.882Z",
"values": [
13560689,
78237945,
92378456
]
}
]
}
There are other keys in the dictionary, but I'm just having a bit of a struggle with this particular key.
The idea is, look at my data queryset, and group up objects that share a timestamp, then add a key "time" to my dict, with the value being the timestamp, and an array "values" with the value being a list of those data.value objects
I'm not experienced enough to build this without looping a lot and probably being very innefficient. Some kind of "while timestamp doesn't change: append value to list", though I'm not sure how to go about that either.
Ideally, if I can do this with queries (should be faster, right?) I would prefer that.
Why not use collections.defaultdict?
from collections import defaultdict
data = defaultdict(list)
# qs is your queryset
for time, value in qs.values_list('time', 'value'):
data[time].append(value)
In that case data looks like:
{
'time_1': [
value_1_1,
value_1_2,
...
],
'time_2': [
value_2_1,
value_2_2,
...
],
....
}
at this point you can build any output format you want

Django complex query based on dicts

Tldr of Problem
Frontend is a form that requires a complex lookup with ranges and stuff across several models, given in a dict. Best way to do it?
Explanation
From the view, I receive a dict of the following form (After being processed by something else):
{'h_index': {"min": 10,"max":20},
'rank' : "supreme_overlord",
'total_citations': {"min": 10,"max":400},
'year_began': {"min": 2000},
'year_end': {"max": 3000},
}
The keys are column names from different models (Right now, 2 separate models, Researcher and ResearchMetrics), and the values are the range / exact value that I want to query.
Example (Above)
Belonging to model Researcher :
rank
year_began
year_end
Belonging to model ResearchMetrics
total_citations
h_index
Researcher has a One to Many relationship with ResearchMetrics
Researcher has a Many to Many relationship with Journals (not mentioned in question)
Ideally: I want to show the researchers who fulfill all the criteria above in a list of list format.
Researcher ID, name, rank, year_began, year_end, total_citations, h_index
[[123, "Thomas", "professor", 2000, 2012, 15, 20],
[ 343 ... ]]
What's the best way to go about solving this problem? (Including changes to form, etc?) I'm not very familiar with the whole form query model thing.
Thank you for your help!
To dynamically perform a query you pass a dict with items 'fieldname__lookuptype': value as **kwargs to Model.objects.filter.
So to filter for rank, year_began and year_end in your example above, you would do this:
How exactly you do the transformation depends on how variable this incoming dictionary is. An example could be something like this:
filter_in = {
'h_index': {"min": 10,"max":20},
'rank' : "supreme_overlord",
'total_citations': {"min": 10,"max":400},
'year_began': {"min": 2000},
'year_end': {"max": 3000},
}
LOOKUP_MAPPING = {
'min': 'gt',
'max': 'lt'
}
filter_kwargs = {}
for field in RESEARCHER_FIELDS:
if not field in filter_in:
continue
filter = filter_in[field]
if isinstance(filter, dict):
for filter_type, value in filter.items():
lookup_type = LOOKUP_MAPPING[filter_type]
lookup = '%s__%s' % (field, lookup_type)
filter_dict[lookup] = value
else:
filter_dict[field] = filter
This results in a dictionary like this:
{
'rank': 'supreme_overlord',
'year_began__gt': 2000,
'year_end__lt': 3000
}
Use it like this:
qs = Researcher.objects.filter(**filter_kwargs)
Regarding the fields total_citations and h_index from ResearchMetrics, I assume you want to aggregate the values. So in your example above you want either a sum or an average.
The principle is the same:
from django.db.models import Sum
METRICS_FIELDS = ['total_citations', 'h_index']
annotate_kwargs = {}
for field in METRICS_FIELDS:
if not field in filter_in:
continue
annotated_field = '%s_sum' % field
annotate_kwargs[annotated_field] = Sum('researchmetric__%s' % field)
filter = filter_in[field]
if isinstance(filter, dict):
for filter_type, value in filter.items():
lookup_type = LOOKUP_MAPPING[filter_type]
lookup = '%s__%s' % (annotated_field, lookup_type)
filter_dict[lookup] = value
else:
filter_kwargs[field] = filter
Now your filter_kwargs look like this:
{
'h_index_sum__gt': 10,
'h_index_sum__lt': 20,
'rank': 'supreme_overlord',
'total_citations_sum__gt': 10,
'total_citations_sum__lt': 400,
'year_began__gt': 2000,
'year_end__lt': 3000
}
And your annotate_kwargs look like this:
{
'h_index_sum': Sum('reasearchmetric__h_index')),
'total_citations_sum': Sum('reasearchmetric__total_citations'))
}
So your final call looks like this:
Researcher.objects.annotate(**annotate_kwargs).filter(**filter_kwargs)
There are some assumptions in my answer, but I hope you get the general idea.
There is one important point: make sure you properly validate the input to make sure that only the field can be filtered that you want the user to filter. In my approach, this is ensured by hard coding the field names in RESEARCHER_FIELDS and METRICS_FIELDS.

Categories

Resources