I have a function which computes something like sum of data (it's not a simple sum, there is an increasing number that multiplies it every time) in database through year. It is calculated in views, I need to pass them to template. I store it in Dictionary portfolio_dict[year] += amount
{'2013': Decimal('92.96892879384746351465539182'), '2012': Decimal('71.48765907571338816005401399')}
But I need some extra data to send as well. Let's say:
date:date
amount:Decimal
year:string
I know it sounds kind of stupid to have a year and date as well. I use year as index. How do I pass these data to template/add date to my current dictionary?
But now, I always had Model and I passed a list of that model instances. But now I don't need to store these data in database, so I don't want to create a model.
Where do I create new class in django if I don't want it to be in database?
Or should I use collections or data structures?
Only django.db.Model instances are stored in the database (and only if you explicitely ask for it). Else this is just plain old Python and you can create and use your own classes as you see fit.
But anyway: if all you need is a year-indexed collection of (date, amount) items, then a dict of dicts is enough:
{
'2013': {
'amount': Decimal('92.96892879384746351465539182'),
'date': datetime.date(2013, 10, 25)
},
# etc
}
Or if you need more than one (amount, date) per year, a dict with lists or dicts:
{
'2013': [
{
'amount': Decimal('92.96892879384746351465539182'),
'date': datetime.date(2013, 10, 25)
},
{
'amount': Decimal('29.9689287'),
'date': datetime.date(2013, 10, 21)
},
],
# etc
}
In fact the proper structure depends on how you're going to use the data.
Related
It's been hours since I tried to perform this operation but I couldn't figure it out.
Let's say I have a Django project with two classes like these:
from django.db import models
class Person(models.Model):
name=models.CharField()
address=models.ManyToManyField(to=Address)
class Address(models.Model):
city=models.CharField()
zip=models.IntegerField()
So it's just a simple Person having multiple addresses.
Then I create some objects:
addr1=Address.objects.create(city='first', zip=12345)
addr2=Address.objects.create(city='second', zip=34555)
addr3=Address.objects.create(city='third', zip=5435)
person1=Person.objects.create(name='person_one')
person1.address.set([addr1,addr2])
person2=Person.objects.create(name='person_two')
person2.address.set([addr1,addr2,addr3])
Now it comes the hard part, I want to make a single query that will return something like that:
result = [
{
'name': 'person_one',
'addresses': [
{
'city':'first',
'zip': 12345
},
{
'city': 'second',
'zip': 34555
}
]
},
{
'name': 'person_two',
'addresses': [
{
'city':'first',
'zip': 12345
},
{
'city': 'second',
'zip': 34555
},
{
'city': 'third',
'zip': 5435
}
]
}
]
The best i could get was using ArrayAgg and JSONBAgg operators for Django (I'm on POSTGRESQL BY THE WAY):
from django.contrib.postgres.aggregates import JSONBAgg, ArrayAgg
result = Person.objects.values(
'name',
addresses=JSONBAgg('city')
)
But that's not enough, I can't pull a lit of dictionaries out of the query directly as I would like to do, just a list of values or something useless using:
addresses=JSONBAgg(('city','zip'))
which returns a dictionari with random keys and the strings I passed as input as values.
Can someone help me out?
Thanks
If you use postgres, you can do this:
subquery = Address.objects.filter(person_id=OuterRef("pk")).annotate(
data=JSONObject(city=F("city"), zip=F("zip"))
).values_list("data")
persons = Persons.objects.annotate(addresses=ArraySubquery(subquery))
Your requirement: To make an aggregation of customized JSON objects after group_by (values) in Django.
Currently, to my knowledge, Django is not providing any function to aggregate manually created JSON objects. There are a couple of ways to solve this. Firstly, make a customized function which is quite laborious. However, there is another approach that is pretty much easy, using both aggregate functions (ArrayAgg or JSONBAgg) and RawSQL together.
from django.contrib.postgres.aggregates import JSONBAgg, ArrayAgg
result = Person.objects.values('name').annotate(addresses=JSONBAgg(RawSQL("json_build_object('city', city, 'zip', zip)", ())))
I hope it would help you.
person.address already holds a queryset of addresses. From there you can use list-comprehension / model_from_dict to get the values you want.
I'm trying to create an event manager in which a dictionary stores the events like this
my_dict = {'2020':
{'9': {'8': ['School ']},
'11': {'13': ['Doctors ']},
'8': {'31': ['Interview']}
},
'2021': {}}
In which the outer key is the year the middle key is a month and the most inner key is a date which leads to a list of events.
I'm trying to first sort it so that the months are in order then sort it again so that the days are in order. Thanks in advance
Use-case
DevOrangeCrush wishes to sort on keys in a nested dictionary where the nesting occurs on multiple levels
Solution
Normalize the data so that the dates match ISO8601 format, for easier sorting
In plain English, this means make sure you always use two digits for month and date, and always use four digits for year
Re-normalize the original dictionary data structure into a single list of dictionaries, where each dictionary represents a row, and the list represents an outer containing table
this is known as an Array of Hashes in perl-speak
this is known as a list of objects in JSON-speak
Once your data is restructured you are solving a much more well-known, well-documented, and more obvious problem, how to sort a simple list of dictionaries (which is already documented in the See also section of this answer).
Example
import pprint
## original data is formatted as a nested dictionary, which is clumsy
my_dict = {'2020':
{'9': {'8': ['School ']}, '11':
{'13': ['Doctors ']},'8':
{'31': ['Interview']}}, '2021': {}
}
## we want the data formatted as a standard table (aka list of dictionary)
## this is the most common format for this kind of data as you would see in
## databases and spreadsheets
mydata_table = []
ddtemp = dict()
for year in my_dict:
for month in my_dict[year].keys():
ddtemp['month'] = '{0:02d}'.format(*[int(month)])
ddtemp['year'] = year
for day in my_dict[year][month].keys():
ddtemp['day'] = '{0:02d}'.format(*[int(day)])
mydata_row = dict()
mydata_row['year'] = '{year}'.format(**ddtemp)
mydata_row['month'] = '{month}'.format(**ddtemp)
mydata_row['day'] = '{day}'.format(**ddtemp)
mydata_row['task_list'] = my_dict[year][month][day]
mydata_row['date'] = '{year}-{month}-{day}'.format(**ddtemp)
mydata_table.append(mydata_row)
pass
pass
pass
## output result is now easily sorted and there is no data loss
## you will have to modify this if you want to deal with years that
## do not have any associated task_list data
pprint.pprint(mydata_table)
'''
## now we have something that can be sorted using well-known python idioms
## and easily manipulated using data-table semantics
## (search, sort, filter-by, group-by, select, project ... etc)
[
{'date': '2020-09-08','day': '08',
'month': '09','task_list': ['School '],'year': '2020'},
{'date': '2020-11-13','day': '13',
'month': '11','task_list': ['Doctors '],'year': '2020'},
{'date': '2020-08-31','day': '31',
'month': '08','task_list': ['Interview'],'year': '2020'},
]
'''
See also
How to sort a python list-of-dictionary
How to sort objects by multiple keys
Why you should use ISO8601 date format
ISO8601 vs timestamp
To get sorted events data, you can do something like this:
def sort_events(my_dict):
new_events_data = dict()
for year, month_data in my_dict.items():
new_month_data = dict()
for month, day_data in month_data.items():
sorted_day_data = sorted(day_data.items(), key=lambda kv: int(kv[0]))
new_month_data[month] = OrderedDict(sorted_day_data)
sorted_months_data = sorted(new_month_data.items(), key=lambda kv: int(kv[0]))
new_events_data[year] = OrderedDict(sorted_months_data)
return new_events_data
Output:
{'2020': OrderedDict([('8', OrderedDict([('31', ['Interview'])])),
('9', OrderedDict([('8', ['School '])])),
('11', OrderedDict([('13', ['Doctors '])]))]),
'2021': OrderedDict()}
A simple dict can't be ordered, you could do it using a OrderedDict but if you simply need to get it sorted while iterating on it do like this
for year in sorted(map(int, my_dict)):
year_dict = my_dict[str(year)]
for month in sorted(map(int, year_dict)):
month_dict = year_dict[str(month)]
for day in sorted(map(int, month_dict)):
events = month_dict[str(day)]
for event in events:
print(year, month, day, event)
Online Demo
The conversion to int is to ensure right ordering between the numbers, without you'll get 1, 10, 11, .., 2, 20, 21
A dictionary in Python does not have an order, you might want to try the OrderedDict class from the collections Module which remembers the order of insertion.
Of course you would have to sort and reinsert the elements whenever you insert a new element which should be placed before any of the existing elements.
If you care about order maybe a different data structure works better. For example a list of lists.
I'm trying to construct a dictionary from my database, that will separate my data into values with common time stamps.
data_point:
time: <timestamp>
value: integer
I have 66k data points, out of which around 7k share timestamps (meaning the measurement was taken at the same time.
I need to make a dict that would look like:
{
"data_array": [
{
"time": "2018-05-11T10:34:43.826Z",
"values": [
13560465,
87856595,
78629348
]
},
{
"time": "2018-05-11T10:34:43.882Z",
"values": [
13560689,
78237945,
92378456
]
}
]
}
There are other keys in the dictionary, but I'm just having a bit of a struggle with this particular key.
The idea is, look at my data queryset, and group up objects that share a timestamp, then add a key "time" to my dict, with the value being the timestamp, and an array "values" with the value being a list of those data.value objects
I'm not experienced enough to build this without looping a lot and probably being very innefficient. Some kind of "while timestamp doesn't change: append value to list", though I'm not sure how to go about that either.
Ideally, if I can do this with queries (should be faster, right?) I would prefer that.
Why not use collections.defaultdict?
from collections import defaultdict
data = defaultdict(list)
# qs is your queryset
for time, value in qs.values_list('time', 'value'):
data[time].append(value)
In that case data looks like:
{
'time_1': [
value_1_1,
value_1_2,
...
],
'time_2': [
value_2_1,
value_2_2,
...
],
....
}
at this point you can build any output format you want
Tldr of Problem
Frontend is a form that requires a complex lookup with ranges and stuff across several models, given in a dict. Best way to do it?
Explanation
From the view, I receive a dict of the following form (After being processed by something else):
{'h_index': {"min": 10,"max":20},
'rank' : "supreme_overlord",
'total_citations': {"min": 10,"max":400},
'year_began': {"min": 2000},
'year_end': {"max": 3000},
}
The keys are column names from different models (Right now, 2 separate models, Researcher and ResearchMetrics), and the values are the range / exact value that I want to query.
Example (Above)
Belonging to model Researcher :
rank
year_began
year_end
Belonging to model ResearchMetrics
total_citations
h_index
Researcher has a One to Many relationship with ResearchMetrics
Researcher has a Many to Many relationship with Journals (not mentioned in question)
Ideally: I want to show the researchers who fulfill all the criteria above in a list of list format.
Researcher ID, name, rank, year_began, year_end, total_citations, h_index
[[123, "Thomas", "professor", 2000, 2012, 15, 20],
[ 343 ... ]]
What's the best way to go about solving this problem? (Including changes to form, etc?) I'm not very familiar with the whole form query model thing.
Thank you for your help!
To dynamically perform a query you pass a dict with items 'fieldname__lookuptype': value as **kwargs to Model.objects.filter.
So to filter for rank, year_began and year_end in your example above, you would do this:
How exactly you do the transformation depends on how variable this incoming dictionary is. An example could be something like this:
filter_in = {
'h_index': {"min": 10,"max":20},
'rank' : "supreme_overlord",
'total_citations': {"min": 10,"max":400},
'year_began': {"min": 2000},
'year_end': {"max": 3000},
}
LOOKUP_MAPPING = {
'min': 'gt',
'max': 'lt'
}
filter_kwargs = {}
for field in RESEARCHER_FIELDS:
if not field in filter_in:
continue
filter = filter_in[field]
if isinstance(filter, dict):
for filter_type, value in filter.items():
lookup_type = LOOKUP_MAPPING[filter_type]
lookup = '%s__%s' % (field, lookup_type)
filter_dict[lookup] = value
else:
filter_dict[field] = filter
This results in a dictionary like this:
{
'rank': 'supreme_overlord',
'year_began__gt': 2000,
'year_end__lt': 3000
}
Use it like this:
qs = Researcher.objects.filter(**filter_kwargs)
Regarding the fields total_citations and h_index from ResearchMetrics, I assume you want to aggregate the values. So in your example above you want either a sum or an average.
The principle is the same:
from django.db.models import Sum
METRICS_FIELDS = ['total_citations', 'h_index']
annotate_kwargs = {}
for field in METRICS_FIELDS:
if not field in filter_in:
continue
annotated_field = '%s_sum' % field
annotate_kwargs[annotated_field] = Sum('researchmetric__%s' % field)
filter = filter_in[field]
if isinstance(filter, dict):
for filter_type, value in filter.items():
lookup_type = LOOKUP_MAPPING[filter_type]
lookup = '%s__%s' % (annotated_field, lookup_type)
filter_dict[lookup] = value
else:
filter_kwargs[field] = filter
Now your filter_kwargs look like this:
{
'h_index_sum__gt': 10,
'h_index_sum__lt': 20,
'rank': 'supreme_overlord',
'total_citations_sum__gt': 10,
'total_citations_sum__lt': 400,
'year_began__gt': 2000,
'year_end__lt': 3000
}
And your annotate_kwargs look like this:
{
'h_index_sum': Sum('reasearchmetric__h_index')),
'total_citations_sum': Sum('reasearchmetric__total_citations'))
}
So your final call looks like this:
Researcher.objects.annotate(**annotate_kwargs).filter(**filter_kwargs)
There are some assumptions in my answer, but I hope you get the general idea.
There is one important point: make sure you properly validate the input to make sure that only the field can be filtered that you want the user to filter. In my approach, this is ensured by hard coding the field names in RESEARCHER_FIELDS and METRICS_FIELDS.
i am recording user's daily usage of my platform.
structures of documents in mongodb are like that:
_id: X
day1:{
loginCount = 4
someDict { x:y, z:m }
}
day2:{
loginCount = 5
someDict { a:b, c:d }
}
then, i need to get last 2 day's user stats which belongs to user X.
how can i get values whose days are greater than two days ago? (like using '$gte' command?)
Ok, if you insist on this scheme try this:
{
_id: Usemongokeyhere
userid: X
days: [
{day:IsoDate(2013-08-12 00:00),
loginCount: 10,
#morestuff
},
{day:IsoDate(2013-08-13 00:00),
loginCount: 11,
#morestuff
},
]
},
#more users
Then you can query like:
db.items.find(
{"days.day":{$gte:ISODate("2013-08-30T00:00:00.000Z"),
$lt: ISODate("2013-08-31T00:00:00.000Z")
}
}
)
Unless there is any change in the question, i am answering based on this schema.
_id: X
day1:{
loginCount:4
someDict:{ x:y, z:m }
}
day2:{
loginCount:5
someDict:{ a:b, c:d }
}
Answer:
last 2 day's user stats which belongs to user X.
You cannot get it from mongo side with operators like $gte, with this structure, because you get the whole days when do query for user X. The document contains information about all days and keeping dynamic values as keys is in my opinion a bad practice. You can retrieve a documents by defining fields like db.collection.find({_id:X},{day1:1,day2:1})
However you have to know what the keys are and i am not sure how you keep day1 and day2 as key iso date, timestamp? Depending on how you hold it, you can write fields on the query by writing yesterday and before yesterday as date string or timestamp and get your required information.