Group By each date in Django 2 - python

I have a model called Log which has a datetime field created_at.
What I want to do is to calculate the number of Logs for each date.
I tried to do this:
from django.db.models.functions import TruncDate
Log.objects.annotate(date=TruncDate('created_at')).values('date').annotate(c=Count('id'))
This is giving me the following:
{'date': datetime.date(2018, 1, 17), 'count': 1}, {'date': datetime.date(2018, 1, 17), 'count': 1}, {'date': datetime.date(2018, 1, 17), 'count': 2}
That is, the date is not unique.
The result I want would be this:
{'date': datetime.date(2018, 1, 17), 'count': 4}, {'date': datetime.date(2018, 1, 18), 'count': 2}
How could approach this problem?

If you set default ordering in your Log model extra field will be added to GROUP BY section of your query, and it can cause this problem. Try to remove ordering.
Log.objects.order_by().annotate(date=TruncDate('created_at')).values('date').annotate(c=Count('id'))

You can use the distinct() method to get unique values in Django
Ex:
Log.objects.annotate(date=TruncDate('created_at')).values('d‌​‌​ate').distinct()

Related

How to calculate cumulative sum with django ORM?

I'm trying to group_by() data based on dates and with every day I want to calculate Count on that day also the total count so far.
Sample output I'm getting:
[
{
"dates": "2022-11-07",
"count": 1
},
{
"dates": "2022-11-08",
"count": 3
},
{
"dates": "2022-11-09",
"count": 33
}
]
Sample output I'm trying to achieve:
[
{
"dates": "2022-11-07",
"count": 1,
"cumulative_count": 1
},
{
"dates": "2022-11-08",
"count": 3,
"cumulative_count": 4
},
{
"dates": "2022-11-09",
"count": 33,
"cumulative_count": 37
}
]
Here's my query:
self.serializer_class.Meta.model.objects.all().annotate(dates=TruncDate("date__date")).values("dates").order_by("dates").annotate(count=Count("channel", distinct=True)).values("count", "dates")
How can I extend this query to get a cumulative sum as well?
I tried to solve your problem like this
models.py
class Demo(models.Model):
count =models.IntegerField()
dates = models.DateField()
serializers.py
class DemoSerializer(serializers.ModelSerializer):
class Meta:
model = Demo
fields = "__all__"
Views.py
class DemoAPI(APIView):
def get(self, request, pk=None, format=None):
data = Demo.objects.all()
cumulative_count= 0
# Normal Django ORM Queruset
print('--------- Default Queryset Response ---------')
for i in data:
del i.__dict__['_state']
print(i.__dict__)
# Adding cumulative_count key in ORM Queryset
for i in data:
cumulative_count += i.__dict__['count']
i.__dict__['cumulative_count'] = cumulative_count
# Updated Django ORM Queruset with cumulative_count
print('--------- Updated Queryset Response ---------')
for i in data:
# del i.__dict__['_state']
print(i.__dict__)
Output before delete _state key from Queryset
#--------- Default Queryset Response ---------
{'_state': <django.db.models.base.ModelState object at 0x000001A07002A680>, 'id': 1, 'count': 1, 'dates': datetime.date(2022, 11, 7)}
{'_state': <django.db.models.base.ModelState object at 0x000001A07002A5C0>, 'id': 2, 'count': 3, 'dates': datetime.date(2022, 11, 8)}
{'_state': <django.db.models.base.ModelState object at 0x000001A07002A7A0>, 'id': 3, 'count': 33, 'dates': datetime.date(2022, 11, 9)}
#--------- Updated Queryset Response ---------
{'_state': <django.db.models.base.ModelState object at 0x000002DAB66E0AC0>, 'id': 1, 'count': 1, 'dates': datetime.date(2022, 11, 7), 'cumulative_count': 1}
{'_state': <django.db.models.base.ModelState object at 0x000002DAB66E0C10>, 'id': 2, 'count': 3, 'dates': datetime.date(2022, 11, 8), 'cumulative_count': 4}
{'_state': <django.db.models.base.ModelState object at 0x000002DAB66E0D60>, 'id': 3, 'count': 33, 'dates': datetime.date(2022, 11, 9), 'cumulative_count': 37}
Output after delete _state key from Queryset Added cumulative_count key in Queryset
#--------- Default Queryset Response ---------
{'id': 1, 'count': 1, 'dates': datetime.date(2022, 11, 7)}
{'id': 2, 'count': 3, 'dates': datetime.date(2022, 11, 8)}
{'id': 3, 'count': 33, 'dates': datetime.date(2022, 11, 9)}
#--------- Updated Queryset Response ---------
{'id': 1, 'count': 1, 'dates': datetime.date(2022, 11, 7), 'cumulative_count': 1}
{'id': 2, 'count': 3, 'dates': datetime.date(2022, 11, 8), 'cumulative_count': 4}
{'id': 3, 'count': 33, 'dates': datetime.date(2022, 11, 9), 'cumulative_count': 37}

Django group by month with possible zeros

Am creating an chart for data analytics. So i need to group the count by month for the whole year.
My model:
Class Application:
reference,
created_at
From the above, i need count of applications for each month for the current year. And with the current query i get all the data but i am not getting data for the months which no data is available:
My query:
queryset = Application.objects.filter(user=user).annotate(month=TruncMonth('created_at')).values('month').annotate(_applications=Count('id')).order_by('month')
For example, If i have data for month Jan and Feb the above query gives data of those only but i need the data to contain "0" for all non data available months:
If March doesnt have data means, the result should be "0" for that month. How to do this ?
You can manually create your dataset using query reqults
queryset = Application.objects.filter(user=user).annotate(
month=TruncMonth('created_at')).values('month').annotate(
_applications=Count('id')).order_by('month')
applications_by_month = {
m['month'].month: m['_applications'] for m in queryset
}
dataset = []
year = 2021
for month in range(1, 13):
dataset.append({
"month": datetime.date(year=year, month=month, day=1),
"applications": applications_by_month.get(month, 0)
})
print(dataset)
Output
[{'month': datetime.date(2021, 1, 1), 'applications': 0},
{'month': datetime.date(2021, 2, 1), 'applications': 0},
{'month': datetime.date(2021, 3, 1), 'applications': 1},
{'month': datetime.date(2021, 4, 1), 'applications': 0},
{'month': datetime.date(2021, 5, 1), 'applications': 0},
{'month': datetime.date(2021, 6, 1), 'applications': 1},
{'month': datetime.date(2021, 7, 1), 'applications': 0},
{'month': datetime.date(2021, 8, 1), 'applications': 0},
{'month': datetime.date(2021, 9, 1), 'applications': 0},
{'month': datetime.date(2021, 10, 1), 'applications': 0},
{'month': datetime.date(2021, 11, 1), 'applications': 0},
{'month': datetime.date(2021, 12, 1), 'applications': 0}]

Extract values from array in python

I'm having some trouble accessing a value that is inside an array that contains a dictionary and another array.
It looks like this:
[{'name': 'Alex',
'number_of_toys': [{'classification': 3, 'count': 383},
{'classification': 1, 'count': 29},
{'classification': 0, 'count': 61}],
'total_toys': 473},
{'name': 'John',
'number_of_toys': [{'classification': 3, 'count': 8461},
{'classification': 0, 'count': 3825},
{'classification': 1, 'count': 1319}],
'total_toys': 13605}]
I want to access the 'count' number for each 'classification'. For example, for 'name' Alex, if 'classification' is 3, then the code returns the 'count' of 383, and so on for the other classifications and names.
Thanks for your help!
Not sure what your question asks, but if it's just a mapping exercise this will get you on the right track.
def get_toys(personDict):
person_toys = personDict.get('number_of_toys')
return [ (toys.get('classification'), toys.get('count')) for toys in person_toys]
def get_person_toys(database):
return [(personDict.get('name'), get_toys(personDict)) for personDict in database]
This result is:
[('Alex', [(3, 383), (1, 29), (0, 61)]), ('John', [(3, 8461), (0, 3825), (1, 1319)])]
This isn't as elegant as the previous answer because it doesn't iterate over the values, but if you want to select specific elements, this is one way to do that:
data = [{'name': 'Alex',
'number_of_toys': [{'classification': 3, 'count': 383},
{'classification': 1, 'count': 29},
{'classification': 0, 'count': 61}],
'total_toys': 473},
{'name': 'John',
'number_of_toys': [{'classification': 3, 'count': 8461},
{'classification': 0, 'count': 3825},
{'classification': 1, 'count': 1319}],
'total_toys': 13605}]
import pandas as pd
df = pd.DataFrame(data)
print(df.loc[0]['name'])
print(df.loc[0][1][0]['classification'])
print(df.loc[0][1][0]['count'])
which gives:
Alex
3
383

create nested dictionarys in python3

I would like to create nested dict in python3, I've the following list(from a sql-query):
[('madonna', 'Portland', 'Oregon', '0.70', '+5551234', 'music', datetime.date(2016, 9, 8), datetime.date(2016, 9, 1)), ('jackson', 'Laredo', 'Texas', '2.03', '+555345', 'none', datetime.date(2016, 5, 23), datetime.date(2016, 5, 16)), ('bohlen', 'P', 'P', '2.27', '+555987', 'PhD Student', datetime.date(2016, 9, 7))]
I would like to have the following output:
{madonna:{city:Portland, State:Oregon, Index: 0.70, Phone:+5551234, art:music, exp-date:2016, 9, 8, arrival-date:datetime.date(2016, 5, 23)},jackson:{city: Laredo, State:Texas........etc...}}
Can somebody show me an easy to understand code?
I try:
from collections import defaultdict
usercheck = defaultdict(list)
for accname, div_ort, standort, raum, telefon, position, exp, dep in cur.fetchall():
usercheck(accname).append[..]
but this don't work, I can't think any further myself
You can use Dict Comprehension (defined here) to dynamically create a dictionary based on the elements of a list:
sql_list = [
('madonna', 'Portland', 'Oregon', '0.70', '+5551234', 'music', datetime.date(2016, 9, 8), datetime.date(2016, 9, 1)),
('jackson', 'Laredo', 'Texas', '2.03', '+555345', 'none', datetime.date(2016, 5, 23), datetime.date(2016, 5, 16)),
('bohlen', 'P', 'P', '2.27', '+555987', 'PhD Student', datetime.date(2016, 9, 7))
]
sql_dict = {
element[0]: {
'city': element[1],
'state': element[2],
'index': element[3],
'phone': element[4],
'art': element[5],
} for element in sql_list
}
Keep in mind that every item in the dictionary needs to have a key and a value, and in your example you have a few values with no key.
If you have a list of the columns, you can use the zip function:
from collections import defaultdict
import datetime
# list of columns returned from your database query
columns = ["city", "state", "index", "phone", "art", "exp-date", "arrival-date"]
usercheck = defaultdict(list)
for row in cur.fetchall():
usercheck[row[0]] = defaultdict(list, zip(columns, row[1:]))
print usercheck
This will output a dictionary like:
defaultdict(<type 'list'>, {'madonna': defaultdict(<type 'list'>, {'city': 'Portland', 'art': 'music', 'index': '0.70', 'phone': '+5551234', 'state': 'Oregon', 'arrival-date': datetime.date(2016, 9, 1), 'exp-date': datetime.date(2016, 9, 8)}), 'jackson': defaultdict(<type 'list'>, {'city': 'Laredo', 'art': 'none', 'index': '2.03', 'phone': '+555345', 'state': 'Texas', 'arrival-date': datetime.date(2016, 5, 16), 'exp-date': datetime.date(2016, 5, 23)}), 'bohlen': defaultdict(<type 'list'>, {'city': 'P', 'art': 'PhD Student', 'index': '2.27', 'phone': '+555987', 'state': 'P', 'arrival-date': None, 'exp-date': datetime.date(2016, 9, 7)})})
When using defaultdict, the argument specifies the default value type in the dictionary.
from collections import defaultdict
usercheck = defaultdict(dict)
for accname, div_ort, standort, raum, telefon, position, exp, dep in cur.fetchall():
usercheck[accname]['city'] = div_ort
usercheck[accname]['state'] = standout
...
The keys in the dictionary are referenced using [key], not (key).

Group by and aggregate the values of a list of dictionaries in Python

I'm trying to write a function, in an elegant way, that will group a list of dictionaries and aggregate (sum) the values of like-keys.
Example:
my_dataset = [
{
'date': datetime.date(2013, 1, 1),
'id': 99,
'value1': 10,
'value2': 10
},
{
'date': datetime.date(2013, 1, 1),
'id': 98,
'value1': 10,
'value2': 10
},
{
'date': datetime.date(2013, 1, 2),
'id' 99,
'value1': 10,
'value2': 10
}
]
group_and_sum_dataset(my_dataset, 'date', ['value1', 'value2'])
"""
Should return:
[
{
'date': datetime.date(2013, 1, 1),
'value1': 20,
'value2': 20
},
{
'date': datetime.date(2013, 1, 2),
'value1': 10,
'value2': 10
}
]
"""
I've tried doing this using itertools for the groupby and summing each like-key value pair, but am missing something here. Here's what my function currently looks like:
def group_and_sum_dataset(dataset, group_by_key, sum_value_keys):
keyfunc = operator.itemgetter(group_by_key)
dataset.sort(key=keyfunc)
new_dataset = []
for key, index in itertools.groupby(dataset, keyfunc):
d = {group_by_key: key}
d.update({k:sum([item[k] for item in index]) for k in sum_value_keys})
new_dataset.append(d)
return new_dataset
You can use collections.Counter and collections.defaultdict.
Using a dict this can be done in O(N), while sorting requires O(NlogN) time.
from collections import defaultdict, Counter
def solve(dataset, group_by_key, sum_value_keys):
dic = defaultdict(Counter)
for item in dataset:
key = item[group_by_key]
vals = {k:item[k] for k in sum_value_keys}
dic[key].update(vals)
return dic
...
>>> d = solve(my_dataset, 'date', ['value1', 'value2'])
>>> d
defaultdict(<class 'collections.Counter'>,
{
datetime.date(2013, 1, 2): Counter({'value2': 10, 'value1': 10}),
datetime.date(2013, 1, 1): Counter({'value2': 20, 'value1': 20})
})
The advantage of Counter is that it'll automatically sum the values of similar keys.:
Example:
>>> c = Counter(**{'value1': 10, 'value2': 5})
>>> c.update({'value1': 7, 'value2': 3})
>>> c
Counter({'value1': 17, 'value2': 8})
Thanks, I forgot about Counter. I still wanted to maintain the output format and sorting of my returned dataset, so here's what my final function looks like:
def group_and_sum_dataset(dataset, group_by_key, sum_value_keys):
container = defaultdict(Counter)
for item in dataset:
key = item[group_by_key]
values = {k:item[k] for k in sum_value_keys}
container[key].update(values)
new_dataset = [
dict([(group_by_key, item[0])] + item[1].items())
for item in container.items()
]
new_dataset.sort(key=lambda item: item[group_by_key])
return new_dataset
Here's an approach using more_itertools where you simply focus on how to construct output.
Given
import datetime
import collections as ct
import more_itertools as mit
dataset = [
{"date": datetime.date(2013, 1, 1), "id": 99, "value1": 10, "value2": 10},
{"date": datetime.date(2013, 1, 1), "id": 98, "value1": 10, "value2": 10},
{"date": datetime.date(2013, 1, 2), "id": 99, "value1": 10, "value2": 10}
]
Code
# Step 1: Build helper functions
kfunc = lambda d: d["date"]
vfunc = lambda d: {k:v for k, v in d.items() if k.startswith("val")}
rfunc = lambda lst: sum((ct.Counter(d) for d in lst), ct.Counter())
# Step 2: Build a dict
reduced = mit.map_reduce(dataset, keyfunc=kfunc, valuefunc=vfunc, reducefunc=rfunc)
reduced
Output
defaultdict(None,
{datetime.date(2013, 1, 1): Counter({'value1': 20, 'value2': 20}),
datetime.date(2013, 1, 2): Counter({'value1': 10, 'value2': 10})})
The items are grouped by date and pertinent values are reduced as Counters.
Details
Steps
build helper functions to customize construction of keys, values and reduced values in the final defaultdict. Here we want to:
group by date (kfunc)
built dicts keeping the "value*" parameters (vfunc)
aggregate the dicts (rfunc) by converting to collections.Counters and summing them. See an equivalent rfunc below+.
pass in the helper functions to more_itertools.map_reduce.
Simple Groupby
... say in that example you wanted to group by id and date?
No problem.
>>> kfunc2 = lambda d: (d["date"], d["id"])
>>> mit.map_reduce(dataset, keyfunc=kfunc2, valuefunc=vfunc, reducefunc=rfunc)
defaultdict(None,
{(datetime.date(2013, 1, 1),
99): Counter({'value1': 10, 'value2': 10}),
(datetime.date(2013, 1, 1),
98): Counter({'value1': 10, 'value2': 10}),
(datetime.date(2013, 1, 2),
99): Counter({'value1': 10, 'value2': 10})})
Customized Output
While the resulting data structure clearly and concisely presents the outcome, the OP's expected output can be rebuilt as a simple list of dicts:
>>> [{**dict(date=k), **v} for k, v in reduced.items()]
[{'date': datetime.date(2013, 1, 1), 'value1': 20, 'value2': 20},
{'date': datetime.date(2013, 1, 2), 'value1': 10, 'value2': 10}]
For more on map_reduce, see the docs. Install via > pip install more_itertools.
+An equivalent reducing function:
def rfunc(lst: typing.List[dict]) -> ct.Counter:
"""Return reduced mappings from map-reduce values."""
c = ct.Counter()
for d in lst:
c += ct.Counter(d)
return c

Categories

Resources