python nested dict in array group sum - python

I have this data structure in Python:
result = {
"data": [
{
"2015-08-27": {
"clicks": 10,
"views":20
}
},
{
"2015-08-28": {
"clicks": 6,
}
}
]
}
How can I add the elements of each dictionary? The output should be :
{
"clicks":16, # 10 + 6
"views":20
}
I am looking for a Pythonic solution for this. Any solutions using Counter are welcome but I am not able to implement it.
I have tried this but I get an error:
counters = []
for i in result:
for k,v in i.items():
counters.append(Counter(v))
sum(counters)

Your code was quite close to a workable solution, and we can make it work with a few important changes. The most important change is that we need to iterate over the "data" item in result.
from collections import Counter
result = {
"data": [
{
"2015-08-27": {
"clicks": 10,
"views":20
}
},
{
"2015-08-28": {
"clicks": 6,
}
}
]
}
counts = Counter()
for d in result['data']:
for k, v in d.items():
counts.update(v)
print(counts)
output
Counter({'views': 20, 'clicks': 16})
We can simplify that a little because we don't need the keys.
counts = Counter()
for d in result['data']:
for v in d.values():
counts.update(v)
The code you posted makes a list of Counters and then tries to sum them. I guess that's also a valid strategy, but unfortunately the sum built-in doesn't know how to add Counters together. But we can do it using functools.reduce.
from functools import reduce
counters = []
for d in result['data']:
for v in d.values():
counters.append(Counter(v))
print(reduce(Counter.__add__, counters))
However, I suspect that the first version will be faster, especially if there are lots of dicts to add together. Also, this version consumes more RAM, since it keeps a list of all the Counters.
Actually we can use sum to add the Counters together, we just have to give it an empty Counter as the start value.
print(sum(counters, Counter()))
We can combine this into a one-liner, eliminating the list by using a generator expression instead:
from collections import Counter
result = {
"data": [
{
"2015-08-27": {
"clicks": 10,
"views":20
}
},
{
"2015-08-28": {
"clicks": 6,
}
}
]
}
totals = sum((Counter(v) for i in result['data'] for v in i.values()), Counter())
print(totals)
output
Counter({'views': 20, 'clicks': 16})

This is not the best solution as I am sure that there are libraries that can get you there in a less verbose way but it is one you can easily read.
res = {}
for x in my_dict['data']:
for y in x:
for t in x[y]:
res.setdefault(t, 0)
res[t] += x[y][t]
print(res) # {'views': 20, 'clicks': 16}

Related

Fastest way to get values from dictionary to another dictionary in python without using if?

I need to find a way to get values from one dictionary to another, bases on key name match without using two loops \ if statement.
Main goal is to make it run more efficiently since it's a part of a larger code and run on multiple threads.
If you can keep the dictionary structure it would help
The second dict is initialized with values 0 in advanced
dict_1 = {
"school": {
"main": ["first_key"]
},
"specific": {
"students": [
{
"name": "sam",
"age": 13
},
{
"name": "dan",
"age": 9
},
{
"name": "david",
"age": 20
},
{
"name": "mike",
"age": 5
}
],
"total_students": 4
}
}
dict_2 = {'sam': 0, 'mike': 0, 'david': 0, 'dan': 0, 'total_students': 0}
for i in dict_1["specific"]['students']:
for x in dict_2:
if x == i["name"]:
dict_2[x] = i["age"]
dict_2["total_students"] = dict_1["specific"]["total_students"]
print(dict_2)
Is there more elegant way of doing it?
You don't need two loops at all! You don't even need to initialize dict_2 in advance. Simply loop over dict_1["specific"]["students"] and assign the ages to dict_2 without an if.
for student in dict_1["specific"]["students"]:
student_name = student["name"]
student_age = student["age"]
dict_2[student_name] = student_age
You could also write this as a comprehension:
dict_2 = {student["name"]: student["age"] for student in dict_1["specific"]["students"]}
Both these give the following dict_2:
{'sam': 13, 'dan': 9, 'david': 20, 'mike': 5}
Then you can set dict_2["total_students"] like you already do, but outside any loops.
dict_2["total_students"] = dict_1["specific"]["total_students"]
If you only want to assign ages for students already in dict_2, you need the if. However, you can still do this with a single loop instead of two. :
for student in dict_1["specific"]["students"]:
student_name = student["name"]
if student_name not in dict_2:
continue # skip the rest of this loop
student_age = student["age"]
dict_2[student_name] = student_age
Both these approaches use a single loop, so they're going to be faster than your two-loop approach.
for i in dict_1['specific']['students']:
dict_2[i['name']]=i['age']
dict_2['total_students']=dict_1['specific']['total_students']
Runs in O(N) time and O(N) space.
Yours currently runs in O(N^2) time and O(N) space

How to set specific attributes of a dictionary to []

Let us imagine the following dictionary
dictionary = {
"key1": {
"value": [1, 3, 5],
},
"key2": {
"value": [1, 2, -1],
},
}
Is it possible to set all the "values" to [] without iterating over the dictionary keys? I want something like dictionary[]["value"]=[] such that all "value" attributes are set to []. But that doesn't work.
Because you need to avoid iteration, here is a little hacky way of solving the case.
Convert dictionary to string, replace and then back to dictionary:
import re, ast
dictionary = {
"key1": {
"value": [1, 3, 5],
},
"key2": {
"value": [1, 2, -1],
},
}
print(ast.literal_eval(re.sub(r'\[.*?\]', '[]', str(dictionary))))
# {'key1': {'value': []}, 'key2': {'value': []}}
I'm going to take a different tack here. Your question is a little misinformed. The implication is that it's "better" to avoid iterating dictionary keys. As mentioned, you can iterate over dictionary values. But, since internally Python stores dictionaries via two arrays, iteration is unavoidable.
Returning to your core question:
I want something like dictionary[]["value"]=[] such that all "value"
attributes are set to [].
Just use collections.defaultdict:
from collections import defaultdict
d = {k: defaultdict(list) for k in dictionary}
print(d['key1']['value']) # []
print(d['key2']['value']) # []
For the dictionary structure you have defined, this will certainly be more efficient than string conversion via repr + regex substitution.
If you insist on explicitly setting keys, you can avoid defaultdict at the cost of an inner dictionary comprehension:
d = {k: {i: [] for i in v} for k, v in dictionary.items()}
{'key1': {'value': []}, 'key2': {'value': []}}

Remove duplicate of a dictionary from list

How can i remove duplicate of the key "name"
[
{
'items':[
{
'$oid':'5a192d0590866ecc5c1f1683'
}
],
'image':'image12',
'_id':{
'$oid':'5a106f7490866e25ddf70cef'
},
'name':'Amala',
'store':{
'$oid':'5a0a10ad90866e5abae59470'
}
},
{
'items':[
{
'$oid':'5a192d2890866ecc5c1f1684'
}
],
'image':'fourth shit',
'_id':{
'$oid':'5a106fa190866e25ddf70cf0'
},
'name':'Amala',
'store':{
'$oid':'5a0a10ad90866e5abae59470'
}
}
]
I want to marge together dictionary with the same key "name"
Here is what i have tried
b = []
for q in data:
if len(data) == 0:
b.append(q)
else:
for y in b:
if q['name'] != y['name']:
b.append(q)
but after trying this the b list doesn't return unique dictionary that i wanted
You loop through the assembled list and if you find a dict with a different name, you add the current dict. The logic should be different: only add it if you don't find one with the same name!
That being said, you should maintain a set of seen names. That will make the check more performant:
b, seen = [], set()
for q in data:
if q['name'] not in seen:
b.append(q)
seen.add(q['name'])

How to join two lists of dictionaries in Python?

I have the following simple data structures:
teams = [ { 'league_id': 1, 'name': 'Kings' }, { 'league_id': 1, 'name': 'Sharkls' }, { 'league_id': 2, 'name': 'Reign' }, { 'league_id': 2, 'name': 'Heat' } ]
leagues = [ { 'league_id': 1, 'name': 'League 1' }, { 'league_id': 2, 'name': 'League 2' } ]
And I have the following dict comprehension:
league_teams = { x['league_id']: [ t['name']
for t in teams if t['league_id'] == x ['league_id'] ]
for x in leagues }
Which yields:
{1: ['Kings', 'Sharkls'], 2: ['Reign', 'Heat']}
Is there a simpler way using itertools or something to get that dict? This feels a little cumbersome.
Here's an adaptation of Moinuddin Quadri's O(n+m) solution that catches the "empty league" case, and which incidentally does not require any modules to be imported. The dict output does double-duty as his league_ids set, and since it's pre-initialized, it does not need to be a collections.defaultdict:
output = { league['league_id']:[] for league in leagues }
for team in teams:
if team['league_id'] in output:
output[team['league_id']].append(team['name'])
print(output)
The output is:
{1: ['Kings', 'Sharkls'], 2: ['Reign', 'Heat']}
You do not need itertools here, instead collections.defaultdict is better choice. Complexity of your solution is O(n*m) whereas with defaultdict, it will be O(n+m).
You can achieve what you want like:
from collections import defaultdict
# create set to store `league_id` in `leagues`. Set holds unique
# values and also searching in set is faster than in normal list
leagues_id = set([item['league_id'] for item in leagues])
my_dict = defaultdict(list)
for item in teams:
if item['league_id'] in leagues_id:
my_dict[item['league_id']].append(item['name'])
where at the end my_dict will hold the value:
{1: ['Kings', 'Sharkls'], 2: ['Reign', 'Heat']}
Edit: In case you also want entry in my_dict for the league_id present in leagues, but not in teams, you need to explictly make entries like:
for leagues_id in leagues_ids:
_ = my_dict[leagues_id] # Will create empty list for such ids
Checking t['league_id'] == x['league_id'] looks not necessary.
You can simplify with:
import collections
league_teams = collections.defaultdict(list)
for t in teams:
league_teams[t['league_id']].append(t['name'])
If you really want itertools for that:
import itertools
league_teams = {k: [t['name'] for t in g]
for k, g in itertools.groupby(teams, key=lambda t: t['league_id'])}
But it will only work if the teams list is sorted.

Converting a python dictionary to a JSON - like object

I know this is going to sound like I just need to use json.loads from the title. But I don't think that's the case here. Or at least I should be able to do this without any libraries (plus I want to understand this and not just solve it with a library method).
What I have currently is a dictionary where the keys are words and the values are total counts for those words:
myDict = { "word1": 12, "word2": 18, "word3": 4, "word4": 45 }
and so on...
what I want is for it to become something like the following (so that I can insert it into a scraperwiki datastore):
myNewDict = {"entry": "word1", "count": 12, "entry": "word2", "count": 18, "entry": "word3", "count": 4, "entry": "word4", "count": 45}
I figured I could just loop over myDict and insert each key/value after my new chosen keys "entry" and "count" (like so):
for k, v in myDict.iteritems():
myNewDict = { "entry": k, "count": v }
but with this, myNewDict is only saving the last result (so only giving me myNewDict={"entry":"word4", "total":45}
what am I missing here?
What you need is a list:
entries = []
for k, v in myDict.iteritems():
entries.append({ "entry": k, "count": v })
Or even better with list comprehensions:
entries = [{'entry': k, 'count': v} for k, v in myDict.iteritems()]
In more details, you were overriding myDict at each iteration of the loop, creating a new dict every time. In case you need it, you could can add key/values to a dict like this :
myDict['key'] = ...
.. but used inside a loop, this would override the previous value associated to 'key', hence the use of a list. In the same manner, if you type:
myNewDict = {"entry": "word1", "count": 12, "entry": "word2", "count": 18, "entry": "word3", "count": 4, "entry": "word4", "count": 45}
you actually get {'count': 45, 'entry': 'word4'} !
Note: I don't know what's the expected format of the output data but JSON-wise, a list of dicts should be correct (in JSON keys are unique too I believe).
While it's not clear 100% clear what output you want, if you want just a string in the format you've outlined above, you could modify your loop to be of the form:
myCustomFormat = '{'
for k, v in myDict.iteritems():
myCustomFormat += '"entry": {0}, "count": {1},'.format(k, v)
# Use slice to trim off trailing comma
myCustomFormat = myCustomFormat[:-1] + '}'
That being said, this probably isn't what you want. As others have pointed out, the duplicative nature of the "keys" will make this somewhat difficult to parse.

Categories

Resources