Python, collect data from an array of dicts - python

I'm new with Python and I have this structure achieved from a DB
data=[
{'Value': '0.2', 'id': 1},
{'Value': '1.2', 'id': 1},
{'Value': '33.34', 'id': 2},
{'Value': '44.3', 'id': 3},
{'Value': '33.23', 'id': 3},
{'Value': '21.1', 'id': 4},
{'Value': '5.33', 'id': 4},
{'Value': '33.3', 'id': 5},
{'Value': '12.2', 'id': 5},
{'Value': '1.22', 'id': 5},
{'Value': '1.23', 'id': 6}
]
I know that I can get the id of a record with:
data[i]['id']
but I need to collect by ID in a proper data structure, in order to get the average values for every ID.
What is the better choice for this?
I'm thinking build a new dict for every ID set, but the IDs can grow in number, and I don't figure out how tackle this problem. If someone can give me some idea I would be very grateful.

Assuming your data is sorted by ID as it appears in your data variable, you can try using itertools.groupby, which can be instructed to group by id. You can then create a new dictionary that has keys equal to the id numbers and values equal to the means:
In [1]: from itertools import groupby
In [2]: data=[
...: {'Value': '0.2', 'id': 1},
...: {'Value': '1.2', 'id': 1},
...: {'Value': '33.34', 'id': 2},
...: {'Value': '44.3', 'id': 3},
...: {'Value': '33.23', 'id': 3},
...: {'Value': '21.1', 'id': 4},
...: {'Value': '5.33', 'id': 4},
...: {'Value': '33.3', 'id': 5},
...: {'Value': '12.2', 'id': 5},
...: {'Value': '1.22', 'id': 5},
...: {'Value': '1.23', 'id': 6}
...: ]
In [3]: means = {}
In [4]: for k, g in groupby(data, key=lambda x: x['id']):
...: g = list(g)
...: means[k] = sum(float(x['Value']) for x in g) / len(g)
...:
...:
In [5]: means
Out[5]:
{1: 0.69999999999999996,
2: 33.340000000000003,
3: 38.765000000000001,
4: 13.215,
5: 15.573333333333332,
6: 1.23}

(Updated: after DSM's comment.)
You could reshape the data like this:
from collections import defaultdict
data=[
{'Value': '0.2', 'id': 1},
{'Value': '1.2', 'id': 1},
{'Value': '33.34', 'id': 2},
{'Value': '44.3', 'id': 3},
{'Value': '33.23', 'id': 3},
{'Value': '21.1', 'id': 4},
{'Value': '5.33', 'id': 4},
{'Value': '33.3', 'id': 5},
{'Value': '12.2', 'id': 5},
{'Value': '1.22', 'id': 5},
{'Value': '1.23', 'id': 6}
]
newdata = defaultdict(list)
for r in data:
newdata[r['id']].append(float(r['Value']))
This would yield:
In [2]: newdata
Out[2]: defaultdict(<type 'list'>, {1: [0.2, 1.2], 2: [33.34], 3: [44.3, 33.23], 4: [21.1, 5.33], 5: [33.3, 12.2, 1.22], 6: [1.23]})
(Update 2)
Calculating the means is now simple with a dictionary comprehension:
mean = {id: sum(values) / len(values) for id, values in newdata.viewitems()}
Which gives:
In [4]: mean
Out[4]: {1: 0.7, 2: 33.34, 3: 38.765, 4: 13.215, 5: 15.573333333333332, 6: 1.23}

If you have numpy, you could use it for this easily:
import numpy
numpy.mean([x['id'] for x in data])
Otherwise, it would be as simple as:
from __future__ import division # if python2.7
ids = [x['id'] for x in data]
print sum(ids)/len(ids)

You can simply create a list of IDs after all have been collected:
id_list = [element['id'] for element in data]
From there you can calculate whatever you want.

Related

How to select only specific key-value pairs from a list of dictionaries?

This is my example :
dictlist = [{'Name': 'James', 'city': 'paris','type': 'A' },
{'Name': 'James','city': 'Porto','type': 'B'},
{'Name': 'Christian','city': 'LA','type': 'A'}]
I want to filter specific keys and values.
For example:
desiredKey = [Name,type]
desiredoutput = [{'Name': 'Lara', 'type': 'A' },
{'Name': 'James', 'type': 'B'},
{'Name': 'Christian','type': 'A'}]
I tried this, but it doesn't work
keys = dictlist[0].keys()
output= [d for d in dictlist if d.keys in desiredKey]
You can try something like this:
In [1]: dictlist = [{'Name': 'James', 'city': 'paris','type': 'A' },
...: {'Name': 'James','city': 'Porto','type': 'B'},
...: {'Name': 'Christian','city': 'LA','type': 'A'}]
In [2]: keys = ["Name","type"]
In [3]: res = []
In [5]: for dict1 in dictlist:
...: result = dict((k, dict1[k]) for k in keys if k in dict1)
...: res.append(result)
...:
In [6]: res
Out[6]:
[{'Name': 'James', 'type': 'A'},
{'Name': 'James', 'type': 'B'},
{'Name': 'Christian', 'type': 'A'}]
It's a bit more complicated than this answer but you can also use zip and itemgetter.
In [43]: list_of_dicts = [
...: {"a":1, "b":1, "c":1, "d":1},
...: {"a":2, "b":2, "c":2, "d":2},
...: {"a":3, "b":3, "c":3, "d":3},
...: {"a":4, "b":4, "c":4, "d":4}
...: ]
In [44]: allowed_keys = ("a", "c")
In [45]: filter_func = itemgetter(*allowed_keys)
In [46]: list_of_filtered_dicts = [
...: dict(zip(allowed_keys, filter_func(d)))
...: for d in list_of_dicts
...: ]
In [47]: list_of_filtered_dicts
Out[47]: [{'a': 1, 'c': 1}, {'a': 2, 'c': 2}, {'a': 3, 'c': 3}, {'a': 4, 'c': 4}]

Python list of dictionaries - adding the dicts with same key names [duplicate]

This question already has answers here:
Python sum on keys for List of Dictionaries [duplicate]
(5 answers)
Closed 4 years ago.
I have a python list like this:
user = [
{'name': 'ozzy', 'quantity': 5},
{'name': 'frank', 'quantity': 4},
{'name': 'ozzy', 'quantity': 3},
{'name': 'frank', 'quantity': 2},
{'name': 'james', 'quantity': 7},
]
I am trying to write the code to join the dictionaries with the same name by also adding the quantities. The final list will be that:
user = [
{'name': 'ozzy', 'quantity': 8},
{'name': 'frank', 'quantity': 6},
{'name': 'james', 'quantity': 7}
]
I have tried a few things but I am struggling to get the right code. The code I have written below is somewhat adding the values (actually my list is much longer, I have just added a small portion for reference).
newList = []
quan = 0
for i in range(0,len(user)):
originator = user[i]['name']
for j in range(i+1,len(user)):
if originator == user[j]['name']:
quan = user[i]['quantity'] + user[j]['quantity']
newList.append({'name': originator, 'Quantity': quan})
can you please help me to get the correct code?
Just count the items in a collections.Counter, and expand back to list of dicts if needed:
user = [
{'name': 'ozzy', 'quantity': 5},
{'name': 'frank', 'quantity': 4},
{'name': 'ozzy', 'quantity': 3},
{'name': 'frank', 'quantity': 2},
{'name': 'james', 'quantity': 7},
]
import collections
d = collections.Counter()
for u in user:
d[u['name']] += u['quantity']
print(dict(d))
newlist = [{'name' : k, 'quantity' : v} for k,v in d.items()]
print(newlist)
outputs Counter dict first, which is already sufficient:
{'frank': 6, 'ozzy': 8, 'james': 7}
and the reformatted output using list of dicts:
[{'name': 'frank', 'quantity': 6}, {'name': 'ozzy', 'quantity': 8}, {'name': 'james', 'quantity': 7}]
The solution is also straightforward with a standard dictionary. No need for Counter or OrderedDict here:
user = [
{'name': 'ozzy', 'quantity': 5},
{'name': 'frank', 'quantity': 4},
{'name': 'ozzy', 'quantity': 3},
{'name': 'frank', 'quantity': 2},
{'name': 'james', 'quantity': 7},
]
dic = {}
for item in user:
n, q = item.values()
dic[n] = dic.get(n,0) + q
print(dic)
user = [{'name':n, 'quantity':q} for n,q in dic.items()]
print(user)
Result:
{'ozzy': 8, 'frank': 6, 'james': 7}
[{'name': 'ozzy', 'quantity': 8}, {'name': 'frank', 'quantity': 6}, {'name': 'james', 'quantity': 7}]
I would suggest changing the way the output dictionary looks so that it is actually useable. Consider something like this
user = [
{'name': 'ozzy', 'quantity': 5},
{'name': 'frank', 'quantity': 4},
{'name': 'ozzy', 'quantity': 3},
{'name': 'frank', 'quantity': 2},
{'name': 'james', 'quantity': 7},
]
data = {}
for i in user:
print(i)
if i['name'] in data:
data[i['name']] += i['quantity']
else:
data.update({i['name']: i['quantity']})
print(data)
{'frank': 6, 'james': 7, 'ozzy': 8}
If you need to maintain the original relative order:
from collections import OrderedDict
user = [
{'name': 'ozzy', 'quantity': 5},
{'name': 'frank', 'quantity': 4},
{'name': 'ozzy', 'quantity': 3},
{'name': 'frank', 'quantity': 2},
{'name': 'james', 'quantity': 7},
]
d = OrderedDict()
for item in user:
d[item['name']] = d.get(item['name'], 0) + item['quantity']
newlist = [{'name' : k, 'quantity' : v} for k, v in d.items()]
print(newlist)
Output:
[{'name': 'ozzy', 'quantity': 8}, {'name': 'frank', 'quantity': 6}, {'name': 'james', 'quantity': 7}]
user = [
{'name': 'ozzy', 'quantity': 8},
{'name': 'frank', 'quantity': 6},
{'name': 'james', 'quantity': 7}
]
reference_dict = {}
for item in user :
reference_dict[item['name']] = reference_dict.get(item['name'],0) + item['quantity']
#Creating new list from reference dict
updated_user = [{'name' : k , 'quantity' : v} for k,v in reference_dict.items()]
print updated_user

What Is a Pythonic Way to Build a Dict of Dictionary-Lists by Attribute?

I'm looking for pythonic way to convert list of tuples which looks like this:
res = [{type: 1, name: 'Nick'}, {type: 2, name: 'Helma'}, ...]
To dict like this:
{1: [{type: 1, name: 'Nick'}, ...], 2: [{type: 2, name: 'Helma'}, ...]}
Now i do this with code like this (based on this question):
d = defaultdict(list)
for v in res:
d[v["type"]].append(v)
Is this a Pythonic way to build dict of lists of objects by attribute?
I agree with the commentators that here, list comprehension will lack, well, comprehension.
Having said that, here's how it can go:
import itertools
a = [{'type': 1, 'name': 'Nick'}, {'type': 2, 'name': 'Helma'}, {'type': 1, 'name': 'Moshe'}]
by_type = lambda a: a['type']
>>> dict([(k, list(g)) for (k, g) in itertools.groupby(sorted(a, key=by_type), key=by_type)])
{1: [{'name': 'Nick', 'type': 1}, {'name': 'Moshe', 'type': 1}], ...}
The code first sorts by 'type', then uses itertools.groupby to group by the exact same critera.
I stopped understanding this code 15 seconds after I finished writing it :-)
You could do it with a dictionary comprehension, which wouldn't be as illegible or incomprehensible as the comments suggest (IMHO):
# A collection of name and type dictionaries
res = [{'type': 1, 'name': 'Nick'},
{'type': 2, 'name': 'Helma'},
{'type': 3, 'name': 'Steve'},
{'type': 1, 'name': 'Billy'},
{'type': 3, 'name': 'George'},
{'type': 4, 'name': 'Sylvie'},
{'type': 2, 'name': 'Wilfred'},
{'type': 1, 'name': 'Jim'}]
# Creating a dictionary by type
res_new = {
item['type']: [each for each in res
if each['type'] == item['type']]
for item in res
}
>>>res_new
{1: [{'name': 'Nick', 'type': 1},
{'name': 'Billy', 'type': 1},
{'name': 'Jim', 'type': 1}],
2: [{'name': 'Helma', 'type': 2},
{'name': 'Wilfred', 'type': 2}],
3: [{'name': 'Steve', 'type': 3},
{'name': 'George', 'type': 3}],
4: [{'name': 'Sylvie', 'type': 4}]}
Unless I missed something, this should give you the result you're looking for.

Count the number of items from a dictionary inside another dictionary in Python

What better way (most elegant) to count the number of items {'id': n}?
'childs': {
1: [{'id': 1}, {'id': 2}, {'id': 3}],
2: [{'id': 4}],
3: [{'id': 5}, {'id': 6},]
}
>>> 6
You should answer to the comments to have a proper answer.
In the meantime, with d being the dict, I'd go with:
sum(len(x) for x in d['childs'].itervalues())
If you don't want to use for, you can do:
sum(map(len, d['childs'].itervalues()))
Or a twisted:
reduce(lambda x, y: x + len(y), d['childs'].itervalues(), 0)
But really the first version is how you would do it. It's classic Python.
As far as I understood your problem, the main task is to count the total occurrence of key "id" whenever it has value of "n". Here the dictionary name is "Childs".
count = 0
for key,value in Childs.iteritems():
if Childs[key]["id"]:
count += 1
print count
Hope it helps!!
cat = {1: [{'id': 1}, {'id': 2}, {'id': 3}], 2: [{'id': 4}], 3: [{'id': 5}, {'id': 6},]}
sum(len(x) for x in cat.values())
myDict = {'childs': {
1: [{'id': 1}, {'id': 2}, {'id': 3}],
2: [{'id': 4}],
3: [{'id': 5}, {'id': 6},]
}}
count = 0
for key in myDict["childs"]:
for item in myDict["childs"][key]:
if checkIfItemIsValid(item):
count += 1
print count
d = {
1: [{'id': 1}, {'id': 2}, {'id': 3}],
2: [{'id': 4}],
3: [{'id': 5}, {'id': 6},]
}
sum(len(i) for i in d.itervalues())
# 6
But this assumes that every member of that child value list is actually an item you want to count.
Here is one other way without using for:
sum(map(len, d.itervalues()))
In case some items may not have the 'id' key and you are only trying to count those which have that key:
sum(1 for obj in i for i in d.itervalues() if 'id' in obj)
or
len([obj for obj in i for i in d.itervalues() if 'id' in obj])

Given a list of dictionaries, how can I eliminate duplicates of one key, and sort by another

I'm working with a list of dict objects that looks like this (the order of the objects differs):
[
{'name': 'Foo', 'score': 1},
{'name': 'Bar', 'score': 2},
{'name': 'Foo', 'score': 3},
{'name': 'Bar', 'score': 3},
{'name': 'Foo', 'score': 2},
{'name': 'Baz', 'score': 2},
{'name': 'Baz', 'score': 1},
{'name': 'Bar', 'score': 1}
]
What I want to do is remove duplicate names, keeping only the one of each name that has the highest 'score'. The results from the above list would be:
[
{'name': 'Baz', 'score': 2},
{'name': 'Foo', 'score': 3},
{'name': 'Bar', 'score': 3}
]
I'm not sure which pattern to use here (aside from a seemingly idiotic loop that keeps checking if the current dict's 'name' is in the list already and then checking if its 'score' is higher than the existing one's 'score'.
One way to do that is:
data = collections.defaultdict(list)
for i in my_list:
data[i['name']].append(i['score'])
output = [{'name': i, 'score': max(j)} for i,j in data.items()]
so output will be:
[{'score': 2, 'name': 'Baz'},
{'score': 3, 'name': 'Foo'},
{'score': 3, 'name': 'Bar'}]
There's no need for defaultdicts or sets here. You can just use dirt simple dicts and lists.
Summarize the best running score in a dictionary and convert the result back into a list:
>>> s = [
{'name': 'Foo', 'score': 1},
{'name': 'Bar', 'score': 2},
{'name': 'Foo', 'score': 3},
{'name': 'Bar', 'score': 3},
{'name': 'Foo', 'score': 2},
{'name': 'Baz', 'score': 2},
{'name': 'Baz', 'score': 1},
{'name': 'Bar', 'score': 1}
]
>>> d = {}
>>> for entry in s:
name, score = entry['name'], entry['score']
d[name] = max(d.get(name, 0), score)
>>> [{'name': name, 'score': score} for name, score in d.items()]
[{'score': 2, 'name': 'Baz'}, {'score': 3, 'name': 'Foo'}, {'score': 3, 'name': 'Bar'}]
Just for fun, here is a purely functional approach:
>>> map(dict, dict(sorted(map(sorted, map(dict.items, s)))).items())
[{'score': 3, 'name': 'Bar'}, {'score': 2, 'name': 'Baz'}, {'score': 3, 'name': 'Foo'}]
Sorting is half the battle.
import itertools
import operator
scores = [
{'name': 'Foo', 'score': 1},
{'name': 'Bar', 'score': 2},
{'name': 'Foo', 'score': 3},
{'name': 'Bar', 'score': 3},
{'name': 'Foo', 'score': 2},
{'name': 'Baz', 'score': 2},
{'name': 'Baz', 'score': 1},
{'name': 'Bar', 'score': 1}
]
result = []
sl = sorted(scores, key=operator.itemgetter('name', 'score'),
reverse=True)
name = object()
for el in sl:
if el['name'] == name:
continue
name = el['name']
result.append(el)
print result
This is the simplest way I can think of:
names = set(d['name'] for d in my_dicts)
new_dicts = []
for name in names:
d = dict(name=name)
d['score'] = max(d['score'] for d in my_dicts if d['name']==name)
new_dicts.append(d)
#new_dicts
[{'score': 2, 'name': 'Baz'},
{'score': 3, 'name': 'Foo'},
{'score': 3, 'name': 'Bar'}]
Personally, I prefer not to import modules when the problem is too small.
In case you haven't heard of group by, this is nice use of it:
from itertools import groupby
data=[
{'name': 'Foo', 'score': 1},
{'name': 'Bar', 'score': 2},
{'name': 'Foo', 'score': 3},
{'name': 'Bar', 'score': 3},
{'name': 'Foo', 'score': 2},
{'name': 'Baz', 'score': 2},
{'name': 'Baz', 'score': 1},
{'name': 'Bar', 'score': 1}
]
keyfunc=lambda d:d['name']
data.sort(key=keyfunc)
ans=[]
for k, g in groupby(data, keyfunc):
ans.append({k:max((d['score'] for d in g))})
print ans
>>>
[{'Bar': 3}, {'Baz': 2}, {'Foo': 3}]
I think I can come up with an one-liner here:
result = dict((x['name'],x) for x in sorted(data,key=lambda x: x['score'])).values()

Categories

Resources