Using default dictionaries problem (python) - python

I have a slightly weird input of data that is in this format:
data = { 'sensor1': {'units': 'x', 'values': [{'time': 17:00, 'value': 10},
{'time': 17:10, 'value': 12},
{'time': 17:20, 'value' :7}, ...]}
'sensor2': {'units': 'x', 'values': [{'time': 17:00, 'value': 9},
{'time': 17:20, 'value': 11}, ...]}
}
And I want to collect the output to look like:
{'17:00': [10,9], '17:10': [12,], '17:20': [7,11], ... }
So the keys are the unique timestamps (ordered) and the values are a list of the values of each sensor, in order they come in the original dictionary. If there is no value for the timestamp in one sensor, it is just left as an empty element ''. I know I might need to use defaultdict but I've not had any success.
e.g.
s = [('yellow', 1), ('blue', 2), ('yellow', 3), ('blue', 4), ('red', 1)]
d = defaultdict(list)
for k, v in s:
d[k].append(v)
sorted(d.items())
[('blue', [2, 4]), ('red', [1]), ('yellow', [1, 3])]
d = defaultdict(default_factory=list)
values_list = data.values()
for item in values_list:
for k, v in item['values']:
d[k].append(v)
result = sorted(d.items())
Encounters key error as each item in values_list is not a tuple but a dict.

You can also use dict in this way:
data = {'sensor1': {'units': 'x', 'values': [{'time': '17:00', 'value': 10},
{'time': '17:10', 'value': 12},
{'time': '17:20', 'value': 7},
]},
'sensor2': {'units': 'x', 'values': [{'time': '17:00', 'value': 9},
{'time': '17:20', 'value': 11},
]}
}
d = {}
for item in data.values():
for pair in item['values']:
if pair["time"] in d:
d[pair["time"]].append(pair["value"])
else:
d[pair["time"]] = [pair["value"]]
result = sorted(d.items())
print(result)
Output:
[('17:00', [10, 9]), ('17:10', [12]), ('17:20', [7, 11])]
Using defaultdict defaultdict example with list in Python documentation :
from collections import defaultdict
data = {'sensor1': {'units': 'x', 'values': [{'time': '17:00', 'value': 10},
{'time': '17:10', 'value': 12},
{'time': '17:20', 'value': 7},
]},
'sensor2': {'units': 'x', 'values': [{'time': '17:00', 'value': 9},
{'time': '17:20', 'value': 11},
]}
}
d = defaultdict(list)
for item in data.values():
for pair in item['values']:
d[pair["time"]].append(pair["value"])
result = sorted(d.items())
print(result)
Output:
[('17:00', [10, 9]), ('17:10', [12]), ('17:20', [7, 11])]

Related

How to combine all related dictionaries in a python list [duplicate]

Python newb...
I have a list of dicts that I am trying to organize into the same month & year:
[{'date':'2008-04-23','value':'1'},
{'date':'2008-04-01','value':'8'},
{'date':'2008-04-05','value':'3'},
{'date':'2009-04-19','value':'5'},
{'date':'2009-04-21','value':'8'},
{'date':'2010-09-09','value':'3'},
{'date':'2010-09-10','value':'4'},
]
What I'm trying to get is a list of dicts like this:
[{'date':2008-04-01,'value':'12'},
{'date':2009-04-01,'value':'13'},
{'date':2010-09-01,'value':'7'},
]
Here's my code, which is just printing an empty list:
from datetime import datetime
myList = [{'date':'2008-04-23','value':'1'}, {'date':'2008-04-01','value':'8'}, {'date':'2008-04-05','value':'3'}, {'date':'2009-04-19','value':'5'}, {'date':'2009-04-21','value':'8'},{'date':'2010-09-09','value':'3'},
{'date':'2010-09-10','value':'4'},
]
newList = []
newDict = {}
for cnt in range(len(myList)):
for k,v in myList[cnt].iteritems():
if k == 'date':
d = datetime.strptime(v,'%Y-%m-%d').date()
for elem in newList:
if elem['date'] != d:
newList.append({'date':d,'value':myList[cnt]['value']})
else:
newList[cnt]['value'] += myList[cnt]['value']
print newList
First, I would sort the data1:
>>> lst = [{'date':'2008-04-23','value':'1'},
... {'date':'2008-04-01','value':'8'},
... {'date':'2008-04-05','value':'3'},
... {'date':'2009-04-19','value':'5'},
... {'date':'2009-04-21','value':'8'},
... {'date':'2010-09-09','value':'3'},
... {'date':'2010-09-10','value':'4'},
... ]
>>> lst.sort(key=lambda x:x['date'][:7])
>>> lst
[{'date': '2008-04-23', 'value': '1'}, {'date': '2008-04-01', 'value': '8'}, {'date': '2008-04-05', 'value': '3'}, {'date': '2009-04-19', 'value': '5'}, {'date': '2009-04-21', 'value': '8'}, {'date': '2010-09-09', 'value': '3'}, {'date': '2010-09-10', 'value': '4'}]
Then, I would use itertools.groupby to do the grouping:
>>> from itertools import groupby
>>> for k,v in groupby(lst,key=lambda x:x['date'][:7]):
... print k, list(v)
...
2008-04 [{'date': '2008-04-23', 'value': '1'}, {'date': '2008-04-01', 'value': '8'}, {'date': '2008-04-05', 'value': '3'}]
2009-04 [{'date': '2009-04-19', 'value': '5'}, {'date': '2009-04-21', 'value': '8'}]
2010-09 [{'date': '2010-09-09', 'value': '3'}, {'date': '2010-09-10', 'value': '4'}]
>>>
Now, to get the output you wanted:
>>> for k,v in groupby(lst,key=lambda x:x['date'][:7]):
... print {'date':k+'-01','value':sum(int(d['value']) for d in v)}
...
{'date': '2008-04-01', 'value': 12}
{'date': '2009-04-01', 'value': 13}
{'date': '2010-09-01', 'value': 7}
1Your data actually already appears to be sorted in this regard, so you might be able to skip this step.
Use itertools.groupby:
data = [{'date':'2008-04-23','value':'1'},
{'date':'2008-04-01','value':'8'},
{'date':'2008-04-05','value':'3'},
{'date':'2009-04-19','value':'5'},
{'date':'2009-04-21','value':'8'},
{'date':'2010-09-09','value':'3'},
{'date':'2010-09-10','value':'4'},
]
import itertools
key = lambda datum: datum['date'].rsplit('-', 1)[0]
data.sort(key=key)
result = [{
'date': key + '-01',
'value': sum(int(item['value']) for item in group)
} for key, group in itertools.groupby(data, key=key)]
print result
# [{'date': '2008-04-01', 'value': 12},
# {'date': '2009-04-01', 'value': 13},
# {'date': '2010-09-01', 'value': 7}]
The accepted answer is correct, but its time complexity is O(n lg n) because of the sorting. Here's an (amortized) O(n) solution.
>>> L=[{'date':'2008-04-23','value':'1'},
... {'date':'2008-04-01','value':'8'},
... {'date':'2008-04-05','value':'3'},
... {'date':'2009-04-19','value':'5'},
... {'date':'2009-04-21','value':'8'},
... {'date':'2010-09-09','value':'3'},
... {'date':'2010-09-10','value':'4'},
... ]
This is what a Counter is made for:
>>> import collections
>>> value_by_month = collections.Counter()
>>> for d in L:
... value_by_month[d['date'][:7]+'-01'] += int(d['value'])
...
>>> value_by_month
Counter({'2009-04-01': 13, '2008-04-01': 12, '2010-09-01': 7})
And if your output has to be a dict object:
>>> dict(value_by_month)
{'2008-04-01': 12, '2009-04-01': 13, '2010-09-01': 7}
Bonus: if you want to avoid imports.
First, create a dict month -> list of values. The function setdefault is handy for building this type of dict:
>>> values_by_month = {}
>>> for d in L:
... values_by_month.setdefault(d['date'][:7], []).append(int(d['value']))
...
>>> values_by_month
{'2008-04': [1, 8, 3], '2009-04': [5, 8], '2010-09': [3, 4]}
Second, sum the values by month and set the date to first day:
>>> [{'date':m+'-01', 'value':sum(vs)} for m, vs in values_by_month.items()]
[{'date': '2008-04-01', 'value': 12}, {'date': '2009-04-01', 'value': 13}, {'date': '2010-09-01', 'value': 7}]

Fast way to check whether number in list is in given range [duplicate]

This question already has answers here:
Fast Algorithm to Quickly Find the Range a Number Belongs to in a Set of Ranges?
(5 answers)
Closed 4 years ago.
I have a list of dictionaries in the following way:
list1 = [{'some_id': 1, 'lower_range': 3, 'upper_range': 7},
{'some_id': 2, 'lower_range': 8, 'upper_range': 12},
{'some_id': 3, 'lower_range': 13, 'upper_range': 16}]
A second list contains some integers:
list2 = [{'value': 4, 'data': 'A'},
{'value': 8, 'data': 'B'},
{'value': 9, 'data': 'C'},
{'value': 15, 'data': 'D'}]
I now want to join 'some_id' and 'data' such that 'value' is between 'lower_range' and 'upper_range' in a new list. I.e., I want the output to be
list3 = [{'some_id': 1, 'data': 'A'},
{'some_id': 2, 'data': 'B'},
{'some_id': 2, 'data': 'C'},
{'some_id': 3, 'data': 'D'}]
One way to do this would be
list3 = []
for i in list1:
for j in list2:
if (j['value'] >= i['lower_range'] and
j['value'] <= i['upper_range']):
list3.append({'some_id': i['some_id'], 'data': j['data']})
However, this seems highly inefficient. Is there some faster way?
There is a special premise that the ranges do not overlap.
So we can find a candidate by searching for an element with the maximum lower_bound that satisfies the condition.
Binary search can reduce complexity from O(n*n) to O(n log n).
In python3, we can use bisect.
list1 = [{'some_id': 1, 'lower_range': 3, 'upper_range': 7},
{'some_id': 2, 'lower_range': 8, 'upper_range': 12},
{'some_id': 3, 'lower_range': 13, 'upper_range': 16}]
list2 = [{'value': 4, 'data': 'A'},
{'value': 8, 'data': 'B'},
{'value': 9, 'data': 'C'},
{'value': 15, 'data': 'D'}]
list3 = []
list1.sort(key = lambda r: r['lower_range'])
lower_ranges = [r['lower_range'] for r in list1]
from bisect import bisect_right
for record in list2:
position = bisect_right(lower_ranges, record['value']) - 1
if (position < 0): continue
candidate = list1[position]
if (record['value'] <= candidate['upper_range']):
list3.append({'some_id': candidate['some_id'], 'data': record['data']})
print(list3)
output (manual indented)
[{'some_id': 1, 'data': 'A'},
{'some_id': 2, 'data': 'B'},
{'some_id': 2, 'data': 'C'},
{'some_id': 3, 'data': 'D'}]
This is a bit verbose but should be more efficient (O(nlogn) < O(n^2)) due to sorting (you can also sort in-place with list.sort):
#!/usr/bin/env python
from operator import itemgetter
list1 = [{'some_id': 1, 'lower_range': 3, 'upper_range': 7},
{'some_id': 2, 'lower_range': 8, 'upper_range': 12},
{'some_id': 3, 'lower_range': 13, 'upper_range': 16}]
list2 = [{'value': 4, 'data': 'A'},
{'value': 8, 'data': 'B'},
{'value': 9, 'data': 'C'},
{'value': 15, 'data': 'D'}]
# sort before merging so we iterate less (O(nlogn))
list1 = sorted(list1, key=itemgetter('lower_range'))
list2 = sorted(list2, key=itemgetter('value'))
it1 = iter(list1)
it2 = iter(list2)
# merge lists that we know are sorted (simple merging algorithm - O(n))
try:
curr_range = next(it1)
curr_val = next(it2)
list3 = []
while True:
rng = range(curr_range['lower_range'], curr_range['upper_range'] + 1)
value = curr_val['value']
if value in rng:
# got a match, add it and check if there are more values
list3.append({'some_id': curr_range['some_id'],
'data': curr_val['data']})
curr_val = next(it2)
continue
if value < curr_range['lower_range']:
# no match, skip to next value
curr_val = next(it2)
continue
if value >= curr_range['upper_range']:
# range too low for value, try next one
curr_range = next(it1)
continue
except StopIteration:
pass
print(list3)
Gives:
[{'data': 'A', 'some_id': 1},
{'data': 'B', 'some_id': 2},
{'data': 'C', 'some_id': 2},
{'data': 'D', 'some_id': 3}]
You could create a dict that maps values to ids like {3: 1, 4: 1, 5: 1, ..., 8: 2, 9: 2, ...}, which would let you find each dict's id in constant O(1) time:
# create a dict that maps values to ids
value_to_id_dict = {}
for dic in list1:
id_ = dic['some_id']
for value in range(dic['lower_range'], dic['upper_range']+1):
value_to_id_dict[value] = id_
# look up each dict's id in the dict we just created
list3 = []
for dic in list2:
new_dic = {'data': dic['data'],
'some_id': value_to_id_dict[dic['value']]}
list3.append(new_dic)
# result:
# [{'data': 'A', 'some_id': 1},
# {'data': 'B', 'some_id': 2},
# {'data': 'C', 'some_id': 2},
# {'data': 'D', 'some_id': 3}]

Python list of dictionaries - adding the dicts with same key names [duplicate]

This question already has answers here:
Python sum on keys for List of Dictionaries [duplicate]
(5 answers)
Closed 4 years ago.
I have a python list like this:
user = [
{'name': 'ozzy', 'quantity': 5},
{'name': 'frank', 'quantity': 4},
{'name': 'ozzy', 'quantity': 3},
{'name': 'frank', 'quantity': 2},
{'name': 'james', 'quantity': 7},
]
I am trying to write the code to join the dictionaries with the same name by also adding the quantities. The final list will be that:
user = [
{'name': 'ozzy', 'quantity': 8},
{'name': 'frank', 'quantity': 6},
{'name': 'james', 'quantity': 7}
]
I have tried a few things but I am struggling to get the right code. The code I have written below is somewhat adding the values (actually my list is much longer, I have just added a small portion for reference).
newList = []
quan = 0
for i in range(0,len(user)):
originator = user[i]['name']
for j in range(i+1,len(user)):
if originator == user[j]['name']:
quan = user[i]['quantity'] + user[j]['quantity']
newList.append({'name': originator, 'Quantity': quan})
can you please help me to get the correct code?
Just count the items in a collections.Counter, and expand back to list of dicts if needed:
user = [
{'name': 'ozzy', 'quantity': 5},
{'name': 'frank', 'quantity': 4},
{'name': 'ozzy', 'quantity': 3},
{'name': 'frank', 'quantity': 2},
{'name': 'james', 'quantity': 7},
]
import collections
d = collections.Counter()
for u in user:
d[u['name']] += u['quantity']
print(dict(d))
newlist = [{'name' : k, 'quantity' : v} for k,v in d.items()]
print(newlist)
outputs Counter dict first, which is already sufficient:
{'frank': 6, 'ozzy': 8, 'james': 7}
and the reformatted output using list of dicts:
[{'name': 'frank', 'quantity': 6}, {'name': 'ozzy', 'quantity': 8}, {'name': 'james', 'quantity': 7}]
The solution is also straightforward with a standard dictionary. No need for Counter or OrderedDict here:
user = [
{'name': 'ozzy', 'quantity': 5},
{'name': 'frank', 'quantity': 4},
{'name': 'ozzy', 'quantity': 3},
{'name': 'frank', 'quantity': 2},
{'name': 'james', 'quantity': 7},
]
dic = {}
for item in user:
n, q = item.values()
dic[n] = dic.get(n,0) + q
print(dic)
user = [{'name':n, 'quantity':q} for n,q in dic.items()]
print(user)
Result:
{'ozzy': 8, 'frank': 6, 'james': 7}
[{'name': 'ozzy', 'quantity': 8}, {'name': 'frank', 'quantity': 6}, {'name': 'james', 'quantity': 7}]
I would suggest changing the way the output dictionary looks so that it is actually useable. Consider something like this
user = [
{'name': 'ozzy', 'quantity': 5},
{'name': 'frank', 'quantity': 4},
{'name': 'ozzy', 'quantity': 3},
{'name': 'frank', 'quantity': 2},
{'name': 'james', 'quantity': 7},
]
data = {}
for i in user:
print(i)
if i['name'] in data:
data[i['name']] += i['quantity']
else:
data.update({i['name']: i['quantity']})
print(data)
{'frank': 6, 'james': 7, 'ozzy': 8}
If you need to maintain the original relative order:
from collections import OrderedDict
user = [
{'name': 'ozzy', 'quantity': 5},
{'name': 'frank', 'quantity': 4},
{'name': 'ozzy', 'quantity': 3},
{'name': 'frank', 'quantity': 2},
{'name': 'james', 'quantity': 7},
]
d = OrderedDict()
for item in user:
d[item['name']] = d.get(item['name'], 0) + item['quantity']
newlist = [{'name' : k, 'quantity' : v} for k, v in d.items()]
print(newlist)
Output:
[{'name': 'ozzy', 'quantity': 8}, {'name': 'frank', 'quantity': 6}, {'name': 'james', 'quantity': 7}]
user = [
{'name': 'ozzy', 'quantity': 8},
{'name': 'frank', 'quantity': 6},
{'name': 'james', 'quantity': 7}
]
reference_dict = {}
for item in user :
reference_dict[item['name']] = reference_dict.get(item['name'],0) + item['quantity']
#Creating new list from reference dict
updated_user = [{'name' : k , 'quantity' : v} for k,v in reference_dict.items()]
print updated_user

Merge dicts from a list of dicts based on some key/value pair

I have a list of dicts shown below , I want to merge some dicts into one based some key/value pair.
[
{'key': 16, 'value': 3, 'user': 3, 'id': 7},
{'key': 17, 'value': 4, 'user': 3, 'id': 7},
{'key': 17, 'value': 5, 'user': 578, 'id': 7},
{'key': 52, 'value': 1, 'user': 3, 'id': 48},
{'key': 46, 'value': 2, 'user': 578, 'id': 48}
]
Now as you can see dict 1 & 2 have same values for user & id keys. So it is possible to merge these two dicts like
[
{'key': [16,17], 'value': [3,4], 'user': 3, 'id': 7},
{'key': [17], 'value': [5], 'user': 578, 'id': 7},
{'key': [52], 'value': [1], 'user': 3, 'id': 48},
{'key': [46], 'value': [2], 'user': 578, 'id': 48}
]
means user & id value must be unique together.What will be the efficient way to merge (if possible)
Following function will convert the list of dictionaries to new format:
def convert(d):
res = {}
for x in d:
key = (x['user'], x['id'])
if key in res:
res[key]['key'].append(x['key'])
res[key]['value'].append(x['value'])
else:
x['key'] = [x['key']]
x['value'] = [x['value']]
res[key] = x
return res.values()
It will mutate the original dictionaries and the ordering of dictionaries in the result will be random. When applied to the input it will produce following result:
[
{'id': 7, 'value': [5], 'key': [17], 'user': 578},
{'id': 7, 'value': [3, 4], 'key': [16, 17], 'user': 3},
{'id': 48, 'value': [1], 'key': [52], 'user': 3},
{'id': 48, 'value': [2], 'key': [46], 'user': 578}
]
Let dicts be your original list of dictionaries. This idea maps unique combinations of user and id to defaultdict(list) objects. The final result will be the list of values from that dictionary.
from collections import defaultdict
tmp = defaultdict(dict)
for info in dicts:
tmp[(info['user'], info['id'])].setdefault('key', []).append(info['key'])
tmp[(info['user'], info['id'])].setdefault('value', []).append(info['value'])
for (user, id_), d in tmp.items(): # python2: use iteritems
d.update(dict(user=user, id=id_))
result = list(tmp.values()) # python2: tmp.values() already gives a list
del tmp
You can use following aggregate function:
def aggregate(lst):
new = {}
for d in lst:
new.setdefault((d['user'], d['id']), []).append(d)
for k, d in new.items():
if len(d) > 1:
keys, values = zip(*[(sub['key'], sub['value']) for sub in d])
user, id_ = k
yield {'key': keys, 'value': values, 'user': user, 'id': id_}
else:
yield d[0]
print list(aggregate(lst))
[{'id': 7, 'value': 5, 'key': 17, 'user': 578},
{'id': 7, 'value': (3, 4), 'key': (16, 17), 'user': 3},
{'id': 48, 'value': 1, 'key': 52, 'user': 3},
{'id': 48, 'value': 2, 'key': 46, 'user': 578}]

Python, collect data from an array of dicts

I'm new with Python and I have this structure achieved from a DB
data=[
{'Value': '0.2', 'id': 1},
{'Value': '1.2', 'id': 1},
{'Value': '33.34', 'id': 2},
{'Value': '44.3', 'id': 3},
{'Value': '33.23', 'id': 3},
{'Value': '21.1', 'id': 4},
{'Value': '5.33', 'id': 4},
{'Value': '33.3', 'id': 5},
{'Value': '12.2', 'id': 5},
{'Value': '1.22', 'id': 5},
{'Value': '1.23', 'id': 6}
]
I know that I can get the id of a record with:
data[i]['id']
but I need to collect by ID in a proper data structure, in order to get the average values for every ID.
What is the better choice for this?
I'm thinking build a new dict for every ID set, but the IDs can grow in number, and I don't figure out how tackle this problem. If someone can give me some idea I would be very grateful.
Assuming your data is sorted by ID as it appears in your data variable, you can try using itertools.groupby, which can be instructed to group by id. You can then create a new dictionary that has keys equal to the id numbers and values equal to the means:
In [1]: from itertools import groupby
In [2]: data=[
...: {'Value': '0.2', 'id': 1},
...: {'Value': '1.2', 'id': 1},
...: {'Value': '33.34', 'id': 2},
...: {'Value': '44.3', 'id': 3},
...: {'Value': '33.23', 'id': 3},
...: {'Value': '21.1', 'id': 4},
...: {'Value': '5.33', 'id': 4},
...: {'Value': '33.3', 'id': 5},
...: {'Value': '12.2', 'id': 5},
...: {'Value': '1.22', 'id': 5},
...: {'Value': '1.23', 'id': 6}
...: ]
In [3]: means = {}
In [4]: for k, g in groupby(data, key=lambda x: x['id']):
...: g = list(g)
...: means[k] = sum(float(x['Value']) for x in g) / len(g)
...:
...:
In [5]: means
Out[5]:
{1: 0.69999999999999996,
2: 33.340000000000003,
3: 38.765000000000001,
4: 13.215,
5: 15.573333333333332,
6: 1.23}
(Updated: after DSM's comment.)
You could reshape the data like this:
from collections import defaultdict
data=[
{'Value': '0.2', 'id': 1},
{'Value': '1.2', 'id': 1},
{'Value': '33.34', 'id': 2},
{'Value': '44.3', 'id': 3},
{'Value': '33.23', 'id': 3},
{'Value': '21.1', 'id': 4},
{'Value': '5.33', 'id': 4},
{'Value': '33.3', 'id': 5},
{'Value': '12.2', 'id': 5},
{'Value': '1.22', 'id': 5},
{'Value': '1.23', 'id': 6}
]
newdata = defaultdict(list)
for r in data:
newdata[r['id']].append(float(r['Value']))
This would yield:
In [2]: newdata
Out[2]: defaultdict(<type 'list'>, {1: [0.2, 1.2], 2: [33.34], 3: [44.3, 33.23], 4: [21.1, 5.33], 5: [33.3, 12.2, 1.22], 6: [1.23]})
(Update 2)
Calculating the means is now simple with a dictionary comprehension:
mean = {id: sum(values) / len(values) for id, values in newdata.viewitems()}
Which gives:
In [4]: mean
Out[4]: {1: 0.7, 2: 33.34, 3: 38.765, 4: 13.215, 5: 15.573333333333332, 6: 1.23}
If you have numpy, you could use it for this easily:
import numpy
numpy.mean([x['id'] for x in data])
Otherwise, it would be as simple as:
from __future__ import division # if python2.7
ids = [x['id'] for x in data]
print sum(ids)/len(ids)
You can simply create a list of IDs after all have been collected:
id_list = [element['id'] for element in data]
From there you can calculate whatever you want.

Categories

Resources