I want to extract all elements of the maximum running date grouped by their codes from a list of dictionary.
Here is what I got so far:
import datetime
from itertools import groupby
commission_list = [
{'code': 'COMMISSION_CODE1', 'runningdt': datetime.datetime.strptime('2016-04-12', '%Y-%m-%d'), 'value': 150},
{'code': 'COMMISSION_CODE1', 'runningdt': datetime.datetime.strptime('2016-04-12', '%Y-%m-%d'), 'value': 450},
{'code': 'COMMISSION_CODE1', 'runningdt': datetime.datetime.strptime('2016-04-16', '%Y-%m-%d'), 'value': 140},
{'code': 'COMMISSION_CODE1', 'runningdt': datetime.datetime.strptime('2016-04-17', '%Y-%m-%d'), 'value': 120},
{'code': 'COMMISSION_CODE1', 'runningdt': datetime.datetime.strptime('2016-04-17', '%Y-%m-%d'), 'value': 220},
{'code': 'COMMISSION_CODE2', 'runningdt': datetime.datetime.strptime('2016-04-11', '%Y-%m-%d'), 'value': 150},
{'code': 'COMMISSION_CODE2', 'runningdt': datetime.datetime.strptime('2016-04-15', '%Y-%m-%d'), 'value': 140},
{'code': 'COMMISSION_CODE2', 'runningdt': datetime.datetime.strptime('2016-04-16', '%Y-%m-%d'), 'value': 160},
{'code': 'COMMISSION_CODE2', 'runningdt': datetime.datetime.strptime('2016-04-19', '%Y-%m-%d'), 'value': 210},
{'code': 'COMMISSION_CODE3', 'runningdt': datetime.datetime.strptime('2016-04-16', '%Y-%m-%d'), 'value': 330},
{'code': 'COMMISSION_CODE3', 'runningdt': datetime.datetime.strptime('2016-04-20', '%Y-%m-%d'), 'value': 310},
{'code': 'COMMISSION_CODE3', 'runningdt': datetime.datetime.strptime('2016-04-20', '%Y-%m-%d'), 'value': 410},
]
latest_run_commissions = []
for key, commission_group in groupby(commission_list, lambda x: x['code']):
tem = list(commission_group)
the_last_com = (max(tem, key=lambda x: x['runningdt']))
filtered_objs = filter(lambda f: f['runningdt'] == the_last_com['runningdt'], tem)
for o in filtered_objs:
latest_run_commissions.append(o)
for f in latest_run_commissions:
print(f)
print(" ")
Are there any more effective and efficient ways out there? Your advice or suggestions will be much appreciated and welcomed.
You can use itemgetter from the operator module to do this efficiently.
In [16]: from operator import itemgetter
In [17]: sorted_data = sorted(commission_list, key=itemgetter('code'))
In [18]: for g, data in groupby(sorted_data, key=itemgetter('code')):
....: print(max(data, key=itemgetter('runningdt')))
....:
{'runningdt': datetime.datetime(2016, 4, 17, 0, 0), 'code': 'COMMISSION_CODE1', 'value': 120}
{'runningdt': datetime.datetime(2016, 4, 19, 0, 0), 'code': 'COMMISSION_CODE2', 'value': 210}
{'runningdt': datetime.datetime(2016, 4, 20, 0, 0), 'code': 'COMMISSION_CODE3', 'value': 310}
Related
I have data coming from API like below:
> {'Message': {'Success': True, 'ErrorMessage': ''},
> 'StoresAttributes': [{'StoreCode': '1004',
> 'Categories': [{'Code': 'Lctn',
> 'Attribute': {'Code': 'Long', 'Value': '16.99390523395146'}},
> {'Code': 'Lctn',
> 'Attribute': {'Code': 'Lat', 'Value': '52.56718450856377'}},
> {'Code': 'Offr', 'Attribute': {'Code': 'Bake', 'Value': 'True'}},
> {'Code': 'Pay', 'Attribute': {'Code': 'SCO', 'Value': 'True'}}]},
> {'StoreCode': '1005',
> 'Categories': [{'Code': 'Lctn',
> 'Attribute': {'Code': 'Long', 'Value': '14.2339250'}},
> {'Code': 'Lctn', 'Attribute': {'Code': 'Lat', 'Value': '53.8996090'}},
> {'Code': 'Offr', 'Attribute': {'Code': 'Bake', 'Value': 'True'}},
> {'Code': 'Pay', 'Attribute': {'Code': 'SCO', 'Value': 'True'}},
> {'Code': 'Offr', 'Attribute': {'Code': 'Bchi', 'Value': 'True'}}]},
And I want to make data frame from it. I have tried with loop or pd.DataFrame() function but it didn't work properly.
What I want to achieve is df with subsequent columns:
StoreCode: 1004,
Long: 16,99,
Lat: 52,56,
Bake: True.
Can please anyone help?
Below screen with my result from json_normalize
error
You can use json_normalize then pivot:
import pandas as pd
data = {'Message': {'Success': True, 'ErrorMessage': ''}, 'StoresAttributes': [{'StoreCode': '1004', 'Categories': [{'Code': 'Lctn', 'Attribute': {'Code': 'Long', 'Value': '16.99390523395146'}}, {'Code': 'Lctn', 'Attribute': {'Code': 'Lat', 'Value': '52.56718450856377'}}, {'Code': 'Offr', 'Attribute': {'Code': 'Bake', 'Value': 'True'}}, {'Code': 'Pay', 'Attribute': {'Code': 'SCO', 'Value': 'True'}}]}, {'StoreCode': '1005', 'Categories': [{'Code': 'Lctn', 'Attribute': {'Code': 'Long', 'Value': '14.2339250'}}, {'Code': 'Lctn', 'Attribute': {'Code': 'Lat', 'Value': '53.8996090'}}, {'Code': 'Offr', 'Attribute': {'Code': 'Bake', 'Value': 'True'}}, {'Code': 'Pay', 'Attribute': {'Code': 'SCO', 'Value': 'True'}}, {'Code': 'Offr', 'Attribute': {'Code': 'Bchi', 'Value': 'True'}}]}]}
df = pd.json_normalize(data['StoresAttributes'], meta='StoreCode', record_path='Categories')
df.pivot(columns='Attribute.Code', values='Attribute.Value', index='StoreCode')
Output:
Attribute.Code Bake Bchi Lat Long SCO
StoreCode
1004 True NaN 52.56718450856377 16.99390523395146 True
1005 True True 53.8996090 14.2339250 True
You can use json_normalize() function like this:
data = [
{"id": 1, "name": {"first": "Coleen", "last": "Volk"}},
{"name": {"given": "Mark", "family": "Regner"}},
{"id": 2, "name": "Faye Raker"},
]
pd.json_normalize(data)
Output:
id name.first name.last name.given name.family name
0 1.0 Coleen Volk NaN NaN NaN
1 NaN NaN NaN Mark Regner NaN
2 2.0 NaN NaN NaN NaN Faye Raker
You can refer to the below link to know more about the json_normalize() function.
CLICK HERE
I have a slightly weird input of data that is in this format:
data = { 'sensor1': {'units': 'x', 'values': [{'time': 17:00, 'value': 10},
{'time': 17:10, 'value': 12},
{'time': 17:20, 'value' :7}, ...]}
'sensor2': {'units': 'x', 'values': [{'time': 17:00, 'value': 9},
{'time': 17:20, 'value': 11}, ...]}
}
And I want to collect the output to look like:
{'17:00': [10,9], '17:10': [12,], '17:20': [7,11], ... }
So the keys are the unique timestamps (ordered) and the values are a list of the values of each sensor, in order they come in the original dictionary. If there is no value for the timestamp in one sensor, it is just left as an empty element ''. I know I might need to use defaultdict but I've not had any success.
e.g.
s = [('yellow', 1), ('blue', 2), ('yellow', 3), ('blue', 4), ('red', 1)]
d = defaultdict(list)
for k, v in s:
d[k].append(v)
sorted(d.items())
[('blue', [2, 4]), ('red', [1]), ('yellow', [1, 3])]
d = defaultdict(default_factory=list)
values_list = data.values()
for item in values_list:
for k, v in item['values']:
d[k].append(v)
result = sorted(d.items())
Encounters key error as each item in values_list is not a tuple but a dict.
You can also use dict in this way:
data = {'sensor1': {'units': 'x', 'values': [{'time': '17:00', 'value': 10},
{'time': '17:10', 'value': 12},
{'time': '17:20', 'value': 7},
]},
'sensor2': {'units': 'x', 'values': [{'time': '17:00', 'value': 9},
{'time': '17:20', 'value': 11},
]}
}
d = {}
for item in data.values():
for pair in item['values']:
if pair["time"] in d:
d[pair["time"]].append(pair["value"])
else:
d[pair["time"]] = [pair["value"]]
result = sorted(d.items())
print(result)
Output:
[('17:00', [10, 9]), ('17:10', [12]), ('17:20', [7, 11])]
Using defaultdict defaultdict example with list in Python documentation :
from collections import defaultdict
data = {'sensor1': {'units': 'x', 'values': [{'time': '17:00', 'value': 10},
{'time': '17:10', 'value': 12},
{'time': '17:20', 'value': 7},
]},
'sensor2': {'units': 'x', 'values': [{'time': '17:00', 'value': 9},
{'time': '17:20', 'value': 11},
]}
}
d = defaultdict(list)
for item in data.values():
for pair in item['values']:
d[pair["time"]].append(pair["value"])
result = sorted(d.items())
print(result)
Output:
[('17:00', [10, 9]), ('17:10', [12]), ('17:20', [7, 11])]
Is there a way to speed this up? currently using 2 for loops (considering duplicates). any suggestions on speeding this up?
d1 = [{'key': 't1', 'val': 1}, {'key': 't2', 'val': 2}, {'key': 't3', 'val': 3}, {'key': 't4', 'val': 4}, {'key': 't5', 'val': 5}, {'key': 't6', 'val': 6}, {'key': 't7', 'val': 7}, {'key': 't8', 'val': 8}, {'key': 't9', 'val': 9}, {'key': 't10', 'val': 10}, {'key': 't11', 'val': 11}, {'key': 't12', 'val': 12}, {'key': 't13', 'val': 13}, {'key': 't14', 'val': 14}, {'key': 't15', 'val': 15}, {'key': 't16', 'val': 16}, {'key': 't17', 'val': 17}, {'key': 't18', 'val': 18}, {'key': 't19', 'val': 19}]
d2 = [{'key': 't1', 'val': 'newval1'}, {'key': 't1', 'val': 'newval11'}, {'key': 't2', 'val': 'newval2'}, {'key': 't3', 'val': 'newval3'}, {'key': 't6', 'val': 'newval6'}, {'key': 't7', 'val': 'newval7'}, {'key': 't8', 'val': 'newval8'}, {'key': 't9', 'val': 'newval9'}, {'key': 't10', 'val': 'newval10'}, {'key': 't11', 'val': 'newval11'}, {'key': 't12', 'val': 'newval12'}, {'key': 't13', 'val': 'newval13'}, {'key': 't14', 'val': 'newval14'}, {'key': 't15', 'val': 'newval15'}, {'key': 't16', 'val': 'newval16'}, {'key': 't17', 'val': 'newval17'}, {'key': 't18', 'val': 'newval18'}, {'key': 't19', 'val': 'newval19'}]
>>> for x in d1:
... for y in d2:
... if x['key'] == y['key']:
... print(x['key'], x['val'], y['val'])
...
('t1', 1, 'newval1')
('t1', 1, 'newval11')
('t2', 2, 'newval2')
('t3', 3, 'newval3')
('t6', 6, 'newval6')
('t7', 7, 'newval7')
('t8', 8, 'newval8')
('t9', 9, 'newval9')
('t10', 10, 'newval10')
('t11', 11, 'newval11')
('t12', 12, 'newval12')
('t13', 13, 'newval13')
('t14', 14, 'newval14')
('t15', 15, 'newval15')
('t16', 16, 'newval16')
('t17', 17, 'newval17')
('t18', 18, 'newval18')
('t19', 19, 'newval19')
With this structure No... They are two lists, you need to iterate over them to find what you are looking for.
But you can store this data as a dictionary of dictionaries using their keys: (After confirmation from you that d1 always have unique keys, just turn d1 to a dictionary of dictionaries)
d1 = {d['key']: {'val': d['val']} for d in d1}
This way you can iterate over d2(a single for loop) and pick the relevant value from d1.
for d in d2:
key, value = d['key'], d['val']
print(key, d1[key]['val'], value)
here is the full code:
d1 = [{'key': 't1', 'val': 1}, {'key': 't2', 'val': 2}, {'key': 't3', 'val': 3},
{'key': 't4', 'val': 4}, {'key': 't5', 'val': 5}, {'key': 't6', 'val': 6},
{'key': 't7', 'val': 7}, {'key': 't8', 'val': 8}, {'key': 't9', 'val': 9},
{'key': 't10', 'val': 10}, {'key': 't11', 'val': 11},
{'key': 't12', 'val': 12}, {'key': 't13', 'val': 13},
{'key': 't14', 'val': 14}, {'key': 't15', 'val': 15},
{'key': 't16', 'val': 16}, {'key': 't17', 'val': 17},
{'key': 't18', 'val': 18}, {'key': 't19', 'val': 19}]
d2 = [{'key': 't1', 'val': 'newval1'}, {'key': 't1', 'val': 'newval11'},
{'key': 't2', 'val': 'newval2'}, {'key': 't3', 'val': 'newval3'},
{'key': 't6', 'val': 'newval6'}, {'key': 't7', 'val': 'newval7'},
{'key': 't8', 'val': 'newval8'}, {'key': 't9', 'val': 'newval9'},
{'key': 't10', 'val': 'newval10'}, {'key': 't11', 'val': 'newval11'},
{'key': 't12', 'val': 'newval12'}, {'key': 't13', 'val': 'newval13'},
{'key': 't14', 'val': 'newval14'}, {'key': 't15', 'val': 'newval15'},
{'key': 't16', 'val': 'newval16'}, {'key': 't17', 'val': 'newval17'},
{'key': 't18', 'val': 'newval18'}, {'key': 't19', 'val': 'newval19'}]
d1 = {d['key']: {'val': d['val']} for d in d1}
for d in d2:
key, value = d['key'], d['val']
print(key, d1[key]['val'], value)
output:
t1 1 newval1
t1 1 newval11
t2 2 newval2
t3 3 newval3
t6 6 newval6
t7 7 newval7
t8 8 newval8
t9 9 newval9
t10 10 newval10
t11 11 newval11
t12 12 newval12
t13 13 newval13
t14 14 newval14
t15 15 newval15
t16 16 newval16
t17 17 newval17
t18 18 newval18
t19 19 newval19
I have seen Python: remove dictionary from list and Splitting a list of dictionaries into several lists of dictionaries - but this question is slightly different.
Consider this working example (same in Python 2 or 3):
#!/usr/bin/env python
from __future__ import print_function
origarr = [
{ 'name': 'test01', 'type': 0, 'value': 42 },
{ 'name': 'test02', 'type': 0, 'value': 142 },
{ 'name': 'test03', 'type': 2, 'value': 242 },
{ 'name': 'test04', 'type': 2, 'value': 342 },
{ 'name': 'test05', 'type': 3, 'value': 42 },
]
print("origarr: {}".format(origarr))
lastdictelem = origarr.pop()
print("\nlastdictelem: {}".format(lastdictelem))
print("after pop, origarr: {}".format(origarr))
namestofilter = [ 'test01', 'test02' ]
newarr = []
for iname in namestofilter:
# find the object having the name iname
foundidx = -1
for ix, idict in enumerate(origarr):
if idict.get('name') == iname:
foundidx = ix
break
if foundidx > -1:
# remove dict object via pop at index, save removed object
remdict = origarr.pop(foundidx)
# add removed object to newarr:
newarr.append(remdict)
print("\nafter namestofilter:")
print("newarr: {}".format(newarr))
print("origarr: {}".format(origarr))
Basically, mylist.pop() removes the last element from mylist as an object (here a dict), and returns it - then I can trivially insert it in a new array/list; this is illustrated by the first printout of this script:
$ python2 test.py
origarr: [{'name': 'test01', 'type': 0, 'value': 42}, {'name': 'test02', 'type': 0, 'value': 142}, {'name': 'test03', 'type': 2, 'value': 242}, {'name': 'test04', 'type': 2, 'value': 342}, {'name': 'test05', 'type': 3, 'value': 42}]
lastdictelem: {'name': 'test05', 'type': 3, 'value': 42}
after pop, origarr: [{'name': 'test01', 'type': 0, 'value': 42}, {'name': 'test02', 'type': 0, 'value': 142}, {'name': 'test03', 'type': 2, 'value': 242}, {'name': 'test04', 'type': 2, 'value': 342}]
Now, what I would like to do, is define an array with values for the name key in a dict (say, namestofilter = [ 'test01', 'test02' ]), and have those dicts removed from the orriginal array/list, and put into a new array/list (as .pop() would do with a single element and an object reference).
Since pop removes the item at a specific index and returns it, the above code does exactly that - and works:
...
after namestofilter:
newarr: [{'name': 'test01', 'type': 0, 'value': 42}, {'name': 'test02', 'type': 0, 'value': 142}]
origarr: [{'name': 'test03', 'type': 2, 'value': 242}, {'name': 'test04', 'type': 2, 'value': 342}]
... but I was wondering - is there a more compact way of doing that, other than "manually" for-looping through the two arrays, and calling .pop()/.append() individually (as done in the example)?
I'm not sure is there a way to it compact - probaly not.
But you can simpify code a little bit and also don't spend O(n) for each .pop:
origarr = [
{ 'name': 'test01', 'type': 0, 'value': 42 },
{ 'name': 'test02', 'type': 0, 'value': 142 },
{ 'name': 'test03', 'type': 2, 'value': 242 },
{ 'name': 'test04', 'type': 2, 'value': 342 },
{ 'name': 'test05', 'type': 3, 'value': 42 },
]
namestofilter = set([ 'test01', 'test02' ]). # could be a list as in question
print("origarr: {}".format(origarr))
lastdictelem = origarr.pop()
print("\nlastdictelem: {}".format(lastdictelem))
print("after pop, origarr: {}".format(origarr))
shift = 0
newarr = []
for ix, idict in enumerate(origarr):
if idict['name'] in namestofilter:
shift += 1
newarr.append(idict)
continue
origarr[ix-shift] = origarr[ix]
origarr = origarr[:-shift] # perhaps it is a slicing O(n) copy overhead
print("\nafter namestofilter:")
print("newarr: {}".format(newarr))
print("origarr: {}".format(origarr))
Output:
origarr: [{'name': 'test01', 'type': 0, 'value': 42}, {'name': 'test02', 'type': 0, 'value': 142}, {'name': 'test03', 'type': 2, 'value': 242}, {'name': 'test04', 'type': 2, 'value': 342}, {'name': 'test05', 'type': 3, 'value': 42}]
lastdictelem: {'name': 'test05', 'type': 3, 'value': 42}
after pop, origarr: [{'name': 'test01', 'type': 0, 'value': 42}, {'name': 'test02', 'type': 0, 'value': 142}, {'name': 'test03', 'type': 2, 'value': 242}, {'name': 'test04', 'type': 2, 'value': 342}]
after namestofilter:
newarr: [{'name': 'test01', 'type': 0, 'value': 42}, {'name': 'test02', 'type': 0, 'value': 142}]
origarr: [{'name': 'test03', 'type': 2, 'value': 242}, {'name': 'test04', 'type': 2, 'value': 342}]
I have this list:
list1 = [
{'currency': 'USD', 'value': 10},
{'currency': 'USD', 'value': 12},
{'currency': 'EUR', 'value': 11},
{'currency': 'EUR', 'value': 15},
{'currency': 'EUR', 'value': 17},
{'currency': 'GBP', 'value': 13},
]
How do I combine the dictionaries so I get this list from list1?
list2 = [
{'currency': 'USD', 'value': 22},
{'currency': 'EUR', 'value': 43},
{'currency': 'GBP', 'value': 13},
]
you can use Counter to calculate the summs
from collections import Counter
c = Counter()
for d in list1:
cur, v = d.get('currency'), d.get('value')
c.update({cur: v})
print(c)
Counter({'EUR': 43, 'USD': 22, 'GBP': 13})
and after generate the output:
list2 = [{'currency': cur, 'value': v} for cur, v in c.items()]
print(list2)
[{'currency': 'USD', 'value': 22}, {'currency': 'GBP', 'value': 13}, {'currency': 'EUR', 'value': 43}]
Use a dictionary:
d = {}
for x in list1:
c, v = x["currency"], x["value"]
d[c] = d.get(c, 0) + v
# {'EUR': 43, 'GBP': 13, 'USD': 22}
Then either use that dict directly (I would recommend that) or turn it back into your list-of-dicts format, use a list-comprehension:
>>> [{"currency": k, "value": v} for k, v in d.items()]
[{'currency': 'USD', 'value': 22},
{'currency': 'EUR', 'value': 43},
{'currency': 'GBP', 'value': 13}]
Using collections.defaultdict.
Demo:
import collections
d = collections.defaultdict(int)
list1 = [
{'currency': 'USD', 'value': 10},
{'currency': 'USD', 'value': 12},
{'currency': 'EUR', 'value': 11},
{'currency': 'EUR', 'value': 15},
{'currency': 'EUR', 'value': 17},
{'currency': 'GBP', 'value': 13},
]
for i in list1:
d[i['currency']] += i["value"]
print( [{'currency': k, 'value': v} for k,v in d.items()] )
Output:
[{'currency': 'USD', 'value': 22}, {'currency': 'GBP', 'value': 13}, {'currency': 'EUR', 'value': 43}]
This should do it:
list1 = [
{'currency': 'USD', 'value': 10},
{'currency': 'USD', 'value': 12},
{'currency': 'EUR', 'value': 11},
{'currency': 'EUR', 'value': 15},
{'currency': 'EUR', 'value': 17},
{'currency': 'GBP', 'value': 13},
]
valuePairs = {}
for d in list1:
curr = d['currency']
val = d['value']
if curr in valuePairs:
valuePairs[curr] += val
else:
valuePairs[curr] = val
solution = [{'currency': k, 'value': v} for k, v in valuePairs.items()]
you could also do:
import itertools as it
list1 = [
{'currency': 'USD', 'value': 10},
{'currency': 'USD', 'value': 12},
{'currency': 'EUR', 'value': 11},
{'currency': 'EUR', 'value': 15},
{'currency': 'EUR', 'value': 17},
{'currency': 'GBP', 'value': 13},
]
kfunc = lambda x: x['currency']
groups = it.groupby(sorted(list1, key=kfunc), kfunc)
result = [{'currency':k, 'value':sum(x['value'] for x in g)} for k, g in groups]
print(result)
You can also one-line it:
[{"currency": k, "value": sum([d["value"] for d in list1 if d["currency"] == k])} for k in list(set([d["currency"] for d in list1]))]
list(set([d["currency"] for d in list1])) gives you the unique list of currencies
You sum then values of each unique currency to make the new dict.