pandas include grouped value in to dict convert - python

here is my data
data = [
{'shape': 'circle', 'width': 10, 'height': 8},
{'shape': 'circle', 'width': 7, 'height': 2},
{'shape': 'square', 'width': 4, 'height': 6}
]
I am using pandas to aggregate min and max height on each group,
my final result should be:
[
{'shape': 'circle', 'min': 2, max: 8},
{'shape': 'square', 'min': 6, max: 6}
]
here is what I tried:
df = pd.DataFrame(data)
my_dict = df.groupby('shape').height.agg(['min', 'max']).to_dict('records')
but this results a record without the 'shape' column:
[
{'min': 2, 'max': 8},
{'min': 6, 'max': 6}
]
how can I include the grouped by column?

The group is set as index, try to reset it:
df.groupby('shape').height.agg(['min', 'max']).reset_index().to_dict('records')

Related

Using default dictionaries problem (python)

I have a slightly weird input of data that is in this format:
data = { 'sensor1': {'units': 'x', 'values': [{'time': 17:00, 'value': 10},
{'time': 17:10, 'value': 12},
{'time': 17:20, 'value' :7}, ...]}
'sensor2': {'units': 'x', 'values': [{'time': 17:00, 'value': 9},
{'time': 17:20, 'value': 11}, ...]}
}
And I want to collect the output to look like:
{'17:00': [10,9], '17:10': [12,], '17:20': [7,11], ... }
So the keys are the unique timestamps (ordered) and the values are a list of the values of each sensor, in order they come in the original dictionary. If there is no value for the timestamp in one sensor, it is just left as an empty element ''. I know I might need to use defaultdict but I've not had any success.
e.g.
s = [('yellow', 1), ('blue', 2), ('yellow', 3), ('blue', 4), ('red', 1)]
d = defaultdict(list)
for k, v in s:
d[k].append(v)
sorted(d.items())
[('blue', [2, 4]), ('red', [1]), ('yellow', [1, 3])]
d = defaultdict(default_factory=list)
values_list = data.values()
for item in values_list:
for k, v in item['values']:
d[k].append(v)
result = sorted(d.items())
Encounters key error as each item in values_list is not a tuple but a dict.
You can also use dict in this way:
data = {'sensor1': {'units': 'x', 'values': [{'time': '17:00', 'value': 10},
{'time': '17:10', 'value': 12},
{'time': '17:20', 'value': 7},
]},
'sensor2': {'units': 'x', 'values': [{'time': '17:00', 'value': 9},
{'time': '17:20', 'value': 11},
]}
}
d = {}
for item in data.values():
for pair in item['values']:
if pair["time"] in d:
d[pair["time"]].append(pair["value"])
else:
d[pair["time"]] = [pair["value"]]
result = sorted(d.items())
print(result)
Output:
[('17:00', [10, 9]), ('17:10', [12]), ('17:20', [7, 11])]
Using defaultdict defaultdict example with list in Python documentation :
from collections import defaultdict
data = {'sensor1': {'units': 'x', 'values': [{'time': '17:00', 'value': 10},
{'time': '17:10', 'value': 12},
{'time': '17:20', 'value': 7},
]},
'sensor2': {'units': 'x', 'values': [{'time': '17:00', 'value': 9},
{'time': '17:20', 'value': 11},
]}
}
d = defaultdict(list)
for item in data.values():
for pair in item['values']:
d[pair["time"]].append(pair["value"])
result = sorted(d.items())
print(result)
Output:
[('17:00', [10, 9]), ('17:10', [12]), ('17:20', [7, 11])]

pandas groupby to list of dicts

this is my data:
data = [
{'shape': 'circle', 'width': 10, 'height': 8},
{'shape': 'circle', 'width': 7, 'height': 2},
{'shape': 'square', 'width': 4, 'height': 6}
]
I am trying to group by shapes that will hold the x, y
my final output should be a dict in the following format:
{
'circle': [
{'x': 10, 'y': 8},
{'x': 7, 'y': 2}
],
'square': [
{'x': 4, 'y': 6}
],
}
here is what I tried, which does not work
df = pd.DataFrame(data)
df = df.rename({'width': 'x', 'height': 'y'}, axis='columns')
df.groupby('shape').apply(
lambda s: s.do_dict()).to_dict()
what is the correct way to do it? also is there a way to do it with out renaming the columns before, something like:
df.groupby('shape').apply(
lambda s: {'x': s['width'], 'y': s['height']}).to_dict()
I could not do without renaming the column but something like this?
(df.rename(columns={'width': 'x', 'height': 'y'})
.groupby('shape')
.apply(lambda s: s[['x', 'y']].to_dict(orient='records'))
.to_dict())
It can be done with a dict comprehension:
res = {i:df[df['shape']==i][['x', 'y']].to_dict(orient='records') for i in set(df['shape'])}
>>>print(res)
{'circle': [{'x': 10, 'y': 8}, {'x': 7, 'y': 2}], 'square': [{'x': 4, 'y': 6}]}

Python: mimic the effect of groupby with list/dict comprehension

I need to transform a list of dict:
original = [
{'type': 'a', 'length': 34, 'width': 74},
{'type': 'a', 'length': 15, 'width': 22},
{'type': 'b', 'length': 53, 'width': 54},
{'type': 'b', 'length': 11, 'width': 45},
]
into a dict with the value of type key as the key:
expected = {
'a': [
{'type': 'a', 'length': 34, 'width': 74},
{'type': 'a', 'length': 15, 'width': 22},
],
'b': [
{'type': 'b', 'length': 53, 'width': 54},
{'type': 'b', 'length': 11, 'width': 45},
],
}
This can be achieved with itertools.groupby or by iterating through the list manually, but is there any way to do it with just list/dict comprehension?
You could do something like this:
{t: [i for i in original if i['type'] == t] for t in {i['type'] for i in original}}
But it's both difficult to read and has a worst-case runtime complexity of O(n²), where n is the number of items in the list. Using itertools.groupby on a sorted list is both faster and easier to read.

Pythonic sort a list of dictionaries in a tricky order

I have a list of id's sorted in a proper oder:
ids = [1, 2, 4, 6, 5, 0, 3]
I also have a list of dictionaries, sorted in some random way:
rez = [{'val': 7, 'id': 1}, {'val': 8, 'id': 2}, {'val': 2, 'id': 3}, {'val': 0, 'id': 4}, {'val': -1, 'id': 5}, {'val': -4, 'id': 6}, {'val': 9, 'id': 0}]
My intention is to sort rez list in a way that corresponds to ids:
rez = [{'val': 7, 'id': 1}, {'val': 8, 'id': 2}, {'val': 0, 'id': 4}, {'val': -4, 'id': 6}, {'val': -1, 'id': 5}, {'val': 9, 'id': 0}, {'val': 2, 'id': 3}]
I tried:
rez.sort(key = lambda x: ids.index(x['id']))
However that way is too slow for me, as len(ids) > 150K, and each dict actually had a lot of keys (some values there are strings). Any suggestion how to do it in the most pythonic, but still fastest way?
You don't need to sort because ids specifies the entire ordering of the result. You just need to pick the correct elements by their ids:
rez_dict = {d['id']:d for d in rez}
rez_ordered = [rez_dict[id] for id in ids]
Which gives:
>>> rez_ordered
[{'id': 1, 'val': 7}, {'id': 2, 'val': 8}, {'id': 4, 'val': 0}, {'id': 6, 'val': -4}, {'id': 5, 'val': -1}, {'id': 0, 'val': 9}, {'id': 3, 'val': 2}]
This should be faster than sorting because it can be done in linear time on average, while sort is O(nlogn).
Note that this assumes that there will be one entry per id, as in your example.
I think you are on the right track. If you need to speed it up, because your list is too long and you are having quadratic complexity, you can turn the list into a dictionary first, mapping the ids to their respective indices.
indices = {id_: pos for pos, id_ in enumerate(ids)}
rez.sort(key = lambda x: indices[x['id']])
This way, indices is {0: 5, 1: 0, 2: 1, 3: 6, 4: 2, 5: 4, 6: 3}, and rez is
[{'id': 1, 'val': 7},
{'id': 2, 'val': 8},
{'id': 4, 'val': 0},
{'id': 6, 'val': -4},
{'id': 5, 'val': -1},
{'id': 0, 'val': 9},
{'id': 3, 'val': 2}]

Python dictionaries, find similarities

I have a python dictionary with a thousand items. Each item is, itself, a dictionary. I'm looking for a clean and elegant way to parse through each item, and find & create templates.
Here's a simplified example of the individual dictionaries' structure:
{'id': 1,
'template': None,
'height': 80,
'width': 120,
'length': 75,
'weight': 100}
From this, I want to pass through once, and if, 500 of the 1000 share the same height and width, determine that, so I can build a template off that data, and assign the template id to 'template'. I can build a gigantic reference hash, but I'm hoping there's a cleaner more elegant way to accomplish this.
The actual data includes closer to 30 keys, of which a small subset need to be excluded from the template checking.
Given dict of dicts items:
import itertools as it
for (height, width), itemIter in it.groupby (items.values(), lambda x: (x['height'], x['width'])):
# in list(itemIter) you will find all items with dimensions (height, width)
#eumiro had an excellent core idea, namely that of using itertools.groupby() to arrange the items with common values together in batches. However besides neglecting to sort things first using the same key function as #Jochen Ritzel pointed-out (and is also mentioned in the documentation), he also didn't address the several other things you mentioned wanting to do.
Below is a more complete and somewhat longer answer. It determines the templates and assigns them in one pass thought the dict-of-dicts. To do this, after first creating a sorted list of items, it uses groupby() to batch them, and if there are enough in each group, creates a template and assigns its ID to each member.
inventory = {
'item1': {'id': 1, 'template': None, 'height': 80, 'width': 120, 'length': 75, 'weight': 100},
'item2': {'id': 2, 'template': None, 'height': 30, 'width': 40, 'length': 20, 'weight': 20},
'item3': {'id': 3, 'template': None, 'height': 80, 'width': 100, 'length': 96, 'weight': 150},
'item4': {'id': 4, 'template': None, 'height': 30, 'width': 40, 'length': 60, 'weight': 75},
'item5': {'id': 5, 'template': None, 'height': 80, 'width': 100, 'length': 36, 'weight': 33}
}
import itertools as itools
def print_inventory():
print 'inventory:'
for key in sorted(inventory.iterkeys()):
print ' {}: {}'.format(key, inventory[key])
print "-- BEFORE --"
print_inventory()
THRESHOLD = 2
ALLKEYS = ['template', 'height', 'width', 'length', 'weight']
EXCLUDEDKEYS = ['template', 'length', 'weight']
INCLUDEDKEYS = [key for key in ALLKEYS if key not in EXCLUDEDKEYS]
# determines which keys make up a template
sortby = lambda item, keys=INCLUDEDKEYS: tuple(item[key] for key in keys)
templates = {}
templateID = 0
sortedinventory = sorted(inventory.itervalues(), key=sortby)
for templatetuple, similariter in itools.groupby(sortedinventory, sortby):
similaritems = list(similariter)
if len(similaritems) >= THRESHOLD:
# create and assign a template
templateID += 1
templates[templateID] = templatetuple # tuple of values of INCLUDEDKEYS
for item in similaritems:
item['template'] = templateID
print
print "-- AFTER --"
print_inventory()
print
print 'templates:', templates
print
When I run it, the following is the output:
-- BEFORE --
inventory:
item1: {'weight': 100, 'height': 80, 'width': 120, 'length': 75, 'template': None, 'id': 1}
item2: {'weight': 20, 'height': 30, 'width': 40, 'length': 20, 'template': None, 'id': 2}
item3: {'weight': 150, 'height': 80, 'width': 100, 'length': 96, 'template': None, 'id': 3}
item4: {'weight': 75, 'height': 30, 'width': 40, 'length': 60, 'template': None, 'id': 4}
item5: {'weight': 33, 'height': 80, 'width': 100, 'length': 36, 'template': None, 'id': 5}
-- AFTER --
inventory:
item1: {'weight': 100, 'height': 80, 'width': 120, 'length': 75, 'template': None, 'id': 1}
item2: {'weight': 20, 'height': 30, 'width': 40, 'length': 20, 'template': 1, 'id': 2}
item3: {'weight': 150, 'height': 80, 'width': 100, 'length': 96, 'template': 2, 'id': 3}
item4: {'weight': 75, 'height': 30, 'width': 40, 'length': 60, 'template': 1, 'id': 4}
item5: {'weight': 33, 'height': 80, 'width': 100, 'length': 36, 'template': 2, 'id': 5}
templates: {1: (30, 40), 2: (80, 100)}

Categories

Resources