pandas groupby to list of dicts

pandas groupby to list of dicts - python

this is my data:
data = [
{'shape': 'circle', 'width': 10, 'height': 8},
{'shape': 'circle', 'width': 7, 'height': 2},
{'shape': 'square', 'width': 4, 'height': 6}
]
I am trying to group by shapes that will hold the x, y
my final output should be a dict in the following format:
{
'circle': [
{'x': 10, 'y': 8},
{'x': 7, 'y': 2}
],
'square': [
{'x': 4, 'y': 6}
],
}
here is what I tried, which does not work
df = pd.DataFrame(data)
df = df.rename({'width': 'x', 'height': 'y'}, axis='columns')
df.groupby('shape').apply(
lambda s: s.do_dict()).to_dict()
what is the correct way to do it? also is there a way to do it with out renaming the columns before, something like:
df.groupby('shape').apply(
lambda s: {'x': s['width'], 'y': s['height']}).to_dict()

I could not do without renaming the column but something like this?
(df.rename(columns={'width': 'x', 'height': 'y'})
.groupby('shape')
.apply(lambda s: s[['x', 'y']].to_dict(orient='records'))
.to_dict())

It can be done with a dict comprehension:
res = {i:df[df['shape']==i][['x', 'y']].to_dict(orient='records') for i in set(df['shape'])}
>>>print(res)
{'circle': [{'x': 10, 'y': 8}, {'x': 7, 'y': 2}], 'square': [{'x': 4, 'y': 6}]}

Related

Pandas: Pivot multi-index, with one 'shared' column

I have a pandas dataframe that can be represented like:
test_dict = {('a', 1) : {'shared':0,'x':1, 'y':2, 'z':3},
('a', 2) : {'shared':1,'x':2, 'y':4, 'z':6},
('b', 1) : {'shared':0,'x':10, 'y':20, 'z':30},
('b', 2) : {'shared':1,'x':100, 'y':200, 'z':300}}
example = pd.DataFrame.from_dict(test_dict).T
I am trying to figure out a way to turn this into a dataframe that looks like this dictionary representation:
res_dict = {1 : {'shared':0,'a':{'x':1, 'y':2, 'z':3}, 'b':{'x':10, 'y':20, 'z':30}},
2 : {'shared':1,'a':{'x':2, 'y':4, 'z':6},'b':{'x':100, 'y':200, 'z':300}}}
Any suggestions appreciated!
Thanks

A possible solution, which uses only dataframe manipulations and then converts to dictionary:
xyz = ['x', 'y', 'z']
out = (example.assign(xyz=example[xyz].apply(list, axis=1)).reset_index()
.pivot(index='level_0', columns=['level_1', 'shared'], values='xyz')
.applymap(lambda x: dict(zip(xyz, x))))
out.columns = out.columns.rename(None, level=0)
out.index = out.index.rename(None)
(pd.concat([out.droplevel(1, axis=1),
out.columns.to_frame().reset_index(drop=True).iloc[:,1]
.to_frame().T.set_axis(out.columns.get_level_values(0), axis=1)])
.iloc[np.arange(-1, len(out))].to_dict())
Output:
{
1: {
'shared': 0,
'a': {'x': 1, 'y': 2, 'z': 3},
'b': {'x': 10, 'y': 20, 'z': 30}
},
2: {
'shared': 1,
'a': {'x': 2, 'y': 4, 'z': 6},
'b': {'x': 100, 'y': 200, 'z': 300}
}
}

pandas include grouped value in to dict convert

here is my data
data = [
{'shape': 'circle', 'width': 10, 'height': 8},
{'shape': 'circle', 'width': 7, 'height': 2},
{'shape': 'square', 'width': 4, 'height': 6}
]
I am using pandas to aggregate min and max height on each group,
my final result should be:
[
{'shape': 'circle', 'min': 2, max: 8},
{'shape': 'square', 'min': 6, max: 6}
]
here is what I tried:
df = pd.DataFrame(data)
my_dict = df.groupby('shape').height.agg(['min', 'max']).to_dict('records')
but this results a record without the 'shape' column:
[
{'min': 2, 'max': 8},
{'min': 6, 'max': 6}
]
how can I include the grouped by column?

The group is set as index, try to reset it:
df.groupby('shape').height.agg(['min', 'max']).reset_index().to_dict('records')

Cartesian (cross) products and np.unique()

Depending on your few on my approach this is either a question about using np.unique() on awkward1 arrays or a call for a better approach:
Let a and b be two awkward1 arrays of the same outer length (number of events) but different inner lengths. For example:
a = [[1, 2], [3] , [] , [4, 5, 6]]
b = [[7] , [3, 5], [6], [8, 9]]
Let f: (x, y) -> z be a function that acts on two numbers x and y and results in the number z. For example:
f(x, y):= y - x
The idea is to compare every element in a with every element in b via f for each event and filter out the matches of a and b pairs that survive some cut applied to f. For example:
f(x, y) < 4
My approach for this is:
a = ak.from_iter(a)
b = ak.from_iter(b)
c = ak.cartesian({'x':a, 'y':b})
#c= [[{'x': 1, 'y': 7}, {'x': 2, 'y': 7}], [{'x': 3, 'y': 3}, {'x': 3, 'y': 5}], [], [{'x': 4, 'y': 8}, {'x': 4, 'y': 9}, {'x': 5, 'y': 8}, {'x': 5, 'y': 9}, {'x': 6, 'y': 8}, {'x': 6, 'y': 9}]]
i = ak.argcartesian({'x':a, 'y':b})
#i= [[{'x': 0, 'y': 0}, {'x': 1, 'y': 0}], [{'x': 0, 'y': 0}, {'x': 0, 'y': 1}], [], [{'x': 0, 'y': 0}, {'x': 0, 'y': 1}, {'x': 1, 'y': 0}, {'x': 1, 'y': 1}, {'x': 2, 'y': 0}, {'x': 2, 'y': 1}]]
diff = c['y'] - c['x']
#diff= [[6, 5], [0, 2], [], [4, 5, 3, 4, 2, 3]]
cut = diff < 4
#cut= [[False, False], [True, True], [], [False, False, True, False, True, True]]
new = c[cut]
#new= [[], [{'x': 3, 'y': 3}, {'x': 3, 'y': 5}], [], [{'x': 5, 'y': 8}, {'x': 6, 'y': 8}, {'x': 6, 'y': 9}]]
new_i = i[cut]
#new_i= [[], [{'x': 0, 'y': 0}, {'x': 0, 'y': 1}], [], [{'x': 1, 'y': 0}, {'x': 2, 'y': 0}, {'x': 2, 'y': 1}]]
It is possible that pairs with the same element from a but different elements from b survive the cut. (e.g. {'x': 3, 'y': 3} and {'x': 3, 'y': 5})
My goal is to group those pairs with the same element from a together and therefore reshape the new array into:
new = [[], [{'x': 3, 'y': [3, 5]}], [], [{'x': 5, 'y': 8}, {'x': 6, 'y': [8, 9]}]]
My only idea how to achieve this is to create a list of the indexes from a that are still present after the cut by using new_i:
i = new_i['x']
#i= [[], [0, 0], [], [1, 2, 2]]
However, I need a unique version of this list to make every index appear only once. This could be achieved with np.unique() in NumPy. But doesn't work in awkward1:
np.unique(i)
<__array_function__ internals> in unique(*args, **kwargs)
TypeError: no implementation found for 'numpy.unique' on types that implement __array_function__: [<class 'awkward1.highlevel.Array'>]
My question:
Is their a np.unique() equivalent in awkward1 and/or would you recommend a different approach to my problem?

Okay, I still don't know how to use np.unique() on my arrays, but I found a solution for my own problem:
In my previous approach I used the following code to pair up booth arrays.
c = ak.cartesian({'x':a, 'y':b})
#c= [[{'x': 1, 'y': 7}, {'x': 2, 'y': 7}], [{'x': 3, 'y': 3}, {'x': 3, 'y': 5}], [], [{'x': 4, 'y': 8}, {'x': 4, 'y': 9}, {'x': 5, 'y': 8}, {'x': 5, 'y': 9}, {'x': 6, 'y': 8}, {'x': 6, 'y': 9}]]
However, with the nested = True parameter from ak.cartesian() I get a list grouped by the elements of a:
c = ak.cartesian({'x':a, 'y':b}, axis = 1, nested = True)
#c= [[[{'x': 1, 'y': 7}], [{'x': 2, 'y': 7}]], [[{'x': 3, 'y': 3}, {'x': 3, 'y': 5}]], [], [[{'x': 4, 'y': 8}, {'x': 4, 'y': 9}], [{'x': 5, 'y': 8}, {'x': 5, 'y': 9}], [{'x': 6, 'y': 8}, {'x': 6, 'y': 9}]]]
After the cut I end up with:
new = c[cut]
#new= [[[], []], [[{'x': 3, 'y': 3}, {'x': 3, 'y': 5}]], [], [[], [{'x': 5, 'y': 8}], [{'x': 6, 'y': 8}, {'x': 6, 'y': 9}]]]
I extract the y values and reduce the most inner layer of the nested lists of new to only one element:
y = new['y']
#y= [[[], []], [[3, 5]], [], [[], [8], [8, 9]]]
new = ak.firsts(new, axis = 2)
#new= [[None, None], [{'x': 3, 'y': 3}], [], [None, {'x': 5, 'y': 8}, {'x': 6, 'y': 8}]]
(I tried to use ak.firsts() with axis = -1 but it seems to be not implemented yet.)
Now every most inner entry in new belongs to exactly one element from a. By replacing the current y of new with the previously extracted y I end up with my desired result:
new['y'] = y
#new= [[None, None], [{'x': 3, 'y': [3, 5]}], [], [None, {'x': 5, 'y': [8]}, {'x': 6, 'y': [8, 9]}]]
Anyway, should you know a better solution, I'd be pleased to hear it.

Scatter plot line to out of order data

I am tracking the movements of an avian animal. I have detection points on an xy plot. I want to connect the previous detected point to the next detection, regardless of direction. This will assist with removing extraneous detections.
Data Sample:
Sample input
The goal is to have a line from the previous data point to the next point.
Sample output
Unsuccessful method 1:
plt.figure('Frame',figsize=(16,12))
plt.imshow(frame)
plt.plot(x, y, '-ro', 'd',markersize=2.5, color='orange')
Method 1 output
Unsuccessful method 2:
plt.plot(np.sort(x), y[np.argsort(x)], '-bo', ms = 2)
Method 2 output

I used your sample data and make a plot with method 1 (but with pandas) and the output was as you expected. I don't understand why you have an unsuccessful result.
data = [{'frame': 1, 'x': 5, 'y': 15},
{'frame': 4, 'x': 10, 'y': 15},
{'frame': 5, 'x': 15, 'y': 15},
{'frame': 6, 'x': 20, 'y': 15},
{'frame': 7, 'x': 23, 'y': 20},
{'frame': 8, 'x': 25, 'y': 25},
{'frame': 11, 'x': 20, 'y': 23},
{'frame': 15, 'x': 15, 'y': 20},
{'frame': 18, 'x': 8, 'y': 18},
{'frame': 19, 'x': 8, 'y': 10},
{'frame': 20, 'x': 12, 'y': 7}]
df = pd.DataFrame(data).sort_values('frame')
df.plot(x='x', y='y')

Select highest value from python list of dicts

In a list of list of dicts:
A = [
[{'x': 1, 'y': 0}, {'x': 2, 'y': 3}, {'x': 3, 'y': 4}, {'x': 4, 'y': 7}],
[{'x': 1, 'y': 0}, {'x': 2, 'y': 2}, {'x': 3, 'y': 13}, {'x': 4, 'y': 0}],
[{'x': 1, 'y': 20}, {'x': 2, 'y': 4}, {'x': 3, 'y': 0}, {'x': 4, 'y': 8}]
]
I need to retrieve the highest 'y' values from each of the list of dicts...so the resulting list would contain:
Z = [(4, 7), (3,13), (1,20)]
In A, the 'x' is the key of each dict while 'y' is the value of each dict.
Any ideas? Thank you.

max accept optional key parameter.
A = [
[{'x': 1, 'y': 0}, {'x': 2, 'y': 3}, {'x': 3, 'y': 4}, {'x': 4, 'y': 7}],
[{'x': 1, 'y': 0}, {'x': 2, 'y': 2}, {'x': 3, 'y': 13}, {'x': 4, 'y': 0}],
[{'x': 1, 'y': 20}, {'x': 2, 'y': 4}, {'x': 3, 'y': 0}, {'x': 4, 'y': 8}]
]
Z = []
for a in A:
d = max(a, key=lambda d: d['y'])
Z.append((d['x'], d['y']))
print Z
UPDATE
suggested by – J.F. Sebastian:
from operator import itemgetter
Z = [itemgetter(*'xy')(max(lst, key=itemgetter('y'))) for lst in A]

I'd use itemgetter and max's key argument:
from operator import itemgetter
pair_getter = itemgetter('x', 'y')
[pair_getter(max(d, key=itemgetter('y'))) for d in A]

[max(((d['x'], d['y']) for d in l), key=lambda t: t[1]) for l in A]

The solution to your stated problem has been given, but I suggest changing your underlying data structure. Tuples are much faster for small elements such as a point. You may retain the clarity of a dictionary by using namedtuple if you so desire.
>>> from collections import namedtuple
>>> A = [
[{'x': 1, 'y': 0}, {'x': 2, 'y': 3}, {'x': 3, 'y': 4}, {'x': 4, 'y': 7}],
[{'x': 1, 'y': 0}, {'x': 2, 'y': 2}, {'x': 3, 'y': 13}, {'x': 4, 'y': 0}],
[{'x': 1, 'y': 20}, {'x': 2, 'y': 4}, {'x': 3, 'y': 0}, {'x': 4, 'y': 8}]
]
Making a Point namedtuple is simple
>>> Point = namedtuple('Point', 'x y')
This is what an instance looks like
>>> Point(x=1, y=0) # Point(1, 0) also works
Point(x=1, y=0)
A would then look like this
>>> A = [[Point(**y) for y in x] for x in A]
>>> A
[[Point(x=1, y=0), Point(x=2, y=3), Point(x=3, y=4), Point(x=4, y=7)],
[Point(x=1, y=0), Point(x=2, y=2), Point(x=3, y=13), Point(x=4, y=0)],
[Point(x=1, y=20), Point(x=2, y=4), Point(x=3, y=0), Point(x=4, y=8)]]
Now working like this is much easier:
>>> from operator import attrgetter
>>> [max(row, key=attrgetter('y')) for row in A]
[Point(x=4, y=7), Point(x=3, y=13), Point(x=1, y=20)]
To retain the speed advantages of tuples it's better to access by index:
>>> from operator import itemgetter
>>> [max(row, key=itemgetter(2)) for row in A]
[Point(x=4, y=7), Point(x=3, y=13), Point(x=1, y=20)]

result=[]
for item in a:
new = sorted(item, key=lambda k: k['y'],reverse=True)
result.append((new[0]['x'],new[0]['y']))
print(result)
Note-The is not the efficient way to do this but this is one of the ways to get the required result.

Develop Reference

Python is a programming language that lets you work quickly and integrate systems more effectively.

pandas groupby to list of dicts - python

I could not do without renaming the column but something like this? (df.rename(columns={'width': 'x', 'height': 'y'}) .groupby('shape') .apply(lambda s: s[['x', 'y']].to_dict(orient='records')) .to_dict())

It can be done with a dict comprehension: res = {i:df[df['shape']==i][['x', 'y']].to_dict(orient='records') for i in set(df['shape'])} >>>print(res) {'circle': [{'x': 10, 'y': 8}, {'x': 7, 'y': 2}], 'square': [{'x': 4, 'y': 6}]}

Related

Pandas: Pivot multi-index, with one 'shared' column

pandas include grouped value in to dict convert

Cartesian (cross) products and np.unique()

Scatter plot line to out of order data

Select highest value from python list of dicts

Categories

Resources