Pandas: Pivot multi-index, with one 'shared' column - python

I have a pandas dataframe that can be represented like:
test_dict = {('a', 1) : {'shared':0,'x':1, 'y':2, 'z':3},
('a', 2) : {'shared':1,'x':2, 'y':4, 'z':6},
('b', 1) : {'shared':0,'x':10, 'y':20, 'z':30},
('b', 2) : {'shared':1,'x':100, 'y':200, 'z':300}}
example = pd.DataFrame.from_dict(test_dict).T
I am trying to figure out a way to turn this into a dataframe that looks like this dictionary representation:
res_dict = {1 : {'shared':0,'a':{'x':1, 'y':2, 'z':3}, 'b':{'x':10, 'y':20, 'z':30}},
2 : {'shared':1,'a':{'x':2, 'y':4, 'z':6},'b':{'x':100, 'y':200, 'z':300}}}
Any suggestions appreciated!
Thanks

A possible solution, which uses only dataframe manipulations and then converts to dictionary:
xyz = ['x', 'y', 'z']
out = (example.assign(xyz=example[xyz].apply(list, axis=1)).reset_index()
.pivot(index='level_0', columns=['level_1', 'shared'], values='xyz')
.applymap(lambda x: dict(zip(xyz, x))))
out.columns = out.columns.rename(None, level=0)
out.index = out.index.rename(None)
(pd.concat([out.droplevel(1, axis=1),
out.columns.to_frame().reset_index(drop=True).iloc[:,1]
.to_frame().T.set_axis(out.columns.get_level_values(0), axis=1)])
.iloc[np.arange(-1, len(out))].to_dict())
Output:
{
1: {
'shared': 0,
'a': {'x': 1, 'y': 2, 'z': 3},
'b': {'x': 10, 'y': 20, 'z': 30}
},
2: {
'shared': 1,
'a': {'x': 2, 'y': 4, 'z': 6},
'b': {'x': 100, 'y': 200, 'z': 300}
}
}

Related

Sum values in nested dictionary in python [duplicate]

This question already has answers here:
Sum values of similar keys inside two nested dictionary in python
(2 answers)
Closed 1 year ago.
I have a list of dictionary:
data = [
{"2010" : {'A' : 2,'B' : 3,'C' : 5,}},
{"2011" : {'A' : 1,'B' : 2,}},
{"2010" : {'A' : 1,'B' : 2,}}
]
I'd like sum the values where the key is same. So the result I expected should be like this:
res =
{"2010" : {'A' : 3, 'B' : 5, 'C' : 5},
"2011" : {'A' : 1, 'B' : 2}}
How can I do this easily?
So you have a list of dictionaries as input and you want to create an output dictionary which contains the sums:
from collections import defaultdict
data = [{'2010': {'A': 2, 'B': 3, 'C': 5}}, {'2011': {'A': 1, 'B': 2}}, {'2010': {'A': 1, 'B': 2}}]
def sum_values(data):
out = {}
for i in data:
for k in i.keys():
if k not in out:
out[k] = defaultdict(int)
for k1,v1 in i[k].items():
out[k][k1] += v1
return out
sum_values(data)
{'2010': defaultdict(<class 'int'>, {'A': 3, 'B': 5, 'C': 5}), '2011': defaultdict(<class 'int'>, {'A': 1, 'B': 2})}
Try this
data = [
{ "2010":{'A':2,'B':3,'C':5,}},
{ "2011":{'A':1,'B':2,}},
{"2010":{'A':1,'B':2,}}
]
res = {}
for d in data:
# print(d.items())
items = list(d.items())[0] # convert to list because dict items isn't subscriptable
year, value = items
if year not in res:
res[year] = value
res is equal to
{
'2010': {'A': 2, 'B': 3, 'C': 5},
'2011': {'A': 1, 'B': 2}
}

pandas groupby to list of dicts

this is my data:
data = [
{'shape': 'circle', 'width': 10, 'height': 8},
{'shape': 'circle', 'width': 7, 'height': 2},
{'shape': 'square', 'width': 4, 'height': 6}
]
I am trying to group by shapes that will hold the x, y
my final output should be a dict in the following format:
{
'circle': [
{'x': 10, 'y': 8},
{'x': 7, 'y': 2}
],
'square': [
{'x': 4, 'y': 6}
],
}
here is what I tried, which does not work
df = pd.DataFrame(data)
df = df.rename({'width': 'x', 'height': 'y'}, axis='columns')
df.groupby('shape').apply(
lambda s: s.do_dict()).to_dict()
what is the correct way to do it? also is there a way to do it with out renaming the columns before, something like:
df.groupby('shape').apply(
lambda s: {'x': s['width'], 'y': s['height']}).to_dict()
I could not do without renaming the column but something like this?
(df.rename(columns={'width': 'x', 'height': 'y'})
.groupby('shape')
.apply(lambda s: s[['x', 'y']].to_dict(orient='records'))
.to_dict())
It can be done with a dict comprehension:
res = {i:df[df['shape']==i][['x', 'y']].to_dict(orient='records') for i in set(df['shape'])}
>>>print(res)
{'circle': [{'x': 10, 'y': 8}, {'x': 7, 'y': 2}], 'square': [{'x': 4, 'y': 6}]}

python recursively sort all nested iterable

how to recursively sort all nested iterable in an iterable?
e.g.
d = {
'e': [{'y': 'y'}, {'x': [{'2': 2, '1': 1}]}],
'x': ['c', 'b', 'a'],
'z': {
'a': [3, 1, 2],
'd': [{'y': [6,5,1]}, {'w': 1}],
'c': {'2': 2, '3': 3, '4': 4}
},
'w': {1:1, 2:2, 3:3}
}
I was the output like
{'e': [{'x': [{'1': 1, '2': 2}]}, {'y': 'y'}],
'w': {1: 1, 2: 2, 3: 3},
'x': ['a', 'b', 'c'],
'z': {'a': [1, 2, 3],
'c': {'2': 2, '3': 3, '4': 4},
'd': [{'w': 1}, {'y': [1, 5, 6]}]}}
from pprint import pprint
d = {
'e': [{'y': 'y'}, {'x': [{'2': 2, '1': 1}]}],
'x': ['c', 'b', 'a'],
'z': {
'a': [3, 1, 2],
'd': [{'y': [6,5,1]}, {'w': 1}],
'c': {'2': 2, '3': 3, '4': 4}
},
'w': {1:1, 2:2, 3:3}
}
def rec_sort(iterable):
"""Recursively sort
"""
def sort_dict_key(x):
if isinstance(x, dict):
return sorted(x.keys(), key=sort_dict_key)
return x
if isinstance(iterable, dict):
d = {}
for k, v in iterable.items():
d[k] = rec_sort(v)
elif isinstance(iterable, list):
iterable.sort(key=sort_dict_key)
for pos,item in enumerate(iterable):
iterable[pos] = rec_sort(item)
return iterable
pprint(rec_sort(d))
You can use recursion:
import json
d = {'x': ['c', 'b', 'a'], 'z': {'a': [3, 1, 2], 'c': {'3': 3, '2': 2, '4': 4}, 'd': [{'y': [6, 5, 1]}, {'w': 1}]}, 'e': [{'y': 'y'}, {'x': [{'1': 1, '2': 2}]}], 'w': {1: 1, 2: 2, 3: 3}}
def sort_nested(c):
if not isinstance(c, dict):
return sorted(c) if isinstance(c, list) else c
return {a:sorted(sort_nested(i) for i in b) if isinstance(b, list) else sort_nested(b) for a, b in c.items()}
print(json.dumps(sort_nested(d), indent=4))
Output:
{
"x": [
"a",
"b",
"c"
],
"z": {
"a": [
1,
2,
3
],
"c": {
"3": 3,
"2": 2,
"4": 4
},
"d": [
{
"w": 1
},
{
"y": [
1,
5,
6
]
}
]
},
"e": [
{
"x": [
{
"1": 1,
"2": 2
}
]
},
{
"y": "y"
}
],
"w": {
"1": 1,
"2": 2,
"3": 3
}
}

Python How to Extract data from Nested Dict

I have output from python networkX code:
flow_value, flow_dict = nx.maximum_flow(T, 'O', 'T')
print(flow_dict)
#Output as followesenter
#{'O': {'A': 4, 'B': 6, 'C': 4}, 'A': {'B': 1, 'D': 3}, 'B': {'C': 0, 'E': 3,'D': 4}, 'C': {'E': 4}, 'E': {'D': 1, 'T': 6}, 'D': {'T': 8}, 'T': {}}
I want to extract all the data in the form looks like:
#('O','A',4),('O','B','6'),('O','C','4'),('A','B',1),......,('D','T',8)
Any ways can I traverse thru the nested dict and get the data I need?
I tried this and it works. Some type checking to only capture strings
def retrieve_all_strings_from_dict(nested_dict, keys_to_ignore = None):
values = []
if not keys_to_ignore:
keys_to_ignore = []
else: keys_to_ignore = to_list(keys_to_ignore)
if not isinstance(nested_dict,dict):
return values
dict_stack = []
dict_stack.append(nested_dict)
for dict_var in dict_stack:
data_list = [v for k,v in dict_var.items() if all([isinstance(v,str), k not in keys_to_ignore]) ]
additional_dicts = [v for k,v in dict_var.items() if isinstance(v,dict)]
for x in additional_dicts:
dict_stack.append(x)
for w in data_list:
values.append(w)
return values

Select highest value from python list of dicts

In a list of list of dicts:
A = [
[{'x': 1, 'y': 0}, {'x': 2, 'y': 3}, {'x': 3, 'y': 4}, {'x': 4, 'y': 7}],
[{'x': 1, 'y': 0}, {'x': 2, 'y': 2}, {'x': 3, 'y': 13}, {'x': 4, 'y': 0}],
[{'x': 1, 'y': 20}, {'x': 2, 'y': 4}, {'x': 3, 'y': 0}, {'x': 4, 'y': 8}]
]
I need to retrieve the highest 'y' values from each of the list of dicts...so the resulting list would contain:
Z = [(4, 7), (3,13), (1,20)]
In A, the 'x' is the key of each dict while 'y' is the value of each dict.
Any ideas? Thank you.
max accept optional key parameter.
A = [
[{'x': 1, 'y': 0}, {'x': 2, 'y': 3}, {'x': 3, 'y': 4}, {'x': 4, 'y': 7}],
[{'x': 1, 'y': 0}, {'x': 2, 'y': 2}, {'x': 3, 'y': 13}, {'x': 4, 'y': 0}],
[{'x': 1, 'y': 20}, {'x': 2, 'y': 4}, {'x': 3, 'y': 0}, {'x': 4, 'y': 8}]
]
Z = []
for a in A:
d = max(a, key=lambda d: d['y'])
Z.append((d['x'], d['y']))
print Z
UPDATE
suggested by – J.F. Sebastian:
from operator import itemgetter
Z = [itemgetter(*'xy')(max(lst, key=itemgetter('y'))) for lst in A]
I'd use itemgetter and max's key argument:
from operator import itemgetter
pair_getter = itemgetter('x', 'y')
[pair_getter(max(d, key=itemgetter('y'))) for d in A]
[max(((d['x'], d['y']) for d in l), key=lambda t: t[1]) for l in A]
The solution to your stated problem has been given, but I suggest changing your underlying data structure. Tuples are much faster for small elements such as a point. You may retain the clarity of a dictionary by using namedtuple if you so desire.
>>> from collections import namedtuple
>>> A = [
[{'x': 1, 'y': 0}, {'x': 2, 'y': 3}, {'x': 3, 'y': 4}, {'x': 4, 'y': 7}],
[{'x': 1, 'y': 0}, {'x': 2, 'y': 2}, {'x': 3, 'y': 13}, {'x': 4, 'y': 0}],
[{'x': 1, 'y': 20}, {'x': 2, 'y': 4}, {'x': 3, 'y': 0}, {'x': 4, 'y': 8}]
]
Making a Point namedtuple is simple
>>> Point = namedtuple('Point', 'x y')
This is what an instance looks like
>>> Point(x=1, y=0) # Point(1, 0) also works
Point(x=1, y=0)
A would then look like this
>>> A = [[Point(**y) for y in x] for x in A]
>>> A
[[Point(x=1, y=0), Point(x=2, y=3), Point(x=3, y=4), Point(x=4, y=7)],
[Point(x=1, y=0), Point(x=2, y=2), Point(x=3, y=13), Point(x=4, y=0)],
[Point(x=1, y=20), Point(x=2, y=4), Point(x=3, y=0), Point(x=4, y=8)]]
Now working like this is much easier:
>>> from operator import attrgetter
>>> [max(row, key=attrgetter('y')) for row in A]
[Point(x=4, y=7), Point(x=3, y=13), Point(x=1, y=20)]
To retain the speed advantages of tuples it's better to access by index:
>>> from operator import itemgetter
>>> [max(row, key=itemgetter(2)) for row in A]
[Point(x=4, y=7), Point(x=3, y=13), Point(x=1, y=20)]
result=[]
for item in a:
new = sorted(item, key=lambda k: k['y'],reverse=True)
result.append((new[0]['x'],new[0]['y']))
print(result)
Note-The is not the efficient way to do this but this is one of the ways to get the required result.

Categories

Resources