Hello I have a dataframe with the following structure:
Index A B C D E
3 foo bar 0 1 [{'A': 1, 'B': 'text', 'C': 3}, {'A': 2, 'B': 'other', 'C': 2}]
4 foo bar 0 1 [{'A': 3, 'B': 'foo', 'C': 4}, {'A': 6, 'B': 'bar', 'C': 8}]
With loc I get the Index and the E column
df2 = df.loc[:, ['E']]
Index E
3 [{'A': 1, 'B': 'text', 'C': 3}, {'A': 2, 'B': 'other', 'C': 2}]
4 [{'A': 3, 'B': 'foo', 'C': 4}, {'A': 6, 'B': 'bar, 'C': 8}]
But what I need is this structure
Index A B C
3 1 text 3
3 2 other 2
4 3 foo 4
4 6 bar 8
I think that iterating over the rows, extracting the array and creating another df for each will work but I hope that more efficient solution can be found.
Thanks in advance.
You can use explode to flatten your column then create a dataframe:
out = pd.DataFrame(df['E'].explode().tolist())
print(out)
# Output
A B C
0 1 text 3
1 2 other 2
2 3 foo 4
3 6 bar 8
To preserve the index:
out = df['E'].explode()
out = pd.DataFrame(out.tolist(), index=out.index)
print(out)
# Output
A B C
3 1 text 3
3 2 other 2
4 3 foo 4
4 6 bar 8
Related
I found many different solutions for doing so for single case but not for pandas Series.
I would like to change this
col1
0 [{'a':2, 'b':3, 'c':9}]
1 [{'a':1, 'b':0, 'c':8}]
2 [{'a':4, 'b': 5, 'c':12}]
3 [{'a':3, 'b':6, 'c':11}]
into
col1
0 {'a':2, 'b':3, 'c':9}
1 {'a':1, 'b':0, 'c':8}
2 {'a':4, 'b': 5, 'c':12}
3 {'a':3, 'b':6, 'c':11}
Thanks
If there is one element list select by indexing for remove []:
df['col'] = df['col'].str[0]
If values are strings repr of dicts:
import ast
df['col'] = df['col'].apply(ast.literal_eval).str[0]
print (df)
col
0 {'a': 2, 'b': 3, 'c': 9}
1 {'a': 1, 'b': 0, 'c': 8}
2 {'a': 4, 'b': 5, 'c': 12}
3 {'a': 3, 'b': 6, 'c': 11}
I have an object like
l = [
{'id': 1, 'name': 'a', 'obj2': [{'a': 3, 'b': 6}, {'a':4, 'b': 5}], 'obj': [{'x': 6, 'y': 'p'}, {'x': 10, 'y': 'q', 'z': 'qqq'}]},
{'id': 2, 'name': 'b', 'obj': [{'x': 10, 'y': 'r'}], 'obj2': [{'a': 9, 'i': 's'}]}
]
and I want to make it a dataframe like:
id name a i b x y z
1 a 3 6 6 p
1 a 3 6 10 q qqq
1 a 4 5 6 p
1 a 4 5 10 q qqq
2 b 9 s 10 r
Inside the l, all keys will be the same. But I may have different l with different key name and different amount of objects with lists inside l[0].
Any help is much appreciated.
This is a perfect use case for pd.json_normalize:
l = [{'id': 1, 'name': 'a', 'obj': [{'x': 6, 'y': 'p'}, {'x': 10, 'y': 'q', 'z': 'qqq'}]},
{'id': 2, 'name': 'b', 'obj': [{'x': 10, 'y': 'r'}]}]
df = pd.json_normalize(l, 'obj', ['id', 'name'])
print(df)
# Output:
x y z id name
0 6 p NaN 1 a
1 10 q qqq 1 a
2 10 r NaN 2 b
Udpate:
I want to use the same code for every object that has that structure type, but maybe the id,name,obj will be named differently
keys = list(l[0].keys())
df = pd.json_normalize(l, keys[-1], keys[:-1])
print(df)
# Output:
x y z id name
0 6 p NaN 1 a
1 10 q qqq 1 a
2 10 r NaN 2 b
I am trying to go from this dataframe:
run property low high abs1perc0 in1out0 weight
0 bob a 5 9 1 1 2
1 bob s 5 9 1 1 2
2 bob d 1 10 0 1 2
3 tom a 1 2 1 1 2
4 tom s 2 3 1 1 2
5 tom d 8 9 0 1 2
to dictionaries that are named after a concatenation of the individual 'run' names and the column names (except property). Property has to become the key and the data has to become the values i.e:
boblow = {'a':5, 's':5, 'd':1}
bobhigh = {'a':9, 's':9, 'd':10}
bobabs1perc0 = {'a':1, 's':1, 'd':0}
...
tomlow = {'a':1, 's':2, 'd':8}
...
This would have to happen to huge dfs and I cant wrap my head around how to do it other than by hand. I started making a list of concatenated names of individual values of the 'run' column but I'm certain someone here has a much faster and smarter way of doing it.
Thanks a Bunch!!
I recommend save the the output into dict of dict , also do not merge your tuple key to one key , also after we reshape your df, to_dict still work
d=df.set_index(['run','property']).stack().unstack(1).to_dict('index')
{('bob', 'low'): {'a': 5, 'd': 1, 's': 5}, ('bob', 'high'): {'a': 9, 'd': 10, 's': 9}, ('bob', 'abs1perc0'): {'a': 1, 'd': 0, 's': 1}, ('bob', 'in1out0'): {'a': 1, 'd': 1, 's': 1}, ('bob', 'weight'): {'a': 2, 'd': 2, 's': 2}, ('tom', 'low'): {'a': 1, 'd': 8, 's': 2}, ('tom', 'high'): {'a': 2, 'd': 9, 's': 3}, ('tom', 'abs1perc0'): {'a': 1, 'd': 0, 's': 1}, ('tom', 'in1out0'): {'a': 1, 'd': 1, 's': 1}, ('tom', 'weight'): {'a': 2, 'd': 2, 's': 2}}
d[('bob','low')]
{'a': 5, 'd': 1, 's': 5}
I am converting columns of data frame to to a list of dictionaries, however, due to the number of columns and number of observations in my data frame I run out of memory using my current approach:
df = pd.DataFrame(np.random.randn(10, 3), columns=['a', 'b', 'c'])
df.T.to_dict().values()
Is there a more efficient way I can do this?
is that what you want?
In [9]: df.to_dict('r')
Out[9]:
[{'a': 1.3720225964856179,
'b': -1.1530341240730422,
'c': -0.18791193632296455},
{'a': 1.3283240103713496, 'b': 3.6614598433626959, 'c': -0.46395170547460196},
{'a': -1.4960282310010959,
'b': 0.25156344524211743,
'c': -1.3664311385849288},
{'a': -0.11601714495988308,
'b': -0.73400546410732148,
'c': 0.9131316189984563},
{'a': 0.27404065198912386,
'b': -3.1246509560345261,
'c': 0.67227710572588184},
{'a': 1.3390654954886572, 'b': -0.80535280826120292, 'c': -1.78092490531724},
{'a': -0.13911682611874573,
'b': 1.6846890792762916,
'c': 0.22985191293512194},
{'a': -0.22058925847227495,
'b': -0.29342906413451442,
'c': -1.1181888670510167},
{'a': 3.2190577575509951, 'b': 0.59152576294942738, 'c': -1.3474566325216308},
{'a': -0.53486658456919434, 'b': 0.14390073779727405, 'c': 1.2214292373636}]
data:
In [10]: df
Out[10]:
a b c
0 1.372023 -1.153034 -0.187912
1 1.328324 3.661460 -0.463952
2 -1.496028 0.251563 -1.366431
3 -0.116017 -0.734005 0.913132
4 0.274041 -3.124651 0.672277
5 1.339065 -0.805353 -1.780925
6 -0.139117 1.684689 0.229852
7 -0.220589 -0.293429 -1.118189
8 3.219058 0.591526 -1.347457
9 -0.534867 0.143901 1.221429
I am trying to convert a dataframe to dictionary:
xtest_cat = xtest_cat.T.to_dict().values()
but it gives a warning :
Warning: DataFrame columns are not unique, some columns will be omitted python
I checked the columns names of the dataframe(xtest_cat) :
len(list(xtest_cat.columns.values))
len(set(list(xtest_cat.columns.values)))
they are all unique.
Can anyone help me out ?
You can use reset_index for create unique index:
xtest_cat = pd.DataFrame({'A':[1,2,3],
'B':[4,5,6],
'C':[7,8,9]})
xtest_cat.index = [0,1,1]
print (xtest_cat)
A B C
0 1 4 7
1 2 5 8
1 3 6 9
print (xtest_cat.index.is_unique)
False
xtest_cat.reset_index(drop=True, inplace=True)
print (xtest_cat)
A B C
0 1 4 7
1 2 5 8
2 3 6 9
xtest_cat = xtest_cat.T.to_dict().values()
print (xtest_cat)
dict_values([{'B': 4, 'C': 7, 'A': 1}, {'B': 5, 'C': 8, 'A': 2}, {'B': 6, 'C': 9, 'A': 3}])
You can also omit T and add parameter orient='index':
xtest_cat = xtest_cat.to_dict(orient='index').values()
print (xtest_cat)
dict_values([{'B': 4, 'C': 7, 'A': 1}, {'B': 5, 'C': 8, 'A': 2}, {'B': 6, 'C': 9, 'A': 3}])
orient='record' is better:
xtest_cat = xtest_cat.to_dict(orient='records')
print (xtest_cat)
[{'B': 4, 'C': 7, 'A': 1}, {'B': 5, 'C': 8, 'A': 2}, {'B': 6, 'C': 9, 'A': 3}]