Python: Pandas: making a dictionary from dataframe - python

I am trying to convert a dataframe to dictionary:
xtest_cat = xtest_cat.T.to_dict().values()
but it gives a warning :
Warning: DataFrame columns are not unique, some columns will be omitted python
I checked the columns names of the dataframe(xtest_cat) :
len(list(xtest_cat.columns.values))
len(set(list(xtest_cat.columns.values)))
they are all unique.
Can anyone help me out ?

You can use reset_index for create unique index:
xtest_cat = pd.DataFrame({'A':[1,2,3],
'B':[4,5,6],
'C':[7,8,9]})
xtest_cat.index = [0,1,1]
print (xtest_cat)
A B C
0 1 4 7
1 2 5 8
1 3 6 9
print (xtest_cat.index.is_unique)
False
xtest_cat.reset_index(drop=True, inplace=True)
print (xtest_cat)
A B C
0 1 4 7
1 2 5 8
2 3 6 9
xtest_cat = xtest_cat.T.to_dict().values()
print (xtest_cat)
dict_values([{'B': 4, 'C': 7, 'A': 1}, {'B': 5, 'C': 8, 'A': 2}, {'B': 6, 'C': 9, 'A': 3}])
You can also omit T and add parameter orient='index':
xtest_cat = xtest_cat.to_dict(orient='index').values()
print (xtest_cat)
dict_values([{'B': 4, 'C': 7, 'A': 1}, {'B': 5, 'C': 8, 'A': 2}, {'B': 6, 'C': 9, 'A': 3}])
orient='record' is better:
xtest_cat = xtest_cat.to_dict(orient='records')
print (xtest_cat)
[{'B': 4, 'C': 7, 'A': 1}, {'B': 5, 'C': 8, 'A': 2}, {'B': 6, 'C': 9, 'A': 3}]

Related

Convert list of dictionaries with one element to single dictionary in pandas DataFrame

I found many different solutions for doing so for single case but not for pandas Series.
I would like to change this
col1
0 [{'a':2, 'b':3, 'c':9}]
1 [{'a':1, 'b':0, 'c':8}]
2 [{'a':4, 'b': 5, 'c':12}]
3 [{'a':3, 'b':6, 'c':11}]
into
col1
0 {'a':2, 'b':3, 'c':9}
1 {'a':1, 'b':0, 'c':8}
2 {'a':4, 'b': 5, 'c':12}
3 {'a':3, 'b':6, 'c':11}
Thanks
If there is one element list select by indexing for remove []:
df['col'] = df['col'].str[0]
If values are strings repr of dicts:
import ast
df['col'] = df['col'].apply(ast.literal_eval).str[0]
print (df)
col
0 {'a': 2, 'b': 3, 'c': 9}
1 {'a': 1, 'b': 0, 'c': 8}
2 {'a': 4, 'b': 5, 'c': 12}
3 {'a': 3, 'b': 6, 'c': 11}

Can you use a string from an array to equate two variables?

import numpy as np
A = np.array(['B'])
B = 5
C = A[0]
I would like C = 5 if that is possible.
An approach like this sounds like it would be far better:
arr = np.array([['A', 'B', 'C'],['D', 'E', 'F'],['G', 'H', 'I']])
values = {'A': 1, 'B': 2, 'C': 3, 'D': 4, 'E': 5, 'F': 6, 'G': 7, 'H': 8, 'I': 9}
print(values['C'])
values['C'] = values[arr[0][1]]
print(values['C'])
Output:
3
2
You can then view the array by values by doing:
np.array(list(map(lambda row: {x:values[x] for x in row}, arr)))
Output:
array([{'A': 1, 'B': 2, 'C': 2},
{'D': 4, 'E': 5, 'F': 6},
{'G': 7, 'H': 8, 'I': 9}], dtype=object)
A = ['B']
B = {'B':5}
C = B[A[0]]
When you print C it will give output 5.

python pandas extract column of array to rows and colums

Hello I have a dataframe with the following structure:
Index A B C D E
3 foo bar 0 1 [{'A': 1, 'B': 'text', 'C': 3}, {'A': 2, 'B': 'other', 'C': 2}]
4 foo bar 0 1 [{'A': 3, 'B': 'foo', 'C': 4}, {'A': 6, 'B': 'bar', 'C': 8}]
With loc I get the Index and the E column
df2 = df.loc[:, ['E']]
Index E
3 [{'A': 1, 'B': 'text', 'C': 3}, {'A': 2, 'B': 'other', 'C': 2}]
4 [{'A': 3, 'B': 'foo', 'C': 4}, {'A': 6, 'B': 'bar, 'C': 8}]
But what I need is this structure
Index A B C
3 1 text 3
3 2 other 2
4 3 foo 4
4 6 bar 8
I think that iterating over the rows, extracting the array and creating another df for each will work but I hope that more efficient solution can be found.
Thanks in advance.
You can use explode to flatten your column then create a dataframe:
out = pd.DataFrame(df['E'].explode().tolist())
print(out)
# Output
A B C
0 1 text 3
1 2 other 2
2 3 foo 4
3 6 bar 8
To preserve the index:
out = df['E'].explode()
out = pd.DataFrame(out.tolist(), index=out.index)
print(out)
# Output
A B C
3 1 text 3
3 2 other 2
4 3 foo 4
4 6 bar 8

Fasest way to generate dictionaries from a pandas df without to_dict

I am trying to go from this dataframe:
run property low high abs1perc0 in1out0 weight
0 bob a 5 9 1 1 2
1 bob s 5 9 1 1 2
2 bob d 1 10 0 1 2
3 tom a 1 2 1 1 2
4 tom s 2 3 1 1 2
5 tom d 8 9 0 1 2
to dictionaries that are named after a concatenation of the individual 'run' names and the column names (except property). Property has to become the key and the data has to become the values i.e:
boblow = {'a':5, 's':5, 'd':1}
bobhigh = {'a':9, 's':9, 'd':10}
bobabs1perc0 = {'a':1, 's':1, 'd':0}
...
tomlow = {'a':1, 's':2, 'd':8}
...
This would have to happen to huge dfs and I cant wrap my head around how to do it other than by hand. I started making a list of concatenated names of individual values of the 'run' column but I'm certain someone here has a much faster and smarter way of doing it.
Thanks a Bunch!!
I recommend save the the output into dict of dict , also do not merge your tuple key to one key , also after we reshape your df, to_dict still work
d=df.set_index(['run','property']).stack().unstack(1).to_dict('index')
{('bob', 'low'): {'a': 5, 'd': 1, 's': 5}, ('bob', 'high'): {'a': 9, 'd': 10, 's': 9}, ('bob', 'abs1perc0'): {'a': 1, 'd': 0, 's': 1}, ('bob', 'in1out0'): {'a': 1, 'd': 1, 's': 1}, ('bob', 'weight'): {'a': 2, 'd': 2, 's': 2}, ('tom', 'low'): {'a': 1, 'd': 8, 's': 2}, ('tom', 'high'): {'a': 2, 'd': 9, 's': 3}, ('tom', 'abs1perc0'): {'a': 1, 'd': 0, 's': 1}, ('tom', 'in1out0'): {'a': 1, 'd': 1, 's': 1}, ('tom', 'weight'): {'a': 2, 'd': 2, 's': 2}}
d[('bob','low')]
{'a': 5, 'd': 1, 's': 5}

How to convert a nested dictionary to pandas dataframe?

I have a dictionary "my_dict" in this format:
{'l1':{'c1': {'a': 0, 'b': 1, 'c': 2},
'c2': {'a': 3, 'b': 4, 'c': 5}},
'l2':{'c1': {'a': 0, 'b': 1, 'c': 2},
'c2': {'a': 3, 'b': 4, 'c': 5}}
}
Currently, I am using pd.DataFrame.from_dict(my_dict, orient='index') and get a df like this:
c2 c1
l1 {u'a': 3, u'c': 5, u'b': 4} {u'a': 0, u'c': 2, u'b': 1}
l2 {u'a': 3, u'c': 5, u'b': 4} {u'a': 0, u'c': 2, u'b': 1}
However, what I want is both l1/l2 and c2/c3 as indexes and a/b/c as columns.
Something like this:
a b c
l1 c1 0 1 2
c2 3 4 5
l2 c1 0 1 2
c2 3 4 5
What's the best way to do this?
Consider a dictionary comprehension to build a dictionary with tuple keys. Then, use pandas' MultiIndex.from_tuples. Below ast is used to rebuild you original dictionary from string (ignore the step on your end).
import pandas as pd
import ast
origDict = ast.literal_eval("""
{'l1':{'c1': {'a': 0, 'b': 1, 'c': 2},
'c2': {'a': 3, 'b': 4, 'c': 5}},
'l2':{'c1': {'a': 0, 'b': 1, 'c': 2},
'c2': {'a': 3, 'b': 4, 'c': 5}}
}""")
# DICTIONARY COMPREHENSION
newdict = {(k1, k2):v2 for k1,v1 in origDict.items() \
for k2,v2 in origDict[k1].items()}
print(newdict)
# {('l1', 'c2'): {'c': 5, 'a': 3, 'b': 4},
# ('l2', 'c1'): {'c': 2, 'a': 0, 'b': 1},
# ('l1', 'c1'): {'c': 2, 'a': 0, 'b': 1},
# ('l2', 'c2'): {'c': 5, 'a': 3, 'b': 4}}
# DATA FRAME ASSIGNMENT
df = pd.DataFrame([newdict[i] for i in sorted(newdict)],
index=pd.MultiIndex.from_tuples([i for i in sorted(newdict.keys())]))
print(df)
# a b c
# l1 c1 0 1 2
# c2 3 4 5
# l2 c1 0 1 2
# c2 3 4 5

Categories

Resources