Convert a dict of dict into a DataFrame - python

I have a dict of dict like this:
data = {'1':{'a':10, 'b':30}, '2':{'a':20, 'b':60}}
And I want to convert it into this DataFrame:
1 2
'data' {'a':10, 'b':30} {'a':20, 'b':60}
but use pandas.DataFrame(data, index=['data'])
1 2
data NaN NaN
use pandas.DataFrame(data):
1 2
a 10 20
b 30 60
So how to get a DataFrame that its value is a dict?

Strange thing to want to do but you have to convert the values to a list with a single data element which is your dict:
In [42]:
data = {'1':{'a':10, 'b':30}, '2':{'a':20, 'b':60}}
for key in data:
data[key] = [data[key]]
pd.DataFrame(data, index=['data'])
Out[42]:
1 2
data {'a': 10, 'b': 30} {'a': 20, 'b': 60}

Related

How to add "orderd data" wity apply method in pandas (not use for-loop)

ID
A
B
C
D
Orderd
No1
8
9
5
2
D:2 C:5 A:8 B:9
No2
3
1
7
9
B:1 A:3 C:7 D:9
No3
29
34
5
294
C:5 A:29 B:34 D:294
I would like to add "Orderd" column with column of A, B, C and D.
If I use for loop, I can do it as like
for n in range(len(df)):
df['Orderd'][n] = df.T.sort_values(by=n,ascending=True)[n].to_string()
However, this method is too slow. I would like to do like this with "df.apply" method for doing speedy.
you can use apply directly on your dataframe, indicating the axis = 1
import pandas as pd
columns = ["ID","A","B","C","D"]
data = [["No1",8,9,5,2],
["No2",3,1,7,9],
["No3",29,34,5,294]]
df = pd.DataFrame(data=data, columns=columns)
df = df.set_index("ID") # important to avoid having an error
df["Orderd"] = df.apply(lambda x: x.sort_values().to_dict(), axis=1)
outputs:
A B C D Orderd
ID
No1 8 9 5 2 {'D': 2, 'C': 5, 'A': 8, 'B': 9}
No2 3 1 7 9 {'B': 1, 'A': 3, 'C': 7, 'D': 9}
No3 29 34 5 294 {'C': 5, 'A': 29, 'B': 34, 'D': 294}
I managed to do it like this:
df['Ordered'] = df.apply(lambda row: ' '.join([':'.join(s) for s in dict(row[1:].sort_values().astype('str')).items()]), axis=1)
Basically, I take all values in the row excluding the first one, which gives you a series. I sort it and convert to string.Then I convert the series to an dict and retrieve the items. I then use two list comprehensions to first join the Letter-Value pairs with a colon and then join the pair strings with a space.

How to create a dataframe from a nested dictionary using pandas?

I have the following nested dictionary:
dict1 = {'a': 1,'b': 2,'remaining': {'c': 3,'d': 4}}
I want to create a dataframe using pandas in order to achieve the following
df = pd.DataFrame(columns=list('abcd'))
df.loc[0] = [1,2,3,4]
You could pop the 'remaining' dict to update dict1, then convert the values to vectors (like lists).
nested = dict1.pop('remaining')
dict1.update(nested)
pd.DataFrame({k: [v] for k, v in dict1.items()})
a b c d
0 1 2 3 4
You can use pandas.json_normalize:
dict1 = {'a': 1,'b': 2,'remaining': {'c': 3,'d': 4}}
df = pd.json_normalize(dict1)
df.columns = list('abcd')
Result:
a b c d
0 1 2 3 4

How to flatten list of dictionaries in multiple columns of pandas dataframe

I have a dataframe and each record stores a list of dictionaries like this:
row prodect_id recommend_info
0 XQ002 [{"recommend_key":"XXX567","recommend_point":50},
{"recommend_key":"XXX236","recommend_point":20},
{"recommend_key":"XXX090","recommend_point":35}]
1 XQ003 [{"recommend_key":"XXX089","recommend_point":30},
{"recommend_key":"XXX567","recommend_point":20}]
I would like to flatten lists of dictionaries, so that it will look like this
row prodect_id recommend_info_recommend_key recommend_info_recommend_point
0 XQ002 XXX567 50
1 XQ002 XXX236 20
2 XQ002 XXX090 35
3 XQ003 XXX089 30
4 XQ003 XXX567 20
I know how to convert only one list of dictionaries to a dataframe.
like this:
d = [{"recommend_key":"XXX089","recommend_point":30},
{"recommend_key":"XXX567","recommend_point":20}]
df = pd.DataFrame(d)
row recommend_key recommend_point
0 XXX089 30
1 XXX567 20
But I don't know how to do this to a dataframe when there is one column storing list of dicts, or there are multiple columns storing list of dicts
row col_a col_b col_c
0 B001 [{"a":"b"},{"a":"c"}] [{"y":11},{"a":"c"}]
1 D009 [{"c":"o"},{"g":"c"}] [{"y":11},{"a":"c"},{"l":"c"}]
2 G068 [{"c":"b"},{"a":"c"}] [{"a":56},{"d":"c"}]
3 C004 [{"d":"a"},{"b":"c"}] [{"c":22},{"a":"c"},{"b":"c"}]
4 F011 [{"h":"u"},{"d":"c"}] [{"h":27},{"d":"c"}]
Try:
pd.concat([df.explode('recommend_info').drop(['recommend_info'], axis=1),
df.explode('recommend_info')['recommend_info'].apply(pd.Series)],
axis=1)
You can do the same thing over and over again with every column
Here is an example:
>>> df = pd.DataFrame({'a': [[{3: 4, 5: 6}, {3:8, 5: 1}],
... [{3:2, 5:4}, {3: 8, 5: 10}]],
... 'b': ['X', "Y"]})
>>> df
a b
0 [{3: 4, 5: 6}, {3: 8, 5: 1}] X
1 [{3: 2, 5: 4}, {3: 8, 5: 10}] Y
>>> df = pd.concat([df.explode('a').drop(['a'], axis=1),
... df.explode('a')['a'].apply(pd.Series)],
... axis=1)
>>> df
b 3 5
0 X 4 6
0 X 8 1
1 Y 2 4
1 Y 8 10
I had a data frame that contained several columns. One of the columns contained a list with one dictionary in each list. I needed the dictionary to be exploded and then appended to the same row it came from. Riccardo's answer mostly worked for me. I have generalized it a bit below:
def explode_column_from_list_dict(df_in, column_name_to_explode):
df = df_in.copy()
df = pd.concat(
[
df.explode(column_name_to_explode).drop([column_name_to_explode], axis=1),
df.explode(column_name_to_explode)[column_name_to_explode].apply(pd.Series),
],
axis=1,
)
return df

Creating Pandas DataFrame from list or dict always returns empty DF

I'm trying to create a pandas dataframe out of a dictionary. The dictionary keys are strings and the values are 1 or more lists. I'm having a strange issue in which pd.DataFrame() command consistently returns an empty dataframe even when I pass it a non-empty object like a list or dict.
My code is similar to the following:
myDictionary = {"ID1":[1,2,3], "ID2":[10,11,12],[2,34,11],"ID3":[8,3,12]}
df = pd.DataFrame(myDictionary, columns = ["A","B","C"])
So I want to create a DF that looks like this:
A B C
ID1 1 2 3
ID2 10 11 12
ID2 2 34 11
ID3 8 3 12
When I check the contents of df, I get "Empty DataFrame" and if I iterate over its contents, I get just the column names and none of the data in myDictionary! I have checked the documentation and this should be a strightforward command:
pd.DataFrame(dict, columns)
This doesn't get me the result I'm looking for and I'm baffled why. Anyone have any ideas? Thank you!
What I would recommend doing in this situation is interpreting your list of lists as strings. Later if you need to edit or analyze any of these you can use a parser to interpret the columns.
See below working code that allows you to keep your list of lists in the dataframe.
myDictionary = {"ID1":'[1,2,3]', "ID2":'[10,11,12],[2,34,11]',"ID3":'[8,3,12]'}
df = pd.DataFrame(myDictionary, columns = ["ID1","ID2","ID3"], index = [0])
df.rename(columns ={'ID1' : 'A', 'ID2': 'B', 'ID3': 'C'}, inplace = True)
df.head(3)
By always converting the lists to strings you will be able to combine them much easier, regardless of how many lists there are that need to be combined.
try the example below to figure out why df is empty:
myDictionary = {"ID1":[1,2,3], "ID2":[10,11,12],"ID3":[8,3,12], 'A':[0, 0, 0]}
df = pd.DataFrame(myDictionary, columns = ["A","B","C"])
and the what you want is:
myDictionary = {"ID1":[1,2,3], "ID2":[10,11,12],"ID3":[8,3,12]}
df = pd.DataFrame(myDictionary).rename(columns={'ID1':'A', 'ID2':'B', 'ID3':'C'})
You are passing in the names "ID1", "ID2", and "ID3" into pd.DataFrame as the column names and then telling pandas to use columns A, B, C. Since there are no columns A, B, C pandas returns an empty DataFrame. Use the code below to make the DataFrame:
import pandas as pd
myDictionary = {"ID1": [1, 2, 3], "ID2": [10, 11, 12], "ID3": [8, 3, 12]}
df = pd.DataFrame(myDictionary, columns=["ID1", "ID2", "ID3"])
print(df)
Output:
ID1 ID2 ID3
0 1 10 8
1 2 11 3
2 3 12 12
And moreover this:
"ID2":[10,11,12],[2,34,11]
Is incorrect since you are either trying to pass 2 keys for one value in a dictionary, or forgot to make a key for the values [2,34,11]. Thus your dictionary should be returning errors when you try and compile unless you remove that list.
Firstly the [2,34,11] list is missing a column name. GIVE IT A NAME!
The reason for your error is that when you use the following command:
df = pd.DataFrame(myDictionary, columns = ["A","B","C"])
It creates a dataframe based on your dictionary. But then you are saying that you only want columns from your dictionary that are labelled 'A', 'B', 'C', which your dictionary doesn't have.
Try instead:
df = pd.DataFrame(myDictionary, columns = ["ID1","ID2","ID3"])
df.rename(columns ={'ID1' : 'A', 'ID2': 'B', 'ID3': 'C'}, inplace = True)
you can not create a data frame where two row level will be same like yours example
ID2 10 11 12
ID2 2 34 11
and at the same time, it is also true for the dictionary as well, in the dictionary every key has to be unique but in yours dataframe metioned like below dictionary which is impossible
{"ID2":[10,11,12],"ID2":[2,34,11]}
so my suggestion chagne you dictionary design and follow so many answers about to convert dictinary to df
Here is one possible approach
Dictionary
myDictionary = {"ID1":[1,2,3], "ID2":[[10,11,12],[2,34,11]],"ID3":[8,3,12]}
Get a dictionary d that contains key-values for values that are nested lists whose (a) keys are unique - use a suffix to ensure the keys of this dictionary d are unique and (b) whose values are flattened sub-lists from the nested list
to do this, iterate through the loop and
check if the value contains a sublist
if so, append that key:value pair to a separate dictionary d
use a suffix to separate identical keys, since the key ID2 can't be repeated in a dictionary
each suffix will hold one of the sub-lists from the nested list
generate a list of keys from the original dictionary (in a variable named nested_keys myDictionary), whose values are nested lists
d = {}
nested_keys = []
for k,v in myDictionary.items():
if any(isinstance(i, list) for i in v):
for m,s in enumerate(v):
d[k+'_'+str(m+1)] = s
nested_keys.append(k)
print(d)
{'ID2_1': [10, 11, 12], 'ID2_2': [2, 34, 11]}
(Using the list of keys whose values are nested lists - nested_keys) Get a second dictionary that contains values that are not nested lists - see this SO post for how to do this
myDictionary = {key: myDictionary[key] for key in myDictionary if key not in nested_keys}
print(myDictionary)
{'ID1': [1, 2, 3], 'ID3': [8, 3, 12]}
Combine the 2 dictionaries above into a single dictionary
myDictionary = {**d, **myDictionary}
print(myDictionary)
{'ID2_1': [10, 11, 12], 'ID2_2': [2, 34, 11], 'ID1': [1, 2, 3], 'ID3': [8, 3, 12]}
Convert the combined dictionary into a DataFrame and drop the suffix that was added earlier
df = pd.DataFrame(list(myDictionary.values()), index=myDictionary.keys(),
columns=list('ABC'))
df.reset_index(inplace=True)
df = df.replace(r"_[0-9]", "", regex=True)
df.sort_values(by='index', inplace=True)
print(df)
index A B C
2 ID1 1 2 3
0 ID2 10 11 12
1 ID2 2 34 11
3 ID3 8 3 12

python dataframe to dictionary, key value issue

I am creating a dataframe and then converting to a dictionary as below
data = {'ID': [1,2,3,4,5],
'A':['1','2','1','3','2'],
'B':[4,6,8,2,4]}
frame = pd.DataFrame(data)
dict_obj = dict(frame[['A','B']].groupby('A').median().sort_values(by='B'))
My problem is that I want column A as Key and column B as values but somehow I am getting a weird dictionary
dict_obj
{'B': A
3 2
2 5
1 6
Name: B, dtype: int64}
i want dictionary object as
{1:6,2:5,3:2}
Could someone help please?
Use the pd.Series.to_dict method
frame.groupby('A').B.median().to_dict()
{'1': 6, '2': 5, '3': 2}

Categories

Resources