I'm trying to create a pandas dataframe out of a dictionary. The dictionary keys are strings and the values are 1 or more lists. I'm having a strange issue in which pd.DataFrame() command consistently returns an empty dataframe even when I pass it a non-empty object like a list or dict.
My code is similar to the following:
myDictionary = {"ID1":[1,2,3], "ID2":[10,11,12],[2,34,11],"ID3":[8,3,12]}
df = pd.DataFrame(myDictionary, columns = ["A","B","C"])
So I want to create a DF that looks like this:
A B C
ID1 1 2 3
ID2 10 11 12
ID2 2 34 11
ID3 8 3 12
When I check the contents of df, I get "Empty DataFrame" and if I iterate over its contents, I get just the column names and none of the data in myDictionary! I have checked the documentation and this should be a strightforward command:
pd.DataFrame(dict, columns)
This doesn't get me the result I'm looking for and I'm baffled why. Anyone have any ideas? Thank you!
What I would recommend doing in this situation is interpreting your list of lists as strings. Later if you need to edit or analyze any of these you can use a parser to interpret the columns.
See below working code that allows you to keep your list of lists in the dataframe.
myDictionary = {"ID1":'[1,2,3]', "ID2":'[10,11,12],[2,34,11]',"ID3":'[8,3,12]'}
df = pd.DataFrame(myDictionary, columns = ["ID1","ID2","ID3"], index = [0])
df.rename(columns ={'ID1' : 'A', 'ID2': 'B', 'ID3': 'C'}, inplace = True)
df.head(3)
By always converting the lists to strings you will be able to combine them much easier, regardless of how many lists there are that need to be combined.
try the example below to figure out why df is empty:
myDictionary = {"ID1":[1,2,3], "ID2":[10,11,12],"ID3":[8,3,12], 'A':[0, 0, 0]}
df = pd.DataFrame(myDictionary, columns = ["A","B","C"])
and the what you want is:
myDictionary = {"ID1":[1,2,3], "ID2":[10,11,12],"ID3":[8,3,12]}
df = pd.DataFrame(myDictionary).rename(columns={'ID1':'A', 'ID2':'B', 'ID3':'C'})
You are passing in the names "ID1", "ID2", and "ID3" into pd.DataFrame as the column names and then telling pandas to use columns A, B, C. Since there are no columns A, B, C pandas returns an empty DataFrame. Use the code below to make the DataFrame:
import pandas as pd
myDictionary = {"ID1": [1, 2, 3], "ID2": [10, 11, 12], "ID3": [8, 3, 12]}
df = pd.DataFrame(myDictionary, columns=["ID1", "ID2", "ID3"])
print(df)
Output:
ID1 ID2 ID3
0 1 10 8
1 2 11 3
2 3 12 12
And moreover this:
"ID2":[10,11,12],[2,34,11]
Is incorrect since you are either trying to pass 2 keys for one value in a dictionary, or forgot to make a key for the values [2,34,11]. Thus your dictionary should be returning errors when you try and compile unless you remove that list.
Firstly the [2,34,11] list is missing a column name. GIVE IT A NAME!
The reason for your error is that when you use the following command:
df = pd.DataFrame(myDictionary, columns = ["A","B","C"])
It creates a dataframe based on your dictionary. But then you are saying that you only want columns from your dictionary that are labelled 'A', 'B', 'C', which your dictionary doesn't have.
Try instead:
df = pd.DataFrame(myDictionary, columns = ["ID1","ID2","ID3"])
df.rename(columns ={'ID1' : 'A', 'ID2': 'B', 'ID3': 'C'}, inplace = True)
you can not create a data frame where two row level will be same like yours example
ID2 10 11 12
ID2 2 34 11
and at the same time, it is also true for the dictionary as well, in the dictionary every key has to be unique but in yours dataframe metioned like below dictionary which is impossible
{"ID2":[10,11,12],"ID2":[2,34,11]}
so my suggestion chagne you dictionary design and follow so many answers about to convert dictinary to df
Here is one possible approach
Dictionary
myDictionary = {"ID1":[1,2,3], "ID2":[[10,11,12],[2,34,11]],"ID3":[8,3,12]}
Get a dictionary d that contains key-values for values that are nested lists whose (a) keys are unique - use a suffix to ensure the keys of this dictionary d are unique and (b) whose values are flattened sub-lists from the nested list
to do this, iterate through the loop and
check if the value contains a sublist
if so, append that key:value pair to a separate dictionary d
use a suffix to separate identical keys, since the key ID2 can't be repeated in a dictionary
each suffix will hold one of the sub-lists from the nested list
generate a list of keys from the original dictionary (in a variable named nested_keys myDictionary), whose values are nested lists
d = {}
nested_keys = []
for k,v in myDictionary.items():
if any(isinstance(i, list) for i in v):
for m,s in enumerate(v):
d[k+'_'+str(m+1)] = s
nested_keys.append(k)
print(d)
{'ID2_1': [10, 11, 12], 'ID2_2': [2, 34, 11]}
(Using the list of keys whose values are nested lists - nested_keys) Get a second dictionary that contains values that are not nested lists - see this SO post for how to do this
myDictionary = {key: myDictionary[key] for key in myDictionary if key not in nested_keys}
print(myDictionary)
{'ID1': [1, 2, 3], 'ID3': [8, 3, 12]}
Combine the 2 dictionaries above into a single dictionary
myDictionary = {**d, **myDictionary}
print(myDictionary)
{'ID2_1': [10, 11, 12], 'ID2_2': [2, 34, 11], 'ID1': [1, 2, 3], 'ID3': [8, 3, 12]}
Convert the combined dictionary into a DataFrame and drop the suffix that was added earlier
df = pd.DataFrame(list(myDictionary.values()), index=myDictionary.keys(),
columns=list('ABC'))
df.reset_index(inplace=True)
df = df.replace(r"_[0-9]", "", regex=True)
df.sort_values(by='index', inplace=True)
print(df)
index A B C
2 ID1 1 2 3
0 ID2 10 11 12
1 ID2 2 34 11
3 ID3 8 3 12
The outcome of this case:
df = _pd.DataFrame({'a':['1','2','3']})
df['b'] = _np.nan
for index in df.index:
df.loc[index, 'b'] = [{'a':1}]
is:
a b
0 1 {'a': 1}
1 2 [{'a': 1}]
2 3 [{'a': 1}]
The outcome of this case:
df = _pd.DataFrame({'a':[1,2,3]})
df['b'] = _np.nan
for index in df.index:
df.loc[index, 'b'] = [{'a':1}]
is:
a b
0 1 {'a': 1}
1 2 {'a': 1}
2 3 {'a': 1}
Why?
_pd.__version__
'0.23.4'
Edit: I want to add the version number, because this might be a bug. That seems reasonable to me. But, this new hold-your-hand system we have here at stackoverflow.com won't let me do it; hence I am adding this edit in order to meet the character requirement.
I think this is cause by the type transform when you assign object to a float type columns, the first item need to convert the whole columns type from float to object , then the whole column became object and the index number 1,2 will be the right type assign since the column itself already become object
df = pd.DataFrame({'a':['1','2','3']})
df['b'] = np.nan
df['b']=df['b'].astype(object)
for index in df.index:
df.loc[index, 'b'] = [{'a':1}]
print(df.loc[index, 'b'] ,index)
[{'a': 1}] 0
[{'a': 1}] 1
[{'a': 1}] 2
df
a b
0 1 [{'a': 1}]
1 2 [{'a': 1}]
2 3 [{'a': 1}]
Also , I think this may belong to the topic https://github.com/pandas-dev/pandas/issues/11617
I am using Python.
I have a dictionary of Dataframes. Each dataframe has a name in the dictionary and I can reference it correctly no problem.
I am trying to take that name and add it as a column across every row. I am having a rough time doing this.
You can simply assign the name string to a new column for each DataFrame:
import pandas as pd
frames = {
'foo': pd.DataFrame({'a': [1, 2], 'b': [3, 4]}),
'bar': pd.DataFrame({'a': [9, 8], 'b': [7, 6]})
}
for name, df in frames.items():
df['name'] = name
print(df, '\n')
Gives:
a b name
0 1 3 foo
1 2 4 foo
a b name
0 9 7 bar
1 8 6 bar
Demo
You can iterate through your dictionary and do below:
for key in d.keys(): # d is the dictionary of dataframes
d[key]['new_col'] = key # df_name is the name string you want to add in dataframe.
You can use a dictionary comprehension and assign:
frames = {k: v.assign(name=k) for k, v in frames.items()}
I have a dict of dict like this:
data = {'1':{'a':10, 'b':30}, '2':{'a':20, 'b':60}}
And I want to convert it into this DataFrame:
1 2
'data' {'a':10, 'b':30} {'a':20, 'b':60}
but use pandas.DataFrame(data, index=['data'])
1 2
data NaN NaN
use pandas.DataFrame(data):
1 2
a 10 20
b 30 60
So how to get a DataFrame that its value is a dict?
Strange thing to want to do but you have to convert the values to a list with a single data element which is your dict:
In [42]:
data = {'1':{'a':10, 'b':30}, '2':{'a':20, 'b':60}}
for key in data:
data[key] = [data[key]]
pd.DataFrame(data, index=['data'])
Out[42]:
1 2
data {'a': 10, 'b': 30} {'a': 20, 'b': 60}