How to create a dataframe from a nested dictionary using pandas?

How to create a dataframe from a nested dictionary using pandas? - python

I have the following nested dictionary:
dict1 = {'a': 1,'b': 2,'remaining': {'c': 3,'d': 4}}
I want to create a dataframe using pandas in order to achieve the following
df = pd.DataFrame(columns=list('abcd'))
df.loc[0] = [1,2,3,4]

You could pop the 'remaining' dict to update dict1, then convert the values to vectors (like lists).
nested = dict1.pop('remaining')
dict1.update(nested)
pd.DataFrame({k: [v] for k, v in dict1.items()})
a b c d
0 1 2 3 4

You can use pandas.json_normalize:
dict1 = {'a': 1,'b': 2,'remaining': {'c': 3,'d': 4}}
df = pd.json_normalize(dict1)
df.columns = list('abcd')
Result:
a b c d
0 1 2 3 4

Related

Convert dictionary with nested lists to a specific pandas df structure

How can I convert dictionary to the "Desired output" below ?
I have a dictionary with a nested list as values:
dictionary = {'Person1': ['a', 1, 'b', 2], 'Person2': ['c', 3, 'd', 4]} #(the dict is longer than this, this is just an example)
Desired output:
name
letter
value
'Person1'
'a'
1
'Person1'
'b'
2
'Person2'
'c'
3
'Person2'
'd'
4
Current code:
import pandas as pd
df = pd.DataFrame.from_dict(dictionary)

Change dictionary to list of tuples by zipping pair and unpairs value of lists in list comprehension and pass to DataFrame constructor:
L = [(k, *x) for k, v in dictionary.items() for x in zip(v[::2], v[1::2])]
df = pd.DataFrame(L, columns=['name','letter','value'])
print (df)
name letter value
0 Person1 a 1
1 Person1 b 2
2 Person2 c 3
3 Person2 d 4

Create DataFrame containing dicts with values according to another DataFrame

I have a pandas dataframe
a = pd.DataFrame([2,4,3,6])
and want to create a corresponding second dataframe containig dicts with the same numeric entries as the first dataframe:
0
0 {'example': 2}
1 {'example': 4}
2 {'example': 3}
3 {'example': 6}
I tried the following but it doesn't work (b doesn't change at all with the second operation):
b = pd.DataFrame([[{'example':0}]] * 4)
b.loc[:][0]['example'] = a

You can create a new df with a list comprehension for which each element is a list containing the dict.
df = pd.DataFrame([[{'example': x}] for x in a.iloc[:, 0]])
Output
0
0 {'example': 2}
1 {'example': 4}
2 {'example': 3}
3 {'example': 6}

Why does the content of a dataframe affect setting?

The outcome of this case:
df = _pd.DataFrame({'a':['1','2','3']})
df['b'] = _np.nan
for index in df.index:
df.loc[index, 'b'] = [{'a':1}]
is:
a b
0 1 {'a': 1}
1 2 [{'a': 1}]
2 3 [{'a': 1}]
The outcome of this case:
df = _pd.DataFrame({'a':[1,2,3]})
df['b'] = _np.nan
for index in df.index:
df.loc[index, 'b'] = [{'a':1}]
is:
a b
0 1 {'a': 1}
1 2 {'a': 1}
2 3 {'a': 1}
Why?
_pd.__version__
'0.23.4'
Edit: I want to add the version number, because this might be a bug. That seems reasonable to me. But, this new hold-your-hand system we have here at stackoverflow.com won't let me do it; hence I am adding this edit in order to meet the character requirement.

I think this is cause by the type transform when you assign object to a float type columns, the first item need to convert the whole columns type from float to object , then the whole column became object and the index number 1,2 will be the right type assign since the column itself already become object
df = pd.DataFrame({'a':['1','2','3']})
df['b'] = np.nan
df['b']=df['b'].astype(object)
for index in df.index:
df.loc[index, 'b'] = [{'a':1}]
print(df.loc[index, 'b'] ,index)
[{'a': 1}] 0
[{'a': 1}] 1
[{'a': 1}] 2
df
a b
0 1 [{'a': 1}]
1 2 [{'a': 1}]
2 3 [{'a': 1}]
Also , I think this may belong to the topic https://github.com/pandas-dev/pandas/issues/11617

Looping dictionary through column using Pandas

I have a data frame with a column called "Input", consisting of various numbers.
I created a dictionary that looks like this
sampleDict = {
"a" : ["123","456"],
"b" : ["789","272"]
}
I am attempting to loop through column "Input" against this dictionary. If any of the values in the dictionary are found (123, 789, etc), I would like to create a new column in my data frame that signifies where it was found.
For example, I would like to create column called "found" where the value is "a" when 456 was found in "Input." the value is "b" when 789 was found in the input.
I tried the following code but my logic seems to be off:
for key in sampleDict:
for p_key in df['Input']:
if code in p_key:
if code in sampleDict[key]:
df = print(code)
print(df)

Use map by flattened lists to dictionary, only is necessary all values in lists are unique:
d = {k: oldk for oldk, oldv in sampleDict.items() for k in oldv}
print (d)
{'123': 'a', '456': 'a', '789': 'b', '272': 'b'}
df = pd.DataFrame({'Input':['789','456','100']})
df['found'] = df['Input'].map(d)
print (df)
Input found
0 789 b
1 456 a
2 100 NaN
If duplicated values in lists is possible use aggregation, e.g. by join in first step and map by Series:
sampleDict = {
"a" : ["123","456", "789"],
"b" : ["789","272"]
}
df1 = pd.DataFrame([(k, oldk) for oldk, oldv in sampleDict.items() for k in oldv],
columns=['a','b'])
s = df1.groupby('a')['b'].apply(', '.join)
print (s)
a
123 a
272 b
456 a
789 a, b
Name: b, dtype: object
df = pd.DataFrame({'Input':['789','456','100']})
df['found'] = df['Input'].map(s)
print (df)
Input found
0 789 a, b
1 456 a
2 100 NaN

You can use collections.defaultdict to construct a mapping of list values to key(s). Data from #jezrael.
from collections import defaultdict
d = defaultdict(list)
for k, v in sampleDict.items():
for w in v:
d[w].append(k)
print(d)
defaultdict(list,
{'123': ['a'], '272': ['b'], '456': ['a'], '789': ['a', 'b']})
Then use pd.Series.map to map inputs to keys in a new series:
df = pd.DataFrame({'Input':['789','456','100']})
df['found'] = df['Input'].map(d)
print(df)
Input found
0 789 [a, b]
1 456 [a]
2 100 NaN

create a mask using a list comprehension then convert the list to an array and mask the true values in the search array
sampleDict = {
"a" : ["123","456"],
"b" : ["789","272"]
}
search=['789','456','100']
#https://www.techbeamers.com/program-python-list-contains-elements/
#https://stackoverflow.com/questions/10274774/python-elegant-and-efficient-ways-to-mask-a-list
for key,item in sampleDict.items():
print(item)
mask=[]
[mask.append(x in search) for x in item]
arr=np.array(item)
print(arr[mask])

Convert a dict of dict into a DataFrame

I have a dict of dict like this:
data = {'1':{'a':10, 'b':30}, '2':{'a':20, 'b':60}}
And I want to convert it into this DataFrame:
1 2
'data' {'a':10, 'b':30} {'a':20, 'b':60}
but use pandas.DataFrame(data, index=['data'])
1 2
data NaN NaN
use pandas.DataFrame(data):
1 2
a 10 20
b 30 60
So how to get a DataFrame that its value is a dict?

Strange thing to want to do but you have to convert the values to a list with a single data element which is your dict:
In [42]:
data = {'1':{'a':10, 'b':30}, '2':{'a':20, 'b':60}}
for key in data:
data[key] = [data[key]]
pd.DataFrame(data, index=['data'])
Out[42]:
1 2
data {'a': 10, 'b': 30} {'a': 20, 'b': 60}

Develop Reference

Python is a programming language that lets you work quickly and integrate systems more effectively.

How to create a dataframe from a nested dictionary using pandas? - python

I have the following nested dictionary: dict1 = {'a': 1,'b': 2,'remaining': {'c': 3,'d': 4}} I want to create a dataframe using pandas in order to achieve the following df = pd.DataFrame(columns=list('abcd')) df.loc[0] = [1,2,3,4]

You could pop the 'remaining' dict to update dict1, then convert the values to vectors (like lists). nested = dict1.pop('remaining') dict1.update(nested) pd.DataFrame({k: [v] for k, v in dict1.items()}) a b c d 0 1 2 3 4

You can use pandas.json_normalize: dict1 = {'a': 1,'b': 2,'remaining': {'c': 3,'d': 4}} df = pd.json_normalize(dict1) df.columns = list('abcd') Result: a b c d 0 1 2 3 4

Related

Convert dictionary with nested lists to a specific pandas df structure

Create DataFrame containing dicts with values according to another DataFrame

Why does the content of a dataframe affect setting?

Looping dictionary through column using Pandas

Convert a dict of dict into a DataFrame

Categories

Resources