Looping through Pandas dict within dataframe

Looping through Pandas dict within dataframe - python

I have a dataframe with a column who's rows each contain a dict.
I would like to extract those dict's and turn them into dataframes so I can merge them together.
What's the best way to do this?
Something like:
for row in dataframe.column:
dataframe_loop = pd.DataFrame(dataframe['column'].iloc(row), columns=['A','B'])
dataframe_result = dataframe_result.append(dataframe_loop)

import pandas as pd
d = {'col': pd.Series([{'a':1}, {'b':2}, {'c':3}])}
df = pd.DataFrame(d)
>>>print(df)
col
0 {'a': 1}
1 {'b': 2}
2 {'c': 3}
res = {}
for row in df.iterrows():
res.update(row[1]['col'])
>>>print(res)
{'b': 2, 'a': 1, 'c': 3}

If your column contains dicts and you want to make a dataframe out of those dicts, you can just convert the column to a list of dicts and make that into a dataframe directly:
pd.DataFrame(dataframe['column'].tolist())
The dictionary keys will become columns. If you want other behavior, you'll need to specify that.

I don't know what your dict in dataframe.column looks like. If it looks like the dictionary below, I think you can use pandas.concat to concentrate dictionaries together.
import pandas as pd
# create a dummy dataframe
dataframe = pd.DataFrame({'column':[{'A':[1,2,3], 'B':[4,5,6]}, \
{'A':[7,8,9], 'B':[10,11,12]}, \
{'A':[13,14,15], 'B':[16,17,18]}]})
#print(dataframe)
res = pd.concat([pd.DataFrame(row, columns=['A', 'B']) for row in dataframe.column], ignore_index=True)
print(res)

Related

Get a row of a pandas DataFrame as a dict-like object with correct types

As suggested in Get a row of data in pandas as a dict, one can extract a row from a pandas DataFrme using loc:
df1 = pd.DataFrame([{"a":1.0,"b":2,"c":3}]).set_index("c")
df1.loc[3].to_dict()
returns {'a': 1.0, 'b': 2.0} - alas, this is wrong because the b value should be 2 and not 2.0.
How do I extract a row with correct types?

One idea is use [[]] for one row DataFrame, a bit ovecomplicated in my opinion:
d = df1.loc[[3]].to_dict(orient='record')[0]
print (d)
{'a': 1.0, 'b': 2}
Problem is if create Series by loc (df1.loc[3]) then float column upcast integer(s) columns.

how to add pandas columns as key value pairs to existing python dictionary

I have an existing dictionary, and I would like to append values from a pandas dataframe; one column would be the keys, and the other column would be the values.
How do I go about doing this?

import pandas as pd
# existing dictionary
mydict = {'hello': 42}
# a pandas dataframe;
d = {'name': ['a', 'b'], 'val': [3, 4]}
df = pd.DataFrame(data=d)
# update dict with df colums
mydict.update(zip(df.name.tolist(), df.val.tolist()))

Creating a dataframe from dictionary with arbitrary length values (using recycled keys as column value)

I am struggling with converting a dictionary into a dataframe.
There are already a lot of answers showing how to do it in the "wide format" like https://stackoverflow.com/a/52819186/6912069 but I would like to do something different, preferably not using loops.
Consider the following example:
I have a dictionary like this one
d_test = {'A': [1, 2], 'B': [3]}
and I'd like to get a dataframe like
index id values
0 A 1
1 A 2
2 B 3
The index can be a normal consecutive integer column. By recycling I mean turning 'A'=[1, 2] into two rows having A in the id column and the values in the values column. This way I would have a "long format" dataframe of the dictionary items.
It seems to be a very basic thing to do, but I was wondering if there is an elegant pythonic way to achieve this. Many thanks for your help.

I would create 2 lists. One from the keys, and other one from the values of the dictionary. As you defined the lists you can pass the lists into the DataFrame.
import pandas as pd
dic = {'A': [1, 2], 'B': [3], 'D': [4, 5, 6]}
keys = []
values = []
for key, value in dic.items():
for v in value:
keys.append(key)
values.append(v)
df = pd.DataFrame(
{'id': keys,
'values': values,
})
print(df)

How to generate multiple pandas dataframe from ordereddict?

I have an Ordered Dictionary, where the keys are the worksheet names, and the values contain the the worksheet items. Thus, the question: How do I use each of the keys and convert to an individual dataframe?
import pandas as pd
powerbipath = 'PowerBI_Ingestion.xlsx' dfs = pd.read_excel(powerbipath, None)
values=[] for idx, eachdf in enumerate(dfs):
eachdf=dfs[eachdf]
new_list1.append(eachdf)
eachdf = pd.DataFrame(new_list1[idx])
Examples I have seen only show how to convert from an ordered dictionary to 1 pandas dataframe. I want to convert to multiple dataframes. Thus, if there are 5 keys, there will be 5 dataframes.

You may want to do something like this, (Assuming your dictionary looks like 'd') :
d = {'first': [1, 2], 'second': [3, 4]}
for i in d:
df = pd.DataFrame(d.get(i), columns=[i])
print(df)
Output looks like :
first
0 1
1 2
second
0 3
1 4

Here is a basic answer using one of these ideas
keys = df["key_column"].unique
df_array = {}
for k in keys :
df_array[k] = dfs[dfs['key_column'] == k]
There might be more efficient way to do it though.

Pandas DataFrame take automatically wrong value as index

I tried to create DataFrames from a JSON file.
I have a list named "Series_participants" containing a part of this JSON file. My list look like thise when i print it.
participantId 1
championId 76
stats {'item0': 3265, 'item2': 3143, 'totalUnitsHeal...
teamId 100
timeline {'participantId': 1, 'csDiffPerMinDeltas': {'1...
spell1Id 4
spell2Id 12
highestAchievedSeasonTier SILVER
dtype: object
<class 'list'>
After i tri to convert this list to a DataFrame like this
pd.DataFrame(Series_participants)
But pandas use values of "stats" and "timeline" as index for the DataFrame. I expected to have automatic index range (0, ..., n)
EDIT 1:
participantId championId stats teamId timeline spell1Id spell2Id highestAchievedSeasonTier
0 1 76 3265 100 NaN 4 12 SILVER
I want to have a dataframe with "stats" & "timeline" colomns containing dicts of their values as in the Series display.
What is my error ?
EDIT 2:
I have tried to create manually the DataFrame but pandas didn't take my choices in consideration and finally take indexes of "stats" key of the Series.
here is my code :
for j in range(0,len(df.participants[0])):
for i in range(0,len(df.participants[0][0])):
Series_participants = pd.Series(df.participants[0][i])
test = {'participantId':Series_participants.values[0],'championId':Series_participants.values[1],'stats':Series_participants.values[2],'teamId':Series_participants.values[3],'timeline':Series_participants.values[4],'spell1Id':Series_participants.values[5],'spell2Id':Series_participants.values[6],'highestAchievedSeasonTier':Series_participants.values[7]}
if j == 0:
df_participants = pd.DataFrame(test)
else:
df_participants.append(test, ignore_index=True)
The double loop is to parse all "participant" of my JSON file.
LAST EDIT :
I achieved what i wanted with the following code :
for i in range(0,len(df.participants[0])):
Series_participants = pd.Series(df.participants[0][i])
df_test = pd.DataFrame(data=[Series_participants.values], columns=['participantId','championId','stats','teamId','timeline','spell1Id','spell2Id','highestAchievedSeasonTier'])
if i == 0:
df_participants = pd.DataFrame(df_test)
else:
df_participants = df_participants.append(df_test, ignore_index=True)
print(df_participants)
Thanks to all for your help !

For efficiency, you should try and manipulate your data as you construct your dataframe rather than as a separate step.
However, to split apart your dictionary keys and values you can use a combination of numpy.repeat and itertools.chain. Here's a minimal example:
df = pd.DataFrame({'A': [1, 2],
'B': [{'key1': 'val0', 'key2': 'val9'},
{'key1': 'val1', 'key2': 'val2'}],
'C': [{'key3': 'val10', 'key4': 'val8'},
{'key3': 'val3', 'key4': 'val4'}]})
import numpy as np
from itertools import chain
chainer = chain.from_iterable
lens = df['B'].map(len)
res = pd.DataFrame({'A': np.repeat(df['A'], lens),
'B': list(chainer(df['B'].map(lambda x: x.values())))})
res.index = chainer(df['B'].map(lambda x: x.keys()))
print(res)
A B
key1 1 val0
key2 1 val9
key1 2 val1
key2 2 val2

If you try to input lists, series or arrays containing dicts into the object constructor, it doesn't recognise what you're trying to do. One way around this is manually setting:
df.at['a', 'b'] = {'x':value}
Note, the above will only work if the columns and indexes are already created in your DataFrame.

Updated per comments: Pandas data frames can hold dictionaries, but it is not recommended.
Pandas is interpreting that you want one index for each of the your dictionary keys and then broadcasting the single item columns across them.
So to help with what you are trying to do I would recommend reading in your dictionaries items as columns. Which is what data frames are typically used for and very good at.
Example Error due to pandas trying to read in the dictionary by key, value pair:
df = pd.DataFrame(columns= ['a', 'b'], index=['a', 'b'])
df.loc['a','a'] = {'apple': 2}
returns
ValueError: Incompatible indexer with Series
Per jpp in the comments below (When using the constructor method):
"They can hold arbitrary types, e.g.
df.iat[0, 0] = {'apple': 2}
However, it's not recommended to use Pandas in this way."

Develop Reference

Python is a programming language that lets you work quickly and integrate systems more effectively.

Looping through Pandas dict within dataframe - python

import pandas as pd d = {'col': pd.Series([{'a':1}, {'b':2}, {'c':3}])} df = pd.DataFrame(d) >>>print(df) col 0 {'a': 1} 1 {'b': 2} 2 {'c': 3} res = {} for row in df.iterrows(): res.update(row[1]['col']) >>>print(res) {'b': 2, 'a': 1, 'c': 3}

Related

Get a row of a pandas DataFrame as a dict-like object with correct types

how to add pandas columns as key value pairs to existing python dictionary

Creating a dataframe from dictionary with arbitrary length values (using recycled keys as column value)

How to generate multiple pandas dataframe from ordereddict?

Pandas DataFrame take automatically wrong value as index

Categories

Resources