Turning a list of dictionaries into a DataFrame - python

If you have a list of dictionaries like this:
listofdict = [{'value1': [1, 2, 3, 4, 5]}, {'value2': [5, 4, 3, 2, 1]}, {'value3': ['a', 'b', 'c', 'd', 'e']}]
How can you turn it into a dataframe where value1, value2 and value3 are column names and the lists are the columns.
I tried:
df = pd.DataFrame(listofdict)
But it gives me the values congested in one row and the remaining rows as NaN.

Here is another way:
df = pd.DataFrame({k:v for i in listofdict for k,v in i.items()})
Output:
value1 value2 value3
0 1 5 a
1 2 4 b
2 3 3 c
3 4 2 d
4 5 1 e

DataFrame is expecting a single dictionary with column names as keys, so you need to fusion all these dictionaries in a single one like {'value1': [1, 2, 3, 4, 5], 'value2': [5, 4, 3, 2, 1], ... }
You can try
listofdict = [{'value1':[1,2,3,4,5]}, {'value2':[5,4,3,2,1]},{'value3':['a','b','c','d','e']}]
dicofdics = {}
for dct in listofdict:
dicofdics.update(dct)
df = pd.DataFrame(dicofdics)
df
index
value1
value2
value3
0
1
5
a
1
2
4
b
2
3
3
c
3
4
2
d
4
5
1
e

Related

How to Convert a dataframe into nested dictionary in the following format

2 3 4 loc_id
0 b b c 1
1 b b c 6
2 b a b 8
3 b b c 10
4 b a b 11
Can somone help me with converting the above dataframe to the following dictionary in Python with column names as first key and a dictionary inside that with keys as columns values of some columns and values as column values of another column
{2:{'b':[1,6,8,10,11]},3:{'b':[1,6,10],'a':[8,11]},4:{'c':[1,6,10],'b':[8,11]}}
Use DataFrame.melt with GroupBy.agg and list for MultiIndex Series and then create nested dictionary:
s = df.melt('loc_id').groupby(['variable','value'])['loc_id'].agg(list)
d = {level: s.xs(level).to_dict() for level in s.index.levels[0]}
print (d)
{'2': {'b': [1, 1, 6, 8, 10, 11]},
'3': {'a': [8, 11], 'b': [1, 1, 6, 10]},
'4': {'b': [8, 11], 'c': [1, 1, 6, 10]}}
Or create dictionary of Series and aggregate index to list:
d = {k: v.groupby(v).agg(lambda x: list(x.index)).to_dict()
for k, v in df.set_index('loc_id').to_dict('series').items()}

Pandas DataFrame filter by multiple column criterias and multiple intervals

I have checked several answers but found no luck so far.
My dataset is like this:
df = pd.DataFrame({
'Location':['A', 'A', 'A', 'B', 'C', 'C'],
'Place':[1, 2, 3, 4, 2, 3],
'Value1':[1, 1, 2, 3, 4, 5],
'Value2':[1, 1, 2, 3, 4, 5]
}, columns = ['Location','Place','Value1','Value2'])
Location Place Value1 Value2
A 1 1 1
A 2 1 1
A 3 2 2
B 4 3 3
C 2 4 4
C 3 5 5
and I have a list of intervals:
A: [0, 1]
A: [3, 5]
B: [1, 3]
C: [1, 4]
C: [6, 10]
Now I want that every row that have Location equal to that of the filter list, should have the Place in range of the filter. So the desired output will be:
Location Place Value1 Value2
A 1 1 1
A 3 2 2
C 2 4 4
C 3 5 5
I know that I can chain multiple between conditions by | , but I have a really long list of intervals so manually enter the condition is not feasible. I also consider forloop to slice the data by location first, but I think there could be more efficient way.
Thank you for your help.
Edit: Currently the list of intervals is just strings like this
A 0 1
A 3 5
B 1 3
C 1 4
C 6 10
but I would like to slice them into list of dicts. Better structure for it is also welcome!
First define dataframe df and filters dff:
df = pd.DataFrame({
'Location':['A', 'A', 'A', 'B', 'C', 'C'],
'Place':[1, 2, 3, 4, 2, 3],
'Value1':[1, 1, 2, 3, 4, 5],
'Value2':[1, 1, 2, 3, 4, 5]
}, columns = ['Location','Place','Value1','Value2'])
dff = pd.DataFrame({'Location':['A','A','B','C','C'],
'fPlace':[[0,1], [3, 5], [1, 3], [1, 4], [6, 10]]})
dff[['p1', 'p2']] = pd.DataFrame(dff["fPlace"].to_list())
now dff is:
Location fPlace p1 p2
0 A [0, 1] 0 1
1 A [3, 5] 3 5
2 B [1, 3] 1 3
3 C [1, 4] 1 4
4 C [6, 10] 6 10
where fPlace transformed to lower and upper bounds p1 and p2 indicates filters that should be applied to Place. Next:
df.merge(dff).query('Place >= p1 and Place <= p2').drop(columns = ['fPlace','p1','p2'])
result:
Location Place Value1 Value2
0 A 1 1 1
5 A 3 2 2
7 C 2 4 4
9 C 3 5 5
Prerequisites:
# presumed setup for your intervals:
intervals = {
"A": [
[0, 1],
[3, 5],
],
"B": [
[1, 3],
],
"C": [
[1, 4],
[6, 10],
],
}
Actual solution:
x = df["Location"].map(intervals).explode().str
l, r = x[0], x[1]
res = df["Place"].loc[l.index].between(l, r)
res = res.loc[res].index.unique()
res = df.loc[res]
Outputs:
>>> res
Location Place Value1 Value2
0 A 1 1 1
2 A 3 2 2
4 C 2 4 4
5 C 3 5 5

Duplicating rows with certain value in a column

I have to duplicate rows that have a certain value in a column and replace the value with another value.
For instance, I have this data:
import pandas as pd
df = pd.DataFrame({'Date': [1, 2, 3, 4], 'B': [1, 2, 3, 2], 'C': ['A','B','C','D']})
Now, I want to duplicate the rows that have 2 in column 'B' then change 2 to 4
df = pd.DataFrame({'Date': [1, 2, 2, 3, 4, 4], 'B': [1, 2, 4, 3, 2, 4], 'C': ['A','B','B','C','D','D']})
Please help me on this one. Thank you.
You can use append, to append the rows where B == 2, which you can extract using loc, but also reassigning B to 4 using assign. If order matters, you can then order by C (to reproduce your desired frame):
>>> df.append(df[df.B.eq(2)].assign(B=4)).sort_values('C')
B C Date
0 1 A 1
1 2 B 2
1 4 B 2
2 3 C 3
3 2 D 4
3 4 D 4

How to convert rows into list using pandas python?

product,count,value1,value2,value3
A,10,5,3,2
B,8,2,2,4
This is my dataframe. I need output like following format:
product,count,values
A,10,[5,3,2]
B,8,[2,2,4]
Here's one way
In [27]: df['values'] = df[['value1', 'value2', 'value3']].values.tolist()
In [28]: df
Out[28]:
product count value1 value2 value3 values
0 A 10 5 3 2 [5, 3, 2]
1 B 8 2 2 4 [2, 2, 4]
In [29]: df.drop(['value1', 'value2', 'value3'], axis=1)
Out[29]:
product count values
0 A 10 [5, 3, 2]
1 B 8 [2, 2, 4]
Details:
In [35]: df = pd.DataFrame([['A', 10, 5, 3, 2], ['B', 8, 2, 2, 4]],
....: columns=['product', 'count', 'value1', 'value2', 'value3'])
In [36]: df
Out[36]:
product count value1 value2 value3
0 A 10 5 3 2
1 B 8 2 2 4

Nested dictionary to multiindex dataframe where dictionary keys are column labels

Say I have a dictionary that looks like this:
dictionary = {'A' : {'a': [1,2,3,4,5],
'b': [6,7,8,9,1]},
'B' : {'a': [2,3,4,5,6],
'b': [7,8,9,1,2]}}
and I want a dataframe that looks something like this:
A B
a b a b
0 1 6 2 7
1 2 7 3 8
2 3 8 4 9
3 4 9 5 1
4 5 1 6 2
Is there a convenient way to do this? If I try:
In [99]:
DataFrame(dictionary)
Out[99]:
A B
a [1, 2, 3, 4, 5] [2, 3, 4, 5, 6]
b [6, 7, 8, 9, 1] [7, 8, 9, 1, 2]
I get a dataframe where each element is a list. What I need is a multiindex where each level corresponds to the keys in the nested dict and the rows corresponding to each element in the list as shown above. I think I can work a very crude solution but I'm hoping there might be something a bit simpler.
Pandas wants the MultiIndex values as tuples, not nested dicts. The simplest thing is to convert your dictionary to the right format before trying to pass it to DataFrame:
>>> reform = {(outerKey, innerKey): values for outerKey, innerDict in dictionary.items() for innerKey, values in innerDict.items()}
>>> reform
{('A', 'a'): [1, 2, 3, 4, 5],
('A', 'b'): [6, 7, 8, 9, 1],
('B', 'a'): [2, 3, 4, 5, 6],
('B', 'b'): [7, 8, 9, 1, 2]}
>>> pandas.DataFrame(reform)
A B
a b a b
0 1 6 2 7
1 2 7 3 8
2 3 8 4 9
3 4 9 5 1
4 5 1 6 2
[5 rows x 4 columns]
You're looking for the functionality in .stack:
df = pandas.DataFrame.from_dict(dictionary, orient="index").stack().to_frame()
# to break out the lists into columns
df = pandas.DataFrame(df[0].values.tolist(), index=df.index)
dict_of_df = {k: pd.DataFrame(v) for k,v in dictionary.items()}
df = pd.concat(dict_of_df, axis=1)
Note that the order of columns is lost for python < 3.6
This recursive function should work:
def reform_dict(dictionary, t=tuple(), reform={}):
for key, val in dictionary.items():
t = t + (key,)
if isinstance(val, dict):
reform_dict(val, t, reform)
else:
reform.update({t: val})
t = t[:-1]
return reform
If lists in the dictionary are not of the same lenght, you can adapte the method of BrenBarn.
>>> dictionary = {'A' : {'a': [1,2,3,4,5],
'b': [6,7,8,9,1]},
'B' : {'a': [2,3,4,5,6],
'b': [7,8,9,1]}}
>>> reform = {(outerKey, innerKey): values for outerKey, innerDict in dictionary.items() for innerKey, values in innerDict.items()}
>>> reform
{('A', 'a'): [1, 2, 3, 4, 5],
('A', 'b'): [6, 7, 8, 9, 1],
('B', 'a'): [2, 3, 4, 5, 6],
('B', 'b'): [7, 8, 9, 1]}
>>> pandas.DataFrame.from_dict(reform, orient='index').transpose()
>>> df.columns = pd.MultiIndex.from_tuples(df.columns)
A B
a b a b
0 1 6 2 7
1 2 7 3 8
2 3 8 4 9
3 4 9 5 1
4 5 1 6 NaN
[5 rows x 4 columns]
This solution works for a larger dataframe, it fits what was requested
cols = df.columns
int_cols = len(cols)
col_subset_1 = [cols[x] for x in range(1,int(int_cols/2)+1)]
col_subset_2 = [cols[x] for x in range(int(int_cols/2)+1, int_cols)]
col_subset_1_label = list(zip(['A']*len(col_subset_1), col_subset_1))
col_subset_2_label = list(zip(['B']*len(col_subset_2), col_subset_2))
df.columns = pd.MultiIndex.from_tuples([('','myIndex'),*col_subset_1_label,*col_subset_2_label])
OUTPUT
A B
myIndex a b c d
0 0.159710 1.472925 0.619508 -0.476738 0.866238
1 -0.665062 0.609273 -0.089719 0.730012 0.751615
2 0.215350 -0.403239 1.801829 -2.052797 -1.026114
3 -0.609692 1.163072 -1.007984 -0.324902 -1.624007
4 0.791321 -0.060026 -1.328531 -0.498092 0.559837
5 0.247412 -0.841714 0.354314 0.506985 0.425254
6 0.443535 1.037502 -0.433115 0.601754 -1.405284
7 -0.433744 1.514892 1.963495 -2.353169 1.285580

Categories

Resources