Pandas Dataframe multiple rows with same index - python

I have a dictionary that looks like this:
dict = {
"A": [1,2,3],
"B": [4]
}
when I try to create a panda Dataframe I use:
output_df = pd.DataFrame.from_dict(dict, orient='index')
Output:
-
1
2
3
A
1
2
3
B
4
What I want:
-
1
A
1
A
2
A
3
B
4
Thanks for your help! :)

try:
df.stack().swaplevel(0,1)
1 A 1.0
2 A 2.0
3 A 3.0
1 B 4.0
dtype: float64
df.stack().swaplevel(0,1).reset_index(level=[1], name='a').reset_index(drop=True)
level_1 a
0 A 1.0
1 A 2.0
2 A 3.0
3 B 4.0

Related

pandas shift does not work for subset of columns and rows

I have the following data Frame:
df = pd.DataFrame({"a":[1,2,3,4,5], "b":[3,2,1,2,2], "c": [2,1,0,2,1]})
a b c
0 1 3 2
1 2 2 1
2 3 1 0
3 4 2 2
4 5 2 1
and I want to shift columns a and b at indexes 0 to 2. I.e. my desired result is
a b c
0 NaN NaN 2
1 1 3 1
2 2 2 0
3 4 1 2
4 5 2 1
If I do
df[["a", "b"]][0:3] = df[["a", "b"]][0:3].shift(1)
and look at df, it appears to not have changed.
However, if a select only the rows or the columns, it works:
Single Column, select subset of rows:
df["a"][0:3] = df["a"][0:3].shift(1)
Output:
a b c
0 NaN 3 2
1 1.0 2 1
2 2.0 1 0
3 4.0 2 2
4 5.0 2 1
Likewise, if i select a list of columns, but all rows, it works as expected, too:
df[["a", "b"]] = df[["a", "b"]].shift(1)
output:
a b c
0 NaN NaN 2
1 1.0 3.0 1
2 2.0 2.0 0
3 3.0 1.0 2
4 4.0 2.0 1
Why does df[["a", "b"]][0:3] = df[["a", "b"]][0:3].shift(1) not work as expected? am I missing something?
Problem is there are double selecting - first columns and then rows, so updating copy. Check also evaluation order matters.
Possible solution with one selecting DataFrame.loc for index labels and columns names:
df.loc[0:2, ["a", "b"]] = df.loc[0:2, ["a", "b"]].shift(1)
print (df)
a b c
0 NaN NaN 2
1 1.0 3.0 1
2 2.0 2.0 0
3 4.0 2.0 2
4 5.0 2.0 1
If not default index and is necessary select first 2 rows:
df = pd.DataFrame({"a":[1,2,3,4,5], "b":[3,2,1,2,2], "c": [2,1,0,2,1]},
index=list('abcde'))
df.loc[df.index[0:2], ["a", "b"]] = df.loc[df.index[0:2], ["a", "b"]].shift(1)
print (df)
a b c
a NaN NaN 2
b 1.0 3.0 1
c 3.0 1.0 0
d 4.0 2.0 2
e 5.0 2.0 1

pandas switch rows with columns and preserve data

How to replace rows with columns in below data that all data is preserved?
Test data:
import pandas as pd
data_dic = {
"x": ['a','b','a','a','b'],
"y": [1,2,3,4,5]
}
df = pd.DataFrame(data_dic)
x y
0 a 1
1 b 2
2 a 3
3 b 4
4 b 5
Expected Output:
a b
0 1 2
1 3 4
2 NaN 5
Use GroupBy.cumcount with pivot:
df = df.assign(g = df.groupby('x').cumcount()).pivot('g','x','y')
Or DataFrame.set_index with Series.unstack:
df = df.set_index([df.groupby('x').cumcount(),'x'])['y'].unstack()
print (df)
x a b
g
0 1.0 2.0
1 3.0 4.0
2 NaN 5.0

Pandas: Find the max value in one column containing lists

I have a dataframe like this:
fly_frame:
day plcae
0 [1,2,3,4,5] A
1 [1,2,3,4] B
2 [1,2] C
3 [1,2,3,4] D
If I want to find the max value in each entry in the day column.
For example:
fly_frame:
day plcae
0 5 A
1 4 B
2 2 C
3 4 D
What should I do?
Thanks for your help.
df.day.apply(max)
#0 5
#1 4
#2 2
#3 4
Use apply with max:
#if strings
#import ast
#print (type(df.loc[0, 'day']))
#<class 'str'>
#df['day'] = df['day'].apply(ast.literal_eval)
print (type(df.loc[0, 'day']))
<class 'list'>
df['day'] = df['day'].apply(max)
Or list comprehension:
df['day'] = [max(x) for x in df['day']]
print (df)
day plcae
0 5 A
1 4 B
2 2 C
3 4 D
Try a combination of pd.concat() and df.apply() with:
import numpy as np
import pandas as pd
fly_frame = pd.DataFrame({'day':[[1,2,3,4,5],[1,2,3,4],[1,2],[1,2,3,4]],'place':['A','B','C','D']})
df = pd.concat([fly_frame['day'].apply(max),fly_frame.drop('day',axis=1)],axis=1)
print(df)
day place
0 5 A
1 4 B
2 2 C
3 4 D
Edit
You can also use df.join() with:
fly_frame.drop('day',axis=1).join(fly_frame['day'].apply(np.max,axis=0))
place day
0 A 5
1 B 4
2 C 2
3 D 4
I suggest bringing your dataframe into a better format first.
>>> df
day plcae
0 [1, 2, 3, 4, 5] A
1 [1, 2, 3, 4] B
2 [1, 2] C
3 [1, 2, 3, 4] D
>>>
>>> df = pd.concat([df.pop('day').apply(pd.Series), df], axis=1)
>>> df
0 1 2 3 4 plcae
0 1.0 2.0 3.0 4.0 5.0 A
1 1.0 2.0 3.0 4.0 NaN B
2 1.0 2.0 NaN NaN NaN C
3 1.0 2.0 3.0 4.0 NaN D
Now everything is easier, for example computing the maximum of numeric values along the columns.
>>> df.max(axis=1)
0 5.0
1 4.0
2 2.0
3 4.0
dtype: float64
edit: renaming the index might also be useful to you.
>>> df.max(axis=1).rename(df['plcae'])
A 5.0
B 4.0
C 2.0
D 4.0
dtype: float64

How to split a pandas column of type dict in columns?

I have a pandas dataframe. One of the columns of the dataframe is a dict object. The following dataframe is a toy example of the real dataframe:
DF = pd.DataFrame({'id':[1,2,3], 'col1':[{'a':1, 'b':2, 'c':3}, {'a':3, 'b':4, 'c':5}, {'a':None, 'b':5, 'c':6}]})
I would like to split the col1 in columns: one column per dictionary key.
All the rows have the same keys.
After the splitting the dataframe should look like:
id a b c
1 1 2 3
2 3 4 5
3 None 5 6
NOTE: I got the dict column from a jsonb column in postgresql.
Input:
df = pd.DataFrame({'id':[1,2,3], 'col1':[{'a':1, 'b':2, 'c':3}, {'a':3, 'b':4, 'c':5}, {'a':None, 'b':5, 'c':6}]})
df.set_index('id').col1.apply(pd.Series)
Output:
a b c
id
1 1.0 2.0 3.0
2 3.0 4.0 5.0
3 NaN 5.0 6.0
Try:
df=pd.DataFrame(DF['col1'].tolist())
df['id']=DF['id']
Then now:
print(df)
IS:
a b c id
0 1.0 2 3 1
1 3.0 4 5 2
2 NaN 5 6 3
Do:
df=pd.DataFrame(DF['col1'].tolist())
df.insert(0,'id',DF['id'])
print(df)
For putting 'id' at front
Output:
id a b c
0 1 1.0 2 3
1 2 3.0 4 5
2 3 NaN 5 6
I think you need:
df = pd.concat([DF.drop(['col1'], axis=1), DF['col1'].apply(pd.Series)], axis=1)
output
id a b c
0 1 1.0 2.0 3.0
1 2 3.0 4.0 5.0
2 3 NaN 5.0 6.0

Value Counts of Column Slice to Contain All Possible Unique Values in Column

I have a df that looks like this:
group val
A 1
A 1
A 2
B 1
B 2
B 3
I want to get the value_counts for each group separately, but want to show all possible values for each value_count group:
> df[df['group']=='A']['val'].value_counts()
1 2
2 1
3 NaN
Name: val, dtype: int64
But it currently looks like this:
> df[df['group']=='A']['val'].value_counts()
1 2
2 1
Name: val, dtype: int64
Any one know any way I can show value_counts with all possible values represented?
In [185]: df.groupby('group')['val'].value_counts().unstack('group')
Out[185]:
group A B
val
1 2.0 1.0
2 1.0 1.0
3 NaN 1.0
In [186]: df.groupby('group')['val'].value_counts().unstack('group')['A']
Out[186]:
val
1 2.0
2 1.0
3 NaN
Name: A, dtype: float64
This works:
from io import StringIO
import pandas as pd
import numpy as np
data = StringIO("""group,val
A,1
A,1
A,2
B,1
B,2
B,3""")
df = pd.read_csv(data)
print(df, '\n')
res_idx = pd.MultiIndex.from_product([df['group'].unique(), df['val'].unique()])
res = pd.concat([pd.DataFrame(index=res_idx),
df.groupby('group').apply(lambda x: x['val'].value_counts())],
axis=1)
print(res)
Produces:
group val
0 A 1
1 A 1
2 A 2
3 B 1
4 B 2
5 B 3
val
A 1 2.0
2 1.0
3 NaN
B 1 1.0
2 1.0
3 1.0

Categories

Resources