split dataframe and create a new dataframe - python

the simple code
I have a dataframe df and i I have a dataframe and I divided into 3 dataframe of same size .However I wanted with these 3 dataframe created one dataframe .The columns of the new dataframe be the transposed these 3 dataframe ie there will 3 columns
In [4]: np.array_split(df, 3)
Out[4]:
[ A B C D
0 foo one -0.174067 -0.608579
1 bar one -0.860386 -1.210518
2 foo two 0.614102 1.689837,
A B C D
3 bar three -0.284792 -1.071160
4 foo two 0.843610 0.803712
5 bar two -1.514722 0.870861,
A B C D
6 foo one 0.131529 -0.968151
7 foo three -1.002946 -0.257468
8 foo three -1.002946 -0.257468]

UPDATE
Sliced and transposed
In [2]: df
Out[2]:
a b c
0 9 9 7
1 1 7 6
2 5 9 1
3 7 4 0
4 5 2 3
5 2 4 6
6 6 3 6
7 0 2 7
8 9 1 4
9 2 9 3
In [3]: dfs = [pd.DataFrame(a).T for a in np.array_split(df, 3)]
In [4]: dfs[0]
Out[4]:
0 1 2 3
a 9 1 5 7
b 9 7 9 4
c 7 6 1 0
OLD version
One option would be to use this:
In [114]: dfs = [pd.DataFrame(a) for a in np.array_split(df, 3)]
In [115]: dfs[0]
Out[115]:
a b c
0 NaN 7.0 0
1 0.0 NaN 4
2 2.0 NaN 4
3 1.0 7.0 0
In [116]: df
Out[116]:
a b c
0 NaN 7.0 0
1 0.0 NaN 4
2 2.0 NaN 4
3 1.0 7.0 0
4 1.0 3.0 9
5 7.0 4.0 9
6 2.0 6.0 9
7 9.0 6.0 4
8 3.0 0.0 9
9 9.0 0.0 1

Related

How to add unbalanced List into a dataFrame in Python?

Here is My dataframe and List
X Y Z X1
1 2 3 3
2 7 2 6
3 10 5 4
4 3 7 9
5 3 3 4
list1=[3,5,6]
list2=[4,3,7,4]
I want to add the lists into a data frame, I have tried some code but it gives an error and something is not working
#Expected Output
X Y Z X1
1 2 3 3
2 7 2 6
3 10 5 4
4 3 7 9
5 3 3 4
3 4
5 3
6 7
4
#here is my code
list1 = [3,5, 6]
df_length = len(df1)
df1.loc[df_length] = list1
Please help me to solve this problem.
Thanks in advance.
Use series.append() to create the new series (X & X1), and create the output df using pd.concat():
s1 = df.X.append(pd.Series(list1)).reset_index(drop=True)
s2 = df.X1.append(pd.Series(list2)).reset_index(drop=True)
df = pd.concat([s1, df.Y, df.Z, s2], axis=1).rename(columns={0: 'X', 1: 'X1'})
df
X Y Z X1
0 1.0 2.0 3.0 3
1 2.0 7.0 2.0 6
2 3.0 10.0 5.0 4
3 4.0 3.0 7.0 9
4 5.0 3.0 3.0 4
5 3.0 NaN NaN 4
6 5.0 NaN NaN 3
7 6.0 NaN NaN 7
8 NaN NaN NaN 4
'''
X Y Z X1
1 2 3 3
2 7 2 6
3 10 5 4
4 3 7 9
5 3 3 4
'''
list1=[3,5,6]
list2=[4,3,7,4]
ls_empty=[]
import pandas as pd
import numpy as np
df = pd.read_clipboard()
df1 = pd.DataFrame([list1, ls_empty, ls_empty, list2])
df1 = df1.T
df1.columns = df.columns
df2 = pd.concat([df, df1]).replace(np.nan, '', regex=True).reset_index(drop=True).astype({'X1': int})
print(df2)
Output:
X Y Z X1
0 1 2 3 3
1 2 7 2 6
2 3 10 5 4
3 4 3 7 9
4 5 3 3 4
5 3 4
6 5 3
7 6 7
8 4

How to fill NaN in one column depending from values two different columns

I have a dataframe with three columns. Two of them are group and subgroup, adn the third one is a value. I have some NaN values in the values column. I need to fiil them by median values,according to group and subgroup.
I made a pivot table with double index and the median of target column. But I don`t understand how to get this values and put them into original dataframe
import pandas as pd
df=pd.DataFrame(data=[
[1,1,'A',1],
[2,1,'A',3],
[3,3,'B',8],
[4,2,'C',1],
[5,3,'A',3],
[6,2,'C',6],
[7,1,'B',2],
[8,1,'C',3],
[9,2,'A',7],
[10,3,'C',4],
[11,2,'B',6],
[12,1,'A'],
[13,1,'C'],
[14,2,'B'],
[15,3,'A']],columns=['id','group','subgroup','value'])
print(df)
id group subgroup value
0 1 1 A 1
1 2 1 A 3
2 3 3 B 8
3 4 2 C 1
4 5 3 A 3
5 6 2 C 6
6 7 1 B 2
7 8 1 C 3
8 9 2 A 7
9 10 3 C 4
10 11 2 B 6
11 12 1 A NaN
12 13 1 C NaN
13 14 2 B NaN
14 15 3 A NaN
df_struct=df.pivot_table(index=['group','subgroup'],values='value',aggfunc='median')
print(df_struct)
value
group subgroup
1 A 2.0
B 2.0
C 3.0
2 A 7.0
B 6.0
C 3.5
3 A 3.0
B 8.0
C 4.0
Will be thankfull for any help
Use pandas.DataFrame.groupby.transform then fillna:
id group subgroup value
0 1 1 A 1.0
1 2 1 A NaN # < Value with nan
2 3 3 B 8.0
3 4 2 C 1.0
4 5 3 A 3.0
5 6 2 C 6.0
6 7 1 B 2.0
7 8 1 C 3.0
8 9 2 A 7.0
9 10 3 C 4.0
10 11 2 B 6.0
df['value'] = df['value'].fillna(df.groupby(['group', 'subgroup'])['value'].transform('median'))
print(df)
Output:
id group subgroup value
0 1 1 A 1.0
1 2 1 A 1.0
2 3 3 B 8.0
3 4 2 C 1.0
4 5 3 A 3.0
5 6 2 C 6.0
6 7 1 B 2.0
7 8 1 C 3.0
8 9 2 A 7.0
9 10 3 C 4.0
10 11 2 B 6.0

Insert columns with constants in pandas

I have a row of pandas dataframe, i.e.
x p y q z
---------
1 4 2 5 3
I want to append only some columns ('x','y','z') of it to another dataframe as new columns with names 'a','b','c'.
Before:
A B
---
7 8
9 6
8 5
After
A B a b c
---------
7 8 1 2 3
9 6 1 2 3
8 5 1 2 3
try this,
df1=pd.DataFrame({'x':[1],'y':[2],'z':[3]})
df2=pd.DataFrame({'A':[7,9,8],'B':[8,6,5]})
print pd.concat([df2,df1],axis=1).fillna(method='ffill').rename(columns={'x':'a','y':'b','z':'c'})
A B a b c
0 7 8 1.0 2.0 3.0
1 9 6 1.0 2.0 3.0
2 8 5 1.0 2.0 3.0
Use assign by Series created by selecting 1. row of df1:
cols = ['x','y','z']
new_cols = ['a','b','c']
df = df2.assign(**pd.Series(df1[cols].iloc[0].values, index=new_cols))
print (df)
A B a b c
0 7 8 1 2 3
1 9 6 1 2 3
2 8 5 1 2 3

How can I merge two dataframes of dissimilar size and preserve their column order?

Consider the dataframes
A:
g N a
1 3 5
2 4 6
and B:
g N a e
3 3 4 7
4 9 1 8
Is there some way to merge these such that the resultant dataframe is:
g N a e
1 3 5 NaN
2 4 6 NaN
3 3 4 7
4 9 1 8
In other words, is there some way to preserve the column order rather than re-sort lexicographically?
Use reindex_axis:
pd.concat([A,B]).reindex_axis(B.columns, axis=1)
Output:
g N a e
0 1 3 5 NaN
1 2 4 6 NaN
0 3 3 4 7.0
1 4 9 1 8.0
When merging, specify sort=False.
In [1251]: A.merge(B, how='outer', sort=False)
Out[1251]:
g N a e
0 1 3 5 NaN
1 2 4 6 NaN
2 3 3 4 7.0
3 4 9 1 8.0
The following should do the trick: pd.concat([a, b])[b.columns]
Full test code:
import pandas as pd
from io import StringIO
a = pd.read_csv(StringIO("""
g N a
1 3 5
2 4 6
"""), sep=r"\s*")
b = pd.read_csv(StringIO("""
g N a e
3 3 4 7
4 9 1 8
"""), sep=r"\s*")
pd.concat([a, b])[b.columns]
This produces:
g N a e
0 1 3 5 NaN
1 2 4 6 NaN
0 3 3 4 7.0
1 4 9 1 8.0
You might also want to reset the index:
pd.concat([a, b])[b.columns].reset_index(drop=True)
... in order to remove index duplicates. This gives:
g N a e
0 1 3 5 NaN
1 2 4 6 NaN
2 3 3 4 7.0
3 4 9 1 8.0

pandas dataframe sum of shift(x) for x in range(1, n)

I have a dataframe with like this, and want to add a new column that is the equivalent of applying shift n times. For example, let n = 2:
df = pd.DataFrame(numpy.random.randint(0, 10, (10, 2)), columns=['a','b'])
a b
0 0 3
1 7 0
2 6 6
3 6 0
4 5 0
5 0 7
6 8 0
7 8 7
8 4 4
9 2 2
df['c'] = df['b'].shift(1) + df['b'].shift(2)
a b c
0 0 3 NaN
1 7 0 NaN
2 6 6 3.0
3 6 0 6.0
4 5 0 6.0
5 0 7 0.0
6 8 0 7.0
7 8 7 7.0
8 4 4 7.0
9 2 2 11.0
In this manner, column c gets the sum of the previous n values from column b.
Other than a loop, is there a better way to accomplish this for a large n?
You can use the rolling() method with a window of 2:
df['c'] = df.b.rolling(window = 2).sum().shift()
df
a b c
0 0 3 NaN
1 7 0 NaN
2 6 6 3.0
3 6 0 6.0
4 5 0 6.0
5 0 7 0.0
6 8 0 7.0
7 8 7 7.0
8 4 4 7.0
9 2 2 11.0

Categories

Resources