I have a row of pandas dataframe, i.e.
x p y q z
---------
1 4 2 5 3
I want to append only some columns ('x','y','z') of it to another dataframe as new columns with names 'a','b','c'.
Before:
A B
---
7 8
9 6
8 5
After
A B a b c
---------
7 8 1 2 3
9 6 1 2 3
8 5 1 2 3
try this,
df1=pd.DataFrame({'x':[1],'y':[2],'z':[3]})
df2=pd.DataFrame({'A':[7,9,8],'B':[8,6,5]})
print pd.concat([df2,df1],axis=1).fillna(method='ffill').rename(columns={'x':'a','y':'b','z':'c'})
A B a b c
0 7 8 1.0 2.0 3.0
1 9 6 1.0 2.0 3.0
2 8 5 1.0 2.0 3.0
Use assign by Series created by selecting 1. row of df1:
cols = ['x','y','z']
new_cols = ['a','b','c']
df = df2.assign(**pd.Series(df1[cols].iloc[0].values, index=new_cols))
print (df)
A B a b c
0 7 8 1 2 3
1 9 6 1 2 3
2 8 5 1 2 3
Related
I have a pandas data frame and I want to move the "F" column to after the "B" column. Is there a way to do that?
A B C D E F
0 7 1 8 1 6
1 8 2 5 8 5 8
2 9 3 6 8 5
3 1 8 1 3 4
4 6 8 2 5 0 9
5 2 N/A 1 3 8
df2
A B F C D E
0 7 1 6 8 1
1 8 2 8 5 8 5
2 9 3 5 6 8
3 1 4 8 1 3
4 6 8 9 2 5 0
5 2 8 N/A 1 3
So it should finally look like df2.
Thanks in advance.
You can try df.insert + df.pop after getting location of B by get_loc
df.insert(df.columns.get_loc("B")+1,"F",df.pop("F"))
print(df)
A B F C D E
0 7.0 1 6.0 NaN 8 1.0
1 8.0 2 8.0 5.0 8 5.0
2 9.0 3 5.0 6.0 8 NaN
3 1.0 8 NaN 1.0 3 4.0
4 6.0 8 9.0 2.0 5 0.0
5 NaN 2 8.0 NaN 1 3.0
Another minimalist, (and very specific!) approach:
df = df[list('ABFCDE')]
Here is a very simple answer to this(only one line). Giving littlebit more explanation to the answer from #warped
You can do that after you added the 'n' column into your df as follows.
import pandas as pd
df = pd.DataFrame({'l':['a','b','c','d'], 'v':[1,2,1,2]})
df['n'] = 0
df
l v n
0 a 1 0
1 b 2 0
2 c 1 0
3 d 2 0
# here you can add the below code and it should work.
df = df[list('nlv')]
df
n l v
0 0 a 1
1 0 b 2
2 0 c 1
3 0 d 2
However, if you have words in your columns names instead of letters. It should include two brackets around your column names.
import pandas as pd
df = pd.DataFrame({'Upper':['a','b','c','d'], 'Lower':[1,2,1,2]})
df['Net'] = 0
df['Mid'] = 2
df['Zsore'] = 2
df
Upper Lower Net Mid Zsore
0 a 1 0 2 2
1 b 2 0 2 2
2 c 1 0 2 2
3 d 2 0 2 2
# here you can add below line and it should work
df = df[list(('Mid','Upper', 'Lower', 'Net','Zsore'))]
df
Mid Upper Lower Net Zsore
0 2 a 1 0 2
1 2 b 2 0 2
2 2 c 1 0 2
3 2 d 2 0 2
I need to create a new dataframe from an existing one by selecting multiple columns, and appending those column values to a new column with it's corresponding index as a new column
So, lets say I have this as a dataframe:
A B C D E F
0 1 2 3 4 0
0 7 8 9 1 0
0 4 5 2 4 0
Transform into this by selecting columns B through E:
A index_value
1 1
7 1
4 1
2 2
8 2
5 2
3 3
9 3
2 3
4 4
1 4
4 4
So, for the new dataframe, column A would be all of the values from columns B through E in the old dataframe, and column index_value would correspond to the index value [starting from zero] of the selected columns.
I've been scratching my head for hours. Any help would be appreciated, thanks!
Python3, Using pandas & numpy libraries.
#Another way
A B C D E F
0 0 1 2 3 4 0
1 0 7 8 9 1 0
2 0 4 5 2 4 0
# Select columns to include
start_colum ='B'
end_column ='E'
index_column_name ='A'
#re-stack the dataframe
df = df.loc[:,start_colum:end_column].stack().sort_index(level=1).reset_index(level=0, drop=True).to_frame()
#Create the "index_value" column
df['index_value'] =pd.Categorical(df.index).codes+1
df.rename(columns={0:index_column_name}, inplace=True)
df.set_index(index_column_name, inplace=True)
df
index_value
A
1 1
7 1
4 1
2 2
8 2
5 2
3 3
9 3
2 3
4 4
1 4
4 4
This is just melt
df.columns = range(df.shape[1])
s = df.melt().loc[lambda x : x.value!=0]
s
variable value
3 1 1
4 1 7
5 1 4
6 2 2
7 2 8
8 2 5
9 3 3
10 3 9
11 3 2
12 4 4
13 4 1
14 4 4
Try using:
df = pd.melt(df[['B', 'C', 'D', 'E']])
# Or df['variable'] = df[['B', 'C', 'D', 'E']].melt()
df['variable'].shift().eq(df['variable'].shift(-1)).cumsum().shift(-1).ffill()
print(df)
Output:
variable value
0 1.0 1
1 1.0 7
2 1.0 4
3 2.0 2
4 2.0 8
5 2.0 5
6 3.0 3
7 3.0 9
8 3.0 2
9 4.0 4
10 4.0 1
11 4.0 4
How do I count the number of unique strings in a rolling window of a pandas dataframe?
a = pd.DataFrame(['a','b','a','a','b','c','d','e','e','e','e'])
a.rolling(3).apply(lambda x: len(np.unique(x)))
Output, same as original dataframe:
0
0 a
1 b
2 a
3 a
4 b
5 c
6 d
7 e
8 e
9 e
10 e
Expected:
0
0 1
1 2
2 2
3 2
4 2
5 3
6 3
7 3
8 2
9 1
10 1
I think you need first convert values to numeric - by factorize or by rank. Also min_periods parameter is necessary for avoid NaN in start of column:
a[0] = pd.factorize(a[0])[0]
print (a)
0
0 0
1 1
2 0
3 0
4 1
5 2
6 3
7 4
8 4
9 4
10 4
b = a.rolling(3, min_periods=1).apply(lambda x: len(np.unique(x))).astype(int)
print (b)
0
0 1
1 2
2 2
3 2
4 2
5 3
6 3
7 3
8 2
9 1
10 1
Or:
a[0] = a[0].rank(method='dense')
0
0 1.0
1 2.0
2 1.0
3 1.0
4 2.0
5 3.0
6 4.0
7 5.0
8 5.0
9 5.0
10 5.0
b = a.rolling(3, min_periods=1).apply(lambda x: len(np.unique(x))).astype(int)
print (b)
0
0 1
1 2
2 2
3 2
4 2
5 3
6 3
7 3
8 2
9 1
10 1
Consider the dataframes
A:
g N a
1 3 5
2 4 6
and B:
g N a e
3 3 4 7
4 9 1 8
Is there some way to merge these such that the resultant dataframe is:
g N a e
1 3 5 NaN
2 4 6 NaN
3 3 4 7
4 9 1 8
In other words, is there some way to preserve the column order rather than re-sort lexicographically?
Use reindex_axis:
pd.concat([A,B]).reindex_axis(B.columns, axis=1)
Output:
g N a e
0 1 3 5 NaN
1 2 4 6 NaN
0 3 3 4 7.0
1 4 9 1 8.0
When merging, specify sort=False.
In [1251]: A.merge(B, how='outer', sort=False)
Out[1251]:
g N a e
0 1 3 5 NaN
1 2 4 6 NaN
2 3 3 4 7.0
3 4 9 1 8.0
The following should do the trick: pd.concat([a, b])[b.columns]
Full test code:
import pandas as pd
from io import StringIO
a = pd.read_csv(StringIO("""
g N a
1 3 5
2 4 6
"""), sep=r"\s*")
b = pd.read_csv(StringIO("""
g N a e
3 3 4 7
4 9 1 8
"""), sep=r"\s*")
pd.concat([a, b])[b.columns]
This produces:
g N a e
0 1 3 5 NaN
1 2 4 6 NaN
0 3 3 4 7.0
1 4 9 1 8.0
You might also want to reset the index:
pd.concat([a, b])[b.columns].reset_index(drop=True)
... in order to remove index duplicates. This gives:
g N a e
0 1 3 5 NaN
1 2 4 6 NaN
2 3 3 4 7.0
3 4 9 1 8.0
the simple code
I have a dataframe df and i I have a dataframe and I divided into 3 dataframe of same size .However I wanted with these 3 dataframe created one dataframe .The columns of the new dataframe be the transposed these 3 dataframe ie there will 3 columns
In [4]: np.array_split(df, 3)
Out[4]:
[ A B C D
0 foo one -0.174067 -0.608579
1 bar one -0.860386 -1.210518
2 foo two 0.614102 1.689837,
A B C D
3 bar three -0.284792 -1.071160
4 foo two 0.843610 0.803712
5 bar two -1.514722 0.870861,
A B C D
6 foo one 0.131529 -0.968151
7 foo three -1.002946 -0.257468
8 foo three -1.002946 -0.257468]
UPDATE
Sliced and transposed
In [2]: df
Out[2]:
a b c
0 9 9 7
1 1 7 6
2 5 9 1
3 7 4 0
4 5 2 3
5 2 4 6
6 6 3 6
7 0 2 7
8 9 1 4
9 2 9 3
In [3]: dfs = [pd.DataFrame(a).T for a in np.array_split(df, 3)]
In [4]: dfs[0]
Out[4]:
0 1 2 3
a 9 1 5 7
b 9 7 9 4
c 7 6 1 0
OLD version
One option would be to use this:
In [114]: dfs = [pd.DataFrame(a) for a in np.array_split(df, 3)]
In [115]: dfs[0]
Out[115]:
a b c
0 NaN 7.0 0
1 0.0 NaN 4
2 2.0 NaN 4
3 1.0 7.0 0
In [116]: df
Out[116]:
a b c
0 NaN 7.0 0
1 0.0 NaN 4
2 2.0 NaN 4
3 1.0 7.0 0
4 1.0 3.0 9
5 7.0 4.0 9
6 2.0 6.0 9
7 9.0 6.0 4
8 3.0 0.0 9
9 9.0 0.0 1