Insert columns with constants in pandas

Insert columns with constants in pandas - python

I have a row of pandas dataframe, i.e.
x p y q z
---------
1 4 2 5 3
I want to append only some columns ('x','y','z') of it to another dataframe as new columns with names 'a','b','c'.
Before:
A B
---
7 8
9 6
8 5
After
A B a b c
---------
7 8 1 2 3
9 6 1 2 3
8 5 1 2 3

try this,
df1=pd.DataFrame({'x':[1],'y':[2],'z':[3]})
df2=pd.DataFrame({'A':[7,9,8],'B':[8,6,5]})
print pd.concat([df2,df1],axis=1).fillna(method='ffill').rename(columns={'x':'a','y':'b','z':'c'})
A B a b c
0 7 8 1.0 2.0 3.0
1 9 6 1.0 2.0 3.0
2 8 5 1.0 2.0 3.0

Use assign by Series created by selecting 1. row of df1:
cols = ['x','y','z']
new_cols = ['a','b','c']
df = df2.assign(**pd.Series(df1[cols].iloc[0].values, index=new_cols))
print (df)
A B a b c
0 7 8 1 2 3
1 9 6 1 2 3
2 8 5 1 2 3

Related

how do I insert a column at a specific column index in pandas data frame? (Change column order in pandas data frame)

I have a pandas data frame and I want to move the "F" column to after the "B" column. Is there a way to do that?
A B C D E F
0 7 1 8 1 6
1 8 2 5 8 5 8
2 9 3 6 8 5
3 1 8 1 3 4
4 6 8 2 5 0 9
5 2 N/A 1 3 8
df2
A B F C D E
0 7 1 6 8 1
1 8 2 8 5 8 5
2 9 3 5 6 8
3 1 4 8 1 3
4 6 8 9 2 5 0
5 2 8 N/A 1 3
So it should finally look like df2.
Thanks in advance.

You can try df.insert + df.pop after getting location of B by get_loc
df.insert(df.columns.get_loc("B")+1,"F",df.pop("F"))
print(df)
A B F C D E
0 7.0 1 6.0 NaN 8 1.0
1 8.0 2 8.0 5.0 8 5.0
2 9.0 3 5.0 6.0 8 NaN
3 1.0 8 NaN 1.0 3 4.0
4 6.0 8 9.0 2.0 5 0.0
5 NaN 2 8.0 NaN 1 3.0

Another minimalist, (and very specific!) approach:
df = df[list('ABFCDE')]

Here is a very simple answer to this(only one line). Giving littlebit more explanation to the answer from #warped
You can do that after you added the 'n' column into your df as follows.
import pandas as pd
df = pd.DataFrame({'l':['a','b','c','d'], 'v':[1,2,1,2]})
df['n'] = 0
df
l v n
0 a 1 0
1 b 2 0
2 c 1 0
3 d 2 0
# here you can add the below code and it should work.
df = df[list('nlv')]
df
n l v
0 0 a 1
1 0 b 2
2 0 c 1
3 0 d 2
However, if you have words in your columns names instead of letters. It should include two brackets around your column names.
import pandas as pd
df = pd.DataFrame({'Upper':['a','b','c','d'], 'Lower':[1,2,1,2]})
df['Net'] = 0
df['Mid'] = 2
df['Zsore'] = 2
df
Upper Lower Net Mid Zsore
0 a 1 0 2 2
1 b 2 0 2 2
2 c 1 0 2 2
3 d 2 0 2 2
# here you can add below line and it should work
df = df[list(('Mid','Upper', 'Lower', 'Net','Zsore'))]
df
Mid Upper Lower Net Zsore
0 2 a 1 0 2
1 2 b 2 0 2
2 2 c 1 0 2
3 2 d 2 0 2

Indexing new dataframes into new columns with pandas

I need to create a new dataframe from an existing one by selecting multiple columns, and appending those column values to a new column with it's corresponding index as a new column
So, lets say I have this as a dataframe:
A B C D E F
0 1 2 3 4 0
0 7 8 9 1 0
0 4 5 2 4 0
Transform into this by selecting columns B through E:
A index_value
1 1
7 1
4 1
2 2
8 2
5 2
3 3
9 3
2 3
4 4
1 4
4 4
So, for the new dataframe, column A would be all of the values from columns B through E in the old dataframe, and column index_value would correspond to the index value [starting from zero] of the selected columns.
I've been scratching my head for hours. Any help would be appreciated, thanks!
Python3, Using pandas & numpy libraries.

#Another way
A B C D E F
0 0 1 2 3 4 0
1 0 7 8 9 1 0
2 0 4 5 2 4 0
# Select columns to include
start_colum ='B'
end_column ='E'
index_column_name ='A'
#re-stack the dataframe
df = df.loc[:,start_colum:end_column].stack().sort_index(level=1).reset_index(level=0, drop=True).to_frame()
#Create the "index_value" column
df['index_value'] =pd.Categorical(df.index).codes+1
df.rename(columns={0:index_column_name}, inplace=True)
df.set_index(index_column_name, inplace=True)
df
index_value
A
1 1
7 1
4 1
2 2
8 2
5 2
3 3
9 3
2 3
4 4
1 4
4 4

This is just melt
df.columns = range(df.shape[1])
s = df.melt().loc[lambda x : x.value!=0]
s
variable value
3 1 1
4 1 7
5 1 4
6 2 2
7 2 8
8 2 5
9 3 3
10 3 9
11 3 2
12 4 4
13 4 1
14 4 4

Try using:
df = pd.melt(df[['B', 'C', 'D', 'E']])
# Or df['variable'] = df[['B', 'C', 'D', 'E']].melt()
df['variable'].shift().eq(df['variable'].shift(-1)).cumsum().shift(-1).ffill()
print(df)
Output:
variable value
0 1.0 1
1 1.0 7
2 1.0 4
3 2.0 2
4 2.0 8
5 2.0 5
6 3.0 3
7 3.0 9
8 3.0 2
9 4.0 4
10 4.0 1
11 4.0 4

Count distinct strings in rolling window using pandas

How do I count the number of unique strings in a rolling window of a pandas dataframe?
a = pd.DataFrame(['a','b','a','a','b','c','d','e','e','e','e'])
a.rolling(3).apply(lambda x: len(np.unique(x)))
Output, same as original dataframe:
0
0 a
1 b
2 a
3 a
4 b
5 c
6 d
7 e
8 e
9 e
10 e
Expected:
0
0 1
1 2
2 2
3 2
4 2
5 3
6 3
7 3
8 2
9 1
10 1

I think you need first convert values to numeric - by factorize or by rank. Also min_periods parameter is necessary for avoid NaN in start of column:
a[0] = pd.factorize(a[0])[0]
print (a)
0
0 0
1 1
2 0
3 0
4 1
5 2
6 3
7 4
8 4
9 4
10 4
b = a.rolling(3, min_periods=1).apply(lambda x: len(np.unique(x))).astype(int)
print (b)
0
0 1
1 2
2 2
3 2
4 2
5 3
6 3
7 3
8 2
9 1
10 1
Or:
a[0] = a[0].rank(method='dense')
0
0 1.0
1 2.0
2 1.0
3 1.0
4 2.0
5 3.0
6 4.0
7 5.0
8 5.0
9 5.0
10 5.0
b = a.rolling(3, min_periods=1).apply(lambda x: len(np.unique(x))).astype(int)
print (b)
0
0 1
1 2
2 2
3 2
4 2
5 3
6 3
7 3
8 2
9 1
10 1

How can I merge two dataframes of dissimilar size and preserve their column order?

Consider the dataframes
A:
g N a
1 3 5
2 4 6
and B:
g N a e
3 3 4 7
4 9 1 8
Is there some way to merge these such that the resultant dataframe is:
g N a e
1 3 5 NaN
2 4 6 NaN
3 3 4 7
4 9 1 8
In other words, is there some way to preserve the column order rather than re-sort lexicographically?

Use reindex_axis:
pd.concat([A,B]).reindex_axis(B.columns, axis=1)
Output:
g N a e
0 1 3 5 NaN
1 2 4 6 NaN
0 3 3 4 7.0
1 4 9 1 8.0

When merging, specify sort=False.
In [1251]: A.merge(B, how='outer', sort=False)
Out[1251]:
g N a e
0 1 3 5 NaN
1 2 4 6 NaN
2 3 3 4 7.0
3 4 9 1 8.0

The following should do the trick: pd.concat([a, b])[b.columns]
Full test code:
import pandas as pd
from io import StringIO
a = pd.read_csv(StringIO("""
g N a
1 3 5
2 4 6
"""), sep=r"\s*")
b = pd.read_csv(StringIO("""
g N a e
3 3 4 7
4 9 1 8
"""), sep=r"\s*")
pd.concat([a, b])[b.columns]
This produces:
g N a e
0 1 3 5 NaN
1 2 4 6 NaN
0 3 3 4 7.0
1 4 9 1 8.0
You might also want to reset the index:
pd.concat([a, b])[b.columns].reset_index(drop=True)
... in order to remove index duplicates. This gives:
g N a e
0 1 3 5 NaN
1 2 4 6 NaN
2 3 3 4 7.0
3 4 9 1 8.0

split dataframe and create a new dataframe

the simple code
I have a dataframe df and i I have a dataframe and I divided into 3 dataframe of same size .However I wanted with these 3 dataframe created one dataframe .The columns of the new dataframe be the transposed these 3 dataframe ie there will 3 columns
In [4]: np.array_split(df, 3)
Out[4]:
[ A B C D
0 foo one -0.174067 -0.608579
1 bar one -0.860386 -1.210518
2 foo two 0.614102 1.689837,
A B C D
3 bar three -0.284792 -1.071160
4 foo two 0.843610 0.803712
5 bar two -1.514722 0.870861,
A B C D
6 foo one 0.131529 -0.968151
7 foo three -1.002946 -0.257468
8 foo three -1.002946 -0.257468]

UPDATE
Sliced and transposed
In [2]: df
Out[2]:
a b c
0 9 9 7
1 1 7 6
2 5 9 1
3 7 4 0
4 5 2 3
5 2 4 6
6 6 3 6
7 0 2 7
8 9 1 4
9 2 9 3
In [3]: dfs = [pd.DataFrame(a).T for a in np.array_split(df, 3)]
In [4]: dfs[0]
Out[4]:
0 1 2 3
a 9 1 5 7
b 9 7 9 4
c 7 6 1 0
OLD version
One option would be to use this:
In [114]: dfs = [pd.DataFrame(a) for a in np.array_split(df, 3)]
In [115]: dfs[0]
Out[115]:
a b c
0 NaN 7.0 0
1 0.0 NaN 4
2 2.0 NaN 4
3 1.0 7.0 0
In [116]: df
Out[116]:
a b c
0 NaN 7.0 0
1 0.0 NaN 4
2 2.0 NaN 4
3 1.0 7.0 0
4 1.0 3.0 9
5 7.0 4.0 9
6 2.0 6.0 9
7 9.0 6.0 4
8 3.0 0.0 9
9 9.0 0.0 1

Develop Reference

Python is a programming language that lets you work quickly and integrate systems more effectively.

Insert columns with constants in pandas - python

I have a row of pandas dataframe, i.e. x p y q z --------- 1 4 2 5 3 I want to append only some columns ('x','y','z') of it to another dataframe as new columns with names 'a','b','c'. Before: A B --- 7 8 9 6 8 5 After A B a b c --------- 7 8 1 2 3 9 6 1 2 3 8 5 1 2 3

try this, df1=pd.DataFrame({'x':[1],'y':[2],'z':[3]}) df2=pd.DataFrame({'A':[7,9,8],'B':[8,6,5]}) print pd.concat([df2,df1],axis=1).fillna(method='ffill').rename(columns={'x':'a','y':'b','z':'c'}) A B a b c 0 7 8 1.0 2.0 3.0 1 9 6 1.0 2.0 3.0 2 8 5 1.0 2.0 3.0

Use assign by Series created by selecting 1. row of df1: cols = ['x','y','z'] new_cols = ['a','b','c'] df = df2.assign(**pd.Series(df1[cols].iloc[0].values, index=new_cols)) print (df) A B a b c 0 7 8 1 2 3 1 9 6 1 2 3 2 8 5 1 2 3

Related

how do I insert a column at a specific column index in pandas data frame? (Change column order in pandas data frame)

Indexing new dataframes into new columns with pandas

Count distinct strings in rolling window using pandas

How can I merge two dataframes of dissimilar size and preserve their column order?

split dataframe and create a new dataframe

Categories

Resources