Subtraction of elements column-wise, in pandas

Subtraction of elements column-wise, in pandas - python

I have the following dataframe:
frame=pd.DataFrame({"col1":[1,5,9,4,7,3],"col2":[5,8,7,9,3,4],"col3":[3,4,2,7,9,1],
"col4":[2,4,7,4,9,0],"col5":[3,4,5,2,1,1],"col6":[8,7,5,4,1,2]})
it results in the following output:
col1 col2 col3 col4 col5 col6
0 1 5 3 2 3 8
1 5 8 4 4 4 7
2 9 7 2 7 5 5
3 4 9 7 4 2 4
4 7 3 9 9 1 1
5 3 4 1 0 1 2
I want to create a new dataframe that differences col1 and col2, col3 and col4 and col5 and col6
Expected output is like that:
col1-col2 col3-col4 col5-col6
0 -4 1 -5
1 -3 0 -3
2 2 -5 0
3 -5 3 -2
4 4 0 0
5 -1 1 -1
Thanks in advance

dfr = pd.DataFrame({'col1-col2': frame.col1 - frame.col2,
'col3-col4': frame.col3 - frame.col4,
'col5-col6': frame.col5 - frame.col6})

If many columns use general solution - select pair and unpair columns, convert to numpy array and create new DataFrame by contructor:
#pandas 0.24+
arr = frame.iloc[:, ::2].to_numpy() - frame.iloc[:, 1::2].to_numpy()
#pandas below
#arr = frame.iloc[:, ::2].values - frame.iloc[:, 1::2].values
c = [f'{a}-{b}' for a, b in zip(frame.columns[::2], frame.columns[1::2])]
df = pd.DataFrame(arr, columns=c)
print (df)
col1-col2 col3-col4 col5-col6
0 -4 1 -5
1 -3 0 -3
2 2 -5 0
3 -5 3 -2
4 4 0 0
5 -1 1 -1
If performance is important, convert to numpy array first, store to variable and then indexing:
#pandas 0.24+
arr = frame.to_numpy()
#pandas below
#arr = frame.values
c = [f'{a}-{b}' for a, b in zip(frame.columns[::2], frame.columns[1::2])]
df = pd.DataFrame(arr[:, ::2] - arr[:, 1::2], columns=c)

df = pd.DataFrame(frame.apply(lambda x: [x['col1']-x['col2'],x['col3']-x['col4'],x['col5']-x['col6']],axis=1).tolist())
df.rename({0:'col1-col2',1:'col3-col4',2:'col4-col5'},axis=1)
col1-col2 col3-col4 col4-col5
0 -4 1 -5
1 -3 0 -3
2 2 -5 0
3 -5 3 -2
4 4 0 0
5 -1 1 -1

Related

How to use dictionary on np.where clause in pandas

I have the following dataframe
import pandas as pd
foo = pd.DataFrame({'id': [1,1,1,2,2,2],
'time': [1,2,3,1,2,3],
'col_id': ['ffp','ffp','ffp', 'hie', 'hie', 'ttt'],
'col_a': [1,2,3,4,5,6],
'col_b': [-1,-2,-3,-4,-5,-6],
'col_c': [10,20,30,40,50,60]})
id time col_id col_a col_b col_c
0 1 1 ffp 1 -1 10
1 1 2 ffp 2 -2 20
2 1 3 ffp 3 -3 30
3 2 1 hie 4 -4 40
4 2 2 hie 5 -5 50
5 2 3 ttt 6 -6 60
I would like to create a new col in foo, which will take the value of either col_a or col_b or col_c, depending on the value of col_id.
I am doing the following:
foo['col'] = np.where(foo.col_id == "ffp", foo.col_a,
np.where(foo.col_id == "hie",foo.col_b, foo.col_c))
which gives
id time col_id col_a col_b col_c col
0 1 1 ffp 1 -1 10 1
1 1 2 ffp 2 -2 20 2
2 1 3 ffp 3 -3 30 3
3 2 1 hie 4 -4 40 -4
4 2 2 hie 5 -5 50 -5
5 2 3 ttt 6 -6 60 60
Since I have a lot of columns, I was wondering if there is a cleaner way to do that, with using a dictionary for example:
dict_cols_matching = {"ffp" : "col_a", "hie": "col_b", "ttt": "col_c"}
Any ideas ?

You can map the values of the dictionary on col_id, then perform indexing lookup:
import numpy as np
idx, cols = pd.factorize(foo['col_id'].map(dict_cols_matching))
foo['col'] = foo.reindex(cols, axis=1).to_numpy()[np.arange(len(foo)), idx]
Output:
id time col_id col_a col_b col_c col
0 1 1 ffp 1 -1 10 1
1 1 2 ffp 2 -2 20 2
2 1 3 ffp 3 -3 30 3
3 2 1 hie 4 -4 40 -4
4 2 2 hie 5 -5 50 -5
5 2 3 ttt 6 -6 60 60

With np.select function to arrange condition list to choice list:
foo['col'] = np.select([foo.col_id.eq("ffp"), foo.col_id.eq("hie"), foo.col_id.eq("ttt")],
[foo.col_a, foo.col_b, foo.col_c])
id time col_id col_a col_b col_c col
0 1 1 ffp 1 -1 10 1
1 1 2 ffp 2 -2 20 2
2 1 3 ffp 3 -3 30 3
3 2 1 hie 4 -4 40 -4
4 2 2 hie 5 -5 50 -5
5 2 3 ttt 6 -6 60 60

You can use lambda function to select the column based on your id, but the method depends on the order of the columns, adjust the parameter 3 if you change the order.
import pandas as pd
import numpy as np
foo = pd.DataFrame({'id': [1,1,1,2,2,2],
'time': [1,2,3,1,2,3],
'col_id': ['ffp','ffp','ffp', 'hie', 'hie', 'ttt'],
'col_a': [1,2,3,4,5,6],
'col_b': [-1,-2,-3,-4,-5,-6],
'col_c': [10,20,30,40,50,60]})
idSet = np.unique(foo['col_id'].to_numpy()).tolist()
foo['col'] = foo.apply(lambda x: x[idSet.index(x.col_id)+3], axis=1)
display(foo)
Output
id time col_id col_a col_b col_c col
0 1 1 ffp 1 -1 10 1
1 1 2 ffp 2 -2 20 2
2 1 3 ffp 3 -3 30 3
3 2 1 hie 4 -4 40 -4
4 2 2 hie 5 -5 50 -5
5 2 3 ttt 6 -6 60 60

You might use a reset_index in combination with a rowwise apply:
foo[["col_id"]].reset_index().apply(lambda u: foo.loc[u["index"],dict_cols_matching[u["col_id"]]], axis=1)

Replace number in beginning and last python pandas data frame

I have table like this:
id a b
1 -10 -5
2 -5 0
3 0 5
4 5 10
i want to replace to -9999999 and 9999999 in the beginning and last like this:
id a b
1 -9999999 -5
2 -5 0
3 0 5
4 5 9999999
is there any possible way to do that?

Use if id is column use DataFrame.iat for set first row and second column and last row and last column by indexing:
df.iat[0, 1] = -9999999
df.iat[-1, -1] = 9999999
print (df)
id a b
0 1 -9999999 -5
1 2 -5 0
2 3 0 5
3 4 5 9999999
Use if id is index set first column:
df.iat[0, 0] = -9999999
df.iat[-1, -1] = 9999999
print (df)
a b
id
1 -9999999 -5
2 -5 0
3 0 5
4 5 9999999

Pandas: Create missing combination rows with zero values

Let's say I have a dataframe df:
df = pd.DataFrame({'col1': [1,1,2,2,2], 'col2': ['A','B','A','B','C'], 'value': [2,4,6,8,10]})
col1 col2 value
0 1 A 2
1 1 B 4
2 2 A 6
3 2 B 8
4 2 C 10
I'm looking for a way to create any missing rows among the possible combination of col1 and col2 with exiting values, and fill in the missing rows with zeros
The desired result would be:
col1 col2 value
0 1 A 2
1 1 B 4
2 2 A 6
3 2 B 8
4 2 C 10
5 1 C 0 <- Missing the "1-C" combination, so create it w/ value = 0
I've looked into using stack and unstack to make this work, but I'm not sure that's exactly what I need.
Thanks in advance

Use pivot , then stack
df.pivot(*df.columns).fillna(0).stack().to_frame('values').reset_index()
Out[564]:
col1 col2 values
0 1 A 2.0
1 1 B 4.0
2 1 C 0.0
3 2 A 6.0
4 2 B 8.0
5 2 C 10.0

Another way using unstack with fill_value=0 and stack, reset_index
df.set_index(['col1','col2']).unstack(fill_value=0).stack().reset_index()
Out[311]:
col1 col2 value
0 1 A 2
1 1 B 4
2 1 C 0
3 2 A 6
4 2 B 8
5 2 C 10

You could use reindex + MultiIndex.from_product:
index = pd.MultiIndex.from_product([df.col1.unique(), df.col2.unique()])
result = df.set_index(['col1', 'col2']).reindex(index, fill_value=0).reset_index()
print(result)
Output
col1 col2 value
0 1 A 2
1 1 B 4
2 1 C 0
3 2 A 6
4 2 B 8
5 2 C 10

splitting/grouping pandas dataframe column

I have a dataframe with a column populated with groups of 1s and 0s. How can I assign each group a consecutive number beginning from 1?
I have tried a for loop across rows, but I need a column operation for fast performance.
d = {'col1': [1,1,1,0,0,1,1,0,0,0,1,1]}
df1 = pd.DataFrame(data=d)
df1
col1
0 1
1 1
2 1
3 0
4 0
5 1
6 1
7 0
8 0
9 0
10 1
11 1
I need the following output:
col1 col2
0 1 1
1 1 1
2 1 1
3 0 2
4 0 2
5 1 3
6 1 3
7 0 4
8 0 4
9 0 4
10 1 5
11 1 5

You can compare shifted values for not equal and add cumulative sum by Series.cumsum:
df1['col2'] = df1['col1'].ne(df1['col1'].shift()).cumsum()
print (df1)
col1 col2
0 1 1
1 1 1
2 1 1
3 0 2
4 0 2
5 1 3
6 1 3
7 0 4
8 0 4
9 0 4
10 1 5
11 1 5

Python - Cbind previous and next row to current row

I have a Pandas data frame like so:
d = {'col1': [1, 2], 'col2': [3, 4], 'col3': [5, 6]}
df = pd.DataFrame(data=d)
Which looks like:
doc sent col1 col2 col3
0 0 0 5 4 8
1 0 1 6 3 2
2 0 2 1 2 9
3 1 0 6 1 6
4 1 1 5 1 5
I'd like to bind the previous row and the next next row to each column like so (accounting for "doc" and "sent" column in my example, which count as indices that nothing can come before or after as seen below):
doc sent col1 col2 col3 p_col1 p_col2 p_col3 n_col1 n_col2 n_col3
0 0 0 5 4 8 0 0 0 6 3 2
1 0 1 6 3 2 5 4 8 1 2 9
2 0 2 1 2 9 6 3 2 6 1 6
3 1 0 6 1 6 0 0 0 5 1 5
4 1 1 5 1 5 6 1 6 0 0 0

use pd.DataFrame.shift to get the prev / next rows, pd.concat to merge the dataframes & fillna to set nulls to zero
The presence of nulls upcasts the ints to floats, since numpy integer arrays cannot contain null values, which are cast back to ints after replacing nulls with 0.
cs = ['col1', 'col2', 'col3']
g = df.groupby('doc')
pd.concat([
df,
g[cs].shift(-1).add_prefix('n'),
g[cs].shift().add_prefix('p')
], axis=1).fillna(0).astype(int)
outputs:
doc sent col1 col2 col3 ncol1 ncol2 ncol3 pcol1 pcol2 pcol3
0 0 0 5 4 8 6 3 2 0 0 0
1 0 1 6 3 2 1 2 9 5 4 8
2 0 2 1 2 9 0 0 0 6 3 2
3 1 0 6 1 6 5 1 5 0 0 0
4 1 1 5 1 5 0 0 0 6 1 6

Develop Reference

Python is a programming language that lets you work quickly and integrate systems more effectively.

Subtraction of elements column-wise, in pandas - python

dfr = pd.DataFrame({'col1-col2': frame.col1 - frame.col2, 'col3-col4': frame.col3 - frame.col4, 'col5-col6': frame.col5 - frame.col6})

df = pd.DataFrame(frame.apply(lambda x: [x['col1']-x['col2'],x['col3']-x['col4'],x['col5']-x['col6']],axis=1).tolist()) df.rename({0:'col1-col2',1:'col3-col4',2:'col4-col5'},axis=1) col1-col2 col3-col4 col4-col5 0 -4 1 -5 1 -3 0 -3 2 2 -5 0 3 -5 3 -2 4 4 0 0 5 -1 1 -1

Related

How to use dictionary on np.where clause in pandas

Replace number in beginning and last python pandas data frame

Pandas: Create missing combination rows with zero values

splitting/grouping pandas dataframe column

Python - Cbind previous and next row to current row

Categories

Resources