pandas DataFrame - find max between offset columns - python

Suppose I have a pandas dataframe given by
import pandas as pd
import numpy as np
df = pd.DataFrame(np.random.randn(5,2))
df
0 1
0 0.264053 -1.225456
1 0.805492 -1.072943
2 0.142433 -0.469905
3 0.758322 0.804881
4 -0.281493 0.602433
I want to return a Series object with 4 rows, containing max(df[0,0], df[1,1]), max(df[1,0], df[2,1]), max(df[2,0], df[3,1]), max(df[3,0], df[4,1]). More generally, what is the best way to compare the max of column 0 and column 1 offset by n rows?
Thanks.

You want to apply max to rows after having shifted the first column.
pd.concat([df.iloc[:, 0].shift(), df.iloc[:, 1]], axis=1).apply(max, axis=1).dropna()

Related

New column in dataset based em last value of item

I have this dataset
In [4]: df = pd.DataFrame({'A':[1, 2, 3, 4, 5]})
In [5]: df
Out[5]:
A
0 1
1 2
2 3
3 4
4 5
I want to add a new column in dataset based em last value of item, like this
A
New Column
1
2
1
3
2
4
3
5
4
I tryed to use apply with iloc, but it doesn't worked
Can you help
Thank you
With your shown samples, could you please try following. You could use shift function to get the new column which will move all elements of given column into new column with a NaN in first element.
import pandas as pd
df['New_Col'] = df['A'].shift()
OR
In case you would like to fill NaNs with zeros then try following, approach is same as above for this one too.
import pandas as pd
df['New_Col'] = df['A'].shift().fillna(0)

Fetch max value column with rows condition

I want to fetch the max value according to 2 columns in a pandas dataframe. I managed to do this according to 1 column but not 2.
For 1 column:
import numpy as np
import pandas as pd
df = pd.DataFrame({"name": list("ABABCD"), "value": np.arange(6)})
maxes = df.groupby(["name"]).agg("max")
df["maxvalue"]=df["name"].apply(lambda x: maxes.loc[x])
>>> df
name value maxvalue
0 A 0 2
1 B 1 3
2 A 2 2
3 B 3 3
4 C 4 4
5 D 5 5
For 2 columns, I've tried this but it doesn't work:
import numpy as np
import pandas as pd
df = pd.DataFrame({"name": list("ABABCD"),"name2": list("MNOMNO"), "value": np.arange(6)})
maxes = df.groupby(["name","name2"]).agg("max")
df["maxvalue"]=df[["name","name2"]].apply(lambda x: maxes.loc[x])
How can this be done for multiple columns?
Use transform instead of agg. Using one or two columns is exactly the same, for two columns it will be as follows:
df["maxvalue"] = df.groupby(["name", "name2"]).transform("max")

Pandas sort all rows

Is there a way to sort each row of a pandas data frame?
I don't care about columns names or row indexes, I just want a table with the values of each row sorted from highest to lowest.
You can use np.sort with axis=1 on the numpy data:
# sample data
np.random.seed(1)
df = pd.DataFrame(np.random.randint(1,10, (2,4)))
# output
pd.DataFrame(np.sort(df.values, axis=1)[:,::-1],
index=df.index,
columns=df.columns)
Output:
0 1 2 3
0 9 6 6 1
1 8 7 2 1
If you want to override your original dataframe:
df[:] = np.sort(df.values, axis=1)[:,::-1]
Update
np.sort(df)[:,::-1] works as well, df is downcast to a numpy array, and axis=-1 is default.

How to multiply across axis 1 in pandas dataframe?

I have a dataframe of numbers and would like to multiply each observation row wise or along axis = 1 and output the answer in another column. As an example:
import pandas as pd
import numpy as np
arr = np.array([2, 3, 4])
df = pd.DataFrame(arr).transpose()
df
What I would like is a column that has value 24 from multiplying column 0 by column 1 by column 2.
I tried the df.mul(axis = 1) but that didn't work.
I'm sure this is easy but all I find is multiplying each column by a constant.
This is prod
df.prod(1)
Out[69]:
0 24
dtype: int32
try to do some thing like this:
import numpy
def multiplyFunction(row):
return numpy.prod(row)
df['result'] = df.apply(multiplyFunction, axis=1)
df.head()
Result
0 1 2 result
0 2 3 4 24
Let me know if it's help

Finding the maximum of a row in a dataframe and returning its column name in pandas

I have a data set of football players. I need to find the maximum of Penalties or Volleys for each player and add a column at the that prints the maximum value and also whether it was Penalties or Volleys. I tried the following code:
import pandas as pd
import numpy as np
df=pd.read_excel(r'C:\Users\shrey\Desktop\FullData.xlsx')
for j,i in df.iterrows():
data=i[['Penalties','Volleys']]
i['max']=np.max(data)
i['max_attr']=i.idxmax()
But this gives me an error - reduction operation 'argmax' not allowed for this dtype
How should I go about with it?
You don't need to iterate rows here. Instead, you can use pd.DataFrame.max and pd.DataFrame.idxmax to perform vectorised calculations:
cols = ['Penalties', 'Volleys']
df['max'] = df[cols].max(1)
df['max_attr'] = df[cols].idxmax(1)
Here's a demo:
df = pd.DataFrame([[2, 3], [5, 1]], columns=['Penalties', 'Volleys'])
cols = ['Penalties', 'Volleys']
df['max'] = df[cols].max(1)
df['max_attr'] = df[cols].idxmax(1)
print(df)
Penalties Volleys max max_attr
0 2 3 3 Volleys
1 5 1 5 Penalties

Categories

Resources