Pandas - find second largest value in each row [duplicate] - python

This question already has answers here:
How do I obtain the second highest value in a row?
(3 answers)
Closed 10 months ago.
Good morning! I have a three column dataframe and need to find the second largest value per each row
DATA=pd.DataFrame({"A":[10,11,4,5],"B":[23,8,3,4],"C":[12,7,11,9]})
A B C
0 10 23 12
1 11 8 7
2 4 3 11
3 5 4 9
I tried using nlargest but it seems to be column based and can't find a pandas solution for this problem. Thank you in advance!

import pandas as pd
df=pd.DataFrame({"A":[10,11,4,5],"B":[23,8,3,4],"C":[12,7,11,9]})
# find the second largest value for each row
df['largest2'] = df.apply(lambda x: x.nlargest(2).iloc[1], axis=1)
print(df.head())
result:
A B C largest2
0 10 23 12 12
1 11 8 7 8
2 4 3 11 4
3 5 4 9 5

In A Python List
mylist = [1, 2, 8, 3, 12]
print(sorted(mylist, reverse=True)[1])
In A Python Pandas List
import pandas as pd
df=pd.DataFrame({"A":[10,11,4,5],"B":[23,8,3,4],"C":[12,7,11,9]})
print(sorted(df['A'].nlargest(4))[3])
print(sorted(df['B'].nlargest(4))[3])
print(sorted(df['C'].nlargest(4))[3])
In A Python Pandas List mk.2
import pandas as pd
df=pd.DataFrame({"A":[10,11,4,5],"B":[23,8,3,4],"C":[12,7,11,9]})
num_of_rows = len(df.index)
second_highest = num_of_rows - 2
print(sorted(df['A'].nlargest(num_of_rows))[second_highest])
print(sorted(df['B'].nlargest(num_of_rows))[second_highest])
print(sorted(df['C'].nlargest(num_of_rows))[second_highest])
In A Python Pandas List mk.3
import pandas as pd
df=pd.DataFrame({"A":[10,11,4,5],"B":[23,8,3,4],"C":[12,7,11,9]})
col_names
num_of_rows = len(df.index)
second_highest = num_of_rows - 2
for col_name in col_names:
print(sorted(df[col_name].nlargest(num_of_rows))[second_highest])
In A Python Pandas List mk.4
import pandas as pd
df=pd.DataFrame({"A":[10,11,4,5],"B":[23,8,3,4],"C":[12,7,11,9]})
top_n = (len(df.columns))
pd.DataFrame({n: df.T[col].nlargest(top_n).index.tolist()
for n, col in enumerate(df.T)}).T
df.apply(pd.Series.nlargest, axis=1, n=2)

Related

how to rotate two dimensional array value using numpy in python

Hi all I need to rotate two dimensional array as shown in the given picture. and if we rotate one set of array it should reflect for all the problems if you find out please do help me to solve the issue
input:
output:
Thankyou
I have tried slicing method to rotate the values but it doesn't give the correct values
import pandas as pd
df = pd.read_csv("/content/pipe2.csv")
df1= df.iloc[6:10]+df.iloc[13:20]
df1
You can use numpy.roll and the DataFrame constructor:
N = -2
out = pd.DataFrame(np.roll(df, N, axis=1),
columns=df.columns, index=df.index)
Example output:
0 1 2 3 4 5 6
0 3 4 5 6 7 1 2
Used input:
0 1 2 3 4 5 6
0 1 2 3 4 5 6 7
Use this:
import pandas as pd
df = pd.read_csv("/content/pipe2.csv")
df1=pd.DataFrame(data=df)
df1_transposed = df1.transpose()
df1_transposed

Adding a new column where some values are manipulated [duplicate]

This question already has answers here:
Pandas conditional creation of a series/dataframe column
(13 answers)
Closed 3 years ago.
I have a dataframe where say, 1 column is filled with dates and the 2nd column is filled with Ages. I want to add a 3rd column which looks at the Ages column and multiplies it the value by 2 if the value in the row is < 20, else just put the Age in that row. The lambda function below multiples every Age by 2.
def fun(df):
change = df.loc[:, "AGE"].apply(lambda x: x * 2 if x <20 else x)
df.insert(2, "NEW_AGE", change)
return df
Use pandas.Series.where:
import pandas as pd
import numpy as np
df = pd.DataFrame(np.arange(15, 25), columns=['AGE'])
df['AGE'].where(df['AGE'] >= 20, df['AGE'] * 2)
Output:
0 30
1 32
2 34
3 36
4 38
5 20
6 21
7 22
8 23
9 24
Name: AGE, dtype: int64

How do I stop for loop that sums value in a column returning multiple identical values? [duplicate]

This question already has answers here:
How do I Pandas group-by to get sum?
(11 answers)
Closed 4 years ago.
Suppose I have the following data frame:
import pandas as pd
df = pd.DataFrame()
df['ID'] = 1, 1, 1, 2, 2, 3, 3
df['a'] = 3, 5, 6, 3, 8, 1, 2
I want to create a for loop that loops over ID and returns the sum of 'a' for that ID. So far I have this:
for i in df['ID']:
print(i, df.loc[df['ID'] == i, 'a'].sum())
However this returns multiples of the same value like so:
1 14
1 14
1 14
2 11
2 11
3 3
3 3
How do I edit my pool so that once it has returned the value for 'id' == 1 it moves on to the next id value rather than just down to the next row?
I'm looking to get the following:
1 14
2 11
3 3
Thanks in advance!
This is much better suited to groupby rather than looping (as are many pandas dataframe problems):
>>> df.groupby('ID')['a'].sum()
ID
1 14
2 11
3 3
Name: a, dtype: int64
However, just to explain where your loop went wrong, you can just loop through the unique values of df['ID'], rather than all rows:
for i in df['ID'].unique():
print(i, df.loc[df['ID'] == i, 'a'].sum())
1 14
2 11
3 3

Resample Pandas Dataframe based on defined value

I'm trying to set the 'Num' column to a max/min threshold of 10 and reindex the dataframe based on this aggregation.
import pandas as pd
import numpy
df = pd.DataFrame({'Num':[2,12,4,25,5]})
----------------------------------------
Num
0 2
1 12
2 4
3 25
4 5
How can I re-index the Pandas Dataframe so it looks like this?
Num
0 10
1 10
2 10
3 10
4 8
Thanks!
Seems like you need
df = pd.DataFrame({'Num':[2,12,4,25,5]})
s=df.Num.sum()
df.iloc[:s//10,0]=10
df.iloc[-1,0]=10 if s%10==0 else s%10
df
Out[369]:
Num
0 10
1 10
2 10
3 10
4 8

Convert values in a column to a single row Python

I have a pandas dataframe that looks like this:
Area1 Area2
1 2
1 4
1 5
1 9
2 8
2 16
2 4
2 1
3 8
3 9
How can I convert 'Area2' column so that it becomes a list of values for each 'Area1' column
So the output I would want is:
Area1 Area2
1 2, 4, 5, 9
2 8, 16, 4, 1
3 8, 9
I have done this in R previously:
df %>% group_by(Area1) %>% summarise(Area2= toString(sort(unique(Area2))))
I have been trying out groupby() and agg() but have had no success.
Could someone explain what I can use once I have grouped the data using df.groupby('Area1')
Many thanks in advance for any suggestions.
You can groupby and apply list
import pandas as pd
df=pd.read_csv("test.csv")
df.groupby('Area1')['Area2'].apply(list)
The R snippet does string concatenation.
The following line keeps the original type of Area2.
import pandas as pd
df.groupby('Area1').Area2.apply(pd.Series.tolist).reset_index()

Categories

Resources