Selecting max within partition for pandas dataframe [duplicate]

Selecting max within partition for pandas dataframe [duplicate] - python

This question already has answers here:
Python pandas - filter rows after groupby
(4 answers)
Closed 8 years ago.
I have a pandas dataframe. My goal is to select only those rows where column C has the largest value within group B. For example, when B is "one" the maximum value of C is 311, so I would like the row where C = 311 and B = "one."
import pandas as pd
import numpy as np
df2 = pd.DataFrame({ 'A' : 1.,
'A' : pd.Categorical(["test1","test2","test3","test4"]),
'B' : pd.Categorical(["one","one","two","two"]),
'C' : np.array([311,42,31,41]),
'D' : np.array([9,8,7,6])
})
df2.groupby('C').max()
Output should be:
test1 one 311 9
test4 two 41 6

You can use idxmax(), which returns the indices of the max values:
maxes = df2.groupby('B')['C'].idxmax()
df2.loc[maxes]
Output:
Out[11]:
A B C D
0 test1 one 311 9
3 test4 two 41 6

Related

How to change pandas table data arrangement? [duplicate]

This question already has answers here:
How do I melt a pandas dataframe?
(3 answers)
Closed 7 days ago.
I would like to change the arrangement of this table:
import pandas as pd
original_dict = {
"group A" : [10,9,11],
"group B" :[23,42,56]
}
original_df = pd.DataFrame(original_dict)
original_df
Here is the desired output:
Value
Group Type
10
group A
9
group A
11
group A
23
group B
42
group B
56
group B
Thank you!

You can use Pandas Melt function.
https://pandas.pydata.org/docs/reference/api/pandas.melt.html
df = pd.melt(original_df)
df.columns=['Group Type', 'Value']
df
Group Type Value
group A 10
group A 9
group A 11
group B 23
group B 42
group B 56

Copy Row(s) from One DataFrame to Another with Regex [duplicate]

This question already has answers here:
How to test if a string contains one of the substrings in a list, in pandas?
(4 answers)
Closed 5 months ago.
I am trying to extract specific rows from a dataframe where values in a column contain a designated string. For example, the current dataframe looks like:
df1=
Location Value Name Type
Up 10 Test A X
Up 12 Test B Y
Down 11 Prod 1 Y
Left 8 Test C Y
Down 15 Prod 2 Y
Right 30 Prod 3 X
And I am trying to build a new dataframe will all rows that have "Test" in the 'Name' column.
df2=
Location Value Name Type
Up 10 Test A X
Up 12 Test B Y
Left 8 Test C Y
Is there a way to do this with regex or match?

Try:
df_out = df[df["Name"].str.contains("Test")]
print(df_out)
Prints:
Location Value Name Type
0 Up 10 Test A X
1 Up 12 Test B Y
3 Left 8 Test C Y

How about: df2 = df1.loc[['Test' in name for name in df1.Name ]]

How to implement the excel function IF(H3>I3,C2,0) in pandas

In column J would like to get the value as per excel function ie IF(H3>I3,C2,0) and based on that occurance value ie from bottom to up 1st occurance as the latest one and next to that is 2nd occurance.
enter image description here

Here is the solution:
import pandas as pd
import numpy as np
# suppose we have this DataFrame:
df = pd.DataFrame({'A':[55,23,11,100,9] , 'B':[12,72,35,4,100]})
# suppose we want to reflect values of 'A' column if its values are equal or more than values in 'B' column, otherwise return 0
# so i'll make another column named 'Result' to put the results in it
df['Result'] = np.where(df['A'] >= df['B'] , df['A'] , 0)
then if you try to print DataFrame:
df
result:
A B Result
0 55 12 55
1 11 72 0
2 23 35 0
3 100 4 100
4 9 100 0

reshape data to split one column into multiple columns based on delimiter in pandas or otherwise in python [duplicate]

This question already has answers here:
How to unnest (explode) a column in a pandas DataFrame, into multiple rows
(16 answers)
Closed 3 years ago.
I have the following dataframe
df_in = pd.DataFrame({
'State':['C','B','D','A','C','B'],
'Contact':['alpha a. theta| beta','beta| alpha a. theta| delta','Theta','gamma| delta','alpha|Eta| gamma| delta','beta'],
'Timestamp':[911583000000,912020000000,912449000000,912742000000,913863000000,915644000000]})
How do I transform it so that the second column which has pipe separated data is broken out into different rows as follows:
df_out = pd.DataFrame({
'State':['C','C','B','B','B','D','A','A','C','C','C','C','B'],
'Contact':['alpha a. theta','beta','beta','alpha a. theta','delta','Theta','gamma', 'delta','alpha','Eta','gamma','delta','beta'],
'Timestamp':[911583000000,911583000000,912020000000,912020000000,912020000000,912449000000,912742000000,912742000000,913863000000,913863000000,913863000000,913863000000,915644000000]})
print(df_in)
print(df_out)
I can use pd.melt but for that I already need to have the 'Contact' column broken out into multiple columns and not have all the contacts in one column separated by a delimiter.

You could split the column, then merge on the index:
df_in.Contact.str.split('|',expand=True).stack().reset_index()\
.merge(df_in.reset_index(),left_on ='level_0',right_on='index')\
.drop(['level_0','level_1','index','Contact'],1)
Out:
0 State Timestamp
0 alpha a. theta C 911583000000
1 beta C 911583000000
2 beta B 912020000000
3 alpha a. theta B 912020000000
4 delta B 912020000000
5 Theta D 912449000000
6 gamma A 912742000000
7 delta A 912742000000
8 alpha C 913863000000
9 Eta C 913863000000
10 gamma C 913863000000
11 delta C 913863000000
12 beta B 915644000000

Splitting multiple columns on a delimiter into rows in pandas dataframe [duplicate]

This question already has answers here:
pandas: records with lists to separate rows
(3 answers)
Closed 4 years ago.
I have a pandas dataframe as shown here:
id pos value sent
1 a/b/c test/test2/test3 21
2 d/a test/test5 21
I would like to split (=explode)df['pos'] and df['token'] so that the dataframe looks like this:
id pos value sent
1 a test 21
1 b test2 21
1 c test3 21
2 d test 21
2 a test5 21
It doesn't work if I split each column and then concat them à la
pos = df.token.str.split('/', expand=True).stack().str.strip().reset_index(level=1, drop=True)
df1 = pd.concat([pos,value], axis=1, keys=['pos','value'])
Any ideas? I'd really appreciate it.
EDIT:
I tried using this solution here : https://stackoverflow.com/a/40449726/4219498
But I get the following error:
TypeError: Cannot cast array data from dtype('int64') to dtype('int32') according to the rule 'safe'
I suppose this is a numpy related issue although I'm not sure how this happens. I'm using Python 2.7.14

I tend to avoid the stack magic in favour of building a new dataframe from scratch. This is usually also more efficient. Below is one way.
import numpy as np
from itertools import chain
lens = list(map(len, df['pos'].str.split('/')))
res = pd.DataFrame({'id': np.repeat(df['id'], lens),
'pos': list(chain.from_iterable(df['pos'].str.split('/'))),
'value': list(chain.from_iterable(df['value'].str.split('/'))),
'sent': np.repeat(df['sent'], lens)})
print(res)
id pos sent value
0 1 a 21 test
0 1 b 21 test2
0 1 c 21 test3
1 2 d 21 test
1 2 a 21 test5

Develop Reference

Python is a programming language that lets you work quickly and integrate systems more effectively.

Selecting max within partition for pandas dataframe [duplicate] - python

You can use idxmax(), which returns the indices of the max values: maxes = df2.groupby('B')['C'].idxmax() df2.loc[maxes] Output: Out[11]: A B C D 0 test1 one 311 9 3 test4 two 41 6

Related

How to change pandas table data arrangement? [duplicate]

Copy Row(s) from One DataFrame to Another with Regex [duplicate]

How to implement the excel function IF(H3>I3,C2,0) in pandas

reshape data to split one column into multiple columns based on delimiter in pandas or otherwise in python [duplicate]

Splitting multiple columns on a delimiter into rows in pandas dataframe [duplicate]

Categories

Resources