I have a pandas data frame with racing results.
Place BibNum Time
0 1 2 5:50
1 2 4 8:09
2 3 7 10:27
3 4 3 11:12
4 5 1 12:13
...
34 1 5 2:03
35 2 9 4:35
36 3 7 5:36
What I would like to know is how can I get a count of how many times the BibNum showed up where the Place was 1, 2, 3 etc?
I know that I can do a "value_counts" but that is for how many times it shows up in a single column. I also looked into using numpy "where" but that is using a conditional like greater than or less than.
IIUC , this is what you need:
out = df.groupby(['Place','BibNum']).size()
I have a pandas dataframe for which I need to sort (by ascending) the values by two columns with the output being a "middle ground" of the two columns.
An example is shown bellow. When I use sort_values it sorts by the first columns and considers the second one only for duplicate values. I, however, need to get the row that have the combinaison of lower values for both columns (which is the 3rd one in the ouput bellow).
test = pd.DataFrame({'file':[1,2,3,4,5,6], 'rmse':[66,41,43,39,40,42], 'var':[44,177,201,321,349,379]})
test.sort_values(by=['rmse', 'var'], ascending=[True, True])
Output :
file rmse var
3 4 39 321 <--- First row given by `sort_values`
4 5 40 349
1 2 41 177 <--- Row that I need
5 6 42 379
2 3 43 201
0 1 66 44
I'm not sure how to phrase my question properly in English so please tell me if I need to make my question more clear.
IIUC, let's use rank, mean, and argsort:
test.iloc[test[['var', 'rmse']].rank().mean(axis=1).argsort()]
Output:
file rmse var
1 2 41 177
3 4 39 321
0 1 66 44
4 5 40 349
2 3 43 201
5 6 42 379
Details, rank the values in each column, then average the ranks for each row and sort the mean ranks to determine row order.
I've tried all the methods of df.sort.values but instead of that you can try a for loop like this :
import pandas as pd
test = pd.DataFrame({'file':[1,2,3,4,5,6], 'rmse':[66,41,43,39,40,42], 'var':[44,177,201,321,349,379]})
for i in test:
test[i]=sorted(test[i])
print(test)
Output :
file rmse var
0 1 39 44
1 2 40 177
2 3 41 201
3 4 42 321
4 5 43 349
5 6 66 379
This question already has answers here:
Pandas Merging 101
(8 answers)
Closed 2 years ago.
I have two dataframes like
df1
sub_id Weight
1 56
2 67
3 81
5 73
9 59
df2
sub_id Text
1 He is normal.
1 person is healthy.
1 has strong immune power.
3 She is over weight.
3 person is small.
9 Looks good.
5 Not well.
5 Need to be tested.
By combining these two data frame i need to get as
(when there are multiple sub_id's in second df need to pick first text and combine with first df as below)
merge_df
sub_id Weight Text
1 56 He is normal.
2 67 Nan.
3 81 She is over weight.
5 73 Not well.
9 59 Looks good.
Can anyone help me out?
Thanks in advance.
Here you go:
print(pd.merge(df1, df2.drop_duplicates(subset='sub_id'),
on='sub_id',
how='outer'))
Output
sub_id Weight Text
0 1 56 He is normal.
1 2 67 NaN
2 3 81 She is over weight.
3 5 73 Not well.
4 9 59 Looks good.
To keep the last duplicate, you'd use the parameter keep='last'
print(pd.merge(df1, df2.drop_duplicates(subset='sub_id', keep='last'),
on='sub_id',
how='outer'))
Output
sub_id Weight Text
0 1 56 has strong immune power.
1 2 67 NaN
2 3 81 person is small.
3 5 73 Need to be tested.
4 9 59 Looks good.
I am new to Pandas.
My dataset:
df
A B
10 1
15 2
65 3
54 2
51 2
96 1
I am trying to add new column C and calculate the median for values that are in the same group defined by column B.
Expected result:
df
A B C
10 11 53
15 2 34
65 3 65
54 2 34
51 2 34
96 1 53
What I've tried:
df_final['C'] = df_final.groupby('B')['A'].transform('median')
I do get an answer, but due to big DataFrame I am unsure if my code performs correctly, could someone tell me if I am using the right way to achieve this?
You can use:
df_final['C'] = df_final.groupby('B')['A'].transform('median')
As provided in comments.
I have a pandas data frame in which one of the columns contains real values. I would like to have a new column in this data frame that contains integer numbers indicating what place the real number from another column takes. For example, 1 would mean that the real number from the column with real numbers is the largest one and 2 would mean the second largest and so on.
DataFrame has a rank method:
import pandas as pd
df = pd.DataFrame({'a': np.random.randint(0,100,10)})
df['rank'] = df.rank(ascending=False)
a rank
0 16 8
1 91 1
2 58 4
3 36 6
4 15 9
5 69 3
6 35 7
7 78 2
8 48 5
9 5 10
Make sure you checkout the optional method keyword which sets the behavior in case of equal values.