I have a dataframe with 2 columns:
count percent grpno.
0 14.78 1
1 0.00 2
2 8.80 3
3 9.60 4
4 55.90 4
5 0.00 2
6 0.00 6
7 0.00 5
8 6.90 1
9 59.00 4
I need to get the max of column 'count percent
' and group by column 'grpno.'. Though I tried doing the same by
geostat.groupby(['grpno.'], sort=False)['count percent'].max()
I get the output to be
grpno.
1 14.78
2 0.00
3 8.80
4 59.00
6 0.00
5 0.00
Name: count percent, dtype: float64
But I need output to be a dataframe that has the column name modified as 'MaxOfcount percent' and 'grpno.' Can anyone help on this? Thanks
res = df.groupby('grpno.')['count percent'].max().reset_index()
res.columns = ['grpno.', 'MaxOfcount percent']
grpno. MaxOfcount percent
0 1 14.78
1 2 0.00
2 3 8.80
3 4 59.00
4 5 0.00
5 6 0.00
You could also do it in one line:
res = df.groupby('grpno.', as_index=False)['count percent'].max().rename(columns={'count percent': 'MaxOfcount percent'})
You could use groupby with argument as_index=False:
In [119]: df.groupby(['grpno.'], as_index=False)[['count percent']].max()
Out[119]:
grpno. count percent
0 1 14.78
1 2 0.00
2 3 8.80
3 4 59.00
4 5 0.00
5 6 0.00
df1 = df.groupby(['grpno.'], as_index=False)[['count percent']].max()
df1.columns = df1.columns[:-1].tolist() + ['MaxOfcount percent']
In [130]: df1
Out[130]:
grpno. MaxOfcount percent
0 1 14.78
1 2 0.00
2 3 8.80
3 4 59.00
4 5 0.00
5 6 0.00
Related
In the df underneath, I want to sort the values of column 'cdf_X' based on column 'A' and 'X'. Column 'X' and 'cdf_X' are connected, so if a value in 'X' appears in column 'A', the value of 'cdf_X' should be repositioned to that index number of column 'A' in a new column. (Values don't occur twice in a column 'cdf_A'.)
Example: 'X'=3 at index 0 -> cdf_X=0.05 at index 0 -> '3' appears in column 'A' at index 4 -> cdf_A at index 4 = cdf_X at index 0
Initial df:
A X cdf_X
0 7 3 0.05
1 4 4 0.15
2 11 7 0.27
3 9 9 0.45
4 3 11 0.69
5 13 13 1.00
Desired df:
A X cdf_X cdf_A
0 7 3 0.05 0.27
1 4 4 0.15 0.15
2 11 7 0.27 0.69
3 9 9 0.45 0.45
4 3 11 0.69 0.05
5 13 13 1.00 1.00
Tried code:
import pandas as pd
df = pd.DataFrame({"A": [7,4,11,9,3,13],
"cdf_X": [0.05,0.15,0.27,0.45,0.69,1.00],
"X": [3,4,7,9,11,13]})
df.loc[:, 'cdf_A'] = df['cdf_X'].where(df['A'] == df['X'])
print(df)
Check with map
df['cdf_A'] = df.A.map(df.set_index('X')['cdf'])
I think you need replace
df['cdf_A'] = df.A.replace(df.set_index('X').cdf)
Out[989]:
A X cdf cdf_A
0 7 3 0.05 0.27
1 4 4 0.15 0.15
2 11 7 0.27 0.69
3 9 9 0.45 0.45
4 3 11 0.69 0.05
5 13 13 1.00 1.00
I have got a dataframe of several hundred thousand rows. Which is of the following format:
time_elapsed cycle
0 0.00 1
1 0.50 1
2 1.00 1
3 1.30 1
4 1.50 1
5 0.00 2
6 0.75 2
7 1.50 2
8 3.00 2
I want to create a third column that will give me the percentage of each time instance that the row is of the cycle (until the next time_elapsed = 0). To give something like:
time_elapsed cycle percentage
0 0.00 1 0
1 0.50 1 33
2 1.00 1 75
3 1.30 1 87
4 1.50 1 100
5 0.00 2 0
6 0.75 2 25
7 1.50 2 50
8 3.00 2 100
I'm not fussed about the number of decimal places, I've just excluded them for ease here.
I started going along this route, but I keep getting errors.
data['percentage'] = data['time_elapsed'].sub(data.groupby(['cycle'])['time_elapsed'].transform(lambda x: x*100/data['time_elapsed'].max()))
I think it's the lambda function causing errors, but I'm not sure what I should do to change it. Any help is much appreciated :)
Use Series.div for division instead sub for subtract, then solution is simplify - get only max per groups, multiple by Series.mul, if necessary Series.round and last convert to integers by Series.astype:
s = data.groupby(['cycle'])['time_elapsed'].transform('max')
data['percentage'] = data['time_elapsed'].div(s).mul(100).round().astype(int)
print (data)
time_elapsed cycle percentage
0 0.00 1 0
1 0.50 1 33
2 1.00 1 67
3 1.30 1 87
4 1.50 1 100
5 0.00 2 0
6 0.75 2 25
7 1.50 2 50
8 3.00 2 100
I have a dataframe with 2 columns:
count percent grpno.
0 14.78 1
1 0.00 2
2 8.80 3
3 9.60 4
4 55.90 4
5 0.00 2
6 0.00 6
7 0.00 5
8 6.90 1
9 59.00 4
I need to get the max of column 'count percent
' and group by column 'grpno.'. Though I tried doing the same by
geostat.groupby(['grpno.'], sort=False)['count percent'].max()
I get the output to be
grpno.
1 14.78
2 0.00
3 8.80
4 59.00
6 0.00
5 0.00
Name: count percent, dtype: float64
But I need output to be a dataframe that has the column name modified as 'MaxOfcount percent' and 'grpno.' Can anyone help on this? Thanks
res = df.groupby('grpno.')['count percent'].max().reset_index()
res.columns = ['grpno.', 'MaxOfcount percent']
grpno. MaxOfcount percent
0 1 14.78
1 2 0.00
2 3 8.80
3 4 59.00
4 5 0.00
5 6 0.00
You could also do it in one line:
res = df.groupby('grpno.', as_index=False)['count percent'].max().rename(columns={'count percent': 'MaxOfcount percent'})
You could use groupby with argument as_index=False:
In [119]: df.groupby(['grpno.'], as_index=False)[['count percent']].max()
Out[119]:
grpno. count percent
0 1 14.78
1 2 0.00
2 3 8.80
3 4 59.00
4 5 0.00
5 6 0.00
df1 = df.groupby(['grpno.'], as_index=False)[['count percent']].max()
df1.columns = df1.columns[:-1].tolist() + ['MaxOfcount percent']
In [130]: df1
Out[130]:
grpno. MaxOfcount percent
0 1 14.78
1 2 0.00
2 3 8.80
3 4 59.00
4 5 0.00
5 6 0.00
Hello i have a problem which i am not able to implement a solution on.
I have following two DataFrames:
>>> df1
A B date
1 1 01-2016
2 1 02-2017
1 2 03-2017
2 2 04-2020
>>> df2
A B 01-2016 02-2017 03-2017 04.2020
1 1 0.10 0.22 0.55 0.77
2 1 0.20 0.12 0.99 0.125
1 2 0.13 0.15 0.15 0.245
2 2 0.33 0.1 0.888 0.64
What i want is following DataFrame:
>>> df3
A B date value
1 1 01-2016 0.10
2 1 02-2017 0.12
1 2 03-2017 0.15
2 2 04-2020 0.64
I already tried following:
summarize_dates = self.summarize_specific_column(data=df1, column='date')
for date in summarize_dates:
left_on = np.append(left_on, date)
right_on = np.append(right_on, merge_columns.upper())
result = pd.merge(left=df2, right=df1,
left_on=left_on, right_on=right_on,
how='right')
print(result)
This does not work. Can you help me and suggest a more comfortable implementation? Manyy thanks in advance!
You can melt df2 and then merge using the default 'inner' merge
df3 = df1.merge(df2.melt(id_vars = ['A', 'B'], var_name='date'))
A B date value
0 1 1 01-2016 0.10
1 2 1 02-2017 0.12
2 1 2 03-2017 0.15
3 2 2 04-2020 0.64
Using lookup
df1['value']=df2.set_index(['A','B']).lookup(df1.set_index(['A','B']).index,df1.date)
df1
Out[228]:
A B date value
0 1 1 01-2016 0.10
1 2 1 02-2017 0.12
2 1 2 03-2017 0.15
3 2 2 04-2020 0.64
I have one dataframe geomerge, I need to group by one column grpno. and select first of column MaxOfcount percent and first of column state code and display grpno. also. I have rename them as FirstOfMaxOfState count percent and FirstOfstate code
My input dataframe:
count percent grpno. state code MaxOfcount percent
0 14.78 1 CA 14.78
1 0.00 2 CA 0.00
2 0.00 2 FL 0.00
3 8.80 3 CA 8.80
4 0.00 6 NC 0.00
5 0.00 5 NC 0.00
6 59.00 4 MA 59.00
My output dataframe:
FirstOfMaxOfState count percent state pool number FirstOfstate code
0 14.78 1 CA
1 0.00 2 CA
2 8.80 3 CA
3 59.00 4 MA
4 0.00 5 NC
5 0.00 6 NC
Can anyone help on this?
Drop the unneeded column, group by grpno, take the first value, and flatten the multi-index:
df2 = df.drop('count percent', 1).groupby('grpno.').take([0]).reset_index(0)
Rename the columns:
mapping = {'state code':'FirstOfstate code' ,
'grpno.': 'state pool number',
'MaxOfcount percent': 'FirstOfMaxOfState count percent'}
df2.rename_axis(mapping, axis=1)
Result:
>>> df2
state pool number FirstOfMaxOfState count percent FirstOfstate code
0 1 14.78 CA
1 2 0.00 CA
3 3 8.80 CA
6 4 59.00 MA
5 5 0.00 NC
4 6 0.00 NC