Finding mean from two different variables [duplicate] - python

This question already has answers here:
Get statistics for each group (such as count, mean, etc) using pandas GroupBy?
(9 answers)
Pandas Groupby: Count and mean combined
(6 answers)
Closed 2 years ago.
I have dataset like
df = pd.DataFrame({"type" :["A","B","C","A","B","B"], "value": [40,25,33,22,45,62]})
I want to find each individual type mean, ie., type = A has mean of 31
I did by subsetting
df_a = df.loc[df['type']=="A"]
df_a['value'].mean()
I want to do it in single line,
Thanks in advance

A possible solution might be:
df.gropuby('type')['value'].mean()

Related

how to group several columns into one column [duplicate]

This question already has answers here:
How do I transpose dataframe in pandas without index?
(3 answers)
Closed 11 months ago.
I am trying to analyze chines GDP according to its provinces. I want to make a line chart that shows changing GDP over time but I cannot group them.
i want to pivot the table but it is not working as I want.
but I want to make it like this
It looks like you want to switch x and y axes. Use transpose. You can call it with T.
transposed_df = df_data.T
print(transposed_df)

Trying to use transform in pandas but it is giving me some error [duplicate]

This question already has answers here:
Converting string column from DataFrame to float for .sum()
(4 answers)
Change column type in pandas
(16 answers)
Closed 1 year ago.
I a trying to get the sum of two numbers by using groupby and transform in pandas library but It is giving some garbage value, can someone guide me on how to solve this:
my data looks like this:
SKU Fees
45241 6.91
45241 6.91
55732 119.05
55732 137.98
I have tried using this code:
df['total_fees'] = df.groupby(['sku'])['Fees'].transform('sum')
what I am getting is this:
SKU Fees total_fees
45241 6.91 6.91.6.91
45241 6.91 6.91.6.91
55732 119.05 119.05.137.98
55732 137.98 119.05.137.98
df['Fees'] = df['Fees'].astype(float)
df.groupby(['sku'])['Fees'].sum()
# Computes the sum
df.groupby(['sku'])['Fees'].transform('sum')
# Computes the sum but using 'transform' duplicates the value for each row

How to reverse a series in pandas by Python? [duplicate]

This question already has an answer here:
Reversing the order of values in a single column of a Dataframe
(1 answer)
Closed 1 year ago.
I need to reverse a series for correct plotting.
So I wrote the code below:
dataFrame["close"] = dataFrame["close"][::-1]
But it doesn't differ. Why?
You are including the specific column. You should reverse the entire dataframe like this:
dataFrame.reindex(index=dataFrame.index[::-1])
or
dataFrame.iloc[::-1]

How do I find "n" maximum values for each month in a pandas dataframe? [duplicate]

This question already has answers here:
Pandas get topmost n records within each group
(6 answers)
Closed 3 years ago.
Given a pandas dataframe with company purchases across various months in a year, how do I find the "N" highest each month?
Currently have:
df.groupby(df['Transaction Date'].dt.strftime('%B'))['Amount'].max()
Which is returning the highest value for each month but would like to see the highest four values.
Am I getting close here or is there a more efficient approach? Thanks in advance
With sort_values then tail
yourdf=df.sort_values('Amount').groupby(df['Transaction Date'].dt.strftime('%B'))['Amount'].tail(4)

Does Pandas have notin function to filter rows from data frame from a given list [duplicate]

This question already has answers here:
Find the list values not in pandas dataframe data
(3 answers)
Closed 5 years ago.
I have a data frame and I need to get the rows which are "no in" the given list
I know in order to get the rows from the list we can use isin.(list), so my question is whether there is a contrary "notin" function?
You can use the ~ in front of condition to negate it.
~df['Col1'].isin(list)
df['Col1'].isin(list) will return True/False, then just flip the boolean to get True where Col1 is not in the list.

Categories

Resources