Aggregate count in Pandas - python

I want to apply a group by on a pandas dataframe. I want to group by three columns and calculate their count. I used the following code
data.groupby(['post_product_list','cust_visid','date_time']).count()
But it didn't seem to work

data.groupby(['post_product_list','cust_visid','date_time']).size()

Related

How does pandas know to only group and show these 3 columns?

I use groupby method to group data by month. The ouput is exactly want I wanted.
What I want to understand is, how does x display only 3 columns (Quantity Ordered, Price Each and Sales) and rejects the other columns shown in the dataset after I use the groupby method. Is it because the other data isn't numeric ? Is it because I used sum method along with groupby method ?
Since sum is a numeric function, pandas would only apply it to the columns that are numeric. This is described in the documentation as Automatic exclusion of “nuisance” columns.

Is there any advice on how to tweak my code to return the correct table as a dataframe?

I am tyring to write a function that takes a dataframe, groups the dataframe by a column, and then orders that column by from largest to smallest using the average of a second column. I am trying to return a dataframe. I am using both seaborn and pandas.
This what I have so far
def table(df, columnone, columntwo):
dfnew = df.groupby([columnone])[columntwo].nlargest()
return dfnew
I am not very sure what I am missing or what I should be looking for. I am pretty new with python and any help would be appreciated.
I think you are looking for this:
def table(df, columnone, columntwo):
return df.groupby([columnone])\
.mean()\
.sort_values(by=[columntwo], ascending=False)
Here groupby will create the groups, mean will average the values in other columns, sort_values will sort the resulting dataframe created after applying groupby.

Python Pandas Groupby to count unique records in a single column

I have a df having a single column containing rows of repeating data. I want to display a pivot table of unique values of that column along with their count. I know it would be some sort of groupby however I could not get it to work, please help.
.
Try:
df.groupby("PdDistrict").size()

Count values in a columns depending on value of another one column

I have this issues: I have to sum all the values present in a dataframe column based on the value that I have in another column. In the specific I have a column "App" and a columns "n_of_Installs".
What I need is count all "n. of Installs" for each App.
I tried this code: dataframe.groupby('App').sum()['n_of_Installs'] but it doesn't work.
Following line will do what you want:
dataframe.groupby('App')['n_of_installs'].sum() # returns pandas Series
Note that above code returns you a pandas Series. If you wish to get a pandas DataFrame instead, use the as_index=False option of groupby
dataframe.groupby('App', as_index=False)['n_of_installs'].sum() # returns pandas DataFrame

How to do sorting after groupby and aggregation on a Pandas Dataframe

I'm having a Pandas Dataframe and I'm doing a groupby on two columns and have a couple of aggregate functions on a column. Here is how my code looks like
df2 = df[X,Y, Z].groupby([X,Y]).agg([np.mean, np.max, np.min]).reset_index()
It find the aggregate functions on the column Z.
I need to sort by let's say min (i.e. sort_values('min')) column but it keeps complaining that 'min' column does not exist. How can I do that
Since you are generating a pd.MultiIndex, you must use a tuple in sort_values.
Try:
df2.sort_values(('Z','amin'))

Categories

Resources