Count values in a columns depending on value of another one column - python

I have this issues: I have to sum all the values present in a dataframe column based on the value that I have in another column. In the specific I have a column "App" and a columns "n_of_Installs".
What I need is count all "n. of Installs" for each App.
I tried this code: dataframe.groupby('App').sum()['n_of_Installs'] but it doesn't work.

Following line will do what you want:
dataframe.groupby('App')['n_of_installs'].sum() # returns pandas Series
Note that above code returns you a pandas Series. If you wish to get a pandas DataFrame instead, use the as_index=False option of groupby
dataframe.groupby('App', as_index=False)['n_of_installs'].sum() # returns pandas DataFrame

Related

Python Pandas Groupby to count unique records in a single column

I have a df having a single column containing rows of repeating data. I want to display a pivot table of unique values of that column along with their count. I know it would be some sort of groupby however I could not get it to work, please help.
.
Try:
df.groupby("PdDistrict").size()

Collapsing values of a Pandas column based on Non-NA value of other column

I have a data like this in a csv file which I am importing to pandas df
I want to collapse the values of Type column by concatenating its strings to one sentence and keeping it at the first row next to date value while keeping rest all rows and values same.
As shown below.
Edit:
You can try ffill + transform
df1=df.copy()
df1[['Number', 'Date']]=df1[['Number', 'Date']].ffill()
df1.Type=df1.Type.fillna('')
s=df1.groupby(['Number', 'Date']).Type.transform(' '.join)
df.loc[df.Date.notnull(),'Type']=s
df.loc[df.Date.isnull(),'Type']=''

Pandas/Python Filtering a DF for column value

I was looking for a way to filter a df for value in a column in a groupby and also in another instance when calling that df column.
For example:
So to plot this dfs column_betas as below, but only when a different column (called column_value) has value like 2?
df['column_betas'] # ( when a different column called `column_value` is 2)
and for below when I am running a group by for the city column, but only when the column_value column = 2?
df.groupby(['City']).quantile(.5)
I am trying to avoid creating additional dfs that filter for a certain value for column_value and instead try to call that value when just calling that df for that specific column value or in the groupby.
This command gets df['column_betas"], where the value column is 2:
df[df["value"]==2]["column_betas"]
and this command does group by only on rows that has value of 2 in value column
df[df["value"]==2].groupby(["City"])
Substitute df with
df[df['column_value']==2]
So df['column_betas'] becomes df[df['column_value']==2]['column_betas']
and df.groupby(['City']).quantile(.5) becomes df[df['column_value']==2].groupby(['City']).quantile(.5)

How to do sorting after groupby and aggregation on a Pandas Dataframe

I'm having a Pandas Dataframe and I'm doing a groupby on two columns and have a couple of aggregate functions on a column. Here is how my code looks like
df2 = df[X,Y, Z].groupby([X,Y]).agg([np.mean, np.max, np.min]).reset_index()
It find the aggregate functions on the column Z.
I need to sort by let's say min (i.e. sort_values('min')) column but it keeps complaining that 'min' column does not exist. How can I do that
Since you are generating a pd.MultiIndex, you must use a tuple in sort_values.
Try:
df2.sort_values(('Z','amin'))

Add pandas Series to a DataFrame, preserving index

I have been having some problems adding the contents of a pandas Series to a pandas DataFrame. I start with an empty DataFrame, initialised with several columns (corresponding to consecutive dates).
I would like to then sequentially fill the DataFrame using different pandas Series, each one corresponding to a different date. However, each Series has a (potentially) different index.
I would like the resulting DataFrame to have an index that is essentially the union of each of the Series indices.
I have been doing this so far:
for date in dates:
df[date] = series_for_date
However, my df index corresponds to that of the first Series and so any data in successive Series that correspond to an index 'key' not in the first Series are lost.
Any help would be much appreciated!
Ben
If i understand you can use concat:
pd.concat([series1,series2,series3],axis=1)

Categories

Resources