How to find and add frequency column for ID? - python

I am a beginner at python, so bear with me!
My dataset is from excel and I was curious how to find and add a frequency column for my ID.
I first performed the groupby function for ID and date by doing:
dfcount = dfxyz.groupby(["ID", "Date"])
and then found the mean by doing:
dfcount1 = dfcount.mean()
The output i got was:
What I am trying to do is get the frequency number beside it like this:
I did not know how to copy python code, so I uploaded pictures! Sorry! Any help is appreciated for what code I can use to count the frequency for each ID AFTER I find the mean of the groupby columns.
Thank you in advance!

You can using groupby with cumcount
df['Freq']=(df.groupby(level=0).cumcount()+1).values

You can use this:
df['column_name'].value_counts()
value_counts - Returns object containing counts of unique values

Related

Pandas DataFrame Statisitics Tracking

I am looking to output a count for the number of times the data is >11 for one column in my dataframe. I have tried using df2['LOC6'].value_counts() before, but I do not think it is applicable in this situation. LOC6 is the name of the column.
How about this?
(df2['LOC6'] > 11).sum()

Python 3D Dataframe: Sort Values by To Columns and get the mean

I searched for a solution on stackoverflow for a while now and havent got any solution yet. So hopefully you can help me out:
I have a dataframe with 3 columns ['Attrition', 'JobRole', 'MonthlyIncome']
I sorted the dataframe by its different values for Attrition (YES/NO) and its different JobRoles and wanna get the mean of the MonthlyIncome
e.g for Attrition==Yes & JobRole=='Healthcare' -> 'MonthlyIncome'=x
avg_inc=df[['Attrition', 'MonthlyIncome', 'JobRole']].sort_values(['Attrition', 'JobRole'])
'''
[1]: https://i.stack.imgur.com/8cvYy.png
I hope anyone can help me out. Thanks in advance
Did you want something like/
Data
df=pd.DataFrame({'Attrition':['No','No','No','Yes','No','Yes','No','Yes'],'MonthlyIncome':[34567,7890,11234,56789,67890,65345,45782,97802], 'JobRole':['NS','DR','HD','DR','NS','HR','NS','HR']})
Groupby and calculate mean
df['Mean_MonthlyIncome']=df.groupby(['JobRole','Attrition'])['MonthlyIncome'].transform('mean')
Or
df.groupby(['JobRole','Attrition'])['MonthlyIncome'].mean()

Pandas - select lowest value to date

I'm new to Pandas.
I've got a dataframe where I want to group by user and then find their lowest score up until that date in the their speed column.
So I can't just use df.groupby(['user'])['speed'].transform('min) as this would give the min of all values not just form the current row to the first.
What can I use to get what I need?
Without seeing your dataset it's hard to help you directly. The problem does boil down to the following. You need to select the range of data you want to work with (so select rows for the date range and columns for the user/speed).
That would look something like x = df.loc[["2-4-2018","2-4-2019"], ['users', 'speed']]
From there you could do a simple x['users'].min() for the value or x['users'].idxmin() for the index of the value.
I haven't played around for a bit with Dataframes, but you're looking for how to slice Dataframes.

Python column with order number based on conditions

I have a table shown as below:
Customers are buying items in different dates. Each customer have a different number. Each item has a different ID.
I want to have an information in separate column for each ID is it first item for given customer or second or third etc.
I was trying:
df['item_order'] = np.where(df['Customer']==df['Customer'].shift(),
df.item_order.shift()+1, 0)
But there are only 0 for first and 1 for second, third etc.
You can try something like the below code using pandas
df[['ID','Customer','Date']].groupby(['ID','Customer']).agg('count')
Let me know if this is the output that you are expecting
thanks for help for everybody, solution is rank method.
You can find below solution for my issue:
df['rank'] = df.sort_values('Customer').groupby('Customer').Date.rank(method='first')

Generating a list of values from a pandas DataFrame column for a range of values in another column

For a list of daily maximum temperature values from 5 to 27 degrees celsius, I want to calculate the corresponding maximum ozone concentration, from the following pandas DataFrame:
I can do this by using the following code, by changing the 5 then 6, 7 etc.
df_c=df_b[df_b['Tmax']==5]
df_c.O3max.max()
Then I have to copy and paste the output values into an excel spreadsheet. I'm sure there must be a much more pythonic way of doing this, such as by using a list comprehension. Ideally I would like to generate a list of values from the column 03max. Please give me some suggestions.
use pd.Series.map with another pd.Series
pd.Series(list_of_temps).map(df_b.set_index('Tmax')['O3max'])
You can get a dataframe
result_df = pd.DataFrame(dict(temps=list_of_temps))
result_df['O3max'] = result_df.temps.map(df_b.set_index('Tmax')['O3max'])
I had another play around and think the following piece of code seems to do the job:
df_c=df_b.groupby(['Tmax'])['O3max'].max()
I would appreciate any thoughts on whether this is correct

Categories

Resources