Create an index column by group [duplicate]

Create an index column by group [duplicate] - python

This question already has answers here:
Add a sequential counter column on groups to a pandas dataframe
(4 answers)
Closed 4 years ago.
I would like to index my dataframe such that in each group it starts from 0 to the number of observations in the group. Ie from :
pd.DataFrame([["John","Car"],["John","House"],["Sam","Skate"],["Sam","Disco"],["Sam","Space"]])
I would like to have :
pd.DataFrame([["John","Car",0],["John","House",1],["Sam","Skate",0],["Sam","Disco",1],["Sam","Space",2]])
Thanks

Youre looking for the cumulative count function:
df = pd.DataFrame([["John","Car"],["John","House"],["Sam","Skate"],["Sam","Disco"],["Sam","Space"]])
df.groupby(0).cumcount()

Use:
df.groupby(0)[0].apply(lambda x:x.duplicated().cumsum())

Related

How do you get the top n rows of each group in a python pandas dataframe? [duplicate]

This question already has answers here:
Pandas get topmost n records within each group
(6 answers)
Closed 11 months ago.
How do you get the top n rows of each group in a python pandas dataframe?

n=10
top10each = df.groupby('Category').apply(lambda group: group.head(n)).reset_index(drop = True)

Python: transpose and group dataframe [duplicate]

This question already has answers here:
How can I pivot a dataframe?
(5 answers)
Closed 2 years ago.
I have dataframe: table_revenue
how can I transpose the dataframe and have grouping by 'stations_id' to see final result as:
where values of cells is the price, aggregated by exact date (column) for specific 'station_id' (row)

It seems you need pivot_table():
output = input.pivot_table(index='station_id',columns='endAt',values='price',aggfunc='sum',fill_value=0)

How to select and extract the rows with the same ID which has the minimum value in one column in jupyter? [duplicate]

This question already has answers here:
Keep other columns when doing groupby
(5 answers)
Closed 3 years ago.
I have a data frame that has some rows with the same ID which includes different starcounter values.
I need to keep the rows with a minimum value and delete the extra rows to reach this table:
.
Thank you in advance.

What you need is
df2 = df.sort_values('starcounter').drop_duplicates(['ID'], keep='first')

Here's a one liner to do this:
df.loc[df.groupby('ID')['starcounter'].idxmin()]

merge duplicate rows by adding a column 'count' [duplicate]

This question already has answers here:
Get statistics for each group (such as count, mean, etc) using pandas GroupBy?
(9 answers)
Closed 3 years ago.
I want to merge duplicate rows by adding a new column 'count'
Final dataframe that I want
rows can be in any order

You can use:
df["count"] = 1
df = df.groupby(["user_id", "item_id", "total"])["count"].count().reset_index()

Pandas: Return a new Dataframe with specific non continuous column selection [duplicate]

This question already has answers here:
How to take column-slices of dataframe in pandas
(11 answers)
Closed 6 years ago.
I have a dataframe with 85 columns and something like 10.000 rows.
The first column is Shrt_Desc and the last Refuse_Pct
The new data frame that I want has to have Shrt_Desc, then leave some columns out and then include in series Fiber_TD_(g) to Refuse_Pct
I use:
dfi_3 = food_info.loc[:, ['Shrt_Desc', 'Fiber_TD_(g)':'Refuse_Pct']]
but it gives a syntax error.
Any ideas how can I achieve this?
Thank you.

Borrowing the main idea from this answer:
pd.concat([food_info['Shrt_Desc'], food_info.ix[:, 'Fiber_TD_(g)':]], axis=1)

Develop Reference

Python is a programming language that lets you work quickly and integrate systems more effectively.

Create an index column by group [duplicate] - python

Youre looking for the cumulative count function: df = pd.DataFrame([["John","Car"],["John","House"],["Sam","Skate"],["Sam","Disco"],["Sam","Space"]]) df.groupby(0).cumcount()

Use: df.groupby(0)[0].apply(lambda x:x.duplicated().cumsum())

Related

How do you get the top n rows of each group in a python pandas dataframe? [duplicate]

Python: transpose and group dataframe [duplicate]

How to select and extract the rows with the same ID which has the minimum value in one column in jupyter? [duplicate]

merge duplicate rows by adding a column 'count' [duplicate]

Pandas: Return a new Dataframe with specific non continuous column selection [duplicate]

Categories

Resources