how to access based row based on condition with grouped dataframe

how to access based row based on condition with grouped dataframe - python

I am new to Python and I want to access some rows for an already grouped dataframe (used groupby).
However, I am unable to select the row I want and would like your help.
The code I used for groupby shown below:
language_conversion = house_ads.groupby(['date_served','language_preferred']).agg({'user_id':'nunique',
'converted':'sum'})
language_conversion
Result shows:
For example, I want to access the number of Spanish-speaking users who received house ads using:
language_conversion[('user_id','Spanish')]
gives me KeyError('user_id','Spanish')
This is the same when I try to create a new column, which gives me the same error.
Thanks for your help

Use this,
language_conversion.loc[(slice(None), 'Arabic'), 'user_id']
You can see the indices(in this case tuples of length 2) using language_conversion.index

you should use this
language_conversion.loc[(slice(None),'Spanish'), 'user_id']
slice(None) here includes all rows in date index.
if you have one particular date in mind just replace slice(None) with that specific date.
the error you are getting is because u accessed columns before indexes which is not correct way of doing it follow the link to learn more indexing

Related

ValueError: must supply a tuple to get_group with multiple grouping keys

Expected outout
Trying to find all URLs with Response Code 200 using pandas- grouping dataframe.
below is my code that gives the error message below :
ValueError: must supply a tuple to get_group with multiple grouping keys
url_response_grouped = log_df.groupby(['URL','ResponseCode'])
url_response_grouped.ngroups
url_response_grouped.groups.keys()
url_response_grouped.get_group('URL','200')

Well, now I see, you don't really need to use groupby() with two columns to see all the urls with 'ResponseCode' 200.00. You just have to do:
url_response_grouped = log_df.groupby('ResponseCode')
And then:
url_response_grouped.get_group(200)
The following code won't work because you got multiple values as your urls, but none of them is 'URL'.
url_response_grouped.get_group('URL','200')

Although the problem is already solved perfectly, still want to provide a solution to the tuple thing. If you do want to groupby based on multiple columns and want to get a specific group,do
randomfiles.groupby(['name', 'age'])
randomfiles.get_group(('Alice', 13))
Hope it will help someone in the future!

I can't select rows from a datetime in a multiindex dataframe

I'm trying to get a scope of certain rows in a specific timeframe. The dataframe contains 2 indexes and one of them is made of datetimes (created with pd.to_datetime). When I try to select certain rows using df_pivot.loc[slice(None), '2021'] I get a KeyError: '2021'. Looking for rows using the year should be possible with datetimes right??? What do i do wrong? picture of the dataframe/indexes

problem is solved, I used reset_index() and then set_index('Datetime') to make it easier to navigate

Pandas Groupby Count Partial Strings

I am wanting to try to get a count of how many rows within a column contain a partial string based on an imported dataframe. In the sample data below, I want to groupby Trans_type and then get a count of how many rows contain a value.
So I would expect to see:
First, is this possible generically without passing a link to get each types expected brand? If not, how could I pass say Car a list of .str.contains['Audi','BMW'].
Thanks for any help!

Try this one:
df.groupby(df["Trans_type"], df["Brand"].str.extract("([a-zA-Z])+", expand=False)).count()

Pandas group events by year

I am very new to pandas but making progress...
I have the following dataframe:
I want to do a count on the number of events that have happened by Month/Year which I believe would produce something like the below
I have tried the following based on the article located here
group = df.groupby(['MonthYear', 'EventID']).count()
frequency = group['EventID'].groupby(level=0, group_keys=False)
print(frequency)
I then get an error (using VS Code) that states:
unable to open 'hashtable_class_helper.pxi'
I have had this before and it is usually when I have used the wrong case for my column names but I have verified they are correct.
Where am I going wrong?

you can use:
frequency= df.groupby('MonthYear')['EventID'].value_counts()
See documentation for more details

You could try aggregation on top of groupBy df.groupby('MonthYear').agg({'EventID':'count'})

Deleting first two rows of a dataframe after doing groupby

I am trying to delete the first two rows from my dataframe df and am using the answer suggested in this post. However, I get the error AttributeError: Cannot access callable attribute 'ix' of 'DataFrameGroupBy' objects, try using the 'apply' method and don't know how to do this with the apply method. I've shown the relevant lines of code below:
df = df.groupby('months_to_maturity')
df = df.ix[2:]
Edit: Sorry, when I mean I want to delete the first two rows, I should have said I want to delete the first two rows associated with each months_to_maturity value.
Thank You

That is what tail(-2) will do. However, groupby.tail does not take a negative value, so it needs a tweak:
df.groupby('months_to_maturity').apply(lambda x: x.tail(-2))
This will give you desired dataframe but its index is a multi-index now.
If you just want to drop the rows in df, just use drop like this:
df.drop(df.groupby('months_to_maturity').head(2).index)

Develop Reference

Python is a programming language that lets you work quickly and integrate systems more effectively.

how to access based row based on condition with grouped dataframe - python

Use this, language_conversion.loc[(slice(None), 'Arabic'), 'user_id'] You can see the indices(in this case tuples of length 2) using language_conversion.index

Related

ValueError: must supply a tuple to get_group with multiple grouping keys

I can't select rows from a datetime in a multiindex dataframe

Pandas Groupby Count Partial Strings

Pandas group events by year

Deleting first two rows of a dataframe after doing groupby

Categories

Resources