I am new to Python and I want to access some rows for an already grouped dataframe (used groupby).
However, I am unable to select the row I want and would like your help.
The code I used for groupby shown below:
language_conversion = house_ads.groupby(['date_served','language_preferred']).agg({'user_id':'nunique',
'converted':'sum'})
language_conversion
Result shows:
For example, I want to access the number of Spanish-speaking users who received house ads using:
language_conversion[('user_id','Spanish')]
gives me KeyError('user_id','Spanish')
This is the same when I try to create a new column, which gives me the same error.
Thanks for your help
Use this,
language_conversion.loc[(slice(None), 'Arabic'), 'user_id']
You can see the indices(in this case tuples of length 2) using language_conversion.index
you should use this
language_conversion.loc[(slice(None),'Spanish'), 'user_id']
slice(None) here includes all rows in date index.
if you have one particular date in mind just replace slice(None) with that specific date.
the error you are getting is because u accessed columns before indexes which is not correct way of doing it follow the link to learn more indexing
Related
Expected outout
Trying to find all URLs with Response Code 200 using pandas- grouping dataframe.
below is my code that gives the error message below :
ValueError: must supply a tuple to get_group with multiple grouping keys
url_response_grouped = log_df.groupby(['URL','ResponseCode'])
url_response_grouped.ngroups
url_response_grouped.groups.keys()
url_response_grouped.get_group('URL','200')
Well, now I see, you don't really need to use groupby() with two columns to see all the urls with 'ResponseCode' 200.00. You just have to do:
url_response_grouped = log_df.groupby('ResponseCode')
And then:
url_response_grouped.get_group(200)
The following code won't work because you got multiple values as your urls, but none of them is 'URL'.
url_response_grouped.get_group('URL','200')
Although the problem is already solved perfectly, still want to provide a solution to the tuple thing. If you do want to groupby based on multiple columns and want to get a specific group,do
randomfiles.groupby(['name', 'age'])
randomfiles.get_group(('Alice', 13))
Hope it will help someone in the future!
I'm trying to get a scope of certain rows in a specific timeframe. The dataframe contains 2 indexes and one of them is made of datetimes (created with pd.to_datetime). When I try to select certain rows using df_pivot.loc[slice(None), '2021'] I get a KeyError: '2021'. Looking for rows using the year should be possible with datetimes right??? What do i do wrong? picture of the dataframe/indexes
problem is solved, I used reset_index() and then set_index('Datetime') to make it easier to navigate
I am wanting to try to get a count of how many rows within a column contain a partial string based on an imported dataframe. In the sample data below, I want to groupby Trans_type and then get a count of how many rows contain a value.
So I would expect to see:
First, is this possible generically without passing a link to get each types expected brand? If not, how could I pass say Car a list of .str.contains['Audi','BMW'].
Thanks for any help!
Try this one:
df.groupby(df["Trans_type"], df["Brand"].str.extract("([a-zA-Z])+", expand=False)).count()
I am very new to pandas but making progress...
I have the following dataframe:
I want to do a count on the number of events that have happened by Month/Year which I believe would produce something like the below
I have tried the following based on the article located here
group = df.groupby(['MonthYear', 'EventID']).count()
frequency = group['EventID'].groupby(level=0, group_keys=False)
print(frequency)
I then get an error (using VS Code) that states:
unable to open 'hashtable_class_helper.pxi'
I have had this before and it is usually when I have used the wrong case for my column names but I have verified they are correct.
Where am I going wrong?
you can use:
frequency= df.groupby('MonthYear')['EventID'].value_counts()
See documentation for more details
You could try aggregation on top of groupBy df.groupby('MonthYear').agg({'EventID':'count'})
I am trying to delete the first two rows from my dataframe df and am using the answer suggested in this post. However, I get the error AttributeError: Cannot access callable attribute 'ix' of 'DataFrameGroupBy' objects, try using the 'apply' method and don't know how to do this with the apply method. I've shown the relevant lines of code below:
df = df.groupby('months_to_maturity')
df = df.ix[2:]
Edit: Sorry, when I mean I want to delete the first two rows, I should have said I want to delete the first two rows associated with each months_to_maturity value.
Thank You
That is what tail(-2) will do. However, groupby.tail does not take a negative value, so it needs a tweak:
df.groupby('months_to_maturity').apply(lambda x: x.tail(-2))
This will give you desired dataframe but its index is a multi-index now.
If you just want to drop the rows in df, just use drop like this:
df.drop(df.groupby('months_to_maturity').head(2).index)