groupby and extract data [duplicate] - python

This question already has answers here:
Pandas groupby: How to get a union of strings
(8 answers)
Closed 3 years ago.
new in pandas and I was able to create a dataframe from a csv file. I was also able to sort it out.
What I am struggling now is the following: I give an image as an example from a pandas data frame.
First column is the index,
Second column is a group number
Third column is what happened.
I want based on the second column to take out the third column on the same unique data frame.
I highlight few examples: For the number 9 return back the sequence
[60,61,70,51]
For the number 6 get back the sequence
[65,55,56]
For the number 8 get back the single element 8.
How groupby can be used to do this extraction?
Thanks a lot
Regards
Alex

Starting from the answers on this question we can extract following code to receive the desired result.
dataframe = pd.DataFrame({'index':[0,1,2,3,4], 'groupNumber':[9,9,9,9,9], 'value':[12,13,14,15,16]})
grouped = dataframe.groupby('groupNumber')['value'].apply(list)

Related

how to group several columns into one column [duplicate]

This question already has answers here:
How do I transpose dataframe in pandas without index?
(3 answers)
Closed 11 months ago.
I am trying to analyze chines GDP according to its provinces. I want to make a line chart that shows changing GDP over time but I cannot group them.
i want to pivot the table but it is not working as I want.
but I want to make it like this
It looks like you want to switch x and y axes. Use transpose. You can call it with T.
transposed_df = df_data.T
print(transposed_df)

How to find an index with maximum number of rows in Pandas? [duplicate]

This question already has answers here:
The first three max value in a column in python
(1 answer)
Count and Sort with Pandas
(5 answers)
Closed 3 years ago.
I am doing an online course which has a problem like " Find the name of the state with maximum number of counties". The problem dataframe is the image below
Problem Dataframe
Now, I have given the dataframe two new index (hierarchical indexing) and after that the dataframe takes a new look like the image below
Modified Dataframe
I have used this code to get the modified dataframe:
def answer_five():
new_df = census_df[census_df['SUMLEV'] == 50]
new_df = new_df.set_index(['STNAME', 'CTYNAME'])
return new_df
answer_five()
What I want to do now is to find the name of the state with most number of counties i.e to find the index with maximum number of rows. How Can I do that?
I know that using something like groupby() method this can be done but I'm not familiar with this method yet and so don't want to use it. Can anyone help? I have searched for this but failed. Sorry if the problem is rudimentary. Thanks in advance.

Filter dataframe by first order per customer [duplicate]

This question already has answers here:
Groupby first two earliest dates, then average time between first two dates - pandas
(3 answers)
Closed 3 years ago.
I would like some help to solve the following problem using Pandas in Python.
I have a dataframe about the customers' transactions - in random order, which contains the following columns, along with datatypes:
user_id object;
transaction_date datetime64[ns];
account_creation_date datetime64[ns];
transaction_id object;
I need to find a dataframe that contains all the first (chronological) transactions for every customer. The final dataframe should contain the same columns as the original one.
So far I have tried to use some "group by", together with aggregate functions, but I cannot seem to get the first transaction in chronological order, instead of the first in order of appeareance.
Any thoughts?
This will get you the earliest observation per customer:
df_first = df.sort_values('transaction_date').groupby('user_id').head(1)

Count values in dataframe column that include specific parameters [duplicate]

This question already has answers here:
How to get the number of times a piece if word is inside a particular column in pandas?
(2 answers)
Closed 3 years ago.
I have a dataframe with ~150k columns:
Dataframe: Information about Salaries and Employees
I need to count specific values in the Job Title column of the dataframe, but it has to be a count of the values that include 'chief' somewhere within the job title.
I tried bringing up all the unique job titles up with value_counts, but there are too many still for me to count.
print("%s employees have 'chief' in their job title." % salaries['JobTitle'].value_counts())
How can I create the specific condition I need to count the values correctly?
salaries['JobTitle'].str.contains('chief').sum()

How to filter dataset to contain only specific keywords? [duplicate]

This question already has answers here:
Use a list of values to select rows from a Pandas dataframe
(8 answers)
Filter dataframe rows if value in column is in a set list of values [duplicate]
(7 answers)
Closed 4 years ago.
I have dataset which contains multiple countries.
How can I filter it so that it contains only specific countries?
For example now it contains UK, Belgium, France, ...etc
I would like to filter it so that it shows only France and Belgium.
So far I have tried that:
dataset = dataset.loc[dataset.Country == "France"].copy()
dataset.head()
and it works, because it filters only the data for France, but if I add Belgium
dataset = dataset.loc[dataset.Country == "France","Belgium"].copy()
dataset.head()
It doesn't work any more.
I get the following error:
'the label [Belgium] is not in the [columns]'
Any help will be highly appreciated.
what you tried failed because it's treating 'Belgium' as a column to look for, which doesn't exist. If you want to filter against multiple values then use isin:
dataset = dataset[dataset['Country'].isin([ "France","Belgium"])].copy()
when you use loc the param after the comma is treated as the label to look for, in this case in the column axis

Categories

Resources