This question already has answers here:
How to get the number of times a piece if word is inside a particular column in pandas?
(2 answers)
Closed 3 years ago.
I have a dataframe with ~150k columns:
Dataframe: Information about Salaries and Employees
I need to count specific values in the Job Title column of the dataframe, but it has to be a count of the values that include 'chief' somewhere within the job title.
I tried bringing up all the unique job titles up with value_counts, but there are too many still for me to count.
print("%s employees have 'chief' in their job title." % salaries['JobTitle'].value_counts())
How can I create the specific condition I need to count the values correctly?
salaries['JobTitle'].str.contains('chief').sum()
Related
This question already has answers here:
pyspark: grouby and then get max value of each group
(2 answers)
Closed 1 year ago.
I have such a DF. I am working on tables in the form of rdd
I would like to get a table with the maximum order value for a given country and the customer number for this customer.
I have no idea how to construct the map function. Unless there is a better way?
With PySpark:
df.groupBy('customernumber', 'city').max('sum_of_orders')
With Pandas:
df.groupby(['customernumber', 'city'])['sum_of_orders'].max()
This question already has answers here:
Count the frequency that a value occurs in a dataframe column
(15 answers)
Closed 1 year ago.
Let's say my data frame has a column 'genus', and it contains 5 rows each that contain 'seed', 'flame', 'turtle', and 'shellfish'.
How can you get Python to add up each category and print out a total count for each category? I want to be able to apply a function that will automatically do this for each unique category in the column 'genus' without having to go and parse out each unique category.
I would want to see something like:
seed: 5
flame: 5
turtle: 5
Shellfish: 5
(yes, this is a Pokemon dataset 😂)
This should work,
df['genus'].value_counts()
or
df.genus.value_counts()
This question already has answers here:
Pandas get topmost n records within each group
(6 answers)
Closed 3 years ago.
Given a pandas dataframe with company purchases across various months in a year, how do I find the "N" highest each month?
Currently have:
df.groupby(df['Transaction Date'].dt.strftime('%B'))['Amount'].max()
Which is returning the highest value for each month but would like to see the highest four values.
Am I getting close here or is there a more efficient approach? Thanks in advance
With sort_values then tail
yourdf=df.sort_values('Amount').groupby(df['Transaction Date'].dt.strftime('%B'))['Amount'].tail(4)
This question already has answers here:
Groupby first two earliest dates, then average time between first two dates - pandas
(3 answers)
Closed 3 years ago.
I would like some help to solve the following problem using Pandas in Python.
I have a dataframe about the customers' transactions - in random order, which contains the following columns, along with datatypes:
user_id object;
transaction_date datetime64[ns];
account_creation_date datetime64[ns];
transaction_id object;
I need to find a dataframe that contains all the first (chronological) transactions for every customer. The final dataframe should contain the same columns as the original one.
So far I have tried to use some "group by", together with aggregate functions, but I cannot seem to get the first transaction in chronological order, instead of the first in order of appeareance.
Any thoughts?
This will get you the earliest observation per customer:
df_first = df.sort_values('transaction_date').groupby('user_id').head(1)
This question already has answers here:
Pandas groupby: How to get a union of strings
(8 answers)
Closed 3 years ago.
new in pandas and I was able to create a dataframe from a csv file. I was also able to sort it out.
What I am struggling now is the following: I give an image as an example from a pandas data frame.
First column is the index,
Second column is a group number
Third column is what happened.
I want based on the second column to take out the third column on the same unique data frame.
I highlight few examples: For the number 9 return back the sequence
[60,61,70,51]
For the number 6 get back the sequence
[65,55,56]
For the number 8 get back the single element 8.
How groupby can be used to do this extraction?
Thanks a lot
Regards
Alex
Starting from the answers on this question we can extract following code to receive the desired result.
dataframe = pd.DataFrame({'index':[0,1,2,3,4], 'groupNumber':[9,9,9,9,9], 'value':[12,13,14,15,16]})
grouped = dataframe.groupby('groupNumber')['value'].apply(list)