This question already has answers here:
Specifying column order following groupby aggregation
(2 answers)
Closed 5 years ago.
My DataFrame is
State|City|Year|Budget|Income
S1|C1|2000|1000|1
S1|C2|2000|1200|2
S2|C3|2000|5500|3
I need to get a new DataFrame with columns:
State, Year, Count, Sum_Budget, Sum_Income:
That is,
State|Year|Count|Sum_Budget|Sum_Income
S1|2000|2|2200|3
S2|2000|1|5500|3
In C# the code would be:
dataframe
.GroupBy(x => new { x.State, x.City})
.Select(x => new {
x.Key.State,
x.Key.City,
Count = x.Count(),
Sum_Budget = x.Sum(y => y.Budget),
Sum_Income= x.Sum(y => y.Income)
}
}).ToArray();
How do I do so with Pandas?
Use agg:
d = {'Income':'Sum_Income','Budget':'Sum_Budget','City':'Count'}
agg_d = {'Budget':'sum', 'Income':'sum', 'City':'size'}
df = df.groupby(['State', 'Year'], as_index=False).agg(agg_d).rename(columns=d)
print (df)
State Year Sum_Income Sum_Budget Count
0 S1 2000 3 2200 2
1 S2 2000 3 5500 1
Related
This question already has answers here:
How do I melt a pandas dataframe?
(3 answers)
Closed 7 days ago.
I would like to change the arrangement of this table:
import pandas as pd
original_dict = {
"group A" : [10,9,11],
"group B" :[23,42,56]
}
original_df = pd.DataFrame(original_dict)
original_df
Here is the desired output:
Value
Group Type
10
group A
9
group A
11
group A
23
group B
42
group B
56
group B
Thank you!
You can use Pandas Melt function.
https://pandas.pydata.org/docs/reference/api/pandas.melt.html
df = pd.melt(original_df)
df.columns=['Group Type', 'Value']
df
Group Type Value
group A 10
group A 9
group A 11
group B 23
group B 42
group B 56
This question already has answers here:
How can I pivot a dataframe?
(5 answers)
Closed 1 year ago.
I need some help
I have the follow CSV file with this Data Frame:
how could I transfer the data of cases in columns week 1, week 2 (...) using Python and Pandas?
It would be something like this:
x = (
df.pivot_table(
index=["city", "population"],
columns="week",
values="cases",
aggfunc="max",
)
.add_prefix("week ")
.reset_index()
.rename_axis("", axis=1)
)
print(x)
Prints:
city population week 1 week 2
0 x 50000 5 10
1 y 88000 2 15
This question already has answers here:
Multi Index Sorting in Pandas
(2 answers)
Closed 2 years ago.
I have dataframe and I aggredated it as below. I want to sort (descending) it according to 'mean'. I m using below code but it gives an error.
df_agg = df.groupby('Subject Field').agg({'seniority_level':['min','mean','median','max']})
df_agg.sort_values(by='mean',ascending=False).head(10)
Error
Your aggregated dataframe has a multi level column index. So you need to address this by specifying both senority_level and mean.
df_agg.sort_values(('seniority_level', 'mean'), ascending=False)
Quick check to demonstrate:
df = pd.DataFrame({
'Accounting': [1, 2, 3],
'Acoustics': [4, 5, 6],
}).melt(var_name='Subject Field', value_name='seniority_level')
df_agg = df.groupby('Subject Field').agg(
{'seniority_level':['min', 'mean', 'median']}
)
df_agg.sort_values(('seniority_level','mean'), ascending=True)
seniority_level
min mean median
Subject Field
Accounting 1 2 2
Acoustics 4 5 5
df_agg.sort_values(('seniority_level','mean'), ascending=False)
seniority_level
min mean median
Subject Field
Acoustics 4 5 5
Accounting 1 2 2
This question already has answers here:
Pandas, Pivot table from 2 columns with values being a count of one of those columns
(2 answers)
Most efficient way to melt dataframe with a ton of possible values pandas
(2 answers)
How to form a pivot table on two categorical columns and count for each index?
(2 answers)
Closed 2 years ago.
am trying to transform the rows and count the occurrences of the values based on groupby the id
Dataframe:
id value
A cake
A cookie
B cookie
B cookie
C cake
C cake
C cookie
expected:
id cake cookie
A 1 1
B 0 2
c 2 1
I wanted to perform a groupby on an account ID and then perform a count of values after group by and give their counts as a new column.
How can I do it in pandas.
Eg:
Account Id Values
1 Open
2 Closed
1 Open
3 Closed
2 Open
Output must be:
Account Id Open Closed
1 2 0
2 1 1
3 0 1
Use a groupby and value_counts to get the initial counts you want. Then unstack the multiindex to get a DataFrame and set null values to 0 to get the final results:
import pandas as pd
# Defining DataFrame
df = pd.DataFrame(index=range(5))
df['Account Id'] = [1, 2, 1, 3, 2]
df['Values'] = ['Open', 'Closed', 'Open', 'Closed', 'Open']
grouped = df.groupby('Account Id')['Values'].value_counts()
# Remove the multiindex present
grouped = grouped.unstack()
# Set null values to 0
result = grouped.where(pd.notnull(grouped), 0)
Output of result:
Closed Open
Account Id
1 0 2
2 1 1
3 1 0
(Sorry, I'm not sure how to properly represent the DataFrame)
This would also return the dataframe for groupby object:
grouped_df = df.groupby(["Account Id","Values"])
grouped_df.size().reset_index(name = "Count")