Sum by year and total_vehicles pandas dataframe - python

I have the following dataframe lrdata3 and I would like to sum the total_vehicles for every year instead of having multiple separate for the same year.
year total_vehicles
0 2000 2016
1 2000 1483
2 2000 1275
3 2000 1086
4 2000 816
When I do this
lrdata3.groupby('year')['total_vehicles'].sum()
I get this which is not even a dataframe
year
2000 419587299
2001 425832533
2002 430480581
2003 434270003
2004 442680113
2005 443366960
2006 452086899
2007 452280161
2008 445462026
2009 443333980
2010 438827716
2011 440461505
2012 440073277
2013 441751395
2014 451394270
2015 460050397
2016 470256985
2017 474693803
2018 473765568
Any help please?
Thanks

You can do it in one line and get a df with this syntax.
Some sample data:
year total_vehicles
0 2000 2016
1 2000 1483
2 2000 1275
3 2000 1086
4 2000 816
5 2001 2016
6 2001 1483
7 2001 1275
8 2002 1086
9 2002 816
df = pd.read_clipboard()
gb = df.groupby('year').agg({'total_vehicles': 'sum'})
print(gb)
total_vehicles
year
2000 6676
2001 4774
2002 1902
print(type(gb))
<class 'pandas.core.frame.DataFrame'>

Your code is fine, just add a .reset_index() to it. Like this:
lrdata3.groupby('year')['total_vehicles'].sum().reset_index()
This will get you what you want.

lrdata3.groupby('year')['total_vehicles'].sum().to_frame()
or groupby and transform
lrdata3['yearlytotal_vehicles']=lrdata3.groupby('year')['total_vehicles'].transform('sum')

Related

Melt a Pandas Dataframe with multiple columns [duplicate]

This question already has answers here:
How do I melt a pandas dataframe?
(3 answers)
Closed 6 months ago.
I wanted to know if there's a way to melt a DataFrame with multiple column names.
I have this Pandas Data Frame:
Edad 2000 2001 2002 2003 ... 2017 2018 2019 2020
...
[15-25] 126675 158246 171958 188389 ... 78707 70246 65661 52209
(25-35] 65823 85059 92841 95394 ... 88479 157492 149862 122067
(35-45] 37474 48605 54593 56279 ... 65870 65798 64587 51502
(45-55] 20624 22067 25860 27601 ... 39476 40725 40566 33979
(55-65] 30240 9047 10500 10972 ... 20135 21095 21173 17242
And would like to have something like this:
Edad Year Value
[15-25] 2000 126675
[15-25] 2001 158246
[15-25] 2002 171958
[15-25] 2003 188389
I've used Melt before but I always address a value column, this time I have my values as cells and I'm having a very hard time figuring out how to address them.
You can use melt with groupby and sort like this:
df.melt(id_vars='Edad', var_name='Year').groupby(['Edad','Year']).agg({'value':'first'}).reset_index().sort_values(by=['Edad','Year'], ascending=[False,True])
Desired results:
Edad Year value
32 [15-25] 2000 126675
33 [15-25] 2001 158246
34 [15-25] 2002 171958
35 [15-25] 2003 188389
36 [15-25] 2017 78707
37 [15-25] 2018 70246
38 [15-25] 2019 65661
39 [15-25] 2020 52209
24 (55-65] 2000 30240
25 (55-65] 2001 9047
26 (55-65] 2002 10500
27 (55-65] 2003 10972
28 (55-65] 2017 20135
29 (55-65] 2018 21095
30 (55-65] 2019 21173
31 (55-65] 2020 17242
16 (45-55] 2000 20624
17 (45-55] 2001 22067
18 (45-55] 2002 25860
19 (45-55] 2003 27601
20 (45-55] 2017 39476
21 (45-55] 2018 40725
22 (45-55] 2019 40566
23 (45-55] 2020 33979
8 (35-45] 2000 37474
9 (35-45] 2001 48605
10 (35-45] 2002 54593
11 (35-45] 2003 56279
12 (35-45] 2017 65870
13 (35-45] 2018 65798
14 (35-45] 2019 64587
15 (35-45] 2020 51502
0 (25-35] 2000 65823
1 (25-35] 2001 85059
2 (25-35] 2002 92841
3 (25-35] 2003 95394
4 (25-35] 2017 88479
5 (25-35] 2018 157492
6 (25-35] 2019 149862
7 (25-35] 2020 122067

Pandas : Groupby sum values

I am using this data frame in excel :
I'd like to show the total sales per year.
Year Sales
2021 7
2018 6
2018 787
2018 935
2018 1 059
2018 5
2018 72
2018 2
2018 3
2019 218
2019 256
2020 2
2018 4
2021 8
2019 14
2020 3
2018 3
2018 1
2020 34
I'm using this :
df.groupby(['Year'])['Sales'].agg('sum')
And the result :
2018.0 67879351 05957223431
2019.0 21825614
2020.0 2334
2021.0 78
Do you know why I don't have the sum of the values ?
Thanks
'Sales' column is of dtype object so convert it to numeric:
df['Sales']=pd.to_numeric(df['Sales'].replace(r"\s+",'',regex=True),errors='coerce')
#df['Sales'].replace(r"\s+",'',regex=True).astype(float)
Now calculte sum():
out=df.groupby(['Year'])['Sales'].sum()
output of out:
Year
2018 2877
2019 488
2020 39
2021 15
Name: Sales, dtype: int64

Pandas/Python - Generating a chart [duplicate]

This question already has an answer here:
matplotlib bar chart with dates
(1 answer)
Closed 4 years ago.
So I want to generate a chart graph from a csv data file, and I've been following a guide but I can't seem to manipulate my code in such a way to get what I want.
So here is what I have so far:
from pandas import DataFrame, read_csv
import matplotlib.pyplot as plt
import pandas as pd
import sys
import matplotlib
df = pd.read_csv("TB_burden_countries_2018-03-06.csv")
df = df.set_index(['country'])
df2 = df.loc["Zimbabwe", "e_mort_num"]
df2 = df.loc["Zimbabwe", "e_mort_num"]
df = pd.DataFrame(data = df2, columns= ["e_mort_num"])
df.columns = ["Mortality"]
print(df2)
This code was just so I can choose a specific country (Zimbabwe) and look at its population number (e_mort_num). What could I write to generate a chart graph? I've been using this tutorial : http://pbpython.com/simple-graphing-pandas.html, but I'm having trouble manipulating variable names, a I'm not too sure what I should be doing. If you require more information, please say so. Thank you for your help!
CSV bit of interest:
Country Year Mortality
Zimbabwe 2000 20000
Zimbabwe 2001 18000
Zimbabwe 2002 17000
Zimbabwe 2003 19000
Zimbabwe 2004 19000
Zimbabwe 2005 22000
Zimbabwe 2006 24000
Zimbabwe 2007 24000
Zimbabwe 2008 23000
Zimbabwe 2009 17000
Zimbabwe 2010 13000
Zimbabwe 2011 14000
Zimbabwe 2012 14000
Zimbabwe 2013 11000
Zimbabwe 2014 11000
Zimbabwe 2015 9000
Zimbabwe 2016 5600
Assuming your dataframe looks like this:
>>> df
Country Year Mortality
0 Zimbabwe 2000 20000
1 Zimbabwe 2001 18000
2 Zimbabwe 2002 17000
3 Zimbabwe 2003 19000
4 Zimbabwe 2004 19000
5 Zimbabwe 2005 22000
6 Zimbabwe 2006 24000
7 Zimbabwe 2007 24000
8 Zimbabwe 2008 23000
9 Zimbabwe 2009 17000
10 Zimbabwe 2010 13000
11 Zimbabwe 2011 14000
12 Zimbabwe 2012 14000
13 Zimbabwe 2013 11000
14 Zimbabwe 2014 11000
15 Zimbabwe 2015 9000
16 Zimbabwe 2016 5600
You can obtain a barplot by using the following code:
# Plot mortality per year:
plt.bar(df['Year'], df['Mortality'])
# Set plot title
plt.title('Zimbabwe')
# Set the "xticks", for barplots, this is the labels on your x axis
plt.xticks(df['Year'], rotation=90)
# Set the name of the x axis
plt.xlabel('Year')
# Set the name of the y axis
plt.ylabel('Mortality')
# tight_layout makes it nicer for reading and saving
plt.tight_layout()
# Show your plot
plt.show()
Which gives you this:

How to add a column with the growth rate in a budget table in Pandas?

I would like to know how can I add a growth rate year to year in the following data in Pandas.
Date Total Managed Expenditure
0 2001 503.2
1 2002 529.9
2 2003 559.8
3 2004 593.2
4 2005 629.5
5 2006 652.1
6 2007 664.3
7 2008 688.2
8 2009 732.0
9 2010 759.2
10 2011 769.2
11 2012 759.8
12 2013 760.6
13 2014 753.3
14 2015 757.6
15 2016 753.9
Use Series.pct_change():
df['Total Managed Expenditure'].pct_change()
Out:
0 NaN
1 0.053060
2 0.056426
3 0.059664
4 0.061194
5 0.035902
6 0.018709
7 0.035978
8 0.063644
9 0.037158
10 0.013172
11 -0.012220
12 0.001053
13 -0.009598
14 0.005708
15 -0.004884
Name: Total Managed Expenditure, dtype: float64
To assign it back:
df['Growth Rate'] = df['Total Managed Expenditure'].pct_change()

Enter Missing Year Amounts with Zeros After GroupBy in Pandas

I am grouping the following rows.
df = df.groupby(['id','year']).sum().sort(ascending=False)
print df
amount
id year
1 2009 120
2008 240
2007 240
2006 240
2005 240
2 2014 100
2013 50
2012 50
2011 100
2010 50
2006 100
... ...
Is there a way to add years that do not have any values with the amount equal to zero until a specific year, in this case 2005, as I am showing below?
Expected Output:
amount
id year
2015 0
2014 0
2013 0
2012 0
2011 0
2010 0
2009 120
2008 240
2007 240
2006 240
2005 240
2 2015 0
2014 100
2013 50
2012 50
2011 100
2010 50
2009 0
2008 0
2007 0
2006 100
2005 0
... ...
Starting with your first DataFrame, this will add all years that occur with some id to all ids.
df = df.unstack().fillna(0).stack()
e.g.
In [16]: df
Out[16]:
amt
id year
1 2001 1
2002 2
2003 3
2 2002 4
2003 5
2004 6
In [17]: df = df.unstack().fillna(0).stack()
In [18]: df
Out[18]:
amt
id year
1 2001 1
2002 2
2003 3
2004 0
2 2001 0
2002 4
2003 5
2004 6

Categories

Resources