I have a dataframe like this one below
Air Station Code Humidity Temperature Latitude Longitude
St.1 20 10 10.00 10.00
St.2 4 15 25.00 30.00
St.3 16 21 8.00 15.00
St.4 38 8 31.00 40.00
St.5 10 18 10.00 10.00
St.6 40 4 25.00 30.00
St.7 10 13 8.00 15.00
St.8 46 11 31.00 40.00
St.9 28 9 10.00 10.00
St.10 14 22 25.00 30.00
St.11 5 40 8.00 15.00
St.12 11 10 31.00 40.00
...
St.89 61 35 10.00 10.00
St.90 23 29 25.00 30.00
St.91 35 12 8.00 15.00
St.92 31 7 31.00 40.00
I want to change the station codes by matching the coordinates, substituing the codes by repeating the first 4 codes, obtaining this
Air Station Code Humidity Temperature Latitude Longitude
St.1 20 10 10.00 10.00
St.2 4 15 25.00 30.00
St.3 16 21 8.00 15.00
St.4 38 8 31.00 40.00
St.1 10 18 10.00 10.00
St.2 40 4 25.00 30.00
St.3 10 13 8.00 15.00
St.4 46 11 31.00 40.00
St.1 28 9 10.00 10.00
St.2 14 22 25.00 30.00
St.3 5 40 8.00 15.00
St.4 11 10 31.00 40.00
...
St.1 61 35 10.00 10.00
St.2 23 29 25.00 30.00
St.3 35 12 8.00 15.00
St.4 31 7 31.00 40.00
Is there some way to implement an "if/else" substitution on the whole dataframe without going manually over every observation in python?
df['Air Station Code'] = 'St.' + pd.Series(df[['Latitude','Longitude']].astype(str).agg(sum, axis=1).factorize()[0] + 1).astype(str)
df
Out[77]:
Air Station Code Humidity Temperature Latitude Longitude
0 St.1 20 10 10.0 10.0
1 St.2 4 15 25.0 30.0
2 St.3 16 21 8.0 15.0
3 St.4 38 8 31.0 40.0
4 St.1 10 18 10.0 10.0
5 St.2 40 4 25.0 30.0
6 St.3 10 13 8.0 15.0
7 St.4 46 11 31.0 40.0
8 St.1 28 9 10.0 10.0
9 St.2 14 22 25.0 30.0
10 St.3 5 40 8.0 15.0
11 St.4 11 10 31.0 40.0
There may be a better way to do this, but this solves the problem. I create a second dataframe without the duplicates, which keeps the first occurrence of each lat/long. I make lat/long the index and drop the other columns. I then do a "join", adding a new column with the matching lat/long. I then overwrite the original station code with the looked up one.
import pandas as pd
data = [
["St.1", 20, 10, 10.00, 10.00],
["St.2", 4, 15, 25.00, 30.00],
["St.3", 16, 21, 8.00, 15.00],
["St.4", 38, 8, 31.00, 40.00],
["St.5", 10, 18, 10.00, 10.00],
["St.6", 40, 4, 25.00, 30.00],
["St.7", 10, 13, 8.00, 15.00],
["St.8", 46, 11, 31.00, 40.00],
["St.9", 28, 9, 10.00, 10.00],
["St.10", 14, 22, 25.00, 30.00],
["St.11", 5, 40, 8.00, 15.00],
["St.12", 11, 10, 31.00, 40.00],
["St.89", 61, 35, 10.00, 10.00],
["St.90", 23, 29, 25.00, 30.00],
["St.91", 35, 12, 8.00, 15.00],
["St.92", 31, 7, 31.00, 40.00],
]
column = "Air_Station_Code Humidity Temperature Latitude Longitude".split()
df = pd.DataFrame(data,columns=column)
print(df)
df1 = df.drop_duplicates(['Latitude','Longitude'])
df1 = df1[['Air_Station_Code','Latitude','Longitude']]
df1.set_index(['Latitude','Longitude'], inplace=True)
print(df1)
df2 = df.join( df1, on=['Latitude','Longitude'], rsuffix='R' )
print(df2)
df['Air_Station_Code'] = df2['Air_Station_CodeR']
print(df)
Output:
Air_Station_Code Humidity Temperature Latitude Longitude
0 St.1 20 10 10.0 10.0
1 St.2 4 15 25.0 30.0
2 St.3 16 21 8.0 15.0
3 St.4 38 8 31.0 40.0
4 St.5 10 18 10.0 10.0
5 St.6 40 4 25.0 30.0
6 St.7 10 13 8.0 15.0
7 St.8 46 11 31.0 40.0
8 St.9 28 9 10.0 10.0
9 St.10 14 22 25.0 30.0
10 St.11 5 40 8.0 15.0
11 St.12 11 10 31.0 40.0
12 St.89 61 35 10.0 10.0
13 St.90 23 29 25.0 30.0
14 St.91 35 12 8.0 15.0
15 St.92 31 7 31.0 40.0
Air_Station_Code
Latitude Longitude
10.0 10.0 St.1
25.0 30.0 St.2
8.0 15.0 St.3
31.0 40.0 St.4
Air_Station_Code Humidity ... Longitude Air_Station_CodeR
0 St.1 20 ... 10.0 St.1
1 St.2 4 ... 30.0 St.2
2 St.3 16 ... 15.0 St.3
3 St.4 38 ... 40.0 St.4
4 St.5 10 ... 10.0 St.1
5 St.6 40 ... 30.0 St.2
6 St.7 10 ... 15.0 St.3
7 St.8 46 ... 40.0 St.4
8 St.9 28 ... 10.0 St.1
9 St.10 14 ... 30.0 St.2
10 St.11 5 ... 15.0 St.3
11 St.12 11 ... 40.0 St.4
12 St.89 61 ... 10.0 St.1
13 St.90 23 ... 30.0 St.2
14 St.91 35 ... 15.0 St.3
15 St.92 31 ... 40.0 St.4
[16 rows x 6 columns]
Air_Station_Code Humidity Temperature Latitude Longitude
0 St.1 20 10 10.0 10.0
1 St.2 4 15 25.0 30.0
2 St.3 16 21 8.0 15.0
3 St.4 38 8 31.0 40.0
4 St.1 10 18 10.0 10.0
5 St.2 40 4 25.0 30.0
6 St.3 10 13 8.0 15.0
7 St.4 46 11 31.0 40.0
8 St.1 28 9 10.0 10.0
9 St.2 14 22 25.0 30.0
10 St.3 5 40 8.0 15.0
11 St.4 11 10 31.0 40.0
12 St.1 61 35 10.0 10.0
13 St.2 23 29 25.0 30.0
14 St.3 35 12 8.0 15.0
15 St.4 31 7 31.0 40.0
Related
This question already has answers here:
Merge multiple DataFrames Pandas
(5 answers)
Pandas Merging 101
(8 answers)
Closed 7 months ago.
I have three dataframes
Dataframe df1:
date A
0 2022-04-11 1
1 2022-04-12 2
2 2022-04-14 26
3 2022-04-16 2
4 2022-04-17 1
5 2022-04-20 17
6 2022-04-21 14
7 2022-04-22 1
8 2022-04-23 9
9 2022-04-24 1
10 2022-04-25 5
11 2022-04-26 2
12 2022-04-27 21
13 2022-04-28 9
14 2022-04-29 17
15 2022-04-30 5
16 2022-05-01 8
17 2022-05-07 1241217
18 2022-05-08 211
19 2022-05-09 1002521
20 2022-05-10 488739
21 2022-05-11 12925
22 2022-05-12 57
23 2022-05-13 8515098
24 2022-05-14 1134576
Dateframe df2:
date B
0 2022-04-12 8
1 2022-04-14 7
2 2022-04-16 2
3 2022-04-19 2
4 2022-04-23 2
5 2022-05-07 2
6 2022-05-08 5
7 2022-05-09 2
8 2022-05-14 1
Dataframe df3:
date C
0 2022-04-12 6
1 2022-04-13 1
2 2022-04-14 2
3 2022-04-20 3
4 2022-04-21 9
5 2022-04-22 25
6 2022-04-23 56
7 2022-04-24 49
8 2022-04-25 68
9 2022-04-26 71
10 2022-04-27 40
11 2022-04-28 44
12 2022-04-29 27
13 2022-04-30 34
14 2022-05-01 28
15 2022-05-07 9
16 2022-05-08 20
17 2022-05-09 24
18 2022-05-10 21
19 2022-05-11 8
20 2022-05-12 8
21 2022-05-13 14
22 2022-05-14 25
23 2022-05-15 43
24 2022-05-16 36
25 2022-05-17 29
26 2022-05-18 28
27 2022-05-19 17
28 2022-05-20 6
I would like to merge df1, df2, df3 in a single dataframe with columns date, A, B, C, in such a way that date contains all dates which appeared in df1 and/or df2 and/or df3 (without repetition), and if a particular date was not in any of the dataframes, then for the respective column I put value 0.0. So, I would like to have something like that:
date A B C
0 2022-04-11 1.0 0.0 0.0
1 2022-08-12 2.0 8.0 6.0
2 2022-08-13 0.0 0.0 1.0
...
I tried to use this method
merge1 = pd.merge(df1, df2, how='outer')
sorted_merge1 = merge1.sort_values(by=['date'], ascending=False)
full_merge = pd.merge(sorted_merg1, df3, how='outer')
However, it seems it skips the dates which are not common for all three dataframes.
Try this,
print(pd.merge(df1, df2, on='date', how='outer').merge(df3, on='date', how='outer').fillna(0))
O/P:
date A B C
0 2022-04-11 1.0 0.0 0.0
1 2022-04-12 2.0 8.0 6.0
2 2022-04-14 26.0 7.0 2.0
3 2022-04-16 2.0 2.0 0.0
4 2022-04-17 1.0 0.0 0.0
5 2022-04-20 17.0 0.0 3.0
6 2022-04-21 14.0 0.0 9.0
7 2022-04-22 1.0 0.0 25.0
8 2022-04-23 9.0 2.0 56.0
9 2022-04-24 1.0 0.0 49.0
10 2022-04-25 5.0 0.0 68.0
11 2022-04-26 2.0 0.0 71.0
12 2022-04-27 21.0 0.0 40.0
13 2022-04-28 9.0 0.0 44.0
14 2022-04-29 17.0 0.0 27.0
15 2022-04-30 5.0 0.0 34.0
16 2022-05-01 8.0 0.0 28.0
17 2022-05-07 1241217.0 2.0 9.0
18 2022-05-08 211.0 5.0 20.0
19 2022-05-09 1002521.0 2.0 24.0
20 2022-05-10 488739.0 0.0 21.0
21 2022-05-11 12925.0 0.0 8.0
22 2022-05-12 57.0 0.0 8.0
23 2022-05-13 8515098.0 0.0 14.0
24 2022-05-14 1134576.0 1.0 25.0
25 2022-04-19 0.0 2.0 0.0
26 2022-04-13 0.0 0.0 1.0
27 2022-05-15 0.0 0.0 43.0
28 2022-05-16 0.0 0.0 36.0
29 2022-05-17 0.0 0.0 29.0
30 2022-05-18 0.0 0.0 28.0
31 2022-05-19 0.0 0.0 17.0
32 2022-05-20 0.0 0.0 6.0
perform merge chain and fill NaN with 0
I have a couple of data frames given this way :
38 47 7 20 35
45 76 63 96 24
98 53 2 87 80
83 86 92 48 1
73 60 26 94 6
80 50 29 53 92
66 90 79 98 46
40 21 58 38 60
35 13 72 28 6
48 76 51 96 12
79 80 24 37 51
86 70 1 22 71
52 69 10 83 13
12 40 3 0 30
46 50 48 76 5
Could you please tell me how it is possible to add them to a list of dataframes?
Thanks a lot!
First convert values to one DataFrame with separator misisng values (converted from blank lines):
df = pd.read_csv(file, header=None, skip_blank_lines=False)
print (df)
0 1 2 3 4
0 38.0 47.0 7.0 20.0 35.0
1 45.0 76.0 63.0 96.0 24.0
2 98.0 53.0 2.0 87.0 80.0
3 83.0 86.0 92.0 48.0 1.0
4 73.0 60.0 26.0 94.0 6.0
5 NaN NaN NaN NaN NaN
6 80.0 50.0 29.0 53.0 92.0
7 66.0 90.0 79.0 98.0 46.0
8 40.0 21.0 58.0 38.0 60.0
9 35.0 13.0 72.0 28.0 6.0
10 48.0 76.0 51.0 96.0 12.0
11 NaN NaN NaN NaN NaN
12 79.0 80.0 24.0 37.0 51.0
13 86.0 70.0 1.0 22.0 71.0
14 52.0 69.0 10.0 83.0 13.0
15 12.0 40.0 3.0 0.0 30.0
16 46.0 50.0 48.0 76.0 5.0
And then in list comprehension create smaller DataFrames in list:
dfs = [g.iloc[1:].astype(int).reset_index(drop=True)
for _, g in df.groupby(df[0].isna().cumsum())]
print (dfs[1])
0 1 2 3 4
0 80 50 29 53 92
1 66 90 79 98 46
2 40 21 58 38 60
3 35 13 72 28 6
4 48 76 51 96 12
Year Week_Number DC_Zip Asin_code
0 2016 1 12206 NaN
1 2016 1 29306 NaN
2 2016 1 33426 NaN
3 2016 1 37206 NaN
4 2016 1 45216 NaN
5 2016 1 60160 NaN
6 2016 1 76110 NaN
7 2016 1 80215 NaN
8 2016 1 84105 NaN
9 2016 1 85034 NaN
10 2016 1 93711 NaN
11 2016 1 98433 NaN
12 2016 2 12206 21.0
13 2016 2 29306 10.0
14 2016 2 33426 11.0
15 2016 2 37206 1.0
16 2016 2 45216 5.0
17 2016 2 60160 7.0
18 2016 2 76110 12.0
19 2016 2 80215 NaN
20 2016 2 84105 2.0
21 2016 2 85034 1.0
22 2016 2 93711 23.0
23 2016 2 98433 7.0
24 2016 3 12206 95.0
25 2016 3 29306 26.0
26 2016 3 33426 51.0
27 2016 3 37206 18.0
28 2016 3 45216 34.0
29 2016 3 60160 30.0
... ... ... ... ...
2778 2020 29 76110 33.0
2779 2020 29 80215 5.0
2780 2020 29 84105 3.0
2781 2020 29 85034 8.0
2782 2020 29 93711 53.0
2783 2020 29 98433 15.0
2784 2020 30 12206 75.0
2785 2020 30 29306 27.0
2786 2020 30 33426 34.0
2787 2020 30 37206 12.0
2788 2020 30 45216 14.0
2789 2020 30 60160 28.0
2790 2020 30 76110 47.0
2791 2020 30 80215 11.0
2792 2020 30 84105 3.0
2793 2020 30 85034 17.0
2794 2020 30 93711 62.0
2795 2020 30 98433 13.0
2796 2020 31 12206 109.0
2797 2020 31 29306 30.0
2798 2020 31 33426 31.0
2799 2020 31 37206 14.0
2800 2020 31 45216 23.0
2801 2020 31 60160 21.0
2802 2020 31 76110 25.0
2803 2020 31 80215 7.0
2804 2020 31 84105 4.0
2805 2020 31 85034 8.0
2806 2020 31 93711 71.0
2807 2020 31 98433 9.0
2808 rows × 4 columns
This is the sales data I am dealing with. I have to perform a weighted average on Asin_code with weighted rate = [5, 5, 20, 30, 40] on respective years 2016, 2017, 2018, 2019 and 2020. I have to create a function so that it will give me a column containing the weighted average of Asin_code."Nan" values should be dropped. We should also change the weighted rate in the future to view more patterns with the data. Any help would be appreciated.
i am trying the following code:
for i in range(len(df.Asin_code)):
df["Weighted_avg"]=rate[0]*df.Asin_code[i]/df.Asin_code.loc[(df.Year==2016)].sum()
just facing difficulties in consolidating the data for whole 5 years.
It becomes much simpler it you define your weights as a dict instead of a list then a simple use of apply() works
# define weights for year as a dict
wr = {2016:5, 2017:5, 2018:20, 2019:30, 2020:40}
df["Weighted_avg"] = df.apply(lambda r:
# numerator is weight * Asin_code[i]
( r["Asin_code"] * wr[r["Year"]]
/
# denomimator sum(Asin_code for year)
df.Asin_code.loc[(df.Year==r["Year"])].sum() ), axis=1)
output
Idx Year Week_Number DC_Zip Asin_code Weighted_avg
25 2016 3 29306 26.0 0.367232
26 2016 3 33426 51.0 0.720339
27 2016 3 37206 18.0 0.254237
28 2016 3 45216 34.0 0.480226
29 2016 3 60160 30.0 0.423729
2778 2020 29 76110 33.0 1.625616
2779 2020 29 80215 5.0 0.246305
2780 2020 29 84105 3.0 0.147783
2781 2020 29 85034 8.0 0.394089
2782 2020 29 93711 53.0 2.610837
suplementary update
Updated request: weighted_average[at index 1]=rate[for year 2016]*Asin_code[at first index of 2016]+rate[for year 2017]*Asin_code[at first index of 2017]+rate[for year 2018]*Asin_code[at first index of 2018]+rate[for year 2019]*Asin_code[at first index of 2019]+rate[for year 2020]*Asin_code[at first index of 2020]
df.dropna().groupby("Year").agg({"Asin_code":"first"}).reset_index()\
.assign(wa=lambda dfa:
dfa.apply(lambda r: r["Asin_code"]*wr[r['Year']],axis=1))["wa"].sum()
df["Weighted_avg"] = df.apply(lambda r: ( (r["Asin_code"] *wr[r["Year"]]).sum(axis = 0)), axis=1)
Output
12 2016 2 12206 21.0 105.0
13 2016 2 29306 10.0 50.0
14 2016 2 33426 11.0 55.0
15 2016 2 37206 1.0 5.0
16 2016 2 45216 5.0 25.0
17 2016 2 60160 7.0 35.0
18 2016 2 76110 12.0 60.0
19 2016 2 80215 NaN NaN
20 2016 2 84105 2.0 10.0
21 2016 2 85034 1.0 5.0
22 2016 2 93711 23.0 115.0
23 2016 2 98433 7.0 35.0
24 2016 3 12206 95.0 475.0
25 2016 3 29306 26.0 130.0
26 2016 3 33426 51.0 255.0
27 2016 3 37206 18.0 90.0
28 2016 3 45216 34.0 170.0
29 2016 3 60160 30.0 150.0
... ... ... ... ... ...
2778 2020 29 76110 33.0 1320.0
2779 2020 29 80215 5.0 200.0
2780 2020 29 84105 3.0 120.0
2781 2020 29 85034 8.0 320.0
2782 2020 29 93711 53.0 2120.0
2783 2020 29 98433 15.0 600.0
2784 2020 30 12206 75.0 3000.0
2785 2020 30 29306 27.0 1080.0
2786 2020 30 33426 34.0 1360.0
2787 2020 30 37206 12.0 480.0
2788 2020 30 45216 14.0 560.0
2789 2020 30 60160 28.0 1120.0
2790 2020 30 76110 47.0 1880.0
2791 2020 30 80215 11.0 440.0
2792 2020 30 84105 3.0 120.0
2793 2020 30 85034 17.0 680.0
2794 2020 30 93711 62.0 2480.0
2795 2020 30 98433 13.0 520.0
2796 2020 31 12206 109.0 4360.0
2797 2020 31 29306 30.0 1200.0
2798 2020 31 33426 31.0 1240.0
2799 2020 31 37206 14.0 560.0
2800 2020 31 45216 23.0 920.0
2801 2020 31 60160 21.0 840.0
2802 2020 31 76110 25.0 1000.0
2803 2020 31 80215 7.0 280.0
2804 2020 31 84105 4.0 160.0
2805 2020 31 85034 8.0 320.0
2806 2020 31 93711 71.0 2840.0
2807 2020 31 98433 9.0 360.0
Got my solution with this.
I have a pandas dataframe similar to this.
score avg
date
1/1/2017 0 0
1/2/2017 1 0.5
1/3/2017 2 1
1/4/2017 3 1.5
1/5/2017 4 2
1/6/2017 5 2.5
1/7/2017 6 3
1/8/2017 7 3.5
1/9/2017 8 4
1/10/2017 9 4.5
1/11/2017 10 5
1/12/2017 11 5.5
1/13/2017 12 7.5
1/14/2017 13 6.5
1/15/2017 14 7.5
1/16/2017 15 8.5
1/17/2017 16 9.5
1/18/2017 17 10.5
1/19/2017 18 11.5
1/20/2017 19 12.5
1/21/2017 20 13.5
1/22/2017 21 14.5
1/23/2017 22 15.5
1/24/2017 23 16.5
1/25/2017 24 17.5
1/26/2017 25 18.5
1/27/2017 26 19.5
1/28/2017 27 20.5
1/29/2017 28 21.5
Basically I am looking to create a 14 day rolling average of the data, but instead of showing NaNs for the first 14 days, simply showing the simple averages. For example, the average on day 2 is the average of day 1 and 2, the average on day 10 is the averages of days 1-10, etc. How would I go about doing this without having to manually create averages? Thanks for the help!
What you need to use is rolling with min_periods=1 as paramter:
df['avg2'] = df.rolling(14, min_periods=1)['score'].mean()
Output:
date score avg avg2
0 2017-01-01 0 0.0 0.0
1 2017-01-02 1 0.5 0.5
2 2017-01-03 2 1.0 1.0
3 2017-01-04 3 1.5 1.5
4 2017-01-05 4 2.0 2.0
5 2017-01-06 5 2.5 2.5
6 2017-01-07 6 3.0 3.0
7 2017-01-08 7 3.5 3.5
8 2017-01-09 8 4.0 4.0
9 2017-01-10 9 4.5 4.5
10 2017-01-11 10 5.0 5.0
11 2017-01-12 11 5.5 5.5
12 2017-01-13 12 7.5 6.0
13 2017-01-14 13 6.5 6.5
14 2017-01-15 14 7.5 7.5
15 2017-01-16 15 8.5 8.5
16 2017-01-17 16 9.5 9.5
17 2017-01-18 17 10.5 10.5
18 2017-01-19 18 11.5 11.5
19 2017-01-20 19 12.5 12.5
20 2017-01-21 20 13.5 13.5
21 2017-01-22 21 14.5 14.5
22 2017-01-23 22 15.5 15.5
23 2017-01-24 23 16.5 16.5
24 2017-01-25 24 17.5 17.5
25 2017-01-26 25 18.5 18.5
26 2017-01-27 26 19.5 19.5
27 2017-01-28 27 20.5 20.5
28 2017-01-29 28 21.5 21.5
I have data
age
32
16
39
39
23
36
29
26
43
34
35
50
29
29
31
42
53
I need to get smth like this
I can get
df.age.value_counts()
and
100. * df.age.value_counts() / len(df.age)
But how can I union this and give name to columns?
You can use cut with agg:
#helper df with min and max ages, necessary add category Total
df1 = pd.DataFrame({'G':['14 yo and younger','15-19','20-24','25-29','30-34',
'35-39','40-44','45-49','50-54','55-59','60-64','65+','Total'],
'Min':[0, 15,20,25,30,35,40,45,50,55,60,65,np.nan],
'Max':[14,19,24,29,34,39,44,49,54,59,64,120, np.nan]})
print (df1)
G Max Min
0 14 yo and younger 14.0 0.0
1 15-19 19.0 15.0
2 20-24 24.0 20.0
3 25-29 29.0 25.0
4 30-34 34.0 30.0
5 35-39 39.0 35.0
6 40-44 44.0 40.0
7 45-49 49.0 45.0
8 50-54 54.0 50.0
9 55-59 59.0 55.0
10 60-64 64.0 60.0
11 65+ 120.0 65.0
12 Total NaN NaN
cutoff = np.hstack([np.array(df1.Min[0]), df1.Max.values])
labels = df1.G.values
df['Groups'] = pd.cut(df.age, bins=cutoff, labels=labels, right=True, include_lowest=True)
print (df)
age Groups
0 32 30-34
1 16 15-19
2 39 35-39
3 39 35-39
4 23 20-24
5 36 35-39
6 29 25-29
7 26 25-29
8 43 40-44
9 34 30-34
10 35 35-39
11 50 50-54
12 29 25-29
13 29 25-29
14 31 30-34
15 42 40-44
16 53 50-54
df = df.groupby('Groups')['Groups']
.agg({'Total':[len, lambda x: len(x)/df.shape[0] * 100 ]})
.rename(columns={'len':'N', '<lambda>':'%'})
#last Total row
df.ix['Total'] = df.sum()
print (df)
Total
N %
Groups
14 yo and younger 0.0 0.000000
15-19 1.0 5.882353
20-24 1.0 5.882353
25-29 4.0 23.529412
30-34 3.0 17.647059
35-39 4.0 23.529412
40-44 2.0 11.764706
45-49 0.0 0.000000
50-54 2.0 11.764706
55-59 0.0 0.000000
60-64 0.0 0.000000
65+ 0.0 0.000000
Total 17.0 100.000000
EDIT1:
Solution with size scale better:
df1 = df.groupby('Groups').size().to_frame()
df1.columns = pd.MultiIndex.from_arrays(('Total','N'))
df1.ix[:,('Total','%')] = 100 * df1.ix[:,('Total','N')] / df.shape[0]
df1.ix['Total'] = df1.sum()
print (df1)
Total
N %
Groups
14 yo and younger 0.0 0.000000
15-19 1.0 5.882353
20-24 1.0 5.882353
25-29 4.0 23.529412
30-34 3.0 17.647059
35-39 4.0 23.529412
40-44 2.0 11.764706
45-49 0.0 0.000000
50-54 2.0 11.764706
55-59 0.0 0.000000
60-64 0.0 0.000000
65+ 0.0 0.000000
Total 17.0 100.000000