Is there a way to calculate yearly moving average in Python? - python

I have some football data that I am modifying for analysis. I basically want to calculate career and yearly per game averages on a weekly basis for several stats.
Example
What I have:
Player
Year
Week
Rushing Yards
Catches
Seth Johnson
2020
1
100
4
Seth Johnson
2020
2
80
2
Seth Johnson
2021
1
50
3
Seth Johnson
2021
2
50
2
What I want:
Player
Year
Week
Rushing Yards
Catches
Career Rushing Yards per Game
Career Catches per Game
Yearly Rushing Yards per Game
Yearly Catches per Game
Seth Johnson
2020
1
100
4
100
4
100
4
Seth Johnson
2020
2
80
2
90
3
90
3
Seth Johnson
2021
1
50
3
76.67
3
50
3
Seth Johnson
2021
2
40
2
67.5
2.75
45
2.5
I figure I could calculate the Career stats and Yearly stats separately then just join everything on Player/Year/Week, but I'm not sure how to go about calculating the moving averages given that the window would be dependant on Year and Week.
I've tried things like looping through the desired categories and calculating rolling averages:
new_df['Career ' + category + ' per Game'] = df.groupby('Player')[category].apply(lambda x: x.rolling(3, min_periods=0).mean())
But I'm not finding the creativity necessary to make the appropriate custom window for rolling(). Does anyone have any ideas here?

Related

Conditional aggregation on dataframe columns with combining 'n' rows into 1 row

I have an Input Dataframe that the following :
NAME TEXT START END
Tim Tim Wagner is a teacher. 10 20.5
Tim He is from Cleveland, Ohio. 20.5 40
Frank Frank is a musician. 40 50
Tim He like to travel with his family 50 62
Frank He is a performing artist who plays the cello. 62 70
Frank He performed at the Carnegie Hall last year. 70 85
Frank It was fantastic listening to him. 85 90
Frank I really enjoyed 90 93
Want output dataframe as follows:
NAME TEXT START END
Tim Tim Wagner is a teacher. He is from Cleveland, Ohio. 10 40
Frank Frank is a musician 40 50
Tim He like to travel with his family 50 62
Frank He is a performing artist who plays the cello. He performed at the Carnegie Hall last year. 62 85
Frank It was fantastic listening to him. I really enjoyed 85 93
My current code:
grp = (df['NAME'] != df['NAME'].shift()).cumsum().rename('group')
df.groupby(['NAME', grp], sort=False)['TEXT','START','END']\
.agg({'TEXT':lambda x: ' '.join(x), 'START': 'min', 'END':'max'})\
.reset_index().drop('group', axis=1)
This combines the last 4 rows into one. Instead I want to combine only 2 rows (say any n rows) even if the 'NAME' has the same value.
Appreciate your help on this.
Thanks
You can groupby the grp to get the relative blocks inside the group:
blocks = df.NAME.ne(df.NAME.shift()).cumsum()
(df.groupby([blocks, df.groupby(blocks).cumcount()//2])
.agg({'NAME':'first', 'TEXT':' '.join,
'START':'min', 'END':'max'})
)
Output:
NAME TEXT START END
NAME
1 0 Tim Tim Wagner is a teacher. He is from Cleveland,... 10.0 40.0
2 0 Frank Frank is a musician. 40.0 50.0
3 0 Tim He like to travel with his family 50.0 62.0
4 0 Frank He is a performing artist who plays the cello.... 62.0 85.0
1 Frank It was fantastic listening to him. I really en... 85.0 93.0

How to select the row with same value in between two data frames?

I have the following huge dataset with me, containing a number of different app names and Sentiments under the attribute sent:
App Sent
0 10 Best Foods for You Positive
1 10 Best Foods for You Positive
2 10 Best Foods for You NaN
3 10 Best Foods for You Positive
4 10 Best Foods for You Positive
5 10 Best Foods for You Positive
6 10 Best Foods for You Positive
7 10 Best Foods for You NaN
8 10 Best Foods for You Neutral
9 10 Best Foods for You Neutral
10 10 Best Foods for You Positive
11 10 Best Foods for You Positive
12 10 Best Foods for You Positive
13 10 Best Foods for You Positive
... ...
64289 Houzz Interior Design Ideas NaN
64290 Houzz Interior Design Ideas NaN
64291 Houzz Interior Design Ideas NaN
64292 Houzz Interior Design Ideas NaN
64293 Houzz Interior Design Ideas NaN
64294 Houzz Interior Design Ideas NaN`
I want to figure out the app which has generated approximately the same ratio for positive and negative sentiments (i.e finding the apps which have the same number of positive and negative sentiments or closest number one)
I tried separating the above data frame into two data frames with values positive and negative.
And then grouping then with count
For example:
Positive dataframe p:
Sent
App
10 Best Foods for You 162
104 找工作 - 找工作 找打工 找兼職 履歷健檢 履歷診療室 31
11st 23
1800 Contacts - Lens Store 64
1LINE – One Line with One Touch 27
2018Emoji Keyboard 😂 Emoticons Lite -sticker&gif 25
21-Day Meditation Experience 68
2Date Dating App, Love and matching 26
2GIS: directory & navigator 23
2RedBeans 31
2ndLine - Second Phone Number 17
30 Day Fitness Challenge - Workout at Home 27
365Scores - Live Scores 5
3D Live Neon Weed Launcher 2
4 in a Row 17
7 Day Food Journal Challenge 9
7 Minute Workout 10
7 Weeks - Habit & Goal Tracker 10
8 Ball Pool 104
850 Sports News Digest 38
8fit Workouts & Meal Planner 137
95Live -SG#1 Live Streaming App 34
A Call From Santa Claus! 20
A Word A Day 3
A&E - Watch Full Episodes of TV Shows 30
A+ Gallery - Photos & Videos 24
...
HipChat - Chat Built for Teams 19
Hipmunk Hotels & Flights 30
Hitwe - meet people and chat 2
Hole19: Golf GPS App, Rangefinder & Scorecard 18
Home Decor Showpiece Art making: Medium Difficulty 16
Home Scouting® MLS Mobile 13
Home Security Camera WardenCam - reuse old phones 18
Home Street – Home Design Game 42
Home Workout - No Equipment 24
Home Workout for Men - Bodybuilding 22
Home workouts - fat burning, abs, legs, arms,chest 8
HomeWork 1
Homes.com 🏠 For Sale, Rent 11
Homescapes 27
Homesnap Real Estate & Rentals 25
Homestyler Interior Design & Decorating Ideas 19
Homework Planner 33
Honkai Impact 3rd 54
Hopper - Watch & Book Flights 54
Horoscopes – Daily Zodiac Horoscope and Astrology 21
Horses Live Wallpaper 22
Hostelworld: Hostels & Cheap Hotels Travel App 42
Hot Wheels: Race Off 14
HotelTonight: Book amazing deals at great hotels 93
Hotels Combined - Cheap deals 15
Hotels.com: Book Hotel Rooms & Find Vacation Deals 39
Hotspot Shield Free VPN Proxy & Wi-Fi Security 17
Hotstar 14
Hotwire Hotel & Car Rental App 16
Housing-Real Estate & Property 8
[853 rows x 1 columns]
And the negative dataframe n:
Sent
App
10 Best Foods for You 10
104 找工作 - 找工作 找打工 找兼職 履歷健檢 履歷診療室 1
11st 7
1800 Contacts - Lens Store 6
1LINE – One Line with One Touch 8
2018Emoji Keyboard 😂 Emoticons Lite -sticker&gif 1
21-Day Meditation Experience 10
2Date Dating App, Love and matching 7
2GIS: directory & navigator 6
2RedBeans 2
2ndLine - Second Phone Number 7
30 Day Fitness Challenge - Workout at Home 2
4 in a Row 3
7 Minute Workout 1
7 Weeks - Habit & Goal Tracker 4
8 Ball Pool 106
850 Sports News Digest 1
8fit Workouts & Meal Planner 19
95Live -SG#1 Live Streaming App 20
A Call From Santa Claus! 14
A&E - Watch Full Episodes of TV Shows 3
A+ Gallery - Photos & Videos 7
A+ Mobile 9
ABC Kids - Tracing & Phonics 1
ABC News - US & World News 29
ABC Preschool Free 8
...
Hill Climb Racing 13
Hill Climb Racing 2 11
Hily: Dating, Chat, Match, Meet & Hook up 29
Hinge: Dating & Relationships 11
HipChat - Chat Built for Teams 26
Hipmunk Hotels & Flights 1
Hitwe - meet people and chat 7
Home Decor Showpiece Art making: Medium Difficulty 5
Home Scouting® MLS Mobile 12
Home Security Camera WardenCam - reuse old phones 4
Home Street – Home Design Game 13
Home Workout - No Equipment 1
Homes.com 🏠 For Sale, Rent 3
Homescapes 25
Homesnap Real Estate & Rentals 6
Homestyler Interior Design & Decorating Ideas 7
Homework Planner 4
Honkai Impact 3rd 22
Hopper - Watch & Book Flights 18
Horoscopes – Daily Zodiac Horoscope and Astrology 1
Horses Live Wallpaper 2
Hostelworld: Hostels & Cheap Hotels Travel App 12
Hot Wheels: Race Off 6
HotelTonight: Book amazing deals at great hotels 17
Hotels Combined - Cheap deals 7
Hotels.com: Book Hotel Rooms & Find Vacation Deals 21
Hotspot Shield Free VPN Proxy & Wi-Fi Security 3
Hotstar 14
Hotwire Hotel & Car Rental App 6
Housing-Real Estate & Property 10
[782 rows x 1 columns]
Doing this I could find the app name which has equal p["Sent"].values:
df=p.loc[p["Sent"]==n["Sent"]]
print(df)
Output:
ValueError: Can only compare identically-labeled Series objects
You are comparing dataframes with different rows.
I would do like this. Consider this situation.
name p n
app1 5 5
app2 8 6
app3 7 7
app4 10 8
app5 3 NaN
This code print the name of app and num where 'p' and 'n' numbers are the same.
# make dataframe p, n
p = pd.DataFrame([5, 8, 7, 10 ,3], index=['app1', 'app2', 'app3', 'app4', 'app5'], columns=['p'])
n = pd.DataFrame([5, 6, 7, 8, None], index=['app1', 'app2', 'app3', 'app4', 'app5'], columns=['n'])
# combine p and n with concat
df = pd.concat([p, n], axis=1)
# check equality
for i in range(len(df)):
if df.iloc[i]['p'] == df.iloc[i]['n']:
print(df.index[i], df.iloc[i]['p'])
# Outputs are
# app1 5.0
# app3 7.0

Sum based on grouping in pandas dataframe?

I have a pandas dataframe df which contains:
major men women rank
Art 5 4 1
Art 3 5 3
Art 2 4 2
Engineer 7 8 3
Engineer 7 4 4
Business 5 5 4
Business 3 4 2
Basically I am needing to find the total number of students including both men and women as one per major regardless of the rank column. So for Art for example, the total should be all men + women totaling 23, Engineer 26, Business 17.
I have tried
df.groupby(['major_category']).sum()
But this separately sums the men and women rather than combining their totals.
Just add both columns and then groupby:
(df.men+df.women).groupby(df.major).sum()
major
Art 23
Business 17
Engineer 26
dtype: int64
melt() then groupby():
df.drop('rank',1).melt('major').groupby('major',as_index=False).sum()
major value
0 Art 23
1 Business 17
2 Engineer 26

Add new column to dataframe based on an average

I have a dataframe that includes the category of a project, currency, number of investors, goal, etc., and I want to create a new column which will be "average success rate of their category":
state category main_category currency backers country \
0 0 Poetry Publishing GBP 0 GB
1 0 Narrative Film Film & Video USD 15 US
2 0 Narrative Film Film & Video USD 3 US
3 0 Music Music USD 1 US
4 1 Restaurants Food USD 224 US
usd_goal_real duration year hour
0 1533.95 59 2015 morning
1 30000.00 60 2017 morning
2 45000.00 45 2013 morning
3 5000.00 30 2012 morning
4 50000.00 35 2016 afternoon
I have the average success rates in series format:
Dance 65.435209
Theater 63.796134
Comics 59.141527
Music 52.660558
Art 44.889045
Games 43.890467
Film & Video 41.790649
Design 41.594386
Publishing 34.701650
Photography 34.110847
Fashion 28.283186
Technology 23.785582
And now I want to add in a new column, where each column will have a success rate matching their category, i.e. wherever the row is technology, the new column will include 23.78 for that row.
df[category_success_rate] = i want the output column to be the % success which matches with the category in "main category" column.
I think you need GroupBy.transform with a Boolean mask, df['state'].eq(1) or (df['state'] == 1):
df['category_success_rate'] = (df['state'].eq(1)
.groupby(df['main_category']).transform('mean') * 100)
Alternative:
df['category_success_rate'] = ((df['state'] == 1)
.groupby(df['main_category']).transform('mean') * 100)

Sort text in second column based on values in first column

in python i would like to separate the text in different rows based on the values of the first number. So:
Harry went to School 100
Mary sold goods 50
Sick man
using the provided information below:
number text
1 Harry
1 Went
1 to
1 School
1 100
2 Mary
2 sold
2 goods
2 50
3 Sick
3 Man
for i in xrange(0, len(df['number'])-1):
if df['number'][i+1] == df['number'][i]:
# append text (e.g Harry went to school 100)
else:
# new row (Mary sold goods 50)
You can use groupby,
for name,group in df.groupby(df['number']):
print ' '.join([i for i in group['text']])
Result
Harry Went to School 100
Mary sold goods 50
Sick Man

Categories

Resources