I have two dataframes:
file_date = str((date.today() - timedelta(days = 2)).strftime('%m-%d-%Y'))
file_date
github_dir_path = 'https://github.com/CSSEGISandData/COVID-19/raw/master/csse_covid_19_data/csse_covid_19_daily_reports/'
file_path = github_dir_path + file_date + '.csv'
first dataframe:
FIPS Admin2 Province_State Country_Region Last_Update Lat Long_ Confirmed Deaths Recovered Active Combined_Key
0 45001.0 Abbeville South Carolina US 2020-04-28 02:30:51 34.223334 -82.461707 29 0 0 29 Abbeville, South Carolina, US
1 22001.0 Acadia Louisiana US 2020-04-28 02:30:51 30.295065 -92.414197 130 9 0 121 Acadia, Louisiana, US
2 51001.0 Accomack Virginia US 2020-04-28 02:30:51 37.767072 -75.632346 195 3 0 192 Accomack, Virginia, US
3 16001.0 Ada Idaho US 2020-04-28 02:30:51 43.452658 -116.241552 650 15 0 635 Ada, Idaho, US
4 19001.0 Adair Iowa US 2020-04-28 02:30:51 41.330756 -94.471059 1 0 0 1 Adair, Iowa, US
#
0 0 ... 0 Kerala 0 Kerala 1
2 2020-02-01 Kerala 2 0 0 ... 0 Kerala 0 Kerala 2
3 2020-02-02 Kerala 3 0 0 ... 0 Kerala 0 Kerala 3
4 2020-02-03 Kerala 3 0 0 ... 0 Kerala 0 Kerala 3
Please guide me on how to concatenate both the data frames. I tried a couple of things but did not get the expected result.
Related
I am very new to using Beautiful Soup and struggling to grasp how to retrieve my intended data. I have the following source code:
<pre>
<div class="scorebox_meta">
<div><strong>Saturday August 14, 2021</strong>,<span class="venuetime" data-venue-date="2021-08-14" data-venue-time="12:30" data-venue-epoch="1628940600">12:30 (venue time)</span> <span class="localtime" data-label="your time">13:30 (local time)</span>
</div><div><strong><small>Venue</small></strong>: <small>Old Trafford, Manchester</small></div>
</pre>
I'm trying to retrieve the date (2021-08-14), time (12:30), and venue location (Old Trafford, Manchester), but the code I've written returns either "none" or all the information in the scorebox. Can anyone assist?
import requests
from bs4 import BeautifulSoup
url = "https://fbref.com/en/matches/e62685d4/Manchester-United-Leeds-United-August-14-2021-Premier-League"
response = requests.get(url)
soup = BeautifulSoup(response.text, "html.parser")
scorebox = soup.find("div", class_="scorebox_meta")
date = scorebox.select("div", class_="data-venue-date")
time = scorebox.select("div", class_="data-venue-time")
venue = scorebox.select("div", class_="match_stadium")
print("Date:", date)
print("Time:", time)
print("Venue:", venue)
Here is one way to get the information from that page:
import pandas as pd
url = "https://fbref.com/en/matches/e62685d4/Manchester-United-Leeds-United-August-14-2021-Premier-League"
dfs = pd.read_html(url)
for df in dfs:
print(df.head())
print('\n')
Result in terminal:
Manchester Utd (4-2-3-1) Manchester Utd (4-2-3-1).1
0 1 David de Gea
1 2 Victor Lindelöf
2 5 Harry Maguire
3 6 Paul Pogba
4 11 Mason Greenwood
Leeds United (4-1-4-1) Leeds United (4-1-4-1).1
0 1 Illan Meslier
1 2 Luke Ayling
2 5 Robin Koch
3 6 Liam Cooper
4 9 Patrick Bamford
Manchester Utd Leeds United
Possession Possession
0 49% 51%
1 Passing Accuracy Passing Accuracy
2 363 of 465 — 78% 77% — 372 of 482
3 Shots on Target Shots on Target
4 8 of 16 — 50% 30% — 3 of 10
Unnamed: 0_level_0 Unnamed: 1_level_0 Unnamed: 2_level_0 Unnamed: 3_level_0 Unnamed: 4_level_0 Unnamed: 5_level_0 Performance Expected SCA Passes Carries Take-Ons
Player # Nation Pos Age Min Gls Ast PK PKatt Sh SoT CrdY CrdR Touches Tkl Int Blocks xG npxG xAG SCA GCA Cmp Att Cmp% PrgP Carries PrgC Att Succ
0 Mason Greenwood 11.0 eng ENG FW,RW 19-317 90 1 0 0 0 3 2 0 0 38 0 0 1 0.1 0.1 0.3 5 1 22 27 81.5 4 28 3 9 5
1 Paul Pogba 6.0 fr FRA LW 28-152 74 0 4 0 0 2 1 0 0 46 1 1 1 0.3 0.3 0.7 5 4 25 33 75.8 5 29 2 2 1
2 Anthony Martial 9.0 fr FRA FW 25-252 16 0 0 0 0 0 0 0 0 7 0 0 0 0.0 0.0 0.0 0 0 5 7 71.4 1 5 0 2 2
3 Daniel James 21.0 wls WAL RW 23-277 74 0 0 0 0 3 1 0 0 34 2 0 2 0.2 0.2 0.0 2 0 13 19 68.4 2 16 2 1 0
4 Jadon Sancho 25.0 eng ENG LW 21-142 16 0 0 0 0 0 0 0 0 15 0 0 0 0.0 0.0 0.0 0 0 10 14 71.4 0 8 1 2 1
Unnamed: 0_level_0 Unnamed: 1_level_0 Unnamed: 2_level_0 Unnamed: 3_level_0 Unnamed: 4_level_0 Unnamed: 5_level_0 Total Short Medium Long Unnamed: 20_level_0 Unnamed: 21_level_0 Unnamed: 22_level_0 Unnamed: 23_level_0 Unnamed: 24_level_0 Unnamed: 25_level_0 Unnamed: 26_level_0 Unnamed: 27_level_0
Player # Nation Pos Age Min Cmp Att Cmp% TotDist PrgDist Cmp Att Cmp% Cmp Att Cmp% Cmp Att Cmp% Ast xAG xA KP 1/3 PPA CrsPA PrgP
0 Mason Greenwood 11.0 eng ENG FW,RW 19-317 90 22 27 81.5 319 102 13 14 92.9 8 8 100.0 1 4 25.0 0 0.3 0.1 1 3 2 0 4
1 Paul Pogba 6.0 fr FRA LW 28-152 74 25 33 75.8 523 212 10 15 66.7 6 7 85.7 7 8 87.5 4 0.7 0.5 5 3 2 0 5
2 Anthony Martial 9.0 fr FRA FW 25-252 16 5 7 71.4 79 14 2 2 100.0 3 4 75.0 0 0 NaN 0 0.0 0.0 0 1 0 0 1
3 Daniel James 21.0 wls WAL RW 23-277 74 13 19 68.4 114 33 11 12 91.7 1 2 50.0 0 1 0.0 0 0.0 0.0 1 1 0 0 2
4 Jadon Sancho 25.0 eng ENG LW 21-142 16 10 14 71.4 113 3 6 8 75.0 3 5 60.0 0 0 NaN 0 0.0 0.0 0 0 0 0 0
Unnamed: 0_level_0 Unnamed: 1_level_0 Unnamed: 2_level_0 Unnamed: 3_level_0 Unnamed: 4_level_0 Unnamed: 5_level_0 Unnamed: 6_level_0 Pass Types Corner Kicks Outcomes
Player # Nation Pos Age Min Att Live Dead FK TB Sw Crs TI CK In Out Str Cmp Off Blocks
0 Mason Greenwood 11.0 eng ENG FW,RW 19-317 90 27 25 2 0 3 0 2 0 2 2 0 0 22 0 0
1 Paul Pogba 6.0 fr FRA LW 28-152 74 33 32 1 1 2 3 0 0 0 0 0 0 25 0 1
2 Anthony Martial 9.0 fr FRA FW 25-252 16 7 7 0 0 0 0 0 0 0 0 0 0 5 0 1
3 Daniel James 21.0 wls WAL RW 23-277 74 19 19 0 0 0 0 2 0 0 0 0 0 13 0 1
4 Jadon Sancho 25.0 eng ENG LW 21-142 16 14 13 1 0 1 0 0 1 0 0 0 0 10 0 0
Unnamed: 0_level_0 Unnamed: 1_level_0 Unnamed: 2_level_0 Unnamed: 3_level_0 Unnamed: 4_level_0 Unnamed: 5_level_0 Tackles Challenges Blocks Unnamed: 18_level_0 Unnamed: 19_level_0 Unnamed: 20_level_0 Unnamed: 21_level_0
Player # Nation Pos Age Min Tkl TklW Def 3rd Mid 3rd Att 3rd Tkl Att Tkl% Lost Blocks Sh Pass Int Tkl+Int Clr Err
0 Mason Greenwood 11.0 eng ENG FW,RW 19-317 90 0 0 0 0 0 0 0 NaN 0 1 0 1 0 0 0 0
1 Paul Pogba 6.0 fr FRA LW 28-152 74 1 1 0 1 0 1 3 33.3 2 1 0 1 1 2 0 0
2 Anthony Martial 9.0 fr FRA FW 25-252 16 0 0 0 0 0 0 0 NaN 0 0 0 0 0 0 0 0
3 Daniel James 21.0 wls WAL RW 23-277 74 2 1 0 2 0 0 0 NaN 0 2 0 2 0 2 1 0
4 Jadon Sancho 25.0 eng ENG LW 21-142 16 0 0 0 0 0 0 0 NaN 0 0 0 0 0 0 0 0
Unnamed: 0_level_0 Unnamed: 1_level_0 Unnamed: 2_level_0 Unnamed: 3_level_0 Unnamed: 4_level_0 Unnamed: 5_level_0 Touches Take-Ons Carries Receiving
Player # Nation Pos Age Min Touches Def Pen Def 3rd Mid 3rd Att 3rd Att Pen Live Att Succ Succ% Tkld Tkld% Carries TotDist PrgDist PrgC 1/3 CPA Mis Dis Rec PrgR
0 Mason Greenwood 11.0 eng ENG FW,RW 19-317 90 38 0 3 16 21 2 38 9 5 55.6 4 44.4 28 180 111 3 3 2 3 2 28 6
1 Paul Pogba 6.0 fr FRA LW 28-152 74 46 1 3 23 20 3 46 2 1 50.0 1 50.0 29 125 73 2 3 1 6 3 35 6
2 Anthony Martial 9.0 fr FRA FW 25-252 16 7 0 1 6 1 0 7 2 2 100.0 0 0.0 5 57 25 0 0 0 0 0 6 1
3 Daniel James 21.0 wls WAL RW 23-277 74 34 1 2 13 19 5 34 1 0 0.0 1 100.0 16 132 48 2 1 1 3 1 21 10
4 Jadon Sancho 25.0 eng ENG LW 21-142 16 15 0 1 8 6 0 15 2 1 50.0 1 50.0 8 44 24 1 1 0 0 0 13 0
Unnamed: 0_level_0 Unnamed: 1_level_0 Unnamed: 2_level_0 Unnamed: 3_level_0 Unnamed: 4_level_0 Unnamed: 5_level_0 Performance Aerial Duels
Player # Nation Pos Age Min CrdY CrdR 2CrdY Fls Fld Off Crs Int TklW PKwon PKcon OG Recov Won Lost Won%
0 Mason Greenwood 11.0 eng ENG FW,RW 19-317 90 0 0 0 2 1 0 2 0 0 0 0 0 5 0 2 0.0
1 Paul Pogba 6.0 fr FRA LW 28-152 74 0 0 0 2 1 0 0 1 1 0 0 0 5 1 1 50.0
2 Anthony Martial 9.0 fr FRA FW 25-252 16 0 0 0 0 0 0 0 0 0 0 0 0 1 0 0 NaN
3 Daniel James 21.0 wls WAL RW 23-277 74 0 0 0 0 0 2 2 0 1 0 0 0 3 1 1 50.0
4 Jadon Sancho 25.0 eng ENG LW 21-142 16 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 NaN
Unnamed: 0_level_0 Unnamed: 1_level_0 Unnamed: 2_level_0 Unnamed: 3_level_0 Shot Stopping Launched Passes Goal Kicks Crosses Sweeper
Player Nation Age Min SoTA GA Saves Save% PSxG Cmp Att Cmp% Att Thr Launch% AvgLen Att Launch% AvgLen Opp Stp Stp% #OPA AvgDist
0 David de Gea es ESP 30-280 90 3 1 2 66.7 1.0 7 11 63.6 24 3 8.3 20.8 12 75.0 42.5 12 0 0.0 0 12.0
Unnamed: 0_level_0 Unnamed: 1_level_0 Unnamed: 2_level_0 Unnamed: 3_level_0 Unnamed: 4_level_0 Unnamed: 5_level_0 Performance Expected SCA Passes Carries Take-Ons
Player # Nation Pos Age Min Gls Ast PK PKatt Sh SoT CrdY CrdR Touches Tkl Int Blocks xG npxG xAG SCA GCA Cmp Att Cmp% PrgP Carries PrgC Att Succ
0 Patrick Bamford 9.0 eng ENG FW 27-343 76 0 0 0 0 1 0 0 0 25 0 1 1 0.1 0.1 0.1 1 0 12 14 85.7 2 12 0 0 0
1 Tyler Roberts 11.0 wls WAL FW 22-214 14 0 0 0 0 1 0 0 0 6 1 0 0 0.0 0.0 0.2 1 0 3 3 100.0 1 3 0 0 0
2 Jack Harrison 22.0 eng ENG LM 24-267 68 0 0 0 0 1 1 0 0 31 1 1 1 0.0 0.0 0.1 3 0 14 21 66.7 3 24 0 1 1
3 Hélder Costa 17.0 ao ANG LM 27-214 22 0 0 0 0 1 0 0 0 9 0 0 1 0.0 0.0 0.0 1 0 5 5 100.0 0 6 0 0 0
4 Mateusz Klich 43.0 pl POL FW,CM 31-062 90 0 0 0 0 2 1 0 0 49 2 0 0 0.1 0.1 0.0 2 0 34 46 73.9 3 31 2 1 1
Unnamed: 0_level_0 Unnamed: 1_level_0 Unnamed: 2_level_0 Unnamed: 3_level_0 Unnamed: 4_level_0 Unnamed: 5_level_0 Total Short Medium Long Unnamed: 20_level_0 Unnamed: 21_level_0 Unnamed: 22_level_0 Unnamed: 23_level_0 Unnamed: 24_level_0 Unnamed: 25_level_0 Unnamed: 26_level_0 Unnamed: 27_level_0
Player # Nation Pos Age Min Cmp Att Cmp% TotDist PrgDist Cmp Att Cmp% Cmp Att Cmp% Cmp Att Cmp% Ast xAG xA KP 1/3 PPA CrsPA PrgP
0 Patrick Bamford 9.0 eng ENG FW 27-343 76 12 14 85.7 141 25 10 12 83.3 0 0 NaN 1 1 100.0 0 0.1 0.0 1 0 1 0 2
1 Tyler Roberts 11.0 wls WAL FW 22-214 14 3 3 100.0 70 22 1 1 100.0 1 1 100.0 1 1 100.0 0 0.2 0.0 1 0 2 1 1
2 Jack Harrison 22.0 eng ENG LM 24-267 68 14 21 66.7 215 94 7 10 70.0 5 6 83.3 1 3 33.3 0 0.1 0.0 2 3 0 0 3
3 Hélder Costa 17.0 ao ANG LM 27-214 22 5 5 100.0 79 0 2 2 100.0 3 3 100.0 0 0 NaN 0 0.0 0.0 0 0 0 0 0
4 Mateusz Klich 43.0 pl POL FW,CM 31-062 90 34 46 73.9 518 80 19 21 90.5 12 16 75.0 2 5 40.0 0 0.0 0.0 0 3 0 0 3
Unnamed: 0_level_0 Unnamed: 1_level_0 Unnamed: 2_level_0 Unnamed: 3_level_0 Unnamed: 4_level_0 Unnamed: 5_level_0 Unnamed: 6_level_0 Pass Types Corner Kicks Outcomes
Player # Nation Pos Age Min Att Live Dead FK TB Sw Crs TI CK In Out Str Cmp Off Blocks
0 Patrick Bamford 9.0 eng ENG FW 27-343 76 14 8 6 0 0 0 0 0 0 0 0 0 12 0 0
1 Tyler Roberts 11.0 wls WAL FW 22-214 14 3 3 0 0 0 0 1 0 0 0 0 0 3 0 0
2 Jack Harrison 22.0 eng ENG LM 24-267 68 21 21 0 0 0 0 1 0 0 0 0 0 14 0 1
3 Hélder Costa 17.0 ao ANG LM 27-214 22 5 5 0 0 0 0 0 0 0 0 0 0 5 0 0
4 Mateusz Klich 43.0 pl POL FW,CM 31-062 90 46 44 1 0 1 0 1 1 0 0 0 0 34 1 1
Unnamed: 0_level_0 Unnamed: 1_level_0 Unnamed: 2_level_0 Unnamed: 3_level_0 Unnamed: 4_level_0 Unnamed: 5_level_0 Tackles Challenges Blocks Unnamed: 18_level_0 Unnamed: 19_level_0 Unnamed: 20_level_0 Unnamed: 21_level_0
Player # Nation Pos Age Min Tkl TklW Def 3rd Mid 3rd Att 3rd Tkl Att Tkl% Lost Blocks Sh Pass Int Tkl+Int Clr Err
0 Patrick Bamford 9.0 eng ENG FW 27-343 76 0 0 0 0 0 0 1 0.0 1 1 0 1 1 1 0 0
1 Tyler Roberts 11.0 wls WAL FW 22-214 14 1 0 0 1 0 0 0 NaN 0 0 0 0 0 1 0 0
2 Jack Harrison 22.0 eng ENG LM 24-267 68 1 0 0 1 0 1 2 50.0 1 1 0 1 1 2 0 0
3 Hélder Costa 17.0 ao ANG LM 27-214 22 0 0 0 0 0 0 0 NaN 0 1 0 1 0 0 0 0
4 Mateusz Klich 43.0 pl POL FW,CM 31-062 90 2 1 0 1 1 2 2 100.0 0 0 0 0 0 2 0 0
Unnamed: 0_level_0 Unnamed: 1_level_0 Unnamed: 2_level_0 Unnamed: 3_level_0 Unnamed: 4_level_0 Unnamed: 5_level_0 Touches Take-Ons Carries Receiving
Player # Nation Pos Age Min Touches Def Pen Def 3rd Mid 3rd Att 3rd Att Pen Live Att Succ Succ% Tkld Tkld% Carries TotDist PrgDist PrgC 1/3 CPA Mis Dis Rec PrgR
0 Patrick Bamford 9.0 eng ENG FW 27-343 76 25 0 0 17 8 3 25 0 0 NaN 0 NaN 12 19 3 0 0 0 5 1 16 2
1 Tyler Roberts 11.0 wls WAL FW 22-214 14 6 0 0 2 4 1 6 0 0 NaN 0 NaN 3 9 4 0 0 0 1 0 4 2
2 Jack Harrison 22.0 eng ENG LM 24-267 68 31 0 5 19 7 0 31 1 1 100.0 0 0.0 24 78 25 0 0 0 4 0 22 6
3 Hélder Costa 17.0 ao ANG LM 27-214 22 9 0 2 4 3 0 9 0 0 NaN 0 NaN 6 18 1 0 0 0 1 0 6 2
4 Mateusz Klich 43.0 pl POL FW,CM 31-062 90 49 0 6 28 16 1 49 1 1 100.0 0 0.0 31 180 72 2 1 0 1 1 39 9
Unnamed: 0_level_0 Unnamed: 1_level_0 Unnamed: 2_level_0 Unnamed: 3_level_0 Unnamed: 4_level_0 Unnamed: 5_level_0 Performance Aerial Duels
Player # Nation Pos Age Min CrdY CrdR 2CrdY Fls Fld Off Crs Int TklW PKwon PKcon OG Recov Won Lost Won%
0 Patrick Bamford 9.0 eng ENG FW 27-343 76 0 0 0 1 0 1 0 1 0 0 0 0 1 0 2 0.0
1 Tyler Roberts 11.0 wls WAL FW 22-214 14 0 0 0 0 0 0 1 0 0 0 0 0 0 0 2 0.0
2 Jack Harrison 22.0 eng ENG LM 24-267 68 0 0 0 1 1 1 1 1 0 0 0 0 3 0 1 0.0
3 Hélder Costa 17.0 ao ANG LM 27-214 22 0 0 0 1 2 0 0 0 0 0 0 0 1 0 1 0.0
4 Mateusz Klich 43.0 pl POL FW,CM 31-062 90 0 0 0 2 0 0 1 0 1 0 0 0 6 0 0 NaN
Unnamed: 0_level_0 Unnamed: 1_level_0 Unnamed: 2_level_0 Unnamed: 3_level_0 Shot Stopping Launched Passes Goal Kicks Crosses Sweeper
Player Nation Age Min SoTA GA Saves Save% PSxG Cmp Att Cmp% Att Thr Launch% AvgLen Att Launch% AvgLen Opp Stp Stp% #OPA AvgDist
0 Illan Meslier fr FRA 21-165 90 8 5 3 37.5 1.6 5 20 25.0 37 10 37.8 30.2 10 60.0 44.5 10 1 10.0 2 13.3
Unnamed: 0_level_0 Unnamed: 1_level_0 Unnamed: 2_level_0 Unnamed: 3_level_0 Unnamed: 4_level_0 Unnamed: 5_level_0 Unnamed: 6_level_0 Unnamed: 7_level_0 Unnamed: 8_level_0 SCA 1 SCA 2
Minute Player Squad xG PSxG Outcome Distance Body Part Notes Player Event Player Event
0 2.0 Scott McTominay Manchester Utd 0.03 NaN Blocked 15.0 Right Foot NaN Bruno Fernandes Pass (Live) Daniel James Pass (Live)
1 6.0 Mason Greenwood Manchester Utd 0.02 0.13 Saved 28.0 Left Foot NaN Scott McTominay Pass (Live) Bruno Fernandes Pass (Live)
2 9.0 Mateusz Klich Leeds United 0.02 NaN Off Target 22.0 Right Foot NaN Jack Harrison Pass (Live) Raphinha Pass (Dead)
3 11.0 Harry Maguire Manchester Utd 0.01 NaN Off Target 14.0 Head NaN Luke Shaw Pass (Dead) NaN NaN
4 12.0 Paul Pogba Manchester Utd 0.25 NaN Off Target 16.0 Left Foot NaN Mason Greenwood Pass (Live) Mason Greenwood Take-On
Unnamed: 0_level_0 Unnamed: 1_level_0 Unnamed: 2_level_0 Unnamed: 3_level_0 Unnamed: 4_level_0 Unnamed: 5_level_0 Unnamed: 6_level_0 Unnamed: 7_level_0 Unnamed: 8_level_0 SCA 1 SCA 2
Minute Player Squad xG PSxG Outcome Distance Body Part Notes Player Event Player Event
0 2.0 Scott McTominay Manchester Utd 0.03 NaN Blocked 15.0 Right Foot NaN Bruno Fernandes Pass (Live) Daniel James Pass (Live)
1 6.0 Mason Greenwood Manchester Utd 0.02 0.13 Saved 28.0 Left Foot NaN Scott McTominay Pass (Live) Bruno Fernandes Pass (Live)
2 11.0 Harry Maguire Manchester Utd 0.01 NaN Off Target 14.0 Head NaN Luke Shaw Pass (Dead) NaN NaN
3 12.0 Paul Pogba Manchester Utd 0.25 NaN Off Target 16.0 Left Foot NaN Mason Greenwood Pass (Live) Mason Greenwood Take-On
4 15.0 Daniel James Manchester Utd 0.03 NaN Blocked 17.0 Left Foot NaN NaN NaN NaN NaN
Unnamed: 0_level_0 Unnamed: 1_level_0 Unnamed: 2_level_0 Unnamed: 3_level_0 Unnamed: 4_level_0 Unnamed: 5_level_0 Unnamed: 6_level_0 Unnamed: 7_level_0 Unnamed: 8_level_0 SCA 1 SCA 2
Minute Player Squad xG PSxG Outcome Distance Body Part Notes Player Event Player Event
0 9.0 Mateusz Klich Leeds United 0.02 NaN Off Target 22.0 Right Foot NaN Jack Harrison Pass (Live) Raphinha Pass (Dead)
1 16.0 Jack Harrison Leeds United 0.02 0.05 Saved 25.0 Left Foot Volley Mateusz Klich Pass (Live) Raphinha Pass (Dead)
2 16.0 Mateusz Klich Leeds United 0.04 0.39 Saved 27.0 Right Foot NaN Mateusz Klich Take-On Jack Harrison Pass (Live)
3 26.0 Patrick Bamford Leeds United 0.12 NaN Off Target 10.0 Head NaN Raphinha Pass (Dead) Jack Harrison Fouled
4 34.0 Raphinha Leeds United 0.05 NaN Blocked 20.0 Left Foot NaN Patrick Bamford Pass (Live) Stuart Dallas Pass (Live)
Pick the table(s) you want. Pandas documentation can be found here.
Edit: here is another way to get the particular info OP is after:
import requests
from bs4 import BeautifulSoup
url = "https://fbref.com/en/matches/e62685d4/Manchester-United-Leeds-United-August-14-2021-Premier-League"
response = requests.get(url)
soup = BeautifulSoup(response.text, "html.parser")
scorebox = soup.find("div", class_="scorebox_meta")
date = scorebox.select_one('span[class="venuetime"]').get('data-venue-date')
time = scorebox.select_one('span[class="venuetime"]').get('data-venue-time')
venue = scorebox.find('small', string='Venue').find_next('small').text
print(date, time, venue)
Result in terminal:
2021-08-14 12:30 Old Trafford, Manchester
In scorebox.select("div", class_="data-venue-date") it's not selecting specific class with "data-venue-date", but it's just selecting all div in 'scorebox'.
PLUS: "data-venue-date" is not a class but it is attribute, also it's not the attribute of "div" but attribute of "span" element.
To do as you wanted:
print(scorebox.find("span", {"data-venue-date" : re.compile(r".*")}))
# <span class="venuetime" data-venue-date="2021-08-14" data-venue-epoch="1628940600" data-venue-time="12:30">12:30 (venue time)</span>
But we don't need to do this, we can do:
print(scorebox.find("div").text)
print(scorebox.find_all("div")[-2].text)
Selecting first "div" inside scorebox
Selecting second last "div" of scorebox
Output:
Saturday August 14, 2021, 12:30 (venue time)
Venue: Old Trafford, Manchester
I am trying to scrape entire table and want to store it in .csv file.
While I am trying to scrape this data it is showing me error as NO TABLES FOUND.
Here is my code.
from pandas.io.html import read_html
page = 'https://games.crossfit.com/leaderboard/open/2020?view=0&division=1&scaled=0&sort=0'
tables = read_html(page, attrs={"class":"desktop athletes"})
print ("Extracted {num} tables".format(num=len(tables)))
Any suggestion or guidance or any help ?
This page uses JavaScript to get data from server and generate table.
But using DevTool in Chrome/Firefox you can see (in tab Network) all requests from browser to server and one of the XHR/AJAX request gets all data in JSON format so you can use this url to get it also as JSON which you can convert to Python data and you don't have to scrape it.
import requests
r = requests.get('https://games.crossfit.com/competitions/api/v1/competitions/open/2020/leaderboards?view=0&division=1&scaled=0&sort=0')
data = r.json()
for row in data['leaderboardRows']:
print(row['entrant']['competitorName'], row['overallScore'], [(x['rank'],x['scoreDisplay']) for x in row['scores']])
Result
Patrick Vellner 64 [('13', '8:38'), ('19', '988 reps'), ('12', '6:29'), ('18', '16:29'), ('2', '10:09')]
Mathew Fraser 74 [('8', '8:28'), ('40', '959 reps'), ('3', '6:08'), ('2', '14:22'), ('21', '10:45')]
Lefteris Theofanidis 94 [('1', '8:05'), ('3', '1021 reps'), ('13', '6:32'), ('4', '15:00'), ('73', '11:11')]
# ... more ...
As stated below, you can access the api to get the data. To save as CSV, you'll need to work through the json format to get what you need (ie. flatten out the nested data). There's 2 ways to do it, a) completely flatten it out so that each row is for each entrant, or b) have separate rows for each entrant for each of their ordinal scores.
The only differences will be if you choose a) you'll have a really wide table (but no repeated data), and if you go with b) you'll have a long table, with repeat of data.
Since it's not too big of a file, I went with option b) so you can always groupby particular columns or filter:
import requests
import pandas as pd
r = requests.get('https://games.crossfit.com/competitions/api/v1/competitions/open/2020/leaderboards?view=0&division=1&scaled=0&sort=0')
data = r.json()
results = pd.DataFrame()
df = pd.DataFrame(data['leaderboardRows'])
for idx, row in df.iterrows():
entrantData = pd.Series()
scoresData = pd.DataFrame()
entrantResults = pd.DataFrame()
for idx2, each in row.iteritems():
if type(each) == dict:
temp = pd.DataFrame.from_dict(each, orient='index')
entrantData = entrantData.append(temp)
elif type(each) == list:
temp2 = pd.DataFrame(each)
scoresData = scoresData.append(temp2, sort=True).reset_index(drop=True)
else:
entrantData = entrantData.append(pd.Series(each, name=idx2))
entrantResults = entrantResults.append(scoresData, sort=True).reset_index(drop=True)
entrantResults = entrantResults.merge(pd.concat([entrantData.T] *5, ignore_index=True), left_index=True, right_index=True)
results = results.append(entrantResults, sort=True).reset_index(drop=True)
results.to_csv('file.csv', index=False)
Output: first 15 rows of 250
print (results.head(15).to_string())
affiliate affiliateId affiliateName age breakdown competitorId competitorName countryChampion countryOfOriginCode countryOfOriginName divisionId drawBlueHR firstName gender heat height highlight judge lane lastName mobileScoreDisplay nextStage ordinal overallRank overallScore postCompStatus profilePicS3key rank scaled score scoreDisplay scoreIdentifier status time video weight
0 CrossFit Nanaimo 1918 CrossFit Nanaimo 30 10 rounds 158264 Patrick Vellner False CA Canada 1 NaN Patrick M 71 in False Dallyn Giroux Vellner 1 1 64 d471c-P158264_7-184.jpg 13 0 11800382 8:38 9d3979222412df2842a1 ACT 518 0 195 lb
1 CrossFit Soul Miami 1918 CrossFit Nanaimo 30 29 rounds +\n2 thrusters\n 158264 Patrick Vellner False CA Canada 1 NaN Patrick M 71 in False Lamar Vernon Vellner 2 1 64 d471c-P158264_7-184.jpg 19 0 1009880000 988 reps 9bd66b00e8367cc7fd0c ACT NaN 0 195 lb
2 CrossFit Nanaimo 1918 CrossFit Nanaimo 30 165 reps 158264 Patrick Vellner False CA Canada 1 NaN Patrick M 71 in False Jason Lochhead Vellner 3 1 64 d471c-P158264_7-184.jpg 12 0 1001650151 6:29 2347b4cb7339f2a13e6c ACT 389 0 195 lb
3 CrossFit Nanaimo 1918 CrossFit Nanaimo 30 240 reps 158264 Patrick Vellner False CA Canada 1 NaN Patrick M 71 in False Dallyn Giroux Vellner 4 1 64 d471c-P158264_7-184.jpg 18 0 1002400211 16:29 bcfd3882df3fa2e99451 ACT 989 0 195 lb
4 CrossFit New England 1918 CrossFit Nanaimo 30 240 reps 158264 Patrick Vellner False CA Canada 1 NaN Patrick M 71 in False Matt O'Keefe Vellner 5 1 64 d471c-P158264_7-184.jpg 2 0 1002400591 10:09 4bb25bed5f71141da122 ACT 609 0 195 lb
5 CrossFit Mayhem 3220 CrossFit Mayhem 30 10 rounds 153604 Mathew Fraser True US United States 1 NaN Mathew M 67 in False darren hunsucker Fraser 1 2 74 9e218-P153604_4-184.jpg 8 0 11800392 8:28 18b5b2e137f00a2d9d7d ACT 508 0 195 lb
6 CrossFit Soul Miami 3220 CrossFit Mayhem 30 28 rounds +\n4 thrusters\n3 toes-to-bars\n 153604 Mathew Fraser True US United States 1 NaN Mathew M 67 in False Daniel Lopez Fraser 2 2 74 9e218-P153604_4-184.jpg 40 0 1009590000 959 reps b96bc1b7b58fa34a28a1 ACT NaN 0 195 lb
7 CrossFit Mayhem 3220 CrossFit Mayhem 30 165 reps 153604 Mathew Fraser True US United States 1 NaN Mathew M 67 in False Jason Fernandez Fraser 3 2 74 9e218-P153604_4-184.jpg 3 0 1001650172 6:08 4f4a994a045652c894c5 ACT 368 0 195 lb
8 CrossFit Mayhem 3220 CrossFit Mayhem 30 240 reps 153604 Mathew Fraser True US United States 1 NaN Mathew M 67 in False Tasia Percevecz Fraser 4 2 74 9e218-P153604_4-184.jpg 2 0 1002400338 14:22 1a4a7d8760e72bb12d68 ACT 862 0 195 lb
9 CrossFit Mayhem 3220 CrossFit Mayhem 30 240 reps 153604 Mathew Fraser True US United States 1 NaN Mathew M 67 in False Kelley Jackson Fraser 5 2 74 9e218-P153604_4-184.jpg 21 0 1002400555 10:45 b4a259e7049f47f65356 ACT 645 0 195 lb
10 NaN 0 30 10 rounds 514502 Lefteris Theofanidis True GR Greece 1 NaN Lefteris M 171 cm False NaN Theofanidis 1 3 94 931eb-P514502_2-184.jpg 1 0 11800415 8:05 c8907e02512f42ff3142 ACT 485 1 81 kg
11 NaN 0 30 30 rounds +\n1 thruster\n 514502 Lefteris Theofanidis True GR Greece 1 NaN Lefteris M 171 cm False NaN Theofanidis 2 3 94 931eb-P514502_2-184.jpg 3 0 1010210000 1021 reps 63add31b22606957701c ACT NaN 1 81 kg
12 NaN 0 30 165 reps 514502 Lefteris Theofanidis True GR Greece 1 NaN Lefteris M 171 cm False NaN Theofanidis 3 3 94 931eb-P514502_2-184.jpg 13 0 1001650148 6:32 46d7cdb691c25ea38dbe ACT 392 1 81 kg
13 NaN 0 30 240 reps 514502 Lefteris Theofanidis True GR Greece 1 NaN Lefteris M 171 cm False NaN Theofanidis 4 3 94 931eb-P514502_2-184.jpg 4 0 1002400300 15:00 d49e55a2af5840740071 ACT 900 1 81 kg
14 NaN 0 30 240 reps 514502 Lefteris Theofanidis True GR Greece 1 NaN Lefteris M 171 cm False NaN Theofanidis 5 3 94 931eb-P514502_2-184.jpg 73 0 1002400529 11:11 d35c9d687eb6b72c8e36 ACT 671 1 81 kg
Pick Tm Player Pos Age To AP1 PB St CarAV ... Att Yds TD Rec Yds TD Tkl Int Sk College/Univ
0 1 CLE Myles Garrett DE 21 2017 0 0 0 0 ... 0 0 0 0 0 0 13 5.0 Texas A&M
1 2 CHI Mitch Trubisky QB 23 2017 0 0 1 0 ... 29 194 0 0 0 0 North Carolina
2 3 SFO Solomon Thomas DE 21 2017 0 0 1 0 ... 0 0 0 0 0 0 25 2.0 Stanford
3 4 JAX Leonard Fournette RB 22 2017 0 0 1 0 ... 207 822 7 25 195 1 LSU
4 5 TEN Corey Davis WR 22 2017 0 0 1 0 ... 0 0 0 22 227 0 West. Michigan
Given this df, I want to count the number of players per College/Univ.
So, just in this particular df, all collegs will have the value of 1.
Given a df and a college name, how can I count the number of items?
You can create boolean mask and then count Trues by sum, Trues are processes like 1s:
(df['College/Univ'] == 'Texas A&M').sum()
Here is the data from my problem below. This is a set of code based on movie reviewers. One line = one review by a reviewer.
bigdataframe
Out[43]:
movie id movietitle releasedate \
0 1 Toy Story (1995) 01-Jan-1995
1 4 Get Shorty (1995) 01-Jan-1995
2 5 Copycat (1995) 01-Jan-1995
3 7 Twelve Monkeys (1995) 01-Jan-1995
4 8 Babe (1995) 01-Jan-1995
5 9 Dead Man Walking (1995) 01-Jan-1995
6 11 Seven (Se7en) (1995) 01-Jan-1995
7 12 Usual Suspects, The (1995) 14-Aug-1995
8 15 Mr. Holland's Opus (1995) 29-Jan-1996
9 17 From Dusk Till Dawn (1996) 05-Feb-1996
10 19 Antonia's Line (1995) 01-Jan-1995
11 21 Muppet Treasure Island (1996) 16-Feb-1996
12 22 Braveheart (1995) 16-Feb-1996
13 23 Taxi Driver (1976) 16-Feb-1996
14 24 Rumble in the Bronx (1995) 23-Feb-1996
15 25 Birdcage, The (1996) 08-Mar-1996
16 28 Apollo 13 (1995) 01-Jan-1995
17 30 Belle de jour (1967) 01-Jan-1967
18 31 Crimson Tide (1995) 01-Jan-1995
19 32 Crumb (1994) 01-Jan-1994
20 42 Clerks (1994) 01-Jan-1994
21 44 Dolores Claiborne (1994) 01-Jan-1994
22 45 Eat Drink Man Woman (1994) 01-Jan-1994
23 47 Ed Wood (1994) 01-Jan-1994
24 48 Hoop Dreams (1994) 01-Jan-1994
25 49 I.Q. (1994) 01-Jan-1994
26 50 Star Wars (1977) 01-Jan-1977
27 54 Outbreak (1995) 01-Jan-1995
28 55 Professional, The (1994) 01-Jan-1994
29 56 Pulp Fiction (1994) 01-Jan-1994
... ... ...
99970 332 Kiss the Girls (1997) 01-Jan-1997
99971 334 U Turn (1997) 01-Jan-1997
99972 338 Bean (1997) 01-Jan-1997
99973 346 Jackie Brown (1997) 01-Jan-1997
99974 682 I Know What You Did Last Summer (1997) 17-Oct-1997
99975 873 Picture Perfect (1997) 01-Aug-1997
99976 877 Excess Baggage (1997) 01-Jan-1997
99977 886 Life Less Ordinary, A (1997) 01-Jan-1997
99978 1527 Senseless (1998) 09-Jan-1998
99979 272 Good Will Hunting (1997) 01-Jan-1997
99980 288 Scream (1996) 20-Dec-1996
99981 294 Liar Liar (1997) 21-Mar-1997
99982 300 Air Force One (1997) 01-Jan-1997
99983 310 Rainmaker, The (1997) 01-Jan-1997
99984 313 Titanic (1997) 01-Jan-1997
99985 322 Murder at 1600 (1997) 18-Apr-1997
99986 328 Conspiracy Theory (1997) 08-Aug-1997
99987 333 Game, The (1997) 01-Jan-1997
99988 338 Bean (1997) 01-Jan-1997
99989 346 Jackie Brown (1997) 01-Jan-1997
99990 354 Wedding Singer, The (1998) 13-Feb-1998
99991 362 Blues Brothers 2000 (1998) 06-Feb-1998
99992 683 Rocket Man (1997) 01-Jan-1997
99993 689 Jackal, The (1997) 01-Jan-1997
99994 690 Seven Years in Tibet (1997) 01-Jan-1997
99995 748 Saint, The (1997) 14-Mar-1997
99996 751 Tomorrow Never Dies (1997) 01-Jan-1997
99997 879 Peacemaker, The (1997) 01-Jan-1997
99998 894 Home Alone 3 (1997) 01-Jan-1997
99999 901 Mr. Magoo (1997) 25-Dec-1997
videoreleasedate IMDb URL \
0 NaN http://us.imdb.com/M/title-exact?Toy%20Story%2...
1 NaN http://us.imdb.com/M/title-exact?Get%20Shorty%...
2 NaN http://us.imdb.com/M/title-exact?Copycat%20(1995)
3 NaN http://us.imdb.com/M/title-exact?Twelve%20Monk...
4 NaN http://us.imdb.com/M/title-exact?Babe%20(1995)
5 NaN http://us.imdb.com/M/title-exact?Dead%20Man%20...
6 NaN http://us.imdb.com/M/title-exact?Se7en%20(1995)
7 NaN http://us.imdb.com/M/title-exact?Usual%20Suspe...
8 NaN http://us.imdb.com/M/title-exact?Mr.%20Holland...
9 NaN http://us.imdb.com/M/title-exact?From%20Dusk%2...
10 NaN http://us.imdb.com/M/title-exact?Antonia%20(1995)
11 NaN http://us.imdb.com/M/title-exact?Muppet%20Trea...
12 NaN http://us.imdb.com/M/title-exact?Braveheart%20...
13 NaN http://us.imdb.com/M/title-exact?Taxi%20Driver...
14 NaN http://us.imdb.com/M/title-exact?Hong%20Faan%2...
15 NaN http://us.imdb.com/M/title-exact?Birdcage,%20T...
16 NaN http://us.imdb.com/M/title-exact?Apollo%2013%2...
17 NaN http://us.imdb.com/M/title-exact?Belle%20de%20...
18 NaN http://us.imdb.com/M/title-exact?Crimson%20Tid...
19 NaN http://us.imdb.com/M/title-exact?Crumb%20(1994)
20 NaN http://us.imdb.com/M/title-exact?Clerks%20(1994)
21 NaN http://us.imdb.com/M/title-exact?Dolores%20Cla...
22 NaN http://us.imdb.com/M/title-exact?Yinshi%20Nan%...
23 NaN http://us.imdb.com/M/title-exact?Ed%20Wood%20(...
24 NaN http://us.imdb.com/M/title-exact?Hoop%20Dreams...
25 NaN http://us.imdb.com/M/title-exact?I.Q.%20(1994)
26 NaN http://us.imdb.com/M/title-exact?Star%20Wars%2...
27 NaN http://us.imdb.com/M/title-exact?Outbreak%20(1...
28 NaN http://us.imdb.com/Title?L%E9on+(1994)
29 NaN http://us.imdb.com/M/title-exact?Pulp%20Fictio...
... ...
99970 NaN http://us.imdb.com/M/title-exact?Kiss+the+Girl...
99971 NaN http://us.imdb.com/Title?U+Turn+(1997)
99972 NaN http://us.imdb.com/M/title-exact?Bean+(1997)
99973 NaN http://us.imdb.com/M/title-exact?imdb-title-11...
99974 NaN http://us.imdb.com/M/title-exact?I+Know+What+Y...
99975 NaN http://us.imdb.com/M/title-exact?Picture+Perfe...
99976 NaN http://us.imdb.com/M/title-exact?Excess+Baggag...
99977 NaN http://us.imdb.com/M/title-exact?Life+Less+Ord...
99978 NaN http://us.imdb.com/M/title-exact?imdb-title-12...
99979 NaN http://us.imdb.com/M/title-exact?imdb-title-11...
99980 NaN http://us.imdb.com/M/title-exact?Scream%20(1996)
99981 NaN http://us.imdb.com/Title?Liar+Liar+(1997)
99982 NaN http://us.imdb.com/M/title-exact?Air+Force+One...
99983 NaN http://us.imdb.com/M/title-exact?Rainmaker,+Th...
99984 NaN http://us.imdb.com/M/title-exact?imdb-title-12...
99985 NaN http://us.imdb.com/M/title-exact?Murder%20at%2...
99986 NaN http://us.imdb.com/M/title-exact?Conspiracy+Th...
99987 NaN http://us.imdb.com/M/title-exact?Game%2C+The+(...
99988 NaN http://us.imdb.com/M/title-exact?Bean+(1997)
99989 NaN http://us.imdb.com/M/title-exact?imdb-title-11...
99990 NaN http://us.imdb.com/M/title-exact?Wedding+Singe...
99991 NaN http://us.imdb.com/M/title-exact?Blues+Brother...
99992 NaN http://us.imdb.com/M/title-exact?Rocket+Man+(1...
99993 NaN http://us.imdb.com/M/title-exact?Jackal%2C+The...
99994 NaN http://us.imdb.com/M/title-exact?Seven+Years+i...
99995 NaN http://us.imdb.com/M/title-exact?Saint%2C%20Th...
99996 NaN http://us.imdb.com/M/title-exact?imdb-title-12...
99997 NaN http://us.imdb.com/M/title-exact?Peacemaker%2C...
99998 NaN http://us.imdb.com/M/title-exact?imdb-title-11...
99999 NaN http://us.imdb.com/M/title-exact?imdb-title-11...
unknown Action Adventure Animation Childrens ... Western \
0 0 0 0 1 1 ... 0
1 0 1 0 0 0 ... 0
2 0 0 0 0 0 ... 0
3 0 0 0 0 0 ... 0
4 0 0 0 0 1 ... 0
5 0 0 0 0 0 ... 0
6 0 0 0 0 0 ... 0
7 0 0 0 0 0 ... 0
8 0 0 0 0 0 ... 0
9 0 1 0 0 0 ... 0
10 0 0 0 0 0 ... 0
11 0 1 1 0 0 ... 0
12 0 1 0 0 0 ... 0
13 0 0 0 0 0 ... 0
14 0 1 1 0 0 ... 0
15 0 0 0 0 0 ... 0
16 0 1 0 0 0 ... 0
17 0 0 0 0 0 ... 0
18 0 0 0 0 0 ... 0
19 0 0 0 0 0 ... 0
20 0 0 0 0 0 ... 0
21 0 0 0 0 0 ... 0
22 0 0 0 0 0 ... 0
23 0 0 0 0 0 ... 0
24 0 0 0 0 0 ... 0
25 0 0 0 0 0 ... 0
26 0 1 1 0 0 ... 0
27 0 1 0 0 0 ... 0
28 0 0 0 0 0 ... 0
29 0 0 0 0 0 ... 0
... ... ... ... ... ... ...
99970 0 0 0 0 0 ... 0
99971 0 1 0 0 0 ... 0
99972 0 0 0 0 0 ... 0
99973 0 0 0 0 0 ... 0
99974 0 0 0 0 0 ... 0
99975 0 0 0 0 0 ... 0
99976 0 0 1 0 0 ... 0
99977 0 0 0 0 0 ... 0
99978 0 0 0 0 0 ... 0
99979 0 0 0 0 0 ... 0
99980 0 0 0 0 0 ... 0
99981 0 0 0 0 0 ... 0
99982 0 1 0 0 0 ... 0
99983 0 0 0 0 0 ... 0
99984 0 1 0 0 0 ... 0
99985 0 0 0 0 0 ... 0
99986 0 1 0 0 0 ... 0
99987 0 0 0 0 0 ... 0
99988 0 0 0 0 0 ... 0
99989 0 0 0 0 0 ... 0
99990 0 0 0 0 0 ... 0
99991 0 1 0 0 0 ... 0
99992 0 0 0 0 0 ... 0
99993 0 1 0 0 0 ... 0
99994 0 0 0 0 0 ... 0
99995 0 1 0 0 0 ... 0
99996 0 1 0 0 0 ... 0
99997 0 1 0 0 0 ... 0
99998 0 0 0 0 1 ... 0
99999 0 0 0 0 0 ... 0
The genres are Action Adventure Animation Children's ... Western. There are around 20 genres, but the dataframe doesn't print them all out. How can I figure out what reviews classified their movies in at least 2 genres? This means that they said that there movie belonged in two genres such as action and drama.
Since each of the genres is in its own dataframe column, I am a bit confused on how to do this. If there was one dataframe column I would simply use groupby becuase it would work well with the genres and their counts.
Any insight would help!
Edit: As and example you can see movie "0" toy story was classified in animation and Children's because it has a 1 in both columns.
Essentially you are only interested in rows for which the sum of the genres columns is greater than 1.
For all the columns this can be achieved by df = df[df.sum(axis=1) > 1] which will automatically ignore non-numeric columns.
The real issue here is how to sum only the genres columns (because movie id column also seem to be numeric).
If you have an external list of genres you can use it, ie df = df[df[['Horror', 'Comedy']].sum(axis=1) > 1].
Here is my dataframe:, yes it is quite large.
bigdataframe
Out[2]:
movie id movietitle releasedate \
0 1 Toy Story (1995) 01-Jan-1995
1 4 Get Shorty (1995) 01-Jan-1995
2 5 Copycat (1995) 01-Jan-1995
3 7 Twelve Monkeys (1995) 01-Jan-1995
4 8 Babe (1995) 01-Jan-1995
5 9 Dead Man Walking (1995) 01-Jan-1995
6 11 Seven (Se7en) (1995) 01-Jan-1995
7 12 Usual Suspects, The (1995) 14-Aug-1995
8 15 Mr. Holland's Opus (1995) 29-Jan-1996
9 17 From Dusk Till Dawn (1996) 05-Feb-1996
10 19 Antonia's Line (1995) 01-Jan-1995
11 21 Muppet Treasure Island (1996) 16-Feb-1996
12 22 Braveheart (1995) 16-Feb-1996
13 23 Taxi Driver (1976) 16-Feb-1996
14 24 Rumble in the Bronx (1995) 23-Feb-1996
15 25 Birdcage, The (1996) 08-Mar-1996
16 28 Apollo 13 (1995) 01-Jan-1995
17 30 Belle de jour (1967) 01-Jan-1967
18 31 Crimson Tide (1995) 01-Jan-1995
19 32 Crumb (1994) 01-Jan-1994
20 42 Clerks (1994) 01-Jan-1994
21 44 Dolores Claiborne (1994) 01-Jan-1994
22 45 Eat Drink Man Woman (1994) 01-Jan-1994
23 47 Ed Wood (1994) 01-Jan-1994
24 48 Hoop Dreams (1994) 01-Jan-1994
25 49 I.Q. (1994) 01-Jan-1994
26 50 Star Wars (1977) 01-Jan-1977
27 54 Outbreak (1995) 01-Jan-1995
28 55 Professional, The (1994) 01-Jan-1994
29 56 Pulp Fiction (1994) 01-Jan-1994
... ... ...
99970 332 Kiss the Girls (1997) 01-Jan-1997
99971 334 U Turn (1997) 01-Jan-1997
99972 338 Bean (1997) 01-Jan-1997
99973 346 Jackie Brown (1997) 01-Jan-1997
99974 682 I Know What You Did Last Summer (1997) 17-Oct-1997
99975 873 Picture Perfect (1997) 01-Aug-1997
99976 877 Excess Baggage (1997) 01-Jan-1997
99977 886 Life Less Ordinary, A (1997) 01-Jan-1997
99978 1527 Senseless (1998) 09-Jan-1998
99979 272 Good Will Hunting (1997) 01-Jan-1997
99980 288 Scream (1996) 20-Dec-1996
99981 294 Liar Liar (1997) 21-Mar-1997
99982 300 Air Force One (1997) 01-Jan-1997
99983 310 Rainmaker, The (1997) 01-Jan-1997
99984 313 Titanic (1997) 01-Jan-1997
99985 322 Murder at 1600 (1997) 18-Apr-1997
99986 328 Conspiracy Theory (1997) 08-Aug-1997
99987 333 Game, The (1997) 01-Jan-1997
99988 338 Bean (1997) 01-Jan-1997
99989 346 Jackie Brown (1997) 01-Jan-1997
99990 354 Wedding Singer, The (1998) 13-Feb-1998
99991 362 Blues Brothers 2000 (1998) 06-Feb-1998
99992 683 Rocket Man (1997) 01-Jan-1997
99993 689 Jackal, The (1997) 01-Jan-1997
99994 690 Seven Years in Tibet (1997) 01-Jan-1997
99995 748 Saint, The (1997) 14-Mar-1997
99996 751 Tomorrow Never Dies (1997) 01-Jan-1997
99997 879 Peacemaker, The (1997) 01-Jan-1997
99998 894 Home Alone 3 (1997) 01-Jan-1997
99999 901 Mr. Magoo (1997) 25-Dec-1997
videoreleasedate IMDb URL \
0 NaN http://us.imdb.com/M/title-exact?Toy%20Story%2...
1 NaN http://us.imdb.com/M/title-exact?Get%20Shorty%...
2 NaN http://us.imdb.com/M/title-exact?Copycat%20(1995)
3 NaN http://us.imdb.com/M/title-exact?Twelve%20Monk...
4 NaN http://us.imdb.com/M/title-exact?Babe%20(1995)
5 NaN http://us.imdb.com/M/title-exact?Dead%20Man%20...
6 NaN http://us.imdb.com/M/title-exact?Se7en%20(1995)
7 NaN http://us.imdb.com/M/title-exact?Usual%20Suspe...
8 NaN http://us.imdb.com/M/title-exact?Mr.%20Holland...
9 NaN http://us.imdb.com/M/title-exact?From%20Dusk%2...
10 NaN http://us.imdb.com/M/title-exact?Antonia%20(1995)
11 NaN http://us.imdb.com/M/title-exact?Muppet%20Trea...
12 NaN http://us.imdb.com/M/title-exact?Braveheart%20...
13 NaN http://us.imdb.com/M/title-exact?Taxi%20Driver...
14 NaN http://us.imdb.com/M/title-exact?Hong%20Faan%2...
15 NaN http://us.imdb.com/M/title-exact?Birdcage,%20T...
16 NaN http://us.imdb.com/M/title-exact?Apollo%2013%2...
17 NaN http://us.imdb.com/M/title-exact?Belle%20de%20...
18 NaN http://us.imdb.com/M/title-exact?Crimson%20Tid...
19 NaN http://us.imdb.com/M/title-exact?Crumb%20(1994)
20 NaN http://us.imdb.com/M/title-exact?Clerks%20(1994)
21 NaN http://us.imdb.com/M/title-exact?Dolores%20Cla...
22 NaN http://us.imdb.com/M/title-exact?Yinshi%20Nan%...
23 NaN http://us.imdb.com/M/title-exact?Ed%20Wood%20(...
24 NaN http://us.imdb.com/M/title-exact?Hoop%20Dreams...
25 NaN http://us.imdb.com/M/title-exact?I.Q.%20(1994)
26 NaN http://us.imdb.com/M/title-exact?Star%20Wars%2...
27 NaN http://us.imdb.com/M/title-exact?Outbreak%20(1...
28 NaN http://us.imdb.com/Title?L%E9on+(1994)
29 NaN http://us.imdb.com/M/title-exact?Pulp%20Fictio...
... ...
99970 NaN http://us.imdb.com/M/title-exact?Kiss+the+Girl...
99971 NaN http://us.imdb.com/Title?U+Turn+(1997)
99972 NaN http://us.imdb.com/M/title-exact?Bean+(1997)
99973 NaN http://us.imdb.com/M/title-exact?imdb-title-11...
99974 NaN http://us.imdb.com/M/title-exact?I+Know+What+Y...
99975 NaN http://us.imdb.com/M/title-exact?Picture+Perfe...
99976 NaN http://us.imdb.com/M/title-exact?Excess+Baggag...
99977 NaN http://us.imdb.com/M/title-exact?Life+Less+Ord...
99978 NaN http://us.imdb.com/M/title-exact?imdb-title-12...
99979 NaN http://us.imdb.com/M/title-exact?imdb-title-11...
99980 NaN http://us.imdb.com/M/title-exact?Scream%20(1996)
99981 NaN http://us.imdb.com/Title?Liar+Liar+(1997)
99982 NaN http://us.imdb.com/M/title-exact?Air+Force+One...
99983 NaN http://us.imdb.com/M/title-exact?Rainmaker,+Th...
99984 NaN http://us.imdb.com/M/title-exact?imdb-title-12...
99985 NaN http://us.imdb.com/M/title-exact?Murder%20at%2...
99986 NaN http://us.imdb.com/M/title-exact?Conspiracy+Th...
99987 NaN http://us.imdb.com/M/title-exact?Game%2C+The+(...
99988 NaN http://us.imdb.com/M/title-exact?Bean+(1997)
99989 NaN http://us.imdb.com/M/title-exact?imdb-title-11...
99990 NaN http://us.imdb.com/M/title-exact?Wedding+Singe...
99991 NaN http://us.imdb.com/M/title-exact?Blues+Brother...
99992 NaN http://us.imdb.com/M/title-exact?Rocket+Man+(1...
99993 NaN http://us.imdb.com/M/title-exact?Jackal%2C+The...
99994 NaN http://us.imdb.com/M/title-exact?Seven+Years+i...
99995 NaN http://us.imdb.com/M/title-exact?Saint%2C%20Th...
99996 NaN http://us.imdb.com/M/title-exact?imdb-title-12...
99997 NaN http://us.imdb.com/M/title-exact?Peacemaker%2C...
99998 NaN http://us.imdb.com/M/title-exact?imdb-title-11...
99999 NaN http://us.imdb.com/M/title-exact?imdb-title-11...
unknown Action Adventure Animation Childrens ... Western \
0 0 0 0 1 1 ... 0
1 0 1 0 0 0 ... 0
2 0 0 0 0 0 ... 0
3 0 0 0 0 0 ... 0
4 0 0 0 0 1 ... 0
5 0 0 0 0 0 ... 0
6 0 0 0 0 0 ... 0
7 0 0 0 0 0 ... 0
8 0 0 0 0 0 ... 0
9 0 1 0 0 0 ... 0
10 0 0 0 0 0 ... 0
11 0 1 1 0 0 ... 0
12 0 1 0 0 0 ... 0
13 0 0 0 0 0 ... 0
14 0 1 1 0 0 ... 0
15 0 0 0 0 0 ... 0
16 0 1 0 0 0 ... 0
17 0 0 0 0 0 ... 0
18 0 0 0 0 0 ... 0
19 0 0 0 0 0 ... 0
20 0 0 0 0 0 ... 0
21 0 0 0 0 0 ... 0
22 0 0 0 0 0 ... 0
23 0 0 0 0 0 ... 0
24 0 0 0 0 0 ... 0
25 0 0 0 0 0 ... 0
26 0 1 1 0 0 ... 0
27 0 1 0 0 0 ... 0
28 0 0 0 0 0 ... 0
29 0 0 0 0 0 ... 0
... ... ... ... ... ... ...
99970 0 0 0 0 0 ... 0
99971 0 1 0 0 0 ... 0
99972 0 0 0 0 0 ... 0
99973 0 0 0 0 0 ... 0
99974 0 0 0 0 0 ... 0
99975 0 0 0 0 0 ... 0
99976 0 0 1 0 0 ... 0
99977 0 0 0 0 0 ... 0
99978 0 0 0 0 0 ... 0
99979 0 0 0 0 0 ... 0
99980 0 0 0 0 0 ... 0
99981 0 0 0 0 0 ... 0
99982 0 1 0 0 0 ... 0
99983 0 0 0 0 0 ... 0
99984 0 1 0 0 0 ... 0
99985 0 0 0 0 0 ... 0
99986 0 1 0 0 0 ... 0
99987 0 0 0 0 0 ... 0
99988 0 0 0 0 0 ... 0
99989 0 0 0 0 0 ... 0
99990 0 0 0 0 0 ... 0
99991 0 1 0 0 0 ... 0
99992 0 0 0 0 0 ... 0
99993 0 1 0 0 0 ... 0
99994 0 0 0 0 0 ... 0
99995 0 1 0 0 0 ... 0
99996 0 1 0 0 0 ... 0
99997 0 1 0 0 0 ... 0
99998 0 0 0 0 1 ... 0
99999 0 0 0 0 0 ... 0
user id rating timestamp age gender occupation zipcode state \
0 308 4 887736532 60 M retired 95076 CA
1 308 5 887737890 60 M retired 95076 CA
2 308 4 887739608 60 M retired 95076 CA
3 308 4 887738847 60 M retired 95076 CA
4 308 5 887736696 60 M retired 95076 CA
5 308 4 887737194 60 M retired 95076 CA
6 308 5 887737837 60 M retired 95076 CA
7 308 5 887737243 60 M retired 95076 CA
8 308 3 887739426 60 M retired 95076 CA
9 308 4 887739056 60 M retired 95076 CA
10 308 3 887737383 60 M retired 95076 CA
11 308 3 887740729 60 M retired 95076 CA
12 308 4 887737647 60 M retired 95076 CA
13 308 5 887737293 60 M retired 95076 CA
14 308 4 887738057 60 M retired 95076 CA
15 308 4 887740649 60 M retired 95076 CA
16 308 3 887737036 60 M retired 95076 CA
17 308 4 887738933 60 M retired 95076 CA
18 308 3 887739472 60 M retired 95076 CA
19 308 5 887737432 60 M retired 95076 CA
20 308 4 887738191 60 M retired 95076 CA
21 308 4 887740451 60 M retired 95076 CA
22 308 4 887736843 60 M retired 95076 CA
23 308 4 887738933 60 M retired 95076 CA
24 308 4 887736880 60 M retired 95076 CA
25 308 3 887740833 60 M retired 95076 CA
26 308 5 887737431 60 M retired 95076 CA
27 308 2 887740254 60 M retired 95076 CA
28 308 3 887738760 60 M retired 95076 CA
29 308 5 887736924 60 M retired 95076 CA
... ... ... ... ... ... ... ...
99970 631 3 888465180 18 F student 38866 MS
99971 631 2 888464941 18 F student 38866 MS
99972 631 2 888465299 18 F student 38866 MS
99973 631 4 888465004 18 F student 38866 MS
99974 631 2 888465247 18 F student 38866 MS
99975 631 2 888465084 18 F student 38866 MS
99976 631 2 888465131 18 F student 38866 MS
99977 631 4 888465216 18 F student 38866 MS
99978 631 2 888465351 18 F student 38866 MS
99979 729 4 893286638 19 M student 56567 MN
99980 729 2 893286261 19 M student 56567 MN
99981 729 2 893286338 19 M student 56567 MN
99982 729 4 893286638 19 M student 56567 MN
99983 729 3 893286204 19 M student 56567 MN
99984 729 3 893286638 19 M student 56567 MN
99985 729 4 893286637 19 M student 56567 MN
99986 729 3 893286638 19 M student 56567 MN
99987 729 4 893286638 19 M student 56567 MN
99988 729 1 893286373 19 M student 56567 MN
99989 729 1 893286168 19 M student 56567 MN
99990 729 5 893286637 19 M student 56567 MN
99991 729 4 893286637 19 M student 56567 MN
99992 729 2 893286511 19 M student 56567 MN
99993 729 4 893286638 19 M student 56567 MN
99994 729 2 893286149 19 M student 56567 MN
99995 729 4 893286638 19 M student 56567 MN
99996 729 3 893286338 19 M student 56567 MN
99997 729 3 893286299 19 M student 56567 MN
99998 729 1 893286511 19 M student 56567 MN
99999 729 1 893286491 19 M student 56567 MN
State1
0 CA
1 CA
2 CA
3 CA
4 CA
5 CA
6 CA
7 CA
8 CA
9 CA
10 CA
11 CA
12 CA
13 CA
14 CA
15 CA
16 CA
17 CA
18 CA
19 CA
20 CA
21 CA
22 CA
23 CA
24 CA
25 CA
26 CA
27 CA
28 CA
29 CA
...
99970 MS
99971 MS
99972 MS
99973 MS
99974 MS
99975 MS
99976 MS
99977 MS
99978 MS
99979 MN
99980 MN
99981 MN
99982 MN
99983 MN
99984 MN
99985 MN
99986 MN
99987 MN
99988 MN
99989 MN
99990 MN
99991 MN
99992 MN
99993 MN
99994 MN
99995 MN
99996 MN
99997 MN
99998 MN
99999 MN
All of the genres are: [['Action', 'Adventure','Animation', 'Childrens', 'Comedy', 'Crime','Documentary', 'Drama', 'Fantasy', 'FilmNoir',
'Horror', 'Musical', 'Mystery', 'Romance','SciFi', 'Thriller', 'War', 'Western']]
How would I be able to figure out what genre had the highest average review, and which had the lowest average review? Should I groupby with ratings and then all of the corresponding genres?
df = bigdataframe[['Action', 'Adventure','Animation', 'Childrens', 'Comedy',
'Crime','Documentary', 'Drama', 'Fantasy', 'FilmNoir',
'Horror', 'Musical', 'Mystery',
'Romance','SciFi', 'Thriller', 'War', 'Western','rating']]
gp = df.groupby('rating')
result = gp.agg(['mean'])
result gives me this:
Action Adventure Animation Childrens Comedy Crime \
mean mean mean mean mean mean
rating
1 0.253191 0.131588 0.030442 0.093944 0.372995 0.068249
2 0.286192 0.150308 0.032806 0.084521 0.339138 0.073351
3 0.267232 0.143710 0.037502 0.081709 0.322380 0.073899
4 0.246708 0.129806 0.036051 0.064728 0.284485 0.082958
5 0.240696 0.136928 0.037545 0.057403 0.246403 0.092590
Documentary Drama Fantasy FilmNoir Horror Musical \
mean mean mean mean mean mean
rating
1 0.009656 0.289034 0.018331 0.007365 0.082324 0.046645
2 0.005101 0.320756 0.019349 0.008531 0.071592 0.050484
3 0.006042 0.363861 0.016983 0.013520 0.055738 0.052238
4 0.007842 0.427459 0.011207 0.019430 0.047112 0.047609
5 0.009858 0.471534 0.008301 0.026414 0.041366 0.049526
Mystery Romance SciFi Thriller War Western
mean mean mean mean mean mean
rating
1 0.041735 0.154173 0.118494 0.203764 0.060065 0.011620
2 0.046262 0.177397 0.133597 0.229903 0.067018 0.015743
3 0.048112 0.186443 0.121422 0.224277 0.074415 0.019893
4 0.056563 0.201381 0.125154 0.222772 0.097589 0.019606
5 0.057780 0.215037 0.137446 0.203387 0.137446 0.018584
I think you need idxmin and idxmax, also new DataFrame is not necessary, you can use bigdataframe and filter columns in []:
genres = ['Action', 'Adventure','Animation', 'Childrens', 'Comedy', 'Crime','Documentary', 'Drama', 'Fantasy', 'FilmNoir', 'Horror', 'Musical', 'Mystery', 'Romance','SciFi', 'Thriller', 'War', 'Western']
df1 = bigdataframe.groupby('rating')[genres].mean()
print (df1)
Action Adventure Animation Childrens Comedy Crime \
rating
1 0.253191 0.131588 0.030442 0.093944 0.372995 0.068249
2 0.286192 0.150308 0.032806 0.084521 0.339138 0.073351
3 0.267232 0.143710 0.037502 0.081709 0.322380 0.073899
4 0.246708 0.129806 0.036051 0.064728 0.284485 0.082958
5 0.240696 0.136928 0.037545 0.057403 0.246403 0.092590
Documentary Drama Fantasy FilmNoir Horror Musical \
rating
1 0.009656 0.289034 0.018331 0.007365 0.082324 0.046645
2 0.005101 0.320756 0.019349 0.008531 0.071592 0.050484
3 0.006042 0.363861 0.016983 0.013520 0.055738 0.052238
4 0.007842 0.427459 0.011207 0.019430 0.047112 0.047609
5 0.009858 0.471534 0.008301 0.026414 0.041366 0.049526
Mystery Romance SciFi Thriller War Western
rating
1 0.041735 0.154173 0.118494 0.203764 0.060065 0.011620
2 0.046262 0.177397 0.133597 0.229903 0.067018 0.015743
3 0.048112 0.186443 0.121422 0.224277 0.074415 0.019893
4 0.056563 0.201381 0.125154 0.222772 0.097589 0.019606
5 0.057780 0.215037 0.137446 0.203387 0.137446 0.018584
mingen = df1.idxmin(axis=1).reset_index(name='Genre')
print (mingen)
rating Genre
0 1 FilmNoir
1 2 Documentary
2 3 Documentary
3 4 Documentary
4 5 Fantasy
maxgen = df1.idxmax(axis=1).reset_index(name='Genre')
print (maxgen)
rating Genre
0 1 Comedy
1 2 Comedy
2 3 Drama
3 4 Drama
4 5 Drama