Retrieving HTML Info with Beautiful Soup - python

I am very new to using Beautiful Soup and struggling to grasp how to retrieve my intended data. I have the following source code:
<pre>
<div class="scorebox_meta">
<div><strong>Saturday August 14, 2021</strong>,<span class="venuetime" data-venue-date="2021-08-14" data-venue-time="12:30" data-venue-epoch="1628940600">12:30 (venue time)</span> <span class="localtime" data-label="your time">13:30 (local time)</span>
</div><div><strong><small>Venue</small></strong>: <small>Old Trafford, Manchester</small></div>
</pre>
I'm trying to retrieve the date (2021-08-14), time (12:30), and venue location (Old Trafford, Manchester), but the code I've written returns either "none" or all the information in the scorebox. Can anyone assist?
import requests
from bs4 import BeautifulSoup
url = "https://fbref.com/en/matches/e62685d4/Manchester-United-Leeds-United-August-14-2021-Premier-League"
response = requests.get(url)
soup = BeautifulSoup(response.text, "html.parser")
scorebox = soup.find("div", class_="scorebox_meta")
date = scorebox.select("div", class_="data-venue-date")
time = scorebox.select("div", class_="data-venue-time")
venue = scorebox.select("div", class_="match_stadium")
print("Date:", date)
print("Time:", time)
print("Venue:", venue)

Here is one way to get the information from that page:
import pandas as pd
url = "https://fbref.com/en/matches/e62685d4/Manchester-United-Leeds-United-August-14-2021-Premier-League"
dfs = pd.read_html(url)
for df in dfs:
print(df.head())
print('\n')
Result in terminal:
Manchester Utd (4-2-3-1) Manchester Utd (4-2-3-1).1
0 1 David de Gea
1 2 Victor Lindelöf
2 5 Harry Maguire
3 6 Paul Pogba
4 11 Mason Greenwood
Leeds United (4-1-4-1) Leeds United (4-1-4-1).1
0 1 Illan Meslier
1 2 Luke Ayling
2 5 Robin Koch
3 6 Liam Cooper
4 9 Patrick Bamford
Manchester Utd Leeds United
Possession Possession
0 49% 51%
1 Passing Accuracy Passing Accuracy
2 363 of 465 — 78% 77% — 372 of 482
3 Shots on Target Shots on Target
4 8 of 16 — 50% 30% — 3 of 10
Unnamed: 0_level_0 Unnamed: 1_level_0 Unnamed: 2_level_0 Unnamed: 3_level_0 Unnamed: 4_level_0 Unnamed: 5_level_0 Performance Expected SCA Passes Carries Take-Ons
Player # Nation Pos Age Min Gls Ast PK PKatt Sh SoT CrdY CrdR Touches Tkl Int Blocks xG npxG xAG SCA GCA Cmp Att Cmp% PrgP Carries PrgC Att Succ
0 Mason Greenwood 11.0 eng ENG FW,RW 19-317 90 1 0 0 0 3 2 0 0 38 0 0 1 0.1 0.1 0.3 5 1 22 27 81.5 4 28 3 9 5
1 Paul Pogba 6.0 fr FRA LW 28-152 74 0 4 0 0 2 1 0 0 46 1 1 1 0.3 0.3 0.7 5 4 25 33 75.8 5 29 2 2 1
2 Anthony Martial 9.0 fr FRA FW 25-252 16 0 0 0 0 0 0 0 0 7 0 0 0 0.0 0.0 0.0 0 0 5 7 71.4 1 5 0 2 2
3 Daniel James 21.0 wls WAL RW 23-277 74 0 0 0 0 3 1 0 0 34 2 0 2 0.2 0.2 0.0 2 0 13 19 68.4 2 16 2 1 0
4 Jadon Sancho 25.0 eng ENG LW 21-142 16 0 0 0 0 0 0 0 0 15 0 0 0 0.0 0.0 0.0 0 0 10 14 71.4 0 8 1 2 1
Unnamed: 0_level_0 Unnamed: 1_level_0 Unnamed: 2_level_0 Unnamed: 3_level_0 Unnamed: 4_level_0 Unnamed: 5_level_0 Total Short Medium Long Unnamed: 20_level_0 Unnamed: 21_level_0 Unnamed: 22_level_0 Unnamed: 23_level_0 Unnamed: 24_level_0 Unnamed: 25_level_0 Unnamed: 26_level_0 Unnamed: 27_level_0
Player # Nation Pos Age Min Cmp Att Cmp% TotDist PrgDist Cmp Att Cmp% Cmp Att Cmp% Cmp Att Cmp% Ast xAG xA KP 1/3 PPA CrsPA PrgP
0 Mason Greenwood 11.0 eng ENG FW,RW 19-317 90 22 27 81.5 319 102 13 14 92.9 8 8 100.0 1 4 25.0 0 0.3 0.1 1 3 2 0 4
1 Paul Pogba 6.0 fr FRA LW 28-152 74 25 33 75.8 523 212 10 15 66.7 6 7 85.7 7 8 87.5 4 0.7 0.5 5 3 2 0 5
2 Anthony Martial 9.0 fr FRA FW 25-252 16 5 7 71.4 79 14 2 2 100.0 3 4 75.0 0 0 NaN 0 0.0 0.0 0 1 0 0 1
3 Daniel James 21.0 wls WAL RW 23-277 74 13 19 68.4 114 33 11 12 91.7 1 2 50.0 0 1 0.0 0 0.0 0.0 1 1 0 0 2
4 Jadon Sancho 25.0 eng ENG LW 21-142 16 10 14 71.4 113 3 6 8 75.0 3 5 60.0 0 0 NaN 0 0.0 0.0 0 0 0 0 0
Unnamed: 0_level_0 Unnamed: 1_level_0 Unnamed: 2_level_0 Unnamed: 3_level_0 Unnamed: 4_level_0 Unnamed: 5_level_0 Unnamed: 6_level_0 Pass Types Corner Kicks Outcomes
Player # Nation Pos Age Min Att Live Dead FK TB Sw Crs TI CK In Out Str Cmp Off Blocks
0 Mason Greenwood 11.0 eng ENG FW,RW 19-317 90 27 25 2 0 3 0 2 0 2 2 0 0 22 0 0
1 Paul Pogba 6.0 fr FRA LW 28-152 74 33 32 1 1 2 3 0 0 0 0 0 0 25 0 1
2 Anthony Martial 9.0 fr FRA FW 25-252 16 7 7 0 0 0 0 0 0 0 0 0 0 5 0 1
3 Daniel James 21.0 wls WAL RW 23-277 74 19 19 0 0 0 0 2 0 0 0 0 0 13 0 1
4 Jadon Sancho 25.0 eng ENG LW 21-142 16 14 13 1 0 1 0 0 1 0 0 0 0 10 0 0
Unnamed: 0_level_0 Unnamed: 1_level_0 Unnamed: 2_level_0 Unnamed: 3_level_0 Unnamed: 4_level_0 Unnamed: 5_level_0 Tackles Challenges Blocks Unnamed: 18_level_0 Unnamed: 19_level_0 Unnamed: 20_level_0 Unnamed: 21_level_0
Player # Nation Pos Age Min Tkl TklW Def 3rd Mid 3rd Att 3rd Tkl Att Tkl% Lost Blocks Sh Pass Int Tkl+Int Clr Err
0 Mason Greenwood 11.0 eng ENG FW,RW 19-317 90 0 0 0 0 0 0 0 NaN 0 1 0 1 0 0 0 0
1 Paul Pogba 6.0 fr FRA LW 28-152 74 1 1 0 1 0 1 3 33.3 2 1 0 1 1 2 0 0
2 Anthony Martial 9.0 fr FRA FW 25-252 16 0 0 0 0 0 0 0 NaN 0 0 0 0 0 0 0 0
3 Daniel James 21.0 wls WAL RW 23-277 74 2 1 0 2 0 0 0 NaN 0 2 0 2 0 2 1 0
4 Jadon Sancho 25.0 eng ENG LW 21-142 16 0 0 0 0 0 0 0 NaN 0 0 0 0 0 0 0 0
Unnamed: 0_level_0 Unnamed: 1_level_0 Unnamed: 2_level_0 Unnamed: 3_level_0 Unnamed: 4_level_0 Unnamed: 5_level_0 Touches Take-Ons Carries Receiving
Player # Nation Pos Age Min Touches Def Pen Def 3rd Mid 3rd Att 3rd Att Pen Live Att Succ Succ% Tkld Tkld% Carries TotDist PrgDist PrgC 1/3 CPA Mis Dis Rec PrgR
0 Mason Greenwood 11.0 eng ENG FW,RW 19-317 90 38 0 3 16 21 2 38 9 5 55.6 4 44.4 28 180 111 3 3 2 3 2 28 6
1 Paul Pogba 6.0 fr FRA LW 28-152 74 46 1 3 23 20 3 46 2 1 50.0 1 50.0 29 125 73 2 3 1 6 3 35 6
2 Anthony Martial 9.0 fr FRA FW 25-252 16 7 0 1 6 1 0 7 2 2 100.0 0 0.0 5 57 25 0 0 0 0 0 6 1
3 Daniel James 21.0 wls WAL RW 23-277 74 34 1 2 13 19 5 34 1 0 0.0 1 100.0 16 132 48 2 1 1 3 1 21 10
4 Jadon Sancho 25.0 eng ENG LW 21-142 16 15 0 1 8 6 0 15 2 1 50.0 1 50.0 8 44 24 1 1 0 0 0 13 0
Unnamed: 0_level_0 Unnamed: 1_level_0 Unnamed: 2_level_0 Unnamed: 3_level_0 Unnamed: 4_level_0 Unnamed: 5_level_0 Performance Aerial Duels
Player # Nation Pos Age Min CrdY CrdR 2CrdY Fls Fld Off Crs Int TklW PKwon PKcon OG Recov Won Lost Won%
0 Mason Greenwood 11.0 eng ENG FW,RW 19-317 90 0 0 0 2 1 0 2 0 0 0 0 0 5 0 2 0.0
1 Paul Pogba 6.0 fr FRA LW 28-152 74 0 0 0 2 1 0 0 1 1 0 0 0 5 1 1 50.0
2 Anthony Martial 9.0 fr FRA FW 25-252 16 0 0 0 0 0 0 0 0 0 0 0 0 1 0 0 NaN
3 Daniel James 21.0 wls WAL RW 23-277 74 0 0 0 0 0 2 2 0 1 0 0 0 3 1 1 50.0
4 Jadon Sancho 25.0 eng ENG LW 21-142 16 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 NaN
Unnamed: 0_level_0 Unnamed: 1_level_0 Unnamed: 2_level_0 Unnamed: 3_level_0 Shot Stopping Launched Passes Goal Kicks Crosses Sweeper
Player Nation Age Min SoTA GA Saves Save% PSxG Cmp Att Cmp% Att Thr Launch% AvgLen Att Launch% AvgLen Opp Stp Stp% #OPA AvgDist
0 David de Gea es ESP 30-280 90 3 1 2 66.7 1.0 7 11 63.6 24 3 8.3 20.8 12 75.0 42.5 12 0 0.0 0 12.0
Unnamed: 0_level_0 Unnamed: 1_level_0 Unnamed: 2_level_0 Unnamed: 3_level_0 Unnamed: 4_level_0 Unnamed: 5_level_0 Performance Expected SCA Passes Carries Take-Ons
Player # Nation Pos Age Min Gls Ast PK PKatt Sh SoT CrdY CrdR Touches Tkl Int Blocks xG npxG xAG SCA GCA Cmp Att Cmp% PrgP Carries PrgC Att Succ
0 Patrick Bamford 9.0 eng ENG FW 27-343 76 0 0 0 0 1 0 0 0 25 0 1 1 0.1 0.1 0.1 1 0 12 14 85.7 2 12 0 0 0
1 Tyler Roberts 11.0 wls WAL FW 22-214 14 0 0 0 0 1 0 0 0 6 1 0 0 0.0 0.0 0.2 1 0 3 3 100.0 1 3 0 0 0
2 Jack Harrison 22.0 eng ENG LM 24-267 68 0 0 0 0 1 1 0 0 31 1 1 1 0.0 0.0 0.1 3 0 14 21 66.7 3 24 0 1 1
3 Hélder Costa 17.0 ao ANG LM 27-214 22 0 0 0 0 1 0 0 0 9 0 0 1 0.0 0.0 0.0 1 0 5 5 100.0 0 6 0 0 0
4 Mateusz Klich 43.0 pl POL FW,CM 31-062 90 0 0 0 0 2 1 0 0 49 2 0 0 0.1 0.1 0.0 2 0 34 46 73.9 3 31 2 1 1
Unnamed: 0_level_0 Unnamed: 1_level_0 Unnamed: 2_level_0 Unnamed: 3_level_0 Unnamed: 4_level_0 Unnamed: 5_level_0 Total Short Medium Long Unnamed: 20_level_0 Unnamed: 21_level_0 Unnamed: 22_level_0 Unnamed: 23_level_0 Unnamed: 24_level_0 Unnamed: 25_level_0 Unnamed: 26_level_0 Unnamed: 27_level_0
Player # Nation Pos Age Min Cmp Att Cmp% TotDist PrgDist Cmp Att Cmp% Cmp Att Cmp% Cmp Att Cmp% Ast xAG xA KP 1/3 PPA CrsPA PrgP
0 Patrick Bamford 9.0 eng ENG FW 27-343 76 12 14 85.7 141 25 10 12 83.3 0 0 NaN 1 1 100.0 0 0.1 0.0 1 0 1 0 2
1 Tyler Roberts 11.0 wls WAL FW 22-214 14 3 3 100.0 70 22 1 1 100.0 1 1 100.0 1 1 100.0 0 0.2 0.0 1 0 2 1 1
2 Jack Harrison 22.0 eng ENG LM 24-267 68 14 21 66.7 215 94 7 10 70.0 5 6 83.3 1 3 33.3 0 0.1 0.0 2 3 0 0 3
3 Hélder Costa 17.0 ao ANG LM 27-214 22 5 5 100.0 79 0 2 2 100.0 3 3 100.0 0 0 NaN 0 0.0 0.0 0 0 0 0 0
4 Mateusz Klich 43.0 pl POL FW,CM 31-062 90 34 46 73.9 518 80 19 21 90.5 12 16 75.0 2 5 40.0 0 0.0 0.0 0 3 0 0 3
Unnamed: 0_level_0 Unnamed: 1_level_0 Unnamed: 2_level_0 Unnamed: 3_level_0 Unnamed: 4_level_0 Unnamed: 5_level_0 Unnamed: 6_level_0 Pass Types Corner Kicks Outcomes
Player # Nation Pos Age Min Att Live Dead FK TB Sw Crs TI CK In Out Str Cmp Off Blocks
0 Patrick Bamford 9.0 eng ENG FW 27-343 76 14 8 6 0 0 0 0 0 0 0 0 0 12 0 0
1 Tyler Roberts 11.0 wls WAL FW 22-214 14 3 3 0 0 0 0 1 0 0 0 0 0 3 0 0
2 Jack Harrison 22.0 eng ENG LM 24-267 68 21 21 0 0 0 0 1 0 0 0 0 0 14 0 1
3 Hélder Costa 17.0 ao ANG LM 27-214 22 5 5 0 0 0 0 0 0 0 0 0 0 5 0 0
4 Mateusz Klich 43.0 pl POL FW,CM 31-062 90 46 44 1 0 1 0 1 1 0 0 0 0 34 1 1
Unnamed: 0_level_0 Unnamed: 1_level_0 Unnamed: 2_level_0 Unnamed: 3_level_0 Unnamed: 4_level_0 Unnamed: 5_level_0 Tackles Challenges Blocks Unnamed: 18_level_0 Unnamed: 19_level_0 Unnamed: 20_level_0 Unnamed: 21_level_0
Player # Nation Pos Age Min Tkl TklW Def 3rd Mid 3rd Att 3rd Tkl Att Tkl% Lost Blocks Sh Pass Int Tkl+Int Clr Err
0 Patrick Bamford 9.0 eng ENG FW 27-343 76 0 0 0 0 0 0 1 0.0 1 1 0 1 1 1 0 0
1 Tyler Roberts 11.0 wls WAL FW 22-214 14 1 0 0 1 0 0 0 NaN 0 0 0 0 0 1 0 0
2 Jack Harrison 22.0 eng ENG LM 24-267 68 1 0 0 1 0 1 2 50.0 1 1 0 1 1 2 0 0
3 Hélder Costa 17.0 ao ANG LM 27-214 22 0 0 0 0 0 0 0 NaN 0 1 0 1 0 0 0 0
4 Mateusz Klich 43.0 pl POL FW,CM 31-062 90 2 1 0 1 1 2 2 100.0 0 0 0 0 0 2 0 0
Unnamed: 0_level_0 Unnamed: 1_level_0 Unnamed: 2_level_0 Unnamed: 3_level_0 Unnamed: 4_level_0 Unnamed: 5_level_0 Touches Take-Ons Carries Receiving
Player # Nation Pos Age Min Touches Def Pen Def 3rd Mid 3rd Att 3rd Att Pen Live Att Succ Succ% Tkld Tkld% Carries TotDist PrgDist PrgC 1/3 CPA Mis Dis Rec PrgR
0 Patrick Bamford 9.0 eng ENG FW 27-343 76 25 0 0 17 8 3 25 0 0 NaN 0 NaN 12 19 3 0 0 0 5 1 16 2
1 Tyler Roberts 11.0 wls WAL FW 22-214 14 6 0 0 2 4 1 6 0 0 NaN 0 NaN 3 9 4 0 0 0 1 0 4 2
2 Jack Harrison 22.0 eng ENG LM 24-267 68 31 0 5 19 7 0 31 1 1 100.0 0 0.0 24 78 25 0 0 0 4 0 22 6
3 Hélder Costa 17.0 ao ANG LM 27-214 22 9 0 2 4 3 0 9 0 0 NaN 0 NaN 6 18 1 0 0 0 1 0 6 2
4 Mateusz Klich 43.0 pl POL FW,CM 31-062 90 49 0 6 28 16 1 49 1 1 100.0 0 0.0 31 180 72 2 1 0 1 1 39 9
Unnamed: 0_level_0 Unnamed: 1_level_0 Unnamed: 2_level_0 Unnamed: 3_level_0 Unnamed: 4_level_0 Unnamed: 5_level_0 Performance Aerial Duels
Player # Nation Pos Age Min CrdY CrdR 2CrdY Fls Fld Off Crs Int TklW PKwon PKcon OG Recov Won Lost Won%
0 Patrick Bamford 9.0 eng ENG FW 27-343 76 0 0 0 1 0 1 0 1 0 0 0 0 1 0 2 0.0
1 Tyler Roberts 11.0 wls WAL FW 22-214 14 0 0 0 0 0 0 1 0 0 0 0 0 0 0 2 0.0
2 Jack Harrison 22.0 eng ENG LM 24-267 68 0 0 0 1 1 1 1 1 0 0 0 0 3 0 1 0.0
3 Hélder Costa 17.0 ao ANG LM 27-214 22 0 0 0 1 2 0 0 0 0 0 0 0 1 0 1 0.0
4 Mateusz Klich 43.0 pl POL FW,CM 31-062 90 0 0 0 2 0 0 1 0 1 0 0 0 6 0 0 NaN
Unnamed: 0_level_0 Unnamed: 1_level_0 Unnamed: 2_level_0 Unnamed: 3_level_0 Shot Stopping Launched Passes Goal Kicks Crosses Sweeper
Player Nation Age Min SoTA GA Saves Save% PSxG Cmp Att Cmp% Att Thr Launch% AvgLen Att Launch% AvgLen Opp Stp Stp% #OPA AvgDist
0 Illan Meslier fr FRA 21-165 90 8 5 3 37.5 1.6 5 20 25.0 37 10 37.8 30.2 10 60.0 44.5 10 1 10.0 2 13.3
Unnamed: 0_level_0 Unnamed: 1_level_0 Unnamed: 2_level_0 Unnamed: 3_level_0 Unnamed: 4_level_0 Unnamed: 5_level_0 Unnamed: 6_level_0 Unnamed: 7_level_0 Unnamed: 8_level_0 SCA 1 SCA 2
Minute Player Squad xG PSxG Outcome Distance Body Part Notes Player Event Player Event
0 2.0 Scott McTominay Manchester Utd 0.03 NaN Blocked 15.0 Right Foot NaN Bruno Fernandes Pass (Live) Daniel James Pass (Live)
1 6.0 Mason Greenwood Manchester Utd 0.02 0.13 Saved 28.0 Left Foot NaN Scott McTominay Pass (Live) Bruno Fernandes Pass (Live)
2 9.0 Mateusz Klich Leeds United 0.02 NaN Off Target 22.0 Right Foot NaN Jack Harrison Pass (Live) Raphinha Pass (Dead)
3 11.0 Harry Maguire Manchester Utd 0.01 NaN Off Target 14.0 Head NaN Luke Shaw Pass (Dead) NaN NaN
4 12.0 Paul Pogba Manchester Utd 0.25 NaN Off Target 16.0 Left Foot NaN Mason Greenwood Pass (Live) Mason Greenwood Take-On
Unnamed: 0_level_0 Unnamed: 1_level_0 Unnamed: 2_level_0 Unnamed: 3_level_0 Unnamed: 4_level_0 Unnamed: 5_level_0 Unnamed: 6_level_0 Unnamed: 7_level_0 Unnamed: 8_level_0 SCA 1 SCA 2
Minute Player Squad xG PSxG Outcome Distance Body Part Notes Player Event Player Event
0 2.0 Scott McTominay Manchester Utd 0.03 NaN Blocked 15.0 Right Foot NaN Bruno Fernandes Pass (Live) Daniel James Pass (Live)
1 6.0 Mason Greenwood Manchester Utd 0.02 0.13 Saved 28.0 Left Foot NaN Scott McTominay Pass (Live) Bruno Fernandes Pass (Live)
2 11.0 Harry Maguire Manchester Utd 0.01 NaN Off Target 14.0 Head NaN Luke Shaw Pass (Dead) NaN NaN
3 12.0 Paul Pogba Manchester Utd 0.25 NaN Off Target 16.0 Left Foot NaN Mason Greenwood Pass (Live) Mason Greenwood Take-On
4 15.0 Daniel James Manchester Utd 0.03 NaN Blocked 17.0 Left Foot NaN NaN NaN NaN NaN
Unnamed: 0_level_0 Unnamed: 1_level_0 Unnamed: 2_level_0 Unnamed: 3_level_0 Unnamed: 4_level_0 Unnamed: 5_level_0 Unnamed: 6_level_0 Unnamed: 7_level_0 Unnamed: 8_level_0 SCA 1 SCA 2
Minute Player Squad xG PSxG Outcome Distance Body Part Notes Player Event Player Event
0 9.0 Mateusz Klich Leeds United 0.02 NaN Off Target 22.0 Right Foot NaN Jack Harrison Pass (Live) Raphinha Pass (Dead)
1 16.0 Jack Harrison Leeds United 0.02 0.05 Saved 25.0 Left Foot Volley Mateusz Klich Pass (Live) Raphinha Pass (Dead)
2 16.0 Mateusz Klich Leeds United 0.04 0.39 Saved 27.0 Right Foot NaN Mateusz Klich Take-On Jack Harrison Pass (Live)
3 26.0 Patrick Bamford Leeds United 0.12 NaN Off Target 10.0 Head NaN Raphinha Pass (Dead) Jack Harrison Fouled
4 34.0 Raphinha Leeds United 0.05 NaN Blocked 20.0 Left Foot NaN Patrick Bamford Pass (Live) Stuart Dallas Pass (Live)
​
​
Pick the table(s) you want. Pandas documentation can be found here.
Edit: here is another way to get the particular info OP is after:
import requests
from bs4 import BeautifulSoup
url = "https://fbref.com/en/matches/e62685d4/Manchester-United-Leeds-United-August-14-2021-Premier-League"
response = requests.get(url)
soup = BeautifulSoup(response.text, "html.parser")
scorebox = soup.find("div", class_="scorebox_meta")
date = scorebox.select_one('span[class="venuetime"]').get('data-venue-date')
time = scorebox.select_one('span[class="venuetime"]').get('data-venue-time')
venue = scorebox.find('small', string='Venue').find_next('small').text
print(date, time, venue)
Result in terminal:
2021-08-14 12:30 Old Trafford, Manchester

In scorebox.select("div", class_="data-venue-date") it's not selecting specific class with "data-venue-date", but it's just selecting all div in 'scorebox'.
PLUS: "data-venue-date" is not a class but it is attribute, also it's not the attribute of "div" but attribute of "span" element.
To do as you wanted:
print(scorebox.find("span", {"data-venue-date" : re.compile(r".*")}))
# <span class="venuetime" data-venue-date="2021-08-14" data-venue-epoch="1628940600" data-venue-time="12:30">12:30 (venue time)</span>
But we don't need to do this, we can do:
print(scorebox.find("div").text)
print(scorebox.find_all("div")[-2].text)
Selecting first "div" inside scorebox
Selecting second last "div" of scorebox
Output:
Saturday August 14, 2021, 12:30 (venue time)
Venue: Old Trafford, Manchester

Related

loop for creating new column and fill with neighborhood row

I have following df. I am going to dynamically create new columns based on number of date (day_number=2), and conditionally fill them based on "code" and "count"
Current format:
code count
id date
ABC1 2019-04-04 1 76
2019-04-05 2 82
Desired matrix-like format:
code count code1_day1 code1_day1 code1_day2 code2_day2
id date
ABC1 2019-04-04 1 76 76 0 0 82
2019-04-05 2 82
I have done this but it fills the same for every column:
code=[1,2]
for date, new in df.groupby(level=[0]):
for col in range(day_number): # day_number=2
for lvl in code:
new[f"day{col+1}_code1"]=new['count'].where(new['code']==1)
new[f"day{col+1}_code2"]=new['count'].where(new['code']==2)
So many thanks for your help!
A biger example of the databse:
code count new-col1 new_col2 ......
id date
ABC1
2019-04-04 1 76 76 0 79 0 82 0 83 0 88 0 55 3 65 6
2019-04-05 1 79 79 0 82 0 83 0 88 0 55 3 65 6 101 10
2019-04-06 1 82 82 0 83 0 88 0 55 3 65 6 101 10 120 14
2019-04-07 2 83 83 0 88 0 55 3 65 6 101 10 120 14 0 0
2019-04-08 1 88 88 0 55 3 65 6 101 10 120 14 0 0 0 0
2019-04-09 1 55 55 3 65 6 101 10 120 14 0 0 0 0 10 0
2019-04-09 2 3 65 6 101 10 120 14 0 0 0 0 10 0
2019-04-10 1 65 101 10 120 14 0 0 0 0 10 0
2019-04-10 2 6 120 14 0 0 0 0 10 0
2019-04-11 1 101 0 0 0 0 10 0
your sample data is not so usable so I've simulated
considering differently, the data is grouped, hence groupby() ID in index and code
apply() after a groupby() gets passed a dataframe, build required columns on this dataframe
d = pd.date_range("01-jan-2021","03-jan-2021")
df = pd.concat([
pd.DataFrame({"ID":"ABC1","date":d,"code":1,"count":np.random.randint(20,50, len(d))}),
pd.DataFrame({"ID":"ABC1","date":d,"code":2,"count":np.random.randint(20,50, len(d))})
]).sort_values(["ID","date","code"], ascending=[True,False,True]).set_index(["ID","date"])
# pad an array with NaN to same length as second iterable
def nppad(a, s):
return np.pad(a.astype(float), (0,len(s)-len(a)), "constant", constant_values=np.nan)
df2 = df.groupby(["ID","code"]).apply(lambda dfa: dfa.assign(**{f"code{dfa.iloc[0,0]}_day{i+1}":
nppad(dfa["count"].values[i:],dfa)
for i in range(len(dfa))}))
output
code count code1_day1 code1_day2 code1_day3 code2_day1 code2_day2 code2_day3
ID date
ABC1 2021-01-03 1 40 40.0 38.0 46.0 NaN NaN NaN
2021-01-03 2 37 NaN NaN NaN 37.0 33.0 33.0
2021-01-02 1 38 38.0 46.0 NaN NaN NaN NaN
2021-01-02 2 33 NaN NaN NaN 33.0 33.0 NaN
2021-01-01 1 46 46.0 NaN NaN NaN NaN NaN
2021-01-01 2 33 NaN NaN NaN 33.0 NaN NaN

How to concatenate the following dataframe

I have two dataframes:
file_date = str((date.today() - timedelta(days = 2)).strftime('%m-%d-%Y'))
file_date
github_dir_path = 'https://github.com/CSSEGISandData/COVID-19/raw/master/csse_covid_19_data/csse_covid_19_daily_reports/'
file_path = github_dir_path + file_date + '.csv'
first dataframe:
FIPS Admin2 Province_State Country_Region Last_Update Lat Long_ Confirmed Deaths Recovered Active Combined_Key
0 45001.0 Abbeville South Carolina US 2020-04-28 02:30:51 34.223334 -82.461707 29 0 0 29 Abbeville, South Carolina, US
1 22001.0 Acadia Louisiana US 2020-04-28 02:30:51 30.295065 -92.414197 130 9 0 121 Acadia, Louisiana, US
2 51001.0 Accomack Virginia US 2020-04-28 02:30:51 37.767072 -75.632346 195 3 0 192 Accomack, Virginia, US
3 16001.0 Ada Idaho US 2020-04-28 02:30:51 43.452658 -116.241552 650 15 0 635 Ada, Idaho, US
4 19001.0 Adair Iowa US 2020-04-28 02:30:51 41.330756 -94.471059 1 0 0 1 Adair, Iowa, US
#
0 0 ... 0 Kerala 0 Kerala 1
2 2020-02-01 Kerala 2 0 0 ... 0 Kerala 0 Kerala 2
3 2020-02-02 Kerala 3 0 0 ... 0 Kerala 0 Kerala 3
4 2020-02-03 Kerala 3 0 0 ... 0 Kerala 0 Kerala 3
Please guide me on how to concatenate both the data frames. I tried a couple of things but did not get the expected result.

How do I count the number of items with a specific column name from a Pandas df? (Python)

Pick Tm Player Pos Age To AP1 PB St CarAV ... Att Yds TD Rec Yds TD Tkl Int Sk College/Univ
0 1 CLE Myles Garrett DE 21 2017 0 0 0 0 ... 0 0 0 0 0 0 13 5.0 Texas A&M
1 2 CHI Mitch Trubisky QB 23 2017 0 0 1 0 ... 29 194 0 0 0 0 North Carolina
2 3 SFO Solomon Thomas DE 21 2017 0 0 1 0 ... 0 0 0 0 0 0 25 2.0 Stanford
3 4 JAX Leonard Fournette RB 22 2017 0 0 1 0 ... 207 822 7 25 195 1 LSU
4 5 TEN Corey Davis WR 22 2017 0 0 1 0 ... 0 0 0 22 227 0 West. Michigan
Given this df, I want to count the number of players per College/Univ.
So, just in this particular df, all collegs will have the value of 1.
Given a df and a college name, how can I count the number of items?
You can create boolean mask and then count Trues by sum, Trues are processes like 1s:
(df['College/Univ'] == 'Texas A&M').sum()

How to count occurrences that appear in 2 or more dataframe columns?

Here is the data from my problem below. This is a set of code based on movie reviewers. One line = one review by a reviewer.
bigdataframe
Out[43]:
movie id movietitle releasedate \
0 1 Toy Story (1995) 01-Jan-1995
1 4 Get Shorty (1995) 01-Jan-1995
2 5 Copycat (1995) 01-Jan-1995
3 7 Twelve Monkeys (1995) 01-Jan-1995
4 8 Babe (1995) 01-Jan-1995
5 9 Dead Man Walking (1995) 01-Jan-1995
6 11 Seven (Se7en) (1995) 01-Jan-1995
7 12 Usual Suspects, The (1995) 14-Aug-1995
8 15 Mr. Holland's Opus (1995) 29-Jan-1996
9 17 From Dusk Till Dawn (1996) 05-Feb-1996
10 19 Antonia's Line (1995) 01-Jan-1995
11 21 Muppet Treasure Island (1996) 16-Feb-1996
12 22 Braveheart (1995) 16-Feb-1996
13 23 Taxi Driver (1976) 16-Feb-1996
14 24 Rumble in the Bronx (1995) 23-Feb-1996
15 25 Birdcage, The (1996) 08-Mar-1996
16 28 Apollo 13 (1995) 01-Jan-1995
17 30 Belle de jour (1967) 01-Jan-1967
18 31 Crimson Tide (1995) 01-Jan-1995
19 32 Crumb (1994) 01-Jan-1994
20 42 Clerks (1994) 01-Jan-1994
21 44 Dolores Claiborne (1994) 01-Jan-1994
22 45 Eat Drink Man Woman (1994) 01-Jan-1994
23 47 Ed Wood (1994) 01-Jan-1994
24 48 Hoop Dreams (1994) 01-Jan-1994
25 49 I.Q. (1994) 01-Jan-1994
26 50 Star Wars (1977) 01-Jan-1977
27 54 Outbreak (1995) 01-Jan-1995
28 55 Professional, The (1994) 01-Jan-1994
29 56 Pulp Fiction (1994) 01-Jan-1994
... ... ...
99970 332 Kiss the Girls (1997) 01-Jan-1997
99971 334 U Turn (1997) 01-Jan-1997
99972 338 Bean (1997) 01-Jan-1997
99973 346 Jackie Brown (1997) 01-Jan-1997
99974 682 I Know What You Did Last Summer (1997) 17-Oct-1997
99975 873 Picture Perfect (1997) 01-Aug-1997
99976 877 Excess Baggage (1997) 01-Jan-1997
99977 886 Life Less Ordinary, A (1997) 01-Jan-1997
99978 1527 Senseless (1998) 09-Jan-1998
99979 272 Good Will Hunting (1997) 01-Jan-1997
99980 288 Scream (1996) 20-Dec-1996
99981 294 Liar Liar (1997) 21-Mar-1997
99982 300 Air Force One (1997) 01-Jan-1997
99983 310 Rainmaker, The (1997) 01-Jan-1997
99984 313 Titanic (1997) 01-Jan-1997
99985 322 Murder at 1600 (1997) 18-Apr-1997
99986 328 Conspiracy Theory (1997) 08-Aug-1997
99987 333 Game, The (1997) 01-Jan-1997
99988 338 Bean (1997) 01-Jan-1997
99989 346 Jackie Brown (1997) 01-Jan-1997
99990 354 Wedding Singer, The (1998) 13-Feb-1998
99991 362 Blues Brothers 2000 (1998) 06-Feb-1998
99992 683 Rocket Man (1997) 01-Jan-1997
99993 689 Jackal, The (1997) 01-Jan-1997
99994 690 Seven Years in Tibet (1997) 01-Jan-1997
99995 748 Saint, The (1997) 14-Mar-1997
99996 751 Tomorrow Never Dies (1997) 01-Jan-1997
99997 879 Peacemaker, The (1997) 01-Jan-1997
99998 894 Home Alone 3 (1997) 01-Jan-1997
99999 901 Mr. Magoo (1997) 25-Dec-1997
videoreleasedate IMDb URL \
0 NaN http://us.imdb.com/M/title-exact?Toy%20Story%2...
1 NaN http://us.imdb.com/M/title-exact?Get%20Shorty%...
2 NaN http://us.imdb.com/M/title-exact?Copycat%20(1995)
3 NaN http://us.imdb.com/M/title-exact?Twelve%20Monk...
4 NaN http://us.imdb.com/M/title-exact?Babe%20(1995)
5 NaN http://us.imdb.com/M/title-exact?Dead%20Man%20...
6 NaN http://us.imdb.com/M/title-exact?Se7en%20(1995)
7 NaN http://us.imdb.com/M/title-exact?Usual%20Suspe...
8 NaN http://us.imdb.com/M/title-exact?Mr.%20Holland...
9 NaN http://us.imdb.com/M/title-exact?From%20Dusk%2...
10 NaN http://us.imdb.com/M/title-exact?Antonia%20(1995)
11 NaN http://us.imdb.com/M/title-exact?Muppet%20Trea...
12 NaN http://us.imdb.com/M/title-exact?Braveheart%20...
13 NaN http://us.imdb.com/M/title-exact?Taxi%20Driver...
14 NaN http://us.imdb.com/M/title-exact?Hong%20Faan%2...
15 NaN http://us.imdb.com/M/title-exact?Birdcage,%20T...
16 NaN http://us.imdb.com/M/title-exact?Apollo%2013%2...
17 NaN http://us.imdb.com/M/title-exact?Belle%20de%20...
18 NaN http://us.imdb.com/M/title-exact?Crimson%20Tid...
19 NaN http://us.imdb.com/M/title-exact?Crumb%20(1994)
20 NaN http://us.imdb.com/M/title-exact?Clerks%20(1994)
21 NaN http://us.imdb.com/M/title-exact?Dolores%20Cla...
22 NaN http://us.imdb.com/M/title-exact?Yinshi%20Nan%...
23 NaN http://us.imdb.com/M/title-exact?Ed%20Wood%20(...
24 NaN http://us.imdb.com/M/title-exact?Hoop%20Dreams...
25 NaN http://us.imdb.com/M/title-exact?I.Q.%20(1994)
26 NaN http://us.imdb.com/M/title-exact?Star%20Wars%2...
27 NaN http://us.imdb.com/M/title-exact?Outbreak%20(1...
28 NaN http://us.imdb.com/Title?L%E9on+(1994)
29 NaN http://us.imdb.com/M/title-exact?Pulp%20Fictio...
... ...
99970 NaN http://us.imdb.com/M/title-exact?Kiss+the+Girl...
99971 NaN http://us.imdb.com/Title?U+Turn+(1997)
99972 NaN http://us.imdb.com/M/title-exact?Bean+(1997)
99973 NaN http://us.imdb.com/M/title-exact?imdb-title-11...
99974 NaN http://us.imdb.com/M/title-exact?I+Know+What+Y...
99975 NaN http://us.imdb.com/M/title-exact?Picture+Perfe...
99976 NaN http://us.imdb.com/M/title-exact?Excess+Baggag...
99977 NaN http://us.imdb.com/M/title-exact?Life+Less+Ord...
99978 NaN http://us.imdb.com/M/title-exact?imdb-title-12...
99979 NaN http://us.imdb.com/M/title-exact?imdb-title-11...
99980 NaN http://us.imdb.com/M/title-exact?Scream%20(1996)
99981 NaN http://us.imdb.com/Title?Liar+Liar+(1997)
99982 NaN http://us.imdb.com/M/title-exact?Air+Force+One...
99983 NaN http://us.imdb.com/M/title-exact?Rainmaker,+Th...
99984 NaN http://us.imdb.com/M/title-exact?imdb-title-12...
99985 NaN http://us.imdb.com/M/title-exact?Murder%20at%2...
99986 NaN http://us.imdb.com/M/title-exact?Conspiracy+Th...
99987 NaN http://us.imdb.com/M/title-exact?Game%2C+The+(...
99988 NaN http://us.imdb.com/M/title-exact?Bean+(1997)
99989 NaN http://us.imdb.com/M/title-exact?imdb-title-11...
99990 NaN http://us.imdb.com/M/title-exact?Wedding+Singe...
99991 NaN http://us.imdb.com/M/title-exact?Blues+Brother...
99992 NaN http://us.imdb.com/M/title-exact?Rocket+Man+(1...
99993 NaN http://us.imdb.com/M/title-exact?Jackal%2C+The...
99994 NaN http://us.imdb.com/M/title-exact?Seven+Years+i...
99995 NaN http://us.imdb.com/M/title-exact?Saint%2C%20Th...
99996 NaN http://us.imdb.com/M/title-exact?imdb-title-12...
99997 NaN http://us.imdb.com/M/title-exact?Peacemaker%2C...
99998 NaN http://us.imdb.com/M/title-exact?imdb-title-11...
99999 NaN http://us.imdb.com/M/title-exact?imdb-title-11...
unknown Action Adventure Animation Childrens ... Western \
0 0 0 0 1 1 ... 0
1 0 1 0 0 0 ... 0
2 0 0 0 0 0 ... 0
3 0 0 0 0 0 ... 0
4 0 0 0 0 1 ... 0
5 0 0 0 0 0 ... 0
6 0 0 0 0 0 ... 0
7 0 0 0 0 0 ... 0
8 0 0 0 0 0 ... 0
9 0 1 0 0 0 ... 0
10 0 0 0 0 0 ... 0
11 0 1 1 0 0 ... 0
12 0 1 0 0 0 ... 0
13 0 0 0 0 0 ... 0
14 0 1 1 0 0 ... 0
15 0 0 0 0 0 ... 0
16 0 1 0 0 0 ... 0
17 0 0 0 0 0 ... 0
18 0 0 0 0 0 ... 0
19 0 0 0 0 0 ... 0
20 0 0 0 0 0 ... 0
21 0 0 0 0 0 ... 0
22 0 0 0 0 0 ... 0
23 0 0 0 0 0 ... 0
24 0 0 0 0 0 ... 0
25 0 0 0 0 0 ... 0
26 0 1 1 0 0 ... 0
27 0 1 0 0 0 ... 0
28 0 0 0 0 0 ... 0
29 0 0 0 0 0 ... 0
... ... ... ... ... ... ...
99970 0 0 0 0 0 ... 0
99971 0 1 0 0 0 ... 0
99972 0 0 0 0 0 ... 0
99973 0 0 0 0 0 ... 0
99974 0 0 0 0 0 ... 0
99975 0 0 0 0 0 ... 0
99976 0 0 1 0 0 ... 0
99977 0 0 0 0 0 ... 0
99978 0 0 0 0 0 ... 0
99979 0 0 0 0 0 ... 0
99980 0 0 0 0 0 ... 0
99981 0 0 0 0 0 ... 0
99982 0 1 0 0 0 ... 0
99983 0 0 0 0 0 ... 0
99984 0 1 0 0 0 ... 0
99985 0 0 0 0 0 ... 0
99986 0 1 0 0 0 ... 0
99987 0 0 0 0 0 ... 0
99988 0 0 0 0 0 ... 0
99989 0 0 0 0 0 ... 0
99990 0 0 0 0 0 ... 0
99991 0 1 0 0 0 ... 0
99992 0 0 0 0 0 ... 0
99993 0 1 0 0 0 ... 0
99994 0 0 0 0 0 ... 0
99995 0 1 0 0 0 ... 0
99996 0 1 0 0 0 ... 0
99997 0 1 0 0 0 ... 0
99998 0 0 0 0 1 ... 0
99999 0 0 0 0 0 ... 0
The genres are Action Adventure Animation Children's ... Western. There are around 20 genres, but the dataframe doesn't print them all out. How can I figure out what reviews classified their movies in at least 2 genres? This means that they said that there movie belonged in two genres such as action and drama.
Since each of the genres is in its own dataframe column, I am a bit confused on how to do this. If there was one dataframe column I would simply use groupby becuase it would work well with the genres and their counts.
Any insight would help!
Edit: As and example you can see movie "0" toy story was classified in animation and Children's because it has a 1 in both columns.
Essentially you are only interested in rows for which the sum of the genres columns is greater than 1.
For all the columns this can be achieved by df = df[df.sum(axis=1) > 1] which will automatically ignore non-numeric columns.
The real issue here is how to sum only the genres columns (because movie id column also seem to be numeric).
If you have an external list of genres you can use it, ie df = df[df[['Horror', 'Comedy']].sum(axis=1) > 1].

Finding the averages within a multi-column dataframe?

Here is my dataframe:, yes it is quite large.
bigdataframe
Out[2]:
movie id movietitle releasedate \
0 1 Toy Story (1995) 01-Jan-1995
1 4 Get Shorty (1995) 01-Jan-1995
2 5 Copycat (1995) 01-Jan-1995
3 7 Twelve Monkeys (1995) 01-Jan-1995
4 8 Babe (1995) 01-Jan-1995
5 9 Dead Man Walking (1995) 01-Jan-1995
6 11 Seven (Se7en) (1995) 01-Jan-1995
7 12 Usual Suspects, The (1995) 14-Aug-1995
8 15 Mr. Holland's Opus (1995) 29-Jan-1996
9 17 From Dusk Till Dawn (1996) 05-Feb-1996
10 19 Antonia's Line (1995) 01-Jan-1995
11 21 Muppet Treasure Island (1996) 16-Feb-1996
12 22 Braveheart (1995) 16-Feb-1996
13 23 Taxi Driver (1976) 16-Feb-1996
14 24 Rumble in the Bronx (1995) 23-Feb-1996
15 25 Birdcage, The (1996) 08-Mar-1996
16 28 Apollo 13 (1995) 01-Jan-1995
17 30 Belle de jour (1967) 01-Jan-1967
18 31 Crimson Tide (1995) 01-Jan-1995
19 32 Crumb (1994) 01-Jan-1994
20 42 Clerks (1994) 01-Jan-1994
21 44 Dolores Claiborne (1994) 01-Jan-1994
22 45 Eat Drink Man Woman (1994) 01-Jan-1994
23 47 Ed Wood (1994) 01-Jan-1994
24 48 Hoop Dreams (1994) 01-Jan-1994
25 49 I.Q. (1994) 01-Jan-1994
26 50 Star Wars (1977) 01-Jan-1977
27 54 Outbreak (1995) 01-Jan-1995
28 55 Professional, The (1994) 01-Jan-1994
29 56 Pulp Fiction (1994) 01-Jan-1994
... ... ...
99970 332 Kiss the Girls (1997) 01-Jan-1997
99971 334 U Turn (1997) 01-Jan-1997
99972 338 Bean (1997) 01-Jan-1997
99973 346 Jackie Brown (1997) 01-Jan-1997
99974 682 I Know What You Did Last Summer (1997) 17-Oct-1997
99975 873 Picture Perfect (1997) 01-Aug-1997
99976 877 Excess Baggage (1997) 01-Jan-1997
99977 886 Life Less Ordinary, A (1997) 01-Jan-1997
99978 1527 Senseless (1998) 09-Jan-1998
99979 272 Good Will Hunting (1997) 01-Jan-1997
99980 288 Scream (1996) 20-Dec-1996
99981 294 Liar Liar (1997) 21-Mar-1997
99982 300 Air Force One (1997) 01-Jan-1997
99983 310 Rainmaker, The (1997) 01-Jan-1997
99984 313 Titanic (1997) 01-Jan-1997
99985 322 Murder at 1600 (1997) 18-Apr-1997
99986 328 Conspiracy Theory (1997) 08-Aug-1997
99987 333 Game, The (1997) 01-Jan-1997
99988 338 Bean (1997) 01-Jan-1997
99989 346 Jackie Brown (1997) 01-Jan-1997
99990 354 Wedding Singer, The (1998) 13-Feb-1998
99991 362 Blues Brothers 2000 (1998) 06-Feb-1998
99992 683 Rocket Man (1997) 01-Jan-1997
99993 689 Jackal, The (1997) 01-Jan-1997
99994 690 Seven Years in Tibet (1997) 01-Jan-1997
99995 748 Saint, The (1997) 14-Mar-1997
99996 751 Tomorrow Never Dies (1997) 01-Jan-1997
99997 879 Peacemaker, The (1997) 01-Jan-1997
99998 894 Home Alone 3 (1997) 01-Jan-1997
99999 901 Mr. Magoo (1997) 25-Dec-1997
videoreleasedate IMDb URL \
0 NaN http://us.imdb.com/M/title-exact?Toy%20Story%2...
1 NaN http://us.imdb.com/M/title-exact?Get%20Shorty%...
2 NaN http://us.imdb.com/M/title-exact?Copycat%20(1995)
3 NaN http://us.imdb.com/M/title-exact?Twelve%20Monk...
4 NaN http://us.imdb.com/M/title-exact?Babe%20(1995)
5 NaN http://us.imdb.com/M/title-exact?Dead%20Man%20...
6 NaN http://us.imdb.com/M/title-exact?Se7en%20(1995)
7 NaN http://us.imdb.com/M/title-exact?Usual%20Suspe...
8 NaN http://us.imdb.com/M/title-exact?Mr.%20Holland...
9 NaN http://us.imdb.com/M/title-exact?From%20Dusk%2...
10 NaN http://us.imdb.com/M/title-exact?Antonia%20(1995)
11 NaN http://us.imdb.com/M/title-exact?Muppet%20Trea...
12 NaN http://us.imdb.com/M/title-exact?Braveheart%20...
13 NaN http://us.imdb.com/M/title-exact?Taxi%20Driver...
14 NaN http://us.imdb.com/M/title-exact?Hong%20Faan%2...
15 NaN http://us.imdb.com/M/title-exact?Birdcage,%20T...
16 NaN http://us.imdb.com/M/title-exact?Apollo%2013%2...
17 NaN http://us.imdb.com/M/title-exact?Belle%20de%20...
18 NaN http://us.imdb.com/M/title-exact?Crimson%20Tid...
19 NaN http://us.imdb.com/M/title-exact?Crumb%20(1994)
20 NaN http://us.imdb.com/M/title-exact?Clerks%20(1994)
21 NaN http://us.imdb.com/M/title-exact?Dolores%20Cla...
22 NaN http://us.imdb.com/M/title-exact?Yinshi%20Nan%...
23 NaN http://us.imdb.com/M/title-exact?Ed%20Wood%20(...
24 NaN http://us.imdb.com/M/title-exact?Hoop%20Dreams...
25 NaN http://us.imdb.com/M/title-exact?I.Q.%20(1994)
26 NaN http://us.imdb.com/M/title-exact?Star%20Wars%2...
27 NaN http://us.imdb.com/M/title-exact?Outbreak%20(1...
28 NaN http://us.imdb.com/Title?L%E9on+(1994)
29 NaN http://us.imdb.com/M/title-exact?Pulp%20Fictio...
... ...
99970 NaN http://us.imdb.com/M/title-exact?Kiss+the+Girl...
99971 NaN http://us.imdb.com/Title?U+Turn+(1997)
99972 NaN http://us.imdb.com/M/title-exact?Bean+(1997)
99973 NaN http://us.imdb.com/M/title-exact?imdb-title-11...
99974 NaN http://us.imdb.com/M/title-exact?I+Know+What+Y...
99975 NaN http://us.imdb.com/M/title-exact?Picture+Perfe...
99976 NaN http://us.imdb.com/M/title-exact?Excess+Baggag...
99977 NaN http://us.imdb.com/M/title-exact?Life+Less+Ord...
99978 NaN http://us.imdb.com/M/title-exact?imdb-title-12...
99979 NaN http://us.imdb.com/M/title-exact?imdb-title-11...
99980 NaN http://us.imdb.com/M/title-exact?Scream%20(1996)
99981 NaN http://us.imdb.com/Title?Liar+Liar+(1997)
99982 NaN http://us.imdb.com/M/title-exact?Air+Force+One...
99983 NaN http://us.imdb.com/M/title-exact?Rainmaker,+Th...
99984 NaN http://us.imdb.com/M/title-exact?imdb-title-12...
99985 NaN http://us.imdb.com/M/title-exact?Murder%20at%2...
99986 NaN http://us.imdb.com/M/title-exact?Conspiracy+Th...
99987 NaN http://us.imdb.com/M/title-exact?Game%2C+The+(...
99988 NaN http://us.imdb.com/M/title-exact?Bean+(1997)
99989 NaN http://us.imdb.com/M/title-exact?imdb-title-11...
99990 NaN http://us.imdb.com/M/title-exact?Wedding+Singe...
99991 NaN http://us.imdb.com/M/title-exact?Blues+Brother...
99992 NaN http://us.imdb.com/M/title-exact?Rocket+Man+(1...
99993 NaN http://us.imdb.com/M/title-exact?Jackal%2C+The...
99994 NaN http://us.imdb.com/M/title-exact?Seven+Years+i...
99995 NaN http://us.imdb.com/M/title-exact?Saint%2C%20Th...
99996 NaN http://us.imdb.com/M/title-exact?imdb-title-12...
99997 NaN http://us.imdb.com/M/title-exact?Peacemaker%2C...
99998 NaN http://us.imdb.com/M/title-exact?imdb-title-11...
99999 NaN http://us.imdb.com/M/title-exact?imdb-title-11...
unknown Action Adventure Animation Childrens ... Western \
0 0 0 0 1 1 ... 0
1 0 1 0 0 0 ... 0
2 0 0 0 0 0 ... 0
3 0 0 0 0 0 ... 0
4 0 0 0 0 1 ... 0
5 0 0 0 0 0 ... 0
6 0 0 0 0 0 ... 0
7 0 0 0 0 0 ... 0
8 0 0 0 0 0 ... 0
9 0 1 0 0 0 ... 0
10 0 0 0 0 0 ... 0
11 0 1 1 0 0 ... 0
12 0 1 0 0 0 ... 0
13 0 0 0 0 0 ... 0
14 0 1 1 0 0 ... 0
15 0 0 0 0 0 ... 0
16 0 1 0 0 0 ... 0
17 0 0 0 0 0 ... 0
18 0 0 0 0 0 ... 0
19 0 0 0 0 0 ... 0
20 0 0 0 0 0 ... 0
21 0 0 0 0 0 ... 0
22 0 0 0 0 0 ... 0
23 0 0 0 0 0 ... 0
24 0 0 0 0 0 ... 0
25 0 0 0 0 0 ... 0
26 0 1 1 0 0 ... 0
27 0 1 0 0 0 ... 0
28 0 0 0 0 0 ... 0
29 0 0 0 0 0 ... 0
... ... ... ... ... ... ...
99970 0 0 0 0 0 ... 0
99971 0 1 0 0 0 ... 0
99972 0 0 0 0 0 ... 0
99973 0 0 0 0 0 ... 0
99974 0 0 0 0 0 ... 0
99975 0 0 0 0 0 ... 0
99976 0 0 1 0 0 ... 0
99977 0 0 0 0 0 ... 0
99978 0 0 0 0 0 ... 0
99979 0 0 0 0 0 ... 0
99980 0 0 0 0 0 ... 0
99981 0 0 0 0 0 ... 0
99982 0 1 0 0 0 ... 0
99983 0 0 0 0 0 ... 0
99984 0 1 0 0 0 ... 0
99985 0 0 0 0 0 ... 0
99986 0 1 0 0 0 ... 0
99987 0 0 0 0 0 ... 0
99988 0 0 0 0 0 ... 0
99989 0 0 0 0 0 ... 0
99990 0 0 0 0 0 ... 0
99991 0 1 0 0 0 ... 0
99992 0 0 0 0 0 ... 0
99993 0 1 0 0 0 ... 0
99994 0 0 0 0 0 ... 0
99995 0 1 0 0 0 ... 0
99996 0 1 0 0 0 ... 0
99997 0 1 0 0 0 ... 0
99998 0 0 0 0 1 ... 0
99999 0 0 0 0 0 ... 0
user id rating timestamp age gender occupation zipcode state \
0 308 4 887736532 60 M retired 95076 CA
1 308 5 887737890 60 M retired 95076 CA
2 308 4 887739608 60 M retired 95076 CA
3 308 4 887738847 60 M retired 95076 CA
4 308 5 887736696 60 M retired 95076 CA
5 308 4 887737194 60 M retired 95076 CA
6 308 5 887737837 60 M retired 95076 CA
7 308 5 887737243 60 M retired 95076 CA
8 308 3 887739426 60 M retired 95076 CA
9 308 4 887739056 60 M retired 95076 CA
10 308 3 887737383 60 M retired 95076 CA
11 308 3 887740729 60 M retired 95076 CA
12 308 4 887737647 60 M retired 95076 CA
13 308 5 887737293 60 M retired 95076 CA
14 308 4 887738057 60 M retired 95076 CA
15 308 4 887740649 60 M retired 95076 CA
16 308 3 887737036 60 M retired 95076 CA
17 308 4 887738933 60 M retired 95076 CA
18 308 3 887739472 60 M retired 95076 CA
19 308 5 887737432 60 M retired 95076 CA
20 308 4 887738191 60 M retired 95076 CA
21 308 4 887740451 60 M retired 95076 CA
22 308 4 887736843 60 M retired 95076 CA
23 308 4 887738933 60 M retired 95076 CA
24 308 4 887736880 60 M retired 95076 CA
25 308 3 887740833 60 M retired 95076 CA
26 308 5 887737431 60 M retired 95076 CA
27 308 2 887740254 60 M retired 95076 CA
28 308 3 887738760 60 M retired 95076 CA
29 308 5 887736924 60 M retired 95076 CA
... ... ... ... ... ... ... ...
99970 631 3 888465180 18 F student 38866 MS
99971 631 2 888464941 18 F student 38866 MS
99972 631 2 888465299 18 F student 38866 MS
99973 631 4 888465004 18 F student 38866 MS
99974 631 2 888465247 18 F student 38866 MS
99975 631 2 888465084 18 F student 38866 MS
99976 631 2 888465131 18 F student 38866 MS
99977 631 4 888465216 18 F student 38866 MS
99978 631 2 888465351 18 F student 38866 MS
99979 729 4 893286638 19 M student 56567 MN
99980 729 2 893286261 19 M student 56567 MN
99981 729 2 893286338 19 M student 56567 MN
99982 729 4 893286638 19 M student 56567 MN
99983 729 3 893286204 19 M student 56567 MN
99984 729 3 893286638 19 M student 56567 MN
99985 729 4 893286637 19 M student 56567 MN
99986 729 3 893286638 19 M student 56567 MN
99987 729 4 893286638 19 M student 56567 MN
99988 729 1 893286373 19 M student 56567 MN
99989 729 1 893286168 19 M student 56567 MN
99990 729 5 893286637 19 M student 56567 MN
99991 729 4 893286637 19 M student 56567 MN
99992 729 2 893286511 19 M student 56567 MN
99993 729 4 893286638 19 M student 56567 MN
99994 729 2 893286149 19 M student 56567 MN
99995 729 4 893286638 19 M student 56567 MN
99996 729 3 893286338 19 M student 56567 MN
99997 729 3 893286299 19 M student 56567 MN
99998 729 1 893286511 19 M student 56567 MN
99999 729 1 893286491 19 M student 56567 MN
State1
0 CA
1 CA
2 CA
3 CA
4 CA
5 CA
6 CA
7 CA
8 CA
9 CA
10 CA
11 CA
12 CA
13 CA
14 CA
15 CA
16 CA
17 CA
18 CA
19 CA
20 CA
21 CA
22 CA
23 CA
24 CA
25 CA
26 CA
27 CA
28 CA
29 CA
...
99970 MS
99971 MS
99972 MS
99973 MS
99974 MS
99975 MS
99976 MS
99977 MS
99978 MS
99979 MN
99980 MN
99981 MN
99982 MN
99983 MN
99984 MN
99985 MN
99986 MN
99987 MN
99988 MN
99989 MN
99990 MN
99991 MN
99992 MN
99993 MN
99994 MN
99995 MN
99996 MN
99997 MN
99998 MN
99999 MN
All of the genres are: [['Action', 'Adventure','Animation', 'Childrens', 'Comedy', 'Crime','Documentary', 'Drama', 'Fantasy', 'FilmNoir',
'Horror', 'Musical', 'Mystery', 'Romance','SciFi', 'Thriller', 'War', 'Western']]
How would I be able to figure out what genre had the highest average review, and which had the lowest average review? Should I groupby with ratings and then all of the corresponding genres?
df = bigdataframe[['Action', 'Adventure','Animation', 'Childrens', 'Comedy',
'Crime','Documentary', 'Drama', 'Fantasy', 'FilmNoir',
'Horror', 'Musical', 'Mystery',
'Romance','SciFi', 'Thriller', 'War', 'Western','rating']]
gp = df.groupby('rating')
result = gp.agg(['mean'])
result gives me this:
Action Adventure Animation Childrens Comedy Crime \
mean mean mean mean mean mean
rating
1 0.253191 0.131588 0.030442 0.093944 0.372995 0.068249
2 0.286192 0.150308 0.032806 0.084521 0.339138 0.073351
3 0.267232 0.143710 0.037502 0.081709 0.322380 0.073899
4 0.246708 0.129806 0.036051 0.064728 0.284485 0.082958
5 0.240696 0.136928 0.037545 0.057403 0.246403 0.092590
Documentary Drama Fantasy FilmNoir Horror Musical \
mean mean mean mean mean mean
rating
1 0.009656 0.289034 0.018331 0.007365 0.082324 0.046645
2 0.005101 0.320756 0.019349 0.008531 0.071592 0.050484
3 0.006042 0.363861 0.016983 0.013520 0.055738 0.052238
4 0.007842 0.427459 0.011207 0.019430 0.047112 0.047609
5 0.009858 0.471534 0.008301 0.026414 0.041366 0.049526
Mystery Romance SciFi Thriller War Western
mean mean mean mean mean mean
rating
1 0.041735 0.154173 0.118494 0.203764 0.060065 0.011620
2 0.046262 0.177397 0.133597 0.229903 0.067018 0.015743
3 0.048112 0.186443 0.121422 0.224277 0.074415 0.019893
4 0.056563 0.201381 0.125154 0.222772 0.097589 0.019606
5 0.057780 0.215037 0.137446 0.203387 0.137446 0.018584
I think you need idxmin and idxmax, also new DataFrame is not necessary, you can use bigdataframe and filter columns in []:
genres = ['Action', 'Adventure','Animation', 'Childrens', 'Comedy', 'Crime','Documentary', 'Drama', 'Fantasy', 'FilmNoir', 'Horror', 'Musical', 'Mystery', 'Romance','SciFi', 'Thriller', 'War', 'Western']
df1 = bigdataframe.groupby('rating')[genres].mean()
print (df1)
Action Adventure Animation Childrens Comedy Crime \
rating
1 0.253191 0.131588 0.030442 0.093944 0.372995 0.068249
2 0.286192 0.150308 0.032806 0.084521 0.339138 0.073351
3 0.267232 0.143710 0.037502 0.081709 0.322380 0.073899
4 0.246708 0.129806 0.036051 0.064728 0.284485 0.082958
5 0.240696 0.136928 0.037545 0.057403 0.246403 0.092590
Documentary Drama Fantasy FilmNoir Horror Musical \
rating
1 0.009656 0.289034 0.018331 0.007365 0.082324 0.046645
2 0.005101 0.320756 0.019349 0.008531 0.071592 0.050484
3 0.006042 0.363861 0.016983 0.013520 0.055738 0.052238
4 0.007842 0.427459 0.011207 0.019430 0.047112 0.047609
5 0.009858 0.471534 0.008301 0.026414 0.041366 0.049526
Mystery Romance SciFi Thriller War Western
rating
1 0.041735 0.154173 0.118494 0.203764 0.060065 0.011620
2 0.046262 0.177397 0.133597 0.229903 0.067018 0.015743
3 0.048112 0.186443 0.121422 0.224277 0.074415 0.019893
4 0.056563 0.201381 0.125154 0.222772 0.097589 0.019606
5 0.057780 0.215037 0.137446 0.203387 0.137446 0.018584
mingen = df1.idxmin(axis=1).reset_index(name='Genre')
print (mingen)
rating Genre
0 1 FilmNoir
1 2 Documentary
2 3 Documentary
3 4 Documentary
4 5 Fantasy
maxgen = df1.idxmax(axis=1).reset_index(name='Genre')
print (maxgen)
rating Genre
0 1 Comedy
1 2 Comedy
2 3 Drama
3 4 Drama
4 5 Drama

Categories

Resources