Python Beautiful Soup Webscraping: Cannot get a full table to display

Python Beautiful Soup Webscraping: Cannot get a full table to display - python

I am relatively new to python and this is my first web scrape. I am trying to scrape a table and can only get the first column to show up. I am using the find method instead of find_all which I am pretty sure what is causing this, but when I use the find_all method I cannot get any text to display. Here is the url I am scraping from: https://www.fangraphs.com/teams/mariners/stats
I am trying to get the top table (Batting Stat Leaders) to work. My code is below:
from bs4 import BeautifulSoup
import requests
import time
htmlText = requests.get('https://www.fangraphs.com/teams/mariners/stats').text
soup = BeautifulSoup(htmlText, 'lxml', )
playerTable = soup.find('div', class_='team-stats-table')
input = input("Would you like to see Batting, Starting Pitching, Relief Pitching, or Fielding Stats? \n")
def BattingStats():
print("BATTING STATS:")
print("Player Name: ")
for tr in playerTable.find_all('tr')[1:55]:
tds = tr.find('td').text
print(tds)
if input == "Batting" or "batting":
BattingStats()

You can use list-comprehension to get text from all rows:
import requests
from bs4 import BeautifulSoup
playerTable = soup.find("div", class_="team-stats-table")
def BattingStats():
print("BATTING STATS:")
print("Player Name: ")
for tr in playerTable.find_all("tr")[1:55]:
tds = [td.text for td in tr.select("td")]
print(tds)
BattingStats()
Prints:
BATTING STATS:
Player Name:
Mitch Haniger 30 94 406 25 0 6.7% 23.4% .257 .291 .268 .323 .524 .358 133 0.2 16.4 -6.5 2.4
Ty France 26 89 372 9 0 7.3% 16.9% .150 .314 .276 .355 .426 .341 121 0.0 9.5 -2.6 2.0
Kyle Seager 33 97 403 18 2 8.4% 25.8% .201 .246 .215 .285 .416 .302 95 -0.3 -2.9 5.4 1.6
...
Solution with pandas:
import pandas as pd
url = "https://www.fangraphs.com/teams/mariners/stats"
df = pd.read_html(url)[7]
print(df)
Prints:
Name Age G PA HR SB BB% K% ISO BABIP AVG OBP SLG wOBA wRC+ BsR Off Def WAR
0 Mitch Haniger 30 94 406 25 0 6.7% 23.4% 0.257 0.291 0.268 0.323 0.524 0.358 133.0 0.2 16.4 -6.5 2.4
1 Ty France 26 89 372 9 0 7.3% 16.9% 0.150 0.314 0.276 0.355 0.426 0.341 121.0 0.0 9.5 -2.6 2.0
2 Kyle Seager 33 97 403 18 2 8.4% 25.8% 0.201 0.246 0.215 0.285 0.416 0.302 95.0 -0.3 -2.9 5.4 1.6
...

Related

Cannot scrape some table using Pandas

i'm more than a noob in python, i'm tryng to get some tables from this page:
https://www.basketball-reference.com/wnba/boxscores/202208030SEA.html
Using Pandas and command pd.read_html i'm able to get most of them but not the "Line Score" and the "Four Factors"...if i print all the table (they are 19) these two are missing, inspecting with chrome they seem to be table and i also get them with excel importing from web.
What am i missing here?
Any help appreciated, thanks!

If you look at the page source (not by inspecting), you'd see those tables are within the comments of the html. You can either a) edit the html str and remove the <!-- and --> from the html, then let pandas parse, or 2) use bs4 to pull out the comments, then parse that tables that way.
I'll show you both options:
Option 1: Remove the comment tags from the page source
import requests
import pandas as pd
url = 'https://www.basketball-reference.com/wnba/boxscores/202208030SEA.html'
response = requests.get(url).text.replace("<!--","").replace("-->","")
dfs = pd.read_html(response, header=1)
Output:
You can see you now have 21 tables, with the 4th and 5th tables the ones in question.
print(len(dfs))
for each in dfs[3:5]:
print('\n\n', each, '\n')
21
Unnamed: 0 1 2 3 4 T
0 Minnesota Lynx 18 14 22 23 77
1 Seattle Storm 30 26 22 11 89
Unnamed: 0 Pace eFG% TOV% ORB% FT/FGA ORtg
0 MIN 97.0 0.507 16.1 14.3 0.101 95.2
1 SEA 97.0 0.579 11.8 9.7 0.114 110.1
Option 2: Pull out comments with bs4
import requests
from bs4 import BeautifulSoup, Comment
import pandas as pd
url = 'https://www.basketball-reference.com/wnba/boxscores/202208030SEA.html'
result = requests.get(url).text
data = BeautifulSoup(result, 'html.parser')
dfs = pd.read_html(url, header=1)
comments = data.find_all(string=lambda text: isinstance(text, Comment))
other_tables = []
for each in comments:
if '<table' in str(each):
try:
other_tables.append(pd.read_html(str(each), header=1)[0])
except:
continue
Output:
for each in other_tables:
print(each, '\n')
Unnamed: 0 1 2 3 4 T
0 Minnesota Lynx 18 14 22 23 77
1 Seattle Storm 30 26 22 11 89
Unnamed: 0 Pace eFG% TOV% ORB% FT/FGA ORtg
0 MIN 97.0 0.507 16.1 14.3 0.101 95.2
1 SEA 97.0 0.579 11.8 9.7 0.114 110.1

Web Scraping with BS4 - Can you sort this out?

Can you fix this code for me? This is giving me a erroneous message like,
AttributeError: ResultSet object has no attribute 'find_all'. You're probably treating a list of elements like a single element. Did you call find_all() when you meant to call find()?
Can anyone please help me on this? Below is the code
import pandas as pd
import requests
from bs4 import BeautifulSoup
url="https://www.cse.lk/pages/trade-summary/trade-summary.component.html"
data = requests.get(url).text
soup = BeautifulSoup(data, 'html5lib')
cse = pd.DataFrame(columns=["Company Name", "Symbol", "Share Volume", "Trade Volume", "Previous Close (Rs.)", "Open (Rs.)", "High (Rs.)", "Low (Rs.)", "**Last Traded Price (Rs.)", "Change (Rs.)", "Change Percentage (%)"])
for row in soup.find_all('tbody').find_all('tr'): ##for row in soup.find("tbody").find_all('tr'):
col = row.find_all("td")
Company_Name = col[0].text
Symbol = col[1].text
Share_Volume = col[2].text
Trade_Volume = col[3].text
Previous_Close = col[4].text
Open = col[5].text
High = col[6].text
Low = col[7].text
Last_Traded_Price = col[8].text
Change = col[9].text
Change_Percentage = col[10].text
cse = cse.append({"Company Name":Company_Name,"Symbol":Symbol,"Share Volume":Share_Volume,"Trade Volume":Trade_Volume,"Previous Close (Rs.)":Previous_Close,"Open (Rs.)":Open,"High (Rs.)":High,"Low (Rs.)":Low,"**Last Traded Price (Rs.)":Last_Traded_Price,"Change (Rs.)":Change,"Change Percentage (%)":Change_Percentage}, ignore_index=True)

The data is loaded from external URL via Javascript, so beautifulsoup doesn't see it. You can use this example how to load it:
import requests
import pandas as pd
url = "https://www.cse.lk/api/tradeSummary"
data = requests.post(url).json()
df = pd.DataFrame(data["reqTradeSummery"])
print(df)
df.to_csv("data.csv", index=None)
Prints:
id name symbol quantity percentageChange change price previousClose high low lastTradedTime issueDate turnover sharevolume tradevolume marketCap marketCapPercentage open closingPrice crossingVolume crossingTradeVol status
0 204 ABANS ELECTRICALS PLC ABAN.N0000 317 4.184704 7.25 180.50 173.25 183.00 172.00 1626944252441 01/JAN/1984 1.256363e+06 7012 44 9.224561e+08 0.0 179.00 180.50 7012 44 0
1 1845 ABANS FINANCE PLC AFSL.N0000 89 -3.225806 -1.00 30.00 31.00 30.10 30.00 1626944124197 27/JUN/2011 1.160916e+06 38652 11 1.996847e+09 0.0 30.10 30.00 38652 11 3
2 2065 ACCESS ENGINEERING PLC AEL.N0000 500 -0.432900 -0.10 23.00 23.10 23.40 22.90 1626944388726 27/MAR/2012 1.968675e+07 855534 264 2.300000e+10 0.0 23.10 23.00 855534 264 0
3 472 ACL CABLES PLC ACL.N0000 1000 -0.963855 -0.40 41.10 41.50 41.70 40.90 1626944397450 01/JAN/1976 3.037800e+07 738027 421 9.846521e+09 0.0 41.50 41.10 738027 421 0
4 406 ACL PLASTICS PLC APLA.N0000 20 0.842697 2.25 269.25 267.00 272.75 266.00 1626943847820 05/APR/1995 1.436916e+06 5333 26 1.134216e+09 0.0 272.75 269.25 5333 26 0
...
and saves data.csv (screenshot from LibreOffice);

Multiple table header <thead> in table <table> and how to scrape data from <thead> as a table row

I'm trying to scrape data from a website but the table has two sets of data, first, 2-3 lines of data are in thead and rest in tbody. I can easily extract data only from one at a time when I try both I got some error like TypeError, AttributeError. btw I'm using python
here is the code
import requests
from bs4 import BeautifulSoup
import pandas as pd
url="https://www.worldometers.info/world-population/"
r=requests.get(url)
print(r)
html=r.text
soup=BeautifulSoup(html,'html.parser')
print(soup.title.text)
print()
print()
live_data=soup.find_all('div',id='maincounter-wrap')
print(live_data)
for i in live_data:
print(i.text)
table_body=soup.find('thead')
table_rows=table_body.find_all('tr')
table_body_2=soup.find('tbody')
table_rows_2=soup.find_all('tr')
year_july1=[]
population=[]
yearly_change_in_perchantage=[]
yearly_change=[]
median_age=[]
fertillity_rate=[]
density=[]#density (p\km**)
urban_population_in_perchantage=[]
urban_population=[]
for tr in table_rows:
td=tr.find_all('td')
year_july1.append(td[0].text)
population.append(td[1].text)
yearly_change_in_perchantage.append(td[2].text)
yearly_change.append(td[3].text)
median_age.append(td[4].text)
fertillity_rate.append(td[5].text)
density.append(td[6].text)
urban_population_in_perchantage.append(td[7].text)
urban_population.append(td[8].text)
for tr in table_rows_2:
td=tr.find_all('td')
year_july1.append(td[0].text)
population.append(td[1].text)
yearly_change_in_perchantage.append(td[2].text)
yearly_change.append(td[3].text)
median_age.append(td[4].text)
fertillity_rate.append(td[5].text)
density.append(td[6].text)
urban_population_in_perchantage.append(td[7].text)
urban_population.append(td[8].text)
headers=['year_july1','population','yearly_change_in_perchantage','yearly_change','median_age','fertillity_rate','density','urban_population_in_perchantage','urban_population']
data_2= pd.DataFrame(list(zip(year_july1,population,yearly_change_in_perchantage,yearly_change,median_age,fertillity_rate,density,urban_population_in_perchantage,urban_population)),columns=headers)
print(data_2)
data_2.to_csv("C:\\Users\\data_2.csv")

you can try the below code it generates the required data. Do let me know if you need any clarification:-
import requests
import pandas as pd
url = 'https://www.worldometers.info/world-population/'
html = requests.get(url).content
df_list = pd.read_html(html, header=0)
df = df_list[0]
#print(df)
df.to_csv("data.csv", index=False)
gives me below output
print(df)
Year (July 1) Population ... Urban Pop % Urban Population
0 2020 7794798739 ... 56.2 % 4378993944
1 2019 7713468100 ... 55.7 % 4299438618
2 2018 7631091040 ... 55.3 % 4219817318
3 2017 7547858925 ... 54.9 % 4140188594
4 2016 7464022049 ... 54.4 % 4060652683
5 2015 7379797139 ... 54.0 % 3981497663
6 2010 6956823603 ... 51.7 % 3594868146
7 2005 6541907027 ... 49.2 % 3215905863
8 2000 6143493823 ... 46.7 % 2868307513
9 1995 5744212979 ... 44.8 % 2575505235
10 1990 5327231061 ... 43.0 % 2290228096
11 1985 4870921740 ... 41.2 % 2007939063
12 1980 4458003514 ... 39.3 % 1754201029
13 1975 4079480606 ... 37.7 % 1538624994
14 1970 3700437046 ... 36.6 % 1354215496
15 1965 3339583597 ... N.A. N.A.
16 1960 3034949748 ... 33.7 % 1023845517
17 1955 2773019936 ... N.A. N.A.
[18 rows x 9 columns]

Selenium code can not catch the table from Chrome

I am using selenium to parse from
https://www.worldometers.info/coronavirus/
and doing as the following, I get attribute error and the table variable remains empty, what is the reason ?
I use Chrome 80. Are the tags right ?
AttributeError: 'NoneType' object has no attribute 'tbody'
from selenium import webdriver
import bs4
browser = webdriver.Chrome()
browser.get("https://www.worldometers.info/coronavirus/")
html = bs4.BeautifulSoup(browser.page_source, "html.parser")
table = html.find("table",class_="table table-bordered table-hover main_table_countries dataTable no-footer") #

Wherever I have table tags, I find it easier to use pandas to capture the table.
import pandas as pd
url = 'https://www.worldometers.info/coronavirus/'
table = pd.read_html(url)[0]
Output:
print(table)
Country,Other TotalCases ... Tot Cases/1M pop Tot Deaths/1M pop
0 China 81093 ... 56.00 2.0
1 Italy 63927 ... 1057.00 101.0
2 USA 43734 ... 132.00 2.0
3 Spain 35136 ... 751.00 49.0
4 Germany 29056 ... 347.00 1.0
.. ... ... ... ... ...
192 Somalia 1 ... 0.06 NaN
193 Syria 1 ... 0.06 NaN
194 Timor-Leste 1 ... 0.80 NaN
195 Turks and Caicos 1 ... 26.00 NaN
196 Total: 378782 ... 48.60 2.1
[197 rows x 10 columns]

Loop Through Table Rows Using BeautifulSoup

I need help looping through table rows and putting them into a list. On this website, there are three tables, each with different statistics - http://www.fangraphs.com/statsplits.aspx?playerid=15640&position=OF&season=0&split=0.4
For instance, these three tables have rows for 2016, 2017, and a total row. I would like the following:
A list of the following -->table 1 - row 1, table 2 - row 1, table 3 - row 1
A second list of the following -->table 1 - row 2, table 2 - row 2, table 3 - row 2
A third list: -->table 1 - row 3, table 2 - row 3, table 3 - row 3
I know I obviously need to create lists, and need to use the append function; however, I am not sure how to get it to loop through just the first row of each table, then the second row of each table, and etc through each row of the table (the number of rows will vary in each instance - this one just happens to have 3).
Any help is greatly appreciated. The code is below:
from bs4 import BeautifulSoup
import requests
import pandas as pd
import csv
idList2 = ['15640', '9256']
splitList=[0.4,0.2,0.3,0.4]
for id in idList2:
pos = 'OF'
for split in splitList:
url = 'http://www.fangraphs.com/statsplits.aspx?playerid=' +
str(id) + '&position=' + str(pos) + '&season=0&split=' +
str(split) + ''
r = requests.get(url)
for season in range(1,4):
print(season)
soup = BeautifulSoup(r.text, "html.parser")
tableStats = soup.find("table", {"id" : "SeasonSplits1_dgSeason" + str(season) + "_ctl00"})
soup.findAll('th')
column_headers = [th.getText() for th in soup.findAll('th')]
statistics = soup.find("table", {"id" :
'"SeasonSplits1_dgSeason" + str(season) + "_ctl00"})'
tabledata = [td.getText() for td in statistics('td')]
print(tabledata)

This will be my last attempt. It has every thing you should need. I created a traceback to where the tables, rows and columns are being scraped. This all happens in the function extract_table(). follow the traceback markers and don't worry about any other code. Don't let the large file size worry you its mostly documentation and spacing.
Traceback marker: ### ... ###
Start at line 95 with traceback marker ### START HERE ###
from bs4 import BeautifulSoup as Soup
import requests
import urllib
###### GO TO LINE 95 ######
### IGNORE ###
def generate_urls (idList, splitList):
""" Using and id list and a split list generate a list urls"""
urls = []
url = 'http://www.fangraphs.com/statsplits.aspx'
for id in idList:
for split in splitList:
# The parameters used in creating the url
url_payload = {'split': split, 'playerid': id, 'position': 'OF', 'season': 0}
# Create the url and store add to the collection of urls
urls += ['?'.join([url, urllib.urlencode(url_payload)])]
return urls # Return the list of urls
### IGNORE ###
def extract_player_name (soup):
""" Extract the player name from the browser title """
# Browser title contains player name, strip all but name
player_name = repr(soup.title.text.strip('\r\n\t'))
player_name = player_name.split(' \\xbb')[0] # Split on ` »`
player_name = player_name[2:] # Erase a leading characters from using `repr`
return player_name
########## FINISH HERE ##########
def extract_table (table_id, soup):
""" Extract data from a table, return the column headers and the table rows"""
### IMPORTANT: THIS CODE IS WHERE ALL THE MAGIC HAPPENS ###
# - First: Find lowest level tag of all the data we want (container).
#
# - Second: Extract the table column headers, requires minimal mining
#
# - Third: Gather a list of tags that represent the tables rows
#
# - Fourth: Loop through the list of rows
# A): Mine all columns in the row
### IMPORTANT: Get A Reference To The Table ###
# SCRAPE 1:
table_tag = soup.find("table", {"id" : 'SeasonSplits1_dgSeason%d_ctl00' % table_id})
# SCRAPE 2:
columns = [th.text for th in table_tag.findAll('th')]
# SCRAPE 3:
rows_tags = table_tag.tbody.findAll('tr'); # All 'tr' tags in the table `tbody` tag are row tags
### IMPORTANT: Cycle Through Rows And Collect Column Data ###
# SCRAPE 4:
rows = [] # List of all table rows
for row_tag in rows_tags:
### IMPORTANT: Mine All Columns In This Row || LOWEST LEVEL IN THE MINING OPERATION. ###
# SCRAPE 4.A
row = [col.text for col in row_tag.findAll('td')] # `td` represents a column in a row.
rows.append (row) # Add this row to all the other rows of this table
# RETURN: The column header and the rows of this table
return [columns, rows]
### Look Deeper ###
def extract_player (soup):
""" Extract player data and store in a list. ['name', [columns, rows], [table2]]"""
player = [] # A list store data in
# player name is first in player list
player.append (extract_player_name (soup))
# Each table is a list entry
for season in range(1,4):
### IMPORTANT: No Table Related Data Has Been Mined Yet. START HERE ###
### - Line: 37
table = extract_table (season, soup) # `season` represents the table id
player.append(table) # Add this table(list to the player data list
# Return the player list
return player
##################################################
################## START HERE ####################
##################################################
###
### OBJECTIVE:
###
### - Follow the trail of important lines that extract the data
### - Important lines will be marked as the following `### ... ###`
###
### All this code really needs is a url and the `extract_table()` function.
###
### The `main()` function is where the journey starts
###
##################################################
##################################################
def main ():
""" The main function is the core program code. """
# Luckily the pages we will scrape all have the same layout making mining easier.
all_players = [] # A place to store all the data
# Values used to alter the url when making requests to access more player statistics
idList2 = ['15640', '9256']
splitList=[0.4,0.2,0.3,0.4]
# Instead of looping through variables that dont tell a story,
# lets create a list of urls generated from those variables.
# This way the code is self-explanatory and is human-readable.
urls = generate_urls(idList2, splitList) # The creation of the url is not important right now
# Lets scrape each url
for url in urls:
print url
# First Step: get a web page via http request.
response = requests.get (url)
# Second step: use a parsing library to create a parsable object
soup = Soup(response.text, "html.parser") # Create a soup object (Once)
### IMPORTANT: Parsing Starts and Ends Here ###
### - Line: 75
# Final Step: Given a soup object, mine player data
player = extract_player (soup)
# Add the new entry to the list
all_players += [player]
return all_players
# If this script is being run, not imported, run the `main()` function.
if __name__ == '__main__':
all_players = main ()
print all_players[0][0] # Player List -> Name
print all_players[0][1] # Player List -> Table 1
print all_players[0][2] # Player List -> Table 2
print all_players[0][3] # Player List -> Table 3
print all_players[0][3][0] # Player List -> Table 1 -> Columns
print all_players[0][3][1] # Player List -> Table 1 -> All Rows
print all_players[0][3][1][0] # Player List -> Table 1 -> All Rows -> Row 1
print all_players[0][3][1][2] # Player List -> Table 1 -> All Rows -> Row 2
print all_players[0][3][1][2][0] # Player List -> Table 1 -> All Rows -> Row 2 -> Colum 1

I've updated the code, separated functionality, used lists instead of dictionaries (as requested). Lines 85+ is output testing (can ignore).
I see now that that your making multiple request (4) for the same player to gather more data on them. In the last answer I provided, the code only kept the last request made. Using a list eliminated this problem.
You may want to condense the list so that their is only one entry per player.
The core of the program is on lines 65-77
Everything above all_player decleration on line 57 is a function To handle scraping.
UPDATED: scrape_players.py
from bs4 import BeautifulSoup as Soup
import requests
def http_get (id, split):
""" Make a get request, return the response. """
# Create url parameters dictinoary
payload = {'split': split, 'playerid': id, 'position': 'OF', 'season': 0}
url = 'http://www.fangraphs.com/statsplits.aspx'
return requests.get(url, params=payload) # Pass payload through `requests.get()`
def extract_player_name (soup):
""" Extract the player name from the browser title """
# Browser title contains player name, strip all but name
player_name = repr(soup.title.text.strip('\r\n\t'))
player_name = player_name.split(' \\xbb')[0] # Split on ` »`
player_name = player_name[2:] # Erase a leading characters from using `repr`
return player_name
def extract_table (table_id, soup):
""" Extract data from a table, return the column headers and the table rows"""
# SCRAPE: Get a table
table_tag = soup.find("table", {"id" : 'SeasonSplits1_dgSeason%d_ctl00' % table_id})
# SCRAPE: Extract table column headers
columns = [th.text for th in table_tag.findAll('th')]
rows = []
# SCRAPE: Extract Table Contents
for row in table_tag.tbody.findAll('tr'):
rows.append ([col.text for col in row.findAll('td')]) # Gather all columns in the row
# RETURN: [columns, rows]
return [columns, rows]
def extract_player (soup):
""" Extract player data and store in a list. ['name', [columns, rows], [table2]]"""
player = []
# player name is first in player list
player.append (extract_player_name (soup))
# Each table is a list entry
for season in range(1,4):
player.append(extract_table (season, soup))
# Return the player list
return player
# A list of all players
all_players = [
#'playername',
#[table_columns, table_rows],
#[table_columns, table_rows],
#[['Season', 'vs R as R'], [['2015', 'yes'], ['2016', 'no'], ['2017', 'no'],]],
]
# I dont know what these values are. Sorry!
idList2 = ['15640', '9256']
splitList=[0.4,0.2,0.3,0.4]
# Scrape data
for id in idList2:
for split in splitList:
response = http_get (id, split)
soup = Soup(response.text, "html.parser") # Create a soup object (Once)
all_players.append (extract_player (soup))
# or all_players += [scrape_player (soup)]
# Output data
def PrintPlayerAsTable (player, show_name=True):
if show_name: print player[0] # First entry is the player name
for table in player[1:]: # All other entries are tables
PrintTableAsTable(table)
def PrintTableAsTable (table, table_sep='\n'):
print table_sep
PrintRowAsTable(table[0]) # The first row in the table is the columns
for row in table[1]: # The second item in the table is a list of rows
PrintRowAsTable (row)
def PrintRowAsTable (row=[], prefix='\t'):
""" Print out the list in a table foramt. """
print prefix + ''.join([col.ljust(15) for col in row])
# There are 4 entries to every player, one for each request made
PrintPlayerAsTable (all_players[0])
PrintPlayerAsTable (all_players[1], False)
PrintPlayerAsTable (all_players[2], False)
PrintPlayerAsTable (all_players[3], False)
print '\n\nScraped %d player Statistics' % len(all_players)
for player in all_players:
print '\t- %s' % player[0]
# 4th player entry
print '\n\n'
print all_players[4][0] # Player name
print '\n'
#print all_players[4][1] # Table 1
print all_players[4][1][0] # Table 1 Column Headers
#print all_players[4][1][1] # Table 1 Rows
print all_players[4][1][1][1] # Table 1 Rows Row 1
print all_players[4][1][1][2] # Table 1 Rows Row 2
print all_players[4][1][1][-1] # Table 1 Rows Last Row
print '\n'
#print all_players[4][2] # Table 2
print all_players[4][2][0] # Table 2 Column Headers
#print all_players[4][2][1] # Table 2 Rows
print all_players[4][2][1][1] # Table 2 Rows Row 1
print all_players[4][2][1][2] # Table 2 Rows Row 2
print all_players[4][2][1][-1] # Table 2 Rows Last Row
print '\nTable 3'
PrintRowAsTable(all_players[4][2][0], '') # Table 3 Column Headers
PrintRowAsTable(all_players[4][2][1][1], '') # Table 3 Rows Row 1
PrintRowAsTable(all_players[4][2][1][2], '') # Table 3 Rows Row 2
PrintRowAsTable(all_players[4][2][1][-1], '') # Table 3 Rows Last Row
OUTPUT:
Outputs scraped data, so you can see how the all_players is structured.
Aaron Judge
Season vs R as R G AB PA H 1B 2B 3B HR R RBI BB IBB SO HBP SF SH GDP SB CS AVG
2016 vs R as R 27 69 77 14 8 2 0 4 8 10 6 0 32 1 1 0 2 0 0 .203
2017 vs R as R 66 198 231 65 34 10 2 19 37 42 31 3 71 2 0 0 8 3 0 .328
Total vs R as R 93 267 308 79 42 12 2 23 45 52 37 3 103 3 1 0 10 3 0 .296
Season vs R as R BB% K% BB/K AVG OBP SLG OPS ISO BABIP wRC wRAA wOBA wRC+
2016 vs R as R 7.8 % 41.6 % 0.19 .203 .273 .406 .679 .203 .294 7 -1.7 .291 79
2017 vs R as R 13.4 % 30.7 % 0.44 .328 .424 .687 1.111 .359 .426 54 26.1 .454 189
Total vs R as R 12.0 % 33.4 % 0.36 .296 .386 .614 1.001 .318 .394 62 24.4 .413 162
Season vs R as R GB/FB LD% GB% FB% IFFB% HR/FB IFH% BUH% Pull% Cent% Oppo% Soft% Med% Hard% Pitches Balls Strikes
2016 vs R as R 0.74 13.2 % 36.8 % 50.0 % 0.0 % 21.1 % 7.1 % 0.0 % 50.0 % 29.0 % 21.1 % 7.9 % 42.1 % 50.0 % 327 117 210
2017 vs R as R 1.14 27.6 % 38.6 % 33.9 % 2.3 % 44.2 % 6.1 % 0.0 % 45.7 % 26.8 % 27.6 % 11.0 % 39.4 % 49.6 % 985 395 590
Total vs R as R 1.02 24.2 % 38.2 % 37.6 % 1.6 % 37.1 % 6.3 % 0.0 % 46.7 % 27.3 % 26.1 % 10.3 % 40.0 % 49.7 % 1312 512 800
Season vs R as L G AB PA H 1B 2B 3B HR R RBI BB IBB SO HBP SF SH GDP SB CS AVG
2016 vs R as L 3 0 0 0 0 0 0 0 2 0 0 0 0 0 0 0 0 0 1 .000
2017 vs R as L 20 0 0 0 0 0 0 0 13 0 0 0 0 0 0 0 0 3 1 .000
Total vs R as L 23 0 0 0 0 0 0 0 15 0 0 0 0 0 0 0 0 3 2 .000
Season vs R as L BB% K% BB/K AVG OBP SLG OPS ISO BABIP wRC wRAA wOBA wRC+
2016 vs R as L 0.0 % 0.0 % 0.00 .000 .000 .000 .000 .000 .000 0 0.0 .000  
2017 vs R as L 0.0 % 0.0 % 0.00 .000 .000 .000 .000 .000 .000 0 0.0 .000  
Total vs R as L 0.0 % 0.0 % 0.00 .000 .000 .000 .000 .000 .000 0 0.0 .000  
Season vs R as L GB/FB LD% GB% FB% IFFB% HR/FB IFH% BUH% Pull% Cent% Oppo% Soft% Med% Hard% Pitches Balls Strikes
2016 vs R as L 0.00 0.0 % 0.0 % 0.0 % 0.0 % 0.0 % 0.0 % 0.0 %             0 0 0
2017 vs R as L 0.00 0.0 % 0.0 % 0.0 % 0.0 % 0.0 % 0.0 % 0.0 %             0 0 0
Total vs R as L 0.00 0.0 % 0.0 % 0.0 % 0.0 % 0.0 % 0.0 % 0.0 %             0 0 0
Season vs L as R G AB PA H 1B 2B 3B HR R RBI BB IBB SO HBP SF SH GDP SB CS AVG
2016 vs L as R 11 15 18 1 1 0 0 0 0 0 3 0 10 0 0 0 0 0 0 .067
2017 vs L as R 26 47 61 16 9 1 1 5 9 12 13 0 16 1 0 0 2 0 0 .340
Total vs L as R 37 62 79 17 10 1 1 5 9 12 16 0 26 1 0 0 2 0 0 .274
Season vs L as R BB% K% BB/K AVG OBP SLG OPS ISO BABIP wRC wRAA wOBA wRC+
2016 vs L as R 16.7 % 55.6 % 0.30 .067 .222 .067 .289 .000 .200 0 -2.3 .164 -8
2017 vs L as R 21.3 % 26.2 % 0.81 .340 .492 .723 1.215 .383 .423 17 9.1 .496 218
Total vs L as R 20.3 % 32.9 % 0.62 .274 .430 .565 .995 .290 .387 16 6.8 .421 166
Season vs L as R GB/FB LD% GB% FB% IFFB% HR/FB IFH% BUH% Pull% Cent% Oppo% Soft% Med% Hard% Pitches Balls Strikes
2016 vs L as R 0.33 20.0 % 20.0 % 60.0 % 0.0 % 0.0 % 0.0 % 0.0 % 20.0 % 60.0 % 20.0 % 20.0 % 40.0 % 40.0 % 81 32 49
2017 vs L as R 0.73 16.1 % 35.5 % 48.4 % 0.0 % 33.3 % 0.0 % 0.0 % 29.0 % 48.4 % 22.6 % 16.1 % 35.5 % 48.4 % 295 135 160
Total vs L as R 0.67 16.7 % 33.3 % 50.0 % 0.0 % 27.8 % 0.0 % 0.0 % 27.8 % 50.0 % 22.2 % 16.7 % 36.1 % 47.2 % 376 167 209
Season vs R as R G AB PA H 1B 2B 3B HR R RBI BB IBB SO HBP SF SH GDP SB CS AVG
2016 vs R as R 27 69 77 14 8 2 0 4 8 10 6 0 32 1 1 0 2 0 0 .203
2017 vs R as R 66 198 231 65 34 10 2 19 37 42 31 3 71 2 0 0 8 3 0 .328
Total vs R as R 93 267 308 79 42 12 2 23 45 52 37 3 103 3 1 0 10 3 0 .296
Season vs R as R BB% K% BB/K AVG OBP SLG OPS ISO BABIP wRC wRAA wOBA wRC+
2016 vs R as R 7.8 % 41.6 % 0.19 .203 .273 .406 .679 .203 .294 7 -1.7 .291 79
2017 vs R as R 13.4 % 30.7 % 0.44 .328 .424 .687 1.111 .359 .426 54 26.1 .454 189
Total vs R as R 12.0 % 33.4 % 0.36 .296 .386 .614 1.001 .318 .394 62 24.4 .413 162
Season vs R as R GB/FB LD% GB% FB% IFFB% HR/FB IFH% BUH% Pull% Cent% Oppo% Soft% Med% Hard% Pitches Balls Strikes
2016 vs R as R 0.74 13.2 % 36.8 % 50.0 % 0.0 % 21.1 % 7.1 % 0.0 % 50.0 % 29.0 % 21.1 % 7.9 % 42.1 % 50.0 % 327 117 210
2017 vs R as R 1.14 27.6 % 38.6 % 33.9 % 2.3 % 44.2 % 6.1 % 0.0 % 45.7 % 26.8 % 27.6 % 11.0 % 39.4 % 49.6 % 985 395 590
Total vs R as R 1.02 24.2 % 38.2 % 37.6 % 1.6 % 37.1 % 6.3 % 0.0 % 46.7 % 27.3 % 26.1 % 10.3 % 40.0 % 49.7 % 1312 512 800
Scraped 8 player Statistics
- Aaron Judge
- Aaron Judge
- Aaron Judge
- Aaron Judge
- A.J. Pollock
- A.J. Pollock
- A.J. Pollock
- A.J. Pollock
A.J. Pollock
[u'Season', u'vs R as R', u'G', u'AB', u'PA', u'H', u'1B', u'2B', u'3B', u'HR', u'R', u'RBI', u'BB', u'IBB', u'SO', u'HBP', u'SF', u'SH', u'GDP', u'SB', u'CS', u'AVG']
[u'2013', u'vs R as R', u'115', u'270', u'295', u'70', u'52', u'12', u'2', u'4', u'25', u'21', u'21', u'1', u'54', u'1', u'0', u'3', u'4', u'3', u'1', u'.259']
[u'2014', u'vs R as R', u'71', u'215', u'232', u'66', u'42', u'17', u'3', u'4', u'21', u'14', u'15', u'0', u'41', u'2', u'0', u'0', u'3', u'7', u'1', u'.307']
[u'Total', u'vs R as R', u'395', u'1120', u'1230', u'330', u'225', u'67', u'13', u'25', u'122', u'102', u'93', u'1', u'199', u'5', u'9', u'3', u'23', u'41', u'6', u'.295']
[u'Season', u'vs R as R', u'BB%', u'K%', u'BB/K', u'AVG', u'OBP', u'SLG', u'OPS', u'ISO', u'BABIP', u'wRC', u'wRAA', u'wOBA', u'wRC+']
[u'2013', u'vs R as R', u'7.1 %', u'18.3 %', u'0.39', u'.259', u'.315', u'.363', u'.678', u'.104', u'.311', u'29', u'-3.0', u'.301', u'84']
[u'2014', u'vs R as R', u'6.5 %', u'17.7 %', u'0.37', u'.307', u'.358', u'.470', u'.828', u'.163', u'.365', u'35', u'9.6', u'.364', u'128']
[u'Total', u'vs R as R', u'7.6 %', u'16.2 %', u'0.47', u'.295', u'.349', u'.445', u'.793', u'.150', u'.337', u'168', u'30.7', u'.345', u'113']
Table 3
Season vs R as R BB% K% BB/K AVG OBP SLG OPS ISO BABIP wRC wRAA wOBA wRC+
2013 vs R as R 7.1 % 18.3 % 0.39 .259 .315 .363 .678 .104 .311 29 -3.0 .301 84
2014 vs R as R 6.5 % 17.7 % 0.37 .307 .358 .470 .828 .163 .365 35 9.6 .364 128
Total vs R as R 7.6 % 16.2 % 0.47 .295 .349 .445 .793 .150 .337 168 30.7 .345 113

Develop Reference

Python is a programming language that lets you work quickly and integrate systems more effectively.

Python Beautiful Soup Webscraping: Cannot get a full table to display - python

Related

Cannot scrape some table using Pandas

Web Scraping with BS4 - Can you sort this out?

Multiple table header <thead> in table <table> and how to scrape data from <thead> as a table row

Selenium code can not catch the table from Chrome

Loop Through Table Rows Using BeautifulSoup

Categories

Resources