web scraping infogol.net attributerror with beautiful soup

web scraping infogol.net attributerror with beautiful soup - python

I am trying to scrape the information for ajax matches from infogol. When I inspect the webpage I find that the table class = 'teamstats-summary-matches ng-scope' but whenI try this i find nothing. So far I have come up with the following code:
import requests
from bs4 import BeautifulSoup
# Set the URL of the webpage you want to scrape
url = 'https://www.infogol.net/en/team/ajax/62'
# Make a request to the webpage
response = requests.get(url)
# Parse the HTML of the webpage
soup = BeautifulSoup(response.text, 'html.parser')
# Find the table containing the data
table = soup.find('table', class_='teamstats-summary-matches ng-scope')
if not table:
print('Cannot find table')

Check that you have found what you are expecting before proceeding
# Find the table containing the data
table = soup.find('table', class_='stats-table')
if not table:
print('Cannot find table')
sys.exit(1)

Related

Beautiful Soup parsing table from react html

Im trying to parse table with orders from html page.
Here the html:
HTML PIC
I need to get data from those table rows,
Here what i tried to do:
response = requests.get('https://partner.market.yandex.ru/supplier/23309133/fulfillment/orders', headers=headers)
soup = BeautifulSoup(response.text, 'lxml')
q = soup.findAll('tr')
a = soup.find('tr')
print(q)
print(a)
But it gives me None. So any idea how to get into those table rows?
I tried to iterate over each div in html... once i get closer to div which contains those tables it give me None as well.
Appreciate any help

Aight. I found a solution by using selenium instead of requests lib.
I don't have any idea why it doesn't work with requests lib since it's doing the same thing as selenium (just sending an get request). But, with the selenium it works.
So here is what I do:
driver = webdriver.Chrome(r"C:\Users\Booking\PycharmProjects\britishairways\chromedriver.exe")
driver.get('https://www.britishairways.com/travel/managebooking/public/ru_ru')
time.sleep(15) # make an authorization
res = driver.page_source
print(res)
soup = BeautifulSoup(res, 'lxml')
b = soup.find_all('tr')

Python Beautiful Soup not pulling all the data

I'm currently looking to pull specific issuer data from URL html with a specific class and ID from the Luxembourg Stock Exchange using Beautiful Soup.
The example link I'm using is here: https://www.bourse.lu/security/XS1338503920/234821
And the data I'm trying to pull is the name under 'Issuer' stored as text; in this case it's 'BNP Paribas Issuance BV'.
I've tried using the class vignette-description-content-text, but it can't seem to find any data, as when looking through the soup, not all of the html is being pulled.
I've found that my current code only pulls some of the html, and I don't know how to expand the data it's pulling.
import requests
from bs4 import BeautifulSoup
URL = "https://www.bourse.lu/security/XS1338503920/234821"
page = requests.get(URL)
soup = BeautifulSoup(page.content, 'html.parser')
results = soup.find(id='ResultsContainer', class_="vignette-description-content-text")
I have found similar problems and followed guides shown in link 1, link 2 and link 3, but the example html used seems very different to the webpage I'm looking to scrape.
Is there something I'm missing to pull and scrape the data?

Based on your code, I suspect you are trying to get element which has class=vignette-description-content-text and id=ResultsContaine.
The class_ is correct way to use ,but not with the id
Try this:
import requests
from bs4 import BeautifulSoup
URL = "https://www.bourse.lu/security/XS1338503920/234821"
page = requests.get(URL)
soup = BeautifulSoup(page.content, 'html.parser')
def applyFilter(element):
if element.has_attr('id') and element.has_attr('class'):
if "vignette-description-content-text" in element['class'] and element['id'] == "ResultsContainer":
return True
results = soup.find_all(applyFilter)
for result in results:
#Each result is an element here

scraping data from Chicago Mercantile Exchange website

I am trying to scrape data from table of CME website. Specifically I want to pull the data of Open interest for every future currency. but when I try parse the table it gives me nothing.
Link from which I am trying to scrape the data given below is the code through which I am trying to do it.
from bs4 import BeautifulSoup
import requests
url="https://www.cmegroup.com/market-data/volume-open-interest/fx-volume.html"
# Make a GET request to fetch the raw HTML content
html_content = requests.get(url).text
# Parse the html content
soup = BeautifulSoup(html_content)
table = soup.find("table", attrs={"class": "cmeData voiDataset"})
print(table)

Table data comes from another HTML doc that you can get with
from bs4 import BeautifulSoup
import requests
url = 'https://www.cmegroup.com/CmeWS/mvc/xsltTransformer.do?xlstDoc=/XSLT/md/voi/voi_asset_class_final.xsl&url=/da/VOI/V2/Totals/TradeDate/20201116/AssetClassId/3/ReportType/F?excluded=CEE,CEU,KCB&hidelinks=false&html='
# Make a GET request to fetch the raw HTML content
html_content = requests.get(url).text
# Parse the html content
soup = BeautifulSoup(html_content)
table = soup.find("table", attrs={"class": "cmeData voiDataset"})
print(table)
To get data for specific date you can change URL as below
# Date for 2020 November 12
date = '20201112'
url = 'https://www.cmegroup.com/CmeWS/mvc/xsltTransformer.do?xlstDoc=/XSLT/md/voi/voi_asset_class_final.xsl&url=/da/VOI/V2/Totals/TradeDate/{}/AssetClassId/3/ReportType/F?excluded=CEE,CEU,KCB&hidelinks=false&html='.format(date)

Extract data from BSE website

How can I extract the value of Security ID, Security Code, Group / Index, Wtd.Avg Price, Trade Date, Quantity Traded, % of Deliverable Quantity to Traded Quantity using Python 3 and save it to an XLS file. Below is the link.
https://www.bseindia.com/stock-share-price/smartlink-network-systems-ltd/smartlink/532419/
PS: I am completely new to the python. I know there are few libs which make scrapping easier like BeautifulSoup, selenium, requests, lxml etc. Don't have much idea about them.
Edit 1:
I tried something
from bs4 import BeautifulSoup
import requests
URL = 'https://www.bseindia.com/stock-share-price/smartlink-network-systems-ltd/smartlink/532419/'
r = requests.get(URL)
soup = BeautifulSoup(r.content, 'html5lib')
table = soup.find('div', attrs = {'id':'newheaddivgrey'})
print(table)
Its output is None. I was expecting all tables in the webpage and filter them further to get required data.
import requests
import lxml.html
URL = 'https://www.bseindia.com/stock-share-price/smartlink-network-systems-ltd/smartlink/532419/'
r = requests.get(URL)
root = lxml.html.fromstring(r.content)
title = root.xpath('//*[#id="SecuritywiseDeliveryPosition"]/table/tbody/tr/td/table/tbody/tr[1]/td')
print(title)
Tried another code. Same problem.
Edit 2:
Tried selenium. But I am not getting the table contents.
from selenium import webdriver
driver = webdriver.Chrome(r"C:\Program Files\JetBrains\PyCharm Community Edition 2017.3.3\bin\chromedriver.exe")
driver.get('https://www.bseindia.com/stock-share-price/smartlink-network-systems-ltd/smartlink/532419/')
table=driver.find_elements_by_xpath('//*[#id="SecuritywiseDeliveryPosition"]/table/tbody/tr/td/table/tbody/tr[1]/td')
print(table)
driver.quit()
Output is [<selenium.webdriver.remote.webelement.WebElement (session="befdd4f01e6152942c9cfc7c563a6bf2", element="0.13124528538297953-1")>]

After loading the page with Selenium, you can get the Javascript modified page source using driver.page_source. You can then pass this page source in the BeautifulSoup object.
driver = webdriver.Chrome()
driver.get('https://www.bseindia.com/stock-share-price/smartlink-network-systems-ltd/smartlink/532419/')
html = driver.page_source
driver.quit()
soup = BeautifulSoup(html, 'lxml')
table = soup.find('div', id='SecuritywiseDeliveryPosition')
This code will give you the Securitywise Delivery Position table in the table variable. You can then parse this BeautifulSoup object to get the different values you want.
The soup object contains the full page source including the elements that were dynamically added. Now, you can parse this to get all the things you mentioned.

Scraping a table with beautiful soup

I'm trying to scrape the price table (buy yes, prices and contracts available) from this site: https://www.predictit.org/Contract/7069/Will-the-Senate-pass-the-Better-Care-Reconciliation-Act-by-July-31#prices.
This is my (obviously very preliminary) code, structured now just to find the table:
from bs4 import BeautifulSoup
import requests
from lxml import html
import json, re
url = "https://www.predictit.org/Contract/7069/Will-the-Senate-pass-the-Better-Care-Reconciliation-Act-by-July-31#prices"
ret = requests.get(url).text
soup = BeautifulSoup(ret, "lxml")
try:
table = soup.find('table')
print table
except AttributeError as e:
print 'No tables found, exiting'
The code finds and parses a table; however, it's the wrong one (the data table on a different tab https://www.predictit.org/Contract/7069/Will-the-Senate-pass-the-Better-Care-Reconciliation-Act-by-July-31#data).
How do I resolve this error to ensure the code identifies the correct table?

As #downshift mentioned in the comments the table is js generated using xhr request.
So you can either use Selenium or make a direct request to the site's api.
Using the 2nd option:
url = "https://www.predictit.org/PrivateData/GetPriceListAjax?contractId=7069"
ret = requests.get(url).text
soup = BeautifulSoup(ret, "lxml")
table = soup.find('table')

Develop Reference

Python is a programming language that lets you work quickly and integrate systems more effectively.

web scraping infogol.net attributerror with beautiful soup - python

Check that you have found what you are expecting before proceeding # Find the table containing the data table = soup.find('table', class_='stats-table') if not table: print('Cannot find table') sys.exit(1)

Related

Beautiful Soup parsing table from react html

Python Beautiful Soup not pulling all the data

scraping data from Chicago Mercantile Exchange website

Extract data from BSE website

Scraping a table with beautiful soup

Categories

Resources