Web scraping Tennis24 in play stats

Web scraping Tennis24 in play stats - python

I have been trying to work out how to scrape the live and updating statistics on Tennis 24 "https://www.tennis24.com/match/4xFaW6fP/#match-statistics;0" a page such as this but when I try to use selenium nothing is returned. even if I just try to return the 1 element such as
<div class="statText statText--awayValue">4</div>
Could someone please give me some pointers as this is my first scraping project?

To print the text 4 you need to induce WebDriverWait for the visibility_of_element_located() and you can use either of the following Locator Strategies:
Using XPATH and text attribute:
driver.get('https://www.tennis24.com/match/4xFaW6fP/#match-statistics;0')
print(WebDriverWait(driver, 20).until(EC.visibility_of_element_located((By.XPATH, "//div[#class='statText statText--titleValue' and text()='Aces']//following::div"))).text)
Using XPATH and get_attribute('innerHTML'):
driver.get('https://www.tennis24.com/match/4xFaW6fP/#match-statistics;0')
print(WebDriverWait(driver, 20).until(EC.visibility_of_element_located((By.XPATH, "//div[#class='statText statText--titleValue' and text()='Aces']//following::div"))).get_attribute('innerHTML'))
Note : You have to add the following imports :
from selenium.webdriver.support.ui import WebDriverWait
from selenium.webdriver.common.by import By
from selenium.webdriver.support import expected_conditions as EC

Related

TimeoutException while extracting the value of the title attribute using Selenium Python

I am trying to get the following count of an Instagram account. I believe that the XPATH is correct an exists. Here's a screenshot showing it exists when I search for it:
This is my code:
wait = WebDriverWait(driver, 30)
followers = wait.until(EC.presence_of_element_located((By.XPATH, "/html/body/div[1]/div/div/div/div[1]/div/div/div/div[1]/div[1]/section/main/div/ul/li[2]/button/div/span")))
print(followers.get_attribute("title"))
I have even looked at similar projects that find the following count and our code is almost exactly the same.

The desired element is a dynamic element, so to locate the element instead of presence_of_element_located() you need to induce WebDriverWait for the visibility_of_element_located() and you can use either of the following locator strategy:
Using XPATH:
print(WebDriverWait(driver, 20).until(EC.visibility_of_element_located((By.XPATH, "//div[contains(., 'followers')]//span[#class and #title and text()]"))).get_attribute("title"))
Note: You have to add the following imports :
from selenium.webdriver.support.ui import WebDriverWait
from selenium.webdriver.common.by import By
from selenium.webdriver.support import expected_conditions as EC

How to extract the value of the src attribute using Selenium

I have problems with something I don't even know the name.
I'm trying to reach the link next to where it says src= "LINK" with selenium.
I have class name = tWeCI, I guess I need to achieve using this but I have no idea how to do it.
Code trials:
from selenium import webdriver
from selenium.webdriver.common.by import By
browser = webdriver.Chrome()
browser.get("https://www.instagram.com/p/CYMHMEGBSRT/")
a = browser.find_element(by=By.CLASS_NAME, value="tWeCl")
Output:
https://instagram.fist4-1.fna.fbcdn.net/v/t50.2886-16/271301182_1082110899295465_6686845989868801609_n.mp4?efg=eyJ2ZW5jb2RlX3RhZyI6InZ0c192b2RfdXJsZ2VuLjcyMC5jbGlwcy5iYXNlbGluZSIsInFlX2dyb3VwcyI6IltcImlnX3dlYl9kZWxpdmVyeV92dHNfb3RmXCJdIn0&_nc_ht=instagram.fist4-1.fna.fbcdn.net&_nc_cat=101&_nc_ohc=6RJahYZ39DIAX_ul9E4&edm=AABBvjUBAAAA&vs=453007466354843_3354420889&_nc_vs=HBksFQAYJEdENjZLeERwbE1LVExOZ0RBRWtPNE5YLWNzeGNicV9FQUFBRhUAAsgBABUAGCRHS0ZaS1JCMldIUXZxRzhCQUJ4bFB2ZnVrbE1hYnFfRUFBQUYVAgLIAQAoABgAGwAVAAAmqN%2FFrpms4z8VAigCQzMsF0A%2BmZmZmZmaGBJkYXNoX2Jhc2VsaW5lXzFfdjERAHX%2BBwA%3D&_nc_rid=c8bb85b1e6&ccb=7-4&oe=62408107&oh=00_AT-E77LrFH7G6VVXGeh8bRFQsu95hhQlqUGZdinzFBIUYQ&_nc_sid=83d603"
Snapshot of the HTML:

To print the value of the src attribute you have to induce WebDriverWait for the visibility_of_element_located() and you can use either of the following Locator Strategies:
Using CSS_SELECTOR:
driver.get('https://www.instagram.com/p/CYMHMEGBSRT/')
print(WebDriverWait(driver, 20).until(EC.visibility_of_element_located((By.CSS_SELECTOR, "video.tWeCl"))).get_attribute("src"))
Using XPATH:
driver.get('https://www.instagram.com/p/CYMHMEGBSRT/')
print(WebDriverWait(driver, 20).until(EC.visibility_of_element_located((By.XPATH, "//video[#class='tWeCl']"))).get_attribute("src"))
Console Output:
https://instagram.fpnq13-2.fna.fbcdn.net/v/t50.2886-16/271301182_1082110899295465_6686845989868801609_n.mp4?efg=eyJ2ZW5jb2RlX3RhZyI6InZ0c192b2RfdXJsZ2VuLjcyMC5jbGlwcy5iYXNlbGluZSIsInFlX2dyb3VwcyI6IltcImlnX3dlYl9kZWxpdmVyeV92dHNfb3RmXCJdIn0&_nc_ht=instagram.fpnq13-2.fna.fbcdn.net&_nc_cat=101&_nc_ohc=6RJahYZ39DIAX-Iphu9&edm=AABBvjUBAAAA&vs=453007466354843_3354420889&_nc_vs=HBksFQAYJEdENjZLeERwbE1LVExOZ0RBRWtPNE5YLWNzeGNicV9FQUFBRhUAAsgBABUAGCRHS0ZaS1JCMldIUXZxRzhCQUJ4bFB2ZnVrbE1hYnFfRUFBQUYVAgLIAQAoABgAGwAVAAAmqN%2FFrpms4z8VAigCQzMsF0A%2BmZmZmZmaGBJkYXNoX2Jhc2VsaW5lXzFfdjERAHX%2BBwA%3D&_nc_rid=34a08d5a01&ccb=7-4&oe=62408107&oh=00_AT9sh_BH__zjeReDB7lde4t3avzYqDjimTJRnoZi6Lj-TQ&_nc_sid=83d603
Note : You have to add the following imports :
from selenium.webdriver.support.ui import WebDriverWait
from selenium.webdriver.common.by import By
from selenium.webdriver.support import expected_conditions as EC

How can I select the image link from src html part using python selenium

I'm trying to get all the links of the images from the below mentioned html code type:
<img src="https://img.xyz.com/rcmshopping/PROD_IMAGES/fffff_2018_05_10_10_49_23.jpg">
I am not able to fetch any link from this part of html out f 3 any 1 is good for me. Im new to python please guide.

To print the value of the src attribute you have to induce WebDriverWait for the visibility_of_element_located() and you can use either of the following Locator Strategies:
Using XPATH:
print(WebDriverWait(browser, 20).until(EC.visibility_of_element_located((By.XPATH, "//a[#class='elevatezoom-gallery']/img"))).get_attribute("src"))
Using CSS_SELECTOR:
print(WebDriverWait(browser, 20).until(EC.visibility_of_element_located((By.CSS_SELECTOR, "a.elevatezoom-gallery>img"))).get_attribute("src"))
Note : You have to add the following imports :
from selenium.webdriver.support.ui import WebDriverWait
from selenium.webdriver.common.by import By
from selenium.webdriver.support import expected_conditions as EC

You can achieve that by using the find_elements_by_css_selector() function.
links = self.webdriver.find_elements_by_css_selector('img[src="https://img.xyz.com/rcmshopping/PROD_IMAGES/fffff_2018_05_10_10_49_23.jpg"])

How to extract the text from the HTML using Selenium and Python

I have this HTML:
And I want get this text "rataoriginal". (This text changes, I need this part of the code as text)
I've tried
xpath = "//span[#class='_5h6Y_ _3Whw5 selectable-text invisible-space copyable-text']"
auxa = driver.find_element_by_xpath(xpath).text
print(auxa)
But it prints the same as print("\n"). I don't want to use beaultifulsoup for a while.
This HTML is from 'https://web.whatsapp.com'

The WebElement is a dynamic element, so to print the values you have to induce WebDriverWait for the visibility_of_element_located() and you can use either of the following Locator Strategies:
Using CSS_SELECTOR:
print(WebDriverWait(driver, 20).until(EC.visibility_of_element_located((By.CSS_SELECTOR, "span.selectable-text.invisible-space.copyable-text[dir='auto']"))).text)
Using XPATH:
print(WebDriverWait(driver, 20).until(EC.visibility_of_element_located((By.XPATH, "//span[contains(#class, '') and contains(#class, 'invisible-space')][contains(#class, '') and #dir='auto']"))).text)
Note : You have to add the following imports :
from selenium.webdriver.support.ui import WebDriverWait
from selenium.webdriver.common.by import By
from selenium.webdriver.support import expected_conditions as EC
References
You can find a relevant discussion in:
How to retrieve the text of a WebElement using Selenium - Python

//*[contains(text(),"rataoriginal")] please use this xpath

How to extract the information of Total Confirmed from webpage with selenium webdriver in python

After clicking the selected country I want to get the number of "Total Confirmed" from the top of the website
Website:
https://gisanddata.maps.arcgis.com/apps/opsdashboard/index.html#/bda7594740fd40299423467b48e9ecf6
Below is the code i have written:
poland = driver.find_element_by_xpath("//h5//span[contains(text(),'Poland')]")
poland.click()

To extract the number "1,862" clicking on Poland is not mandatory, instead you need to induce WebDriverWait for either visibility_of_element_located() or element_to_be_clickable() and you can use the following xpath based Locator Strategy:
Using XPATH and ``:
driver.get('https://gisanddata.maps.arcgis.com/apps/opsdashboard/index.html#/bda7594740fd40299423467b48e9ecf6')
print(WebDriverWait(driver, 20).until(EC.visibility_of_element_located((By.XPATH, "//h5//span[contains(text(),'Poland')]//preceding::span[2]/strong"))).text)
Console Output:
1,862
Using XPATH and ``:
driver.get('https://gisanddata.maps.arcgis.com/apps/opsdashboard/index.html#/bda7594740fd40299423467b48e9ecf6')
print(WebDriverWait(driver, 20).until(EC.element_to_be_clickable((By.XPATH, "//h5//span[contains(text(),'Poland')]//preceding::span[2]/strong"))).text)
Console Output:
1,862
Note : You have to add the following imports :
from selenium.webdriver.support.ui import WebDriverWait
from selenium.webdriver.common.by import By
from selenium.webdriver.support import expected_conditions as EC

Develop Reference

Python is a programming language that lets you work quickly and integrate systems more effectively.

Web scraping Tennis24 in play stats - python

Related

TimeoutException while extracting the value of the title attribute using Selenium Python

How to extract the value of the src attribute using Selenium

How can I select the image link from src html part using python selenium

How to extract the text from the HTML using Selenium and Python

How to extract the information of Total Confirmed from webpage with selenium webdriver in python

Categories

Resources