Web scraping Tennis24 in play stats - python

I have been trying to work out how to scrape the live and updating statistics on Tennis 24 "https://www.tennis24.com/match/4xFaW6fP/#match-statistics;0" a page such as this but when I try to use selenium nothing is returned. even if I just try to return the 1 element such as
<div class="statText statText--awayValue">4</div>
Could someone please give me some pointers as this is my first scraping project?

To print the text 4 you need to induce WebDriverWait for the visibility_of_element_located() and you can use either of the following Locator Strategies:
Using XPATH and text attribute:
driver.get('https://www.tennis24.com/match/4xFaW6fP/#match-statistics;0')
print(WebDriverWait(driver, 20).until(EC.visibility_of_element_located((By.XPATH, "//div[#class='statText statText--titleValue' and text()='Aces']//following::div"))).text)
Using XPATH and get_attribute('innerHTML'):
driver.get('https://www.tennis24.com/match/4xFaW6fP/#match-statistics;0')
print(WebDriverWait(driver, 20).until(EC.visibility_of_element_located((By.XPATH, "//div[#class='statText statText--titleValue' and text()='Aces']//following::div"))).get_attribute('innerHTML'))
Note : You have to add the following imports :
from selenium.webdriver.support.ui import WebDriverWait
from selenium.webdriver.common.by import By
from selenium.webdriver.support import expected_conditions as EC

Related

TimeoutException while extracting the value of the title attribute using Selenium Python

I am trying to get the following count of an Instagram account. I believe that the XPATH is correct an exists. Here's a screenshot showing it exists when I search for it:
This is my code:
wait = WebDriverWait(driver, 30)
followers = wait.until(EC.presence_of_element_located((By.XPATH, "/html/body/div[1]/div/div/div/div[1]/div/div/div/div[1]/div[1]/section/main/div/ul/li[2]/button/div/span")))
print(followers.get_attribute("title"))
I have even looked at similar projects that find the following count and our code is almost exactly the same.
The desired element is a dynamic element, so to locate the element instead of presence_of_element_located() you need to induce WebDriverWait for the visibility_of_element_located() and you can use either of the following locator strategy:
Using XPATH:
print(WebDriverWait(driver, 20).until(EC.visibility_of_element_located((By.XPATH, "//div[contains(., 'followers')]//span[#class and #title and text()]"))).get_attribute("title"))
Note: You have to add the following imports :
from selenium.webdriver.support.ui import WebDriverWait
from selenium.webdriver.common.by import By
from selenium.webdriver.support import expected_conditions as EC

How to extract the value of the src attribute using Selenium

I have problems with something I don't even know the name.
I'm trying to reach the link next to where it says src= "LINK" with selenium.
I have class name = tWeCI, I guess I need to achieve using this but I have no idea how to do it.
Code trials:
from selenium import webdriver
from selenium.webdriver.common.by import By
browser = webdriver.Chrome()
browser.get("https://www.instagram.com/p/CYMHMEGBSRT/")
a = browser.find_element(by=By.CLASS_NAME, value="tWeCl")
Output:
https://instagram.fist4-1.fna.fbcdn.net/v/t50.2886-16/271301182_1082110899295465_6686845989868801609_n.mp4?efg=eyJ2ZW5jb2RlX3RhZyI6InZ0c192b2RfdXJsZ2VuLjcyMC5jbGlwcy5iYXNlbGluZSIsInFlX2dyb3VwcyI6IltcImlnX3dlYl9kZWxpdmVyeV92dHNfb3RmXCJdIn0&_nc_ht=instagram.fist4-1.fna.fbcdn.net&_nc_cat=101&_nc_ohc=6RJahYZ39DIAX_ul9E4&edm=AABBvjUBAAAA&vs=453007466354843_3354420889&_nc_vs=HBksFQAYJEdENjZLeERwbE1LVExOZ0RBRWtPNE5YLWNzeGNicV9FQUFBRhUAAsgBABUAGCRHS0ZaS1JCMldIUXZxRzhCQUJ4bFB2ZnVrbE1hYnFfRUFBQUYVAgLIAQAoABgAGwAVAAAmqN%2FFrpms4z8VAigCQzMsF0A%2BmZmZmZmaGBJkYXNoX2Jhc2VsaW5lXzFfdjERAHX%2BBwA%3D&_nc_rid=c8bb85b1e6&ccb=7-4&oe=62408107&oh=00_AT-E77LrFH7G6VVXGeh8bRFQsu95hhQlqUGZdinzFBIUYQ&_nc_sid=83d603"
Snapshot of the HTML:
To print the value of the src attribute you have to induce WebDriverWait for the visibility_of_element_located() and you can use either of the following Locator Strategies:
Using CSS_SELECTOR:
driver.get('https://www.instagram.com/p/CYMHMEGBSRT/')
print(WebDriverWait(driver, 20).until(EC.visibility_of_element_located((By.CSS_SELECTOR, "video.tWeCl"))).get_attribute("src"))
Using XPATH:
driver.get('https://www.instagram.com/p/CYMHMEGBSRT/')
print(WebDriverWait(driver, 20).until(EC.visibility_of_element_located((By.XPATH, "//video[#class='tWeCl']"))).get_attribute("src"))
Console Output:
https://instagram.fpnq13-2.fna.fbcdn.net/v/t50.2886-16/271301182_1082110899295465_6686845989868801609_n.mp4?efg=eyJ2ZW5jb2RlX3RhZyI6InZ0c192b2RfdXJsZ2VuLjcyMC5jbGlwcy5iYXNlbGluZSIsInFlX2dyb3VwcyI6IltcImlnX3dlYl9kZWxpdmVyeV92dHNfb3RmXCJdIn0&_nc_ht=instagram.fpnq13-2.fna.fbcdn.net&_nc_cat=101&_nc_ohc=6RJahYZ39DIAX-Iphu9&edm=AABBvjUBAAAA&vs=453007466354843_3354420889&_nc_vs=HBksFQAYJEdENjZLeERwbE1LVExOZ0RBRWtPNE5YLWNzeGNicV9FQUFBRhUAAsgBABUAGCRHS0ZaS1JCMldIUXZxRzhCQUJ4bFB2ZnVrbE1hYnFfRUFBQUYVAgLIAQAoABgAGwAVAAAmqN%2FFrpms4z8VAigCQzMsF0A%2BmZmZmZmaGBJkYXNoX2Jhc2VsaW5lXzFfdjERAHX%2BBwA%3D&_nc_rid=34a08d5a01&ccb=7-4&oe=62408107&oh=00_AT9sh_BH__zjeReDB7lde4t3avzYqDjimTJRnoZi6Lj-TQ&_nc_sid=83d603
Note : You have to add the following imports :
from selenium.webdriver.support.ui import WebDriverWait
from selenium.webdriver.common.by import By
from selenium.webdriver.support import expected_conditions as EC

How can I select the image link from src html part using python selenium

I'm trying to get all the links of the images from the below mentioned html code type:
<img src="https://img.xyz.com/rcmshopping/PROD_IMAGES/fffff_2018_05_10_10_49_23.jpg">
I am not able to fetch any link from this part of html out f 3 any 1 is good for me. Im new to python please guide.
To print the value of the src attribute you have to induce WebDriverWait for the visibility_of_element_located() and you can use either of the following Locator Strategies:
Using XPATH:
print(WebDriverWait(browser, 20).until(EC.visibility_of_element_located((By.XPATH, "//a[#class='elevatezoom-gallery']/img"))).get_attribute("src"))
Using CSS_SELECTOR:
print(WebDriverWait(browser, 20).until(EC.visibility_of_element_located((By.CSS_SELECTOR, "a.elevatezoom-gallery>img"))).get_attribute("src"))
Note : You have to add the following imports :
from selenium.webdriver.support.ui import WebDriverWait
from selenium.webdriver.common.by import By
from selenium.webdriver.support import expected_conditions as EC
You can achieve that by using the find_elements_by_css_selector() function.
links = self.webdriver.find_elements_by_css_selector('img[src="https://img.xyz.com/rcmshopping/PROD_IMAGES/fffff_2018_05_10_10_49_23.jpg"])

How to extract the text from the HTML using Selenium and Python

I have this HTML:
And I want get this text "rataoriginal". (This text changes, I need this part of the code as text)
I've tried
xpath = "//span[#class='_5h6Y_ _3Whw5 selectable-text invisible-space copyable-text']"
auxa = driver.find_element_by_xpath(xpath).text
print(auxa)
But it prints the same as print("\n"). I don't want to use beaultifulsoup for a while.
This HTML is from 'https://web.whatsapp.com'
The WebElement is a dynamic element, so to print the values you have to induce WebDriverWait for the visibility_of_element_located() and you can use either of the following Locator Strategies:
Using CSS_SELECTOR:
print(WebDriverWait(driver, 20).until(EC.visibility_of_element_located((By.CSS_SELECTOR, "span.selectable-text.invisible-space.copyable-text[dir='auto']"))).text)
Using XPATH:
print(WebDriverWait(driver, 20).until(EC.visibility_of_element_located((By.XPATH, "//span[contains(#class, '') and contains(#class, 'invisible-space')][contains(#class, '') and #dir='auto']"))).text)
Note : You have to add the following imports :
from selenium.webdriver.support.ui import WebDriverWait
from selenium.webdriver.common.by import By
from selenium.webdriver.support import expected_conditions as EC
References
You can find a relevant discussion in:
How to retrieve the text of a WebElement using Selenium - Python
//*[contains(text(),"rataoriginal")] please use this xpath

How to extract the information of Total Confirmed from webpage with selenium webdriver in python

After clicking the selected country I want to get the number of "Total Confirmed" from the top of the website
Website:
https://gisanddata.maps.arcgis.com/apps/opsdashboard/index.html#/bda7594740fd40299423467b48e9ecf6
Below is the code i have written:
poland = driver.find_element_by_xpath("//h5//span[contains(text(),'Poland')]")
poland.click()
To extract the number "1,862" clicking on Poland is not mandatory, instead you need to induce WebDriverWait for either visibility_of_element_located() or element_to_be_clickable() and you can use the following xpath based Locator Strategy:
Using XPATH and ``:
driver.get('https://gisanddata.maps.arcgis.com/apps/opsdashboard/index.html#/bda7594740fd40299423467b48e9ecf6')
print(WebDriverWait(driver, 20).until(EC.visibility_of_element_located((By.XPATH, "//h5//span[contains(text(),'Poland')]//preceding::span[2]/strong"))).text)
Console Output:
1,862
Using XPATH and ``:
driver.get('https://gisanddata.maps.arcgis.com/apps/opsdashboard/index.html#/bda7594740fd40299423467b48e9ecf6')
print(WebDriverWait(driver, 20).until(EC.element_to_be_clickable((By.XPATH, "//h5//span[contains(text(),'Poland')]//preceding::span[2]/strong"))).text)
Console Output:
1,862
Note : You have to add the following imports :
from selenium.webdriver.support.ui import WebDriverWait
from selenium.webdriver.common.by import By
from selenium.webdriver.support import expected_conditions as EC

Categories

Resources