python - selenium scraping rotten tomatoes for audience score

python - selenium scraping rotten tomatoes for audience score - python

I'm trying to scrape the audience score from rotten tomatoes. I was able to get reviews but not sure how use selenium to get the "audiencescore"
Source:
<score-board
audiencestate="upright"
audiencescore="96"
class="scoreboard"
rating="R"
skeleton="panel"
tomatometerstate="certified-fresh"
tomatometerscore="92"
data-qa="score-panel"
>
<h1 slot="title" class="scoreboard__title" data-qa="score-panel-movie-title">Pulp Fiction</h1>
<p slot="info" class="scoreboard__info">1994, Crime/Drama, 2h 33m</p>
<a slot="critics-count" href="/m/pulp_fiction/reviews?intcmp=rt-scorecard_tomatometer-reviews" class="scoreboard__link scoreboard__link--tomatometer" data-qa="tomatometer-review-count">110 Reviews</a>
<a slot="audience-count" href="/m/pulp_fiction/reviews?type=user&intcmp=rt-scorecard_audience-score-reviews" class="scoreboard__link scoreboard__link--audience" data-qa="audience-rating-count">250,000+ Ratings</a>
<div slot="sponsorship" id="tomatometer_sponsorship_ad"></div>
</score-board>
Code:
from selenium import webdriver
driver = webdriver.Firefox()
url = 'https://www.rottentomatoes.com/m/pulp_fiction'
driver.get(url)
print(driver.find_element_by_css_selector('a[slot=audience-count]').text)

The attribute value of audiencescore which is not any text nodes value that's why we can't invoke .text method to grab that value. So you have to call get_attribute() after selecting the right locator. The following expression is working.
print(driver.find_element(By.CSS_SELECTOR,'#topSection score-board').get_attribute('audiencescore'))
#import
from selenium.webdriver.common.by import By

You were close enough. To extract the value of the audiencescore attribute i.e. the text 96 ideally you need to induce WebDriverWait for the visibility_of_element_located() and you can use either of the following locator strategies:
Using CSS_SELECTOR:
driver.get("https://www.rottentomatoes.com/m/pulp_fiction")
print(WebDriverWait(driver, 20).until(EC.visibility_of_element_located((By.CSS_SELECTOR, "score-board.scoreboard"))).get_attribute("audiencescore"))
Using XPATH:
driver.get("https://www.rottentomatoes.com/m/pulp_fiction")
print(WebDriverWait(driver, 20).until(EC.visibility_of_element_located((By.XPATH, "//score-board[#class='scoreboard']"))).get_attribute("audiencescore"))
Note : You have to add the following imports :
from selenium.webdriver.support.ui import WebDriverWait
from selenium.webdriver.common.by import By
from selenium.webdriver.support import expected_conditions as EC
Console Output:
96
You can find a relevant discussion in How to retrieve the text of a WebElement using Selenium - Python

Try this:
1- Get element score-board
2- Get audiencescore attribute from element
audiencescore = driver.find_element_by_css_selector('score-board').get_attribute('audiencescore')

Related

Get Youtube video title using classname and text attribute using Selenium and Python

Hi I'm using Python Selenium Webdriver to get Youtube title but keep getting more info than I'd like.
The line is:
driver.find_element_by_class_name("style-scope ytd-video-primary-info-renderer").text
Is there any way to fix it and make it more efficient so that it displays only the title.
Here is the test script Im using:
from selenium import webdriver as wd
from time import sleep as zz
driver = wd.Firefox(executable_path=r'./geckodriver.exe')
driver.get('https://www.youtube.com/watch?v=wma0szfIafk')
zz(4)
test_atr = driver.find_element_by_class_name("style-scope ytd-video-primary-info-renderer").text
print(test_atr)

To print the title text OBI-WAN KENOBI Official Trailer (2022) Teaser you can use either of the following Locator Strategies:
Using css_selector and get_attribute("innerHTML"):
print(driver.find_element(By.CSS_SELECTOR, "h1.title.style-scope.ytd-video-primary-info-renderer > yt-formatted-string.style-scope.ytd-video-primary-info-renderer").get_attribute("innerHTML"))
Using xpath and text attribute:
print(driver.find_element(By.XPATH, "//h1[#class='title style-scope ytd-video-primary-info-renderer']/yt-formatted-string[#class='style-scope ytd-video-primary-info-renderer']").text)
Ideally you need to induce WebDriverWait for the visibility_of_element_located() and you can use either of the following Locator Strategies:
Using CSS_SELECTOR and text attribute:
print(WebDriverWait(driver, 20).until(EC.visibility_of_element_located((By.CSS_SELECTOR, "h1.title.style-scope.ytd-video-primary-info-renderer > yt-formatted-string.style-scope.ytd-video-primary-info-renderer"))).text)
Using XPATH and get_attribute("innerHTML"):
print(WebDriverWait(driver, 20).until(EC.visibility_of_element_located((By.XPATH, "//h1[#class='title style-scope ytd-video-primary-info-renderer']/yt-formatted-string[#class='style-scope ytd-video-primary-info-renderer']"))).get_attribute("innerHTML"))
Note : You have to add the following imports :
from selenium.webdriver.support.ui import WebDriverWait
from selenium.webdriver.common.by import By
from selenium.webdriver.support import expected_conditions as EC
Console Output:
OBI-WAN KENOBI Official Trailer (2022) Teaser
You can find a relevant discussion in How to retrieve the text of a WebElement using Selenium - Python
References
Link to useful documentation:
get_attribute() method Gets the given attribute or property of the element.
text attribute returns The text of the element.
Difference between text and innerHTML using Selenium

Web scraping Tennis24 in play stats

I have been trying to work out how to scrape the live and updating statistics on Tennis 24 "https://www.tennis24.com/match/4xFaW6fP/#match-statistics;0" a page such as this but when I try to use selenium nothing is returned. even if I just try to return the 1 element such as
<div class="statText statText--awayValue">4</div>
Could someone please give me some pointers as this is my first scraping project?

To print the text 4 you need to induce WebDriverWait for the visibility_of_element_located() and you can use either of the following Locator Strategies:
Using XPATH and text attribute:
driver.get('https://www.tennis24.com/match/4xFaW6fP/#match-statistics;0')
print(WebDriverWait(driver, 20).until(EC.visibility_of_element_located((By.XPATH, "//div[#class='statText statText--titleValue' and text()='Aces']//following::div"))).text)
Using XPATH and get_attribute('innerHTML'):
driver.get('https://www.tennis24.com/match/4xFaW6fP/#match-statistics;0')
print(WebDriverWait(driver, 20).until(EC.visibility_of_element_located((By.XPATH, "//div[#class='statText statText--titleValue' and text()='Aces']//following::div"))).get_attribute('innerHTML'))
Note : You have to add the following imports :
from selenium.webdriver.support.ui import WebDriverWait
from selenium.webdriver.common.by import By
from selenium.webdriver.support import expected_conditions as EC

How to retrieve partial text from a text node using Selenium and Python

I want to get only " text ... " not using .split() or index slicing
HTML:
<a class="call_recipe" href="/recipes/2913">
" text ... "
<strong> something~ </strong>
</a>
HTML Snapshot:

To print text ... you have to induce WebDriverWait for the visibility_of_element_located() and you can use either of the following Locator Strategies:
Using CSS_SELECTOR, childNodes and strip():
print(driver.execute_script('return arguments[0].firstChild.textContent;', WebDriverWait(driver, 20).until(EC.visibility_of_element_located((By.CSS_SELECTOR, "a.call_recipe[href^='/recipes']")))).strip())
Using XPATH, get_attribute() and splitlines():
print(WebDriverWait(driver, 20).until(EC.visibility_of_element_located((By.XPATH, "//a[#class='call_recipe' and starts-with(#href, '/recipes')]"))).get_attribute("innerHTML").splitlines()[1])
Note : You have to add the following imports :
from selenium.webdriver.support.ui import WebDriverWait
from selenium.webdriver.common.by import By
from selenium.webdriver.support import expected_conditions as EC
References
You can find a couple of relevant detailed discussions in:
How to get specific text that belongs to div class
How to get text from textnodes seperated by whitespace using Selenium and Python

driver.find_element_by_class_name("call_recipe").text
I think this is what you're after.
How to get text with selenium web driver in python

you can utilize the
"find_element_by_class_name("some_text").getText()"
or to better match the text you can use
"driver.find_element_by_xpath("..").text"
Hope this is helpful

How to grab the price information from flight reservation site https://reservations.airarabia.com

I'm very new to python and trying to learn webscraping. Following a tutorial, I'm trying to extract a price from a website but nothing is being printed. What is wrong with my code?
from selenium import webdriver
chrome_path = r"C:\webdrivers\chromedriver.exe"
driver = webdriver.Chrome(chrome_path)
driver.get("https://reservations.airarabia.com/service-app/ibe/reservation.html#/fare/en/AED/AE/SHJ/KHI/07-09-2019/N/1/0/0/Y//N/N")
price = driver.find_elements_by_class_name("fare-and-services-flight-select-fare-value ng-isolate-scope")
for post in price:
print(post.text)

To print the first title you have to induce WebDriverWait for the desired visibility_of_element_located() and you can use either of the following Locator Strategies:
Using CSS_SELECTOR:
print(WebDriverWait(driver, 20).until(EC.visibility_of_element_located((By.CSS_SELECTOR, "isa-flight-select button:first-child span.fare-and-services-flight-select-fare-value.ng-isolate-scope"))).get_attribute("innerHTML"))
Using XPATH:
print(WebDriverWait(driver, 20).until(EC.visibility_of_element_located((By.XPATH, "//isa-flight-select//following::button[contains(#class, 'button')]//span[#class='fare-and-services-flight-select-fare-value ng-isolate-scope']"))).text)
Note : You have to add the following imports :
from selenium.webdriver.support.ui import WebDriverWait
from selenium.webdriver.common.by import By
from selenium.webdriver.support import expected_conditions as EC
Console Output of two back to back execution:
475
You can find a relevant discussion in How to retrieve the title attribute through Selenium using Python?
Outro
As per the documentation:
get_attribute() method Gets the given attribute or property of the element.
text attribute returns The text of the element.
Difference between text and innerHTML using Selenium

The first reason for that is because the webpage you are trying to scrape uses javascript to load the HTML so you will need to wait until that element is present to get it using selenium's WebDriverWait
The second reason is that the find_elements_by_class_name method only accepts one class so you would need to either use find_elements_by_css_selector or find_elements_by_xpath
this is how your code should look
from selenium import webdriver
from selenium.webdriver.support.wait import WebDriverWait
chrome_path = r"C:\webdrivers\chromedriver.exe"
driver = webdriver.Chrome(chrome_path)
driver.get("https://reservations.airarabia.com/service-app/ibe/reservation.html#/fare/en/AED/AE/SHJ/KHI/07-09-2019/N/1/0/0/Y//N/N")
price = WebDriverWait(driver, 10).until(
lambda x: x.find_elements_by_css_selector(".currency-value.fare-value.ng-scope.ng-isolate-scope"))
for post in price:
print(post.get_attribute("innerText"))

Python + Selenium: I can't get print text from div

Python + Selenium: I can't get print text from this div:
<div id="modal-content-18" class="modal-content" data-role="content">
<div>
SignUp Failed. Please Try Again.
</div>
</div>
I tried this:
resp = browser.find_element_by_class_name("modal-content").text
print resp
But it does not work.
Please help me.

I personally prefer xpaths because of cases like these. They can tackle many complex cases as well. Try the following:
resp = browser.find_element_by_xpath('//div[#class="modal-content"]/div').text
print resp
In case the element isn't visible on the screen. The text method will be none. In that case you need the textContent attribute. Use the following then:
resp = browser.find_element_by_xpath('//div[#class="modal-content"]/div').get_attribute("textContent")
print resp
Let me know if it works for you. Also make sure there is only one modal-content on the page. In case there are more than one, your css_selector is insufficient to identify this element. To check this you can run the following.
l = len(browser.find_elements_by_xpath('//div[#class="modal-content"]/div'))
print l
if it returns a number greater than 1, then the modal-content class alone isn't enough and you will need to expand on your selection criteria.

Induce WebDriverWait and visibility_of_element_located() and following locator strategy.
Using CLASS_NAME:
print(WebDriverWait(driver,20).until(EC.visibility_of_element_located((By.CLASS_NAME,"modal-content"))).text)
Using XPATH:
print(WebDriverWait(driver,20).until(EC.visibility_of_element_located((By.XPATH,"//div[#class='modal-content' and #data-role='content']"))).text)
You need to import followings.
from selenium.webdriver.common.by import By
from selenium.webdriver.support.ui import WebDriverWait
from selenium.webdriver.support import expected_conditions as EC
EDITED
Check the textContent attribute value.
print(WebDriverWait(driver,20).until(EC.visibility_of_element_located((By.CLASS_NAME,"modal-content"))).get_attribute("textContent"))
OR
print(WebDriverWait(driver,20).until(EC.visibility_of_element_located((By.XPATH,"//div[#class='modal-content' and #data-role='content']"))).get_attribute("textContent"))

The desired text SignUp Failed. Please Try Again. is within the child <div> so you have to induce WebDriverWait for the desired visibility_of_element_located() and you can use either of the following Locator Strategies:
Using CSS_SELECTOR:
print(WebDriverWait(driver, 20).until(EC.visibility_of_element_located((By.CSS_SELECTOR, "div.modal-content[id^='modal-content-'][data-role='content']>div"))).get_attribute("innerHTML"))
Using XPATH:
print(WebDriverWait(driver, 20).until(EC.visibility_of_element_located((By.XPATH, "//div[#class='modal-content' and starts-with(#id, 'modal-content-')][#data-role='content']/div"))).text)
Note : You have to add the following imports :
from selenium.webdriver.support.ui import WebDriverWait
from selenium.webdriver.common.by import By
from selenium.webdriver.support import expected_conditions as EC
You can find a relevant discussion in How to retrieve the title attribute through Selenium using Python?
Outro
As per the documentation:
get_attribute() method Gets the given attribute or property of the element.
text attribute returns The text of the element.
Difference between text and innerHTML using Selenium

Develop Reference

Python is a programming language that lets you work quickly and integrate systems more effectively.

python - selenium scraping rotten tomatoes for audience score - python

Try this: 1- Get element score-board 2- Get audiencescore attribute from element audiencescore = driver.find_element_by_css_selector('score-board').get_attribute('audiencescore')

Related

Get Youtube video title using classname and text attribute using Selenium and Python

Web scraping Tennis24 in play stats

How to retrieve partial text from a text node using Selenium and Python

How to grab the price information from flight reservation site https://reservations.airarabia.com

Python + Selenium: I can't get print text from div

Categories

Resources