Trouble Scraping a loading score - python

I'm trying to scrape a score from a page. But i honestly can't get myself on the path of getting anywhere close. it's sandwich between a ::before after::. Googling that has led me to probably needing selenium? I've tried Beautiful Soup and Selenium but not getting anywhere. Below is the best(it didn't return an error) that i've gotten. But didn't return anything i can understand. [<selenium.webdriver.remote.webelement.WebElement (session="d902be37b19adc23f00bcaa20ecfc885", element="4064ab3c-7da4-4223-b8bb-c2fbb6590cbe")>]
from selenium import webdriver
from selenium.webdriver.common.by import By
PATH = "C:\Program Files (x86)\chromedriver.exe"
driver = webdriver.Chrome(PATH)
URL = "https://coolcatsnft.com/user/0x10eb84abd429fa4df8dcabbc7c2803822a5b82d9"
driver.get(URL)
search = driver.find_elements(By.CLASS_NAME,"sc-8ce97ff6-2.cgjlQk")
print(search)
driver.quit()

You can use the site's API to grab it using just the requests library. You just need to take the unique identifier from the end of the page url and append it to the API and then use json to extract the score.
import requests
import math
api = "https://prod.journey.coolcatsnft.com/v1/score/get/"
page = "https://coolcatsnft.com/user/0x10eb84abd429fa4df8dcabbc7c2803822a5b82d9"
request_url = api + page.split("/")[-1]
resp = requests.get(request_url)
data = resp.json()
score = data["result"]["overAllScore"]
print(score)
print(math.ceil(score))
output
115.5
116

You were close enough.
[<selenium.webdriver.remote.webelement.WebElement (session="d902be37b19adc23f00bcaa20ecfc885", element="4064ab3c-7da4-4223-b8bb-c2fbb6590cbe")>]
indicates the WebElement itself, where as you need to extract the text.
Solution
To extract the text 116 you need to induce WebDriverWait for the visibility_of_element_located() and you can use either of the following locator strategies:
Using CSS_SELECTOR:
driver.get('https://coolcatsnft.com/user/0x10eb84abd429fa4df8dcabbc7c2803822a5b82d9')
WebDriverWait(driver, 20).until(EC.element_to_be_clickable((By.CSS_SELECTOR, "button[aria-label='Accept cookies']"))).click()
time.sleep(15)
print(WebDriverWait(driver, 20).until(EC.visibility_of_element_located((By.CSS_SELECTOR, ".sc-8ce97ff6-2.cgjlQk"))).text)
Using XPATH:
driver.get('https://coolcatsnft.com/user/0x10eb84abd429fa4df8dcabbc7c2803822a5b82d9')
WebDriverWait(driver, 20).until(EC.element_to_be_clickable((By.CSS_SELECTOR, "button[aria-label='Accept cookies']"))).click()
time.sleep(15)
print(WebDriverWait(driver, 20).until(EC.visibility_of_element_located((By.XPATH, "//div[#class='sc-8ce97ff6-2 cgjlQk' and text()]"))).text)
Note : You have to add the following imports :
from selenium.webdriver.support.ui import WebDriverWait
from selenium.webdriver.common.by import By
from selenium.webdriver.support import expected_conditions as EC
You can find a relevant discussion in How to retrieve the text of a WebElement using Selenium - Python

Related

Python Selenium click login [duplicate]

I'm having trouble performing a click operation on a website. I'm getting a error message NoSuchElementException, but I'm not sure why because I got the class name from the site.
What am I missing?
from selenium import webdriver
from selenium.webdriver.common.by import By
from selenium.webdriver.chrome.options import Options
from selenium.webdriver.chrome.service import Service
s = Service('C:/Program Files (x86)/chromedriver.exe')
chromeOptions = Options()
chromeOptions.headless = False
driver = webdriver.Chrome(service=s, options=chromeOptions)
list_data = []
def initialize_browser():
driver.get("https://virtualracingschool.appspot.com/#/Home")
print("starting_Driver")
click_button = driver.find_element(By.CLASS_NAME, "white-text")
driver.implicitly_wait(15)
click_button.click()
initialize_browser()
Site & Code:
I tried referencing some documents from the selenium site and it mentions for a format
<p class="content">Site content goes here.</p>`
write the code:
content = driver.find_element(By.CLASS_NAME, 'content')`
I felt like I did this properly but my site has
<a class="white-text" style="" ...>
<span>Login</span>
</a>
format. Is the <a> and "style" element hindering my code?
Use the below XPath expression:
//span[text()='Login']
Your code should look like this:
click_button = driver.find_element(By.XPATH, "//span[text()='Login']")
Below is the inspect element by this XPath for your reference:
As per your code trials:
click_button = driver.find_element(By.CLASS_NAME, "white-text")
By.CLASS_NAME, "white-text" identifies four elements within the HTML DOM:
Hence you see the see the error.
Solution
To click on the clickable element you need to induce WebDriverWait for the element_to_be_clickable() and you can use either of the following locator strategies:
Using CSS_SELECTOR:
driver.get('https://virtualracingschool.appspot.com/#/Home')
WebDriverWait(driver, 20).until(EC.element_to_be_clickable((By.CSS_SELECTOR, "a.white-text > span"))).click()
Using XPATH:
driver.get('https://virtualracingschool.appspot.com/#/Home')
WebDriverWait(driver, 20).until(EC.element_to_be_clickable((By.XPATH, "//a[#class='white-text']//span"))).click()
Note: You have to add the following imports :
from selenium.webdriver.support.ui import WebDriverWait
from selenium.webdriver.common.by import By
from selenium.webdriver.support import expected_conditions as EC
Browser snapshot:

Dynamic Web scraping using Selenium

I am trying to scrape the tables from the below dynamic webpage. I am using the below code to find the data in tables (they are under tag name tr). But I am getting empty list as output. Is there anything that I am missing here?
https://www.taipower.com.tw/tc/page.aspx?mid=206&cid=406&cchk=b6134cc6-838c-4bb9-b77a-0b0094afd49d
from selenium import webdriver
chrome_path = r"C:\Users\upko\Downloads\My Projects\Ibrahim Projects\chromedriver.exe"
driver = webdriver.Chrome(chrome_path)
driver.get("https://www.taipower.com.tw/tc/page.aspx?mid=206&cid=406&cchk=b6134cc6-838c-4bb9-b77a-0b0094afd49d")
driver.find_elements_by_tag_name('tr')
Website have iframes, you need switch into desired iframe to access data. Didnt tested code, but should work
iframe = driver.find_element_by_xpath("//iframe[#id='IframeId']")
driver.switch_to_frame(iframe)
#Now you can get data
trs = driver.find_elements_by_tag_name('tr')
The desired elements are within an <iframe> so you have to:
Induce WebDriverWait for the desired frame to be available and switch to it.
Induce WebDriverWait for the desired visibility_of_all_elements_located.
You can use either of the following Locator Strategies:
Using XPATH:
driver.get("https://www.taipower.com.tw/tc/page.aspx?mid=206&cid=406&cchk=b6134cc6-838c-4bb9-b77a-0b0094afd49d")
WebDriverWait(driver, 20).until(EC.frame_to_be_available_and_switch_to_it((By.CSS_SELECTOR,"//iframe[#id='IframeId']")))
print([my_elem.text for my_elem in WebDriverWait(driver, 20).until(EC.visibility_of_all_elements_located((By.XPATH, "//div[#class='container-fluid']//div[#class='span6']/strong")))])
Note : You have to add the following imports :
from selenium.webdriver.support.ui import WebDriverWait
from selenium.webdriver.common.by import By
from selenium.webdriver.support import expected_conditions as EC
Console Output:
['核能(Nuclear)', '燃煤(Coal)', '汽電共生(Co-Gen)', '民營電廠-燃煤(IPP-Coal)', '燃氣(LNG)', '民營電廠-燃氣(IPP-LNG)', ' 燃油(Oil)', '輕油(Diesel)', '水力(Hydro)', '風力(Wind)', '太陽能(Solar)', '抽蓄發電(Pumping Gen)']
Reference
You can find a couple of relevant discussions in:
Switch to an iframe through Selenium and python

Scrape amazon url image/picture in python with selenium

I just need help for scrape Amazon url of image/picture on product page (first image, big size in screen), in python with selenium.
For example, this product:
https://www.amazon.fr/dp/B07CG3HFPV/ref=cm_sw_r_fm_api_glt_i_2RB9QBPTQXWJ7PQQ16MZ?_encoding=UTF8&psc=1
Here is the part of source code web page:
I need to scrape url image with tag "src".
Anyone know how to scrape this please?
Actually, I have this script part, but don't work:
url = https://www.amazon.fr/dp/B07CG3HFPV/ref=cm_sw_r_fm_api_glt_i_2RB9QBPTQXWJ7PQQ16MZ?_encoding=UTF8&psc=1
options = Options()
options.headless = True
driver = webdriver.Chrome(options=options)
driver.get(url)
import time
time.sleep(2)
actions = ActionChains(driver)
link_img = driver.find_element_by_tag_name("img").get_attribute("src")
Thanks for help
To scrape the amazon url of image/picture on product page (first image, big size in screen), in python with selenium you need to induce WebDriverWait for the visibility_of_element_located() and you can use either of the following Locator Strategies:
Using CSS_SELECTOR:
print(WebDriverWait(driver, 20).until(EC.visibility_of_element_located((By.CSS_SELECTOR, "span.a-list-item>span.a-declarative>div.imgTagWrapper>img.a-dynamic-image"))).get_attribute("src"))
Using XPATH:
print(WebDriverWait(browser, 20).until(EC.visibility_of_element_located((By.XPATH, "//span[#class='a-list-item']/span[#class='a-declarative']/div[#class='imgTagWrapper']/img[#class='a-dynamic-image']"))).get_attribute("src"))
Note: You have to add the following imports :
from selenium.webdriver.support.ui import WebDriverWait
from selenium.webdriver.common.by import By
from selenium.webdriver.support import expected_conditions as EC

Anomaly (cannot find element) in only one out of several similar tripadvisor pages

Note: This code requires human intervention in between as it is incomplete, and should thus only be run with Jupyter. I am trying to get the last page number of a tripadvisor webpage.
The "Malaysia" and "Switzerland" webpages works fine (urls commented out below) but not the "Hong Kong" one.
from selenium import webdriver #for navigating through the pages
driver = webdriver.Chrome(executable_path=r'C:\\Users\\user\\Downloads\\chromedriver.exe')
url = "https://www.tripadvisor.com.sg/Hotels-g294217-Hong_Kong-Hotels.html"
#url = "https://www.tripadvisor.com.sg/Hotels-g293951-Malaysia-Hotels.html"
#url = "https://www.tripadvisor.com.sg/Hotels-g188045-Switzerland-Hotels.html"
driver.get(url)
driver.implicitly_wait(5)
Human intervention here: Now click on some arbitrary "Check in date", "Check out date" and then click "Update"
last_page_s = driver.find_element_by_css_selector("span.pageNum.last").get_attribute('data-page-number')
last_page = int(last_page_s)
print(last_page)
I'm still a newbie with webscraping so any help is greatly appreciated!!
To print the last_page number you have to induce WebDriverWait for the visibility_of_element_located() and you can use either of the following Locator Strategies:
Using CSS_SELECTOR:
print(WebDriverWait(driver, 20).until(EC.visibility_of_element_located((By.CSS_SELECTOR, "span.separator.cx_brand_refresh_phase2 +a"))).get_attribute('data-page-number'))
Using XPATH`:
print(WebDriverWait(driver, 20).until(EC.visibility_of_element_located((By.XPATH, "//span[contains(#class, 'separator')]//following::a"))).get_attribute('data-page-number'))
Note : You have to add the following imports :
from selenium.webdriver.support.ui import WebDriverWait
from selenium.webdriver.common.by import By
from selenium.webdriver.support import expected_conditions as EC

How to grab the price information from flight reservation site https://reservations.airarabia.com

I'm very new to python and trying to learn webscraping. Following a tutorial, I'm trying to extract a price from a website but nothing is being printed. What is wrong with my code?
from selenium import webdriver
chrome_path = r"C:\webdrivers\chromedriver.exe"
driver = webdriver.Chrome(chrome_path)
driver.get("https://reservations.airarabia.com/service-app/ibe/reservation.html#/fare/en/AED/AE/SHJ/KHI/07-09-2019/N/1/0/0/Y//N/N")
price = driver.find_elements_by_class_name("fare-and-services-flight-select-fare-value ng-isolate-scope")
for post in price:
print(post.text)
To print the first title you have to induce WebDriverWait for the desired visibility_of_element_located() and you can use either of the following Locator Strategies:
Using CSS_SELECTOR:
print(WebDriverWait(driver, 20).until(EC.visibility_of_element_located((By.CSS_SELECTOR, "isa-flight-select button:first-child span.fare-and-services-flight-select-fare-value.ng-isolate-scope"))).get_attribute("innerHTML"))
Using XPATH:
print(WebDriverWait(driver, 20).until(EC.visibility_of_element_located((By.XPATH, "//isa-flight-select//following::button[contains(#class, 'button')]//span[#class='fare-and-services-flight-select-fare-value ng-isolate-scope']"))).text)
Note : You have to add the following imports :
from selenium.webdriver.support.ui import WebDriverWait
from selenium.webdriver.common.by import By
from selenium.webdriver.support import expected_conditions as EC
Console Output of two back to back execution:
475
You can find a relevant discussion in How to retrieve the title attribute through Selenium using Python?
Outro
As per the documentation:
get_attribute() method Gets the given attribute or property of the element.
text attribute returns The text of the element.
Difference between text and innerHTML using Selenium
The first reason for that is because the webpage you are trying to scrape uses javascript to load the HTML so you will need to wait until that element is present to get it using selenium's WebDriverWait
The second reason is that the find_elements_by_class_name method only accepts one class so you would need to either use find_elements_by_css_selector or find_elements_by_xpath
this is how your code should look
from selenium import webdriver
from selenium.webdriver.support.wait import WebDriverWait
chrome_path = r"C:\webdrivers\chromedriver.exe"
driver = webdriver.Chrome(chrome_path)
driver.get("https://reservations.airarabia.com/service-app/ibe/reservation.html#/fare/en/AED/AE/SHJ/KHI/07-09-2019/N/1/0/0/Y//N/N")
price = WebDriverWait(driver, 10).until(
lambda x: x.find_elements_by_css_selector(".currency-value.fare-value.ng-scope.ng-isolate-scope"))
for post in price:
print(post.get_attribute("innerText"))

Categories

Resources