Scrape amazon url image/picture in python with selenium

Scrape amazon url image/picture in python with selenium - python

I just need help for scrape Amazon url of image/picture on product page (first image, big size in screen), in python with selenium.
For example, this product:
https://www.amazon.fr/dp/B07CG3HFPV/ref=cm_sw_r_fm_api_glt_i_2RB9QBPTQXWJ7PQQ16MZ?_encoding=UTF8&psc=1
Here is the part of source code web page:
I need to scrape url image with tag "src".
Anyone know how to scrape this please?
Actually, I have this script part, but don't work:
url = https://www.amazon.fr/dp/B07CG3HFPV/ref=cm_sw_r_fm_api_glt_i_2RB9QBPTQXWJ7PQQ16MZ?_encoding=UTF8&psc=1
options = Options()
options.headless = True
driver = webdriver.Chrome(options=options)
driver.get(url)
import time
time.sleep(2)
actions = ActionChains(driver)
link_img = driver.find_element_by_tag_name("img").get_attribute("src")
Thanks for help

To scrape the amazon url of image/picture on product page (first image, big size in screen), in python with selenium you need to induce WebDriverWait for the visibility_of_element_located() and you can use either of the following Locator Strategies:
Using CSS_SELECTOR:
print(WebDriverWait(driver, 20).until(EC.visibility_of_element_located((By.CSS_SELECTOR, "span.a-list-item>span.a-declarative>div.imgTagWrapper>img.a-dynamic-image"))).get_attribute("src"))
Using XPATH:
print(WebDriverWait(browser, 20).until(EC.visibility_of_element_located((By.XPATH, "//span[#class='a-list-item']/span[#class='a-declarative']/div[#class='imgTagWrapper']/img[#class='a-dynamic-image']"))).get_attribute("src"))
Note: You have to add the following imports :
from selenium.webdriver.support.ui import WebDriverWait
from selenium.webdriver.common.by import By
from selenium.webdriver.support import expected_conditions as EC

Related

Trouble Scraping a loading score

I'm trying to scrape a score from a page. But i honestly can't get myself on the path of getting anywhere close. it's sandwich between a ::before after::. Googling that has led me to probably needing selenium? I've tried Beautiful Soup and Selenium but not getting anywhere. Below is the best(it didn't return an error) that i've gotten. But didn't return anything i can understand. [<selenium.webdriver.remote.webelement.WebElement (session="d902be37b19adc23f00bcaa20ecfc885", element="4064ab3c-7da4-4223-b8bb-c2fbb6590cbe")>]
from selenium import webdriver
from selenium.webdriver.common.by import By
PATH = "C:\Program Files (x86)\chromedriver.exe"
driver = webdriver.Chrome(PATH)
URL = "https://coolcatsnft.com/user/0x10eb84abd429fa4df8dcabbc7c2803822a5b82d9"
driver.get(URL)
search = driver.find_elements(By.CLASS_NAME,"sc-8ce97ff6-2.cgjlQk")
print(search)
driver.quit()

You can use the site's API to grab it using just the requests library. You just need to take the unique identifier from the end of the page url and append it to the API and then use json to extract the score.
import requests
import math
api = "https://prod.journey.coolcatsnft.com/v1/score/get/"
page = "https://coolcatsnft.com/user/0x10eb84abd429fa4df8dcabbc7c2803822a5b82d9"
request_url = api + page.split("/")[-1]
resp = requests.get(request_url)
data = resp.json()
score = data["result"]["overAllScore"]
print(score)
print(math.ceil(score))
output
115.5
116

You were close enough.
[<selenium.webdriver.remote.webelement.WebElement (session="d902be37b19adc23f00bcaa20ecfc885", element="4064ab3c-7da4-4223-b8bb-c2fbb6590cbe")>]
indicates the WebElement itself, where as you need to extract the text.
Solution
To extract the text 116 you need to induce WebDriverWait for the visibility_of_element_located() and you can use either of the following locator strategies:
Using CSS_SELECTOR:
driver.get('https://coolcatsnft.com/user/0x10eb84abd429fa4df8dcabbc7c2803822a5b82d9')
WebDriverWait(driver, 20).until(EC.element_to_be_clickable((By.CSS_SELECTOR, "button[aria-label='Accept cookies']"))).click()
time.sleep(15)
print(WebDriverWait(driver, 20).until(EC.visibility_of_element_located((By.CSS_SELECTOR, ".sc-8ce97ff6-2.cgjlQk"))).text)
Using XPATH:
driver.get('https://coolcatsnft.com/user/0x10eb84abd429fa4df8dcabbc7c2803822a5b82d9')
WebDriverWait(driver, 20).until(EC.element_to_be_clickable((By.CSS_SELECTOR, "button[aria-label='Accept cookies']"))).click()
time.sleep(15)
print(WebDriverWait(driver, 20).until(EC.visibility_of_element_located((By.XPATH, "//div[#class='sc-8ce97ff6-2 cgjlQk' and text()]"))).text)
Note : You have to add the following imports :
from selenium.webdriver.support.ui import WebDriverWait
from selenium.webdriver.common.by import By
from selenium.webdriver.support import expected_conditions as EC
You can find a relevant discussion in How to retrieve the text of a WebElement using Selenium - Python

Python Selenium click login [duplicate]

I'm having trouble performing a click operation on a website. I'm getting a error message NoSuchElementException, but I'm not sure why because I got the class name from the site.
What am I missing?
from selenium import webdriver
from selenium.webdriver.common.by import By
from selenium.webdriver.chrome.options import Options
from selenium.webdriver.chrome.service import Service
s = Service('C:/Program Files (x86)/chromedriver.exe')
chromeOptions = Options()
chromeOptions.headless = False
driver = webdriver.Chrome(service=s, options=chromeOptions)
list_data = []
def initialize_browser():
driver.get("https://virtualracingschool.appspot.com/#/Home")
print("starting_Driver")
click_button = driver.find_element(By.CLASS_NAME, "white-text")
driver.implicitly_wait(15)
click_button.click()
initialize_browser()
Site & Code:
I tried referencing some documents from the selenium site and it mentions for a format
<p class="content">Site content goes here.</p>`
write the code:
content = driver.find_element(By.CLASS_NAME, 'content')`
I felt like I did this properly but my site has
<a class="white-text" style="" ...>
<span>Login</span>
</a>
format. Is the <a> and "style" element hindering my code?

Use the below XPath expression:
//span[text()='Login']
Your code should look like this:
click_button = driver.find_element(By.XPATH, "//span[text()='Login']")
Below is the inspect element by this XPath for your reference:

As per your code trials:
click_button = driver.find_element(By.CLASS_NAME, "white-text")
By.CLASS_NAME, "white-text" identifies four elements within the HTML DOM:
Hence you see the see the error.
Solution
To click on the clickable element you need to induce WebDriverWait for the element_to_be_clickable() and you can use either of the following locator strategies:
Using CSS_SELECTOR:
driver.get('https://virtualracingschool.appspot.com/#/Home')
WebDriverWait(driver, 20).until(EC.element_to_be_clickable((By.CSS_SELECTOR, "a.white-text > span"))).click()
Using XPATH:
driver.get('https://virtualracingschool.appspot.com/#/Home')
WebDriverWait(driver, 20).until(EC.element_to_be_clickable((By.XPATH, "//a[#class='white-text']//span"))).click()
Note: You have to add the following imports :
from selenium.webdriver.support.ui import WebDriverWait
from selenium.webdriver.common.by import By
from selenium.webdriver.support import expected_conditions as EC
Browser snapshot:

Click on login button using Selenium

I'm having trouble performing a click operation on a website. I'm getting a error message NoSuchElementException, but I'm not sure why because I got the class name from the site.
What am I missing?
from selenium import webdriver
from selenium.webdriver.common.by import By
from selenium.webdriver.chrome.options import Options
from selenium.webdriver.chrome.service import Service
s = Service('C:/Program Files (x86)/chromedriver.exe')
chromeOptions = Options()
chromeOptions.headless = False
driver = webdriver.Chrome(service=s, options=chromeOptions)
list_data = []
def initialize_browser():
driver.get("https://virtualracingschool.appspot.com/#/Home")
print("starting_Driver")
click_button = driver.find_element(By.CLASS_NAME, "white-text")
driver.implicitly_wait(15)
click_button.click()
initialize_browser()
Site & Code:
I tried referencing some documents from the selenium site and it mentions for a format
<p class="content">Site content goes here.</p>`
write the code:
content = driver.find_element(By.CLASS_NAME, 'content')`
I felt like I did this properly but my site has
<a class="white-text" style="" ...>
<span>Login</span>
</a>
format. Is the <a> and "style" element hindering my code?

Use the below XPath expression:
//span[text()='Login']
Your code should look like this:
click_button = driver.find_element(By.XPATH, "//span[text()='Login']")
Below is the inspect element by this XPath for your reference:

As per your code trials:
click_button = driver.find_element(By.CLASS_NAME, "white-text")
By.CLASS_NAME, "white-text" identifies four elements within the HTML DOM:
Hence you see the see the error.
Solution
To click on the clickable element you need to induce WebDriverWait for the element_to_be_clickable() and you can use either of the following locator strategies:
Using CSS_SELECTOR:
driver.get('https://virtualracingschool.appspot.com/#/Home')
WebDriverWait(driver, 20).until(EC.element_to_be_clickable((By.CSS_SELECTOR, "a.white-text > span"))).click()
Using XPATH:
driver.get('https://virtualracingschool.appspot.com/#/Home')
WebDriverWait(driver, 20).until(EC.element_to_be_clickable((By.XPATH, "//a[#class='white-text']//span"))).click()
Note: You have to add the following imports :
from selenium.webdriver.support.ui import WebDriverWait
from selenium.webdriver.common.by import By
from selenium.webdriver.support import expected_conditions as EC
Browser snapshot:

Anomaly (cannot find element) in only one out of several similar tripadvisor pages

Note: This code requires human intervention in between as it is incomplete, and should thus only be run with Jupyter. I am trying to get the last page number of a tripadvisor webpage.
The "Malaysia" and "Switzerland" webpages works fine (urls commented out below) but not the "Hong Kong" one.
from selenium import webdriver #for navigating through the pages
driver = webdriver.Chrome(executable_path=r'C:\\Users\\user\\Downloads\\chromedriver.exe')
url = "https://www.tripadvisor.com.sg/Hotels-g294217-Hong_Kong-Hotels.html"
#url = "https://www.tripadvisor.com.sg/Hotels-g293951-Malaysia-Hotels.html"
#url = "https://www.tripadvisor.com.sg/Hotels-g188045-Switzerland-Hotels.html"
driver.get(url)
driver.implicitly_wait(5)
Human intervention here: Now click on some arbitrary "Check in date", "Check out date" and then click "Update"
last_page_s = driver.find_element_by_css_selector("span.pageNum.last").get_attribute('data-page-number')
last_page = int(last_page_s)
print(last_page)
I'm still a newbie with webscraping so any help is greatly appreciated!!

To print the last_page number you have to induce WebDriverWait for the visibility_of_element_located() and you can use either of the following Locator Strategies:
Using CSS_SELECTOR:
print(WebDriverWait(driver, 20).until(EC.visibility_of_element_located((By.CSS_SELECTOR, "span.separator.cx_brand_refresh_phase2 +a"))).get_attribute('data-page-number'))
Using XPATH`:
print(WebDriverWait(driver, 20).until(EC.visibility_of_element_located((By.XPATH, "//span[contains(#class, 'separator')]//following::a"))).get_attribute('data-page-number'))
Note : You have to add the following imports :
from selenium.webdriver.support.ui import WebDriverWait
from selenium.webdriver.common.by import By
from selenium.webdriver.support import expected_conditions as EC

How to click on the Sign in element with GitHub page https://github.com/ using Selenium and Python

I'm trying to use Selenium for python to click on the sign in link at the top of github. I've tried using find_element_by_link_text() but i get a NoSuchElementException. I then tried using find_element_by_xpath() and I got an ElementNotinteractableException. Here is the code for the first:
from selenium import webdriver
browser = webdriver.Chrome()
browser.get('https://github.com')
signin = browser.find_element_by_link_text('Sign in')
signin.click()
and here's the code for the second.
from selenium import webdriver
browser = webdriver.Chrome()
browser.get('https://github.com')
signin_link = browser.find_element_by_xpath('/html/body/div[1]/header/div/div[2]/div[2]/a[1]')
signin_link.click()
I even tried find_element_by_css_selector() but also got an ElementNotInteractableException. I don't understand what's going wrong. I don't feel like putting in the html, but if you go to github, it's just the sign in link at the very top right.

I think you are missing out to pass chromedriver path. Try this:
browser = webdriver.Chrome(r"C:\Users\...\chromedriver.exe")
Also, If you want to go to the login page, then I would recommend avoiding the long way root. What I mean is, the following code should directly take you to the login page:
browser.get('https://github.com/login')
However, if you must know, how else you can click that element, try looping over "href" elements:
for el in browser.find_elements_by_tag_name("a"):
if "/login" in el.get_attribute('href'):
el.click()

To handle dynamic element induce WebDriverWait() and wait for element_to_be_clickable() and use following locator strtegies.
LINK_TEXT:
from selenium import webdriver
from selenium.webdriver.common.by import By
from selenium.webdriver.support.ui import WebDriverWait
from selenium.webdriver.support import expected_conditions
browser = webdriver.Chrome()
browser.get('https://github.com')
signin =WebDriverWait(browser,10).until(expected_conditions.element_to_be_clickable((By.LINK_TEXT,"Sign in")))
signin.click()
XPATH:
browser = webdriver.Chrome()
browser.get('https://github.com')
signin =WebDriverWait(browser,10).until(expected_conditions.element_to_be_clickable((By.XPATH,"//a[#href='/login']")))
signin.click()
CSS Selector:
browser = webdriver.Chrome()
browser.get('https://github.com')
signin =WebDriverWait(browser,10).until(expected_conditions.element_to_be_clickable((By.CSS_SELECTOR,"a[href='/login']")))
signin.click()

To click() on the Sign in element at the top right corner of GitHub page https://github.com/ using Selenium, you need to induce WebDriverWait for the element_to_be_clickable() and you can use either of the following Locator Strategies:
Using PARTIAL_LINK_TEXT:
WebDriverWait(browser, 20).until(EC.element_to_be_clickable((By.PARTIAL_LINK_TEXT, "Sign"))).click()
Using CSS_SELECTOR:
WebDriverWait(browser, 20).until(EC.element_to_be_clickable((By.CSS_SELECTOR, "a[href='/login']"))).click()
Using XPATH:
WebDriverWait(browser, 20).until(EC.element_to_be_clickable((By.XPATH, "//a[starts-with(., 'Sign')]"))).click()
Note : You have to add the following imports :
from selenium.webdriver.support.ui import WebDriverWait
from selenium.webdriver.common.by import By
from selenium.webdriver.support import expected_conditions as EC

Develop Reference

Python is a programming language that lets you work quickly and integrate systems more effectively.

Scrape amazon url image/picture in python with selenium - python

Related

Trouble Scraping a loading score

Python Selenium click login [duplicate]

Click on login button using Selenium

Anomaly (cannot find element) in only one out of several similar tripadvisor pages

How to click on the Sign in element with GitHub page https://github.com/ using Selenium and Python

Categories

Resources