Web-scraping using Selenium: Get current url after selecting dropdown menu - python

I am trying to scrape pricing information for clothes from Amazon. But I have to select the clothes size. After selecting the size needed, how do I keep track of the new URL? The following code is working and selecting the first value in the dropdown menu. But I just don't know how to keep track of the new url.
original url: https://www.amazon.ae/Jack-Jones-Glenn-Original-Pants/dp/B07JQ8MDGD/ref=sr_1_5?crid=M8QQKGLLZ1O9&keywords=jeans&qid=1657289288&sprefix=jeans%2Caps%2C232&sr=8-5&th=1
url after selecting size (the url I want to get):
https://www.amazon.ae/Jack-Jones-Glenn-Original-Pants/dp/B07JQBYC8J/ref=sr_1_5?crid=M8QQKGLLZ1O9&keywords=jeans&qid=1657289288&sprefix=jeans%2Caps%2C232&sr=8-5&th=1&psc=1
click here if you want to see the screenshot of the web-page
from selenium import webdriver
from webdriver_manager.chrome import ChromeDriverManager
from selenium.webdriver.support.ui import Select
url='https://www.amazon.ae/Jack-Jones-Glenn-Original-Pants/dp/B07JQB87KL/ref=sr_1_5?
crid=M8QQKGLLZ1O9&keywords=jeans&qid=1657289288&sprefix=jeans%2Caps%2C232&sr=8-
5&th=1'
driver = webdriver.Chrome(ChromeDriverManager().install())
driver.get(url)
select=Select(driver.find_element_by_id("native_dropdown_selected_size_name"))
select.select_by_index(2)
#driver.current_url: is returning the original url

Maybe selenium is moving on from the .select_by_index step to getting the URL before the site has a chance to change its own URL.
You might try Implicit Wait (based on time) :
driver.implicitly_wait(10) # force driver to wait 10 seconds
Or Explicit Wait (based on expected condition):
from selenium.webdriver.support.ui import WebDriverWait
from selenium.webdriver.support.ui import WebDriverWait
from selenium.webdriver.support import expected_conditions as EC
wait = WebDriverWait(driver, 10)
element = wait.until(EC.element_to_be_clickable((By.ID, 'someid')))
Your expected condition will depend on your use case.
I would try the implicit wait first, just to see if you can get the updated driver.current_url

Related

Handling "Accept all cookie" popup with selenium when selector is unknown

I have a python script, It look like this.
from selenium import webdriver
from selenium.webdriver.chrome.service import Service
from selenium.webdriver.chrome.options import Options
from selenium.webdriver.support.select import Select
from webdriver_manager.chrome import ChromeDriverManager
from selenium.webdriver.common.by import By
from selenium.webdriver.common.keys import Keys
from selenium.webdriver.support.ui import WebDriverWait
from selenium.webdriver.support import expected_conditions as EC
from os import path
import time
# Tried this code
chrome_options = webdriver.ChromeOptions()
prefs = {"profile.default_content_setting_values.notifications" : 2}
chrome_options.add_experimental_option("prefs",prefs)
browser = webdriver.Chrome(ChromeDriverManager().install(), chrome_options=chrome_options)
links = ["https://www.henleyglobal.com/", "https://markets.ft.com/data"]
for link in links:
browser.get(link)
#WebDriverWait(browser, 20).until(EC.url_changes(link))
#How do I disable/Ignore/remove/escape this "Accept all cookie" popup and then access the website to scrape data?
browser.quit()
So each website in the links array displays an "Accept all cookie" popup after navigating to the site. check the below image.
I have tried many ways nothing works, Check the one after imports
How do I exit/pass/escape this popup and then access the website to scrape data?
If you open your page in a new browser you'll note the page fully loads, then, a moment later your popup appears. The default wait strategy in selenium is just that the page is loaded.
One way to handle this is to simply inspect the page and find the xpath of the popup window. The below code should work for that.
browser.manage().timeouts().implicitlyWait(30, TimeUnit.SECONDS)
if link == 'https://www.henleyglobal.com/':
browser.findElement(By.XPATH("/html/body/div[7]/div/div/div/div[2]/div/div[2]/button[2]")).click()
else:
browser.findElement(By.XPATH("/html/body/div[4]/div/div/div[2]/div[2]/a")).click()
The code is waiting until the element of the pop-up is clickable and then clicking it.
For unknown sites you could try:
chrome_options = webdriver.ChromeOptions()
chrome_options.add_argument("--disable-notifications")
webdriver.Chrome(os.path.join(path, 'chromedriver'), chrome_options=chrome_options)
generally, you can not use some universal locator that will match the "Accept cookies" buttons for each and every web site in the world.
Even here, you have 2 different sites and the elements you need to click are totally different on these sites.
For https://www.henleyglobal.com/ site the correct locator may be something like this CSS Selector .confirmation button.primary-btn while for https://markets.ft.com/data site I'd advise to use CSS Selector .o-cookie-message__actions a.o-cookie-message__button.
These 2 elements are totally different: the first one is button while the second is a, they have totally different class names and all other attributes.
You may thing about the Accept text. It seems to be common, so you could use this XPath //*[contains(text(),'Accept')] but even this will not work since on the first page it matches 2 elements while the accept cookies element is the second between them...
So, there is no General locators, you will have to define separate locators for each page.
Again, for https://www.henleyglobal.com/ I would prefer
driver.find_element(By.CSS_SELECTOR, ".confirmation button.primary-btn").click()
While for the second page https://markets.ft.com/data I would prefer this
driver.find_element(By.CSS_SELECTOR, ".o-cookie-message__actions a.o-cookie-message__button").click()
Also, generally we always use WebDriverWait expected_conditions explicit waits, so the code will be as following:
from selenium.webdriver.support.ui import WebDriverWait
from selenium.webdriver.common.by import By
from selenium.webdriver.support import expected_conditions as EC
wait = WebDriverWait(driver, 10)
# for the first page
wait.until(EC.element_to_be_clickable((By.XPATH, ".confirmation button.primary-btn"))).click()
# for the second page
wait.until(EC.element_to_be_clickable((By.XPATH, ".o-cookie-message__actions a.o-cookie-message__button"))).click()

Can't select element from page [Selenium[

I have tried to scrap info from that site - specifically, from a table. Every time I occur, info that elements doesn't exist.
https://polygonscan.com/token/0x64a795562b02830ea4e43992e761c96d208fc58d
I try to add time.slep(5) to my code or scrolling down function to load all element - ineffective.
Do you have any advice for me?
EDIT
from selenium import webdriver
from selenium.webdriver.chrome.options import Options
from selenium.webdriver.common.by import By
from selenium.webdriver.support.ui import WebDriverWait
from selenium.webdriver.support import expected_conditions as EC
from selenium.common.exceptions import TimeoutException
# Options
chrome_options = Options()
chrome_options.add_argument("--headless")
# Set drive
chrome_driver_path = r"C:\Users\kacpe\OneDrive\Pulpit\Python\Projekty\chromedriver.exe"
driver = webdriver.Chrome(chrome_driver_path, options=chrome_options)
driver.get("https://polygonscan.com/token/0x64a795562b02830ea4e43992e761c96d208fc58d")
try:
element = WebDriverWait(driver, 10).until(
EC.presence_of_element_located((By.XPATH, "//table/tbody/tr[0]")))
print(element)
except TimeoutException as e:
print(e)
I added code in regard to your request. So my main goal is to scrap content from the table at this site. I add Explicit Waits to my code and still I can't select anything from that table - it's looking like the script doesn't see anything from that area.
One way to try solve it, its using the Xpath of the element or the relative position of the same, to make Selenium, get allways the same "line" of position to return the "value" of the information that you are searching.
Ex1:find_element(:xpath,"//*[#id="wmd-input"]")#in that case it's the input of this check box.
If it doesnt work, try this one.
Ex2: browser.implicitly_wait(30) #makes a timer to load all the informations from the web to your machine.

Wait for class to load value after clicking button selenium python

After the website is loaded I click a button successfully which will then generate some numbers in this class
<div class="styles__Value-sc-1bfbyy7-2 eVmhyz"></div>
but not instantly, it will put them in one by one. Selenium will instantly grab the first value that gets put into the class but doesn't wait for the other values to get added. Any way to wait for it to load all the values in there before grabbing it.
Here is the python code I use for grabbing the value:
total = driver.find_element_by_xpath("//div[#class='styles__Value-sc-1bfbyy7-2 eVmhyz']").text
Selenium has a WebDriverWait method:
from selenium import webdriver
from selenium.webdriver.support.ui import WebDriverWait
from selenium.webdriver.support import expected_conditions
browser = webdriver.Chrome()
delay = 5
total = WebDriverWait(browser, delay).until(expected_conditions.presence_of_element_located(<locator>)
I haven't tested it locally but it may work. There is also presence_of_all_elements_located method, you can find the details on this page.
Hope this helps!

Python Selenium Webdriver doesn't refresh html after changing dropdown value in AJAX pages

I'm trying to scrape an AJAX webpage using Python and Selenium. The problem is, when I change the dropdown value, the page content changes according to my selection, but the selenium returns the same old html code from the page. I'd appreciate if anyone can help. Here is my code:
from selenium import webdriver
from selenium.webdriver.support.ui import Select
import time
url = "https://myurl.com/PATH"
driver = webdriver.Chrome()
driver.get(url)
time.sleep(5)
# change the dropdown value
sprintSelect = Select(driver.find_element_by_id("dropdown-select"))
sprintSelect.select_by_visible_text("DropDown_Value2")
html = driver.execute_script("return document.documentElement.outerHTML")
print(html)
You need to wait for the ajax to load the website after your selection.
Try to put implicit or explicit wait after selection.
driver.implicitly_wait(10) # 10 seconds
or if you know the tag/id etc. of the web element you want, try the explicit
from selenium.webdriver.common.by import By
from selenium.webdriver.support.ui import WebDriverWait
from selenium.webdriver.support import expected_conditions as EC
element = WebDriverWait(driver, 10).until(
EC.presence_of_element_located((By.ID, "some_ID"))

Retrieve updated url in selenium using python

How to get the updated url from firefox browser using selenium and Python? The below code is a very good working example of what I am trying to do. The script opens up a url, looks for the search bar in the webpage, pastes a particular product and then executes the search.
I am trying to extract the updated url after the search is completed which should be https://www.myntra.com/avene-unisex-thermal-spring-water-50-ml but I am getting https://www.myntra.com/. How can I get the required url?
from selenium import webdriver
from selenium.webdriver.common.keys import Keys
firefoxOptions = webdriver.FirefoxOptions()
firefoxOptions.set_preference("dom.webnotifications.enabled", False)
driver = webdriver.Firefox(firefox_options = firefoxOptions)
driver.implicitly_wait(5)
# Maximize the browser window
driver.maximize_window()
# navigate to the home page
driver.get("https://www.myntra.com/")
# Locate the text field to update values
text_field = driver.find_element_by_class_name("desktop-searchBar")
# Clears any value already present in text field
text_field.clear()
# Updates the string in search bar
text_field.send_keys("Avene Unisex Thermal Spring Water 50 ml")
text_field.send_keys(Keys.ENTER)
new_page = driver.current_url
print(new_page)
driver.close()
Seems you were pretty close. You need to induce WebDriverWait for the url to get changed and you can use the following solution:
Code Block:
from selenium import webdriver
from selenium.webdriver.support.ui import WebDriverWait
from selenium.webdriver.common.by import By
from selenium.webdriver.support import expected_conditions as EC
driver=webdriver.Firefox(executable_path=r'C:\Utility\BrowserDrivers\geckodriver.exe')
driver.get("https://www.myntra.com/")
WebDriverWait(driver, 10).until(EC.element_to_be_clickable((By.CSS_SELECTOR, "input.desktop-searchBar"))).send_keys("Avene Unisex Thermal Spring Water 50 ml")
driver.find_element_by_css_selector("a.desktop-submit").click()
WebDriverWait(driver, 20).until(EC.visibility_of_element_located((By.CSS_SELECTOR, "h1.title-title")))
print(driver.current_url)
driver.quit()
Console Output:
https://www.myntra.com/avene-unisex-thermal-spring-water-50-ml
You need to wait after text_field.send_keys(Keys.ENTER) to get information updated,
if you don't want waits then try to use click() method instead of send_keys

Categories

Resources