Web scraping- Accept Cookies- SELENIUM - PYTHON - AIRBNB - python

I am trying to accept the cookies in airbnb home page. But I can't find the "key" to get it.
Find below my code:
from selenium import webdriver
from selenium.webdriver.chrome.options import Options
from selenium.webdriver.support.ui import WebDriverWait
from selenium.webdriver.support import expected_conditions as EC
from selenium.webdriver.common.by import By
from time import sleep
options = Options()
# options.add_argument('--headless')
options.add_argument('window-size=400,800')
navegador = webdriver.Chrome(options=options)
navegador.get('https://www.airbnb.com/')
# I tried this 02 ways
WebDriverWait(navegador, 10).until(EC.element_to_be_clickable((By.CSS_SELECTOR, ""))).click()
WebDriverWait(navegador, 10).until(EC.element_to_be_clickable((By.XPATH, ""))).click()
Find below the HTML from Airbnb
<button class="optanon-allow-all accept-cookies-button" title="OK" aria-label="OK" onclick="Optanon.TriggerGoogleAnalyticsEvent('OneTrust Cookie Consent', 'Banner Accept Cookies');" tabindex="1">OK</button>

from my end I could see this HTML in airbnb :
<button data-testid="accept-btn" type="button" class="_1xiwgrva">OK</button>
you can click on the accept cookies button like this :
WebDriverWait(navegador, 10).until(EC.element_to_be_clickable((By.CSS_SELECTOR, "button[data-testid='accept-btn']"))).click()
But I see, you have shared this html :-
<button class="optanon-allow-all accept-cookies-button" title="OK" aria-label="OK" onclick="Optanon.TriggerGoogleAnalyticsEvent('OneTrust Cookie Consent', 'Banner Accept Cookies');" tabindex="1">OK</button>
in this case, you could use the below code :
WebDriverWait(navegador, 10).until(EC.element_to_be_clickable((By.CSS_SELECTOR, "button.optanon-allow-all.accept-cookies-button"))).click()

Try this:
WebDriverWait(navegador, 10).until(EC.visibility_of_element_located((By.CSS_SELECTOR, "button.accept-cookies-button"))).click()
Also, I don't understand why are you defining so small screen size?
We are normally use
options.add_argument('--window-size=1920,1080')
Also, it should be -- there as I used here

Related

Handling "Accept all cookie" popup with selenium when selector is unknown

I have a python script, It look like this.
from selenium import webdriver
from selenium.webdriver.chrome.service import Service
from selenium.webdriver.chrome.options import Options
from selenium.webdriver.support.select import Select
from webdriver_manager.chrome import ChromeDriverManager
from selenium.webdriver.common.by import By
from selenium.webdriver.common.keys import Keys
from selenium.webdriver.support.ui import WebDriverWait
from selenium.webdriver.support import expected_conditions as EC
from os import path
import time
# Tried this code
chrome_options = webdriver.ChromeOptions()
prefs = {"profile.default_content_setting_values.notifications" : 2}
chrome_options.add_experimental_option("prefs",prefs)
browser = webdriver.Chrome(ChromeDriverManager().install(), chrome_options=chrome_options)
links = ["https://www.henleyglobal.com/", "https://markets.ft.com/data"]
for link in links:
browser.get(link)
#WebDriverWait(browser, 20).until(EC.url_changes(link))
#How do I disable/Ignore/remove/escape this "Accept all cookie" popup and then access the website to scrape data?
browser.quit()
So each website in the links array displays an "Accept all cookie" popup after navigating to the site. check the below image.
I have tried many ways nothing works, Check the one after imports
How do I exit/pass/escape this popup and then access the website to scrape data?
If you open your page in a new browser you'll note the page fully loads, then, a moment later your popup appears. The default wait strategy in selenium is just that the page is loaded.
One way to handle this is to simply inspect the page and find the xpath of the popup window. The below code should work for that.
browser.manage().timeouts().implicitlyWait(30, TimeUnit.SECONDS)
if link == 'https://www.henleyglobal.com/':
browser.findElement(By.XPATH("/html/body/div[7]/div/div/div/div[2]/div/div[2]/button[2]")).click()
else:
browser.findElement(By.XPATH("/html/body/div[4]/div/div/div[2]/div[2]/a")).click()
The code is waiting until the element of the pop-up is clickable and then clicking it.
For unknown sites you could try:
chrome_options = webdriver.ChromeOptions()
chrome_options.add_argument("--disable-notifications")
webdriver.Chrome(os.path.join(path, 'chromedriver'), chrome_options=chrome_options)
generally, you can not use some universal locator that will match the "Accept cookies" buttons for each and every web site in the world.
Even here, you have 2 different sites and the elements you need to click are totally different on these sites.
For https://www.henleyglobal.com/ site the correct locator may be something like this CSS Selector .confirmation button.primary-btn while for https://markets.ft.com/data site I'd advise to use CSS Selector .o-cookie-message__actions a.o-cookie-message__button.
These 2 elements are totally different: the first one is button while the second is a, they have totally different class names and all other attributes.
You may thing about the Accept text. It seems to be common, so you could use this XPath //*[contains(text(),'Accept')] but even this will not work since on the first page it matches 2 elements while the accept cookies element is the second between them...
So, there is no General locators, you will have to define separate locators for each page.
Again, for https://www.henleyglobal.com/ I would prefer
driver.find_element(By.CSS_SELECTOR, ".confirmation button.primary-btn").click()
While for the second page https://markets.ft.com/data I would prefer this
driver.find_element(By.CSS_SELECTOR, ".o-cookie-message__actions a.o-cookie-message__button").click()
Also, generally we always use WebDriverWait expected_conditions explicit waits, so the code will be as following:
from selenium.webdriver.support.ui import WebDriverWait
from selenium.webdriver.common.by import By
from selenium.webdriver.support import expected_conditions as EC
wait = WebDriverWait(driver, 10)
# for the first page
wait.until(EC.element_to_be_clickable((By.XPATH, ".confirmation button.primary-btn"))).click()
# for the second page
wait.until(EC.element_to_be_clickable((By.XPATH, ".o-cookie-message__actions a.o-cookie-message__button"))).click()

How to press button?

I'm pretty new with the selenium module in python and I'm not able to press a button.
Here the HTML-Code:
<!-- more tags -->
<a href="#">
<img src="flag-en.png" title="English" alt="English"> English
</a>
<!-- more tags -->
Here's a minimal example:
import selenium.webdriver
import selenium.webdriver.chrome
import selenium.webdriver.chrome.options
import time
from webdriver_manager.chrome import ChromeDriverManager
from selenium.webdriver.chrome.service import Service
from selenium.webdriver.support.wait import WebDriverWait
from selenium.webdriver.common.by import By
from selenium.webdriver.support import expected_conditions as EC
def get_driver():
chrome_options = selenium.webdriver.chrome.options.Options()
return selenium.webdriver.Chrome(service=Service(ChromeDriverManager().install()), options=chrome_options)
driver = get_driver()
driver.get(url)
# you can comment this out, if the first language which is selected isn't english
# otherwise this time is used to manually change the language to a non-english
# language to test, if it really selects the correct button
time.sleep(2)
WebDriverWait(driver, 20).until(EC.element_to_be_clickable((By.XPATH, "//div[#class='btn-group lang-sel']"))).click()
WebDriverWait(driver, 20).until(EC.element_to_be_clickable((By.XPATH, "//img[#title='English']"))).click()
what am I doing wrong?
Context
I want to automatically select the language of the website.
It seems to be your locator strategy is correct. You can shorten your xpath expression as follows:
pos= WebDriverWait(driver, 20).until(EC.element_to_be_clickable((By.XPATH,'//ul[#class="dropdown-menu"]/li[1]/a))).click()
There are may be spaces before of after the text inside the web element, so you can try this:
WebDriverWait(driver, 20).until(EC.element_to_be_clickable((By.XPATH, "//img[#title='English']"))).click()
Or
WebDriverWait(driver, 20).until(EC.element_to_be_clickable((By.XPATH, "//img[contains(.,'English')]"))).click()

Selenium find element by class breaking code

My program checks a login page for available combinations. Upon first opening up the page, there is a "check availability" button, which has an ID which Selenium is able to find/use to click.
If the username is taken, a new button pops up which says "check another combination", however the DOM inspector only gives the following information about the second button:
<a class="btn btn-outline-primary btn-block mt-4" href="" role="button" data-ng-click="combination.reset()">
Check another combination
</a>
I have tried finding this button by CLASS, CLASSNAME, copying and pasting the XPATH from the inspector and all these have lead to not only the button clicking not working, but the entering and submitting of the intial combination also ceasing to work.
Hoping someone can suggest how to identify this button from the given information. For reference, here is one of the lines I tried:
check_another_combo_button = driver.find_element(By.CLASS, "btn btn-outline-primary btn-block mt-4")
Here is my full code:
from selenium import webdriver
from selenium.webdriver.common.by import By
from webdriver_manager.firefox import GeckoDriverManager
import time
driver = webdriver.Firefox(executable_path=GeckoDriverManager().install())
driver.maximize_window()
site = "https://www......"
driver.get(site)
combination_input_field = driver.find_element(By.ID, "userCombination")
check_availability_button = driver.find_element(By.ID, "btn_fQPAC_submit")
combination_input_field.send_keys(77777) #77777 is one of the unavailable combos
check_availability_button.click()
time.sleep(2)
check_another_combo_button = driver.find_element(By.CLASS_NAME, "btn btn-outline-primary btn-block mt-4")
check_another_combo_button.click()
I suspect you want to check for inputs say inputs in a loop. Here is the solution using both chrome and firefox.
Solution using chrome driver :
from selenium import webdriver
from selenium.webdriver.chrome.service import Service
from selenium.webdriver.common.by import By
from selenium.webdriver.support import expected_conditions as EC
from selenium.webdriver.support.ui import WebDriverWait
chrome_path = r"C:\Users\hpoddar\Desktop\Tools\chromedriver_win32\chromedriver.exe"
s = Service(chrome_path)
url = 'https://www......'
driver = webdriver.Chrome(service=s)
driver.get(url)
inputs = ['77777', '88888', '99999', '11111']
for plate in inputs:
combination_input_field = driver.find_element(By.ID, "qPlateCombination")
check_availability_button = driver.find_element(By.ID, "btn_fQPAC_submit")
combination_input_field.send_keys(plate)
check_availability_button.click()
element = WebDriverWait(driver, 10).until(EC.visibility_of_element_located((By.CSS_SELECTOR, ".btn.btn-outline-primary.btn-block.mt-4")))
check_another_combo_button = driver.find_element(By.CSS_SELECTOR, ".btn.btn-outline-primary.btn-block.mt-4")
check_another_combo_button.click()
Solution using Geckodriver :
To use Geckodriver, all you need to add is
from webdriver_manager.firefox import GeckoDriverManager
from selenium.webdriver.chrome.service import Service
and change driver initialization line to
s = Service(executable_path=GeckoDriverManager().install())
webdriver.Firefox(service=s)
driver.maximize_window()
when you have common classes or dynamically generated classes for elements you wish to interact, the next approach is to be fixed by text.
Try with xpath, that matches the text:
//a[normalize-space(.)='Check another combination']
normalize-space(.) - will trim the whitespace and match the text inside the element.
Yu can include #role="button" as additional matcher inside the a tag.

How to execute all javascript content on webpage with selenium to find and send login form info on fully loaded webpage

I've been trying to make a Python script to login into a certain website, navigate through the menu, fill out a form and save the file it generates to a folder.
I've been using Selenium trying to make the website fully load so i can find the elements for the login, but i'm being unsucessful, maybe because the website does a lot of JavaScript content before it fully loads, but i can't make it fully load and show me the data i want.
I tried Robobrowser, Selenium, Requests and BeautifulSoup to get it done.
import requests
from bs4 import BeautifulSoup
from selenium import webdriver
url = "https://directa.natal.rn.gov.br/"
driver = webdriver.Chrome(executable_path="C:\\webdrivers\\chromedriver.exe")
driver.get(url)
html = driver.execute_script("return document.documentElement.outerHTML")
sel_soup = BeautifulSoup(html, 'html.parser')
senha = driver.find_element_by_xpath('//*[#id="senha"]')
senha.send_keys("123")
I expected to have filled the password (senha) field with "123" but i can't even find the element.
It seems like what's needed here is a little bit of a scroll, wait and switch, incase the login fields just aren't ready for input :) The below should work, whereby we actually scroll to the element, having switch to the iframe, before we interact with the rest of the login form. You're able to adjust the delay from 5 seconds to anything of your preference.
from selenium import webdriver
from selenium.webdriver.support.ui import WebDriverWait
from selenium.webdriver.support import expected_conditions as EC
from selenium.webdriver.common.by import By
from selenium.common.exceptions import TimeoutException
""" Variables """
url = "https://directa.natal.rn.gov.br/"
delay = 5 # seconds
""" Initiate driver """
driver = webdriver.Chrome(executable_path="C:\\webdrivers\\chromedriver.exe")
""" Go to url """
driver.get(url)
""" Iframe switch """
WebDriverWait(driver, 10).until(EC.frame_to_be_available_and_switch_to_it((By.CSS_SELECTOR,"frame[name='mainsystem'][src^='main']")))
""" Attempt to get all our elements """
try:
""" Username """
usuario = WebDriverWait(driver, delay).until(EC.presence_of_element_located((By.ID, 'usuario')))
""" Password """
senha = WebDriverWait(driver, delay).until(EC.presence_of_element_located((By.ID, 'senha')))
print("All elements located!")
except TimeoutException:
print("Loading took too much time!")
exit(0)
"""Scroll to our element """
driver.execute_script("arguments[0].scrollIntoView();", usuario)
""" Input data into our fields """
usuario.send_keys("username")
senha.send_keys("password")
""" Locate our login element """
login = WebDriverWait(driver, delay).until(EC.presence_of_element_located((By.ID, 'acessar')))
""" Click Login """
login.click()
To send the character sequence 123 to the password (senha) field, as the the desired elements are within a <frame> so you have to:
Induce WebDriverWait for the desired frame to be available and switch to it.
Induce WebDriverWait for the desired element to be clickable.
You can use the following solution:
Code Block:
from selenium import webdriver
from selenium.webdriver.common.by import By
from selenium.webdriver.support.ui import WebDriverWait
from selenium.webdriver.support import expected_conditions as EC
options = webdriver.ChromeOptions()
options.add_argument("start-maximized")
options.add_argument('disable-infobars')
options.add_argument("--disable-extensions")
driver = webdriver.Chrome(chrome_options=options, executable_path=r'C:\WebDrivers\chromedriver.exe')
driver.get("https://directa.natal.rn.gov.br/")
WebDriverWait(driver, 10).until(EC.frame_to_be_available_and_switch_to_it((By.CSS_SELECTOR,"frame[name='mainsystem'][src^='main']")))
WebDriverWait(driver, 20).until(EC.element_to_be_clickable((By.CSS_SELECTOR, "input.input[name='usuario']"))).send_keys("Tads")
driver.find_element_by_css_selector("input.input[name='senha']").send_keys("123")
Browser Snapshot:
Here you can find a relevant discussion on Ways to deal with #document under iframe

How to find the href attribute of the videos on twitch through selenium and python?

I'm trying to find the twitch video IDs of all videos for a specific user. So for example on this page
https://www.twitch.tv/dyrus/videos/all
So here we have all videos linked, but its not quite so simple as to just scrape the html and find the links since they are generated dynamically it seems.
So I heard about selenium and did something like this:
from selenium import webdriver
# Change path here obviously
driver = webdriver.Chrome('C:/Users/Jason/Downloads/chromedriver')
driver.get('https://www.twitch.tv/dyrus/videos/all')
link_element = driver.find_elements_by_xpath("//*[#href]")
for link in link_element:
print(link.get_attribute('href'))
driver.close()
This returns me a bunch of links on the page but not the videos, they lie "deeper" I think, any input?
Thanks in advance
I would still suggest a couple of changes as follows:
Always open the Web Browser in maximized mode so that all/majority of the desired elements are within the Viewport.
If you are on Windows OS you need to append the extension .exe at the end of the WebDriver variant name, e.g. chromedriver.exe
While you identify for elements always try to include the class attribute in your Locator Strategy.
Always invoke driver.quit() at the end of your #Test to close & destroy the WebDriver and Web Client instances gracefully.
Here is your own code block with the above mentioned tweaks:
from selenium import webdriver
from selenium.webdriver.chrome.options import Options
from selenium.webdriver.common.by import By
from selenium.webdriver.support.ui import WebDriverWait
from selenium.webdriver.support import expected_conditions as EC
options = Options()
options.add_argument("start-maximized")
options.add_argument("disable-infobars")
driver = webdriver.Chrome(chrome_options=options, executable_path=r'C:\path\to\chromedriver.exe')
driver.get('https://www.twitch.tv/dyrus/videos/all')
link_elements = WebDriverWait(driver, 10).until(EC.visibility_of_all_elements_located((By.CSS_SELECTOR, "a.tw-interactive.tw-link[data-a-target='preview-card-image-link']")))
for link in link_elements:
print(link.get_attribute('href'))
driver.quit()
Console Output:
https://www.twitch.tv/videos/295314690
https://www.twitch.tv/videos/294901947
https://www.twitch.tv/videos/294472813
https://www.twitch.tv/videos/294075254
https://www.twitch.tv/videos/293617036
https://www.twitch.tv/videos/293236560
https://www.twitch.tv/videos/292800601
https://www.twitch.tv/videos/292409437
https://www.twitch.tv/videos/292328170
https://www.twitch.tv/videos/292032996
https://www.twitch.tv/videos/291625563
https://www.twitch.tv/videos/291192151
https://www.twitch.tv/videos/290824842
https://www.twitch.tv/videos/290434348
https://www.twitch.tv/videos/290021370
https://www.twitch.tv/videos/289561690
https://www.twitch.tv/videos/289495488
https://www.twitch.tv/videos/289138003
https://www.twitch.tv/videos/289110429
https://www.twitch.tv/videos/288804893
https://www.twitch.tv/videos/288784992
https://www.twitch.tv/videos/288687479
https://www.twitch.tv/videos/288432438
https://www.twitch.tv/videos/288117849
https://www.twitch.tv/videos/288004968
https://www.twitch.tv/videos/287689102
https://www.twitch.tv/videos/287451192
https://www.twitch.tv/videos/287267032
https://www.twitch.tv/videos/287017431
https://www.twitch.tv/videos/286819343
With your locator, you are returning every element on the page that contains an href attribute. You can be a little more specific than that and get what you are looking for. Switch to a CSS selector...
from selenium import webdriver
from selenium.webdriver.common.by import By
from selenium.webdriver.support.ui import WebDriverWait
from selenium.webdriver.support import expected_conditions as EC
# Change path here obviously
driver = webdriver.Chrome('C:/Users/Jason/Downloads/chromedriver')
driver.get('https://www.twitch.tv/dyrus/videos/all')
links = WebDriverWait(driver, 10).until(EC.visibility_of_all_elements_located((By.CSS_SELECTOR, "a[data-a-target='preview-card-image-link']")))
for link in links:
print(link.get_attribute('href'))
driver.close()
That prints 40 links from the page.

Categories

Resources