Clicking button on webpage using Selenium Python Beautifulsoup - python

I'm trying to create a 'bot' to buy a graphics card. I've downloaded a pre-made script, and i'm trying to adjust to my needs.
Script take me to the site in Firefox, finds the button I am trying to look for using the following code:
findAllCards = soup.find('button', {'class': 'Button__StyledButton-iESSlv dJJJCD Button-dtUzzq kHUYTy'})
This works. However, when I an trying to click the button, I am unable to as I have no idea what I am suppose to find here:
driverWait(driver, 'css', '.space-b center')
Webpage I'm using to test is:
https://www.currys.co.uk/gbuk/gaming/console-gaming/controllers/xbox-wireless-controller-carbon-black-10211565-pdt.html
Full code here:
driver.get(url)
while True:
html = driver.page_source
soup = bs4.BeautifulSoup(html, 'html.parser')
wait = WebDriverWait(driver, 15)
wait2 = WebDriverWait(driver, 2)
try:
findAllCards = soup.find('button', {'class': 'Button__StyledButton-iESSlv dJJJCD Button-dtUzzq kHUYTy'})
if findAllCards:
print(f'Button Found!: {findAllCards.get_text()}')
# Clicking Add to Cart.
time.sleep(.3)
print('Click')
driverWait(driver, 'css', '.space-b center')
print('Click1')
time.sleep(2)
Thanks :)

Your findAllCards above returns 3 web elements, not 1. Assuming you are trying to click on the Add to Basket button:
findAllCards = driver.find_element_by_xpath("//div[#id='product-actions']//div[#data-component='add-to-basket-button-wrapper']//button")
findAllCards.click()

Some odd thing with element to be interactable.
wait = WebDriverWait(driver, 10)
driver.get(url)
wait.until(EC.element_to_be_clickable((By.CSS_SELECTOR,"button#onetrust-accept-btn-handler"))).click()
try:
elem=wait.until(EC.presence_of_element_located((By.XPATH,"//button[contains(.,'Add to basket')]")))
driver.execute_script("arguments[0].click();", elem)
except Exception as e:
print(str(e))
Import
from selenium.webdriver.common.by import By
from selenium.webdriver.support.ui import WebDriverWait
from selenium.webdriver.support import expected_conditions as EC

Related

Python Selenium click load more on table

I am trying to get the whole data of this table. However, in the last row there is "Load More" table row that I do not know how to load. So far I have tried different approaches that did not work,
I tried to click on the row itself by this:
from selenium import webdriver
driver = webdriver.Chrome()
driver.get(url)
soup = BeautifulSoup(driver.page_source, 'html.parser')
table = soup.find('table', {"class": "competition-leaderboard__table"})
i = 0
for team in table.find.all('tbody'):
rows = team.find_all('tr')
for row in rows:
i = i + 1
if (i == 51):
row.click()
//the scraping code for the first 50 elements
The code above throws an error saying that "'NoneType' object is not callable".
Another thing that I have tried that did not work is the following:
I tried to get the load more table row by its' class and click on it.
from selenium import webdriver
driver = webdriver.Chrome()
driver.get(url)
load_more = driver.find_element_by_class_name('competition-leaderboard__load-more-wrapper')
load_more.click()
soup = BeautifulSoup(driver.page_source, 'html.parser')
The code above also did not work.
So my question is how can I make python click on the "Load More" table row as in the HTML structure of the site it seems like "Load More" is not a button that is clickable.
In your code you have to accept cookies first, and then you can click 'Load more' button.
CSS selectors are the most suitable in this case.
import time
from selenium import webdriver
from selenium.webdriver.common.by import By
from selenium.webdriver.support.wait import WebDriverWait
from selenium.webdriver.support import expected_conditions as EC
driver = webdriver.Chrome(executable_path='/snap/bin/chromium.chromedriver')
driver.implicitly_wait(10)
driver.get('https://www.kaggle.com/c/coleridgeinitiative-show-us-the-data/leaderboard')
wait = WebDriverWait(driver, 30)
wait.until(EC.element_to_be_clickable((By.CSS_SELECTOR, ".sc-pAyMl.dwWbEz .sc-AxiKw.kOAUSS>.sc-AxhCb.gsXzyw")))
cookies = driver.find_element_by_css_selector(".sc-pAyMl.dwWbEz .sc-AxiKw.kOAUSS>.sc-AxhCb.gsXzyw").click()
load_more = driver.find_element_by_css_selector(".competition-leaderboard__load-more-count").click()
time.sleep(10) # Added for you to make sure that both buttons were clicked
driver.close()
driver.quit()
I tested this snippet and it clicked the desired button.
Note that I've added WebDriverWait in order to wait until the first button is clickable.
UPDATE:
I added time.sleep(10) so you could see that both buttons are clicked.

Selenium button click in loop fails after first try

Currently I am working on a web crawler that should be able to download text of a dutch newspaper bank. The first link is working correctly but suddenly the second link creates an error of which I do not know how to fix this.
It seems that selenium is unable to click the button in the second link while it succeeds doing so in the first link.
Do you know what causes the second link (telegraaf page) to fail?
UPDATE CODE:
from selenium import webdriver
from selenium.webdriver.common.keys import Keys
import time
import pandas as pd
import numpy as np
import re
from selenium.webdriver.common.by import By
from selenium.webdriver.support.ui import WebDriverWait
from selenium.webdriver.support import expected_conditions
from selenium.common.exceptions import TimeoutException
from selenium.webdriver.common.action_chains import ActionChains
#Set up the path to the chrome driver
driver = webdriver.Chrome()
html = driver.find_element_by_tag_name('html')
all_details = []
for c in range(1,2):
try:
driver.get("https://www.delpher.nl/nl/kranten/results?query=kernenergie&facets%5Bpapertitle%5D%5B%5D=Algemeen+Dagblad&facets%5Bpapertitle%5D%5B%5D=De+Volkskrant&facets%5Bpapertitle%5D%5B%5D=De+Telegraaf&facets%5Bpapertitle%5D%5B%5D=Trouw&page={}&sortfield=date&cql%5B%5D=(date+_gte_+%2201-01-1970%22)&cql%5B%5D=(date+_lte_+%2201-01-2018%22)&coll=ddd".format(c))
driver.execute_script("window.scrollTo(0, document.body.scrollHeight);")
incategory = driver.find_elements_by_class_name("search-result")
print(driver.current_url)
links = [ i.find_element_by_class_name("thumbnail search-result__thumbnail").get_attribute("href") for i in incategory]
# Lets loop through each link to acces the page of each book
for link in links:
# get one book url
driver.get(link)
# newspaper
newspaper = driver.find_element_by_xpath("//*[#id='content']/div[2]/div/div[2]/header/h1/span[2]")
# date of the article
date = driver.find_element_by_xpath("//*[#id='content']/div[2]/div/div[2]/header/div/ul/li[1]")
#click button and find title
div_element = WebDriverWait(driver, 60).until(expected_conditions.presence_of_element_located((By.XPATH,'//*[#id="object"]/div/div/div')))
hover = ActionChains(driver).move_to_element(div_element)
hover.perform()
div_element.click()
button = WebDriverWait(driver, 90).until(expected_conditions.presence_of_element_located((By.XPATH, '//*[#id="object-viewer__ocr-button"]')))
hover = ActionChains(driver).move_to_element(button)
hover.perform()
button.click()
element = driver.find_element_by_css_selector(".object-viewer__ocr-panel-results")
driver.execute_script("$(arguments[0]).click();", element)
# content of article
try:
content = driver.find_elements_by_xpath("//*[contains(text(), 'kernenergie')]").text
except:
content = None
# Define a dictionary with details we need
r = {
"1Newspaper":newspaper.text,
"2Date":date.text,
"3Content":content,
}
# append r to all details
all_details.append(r)
except Exception as e:
print(str(e))
pass
# save the information into a CSV file
df = pd.DataFrame(all_details)
df = df.to_string()
time.sleep(3)
driver.close()
So you have some problems.
driver.implicitly_wait(10)
Should only be used once
links = [ i.find_element_by_class_name("search-result__thumbnail-link").get_attribute("href") for i in incategory]
Is a more useful way to get all links
print(driver.current_url)
Could replace
print("https://www.delpher.nl/nl/kranten/results?query=kernenergie&facets%5Bpapertitle%5D%5B%5D=Algemeen+Dagblad&facets%5Bpapertitle%5D%5B%5D=De+Volkskrant&facets%5Bpapertitle%5D%5B%5D=De+Telegraaf&facets%5Bpapertitle%5D%5B%5D=Trouw&page={}&sortfield=date&cql%5B%5D=(date+_gte_+%2201-01-1970%22)&cql%5B%5D=(date+_lte_+%2201-01-2018%22)&coll=ddd".format(c))
No need for url=link
for link in links:
driver.get(link)
Your title actually doesn't get on the second page. Use something like this for all values.
try:
content = driver.find_element_by_xpath('//*[#id="object-viewer__ocr-panel"]/div[2]/div[5]').text
except:
content = None
# Define a dictionary
r = {
"1Newspaper":newspaper,
"2Date":date,
"3Title": title,
"4Content": content,
}
You can replace your exception with to figure out the line of problem.
except Exception as e:
print(str(e))
pass
it might be that the button you are trying to reach is inside an iframe, which means you have to access that one before searching for the XPATH:
iframe = driver.find_elements_by_tag_name('iframe')
driver.switch_to.frame(iframe)
Also there might be a possibility the object youre trying to click is not visible yet, which could be solved by an timeout

Not sure how to get elements from dynamically loading webpage using selenium

So I am scraping reviews and skin type from Sephora and have run into a problem identifying how to get elements off of the page.
Sephora.com loads reviews dynamically after you scroll down the page so I have switched from beautiful soup to Selenium to get the reviews.
The Reviews have no ID, no name, nor a CSS identifier that seems to be stable. The Xpath doesn't seem to be recognized each time I try to use it by copying from chrome nor from firefox.
Here is an example of the HTML from the inspected element that I loaded in chrome:
Inspect Element view from the desired page
My Attempts thus far:
from selenium import webdriver
from selenium.webdriver.common.keys import Keys
driver = webdriver.Chrome("/Users/myName/Downloads/chromedriver")
url = 'https://www.sephora.com/product/the-porefessional-face-primer-P264900'
driver.get(url)
reviews = driver.find_elements_by_xpath(
"//div[#id='ratings-reviews']//div[#data-comp='Ellipsis Box ']")
print("REVIEWS:", reviews)
Output:
| => /Users/myName/anaconda3/bin/python "/Users/myName/Documents/ScrapeyFile Group/attempt32.py"
REVIEWS: []
(base)
So basically an empty list.
ATTEMPT 2:
import time
from selenium import webdriver
from selenium.webdriver.common.keys import Keys
from selenium.webdriver.common.by import By
from selenium.common.exceptions import NoSuchElementException
from selenium.webdriver.common.keys import Keys
# Open up a Firefox browser and navigate to web page.
driver = webdriver.Firefox()
driver.get(
"https://www.sephora.com/product/squalane-antioxidant-cleansing-oil-P416560?skuId=2051902&om_mmc=ppc-GG_1165716902_56760225087_pla-420378096665_2051902_257731959107_9061275_c&country_switch=us&lang=en&ds_rl=1261471&gclid=EAIaIQobChMIisW0iLbK6AIVaR6tBh005wUTEAYYBCABEgJVdvD_BwE&gclsrc=aw.ds"
)
#Scroll to bottom of page b/c its dynamically loading
html = driver.find_element_by_tag_name('html')
html.send_keys(Keys.END)
#scrape stats and comments
comments = driver.find_elements_by_css_selector("div.css-7rv8g1")
print("!!!!!!Comments!!!!!")
print(comments)
OUTPUT:
| => /Users/MYNAME/anaconda3/bin/python /Users/MYNAME/Downloads/attempt33.py
!!!!!!Comments!!!!!
[]
(base)
Empty again. :(
I get the same results when I try to use different element selectors:
#scrape stats and comments
comments = driver.find_elements_by_class_name("css-7rv8g1")
I also get nothing when I tried this:
comments = driver.find_elements_by_xpath(
"//div[#data-comp='GridCell Box']//div[#data-comp='Ellipsis Box ']")
and This (notice the space after Ellipsis Box is gone :
comments = driver.find_elements_by_xpath(
"//div[#data-comp='GridCell Box']//div[#data-comp='Ellipsis Box']")
I have tried using the solutions outlined here and here but ti no avail -- I think there is something I don't understand about the page or selenium that I am missing since this is my first time using selenium so i'm a super nube :(
from selenium import webdriver
from selenium.webdriver.common.keys import Keys
from selenium.webdriver.support import expected_conditions as EC
from selenium.webdriver.support.ui import WebDriverWait
from selenium.webdriver.common.by import By
from selenium.webdriver.common.action_chains import ActionChains
import time
from selenium import webdriver
driver = webdriver.Chrome(executable_path=r"")
driver.maximize_window()
wait = WebDriverWait(driver, 20)
driver.get("https://www.sephora.fr/p/black-ink---classic-line-felt-liner---eyeliner-feutre-precis-waterproof-P3622017.html")
scrolls = 1
while True:
scrolls -= 1
driver.execute_script("window.scrollTo(0, document.body.scrollHeight)")
time.sleep(3)
if scrolls < 0:
break
reviewText=wait.until(EC.presence_of_all_elements_located((By.XPATH, "//ol[#class='bv-content-list bv-content-list-reviews']//li//div[#class='bv-content-summary-body']//div[1]")))
for textreview in reviewText:
print textreview.text
Output:
I've been scraping reviews from Sephora and basically, even if there is plenty of room for improvement, it works like this :
Clicks on "reviews" to access reviews
Loads all reviews by scrolling until there aren't any review left to load
Finds review text and skin type by CSS SELECTOR
def load_all_reviews(driver):
while True:
try:
driver.execute_script(
"arguments[0].scrollIntoView(true);",
WebDriverWait(driver, 10).until(
EC.visibility_of_element_located(
(By.CSS_SELECTOR, ".bv-content-btn-pages-load-more")
)
),
)
driver.execute_script(
"arguments[0].click();",
WebDriverWait(driver, 20).until(
EC.element_to_be_clickable(
(By.CSS_SELECTOR, ".bv-content-btn-pages-load-more")
)
),
)
except Exception as e:
break
def get_review_text(review):
try:
return review.find_element(By.CLASS_NAME, "bv-content-summary-body-text").text
except:
return "NA" # in case it doesnt find a review
def get_skin_type(review):
try:
return review.find_element(By.XPATH, '//*[#id="BVRRContainer"]/div/div/div/div/ol/li[2]/div[1]/div/div[2]/div[5]/ul/li[4]/span[2]').text
except:
return "NA" # in case it doesnt find a skin type
to use those you've got to create a webdriver and first call the load_all_reviews() function.
Then you've got to find reviews with :
reviews = driver.find_elements(By.CSS_SELECTOR, ".bv-content-review")
and finally you can call for each review the get_review() and get_skin_type() functions :
for review in reviews :
print(get_review_text(review))
print(get_skin_type(review))

Unable to click on 'more' button cyclically to get all the full reviews

I've created a script in python in combination with selenium to fetch all the reviews from a certain page of google maps. There are lots of reviews in that page and they are only visible once that page is made to scroll downward. My script can do all of them successfully.
However, the only issue that I'm facing at this moment is that some of the reviews have More button which is meant to click in order to show the full review.
One of such is this:
website address
I've tried with:
from selenium import webdriver
from selenium.webdriver.common.by import By
from selenium.webdriver.support.ui import WebDriverWait
from selenium.webdriver.support import expected_conditions as EC
link = "https://www.google.com/maps/place/Pizzeria+Di+Matteo/#40.8512552,14.255779,17z/data=!4m7!3m6!1s0x133b0841ef6e38e5:0xece6ea09987e9baf!8m2!3d40.8512512!4d14.2579677!9m1!1b1"
driver = webdriver.Chrome()
driver.get(link)
wait = WebDriverWait(driver,10)
while True:
try:
elem = wait.until(EC.visibility_of_element_located((By.CSS_SELECTOR, "[class='section-loading-spinner']")))
driver.execute_script("arguments[0].scrollIntoView();",elem)
except Exception:
break
for see_more in wait.until(EC.presence_of_all_elements_located((By.CSS_SELECTOR, "button[class^='section-expand-review']"))):
see_more.click()
for item in wait.until(EC.visibility_of_all_elements_located((By.CSS_SELECTOR, ".section-review-content"))):
name = item.find_element_by_css_selector("[class='section-review-title'] > span").text
try:
review = item.find_element_by_css_selector("[class='section-review-text']").text
except AttributeError:
review = ""
print(name)
driver.quit()
Currently the above script throws stale element error when it hits this line for see_more in wait.until().click().
How can I click on that More button cyclically to get all the full reviews?
If use WebdriverWait and presence_of_all_elements_located it wait for search the element in given time and if it is not attached to the html you will receive error.
However Check the length of element present in webpage if there then click on the element.
if len(driver.find_elements_by_css_selector("button[class^='section-expand-review']"))>0:
driver.find_element_by_css_selector("button[class^='section-expand-review']").click()
Here is the code.
from selenium import webdriver
from selenium.webdriver.common.by import By
from selenium.webdriver.support.ui import WebDriverWait
from selenium.webdriver.support import expected_conditions as EC
link = "https://www.google.com/maps/place/Ecstasy/#23.7399982,90.3732109,17z/data=!3m1!4b1!4m7!3m6!1s0x3755b8caa669d5e3:0x41f47ddcc39a556e!8m2!3d23.7399933!4d90.3753996!9m1!1b1"
driver = webdriver.Chrome()
driver.get(link)
wait = WebDriverWait(driver,10)
while True:
try:
elem = wait.until(EC.visibility_of_element_located((By.CSS_SELECTOR, "[class='section-loading-spinner']")))
driver.execute_script("arguments[0].scrollIntoView();",elem)
except Exception:
break
if len(driver.find_elements_by_css_selector("button[class^='section-expand-review']"))>0:
driver.find_element_by_css_selector("button[class^='section-expand-review']").click()
print('pass')
for item in wait.until(EC.visibility_of_all_elements_located((By.CSS_SELECTOR, ".section-review-content"))):
name = item.find_element_by_css_selector("[class='section-review-title'] > span").text
try:
review = item.find_element_by_css_selector("[class='section-review-text']").text
except AttributeError:
review = ""
print(name)
driver.quit()
EDITED
if len(driver.find_elements_by_css_selector("button[class^='section-expand-review']"))>0:
for item in driver.find_elements_by_css_selector("button[class^='section-expand-review']"):
item.location_once_scrolled_into_view
item.click()
time.sleep(2)
this is worked with me :-
you can put it within for loop or your method to get all reviews.
try:
driver.find_element_by_class_name("mapsConsumerUiSubviewSectionReview__section-expand-review").click()
except:
continue

Can't get rid of "stale element" error while running my scrpt

I've written a script in python with selenium. The script is supposed to click on some links in a webpage. When I run my script, It does click on the first link and then throws an error stale element reference: element is not attached to the page document instead of chasing for the next link. I searched a lot for the last few hours to find any solution to get rid of this error but no luck.
I'm not interested in their data so any solution other than the perocess of clicking is not what I'm looking for. How can I click on the links until the last link?
This is my attempt so far:
import time
from selenium import webdriver
from selenium.webdriver.common.by import By
from selenium.webdriver.support.ui import WebDriverWait
from selenium.webdriver.support import expected_conditions as EC
def click_links(driver,url):
driver.get(url)
for olink in wait.until(EC.presence_of_all_elements_located((By.CLASS_NAME, "result-row__item-hover-visualizer"))):
olink.click()
time.sleep(3)
if __name__ == '__main__':
weblink = "https://www.hitta.se/s%C3%B6k?vad=Markiser+%26+Persienner"
driver = webdriver.Chrome()
wait = WebDriverWait(driver, 10)
try:
click_links(driver,weblink)
finally:
driver.quit()
You can try below code:
def click_links(driver,url):
driver.get(url)
links_len = len(wait.until(EC.presence_of_all_elements_located((By.CLASS_NAME, "result-row__item-hover-visualizer"))))
for index in range(links_len):
cookies_bar = wait.until(EC.presence_of_element_located((By.CSS_SELECTOR, '[data-bind="visible: showCookieDialog"]')))
driver.execute_script("arguments[0].hidden='true';", cookies_bar)
wait.until(EC.element_to_be_clickable((By.CSS_SELECTOR, 'button[data-track="click-show-more"]'))).click()
entry = wait.until(EC.presence_of_all_elements_located((By.CLASS_NAME, "result-row__item-hover-visualizer")))[index]
entry.click()
time.sleep(3)
driver.back()

Categories

Resources