Selenium Python - Find elements by class name is not working when new elements are loaded - python

I'm writing a Python program to access a website and click buttons that have the same class name.
Conditions:
There are 6 buttons provided per one time.
When a button is clicked, it disappears.
After 6 buttons are clicked, only the section contains the 6 buttons refreshes and the new 6 buttons appear. (then loop like this for a certain amount of times)
The problem is that I can get it to click only the 6 buttons that appear for the first time. After a section contains the buttons refreshes, find_elements_by_class_name cannot find the new 6 buttons.
Question: Is there a way to get find_elements_by_class_name or any alternatives to find the new 6 buttons every time the section refreshes?
Thank you so much in advance for your help. Highly appreciate!
Here is a part of my code.
webdriver_path = '/xxxxxxx/xxxxx/xxxx'
options = Options()
options.headless = True
options.add_argument('--window-size=1920x1080')
driver = webdriver.Chrome(options=options, executable_path=webdriver_path)
driver.get('https://xxxxx.xxx/')
click_n = 1
While True:
all_clicks = driver.find_elements_by_class_name('button_me')
all_clicks[click_n].click()
print('Click SUCESSFUL')
click_n +=1
# if already 6 clicks wait for refresh
if click_n = 6:
driver.implicitly_wait(10)
click_n -= 6
continue
else:
# Wait till click success notice appear
NoticeNCon = WebDriverWait(driver, 10).until(EC.presence_of_element_located((By.XPATH, '//*[#id="Hint"]/div/b[contains(text(),"success")]')))

Related

How can I scroll a pop up element from Abnb with Selenium on Python?

hey guys I'm scrapping abnb with Selenium-Python. The issue is that I want to scrap the reviews from each abnb but in order to scrap them I need to scroll down the pop up element that appears.
As you can see on the image I click the button then the pop-up element appears and then I want to scroll but I can't. Of course as you can see all the reviews don't appear and I have to scroll if I want them to appear, they also don't appear on the DevTools from google chrome unless I scroll.
I tried:
reviews_lista = []
boton_revs = driver.find_element(By.XPATH, "//button[#data-testid='pdp-show-all-reviews-button']")
boton_revs.click()
I clicked on the button that you see on the image.
I selected the whole POP UP
bloque_pop = driver.find_element(By.XPATH, "//div[#class='_14vzertx']")
I tried to scroll with this code but it didn't work
scroll = 0
while scroll < 5: # scroll 5 times
driver.execute_script('arguments[0].scrollTop = arguments[0].scrollTop + arguments[0].offsetHeight;', bloque_pop)
time.sleep(1)
scroll += 1
I also tried
driver.execute_script("window.scrollTo(0,document.body.scrollHeight)")
and this too:
bloque_pop = driver.find_element(By.XPATH, "//div[#class='_14vzertx']")
for i in range(5):
#driver.execute_script("arguments[0].scrollTop = arguments[0].scrollHeight", bloque_pop)
driver.execute_script("window.scrollTo(0,document.body.scrollHeight)", bloque_pop)
time.sleep(1.5)
None of them work.
The easiest way to scroll inside a popup is to scrape elements contained in the popup and then scroll to the last element by using the javascript command scrollIntoView.
popup_reviews = driver.find_elements(By.CSS_SELECTOR, 'div[role=dialog] div[data-review-id]')
driver.execute_script('arguments[0].scrollIntoView({block: "center", behavior: "smooth"});', popup_reviews[-1])

Selenium - HTML doesn't always update after a click. Content in the browser changes, but I often get the same HTML from prior to the click

I'm trying to set up a simple webscraping script to pull every hyperlink from the discover cards on Bandcamp.
Here is my code:
from selenium import webdriver
from selenium.webdriver.common.by import By
from selenium.webdriver.support.ui import WebDriverWait
from selenium.webdriver.support import expected_conditions as EC
from selenium.webdriver.support.select import Select
browser = webdriver.Chrome()
all_links = []
page = 1
url = "https://bandcamp.com/?g=all&s=new&p=0&gn=0&f=digital&w=-1"
browser.get(url)
while page < 6:
page += 1
# wait until discover cards are loaded
test = WebDriverWait(browser, 20).until(EC.element_to_be_clickable(
(By.XPATH, '//*[#id="discover"]/div[9]/div[2]/div/div[1]/div/table/tbody/tr[1]/td[1]/a/div')))
# scrape hyperlinks for each of the 8 albums shown
titles = browser.find_elements(By.CLASS_NAME, "item-title")
links = [title.get_attribute('href') for title in titles[-8:]]
all_links = all_links + links
print(links)
# pagination - click through the page buttons as the links are scraped
page_nums = browser.find_elements(By.CLASS_NAME, 'item-page')
for page_num in page_nums:
if page_num.text.isnumeric():
if int(page_num.text) == page:
page_num.click()
time.sleep(20) # I've tried multiple long wait times as well as WebDriverWaits on different elements to see if the HTML will update, but I haven't seen a positive effect
break
I'm using print(links) to see where this is going wrong. In the selenium browser, it clicks through the pages well. Note that pagination via the url parameters doesn't seem possible as the discover cards often won't load unless you click the page buttons towards the bottom of my picture. BetterSoup and Requests don't work either for the same reason. The print function is returning the following:
['https://neubauten.bandcamp.com/album/stimmen-reste-musterhaus-7?from=discover-new', 'https://cirka1.bandcamp.com/album/time?from=discover-new', 'https://futuramusicsound.bandcamp.com/album/yoga-meditation?from=discover-new', 'https://deathsoundbatrecordings.bandcamp.com/album/real-mushrooms-dsbep092?from=discover-new', 'https://riacurley.bandcamp.com/album/take-me-album?from=discover-new', 'https://terracuna.bandcamp.com/album/el-origen-del-viento?from=discover-new', 'https://hyper-music.bandcamp.com/album/hypermusic-vol-4?from=discover-new', 'https://defisis1.bandcamp.com/album/priceless?from=discover-new']
['https://jarnosalo.bandcamp.com/album/here-lies-ancient-blob?from=discover-new', 'https://andreneitzel.bandcamp.com/album/allegasi-gold-2?from=discover-new', 'https://moonraccoon.bandcamp.com/album/prequels?from=discover-new', 'https://lolivone.bandcamp.com/album/live-at-the-berklee-performance-center?from=discover-new', 'https://nilswrasse.bandcamp.com/album/a-calling-from-the-desert-to-the-sea-original-motion-picture-soundtrack?from=discover-new', 'https://whitereaperaskingride.bandcamp.com/album/asking-for-a-ride?from=discover-new', 'https://collageeffect.bandcamp.com/album/emerald-network?from=discover-new', 'https://foxteethnj.bandcamp.com/album/through-the-blue?from=discover-new']
['https://jarnosalo.bandcamp.com/album/here-lies-ancient-blob?from=discover-new', 'https://andreneitzel.bandcamp.com/album/allegasi-gold-2?from=discover-new', 'https://moonraccoon.bandcamp.com/album/prequels?from=discover-new', 'https://lolivone.bandcamp.com/album/live-at-the-berklee-performance-center?from=discover-new', 'https://nilswrasse.bandcamp.com/album/a-calling-from-the-desert-to-the-sea-original-motion-picture-soundtrack?from=discover-new', 'https://whitereaperaskingride.bandcamp.com/album/asking-for-a-ride?from=discover-new', 'https://collageeffect.bandcamp.com/album/emerald-network?from=discover-new', 'https://foxteethnj.bandcamp.com/album/through-the-blue?from=discover-new']
['https://jarnosalo.bandcamp.com/album/here-lies-ancient-blob?from=discover-new', 'https://andreneitzel.bandcamp.com/album/allegasi-gold-2?from=discover-new', 'https://moonraccoon.bandcamp.com/album/prequels?from=discover-new', 'https://lolivone.bandcamp.com/album/live-at-the-berklee-performance-center?from=discover-new', 'https://nilswrasse.bandcamp.com/album/a-calling-from-the-desert-to-the-sea-original-motion-picture-soundtrack?from=discover-new', 'https://whitereaperaskingride.bandcamp.com/album/asking-for-a-ride?from=discover-new', 'https://collageeffect.bandcamp.com/album/emerald-network?from=discover-new', 'https://foxteethnj.bandcamp.com/album/through-the-blue?from=discover-new']
['https://finitysounds.bandcamp.com/album/kreme?from=discover-new', 'https://mylittlerobotfriend.bandcamp.com/album/amen-break?from=discover-new', 'https://electrinityband.bandcamp.com/album/rise?from=discover-new', 'https://abyssal-void.bandcamp.com/album/ritualist?from=discover-new', 'https://plataformarecs.bandcamp.com/album/v-a-david-lynch-experience?from=discover-new', 'https://hurricaneturtles.bandcamp.com/album/industrial-synth?from=discover-new', 'https://blackwashband.bandcamp.com/album/2?from=discover-new', 'https://worldwide-bitchin-records.bandcamp.com/album/wack?from=discover-new']
Each time it correctly pulls the first 8 albums on page 1, then for pages 2-4 it repeats the 8 albums on page 2, for pages 5-7 it repeats the 8 albums on page 5, and so on. Even though the page is updating (and the url changes) in the selenium browser, for some reason selenium is not recognizing any changes to the html so it repeats the same titles. Any idea where I've gone wrong?
Your definition of titles, i.e.
titles = browser.find_elements(By.CLASS_NAME, "item-title")
is a bad idea because item-title is the class of many elements in the page. Then another bad idea is to pick titles[-8:]. It may sounds good because you think ok each time I click a page the new elements are added at the end, but this is not always the case. Your case is one of those were elements are not added sequentially.
So let's start by considering a class exclusive of cards. For example discover-item. Then open the DevTools, press CTRL+F and enter .discover-item. When the url is first loaded, it will find 8 results. Now click next page, now it finds 16 results, click again and will find 24 results. To better see what's going I suggest you to run the following each time you click on the "next" button.
el = driver.find_elements(By.CSS_SELECTOR, '.discover-item')
for i,e in enumerate(el):
print(i,e.get_attribute('innerText').replace('\n',' - '))
In particular, when arriving to page 3, you will see that the first item shown in page 1 (which in my case is 0 and friends - Jacob Stanley - alternative), is now printed at a different position (in my case 8 and friends - Jacob Stanley - alternative). What happened is that the items of page 3 were added at the beginning of the list, and so you can see why titles[-8:] was a bad choice.
So a better choice is to consider all cards each time you go to the next page, instead of the last 8 only (notice that the HTML of this site can contain no more than 24 cards), and then add all current cards to a set (since a set cannot contain duplicates, only new elements will be added).
# scroll to cards
cards = WebDriverWait(driver,20).until(EC.visibility_of_all_elements_located((By.CSS_SELECTOR, ".discover-item")))
driver.execute_script('arguments[0].scrollIntoView({block: "center", behavior: "smooth"});', cards[0])
time.sleep(1)
items = set()
while 1:
links = driver.find_elements(By.CSS_SELECTOR, '.discover-item .item-title')
# extract info and store it
for idx,card in enumerate(cards):
tit_art_gen = card.get_attribute('innerText').replace('\n',' - ')
href = links[idx].get_attribute('href')
# print(idx, tit_art_gen)
items.add(tit_art_gen + ' - ' + href)
# click 'next' button if it is not disabled
next_button = driver.find_element(By.XPATH, "//a[.='next']")
if 'disabled' in next_button.get_attribute('class'):
print('last page reached')
break
else:
next_button.click()
# wait until new elements are loaded
cards_new = cards.copy()
while cards_new == cards:
cards_new = driver.find_elements(By.CSS_SELECTOR, '.discover-item')
time.sleep(.5)
cards = cards_new.copy()

How to break a loop if certain element is disabled and get text from multiple pages in Selenium Python

I am a new learner for python and selenium. I have written a code to extract data from multiple pages but there is certain problem in the code.
I am not able to break the a while loop function which clicks on next page until there is an option. The next page element disables after reaching the last page but code sill runs.
xpath: '//button[#aria-label="Next page"]'
Full SPAN: class="awsui_icon_h11ix_31bp4_98 awsui_size-normal-mapped-height_h11ix_31bp4_151 awsui_size-normal_h11ix_31bp4_147 awsui_variant-normal_h11ix_31bp4_219"
I am able to get the list of data which I want to extract from the webpage but I am getting on the last page data when I close the webpage from my end, ending the while loop.
Full Code:
opts = webdriver.ChromeOptions()
opts.headless = True
driver = webdriver.Chrome(ChromeDriverManager().install())
base_url = "XYZ"
driver.maximize_window()
driver.get(base_url)
driver.set_page_load_timeout(50)
element = WebDriverWait(driver, 50).until(EC.presence_of_element_located((By.ID, 'all-my-groups')))
driver.find_element(by=By.XPATH, value = '//*[#id="sim-issueListContent"]/div[1]/div/div/div[2]/div[1]/span/div/input').send_keys('No Stock')
dfs = []
page_counter = 0
while True:
wait = WebDriverWait(driver, 30)
wait.until(EC.visibility_of_all_elements_located((By.XPATH, "//div[contains(#class, 'alias-wrapper sim-ellipsis sim-list--shortId')]")))
cards = driver.find_elements_by_xpath("//div[contains(#class, 'alias-wrapper sim-ellipsis sim-list--shortId')]")
sims = []
for card in cards:
sims.append([card.text])
df = pd.DataFrame(sims)
dfs.append(df)
print(page_counter)
page_counter+=1
try:
wait.until(EC.element_to_be_clickable((By.XPATH,'//button[#aria-label="Next page"]'))).click()
except:
break
driver.close()
driver.quit()
I am also, attaching the image of the class and sorry I cannot share the URL as it of private domain.
The easiest option is to let your wait.until() fail via timeout when the "Next page" button is missing. Right now your line wait = WebDriverWait(driver, 30) is setting the timeout to 30 seconds; assuming the page normally loads much faster than that, you could change the timeout to be 5 seconds and then the loop will end faster once you're at the last page. If your page load times are sometimes slow then you should make sure the timeout won't accidentally cut off too early; if the load times are consistently fast then you might be able to get away with an even shorter timeout interval.
Alternatively, you could look through the specific target webpage more carefully to find some element that a) is always present and b) can be used to determine whether we're on the final page or not. Then you could read the value of that element and decide whether to break the loop before trying to find the "Next page" button. This could save a couple of seconds of waiting on the final loop iteration (avoid waiting for timeout) but may not be worth the trouble.
Change the below condtion
try:
wait.until(EC.element_to_be_clickable((By.XPATH,'//button[#aria-label="Next page"]'))).click()
except:
break
as shown in the below pseduocode #disabled is the diff that will make sure to exit the while loop if the button is disabled.
if(driver.find_elements_by_xpath('//button[#aria-label="Next page"][#disabled]'))).size()>0)
break
else
driver.find_element_by_xpath('//button[#aria-label="Next page"]').click()

Trouble with accessing dropdown values while scraping using Python Selenium

I am new to web scrapping. I am trying to scrape a website (for getting flight prices) using Python Selenium which has some drop down menu for fields like departure city, arrival city. The first option listen should be accessed and used.
Lets say that the field departure city is button, on click of that button, a new html page will be loaded with input box and a list set is available with every list element consisting of a button.
While debugging I found that, once my keyword is entered the options loading screen appears but options are not getting loaded even after increasing the sleep time (I have attached the image for the same).
image with error
Manually there are no issues with the drop down menu. As soon as I enter my departure city, options will be loaded and I choose the respective location.
I also tried to use actionchains. Unfortunately there is no luck. I am attaching my code below. Any help is appreciated.
#Manually trying to access the first element:**
#Accessing Input Field:**
fly_from = browser.find_element("xpath", "//button[contains(., 'Departing')]").click()
element = WebDriverWait(browser,10).until(ec.presence_of_element_located((By.ID,"location")))
#Sending the keyword:**
element.send_keys("Los Angeles")
#Selecting the first element:**
first_elem = browser.find_element("xpath", "//div[#class='classname default-padding']//div//ul//li[0]]//button")
Using Action Chains:
fly_from = browser.find_element("xpath", "//button[contains(., 'Departing')]").click()
time.sleep(10)
element = WebDriverWait(browser, 10).until(ec.presence_of_element_located((By.ID, "location")))
element.send_keys("Los Angeles")
time.sleep(3)
list_elem = browser.find_element("xpath", "//div[#class='class name default-padding']//div//ul//li[0]]//button")
size=element.size
action = ActionChains(browser)
action.move_to_element_with_offset(to_element=list_elem,xoffset =0.5*size['width'], yoffset=70).click().perform()
action.click(on_element = list_elem)
action.perform()

Python Selenium how to use an existing chromedriver window?

I am making an automated python script which opens chromedriver on a loop until it finds a specific element on the webpage (using selenium) the driver gets. This obviously eats up recourses eventually as it is constantly opening and closing the driver while on the loop.
Is there a way to use an existing chromedriver window instead of just opening and closing on a loop until a conditional is satisfied?
If that is not possible is there an alternative way to go about this you would reccomend?
Thanks!
Script:
from selenium.webdriver.common.keys import Keys
from selenium import webdriver
import pyautogui
import time
import os
def snkrs():
driver = webdriver.Chrome('/Users/me/Desktop/Random/chromedriver')
driver.get('https://www.nike.com/launch/?s=in-stock')
time.sleep(3)
pyautogui.click(184,451)
pyautogui.click(184,451)
current = driver.current_url
driver.get(current)
time.sleep(3.5)
elem = driver.find_element_by_xpath("//* .
[#id='j_s17368440']/div[2]/aside/div[1]/h1")
ihtml = elem.get_attribute('innerHTML')
if ihtml == 'MOON RACER':
os.system("clear")
print("SNKR has not dropped")
time.sleep(1)
else:
print("SNKR has dropped")
pyautogui.click(1303,380)
pyautogui.hotkey('command', 't')
pyautogui.typewrite('python3 messages.py') # Notifies me by text
pyautogui.press('return')
pyautogui.click(928,248)
pyautogui.hotkey('ctrl', 'z') # Kills the bash loop
snkrs()
Bash loop file:
#!/bin/bash
while [ 1 ]
do
python snkrs.py
done
You are defining a method that contains the chromedriver launch and then running through the method once (not looping) so each method call generates a new browser instance. Instead of doing that, do something more like this...
url = 'https://www.nike.com/launch/?s=in-stock'
driver.get(url)
# toggle grid view
WebDriverWait(driver, 5).until(EC.element_to_be_clickable((By.CSS_SELECTOR, "button[aria-label='Show Products as List']"))).click();
# wait for shoes to drop
while not driver.find_elements((By.XPATH, "//div[#class='figcaption-content']//h3[contains(.,'MOON RACER')]"))
print("SNKR has not dropped")
time.sleep(300) // 300s = 5 mins, don't spam their site
driver.get(url)
print("SNKR has dropped")
I simplified your code, changed the locator, and added a loop. The script launches a browser (once), loads the site, clicks the grid view toggle button, and then looks for the desired shoe to be displayed in this list. If the shoes don't exist, it just sleeps for 5 mins, reloads the page, and tries again. There's no need to refresh the page every 1s. You're going to draw attention to yourself and the shoes aren't going to be refreshed on the site that often anyway.
If you're just trying to wait until something changes on the page then this should do the trick:
snkr_has_not_dropped = True
while snkr_has_not_dropped:
elem = driver.find_element_by_xpath("//* .[ # id = 'j_s17368440'] / div[2] / aside / div[1] / h1")
ihtml = elem.get_attribute('innerHTML')
if ihtml == 'MOON RACER':
print("SNKR has not dropped")
driver.refresh()
else:
print("SNKR has dropped")
snkr_has_not_dropped = False
Just need to refresh the page and try again.

Categories

Resources