Scroll modal window using Selenium in Python - python

I am trying to scrape links to song pages for some artists on genius.com, but I'm running into issues because the links to the individual song pages are displayed inside a popup modal window.
The modal window doesn't load all links in one go, and instead loads more content via ajax when you scroll down to the bottom of the modal.
I tried using code to scroll to the bottom of the page but unfortunately that just scrolled in the window behind the modal rather than the modal itself:
driver.execute_script("window.scrollTo(0, document.body.scrollHeight);")
So then I tried selecting the last element in the modal and scrolling to that (with the idea of doing that a few times until all song pages had been loaded), but it wouldn't scroll far enough to get the website to load more content
last_element = driver.find_elements_by_xpath('//div[#class="mini_card-metadata"]')[-1]
last_element.location_once_scrolled_into_view
Here is my code so far:
import os
from bs4 import BeautifulSoup
from selenium import webdriver
chrome_driver = "/Applications/chromedriver"
os.environ["webdriver.chrome.driver"] = chrome_driver
driver = webdriver.Chrome(chrome_driver)
base_url = 'https://genius.com/artists/Stormzy'
driver.get(base_url)
xpath_str = '//div[contains(text(),"Show all songs by Stormzy")]'
driver.find_element_by_xpath(xpath_str).click()
Is there a way to extract all the song page links for the artist?

Try below code to get required output:
from selenium import webdriver as web
from selenium.webdriver.common.by import By
from selenium.webdriver.support import expected_conditions as EC
from selenium.webdriver.support.ui import WebDriverWait as wait
from selenium.webdriver.common.keys import Keys
from selenium.common.exceptions import TimeoutException
driver = web.Chrome()
base_url = 'https://genius.com/artists/Stormzy'
driver.get(base_url)
# Open modal
driver.find_element_by_xpath('//div[normalize-space()="Show all songs by Stormzy"]').click()
song_locator = By.CSS_SELECTOR, 'a.mini_card.mini_card--small'
# Wait for first XHR complete
wait(driver, 10).until(EC.visibility_of_element_located(song_locator))
# Get current length of songs list
current_len = len(driver.find_elements(*song_locator))
while True:
# Load new XHR until it's possible
driver.find_element(*song_locator).send_keys(Keys.END)
try:
wait(driver, 3).until(lambda x: len(driver.find_elements(*song_locator)) > current_len)
current_len = len(driver.find_elements(*song_locator))
# Return full list of songs
except TimeoutException:
songs_list = [song.get_attribute('href') for song in driver.find_elements(*song_locator)]
break
print(songs_list)
This should allow you to request new XHR until length of songs list became constant and finally return the list of links

When you scroll to bottom of modal dialog it call
$scrollable_data_ctrl.load_next();
As option you can try execute it until new results appear in modal
driver.execute_script("$scrollable_data_ctrl.load_next();")

Related

2nd 'Element not interactable exception' when iterating on the loop, SELENIUM PYTHON

Selenium does see displayed element visible on the page on the second iteration.
I click on a link, and a box within a website appears. I need to close that box.
This action will be performed 1000+ times. On the first iteration, Selenium opens the link and closes the box. On the second iteration, Selenium opens the link, and cannot close the box. At this point, it gives error message:
Exception has occurred: ElementNotInteractableException Message: element not interactable (Session info: chrome=105.0.5195.102)
My code + HTML of relevant element below.
from selenium import webdriver
from selenium.webdriver.common.by import By
import time
options = webdriver.ChromeOptions()
options.add_argument("start-maximized")
options.add_argument("disable-infobars")
options.add_argument("--disable-extensions")
driver = webdriver.Chrome(chrome_options=options, executable_path=r"D:\SeleniumDriver\chromedriver.exe")
driver.get('https://sprawozdaniaopp.niw.gov.pl/')
find_button = driver.find_element("id", "btnsearch")
find_button.click()
interesting_links = driver.find_elements(By.CLASS_NAME, "dialog")
for i in range(len(interesting_links)):
interesting_links[i].click()
time.sleep(10) # I tried 60 seconds, no change
#
# HERE I WOULD DO MY THINGS
#
close_box = driver.find_element(By.CLASS_NAME, "ui-dialog-titlebar-close")
print(close_box.is_displayed())
close_box.click() # Here is where the program crushes on the 2nd iteration
if i == 4: # Stop the program after 5 iterations
break
HTML code of the relevant element:
<span class="ui-icon ui-icon-closethick">close</span>
I tried to locate the element that closes the box by CSS SELECTOR AND XPATH.
The CSS SELECTOR of the X/close button is the same every time, but
only the first time Selenium will see the X button displayed.
THE XPATH is strange. On the first opening of the link, X/close button will have path:
/html/body/div[6]/div[1]/a
However, if you open the next link, path will look this:
/html/body/div[8]/div[1]/a
Let me know what you think of that :-)
This is one way to achieve your goal:
from selenium import webdriver
from selenium.webdriver.chrome.service import Service
from selenium.webdriver.chrome.options import Options
from selenium.webdriver.common.by import By
from selenium.webdriver.support.ui import WebDriverWait
from selenium.webdriver.support import expected_conditions as EC
import time as t
chrome_options = Options()
chrome_options.add_argument("--no-sandbox")
chrome_options.add_argument('disable-notifications')
chrome_options.add_argument("window-size=1280,720")
webdriver_service = Service("chromedriver/chromedriver") ## path to where you saved chromedriver binary
browser = webdriver.Chrome(service=webdriver_service, options=chrome_options)
wait = WebDriverWait(browser, 20)
url = 'https://sprawozdaniaopp.niw.gov.pl/'
browser.get(url)
wait.until(EC.element_to_be_clickable((By.ID, "btnsearch"))).click()
links = wait.until(EC.presence_of_all_elements_located((By.CSS_SELECTOR, "a[class='dialog']")))
counter = 0
for link in links[:5]:
link.click()
print('clicked link', link.text)
### do your stuff ###
t.sleep(1)
wait.until(EC.presence_of_all_elements_located((By.CSS_SELECTOR, 'span[class="ui-icon ui-icon-closethick"]')))[counter].click()
print('closed the popup')
counter = counter+1
This will print out in terminal:
clicked link STOWARZYSZENIE POMOCY DZIECIOM Z PORAŻENIEM MÓZGOWYM "JASNY CEL"
closed the popup
clicked link FUNDACJA NA RZECZ POMOCY DZIECIOM Z GRODZIEŃSZCZYZNY
closed the popup
clicked link FUNDACJA "ADAMA"
closed the popup
clicked link KUJAWSKO-POMORSKI ZWIĄZEK LEKKIEJ ATLETYKI
closed the popup
clicked link "RYBNICKI KLUB PIŁKARSKI - SZKÓŁKA PIŁKARSKA ROW W RYBNIKU"
closed the popup
Every time you click on a link, a new popup is created. When you close it, that popup will not disappear, but it will stay hidden. So when you click on a new link and then you want to close the new popup, you need to select the new (nth) close button. This should also apply to popup elements, so make sure you account for it. I stopped after the 5th link, of course you will need to remove the slicing to handle all links present in page.
Selenium setup above is chromedriver on linux - you just have to observe the imports, and the code after defining the browser(driver).
Selenium documentation can be found at https://www.selenium.dev/documentation/

Python Selenium cant find loadmore (mehr anzeigen) button

i basically want to click every load more button thats on the page before running the rest of my code because otherwise i wont be able to access each profile.
There are 2 Problems:
first how do i even access it? i tried similar methods to the fancyCompLabel part of my code but it wont work.
second im not sure how i should loop through all buttons since i would assume the second button only starts loading until the first one is clicked.
heres the relevant html part and a picture of the button
<span type="button" class="md-text-button button-orange-white" onclick="loadFollowing();">mehr anzeigen</span>
Heres the code to access each profile but as you can see it only runs until the first load button.
from bs4 import BeautifulSoup
import time
from selenium import webdriver
from selenium.webdriver.common.by import By
from selenium.webdriver.support import expected_conditions as EC
from selenium.webdriver.support.wait import WebDriverWait
time.sleep(3)
# Set some Selenium Options
options = webdriver.ChromeOptions()
# options.add_argument('--headless')
options.add_argument('--no-sandbox')
options.add_argument('--disable-dev-shm-usage')
# Webdriver
wd = webdriver.Chrome(executable_path='/usr/bin/chromedriver', options=options)
# URL
url = 'https://www.techpilot.de/zulieferer-suchen?laserschneiden%202d%20(laserstrahlschneiden)'
# Load URL
wd.get(url)
# Get HTML
soup = BeautifulSoup(wd.page_source, 'html.parser')
wait = WebDriverWait(wd, 15)
wait.until(EC.element_to_be_clickable((By.CSS_SELECTOR, "#bodyJSP #CybotCookiebotDialogBodyLevelButtonLevelOptinAllowAll"))).click()
wait.until(EC.frame_to_be_available_and_switch_to_it((By.CSS_SELECTOR, "#efficientSearchIframe")))
wait.until(EC.element_to_be_clickable((By.CSS_SELECTOR, ".hideFunctionalScrollbar #CybotCookiebotDialogBodyLevelButtonLevelOptinAllowAll"))).click()
#wd.switch_to.default_content() # you do not need to switch to default content because iframe is closed already
wait.until(EC.presence_of_all_elements_located((By.CSS_SELECTOR, ".fancyCompLabel")))
results = wd.find_elements_by_css_selector(".fancyCompLabel")
for profil in results:
print(profil.text) #heres the rest of my code but its not relevant
wd.close()
As I see the second pop-up element located by .hideFunctionalScrollbar #CybotCookiebotDialogBodyLevelButtonLevelOptinAllowAll is initially appearing out of the visible screen.
So after switching to the iframe you need to scroll to that element before trying to click on it.
Also, presence_of_all_elements_located doesn't actually wait for all the elements presence. It even doesn't know how many such elements will be. It returns once it finds at least 1 element matching the passed locator.
So I'd advice to add a short sleep after that line to allow all those elements to be actually loaded.
from selenium.webdriver.common.action_chains import ActionChains
wait = WebDriverWait(wd, 15)
actions = ActionChains(wd)
wait.until(EC.element_to_be_clickable((By.CSS_SELECTOR, "#bodyJSP #CybotCookiebotDialogBodyLevelButtonLevelOptinAllowAll"))).click()
wait.until(EC.frame_to_be_available_and_switch_to_it((By.CSS_SELECTOR, "#efficientSearchIframe")))
second_pop_up = wd.find_element_by_css(".hideFunctionalScrollbar #CybotCookiebotDialogBodyLevelButtonLevelOptinAllowAll")
actions.move_to_element(second_pop_up).build().perform()
time.sleep(0.5)
second_pop_up.click()
wait.until(EC.presence_of_all_elements_located((By.CSS_SELECTOR, ".fancyCompLabel")))
time.sleep(0.5)
for profil in results:
print(profil.text) #heres the rest of my code but its not relevant
wd.close()

Python, Selenium. Google Chrome. Web Scraping. How to navigate between 'tabs' in website

im quite noob in python and right now building up a web scraper in Selenium that would take all URL's for products in the clicked 'tab' on web page. But my code take the URL's from the first 'tab'. Code below. Thank you guys. Im starting to be kind of frustrated lol.
Screenshot
from selenium import webdriver
from selenium.webdriver.common.keys import Keys
from selenium.webdriver.common.by import By
from selenium.webdriver.support.ui import WebDriverWait
from selenium.webdriver.support import expected_conditions as EC
import time
from lxml import html
PATH = 'C:\Program Files (x86)\chromedriver.exe'
driver = webdriver.Chrome(PATH)
url = 'https://www.alza.sk/vypredaj-akcia-zlava/e0.htm'
driver.get(url)
driver.find_element_by_xpath('//*[#id="tabs"]/ul/li[2]').click()
links = []
try:
WebDriverWait(driver, 10).until(
EC.presence_of_element_located((By.CLASS_NAME, 'blockFilter')))
link = driver.find_elements_by_xpath("//a[#class='name browsinglink impression-binded']")
for i in link:
links.append(i.get_attribute('href'))
finally:
driver.quit()
print(links)
To select current tab:
current_tab = driver.current_window_handle
To switch between tabs:
driver.switch_to_window(driver.window_handles[1])
driver.switch_to.window(driver.window_handles[-1])
Assuming you have the new tab url as TAB_URL, you should try:
from selenium.webdriver.common.action_chains import ActionChains
action = ActionChains(driver)
action.key_down(Keys.CONTROL).click(TAB_URL).key_up(Keys.CONTROL).perform()
Also, apparently the li doesn't have a click event, are you sure this element you are getting '//*[#id="tabs"]/ul/li[2]' has the aria-selected property set to true or any of these classes: ui-tabs-active ui-state-active?
If not, you should call click on the a tag inside this li.
Then you should increase the timeout parameter of your WebDriverWait to guarantee that the div is loaded.

Python Selenium 'clicks' element but nothing happens

So I am trying to scrape a website, and I want to click an element, go to the page that opens from the click, find another element and click that one. The first click seems to work, no errors, but the next page doesn't open, thus I get an error. Here is a screenshot of what I want to click on the fist page: https://prnt.sc/10l8xa4. Clicking that should redirect to the second page. The problem seems to be that the driver clicks the element but nothing happens:
import sys, csv, os
from selenium import webdriver # Selenium 3.141.0
from selenium.webdriver.chrome.options import Options
from selenium.webdriver.support.ui import WebDriverWait
from selenium.common.exceptions import TimeoutException
from selenium.webdriver.common.action_chains import ActionChains
from selenium.webdriver.common.by import By
from selenium.webdriver.support import expected_conditions as EC
from bs4 import BeautifulSoup
from datetime import datetime
from time import sleep
class Scraper(object):
''' A lot of the messy code is just playing with the tags from the page'''
def __init__(self, link):
self.link = link
self.driver = self.configure_driver() # The simulated browser
# Configuring the browser simulator, named driver, that will get all the information
def configure_driver(self):
# Add additional Options to the webdriver
chrome_options = Options()
# add the argument and make the browser Headless. It will work smoother& faster but it will miss the first category
# chrome_options.add_argument("--headless")
driver = webdriver.Chrome(options = chrome_options)
return driver
def click_element(self, selector): # Clicks the provided element from the page, even if not visible
element = WebDriverWait(self.driver, 20).until(
EC.presence_of_element_located((By.CSS_SELECTOR, selector)))
ActionChains(self.driver).move_to_element(element).click(element).perform()
if __name__ == '__main__':
product_link = 'https://www.action.com/nl-nl/click-and-collect-producten/' # An example of a product
app = Scraper(product_link)
with app.driver:
app.driver.get(product_link)
app.click_element('a.content-card.has-text.card-theme--light.card-size--s.card-align--bottom-left') # This gets clicked and should open new page, but it doesn't
sleep(10)
app.click_element('a.product-card__link') # This throws a Timeout, because the element can't be found, which is obvoius because the second page(which has this element) didn't open
sleep(20)
Try use like that:
with app.driver:
app.driver.get(product_link)
sleep(2)
app.click_element('li.has-submenu')
sleep(2)
app.click_element(
'div.grid-item.grid-item--content') # This gets clicked and should open new page, but it doesn't
sleep(2)
you should add step of openning pop-up and then click on your aimed button
Code with required argument but, by xpath:
def click_element(self, selector, by=By.CSS_SELECTOR): # Clicks the provided element from the page, even if not visible
element = WebDriverWait(self.driver, 20).until(
EC.presence_of_element_located((by, selector)))
ActionChains(self.driver).move_to_element(element).click(element).perform()
if __name__ == '__main__':
product_link = 'https://www.action.com/nl-nl/click-and-collect-producten/' # An example of a product
app = Scraper(product_link)
with app.driver:
app.driver.get(product_link)
sleep(2)
app.click_element("//section[#class='grid']/div[#class='grid-item grid-item--content'][1]", By.XPATH)
sleep(2)

how can I get the next page's reviews with selenium?

i'm trying to scrape more than 10 pages of reviews from https://www.innisfree.com/kr/ko/ProductReviewList.do
However when i move to the next page and try to get the new page's reviews, i still get the first page's reviews only.
i used driver.execute_script("goPage(2)") and also time.sleep(5) but my code only gives me the first page's reviews.
''' i did not use for-loop just to see whether the results are different between page1 and page2'''
''' i imported beautifulsoup and selenium'''
here is my code:
url = "https://www.innisfree.com/kr/ko/ProductReviewList.do"
chromedriver = r'C:\Users\hhm\Downloads\chromedriver_win32\chromedriver.exe'
driver = webdriver.Chrome(chromedriver)
driver.get(url)
print("this is page 1")
driver.execute_script("goPage(1)")
nTypes = soup.select('.reviewList ul .newType div[class^=reviewCon] .reviewConTxt')
for nType in nTypes:
product = nType.select_one('.pdtName').text
print(product)
print('\n')
print("this is page 2")
driver.execute_script("goPage(2)")
time.sleep(5)
nTypes = soup.select('.reviewList ul .newType div[class^=reviewCon] .reviewConTxt')
for nType in nTypes:
product = nType.select_one('.pdtName').text
print(product)
If your second page open as new window then you need to switch to another page and switch your selenium control to another window
Example:
# Opens a new tab
self.driver.execute_script("window.open()")
# Switch to the newly opened tab
self.driver.switch_to.window(self.driver.window_handles[1])
Source:
How to switch to new window in Selenium for Python?
https://www.techbeamers.com/switch-between-windows-selenium-python/
Try the following code.You need to click on each pagination link to reach to next page.you will get all 100 review comments.
from selenium import webdriver
from selenium.webdriver.common.by import By
from selenium.webdriver.support.ui import WebDriverWait
from selenium.webdriver.support import expected_conditions as EC
from bs4 import BeautifulSoup
import time
url = "https://www.innisfree.com/kr/ko/ProductReviewList.do"
chromedriver = r'C:\Users\hhm\Downloads\chromedriver_win32\chromedriver.exe'
driver = webdriver.Chrome(chromedriver)
driver.get(url)
for i in range(2,12):
time.sleep(2)
soup=BeautifulSoup(driver.page_source,'html.parser')
nTypes = soup.select('.reviewList ul .newType div[class^=reviewCon] .reviewConTxt')
for nType in nTypes:
product = nType.select_one('.pdtName').text
print(product)
if i==11:
break
nextbutton=WebDriverWait(driver,10).until(EC.element_to_be_clickable((By.XPATH,"//span[#class='num']/a[text()='" +str(i)+"']")))
driver.execute_script("arguments[0].click();",nextbutton)

Categories

Resources