I'm trying to find all the elements on a page with the specific selector but as it seems, not all the elements are found.
Code I'm using:
from selenium import webdriver
from selenium.webdriver.common.by import By
PATH = "C:\Program Files (x86)\chromedriver.exe"#path of chrome driver
driver = webdriver.Chrome(PATH)#accesses the chrome driver
web = driver.get("https://www.eduqas.co.uk/qualifications/computer-science-as-a-level/#tab_pastpapers")#website
driver.maximize_window()
driver.implicitly_wait(3)
driver.execute_script("window.scrollTo(0, 540)")
driver.implicitly_wait(3)
elements = driver.find_elements(By.CSS_SELECTOR, ".css-13punl2")
driver.implicitly_wait(3)
for x in elements:
x.click()
print(len(elements))
When I print the length of the array "elements" it returns 1, when there are multiple elements on the web page with the selector ".css-13punl2". As seen here image of web page code
link to the website: https://www.eduqas.co.uk/qualifications/computer-science-as-a-level/#tab_pastpapers
For some reason, when I inspect the web page there will sometimes be 6 elements with selector ".css-13punl2" and sometimes there will be 7, but I'm not too sure.
is the selector stable?
im not much familiar with selenium in python, but from what i know, in runtime there are some element attributes that change...
my advice:
put a sleep for 30 seconds, open console (F12) in the opened driver, and write the following command:
$$(".css-13punl2")
if it gives you only 1 element, than you found the problem
or that even it gave you 6 elements, but most of them are invisible.
could you also provide a screenshot of the web itself? or even the link to it
EDITED ANSWER:
try this selector:
#pastpapers_content button
You are trying to wait for the element with driver.implicitly_wait(3). This method is created for activating implicit wait. It can be turned On once and then you don't need to turn it again. When it's turned On Selenium will try to find the element you want for 3 (in this case) seconds instead of one attempt immediately after page loading. In your case it finds one element quickly and doesn't wait for all the elements to appear on the page.
You need to give the page some time to load all the stuff. For that you can use a sleep method for example. It pauses the script execution for the amount of seconds you've set.
Also, the elements on this page disappear from the view so you need to scroll each time you click an element. Additionally you need to close this 'Cookie' prompt as it intercepts the clicks.
And I guess you need the elements with Year, so it's better to skip clicking the 'GCSE' which is also found by this selector.
So, the code will look like that:
from selenium import webdriver
from selenium.webdriver.common.by import By
from time import sleep
PATH = "C:\Program Files (x86)\chromedriver.exe" # path of chrome driver
driver = webdriver.Chrome(PATH) # accesses the chrome driver
driver.get("https://www.eduqas.co.uk/qualifications/computer-science-as-a-level/#tab_pastpapers") # website
driver.maximize_window()
driver.implicitly_wait(3)
driver.execute_script("window.scrollTo(0, 540)")
sleep(3) # Giving time to fully load the content
elements = driver.find_elements(By.CSS_SELECTOR, ".css-13punl2")
driver.find_element(By.ID, 'accept-cookies').click() # Closes the cookies prompt
for x in elements:
if x.text == 'GCSE':
continue
x.click()
driver.execute_script("window.scrollTo(0, document.body.scrollHeight);") # this scrolls the page to the bottom
sleep(1) # This sleep is necessary to give time to finish scrolling
print(len(elements))
Don't use implicitly_wait() more than once, try the below code, it will click on each year:
driver.get("https://www.eduqas.co.uk/qualifications/computer-science-as-a-level/#tab_pastpapers")
wait = WebDriverWait(driver, 10)
time.sleep(1)
# to click on Accept Cookies button
wait.until(EC.element_to_be_clickable((By.CSS_SELECTOR, "#accept-cookies"))).click()
# waiting for list of years to appear and scrolling to it
content = wait.until(EC.presence_of_element_located((By.CSS_SELECTOR, "#pastpapers_content")))
driver.execute_script("arguments[0].scrollIntoView(true)", content)
# get all the elements
elements = wait.until(EC.presence_of_all_elements_located((By.XPATH, ".//*[#id='pastpapers_content']//button[#type='button']")))
print("Total elements: ", len(elements))
for i in range(len(elements)):
ele = driver.find_element(By.XPATH, "(.//*[#id='pastpapers_content']//button[#type='button'])[" + str(i + 1) + "]")
time.sleep(1)
ele.location_once_scrolled_into_view
time.sleep(1)
ele.click()
Related
Selenium does see displayed element visible on the page on the second iteration.
I click on a link, and a box within a website appears. I need to close that box.
This action will be performed 1000+ times. On the first iteration, Selenium opens the link and closes the box. On the second iteration, Selenium opens the link, and cannot close the box. At this point, it gives error message:
Exception has occurred: ElementNotInteractableException Message: element not interactable (Session info: chrome=105.0.5195.102)
My code + HTML of relevant element below.
from selenium import webdriver
from selenium.webdriver.common.by import By
import time
options = webdriver.ChromeOptions()
options.add_argument("start-maximized")
options.add_argument("disable-infobars")
options.add_argument("--disable-extensions")
driver = webdriver.Chrome(chrome_options=options, executable_path=r"D:\SeleniumDriver\chromedriver.exe")
driver.get('https://sprawozdaniaopp.niw.gov.pl/')
find_button = driver.find_element("id", "btnsearch")
find_button.click()
interesting_links = driver.find_elements(By.CLASS_NAME, "dialog")
for i in range(len(interesting_links)):
interesting_links[i].click()
time.sleep(10) # I tried 60 seconds, no change
#
# HERE I WOULD DO MY THINGS
#
close_box = driver.find_element(By.CLASS_NAME, "ui-dialog-titlebar-close")
print(close_box.is_displayed())
close_box.click() # Here is where the program crushes on the 2nd iteration
if i == 4: # Stop the program after 5 iterations
break
HTML code of the relevant element:
<span class="ui-icon ui-icon-closethick">close</span>
I tried to locate the element that closes the box by CSS SELECTOR AND XPATH.
The CSS SELECTOR of the X/close button is the same every time, but
only the first time Selenium will see the X button displayed.
THE XPATH is strange. On the first opening of the link, X/close button will have path:
/html/body/div[6]/div[1]/a
However, if you open the next link, path will look this:
/html/body/div[8]/div[1]/a
Let me know what you think of that :-)
This is one way to achieve your goal:
from selenium import webdriver
from selenium.webdriver.chrome.service import Service
from selenium.webdriver.chrome.options import Options
from selenium.webdriver.common.by import By
from selenium.webdriver.support.ui import WebDriverWait
from selenium.webdriver.support import expected_conditions as EC
import time as t
chrome_options = Options()
chrome_options.add_argument("--no-sandbox")
chrome_options.add_argument('disable-notifications')
chrome_options.add_argument("window-size=1280,720")
webdriver_service = Service("chromedriver/chromedriver") ## path to where you saved chromedriver binary
browser = webdriver.Chrome(service=webdriver_service, options=chrome_options)
wait = WebDriverWait(browser, 20)
url = 'https://sprawozdaniaopp.niw.gov.pl/'
browser.get(url)
wait.until(EC.element_to_be_clickable((By.ID, "btnsearch"))).click()
links = wait.until(EC.presence_of_all_elements_located((By.CSS_SELECTOR, "a[class='dialog']")))
counter = 0
for link in links[:5]:
link.click()
print('clicked link', link.text)
### do your stuff ###
t.sleep(1)
wait.until(EC.presence_of_all_elements_located((By.CSS_SELECTOR, 'span[class="ui-icon ui-icon-closethick"]')))[counter].click()
print('closed the popup')
counter = counter+1
This will print out in terminal:
clicked link STOWARZYSZENIE POMOCY DZIECIOM Z PORAŻENIEM MÓZGOWYM "JASNY CEL"
closed the popup
clicked link FUNDACJA NA RZECZ POMOCY DZIECIOM Z GRODZIEŃSZCZYZNY
closed the popup
clicked link FUNDACJA "ADAMA"
closed the popup
clicked link KUJAWSKO-POMORSKI ZWIĄZEK LEKKIEJ ATLETYKI
closed the popup
clicked link "RYBNICKI KLUB PIŁKARSKI - SZKÓŁKA PIŁKARSKA ROW W RYBNIKU"
closed the popup
Every time you click on a link, a new popup is created. When you close it, that popup will not disappear, but it will stay hidden. So when you click on a new link and then you want to close the new popup, you need to select the new (nth) close button. This should also apply to popup elements, so make sure you account for it. I stopped after the 5th link, of course you will need to remove the slicing to handle all links present in page.
Selenium setup above is chromedriver on linux - you just have to observe the imports, and the code after defining the browser(driver).
Selenium documentation can be found at https://www.selenium.dev/documentation/
Objective:
Need to open a webpage, find urls by class and then open each url in a new tab in chrome
What I tried:
from selenium import webdriver
from selenium.webdriver import ActionChains
from selenium.webdriver.common.keys import Keys
driver = webdriver.Chrome(executable_path="C:\\Users\xxx\Downloads\chromedriver_win32\chromedriver.exe")
driver.maximize_window()
driver.get("https://xxxxxxxx.com")
elements = driver.find_elements_by_class_name("post-comments")
for e in elements:
e.click()
driver.quit()
current output:
It's just opening the first url in the same parent window then stop there.. focus is not going to base url so it is not able to open second and other urls in new tab.
Please let me know your suggestions.
Update1: when I search for relative xpath, I came to know that I need to use regexp so that I can find all links containing "respond" in it.
For that I tried to use
driver.find_element_by_css_selector("a[href*='respond']]")
but it errored there is no find_element.
Update 2:
After some trials, I am able to achieve to the most output that I require with below code. But below code is working on first page only, when below code completes for first page, then there will be 'NextPage' button , I need to click on that to open that page, that will become new main page and I need to run the below piece of code again on that page. How to tell below piece of code once elements on this page done, click on 'NextPage' button and run below logic for that page. Please suggest
from selenium import webdriver
import time
from selenium.webdriver import ActionChains
from selenium.webdriver.common.by import By
from selenium.webdriver.common.keys import Keys
driver = webdriver.Chrome(executable_path="C:\\Users\xxx\Downloads\chromedriver_win32\chromedriver.exe")
driver.maximize_window()
driver.get("https://xxxxxxxx.com")
current_windows_handle = driver.current_window_handle
elements = driver.find_elements_by_css_selector('h2.post-title.entry-title>a')
for ele in (elements):
url = [ele.get_attribute("href")]
for x in url:
time.sleep(2)
driver.execute_script("window.open('');")
handles = driver.window_handles
driver.switch_to.window(handles[1])
driver.get(x)
driver.find_element_by_xpath("//span[#class='mb-text'][contains(.,'Posts')]").click()
time.sleep(2)
driver.close()
driver.switch_to.window(driver.window_handles[1])
time.sleep(5)
actualurl = driver.current_url
with open(r'c:\test\new.txt', 'a') as text_file:
text_file.write(actualurl + '\n')
driver.close()
time.sleep(5)
driver.switch_to.window(current_windows_handle)
driver.close()
driver.quit()
The error is valid, cause
driver.find_element_by_css_selector("a[href*='respond']]")
this is syntax error, but since it is wrapped inside "", compiler won't show it.
try this instead :
elements = driver.find_elements_by_css_selector("a[href*='respond']")
and then print the length like this :-
print(len(elements))
I am assuming that it will have some elements in it. and you can iterate it as well. Pleas see below, it should help you figure out how to simulate those actions.
current_windows_handle = driver.current_window_handle
elements = driver.find_elements_by_css_selector("a[href*='respond']")
for ele in elements:
time.sleep(2)
driver.execute_script("window.open('');") # open a new tab.
handles = driver.window_handles
driver.switch_to.window(handles[1])
driver.get(ele.get_attribute('href'))
time.sleep(2)
# Now it would have loaded a new url in new tab.
# do whatever you wanna do.
# Now close the new tab
driver.close()
time.sleep(2)
driver.switch_to.window(current_windows_handle)
updated 1 :
current_windows_handle = driver.current_window_handle
elements = driver.find_elements_by_css_selector("a[href*='respond']")
j = 0
for ele in range(len(elements)):
time.sleep(2)
elements = driver.find_elements_by_css_selector("a[href*='respond']")
driver.execute_script("window.open('');") # open a new tab.
handles = driver.window_handles
driver.switch_to.window(handles[1])
driver.get(elements[j].get_attribute('href'))
time.sleep(2)
j = j + 1
# Now it would have loaded a new url in new tab.
# do whatever you wanna do.
# Now close the new tab
driver.close()
time.sleep(2)
driver.switch_to.window(current_windows_handle)
I am trying to scrape the headers of wikipedia pages as an exercise, and i want to be able to distinguish between headers with "h2" and "h3" tags.
Therefore i wrote this code:
from selenium import webdriver
from selenium.webdriver.common.keys import Keys #For being able to input key presses
import time #Useful for if your browser is faster than your code
PATH = r"C:\Users\Alireza\Desktop\chromedriver\chromedriver.exe" #Location of the chromedriver
driver = webdriver.Chrome(PATH)
driver.get("https://de.wikipedia.org/wiki/Alpha-Beta-Suche") #Open website in Chrome
print(driver.title) #Print title of the website to console
h1Header = driver.find_element_by_tag_name("h1") #Find the first heading in the article
h2HeaderTexts = driver.find_elements_by_tag_name("h2") #List of all other major headers in the article
h3HeaderTexts = driver.find_elements_by_tag_name("h3") #List of all minor headers in the article
for items in h2HeaderTexts:
scor = items.find_element_by_class_name("mw-headline")
driver.quit()
However, this does not work and the program does not terminate.
Anybody have a solution for this?
The problem here lies in the for loop! Apparently i can not scrape any elements by class name (or anything else) from the elements in h2HeaderTexts, although this should be possible.
You can filter in xpath iteself :
PATH = r"C:\Users\Alireza\Desktop\chromedriver\chromedriver.exe" #Location of the chromedriver
driver = webdriver.Chrome(PATH)
driver.maximize_window()
driver.implicitly_wait(30)
driver.get("https://de.wikipedia.org/wiki/Alpha-Beta-Suche") #Open website in Chrome
print(driver.title)
for item in driver.find_elements(By.XPATH, "//h2/span[#class='mw-headline']"):
print(item.text)
this should give you, h2 heading with class mw-headline elements.
output :
Informelle Beschreibung
Der Algorithmus
Implementierung
Optimierungen
Vergleich von Minimax und AlphaBeta
Geschichte
Literatur
Weblinks
Fußnoten
Process finished with exit code 0
Update 1 :
The reason why your loop is still running and program does not terminate, is cause if you look the page HTML source, and the first h2 tag, that h2 tag does not have a child span with mw-headline, so selenium is trying to locate the element which is not there in HTML DOM. also you are using find_elements which return a list of web elements if found, if not return an empty list, is the reason you do not see exception as well.
You should wait until elements appearing on the page before accessing them.
Also, there are several elements with h1 tag name too there.
To search for elements inside element you should use xpath starting with a dot. Otherwise this will search for the first match on the entire page.
The first h2 element on that page has no element with class name mw-headline inside it. So, you should handle this issue too.
from selenium import webdriver
from selenium.webdriver.common.keys import Keys
from selenium.webdriver.common.by import By
from selenium.webdriver.support.ui import WebDriverWait
from selenium.webdriver.support import expected_conditions as EC #For being able to input key presses
import time #Useful for if your browser is faster than your code
PATH = r"C:\Users\Alireza\Desktop\chromedriver\chromedriver.exe" #Location of the chromedriver
driver = webdriver.Chrome(PATH)
wait = WebDriverWait(driver, 20)
driver.get("https://de.wikipedia.org/wiki/Alpha-Beta-Suche") #Open website in Chrome
print(driver.title) #Print title of the website to console
wait.until(EC.visibility_of_element_located((By.XPATH, "//h1")))
h1Headers = driver.find_elements_by_tag_name("h1") #Find the first heading in the article
h2HeaderTexts = driver.find_elements_by_tag_name("h2") #List of all other major headers in the article
h3HeaderTexts = driver.find_elements_by_tag_name("h3") #List of all minor headers in the article
for items in h2HeaderTexts:
scor = items.find_elements_by_xpath(".//span[#class='mw-headline']")
if scor:
#do what you need with scor[0] element
driver.quit()
You're version does not finish executing because selenium will drop the process if it could not locate an element.
Devs do not like to use try/catch but i personnaly have not found a better way to work around. If you do :
for items in h2HeaderTexts:
try:
scor = items.find_element_by_class_name('mw-headline').text
print(scor)
except:
print('nothing found')
You will notice that it will execute till the end and you have a result.
i basically want to click every load more button thats on the page before running the rest of my code because otherwise i wont be able to access each profile.
There are 2 Problems:
first how do i even access it? i tried similar methods to the fancyCompLabel part of my code but it wont work.
second im not sure how i should loop through all buttons since i would assume the second button only starts loading until the first one is clicked.
heres the relevant html part and a picture of the button
<span type="button" class="md-text-button button-orange-white" onclick="loadFollowing();">mehr anzeigen</span>
Heres the code to access each profile but as you can see it only runs until the first load button.
from bs4 import BeautifulSoup
import time
from selenium import webdriver
from selenium.webdriver.common.by import By
from selenium.webdriver.support import expected_conditions as EC
from selenium.webdriver.support.wait import WebDriverWait
time.sleep(3)
# Set some Selenium Options
options = webdriver.ChromeOptions()
# options.add_argument('--headless')
options.add_argument('--no-sandbox')
options.add_argument('--disable-dev-shm-usage')
# Webdriver
wd = webdriver.Chrome(executable_path='/usr/bin/chromedriver', options=options)
# URL
url = 'https://www.techpilot.de/zulieferer-suchen?laserschneiden%202d%20(laserstrahlschneiden)'
# Load URL
wd.get(url)
# Get HTML
soup = BeautifulSoup(wd.page_source, 'html.parser')
wait = WebDriverWait(wd, 15)
wait.until(EC.element_to_be_clickable((By.CSS_SELECTOR, "#bodyJSP #CybotCookiebotDialogBodyLevelButtonLevelOptinAllowAll"))).click()
wait.until(EC.frame_to_be_available_and_switch_to_it((By.CSS_SELECTOR, "#efficientSearchIframe")))
wait.until(EC.element_to_be_clickable((By.CSS_SELECTOR, ".hideFunctionalScrollbar #CybotCookiebotDialogBodyLevelButtonLevelOptinAllowAll"))).click()
#wd.switch_to.default_content() # you do not need to switch to default content because iframe is closed already
wait.until(EC.presence_of_all_elements_located((By.CSS_SELECTOR, ".fancyCompLabel")))
results = wd.find_elements_by_css_selector(".fancyCompLabel")
for profil in results:
print(profil.text) #heres the rest of my code but its not relevant
wd.close()
As I see the second pop-up element located by .hideFunctionalScrollbar #CybotCookiebotDialogBodyLevelButtonLevelOptinAllowAll is initially appearing out of the visible screen.
So after switching to the iframe you need to scroll to that element before trying to click on it.
Also, presence_of_all_elements_located doesn't actually wait for all the elements presence. It even doesn't know how many such elements will be. It returns once it finds at least 1 element matching the passed locator.
So I'd advice to add a short sleep after that line to allow all those elements to be actually loaded.
from selenium.webdriver.common.action_chains import ActionChains
wait = WebDriverWait(wd, 15)
actions = ActionChains(wd)
wait.until(EC.element_to_be_clickable((By.CSS_SELECTOR, "#bodyJSP #CybotCookiebotDialogBodyLevelButtonLevelOptinAllowAll"))).click()
wait.until(EC.frame_to_be_available_and_switch_to_it((By.CSS_SELECTOR, "#efficientSearchIframe")))
second_pop_up = wd.find_element_by_css(".hideFunctionalScrollbar #CybotCookiebotDialogBodyLevelButtonLevelOptinAllowAll")
actions.move_to_element(second_pop_up).build().perform()
time.sleep(0.5)
second_pop_up.click()
wait.until(EC.presence_of_all_elements_located((By.CSS_SELECTOR, ".fancyCompLabel")))
time.sleep(0.5)
for profil in results:
print(profil.text) #heres the rest of my code but its not relevant
wd.close()
I'm trying to automate the search process in this website: https://www.bcbsga.com/health-insurance/provider-directory/searchcriteria
The process involves clicking on the "Continue" button to search under the 'guest' mode. The next page has got a list of drop-down items to refine the search criteria. My code either produces the "Element not visible" exception (which I corrected by using a wait) or times out. Please help.
Here's my code:
# navigate to the desired page
driver.get("https://www.bcbsga.com/health-insurance/provider-directory/searchcriteria")
# get the guest button
btnGuest = driver.find_element_by_id("btnGuestContinue")
#click the guest button
btnGuest.click()
wait = WebDriverWait(driver,10)
#Find a Doctor Search Criteria page
element = wait.until(EC.visibility_of_element_located((By.ID,"ctl00_MainContent_maincontent_PFPlanQuestionnaire_ddlQuestionnaireInsurance")))
lstGetInsurance = Select(element)
lstGetInsurance.select_by_value("BuyMyself$14States")
# close the browser window
#driver.quit()
you can use input search and key.return:
import time
from selenium import webdriver
from selenium.webdriver.common.keys import Keys
divID = 'ctl00_MainContent_maincontent_PFPlanQuestionnaire_ddlQuestionnaireInsurance_chosen'
inputID = 'ctl00_MainContent_maincontent_PFPlanQuestionnaire_ddlQuestionnaireInsurance_chosen_input'
inputValue = 'I buy it myself (or plan to buy it myself)'
driver = webdriver.Chrome()
driver.get("https://www.bcbsga.com/health-insurance/provider-directory/searchcriteria")
driver.find_element_by_id("btnGuestContinue").click()
driver.implicitly_wait(10)
driver.find_element_by_id(divID).click()
driver.find_element_by_id(inputID).send_keys(inputValue)
driver.find_element_by_id(inputID).send_keys(Keys.RETURN)
time.sleep(6)
driver.close()