Selenium's element returns text too slow - python

Chrome version: 105.0.5195.102
Selenium == 4.4.3
Python == 3.9.12
In a certain page, 'element.text' takes ~0.x seconds which is unbearable. I suppose 'element.text' should return literally just a text from a cached page so couldn't understand it takes so long time. How can I make it faster?
Here are similar QNAs but I need to solve the problem just with Selenium.
Parse text with BeatufulSoup
Parse text with lxml
Another question: Why every 'element.text' takes different times?
For example,
import chromedriver_autoinstaller
import time
from selenium import webdriver
chromedriver_autoinstaller.install(cwd=True)
options = webdriver.ChromeOptions()
options.add_argument("--headless")
options.add_argument('--no-sandbox')
options.add_argument('--ignore-certificate-errors')
options.add_argument('--disable-dev-shm-usage')
options.add_experimental_option("excludeSwitches", ["enable-logging"])
wd = webdriver.Chrome(options=options)
wd.get("https://www.bbc.com/")
t0 = time.time()
e = wd.find_element(By.CSS_SELECTOR, "#page > section.module.module--header > h2")
print(time.time()-t0)
for i in range(10):
t0 = time.time()
txt = e.text
print(time.time()-t0)
# This prints different result for every loop.
wd.quit()

Selenium can be a bit slow because it does NOT work directly with Chrome. The communication is made via channel which is the Chrome web driver.
If you wish to work with a faster and better plugin for Automation try using PlayWright.
Another thing you can try is to find your element directly, and not using a long CSS or long Xpath expression. The longer your expression will be -> the longer it will take to find it and its text

I see the following output for your code:
0.0139617919921875
0.01196908950805664
0.003988742828369141
0.004987955093383789
0.003988027572631836
0.0039899349212646484
0.003989219665527344
0.004987955093383789
0.003987789154052734
0.003989696502685547
0.0049860477447509766
The first 2 times are about 12-14 milliseconds while the others are about 4 milliseconds.
The first action
wd.find_element(By.CSS_SELECTOR, "#page > section.module.module--header > h2")
is polling the DOM until it finds there element matching the given locator.
While the txt = e.text line uses already existing reference to the element on the page, so it does not perform any polling / searching, just access an element on the page via the existing reference (pointer) that's why it takes significantly less time.
Why the second time is long as the first I don't sure I know.
I run this test several times, got different outputs but mainly the picture was the +- same.

Related

Printing web search results won't work in Selenium script, but works when I type it into the shell

I'm very new and learning web scraping in python by trying to get the search results from the website below after a user types in some information, and then print the results. Everything works great up until the very last 2 lines of this script. When I include them in the script, nothing happens. However, when I remove them and then just try typing them into the shell after the script is done running, they work exactly as I'd intended. Can you think of a reason this is happening? As I'm a beginner I'm also super open if you see a much easier solution. All feedback is welcome. Thank you!
#Setup
from selenium import webdriver
from selenium.webdriver.common.keys import Keys
from selenium.webdriver.common.by import By
import time
#Open Chrome
driver = webdriver.Chrome()
driver.get("https://myutilities.seattle.gov/eportal/#/accountlookup/calendar")
#Wait for site to load
time.sleep(10)
#Click on street address search box
elem = driver.find_element(By.ID, 'sa')
elem.click()
#Get input from the user
addr = input('Please enter part of your address and press enter to search.\n')
#Enter user input into search box
elem.send_keys(addr)
#Get search results
elem = driver.find_element(By.XPATH, ('/html/body/app-root/main/div/div/account-lookup/div/section/div[2]/div[2]/div/div/form/div/ul/li/div/div[1]'))
print(elem.text)
I haven't used Selenium in a while, so I can only point you in the right direction. It seems to me you need to iterate over the individual entries, and print those, as opposed to printing the entire div as one element.
You should remove the parentheses from the xpath expression
You can shorten the xpath expression as follows:
Code:
elems = driver.find_element(By.XPATH, '//*[#class="addressResults"]/div')
for elem in elems:
print(elem.text)
You are using an absolute XPATH, what you should be looking into are relative XPATHs
Something like this should do it:
elems = driver.find_elements(By.XPATH, ("//*[#id='addressResults']/div"))
for elem in elems:
...
I ended up figuring out my problem - I just needed to add in a bit that waits until the search results actually load before proceeding on with the script. tossing in a time.sleep(5) did the trick. Eventually I'll add a bit that checks that an element has loaded before proceeding with the script, but this lets me continue for now. Thanks everyone for your answers!

In selenium, when I search Xpath how do I capture the element two positions before?

In Python 3 and selenium I have this script to automate the search of terms in a site with public information
from selenium import webdriver
# Driver Path
CHROME = '/usr/bin/google-chrome'
CHROMEDRIVER = '/home/abraji/Documentos/Code/chromedriver_linux64/chromedriver'
# Chosen browser options
chrome_options = webdriver.chrome.options.Options()
chrome_options.add_argument('--window-size=1920,1080')
chrome_options.binary_location = CHROME
# Website accessed
link = 'https://pjd.tjgo.jus.br/BuscaProcessoPublica?PaginaAtual=2&Passo=7'
# Search term
nome = "MARCONI FERREIRA PERILLO JUNIOR"
# Waiting time
wait = 60
# Open browser
browser = webdriver.Chrome(CHROMEDRIVER, options = chrome_options)
# Implicit wait
browser.implicitly_wait(wait)
# Access the link
browser.get(link)
# Search by term
browser.find_element_by_xpath("//*[#id='NomeParte']").send_keys(nome)
browser.find_element_by_xpath("//*[#id='btnBuscarProcPublico']").click()
# Searches for the text of the last icon - the last page button
element = browser.find_element_by_xpath("//*[#id='divTabela']/div[2]/div[2]/div[4]/div[2]/ul/li[9]/a").text
element
'»'
This site when searching for terms paginates results and always shows as the last pagination button the "»" button.
The next to last button in the case will be "›"
So I need to capture the button text always twice before the last one. Here is this case the number "8", to automate page change - I will know how many clicks on next page will be needed
Please, when I search Xpath how do I capture the element two positions before?
I know this is not an answer to the original question.
But clicking the next button several times is not a good practice.
I checked the network traffic and see that they are calling their API url with offset parameter. You should be able to use this URL with proper offset as you need.
If you really need to access the last but two, you need to get the all navigation buttons first and then access by indexing as follows.
elems = self.browser.find_elements_by_xpath(xpath)
elems[-2]
EDIT
I just tested their API and it works with proper cookie value given.
This way is much faster than automation using Selenium.
Use Selenium just to extract cookie value to be used in the header of the web request.

How to scroll a page to the end

I have tried to do this:
driver_1.execute_script("window.scrollTo(0, document.body.scrollHeight);")
but it does nothing, so I made a loop to scroll the page by steps:
initial_value = 0
end = 300000
for i in xrange(1000,end,1000):
driver_1.execute_script("window.scrollTo(" + str(initial_value) + ', ' + str(i) + ")")
time.sleep(0.5)
initial_value = i
print 'scrolling >>>>'
It kinda works, but I don't know how long is a a given page, so I have to put a big number as the max height, that gives me two problems. First is that even a big number couldn't be large enought to scroll some pages and second one is that if the page is shorter than that limit a loss quite a lot time waiting for the script to finish when is doing nothing
You need something to rely on, some indicator for you to stop scrolling. Here is an example use case when we would stop scrolling if more than N particular elements are already loaded:
Slow scrolling down the page using Selenium
Similar use case:
Headless endless scroll selenium
FYI, you may have noticed an other way to scroll to bottom - scrolling into view of a footer:
footer = driver.find_element_by_tag_name("footer")
driver.execute_script("arguments[0].scrollIntoView();", footer)
To scroll the page to the end, you could simply send the END key:
from selenium import webdriver
from selenium.webdriver.common.keys import Keys
driver = webdriver.Firefox()
driver.get("http://stackoverflow.com/search?tab=newest&q=selenium")
driver.find_element_by_tag_name("body").send_keys(Keys.END)
You could also scroll the full height :
driver = webdriver.Firefox()
driver.get("http://stackoverflow.com/search?tab=newest&q=selenium")
driver.execute_script("window.scrollBy(0, document.documentElement.scrollHeight)")
Hey I found another solution that worked perfectly for me. Check this answer here.
Also this implementation:
driver.find_element_by_tag_name("body").send_keys(Keys.END) does not work for pages that that use infinite scrolling design.

Selenium will not find my elements

firebug
console
I have a project that I chose Selenium to open 1-5 links. It's stopping at the 3rd link. I've followed the same methods for the previously successful requests. I've allowed 17 seconds and watched as I can see the page load, before the script continues to run in my console. I'm just not sure why it can't find this link, and I hope it's something I'm simply overlooking...
from selenium import *
from selenium import webdriver
from selenium.webdriver.common.keys import Keys
from selenium.webdriver.common.action_chains import ActionChains
import csv
import time
username = "xxxxxxx"
password = "xxxxxxx"
driver = webdriver.Firefox()
driver.get("https://tm.login.trendmicro.com/simplesaml/saml2/idp/SSOService.php")
assert "Trend" in driver.title
elem1 = driver.find_element_by_class_name("input_username")
elem2 = driver.find_element_by_class_name("input_password")
elem3 = driver.find_element_by_id("btn_logon")
elem1.send_keys(username)
elem2.send_keys(password)
elem3.send_keys(Keys.RETURN)
time.sleep(7)
assert "No results found." not in driver.page_source
elem4 = driver.find_element_by_css_selector("a.float-right.open-console")
elem4.send_keys(Keys.RETURN)
time.sleep(17)
elem5 = driver.find_element_by_tag_name("a.btn_left")
elem5.send_keys(Keys.RETURN)
Well one of the reasons is elem5 is looking for the element by tag name, but you are passing it a css tag. "a.btn_left" is not an html tag name and so your script will never actually find it, because it simply doesn't exist in the dom.
You either need to find it by css_selector or better yet by Xpath. If you want to make this as reliable possible and more future proof I always try and find elements on a page with at least 2 descriptors using Xpath if possible.
Change this:
elem5 = driver.find_element_by_tag_name("a.btn_left")
To this:
elem5 = driver.find_element_by_css_selector("a.btn_left")
You will almost never use tag_name, mostly because it will always retrieve the first tag you pass to it, so "a" will always find the first link and click it, yours however does not exist.
I wound up solving it with this code. I increased time to 20 secs, believe it or not, I did try the find by css, I actually left the a.btn_left, and cycled through all the elements, and none of them worked, fortunately, I could access by tab and key functions so that worked for now.
time.sleep(20)
driver.get("https://wfbs-svc-nabu.trendmicro.com/wfbs-svc/portal/en/view/cm")
elem5 = driver.find_element_by_link_text("Devices")
elem5.send_keys(Keys.ENTER)

Use PhantomJS evaluate() function from within Selenium

I am using Python bindings for Selenium with PhantomJS to scrape the contents of a website, like so.
The element I want to access is in the DOM but not in the HTML source. I understand that if I want to access elements in the DOM itself, I need to use the PhantomJS evaluate() function. (e.g. http://www.crmarsh.com/phantomjs/ ; http://phantomjs.org/quick-start.html)
How can I do this from within Selenium?
Here is part of my code (which is currently not able to access the element using a PhantomJS driver):
time.sleep(60)
driver.set_window_size(1024,768)
todays_points = driver.find_elements_by_xpath("//div/a[contains(text(),'Today')]/preceding-sibling::span")
total = 0
for today in todays_points:
driver.set_window_size(1024,768)
points = today.find_elements_by_class_name("stream_total_points")[0].text
points = int(points[:-4])
total += points
driver.close()
print total

Categories

Resources