Finding a specific text in a page using selenium python - python

I'm trying to find a specific text from a page https://play.google.com/store/apps/details?id=org.codein.filemanager&hl=en using selenium python. I'm looking for the element name - current version from the above url. I used the below code
browser = webdriver.Firefox() # Get local session of firefox
browser.get(sampleURL) # Load page
elem = browser.find_elements_by_clsss_name("Current Version") # Find the query box
print elem;
time.sleep(2) # Let the page load, will be added to the API
browser.close()
I don't seem to get the output printed. Am I doing anything wrong here?

There is no class with name "Current Version". If you want to capture the version number that is below the "Current Version" text, the you can use this xpath expression:
browser.find_element_by_xpath("//div[#itemprop='softwareVersion']")

Related

Inconsistent results retrieving data with Selenium in Python

driver = webdriver.Chrome(driver_path, options=chrome_options)
wait = WebDriverWait(driver, 20)
driver.get('https://%s/' % asset_id)
wait.until(EC.presence_of_element_located((By.XPATH, "//*[#id='dev_diaginfo_fid']")))
print(driver.find_element_by_xpath('//*[#id='dev_diaginfo_fid']").get_attribute=("innerHTML"))
I'm able to log into the website and Selenium returns the WebElement but it is not consistent when returning the text from that WebElement. Sometimes it returns it and other times it seems like it isn't loading fast enough (slow network where this is being utilized) and returns no data at all but I can still see the WebElement itself just not the data. The data is dynamically loaded via JS. Probably not relevant but I am using send_keys to pass the credentials needed to login and then the page with the version is loaded.
Is there a way to use an ExpectedCondition (EC) to wait until it sees text before moving on? I'm attempting to pull the firmware version from a network device and it finds the Firmware element but it is not consistent when returning the actual firmware version. As stated before, there are issues with network speeds occasionally so my suspicion is that it's moving on before loading the firmware number. This device does not have internet access so I can't share the URL. I can confirm that I have pulled the firmware version but it's just not consistent.
I have tried passing it to beautifulsoup and can verify that it sees Firmware Version: but the inner tags are empty.
Edit: I have tried EC.visibility_of_all_elements and EC.visibility_of_element as well with no luck.
Here's an idea.
Try a while loop until you see the text.
counter = 0
elem = driver.find_element_by_xpath("//*[#id='dev_diaginfo_fid']").get_attribute=("innerHTML")
while "" in elem:
pause(500)
elem = driver.find_element_by_xpath("//*[#id='dev_diaginfo_fid']").get_attribute=("innerHTML")
if "" not in elem:
print("Success! The text is: " + elem)
break
if counter > 20:
print("Text still not found!")
break
counter += 1
Obviously, adjust the loop to suit your needs.

Is there a procedure to enter each link of a Google results and extract text?

A total newbie here in search for your wisdom (1st post/question, too)! Thank you in advance for you time and patience.
I am hoping to automatize scientific literature searches in Google Scholar using Selenium specifically (via Chrome) with Python. I envision entering a topic, which will be searched on Google Scholar, and then entering each link of the articles/books in the results, extracting the abstract/summary, and printing them on the console (or saving them on a text file). This will be an easy way to determine the relevancy of the articles in the results for the stuff that I'm writing.
Thus far, I am able to visit Google scholar, enter text in the search bar, filter by date (newest to oldest), and extract each of the links on the results. I have not been able to write a loop that will enter each article link and extract the abstracts (or other relevant text), as each result may have been coded differently.
Kind regards,
JP (Aotus_californicus)
This is my code so far:
import pandas as pd
from selenium import webdriver
from selenium.webdriver.common.keys import Keys
def get_results(search_term):
url = 'https://scholar.google.com'
browser = webdriver.Chrome(executable_path=r'C:\Users\Aotuscalifornicus\Downloads\chromedriver_win32\chromedriver.exe')
browser.get(url)
searchBar = browser.find_element_by_id('gs_hdr_tsi')
searchBar.send_keys(search_term)
searchBar.submit()
browser.find_element_by_link_text("Trier par date").click()
results = []
links = browser.find_elements_by_xpath('//h3/a')
for link in links:
href = link.get_attribute('href')
print(href)
results.append(href)
browser.close()
get_results('Primate thermoregulation')
Wrt your comment, and using that as a basis for my answer:
To clarify, I am looking to write a loop that enters each link and extracts an element by tag, for example
Open a new window or start a new driver session to check the links in the results. Then use a rule to extract the text you want. You could re-use your existing driver session if you extract all the hrefs first or create a new tab as you get each result link.
for link in links:
href = link.get_attribute('href')
print(href)
results.append(href)
extractor = webdriver.Chrome(executable_path=...) # as above
for result in results:
extractor.get(url)
section_you_want = extractor.find_elements_by_xpath(...) # or whichever set of rules
# other code here
extractor.close()
You can setup rules to use with the base find_element() or find_elements() finders and then iterate over them until you get a result (validate best on element presence or text length or something sane & useful). Each of the the rules is a tuple that can be passed to the base finder function:
from selenium.webdriver.common.by import By # see the docs linked above for the available `By` class attributes
rules = [(By.XPATH, '//h3/p'),
(By.ID, 'summary'),
(By.TAG_NAME, 'div'),
... # etc.
]
for url in results:
extractor.get(url)
for rule in rules:
elems = extractor.find_elements(*rule) # argument unpacking
if not elems:
continue # not found, try next rule
print(elems[0].getText())
break # stop after first successful "find"
else: # only executed if no rules match and `break` is never reached, or `rules` list is empty
print('Could not find anything for url:', url)

In selenium, when I search Xpath how do I capture the element two positions before?

In Python 3 and selenium I have this script to automate the search of terms in a site with public information
from selenium import webdriver
# Driver Path
CHROME = '/usr/bin/google-chrome'
CHROMEDRIVER = '/home/abraji/Documentos/Code/chromedriver_linux64/chromedriver'
# Chosen browser options
chrome_options = webdriver.chrome.options.Options()
chrome_options.add_argument('--window-size=1920,1080')
chrome_options.binary_location = CHROME
# Website accessed
link = 'https://pjd.tjgo.jus.br/BuscaProcessoPublica?PaginaAtual=2&Passo=7'
# Search term
nome = "MARCONI FERREIRA PERILLO JUNIOR"
# Waiting time
wait = 60
# Open browser
browser = webdriver.Chrome(CHROMEDRIVER, options = chrome_options)
# Implicit wait
browser.implicitly_wait(wait)
# Access the link
browser.get(link)
# Search by term
browser.find_element_by_xpath("//*[#id='NomeParte']").send_keys(nome)
browser.find_element_by_xpath("//*[#id='btnBuscarProcPublico']").click()
# Searches for the text of the last icon - the last page button
element = browser.find_element_by_xpath("//*[#id='divTabela']/div[2]/div[2]/div[4]/div[2]/ul/li[9]/a").text
element
'»'
This site when searching for terms paginates results and always shows as the last pagination button the "»" button.
The next to last button in the case will be "›"
So I need to capture the button text always twice before the last one. Here is this case the number "8", to automate page change - I will know how many clicks on next page will be needed
Please, when I search Xpath how do I capture the element two positions before?
I know this is not an answer to the original question.
But clicking the next button several times is not a good practice.
I checked the network traffic and see that they are calling their API url with offset parameter. You should be able to use this URL with proper offset as you need.
If you really need to access the last but two, you need to get the all navigation buttons first and then access by indexing as follows.
elems = self.browser.find_elements_by_xpath(xpath)
elems[-2]
EDIT
I just tested their API and it works with proper cookie value given.
This way is much faster than automation using Selenium.
Use Selenium just to extract cookie value to be used in the header of the web request.

Selenium: avoid ElementNotVisibleException exception

I've never studied HTML seriously, so probably what I'm going to say is not right.
While I was writing my selenium code, I noticed that some buttons on some webpages does not redirect to other pages, but they change the structure of the the first one. From what I've understood, this happens because there's some JavaScript code that modifies it.
So, when I wanna get some data which is not present on the first page loading, I just have to click the right sequence of button to obtain it, rigth?
The page I wanted to load is https://watch.nba.com/, and what I want to get is the match list of a given day. The fastest path to get it is to open the calendary:
calendary = driver.find_element_by_class_name("calendar-btn")
calendary.click()
and click the selected day:
page = calendary.find_element_by_xpath("//*[contains(text(), '" + time.strftime("%d") + "')]")
page.click()
running this code, I get this error:
selenium.common.exceptions.ElementNotVisibleException
I read somewhere that the problem is that the page is not correctly loaded, or the element is not visible/clickable, so I tried with this:
wait = WebDriverWait(driver, 10)
page = wait.until(EC.visibility_of_element_located((By.XPATH, "//*[contains(text(), '" + time.stfrtime("%d") + "')]")))
page.click()
But this time I get this error:
selenium.common.exceptions.TimeoutException
Can you help me to solve at least one of these two problems?
The reason you are getting such behavior is because this page is loaded with iFrames (I can see 15 on the main page) once you click the calendar button, you will need to switch your context to either be on the defaultContext or a specific iframe where the calendar resides. There is tons of code outthere that shows you how to switch into and out of iframe. Hope this helps.

Search results don't change URL - Web Scraping with Python and Selenium

I am trying to create a python script to scrape the public county records website. I ultimately want to be able to have a list of owner names and the script run through all the names and pull the most recent deed of trust information (lender name and date filed). For the code below, I just wrote the owner name as a string 'ANCHOR EQUITIES LTD'.
I have used Selenium to automate the entering of owner name into form boxes but when the 'return' button is pressed and my results are shown, the website url does not change. I try to locate the specific text in the table using xpath but the path does not exist when I look for it. I have concluded the path does not exist because it is searching for the xpath on the first page with no results shown. BeautifulSoup4 wouldn't work in this situation because parsing the url would only return a blank search form html
See my code below:
from selenium import webdriver
from selenium.webdriver.common.keys import Keys
browser = webdriver.Chrome()
browser.get('http://deed.co.travis.tx.us/ords/f?p=105:5:0::NO:::#results')
ownerName = browser.find_element_by_id("P5_GRANTOR_FULLNAME")
ownerName.send_keys('ANCHOR EQUITIES LTD')
docType = browser.find_element_by_id("P5_DOCUMENT_TYPE")
docType.send_keys("deed of trust")
ownerName.send_keys(Keys.RETURN)
print(browser.page_source)
#lenderName = browser.find_element_by_xpath("//*[#id=\"report_results\"]/tbody[2]/tr/td/table/tbody/tr[25]/td[9]/text()")
enter code here
I have commented out the variable that is giving me trouble.. Please help!!!!
If I am not explaining my problem correctly, please feel free to ask and I will clear up any questions.
I think you almost have it.
You match the element you are interested in using:
lenderNameElement = browser.find_element_by_xpath("//*[#id=\"report_results\"]/tbody[2]/tr/td/table/tbody/tr[25]/td[9]")
Next you access the text of that element:
lenderName = lenderNameElement.text
Or in a single step:
lenderName = browser.find_element_by_xpath("//*[#id=\"report_results\"]/tbody[2]/tr/td/table/tbody/tr[25]/td[9]").text
have you used following xpath?
//table[contains(#summary,"Search Results")]/tbody/tr
I have checked it's work perfect.In that, you have to iterate each tr

Categories

Resources