Selenium scrape with full xpath and Python

Selenium scrape with full xpath and Python - python

I am trying to get the Nasdaq "Most Advanced" list of stocks from here: http://www.nasdaq.com/extended-trading/premarket-mostactive.aspx (click on Most Advanced tab)
What is the best way using Selenium to loop through all the Symbols and put them into a Python list? I have figured out the XPATH to the first Symbol:
/html/body/div[4]/div[3]/div/div[7]/div[2]/table/tbody/tr[2]/td/div/h3/a
but am not sure where to go from there.. I tried:
element=driver.find_elements_by_xpath("/html/body/div[4]/div[3]/div/div[7]/div[2]/table/tbody/tr[2]/td/div/h3/a")
print element.text
..as a start just to see if I can get a value but it obviously doesn't work. Sorry for the stupid question :(

These xpaths containing the full absolute path to the element are very fragile.
Rely on the class name (//div[#class="symbol_links"]):
from selenium.webdriver.firefox import webdriver
driver = webdriver.WebDriver()
driver.get('http://www.nasdaq.com/extended-trading/premarket-mostactive.aspx')
# choose "Most Advanced" tab
advanced_link = driver.find_element_by_id('most-advanced')
advanced_link.click()
# get the symbols
print [symbol.text for symbol in driver.find_elements_by_xpath('//div[#class="symbol_links"]') if symbol.text]
driver.close()
prints:
[u'RNA', u'UBIC', u'GURE', u'DRTX', u'DSLV', u'YNDX', u'QIWI', u'NXPI', u'QGEN', u'ZGNX']
Hope that helps.

Related

How can I find the text between a span element with no class/id using Selenium in Python?

I am trying to use Selenium to find the text between a span element that has no class, id, or anything I can think of to find it specifically.
Here is the html:
HTML from inspect element on Chrome
I have tried the following:
reqStr = driver.find_elements_by_xpath('span[data-bind*=text]')
for i in reqStr:
print(i.get_attribute('text'))
But that results in it printing just an empty list '[]'.
Any advice on how to find the text "CHEM1040, PHYS1130 - Must be completed prior to taking this course" is appreciated

You could get it using a CSS Selector like:
reqStr = driver.find_element_by_css_selector('span[data-bind="text: $data.DisplayText() + \' \' + $data.DisplayTextExtension()"]')

Thanks for the help guys. I actually managed to find it using:
reqStr = driver.find_elements_by_xpath("//span[contains(text(),'*')]")
print(reqStr)
for i in reqStr:
print(i.text)
I had this before but must have slightly messed something up because I was not getting the correct results. Thanks!

Selenium on Twitter for user information: no such elements

I try to use Selenium to get specific users' information (e.g., num of followers) by entering their page using ids. The thing is, though I find the needed information in the INSPECT, I cannot position it using Selenium, even with the help of ChroPath, which tells you the Xpath or CssSelector that you can use to position. It keeps saying: No such element...I'm quite confused. I'm not even trying to automatically log in or anything.
here are the codes:
from selenium import webdriver
driver = webdriver.Chrome(executable_path='E:/data mining/chromedriver.exe')
driver.get('https://twitter.com/intent/user?user_id=823730524426477568')
ele = driver.find_element_by_class_name('css-901oao css-16my406 r-poiln3 r-bcqeeo r-qvutc0').text
print(ele)
Error:
Message: no such element: Unable to locate element: {"method":"css selector","selector":".r-poiln3 r-bcqeeo r-qvutc0"}
(Session info: chrome=88.0.4324.104)
It's so strange, because everything is right on the first page, and I don't even need to scroll down to see the information, but it won't be scraped...

To fix the issue with your code, try this:
ele = driver.find_element_by_css_selector('.css-901oao.css-16my406.r-poiln3.r-bcqeeo.r-qvutc0').text
However, because of how the site is formatted, it won't get you the result you want. It appears that not only do they rotate the class name, but there isn't enough variability in how the elements are labelled to make anything but XPath a viable option for getting specific data. (Unless you want to go through a list of all the elements with the same class to find what you need). After some initial site interactions, these XPaths worked for me:
following = driver.find_element_by_xpath('//*[#id="react-root"]/div/div/div[2]/main/div/div/div/div[1]/div/div[2]/div/div/div[1]/div[2]/div[4]/div[1]/a/span[1]/span').text
followers = driver.find_element_by_xpath('//*[#id="react-root"]/div/div/div[2]/main/div/div/div/div[1]/div/div[2]/div/div/div[1]/div[2]/div[4]/div[2]/a/span[1]/span').text

Problem is, there are over 100 elements with classes 'css-901oao css-16my406 r-poiln3 r-bcqeeo r-qvutc0' on that page.
To avoid absolute xpath you could use something like this for e.g. number of followers:
Find span with descendant span with text 'Followers'. Above that, on same level (sibling) there is span with child - number of followers for text
ele = driver.find_element_by_xpath("//span[descendant::span[text()='Followers']]/preceding-sibling::span/span").text

The selectors which you are using are very much fragile. chropath gives fragile xpath which is changing in next run so script is failing. You might like to use relative xpath generated by selectorshub which is much robust.

Need to print the Countdown using selenium

I thought about wanting to know when the next episode of my favorite anime show is gonna be by doing a little bit of web scraping, and searching that specific anime. Then printing out the next episode countdown from the id="nextEpisodeCountDown" in a span tag.
from selenium import webdriver
from selenium.webdriver.common.keys import Keys
import time
PATH = "C:\Program Files (x86)\chromedriver.exe"
driver = webdriver.Chrome(PATH)
driver.get("https://kissanime.ru/")
time.sleep(15)
search = driver.find_element_by_id("keyword")
search.send_keys("Rent a Girlfriend")
search.send_keys(Keys.RETURN)
time.sleep(10)
element = driver.find_element_by_id("nextEpisodeCountDown")
print(element)

I'm not sure exactly what your question is here. You basically just said "I had this idea and here's the code I wrote to do it". I'm gonna go out on a limb and assume you're asking for help making this work because it isn't working as you expected it to? The code you posted works just fine to accomplish the goal you stated with perhaps one issue. I'm guessing the issue you're having is what's being printed out with your code is something like
<selenium.webdriver.remote.webelement.WebElement (session="567cb03b2d0f311ccf81166ff58c62c4", element="92669594-5987-48e7-8cd1-1572a68fc34e")>
and the output you wanted was
06 days 19h:11m
going with this assumption what you need to do is alter your print statement to say print(element.text). When you use element = driver.find_element_by_id("nextEpisodeCountDown") what is returned is that element object and you are looking to print the text of that element thus you need to write element.text. Hope this helps and if not perhaps edit your question to be more clear what it is you're asking.

Selenium Python find element partial link text

I'm trying to use Pythons Selenium module to click on an element whose link has the text "xlsx" at the end of it. Below is the code I'm using and the details of the element. Can someone please see why Python is unable to find this element?
driver.find_element_by_partial_link_text('xlsx').click()
Here is the element details:
<a name="URL$2" id="URL$2" ptlinktgt="pt_new" tabindex="43" onfocus="doFocus_win0(this,false,true);" href="http:******/HSC8_CNTRCT_ITEMS_IMPRVD-1479218.xlsx" onclick="window.open('http:********/HSC8_CNTRCT_ITEMS_IMPRVD-1479218.xlsx','','');cancelBubble(event);return false;" class="PSHYPERLINK">HSC8_CNTRCT_ITEMS_IMPRVD-1479218.xlsx</a>
I had to remove some parts of the URL for confidentiality purposes, however, it should not impact the answering of the question.
Thanks.

Thanks for the replies. Turns out, as #Andersson mentioned, the window was in a different frame.
I solved the problem using the following code before the find_element: driver.switch_to.frame('ptModFrame_0').

You can use a CSS selector:
driver.find_element_by_css_selector("a[href*='xlsx']")
If the element still cannot be located, I would suggest using a wait statement, to ensure that the element is visible, before you interact with it.

Please try:
driver.find_element_by_xpath(".//a[contains(#href,'xlsx')]").

You can grab it by class name (class name = PSHYPERLINK).
This should work:
import selenium
from selenium import webdriver
from selenium.webdriver.common.keys import Keys
import time
url = ''
driver = webdriver.Chrome('/path/to/chromedriver/executable')
if __name__=='__main__':
driver.get(url)
time.sleep(3)
driver.find_element_by_class_name('PSHYPERLINK').click()
When finding the attribute, make sure to use a singular '"element". Like:
driver.find_element_by_class_name('PSHYPERLINK').click()
not:
driver.find_elements_by_class_name('PSHYPERLINK').click()
Hope this helps.

Search results don't change URL - Web Scraping with Python and Selenium

I am trying to create a python script to scrape the public county records website. I ultimately want to be able to have a list of owner names and the script run through all the names and pull the most recent deed of trust information (lender name and date filed). For the code below, I just wrote the owner name as a string 'ANCHOR EQUITIES LTD'.
I have used Selenium to automate the entering of owner name into form boxes but when the 'return' button is pressed and my results are shown, the website url does not change. I try to locate the specific text in the table using xpath but the path does not exist when I look for it. I have concluded the path does not exist because it is searching for the xpath on the first page with no results shown. BeautifulSoup4 wouldn't work in this situation because parsing the url would only return a blank search form html
See my code below:
from selenium import webdriver
from selenium.webdriver.common.keys import Keys
browser = webdriver.Chrome()
browser.get('http://deed.co.travis.tx.us/ords/f?p=105:5:0::NO:::#results')
ownerName = browser.find_element_by_id("P5_GRANTOR_FULLNAME")
ownerName.send_keys('ANCHOR EQUITIES LTD')
docType = browser.find_element_by_id("P5_DOCUMENT_TYPE")
docType.send_keys("deed of trust")
ownerName.send_keys(Keys.RETURN)
print(browser.page_source)
#lenderName = browser.find_element_by_xpath("//*[#id=\"report_results\"]/tbody[2]/tr/td/table/tbody/tr[25]/td[9]/text()")
enter code here
I have commented out the variable that is giving me trouble.. Please help!!!!
If I am not explaining my problem correctly, please feel free to ask and I will clear up any questions.

I think you almost have it.
You match the element you are interested in using:
lenderNameElement = browser.find_element_by_xpath("//*[#id=\"report_results\"]/tbody[2]/tr/td/table/tbody/tr[25]/td[9]")
Next you access the text of that element:
lenderName = lenderNameElement.text
Or in a single step:
lenderName = browser.find_element_by_xpath("//*[#id=\"report_results\"]/tbody[2]/tr/td/table/tbody/tr[25]/td[9]").text

have you used following xpath?
//table[contains(#summary,"Search Results")]/tbody/tr
I have checked it's work perfect.In that, you have to iterate each tr

Develop Reference

Python is a programming language that lets you work quickly and integrate systems more effectively.

Selenium scrape with full xpath and Python - python

Related

How can I find the text between a span element with no class/id using Selenium in Python?

Selenium on Twitter for user information: no such elements

Need to print the Countdown using selenium

Selenium Python find element partial link text

Search results don't change URL - Web Scraping with Python and Selenium

Categories

Resources