PYTHON - Unable To Find Xpath Using Selenium - python

I have been struggling with this for a while now.
I have tried various was of finding the xpath for the following highlighted HTML
I am trying to grab the dollar value listed under the highlighted Strong tag.
Here is what my last attempt looks like below:
try:
price = browser.find_element_by_xpath(".//table[#role='presentation']")
price.find_element_by_xpath(".//tbody")
price.find_element_by_xpath(".//tr")
price.find_element_by_xpath(".//td[#align='right']")
price.find_element_by_xpath(".//strong")
print(price.get_attribute("text"))
except:
print("Unable to find element text")
I attempted to access the table and all nested elements but I am still unable to access the highlighted portion. Using .text and get_attribute('text') also does not work.
Is there another way of accessing the nested element?
Or maybe I am not using XPath as it properly should be.
I have also tried the below:
price = browser.find_element_by_xpath("/html/body/div[4]")
UPDATE:
Here is the Full Code of the Site.
The Site I am using here is www.concursolutions.com
I am attempting to automate booking a flight using selenium.
When you reach the end of the process of booking and receive the price I am unable to print out the price based on the HTML.
It may have something to do with the HTML being a java script that is executed as you proceed.

Looking at the structure of the html, you could use this xpath expression:
//div[#id="gdsfarequote"]/center/table/tbody/tr[14]/td[2]/strong

Making it work
There are a few things keeping your code from working.
price.find_element_by_xpath(...) returns a new element.
Each time, you're not saving it to use with your next query. Thus, when you finally ask it for its text, you're still asking the <table> element—not the <strong> element.
Instead, you'll need to save each found element in order to use it as the scope for the next query:
table = browser.find_element_by_xpath(".//table[#role='presentation']")
tbody = table.find_element_by_xpath(".//tbody")
tr = tbody.find_element_by_xpath(".//tr")
td = tr.find_element_by_xpath(".//td[#align='right']")
strong = td.find_element_by_xpath(".//strong")
find_element_by_* returns the first matching element.
This means your call to tbody.find_element_by_xpath(".//tr") will return the first <tr> element in the <tbody>.
Instead, it looks like you want the third:
tr = tbody.find_element_by_xpath(".//tr[3]")
Note: XPath is 1-indexed.
get_attribute(...) returns HTML element attributes.
Therefore, get_attribute("text") will return the value of the text attribute on the element.
To return the text content of the element, use element.text:
strong.text
Cleaning it up
But even with the code working, there’s more that can be done to improve it.
You often don't need to specify every intermediate element.
Unless there is some ambiguity that needs to be resolved, you can ignore the <tbody> and <td> elements entirely:
table = browser.find_element_by_xpath(".//table[#role='presentation']")
tr = table.find_element_by_xpath(".//tr[3]")
strong = tr.find_element_by_xpath(".//strong")
XPath can be overkill.
If you're just looking for an element by its tag name, you can avoid XPath entirely:
strong = tr.find_element_by_tag_name("strong")
The fare row may change.
Instead of relying on a specific position, you can scope using a text search:
tr = table.find_element_by_xpath(".//tr[contains(text(), 'Base Fare')]")
Other <table> elements may be added to the page.
If the table had some header text, you could use the same text search approach as with the <tr>.
In this case, it would probably be more meaningful to scope to the #gdsfarequite <div> rather than something as ambiguous as a <table>:
farequote = browser.find_element_by_id("gdsfarequote")
tr = farequote.find_element_by_xpath(".//tr[contains(text(), 'Base Fare')]")
But even better, capybara-py provides a nice wrapper on top of Selenium, helping to make this even simpler and clearer:
fare_quote = page.find("#gdsfarequote")
base_fare_row = fare_quote.find("tr", text="Base Fare"):
base_fare = tr.find("strong").text

Related

Need Selenium to return the class title content of given HTML

Using Selenium to perform some webscraping. Have it log in to a site, where an HTML table of data is returned with five values at a time. I'm going to have Selenium scrape a particular bit of data off the table, write to a file, click next, and repeat with the next five.
New automation script. I've a myriad of variations of get_attribute, find_elements_by_class_name, etc. Example:
pnum = prtnames.get_attribute("title")
for x in prtnames:
print('pnum')
Here's the HTML from one of the returned values:
<div class="text-container prtname"><span class="PrtName" title="P011">P011</span></div>
I need to get that "P011" value. Obviously Selenium doesn't have "find_elements_by_title", and there is no HTML id for the value. The Xpath for that line of HTML is:
//*[#id="printerConnectTable"]/tbody/tr[5]/td/table/tbody/tr[1]/td[2]/div/span
But I don't see a reference to "title" or "P011" in that Xpath.
pnum = prtnames.get_attribute("title")
AttributeError: 'list' object has no attribute 'get_attribute'
It's like get_attribute doesn't exist, but there is some (albeit not much) documentation on it.
Fundamentally I'd like to grab that "P011" value and print to console, then I know Selenium is working with the right data.
P.S. I'm self-taught with all of this, I'm automating a sysadmin task.
I think the problem is that prtnames is a list of element, not a specific element. You can use a list comprehension if you want a list of the attributes of titles for the list of prtnames.
pnums = [x.get_attribute('title') for x in prtnames]

Getting ID of Selenium WebElement in Python?

If I have the following HTML <a id="id" class="class" href="href">Element Text</a> how would I get "id" to be returned? My current procedure is:
print("Attribute: " + element.get_attribute('id'))
print("Property: " + element.get_property('id'))
print("Class: " + element.get_attribute('class'))
But all of those return empty strings. I am, however, able to get the text by using element.text
EDIT: Here's a more in depth explanation
I'm looking for an element but that element's ID varies. There is, however, an element linked to the element I want that I can find using it's xpath and comparing it's text to a specific text that I know beforehand. The ID of that element is something in the form of someID_XX. By taking the XX and appending it to another fixed string, I can then search for the element that I actually want. My issue is that once I get the second element (not the one I want directly, but the one that can lead me to the one I want) I can't seem to get it's ID attribute even though it seems to have one in the html. My question is, how do I get the id attribute?
For me is working with
element.get_attribute('id')
At the beginning didn't work because I selected incorrectly a hidden javascript element without id.
But if the error persist on your case, you can try this:
element.get_attribute('outerHTML')
get the DOM for the element, and then parse the html to extract the id

Should be Simple XPATH?

Using Python and Selenium I'm trying to click a link if it contains text. In this case say 14:10 and this would be the DIV I'm after.
<div class="league_check" id="hr_selection_18359391" onclick="HorseRacingBranchWindow.showEvent(18359391);" title="Odds Available"> <span class="race-status"> <img src="/i/none_v.gif" width="12" height="12" onclick="HorseRacingBranchWindow.toggleSelection(18359391); cancelBubble(event);"> </span>14:10 * </div>
I've been watching the browser move manually. I know the DIV has loaded before my code fires but I can't figure out what the heck it is actually doing.
Looked pretty straightforward. I'm not great at XPATH but I usually manage the basics.
justtime = "14:10"
links = Driver.find_elements_by_xpath("//div*[contains(.,justtime)")
As far as I can see no other link on that page contains the text 14:10 but when I loop through links and print it out it's showing basically every link on that page.
I've tried to narrow it down to that class name and containing the text
justtime = "14:10"
links = Driver.find_elements_by_xpath("//div[contains(.,justtime) and (contains(#class, 'league_check'))]")
Which doesn't return anything at all. Really stumped on this it's making no sense to me at all.
Currently, your XPath didn't make use of justtime python variable. Instead, it references child element <justtime> which doesn't exists within the <div>. Expression of form contains(., nonExistentElement) will always evaulates to True, because nonExistentElement translates to empty string here. This is possibly one of the reason why your initial XPath returned more elements than the expected.
Try to incorporate value from justtime variable into your XPath by using string interpolation, and don't forget to enclose the value with quotes so that it can be properly recognized as XPath literal string :
justtime = "14:10"
links = Driver.find_elements_by_xpath("//div[contains(.,'%s')]" % justtime)
You have need to use wait for element
wait = WebDriverWait(driver, 10)
element = wait.until(EC.element_to_be_clickable((By.ID,'someid')))

Python crawler not finding specific Xpath

I asked my previous question here:
Xpath pulling number in table but nothing after next span
This worked and i managed to see the number i wanted in a firefox plugin called xpath checker. the results show below.
so I know i can find this number with this xpath, but when trying to run a python scrpit to find and save the number it says it cannot find it.
try:
views = browser.find_element_by_xpath("//div[#class='video-details-inside']/table//span[#class='added-time']/preceding-sibling::text()")
except NoSuchElementException:
print "NO views"
views = 'n/a'
pass
I no that pass is not best practice but i am just testing this at the moment trying to find the number. I'm wondering if i need to change something on the end of the xpath like .text as the xpath checker normally shows a results a little differently. Like below:
i needed to use the xpath i gave rather than the one used in the above picture because i only want the number and not the date. You can see part of the source in my previous question.
Thanks in advance! scratching my head here.
The xpath used in find_element_by_xpath() has to point to an element, not a text node and not an attribute. This is a critical thing here.
The easiest approach here would be to:
get the td's text (parent)
get the span's text (child)
remove child's text from parent's
Code:
span = browser.find_element_by_xpath("//div[#class='video-details-inside']/table//span[#class='added-time']")
td = span.find_element_by_xpath('..')
views = td.text.replace(span.text, '').strip()

trouble getting text from xpath entry in python

I am on the website
http://www.baseball-reference.com/players/event_hr.cgi?id=bondsba01&t=b
and trying to scrape the data from the tables. When I pull the xpath from one entry, say the pitcher
"Terry Mulholland," I retrieve this:
pitchers = site.xpath("/html/body/div[2]/div[2]/div[6]/table/tbody/tr/td[3]/table/tbody/tr[2]/td/a)
When I try to print pitcher[0].text for pitcher in printers, I get [] rather than the text, Any idea why?
The problem is, last tbody doesn't exist in the original source. If you get that xpath via some browser, keep in mind that browsers can guess and add missing elements to make html valid.
Removing the last tbody resolves the problem.
In : import lxml.html as html
In : site = html.parse("http://www.baseball-reference.com/players/event_hr.cgi?id=bondsba01&t=b")
In : pitchers = site.xpath("/html/body/div[2]/div[2]/div[6]/table/tbody/tr/td[3]/table/tr[2]/td/a")
In : pitchers[0].text
Out: 'Terry Mulholland'
But I need to add that, the xpath expression you are using is pretty fragile. One div added in some convenient place and now you have a broken script. If possible, try to find better references like id or class that points to your expected location.

Categories

Resources