Find text and elements with python and selenium? - python

When I go to a certain webpage I am trying to find a certain element and piece of text:
<span class="Bold Orange Large">0</span>
This didn't work: (It gave an error of compound class names or something...)
elem = browser.find_elements_by_class_name("Bold Orange Large")
So I tried this: (but I'm not sure it worked because I don't really understand the right way to do css selectors in selenium...)
elem = browser.find_elements_by_css_selector("span[class='Bold Orange Large']")
Once I find the span element, I want to find the number that is inside...
num = elem.(what to put here??)
Any help with css selectors, class names, and finding element text would be great!!
Thanks.
EDIT:
My other problem is that there are multiple of those exact span elements but with different numbers inside..how can I deal with that?

you're correct in your usage of css selectors! Also your first attempt was failing because there are spaces in the class name and selenium does not seem to be able to find standalone identifiers with spacing at all. I think that is a bad development practice to begin with, so its not your problem. Selenium itself does not include an html editor, because its already been done before.
Try looking here: How to find/replace text in html while preserving html tags/structure.
Also this one is relevant and popular as well: RegEx match open tags except XHTML self-contained tags

Related

I'm having trouble selectiong an element using Selenium with Python

I want to read out the text in this html element using selenium with python. I just can't find a way to find or select it without using the text (i don't want that because its content changes)
<div font-size="14px" color="text" class="sc-gtsrHT jFEWVt">0.101 ONE</div>
Do you have an idea how i could select it? The conventional ways listed in the documentation seem to not work for me. To be honest i'm not very good with html what doesn't make things any easier.
Thank you in advance
Try this :
element = browser.find_element_by_class_name('sc-gtsrHT jFEWVt').text
Or use a loop if you have several elements :
elements = browser.find_elements_by_class_name('sc-gtsrHT jFEWVt')
for e in elements:
print(e.text)
print(browser.find_element_by_xpath("//*[#class='sc-gtsrHT jFEWVt']").text)
You could simply grab it by class name. It's 2 class names so it would be like so. by_class_name only uses one.
If the class name isn't dynamic otherwise you'd have to right click and copy the xpath or find a unique identiftier.
Find by XPath as long as font size and color attribute are consistent. Be like,
//div[#font-size='14px' and #color='text' and starts-with(#class,'sc-')]
I guess the class name is random?

How do I find a more descriptive XML Path for my Selenium webscrape?

I'm building a website scraper using Selenium and I want to "click" the highlighted div in the image below.
My current code (which works, but isn't very descriptive) is:
button = driver.find_element_by_xpath("//div/div/div/div/div/div/div/div[5]/div[8]")
button.click()
I'm glad it works, but it feels fragile, since I'm accessing the divs purely by index, without any other identifying features. Is there a way, at least for the last div, that I can specify my choice by the text within span? What would the syntax be for choosing the div that contains a span with the text "Grandmaster"?
It's worth noting that this is the only div in any of the "filter-group"s that contains the text "Grandmaster". Is there a way to select this div specifically, without listing all the nested divs (as I've done in my code above)?
Any other ideas on how to make the XML path's code a bit more robust would be appreciated.
What would the syntax be for choosing the div that contains a span with the text "Grandmaster"?
The syntax would be:
driver.find_element_by_xpath("//*[contains(text(), 'Grandmaster')]")
What would the syntax be for choosing the div that contains a span
with the text "Grandmaster"?
You can use this xPath:
//span[contains(., 'Grandmaster')]/parent::div
more information you can get here.

Using Xpath to get the anchor text of a link in Python when the link has no class

(disclaimer: I only vaguely know python & am pretty new to coding)
I'm trying to get the text part of a link, but it doesn't have a specific class, and depending on how I word my code I get either way too many things (the xpath wasn't specific enough) or a blank [ ].
A screenshot of what I'm trying to access is :
Tree is all the html from the page.
The code that returns a blank is:
cardInfo=tree.xpath('div[#class="cardDetails"]/table/tbody/tr/td[2]/a/text()')
The code that returns way too much:
cardInfo=tree.xpath('a[contains(#href, 'domain_name')]/text()')
I tried going into Inspect in chrome and copying the xpath, which also gave me nothing. I've successfully gotten other things out of the page that are just plain text, not links. Super sorry if I didn't explain this well but does anyone have an idea of what I can write?
If you meant to find text next to Set Name::
>>> import lxml.html
>>> tree = lxml.html.parse('http://shop.tcgplayer.com/pokemon/jungle/nidoqueen-7')
>>> tree.xpath(".//b[text()='Set Name:']/parent::td/following-sibling::td/a/text()")
['Jungle']
.//b[text()='Set Name:'] to find b tag with Set Name: text,
parent::td - parent td element of it,
following-sibling::td - following td element

Python crawler not finding specific Xpath

I asked my previous question here:
Xpath pulling number in table but nothing after next span
This worked and i managed to see the number i wanted in a firefox plugin called xpath checker. the results show below.
so I know i can find this number with this xpath, but when trying to run a python scrpit to find and save the number it says it cannot find it.
try:
views = browser.find_element_by_xpath("//div[#class='video-details-inside']/table//span[#class='added-time']/preceding-sibling::text()")
except NoSuchElementException:
print "NO views"
views = 'n/a'
pass
I no that pass is not best practice but i am just testing this at the moment trying to find the number. I'm wondering if i need to change something on the end of the xpath like .text as the xpath checker normally shows a results a little differently. Like below:
i needed to use the xpath i gave rather than the one used in the above picture because i only want the number and not the date. You can see part of the source in my previous question.
Thanks in advance! scratching my head here.
The xpath used in find_element_by_xpath() has to point to an element, not a text node and not an attribute. This is a critical thing here.
The easiest approach here would be to:
get the td's text (parent)
get the span's text (child)
remove child's text from parent's
Code:
span = browser.find_element_by_xpath("//div[#class='video-details-inside']/table//span[#class='added-time']")
td = span.find_element_by_xpath('..')
views = td.text.replace(span.text, '').strip()

How to specify specific elements based on their attributes using XPATH in LXML

I am trying to improve my understanding of XPATH. I have a document that has many elements. I am looking for font elements within the document that have some specific text that is bolded.
Here is an example of a font element inside a div element. The font element has text that is bold that I want to capture
<div style="line-height:120%;padding-bottom:10px;padding-top:10px;font-size:10pt;"><font style="font-family:inherit;font-size:10pt;font-weight:bold;">SECTION 1. Executive Summary</font></div>
Let me be clear this is part of a much larger document,. I found an XPATH tutorial and it described how to select specific elements
Just to make sure I am not running into an issue with how I am reading the file
tree=html.fromstring(open('c:\\mytest.htm')
x=tree.xpath('//font')
This worked as x has 3023 elements and when I examined them I found that they were all font elements. Some were the elements I want.
I then tried to isolate the relevant elements by using
my_elements = tree.xpath('//font[#font-weight='bold']')
That did not work as my_elements is empty. While writing this question I looked closer at the snip of html and played around some in IDLE. I think the problem is that I there is not an attribute of the font tag font-weight. Font-weight is something related to the style attribute The only attribute of the font tag in this example is style. I want to say more but am afraid I will muddy the water too much
Bottom line I want to be able to use xpath to find all font elements that are bold and have the word section in the text I can do this by iterating through the elements and testing in a really clunky way
my_elements = [e for e in tree.iter() if e.tag == 'font' if 'bold' in e.values()[0] ]
my_elements = e for e in my_elements if 'section' in e.text_content().lower()
XPATH just looks like it is well worth understanding.
Thanks for any explanation.
Humm I finally am on the right track
testelem=tree.xpath('//font[contains(#style,"font-weight:bold")]')
okay now we use the and operator
testelem=tree.xpath('//font[contains(#style,"font-weight:bold") and contains(text(),"SECTION")]')
Now to make it case-insensitive
I am getting close to understanding how contains works but am happy for someone who does to put up a solution
testelem=tree.xpath('//font[contains(#style,"font-weight:bold") and starts-with(translate(text(),"SECTION","section"),"section")]')
Just realise this post is like > 2 years old, anyway I still hope this answer will help someone who comes to this question.
You can use regular expression in xpath from lxml. By default, XPath supports regular expressions in the EXSLT namespace:
testelem = tree.xpath('//font[re:match(text(), \
"^(?i)section.*") and \
contains(#style, "font-weight:bold")]',
namespaces={'re': "http://exslt.org/regular-expressions"})
print testelem
[<Element font at 0x1042f49f0>]
for t in testelem:
print t.text, t.attrib
SECTION 1. Executive Summary {'style': 'font-family:inherit;font-size:10pt;font-weight:bold;'}

Categories

Resources