I'm attempting to extract "479" from this sample HTML:
<div data-testid="testid">
"479"
" Miles Away"
</div>
I'm using the following Selenium code in Python:
xpath = 'html/body/div/text()[1]'
WebDriverWait(driver, 30).until(EC.visibility_of_element_located((By.XPATH, xpath)))
distance = driver.find_element(By.XPATH, xpath)
print(distance)
Which returns the following error:
'The result of the xpath expression "html/body/div/text()[1]" is: [object Text]. It should be an element.'
I've attempted to remove 'text()[1]' from the end of my xpath, theoretically printing off all data contained the in the HTML div, but it will instead print a blank line when I do so.
Note: I'm an amateur and self-taught (via mostly Google, YouTube, and this site), so some of my wordage may not be correct. I apologize in advanced.
Given the html:
<div data-testid="testid">
"479"
" Miles Away"
</div>
Both the texts 479 and Miles Away are with in 2 different text nodes.
Selenium doesn't supports text() as it returns a text node, where as Selenium expects back a WebElement. Hence you see the error:
The result of the xpath expression "html/body/div/text()[1]" is: [object Text]. It should be an element.
Solution
To extract the text 479 you can use either of the following locator strategies:
Using xpath through execute_script() and textContent:
print(driver.execute_script('return arguments[0].firstChild.textContent;', WebDriverWait(driver, 30).until(EC.visibility_of_element_located((By.XPATH, "//div[#data-testid='testid']")))).strip())
Using xpath through splitlines() and get_attribute():
print(WebDriverWait(driver, 30).until(EC.visibility_of_element_located((By.XPATH, "//div[#data-testid='testid']"))).get_attribute("innerHTML").splitlines()[1])
The problem is that you can't treat text like that, the text() function returns everything as a string including a line break. I think there is no split function that can help you with that, I advise you to get the text in a python variable and do a split('\n') to the text.
xpath = 'html/body/div/text()'
WebDriverWait(driver,30).until(EC.visibility_of_element_located((By.XPATH, xpath)))
distance = driver.find_element(By.XPATH, xpath)
print(distance.split('\n')[0])
You should take the entire element (without text()) using only
html/body/div
then from returned element get text, which will be: "479" " Miles Away" .
Then using split method from python you can take that number(split by \n, space, or ").
Selenium doesn't support the following xpath
xpath = 'html/body/div/text()[1]'
To identify the element uniquely, Your xpath should be like
xpath = '//div[#data-testid="testid"]'
WebDriverWait(driver, 30).until(EC.visibility_of_element_located((By.XPATH, xpath)))
distance = driver.find_element(By.XPATH, xpath).text
print(distance)
To get the text of the element you have to use element.text
Related
I'm trying to extract the text value following a b tag that contains specific text. I'm using Selenium web driver with Python3.
The HTML inspected for the value I'm trying to return (11,847) is here:
This has an Xpath below (I'm not using this xpath directly to find the element as the table construction changes for different examples that I plan to iterate through):
/html/body/form[1]/div[2]/table[2]/tbody/tr[3]/td[2]/text()
As an example, when I print the below it returns Att: i.e. the element located by my search for the text 'Att' within the b tags.
att=driver.find_element("xpath",".//b[contains(text(), 'Att')]").text
print(att)
Is there a way I can return the value following <b>Att:</b> by searching for 'Att:' (or conversly, I'd also like to return the value following <b>Ref:</b>.
Thanks in advance.
11,847 text content belongs to td node.
You can locate this td element by it's child b text content.
Then you will be able to retrieve the entire text content of that td node.
It will contain Att: and extra spaces and the desired 11,847 string.
Now you will need to remove the Att: and extra spaces so only 11,847 will remain.
As following:
#get the entire text content
entire_text = driver.find_element(By.XPATH,"//td[.//b[contains(text(), 'Att')]]").text
#get the child node text content
child_text = driver.find_element(By.XPATH,"//b[contains(text(), 'Att')]").text
#remove child text content from entire text content
goal_text = entire_text.replace(child_text,'')
#trim white spaces
goal_text = goal_text.strip()
You can use the find_element_by_xpath() method to locate the element that contains the text 'Att:' and then use the find_element_by_xpath() method again to locate the following text node. Here is an example of how you can do this:
att_element = driver.find_element_by_xpath("//b[contains(text(), 'Att:')]")
att_value = att_element.find_element_by_xpath('./following-sibling::text()').text
print(att_value)
This will locate the element that contains the text 'Att:', then locate the following text node, and return the text value of that node.
Similarly you can use the same xpath for 'Ref:' as well just change the text part to 'Ref:'
ref_element = driver.find_element_by_xpath("//b[contains(text(), 'Ref:')]")
ref_value = ref_element.find_element_by_xpath('./following-sibling::text()').text
print(ref_value)
Note that this will only work if the text value you're trying to extract is immediately following the element that contains 'Att:' or 'Ref:' in a text node.
The following xpath would result in an error:
/html/body/form[1]/div[2]/table[2]/tbody/tr[3]/td[2]/text()
as Selenium returns only WebElements but not objects.
Solution
The text 11,847 is within a text node which is the second decendent of the <td> node. So to print the text you have to induce WebDriverWait for the visibility_of_element_located() and you can use either of the following locator strategies:
Using XPATH and childNodes[n]:
print(driver.execute_script('return arguments[0].childNodes[2].textContent;', WebDriverWait(driver, 20).until(EC.visibility_of_element_located((By.XPATH, "//tr[#class='initial']//td[#align='right']")))).strip())
Using XPATH and splitlines():
print(WebDriverWait(driver, 20).until(EC.visibility_of_element_located((By.XPATH, "//tr[#class='initial']//td[#align='right']"))).get_attribute("innerHTML").splitlines()[2])
Note : You have to add the following imports :
from selenium.webdriver.support.ui import WebDriverWait
from selenium.webdriver.common.by import By
from selenium.webdriver.support import expected_conditions as EC
Currently I have this line of code which correctly selects this type of object on the webpage I'm trying to manipulate with Selenium:
pointsObj = driver.find_elements(By.CLASS_NAME,'treeImg')
What I need to do is add in a partial string match condition as well which looks in the section "CLGV (AHU-01_ahu_ChilledWtrVlvOutVolts)" in the line below.
<span class="treeImg v65point" style="cursor:pointer;">CLGV (AHU-01_ahu_ChilledWtrVlvOutVolts)</span>
I found online there's the ChainedBy option but I can't think of how to reference that text in the span. Do I need to use XPath? I tried that for a second but I couldn't think of how to parse it.
Refering both the CLASS_NAME and the innerText you can use either of the following locator strategies:
xpath using the classname treeImg and partial innerText:
pointsObj = driver.find_elements(By.XPATH,"//span[contains(#class, 'treeImg') and contains(., 'AHU-01_ahu_ChilledWtrVlvOutVolts')]")
xpath using all the classnames and entire innerText:
pointsObj = driver.find_elements(By.XPATH,"//span[#class='treeImg v65point' and text()='CLGV (AHU-01_ahu_ChilledWtrVlvOutVolts)']")
This question already has answers here:
selenium.common.exceptions.InvalidSelectorException with "span:contains('string')"
(2 answers)
Closed 1 year ago.
When I try to use :contains in Selenium's By.CSS_SELECTOR, such as
presence = EC.presence_of_element_located((By.CSS_SELECTOR, ".btn:contains('Continue Shopping')"))
or
presence = EC.presence_of_element_located((By.CSS_SELECTOR, ".btn\:contains('Continue Shopping')"))
or
presence = EC.presence_of_element_located((By.CSS_SELECTOR, ".btn\\:contains('Continue Shopping')"))
the Python program crashes with the error
Exception: Message: invalid selector: An invalid or illegal selector was specified
(Session info: chrome=95.0.4638.54)
Is it possible to use :contains in Selenium? The CSS selector
$('.btn:contains("Continue Shopping")')
works fine in Chrome's JS console.
Using Chrome 95.0.4638.54, ChromeDriver 95.0.4638.54, Python 3.10 on Ubuntu 20.04.
The selector :contains('text') is a jQuery selector, not a valid CSS selector like Selenium is expecting. I'm assuming the reason it works on the page via Chrome's DevTools console is because the page has jQuery defined on it.
Unfortunately, I do not believe you can directly select an element via its text using a CSS selector (link).
You have two options as far as I can see:
Alter your selector to be class or ID based (easiest)
Create a Selenium utility to run a JS script that uses this jQuery selector; e.g. execute_script("jQuery(" + id + ":contains('" + text + "')", id, text)
As mentioned by Aspok your CSS locators are not a valid CSS locators.
To locate element based on it text you can use XPath locator, something like:
//*[contains(#class,'btn') and(contains(text(),'Continue Shopping'))]
In case btn is the only class name attribute of that element your XPath can be
//*[#class='btn' and(contains(text(),'Continue Shopping'))]
As explained by #aspok, it is not a valid css selector.
In case you would like to have XPath for the same, and .btn is class and have text/partial text
Continue Shopping
You can try the below XPath :
//*[contains(text(),'Continue Shopping')]
or
//*[contains(text(), 'Continue Shopping') and contains(#class, 'btn')]
Please check in the dev tools (Google chrome) if we have unique entry in HTML DOM or not.
xpath that you should check :
//*[contains(text(), 'Continue Shopping') and contains(#class, 'btn')]
Steps to check:
Press F12 in Chrome -> go to element section -> do a CTRL + F -> then paste the xpath and see, if your desired element is getting highlighted with 1/1 matching node.
Also, Just letting you know that, //* can be replaced by tag name, if you found multiple matching nodes.
I am trying to extract some information from the amazon website using selenium. But I am not able to scrape that information using xpath in selenium.
In the image below I want to extract the info highlighted.
This is the code I am using
try:
path = "//div[#id='desktop_buybox']//div[#class='a-box-inner']//span[#class='a-size-small')]"
seller_element = WebDriverWait(driver, 5).until(
EC.visibility_of_element_located((By.XPATH, path)))
except Exception as e:
print(e)
When I run this code, it shows that there is an error with seller_element = WebDriverWait(driver, 5).until( EC.visibility_of_element_located((By.XPATH, path))) but does not say what exception it is.
I tried looking online and found that this happens when selenium is not able to find the element in the webpage.
But I think the path I have specified is right. Please help me.
Thanks in advance
[EDIT-1]
This is the exception I am getting
Message:
//div[class='a-section a-spacing-none a-spacing-top-base']//span[class='a-size-small a-color-secondary']
XPath could be something like this. You can shorten this.
CSS selector could be and so forth.
.a-section.a-spacing-none.a-spacing-top-base
.a-size-small.a-color-secondary
I think the reason is xpath expression is not correct.
Take the following element as an example, it means the span has two class:
<span class="a-size-small a-color-secondary">
So, span[#class='a-size-small') will not work.
Instead of this, you can ues xpath as
//span[contains(#class, 'a-size-small') and contains(#class, 'a-color-secondary')]
or cssSelector as
span.a-size-small.a-color-secondary
Amazon is updating its content on the basis of the country you are living in, as I have clicked on the link provided by you, there I did not find the element you are looking for simply because the item is not sold here in India.
So in short if you are sitting in India and try to find your element, it is not there, but as you change the location to "United States". it is appearing there.
Solution - Change the location
To print the Ships from and sold by Amazon.com of an element you have to induce WebDriverWait for the visibility_of_element_located() and you can use either of the following Locator Strategies:
Using CSS_SELECTOR and get_attribute():
print(WebDriverWait(driver, 20).until(EC.visibility_of_element_located((By.CSS_SELECTOR, "div.a-section.a-spacing-none.a-spacing-top-base > span.a-size-small.a-color-secondary"))).get_attribute("innerHTML"))
Using XPATH and text attribute:
print(WebDriverWait(driver, 20).until(EC.visibility_of_element_located((By.XPATH, "//div[#class='a-section a-spacing-none a-spacing-top-base']/span[#class='a-size-small a-color-secondary']"))).text)
Note : You have to add the following imports :
from selenium.webdriver.support.ui import WebDriverWait
from selenium.webdriver.common.by import By
from selenium.webdriver.support import expected_conditions as EC
You can find a relevant discussion in How to retrieve the text of a WebElement using Selenium - Python
Outro
Link to useful documentation:
get_attribute() method Gets the given attribute or property of the element.
text attribute returns The text of the element.
Difference between text and innerHTML using Selenium
How would you find an element in selenium (with python) of the following html:
Login
As per the HTML you have shared to find the element by href you can use either of the following:
css_selector:
driver.find_element_by_css_selector("a[href=/user/login]")
xpath:
driver.find_element_by_xpath("//a[#href='/user/login']")
xpath (Multiple Attributes):
driver.find_element_by_xpath("//a[#href='/user/login' and text()='Login']")