xpath returns empty list for the following queries.
Need to fetch UrlOne1, UrlOne2, DataOne1, DataOne, DataOne2
<table>
<thead></thead>
<tbody class="dataContainer">
<tr class="tableLight">
<td>DataOne1</td>
<td> <span class="badge"></span> <span class="long">DataOne</span> <span class="short">DataOne</span> </td>
<td class="hide-s"><span class="ClassOneCN"></span> <span class="ClassOne2">DataOne2</span></td></tr>
<tr class="tableLight">
<tr class="tableLight">
<tr class="tableLight">
returns null [] for the following
response.xpath('//*[#class="dataContainer"]/a/#href')
response.xpath('//*[#class="tableLight"]')
response.xpath('//*[local-name() = "tr" and class="tableLight"]')
but the code below works fine with answer : ['>]
response.xpath('//*[#class="dataContainer"]')
For the first xpath //*[#class="dataContainer"]/a/#href
// is the descendant-or-self axis whereas / is a direct child of the current node. In this case a isn't a direct child so you need to use // :
//*[#class="dataContainer"]//a/#href
The second path //*[#class="tableLight"] should work, but if you know it's an tr tag use it :
//tr[#class="tableLight"]
And for the third xpath //*[local-name() = "tr" and class="tableLight"] class is an attribute so you need to use #class (but I would suggest using the xpath above instead) :
//*[local-name() = "tr" and #class="tableLight"]
As for your what you need (UrlOne1, UrlOne2, DataOne1, DataOne, DataOne2), you could get the a elements like so response.xpath('//tr[#class="tableLight"]//a') and then retrieve the href attribute or text for each a element.
Or directly get the href attributes and text :
//tr[#class="tableLight"]//a/#href
//tr[#class="tableLight"]//a//text()
Related
I'm crawling with python3 and selenium.
I want to modify text in table
<table class="my table">
<tbody>
<tr>...</tr>
<tr>
<td>
<center>
This is text
</center>
</td>
</tr>
<tr>...</tr>
</tbody>
</table>
My goal is to modify "This is text" to another text.
I tried below code.
# python3 + selenium
table = driver.find_element_by_xpath("table xpath")
for tr in table.find_elements(By.TAG_NAME, 'tr'):
for td in tr.find_elements(By.TAG_NAME, 'td'):
for target in td.find_elements(By.TAG_NAME, 'center'):
print(target.text) # This is text
driver.execute_script('document.getElementById("{}").innerHTML = {}";"'.format(target, "new text"))
I got the following error
selenium.common.exceptions.JavascriptException: Message: javascript error: Unexpected identifier
(Session info: headless chrome=100.0.4896.60)
How can modify that?
Thank you.
The document.getElementById expects an ID and not a WebElement. You're trying to pass a WebElement to it.
Do this instead:
table = driver.find_element(By.XPATH,'//table[#class="my table"]')
new_text = "This is the new text"
for tr in table.find_elements(By.TAG_NAME, 'tr'):
for td in tr.find_elements(By.TAG_NAME, 'td'):
for target in td.find_elements(By.TAG_NAME, 'center'):
print(f'Text before: {target.text}')
driver.execute_script(f'arguments[0].innerHTML="{new_text}"', target)
print(f'Text after: {target.text}\n')
For more information about arguments[0] in execute_script, read this answer
What does arguments0 and arguments1 mean
In the following example:
<tr>
<td>
</td>
<td>
</td>
<td>
</td>
<td>
</td>
<td>
text1
<br>
<img>
<br>
text2
</td>
</tr>
When I try to get the text in the 5th td like so:
something = elem.find_element_by_xpath('./td[5]').text
I get both texts in the same variable. I can split them but I was wondering if I can somehow get them in individual variables so I don't bother with a split. However when I try something like this:
something = elem.find_element_by_xpath('./td[5]/text()[1]')
I get the following error message:
InvalidSelectorException: invalid selector:
The result of the xpath expression "./td[5]/text()[1]" is: [object Text].
It should be an element.
Can I get around this error somehow?
You can try below code to get two separate text nodes:
something = elem.find_element_by_xpath('./td[5]')
text1 = driver.execute_script('return arguments[0].firstChild.textContent;', something).strip()
text2 = driver.execute_script('return arguments[0].lastChild.textContent;', something).strip()
In your initial code trial when you used :
something = elem.find_element_by_xpath('./td[5]').text
You got both the elements text1 and text2 as both the text were part of <td[5]>
In your next code trial when you used :
something = elem.find_element_by_xpath('./td[5]/text()[1]')
Raised InvalidSelectorException because, though ./td[5]/text() is a valid xpath expression but currently is not supported by Selenium. Hence the error is raised.
To extract the texts text1 and text2 from the HTML you have provided you can use the str.splitlines method as follows :
text1 = driver.find_element_by_xpath("//tr//following-sibling::td[5]").get_attribute("innerHTML").splitlines()[1]
text2 = driver.find_element_by_xpath("//tr//following-sibling::td[5]").get_attribute("innerHTML").splitlines()[5]
I've written a script in python to find the text within td tag which is the next sibling of first tdtag using BeautifulSoup in combination with css selectors. If i run the script, i find it working. However, when i do the same using lxml library, it no longer works. How can i get my latter script working? Thanks.
This is the content:
html_content="""
<tr>
<td width="25%" valign="top" bgcolor="lightgrey" nowrap="">
<font face="Arial" size="-1" color="224119">
<b>Owner Address </b>
</font>
</td>
<td width="75%" valign="top" nowrap="">
<font face="Arial" size="-1" color="black">
1698 EIDER DOWN DR<br>SUMMERVILLE SC 29483
</font>
</td>
</tr>
"""
Working one with bs4:
from bs4 import BeautifulSoup
soup = BeautifulSoup(html_content,"lxml")
item = soup.select("td")[0].find_next_sibling().text
print(item)
Result:
1698 EIDER DOWN DRSUMMERVILLE SC 29483
The below script can find the address string:
from lxml.html import fromstring
root = fromstring(html_content)
item = root.cssselect("td b:contains('Address')")[0].text
print(item)
Result:
Owner Address
It doesn't work when it comes to find the next sibling (applied "+" sign to find the next sibling:
from lxml.html import fromstring
root = fromstring(html_content)
item = root.cssselect("td b:contains('Owner Address')+td")[0].text
print(item)
Result:
Traceback (most recent call last):
File "C:\Users\ar\AppData\Local\Programs\Python\Python35-32\new_line_one.py", line 28, in <module>
item = root.cssselect("td b:contains('Owner Address')+td")[0].text
IndexError: list index out of range
How can i make it work to find the next sibling? Btw, I'm only after css selectors not xpath. Thanks.
From the css3 selector docs:
8.3.1. Adjacent sibling combinator
The adjacent sibling combinator is made of the "plus sign" (U+002B, +)
character that separates two sequences of simple selectors. The
elements represented by the two sequences share the same parent in the
document tree and the element represented by the first sequence
immediately precedes the element represented by the second one.
Which means in your selector td b:contains('Owner Address')+td, you're asking for a td that has the same parent as the b which contains 'Address' and is child of another td. This node does not exist. To make it work, you need to make sure that you're first partial selector matches the td, not the b node. Since they contain each other, the following would work:
td:contains('Owner Address') + td
Note that this td has no text (only child nodes), so your snippet from above only prints whitespaces.
I am trying to use Python Selenium to input a custom value in an input box on a website.
The html shows this element is stored within a table. The html and my code are shown below. I have tried sending keys on the element with class 'filterLink' and the class 'value'. Is it possible to send keys on a table data element?
Or should I be looking somewhere else in the html to send keys on this field?
The error I am getting is "Message: unknown error: cannot focus element"
Thanks very much!!
<div class="ContentSourceTypeData show" group="Provider Type">
<ul>
<li>
<table class="listItem">
<tbody>
<tr>
<td class="label">
<a class="filterLink" href="javascript:void(0);" value="bank" name="Banks" address="true">
<span class="value">Banks</span> (1831)
</a>
</td>
</tr>
.....
My code...
index_details_elem = browser.find_element_by_class_name('ContentSourceTypeData')
nameTable = index_details_elem.find_elements_by_class_name("listItem")[0] #
Select first listItem as element
nameDataElem = nameTable.find_element_by_class_name("label")
#nameInputElem = nameDataElem.find_elements_by_class_name("value") #used the above line instead of this one as the 'span' element seemed to be causing an issue
#print nameElem.location()
#nameDataElem.click() # removed as I can't click in a table
nameDataElem.send_keys("lookup value")
I would expect that you are able to send_keys to any selenium webelement. I would probably use a css selector (Maybe there is another element on the page with a name of "label"). Try doing:
webelement = "a[class=\"filterLink\"]"
webelement.send_keys("lookup value")
That will select all "a" elements, with a class value of "filterLink", and then send the keys "lookup value" to it.
I want to recover a number that is located in the following table:
the site
<table class="table table-hover table-inx">
<tbody><tr>
</tr>
<tr>
</tr>
<tr>
</tr>
<tr>
<td class=""><label for="RentNet">Miete (netto)</label></td>
<td>478,28 €</td>
</tr>
<tr>
</tr>
<tr>
</tr>
<tr>
<td class=""><label for="Rooms">Zimmer</label></td>
<td>4</td>
</tr>
</tbody></table>
I suppose this strange format happens because the table entries are optional. I get to the table with driver.find_element_by_css_selector("table.table.table-hover") and I see how one could easily iterate through the <tr> tags. But how do I find the second <td> holding the data, in the <tr> with the <label for="Rooms"> ?
Is there a more elegant way than "find the only td field with a one-digit number" or load the detail page?
This similar question didn't help me, because there the tag in question has an id
EDIT:
I just found out about a very helpful cheat sheet for Xpath/CSS selectors posted in an answer to a related question: it contains ways to reference child/parent, next table entry etc
You can select the appropriate td tag using driver.find_element_by_xpath(). The XPath expression that you should use is as follows:
`'//label[#for="Rooms"]/parent::td/following-sibling::td'`
This selects the label tag with for attribute equal to Rooms, then navigates to its parent td element, then navigates to the following td element.
So your code will be:
elem = driver.find_element_by_xpath(
'//label[#for="Rooms"]/parent::td/following-sibling::td')
An example of the XPath expression in action is here.
With xpath, you can create a search for an element that contains another element, like so:
elem = driver.find_element_by_xpath('//tr[./td/label[#for="Rooms"]]/td[2]')
The elem variable will now hold the second td element within the "Rooms" label row (which is what you were looking for). You could also assign the tr element to the variable, and then work with all of the data in the row since you know the cell structure (if you would like to work with the label and data).
Have you tried xpath? Firebug is a great tool for copying xpaths. It will use indices to select the element you want. It's especially useful when your element has no name or ID.
Edit: not sure why I was down voted? I went on the site and found the XPath Firebug gave me:
/html/body/div[2]/div[7]/div[2]/div[3]/div/div[1]/div/div[3]/div[3]/div/table/tbody/tr[7]/td[2]
To get that 4, just:
xpath = "/html/body/div[2]/div[7]/div[2]/div[3]/div/div[1]/div/div[3]/div[3]/div/table/tbody/tr[7]/td[2]"
elem = driver.find_element_by_xpath(xpath)
print elem.text # prints '4'
And to get all the elements for "rooms", you can simply driver.find_elements_by_xpath using partial xpath, so like this:
xpath = "/div/div[1]/div/div[3]/div[3]/div/table/tbody/tr[7]/td[2]"
elems = driver.find_elements_by_xpath(xpath) # returns list
for elem in elems:
print elem.text # prints '3', '3', '4'
Finally, you might be able to get the data with page source.
First, let's make a function that outputs a list of rooms when we input the page source:
def get_rooms(html):
rooms = list()
partials = html.split('''<label for="Rooms">''')[1:]
for partial in partials:
partial = partial.split("<td>")[1]
room = partial.split("</td>")[0]
rooms.append(room)
return rooms
Once we have that function defined, we can retrieve the list of room numbers by:
html = driver.page_source
print get_rooms(html)
It should output:
["3", "3", "4"]