Selenium Python : find element whose href attribute has required keyword - python

The page I'm working on is in this link.
This is the relevant portion of that page:
<html xmlns="http://www.w3.org/1999/xhtml" xml:lang="en" lang="en">
<head>...</head>
<body>
...
<div id="searchResults">
<div class="box-related">...</div>
<img src="/ema/images/icon_download_spread.gif" />Download results to spreadsheet
<div class="table-holder">
<table class="table-epar eparResults" border="1" cellpadding="0" cellspacing="0" summary="Search results for EPARs ordered alphabetically">
<caption>EPAR Search results</caption>
<thead> ... </thead>
<tbody>
<tr>
<th scope="row" class="key-detail name word-wrap">
Abilify
</th>
...
</tr>
<tr>...</tr>
</tbody>
</table>
</div>
</div>
</body>
</html>
This is the XPath location of the element I wish to select:
//*[#id="searchResults"]/div[2]/table/tbody/tr[1]/th/a
But there may be many results on the searchpage, so I want to click on the link whose URL has the product number that I'm searching for (which is 000471 in this case). I want to select the <a> element which contains that string in the href attribute.
Here's what I've tried:
inp = driver.find_element_by_xpath("//*[#id='searchResults']/div[2]/table/tbody/tr[1]/th/a[contains(#href,'"+str3+"')]")
inp.click()
where str3 has the value 000471 in this case. But I keep getting NoSuchElementException.
Any help would be appreciated!

The problem is probably cause by elements which are inserted into the source code viewer or inspector when rebuilding the table. The tbody tag is usually inserted in the code when it doesn't actually exist in the real source.
You can eliminate the unnecessary steps in your XPath, if you can still obtain a unique location path to the data you wish to select. This might be sufficient:
//*[#id='searchResults']//a[contains(#href,'000471')]
If the other steps are still necessary, you can try it without the tbody.
Update I also noticed that your search page declares a namespace:
<html xmlns="http://www.w3.org/1999/xhtml" xml:lang="en" lang="en">
...
Automatic registration of default namespaces is implementation dependent. XPath requires all selectors to be qualified with a namespace. If your selenium implementation doesn't do that, you need to either register a namespace/prefix mapping, and prefix all elements in the namespace (ex: //h:table/h:tr/h:td) or ignore the namespace, using wildcards and comparing the local name in a predicate.
If the namespace is keeping you from selecting the node, you can ignore it with this expression:
//*[#id='searchResults']//*[local-name() = 'a'][contains(#href,'000471')]

Related

Getting the data inside a td tag using Selenium Python

I am new to Selenium Python. I am using Selenium for logging into a website. So far I have successfully logged in and navigated to the page I want. In that page I have a table with id "Common". Inside the table body I have a no. of table rows. I need to get a particular the value "234" from the table. Below is the rough look of the HTML. I need the value "234" to be printed in the output window. I am using Python 2.7. Any help is much appreciated.
<div id="Common" class="x6w" theme="medium">
<div id="Common::content" class="x108" theme="medium">
<div>
<div id="pf12" class="x19" theme="medium">
<table cellpadding="0" cellspacing="0" border="0" summary="" role="presentation" style="width: auto">
<tbody>
<tr>
<td class="x4w" theme="medium" colspan="1">
<table cellpadding="0" cellspacing="0" border="0" width="100%" summary="" role="presentation">
<tbody>
<tr><td style="width: 150px"></td><td></td></tr>
<tr>....</tr>
<tr>....</tr>
<tr class="13" theme="medium" id="15"><td class="13" theme="medium"><label class="label-text" theme="medium">ID</label></td><td valign="top" style="padding-left:9px" class="xv" theme="medium">234</td></tr>
there is a authorisation error, in the html code
please provide the link for the page[Complte]
I you want to know how to iterate through elements this code snippet from Github will help you
Your problem seems related to locating an element.
You can do research on relative xpath
Anyways, base on your html, here's the locator using xpath
targetElem = driver.find_element_by_xpath("//div[contains(#id,'Common')]//tbody/*[text()='ID']/parent::td/following-sibling::td") value = targetElem.text
For any webtable that you are accessing you should be able to extract the the td columns and for that you need to identify that way to make the unique locator like in the case above, you have text in between tags so you can xpath using javascript function "//*[text()='234]"
otherwise the id attribute can be used to extract that element and using the "text" function you can get the text printed on console.

BeautifulSoup select function works differently between Python3.5.2 and Python3.4.2

Problem: I have a html file, it contains some tag, and now I want to find a tag(table) with a class attribute which value is 'targets', use BeautifulSoup4.5.1, it works fine in python3.5.2(Mac Sierra), but do not work in python3.4.2(raspberry pi), I want to figure out why.
Here is the example html file(test.html):
<!DOCTYPE html>
<html>
<head>
<title>test</title>
</head>
<body>
<table class="maincontainer">
<tbody>
<tr>中文</tr>
<tr>
<td>
<table class="main">
<tbody>
<tr>
<td class="embedded">
<td></td>
<table class="targets"></table>
</td>
</tr>
</tbody>
</table>
</td>
</tr>
</tbody>
</table>
</body>
</html>
and here is how I write in the python file:
str=''
with open('test.html','rt',encoding='utf-8') as f:
str=f.read()
from bs4 import BeautifulSoup
soup=BeautifulSoup(str)
table=soup.select('table[class="targets"]')
so can anyone tell me about these following questions:
how select function work?
why this does not work in 3.4.2 but work in 3.5.2?
is there any answer to handle this problem?
This is because of the different modules installed in your 3.5 and 3.4 Python environments. When you don't pass a desired parser name explicitly:
soup = BeautifulSoup(str)
BeautifulSoup would pick the parser automatically choosing from one of the installed modules. If you have lxml installed, it would pick it, if not, it would pick html5lib - if it is not installed, it would pick the built-in html.parser:
If you don’t specify anything, you’ll get the best HTML parser that’s
installed. Beautiful Soup ranks lxml’s parser as being the best, then
html5lib’s, then Python’s built-in parser.
In other words, you should be defining the parser explicitly to avoid any future related problems. Determine which one works for your particular case and set it:
soup = BeautifulSoup(str, "html5lib")
# or soup = BeautifulSoup(str, "lxml")
# or soup = BeautifulSoup(str, "html.parser")

Python/Selenium, how to access html list without id, but has mutlitple list of same class on page

I am new to Selenium and was wondering how to correctly find items in the html list below. The issue I am having is the html list does not have an 'id' directly, it is in a 'span' a couple of lines above. The page has a few of these an they all have the same class "selectUL". In this example case it is the "lang" list, but there is also region, timezone etc.
I am trying to write a function that takes a 'field'(lang region etc) and use that to find_element_by_xpath to parse it out and eventually report which one is selected (and or another function to set the selection)
So... assuming browser is webdriver.Chrome() and I was able to log in etc.
field = "lang"
# obviously not working but hopefully concept makes sense
sysEntry = browser.find_element_by_xpath("//*[#id='{}']//ul[contains(#class, 'selectUL')]".format(field))
Web Page snippit looks like:
<table class="info_table conf_table" cellspacing="0" cellpadding="0">
<tr>
<td class="head" colspan="2">Language / Country</td>
</tr>
<tr>
<td class="sub_head">Language</td>
<td class="content normal">
<div class="selectbox selectmenu">
<a class="selectbtn">
<span id="lang" class="selecttext">None</span>
<span class="select-arrow"></span>
</a>
<ul class="selectUL">
<li id="langEN" class="sel">English
</li>
<li id="langFR">French
</li>
<li id="langGE">German
</li>
</ul>
</div>
How do I access these so that I can read from/write to them?
I think the easier way to do this is to look for class="sel" on the LI. That seems to indicate which option is selected. From there, you can grab the A inside and then the text inside the A. You can use a CSS Selector to find this element using "li.sel > a" then grab the text inside that element. Something like
browser.find_element_by_css_selector("li.sel > a").text
This should return "English" from your HTML sample above.
Let's go a slightly different but more specific route. We can use XPath to find the TD that contains "Language" and then down through the children from there to find the LI with class sel and then get the A that contains the text you want.
browser.find_element_by_xpath("//td[#class='sub_head'][text()='Language']/following-sibling::td//li[#class='sel']/a").text

Remove matched tags in html files?

I have some html files, each of which contains
<td id="MenuTD" style="vertical-align: top;">
...
</td>
where ... can contain anything, and </td> matches <td id="MenuTD" style="vertical-align: top;">. I would like to remove this part from the html files.
Similarly, I may also want to remove some other tags in the files.
How shall I program that in Python?
I am looking at HTMLParser module in Python 2.7, but haven't figured out if that can help.
You can accomplish this using BeautifulSoup. You have two options, depending on what you want to do with the element you're removing.
Set up:
from bs4 import BeautifulSoup
html_doc = """
<html>
<header>
<title>A test</title>
</header>
<body>
<table>
<tr>
<td id="MenuTD" style="vertical-align: top;">
Stuff here <a>with a link</a>
<p>Or paragraph tags</p>
<div>Or a DIV</div>
</td>
<td>Another TD element, without the MenuTD id</td>
</tr>
</table>
</body>
</html>
"""
soup = BeautifulSoup(html_doc)
Option 1 is to use the extract() method. Using this, you will retain a copy of your extracted element so that you can utilize it later in your application:
Code:
menu_td = soup.find(id="MenuTD").extract()
At this point, the element you are removing has been saved to the menu_td variable. Do what you want with that. Your HTML in the soup variable no longer contains your element though:
print(soup.prettify())
Outputs:
<html>
<header>
<title>
A test
</title>
</header>
<body>
<table>
<tr>
<td>
Another TD element, without the MenuTD id
</td>
</tr>
</table>
</body>
</html>
Everything in the MenuTD element has been removed. You can see it is still in the menu_td variable though:
print(menu_td.prettify())
Outputs:
<td id="MenuTD" style="vertical-align: top;">
Stuff here
<a>
with a link
</a>
<p>
Or paragraph tags
</p>
<div>
Or a DIV
</div>
</td>
Option 2: Utilize .decompose(). If you do not need a copy of the removed element, you can utilize this function to remove it from the document and destroy the contents.
Code:
soup.find(id="MenuTD").decompose()
It doesn't return anything (unlike .extract()). It does, however, remove the element from your document:
print(soup.prettify())
Outputs:
<html>
<header>
<title>
A test
</title>
</header>
<body>
<table>
<tr>
<td>
Another TD element, without the MenuTD id
</td>
</tr>
</table>
</body>
</html>

CSS Selectors, Choose by CHILD values

Let say I have an html structure like this:
<html><head></head>
<body>
<table>
<tr>
<td>
<table>
<tr>
<td>Left</td>
</tr>
</table>
</td>
<td>
<table>
<tr>
<td>Center</td>
</tr>
</table>
</td>
<td>
<table>
<tr>
<td>Right</td>
</tr>
</table>
</td>
</tr>
</table>
</body>
</html>
I would like to construct CSS selectors to access the three sub tables, which are only distinguished by the contents of a table data item in their first row.
How can I do this?
I think there no such method available in css selector to verify the inner text.
You can achieve that by using xpath or jQuery path.
xpath :
"//td[contains(text(),'Left')]"
or
"//td[text()='Right']"
jQuery path
jQuery("td:contains('Centre')")
Using below logic you can execute jQuery paths in WebDriver automation.
JavascriptExecutor js = (JavascriptExecutor) driver;
WebElement element=(WebElement)js.executeScript(locator);
the .text method on an element returns the text of an element.
tables = page.find_elements_by_xpath('.//table')
contents = "Left Center Right".split()
results = []
for table in tables:
if table.find_element_by_xpath('.//td').text in contents: # returns only the first element
results.append(table)
You can narrow the search field by setting 'page' to the first 'table' element, and then running your search over that. There are all kinds of ways to improve performance like this. Note, this method will be fairly slow if there are a lot of extraneous tables present. Each webpage will have it's quirks on how it chooses to represent information, make sure you work around those to gain efficiency.
You can also use list comprehension to return your results.
results = [t for t in tables if t.find_element_by_xpath('.//td').text in contents]

Categories

Resources