Hi we are running this code and it is driving my crazy
we capture a data table in table this works
then grab all th and it's text in sizes this works
then we want to grab all underlying rows in TR; and after loop over columns in rows : does not work! the color_rows object is always empty .. but when testing with xpath in the browser it does! work ... why? how?
My question is: how can I grab the tbody/tr's?
Expected flow
loop over TR's
Access, TR 1 by 1, get 1st TD
Get all TD's data that have class form-control
table = response.xpath('//div[#class="content"]//table[contains(#class,"table")]')
sizes = table.xpath('./thead//th/text()').getall()[1:] #works!
color_rows = table.xpath('./tbody/tr') #does not work! object empty
for color_row in color_rows:
color = color_row.xpath('/td[1]/b/text()').get().strip()
print(color)
stocks = color_row.xpath('/td/div[input[#class="form-control"]]/div//text()').getall()
for size, stock in zip(sizes, stocks)
Our html data looks like this
<table class="table">
<thead>
<tr>
<th id="ctl00_cphCEShop_colColore" class="text-left" colspan="2">Colore</th>
<th>S</th>
<th>M</th>
<th>L</th>
</tr>
</thead>
<tbody>
<tr>
<td id="x">
<b>White</b>
<input type="hidden" name="data" value="3230/201">
</td>
<td id="avail">
Avail:
</td>
<td id="1">
<div>
<input name="cell" type="text" class="form-control">
<div class="text-center">179</div>
</div>
</td>
<td id="2">
<div>
<input name="cell" type="text" class="form-control">
<div class="text-center">360</div>
</div>
</td>
etc etc
Apparently tbody tags are often omitted in HTML but aded by the browser.
In this case there was no (real) body tag making the xpath object miss!
And hence the troubles with xpath (if you really think the tbody tag is there)
Why do browsers insert tbody element into table elements?
Related
I'm using Ghost for Python 2.7 and I'm trying to click in a link which is in a table. The problem is that I have no ID, name... This is the HTML code:
<table id="table_webbookmarkline_2" cellpadding="4" cellspacing="0" border="0" width="100%">
<tr valign="top">
<td>
<a href="/dana/home/launch.cgi?url=.ahuvs%3A%2F%2Fhq0l5458452ERA-w-Xz8G3LKe8JNM%2F.ISDXWXaWXUivecOc" target="_blank" onClick='javascript:openBookmark(
this.href, "yes", "yes");
return false;' ><img src="/dana-cached/imgs/icn18x18WebBookmarkPop.gif" alt="This will open in a new TAB" width="18" height="18" border="0" ></a>
</td>
<td width="100%" align="left">
<a href="/dana/home/launch.cgi?url=.ahuvs%3A%2F%2Fhq0l5458452ERA-w-Xz8G3LKe8JNM%2F.ISDXWXaWXUivecOc" target="_blank" onClick='JavaScript:openBookmark(
this.href, "yes", "yes");
return false;' ><b>**LINK WHERE I WANT TO CLICK**</b> </a><br><span class="cssSmall"></span>
</td>
</tr>
</table>
How can I click in this kind of link ?
Seems like Ghost's Session.click() takes a CSS selector. Here only the table has an ID, so a selector that takes the second td that is a descendant of that ID and finds the a element should work:
session.click('#table_webbookmarkline_2 td:nth-child(2) a')
I am working on my test case which includes sending values to the input fields for buying tickets. But for this case selenium gives me unable to locate element error while I am trying to locate input field named itemq_3728, the problem is the page is changing the name of input field every time it reopens the page.
How can I locate the input field ?
I try the XPath but can't achieve the goal and also couldn't write it relative to the name of the ticket
<table id="bms_tickets" width="90%" cellspacing="5" cellpadding="0" class="bms_tickets table">
<thead>
<tr>
<th>NAME</th>
<th width="240px">PRICE</th>
<th width="100px">QUANTITY</th>
</tr>
</thead>
<tbody id="resTypesTable">
<tr id="bms_restype_3728" class="bms_restype">
<td class="bms_restype_desc">
Gen Ad
<div style="font-size:10px;margin-left:5px;">
</div>
</td>
<td class="bms_restype_price">
$10.00
<input type="hidden" name="pay_itemq_3728" value="10.00">
</td>
<td class="bms_restype_qty">
<input type="text" name="itemq_3728" value="0" placeholder="1" min="1">
</td>
</tr>
</tbody>
</table>
Hope this will help assuming only numeric path of name changes after pageload:
'//td[#class="bms_restype_qty"]//input[starts-‐with(#name,"itemq")]'
You can locate it using cssSelector as below :-
driver.find_element_by_css_selector("td.bms_restype_qty > input[type='text']")
Or if you're interested to locate this element using xpath you can locate it wrt Gen Ad name column text as below :-
driver.find_element_by_xpath(".//td[normalize-space(.)='Gen Ad' and #class = 'bms_restype_desc']/following-sibling::td[#class='bms_restype_qty']/input")
Or
driver.find_element_by_xpath(".//tr[td[normalize-space(.)='Gen Ad']]/td[#class='bms_restype_qty']/input")
From this Deutsche Börse web page, under the table header Issuer I want to get the string content 'db X-trackers' in the cell next to the one with Name in it.
Using my web browser, I inspect that table area and get the code, which I've pasted into this XML tree just so that I can test my xPath.
<root>
<div class="row">
<div class="col-lg-12">
<h2>Issuer</h2>
</div>
</div>
<div class="table-responsive">
<table class="table">
<tbody>
<tr>
<td>Name</td>
<td class="text-right">db X-trackers</td>
</tr>
</tbody>
</table>
</div>
</root>
According to FreeFormatter.com, my xPath below succeeds in retrieving the correct element (Text='db X-trackers'):
my_xpath = "//h2['Issuer']/ancestor::div[#class='row']/following-sibling::div//td['Name']/following-sibling::td[1]/text()"
Note: It goes to <h2>Issuer</h2> first to identify the right place to start working from.
However, when I run this on the actual web page using Selenium WebDriver, None is returned.
def get_sibling(driver, my_xpath):
try:
find_value = driver.find_element_by_xpath(my_xpath).text
except NoSuchElementException:
return None
else:
value = re.search(r"(.+)", find_value).group()
return value
I don't believe anything is wrong in the function itself, so either the xPath must be faulty or there is something in the actual web page source code that throws it off.
When studying the actual Source code in Chrome, it looks a bit messier than what I see with Inspector, which is what I used to create the little XML tree above.
<div class="box">
<div class="row">
<div class="col-lg-12">
<h2>Issuer</h2>
</div>
</div>
<div class="table-responsive">
<table class="table">
<tbody>
<tr>
<td >
Name
</td>
<td class="text-right" >
db X-trackers
</td>
</tr>
<tr>
<td >
Product Family
</td>
<td class="text-right" >
db X-trackers
</td>
</tr>
<tr>
<td >
Homepage
</td>
<td class="text-right" >
<a target="_blank" href="http://www.etf.db.com">www.etf.db.com</a>
</td>
</tr>
</tbody>
</table>
</div>
Are there some peculiarities in the source code above, or is my xPath (or function) wrong?
I would use the following and following-sibling axis:
//h2[. = "Issuer"]/following::table//td[. = "Name"]/following-sibling::td
First we locate the h2 element, then get the following table element. In the table element we look for the td element with Name text and then get the following td sibling.
Similar to .renderContents here, I want to search by that value: Beautiful Soup [Python] and the extracting of text in a table
Sample HTML:
<table>
<tr>
<td>
This is garbage
</td>
<td>
<td class="thead" style="font-weight:normal">
<!-- status icon and date -->
<a name="post1"><img class="inlineimg" src="img.gif" alt="Old" border="0" title="Old"></a>
19-11-2010, 04:25 PM
<!-- / status icon and date -->
</td>
<td>
This is garbage
</td>
</tr>
</table>
What I tried:
soup.find_all("td", text = re.compile('(AM|PM)'))[0].get_text().strip()
However, the text parameter of find_all seems to not work for this application: IndexError: list index out of range
What do I need to do?
Don't specify the tag name at all and let it find the desired text node. Works for me:
soup.find(text=re.compile('(AM|PM)')).strip()
I tried to convert app engine generated output page into pdf, and had some problems.
First: I select the contents in jQuery.
Second: Send this javascript variable to a new python script
Third: In the new python script, using xhtml2pdf to the conversion.
However, I got confused in the Second step. Below is my approach:
HTML:
<div class="articles">
<h2 class="model_header">PFAM Output</h2>
<form>
<table align="center">
<!--end 04uberoutput_start-->
<table class="out_chemical" width="550" border="1">
<tr>
<th scope="col" colspan="5">
<div align="center">Chemical Inputs</div>
</th>
</tr>
<tr>
<th scope="col" width="250">
<div align="center">Variable</div>
</th>
<th scope="col" width="150">
<div align="center">Unit</div>
</th>
<th scope="col" width="150">
<div align="center">Value</div>
</th>
</tr>
<tr>
<td>
<div align="center">Water Column Half life #20 ℃</div>
</td>
<td>
<div align="center">days</div>
</td>
<td>
<div align="center">11</div>
</td>
</tr>
</table>
</table>
</form>
</div>
JS
$(document).ready(function () {
var jq_html = $("div.articles").html();
console.log(jq_html);
$('.getpdf').append('<tr style="display:none"><td><input name="extract" value="' + jq_html + '"></input></td></tr>');
$('.getpdf').append('<tr><td><input type="submit" value="Generate PDF"/></td></tr>');
})
new python script to do the conversion
def post(self):
form = cgi.FieldStorage()
extract = form.getvalue('extract')
print extract
self.response.out.write(html)
When I tried to check if variable extract is transferred correctly, I got an empty page. It seems like this variable is ignored... The whole framework seems fine if I feed extract with a number. So could anyone help me to identify if my approach is correct? Thanks!
This line of code does not handle escaping HTML correctly. Additionally, it is a text field rather than a hidden field:
$('.getpdf').append('<tr style="display:none"><td><input name="extract" value="' + jq_html + '"></input></td></tr>');
A better way to do it would be like this:
$('<tr style="display:none"><td><input type="hidden" name="extract"></td></tr>')
.appendTo('.getpdf')
.find('input')
.val(jq_html);