I'm parsing this page
I pull out links from the number2 classes. Further in the loop I go through each element of number2 and try to get the results from the class 'center bold table-odds'. To do this, I try to find the parents of each link, but the problem is that every time I get the result from the first element (in this example it is 31:25)
<table class="table-main odds prediction-table" id="prediction-table-1">
<tbody>
<tr class="odd">
<td rowspan="3" class="center status-text-won">W</td>
<td rowspan="3" id="status-IwnElQet" class="table-time center datet t1570978800-6-1-0-0 ">Today<br>15:00</td>
<td rowspan="3" colspan="1" class="table-participant">
<a class="number2" href="/handball/europe/challenge-cup/vogosca-sviesa-IwnElQet/#1X2;2">1X2</a>
</td>
<td rowspan="3" class="center bold table-odds">31:25</td>
<td class="center table-odds result-ok">1.50</td>
</tr>
<tr class="even">
<td rowspan="3" class="center status-text-lost">L</td>
<td rowspan="3" id="status-0IZCD4u8" class="table-time center datet t1570978800-6-1-0-0 ">Today<br>15:00</td>
<td rowspan="3" colspan="2" class="table-participant">
<a class="number2" href="/volleyball/italy/serie-a2-women/marignano-talmassons-0IZCD4u8/#ah;2;-14.50;3">AH -14.5 Points</a>
</td>
<td rowspan="3" class="center bold table-odds">3:1</td>
<td class="center table-odds result-ok">2.01</td>
</tr>
</tbody>
</table>
odds = driver.find_elements_by_class_name('number2')
for odd in odds:
print(odd.get_attribute('href'))
print(odd.find_element_by_xpath('../..').find_element_by_class_name('center bold table-odds').text)
Your way to do it:
odds = driver.find_elements_by_class_name('number2')
for odd in odds:
print(odd.get_attribute('href'))
print(odd.find_element_by_xpath('./ancestor::tr[1]').find_element_by_css_selector('.center.bold.table-odds').text)
# or
# print(odd.find_element_by_xpath('./ancestor::tr[1]//td[4]')
# or
# print(odd.find_element_by_xpath('./ancestor::tr[1]//td[contains(#class,'bold')]')
Second way:
rows = driver.find_element_by_css_selector('#prediction-table-1 > tbody > tr')
for row in rows:
print(row.find_element_by_css_selector('.number2').get_attribute('href'))
print(row.find_element_by_css_selector('.center.bold.table-odds').text)
You have a typo
find_element_by_class_name
should be
find_elements_by_class_name
Make it plural to get them all. Read more here
Since there is only one class with name "number2" you are getting only on element and your is iterating once only.
odds = driver.find_elements_by_class_name('number2')
Related
Im having trouble writing the contents of this soup function to the my ide.
I have the following soup function:
row = soup.find_all('td', attrs = {'class': 'Table__TD'})
here is the a subset of what it returns:
[<td class="Table__TD">Sat 11/9</td>,
<td class="Table__TD"><span class="flex"><span class="pr2">vs</span><span class="pr2 TeamLink__Logo"><a class="AnchorLink v-mid" data-clubhouse-uid="s:40~l:46~t:6" href="/nba/team/_/name/dal/dallas-mavericks" title="Team - Dallas Mavericks"><img alt="DAL" class="v-mid" data-clubhouse-uid="s:40~l:46~t:6" height="20" src="" title="DAL" width="20"/></a></span><span><a class="AnchorLink v-mid" data-clubhouse-uid="s:40~l:46~t:6" href="/nba/team/_/name/dal/dallas-mavericks" title="Team - Dallas Mavericks">DAL</a></span></span></td>,
<td class="Table__TD"><a class="AnchorLink" data-game-link="true" href="http://www.espn.com/nba/game?gameId=401160772"><span class="flex tl"><span class="pr2"><div class="ResultCell tl loss-stat">L</div></span><span>138-122</span></span></a></td>,
<td class="Table__TD">31</td>,
<td class="Table__TD">6-12</td>,
<td class="Table__TD">50.0</td>,
<td class="Table__TD">4-9</td>,
<td class="Table__TD">44.4</td>,
<td class="Table__TD">2-2</td>,
<td class="Table__TD">100.0</td>,
<td class="Table__TD">4</td>,
<td class="Table__TD">4</td>,
<td class="Table__TD">2</td>,
<td class="Table__TD">3</td>,
<td class="Table__TD">2</td>,
<td class="Table__TD">1</td>,
<td class="Table__TD">18</td>,
<td class="Table__TD">Fri 11/8</td>,
I am trying to use a for loop to write these out but my console is not returning anything.
for data in row[0].find_all('td'):
print(data.get_text())
Can anyone tell me what I am doing wrong? Thanks.
With the initial search, you don't need to re-find_all on the tag name.
Just do something like:
for data in row:
print(data.get_text())
I have been trying for a few hours now to extract a text from a specific cell in the following table for vain:
<tbody class="table-body">
<tr class=" " data-blah="25293454534534513" data-currency="1">
<td class="action-cell no-sort">
</td>
<td class="col1 id">
<a class="alert-ico " data-tooltip=""></a>
<a class="isin-btn " data-tooltip="" id="isin" data-portfolioid="2423424" data-status="0">US3</a>
</td>
<td class="col2 name hide">4%</td>
<td class="col9 colNo.9" title="Bid: 101.23; Mid: 101.28; Ask: 101.33;
Liquidity Score: -*/5*; Merit: -/4;" data-bprice="101.28" data-uprice="101.28">101.28<span class="estim-star">*</span></td>
<td class="col10 price_change" nowrap="" data-sort="0.02"><span class="positive-change">0.02%</span><span class="change-sign positive-change">↑</span></td>
<td class="col11 yield yield-val" title="" data-sort="3.33" data-byield="3.33" data-uyield="3.34%">3.33%</td>
<td class="col12 purchase_price" data-bprice="101.28" data-uprice="101.28" data-sort="101.28"><input type="text" name="purchase_price" class="positive-num-only default" value="101.28"></td>
<td class="col13 margin_bond" data-bond="sec" data-sort="0"><input type="text" name="margin_bond" maxlength="3" class="positive-num-only default" value="0"></td>
</tr>
</tbody>
I'm trying to extract a text from column 'Price Change' (col 10) using lxml.html which allows me to extract data from big tables in a manner of seconds. I'm doing it like that:
import lxml.html
import pandas as pd
root = lxml.html.fromstring(self.driver.page_source)
data = []
for row in root.xpath('.//*[#id=\'main\']/div[5]/div[2]/table/tbody/tr'):
cells = row.xpath('.//td/text()')
So, I succeeded to extract the whole table like that and I know that the only exception is column 10 ('price change') and tried the following and it returned the empty string (""):
row.xpath('.//tr[1]/td[11][#data-sort]/text()')
row.xpath('.//[#id='main']/div[5]/div[2]/table/tbody/tr[1]/td[11]/span/text()')
row.xpath('.//*[#id='main']/div[5]/div[2]/table/tbody/tr[1]/td[11]/text()')
I don't want to extract the text using WebElement but only with lxml.html library
Thank you!
There are two problems
There are total 7 tds and not 11, the td you are intersted is 5 and not 11.
the td you are intersted in has two span and you are not providing which span you are interested in.
this code works perfectly fine.
html_code = """
<tbody class="table-body">
<tr class=" " data-blah="25293454534534513" data-currency="1">
<td class="action-cell no-sort">
</td>
<td class="col1 id">
<a class="alert-ico " data-tooltip=""></a>
<a class="isin-btn " data-tooltip="" id="isin" data-portfolioid="2423424" data-status="0">US3</a>
</td>
<td class="col2 name hide">4%</td>
<td class="col9 colNo.9" title="Bid: 101.23; Mid: 101.28; Ask: 101.33;
Liquidity Score: -*/5*; Merit: -/4;" data-bprice="101.28" data-uprice="101.28">101.28<span class="estim-star">*</span></td>
<td class="col10 price_change" nowrap="" data-sort="0.02">
<span class="positive-change">0.02%</span>
<span class="change-sign positive-change">↑</span></td>
<td class="col11 yield yield-val" title="" data-sort="3.33" data-byield="3.33" data-uyield="3.34%">3.33%</td>
<td class="col12 purchase_price" data-bprice="101.28" data-uprice="101.28" data-sort="101.28"><input type="text" name="purchase_price" class="positive-num-only default" value="101.28"></td>
<td class="col13 margin_bond" data-bond="sec" data-sort="0"><input type="text" name="margin_bond" maxlength="3" class="positive-num-only default" value="0"></td>
</tr>
</tbody>
"""
tree = html.fromstring(html_code)
print "purchase price is %s" % tree.xpath(".//td[contains(#class,'col10')]/span[1]/text()")[0]
print "purchase price is %s" % tree.xpath(".//td[5]/span[1]/text()")[0]
I have a webpage with a table that only appears when I click 'Inspect Element' and is not visible through the View Source page. The table contains only two rows with several cells each and looks similar to this:
<table class="datadisplaytable">
<tbody>
<tr>
<td class="dddefault">16759</td>
<td class="dddefault">MATH</td>
<td class="dddefault">123</td>
<td class="dddefault">001</td>
<td class="dddefault">Calculus</td>
<td class="dddefault"></td>
<td class="dddead"></td>
<td class="dddead"></td>
</tr>
<tr>
<td class="dddefault">16449</td>
<td class="dddefault">PHY</td>
<td class="dddefault">456</td>
<td class="dddefault">002</td>
<td class="dddefault">Physics</td>
<td class="dddefault"></td>
<td class="dddead"></td>
<td class="dddead"></td>
</tr>
</tbody>
</table>
What I'm trying to do is to iterate through the rows and return the text contained in each cell. I can't really seem to do it with Selenium. The elements contain no IDs and I'm not sure how else to get them. I'm not very familiar with using xpaths and such.
Here is a debugging attempt that returns a TypeError:
def check_grades(self):
table = []
for i in self.driver.find_element_by_class_name("dddefault"):
table.append(i)
print(table)
What is an easy way to get the text from the rows?
XPath is fragile. It's better to use CSS selectors or classes:
mytable = find_element_by_css_selector('table.datadisplaytable')
for row in mytable.find_elements_by_css_selector('tr'):
for cell in row.find_elements_by_tag_name('td'):
print(cell.text)
If you want to go row by row using an xpath, you can use the following:
h = """<table class="datadisplaytable">
<tr>
<td class="dddefault">16759</td>
<td class="dddefault">MATH</td>
<td class="dddefault">123</td>
<td class="dddefault">001</td>
<td class="dddefault">Calculus</td>
<td class="dddefault"></td>
<td class="dddead"></td>
<td class="dddead"></td>
</tr>
<tr>
<td class="dddefault">16449</td>
<td class="dddefault">PHY</td>
<td class="dddefault">456</td>
<td class="dddefault">002</td>
<td class="dddefault">Physics</td>
<td class="dddefault"></td>
<td class="dddead"></td>
<td class="dddead"></td>
</tr>
</table>"""
from lxml import html
xml = html.fromstring(h)
# gets the table
table = xml.xpath("//table[#class='datadisplaytable']")[0]
# iterate over all the rows
for row in table.xpath(".//tr"):
# get the text from all the td's from each row
print([td.text for td in row.xpath(".//td[#class='dddefault'][text()])
Which outputs:
['16759', 'MATH', '123', '001', 'Calculus']
['16449', 'PHY', '456', '002', 'Physics']
Using td[text()] will avoid getting any Nones returned for the td's that hold no text.
So to do the same using selenium you would:
table = driver.find_element_by_xpath("//table[#class='datadisplaytable']")
for row in table.find_elements_by_xpath(".//tr"):
print([td.text for td in row.find_elements_by_xpath(".//td[#class='dddefault'][1]"])
For multiple tables:
def get_row_data(table):
for row in table.find_elements_by_xpath(".//tr"):
yield [td.text for td in row.find_elements_by_xpath(".//td[#class='dddefault'][text()]"])
for table in driver.find_elements_by_xpath("//table[#class='datadisplaytable']"):
for data in get_row_data(table):
# use the data
Correction of the Selenium part of #Padraic Cunningham's answer:
table = driver.find_element_by_xpath("//table[#class='datadisplaytable']")
for row in table.find_elements_by_xpath(".//tr"):
print([td.text for td in row.find_elements_by_xpath(".//td[#class='dddefault']")])
Note: there was one missing round bracket at the end; also removed the [1] index, to match the first XML example.
Another note: Though, the example with the index [1] should also be preserved, to show how to extract individual elements.
Another Version (modified and corrected post by Padraic Cunningham):
Tested with Python 3.x
#!/usr/bin/python
h = """<table class="datadisplaytable">
<tr>
<td class="dddefault">16759</td>
<td class="dddefault">MATH</td>
<td class="dddefault">123</td>
<td class="dddefault">001</td>
<td class="dddefault">Calculus</td>
<td class="dddefault"></td>
<td class="dddead"></td>
<td class="dddead"></td>
</tr>
<tr>
<td class="dddefault">16449</td>
<td class="dddefault">PHY</td>
<td class="dddefault">456</td>
<td class="dddefault">002</td>
<td class="dddefault">Physics</td>
<td class="dddefault"></td>
<td class="dddead"></td>
<td class="dddead"></td>
</tr>
</table>"""
from lxml import html
xml = html.fromstring(h)
# gets the table
table = xml.xpath("//table[#class='datadisplaytable']")[0]
# iterate over all the rows
for row in table.xpath(".//tr"):
# get the text from all the td's from each row
print([td.text for td in row.xpath(".//td[#class='dddefault']")])
I have a table (<table>) with values in each row (<tr>) from its body (<tbody>).
The value I would lile to print out is in the <span> inside a <div> tag.
Inspecting the html, I see the value e.g. "Name" is in row 1 (tr[1]), column 2 (td[2]):
<tr class="GAT4PNUFG GAT4PNUMG" __gwt_subrow="0" __gwt_row="0">
<td class="GAT4PNUEG GAT4PNUGG GAT4PNUHG GAT4PNUNG">
<td class="GAT4PNUEG GAT4PNUGG GAT4PNUNG">
<div __gwt_cell="cell-gwt-uid-324" style="outline-style:none;">
<span class="linkhover" title="Name" style="white-space:nowrap;overflow:hidden;text-overflow:ellipsis;empty-cells:show;display:block;color:#00A;cursor:pointer;">Name</span>
</div>
</td>
I would like to loop through the table each row and print out the value in columns 2, td[2]
I am using Python with Selenium Webdriver
The full Xpath to the table row 1, column 2 is:
html/body/div[2]/div[2]/div/div[4]/div/div[2]/div/div[3]/div/div[5]/div/div[3]/div/div[4]/div/div[2]/div/div[4]/div/div[3]/div/div[2]/div/div/table/tbody/tr[1]/td[2]/div/span
I was thinking if i can start from the table, xpath as follows:
html/body/div[2]/div[2]/div/div[4]/div/div[2]/div/div[3]/div/div[5]/div/div[3]/div/div[4]/div/div[2]/div/div[4]/div/div[3]/div/div[2]/div/div/table/tbody
I can then use a for loop and use an index for the tr and td
e.g for row1 use tr[i], for col2 use td[2].
html/body/div[2]/div[2]/div/div[4]/div/div[2]/div/div[3]/div/div[5]/div/div[3]/div/div[4]/div/div[2]/div/div[4]/div/div[3]/div/div[2]/div/div/table/tbody/tr[i]/td[2]/div/span
How can i loop through this table and print out the value of the Span class tag which is always in column 2 of the table?
I tried to get the start of the table into a variable and then I could maybe use this to loop through the rows and columns.
I need some help please.
table = self.driver.find_element(By.XPATH, 'html/body/div[2]/div[2]/div/div[4]/div/div[2]/div/div[3]/div/div[5]/div/div[3]/div/div[4]/div/div[2]/div/div[4]/div/div[3]/div/div[2]/div/div/table/tbody')
Here's the full HTML:
<table cellspacing="0" style="table-layout: fixed; width: 100%;">
<colgroup>
<tbody>
<tr class="GAT4PNUFG GAT4PNUMG" __gwt_subrow="0" __gwt_row="0">
<td class="GAT4PNUEG GAT4PNUGG GAT4PNUHG GAT4PNUNG">
<td class="GAT4PNUEG GAT4PNUGG GAT4PNUNG">
<div __gwt_cell="cell-gwt-uid-324" style="outline-style:none;">
<span class="linkhover" title="Name" style="white-space:nowrap;overflow:hidden;text-overflow:ellipsis;empty-cells:show;display:block;color:#00A;cursor:pointer;">Name</span>
</div>
</td>
<td class="GAT4PNUEG GAT4PNUGG GAT4PNUNG">
<td class="GAT4PNUEG GAT4PNUGG GAT4PNUNG">
<td class="GAT4PNUEG GAT4PNUGG GAT4PNUNG">
<td class="GAT4PNUEG GAT4PNUGG GAT4PNUBH GAT4PNUNG">
</tr>
<tr class="GAT4PNUEH" __gwt_subrow="0" __gwt_row="1">
<td class="GAT4PNUEG GAT4PNUFH GAT4PNUHG">
<td class="GAT4PNUEG GAT4PNUFH">
<div __gwt_cell="cell-gwt-uid-324" style="outline-style:none;">
<span class="linkhover" title="Address" style="white-space:nowrap;overflow:hidden;text-overflow:ellipsis;empty-cells:show;display:block;color:#00A;cursor:pointer;">Address</span>
</div>
</td>
<td class="GAT4PNUEG GAT4PNUFH">
<td class="GAT4PNUEG GAT4PNUFH">
<td class="GAT4PNUEG GAT4PNUFH">
<td class="GAT4PNUEG GAT4PNUFH GAT4PNUBH">
</tr>
<tr class="GAT4PNUFG" __gwt_subrow="0" __gwt_row="2">
<td class="GAT4PNUEG GAT4PNUGG GAT4PNUHG">
<td class="GAT4PNUEG GAT4PNUGG">
<div __gwt_cell="cell-gwt-uid-324" style="outline-style:none;">
<span class="linkhover" title="DOB" style="white-space:nowrap;overflow:hidden;text-overflow:ellipsis;empty-cells:show;display:block;color:#00A;cursor:pointer;">DOB</span>
</div>
</td>
<td class="GAT4PNUEG GAT4PNUGG">
<td class="GAT4PNUEG GAT4PNUGG">
<td class="GAT4PNUEG GAT4PNUGG">
<td class="GAT4PNUEG GAT4PNUGG GAT4PNUBH">
</tr>
<tr class="GAT4PNUEH" __gwt_subrow="0" __gwt_row="3">
---
<tr class="GAT4PNUFG" __gwt_subrow="0" __gwt_row="4">
---
</tbody>
</table>
The developer has put an ID into the table. I have it working now. It is printing all the cell values from column 2. The code is:
table_id = self.driver.find_element(By.ID, 'data_configuration_feeds_ct_fields_body0')
rows = table_id.find_elements(By.TAG_NAME, "tr") # get all of the rows in the table
for row in rows:
# Get the columns (all the column 2)
col = row.find_elements(By.TAG_NAME, "td")[1] #note: index start from 0, 1 is col 2
print col.text #prints text from the element
The XPath you currently using is quite fragile since it depends on the complete document structure and the relative position of the elements. It can easily break in the future.
Instead, locate the rows using their class or other attributes. For instance:
for row in driver.find_elements_by_css_selector("tr.GAT4PNUFG.GAT4PNUMG"):
cell = row.find_elements_by_tag_name("td")[1]
print(cell.text)
Probably a little late to this. But heres my code and works like a charm.
def find_in_table(self, name):
check_table = self.isElementPresent("//table[#class='assessment_list_table_tableStyle__Qw-rz']",
locatorType="xpath")
while not check_table:
time.sleep(10)
check_table = self.isElementPresent("//table[#class='assessment_list_table_tableStyle__Qw-rz']",
locatorType="xpath")
table_id = self.driver.find_element(By.XPATH, "//table[#class='assessment_list_table_tableStyle__Qw-rz']")
rows = table_id.find_elements(By.TAG_NAME, "tr")
for x in range(1, len(rows)):
col = rows[x].find_elements(By.TAG_NAME, "td")[0]
s = col.text
if s == name:
return x
Check whether the table exists
Get the table ID with find_elements
Using table ID to find the rows in table
iterate through the table and finding the text in the first column
(0)
Returns the row value when text matches the one in column
XPATH of the Table Element can be obtained using the selenium plugin in IntelliJ. The plugin is so useful to find elements and more accurate than the ones in as extension in browsers.
(isElementPresent method is a method I used to check whether an element is present using seleniums getElement method and returning boolean if the element exists)
I am new in Python and someone suggested me to use Beautiful soup for Scrapping and i am struck in a problem to fetch the href attribute from a td tag Column 2 on the basis of year in column 4.
<table class="tableFile2" summary="Results">
<tr>
<th width="7%" scope="col">Filings</th>
<th width="10%" scope="col">Format</th>
<th scope="col">Description</th>
<th width="10%" scope="col">Filing Date</th>
<th width="15%" scope="col">File/Film Number</th>
</tr>
<tr>
<td nowrap="nowrap">8-K</td>
<td nowrap="nowrap"> Documents</td>
<td class="small" >Current report, items 8.01 and 9.01
<br />Acc-no: 0001193125</td>
<td>2013-05-03</td>
<td nowrap="nowrap">000-10030<br>13813281 </td>
</tr>
<tr class="blueRow">
<td nowrap="nowrap">424B2</td>
<td nowrap="nowrap"> Documents</td>
<td class="small" >Prospectus [Rule 424(b)(2)]<br />Acc-no: 0001193125</td>
<td>2013-05-01</td>
<td nowrap="nowrap">333-188191<br>13802405 </td>
</tr>
<tr>
<td nowrap="nowrap">FWP</td>
<td nowrap="nowrap"> Documents</td>
<td class="small" >Filing under Securities Act Rules 163/433 of free writing prospectuses<br />Acc-no: 0001193125-13-189053 (34 Act) Size: 52 KB </td>
<td>2013-05-01</td>
<td nowrap="nowrap">333-188191<br>13800170 </td>
</tr>
</table>
table = soup.find('table', class="tableFile2")
rows = table.findAll('tr')
for tr in rows:
cols = tr.findAll('td')
if "2013" in cols[3]
link = cols[1].find('a').get('href')
print
This works for me in Python 2.7:
table = soup.find('table', {'class': 'tableFile2'})
rows = table.findAll('tr')
for tr in rows:
cols = tr.findAll('td')
if len(cols) >= 4 and "2013" in cols[3].text:
link = cols[1].find('a').get('href')
print link
A few issues with your previous code:
soup.find() requires a dictionary of attributes (e.g., {'class' : 'tableFile2'})
Not every cols instance will have at least 3 columns, so you need to check length first.