Iterate through table rows and print column text with Python Selenium - python

I have a table (<table>) with values in each row (<tr>) from its body (<tbody>).
The value I would lile to print out is in the <span> inside a <div> tag.
Inspecting the html, I see the value e.g. "Name" is in row 1 (tr[1]), column 2 (td[2]):
<tr class="GAT4PNUFG GAT4PNUMG" __gwt_subrow="0" __gwt_row="0">
<td class="GAT4PNUEG GAT4PNUGG GAT4PNUHG GAT4PNUNG">
<td class="GAT4PNUEG GAT4PNUGG GAT4PNUNG">
<div __gwt_cell="cell-gwt-uid-324" style="outline-style:none;">
<span class="linkhover" title="Name" style="white-space:nowrap;overflow:hidden;text-overflow:ellipsis;empty-cells:show;display:block;color:#00A;cursor:pointer;">Name</span>
</div>
</td>
I would like to loop through the table each row and print out the value in columns 2, td[2]
I am using Python with Selenium Webdriver
The full Xpath to the table row 1, column 2 is:
html/body/div[2]/div[2]/div/div[4]/div/div[2]/div/div[3]/div/div[5]/div/div[3]/div/div[4]/div/div[2]/div/div[4]/div/div[3]/div/div[2]/div/div/table/tbody/tr[1]/td[2]/div/span
I was thinking if i can start from the table, xpath as follows:
html/body/div[2]/div[2]/div/div[4]/div/div[2]/div/div[3]/div/div[5]/div/div[3]/div/div[4]/div/div[2]/div/div[4]/div/div[3]/div/div[2]/div/div/table/tbody
I can then use a for loop and use an index for the tr and td
e.g for row1 use tr[i], for col2 use td[2].
html/body/div[2]/div[2]/div/div[4]/div/div[2]/div/div[3]/div/div[5]/div/div[3]/div/div[4]/div/div[2]/div/div[4]/div/div[3]/div/div[2]/div/div/table/tbody/tr[i]/td[2]/div/span
How can i loop through this table and print out the value of the Span class tag which is always in column 2 of the table?
I tried to get the start of the table into a variable and then I could maybe use this to loop through the rows and columns.
I need some help please.
table = self.driver.find_element(By.XPATH, 'html/body/div[2]/div[2]/div/div[4]/div/div[2]/div/div[3]/div/div[5]/div/div[3]/div/div[4]/div/div[2]/div/div[4]/div/div[3]/div/div[2]/div/div/table/tbody')
Here's the full HTML:
<table cellspacing="0" style="table-layout: fixed; width: 100%;">
<colgroup>
<tbody>
<tr class="GAT4PNUFG GAT4PNUMG" __gwt_subrow="0" __gwt_row="0">
<td class="GAT4PNUEG GAT4PNUGG GAT4PNUHG GAT4PNUNG">
<td class="GAT4PNUEG GAT4PNUGG GAT4PNUNG">
<div __gwt_cell="cell-gwt-uid-324" style="outline-style:none;">
<span class="linkhover" title="Name" style="white-space:nowrap;overflow:hidden;text-overflow:ellipsis;empty-cells:show;display:block;color:#00A;cursor:pointer;">Name</span>
</div>
</td>
<td class="GAT4PNUEG GAT4PNUGG GAT4PNUNG">
<td class="GAT4PNUEG GAT4PNUGG GAT4PNUNG">
<td class="GAT4PNUEG GAT4PNUGG GAT4PNUNG">
<td class="GAT4PNUEG GAT4PNUGG GAT4PNUBH GAT4PNUNG">
</tr>
<tr class="GAT4PNUEH" __gwt_subrow="0" __gwt_row="1">
<td class="GAT4PNUEG GAT4PNUFH GAT4PNUHG">
<td class="GAT4PNUEG GAT4PNUFH">
<div __gwt_cell="cell-gwt-uid-324" style="outline-style:none;">
<span class="linkhover" title="Address" style="white-space:nowrap;overflow:hidden;text-overflow:ellipsis;empty-cells:show;display:block;color:#00A;cursor:pointer;">Address</span>
</div>
</td>
<td class="GAT4PNUEG GAT4PNUFH">
<td class="GAT4PNUEG GAT4PNUFH">
<td class="GAT4PNUEG GAT4PNUFH">
<td class="GAT4PNUEG GAT4PNUFH GAT4PNUBH">
</tr>
<tr class="GAT4PNUFG" __gwt_subrow="0" __gwt_row="2">
<td class="GAT4PNUEG GAT4PNUGG GAT4PNUHG">
<td class="GAT4PNUEG GAT4PNUGG">
<div __gwt_cell="cell-gwt-uid-324" style="outline-style:none;">
<span class="linkhover" title="DOB" style="white-space:nowrap;overflow:hidden;text-overflow:ellipsis;empty-cells:show;display:block;color:#00A;cursor:pointer;">DOB</span>
</div>
</td>
<td class="GAT4PNUEG GAT4PNUGG">
<td class="GAT4PNUEG GAT4PNUGG">
<td class="GAT4PNUEG GAT4PNUGG">
<td class="GAT4PNUEG GAT4PNUGG GAT4PNUBH">
</tr>
<tr class="GAT4PNUEH" __gwt_subrow="0" __gwt_row="3">
---
<tr class="GAT4PNUFG" __gwt_subrow="0" __gwt_row="4">
---
</tbody>
</table>

The developer has put an ID into the table. I have it working now. It is printing all the cell values from column 2. The code is:
table_id = self.driver.find_element(By.ID, 'data_configuration_feeds_ct_fields_body0')
rows = table_id.find_elements(By.TAG_NAME, "tr") # get all of the rows in the table
for row in rows:
# Get the columns (all the column 2)
col = row.find_elements(By.TAG_NAME, "td")[1] #note: index start from 0, 1 is col 2
print col.text #prints text from the element

The XPath you currently using is quite fragile since it depends on the complete document structure and the relative position of the elements. It can easily break in the future.
Instead, locate the rows using their class or other attributes. For instance:
for row in driver.find_elements_by_css_selector("tr.GAT4PNUFG.GAT4PNUMG"):
cell = row.find_elements_by_tag_name("td")[1]
print(cell.text)

Probably a little late to this. But heres my code and works like a charm.
def find_in_table(self, name):
check_table = self.isElementPresent("//table[#class='assessment_list_table_tableStyle__Qw-rz']",
locatorType="xpath")
while not check_table:
time.sleep(10)
check_table = self.isElementPresent("//table[#class='assessment_list_table_tableStyle__Qw-rz']",
locatorType="xpath")
table_id = self.driver.find_element(By.XPATH, "//table[#class='assessment_list_table_tableStyle__Qw-rz']")
rows = table_id.find_elements(By.TAG_NAME, "tr")
for x in range(1, len(rows)):
col = rows[x].find_elements(By.TAG_NAME, "td")[0]
s = col.text
if s == name:
return x
Check whether the table exists
Get the table ID with find_elements
Using table ID to find the rows in table
iterate through the table and finding the text in the first column
(0)
Returns the row value when text matches the one in column
XPATH of the Table Element can be obtained using the selenium plugin in IntelliJ. The plugin is so useful to find elements and more accurate than the ones in as extension in browsers.
(isElementPresent method is a method I used to check whether an element is present using seleniums getElement method and returning boolean if the element exists)

Related

Why I get only first element?

I'm parsing this page
I pull out links from the number2 classes. Further in the loop I go through each element of number2 and try to get the results from the class 'center bold table-odds'. To do this, I try to find the parents of each link, but the problem is that every time I get the result from the first element (in this example it is 31:25)
<table class="table-main odds prediction-table" id="prediction-table-1">
<tbody>
<tr class="odd">
<td rowspan="3" class="center status-text-won">W</td>
<td rowspan="3" id="status-IwnElQet" class="table-time center datet t1570978800-6-1-0-0 ">Today<br>15:00</td>
<td rowspan="3" colspan="1" class="table-participant">
<a class="number2" href="/handball/europe/challenge-cup/vogosca-sviesa-IwnElQet/#1X2;2">1X2</a>
</td>
<td rowspan="3" class="center bold table-odds">31:25</td>
<td class="center table-odds result-ok">1.50</td>
</tr>
<tr class="even">
<td rowspan="3" class="center status-text-lost">L</td>
<td rowspan="3" id="status-0IZCD4u8" class="table-time center datet t1570978800-6-1-0-0 ">Today<br>15:00</td>
<td rowspan="3" colspan="2" class="table-participant">
<a class="number2" href="/volleyball/italy/serie-a2-women/marignano-talmassons-0IZCD4u8/#ah;2;-14.50;3">AH -14.5 Points</a>
</td>
<td rowspan="3" class="center bold table-odds">3:1</td>
<td class="center table-odds result-ok">2.01</td>
</tr>
</tbody>
</table>
odds = driver.find_elements_by_class_name('number2')
for odd in odds:
print(odd.get_attribute('href'))
print(odd.find_element_by_xpath('../..').find_element_by_class_name('center bold table-odds').text)
Your way to do it:
odds = driver.find_elements_by_class_name('number2')
for odd in odds:
print(odd.get_attribute('href'))
print(odd.find_element_by_xpath('./ancestor::tr[1]').find_element_by_css_selector('.center.bold.table-odds').text)
# or
# print(odd.find_element_by_xpath('./ancestor::tr[1]//td[4]')
# or
# print(odd.find_element_by_xpath('./ancestor::tr[1]//td[contains(#class,'bold')]')
Second way:
rows = driver.find_element_by_css_selector('#prediction-table-1 > tbody > tr')
for row in rows:
print(row.find_element_by_css_selector('.number2').get_attribute('href'))
print(row.find_element_by_css_selector('.center.bold.table-odds').text)
You have a typo
find_element_by_class_name
should be
find_elements_by_class_name
Make it plural to get them all. Read more here
Since there is only one class with name "number2" you are getting only on element and your is iterating once only.
odds = driver.find_elements_by_class_name('number2')

Python - BS4 - extracting a subtable from a wikipedia table using only table header + save as dictionary

I am trying to define a function which extracts all rows of the 'Basisdaten' table on the website https://de.wikipedia.org/wiki/Stuttgart and return a dictionary whose keys and values correspond to the first and second cells in each row of the table.
The 'Basisdaten' table is part of a much larger table, as shown through the result of the following code:
from bs4 import BeautifulSoup
import requests
r=requests.get("https://de.wikipedia.org/wiki/Stuttgart")
soup=BeautifulSoup(r.text,"html.parser")
soup.find('th', text=re.compile('Basisdaten')).find_parent('table')
Unfortunately, there is no unique ID which I can use to only select those rows making up the 'Basisdaten' table. These are the rows which I hope to extract in HTML format:
<tr>
<th colspan="2">Basisdaten
</th></tr>
<tr class="hintergrundfarbe2">
<td>Bundesland:</td>
<td>Baden-Württemberg
</td></tr>
<tr class="hintergrundfarbe2">
<td>Regierungsbezirk:
</td>
<td>Stuttgart
</td></tr>
<tr class="hintergrundfarbe2">
<td>Höhe:
</td>
<td>247 m ü. NHN
</td></tr>
<tr class="hintergrundfarbe2">
<td>Fläche:
</td>
<td>207,35 km<sup>2</sup>
</td></tr>
<tr class="hintergrundfarbe2">
<td>Einwohner:
</td>
<td style="line-height: 1.2em;">628.032 <small><i>(31. Dez. 2016)</i></small><sup class="reference" id="cite_ref-Metadaten_Einwohnerzahl_DE-BW_1-0">[1]</sup>
</td></tr>
<tr class="hintergrundfarbe2">
<td>Bevölkerungsdichte:
</td>
<td>3029 Einwohner je km<sup>2</sup>
</td></tr>
<tr class="hintergrundfarbe2">
<td style="vertical-align: top;">Postleitzahlen:
</td>
<td>70173–70619
</td></tr>
<tr class="hintergrundfarbe2">
<td style="vertical-align: top;">Vorwahl:
</td>
<td>0711
</td></tr>
<tr class="hintergrundfarbe2">
<td style="vertical-align: top;">Kfz-Kennzeichen:
</td>
<td>S
</td></tr>
<tr class="hintergrundfarbe2">
<td style="vertical-align: top;">Gemeindeschlüssel:
</td>
<td>08 1 11 000
</td></tr>
<tr class="hintergrundfarbe2 metadata">
<td>LOCODE:
</td>
<td>DE STR
</td></tr>
<tr class="hintergrundfarbe2 metadata">
<td>NUTS:
</td>
<td>DE111
</td></tr>
<tr class="hintergrundfarbe2">
<td style="vertical-align: top;">Stadtgliederung:
</td>
<td>23 Stadtbezirke<br/>mit 152 Stadtteilen
</td></tr>
<tr class="hintergrundfarbe2">
<td style="vertical-align: top;">Adresse der<br/>Stadtverwaltung:
</td>
<td>Marktplatz 1<br/>70173 Stuttgart
</td></tr>
<tr class="hintergrundfarbe2" style="vertical-align: top;">
<td>Webpräsenz:
</td>
<td style="max-width: 10em; overflow: hidden; word-wrap: break-word;"><a class="external text" href="//www.stuttgart.de/" rel="nofollow">www.stuttgart.de</a>
</td></tr>
<tr class="hintergrundfarbe2">
<td style="vertical-align: top;">Oberbürgermeister:
</td>
<td>Fritz Kuhn (Bündnis 90/Die Grünen)
</td></tr>
I have succeeded in writing this code which gives me the desired result in dictionary form:
data = []
def extractDict(y):
results = y.find("th", {"colspan" : "2"}).find_parent('table').select('td')[3:35]
for row in results:
data.append(row.text.strip().replace('\xa0', '').replace(':', '').replace('[1]', ''))
return dict(zip(data[::2], data[1::2]))
basisdaten=extractDict(soup)
basisdaten
Result:
{'Adresse derStadtverwaltung': 'Marktplatz 170173 Stuttgart',
'Bevölkerungsdichte': '3029Einwohner je km2',
'Bundesland': 'Baden-Württemberg',
'Einwohner': '628.032 (31.Dez.2016)',
'Fläche': '207,35km2',
'Gemeindeschlüssel': '08111000',
'Höhe': '247m ü.NHN',
'Kfz-Kennzeichen': 'S',
'LOCODE': 'DE STR',
'NUTS': 'DE111',
'Oberbürgermeister': 'Fritz Kuhn (Bündnis 90/Die Grünen)',
'Postleitzahlen': '70173–70619',
'Regierungsbezirk': 'Stuttgart',
'Stadtgliederung': '23 Stadtbezirkemit 152 Stadtteilen',
'Vorwahl': '0711',
'Webpräsenz': 'www.stuttgart.de'}
However I am looking for a better solution which does not involve simply picking the 4th to 35th row from the parent table. I subsequently intend to use this code on other similar wikipedia urls and the 'Basisdaten' tables may vary across websites in terms of number of rows.
The similarity amongst all 'Basisdaten' tables is that they are all embedded within the first table and that they all have two columns, hence all start with 'th colspan="2"'. The parent table contains other subtables, for example in this case the subtable 'Lage der Stadt Stuttgart in Baden-Württemberg' comes after 'Basisdaten'.
Is it possible to write a loop which searches for the 'Basisdaten' subtable header and takes all rows thereafter, but stops when it reaches the next subtable header ('th colspan="2"')?
I have only gotten as far as to find the row which contains the start of the Basisdaten table:
soup.find('th', text=re.compile('Basisdaten'))
Hope that made sense! I am very new to Beautifulsoup and Python and this is a very challenging problem for me.
this should do
from bs4 import BeautifulSoup
import requests
data = requests.get("https://de.wikipedia.org/wiki/Stuttgart").text
soup = BeautifulSoup(data, "lxml")
trs = soup.select('table[id*="Infobox"] tr')
is_in_basisdaten = False
data = {}
clean_data = lambda x: x.get_text().strip().replace('\xa0', '').replace(':', '')
for tr in trs:
if tr.th:
if "Basisdaten" in tr.th.string:
is_in_basisdaten = True
if is_in_basisdaten and "Basisdaten" not in tr.th.string:
break
elif is_in_basisdaten:
key, val = tr.select('td')
data[clean_data(key)] = clean_data(val)
print(data)

Cannot access to cell text inside an html table (Selenium,python)

I have been trying for a few hours now to extract a text from a specific cell in the following table for vain:
<tbody class="table-body">
<tr class=" " data-blah="25293454534534513" data-currency="1">
<td class="action-cell no-sort">
</td>
<td class="col1 id">
<a class="alert-ico " data-tooltip=""></a>
<a class="isin-btn " data-tooltip="" id="isin" data-portfolioid="2423424" data-status="0">US3</a>
</td>
<td class="col2 name hide">4%</td>
<td class="col9 colNo.9" title="Bid: 101.23; Mid: 101.28; Ask: 101.33;
Liquidity Score: -*/5*; Merit: -/4;" data-bprice="101.28" data-uprice="101.28">101.28<span class="estim-star">*</span></td>
<td class="col10 price_change" nowrap="" data-sort="0.02"><span class="positive-change">0.02%</span><span class="change-sign positive-change">↑</span></td>
<td class="col11 yield yield-val" title="" data-sort="3.33" data-byield="3.33" data-uyield="3.34%">3.33%</td>
<td class="col12 purchase_price" data-bprice="101.28" data-uprice="101.28" data-sort="101.28"><input type="text" name="purchase_price" class="positive-num-only default" value="101.28"></td>
<td class="col13 margin_bond" data-bond="sec" data-sort="0"><input type="text" name="margin_bond" maxlength="3" class="positive-num-only default" value="0"></td>
</tr>
</tbody>
I'm trying to extract a text from column 'Price Change' (col 10) using lxml.html which allows me to extract data from big tables in a manner of seconds. I'm doing it like that:
import lxml.html
import pandas as pd
root = lxml.html.fromstring(self.driver.page_source)
data = []
for row in root.xpath('.//*[#id=\'main\']/div[5]/div[2]/table/tbody/tr'):
cells = row.xpath('.//td/text()')
So, I succeeded to extract the whole table like that and I know that the only exception is column 10 ('price change') and tried the following and it returned the empty string (""):
row.xpath('.//tr[1]/td[11][#data-sort]/text()')
row.xpath('.//[#id='main']/div[5]/div[2]/table/tbody/tr[1]/td[11]/span/text()')
row.xpath('.//*[#id='main']/div[5]/div[2]/table/tbody/tr[1]/td[11]/text()')
I don't want to extract the text using WebElement but only with lxml.html library
Thank you!
There are two problems
There are total 7 tds and not 11, the td you are intersted is 5 and not 11.
the td you are intersted in has two span and you are not providing which span you are interested in.
this code works perfectly fine.
html_code = """
<tbody class="table-body">
<tr class=" " data-blah="25293454534534513" data-currency="1">
<td class="action-cell no-sort">
</td>
<td class="col1 id">
<a class="alert-ico " data-tooltip=""></a>
<a class="isin-btn " data-tooltip="" id="isin" data-portfolioid="2423424" data-status="0">US3</a>
</td>
<td class="col2 name hide">4%</td>
<td class="col9 colNo.9" title="Bid: 101.23; Mid: 101.28; Ask: 101.33;
Liquidity Score: -*/5*; Merit: -/4;" data-bprice="101.28" data-uprice="101.28">101.28<span class="estim-star">*</span></td>
<td class="col10 price_change" nowrap="" data-sort="0.02">
<span class="positive-change">0.02%</span>
<span class="change-sign positive-change">↑</span></td>
<td class="col11 yield yield-val" title="" data-sort="3.33" data-byield="3.33" data-uyield="3.34%">3.33%</td>
<td class="col12 purchase_price" data-bprice="101.28" data-uprice="101.28" data-sort="101.28"><input type="text" name="purchase_price" class="positive-num-only default" value="101.28"></td>
<td class="col13 margin_bond" data-bond="sec" data-sort="0"><input type="text" name="margin_bond" maxlength="3" class="positive-num-only default" value="0"></td>
</tr>
</tbody>
"""
tree = html.fromstring(html_code)
print "purchase price is %s" % tree.xpath(".//td[contains(#class,'col10')]/span[1]/text()")[0]
print "purchase price is %s" % tree.xpath(".//td[5]/span[1]/text()")[0]

Iterating Through Table Rows in Selenium (Python)

I have a webpage with a table that only appears when I click 'Inspect Element' and is not visible through the View Source page. The table contains only two rows with several cells each and looks similar to this:
<table class="datadisplaytable">
<tbody>
<tr>
<td class="dddefault">16759</td>
<td class="dddefault">MATH</td>
<td class="dddefault">123</td>
<td class="dddefault">001</td>
<td class="dddefault">Calculus</td>
<td class="dddefault"></td>
<td class="dddead"></td>
<td class="dddead"></td>
</tr>
<tr>
<td class="dddefault">16449</td>
<td class="dddefault">PHY</td>
<td class="dddefault">456</td>
<td class="dddefault">002</td>
<td class="dddefault">Physics</td>
<td class="dddefault"></td>
<td class="dddead"></td>
<td class="dddead"></td>
</tr>
</tbody>
</table>
What I'm trying to do is to iterate through the rows and return the text contained in each cell. I can't really seem to do it with Selenium. The elements contain no IDs and I'm not sure how else to get them. I'm not very familiar with using xpaths and such.
Here is a debugging attempt that returns a TypeError:
def check_grades(self):
table = []
for i in self.driver.find_element_by_class_name("dddefault"):
table.append(i)
print(table)
What is an easy way to get the text from the rows?
XPath is fragile. It's better to use CSS selectors or classes:
mytable = find_element_by_css_selector('table.datadisplaytable')
for row in mytable.find_elements_by_css_selector('tr'):
for cell in row.find_elements_by_tag_name('td'):
print(cell.text)
If you want to go row by row using an xpath, you can use the following:
h = """<table class="datadisplaytable">
<tr>
<td class="dddefault">16759</td>
<td class="dddefault">MATH</td>
<td class="dddefault">123</td>
<td class="dddefault">001</td>
<td class="dddefault">Calculus</td>
<td class="dddefault"></td>
<td class="dddead"></td>
<td class="dddead"></td>
</tr>
<tr>
<td class="dddefault">16449</td>
<td class="dddefault">PHY</td>
<td class="dddefault">456</td>
<td class="dddefault">002</td>
<td class="dddefault">Physics</td>
<td class="dddefault"></td>
<td class="dddead"></td>
<td class="dddead"></td>
</tr>
</table>"""
from lxml import html
xml = html.fromstring(h)
# gets the table
table = xml.xpath("//table[#class='datadisplaytable']")[0]
# iterate over all the rows
for row in table.xpath(".//tr"):
# get the text from all the td's from each row
print([td.text for td in row.xpath(".//td[#class='dddefault'][text()])
Which outputs:
['16759', 'MATH', '123', '001', 'Calculus']
['16449', 'PHY', '456', '002', 'Physics']
Using td[text()] will avoid getting any Nones returned for the td's that hold no text.
So to do the same using selenium you would:
table = driver.find_element_by_xpath("//table[#class='datadisplaytable']")
for row in table.find_elements_by_xpath(".//tr"):
print([td.text for td in row.find_elements_by_xpath(".//td[#class='dddefault'][1]"])
For multiple tables:
def get_row_data(table):
for row in table.find_elements_by_xpath(".//tr"):
yield [td.text for td in row.find_elements_by_xpath(".//td[#class='dddefault'][text()]"])
for table in driver.find_elements_by_xpath("//table[#class='datadisplaytable']"):
for data in get_row_data(table):
# use the data
Correction of the Selenium part of #Padraic Cunningham's answer:
table = driver.find_element_by_xpath("//table[#class='datadisplaytable']")
for row in table.find_elements_by_xpath(".//tr"):
print([td.text for td in row.find_elements_by_xpath(".//td[#class='dddefault']")])
Note: there was one missing round bracket at the end; also removed the [1] index, to match the first XML example.
Another note: Though, the example with the index [1] should also be preserved, to show how to extract individual elements.
Another Version (modified and corrected post by Padraic Cunningham):
Tested with Python 3.x
#!/usr/bin/python
h = """<table class="datadisplaytable">
<tr>
<td class="dddefault">16759</td>
<td class="dddefault">MATH</td>
<td class="dddefault">123</td>
<td class="dddefault">001</td>
<td class="dddefault">Calculus</td>
<td class="dddefault"></td>
<td class="dddead"></td>
<td class="dddead"></td>
</tr>
<tr>
<td class="dddefault">16449</td>
<td class="dddefault">PHY</td>
<td class="dddefault">456</td>
<td class="dddefault">002</td>
<td class="dddefault">Physics</td>
<td class="dddefault"></td>
<td class="dddead"></td>
<td class="dddead"></td>
</tr>
</table>"""
from lxml import html
xml = html.fromstring(h)
# gets the table
table = xml.xpath("//table[#class='datadisplaytable']")[0]
# iterate over all the rows
for row in table.xpath(".//tr"):
# get the text from all the td's from each row
print([td.text for td in row.xpath(".//td[#class='dddefault']")])

How do i iterate through a table and print out the values of the columns Selenium Webdriver Python

I am having a problem in iterating through a html table and output the values from column 2 of each row. Column 2 which is td[2] in row 1 has the value "Name", column 2 in row 2 has the value "Address" and so on.
I am trying to print out the values to the console.
My Python webdriver code is:
# Get the table
table_xpath = self.driver.find_element(By.XPATH, 'html/body/div[2]/div[2]/div/div[4]/div/div[2]/div/div[3]/div/div[5]/div/div[3]/div/div[4]/div/div[2]/div/div[4]/div/div[3]/div/div[2]/div/div/table/tbody')
# Get the rows from the table
rows = table_xpath.find_elements(By.TAG_NAME, "tr.GAT4PNUFG.GAT4PNUMG")
# Get the columns (all the column 2)
cols = rows.find_elements(By.TAG_NAME, "td")[2]
for i in cols:
print cols.text
The error i get is:
File "C:\Webdriver\ClearCore 501\Pages\data_objects_saved_page.py", line 87, in verify_variables_created
cols = rows.find_elements(By.TAG_NAME, "td")[2]
AttributeError: 'list' object has no attribute 'find_elements'
The HTML snippet is:
<table cellspacing="0" style="table-layout: fixed; width: 100%;">
<colgroup>
<tbody>
<tr class="GAT4PNUFG GAT4PNUMG" __gwt_subrow="0" __gwt_row="0">
<td class="GAT4PNUEG GAT4PNUGG GAT4PNUHG GAT4PNUNG">
<td class="GAT4PNUEG GAT4PNUGG GAT4PNUNG">
<div __gwt_cell="cell-gwt-uid-324" style="outline-style:none;">
<span class="linkhover" title="Name" style="white-space:nowrap;overflow:hidden;text-overflow:ellipsis;empty-cells:show;display:block;color:#00A;cursor:pointer;">Name</span>
</div>
</td>
<td class="GAT4PNUEG GAT4PNUGG GAT4PNUNG">
<td class="GAT4PNUEG GAT4PNUGG GAT4PNUNG">
<td class="GAT4PNUEG GAT4PNUGG GAT4PNUNG">
<td class="GAT4PNUEG GAT4PNUGG GAT4PNUBH GAT4PNUNG">
</tr>
<tr class="GAT4PNUEH" __gwt_subrow="0" __gwt_row="1">
<td class="GAT4PNUEG GAT4PNUFH GAT4PNUHG">
<td class="GAT4PNUEG GAT4PNUFH">
<div __gwt_cell="cell-gwt-uid-324" style="outline-style:none;">
<span class="linkhover" title="Address" style="white-space:nowrap;overflow:hidden;text-overflow:ellipsis;empty-cells:show;display:block;color:#00A;cursor:pointer;">Address</span>
</div>
</td>
<td class="GAT4PNUEG GAT4PNUFH">
<td class="GAT4PNUEG GAT4PNUFH">
<td class="GAT4PNUEG GAT4PNUFH">
<td class="GAT4PNUEG GAT4PNUFH GAT4PNUBH">
</tr>
<tr class="GAT4PNUFG" __gwt_subrow="0" __gwt_row="2">
<td class="GAT4PNUEG GAT4PNUGG GAT4PNUHG">
<td class="GAT4PNUEG GAT4PNUGG">
<div __gwt_cell="cell-gwt-uid-324" style="outline-style:none;">
<span class="linkhover" title="DOB" style="white-space:nowrap;overflow:hidden;text-overflow:ellipsis;empty-cells:show;display:block;color:#00A;cursor:pointer;">DOB</span>
</div>
</td>
<td class="GAT4PNUEG GAT4PNUGG">
<td class="GAT4PNUEG GAT4PNUGG">
<td class="GAT4PNUEG GAT4PNUGG">
<td class="GAT4PNUEG GAT4PNUGG GAT4PNUBH">
</tr>
<tr class="GAT4PNUEH" __gwt_subrow="0" __gwt_row="3">
---
<tr class="GAT4PNUFG" __gwt_subrow="0" __gwt_row="4">
---
</tbody>
</table>
How can i iterate through this table and print out the values in Python, Webdriver please?
You need to iterate over rows:
rows = table_xpath.find_elements(By.TAG_NAME, "tr.GAT4PNUFG.GAT4PNUMG")
for row in rows:
# Get the columns (all the column 2)
col = row.find_elements(By.TAG_NAME, "td")[2]
print col.text
The developer has put an ID into the table. I have it working now. It is printing all the cell values from column 2. The code is:
table_id = self.driver.find_element(By.ID, 'data_configuration_feeds_ct_fields_body0')
rows = table_id.find_elements(By.TAG_NAME, "tr") # get all of the rows in the table
for row in rows:
# Get the columns (all the column 2)
col = row.find_elements(By.TAG_NAME, "td")[1] #note: index start from 0, 1 is col 2
print col.text

Categories

Resources