Loop through table rows and print text in selenium using python

Loop through table rows and print text in selenium using python - python

I have an HTML Table:
<div class="report-data">
<table>
<thead>
<tr>
<td></td>
<td>All</td>
<td>Long</td>
<td>Short</td>
</tr>
</thead>
<tbody>
<tr>
<td>Net Profit</td>
<td>
<div>3644.65</div>
<div><span class="additional_percent_value">3.64 %</span></div>
</td>
<td>
<div>3713.90</div>
<div><span class="additional_percent_value">3.71 %</span></div>
</td>
<td>
<div><span class="neg">69.25</span></div>
<div><span class="additional_percent_value"><span class="neg">0.07 %</span></span>
</div>
</td>
</tr>
<tr>
<td>Net Profit</td>
<td>
<div>3644.65</div>
<div><span class="additional_percent_value">3.64 %</span></div>
</td>
<td>
<div>3713.90</div>
<div><span class="additional_percent_value">3.71 %</span></div>
</td>
<td>
<div><span class="neg">69.25</span></div>
<div><span class="additional_percent_value"><span class="neg">0.07 %</span></span>
</div>
</td>
</tr>
</tbody>
</table>
</div>
Now I want to print all the td[1] values for each row, so My output should be:
Net Profit
Net Profit
So I executed the below code:
for dt in driver.find_element_by_xpath("//div[#class='report-data']/following-sibling::table/tbody/tr"):
text_label = dt.find_element_by_xpath(".//td").text
print(text_label)
But it throws error:
selenium.common.exceptions.NoSuchElementException: Message: no such
element: Unable to locate element:
{"method":"xpath","selector":"//div[#class='report-data']/following-sibling::table/tbody/tr"}

You're almost there, I believe. Try this:
content = driver.find_elements_by_xpath("//div[#class='report-data']/table/tbody/tr")
for dt in content:
text_label = dt.find_element_by_xpath("./td").text
print(text_label)

Related

Collect text using XPath

is it possible to capture all EAN numbers in such a construct using XPath, or do I need to use regular expressions?
<table>
<tr>
<td>
EAN Giftbox
</td>
<td>
7350034654483
</td>
</tr>
<tr>
<td>
EAN Export Carton:
</td>
<td>
17350034643958
</td>
</tr>
</table>
I want to get a list of ['7350034654483', '17350034643958']

from lxml import html as lh
html = """<table>
<tr>
<td>
EAN Giftbox
</td>
<td>
7350034654483
</td>
</tr>
<tr>
<td>
EAN Export Carton:
</td>
<td>
17350034643958
</td>
</tr>
</table>
"""
root = lh.fragment_fromstring(html)
tds = root.xpath('//tr[*]/td[2]')
for td in tds:
print(td.text.strip())
Output:
7350034654483
17350034643958

Xpath Python Extract Data From Table Between Two Headings

I'm trying to extract data from a table that lies in between two headers in an html file using Python. IN this case, the required id to lookup lies in a span inside a header (I need id="Perlis", which lies between Perlis and Kedah):
<h2>
<span class="mw-headline" id="Perlis">Perlis</span>
<span class="mw-editsection">
<span class="mw-editsection-bracket">[</span>
edit
<span class="mw-editsection-bracket">]</span>
</span>
</h2>
<table class="wikitable" style="text-align:center; font-size:90%; width:100%;">
<tbody>
<tr>
<th width="30"># </th>
<th width="150">Constituency s </th>
<th width="150">Winner </th>
<th width="80">Votes </th>
<th width="80">Majority </th>
<th width="150">Opponent(s) </th>
<th width="80">Votes </th>
<th width="150">Incumbent </th>
<th width="80">
<b>Incumbent Majority</b>
</th>
</tr>
<tr>
<td colspan="13">
BN
<b>2</b> | GS
<b>0</b> | PH
<b>1</b> | Independent
<b>0</b>
</td>
</tr>
<tr align="center">
<td rowspan="2">P1 </td>
<td rowspan="2">
Padang Besar
</td>
<td rowspan="2" bgcolor="#B5BED9">
Zahidi Zainul Abidin
<br /> ( <b>BN</b>- <b>UMNO</b>)
</td>
<td rowspan="2">
<b>15,032</b>
</td>
<td rowspan="2">
<b>1,438</b>
</td>
<td bgcolor="#F18A8F">Izizam Ibrahim <br /> ( <b>PH</b>- <b>PPBM</b>) </td>
<td>
<b>13,594</b>
</td>
<td rowspan="2" bgcolor="#B5BED9">
Zahidi Zainul Abidin
<br /> ( <b>BN</b>- <b>UMNO</b>)
</td>
<td rowspan="2">
<b>7,426</b>
</td>
</tr>
<tr>
<td bgcolor="#B2DBB2">Mokhtar Senik <br /> ( <b>GS</b>- <b>PAS</b>) </td>
<td>
<b>7,874</b>
</td>
</tr>
<tr align="center">
<td rowspan="2">P2 </td>
<td rowspan="2">
Kangar
</td>
<td rowspan="2" bgcolor="#C7F2F2">Noor Amin Ahmad <br /> ( <b>PH</b>- <b>PKR</b>) </td>
<td rowspan="2">
<b>20,909</b>
</td>
<td rowspan="2">
<b>5,603</b>
</td>
<td bgcolor="#B5BED9">Ramli Shariff <br /> ( <b>BN</b>- <b>UMNO</b>) </td>
<td>
<b>15,306</b>
</td>
<td rowspan="2" bgcolor="#B5BED9">
Shaharuddin Ismail
<br /> ( <b>BN</b>- <b>UMNO</b>)
</td>
<td rowspan="2">
<b>4,037</b>
</td>
</tr>
<tr>
<td bgcolor="#B2DBB2">Mohamad Zahid Ibrahim <br /> ( <b>GS</b>- <b>PAS</b>) </td>
<td>
<b>8,465</b>
</td>
</tr>
</tbody>
</table>
<h2>
<span class="mw-headline" id="Kedah">Kedah</span>
<span class="mw-editsection">
<span class="mw-editsection-bracket">[</span>
edit
<span class="mw-editsection-bracket">]</span>
</span>
</h2>
<table class="wikitable" style="text-align:center; font-size:90%; width:100%;"></table>
This is the resulting JSON that I am trying to construct:
[
{
"state": "Perlis",
"constituencies": [
{
"id": "P1",
"name": "Padang Besar"
},
{
"id": "P2",
"name": "Kangar"
}
]
}
]
I'd like to know how to reference the specific table so I can extract the data into a JSON format. I have used Scrapy before but not sure how to in this case- this is what I had in mind:
class PostSpider(scrapy.Spider):
name = 'manual_spider'
start_urls = [
'%URL%'
]
def parse(self, response):
doc = response.xpath('//comment()').getall() //This is the bit I need
//code continues here

Python Beautiful Soup Iterate over Multiple Tables

Trying to find multiple tables using the CSS names and I am only getting the CSS in the output initially. I want to loop over each of the small tables and from there each row contains player info with the tds attributes about each player. How come what I have there doesn't actually print the table contents to begin with? I want to confirm I have made this first step right, before I then go on and into
the tr and tds for each mini table. I think part of the issue is that the first table.
My program -
import requests
from bs4 import BeautifulSoup
#url = 'https://www.skysports.com/premier-league-table'
base_url = 'https://www.skysports.com'
# Squad Data
squad_url = base_url + '/liverpool-squad'
squad_r = requests.get(squad_url)
print(squad_r.status_code)
premier_squad_soup = BeautifulSoup(squad_r.text, 'html.parser')
premier_squad_table = premier_squad_soup.find_all = ('table', {'class': 'table -small no-wrap football-squad-table '})
print(premier_squad_table)
HTML -
each table looks like the following but with a different title
<table class="table -small no-wrap football-squad-table " title="Goalkeeper">
<colgroup>
<col class="" style="">
<col class="digit-4 -bp30-hdn">
<col class="digit-3 ">
<col class="digit-3 ">
<col class="digit-3 ">
</colgroup>
<thead>
<tr class="text-s -interact text-h6" style="">
<th class=" text-h4 -txt-left" title="">Goalkeeper</th>
<th class=" text-h6" title="Played">Pld</th>
<th class=" text-h6" title="Goals">G</th>
<th class=" text-h6" title="Yellow Cards ">YC</th>
<th class=" text-h6" title="Red Cards">RC</th>
</tr>
</thead>
<tbody>
<tr class="text-h6 -center">
<td>
<a href="/football/player/141016/alisson-ramses-becker">
<div class="row-table -2cols">
<span class="col span4/5 -txt-left"><h6 class=" text-h5">Alisson Ramses Becker</h6></span>
</div>
</a>
</td>
<td>
13 (0) </td>
<td>0</td>
<td>0</td>
<td>0</td>
</tr>
<tr class="text-h6 -center">
<td>
<a href="/simon-mignolet">
<div class="row-table -2cols">
<span class="col span4/5 -txt-left"><h6 class=" text-h5">Simon Mignolet</h6></span>
</div>
</a>
</td>
<td>
1 (0) </td>
<td>0</td>
<td>0</td>
<td>0</td>
</tr>
<tr class="text-h6 -center">
<td>
<a href="/football/player/153304/kamil-grabara">
<div class="row-table -2cols">
<span class="col span4/5 -txt-left"><h6 class=" text-h5">Kamil Grabara</h6></span>
</div>
</a>
</td>
<td>
1 (1) </td>
<td>0</td>
<td>0</td>
<td>0</td>
</tr>
</tbody>
</table>
Output -
200
('table', {'class': 'table -small no-wrap football-squad-table '})

Had to find the div first to then get the table inside the div
premier_squad_div = premier_squad_soup.find('div', {'class': '-bp30-box col span1/1'})
premier_squad_table = premier_squad_div.find_all('table', {'class': 'table -small no-wrap football-squad-table '})

Python Selenium Copy Table Columns by Column Name

I have a table that has these headers, like this:
How would I select the whole column using xpath to store in an array.
I was hoping for different arrays, like:
courses = []
teacher = []
avg = []
Bare in mind these column don't have any ID's or classes, so I need a way to select just by using the name of the column.
Here is the code for the table:
<table border="0">
<tbody>
<tr>
<td nowrap="nowrap">Courses</td>
<td nowrap="nowrap">Teacher</td>
<td><select name="fldMarkingPeriod" onchange="switchMarkingPeriod(this.value);">
<option value="MP1">MP1</option>
<option selected="selected" value="MP2">MP2</option>
<option value="MP3">MP3</option>
</select>Avg</td>
</tr>
<tr>
<td nowrap="nowrap">[Course Name]</td>
<td nowrap="nowrap">[Teacher Name]</td>
<td>
<table width="100%" cellspacing="0" cellpadding="0">
<tbody>
<tr>
<td title="View Course Summary" width="70%">100%</td>
<td width="30%">A+</td>
</tr>
</tbody>
</table>
</td>
</tr>
<tr>
<td nowrap="nowrap">[Course Name]</td>
<td nowrap="nowrap">[Teacher Name]</td>
<td>
<table width="100%" cellspacing="0" cellpadding="0">
<tbody>
<tr>
<td title="View Course Summary" width="70%">100%</td>
<td width="30%">A+</td>
</tr>
</tbody>
</table>
</td>
</tr>
<tr>
<td nowrap="nowrap">[Course Name]</td>
<td nowrap="nowrap">[Teacher Name]</td>
<td>
<table width="100%" cellspacing="0" cellpadding="0">
<tbody>
<tr>
<td title="View Course Summary" width="70%">100%</td>
<td width="30%">A+</td>
</tr>
</tbody>
</table>
</td>
</tr>
</tbody>
</table>
Any ideas? Thanks.

Not sure why exactly you need the data by columns, but here is a sample implementation:
courses = []
teachers = []
avgs = []
for row in table.find_elements_by_css("table > tbody > tr")[1:]:
course, teacher, _, avg = [td.text for td in row.find_elements_by_xpath(".//td")]
courses.append(course)
teachers.append(teacher)
avgs.append(avg)

How can i append <br> tags after a text element?

I am parsing a HTML file with BeautifulSoup and got stuck with < br> tags.
I want to append < br> tag after inserting a list element, but it didn't work.
What is the easiest way to do this?
soup = BeautifulSoup(open("test.html"))
mylist = [Item_1,Item_2]
for i in range(len(mylist)):
#insert Items to the 4. column
This is the default HTML:
<html>
<body>
<table>
<tr>
<th>
1. Column
</th>
<th>
2. Column
</th>
<th>
3. Column
</th>
<th>
4. Column
</th>
<th>
5. Column
</th>
<th>
6. Column
</th>
<th>
7. Column
</th>
<th>
8. Column
</th>
</tr>
<tr class="a">
<td class="h">
Text in first column
</td>
<td>
<br/>
</td>
<td>
<br/>
</td>
<td>
<!--I want to insert items here-->
</td>
<td>
1
</td>
<td>
37
</td>
<td>
38
</td>
<td>
38
</td>
</tr>
</table>
</body>
</html>
This is the HTML i want to make
<html>
<body>
<table>
<tr>
<th>
1. Column
</th>
<th>
2. Column
</th>
<th>
3. Column
</th>
<th>
4. Column
</th>
<th>
5. Column
</th>
<th>
6. Column
</th>
<th>
7. Column
</th>
<th>
8. Column
</th>
</tr>
<tr class="a">
<td class="h">
Text in first column
</td>
<td>
<br/>
</td>
<td>
<br/>
</td>
<td>
Item_1 <br>
Item_2
</td>
<td>
1
</td>
<td>
37
</td>
<td>
38
</td>
<td>
38
</td>
</tr>
</table>
</body>
</html>

To append a tag, first create it with the new_tag() factory function, like so:
soup.td.append(soup.new_tag('br'))
Consider the following program. For every table cell (that is, every td) in the html, it appends a <br/> tag and some text to the cell.
from bs4 import BeautifulSoup
html_doc = '''
<html>
<body>
<table>
<tr>
<td>
data1
</td>
<td>
data2
</td>
</tr>
</table>
</body>
</html>
'''
soup = BeautifulSoup(html_doc)
mylist = ['addendum 1', 'addendum 2']
for td,item in zip(soup.find_all('td'), mylist):
td.append(soup.new_tag('br'))
td.append(item)
print soup.prettify()
Result:
<html>
<body>
<table>
<tr>
<td>
data1
<br/>
addendum 1
</td>
<td>
data2
<br/>
addendum 2
</td>
</tr>
</table>
</body>
</html>

Develop Reference

Python is a programming language that lets you work quickly and integrate systems more effectively.

Loop through table rows and print text in selenium using python - python

You're almost there, I believe. Try this: content = driver.find_elements_by_xpath("//div[#class='report-data']/table/tbody/tr") for dt in content: text_label = dt.find_element_by_xpath("./td").text print(text_label)

Related

Collect text using XPath

Xpath Python Extract Data From Table Between Two Headings

Python Beautiful Soup Iterate over Multiple Tables

Python Selenium Copy Table Columns by Column Name

How can i append <br> tags after a text element?

Categories

Resources