I have an HTML Table:
<div class="report-data">
<table>
<thead>
<tr>
<td></td>
<td>All</td>
<td>Long</td>
<td>Short</td>
</tr>
</thead>
<tbody>
<tr>
<td>Net Profit</td>
<td>
<div>3644.65</div>
<div><span class="additional_percent_value">3.64 %</span></div>
</td>
<td>
<div>3713.90</div>
<div><span class="additional_percent_value">3.71 %</span></div>
</td>
<td>
<div><span class="neg">69.25</span></div>
<div><span class="additional_percent_value"><span class="neg">0.07 %</span></span>
</div>
</td>
</tr>
<tr>
<td>Net Profit</td>
<td>
<div>3644.65</div>
<div><span class="additional_percent_value">3.64 %</span></div>
</td>
<td>
<div>3713.90</div>
<div><span class="additional_percent_value">3.71 %</span></div>
</td>
<td>
<div><span class="neg">69.25</span></div>
<div><span class="additional_percent_value"><span class="neg">0.07 %</span></span>
</div>
</td>
</tr>
</tbody>
</table>
</div>
Now I want to print all the td[1] values for each row, so My output should be:
Net Profit
Net Profit
So I executed the below code:
for dt in driver.find_element_by_xpath("//div[#class='report-data']/following-sibling::table/tbody/tr"):
text_label = dt.find_element_by_xpath(".//td").text
print(text_label)
But it throws error:
selenium.common.exceptions.NoSuchElementException: Message: no such
element: Unable to locate element:
{"method":"xpath","selector":"//div[#class='report-data']/following-sibling::table/tbody/tr"}
You're almost there, I believe. Try this:
content = driver.find_elements_by_xpath("//div[#class='report-data']/table/tbody/tr")
for dt in content:
text_label = dt.find_element_by_xpath("./td").text
print(text_label)
Related
is it possible to capture all EAN numbers in such a construct using XPath, or do I need to use regular expressions?
<table>
<tr>
<td>
EAN Giftbox
</td>
<td>
7350034654483
</td>
</tr>
<tr>
<td>
EAN Export Carton:
</td>
<td>
17350034643958
</td>
</tr>
</table>
I want to get a list of ['7350034654483', '17350034643958']
from lxml import html as lh
html = """<table>
<tr>
<td>
EAN Giftbox
</td>
<td>
7350034654483
</td>
</tr>
<tr>
<td>
EAN Export Carton:
</td>
<td>
17350034643958
</td>
</tr>
</table>
"""
root = lh.fragment_fromstring(html)
tds = root.xpath('//tr[*]/td[2]')
for td in tds:
print(td.text.strip())
Output:
7350034654483
17350034643958
I'm trying to extract data from a table that lies in between two headers in an html file using Python. IN this case, the required id to lookup lies in a span inside a header (I need id="Perlis", which lies between Perlis and Kedah):
<h2>
<span class="mw-headline" id="Perlis">Perlis</span>
<span class="mw-editsection">
<span class="mw-editsection-bracket">[</span>
edit
<span class="mw-editsection-bracket">]</span>
</span>
</h2>
<table class="wikitable" style="text-align:center; font-size:90%; width:100%;">
<tbody>
<tr>
<th width="30"># </th>
<th width="150">Constituency s </th>
<th width="150">Winner </th>
<th width="80">Votes </th>
<th width="80">Majority </th>
<th width="150">Opponent(s) </th>
<th width="80">Votes </th>
<th width="150">Incumbent </th>
<th width="80">
<b>Incumbent Majority</b>
</th>
</tr>
<tr>
<td colspan="13">
BN
<b>2</b> | GS
<b>0</b> | PH
<b>1</b> | Independent
<b>0</b>
</td>
</tr>
<tr align="center">
<td rowspan="2">P1 </td>
<td rowspan="2">
Padang Besar
</td>
<td rowspan="2" bgcolor="#B5BED9">
Zahidi Zainul Abidin
<br /> ( <b>BN</b>- <b>UMNO</b>)
</td>
<td rowspan="2">
<b>15,032</b>
</td>
<td rowspan="2">
<b>1,438</b>
</td>
<td bgcolor="#F18A8F">Izizam Ibrahim <br /> ( <b>PH</b>- <b>PPBM</b>) </td>
<td>
<b>13,594</b>
</td>
<td rowspan="2" bgcolor="#B5BED9">
Zahidi Zainul Abidin
<br /> ( <b>BN</b>- <b>UMNO</b>)
</td>
<td rowspan="2">
<b>7,426</b>
</td>
</tr>
<tr>
<td bgcolor="#B2DBB2">Mokhtar Senik <br /> ( <b>GS</b>- <b>PAS</b>) </td>
<td>
<b>7,874</b>
</td>
</tr>
<tr align="center">
<td rowspan="2">P2 </td>
<td rowspan="2">
Kangar
</td>
<td rowspan="2" bgcolor="#C7F2F2">Noor Amin Ahmad <br /> ( <b>PH</b>- <b>PKR</b>) </td>
<td rowspan="2">
<b>20,909</b>
</td>
<td rowspan="2">
<b>5,603</b>
</td>
<td bgcolor="#B5BED9">Ramli Shariff <br /> ( <b>BN</b>- <b>UMNO</b>) </td>
<td>
<b>15,306</b>
</td>
<td rowspan="2" bgcolor="#B5BED9">
Shaharuddin Ismail
<br /> ( <b>BN</b>- <b>UMNO</b>)
</td>
<td rowspan="2">
<b>4,037</b>
</td>
</tr>
<tr>
<td bgcolor="#B2DBB2">Mohamad Zahid Ibrahim <br /> ( <b>GS</b>- <b>PAS</b>) </td>
<td>
<b>8,465</b>
</td>
</tr>
</tbody>
</table>
<h2>
<span class="mw-headline" id="Kedah">Kedah</span>
<span class="mw-editsection">
<span class="mw-editsection-bracket">[</span>
edit
<span class="mw-editsection-bracket">]</span>
</span>
</h2>
<table class="wikitable" style="text-align:center; font-size:90%; width:100%;"></table>
This is the resulting JSON that I am trying to construct:
[
{
"state": "Perlis",
"constituencies": [
{
"id": "P1",
"name": "Padang Besar"
},
{
"id": "P2",
"name": "Kangar"
}
]
}
]
I'd like to know how to reference the specific table so I can extract the data into a JSON format. I have used Scrapy before but not sure how to in this case- this is what I had in mind:
class PostSpider(scrapy.Spider):
name = 'manual_spider'
start_urls = [
'%URL%'
]
def parse(self, response):
doc = response.xpath('//comment()').getall() //This is the bit I need
//code continues here
Trying to find multiple tables using the CSS names and I am only getting the CSS in the output initially. I want to loop over each of the small tables and from there each row contains player info with the tds attributes about each player. How come what I have there doesn't actually print the table contents to begin with? I want to confirm I have made this first step right, before I then go on and into
the tr and tds for each mini table. I think part of the issue is that the first table.
My program -
import requests
from bs4 import BeautifulSoup
#url = 'https://www.skysports.com/premier-league-table'
base_url = 'https://www.skysports.com'
# Squad Data
squad_url = base_url + '/liverpool-squad'
squad_r = requests.get(squad_url)
print(squad_r.status_code)
premier_squad_soup = BeautifulSoup(squad_r.text, 'html.parser')
premier_squad_table = premier_squad_soup.find_all = ('table', {'class': 'table -small no-wrap football-squad-table '})
print(premier_squad_table)
HTML -
each table looks like the following but with a different title
<table class="table -small no-wrap football-squad-table " title="Goalkeeper">
<colgroup>
<col class="" style="">
<col class="digit-4 -bp30-hdn">
<col class="digit-3 ">
<col class="digit-3 ">
<col class="digit-3 ">
</colgroup>
<thead>
<tr class="text-s -interact text-h6" style="">
<th class=" text-h4 -txt-left" title="">Goalkeeper</th>
<th class=" text-h6" title="Played">Pld</th>
<th class=" text-h6" title="Goals">G</th>
<th class=" text-h6" title="Yellow Cards ">YC</th>
<th class=" text-h6" title="Red Cards">RC</th>
</tr>
</thead>
<tbody>
<tr class="text-h6 -center">
<td>
<a href="/football/player/141016/alisson-ramses-becker">
<div class="row-table -2cols">
<span class="col span4/5 -txt-left"><h6 class=" text-h5">Alisson Ramses Becker</h6></span>
</div>
</a>
</td>
<td>
13 (0) </td>
<td>0</td>
<td>0</td>
<td>0</td>
</tr>
<tr class="text-h6 -center">
<td>
<a href="/simon-mignolet">
<div class="row-table -2cols">
<span class="col span4/5 -txt-left"><h6 class=" text-h5">Simon Mignolet</h6></span>
</div>
</a>
</td>
<td>
1 (0) </td>
<td>0</td>
<td>0</td>
<td>0</td>
</tr>
<tr class="text-h6 -center">
<td>
<a href="/football/player/153304/kamil-grabara">
<div class="row-table -2cols">
<span class="col span4/5 -txt-left"><h6 class=" text-h5">Kamil Grabara</h6></span>
</div>
</a>
</td>
<td>
1 (1) </td>
<td>0</td>
<td>0</td>
<td>0</td>
</tr>
</tbody>
</table>
Output -
200
('table', {'class': 'table -small no-wrap football-squad-table '})
Had to find the div first to then get the table inside the div
premier_squad_div = premier_squad_soup.find('div', {'class': '-bp30-box col span1/1'})
premier_squad_table = premier_squad_div.find_all('table', {'class': 'table -small no-wrap football-squad-table '})
I have a table that has these headers, like this:
How would I select the whole column using xpath to store in an array.
I was hoping for different arrays, like:
courses = []
teacher = []
avg = []
Bare in mind these column don't have any ID's or classes, so I need a way to select just by using the name of the column.
Here is the code for the table:
<table border="0">
<tbody>
<tr>
<td nowrap="nowrap">Courses</td>
<td nowrap="nowrap">Teacher</td>
<td><select name="fldMarkingPeriod" onchange="switchMarkingPeriod(this.value);">
<option value="MP1">MP1</option>
<option selected="selected" value="MP2">MP2</option>
<option value="MP3">MP3</option>
</select>Avg</td>
</tr>
<tr>
<td nowrap="nowrap">[Course Name]</td>
<td nowrap="nowrap">[Teacher Name]</td>
<td>
<table width="100%" cellspacing="0" cellpadding="0">
<tbody>
<tr>
<td title="View Course Summary" width="70%">100%</td>
<td width="30%">A+</td>
</tr>
</tbody>
</table>
</td>
</tr>
<tr>
<td nowrap="nowrap">[Course Name]</td>
<td nowrap="nowrap">[Teacher Name]</td>
<td>
<table width="100%" cellspacing="0" cellpadding="0">
<tbody>
<tr>
<td title="View Course Summary" width="70%">100%</td>
<td width="30%">A+</td>
</tr>
</tbody>
</table>
</td>
</tr>
<tr>
<td nowrap="nowrap">[Course Name]</td>
<td nowrap="nowrap">[Teacher Name]</td>
<td>
<table width="100%" cellspacing="0" cellpadding="0">
<tbody>
<tr>
<td title="View Course Summary" width="70%">100%</td>
<td width="30%">A+</td>
</tr>
</tbody>
</table>
</td>
</tr>
</tbody>
</table>
Any ideas? Thanks.
Not sure why exactly you need the data by columns, but here is a sample implementation:
courses = []
teachers = []
avgs = []
for row in table.find_elements_by_css("table > tbody > tr")[1:]:
course, teacher, _, avg = [td.text for td in row.find_elements_by_xpath(".//td")]
courses.append(course)
teachers.append(teacher)
avgs.append(avg)
I am parsing a HTML file with BeautifulSoup and got stuck with < br> tags.
I want to append < br> tag after inserting a list element, but it didn't work.
What is the easiest way to do this?
soup = BeautifulSoup(open("test.html"))
mylist = [Item_1,Item_2]
for i in range(len(mylist)):
#insert Items to the 4. column
This is the default HTML:
<html>
<body>
<table>
<tr>
<th>
1. Column
</th>
<th>
2. Column
</th>
<th>
3. Column
</th>
<th>
4. Column
</th>
<th>
5. Column
</th>
<th>
6. Column
</th>
<th>
7. Column
</th>
<th>
8. Column
</th>
</tr>
<tr class="a">
<td class="h">
Text in first column
</td>
<td>
<br/>
</td>
<td>
<br/>
</td>
<td>
<!--I want to insert items here-->
</td>
<td>
1
</td>
<td>
37
</td>
<td>
38
</td>
<td>
38
</td>
</tr>
</table>
</body>
</html>
This is the HTML i want to make
<html>
<body>
<table>
<tr>
<th>
1. Column
</th>
<th>
2. Column
</th>
<th>
3. Column
</th>
<th>
4. Column
</th>
<th>
5. Column
</th>
<th>
6. Column
</th>
<th>
7. Column
</th>
<th>
8. Column
</th>
</tr>
<tr class="a">
<td class="h">
Text in first column
</td>
<td>
<br/>
</td>
<td>
<br/>
</td>
<td>
Item_1 <br>
Item_2
</td>
<td>
1
</td>
<td>
37
</td>
<td>
38
</td>
<td>
38
</td>
</tr>
</table>
</body>
</html>
To append a tag, first create it with the new_tag() factory function, like so:
soup.td.append(soup.new_tag('br'))
Consider the following program. For every table cell (that is, every td) in the html, it appends a <br/> tag and some text to the cell.
from bs4 import BeautifulSoup
html_doc = '''
<html>
<body>
<table>
<tr>
<td>
data1
</td>
<td>
data2
</td>
</tr>
</table>
</body>
</html>
'''
soup = BeautifulSoup(html_doc)
mylist = ['addendum 1', 'addendum 2']
for td,item in zip(soup.find_all('td'), mylist):
td.append(soup.new_tag('br'))
td.append(item)
print soup.prettify()
Result:
<html>
<body>
<table>
<tr>
<td>
data1
<br/>
addendum 1
</td>
<td>
data2
<br/>
addendum 2
</td>
</tr>
</table>
</body>
</html>