I have this piece of code i want to scrape from a table:
<tr id="vsViewer1_dgMainView_dgMainView_ctl02" class="GridItem odd">
<td class=" ">
<a class="hlPopup" id="lbdgMainView$ctl02" name="lbdgMainView$ctl02" onclick="wrjl_test(this,'lbdgMainView$ctl02','746402:O9oY58XKE+w=:746402:746402')" onmouseover="this.className='HLPopupOver'" onmouseout="this.className='HLPopup'"></a>
<span class="HLPopup" id="lbldgMainView$ctl02" name="lbldgMainView$ctl02" onclick="wrjl_test(this,'lbldgMainView$ctl02','746402:O9oY58XKE+w=:746402:746402')"> Info </span>
</td>
<td align="center" class=" ">746402</td>
<td align="center" class=" ">Wyndham Orlando Resort International Drive</td>
<td align="center" class=" ">Interiano, Ana</td>
<td align="center" class=" ">Yes</td>
<td align="center" class=" ">7.32</td>
<td align="left" class=" ">
<table width="250" class="TextTableSmall" border="0">
<tbody>
<tr>
<td align="center" style="background-color: rgb(128, 128, 128); text-align: center; font-size: 8pt;">Date</td>
<td align="center" style="background-color: rgb(128, 128, 128); text-align: center; font-size: 8pt;">In</td>
<td align="center" style="background-color: rgb(128, 128, 128); text-align: center; font-size: 8pt;">Out</td>
<td align="center" style="background-color: rgb(128, 128, 128); text-align: center; font-size: 8pt;">Hours</td>
<td style="background-color: rgb(128, 128, 128); text-align: center; font-size: 8pt;">Shift</td>
</tr>
<tr>
<td style="background-color: rgb(204, 204, 153); text-align: left; font-size: 8pt;">Thu 10/24/19</td>
<td align="right" style="background-color: rgb(204, 204, 153); text-align: left; font-size: 8pt;">8:00am</td>
<td align="right" style="background-color: rgb(204, 204, 153); text-align: left; font-size: 8pt;">1:20pm</td>
<td align="right" style="background-color: rgb(204, 204, 153); text-align: left; font-size: 8pt;">5.33</td>
<td align="center" style="background-color: rgb(204, 204, 153); text-align: left; font-size: 8pt;">1
<br>FL ORL Wyndham Resort I Drive 18128 - Housekeeping
<br>Room Attendant
</td>
</tr>
<tr>
<td style="background-color: rgb(204, 204, 153); text-align: left; font-size: 8pt;">Thu 10/24/19</td>
<td align="right" style="background-color: rgb(204, 204, 153); text-align: left; font-size: 8pt;">1:39pm</td>
<td align="right" style="background-color: rgb(204, 204, 153); text-align: left; font-size: 8pt;">3:38pm</td>
<td align="right" style="background-color: rgb(204, 204, 153); text-align: left; font-size: 8pt;">1.98</td>
<td align="center" style="background-color: rgb(204, 204, 153); text-align: left; font-size: 8pt;">1
<br>FL ORL Wyndham Resort I Drive 18128 - Housekeeping
<br>Room Attendant
</td>
</tr>
</tbody>
</table>
</td>
<td align="right" class=" ">12.25</td>
<td class=" ">9.0000</td>
<td align="center" class=" ">1</td>
<td align="center" class=" ">Housekeeper</td>
<td align="center" class=" ">HOUSEKEEPER</td>
<td align="center" class=" ">SE-FL-Orlando</td>
<td align="center" class=" ">Wyndham Hotel Group</td>
</tr>
i've done this:
from bs4 import BeautifulSoup
import requests
with open('vsShowViewTWO.html') as html_file:
soup = BeautifulSoup(html_file,'lxml')
tbody = soup.find('tbody',id='thetbody')
table_rows=tbody.find_all('tr')
for tr in table_rows:
td = tr.find_all('td')
row = [i.text for i in td]
print(row)
and the results are:
[' Info ', '746402', 'Resort International', 'Interiano, Ana', 'Yes', '7.32', 'DateInOutHoursShiftThu 10/24/198:00am1:20pm5.331Resort I Drive 18128 - HousekeepingRoom AttendantThu 10/24/191:39pm3:38pm1.981Resort I Drive 18128 - HousekeepingRoom Attendant', 'Date', 'In', 'Out', 'Hours', 'Shift', 'Thu 10/24/19', '8:00am', '1:20pm', '5.33', '1Resort I Drive 18128 - HousekeepingRoom Attendant', 'Thu 10/24/19', '1:39pm', '3:38pm', '1.98', '1 Resort I Drive 18128 - HousekeepingRoom Attendant', '12.25', '9.0000', '1', 'Housekeeper', 'HOUSEKEEPER', 'SE', 'Hotel Group']
but i don't need the whole row just the name "Interiano, Ana" and the last "HOUSEKEEPER", i've been trying with indexing the rows var with no luck
Related
I want to find a specific cell in a table based on coordinates.
HTML
<div id="model" class="mobile handsontable htRowHeaders htColumnHeaders" style="height: 100%; overflow: hidden;" data-originalstyle="height: 100%; overflow: hidden;">
<div class="ht_master handsontable"><div class="wtHolder" style="position: relative; height: 380px; width: 1468px;">
<div class="wtHider" style="width: 2550px; height: 2109px;">
<div class="wtSpreader" style="position: relative; top: 0px; left: 0px;">
<table class="htCore">
<colgroup>
<col class="rowHeader" style="width: 50px;">
<col style="width: 400px;">
<col style="width: 70px;">
<col style="width: 70px;">
<col style="width: 70px;">
<col style="width: 70px;">
<col style="width: 70px;">
<col style="width: 70px;">
<col style="width: 70px;">
<col style="width: 70px;">
<col style="width: 70px;">
<col style="width: 70px;">
<col style="width: 70px;">
<col style="width: 70px;">
<col style="width: 70px;">
<col style="width: 70px;">
<col style="width: 70px;">
<col style="width: 70px;">
<col style="width: 70px;">
<col style="width: 70px;">
<col style="width: 70px;">
<col style="width: 70px;">
<col style="width: 70px;">
<col style="width: 70px;">
<col style="width: 70px;">
<col style="width: 70px;">
<col style="width: 70px;">
</colgroup>
<thead>
<tr>
<th class="">
<div class="relative">
<span class="colHeader cornerHeader"> </span>
</div>
</th>
<th class="">
<div class="relative">
<span class="colHeader">Division, Base Alternative</span>
</div>
</th>
<th class="">
<div class="relative">
<span class="colHeader">2015 (M)</span>
</div>
</th>
<th class="">
<div class="relative">
<span class="colHeader">2016 (N)</span>
</div>
</th>
<th class="">
<div class="relative">
<span class="colHeader">2017 (O)</span>
</div>
</th>
<th class="">
<div class="relative">
<span class="colHeader">2018 (P)</span>
</div>
</th>
<th class="">
<div class="relative">
<span class="colHeader">2019 (Q)</span>
</div>
</th>
</tr>
<thead>
<tr> <th class="" style="height: 17px;">
<div class="relative">
<span class="rowHeader">1</span>
</div>
</th>
<td class="htDimmed htNoWrap"></td>
<td class="afterHiddenColumn htDimmed htNoWrap"></td>
<td class="htDimmed htNoWrap"></td>
<td class="htDimmed htNoWrap"></td>
<td class="htDimmed htNoWrap"></td>
<td class="htDimmed htNoWrap"></td>
<td class="htDimmed htNoWrap"></td>
<td class="htDimmed htNoWrap"></td>
<td class="htDimmed htNoWrap"></td>
<td class="htDimmed htNoWrap"></td>
<td class="htDimmed htNoWrap"></td>
<td class="htDimmed htNoWrap"></td>
<td class="htDimmed htNoWrap"></td>
<td class="htDimmed htNoWrap"></td>
<td class="htDimmed htNoWrap"></td>
<td class="htDimmed htNoWrap"></td>
<tr>
<th class="" style="height: 17px;">
<div class="relative">
<span class="rowHeader">2</span>
</div>
</th>
<td class="htDimmed htNoWrap" title="" style="background-color: rgb(194, 218, 254); font-size: 10pt; color: rgb(0, 0, 0); text-align: left;"></td>
<td class="afterHiddenColumn htDimmed htNoWrap" title="" style="background-color: rgb(255, 255, 255); font-size: 10pt; color: rgb(0, 0, 0); text-align: center;"></td>
<td class="htDimmed htNoWrap" title="" style="background-color: rgb(255, 255, 255); font-size: 10pt; color: rgb(0, 0, 0); text-align: center;"></td>
<td class="htDimmed htNoWrap" title="" style="background-color: rgb(255, 255, 255); font-size: 10pt; color: rgb(0, 0, 0); text-align: center;"></td>
<td class="htDimmed htNoWrap" title="" style="background-color: rgb(255, 255, 255); font-size: 10pt; color: rgb(0, 0, 0); text-align: center;"></td>
<td class="htDimmed htNoWrap" title="" style="background-color: rgb(255, 255, 255); font-size: 10pt; color: rgb(0, 0, 0); text-align: center;"></td>
</tr>
<tr>
<th class="" style="height: 17px;">
<div class="relative">
<span class="rowHeader">3</span>
</div>
</th>
<td class="htNoWrap" title="" style="background-color: rgb(194, 218, 254); font-size: 10pt; color: rgb(0, 0, 0); text-align: left;">Model period</td>
<td class="afterHiddenColumn htRight htNumeric htNoWrap" title="" style="background-color: rgb(255, 255, 128); font-size: 10pt; color: rgb(0, 0, 0); text-align: center;">0</td>
<td class="htRight htNumeric htNoWrap" title="" style="background-color: rgb(255, 255, 128); font-size: 10pt; color: rgb(0, 0, 0); text-align: center;">0</td>
<td class="htRight htNumeric htNoWrap" title="" style="background-color: rgb(255, 255, 128); font-size: 10pt; color: rgb(0, 0, 0); text-align: center;">0</td>
<td class="htRight htNumeric htNoWrap" title="" style="background-color: rgb(255, 255, 128); font-size: 10pt; color: rgb(0, 0, 0); text-align: center;">0</td>
<td class="htRight htNumeric htNoWrap" title="" style="background-color: rgb(255, 255, 128); font-size: 10pt; color: rgb(0, 0, 0); text-align: center;">0</td>
<td class="htRight htNumeric htNoWrap" title="" style="background-color: rgb(255, 255, 128); font-size: 10pt; color: rgb(0, 0, 0); text-align: center;">0</td>
<td class="htRight htNumeric htNoWrap" title="" style="background-color: rgb(255, 255, 128); font-size: 10pt; color: rgb(0, 0, 0); text-align: center;">0</td>
<td class="htRight htNumeric htNoWrap" title="" style="background-color: rgb(255, 255, 128); font-size: 10pt; color: rgb(0, 0, 0); text-align: center;">0</td>
</tr>
<tr>
<th class="" style="height: 17px;">
<div class="relative">
<span class="rowHeader">7</span>
</div>
</th>
<td class="htNoWrap" title="" style="background-color: rgb(194, 218, 254); font-size: 10pt; color: rgb(0, 0, 0); text-align: left;">Pulp sales, pulp mill, MUSD</td>
<td class="afterHiddenColumn htNoWrap" title="" style="background-color: rgb(255, 255, 255); font-size: 10pt; color: rgb(0, 0, 0); text-align: center;">0</td>
<td class="htNoWrap" title="" style="background-color: rgb(255, 255, 255); font-size: 10pt; color: rgb(0, 0, 0); text-align: center;">0</td>
<td class="htNoWrap" title="" style="background-color: rgb(255, 255, 255); font-size: 10pt; color: rgb(0, 0, 0); text-align: center;">597</td>
<td class="htNoWrap" title="" style="background-color: rgb(255, 255, 255); font-size: 10pt; color: rgb(0, 0, 0); text-align: center;">572</td>
<td class="htNoWrap" title="" style="background-color: rgb(255, 255, 255); font-size: 10pt; color: rgb(0, 0, 0); text-align: center;">648</td>
<td class="htNoWrap" title="" style="background-color: rgb(255, 255, 255); font-size: 10pt; color: rgb(0, 0, 0); text-align: center;">35</td>
<td class="htNoWrap" title="" style="background-color: rgb(255, 255, 255); font-size: 10pt; color: rgb(0, 0, 0); text-align: center;">326</td>
<td class="htNoWrap" title="" style="background-color: rgb(255, 255, 255); font-size: 10pt; color: rgb(0, 0, 0); text-align: center;">326</td>
</tr>
<tr>
<th class="" style="height: 17px;">
<div class="relative">
<span class="rowHeader">8</span>
</div>
</th>
<td class="htNoWrap" title="" style="background-color: rgb(194, 218, 254); font-size: 10pt; color: rgb(0, 0, 0); text-align: left;">Pulp sales, paper mill, MUSD</td>
<td class="afterHiddenColumn htNoWrap" title="" style="background-color: rgb(255, 255, 255); font-size: 10pt; color: rgb(0, 0, 0); text-align: center;">0</td>
<td class="htNoWrap" title="" style="background-color: rgb(255, 255, 255); font-size: 10pt; color: rgb(0, 0, 0); text-align: center;">0</td>
<td class="htNoWrap" title="" style="background-color: rgb(255, 255, 255); font-size: 10pt; color: rgb(0, 0, 0); text-align: center;">0</td>
<td class="htNoWrap" title="" style="background-color: rgb(255, 255, 255); font-size: 10pt; color: rgb(0, 0, 0); text-align: center;">0</td>
<td class="htNoWrap" title="" style="background-color: rgb(255, 255, 255); font-size: 10pt; color: rgb(0, 0, 0); text-align: center;">0</td>
<td class="htNoWrap" title="" style="background-color: rgb(255, 255, 255); font-size: 10pt; color: rgb(0, 0, 0); text-align: center;">0</td>
<td class="htNoWrap" title="" style="background-color: rgb(255, 255, 255); font-size: 10pt; color: rgb(0, 0, 0); text-align: center;">0</td>
<td class="htNoWrap" title="" style="background-color: rgb(255, 255, 255); font-size: 10pt; color: rgb(0, 0, 0); text-align: center;">0</td>
<td class="htNoWrap" title="" style="background-color: rgb(255, 255, 255); font-size: 10pt; color: rgb(0, 0, 0); text-align: center;">0</td>
</tr>
</thead>
I have tried a bunch of different methods, but I can't understand how to do it.
I want to find a cell, based on line 7 & column name 2017 (O) for example, and get
its value
The only example I find, is to find a cell by its index or by its value, that's not what I want to accomplish.
Thanks in advance!
EDIT
I have provided more HTML code.
I assume you want to get to the span element using the 'Site 2' text, right?
if so, you can locate it by using these xpaths
//div[text() = 'Site 2']/parent::div/div[#class = 'flex-shrink-0']/tree-node-expander[1]/span[1]
or
//div[text() = 'Site 2']/preceding-sibling::div/tree-node-expander[1]/span[1]
Try like this:
el = driver.find_element_by_xpath("//div[contains(text(), 'Site 2')]/preceding::div//span")
print(el.get_attribite("class")
or try parent element:
el = driver.find_element_by_xpath("//div[contains(text(), 'Site 2')]/parent::div//span")
or preceding-sibling:
el = driver.find_element_by_xpath("//div[contains(text(), 'Site 2')]/preceding-sibling::div//span")
Commenting on your requirement:
I want to find a cell, based on line 7 & column name 2017 (O) for example, and get its value
For this what you can do is grab the Column headers then get index of the column to map it in Xpath as column number based on the column name.
Now by using this column number you will be able to get the row 7 values. Something suggested in the below code:
columnName = '2017 (O)'
rowname = '7'
driver.get(URL)
columnHeader = driver.find_elements_by_xpath('//div/span[#class="colHeader"]')
headerlist = []
# Get Element header
for element in columnHeader:
headerlist.append(element.text)
print(element.text)
index = headerlist.index('2017 (O)') + 1
text = driver.find_element_by_xpath("//div/span[#class='rowHeader' and text()='" + rowname + "']/ancestor::th/following-sibling::td[" + str(index) + "]").text
print(text)
Output -
And the table structure is -
HTML
<div id="model" class="mobile handsontable htRowHeaders htColumnHeaders" style="height: 100%; overflow: hidden;" data-originalstyle="height: 100%; overflow: hidden;">
<div class="ht_master handsontable"><div class="wtHolder" style="position: relative; height: 380px; width: 1468px;">
<div class="wtHider" style="width: 2550px; height: 2109px;">
<div class="wtSpreader" style="position: relative; top: 0px; left: 0px;">
<table class="htCore">
<colgroup>
<col class="rowHeader" style="width: 50px;">
<col style="width: 400px;">
<col style="width: 70px;">
<col style="width: 70px;">
<col style="width: 70px;">
<col style="width: 70px;">
<col style="width: 70px;">
<col style="width: 70px;">
<col style="width: 70px;">
<col style="width: 70px;">
<col style="width: 70px;">
<col style="width: 70px;">
<col style="width: 70px;">
<col style="width: 70px;">
<col style="width: 70px;">
<col style="width: 70px;">
<col style="width: 70px;">
<col style="width: 70px;">
<col style="width: 70px;">
<col style="width: 70px;">
<col style="width: 70px;">
<col style="width: 70px;">
<col style="width: 70px;">
<col style="width: 70px;">
<col style="width: 70px;">
<col style="width: 70px;">
<col style="width: 70px;">
</colgroup>
<thead>
<tr>
<th class="">
<div class="relative">
<span class="colHeader cornerHeader"> </span>
</div>
</th>
<th class="">
<div class="relative">
<span class="colHeader">Division, Base Alternative</span>
</div>
</th>
<th class="">
<div class="relative">
<span class="colHeader">2015 (M)</span>
</div>
</th>
<th class="">
<div class="relative">
<span class="colHeader">2016 (N)</span>
</div>
</th>
<th class="">
<div class="relative">
<span class="colHeader">2017 (O)</span>
</div>
</th>
<th class="">
<div class="relative">
<span class="colHeader">2018 (P)</span>
</div>
</th>
<th class="">
<div class="relative">
<span class="colHeader">2019 (Q)</span>
</div>
</th>
</tr>
<thead>
<tr> <th class="" style="height: 17px;">
<div class="relative">
<span class="rowHeader">1</span>
</div>
</th>
<td class="htDimmed htNoWrap"></td>
<td class="afterHiddenColumn htDimmed htNoWrap"></td>
<td class="htDimmed htNoWrap"></td>
<td class="htDimmed htNoWrap"></td>
<td class="htDimmed htNoWrap"></td>
<td class="htDimmed htNoWrap"></td>
<td class="htDimmed htNoWrap"></td>
<td class="htDimmed htNoWrap"></td>
<td class="htDimmed htNoWrap"></td>
<td class="htDimmed htNoWrap"></td>
<td class="htDimmed htNoWrap"></td>
<td class="htDimmed htNoWrap"></td>
<td class="htDimmed htNoWrap"></td>
<td class="htDimmed htNoWrap"></td>
<td class="htDimmed htNoWrap"></td>
<td class="htDimmed htNoWrap"></td>
<tr>
<th class="" style="height: 17px;">
<div class="relative">
<span class="rowHeader">2</span>
</div>
</th>
<td class="htDimmed htNoWrap" title="" style="background-color: rgb(194, 218, 254); font-size: 10pt; color: rgb(0, 0, 0); text-align: left;"></td>
<td class="afterHiddenColumn htDimmed htNoWrap" title="" style="background-color: rgb(255, 255, 255); font-size: 10pt; color: rgb(0, 0, 0); text-align: center;"></td>
<td class="htDimmed htNoWrap" title="" style="background-color: rgb(255, 255, 255); font-size: 10pt; color: rgb(0, 0, 0); text-align: center;"></td>
<td class="htDimmed htNoWrap" title="" style="background-color: rgb(255, 255, 255); font-size: 10pt; color: rgb(0, 0, 0); text-align: center;"></td>
<td class="htDimmed htNoWrap" title="" style="background-color: rgb(255, 255, 255); font-size: 10pt; color: rgb(0, 0, 0); text-align: center;"></td>
<td class="htDimmed htNoWrap" title="" style="background-color: rgb(255, 255, 255); font-size: 10pt; color: rgb(0, 0, 0); text-align: center;"></td>
</tr>
<tr>
<th class="" style="height: 17px;">
<div class="relative">
<span class="rowHeader">3</span>
</div>
</th>
<td class="htNoWrap" title="" style="background-color: rgb(194, 218, 254); font-size: 10pt; color: rgb(0, 0, 0); text-align: left;">Model period</td>
<td class="afterHiddenColumn htRight htNumeric htNoWrap" title="" style="background-color: rgb(255, 255, 128); font-size: 10pt; color: rgb(0, 0, 0); text-align: center;">0</td>
<td class="htRight htNumeric htNoWrap" title="" style="background-color: rgb(255, 255, 128); font-size: 10pt; color: rgb(0, 0, 0); text-align: center;">0</td>
<td class="htRight htNumeric htNoWrap" title="" style="background-color: rgb(255, 255, 128); font-size: 10pt; color: rgb(0, 0, 0); text-align: center;">0</td>
<td class="htRight htNumeric htNoWrap" title="" style="background-color: rgb(255, 255, 128); font-size: 10pt; color: rgb(0, 0, 0); text-align: center;">0</td>
<td class="htRight htNumeric htNoWrap" title="" style="background-color: rgb(255, 255, 128); font-size: 10pt; color: rgb(0, 0, 0); text-align: center;">0</td>
<td class="htRight htNumeric htNoWrap" title="" style="background-color: rgb(255, 255, 128); font-size: 10pt; color: rgb(0, 0, 0); text-align: center;">0</td>
<td class="htRight htNumeric htNoWrap" title="" style="background-color: rgb(255, 255, 128); font-size: 10pt; color: rgb(0, 0, 0); text-align: center;">0</td>
<td class="htRight htNumeric htNoWrap" title="" style="background-color: rgb(255, 255, 128); font-size: 10pt; color: rgb(0, 0, 0); text-align: center;">0</td>
</tr>
<tr>
<th class="" style="height: 17px;">
<div class="relative">
<span class="rowHeader">7</span>
</div>
</th>
<td class="htNoWrap" title="" style="background-color: rgb(194, 218, 254); font-size: 10pt; color: rgb(0, 0, 0); text-align: left;">Pulp sales, pulp mill, MUSD</td>
<td class="afterHiddenColumn htNoWrap" title="" style="background-color: rgb(255, 255, 255); font-size: 10pt; color: rgb(0, 0, 0); text-align: center;">0</td>
<td class="htNoWrap" title="" style="background-color: rgb(255, 255, 255); font-size: 10pt; color: rgb(0, 0, 0); text-align: center;">0</td>
<td class="htNoWrap" title="" style="background-color: rgb(255, 255, 255); font-size: 10pt; color: rgb(0, 0, 0); text-align: center;">597</td>
<td class="htNoWrap" title="" style="background-color: rgb(255, 255, 255); font-size: 10pt; color: rgb(0, 0, 0); text-align: center;">572</td>
<td class="htNoWrap" title="" style="background-color: rgb(255, 255, 255); font-size: 10pt; color: rgb(0, 0, 0); text-align: center;">648</td>
<td class="htNoWrap" title="" style="background-color: rgb(255, 255, 255); font-size: 10pt; color: rgb(0, 0, 0); text-align: center;">35</td>
<td class="htNoWrap" title="" style="background-color: rgb(255, 255, 255); font-size: 10pt; color: rgb(0, 0, 0); text-align: center;">326</td>
<td class="htNoWrap" title="" style="background-color: rgb(255, 255, 255); font-size: 10pt; color: rgb(0, 0, 0); text-align: center;">326</td>
</tr>
<tr>
<th class="" style="height: 17px;">
<div class="relative">
<span class="rowHeader">8</span>
</div>
</th>
<td class="htNoWrap" title="" style="background-color: rgb(194, 218, 254); font-size: 10pt; color: rgb(0, 0, 0); text-align: left;">Pulp sales, paper mill, MUSD</td>
<td class="afterHiddenColumn htNoWrap" title="" style="background-color: rgb(255, 255, 255); font-size: 10pt; color: rgb(0, 0, 0); text-align: center;">0</td>
<td class="htNoWrap" title="" style="background-color: rgb(255, 255, 255); font-size: 10pt; color: rgb(0, 0, 0); text-align: center;">0</td>
<td class="htNoWrap" title="" style="background-color: rgb(255, 255, 255); font-size: 10pt; color: rgb(0, 0, 0); text-align: center;">0</td>
<td class="htNoWrap" title="" style="background-color: rgb(255, 255, 255); font-size: 10pt; color: rgb(0, 0, 0); text-align: center;">0</td>
<td class="htNoWrap" title="" style="background-color: rgb(255, 255, 255); font-size: 10pt; color: rgb(0, 0, 0); text-align: center;">0</td>
<td class="htNoWrap" title="" style="background-color: rgb(255, 255, 255); font-size: 10pt; color: rgb(0, 0, 0); text-align: center;">0</td>
<td class="htNoWrap" title="" style="background-color: rgb(255, 255, 255); font-size: 10pt; color: rgb(0, 0, 0); text-align: center;">0</td>
<td class="htNoWrap" title="" style="background-color: rgb(255, 255, 255); font-size: 10pt; color: rgb(0, 0, 0); text-align: center;">0</td>
<td class="htNoWrap" title="" style="background-color: rgb(255, 255, 255); font-size: 10pt; color: rgb(0, 0, 0); text-align: center;">0</td>
</tr>
</thead>
I want to get a specific cell in this grid shown below
based on the row number on the left side and the column names on the top.
Let say I want to get the cell with the value 597
Instead of finding it with the value/text or index,
I want to get the cell based on the row number & column name
Something like this
find the cell that has the row number of 7 and column name of 2017 (O)
Expected Output
597
If the row or column is selected
they get a class named ht__highlight
If the cell is selected
it gets a class named htNoWrap current highlight
I have searched for 2 days now,
and I can't find any answers or
examples of this situation
This is quite hard to explain so I apologize for the messy explanation
Thanks in Advance!!!!
The approach should look like this:
rows = driver.find_elements_by_css_selector(".rowHeader")
for row in rows:
row_2015 = row.find_element_by_css_selector(".colHeader:nth-of-type(1)").text
row_2016 = row.find_element_by_css_selector(".colHeader:nth-of-type(2)").text
You get all rows by finding a unique row locator (for example, for 15 rows - 15 results)
You loop through table rows and get individual cells text.
In other words, you css selector should look like this:
expected_text = driver.find_element_by_css_selector(".rowHeader:nth-of-type(1)>.colHeader:nth-of-type(1)").text
Without full HTML page it's hard to say for sure.
Try 2:
With the conditions you specified use the following xpath:
//span[contains(text(),'2017 (O)')]/../../../../../thead[2]/tr[4]/td[4]
It directly locates cell with tha value 597 by column name.
First it finds the column name, then goes few level up in order to obtain values from the second thead.
Try changing tr[4]/td[4] part and you will get any value from the table by xpath.
Here is a HTML table:
<table width="100%" cellpadding="4" cellspacing="0" style="page-break-before: always">
<col width="32*"/>
<col width="32*"/>
<col width="32*"/>
<col width="32*"/>
<col width="32*"/>
<col width="32*"/>
<col width="32*"/>
<col width="32*"/>
<tr valign="top">
<td colspan="2" width="25%" style="background: transparent" style="border: none; padding: 0cm"><p align="left" style="font-variant: normal; font-style: normal; font-weight: normal; text-decoration: none">
<font color="#000000"><font face="Liberation Serif, serif"><font size="3" style="font-size: 12pt">A</font></font></font></p>
</td>
<td colspan="2" width="25%" style="background: transparent" style="border: none; padding: 0cm"><p align="left" style="font-variant: normal; font-style: normal; font-weight: normal; text-decoration: none">
<font color="#000000"><font face="Liberation Serif, serif"><font size="3" style="font-size: 12pt">B</font></font></font></p>
</td>
<td colspan="2" width="25%" style="background: transparent" style="border: none; padding: 0cm"><p align="left" style="font-variant: normal; font-style: normal; font-weight: normal; text-decoration: none">
<font color="#000000"><font face="Liberation Serif, serif"><font size="3" style="font-size: 12pt">C</font></font></font></p>
</td>
<td colspan="2" width="25%" style="background: transparent" style="border: none; padding: 0cm"><p align="left" style="font-variant: normal; font-style: normal; font-weight: normal; text-decoration: none">
<font color="#000000"><font face="Liberation Serif, serif"><font size="3" style="font-size: 12pt">D</font></font></font></p>
</td>
</tr>
<tr valign="top">
<td colspan="2" width="25%" style="background: transparent" style="border: none; padding: 0cm"><p align="left" style="font-variant: normal; font-style: normal; font-weight: normal; text-decoration: none">
<font color="#000000"><font face="Liberation Serif, serif"><font size="3" style="font-size: 12pt">E</font></font></font></p>
</td>
<td colspan="2" width="25%" style="background: transparent" style="border: none; padding: 0cm"><p align="left" style="font-variant: normal; font-style: normal; font-weight: normal; text-decoration: none">
<font color="#000000"><font face="Liberation Serif, serif"><font size="3" style="font-size: 12pt">F</font></font></font></p>
</td>
<td colspan="2" width="25%" style="background: transparent" style="border: none; padding: 0cm"><p align="left" style="font-variant: normal; font-style: normal; font-weight: normal; text-decoration: none">
<font color="#000000"><font face="Liberation Serif, serif"><font size="3" style="font-size: 12pt">G</font></font></font></p>
</td>
<td colspan="2" width="25%" style="background: transparent" style="border: none; padding: 0cm"><p align="left" style="font-variant: normal; font-style: normal; font-weight: normal; text-decoration: none">
<font color="#000000"><font face="Liberation Serif, serif"><font size="3" style="font-size: 12pt">H</font></font></font></p>
</td>
</tr>
<tr valign="top">
<td colspan="2" width="25%" style="background: transparent" style="border: none; padding: 0cm"><p align="left" style="font-variant: normal; font-style: normal; font-weight: normal; text-decoration: none">
<font color="#000000"><font face="Liberation Serif, serif"><font size="3" style="font-size: 12pt">I</font></font></font></p>
</td>
<td colspan="2" width="25%" style="background: transparent" style="border: none; padding: 0cm"><p align="left" style="font-variant: normal; font-style: normal; font-weight: normal; text-decoration: none">
<font color="#000000"><font face="Liberation Serif, serif"><font size="3" style="font-size: 12pt">J</font></font></font></p>
</td>
<td colspan="2" width="25%" style="background: transparent" style="border: none; padding: 0cm"><p align="left" style="font-variant: normal; font-style: normal; font-weight: normal; text-decoration: none">
<font color="#000000"><font face="Liberation Serif, serif"><font size="3" style="font-size: 12pt">K</font></font></font></p>
</td>
<td colspan="2" width="25%" style="background: transparent" style="border: none; padding: 0cm"><p align="left" style="font-variant: normal; font-style: normal; font-weight: normal; text-decoration: none">
<font color="#000000"><font face="Liberation Serif, serif"><font size="3" style="font-size: 12pt">L</font></font></font></p>
</td>
</tr>
<tr valign="top">
<td width="12%" style="background: transparent" style="border: none; padding: 0cm"><p align="left" style="font-variant: normal; font-style: normal; font-weight: normal; text-decoration: none">
<font color="#000000"><font face="Liberation Serif, serif"><font size="3" style="font-size: 12pt">M</font></font></font></p>
</td>
<td width="13%" style="background: transparent" style="border: none; padding: 0cm"><p lang="ru-RU" align="left" style="font-variant: normal; font-style: normal; font-weight: normal; text-decoration: none">
<font color="#000000"><font face="Liberation Serif, serif"><font size="3" style="font-size: 12pt">M2</font></font></font></p>
</td>
<td width="12%" style="background: transparent" style="border: none; padding: 0cm"><p align="left" style="font-variant: normal; font-style: normal; font-weight: normal; text-decoration: none">
<font color="#000000"><font face="Liberation Serif, serif"><font size="3" style="font-size: 12pt">N</font></font></font></p>
</td>
<td width="13%" style="background: transparent" style="border: none; padding: 0cm"><p lang="ru-RU" align="left" style="font-variant: normal; font-style: normal; font-weight: normal; text-decoration: none">
<font color="#000000"><font face="Liberation Serif, serif"><font size="3" style="font-size: 12pt">N2</font></font></font></p>
</td>
<td width="12%" style="background: transparent" style="border: none; padding: 0cm"><p align="left" style="font-variant: normal; font-style: normal; font-weight: normal; text-decoration: none">
<font color="#000000"><font face="Liberation Serif, serif"><font size="3" style="font-size: 12pt">O</font></font></font></p>
</td>
<td width="13%" style="background: transparent" style="border: none; padding: 0cm"><p lang="ru-RU" align="left" style="font-variant: normal; font-style: normal; font-weight: normal; text-decoration: none">
<font color="#000000"><font face="Liberation Serif, serif"><font size="3" style="font-size: 12pt">O2</font></font></font></p>
</td>
<td width="12%" style="background: transparent" style="border: none; padding: 0cm"><p align="left" style="font-variant: normal; font-style: normal; font-weight: normal; text-decoration: none">
<font color="#000000"><font face="Liberation Serif, serif"><font size="3" style="font-size: 12pt">P</font></font></font></p>
</td>
<td width="13%" style="background: transparent" style="border: none; padding: 0cm"><p lang="ru-RU" align="left" style="font-variant: normal; font-style: normal; font-weight: normal; text-decoration: none">
<font color="#000000"><font face="Liberation Serif, serif"><font size="3" style="font-size: 12pt">P2</font></font></font></p>
</td>
</tr>
</table>
The last row here has 2x more columns than others. When I'm trying to read it into the Pandas dataframe I get this result:
table = pd.read_html('1111.html')
table[0]
0 1 2 3 4 5 6 7
0 A A B B C C D D
1 E E F F G G H H
2 I I J J K K L L
3 M M2 N N2 O O2 P P2
How to read it correctly, without dubbing? I don't need the last row.
You can use BeautifulSoup to parse the table and then convert the results to a dataframe:
import pandas as pd
from bs4 import BeautifulSoup as soup
df = pd.DataFrame([[k[1:-1] for i in b.find_all('td') if (k:=i.text) is not None] for b in soup(html, 'html.parser').table.find_all('tr')])
Output:
0 1 2 3 4 5 6 7
0 A B C D None None None None
1 E F G H None None None None
2 I J K L None None None None
3 M M2 N N2 O O2 P P2
Edit: solution without assignment expression:
df = pd.DataFrame([[i.text[1:-1] if i else i for i in b.find_all('td')] for b in soup(html, 'html.parser').table.find_all('tr')])
Output:
0 1 2 3 4 5 6 7
0 A B C D None None None None
1 E F G H None None None None
2 I J K L None None None None
3 M M2 N N2 O O2 P P2
the available days has a class .calendarCellOpen:
table.calendario .calendarCellOpen input {
}
Here it is the calendar css:
#calwrapper
{
min-height:230px;
margin-top:10px;
}
#calendar
{
float:left;
margin-left: 15px; /*Daniele 10-04-2014*/
}
span.calendario
{
display:block;
margin:0;
}
table.fasce
{
margin-left:20px;
}
table.fasce th
{
background-image: url( '../images/tab_body.png' );
background-repeat: repeat-x;
font-size:12px;
}
table.fasce tr
{
border-bottom: #f5f4e7 thin dotted;
}
table.calendario
{
border-top: 0px !important;
}
table.calendario, table.fasce
{
width: 300px;
background-color: White !important;
font-size: 15px;
border-right: #f5f4e7 1px solid !important;
border-left: #f5f4e7 1px solid !important;
border-bottom: #f5f4e7 1px solid !important;
}
table.calendario td, table.fasce td
{
text-align:center;
}
table.calendario .calTitolo
{
background-image: url( '../images/tab_body.png' );
background-repeat: repeat-x;
margin: 0px !important;
padding: 0px !important;
font-size:12px;
}
table.calendario .calTitolo td
{
padding:0px 5px 0px 5px;
width:14.3%;
}
table.calendario .calDayHeader /* RIGA */
{
background-color:#FCFBF7;
font-size:12px;
}
table.calendario .otherMonthDay
{
color: #C0C0C0;
}
table.calendario .cellaSelezionata /* CELLA */
{
background-color:#EDEBD5 !important;
border-collapse:collapse !important;
font-weight:bold;
}
table.calendario .calendarCellOpen input
{
color:#208020 !important; /*High availability (green)*/
font-weight:bold;
}
table.calendario .calendarCellRed
{
color:Red !important; /*noe availability*/
font-weight:bold;
}
table.calendario .calendarCellMed input
{
color:#F09643 !important; /*Disponibilità media*/
font-weight:bold;
}
.pulsanteCalendario
{
border: 0px;
background-color: Transparent;
cursor: pointer;
padding: 0px 0px 0px 0px;
margin: 0px;
height:20px;
width:100%;
overflow:visible;
text-align:center;
font-size:16px;
}
.pulsanteCalendario:hover
{
text-decoration:underline;
}
#legend
{
margin-bottom:8px;
width:100%;
}
#legend ul
{
list-style-type:none;
}
#legend ul li
{
display:inline;
margin-left:20px;
}
The thing is that i want to select (clicking on it with Selenium) the day available(doesn`t matter which day).Just any day which appears to be available(green).
Here is the calendar:
elementos = driver.find_elements_by_class_name("calendarCellOpen")
while True:
if elementos:
driver.find_element_by_class_name("calendarCellOpen").click()
driver.find_element_by_id("ctl00_ContentPlaceHolder1_acc_Calendario1_repFasce_ctl01_btnConferma").click() #confirm button
else:
driver.find_element_by_xpath("//input[#value='<']").click() #back
if elementos:
driver.find_element_by_class_name("calendarCellOpen").click()
driver.find_element_by_id("ctl00_ContentPlaceHolder1_acc_Calendario1_repFasce_ctl01_btnConferma").click()
driver.find_element_by_xpath("//input[#value='>']").click() #forward
if elementos:
driver.find_element_by_class_name("calendarCellOpen").click()
driver.find_element_by_id("ctl00_ContentPlaceHolder1_acc_Calendario1_repFasce_ctl01_btnConferma").click()
This some code i made
I made back and foward because is th only way to reload the calendar..
This is the HTML of the calendar:
<div id="calwrapper">
<div id="legend" style="padding-left:15px; margin-bottom:20px">
<table style="width:90%; border-collapse:collapse; border: 0px">
<tr style="line-height:15px">
<td style="background-color:Red; width:80px; margin-right:10px">
</td>
<td style="width: 383px; padding-left:5px">
Tutto occupato # all none available
</td>
<td style="background-color:#F09643; width:80px">
</td>
<td style="width: 450px; padding-left:5px">
Media disponibilità #half available
</td>
<td style="background-color:#058d08; width:80px">
</td>
<td style="width: 383px; padding-left:5px">
Posti disponibili #available
</td>
<td style="background-color:#000000; width:80px">
</td>
<td style="width: 383px; padding-left:5px">
Non disponibile # none available
</td>
</tr>
</table>
</div>
<div id="calendar">
<span id="ctl00_ContentPlaceHolder1_acc_Calendario1_myCalendario1"
class="calendario">
<table class="calendario" summary="Summary" cellspacing="0">
<caption>Calendario eventi</caption>
<tr class="calTitolo">
<th>
<input type="submit"
name="ctl00$ContentPlaceHolder1$acc_Calendario1$myCalendario1$ctl01"
value="<" title="Clicca qui per andare al mese precedente"
class="pulsanteCalendario" />
</th>
<th colspan="5">
<span>agosto, 2017</span>
</th>
<th>
<input type="submit"
name="ctl00$ContentPlaceHolder1$acc_Calendario1$myCalendario1$ctl03"
value=">" title="Clicca qui per andare al mese successivo"
class="pulsanteCalendario" />
</th>
</tr>
<tr>
<th class="calDayHeader" scope="col">lun</th>
<th class="calDayHeader"
scope="col">mar</th>
<th class="calDayHeader" scope="col">mer</th>
<th class="calDayHeader" scope="col">gio</th>
<th class="calDayHeader" scope="col">ven</th>
<th class="calDayHeader" scope="col">sab</th>
<th class="calDayHeader" scope="col">dom</th>
</tr>
<tr>
<td title="Giorno non disponibile" class="otherMonthDay">31</td>
<td title="Tutto occupato" class="calendarCellRed">1</td>
<td title="Giorno non disponibile" class="noSelectableDay">2</td>
<td title="Tutto occupato" class="calendarCellRed">3</td>
<td title="Tutto occupato" class="calendarCellRed">4</td>
<td title="Giorno non disponibile" class="noSelectableDay">5</td>
<td title="Giorno non disponibile" class="noSelectableDay">6</td>
</tr>
<tr>
<td title="Tutto occupato" class="calendarCellRed">7</td>
<td class="calendarCellOpen">
<input type="submit"
name="ctl00$ContentPlaceHolder1$acc_Calendario1$myCalendario1$ctl12"
value="8" title="8 agosto 2017, Posti disponibili"
class="pulsanteCalendario" />
</td>
<td class="calendarCellOpen">
<input type="submit"
name="ctl00$ContentPlaceHolder1$acc_Calendario1$myCalendario1$ctl12"
value="8" title="8 agosto 2017, Posti disponibili"
class="pulsanteCalendario" />
</td>
<td class="calendarCellOpen">
<input type="submit"
name="ctl00$ContentPlaceHolder1$acc_Calendario1$myCalendario1$ctl12"
value="8" title="8 agosto 2017, Posti disponibili"
class="pulsanteCalendario" />
</td>
<td class="calendarCellOpen">
<input type="submit"
name="ctl00$ContentPlaceHolder1$acc_Calendario1$myCalendario1$ctl12"
value="8" title="8 agosto 2017, Posti disponibili"
class="pulsanteCalendario" />
</td>
<td title="Giorno non disponibile" class="noSelectableDay">9</td>
<td title="Giorno non disponibile" class="noSelectableDay">10</td>
</tr><tr>
<td title="Giorno non disponibile" class="noSelectableDay">14</td>
<td title="Giorno non disponibile" class="noSelectableDay">15</td>
<td title="Giorno non disponibile" class="noSelectableDay">16</td>
<td title="Giorno non disponibile" class="noSelectableDay">17</td>
<td title="Giorno non disponibile" class="noSelectableDay">18</td>
<td title="Giorno non disponibile" class="noSelectableDay">19</td>
<td title="Giorno non disponibile" class="noSelectableDay">20</td>
</tr><tr>
<td title="Giorno non disponibile" class="noSelectableDay">21</td>
<td title="Giorno non disponibile" class="noSelectableDay">22</td>
<td title="Giorno non disponibile" class="noSelectableDay">23</td>
<td title="Giorno non disponibile" class="noSelectableDay">24</td>
<td title="Giorno non disponibile" class="noSelectableDay">25</td>
<td title="Giorno non disponibile" class="noSelectableDay">26</td>
<td title="Giorno non disponibile" class="noSelectableDay">27</td>
</tr><tr>
<td title="Giorno non disponibile" class="noSelectableDay">28</td>
<td title="Giorno non disponibile" class="noSelectableDay">29</td>
<td title="Giorno non disponibile" class="noSelectableDay">30</td>
<td title="Giorno non disponibile" class="noSelectableDay">31</td>
<td title="Giorno non disponibile" class="otherMonthDay">1</td>
<td title="Giorno non disponibile" class="otherMonthDay">2</td>
<td title="Giorno non disponibile" class="otherMonthDay">3</td>
</tr></table></span>
</div>
<div id="orari" >
<input type="hidden"
name="ctl00$ContentPlaceHolder1$acc_Calendario1$HiddenField1"
id="ctl00_ContentPlaceHolder1_acc_Calendario1_HiddenField1" />
</div>
</div>
This is what i gain to do, but im not quite sure that this is going to work:
while True:
for dates in elementos:
if dates.is_enabled():
dates.click()
driver.find_element_by_id("ctl00_ContentPlaceHolder1_acc_Calendario1_repFasce_ctl01_btnConferma").click()
#if elementos > 0:
#driver.find_element_by_class_name("calendarCellOpen").click()
#else:
driver.find_element_by_xpath("//input[#value='<']").click()
driver.find_element_by_xpath("//input[#value='>']").click()
So, here's my code:
link = "https://nookipedia.com/w/api.php?action=query&list=categorymembers&cmtitle=Category:Insect&cmlimit=500&format=json"
async with aiohttp.get(link) as t:
result = await t.json()
foundCheck = False
for list in result["query"]["categorymembers"]:
print(list["title"])
if bug.lower() == list["title"].lower():
print(bug)
await self.bot.say("{} is a real bug".format(bug.title()))
bug2 = bug.replace(" ", "_")
url = "https://nookipedia.com/wiki/{}".format(bug2)
await self.bot.say(url)
async with aiohttp.get(url) as response:
soupObject = BeautifulSoup(await response.text(), "html.parser")
try:
info = soupObject.find(id="Infobox-bug").tr.td.get_text()
await self.bot.say("{}".format(info))
except:
await self.bot.say("Can't get the content from {}".format(url))
foundCheck = True
return
if not foundCheck:
await self.bot.say("That bug does not exist")
return
else:
await self.bot.say("Error")
and here's the html code i'm trying to get:
<table id="Infobox-bug" align="right" style="background: #adff2f; margin-left: 10px; margin-bottom: 10px; border-radius: 10px; -moz-border-radius: 10px; -webkit-border-radius: 10px; -khtml-border-radius: 10px; -icab-border-radius: 10px; -o-border-radius: 10px; border: 3px solid #9acd32; width: 25%">
<tr align="center">
<td colspan="2"> <big><big><b>Pill Bug</b></big></big>
</td></tr>
<tr align="center">
<td style="background: #caecc9; border-radius: 10px; -moz-border-radius: 10px; -webkit-border-radius: 10px; -khtml-border-radius: 10px; -icab-border-radius: 10px; -o-border-radius: 10px;" colspan="2"> <img alt="Pill Bug Picture.jpg" src="/w/images/b/bb/Pill_Bug_Picture.jpg" width="199" height="186" />
</td></tr>
<tr>
<th style="background: #86df2d; border-top-left-radius: 10px; -moz-border-radius-topleft: 10px; -webkit-border-top-left-radius: 10px; -khtml-border-top-left-radius: 10px; -icab-border-top-left-radius: 10px; -o-border-top-left-radius: 10px;" align="right"> Scientific name
</th>
<td style="background:#ffffff; border-top-right-radius: 10px; -moz-border-radius-topright: 10px; -webkit-border-top-right-radius: 10px; -khtml-border-top-right-radius: 10px; -icab-border-top-right-radius: 10px; -o-border-top-right-radius: 10px;" align="left"> <i>Armadillidium vulgare</i>
</td></tr>
<tr>
<th style="background: #86df2d" align="right"> Family
</th>
<td style="background:#ffffff" align="left"> <i>Armadillidiidae - Terrestrial Custaceans</i>
</td></tr>
<tr>
<th style="background: #86df2d" align="right"> Time of year
</th>
<td style="background:#ffffff" align="left"> All year
</td></tr>
<tr>
<th style="background: #86df2d" align="right"> Time of day
</th>
<td style="background:#ffffff" align="left"> All day
</td></tr>
<tr>
<th style="background: #86df2d" align="right"> Location
</th>
<td style="background:#ffffff" align="left"> Under rocks
</td></tr>
<tr>
<th style="background: #86df2d" align="right"> Size
</th>
<td style="background:#ffffff" align="left"> 2 mm
</td></tr>
<tr>
<th style="background: #86df2d" align="right"> Rarity
</th>
<td style="background:#ffffff" align="left"> Common
</td></tr>
<tr>
<th style="background: #86df2d" align="right"> Selling price
</th>
<td style="background:#ffffff" align="left"> 250 Bells
</td></tr>
<tr>
<th style="background: #86df2d; border-bottom-left-radius: 10px; -moz-border-radius-bottomleft: 10px; -webkit-border-bottom-left-radius: 10px; -khtml-border-bottom-left-radius: 10px; -icab-border-bottom-left-radius: 10px; -o-border-bottom-left-radius: 10px;" align="right"> Appearances
</th>
<td style="background:#ffffff; border-bottom-right-radius: 10px; -moz-border-radius-bottomright: 10px; -webkit-border-bottom-right-radius: 10px; -khtml-border-bottom-right-radius: 10px; -icab-border-bottom-right-radius: 10px; -o-border-bottom-right-radius: 10px;" align="left"> <i>Doubutsu no Mori</i>,<br /><i>Animal Crossing</i>,<br /><i>Animal Crossing: Wild World</i>,<br /><i>Animal Crossing: City Folk</i>,<br /><i>Animal Crossing: New Leaf</i>
</td></tr></table>
So, basically i got the "Pill Bug" (aka info) as it own string but i'm not sure how to get everything else after it (within the tr and td) without getting pill bug again? How would i do that so i can get each text as their own strings?
Thank you so much for the help.
BS has many methods to get tags and it parameters
soup.find(args)
soup.find_all(args)
soup.select(CSS_selection)
tag.get(param) or tag.get(param, default) or tag[param]
tag.text or tag.get_text()
tag.name
etc.
And find() / find_all() may use different arguments - so you have to read BS doc for more.
Example:
html = '''<table id="Infobox-bug" align="right" style="background: #adff2f; margin-left: 10px; margin-bottom: 10px; border-radius: 10px; -moz-border-radius: 10px; -webkit-border-radius: 10px; -khtml-border-radius: 10px; -icab-border-radius: 10px; -o-border-radius: 10px; border: 3px solid #9acd32; width: 25%">
<tr align="center">
<td colspan="2"> <big><big><b>Pill Bug</b></big></big>
</td></tr>
<tr align="center">
<td style="background: #caecc9; border-radius: 10px; -moz-border-radius: 10px; -webkit-border-radius: 10px; -khtml-border-radius: 10px; -icab-border-radius: 10px; -o-border-radius: 10px;" colspan="2"> <img alt="Pill Bug Picture.jpg" src="/w/images/b/bb/Pill_Bug_Picture.jpg" width="199" height="186" />
</td></tr>
<tr>
<th style="background: #86df2d; border-top-left-radius: 10px; -moz-border-radius-topleft: 10px; -webkit-border-top-left-radius: 10px; -khtml-border-top-left-radius: 10px; -icab-border-top-left-radius: 10px; -o-border-top-left-radius: 10px;" align="right"> Scientific name
</th>
<td style="background:#ffffff; border-top-right-radius: 10px; -moz-border-radius-topright: 10px; -webkit-border-top-right-radius: 10px; -khtml-border-top-right-radius: 10px; -icab-border-top-right-radius: 10px; -o-border-top-right-radius: 10px;" align="left"> <i>Armadillidium vulgare</i>
</td></tr>
<tr>
<th style="background: #86df2d" align="right"> Family
</th>
<td style="background:#ffffff" align="left"> <i>Armadillidiidae - Terrestrial Custaceans</i>
</td></tr>
<tr>
<th style="background: #86df2d" align="right"> Time of year
</th>
<td style="background:#ffffff" align="left"> All year
</td></tr>
<tr>
<th style="background: #86df2d" align="right"> Time of day
</th>
<td style="background:#ffffff" align="left"> All day
</td></tr>
<tr>
<th style="background: #86df2d" align="right"> Location
</th>
<td style="background:#ffffff" align="left"> Under rocks
</td></tr>
<tr>
<th style="background: #86df2d" align="right"> Size
</th>
<td style="background:#ffffff" align="left"> 2 mm
</td></tr>
<tr>
<th style="background: #86df2d" align="right"> Rarity
</th>
<td style="background:#ffffff" align="left"> Common
</td></tr>
<tr>
<th style="background: #86df2d" align="right"> Selling price
</th>
<td style="background:#ffffff" align="left"> 250 Bells
</td></tr>
<tr>
<th style="background: #86df2d; border-bottom-left-radius: 10px; -moz-border-radius-bottomleft: 10px; -webkit-border-bottom-left-radius: 10px; -khtml-border-bottom-left-radius: 10px; -icab-border-bottom-left-radius: 10px; -o-border-bottom-left-radius: 10px;" align="right"> Appearances
</th>
<td style="background:#ffffff; border-bottom-right-radius: 10px; -moz-border-radius-bottomright: 10px; -webkit-border-bottom-right-radius: 10px; -khtml-border-bottom-right-radius: 10px; -icab-border-bottom-right-radius: 10px; -o-border-bottom-right-radius: 10px;" align="left"> <i>Doubutsu no Mori</i>,<br /><i>Animal Crossing</i>,<br /><i>Animal Crossing: Wild World</i>,<br /><i>Animal Crossing: City Folk</i>,<br /><i>Animal Crossing: New Leaf</i>
</td></tr></table>'''
from bs4 import BeautifulSoup
#import requests
#r = requests.get('https://nookipedia.com/wiki/Pill_Bug')
#html = r.content
soup = BeautifulSoup(html, "html.parser")
tds = soup.find(id="Infobox-bug").find_all('td')
print('--- all td text ---')
for x in tds:
print('>', x.get_text().strip())
# or
print('>', x.text.strip())
print('--- one td text ---')
print(tds[0].text.strip())
print('--- one td a href ---')
print(tds[1].find('a').get('href'))
# or
print(tds[1].find('a')['href'])
print('--- all a href (using CSS selector) ---')
for a in soup.select('#Infobox-bug td a'):
print(a['href'])
print('--- all td and th ---')
for tt in soup.find(id='Infobox-bug').find_all({'td', 'th'}):
if tt.name == 'th':
print('[', tt.name, ']', tt.text.strip(), end=" --> ")
elif tt.name == 'td':
a = tt.find('a')
if a:
a = a['href']
else:
a = 'None'
print('[', tt.name, ']', tt.text.strip(), '(', a, ')')
Result:
--- all td text ---
> Pill Bug
> Pill Bug
>
>
> Armadillidium vulgare
> Armadillidium vulgare
> Armadillidiidae - Terrestrial Custaceans
> Armadillidiidae - Terrestrial Custaceans
> All year
> All year
> All day
> All day
> Under rocks
> Under rocks
> 2 mm
> 2 mm
> Common
> Common
> 250 Bells
> 250 Bells
> Doubutsu no Mori,Animal Crossing,Animal Crossing: Wild World,Animal Crossing: City Folk,Animal Crossing: New Leaf
> Doubutsu no Mori,Animal Crossing,Animal Crossing: Wild World,Animal Crossing: City Folk,Animal Crossing: New Leaf
--- one td text ---
Pill Bug
--- one td a href ---
/wiki/File:Pill_Bug_Picture.jpg
/wiki/File:Pill_Bug_Picture.jpg
--- all a href (using CSS selector) ---
/wiki/File:Pill_Bug_Picture.jpg
/wiki/Bells
/wiki/Doubutsu_no_Mori_(game)
/wiki/Animal_Crossing_(GCN)
/wiki/Animal_Crossing:_Wild_World
/wiki/Animal_Crossing:_City_Folk
/wiki/Animal_Crossing:_New_Leaf
--- all td and th ---
[ td ] Pill Bug ( None )
[ td ] ( /wiki/File:Pill_Bug_Picture.jpg )
[ th ] Scientific name --> [ td ] Armadillidium vulgare ( None )
[ th ] Family --> [ td ] Armadillidiidae - Terrestrial Custaceans ( None )
[ th ] Time of year --> [ td ] All year ( None )
[ th ] Time of day --> [ td ] All day ( None )
[ th ] Location --> [ td ] Under rocks ( None )
[ th ] Size --> [ td ] 2 mm ( None )
[ th ] Rarity --> [ td ] Common ( None )
[ th ] Selling price --> [ td ] 250 Bells ( /wiki/Bells )
[ th ] Appearances --> [ td ] Doubutsu no Mori,Animal Crossing,Animal Crossing: Wild World,Animal Crossing: City Folk,Animal Crossing: New Leaf ( /wiki/Doubutsu_no_Mori_(game) )