I'm scraping a webpage for a table using BeautifulSoup, but for some reason it is only scraping half the table. The half I'm getting is the part that doesn't contain the input fields. Here is the html data:
<table class="commonTable1" cellpadding="0" cellspacing="0" border="0" width="100%" id="portAllocTable">
<tbody>
<tr>
<th class="commonTableHeaderLastCell" colspan="2"><span class="commonBold"> Portfolio Allocation (%) </span></th>
</tr>
<tr>
<td colspan="2" class="commonHeaderContentSeparator"><img src="/fees-web/common/images/spacer.gif" height="1" style="display: block"></td>
</tr>
<tr>
<td>
<span>AdvisorGuided (Capital Portfolio)</span>
</td>
<td class="commonTableBodyLastCell" align="right">
<span>
<!-- When collection method is invoice, the portfolio to charge table should be diabled.
Else work as it was-->
<input type="hidden" name="portfolioChargeList[0].feeCollectionRate" value="100" id="selText_1"><input type="text" name="portfolioChargeList[0].feeCollectionRateINPUT" maxlength="3" onkeypress="return disableMinus();" onblur="updateTotal(1);" value="100" maxvalue="100" decimals="0" showalertdialog="true" blankifzero="true" id="selText_1INPUT" style="text-align:right;width:50px" class="commonTextBoxAmount">
</span>
</td>
</tr>
<tr>
<td>
<span>AdvisorGuided 2 (Capital Portfolio)</span>
</td>
<td class="commonTableBodyLastCell" align="right">
<span>
<!-- When collection method is invoice, the portfolio to charge table should be diabled.
Else work as it was-->
<input type="hidden" name="portfolioChargeList[1].feeCollectionRate" value="0" id="selText_1"><input type="text" name="portfolioChargeList[1].feeCollectionRateINPUT" maxlength="3" onkeypress="return disableMinus();" onblur="updateTotal(1);" value="0" maxvalue="100" decimals="0" showalertdialog="true" blankifzero="true" id="selText_1INPUT" style="text-align:right;width:50px" class="commonTextBoxAmount">
</span>
</td>
</tr>
<tr>
<td>
<span>Client Directed (Capital Portfolio)</span>
</td>
<td class="commonTableBodyLastCell" align="right">
<span>
<!-- When collection method is invoice, the portfolio to charge table should be diabled.
Else work as it was-->
<input type="hidden" name="portfolioChargeList[2].feeCollectionRate" value="0" id="selText_1"><input type="text" name="portfolioChargeList[2].feeCollectionRateINPUT" maxlength="3" onkeypress="return disableMinus();" onblur="updateTotal(1);" value="0" maxvalue="100" decimals="0" showalertdialog="true" blankifzero="true" id="selText_1INPUT" style="text-align:right;width:50px" class="commonTextBoxAmount">
</span>
</td>
</tr>
<tr>
<td>
<span>Holding MMKT (Capital Portfolio)</span>
</td>
<td class="commonTableBodyLastCell" align="right">
<span>
<!-- When collection method is invoice, the portfolio to charge table should be diabled.
Else work as it was-->
<input type="hidden" name="portfolioChargeList[3].feeCollectionRate" value="0" id="selText_1"><input type="text" name="portfolioChargeList[3].feeCollectionRateINPUT" maxlength="3" onkeypress="return disableMinus();" onblur="updateTotal(1);" value="0" maxvalue="100" decimals="0" showalertdialog="true" blankifzero="true" id="selText_1INPUT" style="text-align:right;width:50px" class="commonTextBoxAmount">
</span>
</td>
</tr>
<tr>
<td>
<span>Total</span>
</td>
<td class="commonTableBodyLastCell" align="right">
<span>
<input type="hidden" name="portfolioChargeList[4].feeCollectionRate" value="100" id="selText_1Total"><input type="text" name="portfolioChargeList[4].feeCollectionRateINPUT" maxlength="3" value="100" maxvalue="100" decimals="0" blankifzero="true" id="selText_1TotalINPUT" style="text-align:right;width:50px" class="commonTextBoxAmount">
</span>
</td>
</tr>
</tbody>
</table>
Here is my code:
url = driver.page_source
soup = BeautifulSoup(url, "lxml")
table = soup.find('table', id="portAllocTable")
rows = table.findAll('td')
list_of_rows = []
for row in table.findAll('tr'):
list_of_cells = []
for cell in row.findAll(["th","td"]):
text = cell.text
list_of_cells.append(text)
list_of_rows.append(list_of_cells)
for item in list_of_rows:
print(' '.join(item))
What am I doing wrong? Why is it only printing the left side of the table? Any recommendations about what to change would be much appreciated.
Results:
Portfolio Allocation (%)
AdvisorGuided (Capital Portfolio)
100 100
AdvisorGuided 2 (Capital Portfolio)
0 100
Client Directed (Capital Portfolio)
0 100
Holding MMKT (Capital Portfolio)
0 100
Total
100 100
You'll have to go further into the child and sibling nodes and pull out the attributes (those values aren't actual text/content.
import pandas as pd
import bs4
html = '''<table class="commonTable1" cellpadding="0" cellspacing="0" border="0" width="100%" id="portAllocTable">
<tbody>
<tr>
<th class="commonTableHeaderLastCell" colspan="2"><span class="commonBold"> Portfolio Allocation (%) </span></th>
</tr>
<tr>
<td colspan="2" class="commonHeaderContentSeparator"><img src="/fees-web/common/images/spacer.gif" height="1" style="display: block"></td>
</tr>
<tr>
<td>
<span>AdvisorGuided (Capital Portfolio)</span>
</td>
<td class="commonTableBodyLastCell" align="right">
<span>
<!-- When collection method is invoice, the portfolio to charge table should be diabled.
Else work as it was-->
<input type="hidden" name="portfolioChargeList[0].feeCollectionRate" value="100" id="selText_1"><input type="text" name="portfolioChargeList[0].feeCollectionRateINPUT" maxlength="3" onkeypress="return disableMinus();" onblur="updateTotal(1);" value="100" maxvalue="100" decimals="0" showalertdialog="true" blankifzero="true" id="selText_1INPUT" style="text-align:right;width:50px" class="commonTextBoxAmount">
</span>
</td>
</tr>
<tr>
<td>
<span>AdvisorGuided 2 (Capital Portfolio)</span>
</td>
<td class="commonTableBodyLastCell" align="right">
<span>
<!-- When collection method is invoice, the portfolio to charge table should be diabled.
Else work as it was-->
<input type="hidden" name="portfolioChargeList[1].feeCollectionRate" value="0" id="selText_1"><input type="text" name="portfolioChargeList[1].feeCollectionRateINPUT" maxlength="3" onkeypress="return disableMinus();" onblur="updateTotal(1);" value="0" maxvalue="100" decimals="0" showalertdialog="true" blankifzero="true" id="selText_1INPUT" style="text-align:right;width:50px" class="commonTextBoxAmount">
</span>
</td>
</tr>
<tr>
<td>
<span>Client Directed (Capital Portfolio)</span>
</td>
<td class="commonTableBodyLastCell" align="right">
<span>
<!-- When collection method is invoice, the portfolio to charge table should be diabled.
Else work as it was-->
<input type="hidden" name="portfolioChargeList[2].feeCollectionRate" value="0" id="selText_1"><input type="text" name="portfolioChargeList[2].feeCollectionRateINPUT" maxlength="3" onkeypress="return disableMinus();" onblur="updateTotal(1);" value="0" maxvalue="100" decimals="0" showalertdialog="true" blankifzero="true" id="selText_1INPUT" style="text-align:right;width:50px" class="commonTextBoxAmount">
</span>
</td>
</tr>
<tr>
<td>
<span>Holding MMKT (Capital Portfolio)</span>
</td>
<td class="commonTableBodyLastCell" align="right">
<span>
<!-- When collection method is invoice, the portfolio to charge table should be diabled.
Else work as it was-->
<input type="hidden" name="portfolioChargeList[3].feeCollectionRate" value="0" id="selText_1"><input type="text" name="portfolioChargeList[3].feeCollectionRateINPUT" maxlength="3" onkeypress="return disableMinus();" onblur="updateTotal(1);" value="0" maxvalue="100" decimals="0" showalertdialog="true" blankifzero="true" id="selText_1INPUT" style="text-align:right;width:50px" class="commonTextBoxAmount">
</span>
</td>
</tr>
<tr>
<td>
<span>Total</span>
</td>
<td class="commonTableBodyLastCell" align="right">
<span>
<input type="hidden" name="portfolioChargeList[4].feeCollectionRate" value="100" id="selText_1Total"><input type="text" name="portfolioChargeList[4].feeCollectionRateINPUT" maxlength="3" value="100" maxvalue="100" decimals="0" blankifzero="true" id="selText_1TotalINPUT" style="text-align:right;width:50px" class="commonTextBoxAmount">
</span>
</td>
</tr>
</tbody>
</table>'''
soup = bs4.BeautifulSoup(html, "lxml")
table = soup.find('table', id="portAllocTable")
rows = table.findAll('td')
list_of_rows = []
for row in table.findAll('tr'):
list_of_cells = []
for cell in row.find_all(["th","td"]):
text = cell.text
try:
val = cell.find('input')['value']
max_val = cell.find('input').next_sibling['maxvalue']
list_of_cells.append(val)
list_of_cells.append(max_val)
except:
pass
list_of_cells.append(text)
list_of_rows.append(list_of_cells)
for item in list_of_rows:
print(' '.join(item))
To make a table, you could do something like this. You'll have to do a bitt of clean up, but should get you going:
results = pd.DataFrame()
for row in table.findAll('tr'):
for cell in row.find_all(["th","td"]):
text = cell.text
try:
val = cell.find('input')['value']
max_val = cell.find('input').next_sibling['maxvalue']
except:
val = ''
max_val = ''
pass
temp_df = pd.DataFrame([[text, val, max_val]], columns=['text','value','maxvalue'])
results = results.append(temp_df).reset_index(drop=True)
A few things come to mind.
First: it should be rows = table.findAll('tr') as the tr HTML tag designates rows. Subsequently, it should for row in table.findAll('td'): as the td HTML tag is the cell tag. But you're not even using the rows variable, so the point is moot. If you want you could do something like this:
soup = BeautifulSoup(url, "lxml")
table = soup.find('table', id="portAllocTable")
rows = table.findAll("tr")
list_of_rows = []
for row in rows:
list_of_cells = []
for cell in row.findAll(['th', 'td']):
text = cell.text
list_of_cells.append(text)
list_of_rows.append(list_of_cells)
for item in list_of_rows:
print(' '.join(item))
Second, this code wouldn't get the text in the input fields, so this is probably why you only see the text on the left side.
Finally, you could try a difference parser, such as html5lib.
Related
I'm really not sure what's going on here and cannot figure it out so hoping someone can help me.
Essentially, I am selecting a table, then finding all of the rows within that table. From here, I am looping through the rows and selecting three checkboxes and checking if they are selected or not. However, the results I am getting for every row is always what the first row is. For example
False
True
False
then...
False
True
False
when it should change each row.
Python/Selenium code:
table = driver.find_element_by_xpath("//table[#id='tableMscRatesModal']/tbody")
rows = table.find_elements(By.XPATH, "//tr[#class='ng-scope']") # get all of the rows in the tables
for row in rows:
print(row.text)
manager_box = row.find_element_by_xpath("//td/input[#type='radio'][contains(#name, 'userRole')][#value='1']")
standard_box = row.find_element_by_xpath("//td/input[#type='radio'][contains(#name, 'userRole')][#value='2']")
no_box = row.find_element_by_xpath("//td/input[#type='radio'][contains(#name, 'userRole')][#value='3']")
print(manager_box.is_selected())
print(standard_box.is_selected())
print(no_box.is_selected())
if not (manager_box.is_selected() or standard_box.is_selected() or no_box.is_selected()):
standard_box.send_keys(Keys.SPACE)
I have included an example of two rows below. This is repeated exactly the same for x amount of rows.
HTML:
<table id="tableMscRatesModal" ng-table="tableParams" class="table table-striped table-bordered" show-filter="false">
<colgroup>
<col span="1" style="width: 13%;">
<col span="1" style="width: 13%;">
<col span="1" style="width: 8%;">
<col span="1" style="width: 8%;">
<col span="1" style="width: 8%;">
<col span="1" style="width: 25%;">
<col span="1" style="width: 25%;">
</colgroup>
<tbody>
<tr>
<th>First name</th>
<th>Last name</th>
<th>Manager</th>
<th>Standard user</th>
<th>No access</th>
<th>Proposed change</th>
<th ng-show="iAmRequestManagerWithProposals(getUserList())" class="ng-hide">
Approval
<br>
<br>
<div class="row approvals">
<div class="col-xs-12">
<div class="input-group">
<span class="input-group-addon">
<label ng-click="rejectAll()">
<input type="radio" name="rejectapproveall" id="rejectall" value="rejectall">
Reject all
</label>
</span>
<span class="input-group-addon">
<label ng-click="approveAll()">
<input type="radio" name="rejectapproveall" id="acceptall" value="acceptall">
Approve all
</label>
</span>
</div>
</div>
</div>
</th>
</tr>
<!-- ngRepeat: accessUser in manageAccessControl.usersAndRoles | orderBy:['sortOrder','lastName','firstName'] -->
<tr ng-repeat="accessUser in manageAccessControl.usersAndRoles | orderBy:['sortOrder','lastName','firstName']" class="ng-scope">
<td class="ng-binding">First Name</td>
<td class="ng-binding">Last Name</td>
<td>
<input type="radio" name="userRole" ng-value="hmwAccess.roles.manager" ng-model="accessUser.selectedRoleId" ng-disabled="accessUser.proposedRoleId || accessUser.isAmsAdmin" class="ng-pristine ng-untouched ng-valid ng-not-empty" value="1">
</td>
<td>
<input type="radio" name="userRole" ng-value="hmwAccess.roles.stdUser" ng-model="accessUser.selectedRoleId" ng-disabled="accessUser.proposedRoleId || accessUser.isAmsAdmin" class="ng-pristine ng-untouched ng-valid ng-not-empty" value="2">
</td>
<td>
<input type="radio" name="userRole" ng-value="hmwAccess.roles.noAccess" ng-model="accessUser.selectedRoleId" ng-disabled="accessUser.proposedRoleId || accessUser.isAmsAdmin" class="ng-pristine ng-untouched ng-valid ng-not-empty" value="3">
</td>
<td>
<span ng-show="accessUser.proposedRoleId" class="ng-hide">
<img tooltip-placement="left" tooltip-append-to-body="true" uib-tooltip-html="getRoleChangeProposal(accessUser)" src="/img/infoIcon.png" style="width: 12px; height: 12px">
Pending manager approval
</span>
<span ng-show="accessUser.isAmsAdmin" class="ng-hide">
<img tooltip-placement="left" tooltip-append-to-body="true" uib-tooltip-html="amsAdminUserTooltip" src="/img/infoIcon.png" style="width: 12px; height: 12px">
AMS admin user
</span>
</td>
<td ng-show="iAmRequestManagerWithProposals(getUserList())" class="ng-hide">
<div class="row approvals">
<div class="col-xs-12">
<div class="input-group ng-hide" ng-show="accessUser.proposedRoleId && accessUser.proposedRoleId > 0">
<span class="input-group-addon">
<label ng-click="setApprovalForUser(accessUser, hmwAccess.rejectApprove.rejected)">
<input type="radio" name="rejectapprove" id="reject" value="rejected" ng-model="accessUser.rejectApprove" class="ng-pristine ng-untouched ng-valid ng-empty">
Reject
</label>
</span>
<span class="input-group-addon">
<label ng-click="setApprovalForUser(accessUser, hmwAccess.rejectApprove.approved)">
<input type="radio" name="rejectapprove" id="accept" value="approved" ng-model="accessUser.rejectApprove" class="ng-pristine ng-untouched ng-valid ng-empty">
Approve
</label>
</span>
</div>
</div>
</div>
</td>
</tr>
<!-- end ngRepeat: accessUser in manageAccessControl.usersAndRoles | orderBy:['sortOrder','lastName','firstName'] -->
<tr ng-repeat="accessUser in manageAccessControl.usersAndRoles | orderBy:['sortOrder','lastName','firstName']" class="ng-scope">
<td class="ng-binding">First Name</td>
<td class="ng-binding">Last Name</td>
<td>
<input type="radio" name="userRole" ng-value="hmwAccess.roles.manager" ng-model="accessUser.selectedRoleId" ng-disabled="accessUser.proposedRoleId || accessUser.isAmsAdmin" class="ng-pristine ng-untouched ng-valid ng-not-empty" value="1">
</td>
<td>
<input type="radio" name="userRole" ng-value="hmwAccess.roles.stdUser" ng-model="accessUser.selectedRoleId" ng-disabled="accessUser.proposedRoleId || accessUser.isAmsAdmin" class="ng-pristine ng-untouched ng-valid ng-not-empty" value="2">
</td>
<td>
<input type="radio" name="userRole" ng-value="hmwAccess.roles.noAccess" ng-model="accessUser.selectedRoleId" ng-disabled="accessUser.proposedRoleId || accessUser.isAmsAdmin" class="ng-pristine ng-untouched ng-valid ng-not-empty" value="3">
</td>
<td>
<span ng-show="accessUser.proposedRoleId" class="ng-hide">
<img tooltip-placement="left" tooltip-append-to-body="true" uib-tooltip-html="getRoleChangeProposal(accessUser)" src="/img/infoIcon.png" style="width: 12px; height: 12px">
Pending manager approval
</span>
<span ng-show="accessUser.isAmsAdmin" class="ng-hide">
<img tooltip-placement="left" tooltip-append-to-body="true" uib-tooltip-html="amsAdminUserTooltip" src="/img/infoIcon.png" style="width: 12px; height: 12px">
AMS admin user
</span>
</td>
<td ng-show="iAmRequestManagerWithProposals(getUserList())" class="ng-hide">
<div class="row approvals">
<div class="col-xs-12">
<div class="input-group ng-hide" ng-show="accessUser.proposedRoleId && accessUser.proposedRoleId > 0">
<span class="input-group-addon">
<label ng-click="setApprovalForUser(accessUser, hmwAccess.rejectApprove.rejected)">
<input type="radio" name="rejectapprove" id="reject" value="rejected" ng-model="accessUser.rejectApprove" class="ng-pristine ng-untouched ng-valid ng-empty">
Reject
</label>
</span>
<span class="input-group-addon">
<label ng-click="setApprovalForUser(accessUser, hmwAccess.rejectApprove.approved)">
<input type="radio" name="rejectapprove" id="accept" value="approved" ng-model="accessUser.rejectApprove" class="ng-pristine ng-untouched ng-valid ng-empty">
Approve
</label>
</span>
</div>
</div>
</div>
</td>
</tr>
Thanks in advance!
When locating an element from another element with xpath you need to use current context .
for row in rows:
manager_box = row.find_element_by_xpath(".//td/input[#type='radio'][contains(#name, 'userRole')][#value='1']")
standard_box = row.find_element_by_xpath(".//td/input[#type='radio'][contains(#name, 'userRole')][#value='2']")
no_box = row.find_element_by_xpath(".//td/input[#type='radio'][contains(#name, 'userRole')][#value='3']")
You can also simplify your code if you use list
checkboxes = row.find_elements_by_xpath(".//td/input[#type='radio'][contains(#name, 'userRole')]")
for checkbox in checkboxes:
print(checkboxes.is_selected())
And even farther if you drop the print(row)
checkboxes = row.find_elements_by_xpath("//table[#id='tableMscRatesModal']/tbody//tr[#class='ng-scope']//td/input[#type='radio']")
for i in range(0, len(checkboxes), 3):
print(checkboxes[i].is_selected())
print(checkboxes[i + 1].is_selected())
print(checkboxes[i + 2].is_selected())
I am trying to extract data from a HTML file using python. I am trying to extract the table content from the file.
Below is the HTML content of the table:
<table class="radiobutton" id="ctl00_bodyPlaceHolder_ctl00_Reg_rblTypeDisplay" onclick="return false;">
<tbody>
<tr>
<td>
<input id="ctl00_bodyPlaceHolder_ctl00_Reg_rblTypeDisplay_0" name="ctl00$bodyPlaceHolder$ctl00$Reg$rblTypeDisplay" type="radio" value="1" />
<label for="ctl00_bodyPlaceHolder_ctl00_Reg_rblTypeDisplay_0">Fitting</label>
</td>
</tr>
<tr>
<td>
<input id="ctl00_bodyPlaceHolder_ctl00_Reg_rblTypeDisplay_1" name="ctl00$bodyPlaceHolder$ctl00$Reg$rblTypeDisplay" type="radio" value="2" />
<label for="ctl00_bodyPlaceHolder_ctl00_Reg_rblTypeDisplay_1">Material</label>
</td>
</tr>
<tr>
<td>
<input id="ctl00_bodyPlaceHolder_ctl00_Reg_rblTypeDisplay_2" name="ctl00$bodyPlaceHolder$ctl00$Reg$rblTypeDisplay" type="radio" value="4" />
<label for="ctl00_bodyPlaceHolder_ctl00_Reg_rblTypeDisplay_2">Appliance</label>
</td>
</tr>
<tr>
<td>
<input checked="checked" id="ctl00_bodyPlaceHolder_ctl00_Reg_rblTypeDisplay_3" name="ctl00$bodyPlaceHolder$ctl00$Reg$rblTypeDisplay" type="radio" value="8" />
<label for="ctl00_bodyPlaceHolder_ctl00_Reg_rblTypeDisplay_3">Apparatus</label>
</td>
</tr>
<tr>
<td>
<input id="ctl00_bodyPlaceHolder_ctl00_Reg_rblTypeDisplay_4" name="ctl00$bodyPlaceHolder$ctl00$Reg$rblTypeDisplay" type="radio" value="16" />
<label for="ctl00_bodyPlaceHolder_ctl00_Reg_rblTypeDisplay_4">Other procedures</label>
</td>
</tr>
<tr>
<td>
<input id="ctl00_bodyPlaceHolder_ctl00_Reg_rblTypeDisplay_5" name="ctl00$bodyPlaceHolder$ctl00$Reg$rblTypeDisplay" type="radio" value="32" />
<label for="ctl00_bodyPlaceHolder_ctl00_Reg_rblTypeDisplay_5">Alternative fuel oils</label>
</td>
</tr>
<tr>
<td>
<input id="ctl00_bodyPlaceHolder_ctl00_Reg_rblTypeDisplay_6" name="ctl00$bodyPlaceHolder$ctl00$Reg$rblTypeDisplay" type="radio" value="64" />
<label for="ctl00_bodyPlaceHolder_ctl00_Reg_rblTypeDisplay_6">Other compliance method:</label>
</td>
</tr>
</tbody>
</table>
Below is the python code to print the properties from the tags.
from bs4 import BeautifulSoup
from pyparsing import makeHTMLTags
with open('.\ABC.html', 'r') as read_file:
data = read_file.read()
soup = BeautifulSoup(data, 'html.parser')
table = soup.find("table", attrs={"id":"ctl00_bodyPlaceHolder_ctl00_Reg_rblTypeDisplay"})
spotterTag, spotterEndTag = makeHTMLTags("input")
for spotter in spotterTag.searchString(table):
print(spotter.checked)
print(spotter.id)
How can I print the label of the radio buttons along with checked property?
Examle: For below tag, it should print : Fitting
And "checked" for Input tag mentioned below:
<label for="ctl00_bodyPlaceHolder_ctl00_Reg_rblTypeDisplay_0">Fitting</label>
<input checked="checked" id="ctl00_bodyPlaceHolder_ctl00_Reg_rblTypeDisplay_3" name="ctl00$bodyPlaceHolder$ctl00$Reg$rblTypeDisplay" type="radio" value="8"/>
Below code works but needs a better solution:
from bs4 import BeautifulSoup
from pyparsing import makeHTMLTags
with open('.\ABC.html', 'r') as read_file:
data = read_file.read()
soup = BeautifulSoup(data, 'html.parser')
table = soup.find("table", attrs={"id":"ctl00_bodyPlaceHolder_ctl00_Reg_rblTypeDisplay"})
spotterTag, spotterEndTag = makeHTMLTags("input")
for spotter in spotterTag.searchString(table):
if spotter.checked == 'checked':
label = soup.find("label", attrs={"for":spotter.id})
print(str(label)[str(label).find('>')+1:str(label).find('<',2)])
print(spotter.checked)
Thanks in advance for help!
I'm not sure if I understand you correctly, but do you want to zip input and labels together? If yes, you can use zip() function. For example (data is your HTML string):
from bs4 import BeautifulSoup
soup = BeautifulSoup(data, 'html.parser')
print('{:^25} {:^15} {:^15}'.format('Text', 'Value', 'Checked'))
for inp, lbl in zip(soup.select('table#ctl00_bodyPlaceHolder_ctl00_Reg_rblTypeDisplay input'),
soup.select('table#ctl00_bodyPlaceHolder_ctl00_Reg_rblTypeDisplay label')):
print('{:<25} {:^15} {:^15}'.format(lbl.text, inp['value'], 'checked' if 'checked' in inp.attrs else '-'))
Prints:
Text Value Checked
Fitting 1 -
Material 2 -
Appliance 4 -
Apparatus 8 checked
Other procedures 16 -
Alternative fuel oils 32 -
Other compliance method: 64 -
I need to get the name and the price of each row in the sample html below, however when I'm using beatifulsoup to find_all('tr') it returns all the tr of the main table and the nested tables. what is the best way for extracting only the value and the price of each row?
soup = BeautifulSoup(f, 'html.parser')
priceTable = soup.find('table', attrs={"class":"table table-hover table-responsive"})
Above is what I have and it returns "all" the tr including the nested tables.
What I need is to get all the names and the price of each item in front of it, and finally save them in a csv file
<table class="table table-hover table-responsive">
<tbody><tr>
<td style="vertical-align: middle; width: 20%;" class="hidden-xs">
<img class="retailer-logo" data-placement="right" src="/images/20180813125BhYNMEK8lgOpXj3zxze53WmqeRWov7h.jpg" alt="Contact Energy" style="width:150px;" title="" data-original-title="" />
</td>
<td style="vertical-align: middle; width: 75px;" class="hidden-xs">
<img src="/images/result-arrow.png" />
</td>
<td>
<table style="width: 100%;">
<tbody><tr class="visible-xs">
<td class="text-center" colspan="2">
<img class="retailer-logo" data-placement="right" src="/images/20180813125BhYNMEK8lgOpXj3zxze53WmqeRWov7h.jpg" alt="Contact Energy" style="width:150px;" title="" data-original-title="" />
</td>
</tr>
<tr>
<td colspan="3"><h4>Contact Energy Saver Plus</h4></td>
</tr>
<tr style="text-transform: uppercase">
<td width="150px">Electricity:</td>
<td>$242.85 <a class="plan-breakdown" data-placement="right" title="" data-original-title="<table><tr><td>Anytime</td><td>$0.334</td><td>per kWh</td><tr><td>Daily</td><td>$0.333</td><td>per day</td><tr><td>EA Levy</td><td>$0.0013</td><td>per kWh</td></table>"><i class="glyphicon glyphicon-info-sign"> </i></a>
</td>
</tr>
<tr style="text-transform: uppercase">
<td>Discount:</td>
<td>$63.14 (26%)
</td>
</tr>
<tr>
<td colspan="3">
<a class="plan-detail" data-placement="right" title="" data-original-title="<ul><li>Provides fixed pricing until 31 June 2021 unless there are changes to taxes and levies.</li><li>24% Prompt Payment Discount when you pay on time. additional 1% discount for paying by direct debit (excl. credit card), and 1% discount for getting bills and correspondence by email. Up to 26% PPD available.</li><li>An early termination fee of $150 per contracted ICP if you terminate the contract before the end date�(31/06/2021). Fee may be waived if you are moving house and take Contact Energy to the new property.</li><li>Not available to prepay customers.</li></ul>"><i class="glyphicon glyphicon-info-sign"> </i> What you need to know</a>
</td>
</tr>
<tr class="visible-xs">
<td colspan="2">
<h3 class="total">$179.71</h3>
<div class="incentive">
<b style="text-transform: uppercase">SPECIAL SwitchMe OFFER</b><br />
Special PPD & Fixed rates<br />
<a style="font-size: 0.9em;" class="incentive-info" title="" data-original-title="Receive�a special Prompt Payment Discount and fixed rates until 31 June 2021 unless there are changes to taxes and levies">More Info</a>
</div>
</td>
</tr>
<tr class="visible-xs">
<td colspan="2">
<form id="w0" action="/switch/" method="post">
<input type="hidden" name="_csrf" value="Hi21xBvkP6NpUl0UcaFwxn4U5-94Jj8KqEeprOfuG9tMfP2gStRY6RFrBGdF6gGvT0uM3CAQaVvOPpnq1IddtQ==" /> <input type="hidden" name="query_id" value="409884" /> <input type="hidden" name="plan_group_id" value="54" /> <input type="hidden" name="plan_stage_id" value="367" /> <button type="submit" class="btn btn-block btn-switch" style="max-width: 100%; margin-top: 10px">Switch Now!</button> </form> <div class="wannatalk" style="max-width: 100%">
Want to talk?<br />
Call our friendly team on<br />
<b>0800 179 482</b>
</div>
</td>
</tr>
</tbody></table>
</td>
<td style="text-align: center" class="hidden-xs">
<h3 class="total">$179.71</h3>
<div class="incentive">
<b style="text-transform: uppercase">SPECIAL SwitchMe OFFER</b><br />
Special PPD & Fixed rates<br />
<a style="font-size: 0.9em;" class="incentive-info" title="" data-original-title="Receive�a special Prompt Payment Discount and fixed rates until 31 June 2021 unless there are changes to taxes and levies">More Info</a>
</div>
</td>
<td class="hidden-xs">
<form id="w1" action="/switch/" method="post">
<input type="hidden" name="_csrf" value="Hi21xBvkP6NpUl0UcaFwxn4U5-94Jj8KqEeprOfuG9tMfP2gStRY6RFrBGdF6gGvT0uM3CAQaVvOPpnq1IddtQ==" /> <input type="hidden" name="query_id" value="409884" /> <input type="hidden" name="plan_group_id" value="54" /> <input type="hidden" name="plan_stage_id" value="367" /> <button type="submit" class="btn btn-block btn-switch">Switch Now!</button> </form> <div class="wannatalk">
Want to talk?<br />
Call our friendly team on<br />
<b>0800 179 482</b>
</div>
</td>
</tr>
<tr>
<td style="vertical-align: middle; width: 20%;" class="hidden-xs">
<img class="retailer-logo" data-placement="right" src="/images/20171013102LzWd_kdtQOk4yxxyZuCZBG6q7xIuClx.jpg" alt="Powershop" style="width:150px;" title="" data-original-title="" />
</td>
<td style="vertical-align: middle; width: 75px;" class="hidden-xs">
<img src="/images/result-arrow.png" />
</td>
<td>
<table style="width: 100%;">
<tbody><tr class="visible-xs">
<td class="text-center" colspan="2">
<img class="retailer-logo" data-placement="right" src="/images/20171013102LzWd_kdtQOk4yxxyZuCZBG6q7xIuClx.jpg" alt="Powershop" style="width:150px;" title="" data-original-title="" />
</td>
</tr>
<tr>
<td colspan="3"><h4>Powershop Saver</h4></td>
</tr>
<tr style="text-transform: uppercase">
<td width="150px">Electricity:</td>
<td>$183.40 <a class="plan-breakdown" data-placement="right" title="" data-original-title="<table><tr><td>Anytime</td><td>$0.2508</td><td>per kWh</td><tr><td>Daily</td><td>$0.30</td><td>per day</td><tr><td>EA Levy</td><td>$0.00</td><td>per kWh</td></table>"><i class="glyphicon glyphicon-info-sign"> </i></a>
</td>
</tr>
<tr style="text-transform: uppercase">
<td>Discount:</td>
<td>$0.00 (0%)
</td>
</tr>
<tr>
<td colspan="3">
<a class="plan-detail" data-placement="right" title="" data-original-title="<ul><li>The price estimate is based on forecast charges from Powershop for the next 12 months.</li><li>It assumes you purchase the Powershop Simple Saver powerpack once a month and special powerpacks that are made available from time to time.</li><li>This offer does not require a contract or a minimum supply period.</li><li>New customers will get a $150 power credit applied over their first 12 months ($25 straight away, $10 on the next 10�monthly account
review periods, and a final credit of $25 in the final account review period of
your first year as a Powershop customer).�</li></ul>"><i class="glyphicon glyphicon-info-sign"> </i> What you need to know</a>
</td>
</tr>
<tr class="visible-xs">
<td colspan="2">
<h3 class="total">$183.40</h3>
<div class="incentive">
<b style="text-transform: uppercase">SPECIAL SwitchMe OFFER</b><br />
Get $150 off your bill over 12 months!<br /> <a style="font-size: 0.9em;" class="incentive-info" title="" data-original-title="<div><div>New customers will get a $150 power credit applied over their first 12 months ($25 straight away, then $10 for the next 10�monthly account
review periods, and a final credit of $25 in the final account review period of
your first year as a Powershop customer).</div><div>�</div></div><div><br></div><div><br></div>">More Info</a> </div>
</td>
</tr>
<tr class="visible-xs">
<td colspan="2">
<form id="w2" action="/switch/" method="post">
<input type="hidden" name="_csrf" value="Hi21xBvkP6NpUl0UcaFwxn4U5-94Jj8KqEeprOfuG9tMfP2gStRY6RFrBGdF6gGvT0uM3CAQaVvOPpnq1IddtQ==" /> <input type="hidden" name="query_id" value="409884" /> <input type="hidden" name="plan_group_id" value="53" /> <input type="hidden" name="plan_stage_id" value="273" /> <button type="submit" class="btn btn-block btn-switch" style="max-width: 100%; margin-top: 10px">Switch Now!</button> </form><div class="wannatalk" style="max-width: 100%">
Want to talk?<br />
Call our friendly team on<br />
<b>0800 179 482</b>
</div>
</td>
</tr>
</tbody></table>
</td>
so the output should be:
from td[3] and td[4] in first row:
Contact Energy Saver Plus
$179.71
and then the next row:
Powershop Saver
$183.40
and so on until the last row ( of the main table).
Similar process to that given in comments but different selectors
from bs4 import BeautifulSoup as bs
html = '''yourhtml'''
soup = bs(html, 'lxml')
names = [item.text for item in soup.select('.table h4 ')]
prices = [item.text for item in soup.select('[colspan="2"] > .total')]
results = list(zip(names, prices))
print(results)
I actually managed to solve this with using regex. I like the approach in the above answer much better specially using zip(), but I though pasting my solution here in case it becomes handy to some other readers.
deals=[]
prices=[]
results={}
with open("prices.html", "r") as f:
soup = BeautifulSoup(f, 'html.parser')
priceTable = soup.find('table', attrs={"class":"table table-hover table-responsive"})
tbody = priceTable.find('tbody')
pplanPattern = '<td\ colspan="3"><h4>([^<]+)<\/h4><\/td>'
pricePatterns = '<h3 class="total">([^<]+)<\/h3>'
for rw in tbody:
plan = re.search(pplanPattern, rw)
price = re.search(pricePatterns, rw)
if plan:
deals.append(plan.group(1))
if price:
deals.append(price.group(1))
results[plan.group(1)] = price.group(1)
I am writing a simple survey with modelForm. I have searched online for this issue too and it says that's because it's because the form is unbound... but I for simplicity I only offered one choice in models.py
Edit: form._isbound is true... so it's because of something else
form.errors show property object at 0x03A146C0
models.py
Those are hardcoded as radio inputs in html
class Office(models.Model):
Office_Space = (
('R1B1', 'R1B1'),
('R2B1', 'R2B1'),
('R3B1', 'R3B1'),
('R1B2', 'R1B2'),
('R2B2', 'R2B2'),
('R3B2', 'R3B2'),
('R1B3', 'R1B3'),
('R2B3', 'R2B3'),
('R3B3', 'R3B3')
)
space = models.CharField(max_length=4, choices=Office_Space)
form.py
class officeForm(forms.ModelForm):
class Meta:
model = Office
fields = ['space',]
Views.py
def get_SenarioChoice(request):
form_class = officeForm(request.POST or None)
if request.method == 'POST':
if form_class.is_valid():
space = request.POST.get('result')
response_data = {}
print(space+ "is valid") # here is the RxCx printed for debugging
response_data['space'] = space
form_class.save()
print (connection.queries) #the SQL log
return JsonResponse(response_data)enter code here
return render(request, 'Front.html', {'officeform': form_class})
Added: template- I am very new to web-dev so when I wrote this form I did not know that it could render by itself therefore I hardcoded everything
Survey is consisted of 3 bids, each bid has 3 issues and each issue has 3 options. (I could potentially separated them but I didn't know how so I coded them in one choicefield numbered by the issueID ("R#") + BidID ("B#"))
i.e: R1B1 = issue 1 bid 1
<tr>
<th>Bigger office</th>
</tr>
<tr>
<td>Bigger cubible</td>
<td>5</td>
<td><input type="radio" name="R1B1" value="5" required><br></td>
<td> </td>
<td><input type="radio" name="R1B2" value="5" required><br></td>
<td> </td>
<td><input type="radio" name="R1B3" value="5" required><br></td>
<td> </td>
<td> </td>
<td> </td>
<td> </td>
</tr>
<tr>
<td>Shared office</td>
<td>60</td>
<td><input type="radio" name="R1B1" value="60"><br></td>
<td id =R1C1></td>
<td><input type="radio" name="R1B2" value="60"><br></td>
<td id =R1C2></td>
<td><input type="radio" name="R1B3" value="60"><br></td>
<td id = R1C3></td>
<td id =R1C1C></td>
<td id =R1C2C></td>
<td id = R1C3C></td>
</tr>
<tr>
<td>No change</td>
<td>30</td>
<td><input type="radio" name="R1B1" value="30" required><br></td>
<td> </td>
<td><input type="radio" name="R1B2" value="30" required><br></td>
<td> </td>
<td><input type="radio" name="R1B3" value="30" required><br></td>
<td> </td>
<td> </td>
<td> </td>
<td> </td>
</tr>
<tr>
<th>New and challenging individual assignments</th>
</tr>
<tr>
<td>Some teamwork, some individual work</td>
<td>80</td>
<td><input type="radio" name="R2B1" value="80" required><br></td>
<td> </td>
<td><input type="radio" name="R2B2" value="80" required><br></td>
<td> </td>
<td><input type="radio" name="R2B3" value="80" required><br></td>
<td> </td>
<td> </td>
<td> </td>
<td> </td>
</tr>
<tr>
<td>No (i.e., no change to current situation)</td>
<td>10</td>
<td><input type="radio" name="R2B1" value="10"><br></td>
<td id =R2C1></td>
<td><input type="radio" name="R2B2" value="10"><br></td>
<td id =R2C2></td>
<td><input type="radio" name="R2B3" value="10"><br></td>
<td id =R2C3></td>
<td id =R2C1C></td>
<td id =R2C2C></td>
<td id =R2C3C></td>
</tr>
<tr>
<td>Mostly Group Work</td>
<td>40</td>
<td><input type="radio" name="R2B1" value="40" required><br></td>
<td> </td>
<td><input type="radio" name="R2B2" value="40" required><br></td>
<td> </td>
<td><input type="radio" name="R2B3" value="40" required><br></td>
<td> </td>
<td> </td>
<td> </td>
<td> </td>
</tr>
<tr>
<th>Working hours</th>
</tr>
<tr>
<td>Yes, flextime and others</td>
<td>50</td>
<td><input type="radio" name="R3B1" value="50" required><br></td>
<td> </td>
<td><input type="radio" name="R3B2" value="50" required><br></td>
<td> </td>
<td><input type="radio" name="R3B3" value="50" required><br></td>
<td> </td>
<td> </td>
<td> </td>
<td> </td>
</tr>
<tr>
<td>No change</td>
<td>0</td>
<td><input type="radio" name="R3B1" value="0"><br></td>
<td id =R3C1></td>
<td><input type="radio" name="R3B2" value="0"><br></td>
<td id =R3C2></td>
<td><input type="radio" name="R3B3" value="0"><br></td>
<td id =R3C3></td>
<td id =R3C1C></td>
<td id =R3C2C></td>
<td id =R3C3C></td>
</tr>
<tr>
<tr>
<td>Work more</td>
<td>10</td>
<td><input type="radio" name="R3B1" value="10" required><br></td>
<td> </td>
<td><input type="radio" name="R3B2" value="10" required><br></td>
<td> </td>
<td><input type="radio" name="R3B3" value="10" required><br></td>
<td> </td>
Thanks in advance.
I'm new with python and beautifulsopu lib. I have tried many things, but no luck.
My html code could be like:
<form method = "post" id="FORM1" name="FORM1">
<table cellpadding=0 cellspacing=1 border=0 align="center" bgcolor="#cccccc">
<tr>
<td class="producto"><b>Club</b><br>
<input value="CLUB TENIS DE MESA PORTOBAIL" disabled class="txtmascaraform" type="TEXT" name="txtClub" size="60" maxlength="55">
</td>
<tr>
<td colspan="2" class="producto"><b>Nombre Equipo</b><br>
<input value="C.T.M. PORTOBAIL" disabled class="txtmascaraform" type="TEXT" name="txtNomEqu" size="100" maxlength="80">
</td>
</tr>
<tr>
<td class="producto"><b>Telefono fijo</b><br>
<input value="63097005534" disabled class="txtmascaraform" type="TEXT" name="txtTelf" size="15" maxlength="10">
</td
and I need JUST to take what is within <"b"><"/b"> and its "input value" .
Many thanks!!
First find() your form by id, then find_all() inputs inside and get the value of value attribute:
from bs4 import BeautifulSoup
data = """<form method = "post" id="FORM1" name="FORM1">
<table cellpadding=0 cellspacing=1 border=0 align="center" bgcolor="#cccccc">
<tr>
<td class="producto"><b>Club</b><br>
<input value="CLUB TENIS DE MESA PORTOBAIL" disabled class="txtmascaraform" type="TEXT" name="txtClub" size="60" maxlength="55">
</td>
<tr>
<td colspan="2" class="producto"><b>Nombre Equipo</b><br>
<input value="C.T.M. PORTOBAIL" disabled class="txtmascaraform" type="TEXT" name="txtNomEqu" size="100" maxlength="80">
</td>
</tr>
<tr>
<td class="producto"><b>Telefono fijo</b><br>
<input value="63097005534" disabled class="txtmascaraform" type="TEXT" name="txtTelf" size="15" maxlength="10">
</td>
</tr>
</table>
</form>"""
soup = BeautifulSoup(data)
form = soup.find("form", {'id': "FORM1"})
print [item.get('value') for item in form.find_all('input')]
# UPDATE for getting table cell values
table = form.find("table")
print [item.text.strip() for item in table.find_all('td')]
prints:
['CLUB TENIS DE MESA PORTOBAIL', 'C.T.M. PORTOBAIL', '63097005534']
[u'Club', u'Nombre Equipo', u'Telefono fijo']