I am trying to extract data from a HTML file using python. I am trying to extract the table content from the file.
Below is the HTML content of the table:
<table class="radiobutton" id="ctl00_bodyPlaceHolder_ctl00_Reg_rblTypeDisplay" onclick="return false;">
<tbody>
<tr>
<td>
<input id="ctl00_bodyPlaceHolder_ctl00_Reg_rblTypeDisplay_0" name="ctl00$bodyPlaceHolder$ctl00$Reg$rblTypeDisplay" type="radio" value="1" />
<label for="ctl00_bodyPlaceHolder_ctl00_Reg_rblTypeDisplay_0">Fitting</label>
</td>
</tr>
<tr>
<td>
<input id="ctl00_bodyPlaceHolder_ctl00_Reg_rblTypeDisplay_1" name="ctl00$bodyPlaceHolder$ctl00$Reg$rblTypeDisplay" type="radio" value="2" />
<label for="ctl00_bodyPlaceHolder_ctl00_Reg_rblTypeDisplay_1">Material</label>
</td>
</tr>
<tr>
<td>
<input id="ctl00_bodyPlaceHolder_ctl00_Reg_rblTypeDisplay_2" name="ctl00$bodyPlaceHolder$ctl00$Reg$rblTypeDisplay" type="radio" value="4" />
<label for="ctl00_bodyPlaceHolder_ctl00_Reg_rblTypeDisplay_2">Appliance</label>
</td>
</tr>
<tr>
<td>
<input checked="checked" id="ctl00_bodyPlaceHolder_ctl00_Reg_rblTypeDisplay_3" name="ctl00$bodyPlaceHolder$ctl00$Reg$rblTypeDisplay" type="radio" value="8" />
<label for="ctl00_bodyPlaceHolder_ctl00_Reg_rblTypeDisplay_3">Apparatus</label>
</td>
</tr>
<tr>
<td>
<input id="ctl00_bodyPlaceHolder_ctl00_Reg_rblTypeDisplay_4" name="ctl00$bodyPlaceHolder$ctl00$Reg$rblTypeDisplay" type="radio" value="16" />
<label for="ctl00_bodyPlaceHolder_ctl00_Reg_rblTypeDisplay_4">Other procedures</label>
</td>
</tr>
<tr>
<td>
<input id="ctl00_bodyPlaceHolder_ctl00_Reg_rblTypeDisplay_5" name="ctl00$bodyPlaceHolder$ctl00$Reg$rblTypeDisplay" type="radio" value="32" />
<label for="ctl00_bodyPlaceHolder_ctl00_Reg_rblTypeDisplay_5">Alternative fuel oils</label>
</td>
</tr>
<tr>
<td>
<input id="ctl00_bodyPlaceHolder_ctl00_Reg_rblTypeDisplay_6" name="ctl00$bodyPlaceHolder$ctl00$Reg$rblTypeDisplay" type="radio" value="64" />
<label for="ctl00_bodyPlaceHolder_ctl00_Reg_rblTypeDisplay_6">Other compliance method:</label>
</td>
</tr>
</tbody>
</table>
Below is the python code to print the properties from the tags.
from bs4 import BeautifulSoup
from pyparsing import makeHTMLTags
with open('.\ABC.html', 'r') as read_file:
data = read_file.read()
soup = BeautifulSoup(data, 'html.parser')
table = soup.find("table", attrs={"id":"ctl00_bodyPlaceHolder_ctl00_Reg_rblTypeDisplay"})
spotterTag, spotterEndTag = makeHTMLTags("input")
for spotter in spotterTag.searchString(table):
print(spotter.checked)
print(spotter.id)
How can I print the label of the radio buttons along with checked property?
Examle: For below tag, it should print : Fitting
And "checked" for Input tag mentioned below:
<label for="ctl00_bodyPlaceHolder_ctl00_Reg_rblTypeDisplay_0">Fitting</label>
<input checked="checked" id="ctl00_bodyPlaceHolder_ctl00_Reg_rblTypeDisplay_3" name="ctl00$bodyPlaceHolder$ctl00$Reg$rblTypeDisplay" type="radio" value="8"/>
Below code works but needs a better solution:
from bs4 import BeautifulSoup
from pyparsing import makeHTMLTags
with open('.\ABC.html', 'r') as read_file:
data = read_file.read()
soup = BeautifulSoup(data, 'html.parser')
table = soup.find("table", attrs={"id":"ctl00_bodyPlaceHolder_ctl00_Reg_rblTypeDisplay"})
spotterTag, spotterEndTag = makeHTMLTags("input")
for spotter in spotterTag.searchString(table):
if spotter.checked == 'checked':
label = soup.find("label", attrs={"for":spotter.id})
print(str(label)[str(label).find('>')+1:str(label).find('<',2)])
print(spotter.checked)
Thanks in advance for help!
I'm not sure if I understand you correctly, but do you want to zip input and labels together? If yes, you can use zip() function. For example (data is your HTML string):
from bs4 import BeautifulSoup
soup = BeautifulSoup(data, 'html.parser')
print('{:^25} {:^15} {:^15}'.format('Text', 'Value', 'Checked'))
for inp, lbl in zip(soup.select('table#ctl00_bodyPlaceHolder_ctl00_Reg_rblTypeDisplay input'),
soup.select('table#ctl00_bodyPlaceHolder_ctl00_Reg_rblTypeDisplay label')):
print('{:<25} {:^15} {:^15}'.format(lbl.text, inp['value'], 'checked' if 'checked' in inp.attrs else '-'))
Prints:
Text Value Checked
Fitting 1 -
Material 2 -
Appliance 4 -
Apparatus 8 checked
Other procedures 16 -
Alternative fuel oils 32 -
Other compliance method: 64 -
I'm scraping a webpage for a table using BeautifulSoup, but for some reason it is only scraping half the table. The half I'm getting is the part that doesn't contain the input fields. Here is the html data:
<table class="commonTable1" cellpadding="0" cellspacing="0" border="0" width="100%" id="portAllocTable">
<tbody>
<tr>
<th class="commonTableHeaderLastCell" colspan="2"><span class="commonBold"> Portfolio Allocation (%) </span></th>
</tr>
<tr>
<td colspan="2" class="commonHeaderContentSeparator"><img src="/fees-web/common/images/spacer.gif" height="1" style="display: block"></td>
</tr>
<tr>
<td>
<span>AdvisorGuided (Capital Portfolio)</span>
</td>
<td class="commonTableBodyLastCell" align="right">
<span>
<!-- When collection method is invoice, the portfolio to charge table should be diabled.
Else work as it was-->
<input type="hidden" name="portfolioChargeList[0].feeCollectionRate" value="100" id="selText_1"><input type="text" name="portfolioChargeList[0].feeCollectionRateINPUT" maxlength="3" onkeypress="return disableMinus();" onblur="updateTotal(1);" value="100" maxvalue="100" decimals="0" showalertdialog="true" blankifzero="true" id="selText_1INPUT" style="text-align:right;width:50px" class="commonTextBoxAmount">
</span>
</td>
</tr>
<tr>
<td>
<span>AdvisorGuided 2 (Capital Portfolio)</span>
</td>
<td class="commonTableBodyLastCell" align="right">
<span>
<!-- When collection method is invoice, the portfolio to charge table should be diabled.
Else work as it was-->
<input type="hidden" name="portfolioChargeList[1].feeCollectionRate" value="0" id="selText_1"><input type="text" name="portfolioChargeList[1].feeCollectionRateINPUT" maxlength="3" onkeypress="return disableMinus();" onblur="updateTotal(1);" value="0" maxvalue="100" decimals="0" showalertdialog="true" blankifzero="true" id="selText_1INPUT" style="text-align:right;width:50px" class="commonTextBoxAmount">
</span>
</td>
</tr>
<tr>
<td>
<span>Client Directed (Capital Portfolio)</span>
</td>
<td class="commonTableBodyLastCell" align="right">
<span>
<!-- When collection method is invoice, the portfolio to charge table should be diabled.
Else work as it was-->
<input type="hidden" name="portfolioChargeList[2].feeCollectionRate" value="0" id="selText_1"><input type="text" name="portfolioChargeList[2].feeCollectionRateINPUT" maxlength="3" onkeypress="return disableMinus();" onblur="updateTotal(1);" value="0" maxvalue="100" decimals="0" showalertdialog="true" blankifzero="true" id="selText_1INPUT" style="text-align:right;width:50px" class="commonTextBoxAmount">
</span>
</td>
</tr>
<tr>
<td>
<span>Holding MMKT (Capital Portfolio)</span>
</td>
<td class="commonTableBodyLastCell" align="right">
<span>
<!-- When collection method is invoice, the portfolio to charge table should be diabled.
Else work as it was-->
<input type="hidden" name="portfolioChargeList[3].feeCollectionRate" value="0" id="selText_1"><input type="text" name="portfolioChargeList[3].feeCollectionRateINPUT" maxlength="3" onkeypress="return disableMinus();" onblur="updateTotal(1);" value="0" maxvalue="100" decimals="0" showalertdialog="true" blankifzero="true" id="selText_1INPUT" style="text-align:right;width:50px" class="commonTextBoxAmount">
</span>
</td>
</tr>
<tr>
<td>
<span>Total</span>
</td>
<td class="commonTableBodyLastCell" align="right">
<span>
<input type="hidden" name="portfolioChargeList[4].feeCollectionRate" value="100" id="selText_1Total"><input type="text" name="portfolioChargeList[4].feeCollectionRateINPUT" maxlength="3" value="100" maxvalue="100" decimals="0" blankifzero="true" id="selText_1TotalINPUT" style="text-align:right;width:50px" class="commonTextBoxAmount">
</span>
</td>
</tr>
</tbody>
</table>
Here is my code:
url = driver.page_source
soup = BeautifulSoup(url, "lxml")
table = soup.find('table', id="portAllocTable")
rows = table.findAll('td')
list_of_rows = []
for row in table.findAll('tr'):
list_of_cells = []
for cell in row.findAll(["th","td"]):
text = cell.text
list_of_cells.append(text)
list_of_rows.append(list_of_cells)
for item in list_of_rows:
print(' '.join(item))
What am I doing wrong? Why is it only printing the left side of the table? Any recommendations about what to change would be much appreciated.
Results:
Portfolio Allocation (%)
AdvisorGuided (Capital Portfolio)
100 100
AdvisorGuided 2 (Capital Portfolio)
0 100
Client Directed (Capital Portfolio)
0 100
Holding MMKT (Capital Portfolio)
0 100
Total
100 100
You'll have to go further into the child and sibling nodes and pull out the attributes (those values aren't actual text/content.
import pandas as pd
import bs4
html = '''<table class="commonTable1" cellpadding="0" cellspacing="0" border="0" width="100%" id="portAllocTable">
<tbody>
<tr>
<th class="commonTableHeaderLastCell" colspan="2"><span class="commonBold"> Portfolio Allocation (%) </span></th>
</tr>
<tr>
<td colspan="2" class="commonHeaderContentSeparator"><img src="/fees-web/common/images/spacer.gif" height="1" style="display: block"></td>
</tr>
<tr>
<td>
<span>AdvisorGuided (Capital Portfolio)</span>
</td>
<td class="commonTableBodyLastCell" align="right">
<span>
<!-- When collection method is invoice, the portfolio to charge table should be diabled.
Else work as it was-->
<input type="hidden" name="portfolioChargeList[0].feeCollectionRate" value="100" id="selText_1"><input type="text" name="portfolioChargeList[0].feeCollectionRateINPUT" maxlength="3" onkeypress="return disableMinus();" onblur="updateTotal(1);" value="100" maxvalue="100" decimals="0" showalertdialog="true" blankifzero="true" id="selText_1INPUT" style="text-align:right;width:50px" class="commonTextBoxAmount">
</span>
</td>
</tr>
<tr>
<td>
<span>AdvisorGuided 2 (Capital Portfolio)</span>
</td>
<td class="commonTableBodyLastCell" align="right">
<span>
<!-- When collection method is invoice, the portfolio to charge table should be diabled.
Else work as it was-->
<input type="hidden" name="portfolioChargeList[1].feeCollectionRate" value="0" id="selText_1"><input type="text" name="portfolioChargeList[1].feeCollectionRateINPUT" maxlength="3" onkeypress="return disableMinus();" onblur="updateTotal(1);" value="0" maxvalue="100" decimals="0" showalertdialog="true" blankifzero="true" id="selText_1INPUT" style="text-align:right;width:50px" class="commonTextBoxAmount">
</span>
</td>
</tr>
<tr>
<td>
<span>Client Directed (Capital Portfolio)</span>
</td>
<td class="commonTableBodyLastCell" align="right">
<span>
<!-- When collection method is invoice, the portfolio to charge table should be diabled.
Else work as it was-->
<input type="hidden" name="portfolioChargeList[2].feeCollectionRate" value="0" id="selText_1"><input type="text" name="portfolioChargeList[2].feeCollectionRateINPUT" maxlength="3" onkeypress="return disableMinus();" onblur="updateTotal(1);" value="0" maxvalue="100" decimals="0" showalertdialog="true" blankifzero="true" id="selText_1INPUT" style="text-align:right;width:50px" class="commonTextBoxAmount">
</span>
</td>
</tr>
<tr>
<td>
<span>Holding MMKT (Capital Portfolio)</span>
</td>
<td class="commonTableBodyLastCell" align="right">
<span>
<!-- When collection method is invoice, the portfolio to charge table should be diabled.
Else work as it was-->
<input type="hidden" name="portfolioChargeList[3].feeCollectionRate" value="0" id="selText_1"><input type="text" name="portfolioChargeList[3].feeCollectionRateINPUT" maxlength="3" onkeypress="return disableMinus();" onblur="updateTotal(1);" value="0" maxvalue="100" decimals="0" showalertdialog="true" blankifzero="true" id="selText_1INPUT" style="text-align:right;width:50px" class="commonTextBoxAmount">
</span>
</td>
</tr>
<tr>
<td>
<span>Total</span>
</td>
<td class="commonTableBodyLastCell" align="right">
<span>
<input type="hidden" name="portfolioChargeList[4].feeCollectionRate" value="100" id="selText_1Total"><input type="text" name="portfolioChargeList[4].feeCollectionRateINPUT" maxlength="3" value="100" maxvalue="100" decimals="0" blankifzero="true" id="selText_1TotalINPUT" style="text-align:right;width:50px" class="commonTextBoxAmount">
</span>
</td>
</tr>
</tbody>
</table>'''
soup = bs4.BeautifulSoup(html, "lxml")
table = soup.find('table', id="portAllocTable")
rows = table.findAll('td')
list_of_rows = []
for row in table.findAll('tr'):
list_of_cells = []
for cell in row.find_all(["th","td"]):
text = cell.text
try:
val = cell.find('input')['value']
max_val = cell.find('input').next_sibling['maxvalue']
list_of_cells.append(val)
list_of_cells.append(max_val)
except:
pass
list_of_cells.append(text)
list_of_rows.append(list_of_cells)
for item in list_of_rows:
print(' '.join(item))
To make a table, you could do something like this. You'll have to do a bitt of clean up, but should get you going:
results = pd.DataFrame()
for row in table.findAll('tr'):
for cell in row.find_all(["th","td"]):
text = cell.text
try:
val = cell.find('input')['value']
max_val = cell.find('input').next_sibling['maxvalue']
except:
val = ''
max_val = ''
pass
temp_df = pd.DataFrame([[text, val, max_val]], columns=['text','value','maxvalue'])
results = results.append(temp_df).reset_index(drop=True)
A few things come to mind.
First: it should be rows = table.findAll('tr') as the tr HTML tag designates rows. Subsequently, it should for row in table.findAll('td'): as the td HTML tag is the cell tag. But you're not even using the rows variable, so the point is moot. If you want you could do something like this:
soup = BeautifulSoup(url, "lxml")
table = soup.find('table', id="portAllocTable")
rows = table.findAll("tr")
list_of_rows = []
for row in rows:
list_of_cells = []
for cell in row.findAll(['th', 'td']):
text = cell.text
list_of_cells.append(text)
list_of_rows.append(list_of_cells)
for item in list_of_rows:
print(' '.join(item))
Second, this code wouldn't get the text in the input fields, so this is probably why you only see the text on the left side.
Finally, you could try a difference parser, such as html5lib.
I am writing a simple survey with modelForm. I have searched online for this issue too and it says that's because it's because the form is unbound... but I for simplicity I only offered one choice in models.py
Edit: form._isbound is true... so it's because of something else
form.errors show property object at 0x03A146C0
models.py
Those are hardcoded as radio inputs in html
class Office(models.Model):
Office_Space = (
('R1B1', 'R1B1'),
('R2B1', 'R2B1'),
('R3B1', 'R3B1'),
('R1B2', 'R1B2'),
('R2B2', 'R2B2'),
('R3B2', 'R3B2'),
('R1B3', 'R1B3'),
('R2B3', 'R2B3'),
('R3B3', 'R3B3')
)
space = models.CharField(max_length=4, choices=Office_Space)
form.py
class officeForm(forms.ModelForm):
class Meta:
model = Office
fields = ['space',]
Views.py
def get_SenarioChoice(request):
form_class = officeForm(request.POST or None)
if request.method == 'POST':
if form_class.is_valid():
space = request.POST.get('result')
response_data = {}
print(space+ "is valid") # here is the RxCx printed for debugging
response_data['space'] = space
form_class.save()
print (connection.queries) #the SQL log
return JsonResponse(response_data)enter code here
return render(request, 'Front.html', {'officeform': form_class})
Added: template- I am very new to web-dev so when I wrote this form I did not know that it could render by itself therefore I hardcoded everything
Survey is consisted of 3 bids, each bid has 3 issues and each issue has 3 options. (I could potentially separated them but I didn't know how so I coded them in one choicefield numbered by the issueID ("R#") + BidID ("B#"))
i.e: R1B1 = issue 1 bid 1
<tr>
<th>Bigger office</th>
</tr>
<tr>
<td>Bigger cubible</td>
<td>5</td>
<td><input type="radio" name="R1B1" value="5" required><br></td>
<td> </td>
<td><input type="radio" name="R1B2" value="5" required><br></td>
<td> </td>
<td><input type="radio" name="R1B3" value="5" required><br></td>
<td> </td>
<td> </td>
<td> </td>
<td> </td>
</tr>
<tr>
<td>Shared office</td>
<td>60</td>
<td><input type="radio" name="R1B1" value="60"><br></td>
<td id =R1C1></td>
<td><input type="radio" name="R1B2" value="60"><br></td>
<td id =R1C2></td>
<td><input type="radio" name="R1B3" value="60"><br></td>
<td id = R1C3></td>
<td id =R1C1C></td>
<td id =R1C2C></td>
<td id = R1C3C></td>
</tr>
<tr>
<td>No change</td>
<td>30</td>
<td><input type="radio" name="R1B1" value="30" required><br></td>
<td> </td>
<td><input type="radio" name="R1B2" value="30" required><br></td>
<td> </td>
<td><input type="radio" name="R1B3" value="30" required><br></td>
<td> </td>
<td> </td>
<td> </td>
<td> </td>
</tr>
<tr>
<th>New and challenging individual assignments</th>
</tr>
<tr>
<td>Some teamwork, some individual work</td>
<td>80</td>
<td><input type="radio" name="R2B1" value="80" required><br></td>
<td> </td>
<td><input type="radio" name="R2B2" value="80" required><br></td>
<td> </td>
<td><input type="radio" name="R2B3" value="80" required><br></td>
<td> </td>
<td> </td>
<td> </td>
<td> </td>
</tr>
<tr>
<td>No (i.e., no change to current situation)</td>
<td>10</td>
<td><input type="radio" name="R2B1" value="10"><br></td>
<td id =R2C1></td>
<td><input type="radio" name="R2B2" value="10"><br></td>
<td id =R2C2></td>
<td><input type="radio" name="R2B3" value="10"><br></td>
<td id =R2C3></td>
<td id =R2C1C></td>
<td id =R2C2C></td>
<td id =R2C3C></td>
</tr>
<tr>
<td>Mostly Group Work</td>
<td>40</td>
<td><input type="radio" name="R2B1" value="40" required><br></td>
<td> </td>
<td><input type="radio" name="R2B2" value="40" required><br></td>
<td> </td>
<td><input type="radio" name="R2B3" value="40" required><br></td>
<td> </td>
<td> </td>
<td> </td>
<td> </td>
</tr>
<tr>
<th>Working hours</th>
</tr>
<tr>
<td>Yes, flextime and others</td>
<td>50</td>
<td><input type="radio" name="R3B1" value="50" required><br></td>
<td> </td>
<td><input type="radio" name="R3B2" value="50" required><br></td>
<td> </td>
<td><input type="radio" name="R3B3" value="50" required><br></td>
<td> </td>
<td> </td>
<td> </td>
<td> </td>
</tr>
<tr>
<td>No change</td>
<td>0</td>
<td><input type="radio" name="R3B1" value="0"><br></td>
<td id =R3C1></td>
<td><input type="radio" name="R3B2" value="0"><br></td>
<td id =R3C2></td>
<td><input type="radio" name="R3B3" value="0"><br></td>
<td id =R3C3></td>
<td id =R3C1C></td>
<td id =R3C2C></td>
<td id =R3C3C></td>
</tr>
<tr>
<tr>
<td>Work more</td>
<td>10</td>
<td><input type="radio" name="R3B1" value="10" required><br></td>
<td> </td>
<td><input type="radio" name="R3B2" value="10" required><br></td>
<td> </td>
<td><input type="radio" name="R3B3" value="10" required><br></td>
<td> </td>
Thanks in advance.
I´m getting the following error on python when I trying to do some scraping:
Traceback (most recent call last):
File "", line 26, in
signin2.fields["ctl06$txtParam_1"].value = '139210'
File "C:\Users\Alvaro
Pabon\Anaconda3\lib\site-packages\werkzeug\datastructures.py", line
781, in getitem
raise exceptions.BadRequestKeyError(key)
BadRequestKeyError: 400 Bad Request: The browser (or proxy) sent a
request that this server could not understand.
I provide the html and the python code, what am I doing wrong?
HTML:
<form method="post" action="Default.aspx?IdControl=SolicitarReporteUC&TipoProceso=G" id="Form1">
<div class="aspNetHidden">
<input type="hidden" name="__EVENTTARGET" id="__EVENTTARGET" value="" />
<input type="hidden" name="__EVENTARGUMENT" id="__EVENTARGUMENT" value="" />
<input type="hidden" name="__VIEWSTATE" id="__VIEWSTATE" value="/wEPDwUKLTE2MjczMjc4MQ9kFgICAw9kFgICBQ9kFgJmD2QWDgIBDxAPFgYeDkRhdGFWYWx1ZUZpZWxkBQpDb2RSZXBvcnRlHg1EYXRhVGV4dEZpZWxkBQdSZXBvcnRlHgtfIURhdGFCb3VuZGdkEBUBI0NlcnRpZmljYWRvIGRlIGhpc3RvcmlhIGxhYm9yYWwgRlBNFQEFMTAwOTUUKwMBZxYBZmQCAw9kFgJmD2QWAgIBD2QWAgIBDw9kFgIeB29uY2xpY2sFdmphdmFzY3JpcHQ6cmV0dXJuIEJ1c2NhckNvblBvc3RCYWNrKCdFbXBsZWFkb19WSVBQJywnQ29kRW1wbGVhZG8nLCdFbXBsZWFkbycsJycsJ2N0bDA2X3R4dFBhcmFtXzEnLCdjdGwwNl90eHREZXNjXzEnKTtkAgcPDxYCHgRUZXh0ZWRkAgkPEA8WAh4HVmlzaWJsZWdkEBUBA1BERhUBA1BERhQrAwFnZGQCCw8PFgIeB0VuYWJsZWRnZGQCDQ8PFgIfBGVkZAIRDzwrAAsBAA8WCB4IRGF0YUtleXMWAB4LXyFJdGVtQ291bnQCAR4JUGFnZUNvdW50AgEeFV8hRGF0YVNvdXJjZUl0ZW1Db3VudAIBZBYCZg9kFgICAg9kFgxmD2QWAgIDDw8WAh4LTmF2aWdhdGVVcmwFOkRlZmF1bHQuYXNweD9JZENvbnRyb2w9UGV0aWNpb25lc1ZlclVDJkNvZFBldGljaW9uPTk4NDI0NjZkZAIBDw8WAh8EBQc5ODQyNDY2ZGQCAg8PFgIfBAUKMDQvMDcvMjAxN2RkAgMPDxYCHwQFLENlcnRpZmljYWRvIGRlIGhpc3RvcmlhIGxhYm9yYWwgRlBNKDEzOTIxMCwpZGQCBA8PFgIfBAUBVGRkAgUPDxYCHwQFCVRlcm1pbmFkb2RkZG9xWba643oqthJTATkgc95Acvr6oJVDDdMGc4QiUOHQ" />
</div>
<script type="text/javascript">
//<![CDATA[
var theForm = document.forms['Form1'];
if (!theForm) {
theForm = document.Form1;
}
function __doPostBack(eventTarget, eventArgument) {
if (!theForm.onsubmit || (theForm.onsubmit() != false)) {
theForm.__EVENTTARGET.value = eventTarget;
theForm.__EVENTARGUMENT.value = eventArgument;
theForm.submit();
}
}
//]]>
</script>
<script src="/peoploEL/WebResource.axd?d=Vo5dwRm0erdgUaaz932BKtVNZGJOgXKXcR91FZwwFfehyhj6Sl2EkKnl2mAONakSWUxeINyfjibWOjKY8z8OLswtutIQ6CR4NPqhOOhW3-c1&t=635195493660000000" type="text/javascript"></script>
<script type="text/javascript">
//<![CDATA[
var __cultureInfo = {"name":"es-CO","numberFormat":{"CurrencyDecimalDigits":2,"CurrencyDecimalSeparator":",","IsReadOnly":true,"CurrencyGroupSizes":[3],"NumberGroupSizes":[3],"PercentGroupSizes":[3],"CurrencyGroupSeparator":".","CurrencySymbol":"$","NaNSymbol":"NeuN","CurrencyNegativePattern":14,"NumberNegativePattern":1,"PercentPositivePattern":0,"PercentNegativePattern":0,"NegativeInfinitySymbol":"-Infinito","NegativeSign":"-","NumberDecimalDigits":2,"NumberDecimalSeparator":",","NumberGroupSeparator":".","CurrencyPositivePattern":2,"PositiveInfinitySymbol":"Infinito","PositiveSign":"+","PercentDecimalDigits":2,"PercentDecimalSeparator":",","PercentGroupSeparator":".","PercentSymbol":"%","PerMilleSymbol":"‰","NativeDigits":["0","1","2","3","4","5","6","7","8","9"],"DigitSubstitution":1},"dateTimeFormat":{"AMDesignator":"a.m.","Calendar":{"MinSupportedDateTime":"\/Date(-62135578800000)\/","MaxSupportedDateTime":"\/Date(253402300799999)\/","AlgorithmType":1,"CalendarType":1,"Eras":[1],"TwoDigitYearMax":2029,"IsReadOnly":true},"DateSeparator":"/","FirstDayOfWeek":0,"CalendarWeekRule":0,"FullDateTimePattern":"dddd, dd\u0027 de \u0027MMMM\u0027 de \u0027yyyy hh:mm:ss tt","LongDatePattern":"dddd, dd\u0027 de \u0027MMMM\u0027 de \u0027yyyy","LongTimePattern":"hh:mm:ss tt","MonthDayPattern":"dd MMMM","PMDesignator":"p.m.","RFC1123Pattern":"ddd, dd MMM yyyy HH\u0027:\u0027mm\u0027:\u0027ss \u0027GMT\u0027","ShortDatePattern":"dd/MM/yyyy","ShortTimePattern":"hh:mm tt","SortableDateTimePattern":"yyyy\u0027-\u0027MM\u0027-\u0027dd\u0027T\u0027HH\u0027:\u0027mm\u0027:\u0027ss","TimeSeparator":":","UniversalSortableDateTimePattern":"yyyy\u0027-\u0027MM\u0027-\u0027dd HH\u0027:\u0027mm\u0027:\u0027ss\u0027Z\u0027","YearMonthPattern":"MMMM\u0027 de \u0027yyyy","AbbreviatedDayNames":["dom","lun","mar","mié","jue","vie","sáb"],"ShortestDayNames":["do","lu","ma","mi","ju","vi","sá"],"DayNames":["domingo","lunes","martes","miércoles","jueves","viernes","sábado"],"AbbreviatedMonthNames":["ene","feb","mar","abr","may","jun","jul","ago","sep","oct","nov","dic",""],"MonthNames":["enero","febrero","marzo","abril","mayo","junio","julio","agosto","septiembre","octubre","noviembre","diciembre",""],"IsReadOnly":true,"NativeCalendarName":"calendario gregoriano","AbbreviatedMonthGenitiveNames":["ene","feb","mar","abr","may","jun","jul","ago","sep","oct","nov","dic",""],"MonthGenitiveNames":["enero","febrero","marzo","abril","mayo","junio","julio","agosto","septiembre","octubre","noviembre","diciembre",""]},"eras":[1,"d.C.",null,0]};//]]>
</script>
<script src="/peoploEL/ScriptResource.axd?d=oxaJQOalmF_Pc9FHyAFTk_k6TF1NEbUrjIYsB44pk6WCbYo_nSIw4yk5tC2xEtvEorNRA5gOfFsIU4ZnWzjKxobYxQm7qlMyDI-yMbMSd2l6ZDbJap8N8TY6mfiS7PCqS0ZD_N1nysIMDoEuJENdCQ2&t=23c9c237" type="text/javascript"></script>
<script type="text/javascript">
//<![CDATA[
if (typeof(Sys) === 'undefined') throw new Error('ASP.NET Ajax client-side framework failed to load.');
//]]>
</script>
<div class="aspNetHidden">
<input type="hidden" name="__SCROLLPOSITIONX" id="__SCROLLPOSITIONX" value="0" />
<input type="hidden" name="__SCROLLPOSITIONY" id="__SCROLLPOSITIONY" value="0" />
<input type="hidden" name="__EVENTVALIDATION" id="__EVENTVALIDATION" value="/wEdAArkW6hVSYy1X/RA+Sj0CGQLGp+bdMCDYaJlV2GIWm9IvBdcfX0kLMsTvDhzcFP+5BCmu+5iWjvwd5K06ry8EbPN8eAu30BFMFNpn4fF9w5RD0sfx0Rt1Zoo22r6RgHWIEvbk+/Q0viP1b4fioHhV6vuLByhWnJD/fsZOTyD54nbDa+qASD48033XmTIh5CNr4axLA/MabVFryGhaiI+QVUeJtZhbNAXh60wJUXNyENePpp0PUjhju74p8tImEJGpMk=" />
</div>
<TABLE id="Table1" border="0" cellSpacing="0" cellPadding="0" width="80%" align="center"
height="72%">
<TR>
<TD height="25" vAlign="top" width="165" align="center"></TD>
<TD height="25" width="10"></TD>
<TD height="25" vAlign="top"></TD>
</TR>
<TR>
<TD vAlign="top" width="165" align="center">
<LINK rel="stylesheet" type="text/css" href="EstilosWeb.css">
<LINK rel="stylesheet" type="text/css" href="EstilosWeb.css">
<TABLE style="WIDTH: 160px; HEIGHT: 64px" id="tMain" class="main" cellPadding="0" width="160">
<TR vAlign="top">
<TD id="NavTd">
<DIV id="Nav">
<H4 align="center">Menu
<table id="PanelIzquierdoUC1_htbCategorias" cellspacing="0" cellpadding="0" style="border-width:0px;width:160px;border-collapse:collapse;">
<tr>
<td><a id="PanelIzquierdoUC1_ConsultarLiquidacion" title="Consulta de Liquidación" href="Default.aspx?IdControl=ConsultaLiquidacionFltUC">Consultar Liquidación</a></td>
</tr><tr>
<td><a id="PanelIzquierdoUC1_Reportes" title="Certificado Ing. y Ret." href="Default.aspx?IdControl=ReportesUC">Certificado Ing. y Ret.</a></td>
</tr><tr>
<td><a id="PanelIzquierdoUC1_CambiarClave" title="Cambio de Clave" href="Default.aspx?IdControl=CambioClaveUC">Cambio de Clave</a></td>
</tr><tr>
<td><a id="PanelIzquierdoUC1_ReportesGeneral" title="Reportes" href="Default.aspx?IdControl=SolicitarReporteUC&TipoProceso=G">Reportes</a></td>
</tr><tr>
<td><a id="PanelIzquierdoUC1_CerrarSesion" title="Cerrar Sesion" href="Default.aspx?IdControl=CerrarSesionUC">Cerrar Sesion</a></td>
</tr>
</table></H4>
</DIV>
</TD>
</TR>
</TABLE>
</TD>
<td width="10"> </td>
<TD vAlign="top">
<div id="pnlCargaUserControl" style="width:100%;">
<LINK href="EstilosWeb.css" type="text/css" rel="stylesheet">
<style type="text/css">
.style1
{
height: 26px;
width: 36px;
}
</style>
<TABLE class="FormaTabla" id="Table1" cellSpacing="1" cellPadding="1" width="300" border="0">
<TR>
<TD class="FormaEncabezado" colSpan="2">Reportes</TD>
</TR>
<TR>
<TD colSpan="2">
<P align="center"> </P>
</TD>
</TR>
<TR>
<TD colSpan="2"><select size="4" name="ctl06$lstReportes" onchange="javascript:setTimeout('__doPostBack(\'ctl06$lstReportes\',\'\')', 0)" id="ctl06_lstReportes" class="FormaInfo" style="height:215px;width:564px;">
<option selected="selected" value="10095">Certificado de historia laboral FPM</option>
</select></TD>
</TR>
<TR>
<TD colSpan="2">Parametros</TD>
</TR>
<TR>
<TD style="HEIGHT: 45px" colSpan="2"><table id="ctl06_tbParametros" rules="all" border="1">
<tr>
<td>Empleado</td><td><input name="ctl06$txtParam_1" type="text" value="139211" readonly="readonly" onchange="javascript:setTimeout('__doPostBack(\'ctl06$txtParam_1\',\'\')', 0)" onkeypress="if (WebForm_TextBoxKeyHandler(event) == false) return false;" id="ctl06_txtParam_1" Tabla="Empleado_VIPP" CodigoCampo="CodEmpleado" DescripcionCampo="Empleado" Condicion="" TipoDato="N" Parametro="Empleado" /><input type="submit" name="ctl06$btnParam_1" value="..." id="ctl06_btnParam_1" disabled="disabled" class="aspNetDisabled" onclick="javascript:return BuscarConPostBack('Empleado_VIPP','CodEmpleado','Empleado','','ctl06_txtParam_1','ctl06_txtDesc_1');" style="width:25px;" /></td><td><input name="ctl06$txtDesc_1" type="text" value="JUAN DE LOS PALOTES" readonly="readonly" id="ctl06_txtDesc_1" style="width:250px;" /></td>
</tr>
</table></TD>
</TR>
<TR>
<TD class="style1">
</TD>
<td>
<P align="center"><select name="ctl06$ddlFormato" id="ctl06_ddlFormato" style="width:104px;">
<option value="PDF">PDF</option>
</select> <input type="submit" name="ctl06$btnAceptar" value="Aceptar" id="ctl06_btnAceptar" />
</P>
</td>
</TR>
<TR>
<TD colSpan="2">
<P align="left"><span id="ctl06_lblMensaje" style="color:Red;font-family:Arial;"></span></P>
</TD>
</TR>
</TABLE>
<P>
<input type="submit" name="ctl06$ButActualizar" value="Actualizar" id="ctl06_ButActualizar" /></P>
<P><table class="FormaGrid" cellspacing="0" rules="all" border="1" id="ctl06_dtgDatos" style="border-collapse:collapse;">
<tr>
<td> </td><td>CodPeticion</td><td>FechaHora</td><td>Peticion</td><td>Estado</td><td>DetalleEstado</td>
</tr><tr>
<td style="white-space:nowrap;">
<a id="ctl06_dtgDatos_ctl03_cmdVer" href="javascript:__doPostBack('ctl06$dtgDatos$ctl03$cmdVer','')">Ver</a>
</td><td>9842466</td><td>04/07/2017</td><td>Certificado(139211,)</td><td>T</td><td>Terminado</td>
</tr><tr>
<td colspan="6"><span>1</span></td>
</tr>
</table></P>
</div>
</TD>
</TR>
</TABLE>
PYTHON:
form2 = browser.get_form(id='Form1')
form2["ctl06$txtParam_1"].value = '139211'
form2["ctl06$txtDesc_1"].value = 'JUAN DE LOS POTES'
form2["ctl06$ddlFormato"].value = 'PDF'
form2["ctl06$lstReportes"].value = '10095'
form2["__EVENTTARGET"].value = 'ctl06$dtgDatos$ctl03$cmdVer'
form2["__EVENTARGUMENT"].value = ''
browser.submit_form(signin2)
Use python request lib for that
Create Json and pass it through the headers and remember <__EVENTTARGET>
<__EVENTARGUMENT> This previous <> mention parameter always changing after few minute (based on website).
It Will easy if you use POST method and for before sending request check it in POSTMAN once.
header = {
"ctl00$ContentPlaceHolder1$txt_tradename": str(index),
"ctl00$ContentPlaceHolder1$txtSearchTin": "",
"ctl00$ContentPlaceHolder1$ddl_dist": 2,
"ctl00$ContentPlaceHolder1$btnDlrSearch": "Search",
"__EVENTVALIDATION": token.get("__EVENTVALIDATION", "")
, "__VIEWSTATEGENERATOR": token.get("__VIEWSTATEGENERATOR"),
"__VIEWSTATE": token.get("__VIEWSTATE")
}
try:
req = requests.post(url, header)
I'm new with python and beautifulsopu lib. I have tried many things, but no luck.
My html code could be like:
<form method = "post" id="FORM1" name="FORM1">
<table cellpadding=0 cellspacing=1 border=0 align="center" bgcolor="#cccccc">
<tr>
<td class="producto"><b>Club</b><br>
<input value="CLUB TENIS DE MESA PORTOBAIL" disabled class="txtmascaraform" type="TEXT" name="txtClub" size="60" maxlength="55">
</td>
<tr>
<td colspan="2" class="producto"><b>Nombre Equipo</b><br>
<input value="C.T.M. PORTOBAIL" disabled class="txtmascaraform" type="TEXT" name="txtNomEqu" size="100" maxlength="80">
</td>
</tr>
<tr>
<td class="producto"><b>Telefono fijo</b><br>
<input value="63097005534" disabled class="txtmascaraform" type="TEXT" name="txtTelf" size="15" maxlength="10">
</td
and I need JUST to take what is within <"b"><"/b"> and its "input value" .
Many thanks!!
First find() your form by id, then find_all() inputs inside and get the value of value attribute:
from bs4 import BeautifulSoup
data = """<form method = "post" id="FORM1" name="FORM1">
<table cellpadding=0 cellspacing=1 border=0 align="center" bgcolor="#cccccc">
<tr>
<td class="producto"><b>Club</b><br>
<input value="CLUB TENIS DE MESA PORTOBAIL" disabled class="txtmascaraform" type="TEXT" name="txtClub" size="60" maxlength="55">
</td>
<tr>
<td colspan="2" class="producto"><b>Nombre Equipo</b><br>
<input value="C.T.M. PORTOBAIL" disabled class="txtmascaraform" type="TEXT" name="txtNomEqu" size="100" maxlength="80">
</td>
</tr>
<tr>
<td class="producto"><b>Telefono fijo</b><br>
<input value="63097005534" disabled class="txtmascaraform" type="TEXT" name="txtTelf" size="15" maxlength="10">
</td>
</tr>
</table>
</form>"""
soup = BeautifulSoup(data)
form = soup.find("form", {'id': "FORM1"})
print [item.get('value') for item in form.find_all('input')]
# UPDATE for getting table cell values
table = form.find("table")
print [item.text.strip() for item in table.find_all('td')]
prints:
['CLUB TENIS DE MESA PORTOBAIL', 'C.T.M. PORTOBAIL', '63097005534']
[u'Club', u'Nombre Equipo', u'Telefono fijo']