BadRequestKeyError: 400 Bad Request Python Web Scraping - python
I´m getting the following error on python when I trying to do some scraping:
Traceback (most recent call last):
File "", line 26, in
signin2.fields["ctl06$txtParam_1"].value = '139210'
File "C:\Users\Alvaro
Pabon\Anaconda3\lib\site-packages\werkzeug\datastructures.py", line
781, in getitem
raise exceptions.BadRequestKeyError(key)
BadRequestKeyError: 400 Bad Request: The browser (or proxy) sent a
request that this server could not understand.
I provide the html and the python code, what am I doing wrong?
HTML:
<form method="post" action="Default.aspx?IdControl=SolicitarReporteUC&TipoProceso=G" id="Form1">
<div class="aspNetHidden">
<input type="hidden" name="__EVENTTARGET" id="__EVENTTARGET" value="" />
<input type="hidden" name="__EVENTARGUMENT" id="__EVENTARGUMENT" value="" />
<input type="hidden" name="__VIEWSTATE" id="__VIEWSTATE" value="/wEPDwUKLTE2MjczMjc4MQ9kFgICAw9kFgICBQ9kFgJmD2QWDgIBDxAPFgYeDkRhdGFWYWx1ZUZpZWxkBQpDb2RSZXBvcnRlHg1EYXRhVGV4dEZpZWxkBQdSZXBvcnRlHgtfIURhdGFCb3VuZGdkEBUBI0NlcnRpZmljYWRvIGRlIGhpc3RvcmlhIGxhYm9yYWwgRlBNFQEFMTAwOTUUKwMBZxYBZmQCAw9kFgJmD2QWAgIBD2QWAgIBDw9kFgIeB29uY2xpY2sFdmphdmFzY3JpcHQ6cmV0dXJuIEJ1c2NhckNvblBvc3RCYWNrKCdFbXBsZWFkb19WSVBQJywnQ29kRW1wbGVhZG8nLCdFbXBsZWFkbycsJycsJ2N0bDA2X3R4dFBhcmFtXzEnLCdjdGwwNl90eHREZXNjXzEnKTtkAgcPDxYCHgRUZXh0ZWRkAgkPEA8WAh4HVmlzaWJsZWdkEBUBA1BERhUBA1BERhQrAwFnZGQCCw8PFgIeB0VuYWJsZWRnZGQCDQ8PFgIfBGVkZAIRDzwrAAsBAA8WCB4IRGF0YUtleXMWAB4LXyFJdGVtQ291bnQCAR4JUGFnZUNvdW50AgEeFV8hRGF0YVNvdXJjZUl0ZW1Db3VudAIBZBYCZg9kFgICAg9kFgxmD2QWAgIDDw8WAh4LTmF2aWdhdGVVcmwFOkRlZmF1bHQuYXNweD9JZENvbnRyb2w9UGV0aWNpb25lc1ZlclVDJkNvZFBldGljaW9uPTk4NDI0NjZkZAIBDw8WAh8EBQc5ODQyNDY2ZGQCAg8PFgIfBAUKMDQvMDcvMjAxN2RkAgMPDxYCHwQFLENlcnRpZmljYWRvIGRlIGhpc3RvcmlhIGxhYm9yYWwgRlBNKDEzOTIxMCwpZGQCBA8PFgIfBAUBVGRkAgUPDxYCHwQFCVRlcm1pbmFkb2RkZG9xWba643oqthJTATkgc95Acvr6oJVDDdMGc4QiUOHQ" />
</div>
<script type="text/javascript">
//<![CDATA[
var theForm = document.forms['Form1'];
if (!theForm) {
theForm = document.Form1;
}
function __doPostBack(eventTarget, eventArgument) {
if (!theForm.onsubmit || (theForm.onsubmit() != false)) {
theForm.__EVENTTARGET.value = eventTarget;
theForm.__EVENTARGUMENT.value = eventArgument;
theForm.submit();
}
}
//]]>
</script>
<script src="/peoploEL/WebResource.axd?d=Vo5dwRm0erdgUaaz932BKtVNZGJOgXKXcR91FZwwFfehyhj6Sl2EkKnl2mAONakSWUxeINyfjibWOjKY8z8OLswtutIQ6CR4NPqhOOhW3-c1&t=635195493660000000" type="text/javascript"></script>
<script type="text/javascript">
//<![CDATA[
var __cultureInfo = {"name":"es-CO","numberFormat":{"CurrencyDecimalDigits":2,"CurrencyDecimalSeparator":",","IsReadOnly":true,"CurrencyGroupSizes":[3],"NumberGroupSizes":[3],"PercentGroupSizes":[3],"CurrencyGroupSeparator":".","CurrencySymbol":"$","NaNSymbol":"NeuN","CurrencyNegativePattern":14,"NumberNegativePattern":1,"PercentPositivePattern":0,"PercentNegativePattern":0,"NegativeInfinitySymbol":"-Infinito","NegativeSign":"-","NumberDecimalDigits":2,"NumberDecimalSeparator":",","NumberGroupSeparator":".","CurrencyPositivePattern":2,"PositiveInfinitySymbol":"Infinito","PositiveSign":"+","PercentDecimalDigits":2,"PercentDecimalSeparator":",","PercentGroupSeparator":".","PercentSymbol":"%","PerMilleSymbol":"‰","NativeDigits":["0","1","2","3","4","5","6","7","8","9"],"DigitSubstitution":1},"dateTimeFormat":{"AMDesignator":"a.m.","Calendar":{"MinSupportedDateTime":"\/Date(-62135578800000)\/","MaxSupportedDateTime":"\/Date(253402300799999)\/","AlgorithmType":1,"CalendarType":1,"Eras":[1],"TwoDigitYearMax":2029,"IsReadOnly":true},"DateSeparator":"/","FirstDayOfWeek":0,"CalendarWeekRule":0,"FullDateTimePattern":"dddd, dd\u0027 de \u0027MMMM\u0027 de \u0027yyyy hh:mm:ss tt","LongDatePattern":"dddd, dd\u0027 de \u0027MMMM\u0027 de \u0027yyyy","LongTimePattern":"hh:mm:ss tt","MonthDayPattern":"dd MMMM","PMDesignator":"p.m.","RFC1123Pattern":"ddd, dd MMM yyyy HH\u0027:\u0027mm\u0027:\u0027ss \u0027GMT\u0027","ShortDatePattern":"dd/MM/yyyy","ShortTimePattern":"hh:mm tt","SortableDateTimePattern":"yyyy\u0027-\u0027MM\u0027-\u0027dd\u0027T\u0027HH\u0027:\u0027mm\u0027:\u0027ss","TimeSeparator":":","UniversalSortableDateTimePattern":"yyyy\u0027-\u0027MM\u0027-\u0027dd HH\u0027:\u0027mm\u0027:\u0027ss\u0027Z\u0027","YearMonthPattern":"MMMM\u0027 de \u0027yyyy","AbbreviatedDayNames":["dom","lun","mar","mié","jue","vie","sáb"],"ShortestDayNames":["do","lu","ma","mi","ju","vi","sá"],"DayNames":["domingo","lunes","martes","miércoles","jueves","viernes","sábado"],"AbbreviatedMonthNames":["ene","feb","mar","abr","may","jun","jul","ago","sep","oct","nov","dic",""],"MonthNames":["enero","febrero","marzo","abril","mayo","junio","julio","agosto","septiembre","octubre","noviembre","diciembre",""],"IsReadOnly":true,"NativeCalendarName":"calendario gregoriano","AbbreviatedMonthGenitiveNames":["ene","feb","mar","abr","may","jun","jul","ago","sep","oct","nov","dic",""],"MonthGenitiveNames":["enero","febrero","marzo","abril","mayo","junio","julio","agosto","septiembre","octubre","noviembre","diciembre",""]},"eras":[1,"d.C.",null,0]};//]]>
</script>
<script src="/peoploEL/ScriptResource.axd?d=oxaJQOalmF_Pc9FHyAFTk_k6TF1NEbUrjIYsB44pk6WCbYo_nSIw4yk5tC2xEtvEorNRA5gOfFsIU4ZnWzjKxobYxQm7qlMyDI-yMbMSd2l6ZDbJap8N8TY6mfiS7PCqS0ZD_N1nysIMDoEuJENdCQ2&t=23c9c237" type="text/javascript"></script>
<script type="text/javascript">
//<![CDATA[
if (typeof(Sys) === 'undefined') throw new Error('ASP.NET Ajax client-side framework failed to load.');
//]]>
</script>
<div class="aspNetHidden">
<input type="hidden" name="__SCROLLPOSITIONX" id="__SCROLLPOSITIONX" value="0" />
<input type="hidden" name="__SCROLLPOSITIONY" id="__SCROLLPOSITIONY" value="0" />
<input type="hidden" name="__EVENTVALIDATION" id="__EVENTVALIDATION" value="/wEdAArkW6hVSYy1X/RA+Sj0CGQLGp+bdMCDYaJlV2GIWm9IvBdcfX0kLMsTvDhzcFP+5BCmu+5iWjvwd5K06ry8EbPN8eAu30BFMFNpn4fF9w5RD0sfx0Rt1Zoo22r6RgHWIEvbk+/Q0viP1b4fioHhV6vuLByhWnJD/fsZOTyD54nbDa+qASD48033XmTIh5CNr4axLA/MabVFryGhaiI+QVUeJtZhbNAXh60wJUXNyENePpp0PUjhju74p8tImEJGpMk=" />
</div>
<TABLE id="Table1" border="0" cellSpacing="0" cellPadding="0" width="80%" align="center"
height="72%">
<TR>
<TD height="25" vAlign="top" width="165" align="center"></TD>
<TD height="25" width="10"></TD>
<TD height="25" vAlign="top"></TD>
</TR>
<TR>
<TD vAlign="top" width="165" align="center">
<LINK rel="stylesheet" type="text/css" href="EstilosWeb.css">
<LINK rel="stylesheet" type="text/css" href="EstilosWeb.css">
<TABLE style="WIDTH: 160px; HEIGHT: 64px" id="tMain" class="main" cellPadding="0" width="160">
<TR vAlign="top">
<TD id="NavTd">
<DIV id="Nav">
<H4 align="center">Menu
<table id="PanelIzquierdoUC1_htbCategorias" cellspacing="0" cellpadding="0" style="border-width:0px;width:160px;border-collapse:collapse;">
<tr>
<td><a id="PanelIzquierdoUC1_ConsultarLiquidacion" title="Consulta de Liquidación" href="Default.aspx?IdControl=ConsultaLiquidacionFltUC">Consultar Liquidación</a></td>
</tr><tr>
<td><a id="PanelIzquierdoUC1_Reportes" title="Certificado Ing. y Ret." href="Default.aspx?IdControl=ReportesUC">Certificado Ing. y Ret.</a></td>
</tr><tr>
<td><a id="PanelIzquierdoUC1_CambiarClave" title="Cambio de Clave" href="Default.aspx?IdControl=CambioClaveUC">Cambio de Clave</a></td>
</tr><tr>
<td><a id="PanelIzquierdoUC1_ReportesGeneral" title="Reportes" href="Default.aspx?IdControl=SolicitarReporteUC&TipoProceso=G">Reportes</a></td>
</tr><tr>
<td><a id="PanelIzquierdoUC1_CerrarSesion" title="Cerrar Sesion" href="Default.aspx?IdControl=CerrarSesionUC">Cerrar Sesion</a></td>
</tr>
</table></H4>
</DIV>
</TD>
</TR>
</TABLE>
</TD>
<td width="10"> </td>
<TD vAlign="top">
<div id="pnlCargaUserControl" style="width:100%;">
<LINK href="EstilosWeb.css" type="text/css" rel="stylesheet">
<style type="text/css">
.style1
{
height: 26px;
width: 36px;
}
</style>
<TABLE class="FormaTabla" id="Table1" cellSpacing="1" cellPadding="1" width="300" border="0">
<TR>
<TD class="FormaEncabezado" colSpan="2">Reportes</TD>
</TR>
<TR>
<TD colSpan="2">
<P align="center"> </P>
</TD>
</TR>
<TR>
<TD colSpan="2"><select size="4" name="ctl06$lstReportes" onchange="javascript:setTimeout('__doPostBack(\'ctl06$lstReportes\',\'\')', 0)" id="ctl06_lstReportes" class="FormaInfo" style="height:215px;width:564px;">
<option selected="selected" value="10095">Certificado de historia laboral FPM</option>
</select></TD>
</TR>
<TR>
<TD colSpan="2">Parametros</TD>
</TR>
<TR>
<TD style="HEIGHT: 45px" colSpan="2"><table id="ctl06_tbParametros" rules="all" border="1">
<tr>
<td>Empleado</td><td><input name="ctl06$txtParam_1" type="text" value="139211" readonly="readonly" onchange="javascript:setTimeout('__doPostBack(\'ctl06$txtParam_1\',\'\')', 0)" onkeypress="if (WebForm_TextBoxKeyHandler(event) == false) return false;" id="ctl06_txtParam_1" Tabla="Empleado_VIPP" CodigoCampo="CodEmpleado" DescripcionCampo="Empleado" Condicion="" TipoDato="N" Parametro="Empleado" /><input type="submit" name="ctl06$btnParam_1" value="..." id="ctl06_btnParam_1" disabled="disabled" class="aspNetDisabled" onclick="javascript:return BuscarConPostBack('Empleado_VIPP','CodEmpleado','Empleado','','ctl06_txtParam_1','ctl06_txtDesc_1');" style="width:25px;" /></td><td><input name="ctl06$txtDesc_1" type="text" value="JUAN DE LOS PALOTES" readonly="readonly" id="ctl06_txtDesc_1" style="width:250px;" /></td>
</tr>
</table></TD>
</TR>
<TR>
<TD class="style1">
</TD>
<td>
<P align="center"><select name="ctl06$ddlFormato" id="ctl06_ddlFormato" style="width:104px;">
<option value="PDF">PDF</option>
</select> <input type="submit" name="ctl06$btnAceptar" value="Aceptar" id="ctl06_btnAceptar" />
</P>
</td>
</TR>
<TR>
<TD colSpan="2">
<P align="left"><span id="ctl06_lblMensaje" style="color:Red;font-family:Arial;"></span></P>
</TD>
</TR>
</TABLE>
<P>
<input type="submit" name="ctl06$ButActualizar" value="Actualizar" id="ctl06_ButActualizar" /></P>
<P><table class="FormaGrid" cellspacing="0" rules="all" border="1" id="ctl06_dtgDatos" style="border-collapse:collapse;">
<tr>
<td> </td><td>CodPeticion</td><td>FechaHora</td><td>Peticion</td><td>Estado</td><td>DetalleEstado</td>
</tr><tr>
<td style="white-space:nowrap;">
<a id="ctl06_dtgDatos_ctl03_cmdVer" href="javascript:__doPostBack('ctl06$dtgDatos$ctl03$cmdVer','')">Ver</a>
</td><td>9842466</td><td>04/07/2017</td><td>Certificado(139211,)</td><td>T</td><td>Terminado</td>
</tr><tr>
<td colspan="6"><span>1</span></td>
</tr>
</table></P>
</div>
</TD>
</TR>
</TABLE>
PYTHON:
form2 = browser.get_form(id='Form1')
form2["ctl06$txtParam_1"].value = '139211'
form2["ctl06$txtDesc_1"].value = 'JUAN DE LOS POTES'
form2["ctl06$ddlFormato"].value = 'PDF'
form2["ctl06$lstReportes"].value = '10095'
form2["__EVENTTARGET"].value = 'ctl06$dtgDatos$ctl03$cmdVer'
form2["__EVENTARGUMENT"].value = ''
browser.submit_form(signin2)
Use python request lib for that
Create Json and pass it through the headers and remember <__EVENTTARGET>
<__EVENTARGUMENT> This previous <> mention parameter always changing after few minute (based on website).
It Will easy if you use POST method and for before sending request check it in POSTMAN once.
header = {
"ctl00$ContentPlaceHolder1$txt_tradename": str(index),
"ctl00$ContentPlaceHolder1$txtSearchTin": "",
"ctl00$ContentPlaceHolder1$ddl_dist": 2,
"ctl00$ContentPlaceHolder1$btnDlrSearch": "Search",
"__EVENTVALIDATION": token.get("__EVENTVALIDATION", "")
, "__VIEWSTATEGENERATOR": token.get("__VIEWSTATEGENERATOR"),
"__VIEWSTATE": token.get("__VIEWSTATE")
}
try:
req = requests.post(url, header)
Related
I need to pass the result of soup.find_all to another soup.find_all function to filter the HTML code for a project
I have this HTML code for example: <table class="nested4"> <tr> <td colspan="1"></td> <td colspan="2"> <h2 class="zeroMargin" id="govtMsg" visible="false"></h2> </td> <td colspan="2"> <h2 class="zeroMargin "> Net Metering Conn. </h2> </td> <td colspan="2"> <h2 class="zeroMargin" hidden> Life Line Consumer</h2> </td> </tr> <tr> <td colspan="2"> <p style="margin: 0; text-align: left; padding-left: 5px"> <span>NAME & ADDRESS</span> <br /> <span>MUHAMMAD AMIN </span> <br /> <span>S/O MUHAMMAD KHAN </span> <br /> <span>H-NO.38 MARGALLA ROAD </span> <br /> <span>F-6/3 ISLAMABAD3 </span> <br /> <span></span> </p> </td> <td colspan="3" style="text-align: left"> <h2 class="color-red">Say No To Corruption</h2> <span style="font-size: 8pt; color: #78578e"> MCO Date : 10-Aug-2018</span> <br /> </td> <td> <h3 style="font-size: 14pt;"> </h3> <h2> <br /> </h2> </td> </tr> <tr> <td style="margin-top: 0;" class="border-b"> <br /> </td> <td colspan="1" style="margin-top: 0;" class="border-b"> </td> <td colspan="1" style="margin-top: 0;" class="border-b"> </td> </tr> <tr style="height: 7%;" class="border-tb"> <td style="width: 130px" class="border-r"> <h4>METER NO</h4> </td> <td style="width: 90px" class="border-r"> <h4>PREVIOUS READING</h4> </td> <td style="width: 90px" class="border-r"> <h4>PRESENT READING</h4> </td> <td style="width: 60px" class="border-r"> <h4>MF</h4> </td> <td style="width: 60px" class="border-r"> <h4>UNITS</h4> </td> <td> <h4>STATUS</h4> </td> </tr> <tr style="height: 30px" class="content"> <td class="border-r"> 3-P I 3301539<br> I 3301539<br> E 3301539<br> E 3301539<br> </td> <td class="border-r"> 78693<br>16823<br>19740<br>8<br> </td> <td class="border-r"> 80086<br>17210<br>20139<br>8<br> </td> <td class="border-r"> 1<br>1<br>1<br>1<br> </td> <td class="border-r"> 1393<br>387<br>399<br>0<br> </td> <td> </td> </tr> <tr id="roshniMsg" style="height: 30px" class="content"> <td colspan="6"> <div style="width: 452pt"> <img style="max-width: 100%; max-height: 35%" src="/images/companies/iesco/roshniMsg.jpg" alt="Roshni Message" /> </div> </td> </tr> </table> From this table I want to extract the paragraph and from there I want to get all the span tags in that paragraph. I used soup.find_all() to get the table but I don't know how to use this function iteratively to pass it back to the original soup object so that I could find the paragraph and, moreover the span tags in that paragraph. This is the code Python code I wrote: soup = BeautifulSoup(string, 'html.parser') #Getting the table tag results = soup.find_all('table', attrs={'class':'nested4'}) #Getting the paragragh tag results = soup.find_all('p', attrs={'style':'margin: 0; text-align: left; padding-left: 5px'}) #Getting all the span tags results = soup.find_all('span', attrs={}) I just want help on how to get the paragraphs within the table. And then how to get the spans within the paragraph as I am getting the spans in all of the original HTML code. I don't know how to pass the bs4 object list back to the soup object to use soup.find_all iteratively.
from bs4 import BeautifulSoup html = ''' <table class="nested4"> <tr> <td colspan="1"></td> <td colspan="2"> <h2 class="zeroMargin" id="govtMsg" visible="false"></h2> </td> <td colspan="2"> <h2 class="zeroMargin "> Net Metering Conn. </h2> </td> <td colspan="2"> <h2 class="zeroMargin" hidden> Life Line Consumer</h2> </td> </tr> <tr> <td colspan="2"> <p style="margin: 0; text-align: left; padding-left: 5px"> <span>NAME & ADDRESS</span> <br /> <span>MUHAMMAD AMIN </span> <br /> <span>S/O MUHAMMAD KHAN </span> <br /> <span>H-NO.38 MARGALLA ROAD </span> <br /> <span>F-6/3 ISLAMABAD3 </span> <br /> <span></span> </p> </td> <td colspan="3" style="text-align: left"> <h2 class="color-red">Say No To Corruption</h2> ''' soup = BeautifulSoup(html, 'html.parser') spans = soup.select_one('table.nested4').select('span') for span in spans: print(span.text) This returns: NAME & ADDRESS MUHAMMAD AMIN S/O MUHAMMAD KHAN H-NO.38 MARGALLA ROAD F-6/3 ISLAMABAD3
if you have one table: soup = BeautifulSoup(string, 'html.parser') table = soup.find('table', attrs={'class': 'nested4'}) p = table.find('p', attrs={'style': 'margin: 0; text-align: left; padding-left: 5px'}) results = p.find_all('span') for result in results: print(result.get_text(strip=True)) if you have list of tables: soup = BeautifulSoup(string, 'html.parser') for table in soup.find_all('table', attrs={'class': 'nested4'}): for p in table.find_all('p', attrs={'style': 'margin: 0; text-align: left; padding-left: 5px'}): for span in p.find_all('span'): print(span.get_text(strip=True))
how to get values from nested tables using beautifulsoup
I need to get the name and the price of each row in the sample html below, however when I'm using beatifulsoup to find_all('tr') it returns all the tr of the main table and the nested tables. what is the best way for extracting only the value and the price of each row? soup = BeautifulSoup(f, 'html.parser') priceTable = soup.find('table', attrs={"class":"table table-hover table-responsive"}) Above is what I have and it returns "all" the tr including the nested tables. What I need is to get all the names and the price of each item in front of it, and finally save them in a csv file <table class="table table-hover table-responsive"> <tbody><tr> <td style="vertical-align: middle; width: 20%;" class="hidden-xs"> <img class="retailer-logo" data-placement="right" src="/images/20180813125BhYNMEK8lgOpXj3zxze53WmqeRWov7h.jpg" alt="Contact Energy" style="width:150px;" title="" data-original-title="" /> </td> <td style="vertical-align: middle; width: 75px;" class="hidden-xs"> <img src="/images/result-arrow.png" /> </td> <td> <table style="width: 100%;"> <tbody><tr class="visible-xs"> <td class="text-center" colspan="2"> <img class="retailer-logo" data-placement="right" src="/images/20180813125BhYNMEK8lgOpXj3zxze53WmqeRWov7h.jpg" alt="Contact Energy" style="width:150px;" title="" data-original-title="" /> </td> </tr> <tr> <td colspan="3"><h4>Contact Energy Saver Plus</h4></td> </tr> <tr style="text-transform: uppercase"> <td width="150px">Electricity:</td> <td>$242.85 <a class="plan-breakdown" data-placement="right" title="" data-original-title="<table><tr><td>Anytime</td><td>$0.334</td><td>per kWh</td><tr><td>Daily</td><td>$0.333</td><td>per day</td><tr><td>EA Levy</td><td>$0.0013</td><td>per kWh</td></table>"><i class="glyphicon glyphicon-info-sign"> </i></a> </td> </tr> <tr style="text-transform: uppercase"> <td>Discount:</td> <td>$63.14 (26%) </td> </tr> <tr> <td colspan="3"> <a class="plan-detail" data-placement="right" title="" data-original-title="<ul><li>Provides fixed pricing until 31 June 2021 unless there are changes to taxes and levies.</li><li>24% Prompt Payment Discount when you pay on time. additional 1% discount for paying by direct debit (excl. credit card), and 1% discount for getting bills and correspondence by email. Up to 26% PPD available.</li><li>An early termination fee of $150 per contracted ICP if you terminate the contract before the end date�(31/06/2021). Fee may be waived if you are moving house and take Contact Energy to the new property.</li><li>Not available to prepay customers.</li></ul>"><i class="glyphicon glyphicon-info-sign"> </i> What you need to know</a> </td> </tr> <tr class="visible-xs"> <td colspan="2"> <h3 class="total">$179.71</h3> <div class="incentive"> <b style="text-transform: uppercase">SPECIAL SwitchMe OFFER</b><br /> Special PPD & Fixed rates<br /> <a style="font-size: 0.9em;" class="incentive-info" title="" data-original-title="Receive�a special Prompt Payment Discount and fixed rates until 31 June 2021 unless there are changes to taxes and levies">More Info</a> </div> </td> </tr> <tr class="visible-xs"> <td colspan="2"> <form id="w0" action="/switch/" method="post"> <input type="hidden" name="_csrf" value="Hi21xBvkP6NpUl0UcaFwxn4U5-94Jj8KqEeprOfuG9tMfP2gStRY6RFrBGdF6gGvT0uM3CAQaVvOPpnq1IddtQ==" /> <input type="hidden" name="query_id" value="409884" /> <input type="hidden" name="plan_group_id" value="54" /> <input type="hidden" name="plan_stage_id" value="367" /> <button type="submit" class="btn btn-block btn-switch" style="max-width: 100%; margin-top: 10px">Switch Now!</button> </form> <div class="wannatalk" style="max-width: 100%"> Want to talk?<br /> Call our friendly team on<br /> <b>0800 179 482</b> </div> </td> </tr> </tbody></table> </td> <td style="text-align: center" class="hidden-xs"> <h3 class="total">$179.71</h3> <div class="incentive"> <b style="text-transform: uppercase">SPECIAL SwitchMe OFFER</b><br /> Special PPD & Fixed rates<br /> <a style="font-size: 0.9em;" class="incentive-info" title="" data-original-title="Receive�a special Prompt Payment Discount and fixed rates until 31 June 2021 unless there are changes to taxes and levies">More Info</a> </div> </td> <td class="hidden-xs"> <form id="w1" action="/switch/" method="post"> <input type="hidden" name="_csrf" value="Hi21xBvkP6NpUl0UcaFwxn4U5-94Jj8KqEeprOfuG9tMfP2gStRY6RFrBGdF6gGvT0uM3CAQaVvOPpnq1IddtQ==" /> <input type="hidden" name="query_id" value="409884" /> <input type="hidden" name="plan_group_id" value="54" /> <input type="hidden" name="plan_stage_id" value="367" /> <button type="submit" class="btn btn-block btn-switch">Switch Now!</button> </form> <div class="wannatalk"> Want to talk?<br /> Call our friendly team on<br /> <b>0800 179 482</b> </div> </td> </tr> <tr> <td style="vertical-align: middle; width: 20%;" class="hidden-xs"> <img class="retailer-logo" data-placement="right" src="/images/20171013102LzWd_kdtQOk4yxxyZuCZBG6q7xIuClx.jpg" alt="Powershop" style="width:150px;" title="" data-original-title="" /> </td> <td style="vertical-align: middle; width: 75px;" class="hidden-xs"> <img src="/images/result-arrow.png" /> </td> <td> <table style="width: 100%;"> <tbody><tr class="visible-xs"> <td class="text-center" colspan="2"> <img class="retailer-logo" data-placement="right" src="/images/20171013102LzWd_kdtQOk4yxxyZuCZBG6q7xIuClx.jpg" alt="Powershop" style="width:150px;" title="" data-original-title="" /> </td> </tr> <tr> <td colspan="3"><h4>Powershop Saver</h4></td> </tr> <tr style="text-transform: uppercase"> <td width="150px">Electricity:</td> <td>$183.40 <a class="plan-breakdown" data-placement="right" title="" data-original-title="<table><tr><td>Anytime</td><td>$0.2508</td><td>per kWh</td><tr><td>Daily</td><td>$0.30</td><td>per day</td><tr><td>EA Levy</td><td>$0.00</td><td>per kWh</td></table>"><i class="glyphicon glyphicon-info-sign"> </i></a> </td> </tr> <tr style="text-transform: uppercase"> <td>Discount:</td> <td>$0.00 (0%) </td> </tr> <tr> <td colspan="3"> <a class="plan-detail" data-placement="right" title="" data-original-title="<ul><li>The price estimate is based on forecast charges from Powershop for the next 12 months.</li><li>It assumes you purchase the Powershop Simple Saver powerpack once a month and special powerpacks that are made available from time to time.</li><li>This offer does not require a contract or a minimum supply period.</li><li>New customers will get a $150 power credit applied over their first 12 months ($25 straight away, $10 on the next 10�monthly account review periods, and a final credit of $25 in the final account review period of your first year as a Powershop customer).�</li></ul>"><i class="glyphicon glyphicon-info-sign"> </i> What you need to know</a> </td> </tr> <tr class="visible-xs"> <td colspan="2"> <h3 class="total">$183.40</h3> <div class="incentive"> <b style="text-transform: uppercase">SPECIAL SwitchMe OFFER</b><br /> Get $150 off your bill over 12 months!<br /> <a style="font-size: 0.9em;" class="incentive-info" title="" data-original-title="<div><div>New customers will get a $150 power credit applied over their first 12 months ($25 straight away, then $10 for the next 10�monthly account review periods, and a final credit of $25 in the final account review period of your first year as a Powershop customer).</div><div>�</div></div><div><br></div><div><br></div>">More Info</a> </div> </td> </tr> <tr class="visible-xs"> <td colspan="2"> <form id="w2" action="/switch/" method="post"> <input type="hidden" name="_csrf" value="Hi21xBvkP6NpUl0UcaFwxn4U5-94Jj8KqEeprOfuG9tMfP2gStRY6RFrBGdF6gGvT0uM3CAQaVvOPpnq1IddtQ==" /> <input type="hidden" name="query_id" value="409884" /> <input type="hidden" name="plan_group_id" value="53" /> <input type="hidden" name="plan_stage_id" value="273" /> <button type="submit" class="btn btn-block btn-switch" style="max-width: 100%; margin-top: 10px">Switch Now!</button> </form><div class="wannatalk" style="max-width: 100%"> Want to talk?<br /> Call our friendly team on<br /> <b>0800 179 482</b> </div> </td> </tr> </tbody></table> </td> so the output should be: from td[3] and td[4] in first row: Contact Energy Saver Plus $179.71 and then the next row: Powershop Saver $183.40 and so on until the last row ( of the main table).
Similar process to that given in comments but different selectors from bs4 import BeautifulSoup as bs html = '''yourhtml''' soup = bs(html, 'lxml') names = [item.text for item in soup.select('.table h4 ')] prices = [item.text for item in soup.select('[colspan="2"] > .total')] results = list(zip(names, prices)) print(results)
I actually managed to solve this with using regex. I like the approach in the above answer much better specially using zip(), but I though pasting my solution here in case it becomes handy to some other readers. deals=[] prices=[] results={} with open("prices.html", "r") as f: soup = BeautifulSoup(f, 'html.parser') priceTable = soup.find('table', attrs={"class":"table table-hover table-responsive"}) tbody = priceTable.find('tbody') pplanPattern = '<td\ colspan="3"><h4>([^<]+)<\/h4><\/td>' pricePatterns = '<h3 class="total">([^<]+)<\/h3>' for rw in tbody: plan = re.search(pplanPattern, rw) price = re.search(pricePatterns, rw) if plan: deals.append(plan.group(1)) if price: deals.append(price.group(1)) results[plan.group(1)] = price.group(1)
Clicking a link using Selenium Python Library
How could I click the link highlighted in the attached image using python selenium library, i have tried everything (most attempts shown below), but all not working. note in the attached picture the element tree. Page code (link in question is the last link in the code below: <HEAD> <TITLE> Constructor Self Service</TITLE> </HEAD> <script type="text/javascript"> ...... </script> <link rel="stylesheet" href="/global/res/themes/corporate/css/style3.css" type="text/css"> <script type="text/javascript" src="/global/res/javascript/horizontal_subsection_HM_Loader.js"></script> <link rel=stylesheet href="css/tcss.css" TYPE="text/css"> <script type="text/javascript" src="arrays/aaclib.js"></script> <script type="text/javascript" src="arrays/csslib.js"></script> <table cellspacing=0 cellpadding=0 border=0 width="100%" height="100%" > <tr><td valign="top"> <form name="mainForm" action="/TCSSPRODapp/Controller" method="POST" onsubmit="return false;"> <input type="hidden" name="command"> <input type="hidden" name="page"> <input type="hidden" name="menuIndex"> <input type="hidden" name="submitted" value="false"> <input type="hidden" name="pageToken" value=679> <input type="hidden" name="isInternal" value=false> <body bgcolor="#ffffff" leftmargin="0" topmargin="0" marginwidth="0" marginheight="0"> <!--begin global header--><!--Don't place any other HTML code on this line!!--> <script language="javascript" src="/global/javascript/head_array.js" type="text/javascript"></script> <script language="javascript" type="text/javascript"> var v6=0; if(typeof hmVisi!="undefined"&&typeof sectionId!="undefined" )v6=1; if(v6)document.write('</head><body bgcolor="#ffffff" leftmargin="0" topmargin="0" marginwidth="0" marginheight="0" onLoad="TEL_onLoad()">'); function alertUser() { if(confirm("Warning: Any unsaved data would be lost.\rDo you want to Continue?")) { return true; } else { return false; } } </script> <link rel="stylesheet" href="/global/css/gwy_hed.css" type="text/css"> <a name="top"></a> <table width="100%" cellpadding="0" cellspacing="0" border="0"> <tr> <td rowspan="4" valign="top"><img src="/global/images/telstra_logo2.gif" width="85" height="85" border="0" alt="telstra.com"></td> <td align="right" class="ap1"><span class="hd1"><img src="/global/images/sin.gif" width="1" height="37" border="0" alt="press enter now to toggle the accessibile text mode of this page" align="absmiddle"> Telstra Homepage | Contact Us | Search<img src="/global/images/sin.gif" width="12" height="1" alt=""></span></td> <td width="50%" class="ap1"><img src="/global/images/sin.gif" width="1" height="57" alt=""></td> </tr> <tr class="hd3"> <!--<td><img src="/global/images/sin.gif" width="500" height="2" alt=""></td>--> <td colspan=50 width="100%"><img src="/global/images/sin.gif" width="100" height="2" alt=""></td> <td><img src="/global/images/sin.gif" width="1" height="2" alt=""></td> </tr> <tr> <td align="right"><img src="/global/images/gwy_grad.gif" width="328" height="6" alt=""></td> <td class="hd4"><img src="/global/images/sin.gif" width="1" height="1" alt=""></td> </tr> <tr> <td colspan=2 class="hd5" nowrap><NOBR> <table cellpadding=1 cellspacing=1 border=0> <tr> <!-- AMCO-WORKFLOW:MODIFY:START --> <!--<td rowspan=3 class="hd5" nowrap><img src="/global/images/sin.gif" width="15" height="10" alt="" align="top">Telstra Contractor Self Service<img src="/global/images/sin.gif" width="25" height="1" alt="" align="top"></td>--> <td rowspan=4 class="hd5" nowrap><img src="/global/images/sin.gif" width="5" height="10" alt="" align="top">Telstra Constructor Self Service<img src="/global/images/sin.gif" width="5" height="1" alt="" align="top"></td> <!-- AMCO-WORKFLOW:MODIFY:END --> <TD nowrap><img src="/global/images/sin.gif" width="10" height="12"></TD> <TD nowrap></TD> <TD nowrap></TD> <!-- AMCO-WORKFLOW:ADD:START --> <!--5.14.01 Anand Changes: start--> <!--<TD nowrap></TD>--> <!--5.14.01 Anand Changes: start--> <!-- AMCO-WORKFLOW:ADD:END --> <!--5.14.01 Anand Changes: start--> </tr> <tr style="background-color:#99ccff;"> <!-- AMCO-WORKFLOW:ADD:START --> <TD class=button> <a class=button style="text-decoration:none" title="AMCO" href="javascript:doCommand('doWidebandWorkSummary')" onmousemove="window.status='View Work Summary'" onmouseout="window.status=window.defaultStatus"> AMCO</a> </TD> <!-- AMCO-WORKFLOW:ADD:END --> <TD nowrap style="background-color:#ffffff;"></TD> <TD class=button nowrap> <a class=button style="text-decoration:none" title="View Inbox" href="javascript:doCommand('doInboxStat')" onmousemove="window.status='View Inbox'" onmouseout="window.status=window.defaultStatus"> <NOBR>Inbox 0</NOBR></a> </TD> <TD class=button> <a class=button style="text-decoration:none" title="View Outbox" href="javascript:doCommand('doOutboxStat')" onmousemove="window.status='View Outbox'" onmouseout="window.status=window.defaultStatus"> <NOBR>Outbox 0</NOBR></a> </TD> <!--Start : IPaC Stage II Drop1 : Refresh button added--> <TD class=button> <a class=button style="text-decoration:none" title="Refresh Inbox and Outbox" href="javascript:doCommand('doInboxOutboxRefresh')" onmousemove="window.status='Refresh Inbox and Outbox'" onmouseout="window.status=window.defaultStatus" onclick="return alertUser()"> <NOBR>Refresh</NOBR></a> </TD> <!--End : IPaC Stage II Drop1--> <TD nowrap style="background-color:#ffffff;"></TD> <TD class=button nowrap> <a class=button style="text-decoration:none" title="Display TCSS help" href="javascript:loadHelpWindow()" onmousemove="window.status='Display TCSS help'" onmouseout="window.status=window.defaultStatus"> <NOBR>Help</NOBR></a> </TD> <!-- AMCO-WORKFLOW:ADD:START --> <!--5.14.01 Anand Changes: start--> <!--<TD nowrap width="151" style="background-color:#ffffff;"></TD>--> <!--5.14.01 Anand Changes: end--> </tr> <tr> <TD nowrap><img src="/global/images/sin.gif" width="38" height="12"></TD> <!--Start : IPaC Stage II Drop1 : Display message if refresh didnt occur--> <TD nowrap style="background-color:#ffffff;"></TD> <TD nowrap><img src="/global/images/sin.gif" width="38" height="12"></TD> <TD nowrap style="background-color:#ffffff;"></TD></tr> <tr> <TD nowrap><img src="/global/images/sin.gif" width="38" height="12"></TD> <TD nowrap style="background-color:#ffffff;"></TD> </tr> <!--<TD nowrap></TD>--> <!--<TD nowrap></TD>--> <!-- AMCO-WORKFLOW:ADD:START --> <!--<TD nowrap></TD>--> <!-- AMCO-WORKFLOW:ADD:END --> <!--End : IPaC Stage II Drop1--> </tr> </table> </NOBR> </td> </tr> </table> <script language="javascript" type="text/javascript"> <!--5.14.01 Anand Changes: changed from 100% to 15 % start--> var s='<table width="50%" cellpadding="0" cellspacing="0" border="0"><tr><td class="siteletTitleTab"><img src="/global/images/sin.gif" width="1" height="3" alt=""></td></tr></table>'; <!--5.14.01 Anand Changes: changed from 100% to 15 % end--> if(v6)document.write(s); </script> <!--end global header--><!--Don't place any other HTML code on this line!!--> <table width="100%" border="0" cellspacing="0" cellpadding="0"> <tr class=""> <td width="100%" rowspan="2" align=right> </td> <TD> <INPUT type="hidden" name="internalConstructor" size=10 value="false"> </TD> </tr> </table> <table border=0 cellspacing=0 cellpadding=0 width=100%> <tr><td colspan=50 class=sitelettitletab> <img src=/global/res/images/sin.gif width=0 height=3> </td></tr><tr class=sitelettitletab> <td><img src=/global/res/images/sin.gif width=3 height=1></td><td class=sitetabnav valign=top><img src=/global/res/themes/corporate/images/tab_left_inactive.gif width=8 height=20></td><td class=sitetabnav valign=middle align=center nowrap><a class=sitetabnav href='javascript:doCommand("doFrontPage")' onmousemove="window.status='Home Page'" onmouseout="window.status=window.defaultStatus" title='Home Page'>Home Page</a></td><td align=right class=sitetabnav valign=top><img src=/global/res/themes/corporate/images/tab_right_inactive.gif width=8 height=20> <td><img src=/global/res/images/sin.gif width=3 height=1></td><td class=sitetabnav valign=top><img src=/global/res/themes/corporate/images/tab_left_inactive.gif width=8 height=20></td><td class=sitetabnav valign=middle align=center nowrap><a class=sitetabnav href='javascript:doCommand("doTransmittalsInbox")' onmousemove="window.status='My TCSS'" onmouseout="window.status=window.defaultStatus" title='My TCSS'>My TCSS</a></td><td align=right class=sitetabnav valign=top><img src=/global/res/themes/corporate/images/tab_right_inactive.gif width=8 height=20> <td><img src=/global/res/images/sin.gif width=3 height=1></td><td class=siteactivetabnav valign=top><img src=/global/res/themes/corporate/images/tab_left_inactive.gif width=8 height=20></td><td class=siteactivetabnav valign=top><img src=/global/res/images/sin.gif width=5 height=20></td><td class=siteactivetabnav valign=middle align=center nowrap><img src=/global/res/themes/corporate/images/tick.gif width=8 height=9> <a class=siteactivetabnav onmousemove="window.status='Work Under Contract'" onmouseout="window.status=window.defaultStatus">Work Under Contract</a></td><td align=right class=siteactivetabnav valign=top><img src=/global/res/themes/corporate/images/tab_right_inactive.gif width=8 height=20></td> <td><img src=/global/res/images/sin.gif width=3 height=1></td><td class=sitetabnav valign=top><img src=/global/res/themes/corporate/images/tab_left_inactive.gif width=8 height=20></td><td class=sitetabnav valign=middle align=center nowrap><a class=sitetabnav href='javascript:doCommand("doGenDocSearch")' onmousemove="window.status='Documents'" onmouseout="window.status=window.defaultStatus" title='Documents'>Documents</a></td><td align=right class=sitetabnav valign=top><img src=/global/res/themes/corporate/images/tab_right_inactive.gif width=8 height=20> <td><img src=/global/res/images/sin.gif width=3 height=1></td><td class=sitetabnav valign=top><img src=/global/res/themes/corporate/images/tab_left_inactive.gif width=8 height=20></td><td class=sitetabnav valign=middle align=center nowrap><a class=sitetabnav href='javascript:doCommand("doAssetSearch")' onmousemove="window.status='Reference Library'" onmouseout="window.status=window.defaultStatus" title='Reference Library'>Reference Library</a></td><td align=right class=sitetabnav valign=top><img src=/global/res/themes/corporate/images/tab_right_inactive.gif width=8 height=20> <td><img src=/global/res/images/sin.gif width=3 height=1></td><td class=sitetabnav valign=top><img src=/global/res/themes/corporate/images/tab_left_inactive.gif width=8 height=20></td><td class=sitetabnav valign=middle align=center nowrap><a class=sitetabnav href='javascript:doCommand("doAbout")' onmousemove="window.status='Support'" onmouseout="window.status=window.defaultStatus" title='Support'>Support</a></td><td align=right class=sitetabnav valign=top><img src=/global/res/themes/corporate/images/tab_right_inactive.gif width=8 height=20> <td><img src=/global/res/images/sin.gif width=3 height=1></td><td class=sitetabnav valign=top><img src=/global/res/themes/corporate/images/tab_left_inactive.gif width=8 height=20></td><td class=sitetabnav valign=middle align=center nowrap><a class=sitetabnav href='javascript:doLogout()' onmousemove="window.status='Close TCSS'" onmouseout="window.status=window.defaultStatus" title='Close TCSS'>Close TCSS</a></td><td align=right class=sitetabnav valign=top><img src=/global/res/themes/corporate/images/tab_right_inactive.gif width=8 height=20> <td width=100%><img src=/global/res/images/sin.gif width=1 height=1></td></tr><tr><td colspan=50><img src=/global/res/images/sin.gif width=0 height=2></td></tr></table> <table cellpadding=0 cellspacing=0 border=0 width=100%><tr><td> <img src='/global/res/images/sin.gif' width=1 height=3></td></tr><tr class=sitetabnav><td> <table cellpadding=0 cellspacing=0 border=0><tr><script type='text/javascript'> TEL_horizontalSubsectionNav('corporate') </script><td width=100%> </td></tr></table></td></tr></table> <table cellspacing=10 cellpadding=0 border=0 width="100%"> <tr><td valign=top width="100%"> <TABLE border=0 cellpadding=0 cellspacing=0 width=100%> <TR><TD bgcolor=#99ccff> <TABLE cellspacing=2 cellpadding=0 border=0 width=100%> <TR><TD bgcolor=#F1F8FE> <TABLE border=0 cellpadding=0 cellspacing=0> <TR><TD colspan=2 height=1 bgcolor=#F1F8FE></TD> <TD rowspan=3><IMG border=0 src='images/corner.jpg'></TD> </TR><TR><TD><IMG border=0 width=1 height=1 src='images/whitedot.gif'></TD> <TD class='groupboxheader'> WUC CSA - Scope Variation - Site & Financial Details </TD></TR></TABLE> <TABLE BORDER="0" CELLPADDING="0" CELLSPACING="3" WIDTH="100%"> <TR> <TD CLASS="label" WIDTH="20%">Contract Number:</TD> <TD WIDTH="25%" CLASS="fieldlabel">20003171</TD> <TD CLASS="label" WIDTH="20%">Separable Portion:</TD> <TD WIDTH="25%" CLASS="fieldlabel">30056120</TD> </TR> <TR> <TD CLASS="label">Work Order:</TD> <TD CLASS="fieldLabel">1</TD> <TD CLASS="label">CSA Number:</TD> <TD CLASS="fieldLabel">4117298</TD> </TR> <TR> <TD CLASS="label">Current CSA Status:</TD> <TD class="fieldLabel"> Draft</TD> <TD CLASS="label">Initiated By:</TD> <TD CLASS="fieldlabel">CONSTRUCTOR</TD> </TR> <TR> <TD CLASS="label">Issue Number:</TD> <TD CLASS="fieldlabel">1</TD> </TR> </TABLE> </TD> </TR> </TABLE> </TD> </TR> </TABLE> <SCRIPT type="text/javascript"> function showQuote(sequenceNo) { var f = document.mainForm; switch (parseInt(sequenceNo)) { case 10001: f.command.value = "doCsaViewPSDetails"; break; case 10002: f.command.value = "doCsaViewDRDetails"; break; case 10003: f.command.value = "doCsaViewLSDetails"; break; default: f.sequenceNo.value = sequenceNo; f.command.value = "doCsaViewDIDetails"; } doSubmit(f); } function showMaterialQuote(sequenceNo) { var f = document.mainForm; f.sequenceNo.value = sequenceNo; f.command.value = "doCsaViewMatDetails"; doSubmit(f); } //NDCG:For Link to Non Catalogued material screen //NDCG:Add:Start function showNonCatMaterialQuote(sequenceNo) { var f = document.mainForm; f.command.value = "doCsaViewMatExDetails"; f.sequenceNo.value = sequenceNo; doSubmit(f); } //NDCG:ADD:End </SCRIPT> <INPUT type="hidden" name="sequenceNo"> <BR> <TABLE border="0" cellspacing="1" cellpadding="0" class="table" width="100%"> <TR align="middle"> <TD rowspan="2" class=colHeader>Type</TD> <TD colspan="3" class=colHeader>TCSS Calculated</TD> <TD colspan="3" class=colHeader>Quoted</TD> <TD colspan="3" class=colHeader>Approved</TD> <TD rowspan="2" class=colHeader>View</TD> </TR> <TR align="middle"> <TD class=colHeader>Value</TD> <TD class=colHeader>GST</TD> <TD class=colHeader>Price</TD> <TD class=colHeader>Value</TD> <TD class=colHeader>GST</TD> <TD class=colHeader>Price</TD> <TD class=colHeader>Value</TD> <TD class=colHeader>GST</TD> <TD class=colHeader>Price</TD> </TR> <TR> <TD class=cell>1. PENRITH - Generic Land & Building - Project</TD> <TD nowrap class=rightcell>$0.00</TD> <TD nowrap class=rightcell>$0.00</TD> <TD nowrap class=rightcell>$0.00</TD> <TD nowrap class=rightcell>$0.00</TD> <TD nowrap class=rightcell>$0.00</TD> <TD nowrap class=rightcell>$0.00</TD> <TD nowrap class=rightcell>$0.00</TD> <TD nowrap class=rightcell>$0.00</TD> <TD nowrap class=rightcell>$0.00</TD> <TD class=cell align="center" nowrap> <IMG SRC="/global/res/images/view_white.gif" width=44 height=18 onmousemove="window.status='View Item'" onmouseout="window.status=window.defaultStatus" ALT="View Item" BORDER=0> <IMG SRC=/global/res/images/material.gif width=16 height=16 border=0 onmousemove="window.status='Catalogued Material'" onmouseout="window.status=window.defaultStatus" alt='Catalogued Material'> <IMG SRC=/global/res/images/material.gif width=16 height=16 border=0 onmousemove="window.status='Non Catalogued Material'" onmouseout="window.status=window.defaultStatus" alt='Non Catalogued Material'> </TD> </TR> <TR> <TD class=cell>Daywork Rates</TD> <TD nowrap class=rightcell>$0.00</TD> <TD nowrap class=rightcell>$0.00</TD> <TD nowrap class=rightcell>$0.00</TD> <TD nowrap class=rightcell>$0.00</TD> <TD nowrap class=rightcell>$0.00</TD> <TD nowrap class=rightcell>$0.00</TD> <TD nowrap class=rightcell>$0.00</TD> <TD nowrap class=rightcell>$0.00</TD> <TD nowrap class=rightcell>$0.00</TD> <TD class=cell align="center" nowrap> <IMG SRC="/global/res/images/view_white.gif" width=44 height=18 onmousemove="window.status='View Item'" onmouseout="window.status=window.defaultStatus" ALT="View Item" BORDER=0> </TD> </TR> <TR> <TD class=cell>Lump Sums</TD> <TD nowrap class=rightcell>$0.00</TD> <TD nowrap class=rightcell>$0.00</TD> <TD nowrap class=rightcell>$0.00</TD> <TD nowrap class=rightcell>$0.00</TD> <TD nowrap class=rightcell>$0.00</TD> <TD nowrap class=rightcell>$0.00</TD> <TD nowrap class=rightcell>$0.00</TD> <TD nowrap class=rightcell>$0.00</TD> <TD nowrap class=rightcell>$0.00</TD> <TD class=cell align="center" nowrap> <IMG SRC="/global/res/images/view_white.gif" width=44 height=18 onmousemove="window.status='View Item'" onmouseout="window.status=window.defaultStatus" ALT="View Item" BORDER=0> </TD> </TR> Attempts: driver.find_element_by_css_selector("a[href='javascript:showQuote('10003')']").click() driver.find_element_by_xpath("//a[#href="javascript:showQuote('10003')"]").click() driver.find_element_by_partial_link_text('showQuote('10003')').click() driver.find_element_by_css_selector('[href^=javascript:showQuote('10003')]').click() driver.execute_script('showQuote('10003')')
Did you tried by using xpath something like below //tbody/tr[5]/td[11]/a/img i know taking positions of tr and td like 5, 11 is not appropriate if they go on changing. but try once to see is it able to click or not. You can try like this //a[starts-with(#href, 'javascript:showQuote') and contains(#href, '10003')] If you are using firefox browser then use firebug with firepath add-on to check if it is fetching required unique element or not. Thank You, Murali
You need to escape the quotes to make it work: driver.find_element_by_css_selector("[href='javascript:showQuote(\\'10003\\')']").click() Or with a literal string: driver.find_element_by_css_selector(r"[href='javascript:showQuote(\'10003\')']").click()
Extract table from html file using python
I want to extract table from an html file. I have written the following code-snippet to extract the first table: import urllib2 import os import time import traceback from bs4 import BeautifulSoup #find('table',{'class':'tbl_with_brdr'}) outfile= open('D:/Dropbox/Python/apelec.txt','wb') rfile = open('D:/Dropbox/PRI/Data/AP/195778.html') rsoup = BeautifulSoup(rfile) nodes = rsoup.find('div',{'class':'frmtext'}).find('table').find('tr') for node in nodes[1:]: x = node.find('th').find('b').get_text().encode("utf-8") print x y = node.find('th').findNext('th').find('b').get_text().encode("utf-8") print y outfile.write(str(x)+"\t"+str(y)+"\n") outfile.close() Here is the error: 9 rfile = open('D:/Dropbox/PRI/Data/AP/195778.html') 10 rsoup = BeautifulSoup(rfile) ---> 11 nodes = rsoup.find('div',{'class':'frmtext'}).find('table').find('tr') 12 for node in nodes[1:]: 13 x = node.find('th').find('b').get_text().encode("utf-8") AttributeError: 'NoneType' object has no attribute 'find' And the html file is: <!DOCTYPE HTML PUBLIC "-//W3C//DTD HTML 4.01 Transitional//EN" "http://www.w3.org/TR/html4/loose.dtd"> <html> <head> <link rel="icon" type="image/ico" href="images/favicon.ico"/> <meta http-equiv="Content-Type" content="text/html; charset=UTF-8"> <link rel="stylesheet" href="themes/panchayat_default.css" type="text/css"/> <title>consolidated Election Report</title> </head> <body> <!-- To blur the background while processing dwr --> <div class="faded_div process"></div> <div class="popup_block_div process" style="display: none;"> <img alt="" src="images/loading_animation.gif" style="margin-left: auto; margin-right: auto;"> </div> <div id="maincontainer" class="resize"> <div id="headerwrap"> <!-- Header --> <html> <head> <script type='text/javascript' src="/profilerdwr/engine.js"> </script> <script type='text/javascript' src="/profilerdwr/util.js"> </script> <script type="text/javascript" src="/profilerdwr/interface/lgdDao.js"></script> <script type="text/javascript" src="js/common_util_js.js"></script> <link rel="stylesheet" href="css/common_css.css" type="text/css"></link> <meta http-equiv='Content-Type' content='text/html; charset=UTF-8' /> </head> <body > <div class="clear"></div> <div id="headerwrap"> <div id="header"> <div id="new_header"> <div id="logoleft">Area Profiler</div> <div id="logoright"></div> <div class="clear"></div> </div> <div class="clear"></div> <div id="loginnav" align="right"> <table width="100%" class="tbl_no_brdr"> <tr> <td class="tblclear" align="left"> <div id="mainnav">Home </div> </td> </tr> </table> </div> </div> <div class="clear"></div> <div id="topnav"> <table width="100%" class="tbl_no_brdr"> <tr> <td width="85" class="tblclear">Choose Theme :</td> <td width="200" class="tblclear"> <form id="themeForm" name="themeForm" method="get" action="welcome.do"> <input type="hidden" name='OWASP_CSRFTOKEN' value='CN72-BGJW-G7FM-K1S3-P5FF-V1EN-IO4T-GHWU' /> <select name="theme" id="themeId" class="combofield" onchange="submitThemeForm()" style="width: 120px;"> <option value="default">Default Theme</option> <option value="mustard">Mustard Theme</option> <option value="peach">Peach Theme</option> <option value="green">Green Theme</option> <option value="blue">Blue Theme</option> </select> </form> </td> <td style="padding: 0px"> </td> <td class="tblclear"> </td> <td width="14" class="tblclear txticon"><img src="images/btnMinus.jpg" width="16" height="14" border="0" /></div></td> <td width="14" class="tblclear txticon"><img src="images/btnDefault.jpg" width="16" height="14" border="0" /> </td> <td width="28" class="tblclear txticon"><img src="images/btnPlus.jpg" width="16" height="14" border="0" /></td> <script type="text/javascript" > //documenttextsizer.setup("shared_css_class_of_toggler_controls") documenttextsizer.setup("texttoggler") </script> <td width="100" align="right" class="tblclear">Select Language :</td> <td width="108" align="right" class="tblclear"> <form id="languageForm" name="languageForm" method="get" action="welcome.do"> <input type="hidden" name='OWASP_CSRFTOKEN' value='CN72-BGJW-G7FM-K1S3-P5FF-V1EN-IO4T-GHWU' /> <select id="languageId" name="language" class="combofield" style="width: 120px;" onchange="submitLanguageForm()" > <option value=""> Select Language </option> </select> </form> </td> </tr> </table> </div> <div id="breadcrumbnav"> </div> </div> <script type="text/javascript"> function submitThemeForm() { var isOK = confirm("This will Refresh Your Page. Any Unsaved data will be Lost. Do You still want to Continue?"); if(isOK) { document.getElementById('themeForm').submit(); } else { return; } } function submitLanguageForm() { var isOK = confirm("This will Refresh Your Page. Any Unsaved data will be Lost. Do You still want to Continue?"); if(isOK) { document.getElementById('languageForm').submit(); } else { return; } } </script> </body> </html> </div> <div class="clear"></div> <div id="content"> <div id="leftpnl"> <table width="100%" border="0" cellspacing="0" cellpadding="0"> <tr> <td width="100%" valign="top" class="tblclear"> <!-- content -->. <script type="text/javascript" src="js/common_js.js"></script> <!DOCTYPE html PUBLIC "-//W3C//DTD HTML 4.01 Transitional//EN" "http://www.w3.org/TR/html4/loose.dtd"> <html xmlns="http://www.w3.org/1999/xhtml"> <head> <meta http-equiv="Content-Type" content="text/html; charset=UTF-8" /> <script type="text/javascript"> var pathname; $(document).ready(function() {pathname = window.location.pathname;}); function onBack(s) { var position =pathname.indexOf("/", 2); var newPath = ""; var val = s.indexOf("?", 1); if(val>0) { newPath = s+"&redirect=true"; } else { newPath = s+"?redirect=true"; } window.location.replace(".."+pathname.substring(0,position)+"/"+newPath); } function downloadReport(repformat){ //window.location="downloadConsolidatedElectionReportPDF.do?OWASP_CSRFTOKEN=CN72-BGJW-G7FM-K1S3-P5FF-V1EN-IO4T-GHWU"; //document.forms["electionReportForm"].action="downloadConsolidatedElectionReportPDF.do?repformat="+repformat+"&OWASP_CSRFTOKEN=CN72-BGJW-G7FM-K1S3-P5FF-V1EN-IO4T-GHWU"; document.forms["electionReportForm"].action="downloadConsolidatedElectionReportPDF.do?reportformat="+repformat+"&OWASP_CSRFTOKEN=CN72-BGJW-G7FM-K1S3-P5FF-V1EN-IO4T-GHWU"; document.forms["electionReportForm"].method="POST"; document.getElementById('electionReportForm').target="_blank"; document.forms["electionReportForm"].submit(); } </script> <style type="text/css"> .data_link{ color:blue; display: block; text-decoration: none; font-size: 1em; font-weight: bolder; } .disable_link { cursor:default; color:blue; display: block; text-decoration: none; font-size: 1em; font-weight: bolder; } .data_link:VISITED { color:blue; display: block; text-decoration: none; font-size: 1em; font-weight: bolder; } .data_link:HOVER{ text-decoration: underline; } </style> </head> <body> <div id="frmcontent"> <div class="frmhd"> <table width="100%" class="tbl_no_brdr"> <tr> <td align="left" width="90%"> Consolidated Election</td> </tr> </table> </div> <div class="clear"></div> <div class="frmpnlbrdr"> <div class="frmpnlbg"> <div class="frmtxt"> <table width="100%" style="margin-bottom: 10px;" class="tbl_with_brdr"> <tr class="tblRowTitle tblclear" > <th align="left" ><b>State Name</b></th> <th align="left" ><b>Local Body Type</b></th> <th align="left" ><b>Election Term</b></th> <th align="left" ><b>Local Body Name</b></th> </tr> <tr class="tblRowB" style="color: blue;"> <th align="left" >ANDHRA PRADESH</th> <th align="left" >Village Panchayat</th> <th align="left" > 02-Aug-2013 To 01-Aug-2018 </th> <th align="left" >KODIHALLI</th> </tr> </table> <div class="frmhdtitle">Consolidated Election</div> <table width="100%" class="tbl_with_brdr"> <thead> <tr class="tblRowTitle tblclear"> <th align="center" width="5%" ><b>S.No.</b></th> <th align="left" width="9%"><b>Name</b></th> 0 <th align="left" width="9%"><b>Age</b></th> 1 <th align="left" width="9%"><b>Caste Category</b></th> 2 <th align="left" width="9%"><b>Gender</b></th> 3 <th align="left" width="9%"><b>Qualification</b></th> 4 <th align="left" width="9%"><b>Occupation</b></th> 5 <th align="left" width="9%"><b>Email Address</b></th> 6 <th align="left" width="9%"><b>Ward Name</b></th> 7 <th align="left" width="9%"><b>Reservation</b></th> 8 </tr> </thead> <tbody> <tr class="tblRowB"> <td align="center" >1</td> <td>Kambanna</td> <td>36</td> <td>OBC</td> <td>Male</td> <td>Middle or Lower Secondary</td> <td>N/A</td> <td> N/A </td> <td>N/A</td> <td > Yes (OBC / Others) </td> </tr> <tr class="tblRowA"> <td align="center" >2</td> <td>Ramesh</td> <td>39</td> <td>OBC</td> <td>Male</td> <td>Middle or Lower Secondary</td> <td>Workers not reporting any occupations</td> <td> N/A </td> <td>Ward no 1</td> <td > Yes (OBC / Others) </td> </tr> <tr class="tblRowB"> <td align="center" >3</td> <td>S.Manjunath</td> <td>29</td> <td>OBC</td> <td>Male</td> <td>Higher Secondary or Intermediate or Pre University or Senior Secondary</td> <td>Workers not reporting any occupations</td> <td> N/A </td> <td>Ward no 2</td> <td > No (General / Others) </td> </tr> <tr class="tblRowA"> <td align="center" >4</td> <td>Obuleshu</td> <td>48</td> <td>OBC</td> <td>Male</td> <td>Below Primary</td> <td>Workers not reporting any occupations</td> <td> N/A </td> <td>Ward no 3</td> <td > No (General / Others) </td> </tr> <tr class="tblRowB"> <td align="center" >5</td> <td>Mamatha</td> <td>24</td> <td>OBC</td> <td>Female</td> <td>Matriculation or Junior School Certificate or Secondary</td> <td>N/A</td> <td> N/A </td> <td>Ward no 4</td> <td > Yes (General / Female) </td> </tr> <tr class="tblRowA"> <td align="center" >6</td> <td>Shivamma</td> <td>38</td> <td>OBC</td> <td>Female</td> <td>Below Primary</td> <td>N/A</td> <td> N/A </td> <td>Ward no 5</td> <td > Yes (General / Female) </td> </tr> <tr class="tblRowB"> <td align="center" >7</td> <td>Hanumantappa</td> <td>46</td> <td>SC</td> <td>Male</td> <td>Illiterate</td> <td>N/A</td> <td> N/A </td> <td>Ward no 6</td> <td > No (General / Others) </td> </tr> <tr class="tblRowA"> <td align="center" >8</td> <td>Malingappa</td> <td>45</td> <td>SC</td> <td>Male</td> <td>Illiterate</td> <td>N/A</td> <td> N/A </td> <td>Ward no 7</td> <td > No (General / Others) </td> </tr> <tr class="tblRowB"> <td align="center" >9</td> <td>Kamalamma</td> <td>52</td> <td>OBC</td> <td>Female</td> <td>Illiterate</td> <td>N/A</td> <td> N/A </td> <td>Ward no 8</td> <td > Yes (OBC / Female) </td> </tr> <tr class="tblRowA"> <td align="center" >10</td> <td>Muddamma</td> <td>48</td> <td>OBC</td> <td>Female</td> <td>Illiterate</td> <td>N/A</td> <td> N/A </td> <td>Ward no 9</td> <td > Yes (General / Female) </td> </tr> <tr class="tblRowB"> <td align="center" >11</td> <td>Patta Tayamma</td> <td>45</td> <td>SC</td> <td>Female</td> <td>Middle or Lower Secondary</td> <td>N/A</td> <td> N/A </td> <td>Ward no 10</td> <td > Yes (SC / Female) </td> </tr> <tr class="tblRowA"> <td align="center" >12</td> <td>Sujatha</td> <td>35</td> <td>OBC</td> <td>Female</td> <td>Middle or Lower Secondary</td> <td>N/A</td> <td> N/A </td> <td>Ward no 11</td> <td > Yes (OBC / Female) </td> </tr> <tr class="tblRowB"> <td align="center" >13</td> <td>Kadurappa</td> <td>35</td> <td>SC</td> <td>Male</td> <td>Middle or Lower Secondary</td> <td>N/A</td> <td> N/A </td> <td>Ward no 12</td> <td > Yes (SC / Others) </td> </tr> </tbody> </table> <br /> <table width="100%" class="tbl_no_brdr"> <tr> <td align="center"> <input type="button" class="btn" onclick="onClose('welcome.do?OWASP_CSRFTOKEN=CN72-BGJW-G7FM-K1S3-P5FF-V1EN-IO4T-GHWU')" value=Close /> <input type="button" class="btn" onclick="this.disabled=true; this.value='Please Wait .!';onBack('consolidatedElectionReport.do?OWASP_CSRFTOKEN=CN72-BGJW-G7FM-K1S3-P5FF-V1EN-IO4T-GHWU&electionTermId=35107&stateId=28')" value=Back /> </td> </tr> </table> <form id="electionReportForm" name="electionReportForm" action="#" method="post"> <div align="center"><br/> <input type="button" class="btn" onclick="downloadReport('pdf');" value="Export to PDF" size="5" /> <input type="button" class="btn" onclick="downloadReport('xls');" value="Export to Excel" size="5" /> </div> </form> </div> <div class="myclass" style="font-family: Times; text-align: center; font-size: 10.0pt; color: white; font-weight: bold; border: 1px solid gray"> Report generated through Area Profiler (http://areaprofiler.gov.in)Thu Oct 02 22:34:20 IST 2014 </div> </div> </div> </div> </body> </html> </td> </tr> </table> </div> </div> <div class="clear"></div> <div id="footer"> <!-- Footer --> <html> <head> </head> <body> <table width="100%" class="tbl_no_brdr"> <tr> <td colspan="3" class="fotbrdr"></td> </tr> <tr> <td width="161" class="btmlogospace"><a href="http://www.negp.gov.in/" target= "_blank" ><img src="images/e_governance_logo.jpg" width="161" height="38" /></a></td> <td width="93" class="btmlogospace"><a href="http://www.panchayat.gov.in/" target= "_blank" ><img src="images/panchayatilogo.jpg" width="93" height="38" /></a></td> <td align="right" class="btmlogospace">Site is designed, hosted and maintained by National Informatics Centre<br /> Contents on this website is owned,updated and managed by the Ministry of Panchayati Raj</td> </tr> </table> </body> </html> </div> </div> </body> </html>
I paste here an approach, it is not exactly the solution but you can use it as a guide. You have to traverse the DOM tree and extract the values you want. I changed the class of the div you look for from frmtext to frmtxt and in the traversal you have to check if anything is found or not. import urllib2 import os import time import traceback from bs4 import BeautifulSoup outfile= open('out.txt','wb') rfile = open('195778.html') rsoup = BeautifulSoup(rfile) nodes1 = rsoup.find('div',{'class':'frmtxt'}) nodes = nodes1.find('table').find_all('tr') for node in nodes: a = node.find('th') x = None if a != None: x1 = x.find('b') if x1 != None: x2 = x1.get_text().encode("utf-8") print x2 x = x2 y = node.find('th') if y != None: print 'y',y y2 = y.findNext('th') if y2 != None: print 'y2',y2 y3 = y2.find('b') if y3 != None: y = y3.get_text().encode("utf-8") print y outfile.write(str(x)+"\t"+str(y)+"\n") outfile.close()
Extracting table data from html with python and BeautifulSoup
I'm new with python and beautifulsopu lib. I have tried many things, but no luck. My html code could be like: <form method = "post" id="FORM1" name="FORM1"> <table cellpadding=0 cellspacing=1 border=0 align="center" bgcolor="#cccccc"> <tr> <td class="producto"><b>Club</b><br> <input value="CLUB TENIS DE MESA PORTOBAIL" disabled class="txtmascaraform" type="TEXT" name="txtClub" size="60" maxlength="55"> </td> <tr> <td colspan="2" class="producto"><b>Nombre Equipo</b><br> <input value="C.T.M. PORTOBAIL" disabled class="txtmascaraform" type="TEXT" name="txtNomEqu" size="100" maxlength="80"> </td> </tr> <tr> <td class="producto"><b>Telefono fijo</b><br> <input value="63097005534" disabled class="txtmascaraform" type="TEXT" name="txtTelf" size="15" maxlength="10"> </td and I need JUST to take what is within <"b"><"/b"> and its "input value" . Many thanks!!
First find() your form by id, then find_all() inputs inside and get the value of value attribute: from bs4 import BeautifulSoup data = """<form method = "post" id="FORM1" name="FORM1"> <table cellpadding=0 cellspacing=1 border=0 align="center" bgcolor="#cccccc"> <tr> <td class="producto"><b>Club</b><br> <input value="CLUB TENIS DE MESA PORTOBAIL" disabled class="txtmascaraform" type="TEXT" name="txtClub" size="60" maxlength="55"> </td> <tr> <td colspan="2" class="producto"><b>Nombre Equipo</b><br> <input value="C.T.M. PORTOBAIL" disabled class="txtmascaraform" type="TEXT" name="txtNomEqu" size="100" maxlength="80"> </td> </tr> <tr> <td class="producto"><b>Telefono fijo</b><br> <input value="63097005534" disabled class="txtmascaraform" type="TEXT" name="txtTelf" size="15" maxlength="10"> </td> </tr> </table> </form>""" soup = BeautifulSoup(data) form = soup.find("form", {'id': "FORM1"}) print [item.get('value') for item in form.find_all('input')] # UPDATE for getting table cell values table = form.find("table") print [item.text.strip() for item in table.find_all('td')] prints: ['CLUB TENIS DE MESA PORTOBAIL', 'C.T.M. PORTOBAIL', '63097005534'] [u'Club', u'Nombre Equipo', u'Telefono fijo']