Selecting Current Day in Seleniunm - python

I'm trying to open a calendar and select only the current day of the specific calendar (month, and using selenium).
So far, I have this:
#Click on calendar and open the same
self.search_field = self.driver.find_element_by_xpath("//form[#id='searchform']/div/div/div[2]/div/input")
self.search_field.click()
# Get current day
current_date = date.today()
today_day = current_date.day
print today_day
Now that I've got the "current day," how would I select that "current day" within the given calendar?
[Edit]
Calendar HTML
<div id="ui-datepicker-div" class="ui-datepicker ui-widget ui-widget-content ui-helper-clearfix ui-corner-all" style="position: absolute; top: 151px; left: 222.5px; z-index: 1; display: block;">
<div class="ui-datepicker-header ui-widget-header ui-helper-clearfix ui-corner-all">
<table class="ui-datepicker-calendar">
<thead>
<tbody>
<tr>
<tr>
<tr>
<tr>
<td class=" ui-datepicker-week-end " data-year="2015" data-month="10" data-event="click" data-handler="selectDay">
<td class=" " data-year="2015" data-month="10" data-event="click" data-handler="selectDay">
<td class=" ui-datepicker-days-cell-over ui-datepicker-today" data-year="2015" data-month="10" data-event="click" data-handler="selectDay">
<a class="ui-state-default ui-state-highlight" href="#">24</a>
</td>
<td class=" ui-datepicker-unselectable ui-state-disabled ">
<td class=" ui-datepicker-unselectable ui-state-disabled ">
<td class=" ui-datepicker-unselectable ui-state-disabled ">
<td class=" ui-datepicker-week-end ui-datepicker-unselectable ui- state-disabled ">
</tr>
<tr>
</tbody>
</table>
</div>

Related

Returning None when scraping href using Python

Hi I'm trying to scrape 151 Heavy Duty Rubber Gloves - Ex Large from table with following inspect script. Can someone please help with the right Python script?
[<table border="0" class="ProductBox" id="Added0">
<tr>
<td align="center" colspan="2">
<div style="width:100%;float:left;display:inline;float:left;height:37px;"><div style="float:left;font-size:16px;font-family: 'Roboto Condensed', sans-serif;color:white;margin-top:4%;margin-left:6%;"> </div></div>
</td></tr><tr>
<td align="center" colspan="2" height="60px;" valign="top">
<div class="PromoPriceText"> <br/><br/></div><div class="StdPrice">£0.69</div><div class="UnitCost">(£0.69/Unit)</div>
</td>
</tr>
<tr>
<td align="center" colspan="2" height="185">
<a href="/products/DetailsPortal.asp?product_code=104373&Page=Products&BreadPath=/products/gridlist.asp?DeptCode=14*prodgroup=211" style=" line-height: 20px; padding-left: 0px;">
<img alt="" class="effectfront" id="prod" src="/~uldir/104373t.jpg" style="height:165px !important;"/></a>
</td>
</tr>
<tr>
<td class="ProdDetails" style="padding-left:10px;padding-right:10px;margin-bottom:5px;"><input name="product_code" type="hidden" value="104373"/>104373</td>
<td align="right" class="ProdDetails" style="padding-left:10px;padding-right:10px;margin-bottom:5px;">
</td>
</tr>
<tr>
<td class="ProdDetails" colspan="1" style="padding-left:10px;padding-right:10px;margin-bottom:5px;">
POR 0%
</td>
<td align="right" class="ProdDetails" colspan="1" style="padding-left:10px;padding-right:10px;margin-bottom:5px;">
VAT 20%
</td>
</tr>
<tr>
<td class="ProdDetails" colspan="2" style="padding-left:10px;padding-right:10px;margin-bottom:5px;height:50px;">
<a href="/products/DetailsPortal.asp?product_code=104373&Page=Products&BreadPath=/products/gridlist.asp?DeptCode=14*prodgroup=211" style=" line-height: 20px; padding-left: 0px;">
**151 Heavy Duty Rubber Gloves - Ex Large**</a></td>
</tr>
<tr>
<td class="ProdDetails" colspan="1" style="padding-left:10px;padding-right:10px;margin-bottom:5px;">
1s x 1
</td>
<td class="ProdDetails" colspan="1" style="padding-left:10px;padding-right:10px;margin-bottom:5px;float:right;width:98%;text-align:right;">
<div class="tooltip">
<div class="IconWishNS" id="IconWishNS104373" onclick="AddToWish('104373','A')" style="display:inline-block;">
<span class="tooltiptext tooltip-bottom" style="font-size:12px;">Add to Wish List</span></div>
</div>
<span class="OKStatus">In Stock </span>
</td>
</tr>
<tr>
<td colspan="2" style="padding-left:10px;padding-right:10px;margin-bottom:5px;">
<table style="margin-top : 10px;" width="100%">
<tr>
<td>
<img align="middle" alt="Take 1 Off Qty" src="/images/minus.png"/>
</td>
<td>
<input class="iQtyBox" id="104373_qty" maxlength="4" name="104373_qty" oninput="this.value=(parseInt(this.value)||'')" tabindex="1" type="text" value="1"/>
</td>
<td>
<img align="middle" alt="Add 1 To Qty" src="/images/add.png"/>
</td>
<td align="right">
<button class="subBlackButtonDiv subButtonDiv" style="width:70px;margin:0px;" type="button" value="add">Add</button>
</td>
</tr>
</table>
I tied to use the following
r = s.get(url)
soup = BeautifulSoup(r.text, 'lxml')
table = soup.find_all('table')
for i in table:
links = [link.get('href') for link in i.find_all('a')]
print(links)
which unfortunately returns: ['/products/DetailsPortal.asp?product_code=104373&Page=Products&BreadPath=/products/gridlist.asp?DeptCode=14*prodgroup=211', '/products/DetailsPortal.asp?product_code=104373&Page=Products&BreadPath=/products/gridlist.asp?DeptCode=14*prodgroup=211', '#', '#', '#']
Can use the td.ProductDetails a selector (an a tag inside td with the class ProductDetails) to target the text you are interested in, then call .strip() a few times to remove extra characters:
DATA = """<table border="0" class="ProductBox" id="Added0">
<tr>
...
</table>"""
from bs4 import BeautifulSoup
from typing import Optional
def extract_name(data: str) -> Optional[str]:
soup = BeautifulSoup(data, "html.parser")
links = soup.select("td.ProdDetails a")
if len(links) >= 1:
return links[0].text.strip().strip("*").strip()
else:
return None
print(extract_name(DATA))
# like above
r = s.get(url)
soup = BeautifulSoup(r.text, 'lxml')
tables = soup.find_all('table')
text = extract_name(tables[0])
Output: 151 Heavy Duty Rubber Gloves - Ex Large

I need to pass the result of soup.find_all to another soup.find_all function to filter the HTML code for a project

I have this HTML code for example:
<table class="nested4">
<tr>
<td colspan="1"></td>
<td colspan="2">
<h2 class="zeroMargin" id="govtMsg" visible="false"></h2>
</td>
<td colspan="2">
<h2 class="zeroMargin "> Net Metering Conn. </h2>
</td>
<td colspan="2">
<h2 class="zeroMargin" hidden> Life Line Consumer</h2>
</td>
</tr>
<tr>
<td colspan="2">
<p style="margin: 0; text-align: left; padding-left: 5px">
<span>NAME & ADDRESS</span>
<br />
<span>MUHAMMAD AMIN </span>
<br />
<span>S/O MUHAMMAD KHAN </span>
<br />
<span>H-NO.38 MARGALLA ROAD </span>
<br />
<span>F-6/3 ISLAMABAD3 </span>
<br />
<span></span>
</p>
</td>
<td colspan="3" style="text-align: left">
<h2 class="color-red">Say No To Corruption</h2>
<span style="font-size: 8pt; color: #78578e"> MCO Date : 10-Aug-2018</span>
<br />
</td>
<td>
<h3 style="font-size: 14pt;"> </h3>
<h2> <br /> </h2>
</td>
</tr>
<tr>
<td style="margin-top: 0;" class="border-b">
<br />
</td>
<td colspan="1" style="margin-top: 0;" class="border-b">
</td>
<td colspan="1" style="margin-top: 0;" class="border-b">
</td>
</tr>
<tr style="height: 7%;" class="border-tb">
<td style="width: 130px" class="border-r">
<h4>METER NO</h4>
</td>
<td style="width: 90px" class="border-r">
<h4>PREVIOUS READING</h4>
</td>
<td style="width: 90px" class="border-r">
<h4>PRESENT READING</h4>
</td>
<td style="width: 60px" class="border-r">
<h4>MF</h4>
</td>
<td style="width: 60px" class="border-r">
<h4>UNITS</h4>
</td>
<td>
<h4>STATUS</h4>
</td>
</tr>
<tr style="height: 30px" class="content">
<td class="border-r">
3-P I 3301539<br> I 3301539<br> E 3301539<br> E 3301539<br>
</td>
<td class="border-r">
78693<br>16823<br>19740<br>8<br>
</td>
<td class="border-r">
80086<br>17210<br>20139<br>8<br>
</td>
<td class="border-r">
1<br>1<br>1<br>1<br>
</td>
<td class="border-r">
1393<br>387<br>399<br>0<br>
</td>
<td>
</td>
</tr>
<tr id="roshniMsg" style="height: 30px" class="content">
<td colspan="6">
<div style="width: 452pt">
<img style="max-width: 100%; max-height: 35%" src="/images/companies/iesco/roshniMsg.jpg"
alt="Roshni Message" />
</div>
</td>
</tr>
</table>
From this table I want to extract the paragraph and from there I want to get all the span tags in that paragraph.
I used soup.find_all() to get the table but I don't know how to use this function iteratively to pass it back to the original soup object so that I could find the paragraph and, moreover the span tags in that paragraph.
This is the code Python code I wrote:
soup = BeautifulSoup(string, 'html.parser')
#Getting the table tag
results = soup.find_all('table', attrs={'class':'nested4'})
#Getting the paragragh tag
results = soup.find_all('p', attrs={'style':'margin: 0; text-align: left; padding-left: 5px'})
#Getting all the span tags
results = soup.find_all('span', attrs={})
I just want help on how to get the paragraphs within the table. And then how to get the spans within the paragraph as I am getting the spans in all of the original HTML code. I don't know how to pass the bs4 object list back to the soup object to use soup.find_all iteratively.
from bs4 import BeautifulSoup
html = '''
<table class="nested4">
<tr>
<td colspan="1"></td>
<td colspan="2">
<h2 class="zeroMargin" id="govtMsg" visible="false"></h2>
</td>
<td colspan="2">
<h2 class="zeroMargin "> Net Metering Conn. </h2>
</td>
<td colspan="2">
<h2 class="zeroMargin" hidden> Life Line Consumer</h2>
</td>
</tr>
<tr>
<td colspan="2">
<p style="margin: 0; text-align: left; padding-left: 5px">
<span>NAME & ADDRESS</span>
<br />
<span>MUHAMMAD AMIN </span>
<br />
<span>S/O MUHAMMAD KHAN </span>
<br />
<span>H-NO.38 MARGALLA ROAD </span>
<br />
<span>F-6/3 ISLAMABAD3 </span>
<br />
<span></span>
</p>
</td>
<td colspan="3" style="text-align: left">
<h2 class="color-red">Say No To Corruption</h2>
'''
soup = BeautifulSoup(html, 'html.parser')
spans = soup.select_one('table.nested4').select('span')
for span in spans:
print(span.text)
This returns:
NAME & ADDRESS
MUHAMMAD AMIN
S/O MUHAMMAD KHAN
H-NO.38 MARGALLA ROAD
F-6/3 ISLAMABAD3
if you have one table:
soup = BeautifulSoup(string, 'html.parser')
table = soup.find('table', attrs={'class': 'nested4'})
p = table.find('p', attrs={'style': 'margin: 0; text-align: left; padding-left: 5px'})
results = p.find_all('span')
for result in results:
print(result.get_text(strip=True))
if you have list of tables:
soup = BeautifulSoup(string, 'html.parser')
for table in soup.find_all('table', attrs={'class': 'nested4'}):
for p in table.find_all('p', attrs={'style': 'margin: 0; text-align: left; padding-left: 5px'}):
for span in p.find_all('span'):
print(span.get_text(strip=True))

Problem extracting text of td from table row (tr) with scrapy

I am parsing data table from the following URL:
https://www.signalstart.com/search-signals
In particular, I am trying to extract the data from the table rows.
The table row has a series of table-data cells:
<table class="table table-striped table-bordered dataTable table-hover" id="searchSignalsTable">
<thead>
<tr>
<th class="sorting sorting_asc">Rank</th>
<th class="sorting ">Name</th>
<th class="sorting ">Gain</th>
<th class="sorting ">Pips</th>
<th class="sorting ">DD</th>
<th class="sorting ">Trades</th>
<th class="sorting ">Type</th>
<th>Monthly</th>
<th>Chart</th>
<th class="sorting ">Price</th>
<th class="sorting " style="width: 40px">Age</th>
<th class="sorting " style="width: 70px">Added</th>
<th>Action</th>
</tr>
</thead>
<tbody>
<tr>
<td style="text-align: center;"> - </td>
<td><a class="pointer" target="_blank" href="https://www.signalstart.com/analysis/joker-1k/110059">Joker 1k</a> </td>
<td><span class="red">-9.99%</span></td>
<td><span class="green">2,092.3</span></td>
<td>15.3%</td>
<td>108</td>
<td>Real</td>
<td><span class="monthlySparkline" id="monthlySpark110059"><canvas width="12" height="25" style="display: inline-block; vertical-align: top; width: 12px; height: 25px;"></canvas></span></td>
<td><span class="dayliSparkline" id="dayliSpark110059"><canvas width="100" height="25" style="display: inline-block; vertical-align: top; width: 100px; height: 25px;"></canvas></span></td>
<td>$30</td>
<td>
1m 24d
</td>
<td>
Mar 29, 2020
</td>
<td><a onclick="getMasterPricingData('110059');" data-toggle="modal"><button id="subscribeToMasterBtn110059" class="btn btn-circle btn-sm green" type="button">Copy</button></a>
<div style="display: none;">
<input type="hidden" class="monthlyData" oid="110059" value="-1.78,-3.68,-4.86">
<input type="hidden" class="dailyGrowthData" oid="110059" value="0.00,-0.03,-1.78,-5.69,-6.75,-5.59,-7.61,-5.31,-6.20,-3.81,-4.40,-8.00,-2.88,-3.78,-4.38,-0.20,-5.40,-10.66,-13.69,-12.51,-13.23,-9.99">
<input type="hidden" class="dailyEquityData" oid="110059" value="0.00,-0.23,-1.41,-5.02,-6.25,-4.29,-6.68,-3.91,-5.37,-4.10,-4.40,-3.59,-1.78,-1.75,-2.65,-0.21,-4.87,-10.76,-13.90,-11.58,-13.23,-10.18">
</div>
</td>
</tr>
<tr>
<td style="text-align: center;"> - </td>
<td><a class="pointer" target="_blank" href="https://www.signalstart.com/analysis/fxabakus/56043">FXabakus</a> </td>
<td><span class="red">-19.57%</span></td>
<td><span class="red">-8,615.2</span></td>
<td>42%</td>
<td>1642</td>
<td>Real</td>
<td><span class="monthlySparkline" id="monthlySpark56043"><canvas width="80" height="25" style="display: inline-block; vertical-align: top; width: 80px; height: 25px;"></canvas></span></td>
<td><span class="dayliSparkline" id="dayliSpark56043"><canvas width="100" height="25" style="display: inline-block; vertical-align: top; width: 100px; height: 25px;"></canvas></span></td>
<td>$30</td>
<td>
1y 7m
</td>
<td>
May 4, 2019
</td>
<td><a onclick="getMasterPricingData('56043');" data-toggle="modal"><button id="subscribeToMasterBtn56043" class="btn btn-circle btn-sm green" type="button">Copy</button></a>
<div style="display: none;">
<input type="hidden" class="monthlyData" oid="56043" value="1.22,1.35,3.92,1.35,-1.57,1.77,2.01,1.11,0.38,-14.89,-14.70,-5.21,5.97,7.03,-17.54,2.92,3.11,-8.94,13.38,1.77">
<input type="hidden" class="dailyGrowthData" oid="56043" value="-27.87,-29.29,-29.01,-26.76,-25.76,-25.59,-30.57,-30.13,-29.78,-29.60,-29.25,-28.34,-28.07,-27.89,-25.20,-25.08,-23.66,-23.46,-21.54,-21.02,-21.62,-20.28,-18.31,-26.97,-27.48,-27.00,-28.21,-24.20,-23.46,-30.04,-31.37,-34.62,-33.84,-32.87,-32.20,-30.99,-30.43,-30.30,-29.75,-27.64,-27.45,-24.34,-24.71,-24.09,-24.15,-21.48,-21.08,-20.97,-19.54,-19.57">
<input type="hidden" class="dailyEquityData" oid="56043" value="-27.87,-29.29,-28.89,-26.76,-25.76,-28.10,-34.47,-32.34,-31.54,-40.80,-32.76,-32.90,-33.50,-30.65,-25.37,-25.05,-22.88,-23.29,-21.54,-21.02,-21.54,-20.90,-19.11,-27.76,-35.15,-29.17,-27.79,-24.20,-26.23,-34.32,-35.95,-51.20,-33.84,-32.76,-32.71,-31.62,-30.43,-39.93,-29.75,-27.64,-28.35,-27.62,-28.41,-24.20,-24.51,-22.06,-21.08,-20.97,-18.82,-30.27">
</div>
</td>
</tr>
<tr>
<td style="text-align: center;"> - </td>
<td><a class="pointer" target="_blank" href="https://www.signalstart.com/analysis/af-investing-pro-final/122603">AF Investing Pro Final</a> </td>
<td><span class="green">56.69%</span></td>
<td><span class="green">29,812</span></td>
<td>8.6%</td>
<td>476</td>
<td>Real</td>
<td><span class="monthlySparkline" id="monthlySpark122603"><canvas width="8" height="25" style="display: inline-block; vertical-align: top; width: 8px; height: 25px;"></canvas></span></td>
<td><span class="dayliSparkline" id="dayliSpark122603"><canvas width="100" height="25" style="display: inline-block; vertical-align: top; width: 100px; height: 25px;"></canvas></span></td>
<td>$250</td>
<td>
17d 12h
</td>
<td>
Apr 30, 2020
</td>
<td><a onclick="getMasterPricingData('122603');" data-toggle="modal"><button id="subscribeToMasterBtn122603" class="btn btn-circle btn-sm green" type="button">Copy</button></a>
<div style="display: none;">
<input type="hidden" class="monthlyData" oid="122603" value="55.18,0.98">
<input type="hidden" class="dailyGrowthData" oid="122603" value="-0.02,0.04,54.78,55.02,55.18,55.82,55.86,55.99,56.06,56.25,56.69">
<input type="hidden" class="dailyEquityData" oid="122603" value="-8.60,16.85,54.86,54.11,55.44,55.85,54.38,52.15,45.00,51.07,56.25">
</div>
</td>
</tr>
<tr>
<td style="text-align: center;"> - </td>
<td><a class="pointer" target="_blank" href="https://www.signalstart.com/analysis/rapid-growth/111340">Rapid growth</a> </td>
<td><span class="green">130.78%</span></td>
<td><span class="green">1,102.9</span></td>
<td>44.3%</td>
<td>126</td>
<td>Real</td>
<td><span class="monthlySparkline" id="monthlySpark111340"><canvas width="12" height="25" style="display: inline-block; vertical-align: top; width: 12px; height: 25px;"></canvas></span></td>
<td><span class="dayliSparkline" id="dayliSpark111340"><canvas width="100" height="25" style="display: inline-block; vertical-align: top; width: 100px; height: 25px;"></canvas></span></td>
<td>$31</td>
<td>
2m 8d
</td>
<td>
Apr 1, 2020
</td>
<td><a onclick="getMasterPricingData('111340');" data-toggle="modal"><button id="subscribeToMasterBtn111340" class="btn btn-circle btn-sm green" type="button">Copy</button></a>
<div style="display: none;">
<input type="hidden" class="monthlyData" oid="111340" value="87.85,18.28,3.87">
<input type="hidden" class="dailyGrowthData" oid="111340" value="0.00,0.64,1.40,1.40,1.90,2.91,7.53,8.21,11.19,11.30,17.60,19.60,23.03,37.74,47.75,54.75,59.91,69.79,73.60,79.36,87.85,93.14,93.40,94.70,95.93,96.01,99.95,100.71,101.85,102.10,102.12,104.36,108.76,110.11,110.14,110.23,112.58,115.10,115.54,117.17,121.24,122.19,123.40,124.18,124.88,124.89,130.09,130.78">
<input type="hidden" class="dailyEquityData" oid="111340" value="-1.80,0.67,0.97,1.91,-0.64,2.58,6.82,6.72,8.65,8.46,16.29,17.71,19.96,34.10,47.24,51.91,59.07,69.79,73.58,79.26,88.01,91.03,93.43,87.85,96.19,95.80,100.29,95.63,98.94,101.71,98.33,104.12,108.26,108.46,86.24,108.42,112.83,114.51,94.42,116.29,120.16,121.93,123.05,115.67,122.81,124.45,130.47,130.14">
</div>
</td>
</tr>
<tr>
<td style="text-align: center;"> - </td>
<td><a class="pointer" target="_blank" href="https://www.signalstart.com/analysis/dream-presentation-1/66543">Dream Presentation 1</a> </td>
<td><span class="red">-99.9%</span></td>
<td><span class="red">-2,724.1</span></td>
<td>99.9%</td>
<td>1612</td>
<td>Real</td>
<td><span class="monthlySparkline" id="monthlySpark66543"><canvas width="28" height="25" style="display: inline-block; vertical-align: top; width: 28px; height: 25px;"></canvas></span></td>
<td><span class="dayliSparkline" id="dayliSpark66543"><canvas width="100" height="25" style="display: inline-block; vertical-align: top; width: 100px; height: 25px;"></canvas></span></td>
<td>$30</td>
<td>
6m 13d
</td>
<td>
Nov 8, 2019
</td>
<td><a onclick="getMasterPricingData('66543');" data-toggle="modal"><button id="subscribeToMasterBtn66543" class="btn btn-circle btn-sm green" type="button">Copy</button></a>
<div style="display: none;">
<input type="hidden" class="monthlyData" oid="66543" value="-100.14,-98.54,-98.79,-91.71,-98.23,-100.00,-88.82">
<input type="hidden" class="dailyGrowthData" oid="66543" value="24.18,-99.90,-99.89,-99.88,-99.88,-99.88,-99.87,-99.87,-99.86,-99.84,-99.83,-99.90,-99.89,-99.90,-99.90,-99.81,-99.81,-99.80,-99.90,-99.90,-99.86,-99.83,-99.79,-99.90,-99.90,-99.90,-99.88,-99.89,-99.89,-99.88,-99.82,-99.74,-99.85,-99.37,-99.88,-99.90,-99.90,-99.90,-99.90,-99.87,-99.83,-99.80,-99.75,-99.64,-99.56,-99.90,-99.90">
<input type="hidden" class="dailyEquityData" oid="66543" value="7.87,-99.90,-99.89,-99.88,-99.88,-99.88,-99.88,-99.87,-99.86,-99.84,-99.83,-99.90,-99.89,-99.90,-99.89,-99.83,-99.88,-99.88,-99.90,-99.90,-99.87,-99.83,-99.84,-99.72,-99.90,-99.90,-99.88,-99.89,-99.88,-99.92,-99.86,-99.74,-99.86,-99.39,-99.88,-99.90,-99.90,-99.90,-99.90,-99.87,-99.83,-99.79,-99.76,-99.63,-99.55,-100.16,-99.83">
</div>
</td>
</tr>
<tr>
<td style="text-align: center;"> - </td>
<td><a class="pointer" target="_blank" href="https://www.signalstart.com/analysis/limerence-ea-suite-3/93679">Limerence EA Suite 3</a> </td>
<td><span class="green">1,246.66%</span></td>
<td><span class="green">199.8</span></td>
<td>34.2%</td>
<td>8</td>
<td>Real</td>
<td><span class="monthlySparkline" id="monthlySpark93679"><canvas width="20" height="25" style="display: inline-block; vertical-align: top; width: 20px; height: 25px;"></canvas></span></td>
<td><span class="dayliSparkline" id="dayliSpark93679"><canvas width="100" height="25" style="display: inline-block; vertical-align: top; width: 100px; height: 25px;"></canvas></span></td>
<td>$75</td>
<td>
7m 11d
</td>
<td>
Feb 11, 2020
</td>
<td><a onclick="getMasterPricingData('93679');" data-toggle="modal"><button id="subscribeToMasterBtn93679" class="btn btn-circle btn-sm green" type="button">Copy</button></a>
<div style="display: none;">
<input type="hidden" class="monthlyData" oid="93679" value="95.40,82.01,94.38,87.49,3.90">
<input type="hidden" class="dailyGrowthData" oid="93679" value="0.00,95.40,255.64,591.28,552.49,1234.12,1196.10,1246.66">
<input type="hidden" class="dailyEquityData" oid="93679" value="0.00,95.40,255.64,591.28,1034.76,1234.12,1196.10,1246.66">
</div>
</td>
</tr>
<tr>
<td style="text-align: center;"> - </td>
<td><a class="pointer" target="_blank" href="https://www.signalstart.com/analysis/easy-money/31727">Easy Money</a> </td>
<td><span class="red">-99.9%</span></td>
<td><span class="green">2,430.6</span></td>
<td>100%</td>
<td>1095</td>
<td>Real</td>
<td><span class="monthlySparkline" id="monthlySpark31727"><canvas width="96" height="25" style="display: inline-block; vertical-align: top; width: 96px; height: 25px;"></canvas></span></td>
<td><span class="dayliSparkline" id="dayliSpark31727"><canvas width="100" height="25" style="display: inline-block; vertical-align: top; width: 100px; height: 25px;"></canvas></span></td>
<td>$30</td>
<td>
2y 2m
</td>
<td>
Apr 1, 2018
</td>
<td><a onclick="getMasterPricingData('31727');" data-toggle="modal"><button id="subscribeToMasterBtn31727" class="btn btn-circle btn-sm green" type="button">Copy</button></a>
<div style="display: none;">
<input type="hidden" class="monthlyData" oid="31727" value="6.22,-6.15,22.04,-5.08,0.08,12.08,-69.31,-99.82,245.26,88.44,113.73,52.29,25.38,77.72,-29.07,-24.73,-86.48,-89.27,195.77,-7.65,-99.98,278.89,-69.98,-65.48">
<input type="hidden" class="dailyGrowthData" oid="31727" value="-99.66,-99.69,-99.72,-99.73,-99.77,-99.77,-99.78,-99.81,-99.90,-99.90,-99.89,-99.84,-99.83,-99.82,-99.81,-99.75,-99.78,-99.77,-99.79,-99.78,-99.77,-99.48,-99.46,-99.36,-99.34,-99.33,-99.33,-99.31,-99.33,-99.34,-99.40,-99.45,-99.33,-99.58,-99.65,-99.73,-99.71,-99.70,-99.68,-99.68,-99.69,-99.68,-99.71,-99.68,-99.80,-99.80,-99.77,-99.81,-99.84,-99.90">
<input type="hidden" class="dailyEquityData" oid="31727" value="-99.66,-99.69,-99.73,-99.70,-99.85,-99.89,-99.95,-99.77,-99.85,-99.90,-99.88,-99.84,-99.83,-99.82,-99.79,-99.75,-99.78,-99.77,-99.70,-99.68,-99.59,-99.48,-99.46,-99.36,-99.34,-99.33,-99.32,-99.25,-99.30,-99.34,-99.37,-99.37,-99.35,-99.58,-99.61,-99.73,-99.71,-99.69,-99.68,-99.68,-99.68,-99.68,-99.71,-99.68,-99.80,-99.76,-99.73,-99.79,-99.80,-99.89">
</div>
</td>
</tr>
</tbody>
</table>
My code successfully extracts the data from the first table-data cell (the rank). But it is showing as blank for the second table data cell (the name). What is wrong with this source code:
import scrapy
from behold import Behold
class SignalStartSpider(scrapy.Spider):
name = 'signalstart'
start_urls = [
'https://www.signalstart.com/search-signals',
]
def parse(self, response):
for provider in response.xpath("//div[#class='row']//tr"):
yield {
'rank': provider.xpath('td[1]/text()').get(),
'name': provider.xpath('td[2]/text()').get(),
}
UPDATE
I am now iterating over the td cells within tr and getting the td cells, but my final problem is: how to get the text from the td cells that I have?
import scrapy
from behold import Behold
class SignalStartSpider(scrapy.Spider):
name = 'signalstart'
start_urls = [
'https://www.signalstart.com/search-signals',
]
def parse(self, response):
cols = "rank name gain pips drawdown trades type monthly chart price age added action"
skip = [9,13]
td = dict()
for i, col in enumerate(cols.split()):
td[i] = col
Behold().show('td')
for provider in response.xpath("//div[#class='row']//tr"):
data_row = dict()
for i, datum in enumerate(provider.xpath('td')):
if i in skip:
continue
data_row[td[i]] = datum
# Behold().show('datum')
yield data_row
The correct answer was provided by gallaecio_ in the Scrapy IRC channel - here is the code:
import scrapy
from behold import Behold
class SignalStartSpider(scrapy.Spider):
name = 'signalstart'
start_urls = [
'https://www.signalstart.com/search-signals',
]
def parse(self, response):
cols = "rank name gain pips drawdown trades type monthly chart price age added action"
skip = [9,13]
td = dict()
for i, col in enumerate(cols.split()):
td[i] = col
Behold().show('td')
for provider in response.xpath("//div[#class='row']//tr"):
data_row = dict()
for i, datum in enumerate(provider.xpath('td/text()')):
if i in skip:
continue
data_row[td[i]] = datum.get()
# Behold().show('datum')
yield data_row
for more involved cases you may need https://github.com/TeamHG-Memex/html-text

Python Beautiful Soup Iterate over Multiple Tables

Trying to find multiple tables using the CSS names and I am only getting the CSS in the output initially. I want to loop over each of the small tables and from there each row contains player info with the tds attributes about each player. How come what I have there doesn't actually print the table contents to begin with? I want to confirm I have made this first step right, before I then go on and into
the tr and tds for each mini table. I think part of the issue is that the first table.
My program -
import requests
from bs4 import BeautifulSoup
#url = 'https://www.skysports.com/premier-league-table'
base_url = 'https://www.skysports.com'
# Squad Data
squad_url = base_url + '/liverpool-squad'
squad_r = requests.get(squad_url)
print(squad_r.status_code)
premier_squad_soup = BeautifulSoup(squad_r.text, 'html.parser')
premier_squad_table = premier_squad_soup.find_all = ('table', {'class': 'table -small no-wrap football-squad-table '})
print(premier_squad_table)
HTML -
each table looks like the following but with a different title
<table class="table -small no-wrap football-squad-table " title="Goalkeeper">
<colgroup>
<col class="" style="">
<col class="digit-4 -bp30-hdn">
<col class="digit-3 ">
<col class="digit-3 ">
<col class="digit-3 ">
</colgroup>
<thead>
<tr class="text-s -interact text-h6" style="">
<th class=" text-h4 -txt-left" title="">Goalkeeper</th>
<th class=" text-h6" title="Played">Pld</th>
<th class=" text-h6" title="Goals">G</th>
<th class=" text-h6" title="Yellow Cards ">YC</th>
<th class=" text-h6" title="Red Cards">RC</th>
</tr>
</thead>
<tbody>
<tr class="text-h6 -center">
<td>
<a href="/football/player/141016/alisson-ramses-becker">
<div class="row-table -2cols">
<span class="col span4/5 -txt-left"><h6 class=" text-h5">Alisson Ramses Becker</h6></span>
</div>
</a>
</td>
<td>
13 (0) </td>
<td>0</td>
<td>0</td>
<td>0</td>
</tr>
<tr class="text-h6 -center">
<td>
<a href="/simon-mignolet">
<div class="row-table -2cols">
<span class="col span4/5 -txt-left"><h6 class=" text-h5">Simon Mignolet</h6></span>
</div>
</a>
</td>
<td>
1 (0) </td>
<td>0</td>
<td>0</td>
<td>0</td>
</tr>
<tr class="text-h6 -center">
<td>
<a href="/football/player/153304/kamil-grabara">
<div class="row-table -2cols">
<span class="col span4/5 -txt-left"><h6 class=" text-h5">Kamil Grabara</h6></span>
</div>
</a>
</td>
<td>
1 (1) </td>
<td>0</td>
<td>0</td>
<td>0</td>
</tr>
</tbody>
</table>
Output -
200
('table', {'class': 'table -small no-wrap football-squad-table '})
Had to find the div first to then get the table inside the div
premier_squad_div = premier_squad_soup.find('div', {'class': '-bp30-box col span1/1'})
premier_squad_table = premier_squad_div.find_all('table', {'class': 'table -small no-wrap football-squad-table '})

Extract table from html file using python

I want to extract table from an html file. I have written the following code-snippet to extract the first table:
import urllib2
import os
import time
import traceback
from bs4 import BeautifulSoup
#find('table',{'class':'tbl_with_brdr'})
outfile= open('D:/Dropbox/Python/apelec.txt','wb')
rfile = open('D:/Dropbox/PRI/Data/AP/195778.html')
rsoup = BeautifulSoup(rfile)
nodes = rsoup.find('div',{'class':'frmtext'}).find('table').find('tr')
for node in nodes[1:]:
x = node.find('th').find('b').get_text().encode("utf-8")
print x
y = node.find('th').findNext('th').find('b').get_text().encode("utf-8")
print y
outfile.write(str(x)+"\t"+str(y)+"\n")
outfile.close()
Here is the error:
9 rfile = open('D:/Dropbox/PRI/Data/AP/195778.html')
10 rsoup = BeautifulSoup(rfile)
---> 11 nodes = rsoup.find('div',{'class':'frmtext'}).find('table').find('tr')
12 for node in nodes[1:]:
13 x = node.find('th').find('b').get_text().encode("utf-8")
AttributeError: 'NoneType' object has no attribute 'find'
And the html file is:
<!DOCTYPE HTML PUBLIC "-//W3C//DTD HTML 4.01 Transitional//EN"
"http://www.w3.org/TR/html4/loose.dtd">
<html>
<head>
<link rel="icon" type="image/ico" href="images/favicon.ico"/>
<meta http-equiv="Content-Type" content="text/html; charset=UTF-8">
<link rel="stylesheet" href="themes/panchayat_default.css" type="text/css"/>
<title>consolidated Election Report</title>
</head>
<body>
<!-- To blur the background while processing dwr -->
<div class="faded_div process"></div>
<div class="popup_block_div process" style="display: none;">
<img alt="" src="images/loading_animation.gif" style="margin-left: auto; margin-right: auto;">
</div>
<div id="maincontainer" class="resize">
<div id="headerwrap">
<!-- Header -->
<html>
<head>
<script type='text/javascript' src="/profilerdwr/engine.js"> </script>
<script type='text/javascript' src="/profilerdwr/util.js"> </script>
<script type="text/javascript" src="/profilerdwr/interface/lgdDao.js"></script>
<script type="text/javascript" src="js/common_util_js.js"></script>
<link rel="stylesheet" href="css/common_css.css" type="text/css"></link>
<meta http-equiv='Content-Type' content='text/html; charset=UTF-8' />
</head>
<body >
<div class="clear"></div>
<div id="headerwrap">
<div id="header">
<div id="new_header">
<div id="logoleft">Area Profiler</div>
<div id="logoright"></div>
<div class="clear"></div>
</div>
<div class="clear"></div>
<div id="loginnav" align="right">
<table width="100%" class="tbl_no_brdr">
<tr>
<td class="tblclear" align="left">
<div id="mainnav">Home </div>
</td>
</tr>
</table>
</div>
</div>
<div class="clear"></div>
<div id="topnav">
<table width="100%" class="tbl_no_brdr">
<tr>
<td width="85" class="tblclear">Choose Theme :</td>
<td width="200" class="tblclear">
<form id="themeForm" name="themeForm" method="get" action="welcome.do">
<input type="hidden" name='OWASP_CSRFTOKEN' value='CN72-BGJW-G7FM-K1S3-P5FF-V1EN-IO4T-GHWU' />
<select name="theme" id="themeId" class="combofield" onchange="submitThemeForm()" style="width: 120px;">
<option value="default">Default Theme</option>
<option value="mustard">Mustard Theme</option>
<option value="peach">Peach Theme</option>
<option value="green">Green Theme</option>
<option value="blue">Blue Theme</option>
</select>
</form>
</td>
<td style="padding: 0px">
</td>
<td class="tblclear"> </td>
<td width="14" class="tblclear txticon"><img src="images/btnMinus.jpg" width="16" height="14" border="0" /></div></td>
<td width="14" class="tblclear txticon"><img src="images/btnDefault.jpg" width="16" height="14" border="0" /> </td>
<td width="28" class="tblclear txticon"><img src="images/btnPlus.jpg" width="16" height="14" border="0" /></td>
<script type="text/javascript" >
//documenttextsizer.setup("shared_css_class_of_toggler_controls")
documenttextsizer.setup("texttoggler")
</script>
<td width="100" align="right" class="tblclear">Select Language :</td>
<td width="108" align="right" class="tblclear">
<form id="languageForm" name="languageForm" method="get" action="welcome.do">
<input type="hidden" name='OWASP_CSRFTOKEN' value='CN72-BGJW-G7FM-K1S3-P5FF-V1EN-IO4T-GHWU' />
<select id="languageId" name="language" class="combofield" style="width: 120px;" onchange="submitLanguageForm()" >
<option value=""> Select Language </option>
</select>
</form>
</td>
</tr>
</table>
</div>
<div id="breadcrumbnav"> </div>
</div>
<script type="text/javascript">
function submitThemeForm()
{
var isOK = confirm("This will Refresh Your Page. Any Unsaved data will be Lost. Do You still want to Continue?");
if(isOK)
{
document.getElementById('themeForm').submit();
}
else
{
return;
}
}
function submitLanguageForm()
{
var isOK = confirm("This will Refresh Your Page. Any Unsaved data will be Lost. Do You still want to Continue?");
if(isOK)
{
document.getElementById('languageForm').submit();
}
else
{
return;
}
}
</script>
</body>
</html>
</div>
<div class="clear"></div>
<div id="content">
<div id="leftpnl">
<table width="100%" border="0" cellspacing="0" cellpadding="0">
<tr>
<td width="100%" valign="top" class="tblclear">
<!-- content -->.
<script type="text/javascript" src="js/common_js.js"></script>
<!DOCTYPE html PUBLIC "-//W3C//DTD HTML 4.01 Transitional//EN" "http://www.w3.org/TR/html4/loose.dtd">
<html xmlns="http://www.w3.org/1999/xhtml">
<head>
<meta http-equiv="Content-Type" content="text/html; charset=UTF-8" />
<script type="text/javascript">
var pathname;
$(document).ready(function() {pathname = window.location.pathname;});
function onBack(s) {
var position =pathname.indexOf("/", 2);
var newPath = "";
var val = s.indexOf("?", 1);
if(val>0)
{
newPath = s+"&redirect=true";
}
else
{
newPath = s+"?redirect=true";
}
window.location.replace(".."+pathname.substring(0,position)+"/"+newPath);
}
function downloadReport(repformat){
//window.location="downloadConsolidatedElectionReportPDF.do?OWASP_CSRFTOKEN=CN72-BGJW-G7FM-K1S3-P5FF-V1EN-IO4T-GHWU";
//document.forms["electionReportForm"].action="downloadConsolidatedElectionReportPDF.do?repformat="+repformat+"&OWASP_CSRFTOKEN=CN72-BGJW-G7FM-K1S3-P5FF-V1EN-IO4T-GHWU";
document.forms["electionReportForm"].action="downloadConsolidatedElectionReportPDF.do?reportformat="+repformat+"&OWASP_CSRFTOKEN=CN72-BGJW-G7FM-K1S3-P5FF-V1EN-IO4T-GHWU";
document.forms["electionReportForm"].method="POST";
document.getElementById('electionReportForm').target="_blank";
document.forms["electionReportForm"].submit();
}
</script>
<style type="text/css">
.data_link{
color:blue;
display: block;
text-decoration: none;
font-size: 1em;
font-weight: bolder;
}
.disable_link
{
cursor:default;
color:blue;
display: block;
text-decoration: none;
font-size: 1em;
font-weight: bolder;
}
.data_link:VISITED
{
color:blue;
display: block;
text-decoration: none;
font-size: 1em;
font-weight: bolder;
}
.data_link:HOVER{
text-decoration: underline;
}
</style>
</head>
<body>
<div id="frmcontent">
<div class="frmhd">
<table width="100%" class="tbl_no_brdr">
<tr>
<td align="left" width="90%">
Consolidated Election</td>
</tr>
</table>
</div>
<div class="clear"></div>
<div class="frmpnlbrdr">
<div class="frmpnlbg">
<div class="frmtxt">
<table width="100%" style="margin-bottom: 10px;" class="tbl_with_brdr">
<tr class="tblRowTitle tblclear" >
<th align="left" ><b>State Name</b></th>
<th align="left" ><b>Local Body Type</b></th>
<th align="left" ><b>Election Term</b></th>
<th align="left" ><b>Local Body Name</b></th>
</tr>
<tr class="tblRowB" style="color: blue;">
<th align="left" >ANDHRA PRADESH</th>
<th align="left" >Village Panchayat</th>
<th align="left" >
02-Aug-2013 To
01-Aug-2018
</th>
<th align="left" >KODIHALLI</th>
</tr>
</table>
<div class="frmhdtitle">Consolidated Election</div>
<table width="100%" class="tbl_with_brdr">
<thead>
<tr class="tblRowTitle tblclear">
<th align="center" width="5%" ><b>S.No.</b></th>
<th align="left" width="9%"><b>Name</b></th> 0
<th align="left" width="9%"><b>Age</b></th> 1
<th align="left" width="9%"><b>Caste Category</b></th> 2
<th align="left" width="9%"><b>Gender</b></th> 3
<th align="left" width="9%"><b>Qualification</b></th> 4
<th align="left" width="9%"><b>Occupation</b></th> 5
<th align="left" width="9%"><b>Email Address</b></th> 6
<th align="left" width="9%"><b>Ward Name</b></th> 7
<th align="left" width="9%"><b>Reservation</b></th> 8
</tr>
</thead>
<tbody>
<tr class="tblRowB">
<td align="center" >1</td>
<td>Kambanna</td>
<td>36</td>
<td>OBC</td>
<td>Male</td>
<td>Middle or Lower Secondary</td>
<td>N/A</td>
<td>
N/A
</td>
<td>N/A</td>
<td >
Yes (OBC / Others)
</td>
</tr>
<tr class="tblRowA">
<td align="center" >2</td>
<td>Ramesh</td>
<td>39</td>
<td>OBC</td>
<td>Male</td>
<td>Middle or Lower Secondary</td>
<td>Workers not reporting any occupations</td>
<td>
N/A
</td>
<td>Ward no 1</td>
<td >
Yes (OBC / Others)
</td>
</tr>
<tr class="tblRowB">
<td align="center" >3</td>
<td>S.Manjunath</td>
<td>29</td>
<td>OBC</td>
<td>Male</td>
<td>Higher Secondary or Intermediate or Pre University or Senior Secondary</td>
<td>Workers not reporting any occupations</td>
<td>
N/A
</td>
<td>Ward no 2</td>
<td >
No (General / Others)
</td>
</tr>
<tr class="tblRowA">
<td align="center" >4</td>
<td>Obuleshu</td>
<td>48</td>
<td>OBC</td>
<td>Male</td>
<td>Below Primary</td>
<td>Workers not reporting any occupations</td>
<td>
N/A
</td>
<td>Ward no 3</td>
<td >
No (General / Others)
</td>
</tr>
<tr class="tblRowB">
<td align="center" >5</td>
<td>Mamatha</td>
<td>24</td>
<td>OBC</td>
<td>Female</td>
<td>Matriculation or Junior School Certificate or Secondary</td>
<td>N/A</td>
<td>
N/A
</td>
<td>Ward no 4</td>
<td >
Yes (General / Female)
</td>
</tr>
<tr class="tblRowA">
<td align="center" >6</td>
<td>Shivamma</td>
<td>38</td>
<td>OBC</td>
<td>Female</td>
<td>Below Primary</td>
<td>N/A</td>
<td>
N/A
</td>
<td>Ward no 5</td>
<td >
Yes (General / Female)
</td>
</tr>
<tr class="tblRowB">
<td align="center" >7</td>
<td>Hanumantappa</td>
<td>46</td>
<td>SC</td>
<td>Male</td>
<td>Illiterate</td>
<td>N/A</td>
<td>
N/A
</td>
<td>Ward no 6</td>
<td >
No (General / Others)
</td>
</tr>
<tr class="tblRowA">
<td align="center" >8</td>
<td>Malingappa</td>
<td>45</td>
<td>SC</td>
<td>Male</td>
<td>Illiterate</td>
<td>N/A</td>
<td>
N/A
</td>
<td>Ward no 7</td>
<td >
No (General / Others)
</td>
</tr>
<tr class="tblRowB">
<td align="center" >9</td>
<td>Kamalamma</td>
<td>52</td>
<td>OBC</td>
<td>Female</td>
<td>Illiterate</td>
<td>N/A</td>
<td>
N/A
</td>
<td>Ward no 8</td>
<td >
Yes (OBC / Female)
</td>
</tr>
<tr class="tblRowA">
<td align="center" >10</td>
<td>Muddamma</td>
<td>48</td>
<td>OBC</td>
<td>Female</td>
<td>Illiterate</td>
<td>N/A</td>
<td>
N/A
</td>
<td>Ward no 9</td>
<td >
Yes (General / Female)
</td>
</tr>
<tr class="tblRowB">
<td align="center" >11</td>
<td>Patta Tayamma</td>
<td>45</td>
<td>SC</td>
<td>Female</td>
<td>Middle or Lower Secondary</td>
<td>N/A</td>
<td>
N/A
</td>
<td>Ward no 10</td>
<td >
Yes (SC / Female)
</td>
</tr>
<tr class="tblRowA">
<td align="center" >12</td>
<td>Sujatha</td>
<td>35</td>
<td>OBC</td>
<td>Female</td>
<td>Middle or Lower Secondary</td>
<td>N/A</td>
<td>
N/A
</td>
<td>Ward no 11</td>
<td >
Yes (OBC / Female)
</td>
</tr>
<tr class="tblRowB">
<td align="center" >13</td>
<td>Kadurappa</td>
<td>35</td>
<td>SC</td>
<td>Male</td>
<td>Middle or Lower Secondary</td>
<td>N/A</td>
<td>
N/A
</td>
<td>Ward no 12</td>
<td >
Yes (SC / Others)
</td>
</tr>
</tbody>
</table>
<br />
<table width="100%" class="tbl_no_brdr">
<tr>
<td align="center">
<input type="button" class="btn" onclick="onClose('welcome.do?OWASP_CSRFTOKEN=CN72-BGJW-G7FM-K1S3-P5FF-V1EN-IO4T-GHWU')" value=Close />
<input type="button" class="btn" onclick="this.disabled=true; this.value='Please Wait .!';onBack('consolidatedElectionReport.do?OWASP_CSRFTOKEN=CN72-BGJW-G7FM-K1S3-P5FF-V1EN-IO4T-GHWU&electionTermId=35107&stateId=28')" value=Back />
</td>
</tr>
</table>
<form id="electionReportForm" name="electionReportForm" action="#" method="post">
<div align="center"><br/>
<input type="button" class="btn" onclick="downloadReport('pdf');" value="Export to PDF" size="5" />
<input type="button" class="btn" onclick="downloadReport('xls');" value="Export to Excel" size="5" />
</div>
</form>
</div>
<div class="myclass"
style="font-family: Times; text-align: center; font-size: 10.0pt; color: white; font-weight: bold; border: 1px solid gray">
Report generated through Area Profiler (http://areaprofiler.gov.in)Thu Oct 02 22:34:20 IST 2014
</div>
</div>
</div>
</div>
</body>
</html>
</td>
</tr>
</table>
</div>
</div>
<div class="clear"></div>
<div id="footer">
<!-- Footer -->
<html>
<head>
</head>
<body>
<table width="100%" class="tbl_no_brdr">
<tr>
<td colspan="3" class="fotbrdr"></td>
</tr>
<tr>
<td width="161" class="btmlogospace"><a href="http://www.negp.gov.in/" target= "_blank" ><img src="images/e_governance_logo.jpg" width="161" height="38" /></a></td>
<td width="93" class="btmlogospace"><a href="http://www.panchayat.gov.in/" target= "_blank" ><img src="images/panchayatilogo.jpg" width="93" height="38" /></a></td>
<td align="right" class="btmlogospace">Site is designed, hosted
and maintained by National Informatics Centre<br /> Contents on
this website is owned,updated and managed by the Ministry of
Panchayati Raj</td>
</tr>
</table>
</body>
</html>
</div>
</div>
</body>
</html>
I paste here an approach, it is not exactly the solution but you can use it as a guide.
You have to traverse the DOM tree and extract the values you want.
I changed the class of the div you look for from frmtext to frmtxt and in the traversal you have to check if anything is found or not.
import urllib2
import os
import time
import traceback
from bs4 import BeautifulSoup
outfile= open('out.txt','wb')
rfile = open('195778.html')
rsoup = BeautifulSoup(rfile)
nodes1 = rsoup.find('div',{'class':'frmtxt'})
nodes = nodes1.find('table').find_all('tr')
for node in nodes:
a = node.find('th')
x = None
if a != None:
x1 = x.find('b')
if x1 != None:
x2 = x1.get_text().encode("utf-8")
print x2
x = x2
y = node.find('th')
if y != None:
print 'y',y
y2 = y.findNext('th')
if y2 != None:
print 'y2',y2
y3 = y2.find('b')
if y3 != None:
y = y3.get_text().encode("utf-8")
print y
outfile.write(str(x)+"\t"+str(y)+"\n")
outfile.close()

Categories

Resources