Im having trouble writing the contents of this soup function to the my ide.
I have the following soup function:
row = soup.find_all('td', attrs = {'class': 'Table__TD'})
here is the a subset of what it returns:
[<td class="Table__TD">Sat 11/9</td>,
<td class="Table__TD"><span class="flex"><span class="pr2">vs</span><span class="pr2 TeamLink__Logo"><a class="AnchorLink v-mid" data-clubhouse-uid="s:40~l:46~t:6" href="/nba/team/_/name/dal/dallas-mavericks" title="Team - Dallas Mavericks"><img alt="DAL" class="v-mid" data-clubhouse-uid="s:40~l:46~t:6" height="20" src="" title="DAL" width="20"/></a></span><span><a class="AnchorLink v-mid" data-clubhouse-uid="s:40~l:46~t:6" href="/nba/team/_/name/dal/dallas-mavericks" title="Team - Dallas Mavericks">DAL</a></span></span></td>,
<td class="Table__TD"><a class="AnchorLink" data-game-link="true" href="http://www.espn.com/nba/game?gameId=401160772"><span class="flex tl"><span class="pr2"><div class="ResultCell tl loss-stat">L</div></span><span>138-122</span></span></a></td>,
<td class="Table__TD">31</td>,
<td class="Table__TD">6-12</td>,
<td class="Table__TD">50.0</td>,
<td class="Table__TD">4-9</td>,
<td class="Table__TD">44.4</td>,
<td class="Table__TD">2-2</td>,
<td class="Table__TD">100.0</td>,
<td class="Table__TD">4</td>,
<td class="Table__TD">4</td>,
<td class="Table__TD">2</td>,
<td class="Table__TD">3</td>,
<td class="Table__TD">2</td>,
<td class="Table__TD">1</td>,
<td class="Table__TD">18</td>,
<td class="Table__TD">Fri 11/8</td>,
I am trying to use a for loop to write these out but my console is not returning anything.
for data in row[0].find_all('td'):
print(data.get_text())
Can anyone tell me what I am doing wrong? Thanks.
With the initial search, you don't need to re-find_all on the tag name.
Just do something like:
for data in row:
print(data.get_text())
I'm still a python noob trying to learn beautifulsoup.I looked at solutions on stack but was unsuccessful Please help me to understand this better.
i have extracted the html which is as shown below
<table cellspacing="0" id="ContentPlaceHolder1_dlDetails"
style="width:100%;border-collapse:collapse;">
<tbody><tr>
<td>
<table border="0" cellpadding="5" cellspacing="0" width="70%">
<tbody><tr>
<td> </td>
<td> </td>
</tr>
<tr>
<td bgcolor="#4F95FF" class="listhead" width="49%">Location:</td>
<td bgcolor="#4F95FF" class="listhead" width="51%">On Site </td>
</tr>
<tr>
<td class="listmaintext">ATM ID: </td>
<td class="listmaintext">DAGR00401111111</td>
</tr>
<tr>
<td class="listmaintext">ATM Centre:</td>
<td class="listmaintext"></td>
</tr>
<tr>
<td class="listmaintext">Site Location: </td>
<td class="listmaintext">ADA Building - Agra</td>
</tr>
i tried to parse find_all('tbody') but was unsuccessful
#table = bs.find("table", {"id": "ContentPlaceHolder1_dlDetails"})
html = browser.page_source
soup = bs(html, "lxml")
table = soup.find_all('table', {'id':'ContentPlaceHolder1_dlDetails'})
table_body = table.find('tbody')
rows = table.select('tr')
for row in rows:
cols = row.find_all('td')
cols = [ele.text.strip() for ele in cols]
data.append([ele for ele in cols if ele])values
I'm trying to save values in "listmaintext" class
Error message
AttributeError: ResultSet object has no attribute 'find'. You're probably treating a list of items like a single item. Did you call find_all() when you meant to call find()?
Another way to do this using next_sibling
from bs4 import BeautifulSoup as bs
html ='''
<html>
<table cellspacing="0" id="ContentPlaceHolder1_dlDetails"
style="width:100%;border-collapse:collapse;">
<tbody><tr>
<td>
<table border="0" cellpadding="5" cellspacing="0" width="70%">
<tbody><tr>
<td> </td>
<td> </td>
</tr>
<tr>
<td bgcolor="#4F95FF" class="listhead" width="49%">Location:</td>
<td bgcolor="#4F95FF" class="listhead" width="51%">On Site </td>
</tr>
<tr>
<td class="listmaintext">ATM ID: </td>
<td class="listmaintext">DAGR00401111111</td>
</tr>
<tr>
<td class="listmaintext">ATM Centre:</td>
<td class="listmaintext"></td>
</tr>
<tr>
<td class="listmaintext">Site Location: </td>
<td class="listmaintext">ADA Building - Agra</td>
</tr>
</html>'''
soup = bs(html, 'lxml')
data = [' '.join((item.text, item.next_sibling.next_sibling.text)) for item in soup.select('#ContentPlaceHolder1_dlDetails tr .listmaintext:first-child') if item.text !='']
print(data)
from bs4 import BeautifulSoup
data = '''<table cellspacing="0" id="ContentPlaceHolder1_dlDetails"
style="width:100%;border-collapse:collapse;">
<tbody><tr>
<td>
<table border="0" cellpadding="5" cellspacing="0" width="70%">
<tbody><tr>
<td> </td>
<td> </td>
</tr>
<tr>
<td bgcolor="#4F95FF" class="listhead" width="49%">Location:</td>
<td bgcolor="#4F95FF" class="listhead" width="51%">On Site </td>
</tr>
<tr>
<td class="listmaintext">ATM ID: </td>
<td class="listmaintext">DAGR00401111111</td>
</tr>
<tr>
<td class="listmaintext">ATM Centre:</td>
<td class="listmaintext"></td>
</tr>
<tr>
<td class="listmaintext">Site Location: </td>
<td class="listmaintext">ADA Building - Agra</td>
</tr>'''
soup = BeautifulSoup(data, 'lxml')
s = soup.select('.listmaintext')
for td1, td2 in zip(s[::2], s[1::2]):
print('{} [{}]'.format(td1.text.strip(), td2.text.strip()))
Prints:
ATM ID: [DAGR00401111111]
ATM Centre: []
Site Location: [ADA Building - Agra]
I am trying to grab elements from a table if a cell has a certain color. Only issue is that for the color tags, grabbing the color does not seem possible just yet.
jump = []
for tr in site.findAll('tr'):
for td in site.findAll('td'):
if td == 'td bgcolor':
jump.append(td)
print(jump)
This returns an empty list
How do I grab just the color from the below html?
I need to get the color from the [td] tag (it would also be useful to get the color from the [tr] tag)
<tr bgcolor="#f4f4f4">
<td height="25" nowrap="NOWRAP"> CME_ES </td>
<td height="25" nowrap="NOWRAP"> 07:58:46 </td>
<td height="25" nowrap="NOWRAP"> Connected </td>
<td height="25" nowrap="NOWRAP"> 0 </td>
<td height="25" nowrap="NOWRAP"> 0 </td>
<td height="25" nowrap="NOWRAP"> 0 </td>
<td height="25" nowrap="NOWRAP"> 0 </td>
<td height="25" nowrap="NOWRAP"> 07:58:00 </td>
<td height="25" nowrap="NOWRAP" bgcolor="#55aa2a"> --:--:-- </td>
<td height="25" nowrap="NOWRAP"> 0 </td>
<td height="25" nowrap="NOWRAP"> 0 </td>
<td height="25" nowrap="NOWRAP"> 01:25:00 </td>
<td height="25" nowrap="NOWRAP"> 22:00:00 </td>
</tr>
How about this:
jump = []
for tr in site.findAll('tr'):
for td in site.findAll('td'):
if 'bgcolor' in td.attrs:
#jump.append(td)
print(td.attrs['bgcolor'])
print(jump)
you can use has_attr to check if an element has a certain attribute:
if td.has_attr('bgcolor'):
jump.append(td)
If i misread your answer and you want to only find tds of a certain color, use find_all:
tr.find_all("td", {"bgcolor": "55aa2a"}) # returns list of matches
PS: if someone has a better docs snippet for has_attr, please edit this answer.
I am trying to scrape a webpage with following url
https://www.bseindia.com/corporates/shpSecurities.aspx?scripcd=500209&qtrid=96.00
and I want to scrape a table with following html code. I have tried few things but not able to achieve the desired table to insert into csv.Here the <"tr"> tag is not closed for the data so segregating the data into different row is an issue.
Thanks for help
--J
<table border='0' width='900' align='center' cellspacing='1' cellpadding='4'>
<tr>
<td class='innertable_header1' rowspan='3'>Category of shareholder</td>
<td class='innertable_header1' rowspan='3'>Nos. of shareholders</td>
<td class='innertable_header1' rowspan='3'>No. of fully paid up equity shares held</td>
<td class='innertable_header1' rowspan='3'>No. of shares underlying Depository Receipts</td>
<td class='innertable_header1' rowspan='3'>Total nos. shares held</td>
<td class='innertable_header1' rowspan='3'>Shareholding as a % of total no. of shares (calculated as per SCRR, 1957)As a % of (A+B+C2)</td>
<td class='innertable_header1' rowspan='3'> Number of equity shares held in dematerialized form</td>
</tr>
<tr></tr>
<tr></tr>
<tr>
<td class='TTRow_left'>(A) Promoter & Promoter Group</td>
<td class='TTRow_right'>19</td>
<td class='TTRow_right'>28,17,02,889</td>
<td class='TTRow_right'></td>
<td class='TTRow_right'>28,17,02,889</td>
<td class='TTRow_right'>12.90</td>
<td class='TTRow_right'>28,17,02,889</td>
<tr>
<td class='TTRow_left'>(B) Public</td>
<td class='TTRow_right'>9,16,058</td>
<td class='TTRow_right'>1,87,81,45,362</td>
<td class='TTRow_right'>1,32,95,642</td>
<td class='TTRow_right'>1,89,14,41,004</td>
<td class='TTRow_right'>86.61</td>
<td class='TTRow_right'>1,88,74,40,959</td>
<tr>
<td class='TTRow_left'>(C1) Shares underlying DRs</td>
<td class='TTRow_right'></td>
<td class='TTRow_right'></td>
<td class='TTRow_right'></td>
<td class='TTRow_right'></td>
<td class='TTRow_right'>0.00</td>
<td class='TTRow_right'></td>
<tr>
<td class='TTRow_left'>(C2) Shares held by Employee Trust</td>
<td class='TTRow_right'>1</td>
<td class='TTRow_right'>1,08,05,896</td>
<td class='TTRow_right'></td>
<td class='TTRow_right'>1,08,05,896</td>
<td class='TTRow_right'>0.49</td>
<td class='TTRow_right'>1,08,05,896</td>
<tr>
<td class='TTRow_left'>(C) Non Promoter-Non Public</td>
<td class='TTRow_right'>1</td>
<td class='TTRow_right'>1,08,05,896</td>
<td class='TTRow_right'></td>
<td class='TTRow_right'>1,08,05,896</td>
<td class='TTRow_right'>0.49</td>
<td class='TTRow_right'>1,08,05,896</td>
<tr>
<td class='TTRow_left'>Grand Total</td>
<td class='TTRow_right'>9,16,078</td>
<td class='TTRow_right'>2,17,06,54,147</td>
<td class='TTRow_right'>1,32,95,642</td>
<td class='TTRow_right'>2,18,39,49,789</td>
<td class='TTRow_right'>100.00</td>
<td class='TTRow_right'>2,17,99,49,744</td>
</tr>
</table>
You can try this:
from bs4 import BeautifulSoup as soup
import urllib
import re
s = soup(str(urllib.urlopen('https://www.bseindia.com/corporates/shpSecurities.aspx?scripcd=500209&qtrid=96.00').read()), 'lxml')
results = filter(None, [re.sub('[\n\r]+|\s{2,}', '', i.text) for i in s.find_all('td', {'class':re.compile('TTRow_right|TTRow_left')})])
Output:
[u'(A) Promoter & Promoter Group', u'19', u'28,17,02,889', u'28,17,02,889', u'12.90', u'28,17,02,889', u'(B) Public', u'9,16,058', u'1,87,81,45,362', u'1,32,95,642', u'1,89,14,41,004', u'86.61', u'1,88,74,40,959', u'(C1) Shares underlying DRs', u'0.00', u'(C2) Shares held by Employee Trust', u'1', u'1,08,05,896', u'1,08,05,896', u'0.49', u'1,08,05,896', u'(C) Non Promoter-Non Public', u'1', u'1,08,05,896', u'1,08,05,896', u'0.49', u'1,08,05,896', u'Grand Total', u'9,16,078', u'2,17,06,54,147', u'1,32,95,642', u'2,18,39,49,789', u'100.00', u'2,17,99,49,744']
I'm having a problem with BeautifulSoup not completely parsing the html received. I tried with both lxml and html5lib parsers and I had the same problem.
html = '<td style="vertical-align: top">1</td> <td style="vertical-align: top"><span class="ui-icon country flg-fr"></span>\t</td><td class="pn"><a class="player-link" href="/Players/25604">Hugo Lloris <span class="incident-wrapper"></span> </a><span class="player-meta-data">29</span><span class="player-meta-data">, GK </span></td> <td class="ShotsTotal ">0\t</td><td class="ShotOnTarget ">0\t</td><td class="KeyPassTotal ">0\t</td><td class="PassSuccessInMatch ">88\t</td><td class="DuelAerialWon ">0\t</td><td class="Touches ">35\t</td><td class="rating ">6.24</td> <td style="text-align: left"><span class="incident-wrapper"></span></td> '
parsed_html = ipdb> BeautifulSoup(html, 'html5lib')
<html><head></head><body>1 <span class="ui-icon country flg-fr"></span> <a class="player-link" href="/Players/25604">Hugo Lloris <span class="incident-wrapper"></span> </a><span class="player-meta-data">29</span><span class="player-meta-data">, GK </span> 0 0 0 88 0 35 6.24 <span class="incident-wrapper"></span> </body></html>
It is working for me. I execute the following code (using beautifulsoup4==4.4.1):
from bs4 import BeautifulSoup
html = """
<td style="vertical-align: top">1</td>
<td style="vertical-align: top"><span class="ui-icon country flg-fr"></span>\t</td>
<td class="pn"><a class="player-link" href="/Players/25604">Hugo Lloris <span class="incident-wrapper"></span> </a><span
class="player-meta-data">29</span><span class="player-meta-data">, GK </span></td>
<td class="ShotsTotal ">0\t</td>
<td class="ShotOnTarget ">0\t</td>
<td class="KeyPassTotal ">0\t</td>
<td class="PassSuccessInMatch ">88\t</td>
<td class="DuelAerialWon ">0\t</td>
<td class="Touches ">35\t</td>
<td class="rating ">6.24</td>
<td style="text-align: left"><span class="incident-wrapper"></span></td>
"""
parsed_html = BeautifulSoup(html, 'html5lib')
print(html)
And I've got the following html printed:
<td style="vertical-align: top">1</td>
<td style="vertical-align: top"><span class="ui-icon country flg-fr"></span> </td>
<td class="pn"><a class="player-link" href="/Players/25604">Hugo Lloris <span class="incident-wrapper"></span> </a><span
class="player-meta-data">29</span><span class="player-meta-data">, GK </span></td>
<td class="ShotsTotal ">0 </td>
<td class="ShotOnTarget ">0 </td>
<td class="KeyPassTotal ">0 </td>
<td class="PassSuccessInMatch ">88 </td>
<td class="DuelAerialWon ">0 </td>
<td class="Touches ">35 </td>
<td class="rating ">6.24</td>
<td style="text-align: left"><span class="incident-wrapper"></span></td>
Don't see anything missing.