BeautifulSoup finding attributes of a table row - python

I got the html below, I want to get the text of event_timestamp
<tr id="eventRowId_454169" event_attr_id="25" event_timestamp="2022-07-19 12:30:00" onclick="javascript:changeEventDisplay(454169, this, 'overview');">
<td class="first left time">15:30</td>
<td class="flagCur"> USD</td> <td class="sentiment" title="High Volatility Expected"><i class="newSiteIconsSprite grayFullBullishIcon middle"></i><i class="newSiteIconsSprite grayFullBullishIcon middle"></i><i class="newSiteIconsSprite grayFullBullishIcon middle"></i></td> <td class="left event">Building Permits (Jun)</td>
</tr>
Below is my code
Time = row.tr['event_timestamp']
Am getting None , what can I change to get the time?

from bs4 import BeautifulSoup
html = '''<tr id="eventRowId_454169" event_attr_id="25" event_timestamp="2022-07-19 12:30:00" onclick="javascript:changeEventDisplay(454169, this, 'overview');">
<td class="first left time">15:30</td>
<td class="flagCur"> USD</td> <td class="sentiment" title="High Volatility Expected"><i class="newSiteIconsSprite grayFullBullishIcon middle"></i><i class="newSiteIconsSprite grayFullBullishIcon middle"></i><i class="newSiteIconsSprite grayFullBullishIcon middle"></i></td> <td class="left event">Building Permits (Jun)</td>
</tr>
'''
soup = BeautifulSoup(html, 'html.parser')
time = soup.select_one('tr').get('event_timestamp')
print(time)

Related

Getting value from page with BeautifulSoup

I have the following page structure:
<tr class="small data-row" bgcolor="#f9f9f9">.</tr>
<td class="stats1" align="right">0</td>
<td class="stats1" align="right">0</td>
<td class="stats1" align="right">0</td>
<td class="stats1 stats-dash" align="right">-</td>
.
.
.
<tr class="small data-row" bgcolor="#ffffff">.</tr>
<tr class="small data-row" bgcolor="#f9f9f9">.</tr>
<tr class="small" bgcolor="#eff6ef">.</tr>
<td class="stats1" align="right">215</td>
<td class="stats1" align="right">183</td>
<td class="stats1" align="right">0</td>
<td class="stats1 stats-dash" align="right">-</td>
</tr>
I would like to get this second value == 183, but I am not sure how to do it. I tried in that way:
content = driver.page_source
soup = BeautifulSoup(content)
for elm in soup.select(".stats1"):
val=elm.get("align")
and the output is:
right
<td align="right" class="stats1">215</td>
if I got 183 instead of 215 I could use .split, but in this case I get only this first value.
.select() will return a list of elements. Just call that element by index:
from bs4 import BeautifulSoup
html = '''<tr class="small data-row" bgcolor="#f9f9f9">.</tr>
<tr class="small" bgcolor="#ffffff">.</tr>
<td class="stats1" align="right">215</td>
<td class="stats1" align="right">183</td>
<td class="stats1" align="right">0</td>
<td class="stats1 stats-dash" align="right">-</td>
</tr>'''
soup = BeautifulSoup(html, 'html.parser')
elm = soup.select(".stats1")[1]
Output:
print(elm.text)
183

Retrieving table values from HTML with the same tag names using Beautiful Soup in Python

I am trying to retrieve all the td text for the below table using Beautiful Soup, unfortunately the tag names are the same and I am either only able to retrieve the first element or some elements are repeatedly printing. Hence not really sure of how to go about it.
Below is HTML table snippet:
<div>Table</div>
<table class="Auto" width="100%">
<tr>
<td class="Auto_head">Address</td>
<td class="Auto_head">Name</td>
<td class="Auto_head">Type</td>
<td class="Auto_head">Value IN</td>
<td class="Auto_head">AUTO Statement</td>
<td class="Auto_head">Value OUT</td>
<td class="Auto_head">RESULT</td>
<td class="Auto_head"></td>
</tr>
<tr>
<td class="Auto_body">1</td>
<td class="Auto_body">abc</td>
<td class="Auto_body">yes</td>
<td class="Auto_body">abc123</td>
<td class="Auto_body">jar</td>
<td class="Auto_body">123abc</td>
<td class="Auto_body">PASS</td>
<td class="Auto_body">na</td>
</tr>
What I want is all the text content inside these tags for example the first auto_head corresponds to first auto_body i.e. Address = 1 similarly all the values should be retrieved.
I have used find,findall,findNext and next_sibling but no luck. Here is my current code in python:
self.table = self.soup_file.findAll(class_="Table")
self.headers = [tab.find(class_="Auto_head").findNext('td',class_="Auto_head").contents[0] for tab in self.table]
self.data = [data.find(class_="Auto_body").findNext('td').contents[0] for data in self.table]
Get the headers first, then use zip(...) to combine
from bs4 import BeautifulSoup
data = '''\
<table class="Auto" width="100%">
<tr>
<td class="Auto_head">Address</td>
<td class="Auto_head">Name</td>
<td class="Auto_head">Type</td>
</tr>
<tr>
<td class="Auto_body">1</td>
<td class="Auto_body">abc</td>
<td class="Auto_body">yes</td>
</tr>
<tr>
<td class="Auto_body">2</td>
<td class="Auto_body">def</td>
<td class="Auto_body">no</td>
</tr>
<tr>
<td class="Auto_body">3</td>
<td class="Auto_body">ghi</td>
<td class="Auto_body">maybe</td>
</tr>
</table>
'''
soup = BeautifulSoup(data, 'html.parser')
for table in soup.select('table.Auto'):
# get rows
rows = table.select('tr')
# get headers
headers = [td.text for td in rows[0].select('td.Auto_head')]
# get details
for row in rows[1:]:
values = [td.text for td in row.select('td.Auto_body')]
print(dict(zip(headers, values)))
My output:
{'Address': '1', 'Name': 'abc', 'Type': 'yes'}
{'Address': '2', 'Name': 'def', 'Type': 'no'}
{'Address': '3', 'Name': 'ghi', 'Type': 'maybe'}
Get each category first then iterate using zip
s = '''<div>Table</div>
<table class="Auto" width="100%">
<tr>
<td class="Auto_head">Address</td>
<td class="Auto_head">Name</td>
<td class="Auto_head">Type</td>
<td class="Auto_head">Value IN</td>
<td class="Auto_head">AUTO Statement</td>
<td class="Auto_head">Value OUT</td>
<td class="Auto_head">RESULT</td>
<td class="Auto_head"></td>
</tr>
<tr>
<td class="Auto_body">1</td>
<td class="Auto_body">abc</td>
<td class="Auto_body">yes</td>
<td class="Auto_body">abc123</td>
<td class="Auto_body">jar</td>
<td class="Auto_body">123abc</td>
<td class="Auto_body">PASS</td>
<td class="Auto_body">na</td>
</tr></table>'''
soup = BeautifulSoup(s,features='html')
head = soup.find_all(name='td',class_='Auto_head')
body = soup.find_all(name='td',class_='Auto_body')
for one,two in zip(head,body):
print(f'{one.text}={two.text}')
Address=1
Name=abc
Type=yes
Value IN=abc123
AUTO Statement=jar
Value OUT=123abc
RESULT=PASS
=na
Searching by CSS class
The easiest solution is to add the find_all method at the end of the find
so your code will be
source = requests.get('YOUR URL')
soup=BeautifulSoup(source.text,'html.parser')
data = soup.find('tr').find_all('td')[0]
data = soup.find('tr').find_all('td')[1]
and so on just change the last list number 0,1,2... or else use for loop for the same

Beautiful Soup, for loop to write data

Im having trouble writing the contents of this soup function to the my ide.
I have the following soup function:
row = soup.find_all('td', attrs = {'class': 'Table__TD'})
here is the a subset of what it returns:
[<td class="Table__TD">Sat 11/9</td>,
<td class="Table__TD"><span class="flex"><span class="pr2">vs</span><span class="pr2 TeamLink__Logo"><a class="AnchorLink v-mid" data-clubhouse-uid="s:40~l:46~t:6" href="/nba/team/_/name/dal/dallas-mavericks" title="Team - Dallas Mavericks"><img alt="DAL" class="v-mid" data-clubhouse-uid="s:40~l:46~t:6" height="20" src="data:image/gif;base64,R0lGODlhAQABAIAAAAAAAP///yH5BAEAAAAALAAAAAABAAEAAAIBRAA7" title="DAL" width="20"/></a></span><span><a class="AnchorLink v-mid" data-clubhouse-uid="s:40~l:46~t:6" href="/nba/team/_/name/dal/dallas-mavericks" title="Team - Dallas Mavericks">DAL</a></span></span></td>,
<td class="Table__TD"><a class="AnchorLink" data-game-link="true" href="http://www.espn.com/nba/game?gameId=401160772"><span class="flex tl"><span class="pr2"><div class="ResultCell tl loss-stat">L</div></span><span>138-122</span></span></a></td>,
<td class="Table__TD">31</td>,
<td class="Table__TD">6-12</td>,
<td class="Table__TD">50.0</td>,
<td class="Table__TD">4-9</td>,
<td class="Table__TD">44.4</td>,
<td class="Table__TD">2-2</td>,
<td class="Table__TD">100.0</td>,
<td class="Table__TD">4</td>,
<td class="Table__TD">4</td>,
<td class="Table__TD">2</td>,
<td class="Table__TD">3</td>,
<td class="Table__TD">2</td>,
<td class="Table__TD">1</td>,
<td class="Table__TD">18</td>,
<td class="Table__TD">Fri 11/8</td>,
I am trying to use a for loop to write these out but my console is not returning anything.
for data in row[0].find_all('td'):
print(data.get_text())
Can anyone tell me what I am doing wrong? Thanks.
With the initial search, you don't need to re-find_all on the tag name.
Just do something like:
for data in row:
print(data.get_text())

How to extract specific <td> from table

I'm working on a web scraping program using Python & BeautifulSoup. I encountered a problem when scraping a table.
My problem is, I need to extract selected <td> tags only and not the entire table.
I only need the numbers for 52 Week High, 52 Week Low, Earnings Per Share and Price to book value.
Is there anyway I can do that?
Sample Table
<table id="TABLE_1">
<tbody id="TBODY_2">
<tr id="TR_3">
<td id="TD_4">
<strong id="STRONG_5">52-Week High:</strong>
</td>
<td id="TD_6">
1,116.00
</td>
<td id="TD_7">
<strong id="STRONG_8">Earnings Per Share TTM (EPS):</strong>
</td>
<td id="TD_9">
47.87 (15.57%)
</td>
<td id="TD_10">
<strong id="STRONG_11">Price to Book Value (P/BV):</strong>
</td>
<td id="TD_12">
2.5481125565
</td>
</tr>
<tr id="TR_13">
<td id="TD_14">
<strong id="STRONG_15">52-Week Low:</strong>
</td>
<td id="TD_16">
867.50
</td>
<td id="TD_17">
<strong id="STRONG_18">Price-Earnings Ratio TTM (P/E):</strong>
</td>
<td id="TD_19">
20.8272404429
</td>
<td id="TD_20">
<strong id="STRONG_21">Return on Equity (ROE):</strong>
</td>
<td id="TD_22">
12.42%
</td>
</tr>
<tr id="TR_23">
<td id="TD_24">
<strong id="STRONG_25">Fair Value:</strong>
</td>
<td id="TD_26">
-
</td>
<td id="TD_27">
<strong id="STRONG_28">Dividends Per Share (DPS):</strong>
</td>
<td id="TD_29">
-
</td>
<td id="TD_30">
<strong id="STRONG_31">Recommendation:</strong>
</td>
<td id="TD_32">
None<span id="SPAN_33"></span>
</td>
</tr>
<tr id="TR_34">
<td id="TD_35">
<strong id="STRONG_36">Last Price:</strong>
</td>
<td id="TD_37">
<span id="SPAN_38"></span> <span id="SPAN_39">984.5</span>
</td>
</tr>
</tbody>
</table>
I also showed my codes for your reference.
Any help would be very much appreciated! Thank you!
from bs4 import BeautifulSoup as soup
from urllib.request import Request, urlopen
import pandas as pd
myurl = "https://www.investagrams.com/Stock/ac"
hdr = {'User-Agent': 'Mozilla/5.0'}
req = Request(myurl,headers=hdr)
# Open connection to website
uClient = urlopen(req)
# Offloads the content to variable
page_html = uClient.read()
#just closing it
uClient.close()
# html parser
page_soup = soup(page_html, "html.parser")
table = page_soup.find("div", {"id":"FundamentalAnalysisPanel"}).find("table")
print(table.text)
You can do it with findNextSibling method.
import requests
from bs4 import BeautifulSoup
r = requests.get('https://www.investagrams.com/Stock/ac')
soup = BeautifulSoup(r.text)
# specify table parameters for which you want to find values
parameters = ['52-Week High:', '52-Week Low:', 'Earnings Per Share TTM (EPS):', 'Price-Earnings Ratio TTM (P/E):', 'Price to Book Value (P/BV):']
# iterate all <td> tags and print text of the next sibling (with value),
# if this <td> contains specified parameter.
for td in soup.findAll('td'):
for p in parameters:
if td.find('strong', text=p) is not None:
print(td.findNextSibling().text.strip())
Result:
1,116.00
47.87 (15.57%)
2.5481125565
867.50
20.8272404429
This might be what you want
page_soup = soup(req.data.decode('utf-8'))
#tables = page_soup.find_all('table')
tables = page_soup.find_all('td')
df = pd.read_html(str(tables[i]))
where i is the table you want

Get href Attribute Link from td tag BeautifulSoup Python

I am new in Python and someone suggested me to use Beautiful soup for Scrapping and i am struck in a problem to fetch the href attribute from a td tag Column 2 on the basis of year in column 4.
<table class="tableFile2" summary="Results">
<tr>
<th width="7%" scope="col">Filings</th>
<th width="10%" scope="col">Format</th>
<th scope="col">Description</th>
<th width="10%" scope="col">Filing Date</th>
<th width="15%" scope="col">File/Film Number</th>
</tr>
<tr>
<td nowrap="nowrap">8-K</td>
<td nowrap="nowrap"> Documents</td>
<td class="small" >Current report, items 8.01 and 9.01
<br />Acc-no: 0001193125</td>
<td>2013-05-03</td>
<td nowrap="nowrap">000-10030<br>13813281 </td>
</tr>
<tr class="blueRow">
<td nowrap="nowrap">424B2</td>
<td nowrap="nowrap"> Documents</td>
<td class="small" >Prospectus [Rule 424(b)(2)]<br />Acc-no: 0001193125</td>
<td>2013-05-01</td>
<td nowrap="nowrap">333-188191<br>13802405 </td>
</tr>
<tr>
<td nowrap="nowrap">FWP</td>
<td nowrap="nowrap"> Documents</td>
<td class="small" >Filing under Securities Act Rules 163/433 of free writing prospectuses<br />Acc-no: 0001193125-13-189053 (34 Act) Size: 52 KB </td>
<td>2013-05-01</td>
<td nowrap="nowrap">333-188191<br>13800170 </td>
</tr>
</table>
table = soup.find('table', class="tableFile2")
rows = table.findAll('tr')
for tr in rows:
cols = tr.findAll('td')
if "2013" in cols[3]
link = cols[1].find('a').get('href')
print
This works for me in Python 2.7:
table = soup.find('table', {'class': 'tableFile2'})
rows = table.findAll('tr')
for tr in rows:
cols = tr.findAll('td')
if len(cols) >= 4 and "2013" in cols[3].text:
link = cols[1].find('a').get('href')
print link
A few issues with your previous code:
soup.find() requires a dictionary of attributes (e.g., {'class' : 'tableFile2'})
Not every cols instance will have at least 3 columns, so you need to check length first.

Categories

Resources