How to scrape content from div class using beautifulsoup

How to scrape content from div class using beautifulsoup - python

This is a part of the html page i want to scrape
I am trying to get the title and the value of cryptos using beautifulsoup.
I have tried many solutions using find and find_all to get the content included in div but I don't see what is wrong... There is an example of what i tried:
titles = soup.find_all("div", {"class": "tabTitle-qQlkPW5Y"})
Can you please help me with this ?

My solution is to use selenium to make sure the page fully rendered. Then using beautiful soup we can navigate through its elements.
from selenium import webdriver
driver = webdriver.Chrome(pathToChromeWebDriver)
url = "https://fr.tradingview.com/markets/cryptocurrencies/global-charts/"
driver.get(url)
html = driver.page_source
soup = BeautifulSoup(html, 'html.parser')
for title in soup.find_all("div", {"class": "tabTitle-qQlkPW5Y"}):
print(title.string)

Related

Web Scrapping dynamic content with Python

I would simply like to get the Open Price of a stock with BeautifulSoup or Selenium is okay but i keep getting just the html tag for it and not the actually price i want
# <div class="tv-fundamental-block__value js-symbol-open">33931.0</div>
import requests
from bs4 import BeautifulSoup
url = requests.get('https://www.tradingview.com/symbols/PEPPERSTONE-US30/')
response = url.content
soup = BeautifulSoup(response, 'html.parser')
# print(soup.prettify())
open = soup.find('div', {'class': 'js-symbol-open'})
print(open)
The 33931.0 is the price id like to see in my terminal but i still dont get it
Using selenium ive only gotten the page i already know where i am getting the data from.

To extract the text content of the element using BeautifulSoup, use the .text property of the element:
open = soup.find('div', {'class': 'js-symbol-open'}).text
print(open)
In selenium:
from selenium import webdriver
driver = webdriver.Firefox()
driver.get('https://www.tradingview.com/symbols/PEPPERSTONE-US30/')
open_price = driver.find_element_by_css_selector('.js-symbol-open').text
print(open_price)
driver.quit()

Getting Link Title Using BeautifulSoup

I am trying to parse the title of links using BeautifulSoup. I have tried various things but just can't get it to work.
The html is behind a login so here's a screenshot:
And here's my latest attempt which I was sure would work but just returns "None".
from bs4 import BeautifulSoup
soup = BeautifulSoup(driver.page_source, 'html.parser')
links = soup.find_all('ul', class_='nav list-group')
print(links)
for link in links:
title = link.get('title')
print(title)
Can anyone see what I am doing wrong?

This line of code:
links = soup.find_all('ul', class_='nav list-group')
Is not extracting the links, it's extracting the <ul> tags. Instead, you could try extracting the links with something like:
links = soup.find_all('a', class_='odds')
Then you will be able to loop over them and extract your titles:
for link in links:
print(link['title'])

What happens?
You are selecting the <ul> not its <a> so you wont get any href value.
How to fix?
Select more specific e.g. with these css selector that will find all <a> that has an title attribute, in your <ul>:
links = soup.select('ul.nav.list-group a[title]')
Example
Note: Your question needs some improvement, so you should provide specific part of driver.page_source as text and not as image - Took your code, so it is just a hint.
from bs4 import BeautifulSoup
soup = BeautifulSoup(driver.page_source, 'html.parser')
for link in soup.select('ul.nav.list-group a[title]'):
title = link.get('title')
print(title)

beautiful soup- Scraping a site with hidden tag

I am trying to Scrape NBA.com play by play table so I want to get the text for each box that is in the example picture.
for example(https://www.nba.com/game/bkn-vs-cha-0022000032/play-by-play).
checking the html code I figured that each line is in an article tag that contains div tag that contains two p tags with the information I want, however I wrote the following code and I get back that there are 0 articles and only 9 P tags (should be much more) but even the tags I get their text is not the box but something else. I get 9 tags so I am doing something terrible wrong and I am not sure what it is.
this is the code to get the tags:
from urllib.request import urlopen
from bs4 import BeautifulSoup
def contains_word(t):
return t and 'keyword' in t
url = "https://www.nba.com/game/bkn-vs-cha-0022000032/play-by-play"
page = urlopen(url)
html = page.read().decode("utf-8")
soup = BeautifulSoup(html, "html.parser")
div_tags = soup.find_all('div', text=contains_word("playByPlayContainer"))
articles=soup.find_all('article')
p_tag = soup.find_all('p', text=contains_word("md:bg"))
thank you!

Use Selenium since it's using Javascript and pass it to Beautifulsoup. Also pip install selenium and get the chromedriver.exe
from selenium import webdriver
driver = webdriver.Chrome()
driver.get("https://www.nba.com/game/bkn-vs-cha-0022000032/play-by-play")
soup = BeautifulSoup(driver.page_source, "html.parser")

Trying to Scrape a Span

I 've been trying to scrape two values from a website using beautiful soup in Python, and it's been giving me trouble. Here is the URL of the page I'm scraping:
https://www.stjosephpartners.com/Home/Index
Here are the values I'm trying to scrape:
HTML of Website to be Scraped
I tried:
from bs4 import BeautifulSoup
import requests
source = requests.get('https://www.stjosephpartners.com/Home/Index').text
soup = BeautifulSoup(source, 'lxml')
gold_spot_shell = soup.find('div', class_ = 'col-lg-10').children
print(gold_spot_shell)
the output I got was: <list_iterator object at 0x039FD0A0>
When I tried using: gold_spot_shell = soup.find('div', class_ = 'col-lg-10').children
The output was: ['\n']
when I tried using: gold_spot_shell = soup.find('div', class_ = 'col-lg-10').span
The output was: none
The HTML definitely has at least one span child. I'm not sure how to scrape the values I'm after. Thanks.

Beautifulsoup + Request is not a good method to scrape dynamic website like this. That span is generated by javascript so when you get the html using request, it just does not exist.
You can try to use selenium instead.
You can check if the website is using javascript to render element or not by disabling javascript on the page and find that element again, or just "view page source"

How do I scrape all links in a webpage? My code only scrapes some of the links

This is my code to scrape all links in a webpage:
from bs4 import BeautifulSoup
import requests
import re
page = requests.get("http://www3.asiainsurancereview.com/News")
soup = BeautifulSoup(page.text, "html.parser")
for link in soup.findAll('a', attrs={'href': re.compile("^http://")}):
print(link.get('href'))
links.close()
But it lists out only the links that are present in the drop downs. Why is that? Why did it not "see" the links of the news articles present in the page? I actually want to scrape all the news articles. I tried the following, to identify a tag and scrape the news article links within that tag:
import requests
import re
links=open("Life_and_health_links.txt", "a")
page = requests.get("http://www3.asiainsurancereview.com/News")
soup = BeautifulSoup(page.text, "html.parser")
li_box = soup.select('div.col-sm-5 > ul > li > h5 > a')
for link in li_box:
print(link['href'])
But this, of course, displays only the links in that particular tag. And to list out links in other tags, I have to run this code multiple time specifying the specific tag whose link I want to list out. But, how do I list out all the links of the news articles in all the tags, and skip the links that are not of news articles?

You need to do some research to find the common pattern for news links.
Try this, hope it works.
li_box = soup.select("div ul li h5 a")
for a in li_box:
print(a['href'])

Develop Reference

Python is a programming language that lets you work quickly and integrate systems more effectively.

How to scrape content from div class using beautifulsoup - python

Related

Web Scrapping dynamic content with Python

Getting Link Title Using BeautifulSoup

beautiful soup- Scraping a site with hidden tag

Trying to Scrape a Span

How do I scrape all links in a webpage? My code only scrapes some of the links

Categories

Resources