BeautifulSoup - Python - Find the key from HTML - python

I have been practicing with bs4 and Python and now I have been stucked.
My plan is to do a If - Else state where I wanted to do similar like
If(I find a value inside this html)
Do This method
Else:
Do something else
and I have scraped up a html I found randomly which looks like -
<div class="Talkinghand" data-backing="ShowingHide" data-key="123456" data-theme="$MemeTheme" style=""></div>
and what I have done so far is that:
s = requests.Session()
Url = 'www.myhtml.com' #Just took a random page which I don't feel to insert
r = s.get(Url)
soup = soup(r, "lxml")
findKey = soup.find(('div', {'class': 'Talkinghand'})['data-key'])
print(findKey)
but no luck. Gives me error and
TypeError: object of type 'Response' has no len()
Once I find or print out the key. I wanted to do a if else statement where it also says:
If(there is a value inside that data-key)
...

To display the data-key attribute from inside the <div> tag, you can do the following:
from bs4 import BeautifulSoup
html = '<div class="Talkinghand" data-backing="ShowingHide" data-key="123456" data-theme="$MemeTheme" style=""></div>'
soup = BeautifulSoup(html, "html.parser")
print soup.div['data-key']
This would print:
123456
You would need to pass r.content to your soup call.
Your script had an extra ( and ), so the following would also work:
findKey = soup.find('div', {'class': 'Talkinghand'})['data-key']
print findKey

Related

Loop through div class tags in BeautifulSoup only gets the first result

So, usually what I do when I want to loop through all the elements on a webpage is just do something like:
for i in range(..):
print(get_stuff[i])
But in this case the entire HTML is all in one element, and findAll only gets you the first one, so even if I do this:
from bs4 import BeautifulSoup
import requests
req = requests.get(f"https://jisho.org/search/%23words%20%23n%20?page=1")
soup = BeautifulSoup(req.text, 'html.parser')
concepts = soup.findAll("div",{"class":"concepts"})
tango = concepts[0].findAll("div",{"class":"concept_light clearfix"})
for _ in tango:
tango1 = tango[0].findAll("span",{"class":"text"})[0].text
print(tango1)
I just get the output of the first result repeated. How do I loop through all the "concept_light clearfix" tags instead? I've looked at other answers for a similar question but I didn't understand the solutions (or how to apply them to my case) at all. Please explain simply, thank you.
Try this:-
from bs4 import BeautifulSoup
import requests
with requests.Session() as session:
req = session.get("https://jisho.org/search/%23words%20%23n%20?page=1")
req.raise_for_status()
soup = BeautifulSoup(req.text, 'html.parser')
for concept in soup.findAll("div", attrs={"class": "concepts"}):
for tango in concept.find_all("div", attrs={"class": "concept_light clearfix"}):
for span in tango.find_all("span", attrs={"class": "text"}):
print(span.text)
You can select all tags with class="concept_light" and then select the the <span class="text"> within this tag. For example:
import requests
from bs4 import BeautifulSoup
req = requests.get(f"https://jisho.org/search/%23words%20%23n%20?page=1")
soup = BeautifulSoup(req.content, "html.parser")
for concept in soup.select(".concept_light"):
print(concept.select_one("span.text").get_text(strip=True))
Prints:
学校
川
手
戸
眼鏡
煙草
赤
仕事
英語
問題
部屋
子供
時間
雨
先生
年
手紙
電話
水
病気
You are almost there.
The issue is in the for-loop. You are looping correctly, but using only the first item of tango. This:
tango1 = tango[0].findAll("span",{"class":"text"})[0].text
The for-loop should be like this:
for i in tango:
tango1 = i.findAll("span",{"class":"text"})[0].text.strip()
print(tango1)
Output with the above for-loop.
学校
川
手
戸
眼鏡
煙草
赤
仕事
英語
問題
部屋
子供
時間
雨
先生
年
手紙
電話
水
病気

How to extract text from 'a' element with BeautifulSoup?

I'm trying to get the text from a 'a' html element I got with beautifulsoup.
I am able to print the whole thing and what I want to find is right there:
-1
Tensei Shitara Slime Datta Ken Manga
-1
But when I want to be more specific and get the text from that it gives me this error:
File "C:\python\manga\manga.py", line 15, in <module>
print(title.text)
AttributeError: 'int' object has no attribute 'text'
Here is the code I'm running:
import requests
from bs4 import BeautifulSoup
URL = 'https://mangapark.net/manga/tensei-shitara-slime-datta-ken-fuse'
page = requests.get(URL)
soup = BeautifulSoup(page.content, 'html.parser')
results = soup.find('section', class_='manga')
manga_title = soup.find('div', class_='pb-1 mb-2 line-b-f hd')
for m_title in manga_title:
title = m_title.find('a')
print(title.text)
I've searched for my problem but I couldn't find something that helps.
Beautiful soup returns -1 as a value when it doesn't find something in a search
This isn't a very common way in python to show that no values exist but it is a common one for other languages.
import requests
from bs4 import BeautifulSoup
URL = 'https://mangapark.net/manga/tensei-shitara-slime-datta-ken-fuse'
page = requests.get(URL)
soup = BeautifulSoup(page.content, 'html.parser')
results = soup.find('section', class_='manga')
manga_title = soup.find('div', class_='pb-1 mb-2 line-b-f hd')
for m_title in manga_title.children:
title = m_title.find('a')
# Beautiful soup returns -1 as a value when it doesn't find something in a search
# This isn't a very pythonic way to show non existent values but it is a common one
if title != -1:
print(title.text)
Output
Tensei Shitara Slime Datta Ken Manga

Trying to get a value from a html code by using beautiful soap but have hard time to get it

Trying to find the value shown in the picture below from the website https://www.coop.se/butiker-erbjudanden/coop/coop-ladugardsangen-/ with help of beautiful soap code. But the only value I get is the price number and not the "st" value.
Here is the code I try to use to get it...
CODE
test = product.find('span', class_='Splash-content ')
print(Price.text)
import requests
from bs4 import BeautifulSoup as bsoup
site_source = requests.get("https://www.coop.se/butiker-erbjudanden/coop/coop-ladugardsangen-/").content
soup = bsoup(site_source, "html.parser")
all_items = soup.find("div", class_="Section Section--margin")
item_list = soup.find_all("span", class_="Splash-content")
for item in item_list:
print("Price: ",item.find("span", class_="Splash-priceLarge").text)
if item.find("span", class_="Splash-priceSub Splash-priceUnitNoDecimal"):
print("Unit: ",item.find("span", class_="Splash-priceSub Splash-priceUnitNoDecimal").text)
In some cases the unit is missing so we want to make sure we handle for that.
My understanding is that you basically want to print the price and unit of each item so that is what i attempt to do.
try with :
url = "https://www.coop.se/butiker-erbjudanden/coop/coop-ladugardsangen-/"
try:
page = urllib.request.urlopen(url, timeout=20)
except HTTPError as e:
page = e.read()
soup = BeautifulSoup(page, 'html.parser')
body = soup.find('body')
result = body.find("span", class_="Splash-content")
print(result.get_text())
for me it worked !

AttributeError: 'NoneType' object has no attribute 'find_all' Python Web Scraping w/ Beautiful Soup

I have two problems. First of all, I get the error that is listed in the title "AttributeError: 'NoneType' object has no attribute 'find_all'" whenever I activate this line of code. Secondly, I want to access one more statistic on this specific website as well. So, firstly, my code is below. This is meant to gather names from a website, trim off the excess, then take those names, insert them into a URL, and take two statistics. The first statistic that I am taking is on line 22, which is the source of the error. And the second statistic is in HTML and is also going to be listed after my code.
import requests
from bs4 import BeautifulSoup
import re
res = requests.get('https://plancke.io/hypixel/guild/name/GBP')
soup = BeautifulSoup(res.text, 'lxml')
memberList = []
skillAverageList = []
for i in soup.select('.playerInfo'):
memberList.append(i.text)
memberList = [e[37:-38] for e in memberList]
members = [re.sub("[A-Z][^A-Z]+$", "", member.split(" ")[1]) for member in memberList]
print(members)
for i in range(len(memberList) + 1):
player = memberList[i]
skyLeaMoe = requests.get('https://sky.lea.moe/stats/' + str(player))
skillAverageList.append(soup.find("div", {"id":"additional_stats_container"}).find_all("div",class_="additional-stat")[-2].get_text(strip=True))
pprint(skillAverageList)
Below is the second statistic that I would like to scrape from this website as well (in HTML). This specific statistic is attributed to this specific site, but the code above will hopefully be able to cycle through the entire list (https://sky.lea.moe/stats/Igris/Apple).
<span class="stat-name">Total Slayer XP: </span> == $0
<span class ="stat-value">457,530</span>
I am sorry if this is a lot, I have almost no knowledge of HTML and any attempt for me to learn it has been a struggle. Thanks in advance to anyone this reaches.
It seems that this site doesn't have a div with the id of "additional_stats_container", and therefore soup.find("div", {"id":"additional_stats_container"}) returns None.
Upon inspecting the HTML of this URL with a browser, I couldn't find such a div.
This script will print all names and their Total Slayer XP:
import requests
from bs4 import BeautifulSoup
url = 'https://plancke.io/hypixel/guild/name/GBP'
stats_url = 'https://sky.lea.moe/stats/'
soup = BeautifulSoup(requests.get(url).content, 'html.parser')
for a in soup.select('td a[href*="/hypixel/player/stats/"]'):
s = BeautifulSoup(requests.get(stats_url + a['href'].split('/')[-1]).content, 'html.parser')
total_slayer_xp = s.select_one('span:contains("Total Slayer XP") + span')
print('{:<30} {}'.format(a.text, total_slayer_xp.text if total_slayer_xp else '-'))
Prints:
[MVP+] Igris 457,530
[VIP] Kutta 207,665
[VIP] mistercint 56,455
[MVP+] zouce 1,710,540
viellythedivelon 30
[MVP+] Louis7864 141,670
[VIP] Broadside1138 292,240
[VIP+] Babaloops 40
[VIP+] SparkleDuck9 321,290
[VIP] TooLongOfAUserNa 423,700
...etc.

How to retrieve href that contain specific text in Beautifulsoup 4?

My soup
import requests
from bs4 import BeautifulSoup
page = requests.get('https://example.com')
soup = BeautifulSoup(page.text, 'html.parser')
property_list = soup.find(class_='listing-list ListingsListstyle__ListingsListContainer-cNRhPr hqtMPr')
property_link_list = property_list.find_all('a',{ "class" : "depth-listing-card-link" },string="View details")
print(property_link_list)
I just got an empty array. What I need is to retrieve all the hrefs that contain View details text.
This is an example of the input
<a class="depth-listing-card-link" href="https://example.com">View details<i class="rui-icon rui-icon-arrow-right-small Icon-cFRQJw cqwgEb"></i></a>
I am using Python 3.7.
Try changing the last 2 lines of your code to:
property_link_list = property_list.find_all('a',{ "class" : "depth-listing-card-link" })
for pty in property_link_list:
if pty.text=="View details":
print(pty['href'])
My output is:
/property/bandar-sungai-long/sale-7700845/
/property/bandar-sungai-long/sale-7700845/
/property/bandar-sungai-long/sale-4577620/
/property/bandar-sungai-long/sale-4577620/
/property/port-dickson/sale-8387235/
/property/port-dickson/sale-8387235/
etc.

Categories

Resources