Got empty list with Beautiful Soup and Selenium - python

https://www.rottentomatoes.com/m/the_lord_of_the_rings_the_return_of_the_king
I want to get TOMATOMETER and AUDIENCE SCORE from that website,
but got an empty list.
soup = BeautifulSoup(html, 'html.parser')
notices = soup.select('#tomato_meter_link > span.mop-ratings-wrap__percentage')

You can use last-child selector for span type with the parent class. This is using BeautifulSoup 4.7.1
import requests
from bs4 import BeautifulSoup
res = requests.get('https://www.rottentomatoes.com/m/the_lord_of_the_rings_the_return_of_the_king')
soup = bs(res.content, 'lxml')
ratings = [item.text.strip() for item in soup.select('h1.mop-ratings-wrap__score span:last-child')]
print(ratings)

Your code works well
>>> from bs4 import BeautifulSoup
>>> html = requests.get('https://www.rottentomatoes.com/m/the_lord_of_the_rings_the_return_of_the_king').text
>>> soup = BeautifulSoup(html, 'html.parser')
>>> notices = soup.select('#tomato_meter_link > span.mop-ratings-wrap__percentage')
>>> notices
[<span class="mop-ratings-wrap__percentage">93%</span>]
How did you get html variable?

Related

How can I scrape Songs Title from this request that I have collected using python

import requests
from bs4 import BeautifulSoup
r = requests.get("https://gaana.com/playlist/gaana-dj-hindi-top-50-1")
soup = BeautifulSoup(r.text, "html.parser")
result = soup.find("div", {"class": "s_c"})
print(result.class)
From the above code, I am able to scrape this data
https://www.pastiebin.com/5f08080b8db82
Now I would like to scrape only the title of the songs and then make a list out of them like the below:
Meri Aashiqui
Genda Phool
Any suggestions are much appreciated!
Try this :
import requests
from bs4 import BeautifulSoup
r = requests.get("https://gaana.com/playlist/gaana-dj-hindi-top-50-1")
soup = BeautifulSoup(r.text, "html.parser")
result = soup.find("div", {"class": "s_c"})
#print(result)
div = result.find_all('div', class_='track_npqitemdetail')
name_list = []
for x in div:
span = x.find('span').text
name_list.append(span)
print(name_list)
this code will return all song name in name_list list.

How to get just links of articles in list using BeautifulSoup

Hey guess so I got as far as being able to add the a class to a list. The problem is I just want the href link to be added to the links_with_text list and not the entire a class. What am I doing wrong?
from bs4 import BeautifulSoup
from requests import get
import requests
URL = "https://news.ycombinator.com"
page = requests.get(URL)
soup = BeautifulSoup(page.content, 'html.parser')
results = soup.find(id = 'hnmain')
articles = results.find_all(class_="title")
links_with_text = []
for article in articles:
link = article.find('a', href=True)
links_with_text.append(link)
print('\n'.join(map(str, links_with_text)))
This prints exactly how I want the list to print but I just want the href from every a class not the entire a class. Thank you
To get all links from the https://news.ycombinator.com, you can use CSS selector 'a.storylink'.
For example:
from bs4 import BeautifulSoup
from requests import get
import requests
URL = "https://news.ycombinator.com"
page = requests.get(URL)
soup = BeautifulSoup(page.content, 'html.parser')
links_with_text = []
for a in soup.select('a.storylink'): # <-- find all <a> with class="storylink"
links_with_text.append(a['href']) # <-- note the ['href']
print(*links_with_text, sep='\n')
Prints:
https://blog.mozilla.org/futurereleases/2020/06/18/introducing-firefox-private-network-vpns-official-product-the-mozilla-vpn/
https://mxb.dev/blog/the-return-of-the-90s-web/
https://github.blog/2020-06-18-introducing-github-super-linter-one-linter-to-rule-them-all/
https://www.sciencemag.org/news/2018/11/why-536-was-worst-year-be-alive
https://www.strongtowns.org/journal/2020/6/16/do-the-math-small-projects
https://devblogs.nvidia.com/announcing-cuda-on-windows-subsystem-for-linux-2/
https://lwn.net/SubscriberLink/822568/61d29096a4012e06/
https://imil.net/blog/posts/2020/fakecracker-netbsd-as-a-function-based-microvm/
https://jepsen.io/consistency
https://tumblr.beesbuzz.biz/post/621010836277837824/advice-to-young-web-developers
https://archive.org/search.php?query=subject%3A%22The+Navy+Electricity+and+Electronics+Training+Series%22&sort=publicdate
https://googleprojectzero.blogspot.com/2020/06/ff-sandbox-escape-cve-2020-12388.html?m=1
https://apnews.com/1da061ce00eb531291b143ace0eed1c9
https://support.apple.com/library/content/dam/edam/applecare/images/en_US/appleid/android-apple-music-account-payment-none.jpg
https://standpointmag.co.uk/issues/may-june-2020/the-healing-power-of-birdsong/
https://steveblank.com/2020/06/18/the-coming-chip-wars-of-the-21st-century/
https://www.videolan.org/security/sb-vlc3011.html
https://onesignal.com/careers/2023b71d-2f44-4934-a33c-647855816903
https://www.bbc.com/news/world-europe-53006790
https://github.com/efficient/HOPE
https://everytwoyears.org/
https://www.historytoday.com/archive/natural-histories/intelligence-earthworms
https://cr.yp.to/2005-590/powerpc-cwg.pdf
https://quantum.country/
http://www.crystallography.net/cod/
https://parkinsonsnewstoday.com/2020/06/17/tiny-magnetically-powered-implant-may-be-future-of-deep-brain-stimulation/
https://spark.apache.org/releases/spark-release-3-0-0.html
https://arxiv.org/abs/1712.09624
https://www.washingtonpost.com/technology/2020/06/18/data-privacy-law-sherrod-brown/
https://blog.chromium.org/2020/06/improving-chromiums-browser.html

Terminal won't show print response using BeautifulSoup

Here is my code:
import requests
from bs4 import BeautifulSoup
page = requests.get('https://web.archive.org/web/20121007172955/https://www.nga.gov/collection/anZ1.htm')
soup = BeautifulSoup(page.text, 'html.parser')
name_list = soup.find(class_='BodyText')
name_list_item = name_list.find_all('a')
for i in name_list_item:
names = name_list.contents[0]
print(names)
Then I ran it but nothing showed up in terminal except for a blank space like this:
Please help!! :<
the problem is in the for loop, you have to exctract content from i and not from name_list_item.
your working code should look like this:
import requests
from bs4 import BeautifulSoup
page = requests.get('https://web.archive.org/web/20121007172955/https://www.nga.gov/collection/anZ1.htm')
soup = BeautifulSoup(page.text, 'html.parser')
name_list = soup.find(class_='BodyText')
name_list_item = name_list.find_all('a')
for i in name_list_item:
names = i.contents[0]
print(names)
I will suggest you to use the below approach to get the links.
(Actually the problem with your appraoch is that it also includes invalid data that we don't want, you can print and check). There are 32 names of type <class 'bs4.element.NavigableString'> which does not have contents, so it is printing 32 LF (ASCII value 10) characters.
Useful links »
How to find tags with only certain attributes - BeautifulSoup
How to find children of nodes using Beautiful Soup
Python: BeautifulSoup extract text from anchor tag
>>> import requests
>>> from bs4 import BeautifulSoup
>>>
>>> page = requests.get('https://web.archive.org/web/20121007172955/https://www
.nga.gov/collection/anZ1.htm')
>>>
>>> soup = BeautifulSoup(page.text, 'html.parser')
>>> name_list = soup.findAll("tr", {"valign": "top"})
>>>
>>> for name in name_list:
... print(name.find("a")["href"])
...
/web/20121007172955/https://www.nga.gov/cgi-bin/tsearch?artistid=11630
/web/20121007172955/https://www.nga.gov/cgi-bin/tsearch?artistid=34202
/web/20121007172955/https://www.nga.gov/cgi-bin/tsearch?artistid=3475
/web/20121007172955/https://www.nga.gov/cgi-bin/tsearch?artistid=25135
/web/20121007172955/https://www.nga.gov/cgi-bin/tsearch?artistid=2298
/web/20121007172955/https://www.nga.gov/cgi-bin/tsearch?artistid=23988
/web/20121007172955/https://www.nga.gov/cgi-bin/tsearch?artistid=8232
/web/20121007172955/https://www.nga.gov/cgi-bin/tsearch?artistid=34154
/web/20121007172955/https://www.nga.gov/cgi-bin/tsearch?artistid=4910
/web/20121007172955/https://www.nga.gov/cgi-bin/tsearch?artistid=3450
/web/20121007172955/https://www.nga.gov/cgi-bin/tsearch?artistid=1986
/web/20121007172955/https://www.nga.gov/cgi-bin/tsearch?artistid=3451
/web/20121007172955/https://www.nga.gov/cgi-bin/tsearch?artistid=20099
/web/20121007172955/https://www.nga.gov/cgi-bin/tsearch?artistid=3452
/web/20121007172955/https://www.nga.gov/cgi-bin/tsearch?artistid=34309
/web/20121007172955/https://www.nga.gov/cgi-bin/tsearch?artistid=27191
/web/20121007172955/https://www.nga.gov/cgi-bin/tsearch?artistid=5846
/web/20121007172955/https://www.nga.gov/cgi-bin/tsearch?artistid=3941
/web/20121007172955/https://www.nga.gov/cgi-bin/tsearch?artistid=3941
/web/20121007172955/https://www.nga.gov/cgi-bin/tsearch?artistid=3453
/web/20121007172955/https://www.nga.gov/cgi-bin/tsearch?artistid=35173
/web/20121007172955/https://www.nga.gov/cgi-bin/tsearch?artistid=11133
/web/20121007172955/https://www.nga.gov/cgi-bin/tsearch?artistid=3455
/web/20121007172955/https://www.nga.gov/cgi-bin/tsearch?artistid=3454
/web/20121007172955/https://www.nga.gov/cgi-bin/tsearch?artistid=961
/web/20121007172955/https://www.nga.gov/cgi-bin/tsearch?artistid=11597
/web/20121007172955/https://www.nga.gov/cgi-bin/tsearch?artistid=11597
/web/20121007172955/https://www.nga.gov/cgi-bin/tsearch?artistid=11631
/web/20121007172955/https://www.nga.gov/cgi-bin/tsearch?artistid=3427
>>>
Thank you.

Beautiful Soup find nested div

I am trying to parse a webpage that looks like this with Python->Beautiful Soup
see image
I need data from
<div class="p-offer__price-new">199,99 ₽</div>
I tried this code:
soup = BeautifulSoup(data)
res = soup.findAll("div", {"class": "poffer__price-new"})
print(res)
But result is empty -- []
How can I get this data?
Example of URL: https://edadeal.ru/moskva/offers/d71b75ff-bfee-4731-95ad-52a24ddea72e?from=%2F
import bs4
from selenium import webdriver
driver = webdriver.Chrome('C:\chromedriver_win32\chromedriver.exe')
driver.get('https://edadeal.ru/moskva/offers/d71b75ff-bfee-4731-95ad-52a24ddea72e?from=%2F')
html = driver.page_source
soup = bs4.BeautifulSoup(html,'html.parser')
res = soup.findAll("div", {"class": "p-offer__price-new"})
print (res[0].text)
driver.close()

How to crawl href - Python & beautifulsoup

I am currently crawling a web page (https://www.klook.com/city/30-kyoto/?p=1) using Python 3.4 and bs4 in order to collect the deeplinks of the respective activities.
I found that the links are located in the html source like this:
<a class="j_activity_item_link" href="/activity/1031-arashiyama-rickshaw-tour-kyoto/" class="j_activity_item_link" data-card-tags="{}" data-sold-out="false" data-price="40.0" data-city-id="30" data-id="1031" data-url-seo="arashiyama-rickshaw-tour-kyoto">
But after several trials, this href="/activity/1031-arashiyama-rickshaw-tour-kyoto/" never show up.
Here is my logic so far:
import requests
from bs4 import BeautifulSoup
user_agent = {'User-agent': 'Chrome/43.0.2357'}
for page in range(1,6):
r = requests.get("https://www.klook.com/city/30-kyoto" + "/?p=" + str(page))
soup = BeautifulSoup(r.content, "lxml")
g_data = soup.find_all("a", {"class": "j_activity_item_link"})
for item in g_data:
Deeplink = item.find_all("a")
for t in Deeplink:
print(t.get("href"))
Output:
Process finished with exit code 0
Could you guys help me put? Any feedback is appreciated.
Your "error" of error code 0 simply indicates that everything went ok with your run. According to your example, your list g_data should contain all of the a tags that you are interested in. You should not need the second for loop to again iterate through and find nested a tags. As a debugging step, print the length of your lists to ensure that they are not empty. See the following:
import requests
from bs4 import BeautifulSoup
user_agent = {'User-agent': 'Chrome/43.0.2357'}
for page in range(1,6):
r = requests.get("https://www.klook.com/city/30-kyoto" + "/?p=" + str(page))
soup = BeautifulSoup(r.content, "lxml")
g_data = soup.find_all("a", {"class": "j_activity_item_link"})
for item in g_data:
print(item.get("href"))
You can first find the number of pages of activities, and then use regex with BeautifulSoup:
import re
from bs4 import BeautifulSoup as soup
data = soup(str(urllib.urlopen('https://www.klook.com/city/30-kyoto/?p=1').read()), 'lxml')
page_numbers = [i.text for i in data.find_all('a', {'class':'p_num '})]
activities = {1:[i['href'] for i in data.find_all('a', {'href':re.compile("^/activity/")})]}
for page in page_numbers:
data = soup(str(urllib.urlopen('https://www.klook.com/city/30-kyoto/?p={}'.format(page)).read()), 'lxml')
activities[int(page)] = [i['href'] for i in data.find_all('a', {'href':re.compile("^/activity/")})]
Output:
{1: ['/activity/1079-one-day-kimono-rental-kyoto/', '/activity/1032-higashiyama-rickshaw-tour-kyoto/', '/activity/6128-kyoto-seaside-day-tour-osaka/', '/activity/1540-hankyu-1-day-tourist-pass-osaka/', '/activity/1777-icoca-ic-card-kyoto/', '/activity/1541-kix-airport-limousine-bus-transfer-kyoto/', '/activity/1753-randen-kyoto-bus-subway-1-day-pass-kyoto/', '/activity/3260-sagano-romantic-train-ticket-kyoto/', '/activity/793-japanese-lzakaya-cooking-course-kyoto/', '/activity/882-nishiki-market-teramachi-street-kyoto/', '/activity/792-morning-bento-cooking-course-kyoto/', '/activity/2918-sushi-class-experience-kyoto/', '/activity/6032-ninja-kyoto-restaurant-labyrinth-kyoto/', '/activity/5215-garden-ryokan-nanzenji-yachiyo-kyoto/', '/activity/1079-one-day-kimono-rental-kyoto/', '/activity/3260-sagano-romantic-train-ticket-kyoto/', '/activity/675-wifi-device-japan-kyoto/', '/activity/1031-arashiyama-rickshaw-tour-kyoto/', '/activity/657-day-trip-hiroshima-miyajima-kyoto/', '/activity/4774-4G-wifi-kyoto/', '/activity/2826-gionya-kimono-rental-kyoto/', '/activity/1464-kyoto-tower-admission-ticket-kyoto/', '/activity/2249-sagano-romantic-train-ticket-kyoto/', '/activity/1777-icoca-ic-card-kyoto/', '/activity/1541-kix-airport-limousine-bus-transfer-kyoto/', '/activity/1540-hankyu-1-day-tourist-pass-osaka/', '/activity/3532-wifi-device-japan-kyoto/', '/activity/1753-randen-kyoto-bus-subway-1-day-pass-kyoto/', '/activity/1319-4g-wifi-device-kyoto/', '/activity/1447-wi-ho-japan-wifi-device-kyoto/', '/activity/3826-wifi-device-japan-kyoto/', '/activity/2699-japan-wifi-device-taiwan-kyoto/', '/activity/3652-wifi-device-singapore-kyoto/', '/activity/1122-wi-ho-japan-wifi-device-kyoto/', '/activity/719-japan-docomo-sim-card-kyoto/', '/activity/6128-kyoto-seaside-day-tour-osaka/', '/activity/6241-nanzen-ji-fushimi-inari-taisha-sagano-romantic-train-day-tour/', '/activity/5137-guenpin-fugu-restaurant-kyoto/'], 2: ['/activity/1079-one-day-kimono-rental-kyoto/', '/activity/1032-higashiyama-rickshaw-tour-kyoto/', '/activity/6128-kyoto-seaside-day-tour-osaka/', '/activity/1540-hankyu-1-day-tourist-pass-osaka/', '/activity/1777-icoca-ic-card-kyoto/', '/activity/1541-kix-airport-limousine-bus-transfer-kyoto/', '/activity/1753-randen-kyoto-bus-subway-1-day-pass-kyoto/', '/activity/3260-sagano-romantic-train-ticket-kyoto/', '/activity/793-japanese-lzakaya-cooking-course-kyoto/', '/activity/882-nishiki-market-teramachi-street-kyoto/', '/activity/792-morning-bento-cooking-course-kyoto/', '/activity/2918-sushi-class-experience-kyoto/', '/activity/6032-ninja-kyoto-restaurant-labyrinth-kyoto/', '/activity/5215-garden-ryokan-nanzenji-yachiyo-kyoto/', '/activity/6543-arashiyama-golden-pavilion-temple-todaiji-kobe-mosaic-day-tour-kyoto/', '/activity/5198-nanzenji-junsei-restaurant-kyoto/', '/activity/7877-hanami-kimono-rental-kyoto/', '/activity/793-japanese-lzakaya-cooking-course-kyoto/', '/activity/9915-kyoto-osaka-sightseeing-pass-kyoto-japan/', '/activity/883-geisha-districts-tour-kyoto/', '/activity/1097-gion-kimono-experience-kyoto/', '/activity/6032-ninja-kyoto-restaurant-labyrinth-kyoto/', '/activity/792-morning-bento-cooking-course-kyoto/', '/activity/9272-4g-data-daijobu-sim-card-kyoto/', '/activity/871-sake-brewery-visit-fushimi-inari-shrine-kyoto/', '/activity/5979-tower-terrace-kyoto/', '/activity/632-kyoto-backstreet-cycling/', '/activity/646-kyoto-afternoon-exploration/', '/activity/640-kyoto-morning-sightseeing/', '/activity/872-arashiyama-bamboo-forest-half-day-tour-kyoto/', '/activity/5272-mukadeya-kyoto/', '/activity/6081-one-night-in-kyoto/', '/activity/2918-sushi-class-experience-kyoto/', '/activity/1032-higashiyama-rickshaw-tour-kyoto/', '/activity/5445-kimono-photo-shoot-kyoto/', '/activity/5215-garden-ryokan-nanzenji-yachiyo-kyoto/', '/activity/882-nishiki-market-teramachi-street-kyoto/', '/activity/7096-japan-prepaid-sim-card-kyoto/'], 3: ['/activity/1079-one-day-kimono-rental-kyoto/', '/activity/1032-higashiyama-rickshaw-tour-kyoto/', '/activity/6128-kyoto-seaside-day-tour-osaka/', '/activity/1540-hankyu-1-day-tourist-pass-osaka/', '/activity/1777-icoca-ic-card-kyoto/', '/activity/1541-kix-airport-limousine-bus-transfer-kyoto/', '/activity/1753-randen-kyoto-bus-subway-1-day-pass-kyoto/', '/activity/3260-sagano-romantic-train-ticket-kyoto/', '/activity/793-japanese-lzakaya-cooking-course-kyoto/', '/activity/882-nishiki-market-teramachi-street-kyoto/', '/activity/792-morning-bento-cooking-course-kyoto/', '/activity/2918-sushi-class-experience-kyoto/', '/activity/6032-ninja-kyoto-restaurant-labyrinth-kyoto/', '/activity/5215-garden-ryokan-nanzenji-yachiyo-kyoto/', '/activity/5271-itoh-dining-kyoto/', '/activity/9094-sagano-sightseeing-carriage-tour-kyoto/', '/activity/8192-japan-sim-card-taiwan-airport-pickup-kyoto/', '/activity/8420-south-korea-wifi-device-kyoto/', '/activity/8644-rock-climbing-at-kyoto-konpirayama-kyoto /', '/activity/9934-3g-4g-wifi-mnl-pick-up-delivery-for-japan-kyoto/', '/activity/8966-donburi-cooking-course-and-nishiki-market-tour-kyoto/', '/activity/9215-arashiyama-kyoto-food-drink-half-day-tour/']}

Categories

Resources