Beautiful Soup findAll doesn't find them all - python

I'm trying to parse a website and get some info with the find_all() method, but it doesn't find them all.
This is the code:
#!/usr/bin/python3
from bs4 import BeautifulSoup
from urllib.request import urlopen
page = urlopen ("http://mangafox.me/directory/")
# print (page.read ())
soup = BeautifulSoup (page.read ())
manga_img = soup.findAll ('a', {'class' : 'manga_img'}, limit=None)
for manga in manga_img:
print (manga['href'])
It only prints half of them...

Different HTML parsers deal differently with broken HTML. That page serves broken HTML, and the lxml parser is not dealing very well with it:
>>> import requests
>>> from bs4 import BeautifulSoup
>>> r = requests.get('http://mangafox.me/directory/')
>>> soup = BeautifulSoup(r.content, 'lxml')
>>> len(soup.find_all('a', class_='manga_img'))
18
The standard library html.parser has less trouble with this specific page:
>>> soup = BeautifulSoup(r.content, 'html.parser')
>>> len(soup.find_all('a', class_='manga_img'))
44
Translating that to your specific code sample using urllib, you would specify the parser thus:
soup = BeautifulSoup(page, 'html.parser') # BeatifulSoup can do the reading

The quick way to grab all href elements is to use CSS Selector which will select all a tags with an href element that contains /manga at the beginning link.
Output will contain all links that starts with /manga/"title"(check this in dev tools using inspector):
import requests
from bs4 import BeautifulSoup
import lxml
html = requests.get('http://fanfox.net/directory/').text
soup = BeautifulSoup(html, 'lxml')
for a_tag in soup.select('a[href*="/manga"]'):
link = a_tag['href']
link = link[1:]
print(f'http://fanfox.net/{link}')
Alternative method:
Change requests.get to a different URL (directory/2.html)
Here's the working code(works 2-3-4-5-6.. pages as well) and replit.com to play around:
import requests
from bs4 import BeautifulSoup
import lxml
html = requests.get('http://fanfox.net/directory/').text
soup = BeautifulSoup(html, 'lxml')
for manga in soup.select('.line'):
title = manga.select('.manga-list-1-item-title a')
for t in title:
print(t.text)
for i in manga.findAll('img', class_='manga-list-1-cover'):
img = i['src']
print(img)
for l in manga.findAll('p', class_='manga-list-1-item-title'):
link = l.a['href']
link = link[1:]
print(f'http://fanfox.net/{link}')
Output(could be prettier), all in order:
A Story About Treating a Female Knig...
Tales of Demons and Gods
Martial Peak
Onepunch-Man
One Piece
Star Martial God Technique
Solo Leveling
The Last Human
Kimetsu no Yaiba
Versatile Mage
Boku no Hero Academia
Apotheosis
Black Clover
Tensei Shitara Slime Datta Ken
Kingdom
Tate no Yuusha no Nariagari
Tomo-chan wa Onna no ko!
Goblin Slayer
Yakusoku no Neverland
God of Martial Arts
Kaifuku Jutsushi no Yarinaoshi
Re:Monster
Mushoku Tensei - Isekai Ittara Honki...
Nanatsu no Taizai
Battle Through the Heavens
Shingeki no Kyojin
Iron Ladies
Monster Musume no Iru Nichijou
World’s End Harem
Bleach
Parallel Paradise
Shokugeki no Soma
Spirit Sword Sovereign
Horimiya
Dungeon ni Deai o Motomeru no wa Mac...
Dr. Stone
Berserk
The New Gate
Akatsuki no Yona
Naruto
Overlord
Death March kara Hajimaru Isekai Kyo...
Tsuki ga Michibiku Isekai Douchuu
Eternal Reverence
Minamoto-kun Monogatari
Beastars
Jujutsu Kaisen
Hajime no Ippo
Kaguya-sama wa Kokurasetai - Tensai-...
Domestic na Kanojo
The Legendary Moonlight Sculptor
The Gamer
Kumo desu ga, nani ka?
Bokutachi wa Benkyou ga Dekinai
Enen no Shouboutai
Tsuyokute New Saga
Fairy Tail
Komi-san wa Komyushou Desu.
Kenja no Mago
Soul Land
Boruto: Naruto Next Generations
Hunter X Hunter
History’s Strongest Disciple Kenichi
Phoenix against the World
LV999 no Murabito
Gate - Jietai Kare no Chi nite, Kaku...
Kengan Asura
Konjiki no Moji Tsukai - Yuusha Yoni...
Please don’t bully me, Nagatoro
Isekai Maou to Shoukan Shoujo Dorei ...
http://fmcdn.mfcdn.net/store/manga/27418/cover.jpg?token=64e5c0c930644528cba6eb2f2f5f5a2f3762188d&ttl=1616839200&v=1615891672
http://fmcdn.mfcdn.net/store/manga/16627/cover.jpg?token=33f5ea4c1ba1a013c5bdcfdac87209fe472cf6d5&ttl=1616839200&v=1616396463
http://fmcdn.mfcdn.net/store/manga/27509/cover.jpg?token=ce2b16e8e867a8ce13ad0bee9940b68eef324cac&ttl=1616839200&v=1616737688
http://fmcdn.mfcdn.net/store/manga/11362/cover.jpg?token=1a5876d8a767fd27b26f0287bbb36eb82f9cf811&ttl=1616839200&v=1615796703
http://fmcdn.mfcdn.net/store/manga/106/cover.jpg?token=5313fc0dae53f33fcd1284cd4858603fc47ffa04&ttl=1616839200&v=1616748903
http://fmcdn.mfcdn.net/store/manga/22443/cover.jpg?token=89760754754a63efc875aa7e2de0536a5238bed3&ttl=1616839200&v=1616396922
http://fmcdn.mfcdn.net/store/manga/29037/cover.jpg?token=e8b496db4ad520f002040761c5887bc1e17af63a&ttl=1616839200&v=1616653683
http://fmcdn.mfcdn.net/store/manga/28343/cover.jpg?token=71c1b201e4d714f893efb7ac984c9787dd8df915&ttl=1616839200&v=1616748232
http://fmcdn.mfcdn.net/store/manga/19287/cover.jpg?token=803eb8beab4dc6aa8d73f5137a6e3331c0034d24&ttl=1616839200&v=1609900224
http://fmcdn.mfcdn.net/store/manga/27761/cover.jpg?token=6c11f2bddb31b460fccc9a158cc13b9593fb1ad2&ttl=1616839200&v=1616740672
http://fmcdn.mfcdn.net/store/manga/14356/cover.jpg?token=93638c7ec630de193299caa8d513e045818b35ce&ttl=1616839200&v=1616170144
http://fmcdn.mfcdn.net/store/manga/27118/cover.jpg?token=9c876792ad8e6e5f9777386184ea8e6f409aa9fd&ttl=1616839200&v=1616654344
http://fmcdn.mfcdn.net/store/manga/15291/cover.jpg?token=e0a3195fcc88e397703e8bdf6580a62a0d856816&ttl=1616839200&v=1616345844
http://fmcdn.mfcdn.net/store/manga/15975/cover.jpg?token=e07844bb607a3d53ababab51683ee6fa06906d7c&ttl=1616839200&v=1616733843
http://fmcdn.mfcdn.net/store/manga/8198/cover.jpg?token=bc135016049bb63e5b65ec87207e0c91bb0c62c8&ttl=1616839200&v=1616335864
http://fmcdn.mfcdn.net/store/manga/14036/cover.jpg?token=c13dab07379e88fb871d3d833999ead13bfaf0fc&ttl=1616839200&v=1615393923
http://fmcdn.mfcdn.net/store/manga/16159/cover.jpg?token=cdf538f92f729999bcb9fcae7fb31b7a8c306c92&ttl=1616839200&v=1569492366
http://fmcdn.mfcdn.net/store/manga/20569/cover.jpg?token=f9c08cde2f0a6bd646dc87dc4a8dee6fa44eca3c&ttl=1616839200&v=1616680427
http://fmcdn.mfcdn.net/store/manga/21271/cover.jpg?token=062fd439c18afaf178d3408c64b2b305f679e91a&ttl=1616839200&v=1611285077
http://fmcdn.mfcdn.net/store/manga/26916/cover.jpg?token=cda99bf9831ada1322045bf82893a9ed1ad868d5&ttl=1616839200&v=1615188784
http://fmcdn.mfcdn.net/store/manga/26841/cover.jpg?token=055e9ff117c28b3a7c3089c4d691228adeba1f55&ttl=1616839200&v=1616201299
http://fmcdn.mfcdn.net/store/manga/13895/cover.jpg?token=e7661738326d62d38b5f93771105898cb95adaba&ttl=1616839200&v=1612570263
http://fmcdn.mfcdn.net/store/manga/14217/cover.jpg?token=3263f009d5b42e441a09e14c44e3fd7d12a83089&ttl=1616839200&v=1615259584
http://fmcdn.mfcdn.net/store/manga/11374/cover.jpg?token=ab9d85a9efdd5b41391db5249bcf0011ce07070f&ttl=1616839200&v=1600762925
http://fmcdn.mfcdn.net/store/manga/14225/cover.jpg?token=e8912699841e28f9ca8b40eb8fe1d37d2a6ce3e3&ttl=1616839200&v=1616097340
http://fmcdn.mfcdn.net/store/manga/9011/cover.jpg?token=eaca757d4352b66d4ef69812ec5c265b5a2f7a28&ttl=1616839200&v=1614982324
http://fmcdn.mfcdn.net/store/manga/29235/cover.jpg?token=23b3338eaa8984bad9c17a2d604c60c909282715&ttl=1616839200&v=1614666974
http://fmcdn.mfcdn.net/store/manga/10348/cover.jpg?token=c4209cc06013a704c9f7a0e942b8ae55a7546941&ttl=1616839200&v=1616082423
http://fmcdn.mfcdn.net/store/manga/20107/cover.jpg?token=699e867d86e4957b8ef4d3eee5200f80cdbbea88&ttl=1616839200&v=1610529669
http://fmcdn.mfcdn.net/store/manga/9/cover.jpg?token=a4894a5ce212a490dda9c6cf73b717bbfbf015c3&ttl=1616839200&v=1616593028
http://fmcdn.mfcdn.net/store/manga/24693/cover.jpg?token=d968c24525bc6fe467f40c9ad2ff087ebfb60e4a&ttl=1616839200&v=1615325943
http://fmcdn.mfcdn.net/store/manga/11529/cover.jpg?token=1a3ab38ba3f212d5c95138bb690b155f38390aab&ttl=1616839200&v=1615891829
http://fmcdn.mfcdn.net/store/manga/28001/cover.jpg?token=1769a66a83df9adfed58a36dc9275f202d1f8f37&ttl=1616839200&v=1615891671
http://fmcdn.mfcdn.net/store/manga/11147/cover.jpg?token=e6d602fcd4b438ec299c955738487127cef7a3bf&ttl=1616839200&v=1616264399
http://fmcdn.mfcdn.net/store/manga/12978/cover.jpg?token=7e9094f238fcbd19717ffeeb4dcfe686a99dba4b&ttl=1616839200&v=1568611983
http://fmcdn.mfcdn.net/store/manga/24445/cover.jpg?token=0f77d7a743c0f613ff773f3e430f688e3aa77239&ttl=1616839200&v=1616345762
http://fmcdn.mfcdn.net/store/manga/176/cover.jpg?token=e8e87528092cd5b902767d7564e035486b8535f2&ttl=1616839200&v=1611297351
http://fmcdn.mfcdn.net/store/manga/14588/cover.jpg?token=469da1dfa4953459e08efdeb24561f78f7a68b47&ttl=1616839200&v=1615891989
http://fmcdn.mfcdn.net/store/manga/9126/cover.jpg?token=53689bb06b90c163b58b0410e80252941b27aff6&ttl=1616839200&v=1616083893
http://fmcdn.mfcdn.net/store/manga/8/cover.jpg?token=8e5cbd08bd42f0684f36f107fc991c75b56bbed2&ttl=1616839200&v=1615891989
http://fmcdn.mfcdn.net/store/manga/14765/cover.jpg?token=8a8e0582258d852b4c9d017567dd6820958f5a67&ttl=1616839200&v=1615042503
http://fmcdn.mfcdn.net/store/manga/16457/cover.jpg?token=7e59859f7af131902006c3eb8ed55745ef14573f&ttl=1616839200&v=1613139843
http://fmcdn.mfcdn.net/store/manga/16675/cover.jpg?token=cbb268f1326b704b1bb11accadc35ae3b7222e39&ttl=1616839200&v=1615891829
http://fmcdn.mfcdn.net/store/manga/26261/cover.jpg?token=d83f514efe719b2dd301c2ecc8d672e9d935084c&ttl=1616839200&v=1613384403
http://fmcdn.mfcdn.net/store/manga/9518/cover.jpg?token=76170cb8b2defc468a817a69bf6e799900c4fd9f&ttl=1616839200&v=1596437944
http://fmcdn.mfcdn.net/store/manga/24547/cover.jpg?token=b99d7b791e14ec290054d57ead4dcf9fb61b4d7a&ttl=1616839200&v=1615891989
http://fmcdn.mfcdn.net/store/manga/27861/cover.jpg?token=d14b0f3f2362869830c2971007e86ca43637bb85&ttl=1616839200&v=1616345044
http://fmcdn.mfcdn.net/store/manga/231/cover.jpg?token=53c2dc9eb6bf5c6f635de12496088a27b28e04f7&ttl=1616839200&v=1616418784
http://fmcdn.mfcdn.net/store/manga/17825/cover.jpg?token=f1b7954fba32d3146282b2b5bba4e1419578d65b&ttl=1616839200&v=1616677923
http://fmcdn.mfcdn.net/store/manga/14099/cover.jpg?token=7b7a61b4e544a65a75394e4cabf04831cf0c5d7a&ttl=1616839200&v=1611909666
http://fmcdn.mfcdn.net/store/manga/15177/cover.jpg?token=4442c2f4cf7e5c69d3449e7b358960930ff19e11&ttl=1616839200&v=1605145143
http://fmcdn.mfcdn.net/store/manga/13088/cover.jpg?token=d8ab36b3d0f4d9c6263a4f482f98c4d99809eb36&ttl=1616839200&v=1616641226
http://fmcdn.mfcdn.net/store/manga/18225/cover.jpg?token=ea670a4bc8d1aa0312f5427b24bf5702c12ef3a3&ttl=1616839200&v=1615470603
http://fmcdn.mfcdn.net/store/manga/23945/cover.jpg?token=9e078e0cb6da91194a6f86c814ae03922e8460d0&ttl=1616839200&v=1615891671
http://fmcdn.mfcdn.net/store/manga/17045/cover.jpg?token=3026e40a21e490f37c656a778e9227c6c891cade&ttl=1616839200&v=1615891829
http://fmcdn.mfcdn.net/store/manga/13930/cover.jpg?token=f773694a746e2015b4ca5c46afcc801d9795393c&ttl=1616839200&v=1616100123
http://fmcdn.mfcdn.net/store/manga/246/cover.jpg?token=3926211df393a0d50e58c0285c05f067c1ad64e5&ttl=1616839200&v=1615891989
http://fmcdn.mfcdn.net/store/manga/17189/cover.jpg?token=f9ffcf2a07bb8d1f7a49eac36c1f6c4fcd7e5622&ttl=1616839200&v=1616514627
http://fmcdn.mfcdn.net/store/manga/20299/cover.jpg?token=121f6571e072381a545e9e3790b4bf1723865859&ttl=1616839200&v=1615891671
http://fmcdn.mfcdn.net/store/manga/13841/cover.jpg?token=86245cf3afab622c35a41f4e2bf388ac48713906&ttl=1616839200&v=1615891672
http://fmcdn.mfcdn.net/store/manga/19939/cover.jpg?token=563a2963a0a153ac1c53779712f48af5630e0377&ttl=1616839200&v=1616714152
http://fmcdn.mfcdn.net/store/manga/44/cover.jpg?token=febabec452a05c1415f02bf8387a0a8f16c20137&ttl=1616839200&v=1548837372
http://fmcdn.mfcdn.net/store/manga/107/cover.jpg?token=3dcce47a3a6760b9b81b7b576711980d36cf7be1&ttl=1616839200&v=1543561843
http://fmcdn.mfcdn.net/store/manga/24241/cover.jpg?token=b4a1834d714f0476c2d99c5ffb905351c7a4d72f&ttl=1616839200&v=1616176266
http://fmcdn.mfcdn.net/store/manga/25773/cover.jpg?token=7bf8a8e9346a02250bb24cd8e6e4da0933e6a05f&ttl=1616839200&v=1616655977
http://fmcdn.mfcdn.net/store/manga/10956/cover.jpg?token=db3b74dc959adedbd847142cd3a079caca6b25d1&ttl=1616839200&v=1612043463
http://fmcdn.mfcdn.net/store/manga/15593/cover.jpg?token=caceb80b7266f438bdedae8cf69653ab7911fe68&ttl=1616839200&v=1606188363
http://fmcdn.mfcdn.net/store/manga/14916/cover.jpg?token=0dab5e6797f4cc915a035632ed0d02a2492afbcc&ttl=1616839200&v=1609752363
http://fmcdn.mfcdn.net/store/manga/26771/cover.jpg?token=77a6aa9bbb7ebcd3df15cd4cc65b4e3915e96ed4&ttl=1616839200&v=1615891829
http://fmcdn.mfcdn.net/store/manga/16569/cover.jpg?token=e5815ac1520ad179ad2d6f798e4b6ead6790cd33&ttl=1616839200&v=1614957071
http://fanfox.net/manga/a_story_about_treating_a_female_knight_who_has_never_been_treated_as_a_woman_as_a_woman/
http://fanfox.net/manga/tales_of_demons_and_gods/
http://fanfox.net/manga/martial_peak/
http://fanfox.net/manga/onepunch_man/
http://fanfox.net/manga/one_piece/
http://fanfox.net/manga/star_martial_god_technique/
http://fanfox.net/manga/solo_leveling/
http://fanfox.net/manga/the_last_human/
http://fanfox.net/manga/kimetsu_no_yaiba/
http://fanfox.net/manga/versatile_mage/
http://fanfox.net/manga/boku_no_hero_academia/
http://fanfox.net/manga/apotheosis/
http://fanfox.net/manga/black_clover/
http://fanfox.net/manga/tensei_shitara_slime_datta_ken/
http://fanfox.net/manga/kingdom/
http://fanfox.net/manga/tate_no_yuusha_no_nariagari/
http://fanfox.net/manga/tomo_chan_wa_onna_no_ko/
http://fanfox.net/manga/goblin_slayer/
http://fanfox.net/manga/yakusoku_no_neverland/
http://fanfox.net/manga/god_of_martial_arts/
http://fanfox.net/manga/kaifuku_jutsushi_no_yarinaoshi/
http://fanfox.net/manga/re_monster/
http://fanfox.net/manga/mushoku_tensei_isekai_ittara_honki_dasu/
http://fanfox.net/manga/nanatsu_no_taizai/
http://fanfox.net/manga/battle_through_the_heavens/
http://fanfox.net/manga/shingeki_no_kyojin/
http://fanfox.net/manga/iron_ladies/
http://fanfox.net/manga/monster_musume_no_iru_nichijou/
http://fanfox.net/manga/world_s_end_harem/
http://fanfox.net/manga/bleach/
http://fanfox.net/manga/parallel_paradise/
http://fanfox.net/manga/shokugeki_no_soma/
http://fanfox.net/manga/spirit_sword_sovereign/
http://fanfox.net/manga/horimiya/
http://fanfox.net/manga/dungeon_ni_deai_o_motomeru_no_wa_machigatte_iru_darou_ka/
http://fanfox.net/manga/dr_stone/
http://fanfox.net/manga/berserk/
http://fanfox.net/manga/the_new_gate/
http://fanfox.net/manga/akatsuki_no_yona/
http://fanfox.net/manga/naruto/
http://fanfox.net/manga/overlord/
http://fanfox.net/manga/death_march_kara_hajimaru_isekai_kyousoukyoku/
http://fanfox.net/manga/tsuki_ga_michibiku_isekai_douchuu/
http://fanfox.net/manga/eternal_reverence/
http://fanfox.net/manga/minamoto_kun_monogatari/
http://fanfox.net/manga/beastars/
http://fanfox.net/manga/jujutsu_kaisen/
http://fanfox.net/manga/hajime_no_ippo/
http://fanfox.net/manga/kaguya_sama_wa_kokurasetai_tensai_tachi_no_renai_zunousen/
http://fanfox.net/manga/domestic_na_kanojo/
http://fanfox.net/manga/the_legendary_moonlight_sculptor/
http://fanfox.net/manga/the_gamer/
http://fanfox.net/manga/kumo_desu_ga_nani_ka/
http://fanfox.net/manga/bokutachi_wa_benkyou_ga_dekinai/
http://fanfox.net/manga/enen_no_shouboutai/
http://fanfox.net/manga/tsuyokute_new_saga/
http://fanfox.net/manga/fairy_tail/
http://fanfox.net/manga/komi_san_wa_komyushou_desu/
http://fanfox.net/manga/kenja_no_mago/
http://fanfox.net/manga/soul_land/
http://fanfox.net/manga/boruto_naruto_next_generations/
http://fanfox.net/manga/hunter_x_hunter/
http://fanfox.net/manga/history_s_strongest_disciple_kenichi/
http://fanfox.net/manga/phoenix_against_the_world/
http://fanfox.net/manga/lv999_no_murabito/
http://fanfox.net/manga/gate_jietai_kare_no_chi_nite_kaku_tatakeri/
http://fanfox.net/manga/kengan_asura/
http://fanfox.net/manga/konjiki_no_moji_tsukai_yuusha_yonin_ni_makikomareta_unique_cheat/
http://fanfox.net/manga/please_don_t_bully_me_nagatoro/
http://fanfox.net/manga/isekai_maou_to_shoukan_shoujo_dorei_majutsu/
I found the best way (for me) with .find_all()/.findAll() methods is just to use for loop, same goes with .select() method.
And in some cases .select() giving better results.
Check out SelectorGadget to quickly find css selector.

Related

How can we use Mozilla to Screen Scrape raw data from real estate listings?

I'm looking at this URL.
https://www.century21.com/real-estate/long-island-city-ny/LCNYLONGISLANDCITY/
I'm trying to get this text, in a structured format.
FOR SALE
$1,248,000
3 beds
2 baths
45-09 Skillman Avenue
Sunnyside NY 11104
Listed By CENTURY 21 Sunny Gardens Realty, Inc.
##########################################
FOR SALE
$1,390,000
5 beds
3 baths
2,200 sq. ft
47-35 39th Place
Sunnyside NY 11104
Courtesy Of Keller Williams Realty of Greater Nassau
Here's the sample code that i tried to hack together.
from selenium import webdriver
from bs4 import BeautifulSoup
import pandas as pd
from time import sleep
url='https://www.century21.com/real-estate/long-island-city-ny/LCNYLONGISLANDCITY/'
driver = webdriver.Chrome('C:\\Utility\\chromedriver.exe')
driver.get(url)
sleep(3)
content = driver.page_source
soup = BeautifulSoup(content, features='html.parser')
for element in soup.findAll('div', attrs={'class': 'infinite-item property-card clearfix property-card-C2183089596 initialized visited'}):
#print(element)
address = element.find('div', attrs={'class': 'property-card-primary-info'})
print(address)
price = element.find('a', attrs={'class': 'listing-price'})
print(price)
When I run this, I get no addresses and no prices. Not sure why.
Web scraping is more of an art than a science. It's helpful to pull up the page source in chrome or browser of your choice so you can think about the DOM hierarchy and figure out how to get down into the elements that you need to scrape. Some websites have been built very cleanly and this isn't too much work, and others are scrapped together nonsense that are nightmares to dig data out of it.
This one, thankfully, is very clean.
This isn't perfect, but I think it will get you in the ballpark:
import requests
from bs4 import BeautifulSoup
url='https://www.century21.com/real-estate/long-island-city-ny/LCNYLONGISLANDCITY/'
page = requests.get(url)
soup = BeautifulSoup(page.content, features='html.parser')
for element in soup.findAll('div', attrs={'class': 'property-card'}):
address = element.find('div', attrs={'class': 'property-card-primary-info'}).find('div', attrs={'class': 'property-address-info'})
for address_item in address.children:
print(address_item.get_text().strip())
price = element.find('div',attrs={'class': 'property-card-primary-info'}).find('a', attrs={'class': 'listing-price'})
print(price.get_text().strip())

neither find_all nor find works

I am trying to scrape the name of every favorites on the page of a user of our choice. but with this code I get the error "ResultSet object has no attribute 'find_all'" but if I try to use find it get the opposite error and it ask me to use find_all. I'm a beginner and I don't know what to do. (also to test the code you can use the username "Kineta" she's an administrator so anyone can get access to her profile page).
thanks for your help
from bs4 import BeautifulSoup
import requests
usr_name = str(input('the user you are searching for '))
html_text = requests.get('https://myanimelist.net/profile/'+usr_name)
soup = BeautifulSoup(html_text.text, 'lxml')
favs = soup.find_all('div', class_='fav-slide-outer')
favs_title = favs.find_all('span', class_='title fs10')
print(favs_title)
Your program throws exception because you are trying to use .find_all on ResultSet (favs_title = favs.find_all(...), ResultSetdoesn't have function.find_all`). Instead, you can use CSS selector and select all required elements directly:
import requests
from bs4 import BeautifulSoup
url = "https://myanimelist.net/profile/Kineta"
soup = BeautifulSoup(requests.get(url).content, "html.parser")
for t in soup.select(".fav-slide .title"):
print(t.text)
Prints:
Kono Oto Tomare!
Yuukoku no Moriarty
Kaze ga Tsuyoku Fuiteiru
ACCA: 13-ku Kansatsu-ka
Fukigen na Mononokean
Kakuriyo no Yadomeshi
Shirokuma Cafe
Fruits Basket
Akatsuki no Yona
Colette wa Shinu Koto ni Shita
Okobore Hime to Entaku no Kishi
Meteor Methuselah
Inu x Boku SS
Vampire Juujikai
Mirako, Yuuta
Forger, Loid
Osaki, Kaname
Miyazumi, Tatsuru
Takaoka, Tetsuki
Okamoto, Souma
Shirota, Tsukasa
Archiviste, Noé
Fang, Li Ren
Fukuroi, Michiru
Sakurayashiki, Kaoru
James Moriarty, Albert
Souma, Kyou
Hades
Yona
Son, Hak
Mashima, Taichi
Ootomo, Jin
Collabel, Yuca
Masuda, Toshiki
Furukawa, Makoto
Satou, Takuya
Midorikawa, Hikaru
Miki, Shinichiro
Hino, Satoshi
Hosoya, Yoshimasa
Kimura, Ryouhei
Ono, Daisuke
KENN
Yoshino, Hiroyuki
Toriumi, Kousuke
Toyonaga, Toshiyuki
Ooishi, Masayoshi
Shirodaira, Kyou
Hakusensha
EDIT: To get Anime/Manga/Character favorites:
import requests
from bs4 import BeautifulSoup
url = "https://myanimelist.net/profile/Kineta"
soup = BeautifulSoup(requests.get(url).content, "html.parser")
anime_favorites = [t.text for t in soup.select("#anime_favorites .title")]
manga_favorites = [t.text for t in soup.select("#manga_favorites .title")]
char_favorites = [t.text for t in soup.select("#character_favorites .title")]
print("Anime Favorites")
print("-" * 80)
print(*anime_favorites, sep="\n")
print()
print("Manga Favorites")
print("-" * 80)
print(*manga_favorites, sep="\n")
print()
print("Character Favorites")
print("-" * 80)
print(*char_favorites, sep="\n")
Prints:
Anime Favorites
--------------------------------------------------------------------------------
Kono Oto Tomare!
Yuukoku no Moriarty
Kaze ga Tsuyoku Fuiteiru
ACCA: 13-ku Kansatsu-ka
Fukigen na Mononokean
Kakuriyo no Yadomeshi
Shirokuma Cafe
Manga Favorites
--------------------------------------------------------------------------------
Fruits Basket
Akatsuki no Yona
Colette wa Shinu Koto ni Shita
Okobore Hime to Entaku no Kishi
Meteor Methuselah
Inu x Boku SS
Vampire Juujikai
Character Favorites
--------------------------------------------------------------------------------
Mirako, Yuuta
Forger, Loid
Osaki, Kaname
Miyazumi, Tatsuru
Takaoka, Tetsuki
Okamoto, Souma
Shirota, Tsukasa
Archiviste, Noé
Fang, Li Ren
Fukuroi, Michiru
Sakurayashiki, Kaoru
James Moriarty, Albert
Souma, Kyou
Hades
Yona
Son, Hak
Mashima, Taichi
Ootomo, Jin
Collabel, Yuca
find and find_all work you just need to use them correctly. You can't use them to search through lists (like the 'favs' variable in your example). You can always iterate through the lists with for loop and use the 'find' or 'find_all' functions.
I preferred making it a bit easier but you can choose the way you prefer as I am not sure if mine is more efficient:
from bs4 import BeautifulSoup
import requests
usr_name = str(input('the user you are searching for '))
html_text = requests.get('https://myanimelist.net/profile/'+usr_name)
soup = BeautifulSoup(html_text.text, 'lxml')
favs = soup.find_all('div', class_='fav-slide-outer')
for fav in favs:
tag=fav.span
print(tag.text)
If you need more info on how to use bs4 functions correctly i suggest looking through their docks here.
I looked at the page a bit and changed to code a bit, this way you should get all the results you need:
from bs4 import BeautifulSoup
import requests
usr_name = str(input('the user you are searching for '))
html_text = requests.get('https://myanimelist.net/profile/'+usr_name)
soup = BeautifulSoup(html_text.text, 'lxml')
favs = soup.find_all('li', class_='btn-fav')
for fav in favs:
tag=fav.span
print(tag.text)
I think the problem here is not really the code but how you searched your results and how the site is structured.

Generating URL for Yahoo news and Bing news with Python and BeautifulSoup

I want to scrape data from Yahoo News and 'Bing News' pages. The data that I want to scrape are headlines or/and text below headlines (what ever It can be scraped) and dates (time) when its posted.
I have wrote a code but It does not return anything. Its the problem with my url since Im getting response 404
Can you please help me with it?
This is the code for 'Bing'
from bs4 import BeautifulSoup
import requests
term = 'usa'
url = 'http://www.bing.com/news/q?s={}'.format(term)
response = requests.get(url)
print(response)
soup = BeautifulSoup(response.text, 'html.parser')
print(soup)
And this is for Yahoo:
term = 'usa'
url = 'http://news.search.yahoo.com/q?s={}'.format(term)
response = requests.get(url)
print(response)
soup = BeautifulSoup(response.text, 'html.parser')
print(soup)
Please help me to generate these urls, whats the logic behind them, Im still a noob :)
Basically your urls are just wrong. The urls that you have to use are the same ones that you find in the address bar while using a regular browser. Usually most search engines and aggregators use q parameter for the search term. Most of the other parameters are usually not required (sometimes they are - eg. for specifying result page no etc..).
Bing
from bs4 import BeautifulSoup
import requests
import re
term = 'usa'
url = 'https://www.bing.com/news/search?q={}'.format(term)
response = requests.get(url)
soup = BeautifulSoup(response.text, 'html.parser')
for news_card in soup.find_all('div', class_="news-card-body"):
title = news_card.find('a', class_="title").text
time = news_card.find(
'span',
attrs={'aria-label': re.compile(".*ago$")}
).text
print("{} ({})".format(title, time))
Output
Jason Mohammed blitzkrieg sinks USA (17h)
USA Swimming held not liable by California jury in sexual abuse case (1d)
United States 4-1 Canada: USA secure payback in Nations League (1d)
USA always plays the Dalai Lama card in dealing with China, says Chinese Professor (1d)
...
Yahoo
from bs4 import BeautifulSoup
import requests
term = 'usa'
url = 'https://news.search.yahoo.com/search?q={}'.format(term)
response = requests.get(url)
soup = BeautifulSoup(response.text, 'html.parser')
for news_item in soup.find_all('div', class_='NewsArticle'):
title = news_item.find('h4').text
time = news_item.find('span', class_='fc-2nd').text
# Clean time text
time = time.replace('·', '').strip()
print("{} ({})".format(title, time))
Output
USA Baseball will return to Arizona for second Olympic qualifying chance (52 minutes ago)
Prized White Sox prospect Andrew Vaughn wraps up stint with USA Baseball (28 minutes ago)
Mexico defeats USA in extras for Olympic berth (13 hours ago)
...

How to get <li>'s using BeautifulSoup [duplicate]

I'm trying to parse a website and get some info with the find_all() method, but it doesn't find them all.
This is the code:
#!/usr/bin/python3
from bs4 import BeautifulSoup
from urllib.request import urlopen
page = urlopen ("http://mangafox.me/directory/")
# print (page.read ())
soup = BeautifulSoup (page.read ())
manga_img = soup.findAll ('a', {'class' : 'manga_img'}, limit=None)
for manga in manga_img:
print (manga['href'])
It only prints half of them...
Different HTML parsers deal differently with broken HTML. That page serves broken HTML, and the lxml parser is not dealing very well with it:
>>> import requests
>>> from bs4 import BeautifulSoup
>>> r = requests.get('http://mangafox.me/directory/')
>>> soup = BeautifulSoup(r.content, 'lxml')
>>> len(soup.find_all('a', class_='manga_img'))
18
The standard library html.parser has less trouble with this specific page:
>>> soup = BeautifulSoup(r.content, 'html.parser')
>>> len(soup.find_all('a', class_='manga_img'))
44
Translating that to your specific code sample using urllib, you would specify the parser thus:
soup = BeautifulSoup(page, 'html.parser') # BeatifulSoup can do the reading
The quick way to grab all href elements is to use CSS Selector which will select all a tags with an href element that contains /manga at the beginning link.
Output will contain all links that starts with /manga/"title"(check this in dev tools using inspector):
import requests
from bs4 import BeautifulSoup
import lxml
html = requests.get('http://fanfox.net/directory/').text
soup = BeautifulSoup(html, 'lxml')
for a_tag in soup.select('a[href*="/manga"]'):
link = a_tag['href']
link = link[1:]
print(f'http://fanfox.net/{link}')
Alternative method:
Change requests.get to a different URL (directory/2.html)
Here's the working code(works 2-3-4-5-6.. pages as well) and replit.com to play around:
import requests
from bs4 import BeautifulSoup
import lxml
html = requests.get('http://fanfox.net/directory/').text
soup = BeautifulSoup(html, 'lxml')
for manga in soup.select('.line'):
title = manga.select('.manga-list-1-item-title a')
for t in title:
print(t.text)
for i in manga.findAll('img', class_='manga-list-1-cover'):
img = i['src']
print(img)
for l in manga.findAll('p', class_='manga-list-1-item-title'):
link = l.a['href']
link = link[1:]
print(f'http://fanfox.net/{link}')
Output(could be prettier), all in order:
A Story About Treating a Female Knig...
Tales of Demons and Gods
Martial Peak
Onepunch-Man
One Piece
Star Martial God Technique
Solo Leveling
The Last Human
Kimetsu no Yaiba
Versatile Mage
Boku no Hero Academia
Apotheosis
Black Clover
Tensei Shitara Slime Datta Ken
Kingdom
Tate no Yuusha no Nariagari
Tomo-chan wa Onna no ko!
Goblin Slayer
Yakusoku no Neverland
God of Martial Arts
Kaifuku Jutsushi no Yarinaoshi
Re:Monster
Mushoku Tensei - Isekai Ittara Honki...
Nanatsu no Taizai
Battle Through the Heavens
Shingeki no Kyojin
Iron Ladies
Monster Musume no Iru Nichijou
World’s End Harem
Bleach
Parallel Paradise
Shokugeki no Soma
Spirit Sword Sovereign
Horimiya
Dungeon ni Deai o Motomeru no wa Mac...
Dr. Stone
Berserk
The New Gate
Akatsuki no Yona
Naruto
Overlord
Death March kara Hajimaru Isekai Kyo...
Tsuki ga Michibiku Isekai Douchuu
Eternal Reverence
Minamoto-kun Monogatari
Beastars
Jujutsu Kaisen
Hajime no Ippo
Kaguya-sama wa Kokurasetai - Tensai-...
Domestic na Kanojo
The Legendary Moonlight Sculptor
The Gamer
Kumo desu ga, nani ka?
Bokutachi wa Benkyou ga Dekinai
Enen no Shouboutai
Tsuyokute New Saga
Fairy Tail
Komi-san wa Komyushou Desu.
Kenja no Mago
Soul Land
Boruto: Naruto Next Generations
Hunter X Hunter
History’s Strongest Disciple Kenichi
Phoenix against the World
LV999 no Murabito
Gate - Jietai Kare no Chi nite, Kaku...
Kengan Asura
Konjiki no Moji Tsukai - Yuusha Yoni...
Please don’t bully me, Nagatoro
Isekai Maou to Shoukan Shoujo Dorei ...
http://fmcdn.mfcdn.net/store/manga/27418/cover.jpg?token=64e5c0c930644528cba6eb2f2f5f5a2f3762188d&ttl=1616839200&v=1615891672
http://fmcdn.mfcdn.net/store/manga/16627/cover.jpg?token=33f5ea4c1ba1a013c5bdcfdac87209fe472cf6d5&ttl=1616839200&v=1616396463
http://fmcdn.mfcdn.net/store/manga/27509/cover.jpg?token=ce2b16e8e867a8ce13ad0bee9940b68eef324cac&ttl=1616839200&v=1616737688
http://fmcdn.mfcdn.net/store/manga/11362/cover.jpg?token=1a5876d8a767fd27b26f0287bbb36eb82f9cf811&ttl=1616839200&v=1615796703
http://fmcdn.mfcdn.net/store/manga/106/cover.jpg?token=5313fc0dae53f33fcd1284cd4858603fc47ffa04&ttl=1616839200&v=1616748903
http://fmcdn.mfcdn.net/store/manga/22443/cover.jpg?token=89760754754a63efc875aa7e2de0536a5238bed3&ttl=1616839200&v=1616396922
http://fmcdn.mfcdn.net/store/manga/29037/cover.jpg?token=e8b496db4ad520f002040761c5887bc1e17af63a&ttl=1616839200&v=1616653683
http://fmcdn.mfcdn.net/store/manga/28343/cover.jpg?token=71c1b201e4d714f893efb7ac984c9787dd8df915&ttl=1616839200&v=1616748232
http://fmcdn.mfcdn.net/store/manga/19287/cover.jpg?token=803eb8beab4dc6aa8d73f5137a6e3331c0034d24&ttl=1616839200&v=1609900224
http://fmcdn.mfcdn.net/store/manga/27761/cover.jpg?token=6c11f2bddb31b460fccc9a158cc13b9593fb1ad2&ttl=1616839200&v=1616740672
http://fmcdn.mfcdn.net/store/manga/14356/cover.jpg?token=93638c7ec630de193299caa8d513e045818b35ce&ttl=1616839200&v=1616170144
http://fmcdn.mfcdn.net/store/manga/27118/cover.jpg?token=9c876792ad8e6e5f9777386184ea8e6f409aa9fd&ttl=1616839200&v=1616654344
http://fmcdn.mfcdn.net/store/manga/15291/cover.jpg?token=e0a3195fcc88e397703e8bdf6580a62a0d856816&ttl=1616839200&v=1616345844
http://fmcdn.mfcdn.net/store/manga/15975/cover.jpg?token=e07844bb607a3d53ababab51683ee6fa06906d7c&ttl=1616839200&v=1616733843
http://fmcdn.mfcdn.net/store/manga/8198/cover.jpg?token=bc135016049bb63e5b65ec87207e0c91bb0c62c8&ttl=1616839200&v=1616335864
http://fmcdn.mfcdn.net/store/manga/14036/cover.jpg?token=c13dab07379e88fb871d3d833999ead13bfaf0fc&ttl=1616839200&v=1615393923
http://fmcdn.mfcdn.net/store/manga/16159/cover.jpg?token=cdf538f92f729999bcb9fcae7fb31b7a8c306c92&ttl=1616839200&v=1569492366
http://fmcdn.mfcdn.net/store/manga/20569/cover.jpg?token=f9c08cde2f0a6bd646dc87dc4a8dee6fa44eca3c&ttl=1616839200&v=1616680427
http://fmcdn.mfcdn.net/store/manga/21271/cover.jpg?token=062fd439c18afaf178d3408c64b2b305f679e91a&ttl=1616839200&v=1611285077
http://fmcdn.mfcdn.net/store/manga/26916/cover.jpg?token=cda99bf9831ada1322045bf82893a9ed1ad868d5&ttl=1616839200&v=1615188784
http://fmcdn.mfcdn.net/store/manga/26841/cover.jpg?token=055e9ff117c28b3a7c3089c4d691228adeba1f55&ttl=1616839200&v=1616201299
http://fmcdn.mfcdn.net/store/manga/13895/cover.jpg?token=e7661738326d62d38b5f93771105898cb95adaba&ttl=1616839200&v=1612570263
http://fmcdn.mfcdn.net/store/manga/14217/cover.jpg?token=3263f009d5b42e441a09e14c44e3fd7d12a83089&ttl=1616839200&v=1615259584
http://fmcdn.mfcdn.net/store/manga/11374/cover.jpg?token=ab9d85a9efdd5b41391db5249bcf0011ce07070f&ttl=1616839200&v=1600762925
http://fmcdn.mfcdn.net/store/manga/14225/cover.jpg?token=e8912699841e28f9ca8b40eb8fe1d37d2a6ce3e3&ttl=1616839200&v=1616097340
http://fmcdn.mfcdn.net/store/manga/9011/cover.jpg?token=eaca757d4352b66d4ef69812ec5c265b5a2f7a28&ttl=1616839200&v=1614982324
http://fmcdn.mfcdn.net/store/manga/29235/cover.jpg?token=23b3338eaa8984bad9c17a2d604c60c909282715&ttl=1616839200&v=1614666974
http://fmcdn.mfcdn.net/store/manga/10348/cover.jpg?token=c4209cc06013a704c9f7a0e942b8ae55a7546941&ttl=1616839200&v=1616082423
http://fmcdn.mfcdn.net/store/manga/20107/cover.jpg?token=699e867d86e4957b8ef4d3eee5200f80cdbbea88&ttl=1616839200&v=1610529669
http://fmcdn.mfcdn.net/store/manga/9/cover.jpg?token=a4894a5ce212a490dda9c6cf73b717bbfbf015c3&ttl=1616839200&v=1616593028
http://fmcdn.mfcdn.net/store/manga/24693/cover.jpg?token=d968c24525bc6fe467f40c9ad2ff087ebfb60e4a&ttl=1616839200&v=1615325943
http://fmcdn.mfcdn.net/store/manga/11529/cover.jpg?token=1a3ab38ba3f212d5c95138bb690b155f38390aab&ttl=1616839200&v=1615891829
http://fmcdn.mfcdn.net/store/manga/28001/cover.jpg?token=1769a66a83df9adfed58a36dc9275f202d1f8f37&ttl=1616839200&v=1615891671
http://fmcdn.mfcdn.net/store/manga/11147/cover.jpg?token=e6d602fcd4b438ec299c955738487127cef7a3bf&ttl=1616839200&v=1616264399
http://fmcdn.mfcdn.net/store/manga/12978/cover.jpg?token=7e9094f238fcbd19717ffeeb4dcfe686a99dba4b&ttl=1616839200&v=1568611983
http://fmcdn.mfcdn.net/store/manga/24445/cover.jpg?token=0f77d7a743c0f613ff773f3e430f688e3aa77239&ttl=1616839200&v=1616345762
http://fmcdn.mfcdn.net/store/manga/176/cover.jpg?token=e8e87528092cd5b902767d7564e035486b8535f2&ttl=1616839200&v=1611297351
http://fmcdn.mfcdn.net/store/manga/14588/cover.jpg?token=469da1dfa4953459e08efdeb24561f78f7a68b47&ttl=1616839200&v=1615891989
http://fmcdn.mfcdn.net/store/manga/9126/cover.jpg?token=53689bb06b90c163b58b0410e80252941b27aff6&ttl=1616839200&v=1616083893
http://fmcdn.mfcdn.net/store/manga/8/cover.jpg?token=8e5cbd08bd42f0684f36f107fc991c75b56bbed2&ttl=1616839200&v=1615891989
http://fmcdn.mfcdn.net/store/manga/14765/cover.jpg?token=8a8e0582258d852b4c9d017567dd6820958f5a67&ttl=1616839200&v=1615042503
http://fmcdn.mfcdn.net/store/manga/16457/cover.jpg?token=7e59859f7af131902006c3eb8ed55745ef14573f&ttl=1616839200&v=1613139843
http://fmcdn.mfcdn.net/store/manga/16675/cover.jpg?token=cbb268f1326b704b1bb11accadc35ae3b7222e39&ttl=1616839200&v=1615891829
http://fmcdn.mfcdn.net/store/manga/26261/cover.jpg?token=d83f514efe719b2dd301c2ecc8d672e9d935084c&ttl=1616839200&v=1613384403
http://fmcdn.mfcdn.net/store/manga/9518/cover.jpg?token=76170cb8b2defc468a817a69bf6e799900c4fd9f&ttl=1616839200&v=1596437944
http://fmcdn.mfcdn.net/store/manga/24547/cover.jpg?token=b99d7b791e14ec290054d57ead4dcf9fb61b4d7a&ttl=1616839200&v=1615891989
http://fmcdn.mfcdn.net/store/manga/27861/cover.jpg?token=d14b0f3f2362869830c2971007e86ca43637bb85&ttl=1616839200&v=1616345044
http://fmcdn.mfcdn.net/store/manga/231/cover.jpg?token=53c2dc9eb6bf5c6f635de12496088a27b28e04f7&ttl=1616839200&v=1616418784
http://fmcdn.mfcdn.net/store/manga/17825/cover.jpg?token=f1b7954fba32d3146282b2b5bba4e1419578d65b&ttl=1616839200&v=1616677923
http://fmcdn.mfcdn.net/store/manga/14099/cover.jpg?token=7b7a61b4e544a65a75394e4cabf04831cf0c5d7a&ttl=1616839200&v=1611909666
http://fmcdn.mfcdn.net/store/manga/15177/cover.jpg?token=4442c2f4cf7e5c69d3449e7b358960930ff19e11&ttl=1616839200&v=1605145143
http://fmcdn.mfcdn.net/store/manga/13088/cover.jpg?token=d8ab36b3d0f4d9c6263a4f482f98c4d99809eb36&ttl=1616839200&v=1616641226
http://fmcdn.mfcdn.net/store/manga/18225/cover.jpg?token=ea670a4bc8d1aa0312f5427b24bf5702c12ef3a3&ttl=1616839200&v=1615470603
http://fmcdn.mfcdn.net/store/manga/23945/cover.jpg?token=9e078e0cb6da91194a6f86c814ae03922e8460d0&ttl=1616839200&v=1615891671
http://fmcdn.mfcdn.net/store/manga/17045/cover.jpg?token=3026e40a21e490f37c656a778e9227c6c891cade&ttl=1616839200&v=1615891829
http://fmcdn.mfcdn.net/store/manga/13930/cover.jpg?token=f773694a746e2015b4ca5c46afcc801d9795393c&ttl=1616839200&v=1616100123
http://fmcdn.mfcdn.net/store/manga/246/cover.jpg?token=3926211df393a0d50e58c0285c05f067c1ad64e5&ttl=1616839200&v=1615891989
http://fmcdn.mfcdn.net/store/manga/17189/cover.jpg?token=f9ffcf2a07bb8d1f7a49eac36c1f6c4fcd7e5622&ttl=1616839200&v=1616514627
http://fmcdn.mfcdn.net/store/manga/20299/cover.jpg?token=121f6571e072381a545e9e3790b4bf1723865859&ttl=1616839200&v=1615891671
http://fmcdn.mfcdn.net/store/manga/13841/cover.jpg?token=86245cf3afab622c35a41f4e2bf388ac48713906&ttl=1616839200&v=1615891672
http://fmcdn.mfcdn.net/store/manga/19939/cover.jpg?token=563a2963a0a153ac1c53779712f48af5630e0377&ttl=1616839200&v=1616714152
http://fmcdn.mfcdn.net/store/manga/44/cover.jpg?token=febabec452a05c1415f02bf8387a0a8f16c20137&ttl=1616839200&v=1548837372
http://fmcdn.mfcdn.net/store/manga/107/cover.jpg?token=3dcce47a3a6760b9b81b7b576711980d36cf7be1&ttl=1616839200&v=1543561843
http://fmcdn.mfcdn.net/store/manga/24241/cover.jpg?token=b4a1834d714f0476c2d99c5ffb905351c7a4d72f&ttl=1616839200&v=1616176266
http://fmcdn.mfcdn.net/store/manga/25773/cover.jpg?token=7bf8a8e9346a02250bb24cd8e6e4da0933e6a05f&ttl=1616839200&v=1616655977
http://fmcdn.mfcdn.net/store/manga/10956/cover.jpg?token=db3b74dc959adedbd847142cd3a079caca6b25d1&ttl=1616839200&v=1612043463
http://fmcdn.mfcdn.net/store/manga/15593/cover.jpg?token=caceb80b7266f438bdedae8cf69653ab7911fe68&ttl=1616839200&v=1606188363
http://fmcdn.mfcdn.net/store/manga/14916/cover.jpg?token=0dab5e6797f4cc915a035632ed0d02a2492afbcc&ttl=1616839200&v=1609752363
http://fmcdn.mfcdn.net/store/manga/26771/cover.jpg?token=77a6aa9bbb7ebcd3df15cd4cc65b4e3915e96ed4&ttl=1616839200&v=1615891829
http://fmcdn.mfcdn.net/store/manga/16569/cover.jpg?token=e5815ac1520ad179ad2d6f798e4b6ead6790cd33&ttl=1616839200&v=1614957071
http://fanfox.net/manga/a_story_about_treating_a_female_knight_who_has_never_been_treated_as_a_woman_as_a_woman/
http://fanfox.net/manga/tales_of_demons_and_gods/
http://fanfox.net/manga/martial_peak/
http://fanfox.net/manga/onepunch_man/
http://fanfox.net/manga/one_piece/
http://fanfox.net/manga/star_martial_god_technique/
http://fanfox.net/manga/solo_leveling/
http://fanfox.net/manga/the_last_human/
http://fanfox.net/manga/kimetsu_no_yaiba/
http://fanfox.net/manga/versatile_mage/
http://fanfox.net/manga/boku_no_hero_academia/
http://fanfox.net/manga/apotheosis/
http://fanfox.net/manga/black_clover/
http://fanfox.net/manga/tensei_shitara_slime_datta_ken/
http://fanfox.net/manga/kingdom/
http://fanfox.net/manga/tate_no_yuusha_no_nariagari/
http://fanfox.net/manga/tomo_chan_wa_onna_no_ko/
http://fanfox.net/manga/goblin_slayer/
http://fanfox.net/manga/yakusoku_no_neverland/
http://fanfox.net/manga/god_of_martial_arts/
http://fanfox.net/manga/kaifuku_jutsushi_no_yarinaoshi/
http://fanfox.net/manga/re_monster/
http://fanfox.net/manga/mushoku_tensei_isekai_ittara_honki_dasu/
http://fanfox.net/manga/nanatsu_no_taizai/
http://fanfox.net/manga/battle_through_the_heavens/
http://fanfox.net/manga/shingeki_no_kyojin/
http://fanfox.net/manga/iron_ladies/
http://fanfox.net/manga/monster_musume_no_iru_nichijou/
http://fanfox.net/manga/world_s_end_harem/
http://fanfox.net/manga/bleach/
http://fanfox.net/manga/parallel_paradise/
http://fanfox.net/manga/shokugeki_no_soma/
http://fanfox.net/manga/spirit_sword_sovereign/
http://fanfox.net/manga/horimiya/
http://fanfox.net/manga/dungeon_ni_deai_o_motomeru_no_wa_machigatte_iru_darou_ka/
http://fanfox.net/manga/dr_stone/
http://fanfox.net/manga/berserk/
http://fanfox.net/manga/the_new_gate/
http://fanfox.net/manga/akatsuki_no_yona/
http://fanfox.net/manga/naruto/
http://fanfox.net/manga/overlord/
http://fanfox.net/manga/death_march_kara_hajimaru_isekai_kyousoukyoku/
http://fanfox.net/manga/tsuki_ga_michibiku_isekai_douchuu/
http://fanfox.net/manga/eternal_reverence/
http://fanfox.net/manga/minamoto_kun_monogatari/
http://fanfox.net/manga/beastars/
http://fanfox.net/manga/jujutsu_kaisen/
http://fanfox.net/manga/hajime_no_ippo/
http://fanfox.net/manga/kaguya_sama_wa_kokurasetai_tensai_tachi_no_renai_zunousen/
http://fanfox.net/manga/domestic_na_kanojo/
http://fanfox.net/manga/the_legendary_moonlight_sculptor/
http://fanfox.net/manga/the_gamer/
http://fanfox.net/manga/kumo_desu_ga_nani_ka/
http://fanfox.net/manga/bokutachi_wa_benkyou_ga_dekinai/
http://fanfox.net/manga/enen_no_shouboutai/
http://fanfox.net/manga/tsuyokute_new_saga/
http://fanfox.net/manga/fairy_tail/
http://fanfox.net/manga/komi_san_wa_komyushou_desu/
http://fanfox.net/manga/kenja_no_mago/
http://fanfox.net/manga/soul_land/
http://fanfox.net/manga/boruto_naruto_next_generations/
http://fanfox.net/manga/hunter_x_hunter/
http://fanfox.net/manga/history_s_strongest_disciple_kenichi/
http://fanfox.net/manga/phoenix_against_the_world/
http://fanfox.net/manga/lv999_no_murabito/
http://fanfox.net/manga/gate_jietai_kare_no_chi_nite_kaku_tatakeri/
http://fanfox.net/manga/kengan_asura/
http://fanfox.net/manga/konjiki_no_moji_tsukai_yuusha_yonin_ni_makikomareta_unique_cheat/
http://fanfox.net/manga/please_don_t_bully_me_nagatoro/
http://fanfox.net/manga/isekai_maou_to_shoukan_shoujo_dorei_majutsu/
I found the best way (for me) with .find_all()/.findAll() methods is just to use for loop, same goes with .select() method.
And in some cases .select() giving better results.
Check out SelectorGadget to quickly find css selector.

BeautifulSoup Python

I'm scraping a news article using BeautifulSoup trying to only return the text body of the article itself, not all the additional "noise". Is there any easy way to do this?
import bs4
import requests
url = 'https://www.cnn.com/2018/01/22/us/puerto-rico-privatizing-state-power-authority/index.html'
res = requests.get(url)
soup = bs4.BeautifulSoup(res.text,'html.parser')
element = soup.select_one('div.pg-rail-tall__body #body-text').text
print(element)
Trying to exclude some of the information returned such as
{CNN.VideoPlayer.handleUnmutePlayer = function
handleUnmutePlayer(containerId, dataObj) {'use strict';var
playerInstance,playerPropertyObj,rememberTime,unmuteCTA,unmuteIdSelector
= 'unmute_' +
The noise, as you call it, is the text in the <script>...</script> tags (JavaScript code). You can remove it using .extract() like:
for s in soup.find_all('script'):
s.extract()
You can use this:
r = requests.get('https://edition.cnn.com/2018/01/22/us/puerto-rico-privatizing-state-power-authority/index.html')
soup = BeautifulSoup(r.text, 'html.parser')
[x.extract() for x in soup.find_all('script')] # Does the same thing as the 'for-loop' above
element = soup.find('div', class_='pg-rail-tall__body')
print(element.text)
Partial Output:
(CNN)Puerto Rico Gov. Ricardo Rosselló announced Monday that the
commonwealth will begin privatizing the Puerto Rico Electric Power
Authority, or PREPA. In comments published on Twitter, the governor
said the assets sale would transform the island's power generation
system into a "modern" and "efficient" one that would be less
expensive for citizens.He said the system operates "deficiently" and
that the improved infrastructure would respond more "agilely" to
natural disasters. The privatization process will begin "in the next
few days" and occur in three phases over the next 18 months, the
governor said.JUST WATCHEDSchool cheers as power returns after 112
daysReplayMore Videos ...MUST WATCHSchool cheers as power returns
after 112 days 00:48San Juan Mayor Carmen Yulin Cruz, known for her
criticisms of the Trump administration's response to Puerto Rico after
Hurricane Maria, spoke out against the move.Cruz, writing on her
official Twitter account, said PREPA's privatization would put the
commonwealth's economic development into "private hands" and that the
power authority will begin to "serve other interests.
Try this:
import bs4
import requests
url = 'https://www.cnn.com/2018/01/22/us/puerto-rico-privatizing-state-power-au$'
res = requests.get(url)
soup = bs4.BeautifulSoup(res.text, 'html.parser')
elementd = soup.findAll('div', {'class': 'zn-body__paragraph'})
elementp = soup.findAll('p', {'class': 'zn-body__paragraph'})
for i in elementp:
print(i.text)
for i in elementd:
print(i.text)

Categories

Resources