Get source from image from webpage - python

So I want to get the image source from this website:
https://www.pixiv.net/en/artworks/77619496
But every time I try to scrape it with bs4 I keep failing, I've tried other posts too but couldn't get it to work.
It keeps returning None
import requests
import bs4
from bs4 import BeautifulSoup
url = 'https://www.pixiv.net/en/artworks/77564597'
r = requests.get(url)
soup = BeautifulSoup(r.content, 'html.parser')
x = soup.find("img")
print(x)

If you look at chrome debug console's network section or the console in the browser you are browsing in, you should see that there is no img elements at the beginning, the page generates img elements by executing javascript. However, I inspected the page and there is a meta element which has image data in it and you can parse it with JSON as shown:
import requests, json
from bs4 import BeautifulSoup
url = 'https://www.pixiv.net/en/artworks/77564597'
r = requests.get(url)
soup = BeautifulSoup(r.content, 'html.parser')
x = soup.find("meta", {"id": "meta-preload-data"}).get("content")
usefulData = json.loads(x)
print(usefulData)
The sample output is here.

from selenium import webdriver
import time
from bs4 import BeautifulSoup
browser = webdriver.Firefox()
url = 'https://www.pixiv.net/en/artworks/77564597'
sada = browser.get(url)
time.sleep(3)
source = browser.page_source
soup = BeautifulSoup(source, 'html.parser')
for item in soup.findAll('div', attrs={'class': 'sc-fzXfPI fRnFme'}):
for img in item.findAll('img', attrs={'class': 'sc-fzXfPJ lclRkv'}):
print(img.get('src'))
Output:
https://i.pximg.net/c/250x250_80_a2/custom-thumb/img/2019/11/28/00/02/59/78026183_p0_custom1200.jpg
https://i.pximg.net/c/250x250_80_a2/img-master/img/2019/10/31/04/15/04/77564597_p0_square1200.jpg
https://i.pximg.net/c/250x250_80_a2/img-master/img/2019/08/30/07/23/45/76528190_p0_square1200.jpg
https://i.pximg.net/c/250x250_80_a2/img-master/img/2019/08/23/08/01/08/76410568_p0_square1200.jpg
https://i.pximg.net/c/250x250_80_a2/img-master/img/2019/07/24/03/41/47/75881545_p0_square1200.jpg
https://i.pximg.net/c/250x250_80_a2/img-master/img/2019/05/30/04/24/27/74969583_p0_square1200.jpg
https://i.pximg.net/c/250x250_80_a2/custom-thumb/img/2019/11/28/00/02/59/78026183_p0_custom1200.jpg
https://i.pximg.net/c/250x250_80_a2/img-master/img/2019/10/31/04/15/04/77564597_p0_square1200.jpg
https://i.pximg.net/c/250x250_80_a2/img-master/img/2019/08/30/07/23/45/76528190_p0_square1200.jpg

Related

BeatifulSoup enter data and click the button

How do I get into this field using the BeautifulSoup on python? I need to write inside this field and then enter.
import requests
from bs4 import BeautifulSoup as bs
URL = "http://192.168.0.1"
page = requests.get(URL)
soup = bs(page.content, "html.parser")
results = soup.find(id = "main_internal_container")
item = results.find(class_="rightcolumn")
print(item)

Python scraping returns none

I'm trying to take a name from a HTML page with BeautifulSoup:
import urllib.request
from bs4 import BeautifulSoup
nightbot = 'https://nightbot.tv/t/tonyxzero/song_requests'
page = urllib.request.urlopen(nightbot)
soup = BeautifulSoup(page, 'html5lib')
list_item = soup.find('strong', attrs={'class': 'ng-binding'})
print (list_item)
But when i print print(list_item) i get a none as reply. There is a way to fix it?
Webpage is rendered by javascript. So you have to use a package like selenium to get what you want.
You can try this:
CODE:
import urllib.request
from bs4 import BeautifulSoup
from selenium import webdriver
driver = webdriver.Firefox()
driver.get('https://nightbot.tv/t/tonyxzero/song_requests')
html = driver.page_source
soup = BeautifulSoup(html, 'html.parser')
list_item = soup.find('strong', attrs={'class': 'ng-binding'})
print (list_item)
RESULT:
<strong class="ng-binding" ng-bind="$state.current.title">Song Requests: TONYXZERO</strong>

Python requests.get() not showing all HTML

I'm looking to scrape some information from Easy Allies reviews for a personal project using:
Python3
requests
BS4 (BeautifulSoup)
I would like to scrape the names of the last games they have reviewed which is easy to find within the browser inspect tool, but doesn't exist within the source code of the page which is what is returned with this Python code:
import requests
from bs4 import BeautifulSoup
page = requests.get("http://www.easyallies.com/#!/reviews")
soup = BeautifulSoup(page.text, 'html.parser')
print(soup.prettify())
How do I access this data?
Notice that when you open that url, it calls an endpoint https://www.easyallies.com/api/review/get that will fetch the reviews.
Take this code as an example, and parse the JSON result as you wish.
import requests
from bs4 import BeautifulSoup
data = { 'method': 'review', 'action': 'get', 'data[start]': 0, 'data[limit]': 10 }
reviews = requests.post("https://www.easyallies.com/api/review/get", data=data)
print (reviews.text)
from selenium import webdriver
import time
from bs4 import BeautifulSoup
browser = webdriver.Firefox()
url = 'https://www.easyallies.com/#!/reviews'
sada = browser.get(url)
time.sleep(3)
source = browser.page_source
soup = BeautifulSoup(source, 'html.parser')
for item in soup.findAll('div', attrs={'class': 'name'}):
print(item.text)

want to scrape all members profile links for member details

from bs4 import BeautifulSoup
import requests
r = requests.get('http://medicalassociation.in/doctor-search')
soup = BeautifulSoup(r.text,'lxml')
link = soup.find('table',{'class':'tab-gender'})
link1 = link.find('tbody')
link2 = link1.find('tr')[3:4]
link3 = link2.find('a',class_='user-name')
print link3.text
Not getting links through this code.I want to take out view profile links
The following works for me on several test runs. Just using requests and select with class selector.
import requests
from bs4 import BeautifulSoup as bs
r = requests.get('http://medicalassociation.in/doctor-search')
soup = bs(r.content, 'lxml')
results = [item['href'] for item in soup.select(".user-name")]
print(results)
Request.get() rendering javascripts and can't see any element.You can use WebDriver and get the page_source and then fetch the information.
from bs4 import BeautifulSoup
from selenium import webdriver
driver = webdriver.Chrome()
driver.get("http://medicalassociation.in/doctor-search")
soup = BeautifulSoup(driver.page_source,'html.parser')
for a in soup.find_all('a',class_="user-name"):
if a.text is not None :
print(a['href'])

BeautifulSoup : Fetched all the links on a webpage how to navigate through them without selenium?

So I'm trying to write a mediocre script to download subtitles from one particular website as y'all can see. I'm a newbie to beautifulsoup, so far I have a list of all the "href" after a search query(GET). So how do I navigate further, after getting all the links?
Here's the code:
import requests
from bs4 import BeautifulSoup
usearch = input("Movie Name? : ")
url = "https://www.yifysubtitles.com/search?q="+usearch
print(url)
resp = requests.get(url)
soup = BeautifulSoup(resp.content, 'lxml')
for link in soup.find_all('a'):
dictn = link.get('href')
print(dictn)
You need to use resp.text instead of resp.content
Try this to get the search results.
import requests
from bs4 import BeautifulSoup
base_url_f = "https://www.yifysubtitles.com"
search_url = base_url_f + "/search?q=last+jedi"
resp = requests.get(search_url)
soup = BeautifulSoup(resp.text, 'lxml')
for media in soup.find_all("div", {"class": "media-body"}):
print(base_url_f + media.find('a')['href'])
out: https://www.yifysubtitles.com/movie-imdb/tt2527336

Categories

Resources