My code prints none when trying to webscrape - python

I'm a beginner just started learning python a week ago, I was trying to get a product title for a specific product on amazon but when I try to run my code it prints "None" instead of printing the title, Any help?
import requests
from bs4 import BeautifulSoup
url = 'https://www.amazon.com/Sony-ILCE7SM2-mount-Camera-Full-Frame/dp/B0158SRJVQ/ref=sr_1_1?
dchild=1&keywords=a7s&qid=1589917834&sr=8-1'
headers = {
'user_agent' : 'Mozilla/5.0 (Windows NT 6.1; Win64; x64) AppleWebKit/537.36 (KHTML, like
Gecko) Chrome/81.0.4044.138 Safari/537.36'
}
page = requests.get(url, headers=headers)
soup = BeautifulSoup(page.content, 'html.parser')
title = soup.find(id='productTitle')
print(title)

Related

Python Request returning different result than original page (browser)

I am trying to do a simple WebScrapper to monitor Nike's site here in Brazil.
Basically i want to track products that have stock right now, to check when new products are added.
My problem is that when i navigate to the site https://www.nike.com.br/snkrs#estoque I see different products compared to what I see using python requests method.
Here is the code I am using:
import requests
from bs4 import BeautifulSoup
headers ={
'user-agent': 'Mozilla/5.0 (Windows NT 10.0; Win64; x64) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/87.0.4280.88 Safari/537.36'
}
url = 'https://www.nike.com.br/snkrs#estoque'
page = requests.get(url, headers=headers)
soup = BeautifulSoup(page.content, 'html.parser')
len(soup.find_all(class_='produto produto--comprar'))
This code gives me 40, but using the browser I can see 56 products https://prnt.sc/26jeo1i
The data comes from a different source, within 3 pages.
import requests
from bs4 import BeautifulSoup
headers ={
'user-agent': 'Mozilla/5.0 (Windows NT 10.0; Win64; x64) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/87.0.4280.88 Safari/537.36'
}
productList = []
for p in [1,2,3]:
url = f'https://www.nike.com.br/Snkrs/Estoque?p={p}&demanda=true'
page = requests.get(url, headers=headers)
soup = BeautifulSoup(page.content, 'html.parser')
productList += soup.find_all(class_='produto produto--comprar')
Output:
print(len(productList))
56

Webscraping - I need some help understanding how to distinguish an item on a page BS4, Requests

I am stuck. I am able to extract product name and prices from amazon, using the following code
import requests
from bs4 import BeautifulSoup
import pandas as pd
headers = {
"User-Agent": "Mozilla/5.0 (Windows NT 10.0; Win64; x64) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/93.0.4577.82 Safari/537.36"}
#
url = f'https://www.amazon.co.uk/dp/B083PHB6XX'
r = requests.get(url, headers=headers)
soup = BeautifulSoup(r.content, 'lxml')
name = soup.find('span', {'id': 'productTitle'}).text.strip()
price = soup.find('span', {'id': 'priceblock_ourprice'}).text.strip()
print(name)
print(price)
But I am unable to figure out how to extract the sales rank data from within the table, which is lower down on the page, under the additional information section. I'd be most grateful if anyone would be able to assist in helping me figure out how to write the next soup.find line of code, to show '106,505' for the sales rank.
Many thanks in advance.
One solution can be searching for <th> tag that contains string "Best Sellers Rank" and then find next <span>:
import requests
from bs4 import BeautifulSoup
url = "https://www.amazon.co.uk/dp/B083PHB6XX"
headers = {
"User-Agent": "Mozilla/5.0 (Windows NT 10.0; Win64; x64) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/93.0.4577.82 Safari/537.36"
}
soup = BeautifulSoup(requests.get(url, headers=headers).content, "html.parser")
ranks = soup.select_one('th:-soup-contains("Best Sellers Rank")').find_next(
"span"
)
print(ranks.text.split()[0])
Prints:
111,190

I can not scrape item from this website. Python

I am trying to scrape all the clothing items in this website but I was not be able to do it. I set 'limit=3' in 'find_all' but it gives me only 1 result. How can I get all result in one request?
Please help me I am stuck with this!
This is the e-commerce website I am trying to scrape
def trendyol():
url = "https://www.trendyol.com/erkek+kazak--hirka?filtreler=22|175"
headers = {"User-Agent": 'Mozilla/5.0 (Windows NT 10.0; Win64; x64) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/85.0.4183.83 Safari/537.36'}
page = requests.get(url, headers=headers).text
soup = BeautifulSoup(page, "html.parser")
list= soup.find_all("div",{"class":"p-card-chldrn-cntnr"}, limit=3)
for div in list:
link= str("https://www.trendyol.com/" + div.a.get("href"))
name = div.find("span",{"class":"prdct-desc-cntnr-name hasRatings"}).text
print(f'link: {link}')
print(f'isim: {name}')
Try this code:
from bs4 import BeautifulSoup
import requests
def trendyol(url):
headers = {"User-Agent": 'Mozilla/5.0 (Windows NT 10.0; Win64; x64) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/85.0.4183.83 Safari/537.36'}
page = requests.get(url, headers=headers).text
soup = BeautifulSoup(page, "html.parser")
list= soup.find("div", {'class':'prdct-cntnr-wrppr'})
for link in list.find_all('div',{'class': 'p-card-chldrn-cntnr'}):
print("https://www.trendyol.com" + link.find('a', href=True)['href'])
print(link.find('div',{'class':'image-container'}).img['alt'])
print(link.find('span',{'class':'prdct-desc-cntnr-ttl'}).text)
url = "https://www.trendyol.com/erkek+kazak--hirka?filtreler=22%7C175&pi=3"
trendyol(url)
This code with print product url, title and alt text of title. Thanks.

How do I properly use the find function from BeatifulSoup4 in python3?

I'm following a youtube tutorial on how to scrape an amazon product-page. First I'm trying to get the product title. Later I want to get the amazon price and the secon-hand-price. For this I'm ustin requests and bs4. Here the code so far:
import requests
from bs4 import BeautifulSoup
URL = 'https://www.amazon.de/Teenage-Engineering-Synthesizer-FM-Radio-AMOLED-Display/dp/B00CXSJUZS/ref=sr_1_1_sspa?__mk_de_DE=%C3%85M%C3%85%C5%BD%C3%95%C3%91&dchild=1&keywords=op-1&qid=1594672884&sr=8-1-spons&psc=1&smid=A1GQGGPCGF8PV9&spLa=ZW5jcnlwdGVkUXVhbGlmaWVyPUFEMUZSUjhQMUM3NTkmZW5jcnlwdGVkSWQ9QTAwMzMwODkyQkpTNUJUUE9QUFVFJmVuY3J5cHRlZEFkSWQ9QTA4MzM4NDgxV1Y3UzVVN1lXTUZKJndpZGdldE5hbWU9c3BfYXRmJmFjdGlvbj1jbGlja1JlZGlyZWN0JmRvTm90TG9nQ2xpY2s9dHJ1ZQ=='
headers = {"User-Agent": 'Mozilla/5.0 (Windows NT 10.0; Win64; x64) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/83.0.4103.116 Safari/537.36'}
page = requests.get(URL,headers=headers)
soup = BeautifulSoup(page.content,'html.parser')
title = soup.find('span',{'id' : "productTitle"})
print(title)
my title is None. So the find function doesn't find the element with the id "productTitle". But checking the soup shows, that there is an element with that id..
So what's wrong with my code?
I also tried:
title = soup.find(id = "productTitle")
Try this:
import requests
from bs4 import BeautifulSoup
URL = 'https://www.amazon.de/Teenage-Engineering-Synthesizer-FM-Radio-AMOLED-Display/dp/B00CXSJUZS/ref=sr_1_1_sspa?__mk_de_DE=%C3%85M%C3%85%C5%BD%C3%95%C3%91&dchild=1&keywords=op-1&qid=1594672884&sr=8-1-spons&psc=1&smid=A1GQGGPCGF8PV9&spLa=ZW5jcnlwdGVkUXVhbGlmaWVyPUFEMUZSUjhQMUM3NTkmZW5jcnlwdGVkSWQ9QTAwMzMwODkyQkpTNUJUUE9QUFVFJmVuY3J5cHRlZEFkSWQ9QTA4MzM4NDgxV1Y3UzVVN1lXTUZKJndpZGdldE5hbWU9c3BfYXRmJmFjdGlvbj1jbGlja1JlZGlyZWN0JmRvTm90TG9nQ2xpY2s9dHJ1ZQ=='
headers = {"User-Agent": 'Mozilla/5.0 (Windows NT 10.0; Win64; x64) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/83.0.4103.116 Safari/537.36'}
page = requests.get(URL,headers=headers)
soup = BeautifulSoup(page.content,'lxml')
title = soup.find('span',{'id' : "productTitle"})
print(title.text.strip())
You do the right thing but have a "bad" parser. Read more about the differences between parsers here. I prefer lxml but also sometimes use html5lib. I also added
.text.strip()
to the print so only the title text is printed.
Note: you have to install lxml for python first!

How to get live wind from a site in python

Hi I am writing a python script where it takes live wind from a given site where I live, now if I use the following code on the website I get a 'none' value but on the website there is information at the given position.
I tried this code:
import requests
from bs4 import BeautifulSoup
link = 'http://www.actuelewind.nl/?stationcode=6308#SpotPage'
headers = {'user-agent': 'Mozilla/5.0 (Windows NT 6.1; Win64; x64) AppleWebKit/537.36 (KHTML,
like Gecko) Chrome/76.0.3809.132 Safari/537.36'}
def checkwind():
pagina = requests.get(link, headers=headers)
soup = BeautifulSoup(pagina.content, 'html.parser')
windsnelheid = soup.find('div', attrs={"id": "spotInfoWindsnelheidMS"})
print(windsnelheid)
checkwind()
Can anyone show me how to get live wind from this website?

Categories

Resources