I am attempting to read and parse a website that returns a JSON. Every attempt I have made, it gives me a timeout error or not an error at all(I have to stop it)
URL:
https://api.louisvuitton.com/api/eng-us/catalog/availability/M57089
Code I have tried:
import requests
from urllib.request import Request, urlopen
#Trial 1
BASE_URL = 'https://api.louisvuitton.com/api/eng-us/catalog/availability/M57089'
headers = {
'User-Agent': 'Mozilla/5.0 (Macintosh; Intel Mac OS X 11_1_0) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/88.0.4324.96 Safari/537.36'
}
response = requests.get(BASE_URL, headers=headers)
#Trial2
url = ('https://api.louisvuitton.com/api/eng-us/catalog/availability/M57089')
req = Request(url, headers= headers)
webpage = urlopen(req).read()
page_soup = soup(webpage, "html.parser")
obj=json.loads(str(page_soup))
#Trial3
import dload
j = dload.json('https://api.louisvuitton.com/api/eng-us/catalog/availability/M57089')
print(j)
So far none of these attempts or any variation similar to these have been successful to open the website and read it. Any help would be appreciated.
Related
I'm trying to scrape data from this website 1xbet but I'm getting this error <Response [404]> all the time.
Here is my code.
type here
import requests, bs4
requests.packages.urllib3.disable_warnings()
headers = {"User-Agent":"Mozilla/5.0"}
url = "https://1xbet.com/sports/basketball/early"
response = requests.get(url,headers={'User-Agent': 'Mozilla/5.0 (Macintosh; Intel Mac OS X 10_13_4) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/35.0.1916.47 Safari/537.36'}, verify=False)
print(response)
# soup = BeautifulSoup(page.content, 'html.parser')
# lists = soup.find_all('section', class_="a_event")
# print(lists)
How can I solve this?
I tried to include the headers and veriy=False so that it won't have the error "certificate verify failed", but after doing that I got this response 404. Any help would be appreciated.
404 mean this url source not found,you need check the url is right
I am trying to download a file from a password protected site. I am using the following code, but when I run it nothing happens....no error, just nothing is downloaded. Any insights would be appreciated!
import requests
from bs4 import BeautifulSoup
from urllib.request import urlopen
headers = {
'User-Agent': 'Mozilla/5.0 (Macintosh; Intel Mac OS X 10_15_7) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/92.0.4515.159 Safari/537.36'
}
with requests.Session() as s:
url1 = "http://....."
url2 = "http://......tab"
def login(url1, url2):
r = requests.get(url1)
bs = BeautifulSoup(r.text, 'html.parser')
csrf_token = bs.find('input', attrs={'name': '###_CSRF-Token'})['value']
credentials = {
'username': '#####',
'password': '#####',
'###_CSRF-Token': csrf_token,
}
s.post(url1, data=credentials, headers=headers)
resp = s.get(url2)
with open('/Users/...../Website\ Grab/october.vcf', 'wb') as f:
f.write(r.content)
#urllib.request.urlretrieve(url2, '/Users/.../Downloads/october.vcf')
Relevant line of code is :
response = requests.get(url)
Here's what I've tried so far :
headers = {'User-Agent': 'Mozilla/5.0 (Macintosh; Intel Mac OS X 10_10_1) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/39.0.2171.95 Safari/537.36'}
response = requests.get(url, headers=headers)
and :
from fake_useragent import UserAgent
import requests
ua = UserAgent()
headers = {'User-Agent':str(ua.chrome)}
response = requests.get(url, headers=headers)
But the data I get is still not the current version of the website.
The website I'm trying to scrape is this grocery store flyer.
Can anyone tell me why the data I get is outdated and/or how to fix it?
Update: it works all of a sudden but I haven't changed anything so I'm still curious as to why ...
I'm building a Twitter bot using Tweepy and BeautifulSoup4. I'd like to save in a list the results of a request but my script isn't working anymore (but it was working days ago). I've been looking at it and I don't understand. Here is my function:
import requests
import tweepy
from bs4 import BeautifulSoup
import urllib
import os
from tweepy import StreamListener
from TwitterEngine import TwitterEngine
from ConfigEngine import TwitterAPIConfig
import urllib.request
import emoji
import random
# desktop user-agent
USER_AGENT = "Mozilla/5.0 (Macintosh; Intel Mac OS X 10.14; rv:65.0) Gecko/20100101 Firefox/65.0"
# mobile user-agent
MOBILE_USER_AGENT = "Mozilla/5.0 (Linux; Android 7.0; SM-G930V Build/NRD90M) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/59.0.3071.125 Mobile Safari/537.36"
# Récupération des liens
def parseLinks(url):
headers = {"user-agent": USER_AGENT}
resp = requests.get(url, headers=headers)
if resp.status_code == 200:
soup = BeautifulSoup(resp.content, "html.parser")
results = []
for g in soup.find_all('div', class_='r'):
anchors = g.find_all('a')
if anchors:
link = anchors[0]['href']
results.append(link)
return results
The "url" parameter is 100% correct in the rest of the code. As an output, I get a "None". To be more precise, the execution stops right after line "results = []" (so it doesn't enter into the for).
Any idea?
Thank you so much in advance!
It seems that Google changed the HTML markup on the page. Try to change the search from class="r" to class="rc":
import requests
from bs4 import BeautifulSoup
USER_AGENT = "Mozilla/5.0 (Macintosh; Intel Mac OS X 10.14; rv:65.0) Gecko/20100101 Firefox/65.0"
def parseLinks(url):
headers = {"user-agent": USER_AGENT}
resp = requests.get(url, headers=headers)
if resp.status_code == 200:
soup = BeautifulSoup(resp.content, "html.parser")
results = []
for g in soup.find_all('div', class_='rc'): # <-- change 'r' to 'rc'
anchors = g.find_all('a')
if anchors:
link = anchors[0]['href']
results.append(link)
return results
url = 'https://www.google.com/search?q=tree'
print(parseLinks(url))
Prints:
['https://en.wikipedia.org/wiki/Tree', 'https://simple.wikipedia.org/wiki/Tree', 'https://www.britannica.com/plant/tree', 'https://www.treepeople.org/tree-benefits', 'https://books.google.sk/books?id=yNGrqIaaYvgC&pg=PA20&lpg=PA20&dq=tree&source=bl&ots=_TP8PqSDlT&sig=ACfU3U16j9xRJgr31RraX0HlQZ0ryv9rcA&hl=sk&sa=X&ved=2ahUKEwjOq8fXyKjsAhXhAWMBHToMDw4Q6AEwG3oECAcQAg', 'https://teamtrees.org/', 'https://www.woodlandtrust.org.uk/trees-woods-and-wildlife/british-trees/a-z-of-british-trees/', 'https://artsandculture.google.com/entity/tree/m07j7r?categoryId=other']
i would like to take the response data about a specific website.
I have this site:
https://enjoy.eni.com/it/milano/map/
and if i open the browser debuger console i can see a posr request that give a json response:
how in python i can take this response by scraping the website?
Thanks
Apparently the webservice has a PHPSESSID validation so we need to get it first using proper user agent:
import requests
import json
headers = {
'User-Agent': 'Mozilla/5.0 (X11; Linux x86_64) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/56.0.2924.76 Safari/537.36'
}
r = requests.get('https://enjoy.eni.com/it/milano/map/', headers=headers)
session_id = r.cookies['PHPSESSID']
headers['Cookie'] = 'PHPSESSID={};'.format(session_id)
res = requests.post('https://enjoy.eni.com/ajax/retrieve_vehicles', headers=headers, allow_redirects=False)
json_obj = json.loads(res.content)