hi I trying to get a request to https://www.playerup.com/ with blow code :
import requests
header = {"user-agent": "Mozilla/5.0 (Windows NT 10.0; Win64; x64) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/80.0.3987.163 Safari/537.36"}
r = requests.get("https://www.playerup.com/",headers=header)
print(r.status_code)
but it give me a [503]type error
I trying timeout for 5 second and also not work
how I should fix it ?
So times using a referer and google cache can help you avoid these failures. So your code should be:
# here added a referer
header = {"user-agent": "Mozilla/5.0 (Windows NT 10.0; Win64; x64) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/74.0.3729.169 Safari/537.36" ,'referer':'https://www.google.com/'}
# now use a google cache
r = requests.get("http://webcache.googleusercontent.com/search?q=cache:www.playerup.com/",headers=header)
Now see the status code:
>>>r
<Response [200]>
Related
Good morning,
Since yesterday, I'm having timeouts doing requests to ebay website. The code is simple:
import requests
headers = {
"User-Agent":
"Mozilla/5.0 (Windows NT 10.0; Win64; x64) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/70.0.3538.77 Safari/537.36"}
htlm=requests.get("https://www.ebay.es",headers=headers).text
Tested with google and it works. This is the response I receive:
'\nGateway Timeout - In read \n\nGateway Timeout\nThe proxy server did not receive a timely response from the upstream server.\nReference #1.477f1602.1645295618.7675ccad\n\n'
What happened or changed? How could I solve it?
Removing the headers should work. Perhaps they don't like that user agent for some reason.
import requests
# headers = { "User-Agent": "Mozilla/5.0 (Windows NT 10.0; Win64; x64) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/70.0.3538.77 Safari/537.36"}
headers = {}
url = "https://www.ebay.es"
response = requests.get(url, headers=headers)
html_text = response.text
I have the same problem of this question :
Python requests.get fails with 403 forbidden, even after using headers and Session object
unfortunately there is no answer.So how can i solve forbidden 403 ?
I tried:
Python requests - 403 forbidden - despite setting `User-Agent` headers
and :
Python requests. 403 Forbidden
Someone know another option to solve it ?
import requests
url_complete='https://smartsub.les.inf.puc-rio.br//media/imagens/5f667ec98b21262d4fc0a9dc5df4d0e4/8c6bbb5844e009eab139442e4024684d.jpg'
session = requests.Session()
headers = {'User-Agent': 'Mozilla/5.0 (Windows NT 10.0; Win64; x64) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/95.0.4638.69 Safari/537.36 Edg/95.0.1020.53',
'referer':'https://smartsub.les.inf.puc-rio.br/login/?next=/'}
Picture_request = session.get(url_complete,headers=headers)
print(Picture_request)
For someone who is having the same problem , the solution to my problem was to fill in the cookers information in the headers.
headers = {'User-Agent': 'Mozilla/5.0 (Windows NT 10.0; Win64; x64) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/95.0.4638.69 Safari/537.36 Edg/95.0.1020.53',
'cookie':" ..."}
You can get the cookie info in same way as useg-agent as explained here
Python requests. 403 Forbidden
Try an HTTP proxy, for instance 'Zyte'
I am able to open this url via a browser and see the response in json format. However, when I use the requests module, there is no response from the method.
import requests
response = requests.get('https://api.nasdaq.com/api/calendar/earnings?date=2021-02-23')
What is wrong here?
this worked for me:
url = 'https://api.nasdaq.com/api/calendar/earnings?date=2021-02-23'
headers = {'user-agent': 'Mozilla/5.0 (Windows NT 10.0; Win64; x64) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/75.0.3770.142 Safari/537.36'}
response = requests.get(url, headers=headers)
Explanation
The site is blocking requests from python. Refer to explanation here
When adding the headers of the query that appear when inspecting the element in chrome, the request works well in python:
import requests
response = requests.get('https://api.nasdaq.com/api/calendar/earnings?date=2021-02-23',headers={"authority":"api.nasdaq.com","scheme":"https","path":"/api/calendar/earnings?date=2021-02-23","pragma":"no-cache","cache-control":"no-cache","accept":"application/json, text/plain, */*","user-agent":"Mozilla/5.0 (Windows NT 10.0; Win64; x64) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/88.0.4324.182 Safari/537.36","origin":"https://www.nasdaq.com","sec-fetch-site":"same-site","sec-fetch-mode":"cors","sec-fetch-dest":"empty","referer":"https://www.nasdaq.com/","accept-encoding":"gzip, deflate, br","accept-language":"en-US,en;q=0.9,es;q=0.8,nl;q=0.7"})
Currently, I am trying to make a user generator for a website and there is a fundemantal problem that I've been facing. The code below works but what it prints out is
The page has expired due to inactivity. Please refresh and try again
I have seen some of those solutions including using xsrf-token but either I am doing something wrong or it is not related to token.
with requests.Session() as s:
s.get('http://www.watchill.org/register')
token = s.cookies["XSRF-TOKEN"]
agent = {"User-Agent": "Mozilla/5.0 (Windows NT 6.3; Win64; x64) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/75.0.3770.142 Safari/537.36 OPR/62.0.3331.116",
"XSRF-TOKEN":token}
r = s.post('http://www.watchill.org/register',headers=agent)
print(bs4.BeautifulSoup(r.content,"html.parser"))
The problem is with your CSRF token which is invalid.
I didn't check if this code is doing what's your aiming to but it does not return page expired message:
import requests
from bs4 import BeautifulSoup
def getXsrf(cookies):
for cookie in s.cookies:
if cookie.name =='XSRF-TOKEN':
return cookie.value
with requests.Session() as s:
s.get('http://www.watchill.org/register')
xsrf = getXsrf(s.cookies)
headers = {"User-Agent": "Mozilla/5.0 (Windows NT 6.3; Win64; x64) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/75.0.3770.142 Safari/537.36 OPR/62.0.3331.116"}
headers['X-XSRF-TOKEN'] = xsrf
r = s.post('http://www.watchill.org/register',headers=headers)
print(BeautifulSoup(r.content,"html.parser"))
I am trying to scrape zk.fm in order to download music, but it's giving me some trouble. I'm using urllib3 to generate a response, but this always yields a Bad Gateway error. Accessing the website through a browser works perfectly fine.
This is my code (with a random fake user-agent). I'm trying to access "http://zk.fm/mp3/search?keywords=" followed by some keywords which indicate the song name and artist, for example "http://zk.fm/mp3/search?keywords=childish+gambino+heartbeat".
from bs4 import BeautifulSoup
from random import choice
import urllib3
desktop_agents = ['Mozilla/5.0 (Windows NT 6.1; WOW64) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/54.0.2840.99 Safari/537.36',
'Mozilla/5.0 (Windows NT 10.0; WOW64) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/54.0.2840.99 Safari/537.36',
'Mozilla/5.0 (Windows NT 10.0; Win64; x64) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/54.0.2840.99 Safari/537.36',
'Mozilla/5.0 (Macintosh; Intel Mac OS X 10_12_1) AppleWebKit/602.2.14 (KHTML, like Gecko) Version/10.0.1 Safari/602.2.14',
'Mozilla/5.0 (Windows NT 10.0; WOW64) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/54.0.2840.71 Safari/537.36',
'Mozilla/5.0 (Macintosh; Intel Mac OS X 10_12_1) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/54.0.2840.98 Safari/537.36',
'Mozilla/5.0 (Macintosh; Intel Mac OS X 10_11_6) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/54.0.2840.98 Safari/537.36',
'Mozilla/5.0 (Windows NT 6.1; WOW64) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/54.0.2840.71 Safari/537.36',
'Mozilla/5.0 (Windows NT 6.1; Win64; x64) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/54.0.2840.99 Safari/537.36',
'Mozilla/5.0 (Windows NT 10.0; WOW64; rv:50.0) Gecko/20100101 Firefox/50.0']
def random_headers():
return {'User-Agent': choice(desktop_agents)}
ua = random_headers()
http = urllib3.PoolManager(10,headers=user_agent)
response = http.request('GET',"http://zk.fm/mp3/search?
keywords=childish+gambino+heartbeat")
soup = BeautifulSoup(response.data)
Is there a way to work around the 502 Error, or is it out of my control?
You need to enable the persistence of cookies, then access, in order, the site home page followed by the search URL. I suggest (personally) python-requests, but it is up to you. See here for discussion.
I tested this by visiting the search page - error 502. visit home page - 200. visit search - 200. clear cookies and visit search again - 502. So it must be cookies that are the problem.