The following python script gives me 403 error, the type of request is 'GET'.
import requests
import json
url ='https://footballapi.pulselive.com/football/players?pageSize=30&compSeasons=210&altIds=true&page=2&type=player&id=-1&compSeasonId=210'
headers = {'User-Agent': 'Mozilla/5.0 (X11; Linux x86_64) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/70.0.3538.67 Safari/537.36'}
result = requests.get(url, headers=headers)
print(result.status_code)
A Screenshot:
Check XHR Request Screenshot
Your code looks fine. I ran it and got the same 403 response. But if you open the url you posted, you'll notice a 403 error there as well. This looks like an issue with the website itself or maybe you are using an incorrect url.
This might be a late answer, but what you're missing is the correct header to access the Pulselive API. The necessary header is 'Origin':'https://www.premierleague.com'.
This makes the API think that the request is coming from the official Premier League website, and they have access to the API.
Hope this helps!
Related
I am new to the whole scraping thing and am trying to scrape some information off a website through python but when checking for HTML response (i.e. 200) I am not getting any results back on the terminal. below is my code. Appreciate all sort of help! Edit: I have fixed my rookie mistake in the print section below xD thank you guys for the correction!
import requests
url = "https://www.sephora.ae/en/shop/makeup-c302/"
page = requests.get(url)
print(page.status_code)
The problem is that the page you are trying to scrape protects against scraping by ignoring requests from unusual user agents.
Set the user agent to some well-known string like below
import requests
url = "https://www.sephora.ae/en/shop/makeup-c302/"
headers = {
'User-Agent': 'Mozilla/5.0 (Windows NT 10.0; Win64; x64) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/102.0.5005.63 Safari/537.36'
}
response = requests.get(url, headers=headers)
print(response.status_code)
For one thing, you don't print to the console in Python with the syntax Print = (page). That code assigns the page variable to a variable called Print, which is probably not a good idea as print is a keyword in Python. In order to output to the console, change your code to:
print(page)
Second, printing page is just printing the response object you are receiving after making your GET request, which is not very helpful. The response object has a number of properties you can access, which you can read about in the documentation for the requests Python library.
To get the status code of your response, try:
print(page.status_code)
Python error when using request get
Hello guys i have this in my code
from bs4 import BeautifulSoup
r = requests.get(url)
And I'm gettin this
<Response [403]>
Whats could be the solution
The url is 'https://www3.animeflv.net/anime/sailor-moon'
btw the title is weird because i dont know why stack overflow dont allow me the way i want to put it :(
For your specific case you can overcome that by faking your User-Agent in request headers.
import requests
url = 'https://www3.animeflv.net/anime/sailor-moon'
headers = {'User-Agent':'Mozilla/5.0 (X11; Linux x86_64) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/90.0.4430.93 Safari/537.36'}
res = requests.get(url, headers=headers)
print(res.status_code)
<Response [200]>
Some websites try to block requests made with python requests library, by default when you make a request from python script your User-Agent is something like python3/requests but if you fake it with manipulating headers you can easily bypass that. Take a look at this library https://pypi.org/project/fake-useragent/ for generating fake User-Agent strings.
My Django website is hosted using Apache server. I want to send data using requests.post to my website using a python script on my pc but It is giving 403 forbidden.
import json
url = "http://54.161.205.225/Project/devicecapture"
headers = {'User-Agent':
'Mozilla/5.0 (Macintosh; Intel Mac OS X 10_11_5) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/50.0.2661.102 Safari/537.36',
'content-type': 'application/json'}
data = {
"nb_imsi":"test API",
"tmsi1":"test",
"tmsi2":"test",
"imsi":"test API",
"country":"USA",
"brand":"Vodafone",
"operator":"test",
"mcc":"aa",
"mnc":"jhj",
"lac":"hfhf",
"cellIid":"test cell"
}
response = requests.post(url, data =json.dumps(data),headers=headers)
print(response.status_code)
I have also given permission to the directory containing the views.py where this request will go.
I have gone through many other answers but they didn't help.
I have tried the code without json.dumps also but it isn't working with that also.
How to resolve this?
After investigating it looks like the URL that you need to post to in order to login is: http://54.161.205.225/Project/accounts/login/?next=/Project/
You can work out what you need to send in a post request by looking in the Chrome DevTools, Network tab. This tells us that you need to send the fields username, password and csrfmiddlewaretoken, which you need to pull from the page.
You can get it by extracting it from the response of the first get request. It is stored on the page like this:
<input type="hidden" name="csrfmiddlewaretoken" value="OspZfYQscPMHXZ3inZ5Yy5HUPt52LTiARwVuAxpD6r4xbgyVu4wYbfpgYMxDgHta">
So you'll need to do some kind of Regex to get it. You'll work it out.
So first you have to create a session. Then load the login page with a get request. Then send a post request with your login credentials to that same URL. And then your session will gain the required cookies that will then allow you to post to your desired URL. This is an example below.
import requests
# Create session
session = requests.session()
# Add user-agent string
session.headers.update({'User-Agent': 'Mozilla/5.0 (Macintosh; Intel Mac OS X 10_9_3) ' +
'AppleWebKit/537.36 (KHTML, like Gecko) Chrome/35.0.1916.47 Safari/537.36'})
# Get login page
response = session.get('http://54.161.205.225/Project/accounts/login/?next=/Project/')
# Get csrf
# Do something to response.text
# Post to login
response = session.post('http://54.161.205.225/Project/accounts/login/?next=/Project/', data={
'username': 'example123',
'password': 'examplexamplexample',
'csrfmiddlewaretoken': 'something123123',
})
# Post desired data
response = session.post('http://url.url/other_page', data={
'data': 'something',
})
print(response.status_code)
Hopefully this should get you there. Good luck.
For more information check out this question on requests: Python Requests and persistent sessions
I faced that situation many times
The problems were :
54.161.205.225 is not added to allowed hosts in settings.py
the apache wsgi is not correctly configured
things might help with debug :
Check apache error-logs to investigate what went wrong
try running server locally and post to it to make sure prob is not related to apache
I've had some success using the POST requests in the past on other sites and receiving data from them but for some reason I'm having difficulty with the metacritic site.
Using chrome and the developer tools, I can see that when I begin to type in the search bar, it starts a POST request to the following url.
searchURL = 'http://www.metacritic.com/g00/3_c-6bbb.rjyfhwnynh.htr_/c-6RTWJUMJZX77x24myyux3ax2fx2fbbb.rjyfhwnynh.htrx2ffzytx78jfwhmx3fn65h.rfwpx3dcmw_$/$'
I also know that my headers need to be the following in order to get a response
headers = {'User-Agent' : "Mozilla/5.0 (Windows NT 6.3; Win64; x64) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/63.0.3239.84 Safari/537.36"}
When I run this, I get a status code of 200 which indicates it worked but my response text is not what I expected. I am receiving the content of the entire page when I'm expecting json of search results. What am I missing here?
title = 'Grand Theft Auto'
#search request using POST
r = requests.post(searchURL, data = {'searchTerm' : title}, headers = headers)
print(r.status_code)
print(r.text)
You can see in the images below what I'm expecting to get.
Headers
Response
Not sure about the difference - maybe GDPR-related since i live in Europe, or because i have set DNT (Do not track) to true in Chrome - but for me, Metacritic autocomplete requests post simply to http://www.metacritic.com/autosearch with the parameters search_term set to the search value and search_filter set to all :
From your screenshots, i think the URL for autocomplete in your browser is constructed with your session id, maybe to avoid stuff like you intend to do :)
So in your case i would try in following order:
post to the /autosearch URL and if that doesn't work
figure out the session-id to URL-writing logic, then make an initial request in the code to get a session id and work with that
I am trying to download a ZIP file using from this website. I have looked at other questions like this, tried using the requests and urllib but I get the same error:
urllib.error.HTTPError: HTTP Error 302: The HTTP server returned a redirect error that would lead to an infinite loop. The last 30x error message was: Found
Any ideas on how to open the file straight from the web?
Here is some sample code
from urllib.request import urlopen
response = urlopen('http://www1.caixa.gov.br/loterias/_arquivos/loterias/D_megase.zip')
The linked url will redirect indefinitely, that's why you get the 302 error.
You can examine this yourself over here. As you can see the linked url immediately redirects to itself creating a single-url loop.
Works for me using the Requests library
import requests
url = 'http://www1.caixa.gov.br/loterias/_arquivos/loterias/D_megase.zip'
response = requests.get(url)
# Unzip it into a local directory if you want
import zipfile, io
zip = zipfile.ZipFile(io.BytesIO(response.content))
zip.extractall("/path/to/your/directory")
Note that sometimes trying to access web pages programmatically leads to 302 responses because they only want you to access the page via a web browser.
If you need to fake this (don't be abusive), just set the 'User-Agent' header to be like a browser. Here's an example of making a request look like it's coming from a Chrome browser.
user_agent = 'Mozilla/5.0 (Windows NT 6.2; Win64; x64) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/32.0.1667.0 Safari/537.36'
headers = {'User-Agent': user_agent}
requests.get(url, headers=headers)
There are several libraries (e.g. https://pypi.org/project/fake-useragent/) to help with this for more extensive scraping projects.