Getting the same response different URL

Getting the same response different URL - python

I'm getting the same response from these 2 URLs:
First URL
Second URL
This is the code I'm using:
import requests
url = "https://www.amazon.it/blackfriday"
querystring = {"ref_":"nav_cs_gb_td_bf_dt_cr","deals-widget":"{\"version\":1,\"viewIndex\":60,\"presetId\":\"deals-collection-all-deals\",\"sorting\":\"BY_SCORE\"}"}
payload = ""
headers = {"cookie": "session-id=260-4643637-2647537; session-id-time=2082787201l; i18n-prefs=EUR; ubid-acbit=258-7747562-7485655; session-token=%22aZB70z2dnXHbhJ9e02ESp7q6xO23IGnDFT2iBCiPXZFoBTTEguAJ%2FBSnV7ud6bjAca64nh3bMF1bwDykOBf9BV%2BVjbx4tUQCyBkrg8tyR8PLZ8cjzpCz%2FzQSAmjiL6mSBcspkF8xuV0bxqLeRX7JQCMrHVBFf%2BsUhxV%2FMBLCH8UPk2o5aNL7OyAFCODBdRqm72RK5DAoKeMUymlVEOtqzvZSJbP%2Fut0gobiXJblRM2c%3D%22"}
response = requests.request("GET", url, data=payload, headers=headers, params=querystring)
I would like to get the same response that i get on the browser
How can i do it? Why does this happen?

You have to trick the server into thinking you are a browser. You can accomplish this by setting the user agent header.
headers = {'user-agent':'Mozilla/5.0 (Windows NT 10.0; Win64; x64) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/94.0.4606.61 Safari/537.36',"cookie": "session-id=260-4643637-2647537; session-id-time=2082787201l; i18n-prefs=EUR; ubid-acbit=258-7747562-7485655; session-token=%22aZB70z2dnXHbhJ9e02ESp7q6xO23IGnDFT2iBCiPXZFoBTTEguAJ%2FBSnV7ud6bjAca64nh3bMF1bwDykOBf9BV%2BVjbx4tUQCyBkrg8tyR8PLZ8cjzpCz%2FzQSAmjiL6mSBcspkF8xuV0bxqLeRX7JQCMrHVBFf%2BsUhxV%2FMBLCH8UPk2o5aNL7OyAFCODBdRqm72RK5DAoKeMUymlVEOtqzvZSJbP%2Fut0gobiXJblRM2c%3D%22"}

Related

Web scraping request stopped working, showing "Response [401]" in python?

import requests
headers = {'User-Agent': 'Mozilla/5.0 (Windows NT 10.0; Win64; x64) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/84.0.4147.135 Safari/537.36'}
url = 'https://www.nseindia.com/api/chart-databyindex?index=ACCEQN'
r = requests.get(url, headers=headers)
data = r.json()
print(data)
prices = data['grapthData']
print(prices)
It was working fine but now it showing error "Response [401]"

Well, it's all about the site's authentication requirements. It requires a certain level of authorization to access like this.

Web scraping a JSON file with requests

I'm trying to retrieve the timetable from this site using Requests.
I make the post sending the right parameters and get back the empty HTML skeleton, but instead I would like to get the json file returned.
Here is what I see when inspecting the page and highlighted you can see the file I want to retrieve.
Here is my code so far:
url = "https://alilauro-tickets.certusonline.com/"
headers = {'user-agent':'Mozilla/5.0 (X11; Linux x86_64) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/62.0.3202.94 Safari/537.3'}
data = "msg=TimeTable&req=%7B%22getAvailability%22%3A%22Y%22%2C%22getBasicPrice%22%3A%22Y%22%2C%22getRouteAnalysis%22%3A%22Y%22%2C%22directOnly%22%3A%22Y%22%2C%22legs%22%3A%224%22%2C%22pax%22%3A1%2C%22origin%22%3A%22BEV%22%2C%22destination%22%3A%22ISC%22%2C%22tripRequest%22%3A%5B%7B%22tripfrom%22%3A%22BEV%22%2C%22tripto%22%3A%22ISC%22%2C%22tripdate%22%3A%222020-03-19%22%2C%22tripleg%22%3A0%7D%2C%7B%22tripfrom%22%3A%22ISC%22%2C%22tripto%22%3A%22BEV%22%2C%22tripdate%22%3A%222020-03-19%22%2C%22tripleg%22%3A1%7D%2C%7B%22tripfrom%22%3A%22BEV%22%2C%22tripto%22%3A%22FOR%22%2C%22tripdate%22%3A%222020-03-19%22%2C%22tripleg%22%3A2%7D%2C%7B%22tripfrom%22%3A%22FOR%22%2C%22tripto%22%3A%22BEV%22%2C%22tripdate%22%3A%222020-03-19%22%2C%22tripleg%22%3A3%7D%5D%7D"
r = requests.post(url, data=data, headers=headers, timeout=20)

The request should be as below:
url = 'https://alilauro-tickets.certusonline.com/php/proxy.php'
headers = {'user-agent':'Mozilla/5.0 (X11; Linux x86_64) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/62.0.3202.94 Safari/537.3'}
data = {
'msg': 'TimeTable',
'req': '{"getAvailability":"Y","getBasicPrice":"Y","getRouteAnalysis":"Y","directOnly":"Y","legs":1,"pax":1,"origin":"BEV","destination":"FOR","tripRequest":[{"tripfrom":"BEV","tripto":"FOR","tripdate":"2020-03-18","tripleg":0}]}'
}
response = requests.post(url, headers=headers, data=data)

Response [412] when using the requests python package to access this webpage, how to get around it?

This is the reproducible code:
import requests
url = 'http://wjw.hubei.gov.cn/'
headers = {'User-Agent': 'Mozilla/5.0 (Windows NT 10.0; Win64; x64) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/78.0.3904.97 Safari/537.36'}
res = requests.get(url,headers=headers)
print(res)
The code print(res) gives the following output:
<Response [412]>
I can open the webpage fine on my computer with Chrome.
Is there something missing in the header? Is there a way to get around the 412 error? Thanks in advance!

That website require a valid Cookie in order to response back to you.
I've tried several ways such as calling the main website and then retrieving the Cookie under requests.Session() but the website is not allowing me to pass through.
So the only way which you can use as for now. Or to use Selenium or pass a valid Cookie to the requests
Here's how to get the Cookie and User-Agent via the browser:
Using the following Code:
import requests
headers = {
"User-Agent": "Mozilla/5.0 (Windows NT 10.0; Win64; x64; rv:72.0) Gecko/20100101 Firefox/72.0",
"Cookie": "Hm_lvt_5544783ae3e1427d6972d9e77268f25d=1578572654; Hm_lpvt_5544783ae3e1427d6972d9e77268f25d=1578572671; dataHide2=64fa0f2a-a6aa-43b4-adf0-ce901e8d1a37; FSSBBIl1UgzbN7N80S=sXE0qXcyGkTm4uVerLqfZyUU3XFMZzkm22k.eqVABLPe0eYMo3D8uX5ZJ07.7cCr; FSSBBIl1UgzbN7N80T=4aY.P74ZFvDef6i1BgsPAGpjsGOCcIHJFaOyshl4_fJ1WvTk1nqBkdG9PsyX3VRZcIuI8zdYiRJw4rEBQfx.Mv.GS_wT6Hzgiw.AY.UMP.Mw4iCKXGDzY1UeIH2gUd15impxzBVzZpN3MnSdqD0TUqcxSq0RrvIuE8RKT5pFLAqaNnVqtbeSACx43yIYtKJ41y8Isu6a6lNOlWNeaFJ8bx22pKm3lAIO.HIDhGSZqrUP76.q3i4Iux59f7dqJPuSRF90G1LSUBE8t8HrlWzBcSwJJJARX4Ioc0iHmHvdkVoigUitTRjLUHJM4ieOV1sLBDsq"
}
r = requests.get("http://wjw.hubei.gov.cn/", headers=headers)
print(r)
Output:
<Response [200]>
Update:
import requests
headers = {
"User-Agent": "Mozilla/5.0 (Windows NT 10.0; Win64; x64; rv:72.0) Gecko/20100101 Firefox/72.0"}
with requests.Session() as req:
r = req.get("http://www.hubei.gov.cn/")
headers['Cookie'] = r.headers.get("Set-Cookie")
for item in range(10):
new = req.get("http://wjw.hubei.gov.cn/", headers=headers)
print(new)

import requests
response=requests.get("https://precog.iiit.ac.in/")
< Response [200] >
<Response [400]>
<Response [800]>
None of the above responses

How can I use url lib using a json file

I'm trying to get data from a json link, but I'm getting this error: TypeError: can't concat str to bytes
This is my code:
l = "https://www.off---white.com/en/IT/men/products/omch016f18d471431088s"
url = (l+".json"+"?porcoiddio")
req = urllib.request.Request(url, headers)
response = urllib.request.urlopen(req)
size_opts = json.loads(response.decode('utf-8'))['available_sizes']
How can I solve this error?

Your question answer is change your code to:
size_opts = json.loads(response.read().decode('utf-8'))['available_sizes']
Change at 2018-10-02 22:55 : I view your source code and found Response 503 , the reason why you got 503 is that request did not contain cookies:
req = urllib.request.Request(url, headers=headers)
you have update your headers.
headers.update({"Cookie":cookie_value})
req = urllib.request.Request(url, headers=headers) # !!!! you need a headers include cookies !!!!

you are providing the data argument by mistake …
you'll have to use a keyword argument for headers as otherwise the second argument will be filled with positional input, which happens to be data, try this:
req = urllib.request.Request(url, headers=headers)
See https://docs.python.org/3/library/urllib.request.html#urllib.request.Request for a documentation of Requests signature.

You could have a go using requests instead?
import requests, json
l = "https://www.off---white.com/en/IT/men/products/omch016f18d471431088s"
url = (l+".json"+"?porcoiddio")
session = requests.Session()
session.mount('http://', requests.adapters.HTTPAdapter(max_retries=10))
size_opts = session.get(url, headers= {'Referer': 'off---white.com/it/IT/login', 'User-Agent': 'Mozilla/5.0 (Windows NT 10.0; Win64; x64) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/67.0.3396.99 Safari/537.36'}).json()['available_sizes']
To check the response:
size_opts = session.get(url, headers= {'Referer': 'off---white.com/it/IT/login', 'User-Agent': 'Mozilla/5.0 (Windows NT 10.0; Win64; x64) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/67.0.3396.99 Safari/537.36'})
print(size_opts)
Gives
<Response [503]>
This response means: "503 Service Unavailable. The server is currently unable to handle the request due to a temporary overload or scheduled maintenance"
I would suggest the problem isn't the code but the server?

Python requests: CSRF token missing or incorrect

I have a simple HTML page where I am trying to post form data using requests.post(); however, I keep getting Bad Request 400. CSRF token missing or incorrect even though I am passing it URL-encoded.
Please help.
url = "https://recruitment.advarisk.com/tests/scraping"
res = requests.get(url)
tree = etree.HTML(res.content)
csrf = tree.xpath('//input[#name="csrf_token"]/#value')[0]
postData = dict(csrf_token=csrf, ward=wardName)
print(postData)
postUrl = urllib.parse.quote(csrf)
formData = dict(csrf_token=postUrl, ward=wardName)
print(formData)
headers = {'referer': url, 'content-type': 'application/x-www-form-urlencoded', 'user-agent':'Mozilla/5.0 (Windows NT 10.0; Win64; x64) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/58.0.3029.110 Safari/537.36'}
page = requests.post(url, data=formData, headers=headers)
return page.content

You have make sure the requests in one session, so that the csrf_token will be matched:
import sys
import requests
wardName = "DHANLAXMICOMPLEX"
url = 'https://recruitment.advarisk.com/tests/scraping'
#make the requests in one session
client = requests.session()
# Retrieve the CSRF token first
tree = etree.HTML(client.get(url).content)
csrf = tree.xpath('//input[#name="csrf_token"]/#value')[0]
#form data
formData = dict(csrf_token=csrf, ward=wardName)
headers = {'referer': url, 'content-type': 'application/x-www-form-urlencoded', 'user-agent':'Mozilla/5.0 (Windows NT 10.0; Win64; x64) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/58.0.3029.110 Safari/537.36'}
#use same session client
r = client.post(url, data=formData, headers=headers)
print r.content
It will give you the html with the result data table.

Develop Reference

Python is a programming language that lets you work quickly and integrate systems more effectively.

Getting the same response different URL - python

Related

Web scraping request stopped working, showing "Response [401]" in python?

Web scraping a JSON file with requests

Response [412] when using the requests python package to access this webpage, how to get around it?

How can I use url lib using a json file

Python requests: CSRF token missing or incorrect

Categories

Resources