Python requests gives 403 error when requesting from papara.com

Python requests gives 403 error when requesting from papara.com - python

I'm trying to get into papara.com using Python. When I make a request it always gives 403 as a response. I got cookies from my browser. Here is my code:
import requests
headers = {
'authority': 'www.papara.com',
'upgrade-insecure-requests': '1',
'user-agent': 'Mozilla/5.0 (X11; Linux x86_64) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/87.0.4280.141 Safari/537.36',
'accept': 'text/html,application/xhtml+xml,application/xml;q=0.9,image/avif,image/webp,image/apng,*/*;q=0.8,application/signed-exchange;v=b3;q=0.9',
'sec-fetch-site': 'none',
'sec-fetch-mode': 'navigate',
'sec-fetch-user': '?1',
'sec-fetch-dest': 'document',
'accept-language': 'en-US,en;q=0.9',
'cookie': '__cfruid=64370d0d06d80a1e1a701ae8bee5a4b85c1de1af-1610296629',
}
response = requests.get('https://www.papara.com/', headers=headers)
I tried different user agents, I tried removing the cookie from the headers but didn't work.

Related

403 HTML Error Code when sending request to lexica.art using Python

I need to get all image prompts from the https://lexica.art/?q=history request, but the website returns 403 error code when I am trying to send a request.
I already tried to set User-Agent property, and copied all the request properties, but it still isn't working.
Here is my code:
import requests
url="https://lexica.art/api/trpc/prompts.infinitePrompts?batch=1&input={%220%22%3A{%22json%22%3A{%22text%22%3A%22history%22%2C%22searchMode%22%3A%22images%22%2C%22source%22%3A%22search%22%2C%22cursor%22%3A250}}}"
headers = {
'Accept': 'text/html,application/xhtml+xml,application/xml;q=0.9,image/avif,image/webp,*/*;q=0.8',
'Accept-Encoding': 'gzip, deflate, br',
'Accept-Language': 'en-US,en;q=0.5',
'Alt-Used': 'lexica.art',
'cache-control': 'max-age=0',
'Connection': 'keep-alive',
'Host': 'lexica.art',
'Sec-Fetch-Dest': 'document',
'Sec-Fetch-Mode': 'navigate',
'Sec-Fetch-Site': 'cross-site',
'TE': 'trailers',
'Upgrade-Insecure-Requests': '1',
'User-Agent': 'Mozilla/5.0 (Windows NT 10.0; Win64; x64) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/106.0.0.0 Safari/537.36'
}
r=requests.get(url, headers=headers)
print(r.status_code)

I used selenium library instead, everything is working fine

Python application calling Nse India URL getting stuck in Heroku

I have a python program that calls the nseindia.com and tries to fetch the indices data using the URL: https://www1.nseindia.com/live_market/dynaContent/live_watch/stock_watch/liveIndexWatchData.json"
This code is working fine on my system, but when I deploy this to Heroku it get stuck at the URL call.
import requests
url = "https://www1.nseindia.com/live_market/dynaContent/live_watch/stock_watch/liveIndexWatchData.json"
headers = {
'authority': 'beta.nseindia.com',
'cache-control': 'max-age=0',
'dnt': '1',
'upgrade-insecure-requests': '1',
'user-agent': 'Mozilla/5.0 (Windows NT 10.0; Win64; x64) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/79.0.3945.117 Safari/537.36',
'sec-fetch-user': '?1',
'accept': 'text/html,application/xhtml+xml,application/xml;q=0.9,image/webp,image/apng,*/*;q=0.8,application/signed-exchange;v=b3;q=0.9',
'sec-fetch-site': 'none',
'sec-fetch-mode': 'navigate',
'accept-encoding': 'gzip, deflate, br',
'accept-language': 'en-US,en;q=0.9,hi;q=0.8',
}
response = requests.get(url=url, headers=headers)
print(response.json())
Any suggestions?

XHR post request responds with 401, urlencoded data

I'm trying to make a post request using the following converted from CURL bash:
headers = {
'authority': 'www.discoverecentral.com',
'accept': '*/*',
'accept-language': 'en-US,en;q=0.9',
'content-type': 'application/x-www-form-urlencoded',
# Requests sorts cookies= alphabetically
# 'cookie': 'COOKIE_SUPPORT=true; GUEST_LANGUAGE_ID=en_US; _ga=GA1.2.976004968.1651680892; _gid=GA1.2.1647177146.1652091069; ak_bmsc=376E16054B8CE1667585CF4B843B1281~000000000000000000000000000000~YAAQVJPIF87oQqyAAQAA5y8jrQ9DHy/4GZJUo1mSNg5U7s7R0A1ATGV+bFMIIp99MPTSGgwRJbLppQ33OtTnvp4dT1gF31OZ01N5b7SAvYbzGh6p1JHCPRkuLI7LI/yDQ/Y24KBTfsRYeTkILDOlI948yMwXay1lXdXMwVmiUOhfUV1TqPoS/kuHVjF+Pu5TYaGVoHmz2tARel9ydbLCv44P+yYkEssPPJanuEtdg3A3IYXH4SzSbaqhN+yV2OmwbYj9C4rHP3Vb1R7g2zQAKzS8Z+kwdV5Ns13EVuFPb+bVNxAKUIsnMKy7Lpxa05e+l38JktfKWtto7bBkfAzH7FyibI/6iyCvw/cghpDaE/PkXqXZDZh6GFWkVUABzngytkXRkS1aTG9VwhBJap2iJbWaVvA=; SAML_SP_SESSION_KEY=_bd42396230f077643c06f7bb75c60202169a8011748d5bf587745d054563; JSESSIONID=429078A672F540F1159490C033065E11.jvm6; _gat=1; bm_sv=237874D0F3F147A8B5E9FE30ABD61E37~YAAQVJPIFxrtQqyAAQAA/dqCrQ8Pjm4VHd954FLp0cvcoavAJFayiPFK25Q0lEeLQz4Ejuy7Q2GTzcT1DC0xhWkz2XAC6zLrqBc93TFAOG9zTjPZFqUTKfu9XplU5QowZlz76ekHhvprJpnen+rsaOPGScci0EPsUaU4LXyknJADa97lizWyy/1RpFDuSUnspML6cYGOBwVmpVs3EM13bfVQCuB7r4li7iMJ0toY6hl30+YIzwF7ESB1xrlwvl59Uvumf3j4w4UC1kw=~1; LFR_SESSION_STATE_4814701=1652178477701',
'origin': 'https://www.discoverecentral.com',
'referer': 'https://www.discoverecentral.com/group/merchant/my-reports',
'sec-ch-ua': '" Not A;Brand";v="99", "Chromium";v="101", "Google Chrome";v="101"',
'sec-ch-ua-mobile': '?0',
'sec-ch-ua-platform': '"Windows"',
'sec-fetch-dest': 'empty',
'sec-fetch-mode': 'cors',
'sec-fetch-site': 'same-origin',
'user-agent': 'Mozilla/5.0 (Windows NT 10.0; Win64; x64) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/101.0.4951.54 Safari/537.36',
'x-requested-with':'XMLHttpRequest',
}
params = {
'p_p_id': 'DiscoverMyReportPortlet_WAR_discovermyreportportlet',
'p_p_lifecycle': '2',
'p_p_state': 'normal',
'p_p_mode': 'view',
'p_p_resource_id': 'retreiveHierarchyList',
'p_p_cacheability': 'cacheLevelPage',
}
data = '&direction=des&orderBy=Default&selectedLocalEntityId=6011&gridPageSize=5000'
response = s.post('https://www.discoverecentral.com/group/merchant/my-reports', params=params, headers=headers, data=data)
The response is a 401 unauthorized. I know it has something to do with the structure of the data being passed into the request. Anyone come across a similar issue?

Python - How to log into a website that has a 302 page (while also passing on a cookie) after the initial sign in?

So here is the basic login process mapped out (when you successfully login on the site) for a high level understanding. Below it is my code, that only gets me the first cookie, when I need the second cookie to access the JSON I need. I have a working code that currently gets me the file but the cookie (2nd cookie) expires after a week so it is not sustainable.
Step 1 - Purpose: Get first cookie that will be "c1" below
Request URL: https://www.fakesite.com/sign-in.html
Request Method: GET
Status Code: 200
Step 2 - Purpose: Use "c1" to make a POST request to obtain second cookie "c2" <<<<-Can't get past here
Request URL: https://www.fakesite.com/sign-in.html
Request Method: POST
Status Code: 302
Step 3 - Purpose: Use "c2" and auth_token (doesn't change) to GET the json file I need <<<I have working code when manually getting 7-day cookie
Request URL: https://www.fakesite.com/ap1/file.json
Request Method: GET
Status Code: 200
import requests
headers = {
'authority': 'www.fakesite.com',
'sec-ch-ua': '" Not A;Brand";v="99", "Chromium";v="98", "Google Chrome";v="98"',
'sec-ch-ua-mobile': '?0',
'sec-ch-ua-platform': '"Windows"',
'upgrade-insecure-requests': '1',
'user-agent': 'Mozilla/5.0 (Windows NT 10.0; Win64; x64) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/98.0.4758.102 Safari/537.36',
'accept': 'text/html,application/xhtml+xml,application/xml;q=0.9,image/avif,image/webp,image/apng,*/*;q=0.8,application/signed-exchange;v=b3;q=0.9',
'sec-fetch-site': 'same-origin',
'sec-fetch-mode': 'navigate',
'sec-fetch-user': '?1',
'sec-fetch-dest': 'document',
'referer': 'https://www.fakesite.com/goodbye.html?service=logout',
'accept-language': 'en-US,en;q=0.9',
}
s = requests.Session()
r1 = s.get('https://www.fakesite.com/sign-in.html', headers=headers)
c1 = s.cookies['fakesite']
print(r1.status_code)
print(c1)
c2= 'fakesite='+c1+'; language=en'
headers = {
'authority': 'www.fakesite.com',
'cache-control': 'max-age=0',
'sec-ch-ua': '" Not A;Brand";v="99", "Chromium";v="98", "Google Chrome";v="98"',
'sec-ch-ua-mobile': '?0',
'sec-ch-ua-platform': '"Windows"',
'upgrade-insecure-requests': '1',
'origin': 'https://www.fakesite.com',
'content-type': 'application/x-www-form-urlencoded',
'user-agent': 'Mozilla/5.0 (Windows NT 10.0; Win64; x64) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/98.0.4758.102 Safari/537.36',
'accept': 'text/html,application/xhtml+xml,application/xml;q=0.9,image/avif,image/webp,image/apng,*/*;q=0.8,application/signed-exchange;v=b3;q=0.9',
'sec-fetch-site': 'same-origin',
'sec-fetch-mode': 'navigate',
'sec-fetch-user': '?1',
'sec-fetch-dest': 'document',
'referer': 'https://www.fakesite.com/sign-in.html',
'accept-language': 'en-US,en;q=0.9',
'cookie': c2,
}
data = {
'csrf_token': '12345=',
'username': 'userme',
'password': 'passwordme',
'returnUrl': '/sign-in.html',
'service': 'login',
'Login': 'Login'
}
r2 = requests.post('https://www.fakesite.com/sign-in.html', headers=headers, data=data)
c3 = s.cookies['fakesite']
print(r2.status_code)
print(c3)
OUTPUT:
200
151f3e3ba82030e5b7c03bc310ed5ad5
200
151f3e3ba82030e5b7c03bc310ed5ad5
My code results in returning the first cookie over again when I try and print all of them . I feel like I have tried everything to no avail. When I try to go to the last site while directly logging in, I get a 401 because I need the first cookie to get the second cookie first.

Python Requests unable to get the site even after passing headers

I am trying to scrape an e-commerce site Myntra but the request keeps on loading without ever returning the result. I have tried passing headers to the request with different user-agents but still, it doesn't work. If a timeout parameter is added, the request times out but no success.
Here is the sample code I'm trying to execute
import requests
url = 'https://www.myntra.com'
s = requests.Session()
headers = {
'authority': 'www.myntra.com',
'method': 'GET',
'path': '/',
'scheme': 'https',
'accept': 'text/html,application/xhtml+xml,application/xml;q=0.9,image/avif,image/webp,image/apng,*/*;q=0.8,application/signed-exchange;v=b3;q=0.9',
'accept-encoding': 'gzip, deflate, br',
'accept-language': 'en-US,en;q=0.9',
'cache-control': 'max-age=0',
# dnt: 1
'sec-fetch-dest': 'document',
'sec-fetch-mode': 'navigate',
'sec-fetch-site': 'none',
'sec-fetch-user': '?1',
'upgrade-insecure-requests': '1',
'user-agent': 'Mozilla/5.0 (Windows NT 10.0; Win64; x64) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/85.0.4183.121 Safari/537.36',
}
response = s.get(url, headers=headers, timeout=10).content
print(response)
If I try to curl the same site, I get a 403 status code with the following output.
<HTML><HEAD>
<TITLE>Access Denied</TITLE>
</HEAD><BODY>
<H1>Access Denied</H1>
You don't have permission to access "http://www.myntra.com/" on this server.<P>
Reference #18.24092e17.1601830542.453a61c2
</BODY>
</HTML>

Develop Reference

Python is a programming language that lets you work quickly and integrate systems more effectively.

Python requests gives 403 error when requesting from papara.com - python

Related

403 HTML Error Code when sending request to lexica.art using Python

Python application calling Nse India URL getting stuck in Heroku

XHR post request responds with 401, urlencoded data

Python - How to log into a website that has a 302 page (while also passing on a cookie) after the initial sign in?

Python Requests unable to get the site even after passing headers

Categories

Resources