I'm trying to get into papara.com using Python. When I make a request it always gives 403 as a response. I got cookies from my browser. Here is my code:
import requests
headers = {
'authority': 'www.papara.com',
'upgrade-insecure-requests': '1',
'user-agent': 'Mozilla/5.0 (X11; Linux x86_64) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/87.0.4280.141 Safari/537.36',
'accept': 'text/html,application/xhtml+xml,application/xml;q=0.9,image/avif,image/webp,image/apng,*/*;q=0.8,application/signed-exchange;v=b3;q=0.9',
'sec-fetch-site': 'none',
'sec-fetch-mode': 'navigate',
'sec-fetch-user': '?1',
'sec-fetch-dest': 'document',
'accept-language': 'en-US,en;q=0.9',
'cookie': '__cfruid=64370d0d06d80a1e1a701ae8bee5a4b85c1de1af-1610296629',
}
response = requests.get('https://www.papara.com/', headers=headers)
I tried different user agents, I tried removing the cookie from the headers but didn't work.
Related
I need to get all image prompts from the https://lexica.art/?q=history request, but the website returns 403 error code when I am trying to send a request.
I already tried to set User-Agent property, and copied all the request properties, but it still isn't working.
Here is my code:
import requests
url="https://lexica.art/api/trpc/prompts.infinitePrompts?batch=1&input={%220%22%3A{%22json%22%3A{%22text%22%3A%22history%22%2C%22searchMode%22%3A%22images%22%2C%22source%22%3A%22search%22%2C%22cursor%22%3A250}}}"
headers = {
'Accept': 'text/html,application/xhtml+xml,application/xml;q=0.9,image/avif,image/webp,*/*;q=0.8',
'Accept-Encoding': 'gzip, deflate, br',
'Accept-Language': 'en-US,en;q=0.5',
'Alt-Used': 'lexica.art',
'cache-control': 'max-age=0',
'Connection': 'keep-alive',
'Host': 'lexica.art',
'Sec-Fetch-Dest': 'document',
'Sec-Fetch-Mode': 'navigate',
'Sec-Fetch-Site': 'cross-site',
'TE': 'trailers',
'Upgrade-Insecure-Requests': '1',
'User-Agent': 'Mozilla/5.0 (Windows NT 10.0; Win64; x64) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/106.0.0.0 Safari/537.36'
}
r=requests.get(url, headers=headers)
print(r.status_code)
I used selenium library instead, everything is working fine
I have a python program that calls the nseindia.com and tries to fetch the indices data using the URL: https://www1.nseindia.com/live_market/dynaContent/live_watch/stock_watch/liveIndexWatchData.json"
This code is working fine on my system, but when I deploy this to Heroku it get stuck at the URL call.
import requests
url = "https://www1.nseindia.com/live_market/dynaContent/live_watch/stock_watch/liveIndexWatchData.json"
headers = {
'authority': 'beta.nseindia.com',
'cache-control': 'max-age=0',
'dnt': '1',
'upgrade-insecure-requests': '1',
'user-agent': 'Mozilla/5.0 (Windows NT 10.0; Win64; x64) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/79.0.3945.117 Safari/537.36',
'sec-fetch-user': '?1',
'accept': 'text/html,application/xhtml+xml,application/xml;q=0.9,image/webp,image/apng,*/*;q=0.8,application/signed-exchange;v=b3;q=0.9',
'sec-fetch-site': 'none',
'sec-fetch-mode': 'navigate',
'accept-encoding': 'gzip, deflate, br',
'accept-language': 'en-US,en;q=0.9,hi;q=0.8',
}
response = requests.get(url=url, headers=headers)
print(response.json())
Any suggestions?
I'm trying to make a post request using the following converted from CURL bash:
headers = {
'authority': 'www.discoverecentral.com',
'accept': '*/*',
'accept-language': 'en-US,en;q=0.9',
'content-type': 'application/x-www-form-urlencoded',
# Requests sorts cookies= alphabetically
# 'cookie': 'COOKIE_SUPPORT=true; GUEST_LANGUAGE_ID=en_US; _ga=GA1.2.976004968.1651680892; _gid=GA1.2.1647177146.1652091069; ak_bmsc=376E16054B8CE1667585CF4B843B1281~000000000000000000000000000000~YAAQVJPIF87oQqyAAQAA5y8jrQ9DHy/4GZJUo1mSNg5U7s7R0A1ATGV+bFMIIp99MPTSGgwRJbLppQ33OtTnvp4dT1gF31OZ01N5b7SAvYbzGh6p1JHCPRkuLI7LI/yDQ/Y24KBTfsRYeTkILDOlI948yMwXay1lXdXMwVmiUOhfUV1TqPoS/kuHVjF+Pu5TYaGVoHmz2tARel9ydbLCv44P+yYkEssPPJanuEtdg3A3IYXH4SzSbaqhN+yV2OmwbYj9C4rHP3Vb1R7g2zQAKzS8Z+kwdV5Ns13EVuFPb+bVNxAKUIsnMKy7Lpxa05e+l38JktfKWtto7bBkfAzH7FyibI/6iyCvw/cghpDaE/PkXqXZDZh6GFWkVUABzngytkXRkS1aTG9VwhBJap2iJbWaVvA=; SAML_SP_SESSION_KEY=_bd42396230f077643c06f7bb75c60202169a8011748d5bf587745d054563; JSESSIONID=429078A672F540F1159490C033065E11.jvm6; _gat=1; bm_sv=237874D0F3F147A8B5E9FE30ABD61E37~YAAQVJPIFxrtQqyAAQAA/dqCrQ8Pjm4VHd954FLp0cvcoavAJFayiPFK25Q0lEeLQz4Ejuy7Q2GTzcT1DC0xhWkz2XAC6zLrqBc93TFAOG9zTjPZFqUTKfu9XplU5QowZlz76ekHhvprJpnen+rsaOPGScci0EPsUaU4LXyknJADa97lizWyy/1RpFDuSUnspML6cYGOBwVmpVs3EM13bfVQCuB7r4li7iMJ0toY6hl30+YIzwF7ESB1xrlwvl59Uvumf3j4w4UC1kw=~1; LFR_SESSION_STATE_4814701=1652178477701',
'origin': 'https://www.discoverecentral.com',
'referer': 'https://www.discoverecentral.com/group/merchant/my-reports',
'sec-ch-ua': '" Not A;Brand";v="99", "Chromium";v="101", "Google Chrome";v="101"',
'sec-ch-ua-mobile': '?0',
'sec-ch-ua-platform': '"Windows"',
'sec-fetch-dest': 'empty',
'sec-fetch-mode': 'cors',
'sec-fetch-site': 'same-origin',
'user-agent': 'Mozilla/5.0 (Windows NT 10.0; Win64; x64) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/101.0.4951.54 Safari/537.36',
'x-requested-with':'XMLHttpRequest',
}
params = {
'p_p_id': 'DiscoverMyReportPortlet_WAR_discovermyreportportlet',
'p_p_lifecycle': '2',
'p_p_state': 'normal',
'p_p_mode': 'view',
'p_p_resource_id': 'retreiveHierarchyList',
'p_p_cacheability': 'cacheLevelPage',
}
data = '&direction=des&orderBy=Default&selectedLocalEntityId=6011&gridPageSize=5000'
response = s.post('https://www.discoverecentral.com/group/merchant/my-reports', params=params, headers=headers, data=data)
The response is a 401 unauthorized. I know it has something to do with the structure of the data being passed into the request. Anyone come across a similar issue?
So here is the basic login process mapped out (when you successfully login on the site) for a high level understanding. Below it is my code, that only gets me the first cookie, when I need the second cookie to access the JSON I need. I have a working code that currently gets me the file but the cookie (2nd cookie) expires after a week so it is not sustainable.
Step 1 - Purpose: Get first cookie that will be "c1" below
Request URL: https://www.fakesite.com/sign-in.html
Request Method: GET
Status Code: 200
Step 2 - Purpose: Use "c1" to make a POST request to obtain second cookie "c2" <<<<-Can't get past here
Request URL: https://www.fakesite.com/sign-in.html
Request Method: POST
Status Code: 302
Step 3 - Purpose: Use "c2" and auth_token (doesn't change) to GET the json file I need <<<I have working code when manually getting 7-day cookie
Request URL: https://www.fakesite.com/ap1/file.json
Request Method: GET
Status Code: 200
import requests
headers = {
'authority': 'www.fakesite.com',
'sec-ch-ua': '" Not A;Brand";v="99", "Chromium";v="98", "Google Chrome";v="98"',
'sec-ch-ua-mobile': '?0',
'sec-ch-ua-platform': '"Windows"',
'upgrade-insecure-requests': '1',
'user-agent': 'Mozilla/5.0 (Windows NT 10.0; Win64; x64) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/98.0.4758.102 Safari/537.36',
'accept': 'text/html,application/xhtml+xml,application/xml;q=0.9,image/avif,image/webp,image/apng,*/*;q=0.8,application/signed-exchange;v=b3;q=0.9',
'sec-fetch-site': 'same-origin',
'sec-fetch-mode': 'navigate',
'sec-fetch-user': '?1',
'sec-fetch-dest': 'document',
'referer': 'https://www.fakesite.com/goodbye.html?service=logout',
'accept-language': 'en-US,en;q=0.9',
}
s = requests.Session()
r1 = s.get('https://www.fakesite.com/sign-in.html', headers=headers)
c1 = s.cookies['fakesite']
print(r1.status_code)
print(c1)
c2= 'fakesite='+c1+'; language=en'
headers = {
'authority': 'www.fakesite.com',
'cache-control': 'max-age=0',
'sec-ch-ua': '" Not A;Brand";v="99", "Chromium";v="98", "Google Chrome";v="98"',
'sec-ch-ua-mobile': '?0',
'sec-ch-ua-platform': '"Windows"',
'upgrade-insecure-requests': '1',
'origin': 'https://www.fakesite.com',
'content-type': 'application/x-www-form-urlencoded',
'user-agent': 'Mozilla/5.0 (Windows NT 10.0; Win64; x64) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/98.0.4758.102 Safari/537.36',
'accept': 'text/html,application/xhtml+xml,application/xml;q=0.9,image/avif,image/webp,image/apng,*/*;q=0.8,application/signed-exchange;v=b3;q=0.9',
'sec-fetch-site': 'same-origin',
'sec-fetch-mode': 'navigate',
'sec-fetch-user': '?1',
'sec-fetch-dest': 'document',
'referer': 'https://www.fakesite.com/sign-in.html',
'accept-language': 'en-US,en;q=0.9',
'cookie': c2,
}
data = {
'csrf_token': '12345=',
'username': 'userme',
'password': 'passwordme',
'returnUrl': '/sign-in.html',
'service': 'login',
'Login': 'Login'
}
r2 = requests.post('https://www.fakesite.com/sign-in.html', headers=headers, data=data)
c3 = s.cookies['fakesite']
print(r2.status_code)
print(c3)
OUTPUT:
200
151f3e3ba82030e5b7c03bc310ed5ad5
200
151f3e3ba82030e5b7c03bc310ed5ad5
My code results in returning the first cookie over again when I try and print all of them . I feel like I have tried everything to no avail. When I try to go to the last site while directly logging in, I get a 401 because I need the first cookie to get the second cookie first.
I am trying to scrape an e-commerce site Myntra but the request keeps on loading without ever returning the result. I have tried passing headers to the request with different user-agents but still, it doesn't work. If a timeout parameter is added, the request times out but no success.
Here is the sample code I'm trying to execute
import requests
url = 'https://www.myntra.com'
s = requests.Session()
headers = {
'authority': 'www.myntra.com',
'method': 'GET',
'path': '/',
'scheme': 'https',
'accept': 'text/html,application/xhtml+xml,application/xml;q=0.9,image/avif,image/webp,image/apng,*/*;q=0.8,application/signed-exchange;v=b3;q=0.9',
'accept-encoding': 'gzip, deflate, br',
'accept-language': 'en-US,en;q=0.9',
'cache-control': 'max-age=0',
# dnt: 1
'sec-fetch-dest': 'document',
'sec-fetch-mode': 'navigate',
'sec-fetch-site': 'none',
'sec-fetch-user': '?1',
'upgrade-insecure-requests': '1',
'user-agent': 'Mozilla/5.0 (Windows NT 10.0; Win64; x64) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/85.0.4183.121 Safari/537.36',
}
response = s.get(url, headers=headers, timeout=10).content
print(response)
If I try to curl the same site, I get a 403 status code with the following output.
<HTML><HEAD>
<TITLE>Access Denied</TITLE>
</HEAD><BODY>
<H1>Access Denied</H1>
You don't have permission to access "http://www.myntra.com/" on this server.<P>
Reference #18.24092e17.1601830542.453a61c2
</BODY>
</HTML>