I am trying to do some web scraping from here but I am struggling to get the access token automaticaly. Everytime I do the web scraping, I need to manually update the Bearer token. Is there a way to do this automaticaly?
Let me show you how I do it manually:
url_WiZink = 'https://www.creditopessoal.wizink.pt/gravitee/gateway/api-chn-loans/v1/loans/quotation'
headers_WiZink = {'Accept': 'application/json',
'Accept-Encoding': 'gzip, deflate, br',
'Accept-Language': 'pt-PT',
'Authorization': 'Bearer de6ea490-381e-417f-ab77-3aad0d7eb63c',
'Connection': 'keep-alive',
'Content-Length': '266',
'Content-Type': 'application/json;charset=UTF-8',
'Host': 'www.creditopessoal.wizink.pt',
'Origin': 'https://www.wizink.pt',
'Referer': 'https://www.wizink.pt/',
'sec-ch-ua': '" Not A;Brand";v="99", "Chromium";v="90", "Google Chrome";v="90"',
'sec-ch-ua-mobile': '?0',
'Sec-Fetch-Dest': 'empty',
'Sec-Fetch-Mode': 'cors',
'Sec-Fetch-Site': 'same-site',
'User-Agent': 'Mozilla/5.0 (Windows NT 10.0; WOW64) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/90.0.4430.212 Safari/537.36',
'X-Channel-Id': 'LOANSIMULATOR',
'X-Client-Id': 'simWzkPt',
'X-Country-Id': 'PRT',
'X-Device-UUID': 'd14e9b629804cbba1ac7c3e78ab39a56'}
payload_WiZink = {"productCode":"WZP01","fixedTermLoanId":84,"impositionAmount":{"amount":10000,"currency":"EUR"},"settlementDay":"5","dueOrAdvanceInterestIndicator":"3","nominalInterest":"8.0000000","feeRateId":"05","settlementFrequencyId":"0001","deprecationFrequencyId":"0001"}
response_WiZink = requests.post(url_WiZink, headers=headers_WiZink, json=payload_WiZink, verify=False).json()
For that website, you can get an access token by calling their oauth/token endpoint:
import requests
access_token = requests.post(
'https://www.creditopessoal.wizink.pt/gravitee/gateway/api-chn-auth-server/v1/oauth/token',
headers={'Authorization': 'Basic c2ltV3prUHQ6YmllZTktZmR6dzAzLXBvZWpuY2Q='},
data={'grant_type': 'client_credentials'},
).json()['access_token']
print(access_token)
Related
I need to get all image prompts from the https://lexica.art/?q=history request, but the website returns 403 error code when I am trying to send a request.
I already tried to set User-Agent property, and copied all the request properties, but it still isn't working.
Here is my code:
import requests
url="https://lexica.art/api/trpc/prompts.infinitePrompts?batch=1&input={%220%22%3A{%22json%22%3A{%22text%22%3A%22history%22%2C%22searchMode%22%3A%22images%22%2C%22source%22%3A%22search%22%2C%22cursor%22%3A250}}}"
headers = {
'Accept': 'text/html,application/xhtml+xml,application/xml;q=0.9,image/avif,image/webp,*/*;q=0.8',
'Accept-Encoding': 'gzip, deflate, br',
'Accept-Language': 'en-US,en;q=0.5',
'Alt-Used': 'lexica.art',
'cache-control': 'max-age=0',
'Connection': 'keep-alive',
'Host': 'lexica.art',
'Sec-Fetch-Dest': 'document',
'Sec-Fetch-Mode': 'navigate',
'Sec-Fetch-Site': 'cross-site',
'TE': 'trailers',
'Upgrade-Insecure-Requests': '1',
'User-Agent': 'Mozilla/5.0 (Windows NT 10.0; Win64; x64) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/106.0.0.0 Safari/537.36'
}
r=requests.get(url, headers=headers)
print(r.status_code)
I used selenium library instead, everything is working fine
I want to get the source page of 'g2.com' website.
I included all the headers in the request and still get 403 response.
headers = {
'Authority': 'www.g2.com',
'Method': 'GET',
'Scheme': 'https',
'Accept': 'text/html,application/xhtml+xml,application/xml;q=0.9,image/avif,image/webp,image/apng,*/*;q=0.8,application/signed-exchange;v=b3;q=0.9',
'Accept-Encoding': 'gzip, deflate, br',
'Accept-Language': 'en-US, en; q=0.9',
'Referer': 'https://www.g2.com/products/adobe-marketo-engage/reviews?__cf_chl_tk=Fo5h38ejOyOQlLESaOeFuwQwgN_UhEf3GEk.D8oGJUI-1657274049-0-gaNycGzNCmU',
'Pragma': 'no-cache',
'Connection': 'keep-alive',
'Cache-Control': 'max-age=0',
'Upgrade-Insecure-Requests': '1',
'sec-ch-ua': '" Not;A Brand";v="99", "Google Chrome";v="103", "Chromium";v="103"',
'sec-ch-ua-mobile': '?0',
'Sec-Fetch-Site': 'same-origin',
'Sec-Fetch-Mode': 'navigate',
'Sec-Fetch-Dest': 'document',
'Sec-Fetch-User': '?1',
'cookie': 'events_distinct_id=c2b8f12b-8533-4f3b-8116-5d1697f9fb9a; _ga=GA1.2.1805609855.1656582972; ajs_anonymous_id=c2b8f12b-8533-4f3b-8116-5d1697f9fb9a; intercom-id-rzpwcktf=ca9542d9-48e1-43e9-9c03-2e6f764bd313; _gcl_au=1.1.1117117031.1656582974; _delighted_web={"h7nzI49oCCJbJbS4":{"_delighted_fst":{"t":"1656582990103"}}}; __adroll_fpc=be0a43b909a1ee55f7fbbe0ff435677b-1656582993099; intercom-session-rzpwcktf=; cf_clearance=bZFBSAXN._7HrbMYKIHlwGhxjEpv1LcxhdgUZJvANuA-1657274052-0-150; __ar_v4=|C6MKFN32KVBHZAS4DKYVVW:20220707:1|EEPCTRZ5RNC6ZCBB2PJM4J:20220707:1|NBMTYK27EJFT3GYAV7FM56:20220707:1; _g2_session_id=741c7ae825402c34f500c9e045907672; _gid=GA1.2.923101367.1657517161; AWSALB=OS0LwCLYUCHeMYMSmWfhqAWaatiLtCuQhpwkGLfmjOb8/p6Lc/wFvC/DJ0MD7qnMTML3BbUpq9caoDwAQuU/EStHqVpw+cO7juiwSVs551sfSWPPYgiP8lwLahU+; AWSALBCORS=OS0LwCLYUCHeMYMSmWfhqAWaatiLtCuQhpwkGLfmjOb8/p6Lc/wFvC/DJ0MD7qnMTML3BbUpq9caoDwAQuU/EStHqVpw+cO7juiwSVs551sfSWPPYgiP8lwLahU+',
'User-Agent': 'Mozilla/5.0(Macintosh; Intel Mac OS X 10_15_7) AppleWebKit/537.36 (KHTML, like Gecko) '
'Chrome/103.0.0.0 Safari/537.36',
'Content-Type': 'text/html; charset=UTF-8',
}
response = requests.get(url, headers=headers)
I tried with pytor package tho but I got the same problem.
I can view the website in all browsers but when I try to get the source page with requests I get 403.
I'm trying to make a post request using the following converted from CURL bash:
headers = {
'authority': 'www.discoverecentral.com',
'accept': '*/*',
'accept-language': 'en-US,en;q=0.9',
'content-type': 'application/x-www-form-urlencoded',
# Requests sorts cookies= alphabetically
# 'cookie': 'COOKIE_SUPPORT=true; GUEST_LANGUAGE_ID=en_US; _ga=GA1.2.976004968.1651680892; _gid=GA1.2.1647177146.1652091069; ak_bmsc=376E16054B8CE1667585CF4B843B1281~000000000000000000000000000000~YAAQVJPIF87oQqyAAQAA5y8jrQ9DHy/4GZJUo1mSNg5U7s7R0A1ATGV+bFMIIp99MPTSGgwRJbLppQ33OtTnvp4dT1gF31OZ01N5b7SAvYbzGh6p1JHCPRkuLI7LI/yDQ/Y24KBTfsRYeTkILDOlI948yMwXay1lXdXMwVmiUOhfUV1TqPoS/kuHVjF+Pu5TYaGVoHmz2tARel9ydbLCv44P+yYkEssPPJanuEtdg3A3IYXH4SzSbaqhN+yV2OmwbYj9C4rHP3Vb1R7g2zQAKzS8Z+kwdV5Ns13EVuFPb+bVNxAKUIsnMKy7Lpxa05e+l38JktfKWtto7bBkfAzH7FyibI/6iyCvw/cghpDaE/PkXqXZDZh6GFWkVUABzngytkXRkS1aTG9VwhBJap2iJbWaVvA=; SAML_SP_SESSION_KEY=_bd42396230f077643c06f7bb75c60202169a8011748d5bf587745d054563; JSESSIONID=429078A672F540F1159490C033065E11.jvm6; _gat=1; bm_sv=237874D0F3F147A8B5E9FE30ABD61E37~YAAQVJPIFxrtQqyAAQAA/dqCrQ8Pjm4VHd954FLp0cvcoavAJFayiPFK25Q0lEeLQz4Ejuy7Q2GTzcT1DC0xhWkz2XAC6zLrqBc93TFAOG9zTjPZFqUTKfu9XplU5QowZlz76ekHhvprJpnen+rsaOPGScci0EPsUaU4LXyknJADa97lizWyy/1RpFDuSUnspML6cYGOBwVmpVs3EM13bfVQCuB7r4li7iMJ0toY6hl30+YIzwF7ESB1xrlwvl59Uvumf3j4w4UC1kw=~1; LFR_SESSION_STATE_4814701=1652178477701',
'origin': 'https://www.discoverecentral.com',
'referer': 'https://www.discoverecentral.com/group/merchant/my-reports',
'sec-ch-ua': '" Not A;Brand";v="99", "Chromium";v="101", "Google Chrome";v="101"',
'sec-ch-ua-mobile': '?0',
'sec-ch-ua-platform': '"Windows"',
'sec-fetch-dest': 'empty',
'sec-fetch-mode': 'cors',
'sec-fetch-site': 'same-origin',
'user-agent': 'Mozilla/5.0 (Windows NT 10.0; Win64; x64) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/101.0.4951.54 Safari/537.36',
'x-requested-with':'XMLHttpRequest',
}
params = {
'p_p_id': 'DiscoverMyReportPortlet_WAR_discovermyreportportlet',
'p_p_lifecycle': '2',
'p_p_state': 'normal',
'p_p_mode': 'view',
'p_p_resource_id': 'retreiveHierarchyList',
'p_p_cacheability': 'cacheLevelPage',
}
data = '&direction=des&orderBy=Default&selectedLocalEntityId=6011&gridPageSize=5000'
response = s.post('https://www.discoverecentral.com/group/merchant/my-reports', params=params, headers=headers, data=data)
The response is a 401 unauthorized. I know it has something to do with the structure of the data being passed into the request. Anyone come across a similar issue?
So here is the basic login process mapped out (when you successfully login on the site) for a high level understanding. Below it is my code, that only gets me the first cookie, when I need the second cookie to access the JSON I need. I have a working code that currently gets me the file but the cookie (2nd cookie) expires after a week so it is not sustainable.
Step 1 - Purpose: Get first cookie that will be "c1" below
Request URL: https://www.fakesite.com/sign-in.html
Request Method: GET
Status Code: 200
Step 2 - Purpose: Use "c1" to make a POST request to obtain second cookie "c2" <<<<-Can't get past here
Request URL: https://www.fakesite.com/sign-in.html
Request Method: POST
Status Code: 302
Step 3 - Purpose: Use "c2" and auth_token (doesn't change) to GET the json file I need <<<I have working code when manually getting 7-day cookie
Request URL: https://www.fakesite.com/ap1/file.json
Request Method: GET
Status Code: 200
import requests
headers = {
'authority': 'www.fakesite.com',
'sec-ch-ua': '" Not A;Brand";v="99", "Chromium";v="98", "Google Chrome";v="98"',
'sec-ch-ua-mobile': '?0',
'sec-ch-ua-platform': '"Windows"',
'upgrade-insecure-requests': '1',
'user-agent': 'Mozilla/5.0 (Windows NT 10.0; Win64; x64) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/98.0.4758.102 Safari/537.36',
'accept': 'text/html,application/xhtml+xml,application/xml;q=0.9,image/avif,image/webp,image/apng,*/*;q=0.8,application/signed-exchange;v=b3;q=0.9',
'sec-fetch-site': 'same-origin',
'sec-fetch-mode': 'navigate',
'sec-fetch-user': '?1',
'sec-fetch-dest': 'document',
'referer': 'https://www.fakesite.com/goodbye.html?service=logout',
'accept-language': 'en-US,en;q=0.9',
}
s = requests.Session()
r1 = s.get('https://www.fakesite.com/sign-in.html', headers=headers)
c1 = s.cookies['fakesite']
print(r1.status_code)
print(c1)
c2= 'fakesite='+c1+'; language=en'
headers = {
'authority': 'www.fakesite.com',
'cache-control': 'max-age=0',
'sec-ch-ua': '" Not A;Brand";v="99", "Chromium";v="98", "Google Chrome";v="98"',
'sec-ch-ua-mobile': '?0',
'sec-ch-ua-platform': '"Windows"',
'upgrade-insecure-requests': '1',
'origin': 'https://www.fakesite.com',
'content-type': 'application/x-www-form-urlencoded',
'user-agent': 'Mozilla/5.0 (Windows NT 10.0; Win64; x64) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/98.0.4758.102 Safari/537.36',
'accept': 'text/html,application/xhtml+xml,application/xml;q=0.9,image/avif,image/webp,image/apng,*/*;q=0.8,application/signed-exchange;v=b3;q=0.9',
'sec-fetch-site': 'same-origin',
'sec-fetch-mode': 'navigate',
'sec-fetch-user': '?1',
'sec-fetch-dest': 'document',
'referer': 'https://www.fakesite.com/sign-in.html',
'accept-language': 'en-US,en;q=0.9',
'cookie': c2,
}
data = {
'csrf_token': '12345=',
'username': 'userme',
'password': 'passwordme',
'returnUrl': '/sign-in.html',
'service': 'login',
'Login': 'Login'
}
r2 = requests.post('https://www.fakesite.com/sign-in.html', headers=headers, data=data)
c3 = s.cookies['fakesite']
print(r2.status_code)
print(c3)
OUTPUT:
200
151f3e3ba82030e5b7c03bc310ed5ad5
200
151f3e3ba82030e5b7c03bc310ed5ad5
My code results in returning the first cookie over again when I try and print all of them . I feel like I have tried everything to no avail. When I try to go to the last site while directly logging in, I get a 401 because I need the first cookie to get the second cookie first.
I'm trying to get into papara.com using Python. When I make a request it always gives 403 as a response. I got cookies from my browser. Here is my code:
import requests
headers = {
'authority': 'www.papara.com',
'upgrade-insecure-requests': '1',
'user-agent': 'Mozilla/5.0 (X11; Linux x86_64) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/87.0.4280.141 Safari/537.36',
'accept': 'text/html,application/xhtml+xml,application/xml;q=0.9,image/avif,image/webp,image/apng,*/*;q=0.8,application/signed-exchange;v=b3;q=0.9',
'sec-fetch-site': 'none',
'sec-fetch-mode': 'navigate',
'sec-fetch-user': '?1',
'sec-fetch-dest': 'document',
'accept-language': 'en-US,en;q=0.9',
'cookie': '__cfruid=64370d0d06d80a1e1a701ae8bee5a4b85c1de1af-1610296629',
}
response = requests.get('https://www.papara.com/', headers=headers)
I tried different user agents, I tried removing the cookie from the headers but didn't work.