I am trying to download the file from here.
In Firefox, the link redirects me to a webpage where I have to log in with my credentials. After doing so, Firefox automatically prompts me to save the desired file with a pop-up windows.
I am trying to replicate that behavior with Python without success.
I've tried the code from this question, namely
import requests
session = requests.session()
url = "https://metamap.nlm.nih.gov/download/public_mm_linux_main_2020.tar.bz2"
data= {'username': 'xxxxx', 'password':'xxxxxx'}
session.post(url, data=data)
response = requests.get("https://metamap.nlm.nih.gov/download/public_mm_linux_main_2020.tar.bz2")
print(response.text)
Insights from Firefox
Using the Network tab from the Developer Mode, I have been able to see the following:
The POST request, apart from username and password sends also an execution parameter, which seems to be very long hashed string. I don't know what this is or how to replicate this.
POST /cas/login?service=https%3a%2f%2fmetamap.nlm.nih.gov%2fdownload%2fpublic_mm_linux_main_2020.tar.bz2 HTTP/1.1
Host: utslogin.nlm.nih.gov
User-Agent: Mozilla/5.0 (X11; Ubuntu; Linux x86_64; rv:79.0) Gecko/20100101 Firefox/79.0
Accept: text/html,application/xhtml+xml,application/xml;q=0.9,image/webp,*/*;q=0.8
Accept-Language: en-US,en;q=0.5
Accept-Encoding: gzip, deflate, br
Content-Type: application/x-www-form-urlencoded
Content-Length: 4900
Origin: https://utslogin.nlm.nih.gov
Connection: keep-alive
Referer: https://utslogin.nlm.nih.gov/cas/login?service=https%3a%2f%2fmetamap.nlm.nih.gov%2fdownload%2fpublic_mm_linux_main_2020.tar.bz2
Cookie: TGC=exxxxxx; JSESSIONID=xxxxx
Upgrade-Insecure-Requests: 1
Afterwards, there is a GET request to the same address with a ticket parameter.
GET /download/public_mm_linux_main_2020.tar.bz2?ticket=xxxxxx HTTP/1.1
Host: metamap.nlm.nih.gov
User-Agent: Mozilla/5.0 (X11; Ubuntu; Linux x86_64; rv:79.0) Gecko/20100101 Firefox/79.0
Accept: text/html,application/xhtml+xml,application/xml;q=0.9,image/webp,*/*;q=0.8
Accept-Language: en-US,en;q=0.5
Accept-Encoding: gzip, deflate, br
Referer: https://utslogin.nlm.nih.gov/cas/login?service=https%3a%2f%2fmetamap.nlm.nih.gov%2fdownload%2fpublic_mm_linux_main_2020.tar.bz2
Connection: keep-alive
Cookie: MOD_AUTH_CAS_S=xxxxxx
Upgrade-Insecure-Requests: 1
Finally, another GET triggers the download.
GET /download/public_mm_linux_main_2020.tar.bz2 HTTP/1.1
Host: metamap.nlm.nih.gov
User-Agent: Mozilla/5.0 (X11; Ubuntu; Linux x86_64; rv:79.0) Gecko/20100101 Firefox/79.0
Accept: text/html,application/xhtml+xml,application/xml;q=0.9,image/webp,*/*;q=0.8
Accept-Language: en-US,en;q=0.5
Accept-Encoding: gzip, deflate, br
Referer: https://utslogin.nlm.nih.gov/cas/login?service=https%3a%2f%2fmetamap.nlm.nih.gov%2fdownload%2fpublic_mm_linux_main_2020.tar.bz2
Connection: keep-alive
Cookie: MOD_AUTH_CAS_S=xxxxx
Upgrade-Insecure-Requests: 1
In between all of this, cookies are set and read.
I don't know what information is useful. I'm putting here all that I can see.
Than you in advanced.
EDIT:
After more exploration I am trying to replicate the 3 request pattern seen in Mozilla/Chrome.
This is my code now:
import requests
url_1 = "https://utslogin.nlm.nih.gov/cas/login"
# url_2 = "https://metamap.nlm.nih.gov/download/public_mm_linux_main_2020.tar.bz2"
params = {'service': 'https%3a%2f%2fmetamap.nlm.nih.gov%2fdownload%2fpublic_mm_linux_main_2020.tar.bz2'}
data = {'username': 'xxxxx',
'password':'xxxxx',
'execution': '1cb37c1a-141b-4891-b961-ea5c4b20deed_ZXlKaGJHY2lPaUpJVXpVeE1pSjkuWXpWb1EzaFdTMmxOZGxsdUwwUlFiWEozSzBkSlVuUXhURlpuZEROSmVpdGtObFpVYjFkU01FMWlhMjkxS3lzNWVrbHlNblptTWt4MVdrdE9SU3RJTmsxTGNISmhNblYxTmtOVVExWkJPWFJZUkVKbFREWk1WV2RaZEhSSU5qbHNNbTlZVldONGMyTndNWGgxVGtWalZtcHRkRWMxZGxaSGNsVmtTVmh6UzJSWmFEQjBVRmRMUlVSalRsTk5WVVYxYlU4elRHUjNZV0pSWVhCblVsQXdjemRhZW5ac1NHeDNhMU5pYjA5UlVYTk1ZWHB1UWtkeGJtUTFRa1V5T0RseVNWcHlSa3RZY2pGS09FeFNjekZNY21KVmFsaE1la3RKZFdsd1pGaEhNVmhIY3pJMU4zRkZVVUZrYTA1aE5IZGljMlZZWTA5Q1kyNXFTbTlUUkRsR1UxcHdVa2wyUlhaU1dsbFpaRU5yVFhwQmQwOWlOVTh4WTNaVmNIWXJaalJqUmxGMU4xSmxVVWx3VjNwR1ZrdDROamhaTjI5UlJXTjVXbXg2YVdGbE5HTlpkM0JxUW5kVWJUaHVXVGx6V0VKek5VRmpjV2c0TmpsSlUydHJRWEZUVVRaeWNEUTNSVko2ZGtnclltZG5iRkY2VW5oQmMwcFVSRXAzV0c5eVJqSTVNSEJXTVhOSWNpczFMMVp3VDFWRlJsUnJlSE5ETlVaS2VYQjJPV2hWU2pOVVZ6RlBMM0pDYzBOWlQzbEJkV3g1TWtSYWMwaEhjR0ZSUjI5NGFIZGpha2RJTkVzeVRGaFRjMnh0UldKSFdVWm9halF6YUhKa04wUnNabmQ2U1ZaUE1tWkRUR2RPTlhORVJHOXhNbWR5WWxCWldFUkJjMmQyY1dGamVWTjZZMWRCUTBOclRXOW9XWGRpUkVKS09XZGlSVzl1WVZOQlpUVjZZamhSVEhKdGEyYzNUM0pxYVhCdmJUTmxRbWMwVXpJMGIycE5WRXBXZVhKV2NWRTJNSEJ6UTNobFVUZzJOa1ZMWlhkMGJGQjBRbVIwUjBGNlMyNURPVEZ6UWtWS1RHbGhWRzR4Ulc0MU0ybHRaRVIzVm1Ga1ZGVTVVemQ0UVd4Q2JtUndTMEZ1Tmtac1l6SmtWRVpxYUhKTlRFVnljMnhCV1VkMk5YaEZkbEpqZDBoSUwzQkZNVlJVYTFkUmFIUjBhMXBwTVZSdWRXcFdkR1EzYkhoQ2JXZDFjekZyV1hGbVVVaDBRV1JWTlZwQk1HSkRSV3BSWmxwSVNDOVNRbTR4V1dkNmJXNVlWRzl6VWxKWlRHUkdTV2h4TUhkVWRrMW9XR3AyVEdGRVZYRkhiMnB1Y2pSemFXNVBRa3h1Y0VGTUszWk9TekpYUjFsRVNYSkdOSFJ3ZVhwT1dqUldVMFozWVRoRVdIRkRZbmxEVGxSdWRqaE1SQzg0V2swdmJYcGpNRmxoUkVObllsVmtNVWRxYmpSM01VUlhURlpWY2t0VVJGRTNhMVZ4WlhSa2JVTnlWVVU0U2pWd2VDOWxZVFJZVmpoU1ZqWkxNVlJDVVUxS1lsUlJZM1k1ZWtwNmNqQTBZbFJXVUhaNmQxUnNkSFpuTWxsQmMxSkpjbWhOZW1SQ1ZEVllWeXR5ZFd0WFRsRkNOVkZNV2xweFRFVlpRelpSV0VoUWIyWlJZMjFSU1hScFJuQkhlRVI0Um5aV1l6RnJXazlpUlRjcmVFeE5ORWQwTTAxWVNYbGtTVzVLYnpsVFdXOUJhVXB5WWpGek1ERjZkMGxsZEVVMWRWTnVXbFprTjIwMFQza3lNWGgxUkhFME1FVlVLMXBqUzJWM1NEaFFPVFkwVEVwR2RtbElVelUzUVhJM1NHMVphMUJOVjJsUGFuZHJWRzVvT0hWdkwzVnVOVnBDV2tGSmJ6UkVWSFUwV2lzd1MwRXhSazlaUW5ZNE9EbFBkekZLZWpWa2RqZ3JUekEwT0ZaeGEzVmthamw0YWpSU09EUmFPRFIzT1VoUGN6QlpOVTlQZDFFMVdtWTJRMjFRYTFSUUx6aGhZMGxETTJsUVZFcHpMeXM0ZEVWS1MzaFdiRnBTU1RaU1dYYzJZWGR0UWpaUldsSXJVV2N3YVRZNWNIbEJSbUZGV2k4eFkwNTFUakUzVm5oWGFWbDVla1JhTlhKeVQzRkRSM05hTUVST1EyNUNTVkJFVkhKMldIRnpha3huVlZSNmQwTkdWVGN4YVhRMVFURldXUzlXUjB4a1VIUndZMFJSTTFobEwzSjZMemc1SzNoVGJGUjNkV3A2ZEd3elZtZFpPRVV2YVhaMlJUVXJkREJxWmtsSlZIRnhNa050ZGpSYUwwZEpVV3BXVVdodFFYbG5SazVOZUZCT2IycENVelkyT1dGRWIwUXlXblZFVFdVMmRFRlhXaXN5TWpVekwyZzRWSHBWZEdWU1pDczVURlI2Y0haYVRubzJXblpJYWl0Qk9Dc3djMlZIU0VJd2EwaFpNVWwxZEV4NE1ESnBWVFJOTUVSNVRXTk9Za3hDYjBwRWNtNUhVVzlSYjBoNk5rOXllSEpZYW5oUVdXbzRkeTkzYjJRM1dqSnRhekZNTVZKaGMyTjRXRGQ2V25sMFZpOVhVakUwTDBZM00yMTZVMDV0UTJRMFUweHBWbmhCUW14ME1ETmlWVGs0VHpaRFpIcGpjMk5xWVhWMGVtbzBTVmhpTTNWeVRHUnJSbmt3TVV4RlVVWkNha2Q1VkVkVWFuTnFOblpuTm1KTWNqZFRiVVZHTmxoTlYwUk5TbG8wZUdKTVNWbG9OU3NyVFRKQ2VGQktTek5hV0RKbVRraHdUbVY2YVdoUk9HMXNPVk42VVRsaGRuYzNaVlZoUzNkaFJHdEpZMjRyUkhwVGIzRjFTVlpZZWxGNllUZENiMVUwU1hOQlVteEtkVkpSZDFoak1TdExjR1JXV0dwWmFYbGtMMU5hYTBjMk5sSTFOSGRoTTBaVGFrZzVia1JOTmpWV0wzQnhaalZVS3pGNWJITm9SV1puVmprMFNIUm1halozWW10Q1VXRldTMGh5Wld4MFMzQkdUMFIyTmxCVmFUZEVLemhTTDNGVWVuTnNNa2hzUWpOUVlrZHdhSGRRVUZadGJWTnNPSFJHYVhoQ1RIVk9iMkZIUVVSdE1WSXpWM0pRZWt0aGMzcExhbWwxUWxNMFJWWXhjSE5uVVRZMVJsQkhXbEJYZDNORFVVaHZRM00wTkRaS1lsSkhaRll6UlhsS09IUmFNMHMxYkhwTk1XYzVSak5yYmpVclVqZzNWRGhNZEVwb1ZEQlRiMWN4VkZkdU1VTlplVmgzTlVRMllYUkpRbEJ5T0N0NVVYZDRhVkpsU1ZsTmMwaHpRelFyVFRBNU5GUnhVVWRUTTJjeFNWZEdTa0Z2YjJsTVFreHFXRTUwYkhveVZtTmhTSE5DVVdaTU1rTkhSR1JVWkRCMkt6RnRhbkpPVVhOeVVFMUZaMUpQVFhZMVJrUm9XVlZpSzA1SFUwMWhlVlJNV2taRGNqTmFVMFZpTkdsT1dVVjJUVzR5Y2pkTk4zQnFhMFZOWjJoSFpsaElXbGxVVjFCalJraGFaak5RYjA4eFJVcDBURUpUYURaeWJXbGlTek4xVG1aUVNYSlJaVEU1V213Mk1qUTJia3RTV1dvemVEVkZNVWhZYVdJMFVtUkJTWGxNWmsxS09WRmxRWFJ3VG1zdmVDdHlNaTlTWnpKSGRrZHRTMWhyV2xKbFUyczVSWFJSVTNaa1JrRlRRMnRFZUZFdlVXRkhOVGhpT0RGaUswTjFRbGd3ZEVSTmJsTkxiVVV4YVdGVVMwaFBLMXBpV2pWa1NXSjZOM0J2YlRoclRubHpOMFJ2ZVZWbk9ITnBlREZKWWtNckx6SnBkRmhtWkhGa05HZ3dhVEkzVTNwWFJFOXpWMnRqZURkSUsyZEtTRGxSYzNKeGMyWmxZbEJOU1hnekwwY3pWbmRNV0hScmMyVk5ZVzAzYW5nemQwWndSbVZFYTA5YWIzWndZVVpJVGk5TVZHVk9lRlIwU2poUlpWbDFaVTFaY1dGbGEzWkROVmxUWjNoa2JVMURPVkpWWm1WRlJYb3JiWEpaYkU4NFYzb3JkRTlYUWxOQlZFaFViMHhoVEhKVGQwMU5PVEpHV1dFNFJGb3hWM04zY214SGVGcFNlbEZ2VTFWNEsyaDNLMEpITDJaMGEycGxhM0pEZW0xTWMzUm9PSE4zUjJaV1FrcHRSVk4wVkc1M2FFRmxSbVJUVFZWelpFaEdXSEZyVkhkMU9HWmlabkZKWlhsdU16RlpOREpQYTJabldGaEtlRzR2WjNVM2JUWkJQVDAuSXFFcjcxblloaURBYjl6QUtnN2hRVVNoaE9QbVZQWlhnT1NZQ0lVN3E3NWw0TGtHS2x4OWhDbzVkMVNRQlBnaVRfbkQtemVEdHpyS0F4NldvV0JLZ1E=',
'submit': 'LOGIN'}
r = requests.get(url_1, params = params)
print(r.headers)
print(r.cookies)
print(r.status_code)
headers = {'Accept': 'text/html,application/xhtml+xml,application/xml;q=0.9,image/webp,image/apng,*/*;q=0.8,application/signed-exchange;v=b3;q=0.9',
'Accept-Encoding': 'gzip, deflate, br',
'Content-Type': 'application/x-www-form-urlencoded',
'Sec-Fetch-Dest': 'document',
'Sec-Fetch-Mode': 'navigate',
'Sec-Fetch-Site': 'same-origin',
'Sec-Fetch-User': '?1',
'User-Agent': 'Mozilla/5.0 (Windows NT 10.0; Win64; x64) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/84.0.4147.135 Safari/537.36',
'Upgrade-Insecure-Requests': 1,
'Referer': 'https://utslogin.nlm.nih.gov/cas/login?service=https%3a%2f%2fmetamap.nlm.nih.gov%2fdownload%2fpublic_mm_linux_main_2020.tar.bz2',
'Origin': 'https://utslogin.nlm.nih.gov',
'Host': 'utslogin.nlm.nih.gov',
'Connection': 'keep-alive',
'Cache-Control': 'max-age=0',
'Accept-Language': 'en-GB,en-US;q=0.9,en;q=0.8,es-CO;q=0.7,es-AR;q=0.6,es;q=0.5,it;q=0.',
'Content-Length': 4634}
r_2 = requests.post(url_1, data=data, cookies=r.cookies, params = params)
print("-----------------")
print(r_2.headers)
print(r_2.cookies)
print(r_2.status_code)
execution and submit are parameters also passed during the POST request. With the first GET I emulate the normal loading of the page, after which a cookie is given to me. Using that cookie and submitting the form, I expect to get back another cookie and a ticket with which to download the file. However, instead of getting this, what I see in output is the response header of another GET as is anything had happened.
By the way, to whomever wants to help me: the credentials from that site are acquired prior by registering and some hours may pass between registration and credential arrival.
Related
I'm trying to login to a specific site in Python, but without success. Any helpers?
import requests
url = 'https://us0.forgeofempires.com/page/'
user_agent = 'Mozilla/5.0 (X11; Ubuntu; Linux x86_64; rv:90.0) Gecko/20100101 Firefox/90.0'
session = requests.Session()
session.headers['User-Agent'] = user_agent
_xsrf = session.cookies.get('_xsrf', domain='.forgeofempires.com')
login = session.post(url, {
'login[userid]':'user',
'login[password]':'pass',
'_xsrf':_xsrf
})
print(login)
f = open('results.html', 'w', encoding = 'utf-8')
f.write(login.text)
f.close()
Update
The following code finally works. I had to figure out by trial and error the fewest headers that needed to be set to get this to work. And, of course, as I previously stated, it is necessary to first visit the URL https://us.forgeofempires.com/glps/iframe-login to have the necessary cookies downloaded before you can proceed with sending the login POST (it is also necessary to extract from the cookies the 'XRRF-TOKEN' value and use this to set the 'x-xsrf-token' header value). And in my earlier attempt I was erroneously sending up the form encoded as JSON.
Note that the output is in JSON format; it is not HTML. Your output file should probably have the file extension of .json.
import requests
initial_url = 'https://us.forgeofempires.com/glps/iframe-login'
login_url = 'https://us.forgeofempires.com/glps/login_check'
session = requests.Session()
user_agent = 'Mozilla/5.0 (X11; Ubuntu; Linux x86_64; rv:90.0) Gecko/20100101 Firefox/90.0'
session.headers['User-Agent'] = user_agent
r = session.get(initial_url)
xsrf_token = session.cookies.get('XSRF-TOKEN', domain='us.forgeofempires.com')
session.headers['x-xsrf-token'] = xsrf_token
session.headers['x-requested-with'] = 'XMLHttpRequest'
login = session.post(login_url, data={
'login[userid]': 'HTWyou',
'login[password]': 'kjh&*76dfgG',
'login[remember_me]': False
})
#f = open('results.html', 'w', encoding = 'utf-8')
f = open('results.json', 'w', encoding='utf-8')
f.write(login.text)
f.close()
"""
# code to print the headers:
hdrs = login.request.headers
for k in sorted(hdrs.keys()):
print(f'{k}: {hdrs[k]}')
"""
print(login.json())
Prints:
{'success': True, 'url': '/', 'player_id': 851846598}
The headers being passed up by the login XHR Request
:authority: us.forgeofempires.com
:method: POST
:path: /glps/login_check
:scheme: https
accept: application/json, text/plain, */*
accept-encoding: gzip, deflate, br
accept-language: en-US,en;q=0.9
content-length: 74
content-type: application/x-www-form-urlencoded; charset=UTF-8
cookie: device_view=full; metricsUvId=da9ca1f1-8e32-411f-ac7d-bf389df25469; portal_tid=1627319201921-36801; portal_ref_session=1; portal_ref_url=https://us0.forgeofempires.com/; portal_data=portal_tid=1627319201921-36801&portal_ref_session=1&portal_ref_url=https://us0.forgeofempires.com/; PHPSESSID=54955p3mbbstetnnquivk12rk2r0ihqmk41nagl620r5g9p5; XSRF-TOKEN=mQxleEoXU2Hyz6LP1ZlM7huHE-DLt5pg8qySl-M1vtI
dnt: 1
origin: https://us.forgeofempires.com
referer: https://us.forgeofempires.com/glps/iframe-login
sec-ch-ua: "Chromium";v="92", " Not A;Brand";v="99", "Google Chrome";v="92"
sec-ch-ua-mobile: ?0
sec-fetch-dest: empty
sec-fetch-mode: cors
sec-fetch-site: same-origin
user-agent: Mozilla/5.0 (Windows NT 10.0; Win64; x64) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/92.0.4515.107 Safari/537.36
x-requested-with: XMLHttpRequest
x-xsrf-token: mQxleEoXU2Hyz6LP1ZlM7huHE-DLt5pg8qySl-M1vtI
The headers being passed up by session.post
Accept: */*
Accept-Encoding: gzip, deflate
Connection: keep-alive
Content-Length: 89
Content-Type: application/x-www-form-urlencoded
Cookie: PHPSESSID=86lis3h0s8291fgkto5r2lcnqr7a6c22nnotveeo2iuuusht; XSRF-TOKEN=Qxm6zfZrDqaI8ce7MDQ5rQqpHHOUozKaFu-NL2Av9KU; device_view=full
User-Agent: Mozilla/5.0 (X11; Ubuntu; Linux x86_64; rv:90.0) Gecko/20100101 Firefox/90.0
x-requested-with: XMLHttpRequest
x-xsrf-token: Qxm6zfZrDqaI8ce7MDQ5rQqpHHOUozKaFu-NL2Av9KU
How can I get around the 403 code?
I was trying to get search result from a website, however I got "Response[403]" message, I've found similar get solving 403 error by adding headers to request.get, however it didn't work for my problem. What should I do to correctly get the result I want?
Google Chrome Request
Request
Request URL: http://178.32.46.165/
Request Method: GET
Status Code: 200 OK
Remote Address: 178.32.46.165:80
Referrer Policy: strict-origin-when-cross-origin
Request Headers
Accept: text/html,application/xhtml+xml,application/xml;q=0.9,image/avif,image/webp,image/apng,*/*;q=0.8,application/signed-exchange;v=b3;q=0.9
Accept-Encoding: gzip, deflate
Accept-Language: tr-TR,tr;q=0.9
Cache-Control: max-age=0
Connection: keep-alive
Host: 178.32.46.165
Upgrade-Insecure-Requests: 1
User-Agent: Mozilla/5.0 (Windows NT 10.0; Win64; x64) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/85.0.4183.102 Safari/537.36
Python Request
import requests
headers = {
'Connection': 'keep-alive',
'Cache-Control': 'max-age=0',
'Upgrade-Insecure-Requests': '1',
'User-Agent': 'Mozilla/5.0 (Windows NT 10.0; Win64; x64) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/85.0.4183.102 Safari/537.36',
'Accept': 'text/html,application/xhtml+xml,application/xml;q=0.9,image/avif,image/webp,image/apng,*/*;q=0.8,application/signed-exchange;v=b3;q=0.9',
'Accept-Language': 'tr-TR,tr;q=0.9',
}
response = requests.get('http://178.32.46.165/', headers=headers, verify=False)
print(response)
Response
<Response [403]>
If you're getting a 403 response, you're likely not meant to be able to access this resource. This is not a Python issue, this is an issue with whatever website you're trying to access.
Additionally, this seems to be a temporary issue. I pasted your code as-is into my Python prompt and received <Response [200]>
I think it may provide blocking depending on the country.
Please help. I'm trying to create a POST request on an .asp site that requires cookies, but the way I handle them seems not to return anything. Read through some questions of similar topic but can't find the _SessionID cookie some are referring to. Please help me formulate this POST request so it works.
Headers
:authority: safer.fmcsa.dot.gov
:method: POST
:path: /query.asp
:scheme: https
accept: text/html,application/xhtml+xml,application/xml;q=0.9,image/webp,image/apng,*/*;q=0.8,application/signed-exchange;v=b3;q=0.9
accept-encoding: gzip, deflate, br
accept-language: en-US,en;q=0.9
cache-control: max-age=0
content-length: 85
content-type: application/x-www-form-urlencoded
cookie: ASP.NET_SessionId=ywxszihqlu1yciwe5z5gm4qt; etype=au; ASPSESSIONIDQECTCDRB=KGFOBHBBLKCBKBFIAPEBMIHJ; ASPSESSIONIDQGARCDRA=LKEDBAOBMOMDNGBNBFEMMIPB; ASPSESSIONIDSEBQCBSB=DAMJMNKCNJKHCMDCIJBPKEHD; ASPSESSIONIDCEQRADQC=EIEJCDLBHHCCKHCNJNIMHDKA; ASPSESSIONIDAESTCBQC=KPDPJNHCLOBJENEHPNIFKJLH; LI_carrier=67449; ASPSESSIONIDAGSQADRC=CPIBAKEDDPNFCIPLIGLOKKLA; ASPSESSIONIDAERTDBQD=FMKFHJJAJKNIGCBCCFJFCMNF; AWSALB=Xc7OAuZUmx6vgE5l9NaawsH8oBWjy6eZ3B62kw2rZ5HieoRlMu4SSmVVcaPJPcPjp1fVt9U/T9FaRflgNHwtzmsK4X4e+y+yoGArTfgpb75NWo/ilAek0Qk/sFYI; AWSALBCORS=Xc7OAuZUmx6vgE5l9NaawsH8oBWjy6eZ3B62kw2rZ5HieoRlMu4SSmVVcaPJPcPjp1fVt9U/T9FaRflgNHwtzmsK4X4e+y+yoGArTfgpb75NWo/ilAek0Qk/sFYI
origin: https://safer.fmcsa.dot.gov
referer: https://safer.fmcsa.dot.gov/CompanySnapshot.aspx
sec-fetch-dest: document
sec-fetch-mode: navigate
sec-fetch-site: same-origin
sec-fetch-user: ?1
upgrade-insecure-requests: 1
user-agent: Mozilla/5.0 (Macintosh; Intel Mac OS X 10_15_1) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/80.0.3987.132 Safari/537.36
Form Data
searchtype: ANY
query_type: queryCarrierSnapshot
query_param: USDOT
query_string: 2300842
My Code So Far
def checkDOT():
headers = {
'User-Agent': 'Mozilla/5.0 (Windows NT 6.1; Win64; x64; rv:65.0) Gecko/20100101 Firefox/65.0',
'Accept': 'text/html,application/xhtml+xml,application/xml;q=0.9,image/webp,image/apng,*/*;q=0.8,application/signed-exchange;v=b3;q=0.9',
'Accept-Language': 'en-US,en;q=0.9',
'Cache-Control': 'no-cache,no-store,must-revalidate,max-age=0,private',
'upgrade-insecure-requests': '1',
'Connection': 'keep-alive',
'origin': 'https://safer.fmcsa.dot.gov',
'referer': 'https://safer.fmcsa.dot.gov/CompanySnapshot.aspx'
}
s = requests.Session()
data = {
'searchtype': 'ANY',
'query_type': 'queryCarrierSnapshot',
'query_param': 'USDOT',
'query_string': '2300842'
}
params = (
('pageNumber', '0'),
('itemsPerPage', '15'),
)
url = 'https://safer.fmcsa.dot.gov/CompanySnapshot.aspx'
response = s.get(url, headers=headers, data=data, params=params)
if response:
print(response.content)
else:
print("This did not work")
get requests dont use data parameter, and your code is requests.get,is that right?
I can get a html page with:
headers = {
'User-Agent': 'Mozilla/5.0 (Windows NT 6.1; Win64; x64; rv:65.0) Gecko/20100101 Firefox/65.0',
}
url = 'https://safer.fmcsa.dot.gov/CompanySnapshot.aspx'
response = requests.get(url, headers=headers, verify=False)
print(response.text)
I went to this website
www4.fmovies.to
then I clicked a movie and checked its CDN URL via Inspect->Network
and got below details
https://cdn.mcloud.to/stream/sf:i0:q2:h3:p23:l1/LR6ljfLn3hrEjSfrOp19wg/1542603600/i/f/2/nr69r8/hls/480/480-0013.ts
:authority: cdn.mcloud.to
:method: GET
:path: /stream/sf:i0:q2:h3:p23:l1/LR6ljfLn3hrEjSfrOp19wg/1542603600/i/f/2/nr69r8/hls/480/480-0001.ts
:scheme: https
accept: */*
accept-encoding: gzip, deflate, br
accept-language: en-US,en;q=0.9
cookie: __cfduid=d0847f9ac6d9a8da1dd131d1a0a91ea991542533053; _ga=GA1.2.485859786.1542533055; _gid=GA1.2.1916946057.1542533055; _gat=1
origin: https://mcloud.to
referer: https://mcloud.to/embed/#P#O8SE2916SEOA5?sub.file=https%253A%252F%252Fstatic1.akacdn.ru%252Fsubtitle%252F40039.vtt%253Fv1&ui=oAhi567w9OQEhJWEdbl0s%40Ep0Ir2VvG1xiK9JqKx
user-agent: Mozilla/5.0 (X11; Linux x86_64) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/70.0.3538.102 Safari/537.36
created header information using the above information and then ran
request = requests.get(url, headers=headers)
But am getting 403 Not Authorized. What is the issue?
You need to pass referer header that is the src attribute of video content iframe that looks like
<iframe src="https://mcloud.to/embed/#9#4ZS04Z10SWOE5?ui=pwxi4Kjr6%40wHmIqHcrl0yeFfpYqUUIW1wCKlJr6x" allow="autoplay; fullscreen" scrolling="no" allowfullscreen="yes" style="width: 100%; height: 100%;" frameborder="no"></iframe>
The code looks like
headers = {'user-agent': 'Mozilla/5.0 (Windows NT 6.1; Win64; x64; rv:60.0)Gecko/20100101 Firefox/60.', 'pragma': 'no-cache', 'connection': 'keep-alive', 'cache-control': 'no-cache', 'referer': 'https://mcloud.to/embed/#9#4ZS04Z10SWOE5?ui=pwxi4Kjr6%40wHmIqHcrl0yeFfpYqUUIW1wCKlJr6x'}
requests.get('https://cdn.mcloud.to/stream/sf:i0:q2:h2:p24:l1/WjLDZuCBHmtyv63lT-RoVQ/1542603600/g/c/0/rj0m0m/hls/480/480-0000.ts', headers=headers)
Here is what my XHR data looks like when captured in chrome
Request Header
POST my_url?X-Progress-ID=ee821652321919bc7ae61fbe0b625990&userpkgname=file_name.tar.gz HTTP/1.1
Host: 10.110.134.28
Connection: keep-alive
Content-Length: 17461
Accept: application/json, text/plain, */*
Origin: https://10.110.134.28
X-XSRF-TOKEN: bGIwdfFE-oaL_1yVrCzw0iHvv4yUHLC28xjw
User-Agent: Mozilla/5.0 (Windows NT 6.1; Win64; x64) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/53.0.2785.116 Safari/537.36
Content-Type: multipart/form-data; boundary=----WebKitFormBoundarybfK9jSdLoc2Mpj0i
Referer: https://10.110.134.28/
Accept-Encoding: gzip, deflate, br
Accept-Language: en-US,en;q=0.8
Cookie: XSRF-TOKEN=bGIwdfFE-oaL_1yVrCzw0iHvv4yUHLC28xjw; sid=s%3AXglfJNLQ9zzp3eHjQ2QOpk19kFKDDMvJ.ZMKyZd1Gx13lz2MnJgty5WncnilySzfoThGktkhlk4w
Payload
------WebKitFormBoundarybfK9jSdLoc2Mpj0i
Content-Disposition: form-data; name="package"; filename="file_name.tar.gz"
Content-Type: application/x-gzip
------WebKitFormBoundarybfK9jSdLoc2Mpj0i--
And this is how I am building my request.
files = {'package': (<file_name>, open(config_path, 'rb'), 'application/x-gzip')}
request.post(url, files=files)
This is how my request header looks like
{
'Content-Length' : '17449',
'Accept-Encoding' : 'gzip, deflate',
'Accept' : '*/*',
'User-Agent' : 'python-requests/2.10.0',
'Connection' : 'keep-alive',
'Cookie' : 'XSRF-TOKEN=zmklLEL0-gDJOfNBk113MuTpBkLo0j6MAzw0; sid=s%3AC2JZDCfpg_CgkU7qSlS5YTvWXwpgMX35.5nU7W02TPNYtMkIQ4W%2B1bjd87A7KyJbh3shoNqqADXE',
'Content-Type' : 'multipart/form-data; boundary=270d9e02bf214dc7a09c3081cba5b0e0',
'XSRF-TOKEN' : 'zmklLEL0-gDJOfNBk113MuTpBkLo0j6MAzw0'
}
When I make the request I get 502 bad gateway response that too after few seconds while on chrome I get 200 OK instantly
So most probably I am not building my request correctly. Any suggestions?