Download file after POST form submission - python

I am trying to download the file from here.
In Firefox, the link redirects me to a webpage where I have to log in with my credentials. After doing so, Firefox automatically prompts me to save the desired file with a pop-up windows.
I am trying to replicate that behavior with Python without success.
I've tried the code from this question, namely
import requests
session = requests.session()
url = "https://metamap.nlm.nih.gov/download/public_mm_linux_main_2020.tar.bz2"
data= {'username': 'xxxxx', 'password':'xxxxxx'}
session.post(url, data=data)
response = requests.get("https://metamap.nlm.nih.gov/download/public_mm_linux_main_2020.tar.bz2")
print(response.text)
Insights from Firefox
Using the Network tab from the Developer Mode, I have been able to see the following:
The POST request, apart from username and password sends also an execution parameter, which seems to be very long hashed string. I don't know what this is or how to replicate this.
POST /cas/login?service=https%3a%2f%2fmetamap.nlm.nih.gov%2fdownload%2fpublic_mm_linux_main_2020.tar.bz2 HTTP/1.1
Host: utslogin.nlm.nih.gov
User-Agent: Mozilla/5.0 (X11; Ubuntu; Linux x86_64; rv:79.0) Gecko/20100101 Firefox/79.0
Accept: text/html,application/xhtml+xml,application/xml;q=0.9,image/webp,*/*;q=0.8
Accept-Language: en-US,en;q=0.5
Accept-Encoding: gzip, deflate, br
Content-Type: application/x-www-form-urlencoded
Content-Length: 4900
Origin: https://utslogin.nlm.nih.gov
Connection: keep-alive
Referer: https://utslogin.nlm.nih.gov/cas/login?service=https%3a%2f%2fmetamap.nlm.nih.gov%2fdownload%2fpublic_mm_linux_main_2020.tar.bz2
Cookie: TGC=exxxxxx; JSESSIONID=xxxxx
Upgrade-Insecure-Requests: 1
Afterwards, there is a GET request to the same address with a ticket parameter.
GET /download/public_mm_linux_main_2020.tar.bz2?ticket=xxxxxx HTTP/1.1
Host: metamap.nlm.nih.gov
User-Agent: Mozilla/5.0 (X11; Ubuntu; Linux x86_64; rv:79.0) Gecko/20100101 Firefox/79.0
Accept: text/html,application/xhtml+xml,application/xml;q=0.9,image/webp,*/*;q=0.8
Accept-Language: en-US,en;q=0.5
Accept-Encoding: gzip, deflate, br
Referer: https://utslogin.nlm.nih.gov/cas/login?service=https%3a%2f%2fmetamap.nlm.nih.gov%2fdownload%2fpublic_mm_linux_main_2020.tar.bz2
Connection: keep-alive
Cookie: MOD_AUTH_CAS_S=xxxxxx
Upgrade-Insecure-Requests: 1
Finally, another GET triggers the download.
GET /download/public_mm_linux_main_2020.tar.bz2 HTTP/1.1
Host: metamap.nlm.nih.gov
User-Agent: Mozilla/5.0 (X11; Ubuntu; Linux x86_64; rv:79.0) Gecko/20100101 Firefox/79.0
Accept: text/html,application/xhtml+xml,application/xml;q=0.9,image/webp,*/*;q=0.8
Accept-Language: en-US,en;q=0.5
Accept-Encoding: gzip, deflate, br
Referer: https://utslogin.nlm.nih.gov/cas/login?service=https%3a%2f%2fmetamap.nlm.nih.gov%2fdownload%2fpublic_mm_linux_main_2020.tar.bz2
Connection: keep-alive
Cookie: MOD_AUTH_CAS_S=xxxxx
Upgrade-Insecure-Requests: 1
In between all of this, cookies are set and read.
I don't know what information is useful. I'm putting here all that I can see.
Than you in advanced.
EDIT:
After more exploration I am trying to replicate the 3 request pattern seen in Mozilla/Chrome.
This is my code now:
import requests
url_1 = "https://utslogin.nlm.nih.gov/cas/login"
# url_2 = "https://metamap.nlm.nih.gov/download/public_mm_linux_main_2020.tar.bz2"
params = {'service': 'https%3a%2f%2fmetamap.nlm.nih.gov%2fdownload%2fpublic_mm_linux_main_2020.tar.bz2'}
data = {'username': 'xxxxx',
'password':'xxxxx',
'execution': '1cb37c1a-141b-4891-b961-ea5c4b20deed_ZXlKaGJHY2lPaUpJVXpVeE1pSjkuWXpWb1EzaFdTMmxOZGxsdUwwUlFiWEozSzBkSlVuUXhURlpuZEROSmVpdGtObFpVYjFkU01FMWlhMjkxS3lzNWVrbHlNblptTWt4MVdrdE9SU3RJTmsxTGNISmhNblYxTmtOVVExWkJPWFJZUkVKbFREWk1WV2RaZEhSSU5qbHNNbTlZVldONGMyTndNWGgxVGtWalZtcHRkRWMxZGxaSGNsVmtTVmh6UzJSWmFEQjBVRmRMUlVSalRsTk5WVVYxYlU4elRHUjNZV0pSWVhCblVsQXdjemRhZW5ac1NHeDNhMU5pYjA5UlVYTk1ZWHB1UWtkeGJtUTFRa1V5T0RseVNWcHlSa3RZY2pGS09FeFNjekZNY21KVmFsaE1la3RKZFdsd1pGaEhNVmhIY3pJMU4zRkZVVUZrYTA1aE5IZGljMlZZWTA5Q1kyNXFTbTlUUkRsR1UxcHdVa2wyUlhaU1dsbFpaRU5yVFhwQmQwOWlOVTh4WTNaVmNIWXJaalJqUmxGMU4xSmxVVWx3VjNwR1ZrdDROamhaTjI5UlJXTjVXbXg2YVdGbE5HTlpkM0JxUW5kVWJUaHVXVGx6V0VKek5VRmpjV2c0TmpsSlUydHJRWEZUVVRaeWNEUTNSVko2ZGtnclltZG5iRkY2VW5oQmMwcFVSRXAzV0c5eVJqSTVNSEJXTVhOSWNpczFMMVp3VDFWRlJsUnJlSE5ETlVaS2VYQjJPV2hWU2pOVVZ6RlBMM0pDYzBOWlQzbEJkV3g1TWtSYWMwaEhjR0ZSUjI5NGFIZGpha2RJTkVzeVRGaFRjMnh0UldKSFdVWm9halF6YUhKa04wUnNabmQ2U1ZaUE1tWkRUR2RPTlhORVJHOXhNbWR5WWxCWldFUkJjMmQyY1dGamVWTjZZMWRCUTBOclRXOW9XWGRpUkVKS09XZGlSVzl1WVZOQlpUVjZZamhSVEhKdGEyYzNUM0pxYVhCdmJUTmxRbWMwVXpJMGIycE5WRXBXZVhKV2NWRTJNSEJ6UTNobFVUZzJOa1ZMWlhkMGJGQjBRbVIwUjBGNlMyNURPVEZ6UWtWS1RHbGhWRzR4Ulc0MU0ybHRaRVIzVm1Ga1ZGVTVVemQ0UVd4Q2JtUndTMEZ1Tmtac1l6SmtWRVpxYUhKTlRFVnljMnhCV1VkMk5YaEZkbEpqZDBoSUwzQkZNVlJVYTFkUmFIUjBhMXBwTVZSdWRXcFdkR1EzYkhoQ2JXZDFjekZyV1hGbVVVaDBRV1JWTlZwQk1HSkRSV3BSWmxwSVNDOVNRbTR4V1dkNmJXNVlWRzl6VWxKWlRHUkdTV2h4TUhkVWRrMW9XR3AyVEdGRVZYRkhiMnB1Y2pSemFXNVBRa3h1Y0VGTUszWk9TekpYUjFsRVNYSkdOSFJ3ZVhwT1dqUldVMFozWVRoRVdIRkRZbmxEVGxSdWRqaE1SQzg0V2swdmJYcGpNRmxoUkVObllsVmtNVWRxYmpSM01VUlhURlpWY2t0VVJGRTNhMVZ4WlhSa2JVTnlWVVU0U2pWd2VDOWxZVFJZVmpoU1ZqWkxNVlJDVVUxS1lsUlJZM1k1ZWtwNmNqQTBZbFJXVUhaNmQxUnNkSFpuTWxsQmMxSkpjbWhOZW1SQ1ZEVllWeXR5ZFd0WFRsRkNOVkZNV2xweFRFVlpRelpSV0VoUWIyWlJZMjFSU1hScFJuQkhlRVI0Um5aV1l6RnJXazlpUlRjcmVFeE5ORWQwTTAxWVNYbGtTVzVLYnpsVFdXOUJhVXB5WWpGek1ERjZkMGxsZEVVMWRWTnVXbFprTjIwMFQza3lNWGgxUkhFME1FVlVLMXBqUzJWM1NEaFFPVFkwVEVwR2RtbElVelUzUVhJM1NHMVphMUJOVjJsUGFuZHJWRzVvT0hWdkwzVnVOVnBDV2tGSmJ6UkVWSFUwV2lzd1MwRXhSazlaUW5ZNE9EbFBkekZLZWpWa2RqZ3JUekEwT0ZaeGEzVmthamw0YWpSU09EUmFPRFIzT1VoUGN6QlpOVTlQZDFFMVdtWTJRMjFRYTFSUUx6aGhZMGxETTJsUVZFcHpMeXM0ZEVWS1MzaFdiRnBTU1RaU1dYYzJZWGR0UWpaUldsSXJVV2N3YVRZNWNIbEJSbUZGV2k4eFkwNTFUakUzVm5oWGFWbDVla1JhTlhKeVQzRkRSM05hTUVST1EyNUNTVkJFVkhKMldIRnpha3huVlZSNmQwTkdWVGN4YVhRMVFURldXUzlXUjB4a1VIUndZMFJSTTFobEwzSjZMemc1SzNoVGJGUjNkV3A2ZEd3elZtZFpPRVV2YVhaMlJUVXJkREJxWmtsSlZIRnhNa050ZGpSYUwwZEpVV3BXVVdodFFYbG5SazVOZUZCT2IycENVelkyT1dGRWIwUXlXblZFVFdVMmRFRlhXaXN5TWpVekwyZzRWSHBWZEdWU1pDczVURlI2Y0haYVRubzJXblpJYWl0Qk9Dc3djMlZIU0VJd2EwaFpNVWwxZEV4NE1ESnBWVFJOTUVSNVRXTk9Za3hDYjBwRWNtNUhVVzlSYjBoNk5rOXllSEpZYW5oUVdXbzRkeTkzYjJRM1dqSnRhekZNTVZKaGMyTjRXRGQ2V25sMFZpOVhVakUwTDBZM00yMTZVMDV0UTJRMFUweHBWbmhCUW14ME1ETmlWVGs0VHpaRFpIcGpjMk5xWVhWMGVtbzBTVmhpTTNWeVRHUnJSbmt3TVV4RlVVWkNha2Q1VkVkVWFuTnFOblpuTm1KTWNqZFRiVVZHTmxoTlYwUk5TbG8wZUdKTVNWbG9OU3NyVFRKQ2VGQktTek5hV0RKbVRraHdUbVY2YVdoUk9HMXNPVk42VVRsaGRuYzNaVlZoUzNkaFJHdEpZMjRyUkhwVGIzRjFTVlpZZWxGNllUZENiMVUwU1hOQlVteEtkVkpSZDFoak1TdExjR1JXV0dwWmFYbGtMMU5hYTBjMk5sSTFOSGRoTTBaVGFrZzVia1JOTmpWV0wzQnhaalZVS3pGNWJITm9SV1puVmprMFNIUm1halozWW10Q1VXRldTMGh5Wld4MFMzQkdUMFIyTmxCVmFUZEVLemhTTDNGVWVuTnNNa2hzUWpOUVlrZHdhSGRRVUZadGJWTnNPSFJHYVhoQ1RIVk9iMkZIUVVSdE1WSXpWM0pRZWt0aGMzcExhbWwxUWxNMFJWWXhjSE5uVVRZMVJsQkhXbEJYZDNORFVVaHZRM00wTkRaS1lsSkhaRll6UlhsS09IUmFNMHMxYkhwTk1XYzVSak5yYmpVclVqZzNWRGhNZEVwb1ZEQlRiMWN4VkZkdU1VTlplVmgzTlVRMllYUkpRbEJ5T0N0NVVYZDRhVkpsU1ZsTmMwaHpRelFyVFRBNU5GUnhVVWRUTTJjeFNWZEdTa0Z2YjJsTVFreHFXRTUwYkhveVZtTmhTSE5DVVdaTU1rTkhSR1JVWkRCMkt6RnRhbkpPVVhOeVVFMUZaMUpQVFhZMVJrUm9XVlZpSzA1SFUwMWhlVlJNV2taRGNqTmFVMFZpTkdsT1dVVjJUVzR5Y2pkTk4zQnFhMFZOWjJoSFpsaElXbGxVVjFCalJraGFaak5RYjA4eFJVcDBURUpUYURaeWJXbGlTek4xVG1aUVNYSlJaVEU1V213Mk1qUTJia3RTV1dvemVEVkZNVWhZYVdJMFVtUkJTWGxNWmsxS09WRmxRWFJ3VG1zdmVDdHlNaTlTWnpKSGRrZHRTMWhyV2xKbFUyczVSWFJSVTNaa1JrRlRRMnRFZUZFdlVXRkhOVGhpT0RGaUswTjFRbGd3ZEVSTmJsTkxiVVV4YVdGVVMwaFBLMXBpV2pWa1NXSjZOM0J2YlRoclRubHpOMFJ2ZVZWbk9ITnBlREZKWWtNckx6SnBkRmhtWkhGa05HZ3dhVEkzVTNwWFJFOXpWMnRqZURkSUsyZEtTRGxSYzNKeGMyWmxZbEJOU1hnekwwY3pWbmRNV0hScmMyVk5ZVzAzYW5nemQwWndSbVZFYTA5YWIzWndZVVpJVGk5TVZHVk9lRlIwU2poUlpWbDFaVTFaY1dGbGEzWkROVmxUWjNoa2JVMURPVkpWWm1WRlJYb3JiWEpaYkU4NFYzb3JkRTlYUWxOQlZFaFViMHhoVEhKVGQwMU5PVEpHV1dFNFJGb3hWM04zY214SGVGcFNlbEZ2VTFWNEsyaDNLMEpITDJaMGEycGxhM0pEZW0xTWMzUm9PSE4zUjJaV1FrcHRSVk4wVkc1M2FFRmxSbVJUVFZWelpFaEdXSEZyVkhkMU9HWmlabkZKWlhsdU16RlpOREpQYTJabldGaEtlRzR2WjNVM2JUWkJQVDAuSXFFcjcxblloaURBYjl6QUtnN2hRVVNoaE9QbVZQWlhnT1NZQ0lVN3E3NWw0TGtHS2x4OWhDbzVkMVNRQlBnaVRfbkQtemVEdHpyS0F4NldvV0JLZ1E=',
'submit': 'LOGIN'}
r = requests.get(url_1, params = params)
print(r.headers)
print(r.cookies)
print(r.status_code)
headers = {'Accept': 'text/html,application/xhtml+xml,application/xml;q=0.9,image/webp,image/apng,*/*;q=0.8,application/signed-exchange;v=b3;q=0.9',
'Accept-Encoding': 'gzip, deflate, br',
'Content-Type': 'application/x-www-form-urlencoded',
'Sec-Fetch-Dest': 'document',
'Sec-Fetch-Mode': 'navigate',
'Sec-Fetch-Site': 'same-origin',
'Sec-Fetch-User': '?1',
'User-Agent': 'Mozilla/5.0 (Windows NT 10.0; Win64; x64) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/84.0.4147.135 Safari/537.36',
'Upgrade-Insecure-Requests': 1,
'Referer': 'https://utslogin.nlm.nih.gov/cas/login?service=https%3a%2f%2fmetamap.nlm.nih.gov%2fdownload%2fpublic_mm_linux_main_2020.tar.bz2',
'Origin': 'https://utslogin.nlm.nih.gov',
'Host': 'utslogin.nlm.nih.gov',
'Connection': 'keep-alive',
'Cache-Control': 'max-age=0',
'Accept-Language': 'en-GB,en-US;q=0.9,en;q=0.8,es-CO;q=0.7,es-AR;q=0.6,es;q=0.5,it;q=0.',
'Content-Length': 4634}
r_2 = requests.post(url_1, data=data, cookies=r.cookies, params = params)
print("-----------------")
print(r_2.headers)
print(r_2.cookies)
print(r_2.status_code)
execution and submit are parameters also passed during the POST request. With the first GET I emulate the normal loading of the page, after which a cookie is given to me. Using that cookie and submitting the form, I expect to get back another cookie and a ticket with which to download the file. However, instead of getting this, what I see in output is the response header of another GET as is anything had happened.
By the way, to whomever wants to help me: the credentials from that site are acquired prior by registering and some hours may pass between registration and credential arrival.

Related

What is the problem with my Python authentication script?

I'm trying to login to a specific site in Python, but without success. Any helpers?
import requests
url = 'https://us0.forgeofempires.com/page/'
user_agent = 'Mozilla/5.0 (X11; Ubuntu; Linux x86_64; rv:90.0) Gecko/20100101 Firefox/90.0'
session = requests.Session()
session.headers['User-Agent'] = user_agent
_xsrf = session.cookies.get('_xsrf', domain='.forgeofempires.com')
login = session.post(url, {
'login[userid]':'user',
'login[password]':'pass',
'_xsrf':_xsrf
})
print(login)
f = open('results.html', 'w', encoding = 'utf-8')
f.write(login.text)
f.close()
Update
The following code finally works. I had to figure out by trial and error the fewest headers that needed to be set to get this to work. And, of course, as I previously stated, it is necessary to first visit the URL https://us.forgeofempires.com/glps/iframe-login to have the necessary cookies downloaded before you can proceed with sending the login POST (it is also necessary to extract from the cookies the 'XRRF-TOKEN' value and use this to set the 'x-xsrf-token' header value). And in my earlier attempt I was erroneously sending up the form encoded as JSON.
Note that the output is in JSON format; it is not HTML. Your output file should probably have the file extension of .json.
import requests
initial_url = 'https://us.forgeofempires.com/glps/iframe-login'
login_url = 'https://us.forgeofempires.com/glps/login_check'
session = requests.Session()
user_agent = 'Mozilla/5.0 (X11; Ubuntu; Linux x86_64; rv:90.0) Gecko/20100101 Firefox/90.0'
session.headers['User-Agent'] = user_agent
r = session.get(initial_url)
xsrf_token = session.cookies.get('XSRF-TOKEN', domain='us.forgeofempires.com')
session.headers['x-xsrf-token'] = xsrf_token
session.headers['x-requested-with'] = 'XMLHttpRequest'
login = session.post(login_url, data={
'login[userid]': 'HTWyou',
'login[password]': 'kjh&*76dfgG',
'login[remember_me]': False
})
#f = open('results.html', 'w', encoding = 'utf-8')
f = open('results.json', 'w', encoding='utf-8')
f.write(login.text)
f.close()
"""
# code to print the headers:
hdrs = login.request.headers
for k in sorted(hdrs.keys()):
print(f'{k}: {hdrs[k]}')
"""
print(login.json())
Prints:
{'success': True, 'url': '/', 'player_id': 851846598}
The headers being passed up by the login XHR Request
:authority: us.forgeofempires.com
:method: POST
:path: /glps/login_check
:scheme: https
accept: application/json, text/plain, */*
accept-encoding: gzip, deflate, br
accept-language: en-US,en;q=0.9
content-length: 74
content-type: application/x-www-form-urlencoded; charset=UTF-8
cookie: device_view=full; metricsUvId=da9ca1f1-8e32-411f-ac7d-bf389df25469; portal_tid=1627319201921-36801; portal_ref_session=1; portal_ref_url=https://us0.forgeofempires.com/; portal_data=portal_tid=1627319201921-36801&portal_ref_session=1&portal_ref_url=https://us0.forgeofempires.com/; PHPSESSID=54955p3mbbstetnnquivk12rk2r0ihqmk41nagl620r5g9p5; XSRF-TOKEN=mQxleEoXU2Hyz6LP1ZlM7huHE-DLt5pg8qySl-M1vtI
dnt: 1
origin: https://us.forgeofempires.com
referer: https://us.forgeofempires.com/glps/iframe-login
sec-ch-ua: "Chromium";v="92", " Not A;Brand";v="99", "Google Chrome";v="92"
sec-ch-ua-mobile: ?0
sec-fetch-dest: empty
sec-fetch-mode: cors
sec-fetch-site: same-origin
user-agent: Mozilla/5.0 (Windows NT 10.0; Win64; x64) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/92.0.4515.107 Safari/537.36
x-requested-with: XMLHttpRequest
x-xsrf-token: mQxleEoXU2Hyz6LP1ZlM7huHE-DLt5pg8qySl-M1vtI
The headers being passed up by session.post
Accept: */*
Accept-Encoding: gzip, deflate
Connection: keep-alive
Content-Length: 89
Content-Type: application/x-www-form-urlencoded
Cookie: PHPSESSID=86lis3h0s8291fgkto5r2lcnqr7a6c22nnotveeo2iuuusht; XSRF-TOKEN=Qxm6zfZrDqaI8ce7MDQ5rQqpHHOUozKaFu-NL2Av9KU; device_view=full
User-Agent: Mozilla/5.0 (X11; Ubuntu; Linux x86_64; rv:90.0) Gecko/20100101 Firefox/90.0
x-requested-with: XMLHttpRequest
x-xsrf-token: Qxm6zfZrDqaI8ce7MDQ5rQqpHHOUozKaFu-NL2Av9KU

curl get response 403

How can I get around the 403 code?
I was trying to get search result from a website, however I got "Response[403]" message, I've found similar get solving 403 error by adding headers to request.get, however it didn't work for my problem. What should I do to correctly get the result I want?
Google Chrome Request
Request
Request URL: http://178.32.46.165/
Request Method: GET
Status Code: 200 OK
Remote Address: 178.32.46.165:80
Referrer Policy: strict-origin-when-cross-origin
Request Headers
Accept: text/html,application/xhtml+xml,application/xml;q=0.9,image/avif,image/webp,image/apng,*/*;q=0.8,application/signed-exchange;v=b3;q=0.9
Accept-Encoding: gzip, deflate
Accept-Language: tr-TR,tr;q=0.9
Cache-Control: max-age=0
Connection: keep-alive
Host: 178.32.46.165
Upgrade-Insecure-Requests: 1
User-Agent: Mozilla/5.0 (Windows NT 10.0; Win64; x64) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/85.0.4183.102 Safari/537.36
Python Request
import requests
headers = {
'Connection': 'keep-alive',
'Cache-Control': 'max-age=0',
'Upgrade-Insecure-Requests': '1',
'User-Agent': 'Mozilla/5.0 (Windows NT 10.0; Win64; x64) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/85.0.4183.102 Safari/537.36',
'Accept': 'text/html,application/xhtml+xml,application/xml;q=0.9,image/avif,image/webp,image/apng,*/*;q=0.8,application/signed-exchange;v=b3;q=0.9',
'Accept-Language': 'tr-TR,tr;q=0.9',
}
response = requests.get('http://178.32.46.165/', headers=headers, verify=False)
print(response)
Response
<Response [403]>
If you're getting a 403 response, you're likely not meant to be able to access this resource. This is not a Python issue, this is an issue with whatever website you're trying to access.
Additionally, this seems to be a temporary issue. I pasted your code as-is into my Python prompt and received <Response [200]>
I think it may provide blocking depending on the country.

python post request with cookies asp.net

Please help. I'm trying to create a POST request on an .asp site that requires cookies, but the way I handle them seems not to return anything. Read through some questions of similar topic but can't find the _SessionID cookie some are referring to. Please help me formulate this POST request so it works.
Headers
:authority: safer.fmcsa.dot.gov
:method: POST
:path: /query.asp
:scheme: https
accept: text/html,application/xhtml+xml,application/xml;q=0.9,image/webp,image/apng,*/*;q=0.8,application/signed-exchange;v=b3;q=0.9
accept-encoding: gzip, deflate, br
accept-language: en-US,en;q=0.9
cache-control: max-age=0
content-length: 85
content-type: application/x-www-form-urlencoded
cookie: ASP.NET_SessionId=ywxszihqlu1yciwe5z5gm4qt; etype=au; ASPSESSIONIDQECTCDRB=KGFOBHBBLKCBKBFIAPEBMIHJ; ASPSESSIONIDQGARCDRA=LKEDBAOBMOMDNGBNBFEMMIPB; ASPSESSIONIDSEBQCBSB=DAMJMNKCNJKHCMDCIJBPKEHD; ASPSESSIONIDCEQRADQC=EIEJCDLBHHCCKHCNJNIMHDKA; ASPSESSIONIDAESTCBQC=KPDPJNHCLOBJENEHPNIFKJLH; LI_carrier=67449; ASPSESSIONIDAGSQADRC=CPIBAKEDDPNFCIPLIGLOKKLA; ASPSESSIONIDAERTDBQD=FMKFHJJAJKNIGCBCCFJFCMNF; AWSALB=Xc7OAuZUmx6vgE5l9NaawsH8oBWjy6eZ3B62kw2rZ5HieoRlMu4SSmVVcaPJPcPjp1fVt9U/T9FaRflgNHwtzmsK4X4e+y+yoGArTfgpb75NWo/ilAek0Qk/sFYI; AWSALBCORS=Xc7OAuZUmx6vgE5l9NaawsH8oBWjy6eZ3B62kw2rZ5HieoRlMu4SSmVVcaPJPcPjp1fVt9U/T9FaRflgNHwtzmsK4X4e+y+yoGArTfgpb75NWo/ilAek0Qk/sFYI
origin: https://safer.fmcsa.dot.gov
referer: https://safer.fmcsa.dot.gov/CompanySnapshot.aspx
sec-fetch-dest: document
sec-fetch-mode: navigate
sec-fetch-site: same-origin
sec-fetch-user: ?1
upgrade-insecure-requests: 1
user-agent: Mozilla/5.0 (Macintosh; Intel Mac OS X 10_15_1) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/80.0.3987.132 Safari/537.36
Form Data
searchtype: ANY
query_type: queryCarrierSnapshot
query_param: USDOT
query_string: 2300842
My Code So Far
def checkDOT():
headers = {
'User-Agent': 'Mozilla/5.0 (Windows NT 6.1; Win64; x64; rv:65.0) Gecko/20100101 Firefox/65.0',
'Accept': 'text/html,application/xhtml+xml,application/xml;q=0.9,image/webp,image/apng,*/*;q=0.8,application/signed-exchange;v=b3;q=0.9',
'Accept-Language': 'en-US,en;q=0.9',
'Cache-Control': 'no-cache,no-store,must-revalidate,max-age=0,private',
'upgrade-insecure-requests': '1',
'Connection': 'keep-alive',
'origin': 'https://safer.fmcsa.dot.gov',
'referer': 'https://safer.fmcsa.dot.gov/CompanySnapshot.aspx'
}
s = requests.Session()
data = {
'searchtype': 'ANY',
'query_type': 'queryCarrierSnapshot',
'query_param': 'USDOT',
'query_string': '2300842'
}
params = (
('pageNumber', '0'),
('itemsPerPage', '15'),
)
url = 'https://safer.fmcsa.dot.gov/CompanySnapshot.aspx'
response = s.get(url, headers=headers, data=data, params=params)
if response:
print(response.content)
else:
print("This did not work")
get requests dont use data parameter, and your code is requests.get,is that right?
I can get a html page with:
headers = {
'User-Agent': 'Mozilla/5.0 (Windows NT 6.1; Win64; x64; rv:65.0) Gecko/20100101 Firefox/65.0',
}
url = 'https://safer.fmcsa.dot.gov/CompanySnapshot.aspx'
response = requests.get(url, headers=headers, verify=False)
print(response.text)

Issue in passing session information for scraping

I went to this website
www4.fmovies.to
then I clicked a movie and checked its CDN URL via Inspect->Network
and got below details
https://cdn.mcloud.to/stream/sf:i0:q2:h3:p23:l1/LR6ljfLn3hrEjSfrOp19wg/1542603600/i/f/2/nr69r8/hls/480/480-0013.ts
:authority: cdn.mcloud.to
:method: GET
:path: /stream/sf:i0:q2:h3:p23:l1/LR6ljfLn3hrEjSfrOp19wg/1542603600/i/f/2/nr69r8/hls/480/480-0001.ts
:scheme: https
accept: */*
accept-encoding: gzip, deflate, br
accept-language: en-US,en;q=0.9
cookie: __cfduid=d0847f9ac6d9a8da1dd131d1a0a91ea991542533053; _ga=GA1.2.485859786.1542533055; _gid=GA1.2.1916946057.1542533055; _gat=1
origin: https://mcloud.to
referer: https://mcloud.to/embed/#P#O8SE2916SEOA5?sub.file=https%253A%252F%252Fstatic1.akacdn.ru%252Fsubtitle%252F40039.vtt%253Fv1&ui=oAhi567w9OQEhJWEdbl0s%40Ep0Ir2VvG1xiK9JqKx
user-agent: Mozilla/5.0 (X11; Linux x86_64) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/70.0.3538.102 Safari/537.36
created header information using the above information and then ran
request = requests.get(url, headers=headers)
But am getting 403 Not Authorized. What is the issue?
You need to pass referer header that is the src attribute of video content iframe that looks like
<iframe src="https://mcloud.to/embed/#9#4ZS04Z10SWOE5?ui=pwxi4Kjr6%40wHmIqHcrl0yeFfpYqUUIW1wCKlJr6x" allow="autoplay; fullscreen" scrolling="no" allowfullscreen="yes" style="width: 100%; height: 100%;" frameborder="no"></iframe>
The code looks like
headers = {'user-agent': 'Mozilla/5.0 (Windows NT 6.1; Win64; x64; rv:60.0)Gecko/20100101 Firefox/60.', 'pragma': 'no-cache', 'connection': 'keep-alive', 'cache-control': 'no-cache', 'referer': 'https://mcloud.to/embed/#9#4ZS04Z10SWOE5?ui=pwxi4Kjr6%40wHmIqHcrl0yeFfpYqUUIW1wCKlJr6x'}
requests.get('https://cdn.mcloud.to/stream/sf:i0:q2:h2:p24:l1/WjLDZuCBHmtyv63lT-RoVQ/1542603600/g/c/0/rj0m0m/hls/480/480-0000.ts', headers=headers)

Not able to upload tar.gz file using Python Request Module

Here is what my XHR data looks like when captured in chrome
Request Header
POST my_url?X-Progress-ID=ee821652321919bc7ae61fbe0b625990&userpkgname=file_name.tar.gz HTTP/1.1
Host: 10.110.134.28
Connection: keep-alive
Content-Length: 17461
Accept: application/json, text/plain, */*
Origin: https://10.110.134.28
X-XSRF-TOKEN: bGIwdfFE-oaL_1yVrCzw0iHvv4yUHLC28xjw
User-Agent: Mozilla/5.0 (Windows NT 6.1; Win64; x64) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/53.0.2785.116 Safari/537.36
Content-Type: multipart/form-data; boundary=----WebKitFormBoundarybfK9jSdLoc2Mpj0i
Referer: https://10.110.134.28/
Accept-Encoding: gzip, deflate, br
Accept-Language: en-US,en;q=0.8
Cookie: XSRF-TOKEN=bGIwdfFE-oaL_1yVrCzw0iHvv4yUHLC28xjw; sid=s%3AXglfJNLQ9zzp3eHjQ2QOpk19kFKDDMvJ.ZMKyZd1Gx13lz2MnJgty5WncnilySzfoThGktkhlk4w
Payload
------WebKitFormBoundarybfK9jSdLoc2Mpj0i
Content-Disposition: form-data; name="package"; filename="file_name.tar.gz"
Content-Type: application/x-gzip
------WebKitFormBoundarybfK9jSdLoc2Mpj0i--
And this is how I am building my request.
files = {'package': (<file_name>, open(config_path, 'rb'), 'application/x-gzip')}
request.post(url, files=files)
This is how my request header looks like
{
'Content-Length' : '17449',
'Accept-Encoding' : 'gzip, deflate',
'Accept' : '*/*',
'User-Agent' : 'python-requests/2.10.0',
'Connection' : 'keep-alive',
'Cookie' : 'XSRF-TOKEN=zmklLEL0-gDJOfNBk113MuTpBkLo0j6MAzw0; sid=s%3AC2JZDCfpg_CgkU7qSlS5YTvWXwpgMX35.5nU7W02TPNYtMkIQ4W%2B1bjd87A7KyJbh3shoNqqADXE',
'Content-Type' : 'multipart/form-data; boundary=270d9e02bf214dc7a09c3081cba5b0e0',
'XSRF-TOKEN' : 'zmklLEL0-gDJOfNBk113MuTpBkLo0j6MAzw0'
}
When I make the request I get 502 bad gateway response that too after few seconds while on chrome I get 200 OK instantly
So most probably I am not building my request correctly. Any suggestions?

Categories

Resources