I am trying to use the requests.Session.get method in Python which can take a headers dictionary as an argument. When I copy headers from "Inspect Element" in Mozilla Firefox they look like this:
Accept: text/html,application/xhtml+xml,application/xml;q=0.9,*/*;q=0.8
Accept-Encoding: gzip, deflate, br
Accept-Language: en-US,en;q=0.5
Connection: keep-alive
Host: cdn.sstatic.net
User-Agent: Mozilla/5.0 (X11; Ubuntu; Linux x86_64; rv:59.0) Gecko/20100101 Firefox/59.0
But session.get needs them as a dictionary:
{"User-Agent":"Mozilla/5.0 (Macintosh; Intel Mac OS X 10_9_5)
AppleWebKit 537.36 (KHTML, like Gecko) Chrome",
"Accept":"text/html,application/xhtml+xml,application/xml;
q=0.9,image/webp,*/*;q=0.8"}
Is there a method in Python that takes a string and inserts characters into it at given places (in this example inserting an " to the two sides of every colon, for every new-line character inserting ", to it's left and " to it's right, and a \ character for line breaking before every new-line character) or just automatically converts strings of this form into a dictionary?
headers_as_text = """Accept: text/html,application/xhtml+xml,application/xml;q=0.9,/;q=0.8
Accept-Encoding: gzip, deflate, br
Accept-Language: en-US,en;q=0.5
Connection: keep-alive
Host: cdn.sstatic.net
User-Agent: Mozilla/5.0 (X11; Ubuntu; Linux x86_64; rv:59.0) Gecko/20100101 Firefox/59.0"""
def get_key_value(text):
colon_position = text.index(":")
if not colon_position:
return text.strip(), ""
return (
text[:colon_position].strip(),
text[colon_position+1:].strip()
)
text_lines = filter(lambda x: x, headers_as_text.split("\n"))
dict_values = dict(map(get_key_value, text_lines))
The final value is a dictionary:
dict_values = {
'Host': 'cdn.sstatic.net',
'Accept-Encoding': 'gzip, deflate, br',
'Connection': 'keep-alive',
'User-Agent': 'Mozilla/5.0 (X11; Ubuntu; Linux x86_64; rv:59.0) Gecko/20100101 Firefox/59.0',
'Accept-Language': 'en-US,en;q=0.5',
'Accept': 'text/html,application/xhtml+xml,application/xml;q=0.9,/;q=0.8'
}
Related
I'm trying to login to a specific site in Python, but without success. Any helpers?
import requests
url = 'https://us0.forgeofempires.com/page/'
user_agent = 'Mozilla/5.0 (X11; Ubuntu; Linux x86_64; rv:90.0) Gecko/20100101 Firefox/90.0'
session = requests.Session()
session.headers['User-Agent'] = user_agent
_xsrf = session.cookies.get('_xsrf', domain='.forgeofempires.com')
login = session.post(url, {
'login[userid]':'user',
'login[password]':'pass',
'_xsrf':_xsrf
})
print(login)
f = open('results.html', 'w', encoding = 'utf-8')
f.write(login.text)
f.close()
Update
The following code finally works. I had to figure out by trial and error the fewest headers that needed to be set to get this to work. And, of course, as I previously stated, it is necessary to first visit the URL https://us.forgeofempires.com/glps/iframe-login to have the necessary cookies downloaded before you can proceed with sending the login POST (it is also necessary to extract from the cookies the 'XRRF-TOKEN' value and use this to set the 'x-xsrf-token' header value). And in my earlier attempt I was erroneously sending up the form encoded as JSON.
Note that the output is in JSON format; it is not HTML. Your output file should probably have the file extension of .json.
import requests
initial_url = 'https://us.forgeofempires.com/glps/iframe-login'
login_url = 'https://us.forgeofempires.com/glps/login_check'
session = requests.Session()
user_agent = 'Mozilla/5.0 (X11; Ubuntu; Linux x86_64; rv:90.0) Gecko/20100101 Firefox/90.0'
session.headers['User-Agent'] = user_agent
r = session.get(initial_url)
xsrf_token = session.cookies.get('XSRF-TOKEN', domain='us.forgeofempires.com')
session.headers['x-xsrf-token'] = xsrf_token
session.headers['x-requested-with'] = 'XMLHttpRequest'
login = session.post(login_url, data={
'login[userid]': 'HTWyou',
'login[password]': 'kjh&*76dfgG',
'login[remember_me]': False
})
#f = open('results.html', 'w', encoding = 'utf-8')
f = open('results.json', 'w', encoding='utf-8')
f.write(login.text)
f.close()
"""
# code to print the headers:
hdrs = login.request.headers
for k in sorted(hdrs.keys()):
print(f'{k}: {hdrs[k]}')
"""
print(login.json())
Prints:
{'success': True, 'url': '/', 'player_id': 851846598}
The headers being passed up by the login XHR Request
:authority: us.forgeofempires.com
:method: POST
:path: /glps/login_check
:scheme: https
accept: application/json, text/plain, */*
accept-encoding: gzip, deflate, br
accept-language: en-US,en;q=0.9
content-length: 74
content-type: application/x-www-form-urlencoded; charset=UTF-8
cookie: device_view=full; metricsUvId=da9ca1f1-8e32-411f-ac7d-bf389df25469; portal_tid=1627319201921-36801; portal_ref_session=1; portal_ref_url=https://us0.forgeofempires.com/; portal_data=portal_tid=1627319201921-36801&portal_ref_session=1&portal_ref_url=https://us0.forgeofempires.com/; PHPSESSID=54955p3mbbstetnnquivk12rk2r0ihqmk41nagl620r5g9p5; XSRF-TOKEN=mQxleEoXU2Hyz6LP1ZlM7huHE-DLt5pg8qySl-M1vtI
dnt: 1
origin: https://us.forgeofempires.com
referer: https://us.forgeofempires.com/glps/iframe-login
sec-ch-ua: "Chromium";v="92", " Not A;Brand";v="99", "Google Chrome";v="92"
sec-ch-ua-mobile: ?0
sec-fetch-dest: empty
sec-fetch-mode: cors
sec-fetch-site: same-origin
user-agent: Mozilla/5.0 (Windows NT 10.0; Win64; x64) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/92.0.4515.107 Safari/537.36
x-requested-with: XMLHttpRequest
x-xsrf-token: mQxleEoXU2Hyz6LP1ZlM7huHE-DLt5pg8qySl-M1vtI
The headers being passed up by session.post
Accept: */*
Accept-Encoding: gzip, deflate
Connection: keep-alive
Content-Length: 89
Content-Type: application/x-www-form-urlencoded
Cookie: PHPSESSID=86lis3h0s8291fgkto5r2lcnqr7a6c22nnotveeo2iuuusht; XSRF-TOKEN=Qxm6zfZrDqaI8ce7MDQ5rQqpHHOUozKaFu-NL2Av9KU; device_view=full
User-Agent: Mozilla/5.0 (X11; Ubuntu; Linux x86_64; rv:90.0) Gecko/20100101 Firefox/90.0
x-requested-with: XMLHttpRequest
x-xsrf-token: Qxm6zfZrDqaI8ce7MDQ5rQqpHHOUozKaFu-NL2Av9KU
I'm trying to post form data to a printers Web interface, i can post fields successfully but there's a certain type of form data which doesn't seem to work/update.
My Code:
import sys
import requests
IP='xxx.xxx.xxx.xxx'
PrinterSession = requests.session()
csrfdata = PrinterSession .get('http://xxx.xxx.xxx.xxx/properties/authentication/login.php')
csrftoken = str(csrfdata.content)[str(csrfdata.content).index('CSRFToken')+18:str(csrfdata.content).index('CSRFToken')+146]
login_data ={
'webUsername' : 'admin',
'webPassword' : 'xxxx',
'CSRFToken' : csrftoken,
'NextPage' : '/properties/authentication/luidLogin.php',
'_fun_function' : 'HTTP_Authenticate_fn',
'frmaltDomain' : 'default',
}
LoginPrinter = PrinterSession.post("http://xxx.xxx.xxx.xxx/userpost/printer.set", data=login_data )
headersSMTP = {
'Accept': 'text/html,application/xhtml+xml,application/xml;q=0.9,image/avif,image/webp,image/apng,* /*;q=0.8,application/signed-exchange;v=b3;q=0.9',
'Accept-Encoding': 'gzip, deflate, br',
'Accept-Language': 'en-US,en;q=0.9',
'Cache-Control': 'max-age=0',
'Connection': 'keep-alive',
'Content-Length': '328',
'Content-Type': 'application/x-www-form-urlencoded',
'Cookie': 'PageToShow=; statusNumNodes=8; statusSelected=n1; frmCompany=; frmIFax=; frmFaxNumber=; frmProtocol=SMB; frmDocumentPath=; frmLoginName=Xerox; frmServerName=; frmNdsContext=; frmSmbShare=Scans; frmNdsTree=; frmIpv6_Host_1=%3A%3A; frmFirstName=David; frmLastName=Fester; frmFriendlyName=SMB%20Scan; frmEmail=dawie180#gmail.com; frmDisplayName=Fester%2C%20David; frmServerVolume=Scans; frmIpv4_1_1=12; frmIpv4_1_2=12; frmIpv4_1_3=12; frmIpv4_1_4=12; frmXrxAdd_1=Hn; frmHnAdd_1=asdasdsdasda; propNumNodes=117; PHPSESSID=457428a534bf077cc6bb0fff7ee80f7f; propSelected=n49; propHierarchy=001010000010000000000000000; WebTimerPopupID=15',
'Host': '10.241.24.28',
'Origin': 'https://10.241.24.28',
'Referer': 'https://10.241.24.28/protocols/smtp/required.php?from=email_req_smtp',
'sec-ch-ua': '" Not A;Brand";v="99", "Chromium";v="90", "Google Chrome";v="90"',
'sec-ch-ua-mobile': '?0',
'Sec-Fetch-Dest': 'frame',
'Sec-Fetch-Mode': 'navigate',
'Sec-Fetch-Site': 'same-origin',
'Sec-Fetch-User': '?1',
'Upgrade-Insecure-Requests': '1',
'User-Agent': 'Mozilla/5.0 (Windows NT 10.0; Win64; x64) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/90.0.4430.85 Safari/537.36',
}
cookietolist=headersSMTP['Cookie'].split(";")
cookietolist[27]=' PHPSESSID='+PrinterSession.cookies.get_dict()['PHPSESSID']
StringCookie = ";".join(cookietolist)
headersSMTP['Cookie']=StringCookie
postSMTPform = {
'_fun_function': 'HTTP_Set_Config_Attrib_fn',
'_fun_function': 'HTTP_CN_Set_fn',
'_fun_function': 'HTTP_SNMP_Set_SvcMon_fn',
'NextPage': '/properties/email/required.php',
'CSRFToken': csrftoken,
'POP3_MAILBOX_ADDRESS': 'smtp.test.com',
'connectivity.smtp.server': 'xxx.xxx.xxx.xxx:25',
}
PostSMTP = PrinterSession.post('http://'+IP+'/dummypost/printer.set', data=postSMTPform, headers=headersSMTP)
Post Request inspected via Chrome:
Request Headers:
Accept: text/html,application/xhtml+xml,application/xml;q=0.9,image/avif,image/webp,image/apng,*/*;q=0.8,application/signed-exchange;v=b3;q=0.9
Accept-Encoding: gzip, deflate, br
Accept-Language: en-US,en;q=0.9
Cache-Control: max-age=0
Connection: keep-alive
Content-Length: 364
Content-Type: application/x-www-form-urlencoded
Cookie: PHPSESSID=c960f2c4fb2f99c44333e601d3cc6a79; propSelected=n1; propNumNodes=117; WebTimerPopupID=2
Host: 10.241.24.28
Origin: https://10.241.24.28
Referer: https://10.241.24.28/protocols/smtp/required.php?from=email_cfg_over
sec-ch-ua: " Not A;Brand";v="99", "Chromium";v="90", "Google Chrome";v="90"
sec-ch-ua-mobile: ?0
Sec-Fetch-Dest: frame
Sec-Fetch-Mode: navigate
Sec-Fetch-Site: same-origin
Sec-Fetch-User: ?1
Upgrade-Insecure-Requests: 1
User-Agent: Mozilla/5.0 (Windows NT 10.0; Win64; x64) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/90.0.4430.93 Safari/537.36
Form Data:
_fun_function: HTTP_Set_Config_Attrib_fn
_fun_function: HTTP_CN_Set_fn
_fun_function: HTTP_SNMP_Set_SvcMon_fn
NextPage: /config_overview/email.php
connectivity.smtp.server: 10.10.10.10:25
POP3_MAILBOX_ADDRESS: test#home.com
CSRFToken: eb590d22d4b0c75038816bd82a26e188f9a49941029c42765dd790cb8acd07b4966f5a910f493f2d839cb36d5d0a77a7632f614e06fcd67019c0f95f508d7da1
So "POP3_MAILBOX_ADDRESS: test#home.com" works fine, updates the SMTP From Address in the printer, all other Fields using the same Uppercase and underscore style key works fine, but the other field "connectivity.smtp.server: 10.10.10.10:25" does not work, it does not update the setting in the printer and it seems every field which uses this lower case seperated by periods style does not work.
The "POP3_MAILBOX_ADDRESS: test#home.com" works fine even without parsing the headers with the post request, i added the headers after and it still doesn't work. Don't know what im doing wrong.
I am trying to download the file from here.
In Firefox, the link redirects me to a webpage where I have to log in with my credentials. After doing so, Firefox automatically prompts me to save the desired file with a pop-up windows.
I am trying to replicate that behavior with Python without success.
I've tried the code from this question, namely
import requests
session = requests.session()
url = "https://metamap.nlm.nih.gov/download/public_mm_linux_main_2020.tar.bz2"
data= {'username': 'xxxxx', 'password':'xxxxxx'}
session.post(url, data=data)
response = requests.get("https://metamap.nlm.nih.gov/download/public_mm_linux_main_2020.tar.bz2")
print(response.text)
Insights from Firefox
Using the Network tab from the Developer Mode, I have been able to see the following:
The POST request, apart from username and password sends also an execution parameter, which seems to be very long hashed string. I don't know what this is or how to replicate this.
POST /cas/login?service=https%3a%2f%2fmetamap.nlm.nih.gov%2fdownload%2fpublic_mm_linux_main_2020.tar.bz2 HTTP/1.1
Host: utslogin.nlm.nih.gov
User-Agent: Mozilla/5.0 (X11; Ubuntu; Linux x86_64; rv:79.0) Gecko/20100101 Firefox/79.0
Accept: text/html,application/xhtml+xml,application/xml;q=0.9,image/webp,*/*;q=0.8
Accept-Language: en-US,en;q=0.5
Accept-Encoding: gzip, deflate, br
Content-Type: application/x-www-form-urlencoded
Content-Length: 4900
Origin: https://utslogin.nlm.nih.gov
Connection: keep-alive
Referer: https://utslogin.nlm.nih.gov/cas/login?service=https%3a%2f%2fmetamap.nlm.nih.gov%2fdownload%2fpublic_mm_linux_main_2020.tar.bz2
Cookie: TGC=exxxxxx; JSESSIONID=xxxxx
Upgrade-Insecure-Requests: 1
Afterwards, there is a GET request to the same address with a ticket parameter.
GET /download/public_mm_linux_main_2020.tar.bz2?ticket=xxxxxx HTTP/1.1
Host: metamap.nlm.nih.gov
User-Agent: Mozilla/5.0 (X11; Ubuntu; Linux x86_64; rv:79.0) Gecko/20100101 Firefox/79.0
Accept: text/html,application/xhtml+xml,application/xml;q=0.9,image/webp,*/*;q=0.8
Accept-Language: en-US,en;q=0.5
Accept-Encoding: gzip, deflate, br
Referer: https://utslogin.nlm.nih.gov/cas/login?service=https%3a%2f%2fmetamap.nlm.nih.gov%2fdownload%2fpublic_mm_linux_main_2020.tar.bz2
Connection: keep-alive
Cookie: MOD_AUTH_CAS_S=xxxxxx
Upgrade-Insecure-Requests: 1
Finally, another GET triggers the download.
GET /download/public_mm_linux_main_2020.tar.bz2 HTTP/1.1
Host: metamap.nlm.nih.gov
User-Agent: Mozilla/5.0 (X11; Ubuntu; Linux x86_64; rv:79.0) Gecko/20100101 Firefox/79.0
Accept: text/html,application/xhtml+xml,application/xml;q=0.9,image/webp,*/*;q=0.8
Accept-Language: en-US,en;q=0.5
Accept-Encoding: gzip, deflate, br
Referer: https://utslogin.nlm.nih.gov/cas/login?service=https%3a%2f%2fmetamap.nlm.nih.gov%2fdownload%2fpublic_mm_linux_main_2020.tar.bz2
Connection: keep-alive
Cookie: MOD_AUTH_CAS_S=xxxxx
Upgrade-Insecure-Requests: 1
In between all of this, cookies are set and read.
I don't know what information is useful. I'm putting here all that I can see.
Than you in advanced.
EDIT:
After more exploration I am trying to replicate the 3 request pattern seen in Mozilla/Chrome.
This is my code now:
import requests
url_1 = "https://utslogin.nlm.nih.gov/cas/login"
# url_2 = "https://metamap.nlm.nih.gov/download/public_mm_linux_main_2020.tar.bz2"
params = {'service': 'https%3a%2f%2fmetamap.nlm.nih.gov%2fdownload%2fpublic_mm_linux_main_2020.tar.bz2'}
data = {'username': 'xxxxx',
'password':'xxxxx',
'execution': '1cb37c1a-141b-4891-b961-ea5c4b20deed_ZXlKaGJHY2lPaUpJVXpVeE1pSjkuWXpWb1EzaFdTMmxOZGxsdUwwUlFiWEozSzBkSlVuUXhURlpuZEROSmVpdGtObFpVYjFkU01FMWlhMjkxS3lzNWVrbHlNblptTWt4MVdrdE9SU3RJTmsxTGNISmhNblYxTmtOVVExWkJPWFJZUkVKbFREWk1WV2RaZEhSSU5qbHNNbTlZVldONGMyTndNWGgxVGtWalZtcHRkRWMxZGxaSGNsVmtTVmh6UzJSWmFEQjBVRmRMUlVSalRsTk5WVVYxYlU4elRHUjNZV0pSWVhCblVsQXdjemRhZW5ac1NHeDNhMU5pYjA5UlVYTk1ZWHB1UWtkeGJtUTFRa1V5T0RseVNWcHlSa3RZY2pGS09FeFNjekZNY21KVmFsaE1la3RKZFdsd1pGaEhNVmhIY3pJMU4zRkZVVUZrYTA1aE5IZGljMlZZWTA5Q1kyNXFTbTlUUkRsR1UxcHdVa2wyUlhaU1dsbFpaRU5yVFhwQmQwOWlOVTh4WTNaVmNIWXJaalJqUmxGMU4xSmxVVWx3VjNwR1ZrdDROamhaTjI5UlJXTjVXbXg2YVdGbE5HTlpkM0JxUW5kVWJUaHVXVGx6V0VKek5VRmpjV2c0TmpsSlUydHJRWEZUVVRaeWNEUTNSVko2ZGtnclltZG5iRkY2VW5oQmMwcFVSRXAzV0c5eVJqSTVNSEJXTVhOSWNpczFMMVp3VDFWRlJsUnJlSE5ETlVaS2VYQjJPV2hWU2pOVVZ6RlBMM0pDYzBOWlQzbEJkV3g1TWtSYWMwaEhjR0ZSUjI5NGFIZGpha2RJTkVzeVRGaFRjMnh0UldKSFdVWm9halF6YUhKa04wUnNabmQ2U1ZaUE1tWkRUR2RPTlhORVJHOXhNbWR5WWxCWldFUkJjMmQyY1dGamVWTjZZMWRCUTBOclRXOW9XWGRpUkVKS09XZGlSVzl1WVZOQlpUVjZZamhSVEhKdGEyYzNUM0pxYVhCdmJUTmxRbWMwVXpJMGIycE5WRXBXZVhKV2NWRTJNSEJ6UTNobFVUZzJOa1ZMWlhkMGJGQjBRbVIwUjBGNlMyNURPVEZ6UWtWS1RHbGhWRzR4Ulc0MU0ybHRaRVIzVm1Ga1ZGVTVVemQ0UVd4Q2JtUndTMEZ1Tmtac1l6SmtWRVpxYUhKTlRFVnljMnhCV1VkMk5YaEZkbEpqZDBoSUwzQkZNVlJVYTFkUmFIUjBhMXBwTVZSdWRXcFdkR1EzYkhoQ2JXZDFjekZyV1hGbVVVaDBRV1JWTlZwQk1HSkRSV3BSWmxwSVNDOVNRbTR4V1dkNmJXNVlWRzl6VWxKWlRHUkdTV2h4TUhkVWRrMW9XR3AyVEdGRVZYRkhiMnB1Y2pSemFXNVBRa3h1Y0VGTUszWk9TekpYUjFsRVNYSkdOSFJ3ZVhwT1dqUldVMFozWVRoRVdIRkRZbmxEVGxSdWRqaE1SQzg0V2swdmJYcGpNRmxoUkVObllsVmtNVWRxYmpSM01VUlhURlpWY2t0VVJGRTNhMVZ4WlhSa2JVTnlWVVU0U2pWd2VDOWxZVFJZVmpoU1ZqWkxNVlJDVVUxS1lsUlJZM1k1ZWtwNmNqQTBZbFJXVUhaNmQxUnNkSFpuTWxsQmMxSkpjbWhOZW1SQ1ZEVllWeXR5ZFd0WFRsRkNOVkZNV2xweFRFVlpRelpSV0VoUWIyWlJZMjFSU1hScFJuQkhlRVI0Um5aV1l6RnJXazlpUlRjcmVFeE5ORWQwTTAxWVNYbGtTVzVLYnpsVFdXOUJhVXB5WWpGek1ERjZkMGxsZEVVMWRWTnVXbFprTjIwMFQza3lNWGgxUkhFME1FVlVLMXBqUzJWM1NEaFFPVFkwVEVwR2RtbElVelUzUVhJM1NHMVphMUJOVjJsUGFuZHJWRzVvT0hWdkwzVnVOVnBDV2tGSmJ6UkVWSFUwV2lzd1MwRXhSazlaUW5ZNE9EbFBkekZLZWpWa2RqZ3JUekEwT0ZaeGEzVmthamw0YWpSU09EUmFPRFIzT1VoUGN6QlpOVTlQZDFFMVdtWTJRMjFRYTFSUUx6aGhZMGxETTJsUVZFcHpMeXM0ZEVWS1MzaFdiRnBTU1RaU1dYYzJZWGR0UWpaUldsSXJVV2N3YVRZNWNIbEJSbUZGV2k4eFkwNTFUakUzVm5oWGFWbDVla1JhTlhKeVQzRkRSM05hTUVST1EyNUNTVkJFVkhKMldIRnpha3huVlZSNmQwTkdWVGN4YVhRMVFURldXUzlXUjB4a1VIUndZMFJSTTFobEwzSjZMemc1SzNoVGJGUjNkV3A2ZEd3elZtZFpPRVV2YVhaMlJUVXJkREJxWmtsSlZIRnhNa050ZGpSYUwwZEpVV3BXVVdodFFYbG5SazVOZUZCT2IycENVelkyT1dGRWIwUXlXblZFVFdVMmRFRlhXaXN5TWpVekwyZzRWSHBWZEdWU1pDczVURlI2Y0haYVRubzJXblpJYWl0Qk9Dc3djMlZIU0VJd2EwaFpNVWwxZEV4NE1ESnBWVFJOTUVSNVRXTk9Za3hDYjBwRWNtNUhVVzlSYjBoNk5rOXllSEpZYW5oUVdXbzRkeTkzYjJRM1dqSnRhekZNTVZKaGMyTjRXRGQ2V25sMFZpOVhVakUwTDBZM00yMTZVMDV0UTJRMFUweHBWbmhCUW14ME1ETmlWVGs0VHpaRFpIcGpjMk5xWVhWMGVtbzBTVmhpTTNWeVRHUnJSbmt3TVV4RlVVWkNha2Q1VkVkVWFuTnFOblpuTm1KTWNqZFRiVVZHTmxoTlYwUk5TbG8wZUdKTVNWbG9OU3NyVFRKQ2VGQktTek5hV0RKbVRraHdUbVY2YVdoUk9HMXNPVk42VVRsaGRuYzNaVlZoUzNkaFJHdEpZMjRyUkhwVGIzRjFTVlpZZWxGNllUZENiMVUwU1hOQlVteEtkVkpSZDFoak1TdExjR1JXV0dwWmFYbGtMMU5hYTBjMk5sSTFOSGRoTTBaVGFrZzVia1JOTmpWV0wzQnhaalZVS3pGNWJITm9SV1puVmprMFNIUm1halozWW10Q1VXRldTMGh5Wld4MFMzQkdUMFIyTmxCVmFUZEVLemhTTDNGVWVuTnNNa2hzUWpOUVlrZHdhSGRRVUZadGJWTnNPSFJHYVhoQ1RIVk9iMkZIUVVSdE1WSXpWM0pRZWt0aGMzcExhbWwxUWxNMFJWWXhjSE5uVVRZMVJsQkhXbEJYZDNORFVVaHZRM00wTkRaS1lsSkhaRll6UlhsS09IUmFNMHMxYkhwTk1XYzVSak5yYmpVclVqZzNWRGhNZEVwb1ZEQlRiMWN4VkZkdU1VTlplVmgzTlVRMllYUkpRbEJ5T0N0NVVYZDRhVkpsU1ZsTmMwaHpRelFyVFRBNU5GUnhVVWRUTTJjeFNWZEdTa0Z2YjJsTVFreHFXRTUwYkhveVZtTmhTSE5DVVdaTU1rTkhSR1JVWkRCMkt6RnRhbkpPVVhOeVVFMUZaMUpQVFhZMVJrUm9XVlZpSzA1SFUwMWhlVlJNV2taRGNqTmFVMFZpTkdsT1dVVjJUVzR5Y2pkTk4zQnFhMFZOWjJoSFpsaElXbGxVVjFCalJraGFaak5RYjA4eFJVcDBURUpUYURaeWJXbGlTek4xVG1aUVNYSlJaVEU1V213Mk1qUTJia3RTV1dvemVEVkZNVWhZYVdJMFVtUkJTWGxNWmsxS09WRmxRWFJ3VG1zdmVDdHlNaTlTWnpKSGRrZHRTMWhyV2xKbFUyczVSWFJSVTNaa1JrRlRRMnRFZUZFdlVXRkhOVGhpT0RGaUswTjFRbGd3ZEVSTmJsTkxiVVV4YVdGVVMwaFBLMXBpV2pWa1NXSjZOM0J2YlRoclRubHpOMFJ2ZVZWbk9ITnBlREZKWWtNckx6SnBkRmhtWkhGa05HZ3dhVEkzVTNwWFJFOXpWMnRqZURkSUsyZEtTRGxSYzNKeGMyWmxZbEJOU1hnekwwY3pWbmRNV0hScmMyVk5ZVzAzYW5nemQwWndSbVZFYTA5YWIzWndZVVpJVGk5TVZHVk9lRlIwU2poUlpWbDFaVTFaY1dGbGEzWkROVmxUWjNoa2JVMURPVkpWWm1WRlJYb3JiWEpaYkU4NFYzb3JkRTlYUWxOQlZFaFViMHhoVEhKVGQwMU5PVEpHV1dFNFJGb3hWM04zY214SGVGcFNlbEZ2VTFWNEsyaDNLMEpITDJaMGEycGxhM0pEZW0xTWMzUm9PSE4zUjJaV1FrcHRSVk4wVkc1M2FFRmxSbVJUVFZWelpFaEdXSEZyVkhkMU9HWmlabkZKWlhsdU16RlpOREpQYTJabldGaEtlRzR2WjNVM2JUWkJQVDAuSXFFcjcxblloaURBYjl6QUtnN2hRVVNoaE9QbVZQWlhnT1NZQ0lVN3E3NWw0TGtHS2x4OWhDbzVkMVNRQlBnaVRfbkQtemVEdHpyS0F4NldvV0JLZ1E=',
'submit': 'LOGIN'}
r = requests.get(url_1, params = params)
print(r.headers)
print(r.cookies)
print(r.status_code)
headers = {'Accept': 'text/html,application/xhtml+xml,application/xml;q=0.9,image/webp,image/apng,*/*;q=0.8,application/signed-exchange;v=b3;q=0.9',
'Accept-Encoding': 'gzip, deflate, br',
'Content-Type': 'application/x-www-form-urlencoded',
'Sec-Fetch-Dest': 'document',
'Sec-Fetch-Mode': 'navigate',
'Sec-Fetch-Site': 'same-origin',
'Sec-Fetch-User': '?1',
'User-Agent': 'Mozilla/5.0 (Windows NT 10.0; Win64; x64) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/84.0.4147.135 Safari/537.36',
'Upgrade-Insecure-Requests': 1,
'Referer': 'https://utslogin.nlm.nih.gov/cas/login?service=https%3a%2f%2fmetamap.nlm.nih.gov%2fdownload%2fpublic_mm_linux_main_2020.tar.bz2',
'Origin': 'https://utslogin.nlm.nih.gov',
'Host': 'utslogin.nlm.nih.gov',
'Connection': 'keep-alive',
'Cache-Control': 'max-age=0',
'Accept-Language': 'en-GB,en-US;q=0.9,en;q=0.8,es-CO;q=0.7,es-AR;q=0.6,es;q=0.5,it;q=0.',
'Content-Length': 4634}
r_2 = requests.post(url_1, data=data, cookies=r.cookies, params = params)
print("-----------------")
print(r_2.headers)
print(r_2.cookies)
print(r_2.status_code)
execution and submit are parameters also passed during the POST request. With the first GET I emulate the normal loading of the page, after which a cookie is given to me. Using that cookie and submitting the form, I expect to get back another cookie and a ticket with which to download the file. However, instead of getting this, what I see in output is the response header of another GET as is anything had happened.
By the way, to whomever wants to help me: the credentials from that site are acquired prior by registering and some hours may pass between registration and credential arrival.
Please help. I'm trying to create a POST request on an .asp site that requires cookies, but the way I handle them seems not to return anything. Read through some questions of similar topic but can't find the _SessionID cookie some are referring to. Please help me formulate this POST request so it works.
Headers
:authority: safer.fmcsa.dot.gov
:method: POST
:path: /query.asp
:scheme: https
accept: text/html,application/xhtml+xml,application/xml;q=0.9,image/webp,image/apng,*/*;q=0.8,application/signed-exchange;v=b3;q=0.9
accept-encoding: gzip, deflate, br
accept-language: en-US,en;q=0.9
cache-control: max-age=0
content-length: 85
content-type: application/x-www-form-urlencoded
cookie: ASP.NET_SessionId=ywxszihqlu1yciwe5z5gm4qt; etype=au; ASPSESSIONIDQECTCDRB=KGFOBHBBLKCBKBFIAPEBMIHJ; ASPSESSIONIDQGARCDRA=LKEDBAOBMOMDNGBNBFEMMIPB; ASPSESSIONIDSEBQCBSB=DAMJMNKCNJKHCMDCIJBPKEHD; ASPSESSIONIDCEQRADQC=EIEJCDLBHHCCKHCNJNIMHDKA; ASPSESSIONIDAESTCBQC=KPDPJNHCLOBJENEHPNIFKJLH; LI_carrier=67449; ASPSESSIONIDAGSQADRC=CPIBAKEDDPNFCIPLIGLOKKLA; ASPSESSIONIDAERTDBQD=FMKFHJJAJKNIGCBCCFJFCMNF; AWSALB=Xc7OAuZUmx6vgE5l9NaawsH8oBWjy6eZ3B62kw2rZ5HieoRlMu4SSmVVcaPJPcPjp1fVt9U/T9FaRflgNHwtzmsK4X4e+y+yoGArTfgpb75NWo/ilAek0Qk/sFYI; AWSALBCORS=Xc7OAuZUmx6vgE5l9NaawsH8oBWjy6eZ3B62kw2rZ5HieoRlMu4SSmVVcaPJPcPjp1fVt9U/T9FaRflgNHwtzmsK4X4e+y+yoGArTfgpb75NWo/ilAek0Qk/sFYI
origin: https://safer.fmcsa.dot.gov
referer: https://safer.fmcsa.dot.gov/CompanySnapshot.aspx
sec-fetch-dest: document
sec-fetch-mode: navigate
sec-fetch-site: same-origin
sec-fetch-user: ?1
upgrade-insecure-requests: 1
user-agent: Mozilla/5.0 (Macintosh; Intel Mac OS X 10_15_1) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/80.0.3987.132 Safari/537.36
Form Data
searchtype: ANY
query_type: queryCarrierSnapshot
query_param: USDOT
query_string: 2300842
My Code So Far
def checkDOT():
headers = {
'User-Agent': 'Mozilla/5.0 (Windows NT 6.1; Win64; x64; rv:65.0) Gecko/20100101 Firefox/65.0',
'Accept': 'text/html,application/xhtml+xml,application/xml;q=0.9,image/webp,image/apng,*/*;q=0.8,application/signed-exchange;v=b3;q=0.9',
'Accept-Language': 'en-US,en;q=0.9',
'Cache-Control': 'no-cache,no-store,must-revalidate,max-age=0,private',
'upgrade-insecure-requests': '1',
'Connection': 'keep-alive',
'origin': 'https://safer.fmcsa.dot.gov',
'referer': 'https://safer.fmcsa.dot.gov/CompanySnapshot.aspx'
}
s = requests.Session()
data = {
'searchtype': 'ANY',
'query_type': 'queryCarrierSnapshot',
'query_param': 'USDOT',
'query_string': '2300842'
}
params = (
('pageNumber', '0'),
('itemsPerPage', '15'),
)
url = 'https://safer.fmcsa.dot.gov/CompanySnapshot.aspx'
response = s.get(url, headers=headers, data=data, params=params)
if response:
print(response.content)
else:
print("This did not work")
get requests dont use data parameter, and your code is requests.get,is that right?
I can get a html page with:
headers = {
'User-Agent': 'Mozilla/5.0 (Windows NT 6.1; Win64; x64; rv:65.0) Gecko/20100101 Firefox/65.0',
}
url = 'https://safer.fmcsa.dot.gov/CompanySnapshot.aspx'
response = requests.get(url, headers=headers, verify=False)
print(response.text)
I went to this website
www4.fmovies.to
then I clicked a movie and checked its CDN URL via Inspect->Network
and got below details
https://cdn.mcloud.to/stream/sf:i0:q2:h3:p23:l1/LR6ljfLn3hrEjSfrOp19wg/1542603600/i/f/2/nr69r8/hls/480/480-0013.ts
:authority: cdn.mcloud.to
:method: GET
:path: /stream/sf:i0:q2:h3:p23:l1/LR6ljfLn3hrEjSfrOp19wg/1542603600/i/f/2/nr69r8/hls/480/480-0001.ts
:scheme: https
accept: */*
accept-encoding: gzip, deflate, br
accept-language: en-US,en;q=0.9
cookie: __cfduid=d0847f9ac6d9a8da1dd131d1a0a91ea991542533053; _ga=GA1.2.485859786.1542533055; _gid=GA1.2.1916946057.1542533055; _gat=1
origin: https://mcloud.to
referer: https://mcloud.to/embed/#P#O8SE2916SEOA5?sub.file=https%253A%252F%252Fstatic1.akacdn.ru%252Fsubtitle%252F40039.vtt%253Fv1&ui=oAhi567w9OQEhJWEdbl0s%40Ep0Ir2VvG1xiK9JqKx
user-agent: Mozilla/5.0 (X11; Linux x86_64) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/70.0.3538.102 Safari/537.36
created header information using the above information and then ran
request = requests.get(url, headers=headers)
But am getting 403 Not Authorized. What is the issue?
You need to pass referer header that is the src attribute of video content iframe that looks like
<iframe src="https://mcloud.to/embed/#9#4ZS04Z10SWOE5?ui=pwxi4Kjr6%40wHmIqHcrl0yeFfpYqUUIW1wCKlJr6x" allow="autoplay; fullscreen" scrolling="no" allowfullscreen="yes" style="width: 100%; height: 100%;" frameborder="no"></iframe>
The code looks like
headers = {'user-agent': 'Mozilla/5.0 (Windows NT 6.1; Win64; x64; rv:60.0)Gecko/20100101 Firefox/60.', 'pragma': 'no-cache', 'connection': 'keep-alive', 'cache-control': 'no-cache', 'referer': 'https://mcloud.to/embed/#9#4ZS04Z10SWOE5?ui=pwxi4Kjr6%40wHmIqHcrl0yeFfpYqUUIW1wCKlJr6x'}
requests.get('https://cdn.mcloud.to/stream/sf:i0:q2:h2:p24:l1/WjLDZuCBHmtyv63lT-RoVQ/1542603600/g/c/0/rj0m0m/hls/480/480-0000.ts', headers=headers)