Python Mechanize Prevent Connection:Close - python

I'm trying to use mechanize to get information from a web page. It's basically succeeding in getting the first bit of information, but the web page includes a button for "Next" to get more information. I can't figure out how to programmatically get the additional information.
By using Live HTTP Headers, I can see the http request that is generated when I click the next button within a browser. It seems as if I can issue the same request using mechanize, but in the latter case, instead of getting the next page, I am redirected to the home page of the website.
Obviously, mechanize is doing something different than my browser is, but I can't figure out what. In comparing the headers, I did find one difference, which was the browser used
Connection: keep-alive
while mechanize used
Connection: close
I don't know if that's the culprit, but when I tried to add the header ('Connection','keep-alive'), it didn't change anything.
[UPDATE]
When I click the button for "page 2" within Firefox, the generated http is (according to Live HTTP Headers):
GET /statistics/movies/ww_load/the-fast-and-the-furious-6-2012?authenticity_token=ItU38334Qxh%2FRUW%2BhKoWk2qsPLwYKDfiNRoSuifo4ns%3D&facebook_fans_page=2&tbl=facebook_fans&authenticity_token=ItU38334Qxh%2FRUW%2BhKoWk2qsPLwYKDfiNRoSuifo4ns%3D HTTP/1.1
Host: www.boxoffice.com
User-Agent: Mozilla/5.0 (Windows NT 6.1; rv:18.0) Gecko/20100101 Firefox/18.0
Accept: text/javascript, text/html, application/xml, text/xml, */*
Accept-Language: en-US,en;q=0.5
Accept-Encoding: gzip, deflate
X-Requested-With: XMLHttpRequest
X-Prototype-Version: 1.6.0.3
Referer: http://www.boxoffice.com/statistics/movies/the-fast-and-the-furious-6-2012
Cookie: __utma=179025207.1680379428.1359475480.1360001752.1360005948.13; __utmz=179025207.1359475480.1.1.utmcsr=(direct)|utmccn=(direct)|utmcmd=(none); __qca=P0-668235205-1359475480409; zip=13421; country_code=US; _boxoffice_session=2202c6a47fc5eb92cd0ba57ef6fbd2c8; __utmc=179025207; user_credentials=d3adbc6ecf16c038fcbff11779ad16f528db8ebd470befeba69c38b8a107c38e9003c7977e32c28bfe3955909ddbf4034b9cc396dac4615a719eb47f49cc9eac%3A%3A15212; __utmb=179025207.2.10.1360005948
Connection: keep-alive
When I try to request the same url within mechanize, it looks like this:
GET /statistics/movies/ww_load/the-fast-and-the-furious-6-2012?facebook_fans_page=2&tbl=facebook_fans&authenticity_token=ZYcZzBHD3JPlupj%2F%2FYf4dQ42Kx9ZBW1gDCBuJ0xX8X4%3D HTTP/1.1
Accept-Encoding: identity
Host: www.boxoffice.com
Accept: text/javascript, text/html, application/xml, text/xml, */*
Keep-Alive: 115
Connection: close
Cookie: _boxoffice_session=ced53a0ca10caa9757fd56cd89f9983e; country_code=US; zip=13421; user_credentials=d3adbc6ecf16c038fcbff11779ad16f528db8ebd470befeba69c38b8a107c38e9003c7977e32c28bfe3955909ddbf4034b9cc396dac4615a719eb47f49cc9eac%3A%3A15212
Referer: http://www.boxoffice.com/statistics/movies/the-fast-and-the-furious-6-2012
User-Agent: Mozilla/5.0 (X11; U; Linux i686; en-US; rv:1.9.0.1) Gecko/2008071615 Fedora/3.0.1-1.fc9 Firefox/3.0.1
--
Daryl

The server was checking X-Requested-With and/or X-Prototype-Version, so adding those two headers to the mechanize request fixed it.

Maybe a little late with an answer but i fixed this by adding an line in _urllib2_forked.py
on line 1098 stands the line: headers["Connection"] = "Close"
Change this to:
if not 'Connection' in headers:
headers["Connection"] = "Close"
and make sure you set the header in you script and it will work.
Gr. Squandor

Related

How exactly do I pass a POST body parameter using python requests

I'm trying to provide an "mfa-code" parameter to a post request using python requests, but the response I'm getting is that the parameter "mfa-code" is missing, even though I try and provide it via requests.post(url, data={"mfa-code": "0000"}) and also tried requests.post(url, json={"mfa-code": "0000"}).
Here is what I'm trying to send.
POST /login2 HTTP/1.1
Host: redacted.net
Cookie: session=qexMWyQnLtSlBI8B005qnVW4OYvEwEV2; verify=wiener
User-Agent: Mozilla/5.0 (X11; Linux x86_64; rv:78.0) Gecko/20100101 Firefox/78.0
Accept: text/html,application/xhtml+xml,application/xml;q=0.9,image/webp,*/*;q=0.8
Accept-Language: en-US,en;q=0.5
Accept-Encoding: gzip, deflate
Referer: https://redacted.net/login2
Content-Type: application/x-www-form-urlencoded
Content-Length: 13
Origin: https://redacted.net
Upgrade-Insecure-Requests: 1
Dnt: 1
Sec-Gpc: 1
Te: trailers
Connection: close
mfa-code=0000
And this is the request I'm sending with my python script
import requests
url = "redacted.net"
data={"mfa-code": "0000"}
r = reqeusts.post(url, data=data)
print(r.text)
This results in a response only stating:
"Missing parameter 'mfa-code'"
I took note in the response and how mfa-code is surrounded with ', so I went to burp repeater and put single quotes on mfa-code and sure enough received the same error.
I then tried with other options like json=json.dumps(data), but to the same result as the requests needs a POST body parameter of the type variable=data and not a json object.
What am I missing here?
Or is this something python requests cannot do?

HTTPS - How to actually decrypt client request

I am using the socket library for handling http requests waiting on port 80 for connections (does not really matter right now), which works fine as all responses follow the following format
b"""GET / HTTP/1.1
Host: localhost:8000
Connection: keep-alive
Upgrade-Insecure-Requests: 1
User-Agent: Mozilla/5.0 (Windows NT 10.0; WOW64) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/84.0.4147.135 Safari/537.36 OPR/70.0.3728.189
Accept: text/html,application/xhtml+xml,application/xml;q=0.9,image/webp,image/apng,*/*;q=0.8,application/signed-exchange;v=b3;q=0.9
Sec-Fetch-Site: none
Sec-Fetch-Mode: navigate
Sec-Fetch-User: ?1
Sec-Fetch-Dest: document
Accept-Encoding: gzip, deflate, br
Accept-Language: el-GR,el;q=0.9"""
if you open port 443 or just use https in any browser, when a request is made the data is encrypted. But how can you actually decrypt the data and interact with the client? I've seen many posts about this but no one explains how the data can actually be decrypted. The data that is received always looks something like this and starts the same way with 0x16 and 0x03 bytes
b'\x16\x03\x01\x02\x00\x01\x00\x01\xfc\x03\x03\xfb\'\xa3\xa5\xa4\x1cf\xd1w~(L\xb5%0,\xfb\xa57\xf4\x92\x03}\x84xCIA\xd9}]2 \x15ID\xafU\xb6\xe3\x9d\xbdr\x93 L\x98\rD\xca\xa7\x11\x89\x00`Q\xf5\th\xde\x85S\xf8Q\x98\x00"jj\x13\x03\x13\x01\x13\x02\xcc\xa9\xcc\xa8\xc0+\xc0/\xc0,\xc00\xc0\x13\xc0\x14\x00\x9c\x00\x9d\x00/\x005\x00\n\x01\x00\x01\x91ZZ\x00\x00\x00\x00\x00\x0e\x00\x0c\x00\x00\tlocalhost\x00\x17\x00\x00\xff\x01\x00\x01\x00\x00\n\x00\n\x00\x08\x9a\x9a\x00\x1d\x00\x17\x00\x18\x00\x0b\x00\x02\x01\x00\x00#\x00\x00\x00\x10\x00\x0e\x00\x0c\x02h2\x08http/1.1\x00\x05\x00\x05\x01\x00\x00\x00\x00\x00\r\x00\x14\x00\x12\x04\x03\x08\x04\x04\x01\x05\x03\x08\x05\x05\x01\x08\x06\x06\x01\x02\x01\x00\x12\x00\x00\x003\x00+\x00)\x9a\x9a\x00\x01\x00\x00\x1d\x00 \xa5\x81S\xec\xf4I_\x08\xd2\n\xa6\xb5\xf6E\x9dE\xe6ha\xe7\xfdy\xdab=\xf4\xd3\x1b`V\x94F\x00-\x00\x02\x01\x01\x00+\x00\x0b\nZZ\x03\x04\x03\x03\x03\x02\x03\x01\x00\x1b\x00\x03\x02\x00\x02\xea\xea\x00\x01\x00\x00\x15\x00\xcf\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00'
My question is how can I bring the HTTPS data into a form like the above. I've read about some specific handshake procedures but I could not find something that just answsers telling exactly what to do. Of course I am only asking for development purposes.

What is the purpose of the header element 'x-instagram-ajax' in API calls via Python to Instagram?

On the instagram login page, if one inspects the element of the POST call for the url 'https://www.instagram.com/accounts/web_create_ajax/', it lists the following as headers:
Host: www.instagram.com
User-Agent: Mozilla/5.0 (Macintosh; Intel Mac OS X 10.13; rv:61.0) Gecko/20100101 Firefox/61.0
Accept: */*
Accept-Language: en-US,en;q=0.5
Accept-Encoding: gzip, deflate, br
Referer: https://www.instagram.com/
X-CSRFToken: 7dmO9F3JuVGvSXumd79yByPxnHoWHz1A
X-Instagram-AJAX: c2d8f4380025
Content-Type: application/x-www-form-urlencoded
X-Requested-With: XMLHttpRequest
Content-Length: 102
Cookie: csrftoken=7dmO9F3JuVGvSXumd79yByPxnHoWHz1A; mid=W30zsQAEAAErXHJ3iUojfTceCd53; mcd=3; csrftoken=7dmO9F3JuVGvSXumd79yByPxnHoWHz1A; rur=FTW
Connection: keep-alive
I am wondering if anyone would have any idea what X-Instagram-AJAX is and how I can generate it each time. Is it connected as a pair with X-CSRFToken? Thanks.
Follow, like etc requests working without this header. I don't know what is it but i think instagram dedects suspicious requests with this and then log it. You can get this value on any page in instagram This is x-instagram-ajax value
You can parse it and use.

Authorizate to router panel with Python

I am trying to log into my router's panel using python, but the problem is that I have no idea what the protocol for doing that is. I tried using Wireshark to find out, but it just shows just a GET request and a response. I tried logging in to the router and then searching the username and password in the packets, but it didn't find it. (My guess is that it's encrypted)
If anyone could help me with the protocol of logging in to the panel, it would be greatly appreciated.
Found it. Fllowing the TCP stream gave me the following:
GET / HTTP/1.1
Host: 10.0.0.138
Connection: keep-alive
Cache-Control: max-age=0
Authorization: Basic UG90YXRvOg==
Accept: text/html,application/xhtml+xml,application/xml;q=0.9,image/webp,*/*;q=0.8
Upgrade-Insecure-Requests: 1
User-Agent: Mozilla/5.0 (Windows NT 6.3; WOW64) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/49.0.2623.112 Safari/537.36
Accept-Encoding: gzip, deflate, sdch
Accept-Language: en-US,en;q=0.8,he;q=0.6
HTTP/1.0 401 Unauthorized
WWW-Authenticate: Basic realm="NETGEAR DGN2200v2BEZEQ"
Content-type: text/html
<html>
<head><title>401 Unauthorized</title></head>
<body><h1>401 Unauthorized</h1>
<p>Access to this resource is denied, your client has not supplied the correct authentication.</p></body>
</html>
The username and password are encoded in base64 in the format of username:password.

Openstack Swift logging for temp url and Cross domain

We have our own private cloud where I have installed OpenStack Swift. I have a working node (proxy and storage) that allows me to store and retrieve if I use the openstack and swift python cli to store and retrieve files. Additionally I am able to use the python API on a remote machine to store and retrieve files.
The root of my question is how to debug temp url and crossdomain filter issues. Is there a way to turn on detailed debug logging for these filters?
I have the default logging set to
log_name = swift
log_facility = LOG_LOCAL0
log_level = DEBUG
The situation I am trying to troubleshoot is as follows. When I try and use temp url and cross domain (for CORS), I get a 401. I debugged the code and it appears to be a invalid HMAC error. Based on research, this appears to be a date time issue where the client and the server have missed matched times. However both are running the ntpd service so the time should be in sync.
For CORS, it appears that preflight OPTIONS request is succeeding. The subsequent PUT is failing with a 401....
"No 'Access-Control-Allow-Origin' header is present on the requested resource. Origin 'http://blahost' is therefore not allowed access. The response had HTTP status code 401."
The strange part is the OPTIONS request is returning "access-control-allow-origin" instead of 'Access-Control-Allow-Origin'... the case is off.
Preflight request:
OPTIONS /v1/AUTH_99cf99f26aaa4b2c923806231b03334c/436/88b6d895-6dbf-4f29-904d-96c9b7959016?temp_url_sig=4a953c34372e37b2a22bb31fb0581a7eb7f02cee&temp_url_expires=1441508891 HTTP/1.1
Host: 23.253.200.41:8080
Connection: keep-alive
Pragma: no-cache
Cache-Control: no-cache
Access-Control-Request-Method: PUT
Origin: http://blahhost
User-Agent: Mozilla/5.0 (X11; Linux x86_64) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/44.0.2403.157 Safari/537.36
Access-Control-Request-Headers: accept, content-type
Accept: */*
Referer: http://blahost/binder/436/site/419/folder/17560/file
Accept-Encoding: gzip, deflate, sdch
Accept-Language: en-US,en;q=0.8
Preflight Response:
HTTP/1.1 200 OK
access-control-allow-origin: http://blahost
access-control-allow-methods: HEAD, GET, PUT, POST, COPY, OPTIONS, DELETE
access-control-allow-headers: content-type, accept
Allow: HEAD, GET, PUT, POST, COPY, OPTIONS, DELETE
Content-Length: 0
X-Trans-Id: tx9e359777dfb94148858cd-0055eba012
Date: Sun, 06 Sep 2015 02:08:18 GMT
Connection: keep-alive
Subsequent PUT request(notice it is missing the Access-Control-Allow-Origin)
PUT /v1/AUTH_99cf99f26aaa4b2c923806231b03334c/436/88b6d895-6dbf-4f29-904d-96c9b7959016?temp_url_sig=4a953c34372e37b2a22bb31fb0581a7eb7f02cee&temp_url_expires=1441508891 HTTP/1.1
Host: 23.253.200.41:8080
Connection: keep-alive
Content-Length: 16231
Pragma: no-cache
Cache-Control: no-cache
Accept: */*
Origin: http://blahost
User-Agent: Mozilla/5.0 (X11; Linux x86_64) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/44.0.2403.157 Safari/537.36
Content-Type: application/pdf
Referer: http://blahost/binder/436/site/419/folder/17560/file
Accept-Encoding: gzip, deflate, sdch
Accept-Language: en-US,en;q=0.8
I would appreciate any advice on how to troubleshoot.
Thanks
Greg

Categories

Resources