MITM Proxy, getting entire request and response string - python

I am using mitmproxy for intercepting traffic. What I want is to be able to get the entire request and response in a string. I know that you have def response(context, flow) and that the HTTPFlow object has the request and response objects. What I want is simply something like this in a string
GET http://www.google-analytics.com/collect?v=1& HTTP/1.1
Header 1: value
Header 2: value
request body
and this
HTTP/1.1 301 Moved Permanently
Header 1: value
Header 2: value
response body
Now I've been trying this by joing the different parts and bits of the requests and responses but that is prone to errors. Is there a better way to do this?
Also, does mitm handle Gzip encoded response bodies?

If some one bumps into this; the above answer does not work for mitmproxy 4. Instead one can use this:
from mitmproxy.net.http.http1.assemble import assemble_request
def response(flow):
print(assemble_request(flow.request).decode('utf-8'))

You can get the whole request/response object as a string using flow.request.assemble(). If you want the request/response without transfer-encoding (gzip), you can use the decoded decorator:
from libmproxy.protocol.http import decoded
with decoded(flow.request):
data = flow.request.assemble()
Apart from that, you may find https://github.com/mitmproxy/mitmproxy/tree/master/examples very useful.

Related

Python request gives 415 error while post data unless data = {'key':'value'}

I am doing a straight forward request as follows.
import requests
def user_transactions():
url = 'https://webapi.coinfloor.co.uk/v2/bist/XBT/GBP/user_transactions/'
data = {'key':'value'}
r = requests.post(url, data=data, auth=("some_username", "some_password") )
print(r.status_code)
print(r.text)
return
Even though data= is optional in the documents.
https://www.w3schools.com/python/ref_requests_post.asp
If i comment out the data variable then the routine returns a
status_code=415 error.
If i include in the data variable then the routine returns a status_code=200 success.
I have tried to look this up, for example here:
Python request gives 415 error while post data , but with no answer.
The question is: Why is it the case that [1] fails but [2] works ?
Yes, data is optional on the python side. The requests library will happily send a empty request to the server, as you can see. If the argument was not optional, the program would crash before sending a request so there would be no status code.
However, the server needs to be able to process the request. If it does not like what you sent for whatever reason, it might send back a 4xx status code, or otherwise not do what you expect.
In this case, it throws an error that the data is in invalid format. How can a empty request be in invalid format? Because the format is specified in a header. If you supply a data argumet requests will send data in urlencoded format, and specify in the header what format the data is in. If the data is empty, the request will be empty but the header will still be there. This site apparently requires the header to specify a data format it knows.
You can solve this in two ways, giving an empty object:
r = requests.post(url, data={}, auth=("some_username", "some_password") )
Or by explicitly specifying the header:
r = requests.post(url, auth=(...), headers={'Content-Type': 'application/x-www-form-urlencoded'})
Side note: You should not be using W3Schools as a source. It is frequently inaccurate and often recommends bad practices.
I think you are mistaking the documentation of the requests.post function signature with API documentation. It is saying that data is a keyword argument, not that the API optionally takes data.
It depends on the API endpoint you are trying to use. That endpoint must require data to be sent with the request. If you look at the documentation for the API you are using, it will mention what needs to be sent for a valid request.

Escaping string in json dictionary python request

So I've got a python application that is using requests.post to make a post request with json headers, body info, etc.
Problem is that in my dictionary that gets sent as headers, I have a variable that often contains character groups like %25"" or "%2F", etc. I've seen this cause problems before if sent in body data, but that can be fixed by sending the body data as a sting rather than a dictionary. Haven't figured out how to make this work with the headers though, as you can't simply delimit the parameters with an ampersand like in body data.
How do I make sure that my cookie value is not altered in the process of the post request?
For instance, headers :
Host : blahblah.com
Connection : Keep-Alive
Cookie : My sensitive string with special characters
etc.
Note : Nothing server-side can be changed. The python application is being used for hired pentesting services.
A common technique for sending data that becomes a mess when transmitted is to encode it, especially as base64
Sender:
import base64
...
encoded_data = "base64:{}".format(base64.b64encode(data))
Receiver:
import base64
...
if encoded_data.startswith("base64:"):
data = base64.b64decode(encoded_data.split(':')[1])

Receiving 500 HTTP response when posting to website

I am attempting am attempting to extract some information from a website that requires a post to an ajax script.
I am trying to create an automated script however I am consitently running into an HTTP 500 error. This is in contrast to a different data pull I did from a
url = 'http://www.ise.com/ExchangeDataService.asmx/Get_ISE_Dividend_Volume_Data/'
paramList = ''
paramList += '"' + 'dtStartDate' + '":07/25/2014"'
paramList += ','
paramList += '"' + 'dtEndDate' + '":07/25/2014"';
paramList = '{' + paramList + '}';
response = requests.post(url, headers={
'Content-Type': 'application/json; charset=UTF-8',
'data': paramList,
'dataType':'json'
})
I was wondering if anyone had any recommendations as to what is happening. This isn't proprietary data as they allow you to manually download it in excel format.
The input you're generating is not valid JSON. It looks like this:
{"dtStartDate":07/25/2014","dtEndDate":07/25/2014"}
If you look carefully, you'll notice a missing " before the first 07.
This is one of many reasons you shouldn't be trying to generate JSON by string concatenation. Either build a dict and use json.dump, or if you must, use a multi-line string as a template for str.format or %.
Also, as bruno desthuilliers points out, you almost certainly want to be sending the JSON as the POST body, not as a data header in an empty POST. Doing it the wrong way does happen to work with some back-ends, but only by accident, and that's certainly not something you should be relying on. And if the server you're talking to isn't one of those back-ends, then you're sending the empty string as your JSON data, which is just as invalid.
So, why does this give you a 500 error? Probably because the backend is some messy PHP code that doesn't have an error handler for invalid JSON, so it just bails with no information on what went wrong, so the server can't do anything better than send you a generic 500 error.
If that's a copy/paste from you actual code, 'data' is probably not supposed to be part of the request headers. As a side note: you don't "post to an ajax script", you post to an URL. The fact that this URL is called via an asynchronous request from some javascript on some page of the site is totally irrelevant.
it sounds like a server error. So what your posting could breaking their api due to its formatting.
Or their api could be down.
http://pcsupport.about.com/od/findbyerrormessage/a/500servererror.htm

Urllib's urlopen breaking on some sites (e.g. StackApps api): returns garbage results

I'm using urllib2's urlopen function to try and get a JSON result from the StackOverflow api.
The code I'm using:
>>> import urllib2
>>> conn = urllib2.urlopen("http://api.stackoverflow.com/0.8/users/")
>>> conn.readline()
The result I'm getting:
'\x1f\x8b\x08\x00\x00\x00\x00\x00\x04\x00\xed\xbd\x07`\x1cI\x96%&/m\xca{\x7fJ\...
I'm fairly new to urllib, but this doesn't seem like the result I should be getting. I've tried it in other places and I get what I expect (the same as visiting the address with a browser gives me: a JSON object).
Using urlopen on other sites (e.g. "http://google.com") works fine, and gives me actual html. I've also tried using urllib and it gives the same result.
I'm pretty stuck, not even knowing where to look to solve this problem. Any ideas?
That almost looks like something you would be feeding to pickle. Maybe something in the User-Agent string or Accepts header that urllib2 is sending is causing StackOverflow to send something other than JSON.
One telltale is to look at conn.headers.headers to see what the Content-Type header says.
And this question, Odd String Format Result from API Call, may have your answer. Basically, you might have to run your result through a gzip decompressor.
Double checking with this code:
>>> req = urllib2.Request("http://api.stackoverflow.com/0.8/users/",
headers={'Accept-Encoding': 'gzip, identity'})
>>> conn = urllib2.urlopen(req)
>>> val = conn.read()
>>> conn.close()
>>> val[0:25]
'\x1f\x8b\x08\x00\x00\x00\x00\x00\x04\x00\xed\xbd\x07`\x1cI\x96%&/m\xca{\x7fJ'
Yes, you are definitely getting gzip encoded data back.
Since you seem to be getting different results on different machines with the same version of Python, and in general it looks like the urllib2 API would require you do something special to request gzip encoded data, my guess is that you have a transparent proxy in there someplace.
I saw a presentation by the EFF at CodeCon in 2009. They were doing end-to-end connectivity testing to discover dirty ISP tricks of various kinds. One of the things they discovered while doing this testing is that a surprising number of consumer level NAT routers add random HTTP headers or do transparent proxying. You might have some piece of equipment on your network that's adding or modifying the Accept-Encoding header in order to make your connection seem faster.

Python urllib2 Response header

I'm trying to extract the response header of a URL request. When I use firebug to analyze the response output of a URL request, it returns:
Content-Type text/html
However when I use the python code:
urllib2.urlopen(URL).info()
the resulting output returns:
Content-Type: video/x-flv
I am new to python, and to web programming in general; any helpful insight is much appreciated. Also, if more info is needed please let me know.
Thanks in advance for reading this post
Try to request as Firefox does. You can see the request headers in Firebug, so add them to your request object:
import urllib2
request = urllib2.Request('http://your.tld/...')
request.add_header('User-Agent', 'some fake agent string')
request.add_header('Referer', 'fake referrer')
...
response = urllib2.urlopen(request)
# check content type:
print response.info().getheader('Content-Type')
There's also HTTPCookieProcessor which can make it better, but I don't think you'll need it in most cases. Have a look at python's documentation:
http://docs.python.org/library/urllib2.html
Content-Type text/html
Really, like that, without the colon?
If so, that might explain it: it's an invalid header, so it gets ignored, so urllib guesses the content-type instead, by looking at the filename. If the URL happens to have ‘.flv’ at the end, it'll guess the type should be video/x-flv.
This peculiar discrepancy might be explained by different headers (maybe ones of the accept kind) being sent by the two requests -- can you check that...? Or, if Javascript is running in Firefox (which I assume you're using when you're running firebug?) -- since it's definitely NOT running in the Python case -- "all bets are off", as they say;-).
Keep in mind that a web server can return different results for the same URL based on differences in the request. For example, content-type negotiation: the requestor can specify a list of content-types it will accept, and the server can return different results to try to accomodate different needs.
Also, you may be getting an error page for one of your requests, for example, because it is malformed, or you don't have cookies set that authenticate you properly, etc. Look at the response itself to see what you are getting.
according to http://docs.python.org/library/urllib2.html there is only get_header() method and nothing about getheader .
Asking because Your code works fine for
response.info().getheader('Set cookie')
but once i execute
response.info().get_header('Set cookie')
i get:
Traceback (most recent call last):
File "baza.py", line 11, in <module>
cookie = response.info().get_header('Set-Cookie')
AttributeError: HTTPMessage instance has no attribute 'get_header'
edit:
Moreover
response.headers.get('Set-Cookie') works fine as well, not mentioned in urlib2 doc....
for getting raw data for the headers in python2, a little bit of a hack but it works.
"".join(urllib2.urlopen("http://google.com/").info().__dict__["headers"])
basically "".join(list) will the list of headers, which all include "\n" at the end.
__dict__ is a built in python variable for all dicts, basically you can select a list out of a 2d array with it.
and ofcourse ["headers"] is selecting the list value from the .info() response value dict
hope this helped you learn a few ez python tricks :)

Categories

Resources