Apparently fine http request results malformed when sent over socket - python

I'm working with socket operations and have coded a basic interception proxy in python. It works fine, but some hosts return 400 bad request responses.
These requests do not look malformed though. Here's one:
GET http://www.baltour.it/ HTTP/1.1
Host: www.baltour.it
User-Agent: Mozilla/5.0 (X11; Ubuntu; Linux x86_64; rv:28.0) Gecko/20100101 Firefox/28.0
Accept: text/html,application/xhtml+xml,application/xml;q=0.9,*/*;q=0.8
Accept-Language: en-US,en;q=0.5
Accept-Encoding: gzip, deflate
Connection: keep-alive
Same request, raw:
GET http://www.baltour.it/ HTTP/1.1\r\nHost: www.baltour.it\r\nUser-Agent: Mozilla/5.0 (X11; Ubuntu; Linux x86_64; rv:28.0) Gecko/20100101 Firefox/28.0\r\nAccept: text/html,application/xhtml+xml,application/xml;q=0.9,*/*;q=0.8\r\nAccept-Language: en-US,en;q=0.5\r\nAccept-Encoding: gzip, deflate\r\nConnection: keep-alive\r\n\r\n
The code I use to send the request is the most basic socket operation (though I don't think the problem lies there, it works fine with most hosts)
socket_client.send(request_raw)
while socket_client.recv is used to get the response (but no problems here, the response is well-formed, though its status is 400).
Any ideas?

When not talking to a proxy, you are not supposed to put the http://hostname part in the HTTP header; see section 5.1.2 of the HTTP 1.1 RFC 2616 spec:
The most common form of Request-URI is that used to identify a resource on an origin server or gateway. In this case the absolute path of the URI MUST be transmitted (see section 3.2.1, abs_path) as the Request-URI, and the network location of the URI (authority) MUST be transmitted in a Host header field.
(emphasis mine); abs_path is the absolute path part of the request URI, not the full absolute URI itself.
E.g. the server expects you to send:
GET / HTTP/1.1
Host: www.baltour.it
A receiving server should be tolerant of the incorrect behaviour, however. The server seems to violate the RFC as well here too. Further on in the same section it reads:
To allow for transition to absoluteURIs in all requests in future versions of HTTP, all HTTP/1.1 servers MUST accept the absoluteURI form in requests, even though HTTP/1.1 clients will only generate them in requests to proxies.

Related

Question regarding Python HTTP GET request using sockets

Was wondering why I am getting a 408 request timeout when sending an HTTP GET request using sockets. I just copied the GET request that was sent through Chrome and then pasted it into python figuring that I would get a 200 response, but clearly, I am missing something.
def GET():
headers = ("""GET / HTTP/1.1\r
Host: {insert host here}\r
Connection: close\r
Cache-Control: max-age=0\r
DNT: 1\r
Upgrade-Insecure-Requests: 1\r
User-Agent: Mozilla/5.0 (Windows NT 10.0; Win64; x64) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/93.0.4577.63 Safari/537.36\r
Accept: text/html,application/xhtml+xml,application/xml;q=0.9,image/avif,image/webp,image/apng,*/*;q=0.8,application/signed-exchange;v=b3;q=0.9\r
Accept-Encoding: gzip, deflate\r
Accept-Language: en-US,en;q=0.9\r
Cookie: accept_cookies=1\r\n""").encode('ascii')
payload = headers
return payload
def activity1():
sock = socket.socket(socket.AF_INET, socket.SOCK_STREAM)
sock.connect((HOST, PORT))
user = GET()
sock.sendall(user)
poop = sock.recv(10000)
print(poop)
sock.close()
Assuming the hostname and port are defined correctly is there anything wrong with this request that would cause it to timeout? Thanks.
The initial problem is that the HTTP header is not properly finished, i.e. it is missing the final \r\n (empty line). Once this is done you will likely run into multiple other problems, like:
You are assuming that everything can be read within a single recv, which will only be true for short answers.
You likely assume that the body is a single byte buffer. But it can be transferred in chunks since HTTP/1.1 support this Transfer-Encoding.
You likely assume that the body is in plain. But it can be compressed since you explicitly accept gzip-compressed responses.
HTTP is not the simple protocol as it might look. Please read the actual standard before implementing it, see RFC 7230. Or just use a library which does the hard work for you.

HTTPS - How to actually decrypt client request

I am using the socket library for handling http requests waiting on port 80 for connections (does not really matter right now), which works fine as all responses follow the following format
b"""GET / HTTP/1.1
Host: localhost:8000
Connection: keep-alive
Upgrade-Insecure-Requests: 1
User-Agent: Mozilla/5.0 (Windows NT 10.0; WOW64) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/84.0.4147.135 Safari/537.36 OPR/70.0.3728.189
Accept: text/html,application/xhtml+xml,application/xml;q=0.9,image/webp,image/apng,*/*;q=0.8,application/signed-exchange;v=b3;q=0.9
Sec-Fetch-Site: none
Sec-Fetch-Mode: navigate
Sec-Fetch-User: ?1
Sec-Fetch-Dest: document
Accept-Encoding: gzip, deflate, br
Accept-Language: el-GR,el;q=0.9"""
if you open port 443 or just use https in any browser, when a request is made the data is encrypted. But how can you actually decrypt the data and interact with the client? I've seen many posts about this but no one explains how the data can actually be decrypted. The data that is received always looks something like this and starts the same way with 0x16 and 0x03 bytes
b'\x16\x03\x01\x02\x00\x01\x00\x01\xfc\x03\x03\xfb\'\xa3\xa5\xa4\x1cf\xd1w~(L\xb5%0,\xfb\xa57\xf4\x92\x03}\x84xCIA\xd9}]2 \x15ID\xafU\xb6\xe3\x9d\xbdr\x93 L\x98\rD\xca\xa7\x11\x89\x00`Q\xf5\th\xde\x85S\xf8Q\x98\x00"jj\x13\x03\x13\x01\x13\x02\xcc\xa9\xcc\xa8\xc0+\xc0/\xc0,\xc00\xc0\x13\xc0\x14\x00\x9c\x00\x9d\x00/\x005\x00\n\x01\x00\x01\x91ZZ\x00\x00\x00\x00\x00\x0e\x00\x0c\x00\x00\tlocalhost\x00\x17\x00\x00\xff\x01\x00\x01\x00\x00\n\x00\n\x00\x08\x9a\x9a\x00\x1d\x00\x17\x00\x18\x00\x0b\x00\x02\x01\x00\x00#\x00\x00\x00\x10\x00\x0e\x00\x0c\x02h2\x08http/1.1\x00\x05\x00\x05\x01\x00\x00\x00\x00\x00\r\x00\x14\x00\x12\x04\x03\x08\x04\x04\x01\x05\x03\x08\x05\x05\x01\x08\x06\x06\x01\x02\x01\x00\x12\x00\x00\x003\x00+\x00)\x9a\x9a\x00\x01\x00\x00\x1d\x00 \xa5\x81S\xec\xf4I_\x08\xd2\n\xa6\xb5\xf6E\x9dE\xe6ha\xe7\xfdy\xdab=\xf4\xd3\x1b`V\x94F\x00-\x00\x02\x01\x01\x00+\x00\x0b\nZZ\x03\x04\x03\x03\x03\x02\x03\x01\x00\x1b\x00\x03\x02\x00\x02\xea\xea\x00\x01\x00\x00\x15\x00\xcf\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00'
My question is how can I bring the HTTPS data into a form like the above. I've read about some specific handshake procedures but I could not find something that just answsers telling exactly what to do. Of course I am only asking for development purposes.

Python socket - downloading files only work in chrome

So I created a code which a client uploads a file to the server folder and he has an option to download it back, it works perfectly fine in chrome, I click on the item I want to download and it downloads it
def send_image(request, cs):
request = request.split('=')
try:
name = request[1]
except:
name = request[0]
print('using send_iamge!')
print('Na ' + name)
path = 'C:\\Users\\x\\Desktop\\webroot\\uploads' + '\\file-name=' + name
print(path)
with open(path, 'rb') as re:
print('exist!')
read = re.read()
cs.send(read)
the code above reads the file that you choose and sends the data as bytes to the client back.
In chrome, it downloads the file as I showed you already but in for example internet explorer, it just prints the data to the client and doesn't download it The real question is why doesn't it just prints the data in chrome, why does it download it and doesn't print it as internet explorer does and how can I fix it?(for your info: all the files that I download have the name file-name before them that's why I put it there)
http request:
UPDATE:
POST /upload?file-name=Screenshot_2.png HTTP/1.1
Host: 127.0.0.1
Connection: keep-alive
Content-Length: 3534
Accept: */*
X-Requested-With: XMLHttpRequest
User-Agent: Mozilla/5.0 (Windows NT 10.0; Win64; x64) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/79.0.3945.130 Safari/537.36
Content-Type: application/octet-stream
Origin: http://127.0.0.1
Sec-Fetch-Site: same-origin
Sec-Fetch-Mode: cors
Referer: http://127.0.0.1/
Accept-Encoding: gzip, deflate, br
Accept-Language: en-GB,en;q=0.9,en-US;q=0.8,he;q=0.7
It looks like that you don't send a HTTP/1 response but a HTTP/0.9 response (Note that I'm talking about the response send from the server not the request send from the client). A HTTP/1 response consists of a HTTP header and a HTTP body, similar to how a HTTP request is constructed. A HTTP/0.9 response instead only consists of the actual body, i.e. no header and thus no meta information in the header which tell the browser what to do with the body.
HTTP/0.9 is obsolete for 25 years but some browsers still support it. When a browser gets a HTTP/0.9 request it could anything with it since there is no defined meaning from the HTTP header. Browsers might try to interpret is as HTML, as plain text, offer it for download, refuse it in total ... - whatever.
The way to fix the problem is to send an actual HTTP response header before sending the body, i.e. something like this
cs.send("HTTP/1.0 200 ok\r\nContent-type: application/octet-stream\r\n\r\n")
with open(path, 'rb') as re:
...
cs.send(read)
In any case: HTTP is way more complex than you might think. There are established libraries to deal with this complexity. If you insist on not using any library please study the standard in order to avoid such problems.

Python Mechanize Prevent Connection:Close

I'm trying to use mechanize to get information from a web page. It's basically succeeding in getting the first bit of information, but the web page includes a button for "Next" to get more information. I can't figure out how to programmatically get the additional information.
By using Live HTTP Headers, I can see the http request that is generated when I click the next button within a browser. It seems as if I can issue the same request using mechanize, but in the latter case, instead of getting the next page, I am redirected to the home page of the website.
Obviously, mechanize is doing something different than my browser is, but I can't figure out what. In comparing the headers, I did find one difference, which was the browser used
Connection: keep-alive
while mechanize used
Connection: close
I don't know if that's the culprit, but when I tried to add the header ('Connection','keep-alive'), it didn't change anything.
[UPDATE]
When I click the button for "page 2" within Firefox, the generated http is (according to Live HTTP Headers):
GET /statistics/movies/ww_load/the-fast-and-the-furious-6-2012?authenticity_token=ItU38334Qxh%2FRUW%2BhKoWk2qsPLwYKDfiNRoSuifo4ns%3D&facebook_fans_page=2&tbl=facebook_fans&authenticity_token=ItU38334Qxh%2FRUW%2BhKoWk2qsPLwYKDfiNRoSuifo4ns%3D HTTP/1.1
Host: www.boxoffice.com
User-Agent: Mozilla/5.0 (Windows NT 6.1; rv:18.0) Gecko/20100101 Firefox/18.0
Accept: text/javascript, text/html, application/xml, text/xml, */*
Accept-Language: en-US,en;q=0.5
Accept-Encoding: gzip, deflate
X-Requested-With: XMLHttpRequest
X-Prototype-Version: 1.6.0.3
Referer: http://www.boxoffice.com/statistics/movies/the-fast-and-the-furious-6-2012
Cookie: __utma=179025207.1680379428.1359475480.1360001752.1360005948.13; __utmz=179025207.1359475480.1.1.utmcsr=(direct)|utmccn=(direct)|utmcmd=(none); __qca=P0-668235205-1359475480409; zip=13421; country_code=US; _boxoffice_session=2202c6a47fc5eb92cd0ba57ef6fbd2c8; __utmc=179025207; user_credentials=d3adbc6ecf16c038fcbff11779ad16f528db8ebd470befeba69c38b8a107c38e9003c7977e32c28bfe3955909ddbf4034b9cc396dac4615a719eb47f49cc9eac%3A%3A15212; __utmb=179025207.2.10.1360005948
Connection: keep-alive
When I try to request the same url within mechanize, it looks like this:
GET /statistics/movies/ww_load/the-fast-and-the-furious-6-2012?facebook_fans_page=2&tbl=facebook_fans&authenticity_token=ZYcZzBHD3JPlupj%2F%2FYf4dQ42Kx9ZBW1gDCBuJ0xX8X4%3D HTTP/1.1
Accept-Encoding: identity
Host: www.boxoffice.com
Accept: text/javascript, text/html, application/xml, text/xml, */*
Keep-Alive: 115
Connection: close
Cookie: _boxoffice_session=ced53a0ca10caa9757fd56cd89f9983e; country_code=US; zip=13421; user_credentials=d3adbc6ecf16c038fcbff11779ad16f528db8ebd470befeba69c38b8a107c38e9003c7977e32c28bfe3955909ddbf4034b9cc396dac4615a719eb47f49cc9eac%3A%3A15212
Referer: http://www.boxoffice.com/statistics/movies/the-fast-and-the-furious-6-2012
User-Agent: Mozilla/5.0 (X11; U; Linux i686; en-US; rv:1.9.0.1) Gecko/2008071615 Fedora/3.0.1-1.fc9 Firefox/3.0.1
--
Daryl
The server was checking X-Requested-With and/or X-Prototype-Version, so adding those two headers to the mechanize request fixed it.
Maybe a little late with an answer but i fixed this by adding an line in _urllib2_forked.py
on line 1098 stands the line: headers["Connection"] = "Close"
Change this to:
if not 'Connection' in headers:
headers["Connection"] = "Close"
and make sure you set the header in you script and it will work.
Gr. Squandor

How to accept the GET request in Python socket application?

I am writing a very basic web server as a homework assignment and I have it running on localhost port 14000. When I browser to localhost:14000, the server sends back an HTML page with a form on it (the form's action is the same address - localhost:14000, not sure if that's proper or not).
Basically I want to be able to gather the data from the GET request once the page reloads after the submit - how can I do this? How can i access the stuff in the GET in general?
NOTE: I already tried socket.recv(xxx), that doesn't work if the page is being loaded first time - in that case we are not "receiving" anything from the client so it just keeps spinning.
The secret lies in conn.recv which will give you the headers sent by the browser/client of the request. If they look like the one I generated with safari you can easily parse them (even without a complex regex pattern).
data = conn.recv(1024)
#Parse headers
"""
data will now be something like this:
GET /?banana=True HTTP/1.1
Host: localhost:50008
User-Agent: Mozilla/5.0 (Macintosh; Intel Mac OS X 10_7_3) AppleWebKit/534.53.11 (KHTML, like Gecko) Version/5.1.3 Safari/534.53.10
Accept: text/html,application/xhtml+xml,application/xml;q=0.9,*/*;q=0.8
Accept-Language: en-us
Accept-Encoding: gzip, deflate
Connection: keep-alive
"""
#A simple parsing of the get data would be:
GET={i.split("=")[0]:i.split("=")[1] for i in data.split("\n")[0].split(" ")[1][2:].split("&")}

Categories

Resources