Im currently creating a python socket http server, and I'm working on my GET and POST requests. I got my GET implementation working fine, but the body element of the POST requests won't show up.
Code snippet:
self.host = ''
self.port = 8080
self.listener = socket.socket(socket.AF_INET, socket.SOCK_STREAM)
self.listener.setsockopt(socket.SOL_SOCKET, socket.SO_REUSEADDR, 1)
self.listener.bind((self.host, self.port))
self.listener.listen(1)
while True:
client_connection, client_address = self.listener.accept()
request = client_connection.recv(2048)
print request
This code yields the http header after processing the post request from the webpage:
POST /test.txt HTTP/1.1
Host: localhost:8080
Content-Type: application/x-www-form-urlencoded
Origin: http://localhost:8080
Content-Length: 21
Connection: keep-alive
Accept: text/html,application/xhtml+xml,application/xml;q=0.9,*/*;q=0.8
User-Agent: Mozilla/5.0 (Macintosh; Intel Mac OS X 10_11_5) AppleWebKit/601.6.17 (KHTML, like Gecko) Version/9.1.1 Safari/601.6.17
Referer: http://localhost:8080/
Accept-Language: nb-no
Accept-Encoding: gzip, deflate
But there is no body, so the question is why am i not receiving the http body when i know it is sent?
Thanks!
while True:
client_connection, client_address = self.listener.accept()
request = client_connection.recv(2048)
print request
recv does not read exactly 2048 bytes but it reads up to 2048 bytes. If some data arrive recv will return with the data even if more data might follow. My guess is that in your case the client is first sending the HTTP header and then the body. If NAGLE algorithms is off at the client side (common) it is likely that your first recv will only get the header and that you would need another recv for the body. This would explain what happens in your case: you get the header but not the body since you don't do another recv.
But even that would be a too simple implementation which will go wrong sooner or later. To make it correctly you should implement the HTTP protocol correctly: first read the HTTP header which might need multiple recv if the header is large. Then you should parse the header, figure out the size of the body (Content-length header) and read the remaining bytes.
Related
Was wondering why I am getting a 408 request timeout when sending an HTTP GET request using sockets. I just copied the GET request that was sent through Chrome and then pasted it into python figuring that I would get a 200 response, but clearly, I am missing something.
def GET():
headers = ("""GET / HTTP/1.1\r
Host: {insert host here}\r
Connection: close\r
Cache-Control: max-age=0\r
DNT: 1\r
Upgrade-Insecure-Requests: 1\r
User-Agent: Mozilla/5.0 (Windows NT 10.0; Win64; x64) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/93.0.4577.63 Safari/537.36\r
Accept: text/html,application/xhtml+xml,application/xml;q=0.9,image/avif,image/webp,image/apng,*/*;q=0.8,application/signed-exchange;v=b3;q=0.9\r
Accept-Encoding: gzip, deflate\r
Accept-Language: en-US,en;q=0.9\r
Cookie: accept_cookies=1\r\n""").encode('ascii')
payload = headers
return payload
def activity1():
sock = socket.socket(socket.AF_INET, socket.SOCK_STREAM)
sock.connect((HOST, PORT))
user = GET()
sock.sendall(user)
poop = sock.recv(10000)
print(poop)
sock.close()
Assuming the hostname and port are defined correctly is there anything wrong with this request that would cause it to timeout? Thanks.
The initial problem is that the HTTP header is not properly finished, i.e. it is missing the final \r\n (empty line). Once this is done you will likely run into multiple other problems, like:
You are assuming that everything can be read within a single recv, which will only be true for short answers.
You likely assume that the body is a single byte buffer. But it can be transferred in chunks since HTTP/1.1 support this Transfer-Encoding.
You likely assume that the body is in plain. But it can be compressed since you explicitly accept gzip-compressed responses.
HTTP is not the simple protocol as it might look. Please read the actual standard before implementing it, see RFC 7230. Or just use a library which does the hard work for you.
I have the following bare-bones server in order to make sure that I'm able to receive a socket connection:
import socket
HOST, PORT = '', 8888
listen_socket = socket.socket(socket.AF_INET, socket.SOCK_STREAM)
listen_socket.setsockopt(socket.SOL_SOCKET, socket.SO_REUSEADDR, 1)
listen_socket.bind((HOST, PORT))
listen_socket.listen(1)
print(f'Serving HTTP on port {PORT} ...')
while True:
print ('1 - start')
client_connection, client_address = listen_socket.accept()
request_data = client_connection.recv(1024)
print(request_data.decode('utf-8'))
client_connection.sendall(b"""HTTP/1.1 200 OK\n\nHello, World!\n""")
client_connection.close()
print ('2 - end')
And when I view it in Chrome, it loads the page, but doesn't return any response. Here is what the server prints:
1 - start
GET / HTTP/1.1
Host: localhost:8888
Connection: keep-alive
Cache-Control: max-age=0
Upgrade-Insecure-Requests: 1
User-Agent: Mozilla/5.0 (Macintosh; Intel Mac OS X 10_14_6) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/77.0.3865.90 Safari/537.36
Sec-Fetch-Mode: navigate
Sec-Fetch-User: ?1
Accept: text/html,application/xhtml+xml,application/xml;q=0.9,image/webp,image/apng,*/*;q=0.8,application/signed-exchange;v=b3
Sec-Fetch-Site: none
Accept-Encoding: gzip, deflate, br
Accept-Language: en-US,en;q=0.9
Cookie: csrftoken=ekiwd2n8m6BjVe0Uoyif6OZ7pmWmULDs; _xsrf=2|02c4087f|7e8be522d7bb6404ace2d2117e42deed|1571691550; username-localhost-8888="2|1:0|10:1571694083|23:username-localhost-8888|44:ODk2NzY4Yzg2ZjgyNDRlMzg0ZWEwMjU1MGQ0OTU5NzE=|87587a46d228990c91d93d828c44e2a79155d03332047321e1f882dced5c67dd"; _hp2_id.1263915336=%7B%22userId%22%3A%223410356020127146%22%2C%22pageviewId%22%3A%223017774512831795%22%2C%22sessionId%22%3A%228642071851852288%22%2C%22identity%22%3Anull%2C%22trackerVersion%22%3A%224.0%22%
2 - end
1 - start
So it seems to be going through...But the client (Chrome) just receives a blank response:
Why isn't it returning "Hello, World!" ? When I try it with telnet it's the same as well.
Note that this does with with requests -- just not telnet or chrome:
>>> print(requests.get('http://127.0.0.1:8888').text)
Hello, World!
It seems to be related to this line:
request_data = client_connection.recv(1024)
As you said problem can be when browser sends more then 1024 bytes.
This example uses while True loop to read in chunk and check if request_data ends with b'\r\n\r\n' - and it works with GET but problem makes POST which send body directly after b'\r\n\r\n'. To resolve this problem I simply use .recv(1) and check b'\r\n\r\n' after every bytes but you could try to find better method to check b'\r\n\r\n' in request_data
import socket
HOST, PORT = '', 8888
listen_socket = socket.socket(socket.AF_INET, socket.SOCK_STREAM)
listen_socket.setsockopt(socket.SOL_SOCKET, socket.SO_REUSEADDR, 1)
listen_socket.bind((HOST, PORT))
listen_socket.listen(1)
print(f'Serving HTTP on port {PORT} ...')
while True:
print ('1 - start')
client_connection, client_address = listen_socket.accept()
request_data = b''
while True:
chunk = client_connection.recv(1)
request_data += chunk
#print(chunk)
if request_data.endswith(b'\r\n\r\n'):
break
print(request_data.decode('utf-8'))
client_connection.sendall(b"""HTTP/1.1 200 OK\n\nHello, World!\n""")
client_connection.close()
print ('2 - end')
This code reads only header. To read body in POST you can use function with bigger value in recv() and POST should send header content-length with size of body so it should be easier to read body then header.
I'd like to know what a HTTP GET request header is sent by requests.get() from the client side.
requests.get('http://localhost:9000')
The request header sent by the above python command monitored by netcat is the following. However, I don't find a way to directly monitor the HTTP GET request header sent at the client side.
GET / HTTP/1.1
Host: localhost:9000
Connection: keep-alive
Accept-Encoding: gzip, deflate
Accept: */*
User-Agent: python-requests/2.18.4
import requests
import sys
req = requests.Request('GET', 'localhost:9000')
print req.headers
prepared = req.prepare()
s = requests.Session()
page = s.send(prepared)
The request header sent by the above python command monitored by netcat is the following.
GET / HTTP/1.1
Host: localhost:9000
Accept-Encoding: identity
Also, req.headers can be used to monitor the header. It is not exactly the same as the header sent as it does not contain the Accept-Encoding header. Also, the HTTP GET request header sent by this way is also different from that of the first way.
$ ./main.py
{}
Is there a method to directly monitor what HTTP GET header is sent in the first way?
Also, why the requests send by the two methods are not the same? Isn't it better to make them consistent to avoid possible confusion?
I'd like to know what a HTTP GET request header is sent by requests.get() from the client side
If I got it right, you want to view headers, that were actually sent by requests.get().
You can access them by using .request.headers attributes:
import requests
r = requests.get("http://example.com")
print(r.request.headers)
I'm trying to connect to website with python requests, but not with my real IP. So, I found some proxy on the internet and wrote this code:
import requests
proksi = {
'http': 'http://5.45.64.97:3128'
}
x = requests.get('http://www.whatsmybrowser.org/', proxies = proksi)
print(x.text)
When I get output, proxy simple don't work. Site returns my real IP Address. What I did wrong? Thanks.
The answer is quite simple. Although it is a proxy service, it doesn't guarantee 100% anonymity. When you send the HTTP GET request via the proxy server, the request sent by your program to the proxy server is:
GET http://www.whatsmybrowser.org/ HTTP/1.1
Host: www.whatsmybrowser.org
Connection: keep-alive
Accept-Encoding: gzip, deflate
Accept: */*
User-Agent: python-requests/2.10.0
Now, when the proxy server sends this request to the actual destination, it sends:
GET http://www.whatsmybrowser.org/ HTTP/1.1
Host: www.whatsmybrowser.org
Accept-Encoding: gzip, deflate
Accept: */*
User-Agent: python-requests/2.10.0
Via: 1.1 naxserver (squid/3.1.8)
X-Forwarded-For: 122.126.64.43
Cache-Control: max-age=18000
Connection: keep-alive
As you can see, it throws your IP (in my case, 122.126.64.43) in the HTTP header: X-Forwarded-For and hence the website knows that the request was sent on behalf of 122.126.64.43
Read more about this header at: https://www.rfc-editor.org/rfc/rfc7239
If you want to host your own squid proxy server and want to disable setting X-Forwarded-For header, read: http://www.squid-cache.org/Doc/config/forwarded_for/
How can I send HTTP requests as string with python? something like this:
r = """GET /hello.htm HTTP/1.1
User-Agent: Mozilla/4.0 (compatible; MSIE5.01; Windows NT)
Host: www.stackoverflow.com
Accept-Language: en-us
Accept-Encoding: gzip, deflate
Connection: Keep-Alive"""
answer = send(r)
print answer # gives me the response as string
Assuming python 3, it is recommended that you use urllib.request.
But since you specifically ask for providing the HTTP message as a string,
I assume you want to do low level stuff, so you can also use http.client:
import http.client
connection = http.client.HTTPConnection('www.python.org')
connection.request('GET', '/')
response = connection.getresponse()
print(response.status)
print(response.msg)
answer = response.read()
print(answer)