I'm building a proxy server in Python and I got a question.
First I'll be showing you a part of my code that presents the receiving data from the client: If there is data from the client, it downloads the content of the requested website (By using the urllib library) and then sending to the client 200 OK with the content length and the content itself:
data = currentSocket.recv(4096)
if data == "":
open_client_sockets.remove(currentSocket)
print 'Conn is closed'
else:
dataSplit = data.split("\r\n")
Host = HostFliter(dataSplit)
print Host, " Host"
if Host == "":
break
contentURL = urllib.urlopen(Host)
content_to_send = contentURL.read()
currentSocket.send("HTTP/1.1 200 OK\r\nContent-Length:"+str(len(content_to_send))+"\r\n\r\n"+str(content_to_send))
contentURL.close()
**The variable "Host" contains the url of the website.
Now for the question:
Where do I get the headers from the server and then send them to the client?
**The libraries I use: socket, select, urllib.
**This is for the select library:
rlist, wlist, xlist = select.select([serverSocket] + open_client_sockets, open_client_sockets, [])
The HTTP response syntax is as follows
HTTP/1.1 200 OK
Content-Type: text/html
Content-Length: 38
<html><body>Hello world!</body></html>
So you need to send headers just before \r\n separated by \n in above format.
Related
I'm sending data in a TCP client in python and the tutorial I'm following is telling me to send this:
"GET / HTTP/1.1\r\nHost: google.com\r\n\r\n"
I've tried looking up information about the formatting here and I'm confused about what the GET is actually requesting or what data would be sent back by this request, and also what is the purpose of the carriage returns and newlines?
If want to write low-level HTTP GET in Python then you can create a TCP Socket and write the GET command optionally with header parameters then read the response.
The HTTP request starts with a Request-line (e.g. GET / HTTP/1.1 with a terminating CRLF or "\r\n"). The request line is followed by zero or more headers each ending with a CRLF. A final CRLF sequence marks the end of the request line and header part of the HTTP request followed by an optional message body. The request structure is defined in section 5 of the HTTP 1.1 spec
import socket
# host and port map to URL http://localhost:8000/
host = "localhost"
port = 8000
try:
sock = socket.socket()
sock.connect((host, port))
sock.sendall("GET / HTTP/1.1\r\nHost: google.com\r\n\r\n".encode())
# keep reading from socket until no more data in response
while True:
response = sock.recv(8096)
if len(response) == 0:
break
print(response)
except Exception as ex:
print("I/O Error:", ex)
The first line of the HTTP response is the status line including status code terminated with \r\n and followed by response headers.
HTTP/1.1 200 OK\r\n
Content-type: text/plain\r\n
Content-length: 14\r\n
\r\n
This is a test
You need to parse the status line and headers to determine how to decode the message body of the HTTP response.
Details of the HTTP response are in section 6 of the HTTP 1.1 Spec.
Alternatively, the requests module implements the HTTP spec in a simple API.
Example to make a HTTP GET using requests API.
import requests
url = 'http://localhost:8000/'
response = requests.get(url)
print("Status code:", response.status_code)
print("Content:", response.text)
I've got here a code that sends an HTTPS request.
My problem is handling redirection requests using the same socket connection.
I know that the requests module can handle this redirects very well but this code
is for a proxy server that I'm developing.
Any help would be appreciated. Thanks!
import socket, ssl
from ssl import SSLContext
HOST = "www.facebook.com"
PORT = 443
ContextoSSL = SSLContext(protocol=ssl.PROTOCOL_SSLv23)
sock = socket.socket(socket.AF_INET, socket.SOCK_STREAM)
sslSocket = ContextoSSL.wrap_socket(sock, server_hostname=HOST)
sslSocket.setsockopt(socket.SOL_SOCKET, socket.SO_REUSEADDR, 1)
sslSocket.connect((HOST, PORT))
sslSocket.do_handshake()
der = sslSocket.getpeercert(binary_form=True)
pem_data = ssl.DER_cert_to_PEM_cert(der)
#print(pem_data) # print certificate
'''1st request'''
headers = \
"GET / HTTP/1.1\r\n"\
"Host: www.facebook.com\r\n"\
"User-Agent: python-requests/2.22.0\r\n"\
"Accept-Encoding: gzip, deflate\r\nAccept: */*\r\n"\
"Connection: keep-alive\r\n\r\n"
print("\n\n" + headers)
sslSocket.send(headers.encode()) # send request
response = sslSocket.recv(9999)
print(response) # print receive response
'''2nd request''' # on this redirect with cookie set, response should be 200 OK
cookie, location = "", ""
for line in response.decode().splitlines():
if "Set-Cookie" in line:
cookie = line.replace("Set-Cookie: ", "").split(";")[0]
if "Location" in line:
location = line.replace("Location: ", "").split("/")[3]
print(cookie, location)
headers = \
f"GET /{location} HTTP/1.1\r\n"\
"Host: www.facebook.com\r\n"\
"User-Agent: python-requests/2.22.0\r\n"\
"Accept-Encoding: gzip, deflate\r\nAccept: */*\r\n"\
"Connection: keep-alive\r\n"\
f"Cookie: {cookie}\r\n\r\n"
print("\n\n" + headers)
sslSocket.send(headers.encode()) # send request
response = sslSocket.recv(9999)
print(response) # print received response
To handle a redirect you must first get the new location:
first properly read the response as specified in the HTTP standard, i.e. read the full body based on the length declared in the response
parse the response
check for a response code which indicates a redirect
in case of a redirect extract the new location from the Location field in the response header
Once you have the new location you can issue the new request for this location. If the new location is for the same domain and if both request and response indicated that the TCP connection can be reused you can try to issue the new request on the same TCP connection. But you need to handle the case that the server might close the connection anyway since this is explicitly allowed.
In all other cases you have to create a new TCP connection for the new request.
Note that showing you how you exactly can code this would be too broad. There is a reason HTTP libraries exist which you'd better use for this purpose instead of implementing all the complexity yourself.
I have been working on this program which basically sends an HTML request to the specified server, but each time I run it to send a GET request it responds with a 404 not found page of that site. Can anybody please guide what am I doing wrong out here? I tried copying the Firefox HTML request file and sending that still no use.
import socket
server,port = 'google.com',80
ip = socket.gethostbyname(server)
print (ip)
sock = socket.socket(socket.AF_INET,socket.SOCK_STREAM)
sock.connect((server,port))
request = 'GET /HTTP/1.1\nHost: '+str(ip)+'\n\n'
print(request)
sock.sendall(request.encode())
while True:
data = ' '
data = sock.recv(4096)
if data == ' ':
break
print(data.decode())
And also what are the applications of socket module apart from creating remote servers?
I think your problem is your request. First, you need a space after that first /. Second, for the host, it should be www.google.com, not an IP address.
request = 'GET / HTTP/1.1 \nHost: www.google.com\n\n'
Also, you should change that first line to www.google.com, since it will redirect you there anyway:
server,port = 'www.google.com',80
I need to build a http server without using an HTTP library.
I have the server running and an html page beeing loaded but my <img src="..."/> tags are not beeing loaded, I recive the call but cannot preset the png/JPEG in the page.
httpServer.py
# Define socket host and port
SERVER_HOST = '0.0.0.0'
SERVER_PORT = 8000
# Create socket
server_socket = socket.socket(socket.AF_INET, socket.SOCK_STREAM)
server_socket.setsockopt(socket.SOL_SOCKET, socket.SO_REUSEADDR, 1)
server_socket.bind((SERVER_HOST, SERVER_PORT))
server_socket.listen(1)
print('Listening on port %s ...' % SERVER_PORT)
while True:
# Wait for client connections
client_connection, client_address = server_socket.accept()
# Handle client request
request = client_connection.recv(1024).decode()
content = handle_request(request)
# Send HTTP response
if content:
response = 'HTTP/1.1 200 OK\n\n'
response += content
else:
response = 'HTTP/1.1 404 NOT FOUND\n\nFile Not Found'
client_connection.sendall(response.encode())
client_connection.close()
# Close socket
server_socket.close()
Function where handles the call
def handle_request(request):
http = HttpHandler.HTTPHandler
# Parse headers
print(request)
headers = request.split('\n')
get_content = headers[0].split()
accept = headers[6].split()
type_content = accept[1].split('/')
try:
# Filename
filename = get_content[1]
if get_content[0] == "GET":
content = http.get(None, get_content[1], type_content[0])
return content
except FileNotFoundError:
return None
class to handle the http verbs
class HTTPHandler:
def get(self, args, type):
if args == '/':
args = '/index.html'
fin = open('htdocs' + args)
if type != "image":
fin = open('htdocs/' + args)
if type == "image":
fin = open('htdocs/' + args, 'rb')
# Read file contents
content = fin.read()
fin.close()
return content
Realize that I´m trying to make an HTTP 1.1, if you see anything out of pattern fell free to say thanks in advance.
I don't know where you've learnt how HTTP works but I'm pretty sure that you did not study the actual standard which you should do when implementing a protocol. Some notes about your implementation:
Line ends should be \r\n not \n. This is true for both responses from the server as requests from the client.
You are assuming that the clients requests is never larger than 1024 bytes and that it can be read within a single recv. But, requests can have arbitrary length and there is no guarantee that you get all within a single recv (TCP is a streaming protocol and not a message protocol).
While it is kind of ok to simply close the TCP connection after the body it would be better to include the length of the body in the Content-length header or use chunked transfer encoding.
The type of the content should be given by using the Content-Type header, i.e. Content-type: text/html for HTML and Content-type: image/jpeg for JPEG images. Without this browser might guess correctly or wrongly what the type might be or depending on the context might also insist on a proper content-type header.
Apart from that, if you debug such problems it is helpful to find out what gets actually exchanged between client and server. It might be that you've checked this for yourself but you did not include such information into your question. Your only error description is "...I recive the call but cannot preset the png/JPEG in the page" and then a dump of your code.
httpServer.py
Ended up like:
while True:
# Wait for client connections
client_connection, client_address = server_socket.accept()
# Handle client request
request = client_connection.recv(10240).decode()
content = handle_request(request)
# Send HTTP response
if content:
if str(content).find("html") > 0:
client_connection.send('HTTP/1.1 200 OK\n\n'.encode())
client_connection.send(content.encode())
else:
client_connection.send('HTTP/1.1 200 OK\r\n'.encode())
client_connection.send("Content-Type: image/jpeg\r\n".encode())
client_connection.send("Accept-Ranges: bytes\r\n\r\n".encode())
client_connection.send(content)
else:
response = 'HTTP/1.1 404 NOT FOUND\r\nFile Not Found'
client_connection.close()
And the Get method like:
class HTTPHandler:
def get(self, args, type):
if args == '/':
args = '/index.html'
fin = open('htdocs' + args)
if type != "image":
fin = open('htdocs/' + args)
if type.find("html") == -1:
image_data = open('htdocs/' + args, 'rb')
bytes = image_data.read()
# Content-Type: image/jpeg, image/png \n\n
content = bytes
fin.close()
return content
# Read file contents
content = fin.read()
fin.close()
return content
I am trying to write a simple HTTP client program using raw sockets in Python 3. However, the server does not return a response despite having been sent a simple HTTP request. My question is why the server doesn't return a response.
Here is my code:
from socket import *
BUF_LEN = 8192 * 100000
info = getaddrinfo('google.com', 80, AF_INET)
addr = info[-1][-1]
print(addr)
client = socket(AF_INET, SOCK_STREAM)
client.connect(addr)
client.send(b"GET /index.html HTTP1.1\r\nHost: www.google.com\r\n")
print(client.recv(BUF_LEN).decode("utf-8")) # print nothing
You've missed a blank line at the end and mis-specified the HTTP version without a slash:
>>> client.send(b"GET /index.html HTTP1.1\r\nHost: www.google.com\r\n")
Should be:
>>> client.send(b"GET /index.html HTTP/1.1\r\nHost: www.google.com\r\n\r\n")
50
>>> client.recv(BUF_LEN).decode("utf-8")
u'HTTP/1.1 302 Found\r\nCache-Control: private\r\nContent-Type: text/html; charset=UTF-8\r\nLocation: http://www.google.co.uk/index.html?gfe_rd=cr&ei=fIR7WJ7QGejv8AeZzbWgCw\r\nContent-Length: 271\r\nDate: Sun, 15 Jan 2017 14:17:32 GMT\r\n\r\n<HTML><HEAD><meta http-equiv....
The blank line tells the server its the end of the headers, and since this is a GET request there's no payload and so it can then return the content.
Without the / in the HTTP/1.1 spec Google's servers will return an Error: 400 Bad Request response.