I'm writing a really simple web proxy through python and right now I'm working on dealing with HTTPS CONNECT requests so I can open HTTPS websites. I'm trying to set up an SSL tunnel but my code is just not quite right. I think I'm close but if someone could take a look and push me in the right direction that would be great. My current understanding of what I'm supposed to do is
Recognize that the request is a CONNECT request
Send a message back to the browser as I have defined in the variable connect_req in my code
That's about it
Here's my code:
def ProxyThread(conn, client_addr):
request = conn.recv(MAX_BUFFER)
#print request
# Parsing
method, webserver, port = ParseReq(request)
print 'Request = ' + method + ' ' + webserver + ':' + str(port) + '\n'
try:
serverSocket = socket.socket(socket.AF_INET, socket.SOCK_STREAM)
serverSocket.connect((webserver, port))
if method == 'CONNECT':
connect_req = 'HTTP/1.1 200 Connection established\r\n'
connect_req += 'Proxy-agent: localhost\r\n\r\n'
conn.send(connect_req.encode())
serverSocket.send(connect_req)
while 1:
data = serverSocket.recv(MAX_BUFFER)
# while there is data to receive from server
if len(data) > 0:
conn.send(data)
else:
break
serverSocket.close()
conn.close()
except socket.error, (message):
print message
if conn:
conn.close()
if serverSocket:
serverSocket.close()
return
Edit 1: Updated code to start a thread when I get a HTTPS req
def ProxyThread(conn, client_addr):
request = conn.recv(MAX_BUFFER)
method, webserver, port = ParseReq(request)
#Handle index out of range exception - Throw out the request
if method is None or webserver is None or port is -1:
return
print 'Request = ' + method + ' ' + webserver + ':' + str(port) + ' START\n'
serverSocket = socket.socket(socket.AF_INET, socket.SOCK_STREAM)
try:
if method == 'CONNECT':
connect_req = 'HTTP/1.0 200 Connection established\r\n'
connect_req += 'Proxy-agent: ProxyServer/1.0\r\n'
connect_req += '\r\n'
print connect_req
conn.send(connect_req)
thread = threading.Thread(target=HTTPSProxyThread, args=(conn, serverSocket))
thread.start()
serverSocket.connect((webserver, port))
serverSocket.send(request)
while 1:
data = serverSocket.recv(MAX_BUFFER)
# while there is data to receive from server
if len(data) > 0:
conn.send(data)
else:
break
print 'Request = ' + method + ' ' + webserver + ':' + str(port) + ' FINISH\n'
serverSocket.close()
conn.close()
def HTTPSProxyThread(conn, serverSocket):
while 1:
request = conn.recv(MAX_BUFFER)
print request
method, webserver, port = ParseReq(request)
serverSocket.connect((webserver, port))
serverSocket.send(request)
while 1:
data = serverSocket.recv(MAX_BUFFER)
# while there is data to receive from server
if len(data) > 0:
conn.send(data)
else:
break
A lot of people seem to be building their own web proxies in Python, or Node.js these days.
As someone who has spent the past 22 years making a web proxy, I wonder why people do it to themselves, especially where there is free product available on all the main platforms where somebody has already dealt with the issues such as (some of these you will have to deal with later)
tunneling (CONNECT)
chunking
HTTP authentication
Funky non-compliant behaviour from a surprising number of servers and clients.
performance
scalability
logging
caching
policy framework and enforcement
etc
Whilst it's a fun way to pass the time for a while, the more of these proxies there are out there, the more broken the web becomes overall if the naive implementations are used for more general traffic. If you're just using this for your own specific deployment requirement, the ignore this comment.
I guess the point I'm trying to make is that making a well-behaved (let alone performant) web proxy is non-trivial.
Related
I have a simple Python HTTP server which also connects other HTTP servers to fetch some data. While, connecting to other servers, my server acts as an http client, but the socket created for incoming connection requests still keeps listening from port 8080 (I have a different socket for the client).
The list of other servers that I need to connect and fetch data is stored in a JSON file and I have code like this
with open(myjsonfile, 'r') as json_file:
entries = json.load(json_file)
for entry in entries.keys():
address = entries[entry]['address']
port = int(entries[entry]['port'])
client_port = config.server_port + 50000
host = gethostname()
# request the TXT file
sock = socket(AF_INET,SOCK_STREAM)
# sock.setsockopt(SOL_SOCKET, SO_REUSEADDR, 1)
sock.bind((host, client_port))
sock.connect((address, port))
reqmsg = "GET /" + config.txt_name + " HTTP/1.0\r\n\r\n"
sock.sendall(reqmsg.encode())
response = ''
response = sock.recv(2048).decode()
pos = response.find("\r\n\r\n")
txt_data = response[pos+4:]
# processing the received data here
sock.close()
# request the IMG file
sock = socket(AF_INET,SOCK_STREAM)
# sock.setsockopt(SOL_SOCKET, SO_REUSEADDR, 1)
sock.bind((host, client_port))
sock.connect((address, port))
reqmsg = "GET /" + config.img_name + " HTTP/1.0\r\n\r\n"
sock.sendall(reqmsg.encode())
response = b''
while True:
recvdata = sock.recv(2048)
if (len(recvdata) < 1):
break
response = response + recvdata
pos = response.find(b"\r\n\r\n")
img_data = response[pos+4:]
# working with the image here
sock.close()
I have to use a set port number for my client because this is how the server identifies me. However, I sometimes get an "Address already in use" error for the second socket.bind() call (the one for the image). Without the bind() calls, my code works fine.
I tried setting socket options (commented out in the code above) and using pycurl with the LOCALPORT property set to client_port value above, but still getting the same error.
What could be the reason behind the error message? I think I open and close the sockets so the operating system should free the port for further use (I think)?
Thanks
PS : This is a small project, not a production system, hence do not bother with "why use port numbers to identify clients"
There is a TIME_WAIT after the session is shutdown to make sure that there are still no live packets in the network.When you re-create the same tuple and one of those packets shows up, it would be treated as a valid packet for your connection this will cause an error state.Usually 2xpacket max age, before the packet is discarded
Before you create a connection with the same tuple, all the packets from the previous session must be dead.
Try using;
...
sock.setsockopt(SOL_SOCKET, SO_REUSEADDR, 1)
sock.listen([backlog])
sock.bind((host, client_port))
...
socket.listen([backlog])
I am trying to use this code to create an HTTP proxy cache server. When I run the code it starts running and connects to the port and everything but when I try to connect from the browser, for example, it opens a port on 55555 if I type in localhost:52523/www.google.com it works fine but when I try other sites specifically HTTP, for example, localhost:52523/www.microcenter.com or just localhost:52523/google.com it will display localhost didn’t send any data.
ERR_EMPTY_RESPONSE and shows an exception in the console though it creates the cache file on my computer.
I would like to find out how to edit the code so that I can access any website just as I would normally do on the browser without using the proxy server. It should be able to work with www.microcenter.com
import socket
import sys
import urllib
from urlparse import urlparse
Serv_Sock = socket.socket(socket.AF_INET, socket.SOCK_STREAM) # socket.socket
function creates a socket.
port = Serv_Sock.getsockname()[1]
# Server socket created, bound and starting to listen
Serv_Sock = socket.socket(socket.AF_INET, socket.SOCK_STREAM) # socket.socket
function creates a socket.
Serv_Sock.bind(('',port))
Serv_Sock.listen(5)
port = Serv_Sock.getsockname()[1]
# Prepare a server socket
print ("starting server on port %s...,"%(port))
def caching_object(splitMessage, Cli_Sock):
#this method is responsible for caching
Req_Type = splitMessage[0]
Req_path = splitMessage[1]
Req_path = Req_path[1:]
print "Request is ", Req_Type, " to URL : ", Req_path
#Searching available cache if file exists
url = urlparse(Req_path)
file_to_use = "/" + Req_path
print file_to_use
try:
file = open(file_to_use[5:], "r")
data = file.readlines()
print "File Present in Cache\n"
#Proxy Server Will Send A Response Message
#Cli_Sock.send("HTTP/1.0 200 OK\r\n")
#Cli_Sock.send("Content-Type:text/html")
#Cli_Sock.send("\r\n")
#Proxy Server Will Send Data
for i in range(0, len(data)):
print (data[i])
Cli_Sock.send(data[i])
print "Reading file from cache\n"
except IOError:
print "File Doesn't Exists In Cache\n fetching file from server \n
creating cache"
serv_proxy = socket.socket(socket.AF_INET, socket.SOCK_STREAM)
host_name = Req_path
print "HOST NAME:", host_name
try:
serv_proxy.connect((url.host_name, 80))
print 'Socket connected to port 80 of the host'
fileobj = serv_proxy.makefile('r', 0)
fileobj.write("GET " + "http://" + Req_path + " HTTP/1.0\n\n")
# Read the response into buffer
buffer = fileobj.readlines()
# Create a new file in the cache for the requested file.
# Also send the response in the buffer to client socket
# and the corresponding file in the cache
tmpFile = open(file_to_use, "wb")
for data in buffer:
tmpFile.write(data)
tcpCliSock.send(data)
except:
print 'Illegal Request'
Cli_Sock.close()
while True:
# Start receiving data from the client
print 'Initiating server... \n Accepting connection\n'
Cli_Sock, addr = Serv_Sock.accept() # Accept a connection from client
#print addr
print ' connection received from: ', addr
message = Cli_Sock.recv(1024) #Recieves data from Socket
splitMessage = message.split()
if len(splitMessage) <= 1:
continue
caching_object(splitMessage, Cli_Sock)
There is a few errors in the code :-
The first is that a GET request does not expect the protocol to be passed in as part of the call, nor does it expect the host, instead the GET should be restricted to only the path + query string.
An additional HOST header should be added which specifies which host you are using (i.e www.google.com ) some web servers may be setup to ignore this and instead send you a default page, but results are intermittent.
You should have a peek at the HTTP RFC which gives some other headers that can be passed via HTTP.
You could also install something like Fiddler or Wireshark and monitor some sample HTTP calls and see how the payload is supposed to look.
I currently have a Python client & server sending json object over socket as follows.
Client
# Create the socket & send the request
s = socket.socket(socket.AF_INET, socket.SOCK_STREAM)
print 'Connecting to server at host: ' + (self.host) + ' port: ' + str(self.port)
s.connect((self.host, self.port))
print 'Sending signing request to the server'
s.sendall(request_data)
print 'Waiting for server response'
response_data = s.recv(10 * 1024)
print 'Got server response'
s.close()
Server
# Create a socket
s = socket.socket(socket.AF_INET, socket.SOCK_STREAM)
print 'Starting the server at host: ' + (self.host) + ' port: ' + str(self.port)
s.bind((self.host, self.port))
s.listen(1)
while True:
# Create a new connection
print 'Listening for client requests...'
conn, addr = s.accept()
print 'Connected to: ' + str(addr)
# Get the data
request_data = conn.recv(10 * 1024)
print 'Got message: ' + str(request_data)
# Get the json object
try:
# Decode the data and do stuff
# ...
# ...
except Exception as e:
print e
finally:
# Close the connection
conn.close()
However, besides the json object, I also need to send a file (which is not a json object). Inside the Server's while loop, the socket cannot distinguish when the json object ends and file starting receiving.
My questions here is about the methodology. What would be the usual approach to send two distinct types of data through the socket? Can we use the same socket to receive two data types in a serial order? Would that require two while loops (one for json, another for file) inside the current while loop?
Or are there any other ways of doing so?
Thanks.
First things first, you cannot just do
response_data = s.recv(10 * 1024)
print 'Got server response'
or
# Get the data
request_data = conn.recv(10 * 1024)
print 'Got message: ' + str(request_data)
and then say you've got the data. Transmissions over TCP do not preserve their borders.
Regarding methodology, you need a protocol built over TCP. HTTP would be a great choice if you don't need your server to connect to clients without a request. In this case great libraries and frameworks are available.
If you want to build your own protocol, consider using control characters in your data stream. Something like this is possible:
json = b"{foo: ['b', 'a', 'r']}\n" # \n here stands for end-of-the-json symbol
sock.send_byte(TYPE_JSON_MESSAGE)
sock.sendall(json)
sock.send_byte(TYPE_FILE_MESSAGE)
sock.send_int(file_size) # so the server can determine where the file transmission ends
for chunk in chunked_file:
sock.sendall(chunk)
Here it's up to you to implement send_byte and send_int. It's not really difficult, if you use struct module.
On the server side:
message_type = sock.recv(1)
if message_type == TYPE_JSON_MESSAGE:
# ...
elif message_type == TYPE_FILE_MESSAGE:
file_size = sock.recv_int() # not implemented
# ...
I am extremely new to sockets and have been doing a small project based on python.
I've written a simple script, which is this :
def runserver():
try:
s = socket.socket(socket.AF_INET, socket.SOCK_STREAM)
except socket.error, msg:
print(msg[0])
s.bind((host, port))
s.listen(10)
while True:
client, address = s.accept()
# !clients.append(client)
request = client.recv(1024)
params = request.split("|")
if params[0] == "reg_user":
print(params[1] + ", " + params[2])
client.send("Registration Request Received.")
elif params[0] == "login":
print(params[1] + ", " + params[2])
client.send("Login Request Received.")
# !Handles administrative login
elif params[0] == "admin_login":
print("Administrator requested login.")
res = admin_login(params[1], params[2])
if res == 1:
client.send("Login Successful!")
print("Login Successful!")
else:
client.send("Login Failed!")
print("Login Failed!")
client.close()
s.close()
# !main function
if __name__ == "__main__":
runserver()
My Question is, the line
client.close()
Does this line hamper the performance of the code ?
What I mean is, each time a socket connection is made, it is catered to, and then disconnected. This means, that if the application needs further communication, it needs to reconnect to the socket before doing any further business.
I need to know if this hampers the performance and if so, how to get past this ? I want to use persistent connection basically, i.e after login (successful), there will be no disconnection and the application (client) and server can talk uninterrupted.
The second part of the question is :
s.listen(10)
I am allowing a backlog of 10 connections. What is the threshold value for this backlog ? How many connection can a server usually handle in a real life situation ?
I am attempting to create a Python web server however it seems to not be able to send any files larger then 4KB. If the file is above 4KB in size it just cuts off the end of the text/image past 4KB. Anything embedded from other sources (Amazon S3/Twitter) work fine.
Here is the server code. It is a bit of a mess at the moment but I am focused on getting it to work. Afterwards I will add more security to the code.
'''
Simple socket server using threads
'''
import socket
import sys
import time
import os
from thread import *
HOST = '' # Symbolic name meaning all available interfaces
PORT = 80 # Arbitrary non-privileged port
s = socket.socket(socket.AF_INET, socket.SOCK_STREAM)
print 'Socket created'
#Bind socket to local host and port
try:
s.bind((HOST, PORT))
except socket.error as msg:
print 'Bind failed. Error Code : ' + str(msg[0]) + ' Message ' + msg[1]
sys.exit()
print 'Socket bind complete'
#Start listening on socket
s.listen(10)
print 'Socket now listening'
#Function for handling connections. This will be used to create threads
def clientthread(conn):
#Sending message to connected client
#infinite loop so that function do not terminate and thread do not end.
while True:
#Receiving from client
data = conn.recv(4096)
print data
dataSplit = data.split(' ')
print dataSplit
contentType = "text/html"
if(dataSplit[1].endswith(".html")):
print "HTML FILE DETECTED"
contentType = "text/html"
elif(dataSplit[1].endswith(".png")):
print "PNG FILE DETECTED"
contentType = "image/png"
elif(dataSplit[1].endswith(".css")):
print "CSS FILE DETECTED"
contentType = "text/css"
else:
print "NO MIMETYPE DEFINED"
conn.sendall('HTTP/1.1 200 OK\nServer: TestWebServ/0.0.1\nContent-Length: ' + str(os.path.getsize('index.html')) + '\nConnection: close\nContent-Type:' + contentType + '\n\n')
print '\n\n\n\n\n\n\n\n'
with open(dataSplit[1][1:]) as f:
fileText = f.read()
n = 1000
fileSplitToSend = [fileText[i:i+n] for i in range(0, len(fileText), n)]
for lineToSend in fileSplitToSend:
conn.sendall(lineToSend)
break
if not data:
break
#came out of loop
conn.close()
#now keep talking with the client
while 1:
#wait to accept a connection - blocking call
conn, addr = s.accept()
print 'Connected with ' + addr[0] + ':' + str(addr[1])
#start new thread takes 1st argument as a function name to be run, second is the tuple of arguments to the function.
start_new_thread(clientthread ,(conn,))
s.close
Thank you for your time.
So, thanks you the user "YOU" we found the problem. I had this code:
conn.sendall('HTTP/1.1 200 OK\nServer: TestWebServ/0.0.1\nContent-Length: ' + str(os.path.getsize('index.html')) + '\nConnection: close\nContent-Type:' + contentType + '\n\n')
instead of this code:
conn.sendall('HTTP/1.1 200 OK\nServer: TestWebServ/0.0.1\nContent-Length: ' + str(os.path.getsize(dataSplit[1][1:])) + '\nConnection: close\nContent-Type:' + contentType + '\n\n')
The problem was that I was sending the file size for index.html for every file, so Chrome and other browsers just removed the extra data. It just so happened index.html was 4KB so I though it was a packet limitation or something in that area.