http proxy cache server not limiting browser functionality

http proxy cache server not limiting browser functionality - python

I am trying to use this code to create an HTTP proxy cache server. When I run the code it starts running and connects to the port and everything but when I try to connect from the browser, for example, it opens a port on 55555 if I type in localhost:52523/www.google.com it works fine but when I try other sites specifically HTTP, for example, localhost:52523/www.microcenter.com or just localhost:52523/google.com it will display localhost didn’t send any data.
ERR_EMPTY_RESPONSE and shows an exception in the console though it creates the cache file on my computer.
I would like to find out how to edit the code so that I can access any website just as I would normally do on the browser without using the proxy server. It should be able to work with www.microcenter.com
import socket
import sys
import urllib
from urlparse import urlparse
Serv_Sock = socket.socket(socket.AF_INET, socket.SOCK_STREAM) # socket.socket
function creates a socket.
port = Serv_Sock.getsockname()[1]
# Server socket created, bound and starting to listen
Serv_Sock = socket.socket(socket.AF_INET, socket.SOCK_STREAM) # socket.socket
function creates a socket.
Serv_Sock.bind(('',port))
Serv_Sock.listen(5)
port = Serv_Sock.getsockname()[1]
# Prepare a server socket
print ("starting server on port %s...,"%(port))
def caching_object(splitMessage, Cli_Sock):
#this method is responsible for caching
Req_Type = splitMessage[0]
Req_path = splitMessage[1]
Req_path = Req_path[1:]
print "Request is ", Req_Type, " to URL : ", Req_path
#Searching available cache if file exists
url = urlparse(Req_path)
file_to_use = "/" + Req_path
print file_to_use
try:
file = open(file_to_use[5:], "r")
data = file.readlines()
print "File Present in Cache\n"
#Proxy Server Will Send A Response Message
#Cli_Sock.send("HTTP/1.0 200 OK\r\n")
#Cli_Sock.send("Content-Type:text/html")
#Cli_Sock.send("\r\n")
#Proxy Server Will Send Data
for i in range(0, len(data)):
print (data[i])
Cli_Sock.send(data[i])
print "Reading file from cache\n"
except IOError:
print "File Doesn't Exists In Cache\n fetching file from server \n
creating cache"
serv_proxy = socket.socket(socket.AF_INET, socket.SOCK_STREAM)
host_name = Req_path
print "HOST NAME:", host_name
try:
serv_proxy.connect((url.host_name, 80))
print 'Socket connected to port 80 of the host'
fileobj = serv_proxy.makefile('r', 0)
fileobj.write("GET " + "http://" + Req_path + " HTTP/1.0\n\n")
# Read the response into buffer
buffer = fileobj.readlines()
# Create a new file in the cache for the requested file.
# Also send the response in the buffer to client socket
# and the corresponding file in the cache
tmpFile = open(file_to_use, "wb")
for data in buffer:
tmpFile.write(data)
tcpCliSock.send(data)
except:
print 'Illegal Request'
Cli_Sock.close()
while True:
# Start receiving data from the client
print 'Initiating server... \n Accepting connection\n'
Cli_Sock, addr = Serv_Sock.accept() # Accept a connection from client
#print addr
print ' connection received from: ', addr
message = Cli_Sock.recv(1024) #Recieves data from Socket
splitMessage = message.split()
if len(splitMessage) <= 1:
continue
caching_object(splitMessage, Cli_Sock)

There is a few errors in the code :-
The first is that a GET request does not expect the protocol to be passed in as part of the call, nor does it expect the host, instead the GET should be restricted to only the path + query string.
An additional HOST header should be added which specifies which host you are using (i.e www.google.com ) some web servers may be setup to ignore this and instead send you a default page, but results are intermittent.
You should have a peek at the HTTP RFC which gives some other headers that can be passed via HTTP.
You could also install something like Fiddler or Wireshark and monitor some sample HTTP calls and see how the payload is supposed to look.

Related

Python client code gets Address already in use error

I have a simple Python HTTP server which also connects other HTTP servers to fetch some data. While, connecting to other servers, my server acts as an http client, but the socket created for incoming connection requests still keeps listening from port 8080 (I have a different socket for the client).
The list of other servers that I need to connect and fetch data is stored in a JSON file and I have code like this
with open(myjsonfile, 'r') as json_file:
entries = json.load(json_file)
for entry in entries.keys():
address = entries[entry]['address']
port = int(entries[entry]['port'])
client_port = config.server_port + 50000
host = gethostname()
# request the TXT file
sock = socket(AF_INET,SOCK_STREAM)
# sock.setsockopt(SOL_SOCKET, SO_REUSEADDR, 1)
sock.bind((host, client_port))
sock.connect((address, port))
reqmsg = "GET /" + config.txt_name + " HTTP/1.0\r\n\r\n"
sock.sendall(reqmsg.encode())
response = ''
response = sock.recv(2048).decode()
pos = response.find("\r\n\r\n")
txt_data = response[pos+4:]
# processing the received data here
sock.close()
# request the IMG file
sock = socket(AF_INET,SOCK_STREAM)
# sock.setsockopt(SOL_SOCKET, SO_REUSEADDR, 1)
sock.bind((host, client_port))
sock.connect((address, port))
reqmsg = "GET /" + config.img_name + " HTTP/1.0\r\n\r\n"
sock.sendall(reqmsg.encode())
response = b''
while True:
recvdata = sock.recv(2048)
if (len(recvdata) < 1):
break
response = response + recvdata
pos = response.find(b"\r\n\r\n")
img_data = response[pos+4:]
# working with the image here
sock.close()
I have to use a set port number for my client because this is how the server identifies me. However, I sometimes get an "Address already in use" error for the second socket.bind() call (the one for the image). Without the bind() calls, my code works fine.
I tried setting socket options (commented out in the code above) and using pycurl with the LOCALPORT property set to client_port value above, but still getting the same error.
What could be the reason behind the error message? I think I open and close the sockets so the operating system should free the port for further use (I think)?
Thanks
PS : This is a small project, not a production system, hence do not bother with "why use port numbers to identify clients"

There is a TIME_WAIT after the session is shutdown to make sure that there are still no live packets in the network.When you re-create the same tuple and one of those packets shows up, it would be treated as a valid packet for your connection this will cause an error state.Usually 2xpacket max age, before the packet is discarded
Before you create a connection with the same tuple, all the packets from the previous session must be dead.
Try using;
...
sock.setsockopt(SOL_SOCKET, SO_REUSEADDR, 1)
sock.listen([backlog])
sock.bind((host, client_port))
...
socket.listen([backlog])

Python Socket Programming Simple Web Server, Trying to access a html file from server

So, i am trying to create a simple server on python and trying to access a html file in the same directory through it, but as the output i keep on getting ready to serve...
output
EDIT:
Put an HTML file (e.g., HelloWorld.html) in the same directory that the server is in. Run the server program. Determine the IP address of the host that is running the server (e.g., 128.238.251.26). From another host, open a browser and provide the corresponding URL. For example:
http://128.238.251.26:6789/HelloWorld.html
‘HelloWorld.html’ is the name of the file you placed in the server directory. Note also the use of the port number after the colon. You need to replace this port number with whatever port you have used in the server code. In the above example, we have used the port number 6789. The browser should then display the contents of HelloWorld.html. If you omit ":6789", the browser will assume port 80 and you will get the web page from the server only if your server is listening at port 80.
Then try to get a file that is not present at the server. You should get a “404 Not Found” message.
#import socket module
from socket import *
serverSocket = socket(AF_INET, SOCK_STREAM)
#Prepare a sever socket
serverSocket.bind(('', 12006))
serverSocket.listen(1)
while True:
print 'Ready to serve...'
#Establish the connection
connectionSocket, addr = serverSocket.accept()
try:
message = connectionSocket.recv(1024)
filename = message.split()[1]
f = open(filename[1:])
outputdata = f.read()
f.close()
#Send one HTTP header line into socket
connectionSocket.send('HTTP/1.0 200 OK\r\n\r\n')
#Send the content of the requested file to the client
for i in range(0, len(outputdata)):
connectionSocket.send(outputdata[i])
connectionSocket.close()
except IOError:
#Send response message for file not found
connectionSocket.send('404 Not Found')
#Close client socket
connectionSocket.close()
serverSocket.close()

Your output is a standart output, that used through print function. you should to make a request to your server and you'll get the correct output
If your server on your local machine, you should use localhost address; if not, you should use your server ip. Also you should to specify a port. 12006 in your case. localhost:12006 as an example
Also socket.send method requires a byte-like object. not string
If it's only a string literal, you should to add a b character before the first quotation mark
Example:
connectionSocket.send(b'HTTP/1.0 200 OK\r\n\r\n')
If it is a string object, you should to encode it:
connectionSocket.send(outputdata[i].encode())
Check out the documentation

http proxy server only working for https sites

I am trying to use this code to create an HTTP proxy cache server. When I run the code it starts running and connects to the port and everything but when I try to connect from the browser, for example, it opens a port on 55555 if I type in localhost:52523/www.google.com it works fine but when I try other sites specifically HTTP, for example, localhost:52523/www.microcenter.com or just localhost:52523/google.com it will display localhost didn’t send any data.
ERR_EMPTY_RESPONSE and shows an exception in the console though it creates the cache file on my computer.
I would like to find out how to edit the code so that I can access any website just as I would normally do on the browser without using the proxy server. It should be able to work with www.microcenter.com
import socket
import sys
import urllib
from urlparse import urlparse
Serv_Sock = socket.socket(socket.AF_INET, socket.SOCK_STREAM) # socket.socket
function creates a socket.
port = Serv_Sock.getsockname()[1]
# Server socket created, bound and starting to listen
Serv_Sock = socket.socket(socket.AF_INET, socket.SOCK_STREAM) # socket.socket
function creates a socket.
Serv_Sock.bind(('',port))
Serv_Sock.listen(5)
port = Serv_Sock.getsockname()[1]
# Prepare a server socket
print ("starting server on port %s...,"%(port))
def caching_object(splitMessage, Cli_Sock):
#this method is responsible for caching
Req_Type = splitMessage[0]
Req_path = splitMessage[1]
Req_path = Req_path[1:]
print "Request is ", Req_Type, " to URL : ", Req_path
#Searching available cache if file exists
url = urlparse(Req_path)
file_to_use = "/" + Req_path
print file_to_use
try:
file = open(file_to_use[5:], "r")
data = file.readlines()
print "File Present in Cache\n"
#Proxy Server Will Send A Response Message
#Cli_Sock.send("HTTP/1.0 200 OK\r\n")
#Cli_Sock.send("Content-Type:text/html")
#Cli_Sock.send("\r\n")
#Proxy Server Will Send Data
for i in range(0, len(data)):
print (data[i])
Cli_Sock.send(data[i])
print "Reading file from cache\n"
except IOError:
print "File Doesn't Exists In Cache\n fetching file from server \n
creating cache"
serv_proxy = socket.socket(socket.AF_INET, socket.SOCK_STREAM)
host_name = Req_path
print "HOST NAME:", host_name
try:
serv_proxy.connect((url.host_name, 80))
print 'Socket connected to port 80 of the host'
fileobj = serv_proxy.makefile('r', 0)
fileobj.write("GET " + "http://" + Req_path + " HTTP/1.0\n\n")
# Read the response into buffer
buffer = fileobj.readlines()
# Create a new file in the cache for the requested file.
# Also send the response in the buffer to client socket
# and the corresponding file in the cache
tmpFile = open(file_to_use, "wb")
for data in buffer:
tmpFile.write(data)
tcpCliSock.send(data)
except:
print 'Illegal Request'
Cli_Sock.close()
while True:
# Start receiving data from the client
print 'Initiating server... \n Accepting connection\n'
Cli_Sock, addr = Serv_Sock.accept() # Accept a connection from client
#print addr
print ' connection received from: ', addr
message = Cli_Sock.recv(1024) #Recieves data from Socket
splitMessage = message.split()
if len(splitMessage) <= 1:
continue
caching_object(splitMessage, Cli_Sock)

Your errors are not related to URI scheme (http or https) but to files and socket use.
When you are trying to open a file with:
file = open(file_to_use[1:], "r")
you are passing an illegal file path (http://ebay.com/ in your example).
As you are working with URIs, you could use a parser like urlparse, so you can handle better the schema, hostname, etc...
For example:
url = urlparse(Req_path)
file_to_use = url.hostname
file = open(file_to_use, "r")
and use only the hostname as a file name.
Another problem is with the use of sockets. Function connect should receive hostname, not hostname with schema which is what you are doing. Again, with the help of the parser:
serv_proxy.connect((url.hostname, 80))
Besides that, you do not call listen on a client (see examples), so you can remove that line.
Finally, again to create the new file, use the hostname:
tmpFile = open(file_to_use, "wb")

Cache Proxy Server Returning 404 with www.google.com

I have a homework assignment which involves implementing a proxy cache server in Python for web pages. Here is my implementation of it
from socket import *
import sys
def main():
#Create a server socket, bind it to a port and start listening
tcpSerSock = socket(AF_INET, SOCK_STREAM) #Initializing socket
tcpSerSock.bind(("", 8030)) #Binding socket to port
tcpSerSock.listen(5) #Listening for page requests
while True:
#Start receiving data from the client
print 'Ready to serve...'
tcpCliSock, addr = tcpSerSock.accept()
print 'Received a connection from:', addr
message = tcpCliSock.recv(1024)
print message
#Extract the filename from the given message
filename = ""
try:
filename = message.split()[1].partition("/")[2].replace("/", "")
except:
continue
fileExist = False
try: #Check whether the file exists in the cache
f = open(filename, "r")
outputdata = f.readlines()
fileExist = True
#ProxyServer finds a cache hit and generates a response message
tcpCliSock.send("HTTP/1.0 200 OK\r\n")
tcpCliSock.send("Content-Type:text/html\r\n")
for data in outputdata:
tcpCliSock.send(data)
print 'Read from cache'
except IOError: #Error handling for file not found in cache
if fileExist == False:
c = socket(AF_INET, SOCK_STREAM) #Create a socket on the proxyserver
try:
srv = getaddrinfo(filename, 80)
c.connect((filename, 80)) #https://docs.python.org/2/library/socket.html
# Create a temporary file on this socket and ask port 80 for
# the file requested by the client
fileobj = c.makefile('r', 0)
fileobj.write("GET " + "http://" + filename + " HTTP/1.0\r\n")
# Read the response into buffer
buffr = fileobj.readlines()
# Create a new file in the cache for the requested file.
# Also send the response in the buffer to client socket and the
# corresponding file in the cache
tmpFile = open(filename,"wb")
for data in buffr:
tmpFile.write(data)
tcpCliSock.send(data)
except:
print "Illegal request"
else: #File not found
print "404: File Not Found"
tcpCliSock.close() #Close the client and the server sockets
main()
I configured my browsers to use my proxy server like so
But my problem when I run it is that no matter what web page I try to access it returns a 404 error with the initial connection and then a connection reset error with subsequent connections. I have no idea why so any help would be greatly appreciated, thanks!

There are quite a number of issues with your code.
Your URL parser is quite cumbersome. Instead of the line
filename = message.split()[1].partition("/")[2].replace("/", "")
I would use
import re
parsed_url = re.match(r'GET\s+http://(([^/]+)(.*))\sHTTP/1.*$', message)
local_path = parsed_url.group(3)
host_name = parsed_url.group(2)
filename = parsed_url.group(1)
If you catch an exception there, you should probably throw an error because it is a request your proxy doesn't understand (e.g. a POST).
When you assemble your request to the destination server, you then use
fileobj.write("GET {object} HTTP/1.0\n".format(object=local_path))
fileobj.write("Host: {host}\n\n".format(host=host_name))
You should also include some of the header lines from the original request because they can make a major difference to the returned content.
Furthermore, you currently cache the entire response with all header lines, so you should not add your own when serving from cache.
What you have doesn't work, anyway, because there is no guarantee that you will get a 200 and text/html content. You should check the response code and only cache if you did indeed get a 200.

HTTP Proxy server [Python]

I have this code here for an HTTP Proxy server which works. How do I create another program called "Client"? The client will send HTTP GET requests to multiple web servers
via the proxy server. The client program connects to the proxy and sends HTTP GET requests for the following 3 websites: (www.google.com, www.yahoo.com, www.stackoverflow.com)
with an interval of 30 seconds.
-My overall question is how do i send HTTP GET requests to the proxy server from python, not my web browser?
OSX 10.10.3 Python 3.4
When i call this proxy in my terminal:
python 1869.py 2000
You can give any port number in place of 2000.
Output:
starting server ....
Initiating server...
Accepting connection
Then in my browser (im using the most update version of chrome) I type in:
localhost:2000/www.stackoverflow.com
And my terminal output is:
request is GET to URL : www.stackoverflow.com
/www.stackoverflow.com
File Present in Cache
HTTP/1.1 301 Moved Permanently
Content-Type: text/html; charset=UTF-8
Location: http://stackoverflow.com/
Date: Thu, 07 May 2015 17:45:40 GMT
Content-Length: 148
Connection: close
Age: 0
<head><title>Document Moved</title></head>
<body><h1>Object Moved</h1>This document may be found here</body>
Reading file from cache
Initiating server...
Accepting connection
Proxy code:
import socket
import sys
if len(sys.argv) <= 1:
print 'Usage: "python S.py port"\n[port : It is the port of the Proxy Server'
sys.exit(2)
# Server socket created, bound and starting to listen
Serv_Port = int(sys.argv[1]) # sys.argv[1] is the port number entered by the user
Serv_Sock = socket.socket(socket.AF_INET, socket.SOCK_STREAM) # socket.socket function creates a socket.
# Prepare a server socket
print "starting server ...."
Serv_Sock.bind(('', Serv_Port))
Serv_Sock.listen(5)
def caching_object(splitMessage, Cli_Sock):
#this method is responsible for caching
Req_Type = splitMessage[0]
Req_path = splitMessage[1]
Req_path = Req_path[1:]
print "Request is ", Req_Type, " to URL : ", Req_path
#Searching available cache if file exists
file_to_use = "/" + Req_path
print file_to_use
try:
file = open(file_to_use[1:], "r")
data = file.readlines()
print "File Present in Cache\n"
#Proxy Server Will Send A Response Message
#Cli_Sock.send("HTTP/1.0 200 OK\r\n")
#Cli_Sock.send("Content-Type:text/html")
#Cli_Sock.send("\r\n")
#Proxy Server Will Send Data
for i in range(0, len(data)):
print (data[i])
Cli_Sock.send(data[i])
print "Reading file from cache\n"
except IOError:
print "File Doesn't Exists In Cache\n fetching file from server \n creating cache"
serv_proxy = socket.socket(socket.AF_INET, socket.SOCK_STREAM)
host_name = Req_path
print "HOST NAME:", host_name
try:
serv_proxy.connect((host_name, 80))
print 'Socket connected to port 80 of the host'
fileobj = serv_proxy.makefile('r', 0)
fileobj.write("GET " + "http://" + Req_path + " HTTP/1.0\n\n")
# Read the response into buffer
buffer = fileobj.readlines()
# Create a new file in the cache for the requested file.
# Also send the response in the buffer to client socket
# and the corresponding file in the cache
tmpFile = open("./" + Req_path, "wb")
for i in range(0, len(buffer)):
tmpFile.write(buffer[i])
Cli_Sock.send(buffer[i])
except:
print 'Illegal Request'
Cli_Sock.close()
while True:
# Start receiving data from the client
print 'Initiating server... \n Accepting connection\n'
Cli_Sock, addr = Serv_Sock.accept() # Accept a connection from client
#print addr
print ' connection received from: ', addr
message = Cli_Sock.recv(1024) #Recieves data from Socket
splitMessage = message.split()
if len(splitMessage) <= 1:
continue
caching_object(splitMessage, Cli_Sock)

you can use httpie from linux' terminal:
http [get/post] http://host:port/link_name/
in your app you can use requests:
pip install requests
import requests
response = requests.get(url='url', proxies='proxies')
response.close()
print(response)

Develop Reference

Python is a programming language that lets you work quickly and integrate systems more effectively.

http proxy cache server not limiting browser functionality - python

Related

Python client code gets Address already in use error

Python Socket Programming Simple Web Server, Trying to access a html file from server

http proxy server only working for https sites

Cache Proxy Server Returning 404 with www.google.com

HTTP Proxy server [Python]

Categories

Resources