I need to build a http server without using an HTTP library.
I have the server running and an html page beeing loaded but my <img src="..."/> tags are not beeing loaded, I recive the call but cannot preset the png/JPEG in the page.
httpServer.py
# Define socket host and port
SERVER_HOST = '0.0.0.0'
SERVER_PORT = 8000
# Create socket
server_socket = socket.socket(socket.AF_INET, socket.SOCK_STREAM)
server_socket.setsockopt(socket.SOL_SOCKET, socket.SO_REUSEADDR, 1)
server_socket.bind((SERVER_HOST, SERVER_PORT))
server_socket.listen(1)
print('Listening on port %s ...' % SERVER_PORT)
while True:
# Wait for client connections
client_connection, client_address = server_socket.accept()
# Handle client request
request = client_connection.recv(1024).decode()
content = handle_request(request)
# Send HTTP response
if content:
response = 'HTTP/1.1 200 OK\n\n'
response += content
else:
response = 'HTTP/1.1 404 NOT FOUND\n\nFile Not Found'
client_connection.sendall(response.encode())
client_connection.close()
# Close socket
server_socket.close()
Function where handles the call
def handle_request(request):
http = HttpHandler.HTTPHandler
# Parse headers
print(request)
headers = request.split('\n')
get_content = headers[0].split()
accept = headers[6].split()
type_content = accept[1].split('/')
try:
# Filename
filename = get_content[1]
if get_content[0] == "GET":
content = http.get(None, get_content[1], type_content[0])
return content
except FileNotFoundError:
return None
class to handle the http verbs
class HTTPHandler:
def get(self, args, type):
if args == '/':
args = '/index.html'
fin = open('htdocs' + args)
if type != "image":
fin = open('htdocs/' + args)
if type == "image":
fin = open('htdocs/' + args, 'rb')
# Read file contents
content = fin.read()
fin.close()
return content
Realize that I´m trying to make an HTTP 1.1, if you see anything out of pattern fell free to say thanks in advance.
I don't know where you've learnt how HTTP works but I'm pretty sure that you did not study the actual standard which you should do when implementing a protocol. Some notes about your implementation:
Line ends should be \r\n not \n. This is true for both responses from the server as requests from the client.
You are assuming that the clients requests is never larger than 1024 bytes and that it can be read within a single recv. But, requests can have arbitrary length and there is no guarantee that you get all within a single recv (TCP is a streaming protocol and not a message protocol).
While it is kind of ok to simply close the TCP connection after the body it would be better to include the length of the body in the Content-length header or use chunked transfer encoding.
The type of the content should be given by using the Content-Type header, i.e. Content-type: text/html for HTML and Content-type: image/jpeg for JPEG images. Without this browser might guess correctly or wrongly what the type might be or depending on the context might also insist on a proper content-type header.
Apart from that, if you debug such problems it is helpful to find out what gets actually exchanged between client and server. It might be that you've checked this for yourself but you did not include such information into your question. Your only error description is "...I recive the call but cannot preset the png/JPEG in the page" and then a dump of your code.
httpServer.py
Ended up like:
while True:
# Wait for client connections
client_connection, client_address = server_socket.accept()
# Handle client request
request = client_connection.recv(10240).decode()
content = handle_request(request)
# Send HTTP response
if content:
if str(content).find("html") > 0:
client_connection.send('HTTP/1.1 200 OK\n\n'.encode())
client_connection.send(content.encode())
else:
client_connection.send('HTTP/1.1 200 OK\r\n'.encode())
client_connection.send("Content-Type: image/jpeg\r\n".encode())
client_connection.send("Accept-Ranges: bytes\r\n\r\n".encode())
client_connection.send(content)
else:
response = 'HTTP/1.1 404 NOT FOUND\r\nFile Not Found'
client_connection.close()
And the Get method like:
class HTTPHandler:
def get(self, args, type):
if args == '/':
args = '/index.html'
fin = open('htdocs' + args)
if type != "image":
fin = open('htdocs/' + args)
if type.find("html") == -1:
image_data = open('htdocs/' + args, 'rb')
bytes = image_data.read()
# Content-Type: image/jpeg, image/png \n\n
content = bytes
fin.close()
return content
# Read file contents
content = fin.read()
fin.close()
return content
Related
I am trying to get the content from the body but when I need the sock.recv I always have a return of 0 bytes. I already got the header and it worked fine but I received it byte by byte. my problem now is: I have the content length the length of the header and also the header. Now i want to get the body separately
Task 3d
PS: I am aware that it can't work as it is on the screenshot but I haven't found another solution yet
# -*- coding: utf-8 -*-
"""
task3.simple_web_browser
XX-YYY-ZZZ
<Your name>
"""
from socket import gethostbyname, socket, timeout, AF_INET, SOCK_STREAM
from sys import argv
HTTP_HEADER_DELIMITER = b'\r\n\r\n'
CONTENT_LENGTH_FIELD = b'Content-Length:'
HTTP_PORT = 80
ONE_BYTE_LENGTH = 1
def create_http_request(host, path, method='GET'):
'''
Create a sequence of bytes representing an HTTP/1.1 request of the given method.
:param host: the string contains the hostname of the remote server
:param path: the string contains the path to the document to retrieve
:param method: the string contains the HTTP request method (e.g., 'GET', 'HEAD', etc...)
:return: a bytes object contains the HTTP request to send to the remote server
e.g.,) An HTTP/1.1 GET request to http://compass.unisg.ch/
host: compass.unisg.ch
path: /
return: b'GET / HTTP/1.1\nHost: compass.unisg.ch\r\n\r\n'
'''
### Task 3(a) ###
# Hint 1: see RFC7230-7231 for the HTTP/1.1 syntax and semantics specification
# https://tools.ietf.org/html/rfc7230
# https://tools.ietf.org/html/rfc7231
# Hint 2: use str.encode() to create an encoded version of the string as a bytes object
# https://docs.python.org/3/library/stdtypes.html#str.encode
r = '{} {} HTTP/1.1\nHost: {}\r\n\r\n'.format(method, path, host)
response = r.encode()
return response
### Task 3(a) END ###
def get_content_length(header):
'''
Get the integer value from the Content-Length HTTP header field if it
is found in the given sequence of bytes. Otherwise returns 0.
:param header: the bytes object contains the HTTP header
:return: an integer value of the Content-Length, 0 if not found
'''
### Task 3(c) ###
# Hint: use CONTENT_LENGTH_FIELD to find the value
# Note that the Content-Length field may not be always at the end of the header.
for line in header.split(b'\r\n'):
if CONTENT_LENGTH_FIELD in line:
return int(line[len(CONTENT_LENGTH_FIELD):])
return 0
### Task 3(c) END ###
def receive_body(sock, content_length):
'''
Receive the body content in the HTTP response
:param sock: the TCP socket connected to the remote server
:param content_length: the size of the content to recieve
:return: a bytes object contains the remaining content (body) in the HTTP response
'''
### Task 3(d) ###
body = bytes()
data = bytes()
while True:
data = sock.recv(content_length)
if len(data)<=0:
break
else:
body += data
return body
### Task 3(d) END ###
def receive_http_response_header(sock):
'''
Receive the HTTP response header from the TCP socket.
:param sock: the TCP socket connected to the remote server
:return: a bytes object that is the HTTP response header received
'''
### Task 3(b) ###
# Hint 1: use HTTP_HEADER_DELIMITER to determine the end of the HTTP header
# Hint 2: use sock.recv(ONE_BYTE_LENGTH) to receive the chunk byte-by-byte
header = bytes()
chunk = bytes()
try:
while HTTP_HEADER_DELIMITER not in chunk:
chunk = sock.recv(ONE_BYTE_LENGTH)
if not chunk:
break
else:
header += chunk
except socket.timeout:
pass
return header
### Task 3(b) END ###
def main():
# Change the host and path below to test other web sites!
host = 'example.com'
path = '/index.html'
print(f"# Retrieve data from http://{host}{path}")
# Get the IP address of the host
ip_address = gethostbyname(host)
print(f"> Remote server {host} resolved as {ip_address}")
# Establish the TCP connection to the host
sock = socket(AF_INET, SOCK_STREAM)
sock.connect((ip_address, HTTP_PORT))
print(f"> TCP Connection to {ip_address}:{HTTP_PORT} established")
# Uncomment this comment block after Task 3(a)
# Send an HTTP GET request
http_get_request = create_http_request(host, path)
print('\n# HTTP GET request ({} bytes)'.format(len(http_get_request)))
print(http_get_request)
sock.sendall(http_get_request)
# Comment block for Task 3(a) END
# Uncomment this comment block after Task 3(b)
# Receive the HTTP response header
header = receive_http_response_header(sock)
print(type(header))
print('\n# HTTP Response Header ({} bytes)'.format(len(header)))
print(header)
# Comment block for Task 3(b) END
# Uncomment this comment block after Task 3(c)
content_length = get_content_length(header)
print('\n# Content-Length')
print(f"{content_length} bytes")
# Comment block for Task 3(c) END
# Uncomment this comment block after Task 3(d)
body = receive_body(sock, content_length)
print('\n# Body ({} bytes)'.format(len(body)))
print(body)
# Comment block for Task 3(d) END
if __name__ == '__main__':
main()
I have the content length the length of the header and also the header
You don't. In receive_http_response_header you check HTTP_HEADER_DELIMITER always only again the latest byte (chunk instead of header) which means that you'll never match the end of the header:
while HTTP_HEADER_DELIMITER not in chunk:
chunk = sock.recv(ONE_BYTE_LENGTH)
if not chunk:
break
else:
header += chunk
Then you just assume that you've read the full header while in reality you've read the full response. This means that another recv you are doing when trying to read the response body will only return 0 since no more data are there, i.e. the body was already included in what you consider the HTTP header.
Apart from that receive_body is wrong too since you make a similar mistake is in receive_http_response_header: the goal is not to read recv content_length bytes again and again until no more bytes are available as you do currently but the goal is to return when length(body) matches the content_length and continue reading the remaining data as long the body is not fully read.
I have a homework assignment which involves implementing a proxy cache server in Python for web pages. Here is my implementation of it
from socket import *
import sys
def main():
#Create a server socket, bind it to a port and start listening
tcpSerSock = socket(AF_INET, SOCK_STREAM) #Initializing socket
tcpSerSock.bind(("", 8030)) #Binding socket to port
tcpSerSock.listen(5) #Listening for page requests
while True:
#Start receiving data from the client
print 'Ready to serve...'
tcpCliSock, addr = tcpSerSock.accept()
print 'Received a connection from:', addr
message = tcpCliSock.recv(1024)
print message
#Extract the filename from the given message
filename = ""
try:
filename = message.split()[1].partition("/")[2].replace("/", "")
except:
continue
fileExist = False
try: #Check whether the file exists in the cache
f = open(filename, "r")
outputdata = f.readlines()
fileExist = True
#ProxyServer finds a cache hit and generates a response message
tcpCliSock.send("HTTP/1.0 200 OK\r\n")
tcpCliSock.send("Content-Type:text/html\r\n")
for data in outputdata:
tcpCliSock.send(data)
print 'Read from cache'
except IOError: #Error handling for file not found in cache
if fileExist == False:
c = socket(AF_INET, SOCK_STREAM) #Create a socket on the proxyserver
try:
srv = getaddrinfo(filename, 80)
c.connect((filename, 80)) #https://docs.python.org/2/library/socket.html
# Create a temporary file on this socket and ask port 80 for
# the file requested by the client
fileobj = c.makefile('r', 0)
fileobj.write("GET " + "http://" + filename + " HTTP/1.0\r\n")
# Read the response into buffer
buffr = fileobj.readlines()
# Create a new file in the cache for the requested file.
# Also send the response in the buffer to client socket and the
# corresponding file in the cache
tmpFile = open(filename,"wb")
for data in buffr:
tmpFile.write(data)
tcpCliSock.send(data)
except:
print "Illegal request"
else: #File not found
print "404: File Not Found"
tcpCliSock.close() #Close the client and the server sockets
main()
I configured my browsers to use my proxy server like so
But my problem when I run it is that no matter what web page I try to access it returns a 404 error with the initial connection and then a connection reset error with subsequent connections. I have no idea why so any help would be greatly appreciated, thanks!
There are quite a number of issues with your code.
Your URL parser is quite cumbersome. Instead of the line
filename = message.split()[1].partition("/")[2].replace("/", "")
I would use
import re
parsed_url = re.match(r'GET\s+http://(([^/]+)(.*))\sHTTP/1.*$', message)
local_path = parsed_url.group(3)
host_name = parsed_url.group(2)
filename = parsed_url.group(1)
If you catch an exception there, you should probably throw an error because it is a request your proxy doesn't understand (e.g. a POST).
When you assemble your request to the destination server, you then use
fileobj.write("GET {object} HTTP/1.0\n".format(object=local_path))
fileobj.write("Host: {host}\n\n".format(host=host_name))
You should also include some of the header lines from the original request because they can make a major difference to the returned content.
Furthermore, you currently cache the entire response with all header lines, so you should not add your own when serving from cache.
What you have doesn't work, anyway, because there is no guarantee that you will get a 200 and text/html content. You should check the response code and only cache if you did indeed get a 200.
a strange question.i can see about text_content. but i can't see pic_content,i don't know why. use chrome have a fault."Resource interpreted as Image but transferred with MIME type text/html:" and i discover maybe my if else codes does not work.pic must be image/jpg can output..but i don't know why and how to do ...
import socket
#Address
#httpq server
HOST = ''
PORT = 8000
#prepare HTTP response
#start line head and body
text_content = '''HTTP/1.x 200 OK
Content-Type: text/html
<html>
<head>
<title>WOW</titile>
</head>
<p>WOW,python server</p>
<img src="test.jpg/">
</html>
'''
#read picture ,put into HTTP format
f = open('test.jpg','rb')
pic_content = '''
HTTP/1.x 200 OK
Content-Type: image/jpg
'''
pic_content = pic_content + f.read()
f.close()
#cofigure socket
s = socket.socket(socket.AF_INET,socket.SOCK_STREAM)
s.bind((HOST,PORT))
#infinite loop,server forever
while True:
#3:maxinum number of requests waitting
s.listen(3)
conn, addr = s.accept()
request = conn.recv(1024)
method = request.split(' ')[0]
src = request.split(' ')[1]
#deal with GET method
if method =='GET':
#URL
if src =='/test.jpg':
content = pic_content
else:content = text_content
print 'Connected by',addr
print 'Request is:', request
conn.sendall(content)
#close connection
conn.close()
You are missing blank lines between the HTTP header and the body (after Content-Type).
Since you are already logging the request, you can see that the browser is requesting test.jpg/ with an extraneous trailing slash. Remove it, and it works.
From command line
client.py Aaron 12000 HelloWorld.html GET
client.py
def main(argv):
serverName = argv[0]
serverPort = int(argv[1])
fileName = argv[2]
typeOfHttpRequest = argv[3]
clientSocket = socket(AF_INET, SOCK_STREAM)
clientSocket.connect((serverName, serverPort))
clientSocket.send(typeOfHttpRequest + " " + fileName + " HTTP/1.1\r\n\r\n")
content = clientSocket.recv(1024)
print content
clientSocket.close()
if __name__ == "__main__":
main(sys.argv[1:])
server.py
while True:
#Establish the connection
print 'Ready to serve....'
connectionSocket, addr = serverSocket.accept()
try:
message = connectionSocket.recv(1024)
typeOfRequest = message.split()[0]
filename = message.split()[1]
print typeOfRequest
print filename
f = open(filename[1:])
outputdata = f.read()
if typeOfRequest == 'GET':
for i in range(0, len(outputdata)):
connectionSocket.send(outputdata[i])
connectionSocket.close()
elif typeOfRequest == 'HEAD':
connectionSocket.send(True)
except IOError:
connectionSocket.send('HTTP/1.1 404 Not Found')
connectionSocket.close()
serverSocket.close()
I have put HelloWorld.html in the same directory as server.py but this always generates an IOError. Anyone know why it might be the case?
The files are located in C:\Networking
os.getcwd shows C:\Networking
HelloWorld.html is located in C:/networking/HelloWorld.html
Filename prints out correctly.
As you might have noticed, you were trying to strip the / from the beginning of the URL, though it was not there. However, there are other errors in your code, which mean that it does not work like a HTTP server:
First of all, recv() is not guaranteed to read all the data - even if there would be total of 1024 bytes written to a socket, recv(1024) could return just 10 bytes, say. Thus it is better to do in a loop:
buffer = []
while True:
data = connection_socket.recv(1024)
if not data:
break
buffer.append(data)
message = ''.join(buffer)
Now message is guaranteed to contain everything.
Next, to handle the header lines of the request, you can use
from cStringIO import StringIO
message_reader = StringIO(message)
first_line = next(message_reader)
type_of_request, filename = message.split()[:2]
With this it is easier to extend your code for more complete HTTP support.
Now open the file with open, with with statement:
with open(filename) as f:
output_data = f.read()
This ensures that the file is closed properly too.
Finally, when you respond to the request, you should answer with HTTP/1.0, not HTTP/1.1 as you are not supporting the full extent of HTTP/1.1. Also, even an OK response needs to respond with full headers, say with:
HTTP/1.1 200 OK
Server: My Python Server
Content-Length: 123
Content-Type: text/html;charset=UTF-8
data goes here....
Thus your send routine should do that:
if typeOfRequest == 'GET':
headers = ('HTTP/1.0 200 OK\r\n'
'Server: My Python Server\r\n'
'Content-Length: %d\r\n'
'Content-Type: text/html;charset=UTF-8\r\n\r\n'
'Connection: close\r\n'
) % len(output_data)
connection_socket.sendall(headers)
connection_socket.sendall(output_data)
Notice how you can use sendall to send all data from a string.
I would like to be able to construct a raw HTTP request and send it with a socket. Obviously, you would like me to use something like urllib and urllib2 but I do not want to use that.
It would have to look something like this:
import socket
tcpsoc = socket.socket(socket.AF_INET, socket.SOCK_STREAM)
tcpsoc.bind(('72.14.192.58', 80)) #bind to googles ip
tcpsoc.send('HTTP REQUEST')
response = tcpsoc.recv()
Obviously you would also have to request the page/file and get and post parameters
import socket
import urlparse
CONNECTION_TIMEOUT = 5
CHUNK_SIZE = 1024
HTTP_VERSION = 1.0
CRLF = "\r\n\r\n"
socket.setdefaulttimeout(CONNECTION_TIMEOUT)
def receive_all(sock, chunk_size=CHUNK_SIZE):
'''
Gather all the data from a request.
'''
chunks = []
while True:
chunk = sock.recv(int(chunk_size))
if chunk:
chunks.append(chunk)
else:
break
return ''.join(chunks)
def get(url, **kw):
kw.setdefault('timeout', CONNECTION_TIMEOUT)
kw.setdefault('chunk_size', CHUNK_SIZE)
kw.setdefault('http_version', HTTP_VERSION)
kw.setdefault('headers_only', False)
kw.setdefault('response_code_only', False)
kw.setdefault('body_only', False)
url = urlparse.urlparse(url)
sock = socket.socket(socket.AF_INET, socket.SOCK_STREAM)
sock.settimeout(kw.get('timeout'))
sock.connect((url.netloc, url.port or 80))
msg = 'GET {0} HTTP/{1} {2}'
sock.sendall(msg.format(url.path or '/', kw.get('http_version'), CRLF))
data = receive_all(sock, chunk_size=kw.get('chunk_size'))
sock.shutdown(socket.SHUT_RDWR)
sock.close()
data = data.decode(errors='ignore')
headers = data.split(CRLF, 1)[0]
request_line = headers.split('\n')[0]
response_code = request_line.split()[1]
headers = headers.replace(request_line, '')
body = data.replace(headers, '').replace(request_line, '')
if kw['body_only']:
return body
if kw['headers_only']:
return headers
if kw['response_code_only']:
return response_code
else:
return data
print(get('http://www.google.com/'))
Most of what you need to know is in the HTTP/1.1 spec, which you should definitely study if you want to roll your own HTTP implementation: http://www.w3.org/Protocols/rfc2616/rfc2616.html
Yes, basically you just have to write text, something like :
GET /pageyouwant.html HTTP/1.1[CRLF]
Host: google.com[CRLF]
Connection: close[CRLF]
User-Agent: MyAwesomeUserAgent/1.0.0[CRLF]
Accept-Encoding: gzip[CRLF]
Accept-Charset: ISO-8859-1,UTF-8;q=0.7,*;q=0.7[CRLF]
Cache-Control: no-cache[CRLF]
[CRLF]
Feel free to remove / add headers at will.
"""
This module is a demonstration of how to send
a HTTP request from scratch with the socket module.
"""
import socket
__author__ = "Ricky L Wilson."
__email__ = "echoquote#gmail.com"
"""
The term CRLF refers to Carriage Return (ASCII 13, \r)
Line Feed (ASCII 10, \n).
They're used to note the termination of a line,
however, dealt with
differently in today's popular Operating Systems.
"""
CRLF = '\r\n'
SP = ' '
CR = '\r'
HOST = 'www.example.com'
PORT = 80
PATH = '/'
def request_header(host=HOST, path=PATH):
"""
Create a request header.
"""
return CRLF.join([
"GET {} HTTP/1.1".format(path), "Host: {}".format(host),
"Connection: Close\r\n\r\n"
])
def parse_header(header):
# The response-header fields allow the server
# to pass additional information about the
# response which cannot be placed in the
# Status- Line.
# These header fields give information about
# the server and about further access to the
# resource identified by the Request-URI.
header_fields = header.split(CR)
# The first line of a Response message is the
# Status-Line, consisting of the protocol version
# followed by a numeric status code and its
# associated textual phrase, with each element
# separated by SP characters.
# Get the numeric status code from the status
# line.
code = header_fields.pop(0).split(' ')[1]
header = {}
for field in header_fields:
key, value = field.split(':', 1)
header[key.lower()] = value
return header, code
def send_request(host=HOST, path=PATH, port=PORT):
"""
Send an HTTP GET request.
"""
# Create the socket object.
"""
A network socket is an internal endpoint
for sending or receiving data within a node on
a computer network.
Concretely, it is a representation of this
endpoint in networking software (protocol stack),
such as an entry in a table
(listing communication protocol,
destination, status, etc.), and is a form of
system resource.
The term socket is analogous to physical
female connectors, communication between two
nodes through a channel being visualized as a
cable with two male connectors plugging into
sockets at each node.
Similarly, the term port (another term for a female connector)
is used for external endpoints at a node,
and the term socket is also used for an
internal endpoint of local inter-process
communication (IPC) (not over a network).
However, the analogy is limited, as network
communication need not be one-to-one or
have a dedicated communication channel.
"""
sock = socket.socket()
# Connect to the server.
sock.connect((host, port))
# Send the request.
sock.send(request_header(host, path))
# Get the response.
response = ''
chuncks = sock.recv(4096)
while chuncks:
response += chuncks
chuncks = sock.recv(4096)
# HTTP headers will be separated from the body by an empty line
header, _, body = response.partition(CRLF + CRLF)
header, code = parse_header(header)
return header, code, body
header, code, body = send_request(host='www.google.com')
print code, CRLF, body
For a working example to guide you, you might want to take a look at libcurl, a library written in the C language that:
does what you want and much more;
is a snap to use;
is widely deployed; and
is actively supported.
It's a beautiful thing and one of the best examples of what open source can and should be.