Creating a raw HTTP request with sockets - python

I would like to be able to construct a raw HTTP request and send it with a socket. Obviously, you would like me to use something like urllib and urllib2 but I do not want to use that.
It would have to look something like this:
import socket
tcpsoc = socket.socket(socket.AF_INET, socket.SOCK_STREAM)
tcpsoc.bind(('', 80)) #bind to googles ip
tcpsoc.send('HTTP REQUEST')
response = tcpsoc.recv()
Obviously you would also have to request the page/file and get and post parameters

import socket
import urlparse
CRLF = "\r\n\r\n"
def receive_all(sock, chunk_size=CHUNK_SIZE):
Gather all the data from a request.
chunks = []
while True:
chunk = sock.recv(int(chunk_size))
if chunk:
return ''.join(chunks)
def get(url, **kw):
kw.setdefault('timeout', CONNECTION_TIMEOUT)
kw.setdefault('chunk_size', CHUNK_SIZE)
kw.setdefault('http_version', HTTP_VERSION)
kw.setdefault('headers_only', False)
kw.setdefault('response_code_only', False)
kw.setdefault('body_only', False)
url = urlparse.urlparse(url)
sock = socket.socket(socket.AF_INET, socket.SOCK_STREAM)
sock.connect((url.netloc, url.port or 80))
msg = 'GET {0} HTTP/{1} {2}'
sock.sendall(msg.format(url.path or '/', kw.get('http_version'), CRLF))
data = receive_all(sock, chunk_size=kw.get('chunk_size'))
data = data.decode(errors='ignore')
headers = data.split(CRLF, 1)[0]
request_line = headers.split('\n')[0]
response_code = request_line.split()[1]
headers = headers.replace(request_line, '')
body = data.replace(headers, '').replace(request_line, '')
if kw['body_only']:
return body
if kw['headers_only']:
return headers
if kw['response_code_only']:
return response_code
return data

Most of what you need to know is in the HTTP/1.1 spec, which you should definitely study if you want to roll your own HTTP implementation:

Yes, basically you just have to write text, something like :
GET /pageyouwant.html HTTP/1.1[CRLF]
Connection: close[CRLF]
User-Agent: MyAwesomeUserAgent/1.0.0[CRLF]
Accept-Encoding: gzip[CRLF]
Accept-Charset: ISO-8859-1,UTF-8;q=0.7,*;q=0.7[CRLF]
Cache-Control: no-cache[CRLF]
Feel free to remove / add headers at will.

This module is a demonstration of how to send
a HTTP request from scratch with the socket module.
import socket
__author__ = "Ricky L Wilson."
__email__ = ""
The term CRLF refers to Carriage Return (ASCII 13, \r)
Line Feed (ASCII 10, \n).
They're used to note the termination of a line,
however, dealt with
differently in today's popular Operating Systems.
CRLF = '\r\n'
SP = ' '
CR = '\r'
HOST = ''
PORT = 80
PATH = '/'
def request_header(host=HOST, path=PATH):
Create a request header.
return CRLF.join([
"GET {} HTTP/1.1".format(path), "Host: {}".format(host),
"Connection: Close\r\n\r\n"
def parse_header(header):
# The response-header fields allow the server
# to pass additional information about the
# response which cannot be placed in the
# Status- Line.
# These header fields give information about
# the server and about further access to the
# resource identified by the Request-URI.
header_fields = header.split(CR)
# The first line of a Response message is the
# Status-Line, consisting of the protocol version
# followed by a numeric status code and its
# associated textual phrase, with each element
# separated by SP characters.
# Get the numeric status code from the status
# line.
code = header_fields.pop(0).split(' ')[1]
header = {}
for field in header_fields:
key, value = field.split(':', 1)
header[key.lower()] = value
return header, code
def send_request(host=HOST, path=PATH, port=PORT):
Send an HTTP GET request.
# Create the socket object.
A network socket is an internal endpoint
for sending or receiving data within a node on
a computer network.
Concretely, it is a representation of this
endpoint in networking software (protocol stack),
such as an entry in a table
(listing communication protocol,
destination, status, etc.), and is a form of
system resource.
The term socket is analogous to physical
female connectors, communication between two
nodes through a channel being visualized as a
cable with two male connectors plugging into
sockets at each node.
Similarly, the term port (another term for a female connector)
is used for external endpoints at a node,
and the term socket is also used for an
internal endpoint of local inter-process
communication (IPC) (not over a network).
However, the analogy is limited, as network
communication need not be one-to-one or
have a dedicated communication channel.
sock = socket.socket()
# Connect to the server.
sock.connect((host, port))
# Send the request.
sock.send(request_header(host, path))
# Get the response.
response = ''
chuncks = sock.recv(4096)
while chuncks:
response += chuncks
chuncks = sock.recv(4096)
# HTTP headers will be separated from the body by an empty line
header, _, body = response.partition(CRLF + CRLF)
header, code = parse_header(header)
return header, code, body
header, code, body = send_request(host='')
print code, CRLF, body

For a working example to guide you, you might want to take a look at libcurl, a library written in the C language that:
does what you want and much more;
is a snap to use;
is widely deployed; and
is actively supported.
It's a beautiful thing and one of the best examples of what open source can and should be.


Receive the body content in the HTTP response in python

I am trying to get the content from the body but when I need the sock.recv I always have a return of 0 bytes. I already got the header and it worked fine but I received it byte by byte. my problem now is: I have the content length the length of the header and also the header. Now i want to get the body separately
Task 3d
PS: I am aware that it can't work as it is on the screenshot but I haven't found another solution yet
# -*- coding: utf-8 -*-
<Your name>
from socket import gethostbyname, socket, timeout, AF_INET, SOCK_STREAM
from sys import argv
CONTENT_LENGTH_FIELD = b'Content-Length:'
def create_http_request(host, path, method='GET'):
Create a sequence of bytes representing an HTTP/1.1 request of the given method.
:param host: the string contains the hostname of the remote server
:param path: the string contains the path to the document to retrieve
:param method: the string contains the HTTP request method (e.g., 'GET', 'HEAD', etc...)
:return: a bytes object contains the HTTP request to send to the remote server
e.g.,) An HTTP/1.1 GET request to
path: /
return: b'GET / HTTP/1.1\nHost:\r\n\r\n'
### Task 3(a) ###
# Hint 1: see RFC7230-7231 for the HTTP/1.1 syntax and semantics specification
# Hint 2: use str.encode() to create an encoded version of the string as a bytes object
r = '{} {} HTTP/1.1\nHost: {}\r\n\r\n'.format(method, path, host)
response = r.encode()
return response
### Task 3(a) END ###
def get_content_length(header):
Get the integer value from the Content-Length HTTP header field if it
is found in the given sequence of bytes. Otherwise returns 0.
:param header: the bytes object contains the HTTP header
:return: an integer value of the Content-Length, 0 if not found
### Task 3(c) ###
# Hint: use CONTENT_LENGTH_FIELD to find the value
# Note that the Content-Length field may not be always at the end of the header.
for line in header.split(b'\r\n'):
return int(line[len(CONTENT_LENGTH_FIELD):])
return 0
### Task 3(c) END ###
def receive_body(sock, content_length):
Receive the body content in the HTTP response
:param sock: the TCP socket connected to the remote server
:param content_length: the size of the content to recieve
:return: a bytes object contains the remaining content (body) in the HTTP response
### Task 3(d) ###
body = bytes()
data = bytes()
while True:
data = sock.recv(content_length)
if len(data)<=0:
body += data
return body
### Task 3(d) END ###
def receive_http_response_header(sock):
Receive the HTTP response header from the TCP socket.
:param sock: the TCP socket connected to the remote server
:return: a bytes object that is the HTTP response header received
### Task 3(b) ###
# Hint 1: use HTTP_HEADER_DELIMITER to determine the end of the HTTP header
# Hint 2: use sock.recv(ONE_BYTE_LENGTH) to receive the chunk byte-by-byte
header = bytes()
chunk = bytes()
while HTTP_HEADER_DELIMITER not in chunk:
chunk = sock.recv(ONE_BYTE_LENGTH)
if not chunk:
header += chunk
except socket.timeout:
return header
### Task 3(b) END ###
def main():
# Change the host and path below to test other web sites!
host = ''
path = '/index.html'
print(f"# Retrieve data from http://{host}{path}")
# Get the IP address of the host
ip_address = gethostbyname(host)
print(f"> Remote server {host} resolved as {ip_address}")
# Establish the TCP connection to the host
sock = socket(AF_INET, SOCK_STREAM)
sock.connect((ip_address, HTTP_PORT))
print(f"> TCP Connection to {ip_address}:{HTTP_PORT} established")
# Uncomment this comment block after Task 3(a)
# Send an HTTP GET request
http_get_request = create_http_request(host, path)
print('\n# HTTP GET request ({} bytes)'.format(len(http_get_request)))
# Comment block for Task 3(a) END
# Uncomment this comment block after Task 3(b)
# Receive the HTTP response header
header = receive_http_response_header(sock)
print('\n# HTTP Response Header ({} bytes)'.format(len(header)))
# Comment block for Task 3(b) END
# Uncomment this comment block after Task 3(c)
content_length = get_content_length(header)
print('\n# Content-Length')
print(f"{content_length} bytes")
# Comment block for Task 3(c) END
# Uncomment this comment block after Task 3(d)
body = receive_body(sock, content_length)
print('\n# Body ({} bytes)'.format(len(body)))
# Comment block for Task 3(d) END
if __name__ == '__main__':
I have the content length the length of the header and also the header
You don't. In receive_http_response_header you check HTTP_HEADER_DELIMITER always only again the latest byte (chunk instead of header) which means that you'll never match the end of the header:
while HTTP_HEADER_DELIMITER not in chunk:
chunk = sock.recv(ONE_BYTE_LENGTH)
if not chunk:
header += chunk
Then you just assume that you've read the full header while in reality you've read the full response. This means that another recv you are doing when trying to read the response body will only return 0 since no more data are there, i.e. the body was already included in what you consider the HTTP header.
Apart from that receive_body is wrong too since you make a similar mistake is in receive_http_response_header: the goal is not to read recv content_length bytes again and again until no more bytes are available as you do currently but the goal is to return when length(body) matches the content_length and continue reading the remaining data as long the body is not fully read.

Why is there both a resolver and a handler in dnslib's DNSServer()?

I am trying to understand the resolving process in dnslib. Specifically, I am using the example to implement a local DNS proxy which will send a request to specific servers based on the query.
(copy of
# -*- coding: utf-8 -*-
from __future__ import print_function
import binascii,socket,struct
from dnslib import DNSRecord,RCODE
from dnslib.server import DNSServer,DNSHandler,BaseResolver,DNSLogger
class ProxyResolver(BaseResolver):
Proxy resolver - passes all requests to upstream DNS server and
returns response
Note that the request/response will be each be decoded/re-encoded
a) Request packet received by DNSHandler and parsed into DNSRecord
b) DNSRecord passed to ProxyResolver, serialised back into packet
and sent to upstream DNS server
c) Upstream DNS server returns response packet which is parsed into
d) ProxyResolver returns DNSRecord to DNSHandler which re-serialises
this into packet and returns to client
In practice this is actually fairly useful for testing but for a
'real' transparent proxy option the DNSHandler logic needs to be
modified (see PassthroughDNSHandler)
def __init__(self,address,port,timeout=0):
self.address = address
self.port = port
self.timeout = timeout
def resolve(self,request,handler):
if handler.protocol == 'udp':
proxy_r = request.send(self.address,self.port,
proxy_r = request.send(self.address,self.port,
reply = DNSRecord.parse(proxy_r)
except socket.timeout:
reply = request.reply()
reply.header.rcode = getattr(RCODE,'NXDOMAIN')
return reply
class PassthroughDNSHandler(DNSHandler):
Modify DNSHandler logic (get_reply method) to send directly to
upstream DNS server rather then decoding/encoding packet and
passing to Resolver (The request/response packets are still
parsed and logged but this is not inline)
def get_reply(self,data):
host,port = self.server.resolver.address,self.server.resolver.port
request = DNSRecord.parse(data)
if self.protocol == 'tcp':
data = struct.pack("!H",len(data)) + data
response = send_tcp(data,host,port)
response = response[2:]
response = send_udp(data,host,port)
reply = DNSRecord.parse(response)
return response
def send_tcp(data,host,port):
Helper function to send/receive DNS TCP request
(in/out packets will have prepended TCP length header)
sock = socket.socket(socket.AF_INET,socket.SOCK_STREAM)
response = sock.recv(8192)
length = struct.unpack("!H",bytes(response[:2]))[0]
while len(response) - 2 < length:
response += sock.recv(8192)
return response
def send_udp(data,host,port):
Helper function to send/receive DNS UDP request
sock = socket.socket(socket.AF_INET,socket.SOCK_DGRAM)
response,server = sock.recvfrom(8192)
return response
if __name__ == '__main__':
import argparse,sys,time
p = argparse.ArgumentParser(description="DNS Proxy")
help="Local proxy port (default:53)")
help="Local proxy listen address (default:all)")
metavar="<dns server:port>",
help="Upstream DNS server:port (default:")
help="TCP proxy (default: UDP only)")
help="Upstream timeout (default: 5s)")
help="Dont decode/re-encode request/response (default: off)")
help="Log hooks to enable (default: +request,+reply,+truncated,+error,-recv,-send,-data)")
help="Log prefix (timestamp/handler/resolver) (default: False)")
args = p.parse_args()
args.dns,_,args.dns_port = args.upstream.partition(':')
args.dns_port = int(args.dns_port or 53)
print("Starting Proxy Resolver (%s:%d -> %s:%d) [%s]" % (
args.address or "*",args.port,
"UDP/TCP" if args.tcp else "UDP"))
resolver = ProxyResolver(args.dns,args.dns_port,args.timeout)
handler = PassthroughDNSHandler if args.passthrough else DNSHandler
logger = DNSLogger(args.log,args.log_prefix)
udp_server = DNSServer(resolver,
if args.tcp:
tcp_server = DNSServer(resolver,
while udp_server.isAlive():
I have successfully injected the business logic of my interactions in the get_reply method of PassthroughDNSHandler:
def get_reply(self, data):
host, port = self.server.resolver.address, self.server.resolver.port
request = DNSRecord.parse(data)
query = str(request.questions[0].qname)
if query.endswith(''):
server = ""
elif any(query.endswith(x) for x in ["", ""]):
server = ""
server = ""
log.debug(f"{query} redirected to {server}")
response = send_udp(data, server, port)
reply = DNSRecord.parse(response)
This works as expected, the right DNS servers are queried depending on the request.
The part which I do not understand is the involvement of ProxyResolver in the initialization of the server.
resolver = ProxyResolver(args.dns, args.dns_port, args.timeout)
udp_server = DNSServer(resolver, port=53, address="", handler=PassthroughDNSHandler)
What is resolver needed for?
As far as I understand, the packet received on is passed, via handler, to PassthroughDNSHandler and actually processed in get_reply().
It is then further sent to the relevant upstream server via send_udp() and the response is forwarded back to the requesting client.
At what point does resolver gets into the picture and what is its role?
I put a breakpoint in the resolve() method of ProxyResolver and it is never hit.

Send image over http python

I need to build a http server without using an HTTP library.
I have the server running and an html page beeing loaded but my <img src="..."/> tags are not beeing loaded, I recive the call but cannot preset the png/JPEG in the page.
# Define socket host and port
# Create socket
server_socket = socket.socket(socket.AF_INET, socket.SOCK_STREAM)
server_socket.setsockopt(socket.SOL_SOCKET, socket.SO_REUSEADDR, 1)
server_socket.bind((SERVER_HOST, SERVER_PORT))
print('Listening on port %s ...' % SERVER_PORT)
while True:
# Wait for client connections
client_connection, client_address = server_socket.accept()
# Handle client request
request = client_connection.recv(1024).decode()
content = handle_request(request)
# Send HTTP response
if content:
response = 'HTTP/1.1 200 OK\n\n'
response += content
response = 'HTTP/1.1 404 NOT FOUND\n\nFile Not Found'
# Close socket
Function where handles the call
def handle_request(request):
http = HttpHandler.HTTPHandler
# Parse headers
headers = request.split('\n')
get_content = headers[0].split()
accept = headers[6].split()
type_content = accept[1].split('/')
# Filename
filename = get_content[1]
if get_content[0] == "GET":
content = http.get(None, get_content[1], type_content[0])
return content
except FileNotFoundError:
return None
class to handle the http verbs
class HTTPHandler:
def get(self, args, type):
if args == '/':
args = '/index.html'
fin = open('htdocs' + args)
if type != "image":
fin = open('htdocs/' + args)
if type == "image":
fin = open('htdocs/' + args, 'rb')
# Read file contents
content =
return content
Realize that I´m trying to make an HTTP 1.1, if you see anything out of pattern fell free to say thanks in advance.
I don't know where you've learnt how HTTP works but I'm pretty sure that you did not study the actual standard which you should do when implementing a protocol. Some notes about your implementation:
Line ends should be \r\n not \n. This is true for both responses from the server as requests from the client.
You are assuming that the clients requests is never larger than 1024 bytes and that it can be read within a single recv. But, requests can have arbitrary length and there is no guarantee that you get all within a single recv (TCP is a streaming protocol and not a message protocol).
While it is kind of ok to simply close the TCP connection after the body it would be better to include the length of the body in the Content-length header or use chunked transfer encoding.
The type of the content should be given by using the Content-Type header, i.e. Content-type: text/html for HTML and Content-type: image/jpeg for JPEG images. Without this browser might guess correctly or wrongly what the type might be or depending on the context might also insist on a proper content-type header.
Apart from that, if you debug such problems it is helpful to find out what gets actually exchanged between client and server. It might be that you've checked this for yourself but you did not include such information into your question. Your only error description is "...I recive the call but cannot preset the png/JPEG in the page" and then a dump of your code.
Ended up like:
while True:
# Wait for client connections
client_connection, client_address = server_socket.accept()
# Handle client request
request = client_connection.recv(10240).decode()
content = handle_request(request)
# Send HTTP response
if content:
if str(content).find("html") > 0:
client_connection.send('HTTP/1.1 200 OK\n\n'.encode())
client_connection.send('HTTP/1.1 200 OK\r\n'.encode())
client_connection.send("Content-Type: image/jpeg\r\n".encode())
client_connection.send("Accept-Ranges: bytes\r\n\r\n".encode())
response = 'HTTP/1.1 404 NOT FOUND\r\nFile Not Found'
And the Get method like:
class HTTPHandler:
def get(self, args, type):
if args == '/':
args = '/index.html'
fin = open('htdocs' + args)
if type != "image":
fin = open('htdocs/' + args)
if type.find("html") == -1:
image_data = open('htdocs/' + args, 'rb')
bytes =
# Content-Type: image/jpeg, image/png \n\n
content = bytes
return content
# Read file contents
content =
return content

scapy not getting HTTP requests on non-standard ports

I'm trying to get all HTTP GET/POST incoming requests.
I've found this code which seems promising, but I've noticed that it only works on standard HTTP ports. If I use another port (say 8080) scapy can't find the HTTP layer (packet.haslayer(http.HTTPRequest) == False).
This is the code:
from scapy.all import IP, sniff
from scapy.layers import http
def process_tcp_packet(packet):
Processes a TCP packet, and if it contains an HTTP request, it prints it.
if not packet.haslayer(http.HTTPRequest):
# This packet doesn't contain an HTTP request so we skip it
http_layer = packet.getlayer(http.HTTPRequest)
ip_layer = packet.getlayer(IP)
print '\n{0[src]} just requested a {1[Method]} {1[Host]}{1[Path]}'.format(ip_layer.fields, http_layer.fields)
# Start sniffing the network.
sniff(filter='tcp', prn=process_tcp_packet)
Any idea about what I'm doing wrong?
**** UPDATE ****
I got rid of scapy_http and just looked at the raw data.
I'm posting here the code I'm using - it works fine for me as I'm debugging a strange problem I'm having on Apache Solr - but your mileage may vary.
def process_tcp_packet(packet):
msg = list()
if packet.dport == 8983 and packet.haslayer(Raw):
lines = packet.getlayer(Raw).load.split(os.linesep)
# GET or POST requests ?
if lines[0].lower().startswith('get /') or lines[0].lower().startswith('post /'):
# request forwarded by a proxy ?
_ = [line.split()[1] for line in lines if line.startswith('X-Forwarded-For:')]
s_ip = (_[0] if _ else packet.getlayer(IP).src)
# collect info
d_port = packet.getlayer(IP).dport
now =
msg.append('%s: %s > :%i -> %s' % (now, s_ip, d_port, lines[0]))
except Exception, e:
msg.append('%s: ERROR! %s' % (, str(e)))
with open('http.log', 'a') as out:
for m in msg:
out.write(m + os.linesep)

how to show html page before it was fully downloaded

I am sending some data after html content (it has a little delay) in the same response during keep-alive session and want browser to show html before the whole response is downloaded.
For example, I have text 'hello, ' and a function that computes 'world' with delay (let it be 1 sec). So I want browser to show 'hello, ' immediately and 'world' with its delay. Is it possible within one request (so, without ajax)
Here is example python code of what I do (highlighted:
import socket
from time import sleep
sock = socket.socket()
sock.bind(('', 9090))
conn, addr = sock.accept()
def give_me_a_world():
return b'world'
while True:
data = conn.recv(1024)
response = b'HTTP/1.1 200 OK\r\n'\
b'Content-Length: 12\r\n'\
b'Connection: keep-alive\r\n'\
b'hello, '
conn.send(response) # send first part
conn.send(give_me_a_world()) # make a delay and send other part
First and foremost, read How the web works: HTTP and CGI explained to understand why and where your current code violates HTTP and thus doesn't and shouldn't work.
Now, as per Is Content-Length or Transfer-Encoding is mandatory in a response when it has body , after fixing the violation, you should
omit the Content-Length header and close the socket after sending all the data, OR
calculate the length of the entire data to send beforehand and specify it in the Content-Length header
You could use Transfer-Encoding: chunked and omit Content-Length.
It works fine on text browsers like curl and Links WWW Browser. But, modern graphical browsers don't really start rendering until it reaches some sort of buffer boundaries.
import socket
from time import sleep
sock = socket.socket()
sock.bind(('', 9090))
conn, addr = sock.accept()
def give_me_a_world():
return b'5\r\n'\
while True:
data = conn.recv(1024)
response = b'HTTP/1.1 200 OK\r\n'\
b'Transfer-Encoding: chunked\r\n'\
b'Connection: keep-alive\r\n'\
b'hello, \r\n'
conn.send(response) # send first part
conn.send(give_me_a_world()) # make a delay and send other part

