Receive the body content in the HTTP response in python - python

I am trying to get the content from the body but when I need the sock.recv I always have a return of 0 bytes. I already got the header and it worked fine but I received it byte by byte. my problem now is: I have the content length the length of the header and also the header. Now i want to get the body separately
Task 3d
PS: I am aware that it can't work as it is on the screenshot but I haven't found another solution yet
# -*- coding: utf-8 -*-
"""
task3.simple_web_browser
XX-YYY-ZZZ
<Your name>
"""
from socket import gethostbyname, socket, timeout, AF_INET, SOCK_STREAM
from sys import argv
HTTP_HEADER_DELIMITER = b'\r\n\r\n'
CONTENT_LENGTH_FIELD = b'Content-Length:'
HTTP_PORT = 80
ONE_BYTE_LENGTH = 1
def create_http_request(host, path, method='GET'):
'''
Create a sequence of bytes representing an HTTP/1.1 request of the given method.
:param host: the string contains the hostname of the remote server
:param path: the string contains the path to the document to retrieve
:param method: the string contains the HTTP request method (e.g., 'GET', 'HEAD', etc...)
:return: a bytes object contains the HTTP request to send to the remote server
e.g.,) An HTTP/1.1 GET request to http://compass.unisg.ch/
host: compass.unisg.ch
path: /
return: b'GET / HTTP/1.1\nHost: compass.unisg.ch\r\n\r\n'
'''
### Task 3(a) ###
# Hint 1: see RFC7230-7231 for the HTTP/1.1 syntax and semantics specification
# https://tools.ietf.org/html/rfc7230
# https://tools.ietf.org/html/rfc7231
# Hint 2: use str.encode() to create an encoded version of the string as a bytes object
# https://docs.python.org/3/library/stdtypes.html#str.encode
r = '{} {} HTTP/1.1\nHost: {}\r\n\r\n'.format(method, path, host)
response = r.encode()
return response
### Task 3(a) END ###
def get_content_length(header):
'''
Get the integer value from the Content-Length HTTP header field if it
is found in the given sequence of bytes. Otherwise returns 0.
:param header: the bytes object contains the HTTP header
:return: an integer value of the Content-Length, 0 if not found
'''
### Task 3(c) ###
# Hint: use CONTENT_LENGTH_FIELD to find the value
# Note that the Content-Length field may not be always at the end of the header.
for line in header.split(b'\r\n'):
if CONTENT_LENGTH_FIELD in line:
return int(line[len(CONTENT_LENGTH_FIELD):])
return 0
### Task 3(c) END ###
def receive_body(sock, content_length):
'''
Receive the body content in the HTTP response
:param sock: the TCP socket connected to the remote server
:param content_length: the size of the content to recieve
:return: a bytes object contains the remaining content (body) in the HTTP response
'''
### Task 3(d) ###
body = bytes()
data = bytes()
while True:
data = sock.recv(content_length)
if len(data)<=0:
break
else:
body += data
return body
### Task 3(d) END ###
def receive_http_response_header(sock):
'''
Receive the HTTP response header from the TCP socket.
:param sock: the TCP socket connected to the remote server
:return: a bytes object that is the HTTP response header received
'''
### Task 3(b) ###
# Hint 1: use HTTP_HEADER_DELIMITER to determine the end of the HTTP header
# Hint 2: use sock.recv(ONE_BYTE_LENGTH) to receive the chunk byte-by-byte
header = bytes()
chunk = bytes()
try:
while HTTP_HEADER_DELIMITER not in chunk:
chunk = sock.recv(ONE_BYTE_LENGTH)
if not chunk:
break
else:
header += chunk
except socket.timeout:
pass
return header
### Task 3(b) END ###
def main():
# Change the host and path below to test other web sites!
host = 'example.com'
path = '/index.html'
print(f"# Retrieve data from http://{host}{path}")
# Get the IP address of the host
ip_address = gethostbyname(host)
print(f"> Remote server {host} resolved as {ip_address}")
# Establish the TCP connection to the host
sock = socket(AF_INET, SOCK_STREAM)
sock.connect((ip_address, HTTP_PORT))
print(f"> TCP Connection to {ip_address}:{HTTP_PORT} established")
# Uncomment this comment block after Task 3(a)
# Send an HTTP GET request
http_get_request = create_http_request(host, path)
print('\n# HTTP GET request ({} bytes)'.format(len(http_get_request)))
print(http_get_request)
sock.sendall(http_get_request)
# Comment block for Task 3(a) END
# Uncomment this comment block after Task 3(b)
# Receive the HTTP response header
header = receive_http_response_header(sock)
print(type(header))
print('\n# HTTP Response Header ({} bytes)'.format(len(header)))
print(header)
# Comment block for Task 3(b) END
# Uncomment this comment block after Task 3(c)
content_length = get_content_length(header)
print('\n# Content-Length')
print(f"{content_length} bytes")
# Comment block for Task 3(c) END
# Uncomment this comment block after Task 3(d)
body = receive_body(sock, content_length)
print('\n# Body ({} bytes)'.format(len(body)))
print(body)
# Comment block for Task 3(d) END
if __name__ == '__main__':
main()

I have the content length the length of the header and also the header
You don't. In receive_http_response_header you check HTTP_HEADER_DELIMITER always only again the latest byte (chunk instead of header) which means that you'll never match the end of the header:
while HTTP_HEADER_DELIMITER not in chunk:
chunk = sock.recv(ONE_BYTE_LENGTH)
if not chunk:
break
else:
header += chunk
Then you just assume that you've read the full header while in reality you've read the full response. This means that another recv you are doing when trying to read the response body will only return 0 since no more data are there, i.e. the body was already included in what you consider the HTTP header.
Apart from that receive_body is wrong too since you make a similar mistake is in receive_http_response_header: the goal is not to read recv content_length bytes again and again until no more bytes are available as you do currently but the goal is to return when length(body) matches the content_length and continue reading the remaining data as long the body is not fully read.

Related

Why is looping Python TCP receiver receives message partially?

I have a server that sends some messages to a client. The print(trades) statement shows that file reader reads the entire csv correctly:
def send_past_trades(self):
with open('OTC_trade_records.csv',newline='') as f:
connectionSocket, addr = self.client
trades = f.read()
#print(trades)
connectionSocket.send(trades.encode())
My client receiver is like this:
msg = b""
while(True):
print("Batch receiving")
tmp = client_socket.recv(4096)
msg += tmp
if len(tmp) < 4096:
print(len(tmp))
break
msg = msg.decode()
print(msg)
The message is always partial. I can see that the statement "Batch receiving" is printed once and when the break statement is initiated, the length of the last message is 1228.
Another point is, this code works fine in my local system. The problem occurs when I put the server program to a remote server machine. Is there a possibility that server intervenes with the message?
Note: I tried different ways to solve the problem such as sending only package size of 1024b messages in a loop. Still partial messages received.
The problem is here:
if len(tmp) < 4096:
print(len(tmp))
break
The point is that bufsize in recv(bufsize) is a maximum size to receive. The recv will return fewer bytes if there are fewer available.
I suggest to define a simple communication protocol that describes the structure of a message with a header and payload. The header must contain the payload size. This allows you to parse data from the incoming TCP stream and get the exact size of the received data. Then you can receive requested amount of data.
A client will look like this:
import struct
# Receive a header
header = connection.recv(8)
(length,) = struct.unpack('>Q', header) # Parse payload length
# Receive the payload
payload = b''
while len(payload) < length:
to_read = length - len(payload)
payload += connection.recv(4096 if to_read > 4096 else to_read)
Server:
import struct
with open('OTC_trade_records.csv',newline='') as f:
connectionSocket, addr = self.client
trades = f.read()
length = struct.pack('>Q', len(trades))
connectionSocket.sendall(length)
connectionSocket.sendall(trades)

Why is there both a resolver and a handler in dnslib's DNSServer()?

I am trying to understand the resolving process in dnslib. Specifically, I am using the proxy.py example to implement a local DNS proxy which will send a request to specific servers based on the query.
(copy of proxy.py):
# -*- coding: utf-8 -*-
from __future__ import print_function
import binascii,socket,struct
from dnslib import DNSRecord,RCODE
from dnslib.server import DNSServer,DNSHandler,BaseResolver,DNSLogger
class ProxyResolver(BaseResolver):
"""
Proxy resolver - passes all requests to upstream DNS server and
returns response
Note that the request/response will be each be decoded/re-encoded
twice:
a) Request packet received by DNSHandler and parsed into DNSRecord
b) DNSRecord passed to ProxyResolver, serialised back into packet
and sent to upstream DNS server
c) Upstream DNS server returns response packet which is parsed into
DNSRecord
d) ProxyResolver returns DNSRecord to DNSHandler which re-serialises
this into packet and returns to client
In practice this is actually fairly useful for testing but for a
'real' transparent proxy option the DNSHandler logic needs to be
modified (see PassthroughDNSHandler)
"""
def __init__(self,address,port,timeout=0):
self.address = address
self.port = port
self.timeout = timeout
def resolve(self,request,handler):
try:
if handler.protocol == 'udp':
proxy_r = request.send(self.address,self.port,
timeout=self.timeout)
else:
proxy_r = request.send(self.address,self.port,
tcp=True,timeout=self.timeout)
reply = DNSRecord.parse(proxy_r)
except socket.timeout:
reply = request.reply()
reply.header.rcode = getattr(RCODE,'NXDOMAIN')
return reply
class PassthroughDNSHandler(DNSHandler):
"""
Modify DNSHandler logic (get_reply method) to send directly to
upstream DNS server rather then decoding/encoding packet and
passing to Resolver (The request/response packets are still
parsed and logged but this is not inline)
"""
def get_reply(self,data):
host,port = self.server.resolver.address,self.server.resolver.port
request = DNSRecord.parse(data)
self.server.logger.log_request(self,request)
if self.protocol == 'tcp':
data = struct.pack("!H",len(data)) + data
response = send_tcp(data,host,port)
response = response[2:]
else:
response = send_udp(data,host,port)
reply = DNSRecord.parse(response)
self.server.logger.log_reply(self,reply)
return response
def send_tcp(data,host,port):
"""
Helper function to send/receive DNS TCP request
(in/out packets will have prepended TCP length header)
"""
sock = socket.socket(socket.AF_INET,socket.SOCK_STREAM)
sock.connect((host,port))
sock.sendall(data)
response = sock.recv(8192)
length = struct.unpack("!H",bytes(response[:2]))[0]
while len(response) - 2 < length:
response += sock.recv(8192)
sock.close()
return response
def send_udp(data,host,port):
"""
Helper function to send/receive DNS UDP request
"""
sock = socket.socket(socket.AF_INET,socket.SOCK_DGRAM)
sock.sendto(data,(host,port))
response,server = sock.recvfrom(8192)
sock.close()
return response
if __name__ == '__main__':
import argparse,sys,time
p = argparse.ArgumentParser(description="DNS Proxy")
p.add_argument("--port","-p",type=int,default=53,
metavar="<port>",
help="Local proxy port (default:53)")
p.add_argument("--address","-a",default="",
metavar="<address>",
help="Local proxy listen address (default:all)")
p.add_argument("--upstream","-u",default="8.8.8.8:53",
metavar="<dns server:port>",
help="Upstream DNS server:port (default:8.8.8.8:53)")
p.add_argument("--tcp",action='store_true',default=False,
help="TCP proxy (default: UDP only)")
p.add_argument("--timeout","-o",type=float,default=5,
metavar="<timeout>",
help="Upstream timeout (default: 5s)")
p.add_argument("--passthrough",action='store_true',default=False,
help="Dont decode/re-encode request/response (default: off)")
p.add_argument("--log",default="request,reply,truncated,error",
help="Log hooks to enable (default: +request,+reply,+truncated,+error,-recv,-send,-data)")
p.add_argument("--log-prefix",action='store_true',default=False,
help="Log prefix (timestamp/handler/resolver) (default: False)")
args = p.parse_args()
args.dns,_,args.dns_port = args.upstream.partition(':')
args.dns_port = int(args.dns_port or 53)
print("Starting Proxy Resolver (%s:%d -> %s:%d) [%s]" % (
args.address or "*",args.port,
args.dns,args.dns_port,
"UDP/TCP" if args.tcp else "UDP"))
resolver = ProxyResolver(args.dns,args.dns_port,args.timeout)
handler = PassthroughDNSHandler if args.passthrough else DNSHandler
logger = DNSLogger(args.log,args.log_prefix)
udp_server = DNSServer(resolver,
port=args.port,
address=args.address,
logger=logger,
handler=handler)
udp_server.start_thread()
if args.tcp:
tcp_server = DNSServer(resolver,
port=args.port,
address=args.address,
tcp=True,
logger=logger,
handler=handler)
tcp_server.start_thread()
while udp_server.isAlive():
time.sleep(1)
I have successfully injected the business logic of my interactions in the get_reply method of PassthroughDNSHandler:
def get_reply(self, data):
host, port = self.server.resolver.address, self.server.resolver.port
request = DNSRecord.parse(data)
query = str(request.questions[0].qname)
if query.endswith('.example.info.'):
server = "192.168.10.1"
elif any(query.endswith(x) for x in ["example.net.", "example.com."]):
server = "10.24.131.10"
else:
server = "1.1.1.1"
log.debug(f"{query} redirected to {server}")
response = send_udp(data, server, port)
reply = DNSRecord.parse(response)
This works as expected, the right DNS servers are queried depending on the request.
The part which I do not understand is the involvement of ProxyResolver in the initialization of the server.
resolver = ProxyResolver(args.dns, args.dns_port, args.timeout)
udp_server = DNSServer(resolver, port=53, address="127.0.0.1", handler=PassthroughDNSHandler)
What is resolver needed for?
As far as I understand, the packet received on 127.0.0.1:53 is passed, via handler, to PassthroughDNSHandler and actually processed in get_reply().
It is then further sent to the relevant upstream server via send_udp() and the response is forwarded back to the requesting client.
At what point does resolver gets into the picture and what is its role?
I put a breakpoint in the resolve() method of ProxyResolver and it is never hit.

Send image over http python

I need to build a http server without using an HTTP library.
I have the server running and an html page beeing loaded but my <img src="..."/> tags are not beeing loaded, I recive the call but cannot preset the png/JPEG in the page.
httpServer.py
# Define socket host and port
SERVER_HOST = '0.0.0.0'
SERVER_PORT = 8000
# Create socket
server_socket = socket.socket(socket.AF_INET, socket.SOCK_STREAM)
server_socket.setsockopt(socket.SOL_SOCKET, socket.SO_REUSEADDR, 1)
server_socket.bind((SERVER_HOST, SERVER_PORT))
server_socket.listen(1)
print('Listening on port %s ...' % SERVER_PORT)
while True:
# Wait for client connections
client_connection, client_address = server_socket.accept()
# Handle client request
request = client_connection.recv(1024).decode()
content = handle_request(request)
# Send HTTP response
if content:
response = 'HTTP/1.1 200 OK\n\n'
response += content
else:
response = 'HTTP/1.1 404 NOT FOUND\n\nFile Not Found'
client_connection.sendall(response.encode())
client_connection.close()
# Close socket
server_socket.close()
Function where handles the call
def handle_request(request):
http = HttpHandler.HTTPHandler
# Parse headers
print(request)
headers = request.split('\n')
get_content = headers[0].split()
accept = headers[6].split()
type_content = accept[1].split('/')
try:
# Filename
filename = get_content[1]
if get_content[0] == "GET":
content = http.get(None, get_content[1], type_content[0])
return content
except FileNotFoundError:
return None
class to handle the http verbs
class HTTPHandler:
def get(self, args, type):
if args == '/':
args = '/index.html'
fin = open('htdocs' + args)
if type != "image":
fin = open('htdocs/' + args)
if type == "image":
fin = open('htdocs/' + args, 'rb')
# Read file contents
content = fin.read()
fin.close()
return content
Realize that I´m trying to make an HTTP 1.1, if you see anything out of pattern fell free to say thanks in advance.
I don't know where you've learnt how HTTP works but I'm pretty sure that you did not study the actual standard which you should do when implementing a protocol. Some notes about your implementation:
Line ends should be \r\n not \n. This is true for both responses from the server as requests from the client.
You are assuming that the clients requests is never larger than 1024 bytes and that it can be read within a single recv. But, requests can have arbitrary length and there is no guarantee that you get all within a single recv (TCP is a streaming protocol and not a message protocol).
While it is kind of ok to simply close the TCP connection after the body it would be better to include the length of the body in the Content-length header or use chunked transfer encoding.
The type of the content should be given by using the Content-Type header, i.e. Content-type: text/html for HTML and Content-type: image/jpeg for JPEG images. Without this browser might guess correctly or wrongly what the type might be or depending on the context might also insist on a proper content-type header.
Apart from that, if you debug such problems it is helpful to find out what gets actually exchanged between client and server. It might be that you've checked this for yourself but you did not include such information into your question. Your only error description is "...I recive the call but cannot preset the png/JPEG in the page" and then a dump of your code.
httpServer.py
Ended up like:
while True:
# Wait for client connections
client_connection, client_address = server_socket.accept()
# Handle client request
request = client_connection.recv(10240).decode()
content = handle_request(request)
# Send HTTP response
if content:
if str(content).find("html") > 0:
client_connection.send('HTTP/1.1 200 OK\n\n'.encode())
client_connection.send(content.encode())
else:
client_connection.send('HTTP/1.1 200 OK\r\n'.encode())
client_connection.send("Content-Type: image/jpeg\r\n".encode())
client_connection.send("Accept-Ranges: bytes\r\n\r\n".encode())
client_connection.send(content)
else:
response = 'HTTP/1.1 404 NOT FOUND\r\nFile Not Found'
client_connection.close()
And the Get method like:
class HTTPHandler:
def get(self, args, type):
if args == '/':
args = '/index.html'
fin = open('htdocs' + args)
if type != "image":
fin = open('htdocs/' + args)
if type.find("html") == -1:
image_data = open('htdocs/' + args, 'rb')
bytes = image_data.read()
# Content-Type: image/jpeg, image/png \n\n
content = bytes
fin.close()
return content
# Read file contents
content = fin.read()
fin.close()
return content

Raw HTTP client not returning any data

import socket
host = 'www.google.com'
port = 80
client = socket.socket(socket.AF_INET, socket.SOCK_STREAM)
try :
client.connect((host, port))
except socket.error:
print ("Err")
package = '00101'
package.encode('utf-8')
client.sendall(package.encode(encoding = 'utf-8'))
response = client.recv(4096)
print (response.decode('UTF-8')
I kept getting b'' as my return, so I'm trying to decode it. The error I receive is unexpected EOF while parsing. Should I not include the decoding() function in my printing? I've tried printing only response, the .decode() function did not decode. What should I try?
You need to send a valid HTTP request. For example:
package = b'''GET /HTTP/1.1
Host: www.google.com
'''
client.sendall(package)
Which correctly returns a redirect on my machine. Note the empty line at the end of package, which ends the request.
When you send b'00101' and start reading, the google server has not yet processed your request and returns nothing. By sending a trailing newline (package = b'00101\n') it will start processing your request, and you will get:
...
<p>Your client has issued a malformed or illegal request. <ins>That’s all we know.</ins>

Creating a raw HTTP request with sockets

I would like to be able to construct a raw HTTP request and send it with a socket. Obviously, you would like me to use something like urllib and urllib2 but I do not want to use that.
It would have to look something like this:
import socket
tcpsoc = socket.socket(socket.AF_INET, socket.SOCK_STREAM)
tcpsoc.bind(('72.14.192.58', 80)) #bind to googles ip
tcpsoc.send('HTTP REQUEST')
response = tcpsoc.recv()
Obviously you would also have to request the page/file and get and post parameters
import socket
import urlparse
CONNECTION_TIMEOUT = 5
CHUNK_SIZE = 1024
HTTP_VERSION = 1.0
CRLF = "\r\n\r\n"
socket.setdefaulttimeout(CONNECTION_TIMEOUT)
def receive_all(sock, chunk_size=CHUNK_SIZE):
'''
Gather all the data from a request.
'''
chunks = []
while True:
chunk = sock.recv(int(chunk_size))
if chunk:
chunks.append(chunk)
else:
break
return ''.join(chunks)
def get(url, **kw):
kw.setdefault('timeout', CONNECTION_TIMEOUT)
kw.setdefault('chunk_size', CHUNK_SIZE)
kw.setdefault('http_version', HTTP_VERSION)
kw.setdefault('headers_only', False)
kw.setdefault('response_code_only', False)
kw.setdefault('body_only', False)
url = urlparse.urlparse(url)
sock = socket.socket(socket.AF_INET, socket.SOCK_STREAM)
sock.settimeout(kw.get('timeout'))
sock.connect((url.netloc, url.port or 80))
msg = 'GET {0} HTTP/{1} {2}'
sock.sendall(msg.format(url.path or '/', kw.get('http_version'), CRLF))
data = receive_all(sock, chunk_size=kw.get('chunk_size'))
sock.shutdown(socket.SHUT_RDWR)
sock.close()
data = data.decode(errors='ignore')
headers = data.split(CRLF, 1)[0]
request_line = headers.split('\n')[0]
response_code = request_line.split()[1]
headers = headers.replace(request_line, '')
body = data.replace(headers, '').replace(request_line, '')
if kw['body_only']:
return body
if kw['headers_only']:
return headers
if kw['response_code_only']:
return response_code
else:
return data
print(get('http://www.google.com/'))
Most of what you need to know is in the HTTP/1.1 spec, which you should definitely study if you want to roll your own HTTP implementation: http://www.w3.org/Protocols/rfc2616/rfc2616.html
Yes, basically you just have to write text, something like :
GET /pageyouwant.html HTTP/1.1[CRLF]
Host: google.com[CRLF]
Connection: close[CRLF]
User-Agent: MyAwesomeUserAgent/1.0.0[CRLF]
Accept-Encoding: gzip[CRLF]
Accept-Charset: ISO-8859-1,UTF-8;q=0.7,*;q=0.7[CRLF]
Cache-Control: no-cache[CRLF]
[CRLF]
Feel free to remove / add headers at will.
"""
This module is a demonstration of how to send
a HTTP request from scratch with the socket module.
"""
import socket
__author__ = "Ricky L Wilson."
__email__ = "echoquote#gmail.com"
"""
The term CRLF refers to Carriage Return (ASCII 13, \r)
Line Feed (ASCII 10, \n).
They're used to note the termination of a line,
however, dealt with
differently in today's popular Operating Systems.
"""
CRLF = '\r\n'
SP = ' '
CR = '\r'
HOST = 'www.example.com'
PORT = 80
PATH = '/'
def request_header(host=HOST, path=PATH):
"""
Create a request header.
"""
return CRLF.join([
"GET {} HTTP/1.1".format(path), "Host: {}".format(host),
"Connection: Close\r\n\r\n"
])
def parse_header(header):
# The response-header fields allow the server
# to pass additional information about the
# response which cannot be placed in the
# Status- Line.
# These header fields give information about
# the server and about further access to the
# resource identified by the Request-URI.
header_fields = header.split(CR)
# The first line of a Response message is the
# Status-Line, consisting of the protocol version
# followed by a numeric status code and its
# associated textual phrase, with each element
# separated by SP characters.
# Get the numeric status code from the status
# line.
code = header_fields.pop(0).split(' ')[1]
header = {}
for field in header_fields:
key, value = field.split(':', 1)
header[key.lower()] = value
return header, code
def send_request(host=HOST, path=PATH, port=PORT):
"""
Send an HTTP GET request.
"""
# Create the socket object.
"""
A network socket is an internal endpoint
for sending or receiving data within a node on
a computer network.
Concretely, it is a representation of this
endpoint in networking software (protocol stack),
such as an entry in a table
(listing communication protocol,
destination, status, etc.), and is a form of
system resource.
The term socket is analogous to physical
female connectors, communication between two
nodes through a channel being visualized as a
cable with two male connectors plugging into
sockets at each node.
Similarly, the term port (another term for a female connector)
is used for external endpoints at a node,
and the term socket is also used for an
internal endpoint of local inter-process
communication (IPC) (not over a network).
However, the analogy is limited, as network
communication need not be one-to-one or
have a dedicated communication channel.
"""
sock = socket.socket()
# Connect to the server.
sock.connect((host, port))
# Send the request.
sock.send(request_header(host, path))
# Get the response.
response = ''
chuncks = sock.recv(4096)
while chuncks:
response += chuncks
chuncks = sock.recv(4096)
# HTTP headers will be separated from the body by an empty line
header, _, body = response.partition(CRLF + CRLF)
header, code = parse_header(header)
return header, code, body
header, code, body = send_request(host='www.google.com')
print code, CRLF, body
For a working example to guide you, you might want to take a look at libcurl, a library written in the C language that:
does what you want and much more;
is a snap to use;
is widely deployed; and
is actively supported.
It's a beautiful thing and one of the best examples of what open source can and should be.

Categories

Resources