I am implementing a simple web-proxy using the python sockets module. After forwarding on the client's HTTP request to the server I use the following method to read the response:
def _read_response(self):
response = ''
while True:
(readable, _, error) = select.select([self.server], [], [self.server], 3)
if error:
break
if readable:
data = self.server.recv(BUFSIZE)
if not data: break
response += data
return response
The above code seems to work in most cases however it is slow. I narrowed this problem down to the line:
data = self.server.recv(BUFSIZE)
This call takes upwards of 20 seconds when there is no longer any data to receive (when data == '').
What is the correct way to read a http response and why does the call to recv() take so long?
Parse the content-length header before reading the body.
Then read only content-length bytes from the server.
You may want to set non-blocking mode of the socket by:
socket.setblocking(flag)
or set a timeout on the socket operation:
socket.settimeout(value)
Related
I have a google function that gets data of few websites
with http request.
I want to send the data back as it comes in and not wait for all of it.
I was thinking streaming the data should work.
But I have no idea how to do it
not on server or client side.
I already trade 20 example of google and non worked for me.
Tried:
response.write
response.send
and
import json
import requests
r = requests.get('https://httpbin.org/stream/20', stream=True)
for line in r.iter_lines():
# filter out keep-alive new lines
if line:
decoded_line = line.decode('utf-8')
print(json.loads(decoded_line))
Not sure which side I am doing wrong the client or server but this not working.
Anyone know how to do streaming data on python from http request / to http res?
I have a Django App that accepts messages from a remote device as a POST message.
This fits well with Django's framework! I used the generic View class (from django.views import View) and defined my own POST function.
But the remote device requires a special reply that I cannot generate in Django (yet). So, I use the Requests library to re-create the POST message and send it up to the manufacturer's cloud server.
That server processes the data, and responds with the special message in the body. Idealy, the entire HTML response message should go back to the remote device. If it does not get a valid reply, it will re-send the message. Which would be annoying!
I've been googling, but am having trouble getting a clear picture on how to either:
(a): Reply back in Django with the Requests.response object without any edits.
(b): Build a Django response and send it back.
Actually, I think I do know how to do (b), but its work. I would rather do (a) if its possible.
Thanks in Advance!
Rich.
Thanks for the comments and questions!
The perils of late night programming: you might over-think something, or miss the obvious. I was so focused on finding a way to return the request.response without any changes/edits I did not even sketch out what option (b) would be.
Well, it turns out its pretty simple:
s = Session()
# Populate POST to cloud with data from remote device request:
req = Request('POST', url, data=data, headers=headers)
prepped = req.prepare()
timeout = 10
retries = 3
while retries > 0:
try:
logger.debug("POST data to remote host")
resp = s.send(prepped, timeout=timeout)
break
except:
logger.debug("remote host connection failed, retry")
retries -= 1
logger.debug("retries left: %d", retries)
time.sleep(.3)
if retries == 0:
pass # There isn't anything I can do if this fails repeatedly...
# Build reply to remote device:
r = HttpResponse(resp.content,
content_type = resp.headers['Content-Type'],
status = resp.status_code,
reason = resp.reason,
)
r['Server'] = resp.headers['Server']
r['Connection'] = resp.headers['Connection']
logger.debug("Returning Server response to remote device")
return r
The Session "s" allows one to use "prepped" and "send", which allows one to monkey with the request object before its sent, and to re-try the send. I think at least some of it can be removed in a refactor; making this process even simpler.
There are 3 HTTP object at play here:
"req" is the POST I send up to the cloud server to get back a special (encrypted) reply.
"resp" is the reply back from the cloud server. The body (.content) contains the special reply.
"r" is the Django HTTP response I need to send back to the remote device that started this ball rolling by POSTing data to my view.
Its pretty simple to populate the response with the data, and set headers to the values returned by the cloud server.
I know this works because the remote device does not POST the same data twice! If there was a mistake anyplace in this process, it would re-send the same data over and over. I copied the While/try loop from a Socket repeater module. I don't know if that is really applicable to HTTP. I have been testing this on live hardware for over 48 hours and so far it has never failed. Timeouts are a question mark too, in that I know the remote device and cloud server have strict limits. So if there is an error in my "repeater", re-trying may not work if the process takes too long. It might be better to just discard/give up on the current POST. And wait for the remote device to re-try. Sorry, refactoring out loud...
We want capture & modify the HTTP response inside proxy just before sending it back to client. We are using python-proxy (http://code.google.com/p/python-proxy/). When we read the HTTP stream from the proxy socket/buffer, the HTTP content/body is encoded/compressed. We need to decompress/decode the HTTP content/body, modify the content and compress/encode it back & return it to proxy to forward the modified response back to client. How do we achieve this using python? Any help will be appreciated.
NOTE: We cannot make any direct call on the URLs since this code will be running inside Proxy.
import gzip
from io import BytesIO
def compress(data):
if type(data) != type(b'bytes'):
data = bytes(data, 'UTF-8')
return gzip.compress(data)
def decompress(data):
with gzip.GzipFile(fileobj=BytesIO(data)) as fh:
try:
unzipped = fh.read()
except struct.error:
return None
return unzipped
Assuming a few things:
The Content-Encoding is gzip
You can actually retrieve the data from the proxy library
You've gathered the data in a correct manner if chunked etc so that it can be decompressed.
I'm trying to replace curl with Python & the requests library. With curl, I can upload a single XML file to a REST server with the curl -T option. I have been unable to do the same with the requests library.
A basic scenario works:
payload = '<person test="10"><first>Carl</first><last>Sagan</last></person>'
headers = {'content-type': 'application/xml'}
r = requests.put(url, data=payload, headers=headers, auth=HTTPDigestAuth("*", "*"))
When I change payload to a bigger string by opening an XML file, the .put method hangs (I use the codecs library to get a proper unicode string). For example, with a 66KB file:
xmlfile = codecs.open('trb-1996-219.xml', 'r', 'utf-8')
headers = {'content-type': 'application/xml'}
content = xmlfile.read()
r = requests.put(url, data=content, headers=headers, auth=HTTPDigestAuth("*", "*"))
I've been looking into using the multipart option (files), but the server doesn't seem to like that.
So I was wondering if there is a way to simulate curl -T behaviour in Python requests library.
UPDATE 1:
The program hangs in textmate, but throws an UnicodeEncodeError error on the commandline. Seems that must be the problem. So the question would be: is there a way to send unicode strings to a server with the requests library?
UPDATE 2:
Thanks to the comment of Martijn Pieters the UnicodeEncodeError went away, but a new issue turned up.
With a literal (ASCII) XML string, logging shows the following lines:
2012-11-11 15:55:05,154 INFO Starting new HTTP connection (1): my.ip.address
2012-11-11 15:55:05,294 DEBUG "PUT /v1/documents?uri=/example/test.xml HTTP/1.1" 401 211
2012-11-11 15:55:05,430 DEBUG "PUT /v1/documents?uri=/example/test.xml HTTP/1.1" 201 0
Seems the server always bounces the first authentication attempt (?) but then accepts the second one.
With a file object (open('trb-1996-219.xml', 'rb')) passed to data, the logfile shows:
2012-11-11 15:50:54,309 INFO Starting new HTTP connection (1): my.ip.address
2012-11-11 15:50:55,105 DEBUG "PUT /v1/documents?uri=/example/test.xml HTTP/1.1" 401 211
2012-11-11 15:51:25,603 WARNING Retrying (0 attempts remain) after connection broken by 'BadStatusLine("''",)': /v1/documents?uri=/example/test.xml
So, first attempt is blocked as before, but no second attempt is made.
According to Martijn Pieters (below), the second issue can be explained by a faulty server (empty line).
I will look into this, but if someone has a workaround (apart from using curl) I wouldn't mind hearing it.
And I am still surprised that the requests library behaves so differently for small string and file object. Isn't the file object serialized before it gets to the server anyway?
To PUT large files, don't read them into memory. Simply pass the file as the data keyword:
xmlfile = open('trb-1996-219.xml', 'rb')
headers = {'content-type': 'application/xml'}
r = requests.put(url, data=xmlfile, headers=headers, auth=HTTPDigestAuth("*", "*"))
Moreover, you were opening the file as unicode (decoding it from UTF-8). As you'll be sending it to a remote server, you need raw bytes, not unicode values, and you should open the file as a binary instead.
Digest authentication always requires you to make at least two request to the server. The first request doesn't contain any authentication data. This first request will fail with a 401 "Authorization required" response code and a digest challenge (called a nounce) to be used for hashing your password etc. (the exact details don't matter here). This is used to make a second request to the server containing your credentials hashed with the challenge.
The problem is in the this two step authentication: your large file was already send with the first unauthorized request (send in vain) but on the second request the file object is already at the EOF position. Since the file size was also send in the Content-length header of the second request, this causes the server to wait for a file that will never be send.
You could solve it using a requests Session and first make a simple request for authentication purposes (say a GET request). Then make a second PUT request containing the actual payload using the same digest challenge form the first request.
sess = requests.Session()
sess.auth = HTTPDigestAuth("*", "*")
sess.get(url)
headers = {'content-type': 'application/xml'}
with codecs.open('trb-1996-219.xml', 'r', 'utf-8') as xmlfile:
sess.put(url, data=xmlfile, headers=headers)
i used requests in python to upload an XML file using the commands.
first to open the file use open()
file = open("PIR.xsd")
fragment = file.read()
file.close()
copy the data of XML file in the payload of the requests and post it
payload = {'key':'PFAkrzjmuZR957','xmlFragment':fragment}
r = requests.post(URL,data=payload)
to check the html validation code
print (r.text)
I'm using Python Flask + nginx with FCGI.
On some requests, I have to output large responses. Usually those responses are fetched from a socket. Currently I'm doing the response like this:
response = []
while True:
recv = s.recv(1024)
if not recv: break
response.append(recv)
s.close()
response = ''.join(response)
return flask.make_response(response, 200, {
'Content-type': 'binary/octet-stream',
'Content-length': len(response),
'Content-transfer-encoding': 'binary',
})
The problem is I actually do not need the data. I also have a way to determine the exact response length to be fetched from the socket. So I need a good way to send the HTTP headers, then start outputing directly from the socket, instead of collecting it in memory and then supplying to nginx (probably by some sort of a stream).
I was unable to find the solution to this seemingly common issue. How would that be achieved?
Thank you!
if response in flask.make_response is an iterable, it will be iterated over to produce the response, and each string is written to the output stream on it's own.
what this means is that you can also return a generator which will yield the output when iterated over. if you know the content length, then you can (and should) pass it as header.
a simple example:
from flask import Flask
app = Flask(__name__)
import sys
import time
import flask
#app.route('/')
def generated_response_example():
n = 20
def response_generator():
for i in range(n):
print >>sys.stderr, i
yield "%03d\n" % i
time.sleep(.2)
print >>sys.stderr, "returning generator..."
gen = response_generator()
# the call to flask.make_response is not really needed as it happens imlicitly
# if you return a tuple.
return flask.make_response(gen ,"200 OK", {'Content-length': 4*n})
if __name__ == '__main__':
app.run()
if you run this and try it in a browser, you should see a nice incemental count...
(the content type is not set because it seems if i do that my browser waits until the whole content has been streamed before rendering the page. wget -qO - localhost:5000 doesn't have this problems.