Python TPCServer rfile.read blocks - python

I am writing a simple SocketServer.TCPServer request handler (StreamRequestHandler) that will capture the request, along with the headers and the message body. This is for faking out an HTTP server that we can use for testing.
I have no trouble grabbing the request line or the headers.
If I try to grab more from the rfile than exists, the code blocks. How can I grab all of the request body without knowing its size? In other words, I don't have a Content-Size header.
Here's a snippet of what I have now:
def _read_request_line(self):
server.request_line = self.rfile.readline().rstrip('\r\n')
def _read_headers(self):
headers = []
for line in self.rfile:
line = line.rstrip('\r\n')
if not line:
break
parts = line.split(':', 1)
header = (parts[0].strip(), parts[0].strip())
headers.append(header)
server.request_headers = headers
def _read_content(self):
server.request_content = self.rfile.read() # blocks

Keith's comment is correct. Here's what it looks like
length = int(self.headers.getheader('content-length'))
data = self.rfile.read(length)

Related

Using a variable from a dictionary in a loop to attach to an API call

I'm calling a LinkedIn API with the code below and it does what I want.
However when I use almost identical code inside a loop it returns a type error.
it returns a type error:
File "C:\Users\pchmurzynski\OneDrive - Centiq Ltd\Documents\Python\mergedreqs.py", line 54, in <module>
auth_headers = headers(access_token)
TypeError: 'dict' object is not callable
It has a problem with this line (which again, works fine outside of the loop):
headers = headers(access_token)
I tried changing it to
headers = headers.get(access_token)
or
headers = headers[access_token]
EDIT:
I have also tried this, with the same error:
auth_headers = headers(access_token)
But it didn't help. What am I doing wrong? Why does the dictionary work fine outside of the loop, but not inside of it and what should I do to make it work?
What I am hoping to achieve is to get a list, which I can save as json with share statistics called for each ID from the "shids" list. That can be done with individual requests - one link for one ID,
(f'https://api.linkedin.com/v2/organizationalEntityShareStatistics?q=organizationalEntity&organizationalEntity=urn%3Ali%3Aorganization%3A77487&ugcPosts=List(urn%3Ali%3AugcPost%3A{shid})
or a a request with a list of ids.
(f'https://api.linkedin.com/v2/organizationalEntityShareStatistics?q=organizationalEntity&organizationalEntity=urn%3Ali%3Aorganization%3A77487&ugcPosts=List(urn%3Ali%3AugcPost%3A{shid},urn%3Ali%3AugcPost%3A{shid2},...,urn%3Ali%3AugcPost%3A{shidx})
Updated Code thanks to your comments.
shlink = ("https://api.linkedin.com/v2/organizationalEntityShareStatistics?q=organizationalEntity&organizationalEntity=urn%3Ali%3Aorganization%3A77487&shares=List(urn%3Ali%3Ashare%3A{})")
#loop through the list of share ids and make an api request for each of them
shares = []
token = auth(credentials) # Authenticate the API
headers = fheaders(token) # Make the headers to attach to the API call.
for shid in shids:
#create a request link for each sh id
r = (shlink.format(shid))
#call the api
res = requests.get(r, headers = auth_headers)
share_stats = res.json()
#append the shares list with the responce
shares.append(share_stats["elements"])
works fine outside the loop
Because in the loop, you re-define the variable. Added print statments to show it
from liapiauth import auth, headers # one type
for ...:
...
print(type(headers))
headers = headers(access_token) # now set to another type
print(type(headers))
Lesson learned - don't overrwrite your imports
Some refactors - your auth token isn't changing, so don't put it in the loop; You can use one method for all LinkedIn API queries
from liapiauth import auth, headers
import requests
API_PREFIX = 'https://api.linkedin.com/v2'
SHARES_ENDPOINT_FMT = '/organizationalEntityShareStatistics?q=organizationalEntity&organizationalEntity=urn%3Ali%3Aorganization%3A77487&shares=List(urn%3Ali%3Ashare%3A{}'
def get_linkedin_response(endpoint, headers):
return requests.get(API_PREFIX + endpoint, headers=headers)
def main(access_token=None):
if access_token is None:
raise ValueError('Access-Token not defined')
auth_headers = headers(access_token)
shares = []
for shid in shids:
endpoint = SHARES_ENDPOINT_FMT.format(shid)
resp = get_linkedin_response(endpoint, auth_headers)
if resp.status_code // 100 == 2:
share_stats = resp.json()
shares.append(share_stats[1])
# TODO: extract your data here
idlist = [el["id"] for el in shares_list["elements"]]
if __name__ == '__main__':
credentials = 'credentials.json'
main(auth(credentials))

Flask redirect page after response is complete

So I currently have button in my html script that calls the route /return-files. This route returns a response with a zipped file and I also disabled any caching since I had issue with trying to get a new file each time this response is made. This is working well, but since i cannot return multiple things ( a response and a redirect) I cannot refresh my current page after sending the user a the zip file. Now I have read many solutions such as using javascript to create consecutive responses. I am looking for the simplest technique. Source code bellow thanks for all help.
#app.route('/return-files')
def creareturn_files_tut():
global itemsList
global clientName
try:
if len(itemsList) <= 0:
flash("The itemList was found to contain no items")
else:
taxPercent = 0.13
ziped = FileHandler(clientName, itemsList, taxPercent)
file = ziped.addToZip()
os.remove(ziped.excelFileName)
os.remove(ziped.wordFileName)
# os.remove(file)
itemsList = []
clientName = None
response = make_response(send_file(file, file, as_attachment=True))
# remove cache for the file so that a new file can be sent each time
response.headers["Cache-Control"] = "no-cache, no-store, must-revalidate"
response.headers["Pragma"] = "no-cache"
response.headers["Expires"] = "0"
response.headers['Cache-Control'] = 'public, max-age=0'
return response
# return out
except Exception as e:
return str(e)
You can use send_from_directory() or send_file() like this:
file = ziped.addToZip()
os.remove(ziped.excelFileName)
os.remove(ziped.wordFileName)
itemsList = []
clientName = None
return send_from_directory(directory='', filename=file, as_attachment=True, cache_timeout=0)
or
file = ziped.addToZip()
os.remove(ziped.excelFileName)
os.remove(ziped.wordFileName)
itemsList = []
clientName = None
return send_file(file, cache_timeout=0, as_attachment=True)
Note that caching is disabled by setting cache_timeout to 0.

Python tornado async request handler

I have written an async file upload RequestHandler. It is correct byte-wise, that is the files I receive are identical to the ones being sent. One issue that I am having trouble figuring out is upload delay. Specifically when I issue the post request to upload the file while testing locally I see the browser showing upload progress get stuck. For files close to 4MB in size it gets stuck on 50%+ for a little while then some time passes and it sends all of the data, and gets stuck on "waiting for localhost..." The whole process may last 3+ minutes.
The kicker is when I add print statements that end with a new line to data_received method the delays disappear. Does the print statement trigger the network buffers to be flushed somehow?
Here is the implementation of data_received, along with the helper methods:
#tornado.gen.coroutine
def _read_data(self, cont_buf):
'''
Read the file data.
#param cont_buf - buffered HTTP request
#param boolean indicating whether data is still being read and new
buffer
'''
# Check only last characters of the buffer guaranteed to be large
# enough to contain the boundary
end_of_data_idx = cont_buf.find(self._boundary)
if end_of_data_idx >= 0:
data = cont_buf[:(end_of_data_idx - self.LSEP)]
self.receive_data(self.header_list[-1], data)
new_buffer = cont_buf[(end_of_data_idx + len(self._boundary)):]
return False, new_buffer
else:
self.receive_data(self.header_list[-1], cont_buf)
return True, b""
#tornado.gen.coroutine
def _parse_params(self, param_buf):
'''
Parse HTTP header parameters.
#param param_buf - string buffer containing the parameters.
#returns parameters dictionary
'''
params = dict()
param_res = self.PAT_HEADERPARAMS.findall(param_buf)
if param_res:
for name, value in param_res:
params[name] = value
elif param_buf:
params['value'] = param_buf
return params
#tornado.gen.coroutine
def _parse_header(self, header_buf):
'''
Parses a buffer containing an individual header with parameters.
#param header_buf - header buffer containing a single header
#returns header dictionary
'''
res = self.PAT_HEADERVALUE.match(header_buf)
header = dict()
if res:
name, value, tail = res.groups()
header = {'name': name, 'value': value,
'params': (yield self._parse_params(tail))}
elif header_buf:
header = {"value": header_buf}
return header
#tornado.gen.coroutine
def data_received(self, chunk):
'''
Processes a chunk of content body.
#param chunk - a piece of content body.
'''
self._count += len(chunk)
self._buffer += chunk
# Has boundary been established?
if not self._boundary:
self._boundary, self._buffer =\
(yield self._extract_boundary(self._buffer))
if (not self._boundary
and len(self._buffer) > self.BOUNDARY_SEARCH_BUF_SIZE):
raise RuntimeError("Cannot find multipart delimiter.")
while True:
if self._receiving_data:
self._receiving_data, self._buffer = yield self._read_data(self._buffer)
if self._is_end_of_request(self._buffer):
yield self.request_done()
break
elif self._is_end_of_data(self._buffer):
break
else:
headers, self._buffer = yield self._read_headers(self._buffer)
if headers:
self.header_list.append(headers)
self._receiving_data = True
else:
break

script to serve from url, for requests matching regular expression

I am a complete n00b in Python and am trying to figure out a stub for mitmproxy.
I have tried the documentation but they assume we know Python so i am at a stalemate.
I've been working with a script:
original_url = 'http://production.domain.com/1/2/3'
new_content_path = '/home/andrepadez/proj/main.js'
body = open(new_content_path, 'r').read()
def response(context, flow):
url = flow.request.get_url()
if url == original_url:
flow.response.content = body
As you can predict, the proxy takes every request to 'http://production.domain.com/1/2/3' and serves the content of my file.
I need this to be more dynamic:
for every request to 'http://production.domain.com/*', i need to serve a correspondent URL, for example:
http://production.domain.com/1/4/3 -> http://develop.domain.com/1/4/3
I know i have to use a regular expression, so i can capture and map it correctly, but i don't know how to serve the contents of the develop url as "flow.response.content".
Any help will be welcome
You would have to do something like this:
import re
# In order not to re-read the original file every time, we maintain
# a cache of already-read bodies.
bodies = { }
def response(context, flow):
# Intercept all URLs
url = flow.request.get_url()
# Check if this URL is one of "ours" (check out Python regexps)
m = re.search('REGEXP_FOR_ORIGINAL_URL/(\d+)/(\d+)/(\d+)', url)
if None != m:
# It is, and m will contain this information
# The three numbers are in m.group(1), (2), (3)
key = "%d.%d.%d" % ( m.group(1), m.group(2), m.group(3) )
try:
body = bodies[key]
except KeyError:
# We do not yet have this body
body = // whatever is necessary to retrieve this body
= open("%s.txt" % ( key ), 'r').read()
bodies[key] = body
flow.response.content = body

Python Bottle multiple responses for one route

I'm looking for a way to send multiple responses for one route.
The problem is that from what I've read I have to return the content data.
For example :
#route('/events')
def positions():
for i in xrange(5):
response.content_type = 'text/event-stream'
response.set_header('Cache-Control', 'no-cache')
now = datetime.datetime.now().time().replace(microsecond=0)
return "data: %s\n\n"%now
Is there a way to replace the last line in some function call, so I can send all the responses and then exit the route ?
Thanks,
Omer.
I'm not 100% sure I understand what you're asking so I might not be answering correctly, but would this do what you want?
#route('/events')
def positions():
output = ''
for i in xrange(5):
now = datetime.datetime.now().time().replace(microsecond=0)
output += "%s\n\n"%now
response.content_type = 'text/event-stream'
response.set_header('Cache-Control', 'no-cache')
return "data: " + output

Categories

Resources