Could someone help me to understand the difference between the various methods:
request.bounded_stream.read()
request.stream.read()
request.get_media()
They seem to do the same thing, but using stream or bounded_stream provides a bytes like object.
class test_dev(object):
async def on_post(self, request, response):
obj = await request.bounded_stream.read()
print(obj)
class test_dev(object):
async def on_post(self, request, response):
obj = await request.stream.read()
print(obj)
class test_dev(object):
async def on_post(self, request, response):
obj = await request.get_media()
print(obj)
stream and bounded_stream are file-like wrappers to access the request body data stream coming from the server. Those are non-seakable and, as far as I know, there is no possibility to peek them either.
The difference between them is that the latter is bound to the content_length of the request where stream might behave differently from system to system. This is detailed in the official documentation.
As for get_media(), it's a wrapper for the media property. From the documentation, you can read:
Warning:
This operation will consume the request stream the first time it’s called and cache the results. Follow-up calls will just retrieve a cached version of the object.
So, the first time you access request.media, the request stream is consumed and cached, meaning that from that moment on stream and bounded_stream will return and empty data stream. By default, the application will assume the content type is application/json and uses the json library to serialize and deserialize the content.
What is not so clear from the docs, but also happens, is that if you choose to access, and therefore consume, stream or bounded_stream, then accessing media will return an error.
If you choose to access stream or bounded_stream yourself you'll also have to store the data consume yourself.
Related
I have seen Tornado documentations and examples where self.write method is widely used to render some value on HTML, where the POST request was run in a handler. But I could not find much clarity on how to return the response back to client.
For example, I am calling a POST request on a Tornado server from my client. The code that accepts post request is:
class strest(tornado.web.RequestHandler):
def post(self):
value = self.get_argument('key')
cbtp = cbt.main(value)
With this, I can find the value of cbtp and with self.write(cbtp), I can get it printed in HTML. But instead, I want to return this value to the client in JSON format, like {'cbtp':cbtp}
I want to know how to modify my code so that this response is sent to the client, or give me some documentation where this this is fluently explained.
Doing something like
res = {cbtp: cbtp}
return cbtp
throws a BadYieldError: yielded unknown object
You just need to set the output type as JSON and json.dumps your output.
Normally I have the set_default_headers in a parent class called RESTRequestHandler. If you want just one request that is returning JSON you can set the headers in the post call.
class strest(tornado.web.RequestHandler):
def set_default_headers(self):
self.set_header("Content-Type", 'application/json')
def post(self):
value = self.get_argument('key')
cbtp = cbt.main(value)
r = json.dumps({'cbtp': cbtp})
self.write(r)
If the given chunk is a dictionary, we write it as JSON and set the Content-Type of the response to be application/json. (if you want to send JSON as a different Content-Type, call set_header after calling write()).
Using it should give you exactly what you want:
self.write(json.dumps({'cbtp': cbtp}))
I was wondering if this code is safe:
import requests
with requests.Session() as s:
response = s.get("http://google.com", stream=True)
content = response.content
For example simple like this, this does not fail (note I don't write "it works" :p), since the pool does not close instantly the connection anyway (that's a point of a session/pool right?).
Using stream=True, the response object is supposed to have its raw attribute that contains the connection, but I'm unsure if the connection is owned by the session or not, and then if at some point if I don't read the content right now but later, it might have been closed.
My current 2 cents is that it's unsafe, but I'm not 100% sure.
Thanks!
[Edit after reading requests code]
After reading the requests code in more details, it seems that it's what requests.get is doing itself:
https://github.com/requests/requests/blob/master/requests/api.py
def request(method, url, **kwargs):
with sessions.Session() as session:
return session.request(method=method, url=url, **kwargs)
Being that kwargs might contain stream.
So I guess the answer is "it's safe".
Ok, I got my answer by digging in requests and urllib3
The Response object own the connection until the close() method is explicitly called:
https://github.com/requests/requests/blob/24092b11d74af0a766d9cc616622f38adb0044b9/requests/models.py#L937-L948
def close(self):
"""Releases the connection back to the pool. Once this method has been
called the underlying ``raw`` object must not be accessed again.
*Note: Should not normally need to be called explicitly.*
"""
if not self._content_consumed:
self.raw.close()
release_conn = getattr(self.raw, 'release_conn', None)
if release_conn is not None:
release_conn()
release_conn() is a urllib3 method, this releases the connection and put it back in the pool
def release_conn(self):
if not self._pool or not self._connection:
return
self._pool._put_conn(self._connection)
self._connection = None
If the Session was destroyed (like using the with), the pool was destroyed, the connection cannot be put back and it's just closed.
TL;DR; This is safe.
Note that this also means that in stream mode, the connection is not available for the pool until the response is closed explicitly or the content read entirely.
I believe using "callback" method is asynchronous, please correct me if I'm wrong. I'm still new with Python so please bear with me.
Anyway, I'm trying to make a method to check if a file exists and here is my code:
def file_exists(self, url):
res = False;
response = Request(url, method='HEAD', dont_filter=True)
if response.status == 200:
res = True
return res
I thought the Request() method will return a Response object but it still returns a Request object, to capture the Response, I have to create a different method for the callback.
Is there a way to get the Response object within the code block where you call the Response() method?
If anyone is still interested in a possible solution – I managed it by doing a request with "requests" sort of "inside" a scrapy function like this:
import requests
request_object = requests.get(the_url_you_like_to_get)
response_object = scrapy.Selector(request_object )
item['attribute'] = response_object .xpath('//path/you/like/to/get/text()').extract_first()
and then proceed.
Request objects don't generate anything.
Scrapy uses asynchronous Downloader engine which takes these Request objects and generate Response objects.
if any method in your spider returns a Request object it is automatically scheduled in the downloader and returns a Response object to specified callback(i.e. Request(url, callback=self.my_callback)).
Check out more at scrapy's architecture overview
Now depends when and where you are doing it you can schedule requests by telling the downloader to schedule some requests:
self.crawler.engine.schedule(Request(url, callback=self.my_callback), spider)
If you run this from a spider spider here can most likely be self here and self.crawler is inherited from scrapy.Spider.
Alternatively you can always block asynchronous stack by using something like requests like:
def parse(self, response):
image_url = response.xpath('//img/#href').extract_first()
if image_url:
image_head = requests.head(image_url)
if 'image' in image_head.headers['Content-Type']:
item['image'] = image_url
It will slow your spider down but it's significantly easier to implement and manage.
Scrapy uses Request and Response objects for crawling web sites.
Typically, Request objects are generated in the spiders and pass across the system until they reach the Downloader, which executes the request and returns a Response object which travels back to the spider that issued the request.
Unless you are manually using a Downloader, it seems like the way you're using the framework is incorrect. I'd read a bit more about how you can create proper spiders here.
As for file exists, your spider can store relevant information in a database or other data structure when parsing the scraped data in its parse*() method, and you can later query it in your own code.
I'm using YouTube data API v3.
Is it possible to make a big BatchHttpRequest (e.g., see here) and also to use ETags for local caching at the httplib2 level (e.g., see here)?
ETags work fine for single queries, I don't understand if they are useful also for batch requests.
TL;DR:
BatchHttpRequest cannot be used with caching
HERE IT IS:
First lets see the way to initialize BatchHttpRequest:
from apiclient.http import BatchHttpRequest
def list_animals(request_id, response, exception):
if exception is not None:
# Do something with the exception
pass
else:
# Do something with the response
pass
def list_farmers(request_id, response):
"""Do something with the farmers list response."""
pass
service = build('farm', 'v2')
batch = service.new_batch_http_request()
batch.add(service.animals().list(), callback=list_animals)
batch.add(service.farmers().list(), callback=list_farmers)
batch.execute(http=http)
Second lets see how ETags are used:
from google.appengine.api import memcache
http = httplib2.Http(cache=memcache)
Now lets analyze:
Observe the last line of BatchHttpRequest example: batch.execute(http=http), and now checking the source code for execute, it calls _refresh_and_apply_credentials, which applies the http object we pass it.
def _refresh_and_apply_credentials(self, request, http):
"""Refresh the credentials and apply to the request.
Args:
request: HttpRequest, the request.
http: httplib2.Http, the global http object for the batch.
"""
# For the credentials to refresh, but only once per refresh_token
# If there is no http per the request then refresh the http passed in
# via execute()
Which means, execute call which takes in http, can be passed the ETag http you would have created as:
http = httplib2.Http(cache=memcache)
# This would mean we would get the ETags cached http
batch.execute(http=http)
Update 1:
Could try with a custom object as well:
from googleapiclient.discovery_cache import DISCOVERY_DOC_MAX_AGE
from googleapiclient.discovery_cache.base import Cache
from googleapiclient.discovery_cache.file_cache import Cache as FileCache
custCache = FileCache(max_age=DISCOVERY_DOC_MAX_AGE)
http = httplib2.Http(cache=custCache)
# This would mean we would get the ETags cached http
batch.execute(http=http)
Because, this is just a hunch on the comment in http2 lib:
"""If 'cache' is a string then it is used as a directory name for
a disk cache. Otherwise it must be an object that supports the
same interface as FileCache.
Conclusion Update 2:
After again verifying the google-api-python source code, I see that, BatchHttpRequest is fixed with 'POST' request and has a content-type of multipart/mixed;.. - source code.
Giving a clue about the fact that, BatchHttpRequest is useful in order to POST data which is then processed down the later.
Now, keeping that in mind, observing what httplib2 request method uses: _updateCache only when following criteria are met:
Request is in ["GET", "HEAD"] or response.status == 303 or is a redirect request
ElSE -- response.status in [200, 203] and method in ["GET", "HEAD"]
OR -- if response.status == 304 and method == "GET"
This means, BatchHttpRequest cannot be used with caching.
I am trying to write a file sharing application that exposes a REST interface.
The library I am using, Flask-RESTful only supports returning JSON by default. Obviously attempting to serve binary data over JSON is not a good idea at all.
What is the most "RESTful" way of serving up binary data through a GET method? It appears possible to extend Flask-RESTful to support returning different data representations besides JSON but the documentation is scarce and I'm not sure if it's even the best approach.
The approach suggested in the Flask-RESTful documentation is to declare our supported representations on the Api object so that it can support other mediatypes. The mediatype we are looking for is application/octet-stream.
First, we need to write a representation function:
from flask import Flask, send_file, safe_join
from flask_restful import Api
app = Flask(__name__)
api = Api(app)
#api.representation('application/octet-stream')
def output_file(data, code, headers):
filepath = safe_join(data["directory"], data["filename"])
response = send_file(
filename_or_fp=filepath,
mimetype="application/octet-stream",
as_attachment=True,
attachment_filename=data["filename"]
)
return response
What this representation function does is to convert the data, code, headers our method returns into a Response object with mimetype application/octet-stream. Here we use send_file function to construct this Response object.
Our GET method can be something like:
from flask_restful import Resource
class GetFile(Resource):
def get(self, filename):
return {
"directory": <Our file directory>,
"filename": filename
}
And that's all the coding we need. When sending this GET request, we need to change the Accept mimetype to Application/octet-stream so that our API will call the representation function. Otherwise it will return the JSON data as by default.
There's an xml example on github
I know this question was asked 7 years ago so it probably doesn't matter any more to #Ayrx. Hope it helps to whoever drops by.
As long as you're setting the Content-Type header accordingly and respecting the Accept header sent by the client, you're free to return any format you want. You can just have a view that returns your binary data with the application/octet-stream content type.
After lot of trials and experiments, including hours of browsing to make the Response class as Single Responsible down loader
class DownloadResource(Resource):
def get(self):
item_list = dbmodel.query.all()
item_list = [item.image for item in item_list]
data = json.dumps({'items':item_list})
response = make_response(data)
response.headers['Content-Type'] = 'text/json'
response.headers['Content-Disposition'] = 'attachment; filename=selected_items.json'
return response
Change your filename and content type to support the format you want.