I was wondering if this code is safe:
import requests
with requests.Session() as s:
response = s.get("http://google.com", stream=True)
content = response.content
For example simple like this, this does not fail (note I don't write "it works" :p), since the pool does not close instantly the connection anyway (that's a point of a session/pool right?).
Using stream=True, the response object is supposed to have its raw attribute that contains the connection, but I'm unsure if the connection is owned by the session or not, and then if at some point if I don't read the content right now but later, it might have been closed.
My current 2 cents is that it's unsafe, but I'm not 100% sure.
Thanks!
[Edit after reading requests code]
After reading the requests code in more details, it seems that it's what requests.get is doing itself:
https://github.com/requests/requests/blob/master/requests/api.py
def request(method, url, **kwargs):
with sessions.Session() as session:
return session.request(method=method, url=url, **kwargs)
Being that kwargs might contain stream.
So I guess the answer is "it's safe".
Ok, I got my answer by digging in requests and urllib3
The Response object own the connection until the close() method is explicitly called:
https://github.com/requests/requests/blob/24092b11d74af0a766d9cc616622f38adb0044b9/requests/models.py#L937-L948
def close(self):
"""Releases the connection back to the pool. Once this method has been
called the underlying ``raw`` object must not be accessed again.
*Note: Should not normally need to be called explicitly.*
"""
if not self._content_consumed:
self.raw.close()
release_conn = getattr(self.raw, 'release_conn', None)
if release_conn is not None:
release_conn()
release_conn() is a urllib3 method, this releases the connection and put it back in the pool
def release_conn(self):
if not self._pool or not self._connection:
return
self._pool._put_conn(self._connection)
self._connection = None
If the Session was destroyed (like using the with), the pool was destroyed, the connection cannot be put back and it's just closed.
TL;DR; This is safe.
Note that this also means that in stream mode, the connection is not available for the pool until the response is closed explicitly or the content read entirely.
Related
I have the following session-dependent code which must be run continuously.
Code
import requests
http = requests.Session()
while True:
# if http is not good, then run http = requests.Session() again
response = http.get(....)
# process respons
# wait for 5 seconds
Note: I moved the line http = requests.Session() out of the loop.
Issue
How to check if the session is working?
An example for a not working session may be after the web server is restarted. Or loadbalancer redirects to a different web server.
The requests.Session object is just a persistence and connection-pooling object to allow shared state between different HTTP request on the client-side.
If the server unexpectedly closes a session, so that it becomes invalid, the server probably would respond with some error-indicating HTTP status code.
Thus requests would raise an error. See Errors and Exceptions:
All exceptions that Requests explicitly raises inherit from requests.exceptions.RequestException.
See the extended classes of RequestException.
Approach 1: implement open/close using try/except
Your code can catch such exceptions within a try/except-block.
It depends on the server's API interface specification how it will signal a invalidated/closed session. This signal response should be evaluated in the except block.
Here we use session_was_closed(exception) function to evaluate the exception/response and Session.close() to close the session correctly before opening a new one.
import requests
# initially open a session object
s = requests.Session()
# execute requests continuously
while True:
try:
response = s.get(....)
# process response
except requests.exceptions.RequestException as e:
if session_was_closed(e):
s.close() # close the session
s = requests.Session() # opens a new session
else:
# process non-session-related errors
# wait for 5 seconds
Depending on the server response of your case, implement the method session_was_closed(exception).
Approach 2: automatically open/close using with
From Advanced Usage, Session Objects:
Sessions can also be used as context managers:
with requests.Session() as s:
s.get('https://httpbin.org/cookies/set/sessioncookie/123456789')
This will make sure the session is closed as soon as the with block is exited, even if unhandled exceptions occurred.
I would flip the logic and add a try-except.
import requests
http = requests.Session()
while True:
try:
response = http.get(....)
except requests.ConnectionException:
http = requests.Session()
continue
# process respons
# wait for 5 seconds
See this answer for more info. I didn't test if the raised exception is that one, so please test it.
Could someone help me to understand the difference between the various methods:
request.bounded_stream.read()
request.stream.read()
request.get_media()
They seem to do the same thing, but using stream or bounded_stream provides a bytes like object.
class test_dev(object):
async def on_post(self, request, response):
obj = await request.bounded_stream.read()
print(obj)
class test_dev(object):
async def on_post(self, request, response):
obj = await request.stream.read()
print(obj)
class test_dev(object):
async def on_post(self, request, response):
obj = await request.get_media()
print(obj)
stream and bounded_stream are file-like wrappers to access the request body data stream coming from the server. Those are non-seakable and, as far as I know, there is no possibility to peek them either.
The difference between them is that the latter is bound to the content_length of the request where stream might behave differently from system to system. This is detailed in the official documentation.
As for get_media(), it's a wrapper for the media property. From the documentation, you can read:
Warning:
This operation will consume the request stream the first time it’s called and cache the results. Follow-up calls will just retrieve a cached version of the object.
So, the first time you access request.media, the request stream is consumed and cached, meaning that from that moment on stream and bounded_stream will return and empty data stream. By default, the application will assume the content type is application/json and uses the json library to serialize and deserialize the content.
What is not so clear from the docs, but also happens, is that if you choose to access, and therefore consume, stream or bounded_stream, then accessing media will return an error.
If you choose to access stream or bounded_stream yourself you'll also have to store the data consume yourself.
import requests
requests.get(path_url, timeout=100)
In the above usage of python requests library, does the connection close automatically once requests.get is done running? If not, how can I make for certain that connection is closed
Yes, there is a call to a session.close behind the get code. If using a proper IDE like PyCharm for example, you can follow the get code to see what is happening. Inside get there is a call to request:
return request('get', url, params=params, **kwargs)
Within the definition of that request method, the call to session.close is made.
By following the link here to the requests repo, there is a call being made for the session control:
# By using the 'with' statement we are sure the session is closed, thus we
# avoid leaving sockets open which can trigger a ResourceWarning in some
# cases, and look like a memory leak in others.
with sessions.Session() as session:
return session.request(method=method, url=url, **kwargs)
We have some custom module where we have redefined open, seek, read, tell functions to read only a part of file according to the arguments.
But, this logic overrides the default tell and python requests is trying to calculate the content-length which involves using tell(), which then redirects to our custom tell function and the logic is somewhere buggy and returns a wrong value. And I tried some changes, it throws error.
Found the following from models.py of requests:
def prepare_content_length(self, body):
if hasattr(body, 'seek') and hasattr(body, 'tell'):
body.seek(0, 2)
self.headers['Content-Length'] = builtin_str(body.tell())
body.seek(0, 0)
elif body is not None:
l = super_len(body)
if l:
self.headers['Content-Length'] = builtin_str(l)
elif (self.method not in ('GET', 'HEAD')) and (self.headers.get('Content-Length') is None):
self.headers['Content-Length'] = '0'
For now, I am not able to figure out where's the bug and stressed out to investigate more and fix it. And everything else work except content-length calculation by python requests.
So, I have created my own definition for finding content-length. And I have included the value in requests header. But, the request is still preparing the content-length and throwing error.
How can I restrict not preparing content-length and use the specified content-length?
Requests lets you modify a request before sending. See Prepared Requests.
For example:
from requests import Request, Session
s = Session()
req = Request('POST', url, data=data, headers=headers)
prepped = req.prepare()
# do something with prepped.headers
prepped.headers['Content-Length'] = your_custom_content_length_calculation()
resp = s.send(prepped, ...)
If your session has its own configuration (like cookie persistence or connection-pooling), then you should use s.prepare_request(req) instead of req.prepare().
I am using a urllib.request.urlopen() to GET from a web service I'm trying to test.
This returns an HTTPResponse object, which I then read() to get the response body.
But I always see a ResourceWarning about an unclosed socket from socket.py
Here's the relevant function:
from urllib.request import Request, urlopen
def get_from_webservice(url):
""" GET from the webservice """
req = Request(url, method="GET", headers=HEADERS)
with urlopen(req) as rsp:
body = rsp.read().decode('utf-8')
return json.loads(body)
Here's the warning as it appears in the program's output:
$ ./test/test_webservices.py
/Library/Frameworks/Python.framework/Versions/3.3/lib/python3.3/socket.py:359: ResourceWarning: unclosed <socket.socket object, fd=5, family=30, type=1, proto=6>
self._sock = None
.s
----------------------------------------------------------------------
Ran 2 tests in 0.010s
OK (skipped=1)
If there's anything I can do to the HTTPResponse (or the Request?) to make it close its socket cleanly,
I would really like to know, because this code is for my unit tests; I don't like
ignoring warnings anywhere, but especially not there.
I don't know if this is the answer, but it is part of the way to an answer.
If I add the header "connection: close" to the response from my web services, the HTTPResponse object seems to clean itself up properly without a warning.
And in fact, the HTTP Spec (http://www.w3.org/Protocols/rfc2616/rfc2616-sec14.html) says:
HTTP/1.1 applications that do not support persistent connections MUST include the "close" connection option in every message.
So the problem was on the server end (i.e. my fault!). In the event that you don't have control over the headers coming from the server, I don't know what you can do.
I had the same problem with urllib3 and I just added a context manager to close connection automatically:
import urllib3
def get(addr, headers):
""" this function will close the connection after a http request. """
with urllib3.PoolManager() as conn:
res = conn.request('GET', addr, headers=headers)
if r.status == 200:
return res.data
else:
raise ConnectionError(res.reason)
Note that urllib3 is designed to have a pool of connections and to keep connections alive for you. This can significantly speed up your application, if it needs to make a series of requests, e.g. few calls to the backend API.
Please read urllib3 documentation re connection pools here: https://urllib3.readthedocs.io/en/1.5/pools.html
P.S. you could also use requests lib, which is not a part of the Python standard lib (at 2019) but is very powerful and simple to use: http://docs.python-requests.org/en/master/