I expected calling close() on a Session object to close the session. But looks like that's not happening. Am I missing something?
import requests
s = requests.Session()
url = 'https://google.com'
r = s.get(url)
s.close()
print("s is closed now")
r = s.get(url)
print(r)
output:
s is closed now
<Response [200]>
The second call to s.get() should have given an error.
Inside the implementation for Session.close() we can find that:
def close(self):
"""Closes all adapters and as such the session"""
for v in self.adapters.values():
v.close()
And inside the adapter.close implementation:
def close(self):
"""Disposes of any internal state.
Currently, this closes the PoolManager and any active ProxyManager,
which closes any pooled connections.
"""
self.poolmanager.clear()
for proxy in self.proxy_manager.values():
proxy.clear()
So what I could make out is that, it clears the state of the Session object. So in case you have logged in to some site and have some stored cookies in the Session, then these cookies will be removed once you use the session.close() method. The inner functions still remain functional though.
You can use a Context Manager to auto-close it:
import requests
with requests.Session() as s:
url = 'https://google.com'
r = s.get(url)
See requests docs > Sessions:
This will make sure the session is closed as soon as the with block is exited, even if unhandled exceptions occurred.
Related
I have the following session-dependent code which must be run continuously.
Code
import requests
http = requests.Session()
while True:
# if http is not good, then run http = requests.Session() again
response = http.get(....)
# process respons
# wait for 5 seconds
Note: I moved the line http = requests.Session() out of the loop.
Issue
How to check if the session is working?
An example for a not working session may be after the web server is restarted. Or loadbalancer redirects to a different web server.
The requests.Session object is just a persistence and connection-pooling object to allow shared state between different HTTP request on the client-side.
If the server unexpectedly closes a session, so that it becomes invalid, the server probably would respond with some error-indicating HTTP status code.
Thus requests would raise an error. See Errors and Exceptions:
All exceptions that Requests explicitly raises inherit from requests.exceptions.RequestException.
See the extended classes of RequestException.
Approach 1: implement open/close using try/except
Your code can catch such exceptions within a try/except-block.
It depends on the server's API interface specification how it will signal a invalidated/closed session. This signal response should be evaluated in the except block.
Here we use session_was_closed(exception) function to evaluate the exception/response and Session.close() to close the session correctly before opening a new one.
import requests
# initially open a session object
s = requests.Session()
# execute requests continuously
while True:
try:
response = s.get(....)
# process response
except requests.exceptions.RequestException as e:
if session_was_closed(e):
s.close() # close the session
s = requests.Session() # opens a new session
else:
# process non-session-related errors
# wait for 5 seconds
Depending on the server response of your case, implement the method session_was_closed(exception).
Approach 2: automatically open/close using with
From Advanced Usage, Session Objects:
Sessions can also be used as context managers:
with requests.Session() as s:
s.get('https://httpbin.org/cookies/set/sessioncookie/123456789')
This will make sure the session is closed as soon as the with block is exited, even if unhandled exceptions occurred.
I would flip the logic and add a try-except.
import requests
http = requests.Session()
while True:
try:
response = http.get(....)
except requests.ConnectionException:
http = requests.Session()
continue
# process respons
# wait for 5 seconds
See this answer for more info. I didn't test if the raised exception is that one, so please test it.
I've been developing an application, where I need to handle temporarily disconnects on the client (network interface goes down).
I initially thought the below approach would work, but sometimes if restart the network interface, the s.get(url) call would hang indefinitely:
s = requests.Session()
s.mount('http://stackoverflow.com', HTTPAdapter(max_retries=Retry(total=10, connect=10, read=10)))
s.get(url)
By adding the timeout=10 keyword argument to s.get(url), the code is now able to handle this blocking behavior:
s = requests.Session()
s.mount('http://stackoverflow.com', HTTPAdapter(max_retries=Retry(total=10, connect=10, read=10)))
s.get(url, timeout=10)
Why is a timeout necessary to handle the cases, where a network interface resets or goes down temporarily? Why is max_retries=Retry(total=10, connect=10, read=10) not able to handle this? In particular, why is s.get() not informed that the network interface went offline, so that it could retry the connection instead of hanging?
Try:
https://urllib3.readthedocs.io/en/latest/reference/urllib3.util.html#urllib3.util.retry.Retry
from requests.adapters import HTTPAdapter
s = requests.Session()
s.mount('http://stackoverflow.com', HTTPAdapter(max_retries=5))
Or:
retries = Retry(connect=5, read=2, redirect=5)
http = PoolManager(retries=retries)
response = http.request('GET', 'http://stackoverflow.com')
Or:
response = http.request('GET', 'http://stackoverflow.com', retries=Retry(10))
What's the correct way to use HTTPAdapter with Async programming and calling out to a method? All of these requests are being made to the same domain.
I'm doing some async programming in Celery using eventlet and testing the load on one of my sites. I have a method that I call out to which makes the request to the url.
def get_session(url):
# gets session returns source
headers, proxies = header_proxy()
# set all of our necessary variables to None so that in the event of an error
# we can make sure we dont break
response = None
status_code = None
out_data = None
content = None
try:
# we are going to use request-html to be able to parse the
# data upon the initial request
with HTMLSession() as session:
# you can swap out the original request session here
# session = requests.session()
# passing the parameters to the session
session.mount('https://', HTTPAdapter(max_retries=0, pool_connections=250, pool_maxsize=500))
response = session.get(url, headers=headers, proxies=proxies)
status_code = response.status_code
try:
# we are checking to see if we are getting a 403 error on all requests. If so,
# we update the status code
code = response.html.xpath('''//*[#id="accessDenied"]/p[1]/b/text()''')
if code:
status_code = str(code[0][:-1])
else:
pass
except Exception as error:
pass
# print(error)
# assign the content to content
content = response.content
except Exception as error:
print(error)
pass
If I leave out the pool_connections and pool_maxsize parameters, and run the code, I get an error indicating that I do not have enough open connections. However, I don't want to unnecessarily open up a large number of connections if I dont need to.
based on this... https://laike9m.com/blog/requests-secret-pool_connections-and-pool_maxsize,89/ Im going to guess that this applies to the host and not so much the async task. Therefore, I set the max number to the max number of connections that can be reused per host. If I hit a domain several times, the connection is reused.
I was wondering if this code is safe:
import requests
with requests.Session() as s:
response = s.get("http://google.com", stream=True)
content = response.content
For example simple like this, this does not fail (note I don't write "it works" :p), since the pool does not close instantly the connection anyway (that's a point of a session/pool right?).
Using stream=True, the response object is supposed to have its raw attribute that contains the connection, but I'm unsure if the connection is owned by the session or not, and then if at some point if I don't read the content right now but later, it might have been closed.
My current 2 cents is that it's unsafe, but I'm not 100% sure.
Thanks!
[Edit after reading requests code]
After reading the requests code in more details, it seems that it's what requests.get is doing itself:
https://github.com/requests/requests/blob/master/requests/api.py
def request(method, url, **kwargs):
with sessions.Session() as session:
return session.request(method=method, url=url, **kwargs)
Being that kwargs might contain stream.
So I guess the answer is "it's safe".
Ok, I got my answer by digging in requests and urllib3
The Response object own the connection until the close() method is explicitly called:
https://github.com/requests/requests/blob/24092b11d74af0a766d9cc616622f38adb0044b9/requests/models.py#L937-L948
def close(self):
"""Releases the connection back to the pool. Once this method has been
called the underlying ``raw`` object must not be accessed again.
*Note: Should not normally need to be called explicitly.*
"""
if not self._content_consumed:
self.raw.close()
release_conn = getattr(self.raw, 'release_conn', None)
if release_conn is not None:
release_conn()
release_conn() is a urllib3 method, this releases the connection and put it back in the pool
def release_conn(self):
if not self._pool or not self._connection:
return
self._pool._put_conn(self._connection)
self._connection = None
If the Session was destroyed (like using the with), the pool was destroyed, the connection cannot be put back and it's just closed.
TL;DR; This is safe.
Note that this also means that in stream mode, the connection is not available for the pool until the response is closed explicitly or the content read entirely.
Is there anyway to pickup on a previous session when starting a python program?
I've set session as a global variable so that it can be accessed across any method that needs it. However, I'm guessing when I start the program again the session variable is reset.
Is there a way to come back to a previous session when starting the program?
session = requests.Session()
def auth():
session = self.session
url = 'this url has auth'
session.post(url, data=data)
# Now authentcated so lets grab the data
call_data(sessions)
def call_data(session)
url = 'this url has the data'
session.post(url, data=data)
def check_data()
url = 'this url does a specific call on data elements'
self.session.post(url, data=data)
When I load up my program a second time I will only want to use check_data method, I'd prefer to not require an auth every time I start the program, or perhaps I'm just curious to see if it can be done ;)
EDIT
I've updated my solution with the accepted answer.
def auth():
session = self.session
session.cookies = LWPCookieJar("cookies.txt")
url = 'this url has auth'
session.post(url, data=data)
# Now authentcated so lets grab the data
call_data(sessions)
session.cookies.save() #Save auth cookie
def some_other_method():
if not cookie:
session.cookies.load()
# do stuff now that we're authed
Code obviously don't show proper accessor for other methods, but the idea works fine.
Would be interested to know if this is the only way to remain authed.
The sessions are tracked in http via cookies. You can save them between program restart by storing in a http.cookiejar.LWPCookieJar
At the beginning of your program you have to set the cookieJar to this FileCookieJar and load the existing cookies if any
import requests
from http.cookiejar import LWPCookieJar
session = requests.Session()
session.cookies = LWPCookieJar("storage.jar")
session.cookies.load()
before closing your program you have to to save them to the file
session.cookies.save()
Note that by default it has the same behavior than browser that it doesn't save session cookies which are not set to persistent to your browser across restart if you want a different behavior, just precise it to save() method by setting ignore_discard argument to False like this
session.cookies.save(ignore_discard=False)
It's not clear what kind of session you try to establish. Django? Flask? Something different?
Be aware also that there seems to be a misspelling of call_data(sessions) where only session (without s) is defined.