I have a very unreliable API that I request using Python. I have been thinking about using requests_cache and setting expire_after to be 999999999999 like I have seen other people do.
The only problem is, I do not know if when the API works again, that if the data is updated. If requests_cache will automatically auto-update and delete the old entry.
I have tried reading the docs but I cannot really see this anywhere.
requests_cache will not update until the expire_after time has passed. In that case it will not detect that your API is back to a working state.
I note that the project has since added an option that I implemented in the past; you can now set the old_data_on_error option when configuring the cache; see the CachedSession documentation:
old_data_on_error – If True it will return expired cached response if update fails.
It would reuse existing cache data in case a backend update fails, rather than delete that data.
In the past, I created my own requests_cache session setup (plus small patch) that would reuse cached values beyond expire_after if the backend gave a 500 error or timed out (using short timeouts) to deal with a problematic API layer, rather than rely on expire_after:
import logging
from datetime import (
datetime,
timedelta
)
from requests.exceptions import (
ConnectionError,
Timeout,
)
from requests_cache.core import (
dispatch_hook,
CachedSession,
)
log = logging.getLogger(__name__)
# Stop logging from complaining if no logging has been configured.
log.addHandler(logging.NullHandler())
class FallbackCachedSession(CachedSession):
"""Cached session that'll reuse expired cache data on timeouts
This allows survival in case the backend is down, living of stale
data until it comes back.
"""
def send(self, request, **kwargs):
# this *bypasses* CachedSession.send; we want to call the method
# CachedSession.send() would have delegated to!
session_send = super(CachedSession, self).send
if (self._is_cache_disabled or
request.method not in self._cache_allowable_methods):
response = session_send(request, **kwargs)
response.from_cache = False
return response
cache_key = self.cache.create_key(request)
def send_request_and_cache_response(stale=None):
try:
response = session_send(request, **kwargs)
except (Timeout, ConnectionError):
if stale is None:
raise
log.warning('No response received, reusing stale response for '
'%s', request.url)
return stale
if stale is not None and response.status_code == 500:
log.warning('Response gave 500 error, reusing stale response '
'for %s', request.url)
return stale
if response.status_code in self._cache_allowable_codes:
self.cache.save_response(cache_key, response)
response.from_cache = False
return response
response, timestamp = self.cache.get_response_and_time(cache_key)
if response is None:
return send_request_and_cache_response()
if self._cache_expire_after is not None:
is_expired = datetime.utcnow() - timestamp > self._cache_expire_after
if is_expired:
self.cache.delete(cache_key)
# try and get a fresh response, but if that fails reuse the
# stale one
return send_request_and_cache_response(stale=response)
# dispatch hook here, because we've removed it before pickling
response.from_cache = True
response = dispatch_hook('response', request.hooks, response, **kwargs)
return response
def basecache_delete(self, key):
# We don't really delete; we instead set the timestamp to
# datetime.min. This way we can re-use stale values if the backend
# fails
try:
if key not in self.responses:
key = self.keys_map[key]
self.responses[key] = self.responses[key][0], datetime.min
except KeyError:
return
from requests_cache.backends.base import BaseCache
BaseCache.delete = basecache_delete
The above subclass of CachedSession bypasses the original send() method to instead go directly to the original requests.Session.send() method, to return existing cached value even if the timeout has passed but the backend has failed. Deletion is disabled in favour of setting the timeout value to 0, so we can still reuse that old value if a new request fails.
Use the FallbackCachedSession instead of a regular CachedSession object.
If you wanted to use requests_cache.install_cache(), make sure to pass in FallbackCachedSession to that function in the session_factory keyword argument:
import requests_cache
requests_cache.install_cache(
'cache_name', backend='some_backend', expire_after=180,
session_factory=FallbackCachedSession)
My approach is a little more comprehensive than what requests_cache implemented some time after I hacked together the above; my version will fall back to a stale response even if you explicitly marked it as deleted before.
Try to do something like that:
class UnreliableAPIClient:
def __init__(self):
self.some_api_method_cached = {} # we will store results here
def some_api_method(self, param1, param2)
params_hash = "{0}-{1}".format(param1, param2) # need to identify input
try:
result = do_call_some_api_method_with_fail_probability(param1, param2)
self.some_api_method_cached[params_hash] = result # save result
except:
result = self.some_api_method_cached[params_hash] # resort to cached result
if result is None:
raise # reraise exception if nothing cached
return result
Of course you can make simple decorator with that, up to you - http://www.artima.com/weblogs/viewpost.jsp?thread=240808
Related
I need to write historic data into InfluxDB (I'm using Python, which is not a must in this case, so I maybe willing to accept non-Python solutions). I set up the write API like this
write_api = client.write_api(write_options=ASYNCHRONOUS)
The Data comes from a DataFrame with a timestamp as key, so I write it to the database like this
result = write_api.write(bucket=bucket, data_frame_measurement_name=field_key, record=a_data_frame)
This call does not throw an exception, even if the InfluxDB server is down. result has a protected attribute _success that is a boolean in debugging, but I cannot access it from the code.
How do I check if the write was a success?
If you use background batching, you can add custom success, error and retry callbacks.
from influxdb_client import InfluxDBClient
def success_cb(details, data):
url, token, org = details
print(url, token, org)
data = data.decode('utf-8').split('\n')
print('Total Rows Inserted:', len(data))
def error_cb(details, data, exception):
print(exc)
def retry_cb(details, data, exception):
print('Retrying because of an exception:', exc)
with InfluxDBClient(url, token, org) as client:
with client.write_api(success_callback=success_cb,
error_callback=error_cb,
retry_callback=retry_cb) as write_api:
write_api.write(...)
If you are eager to test all the callbacks and don't want to wait until all retries are finished, you can override the interval and number of retries.
from influxdb_client import InfluxDBClient, WriteOptions
with InfluxDBClient(url, token, org) as client:
with client.write_api(success_callback=success_cb,
error_callback=error_cb,
retry_callback=retry_cb,
write_options=WriteOptions(retry_interval=60,
max_retries=2),
) as write_api:
...
if you want to immediately write data into database, then use SYNCHRONOUS version of write_api - https://github.com/influxdata/influxdb-client-python/blob/58343322678dd20c642fdf9d0a9b68bc2c09add9/examples/example.py#L12
The asynchronous write should be "triggered" by call .get() - https://github.com/influxdata/influxdb-client-python#asynchronous-client
Regards
write_api.write() returns a multiprocessing.pool.AsyncResult or multiprocessing.pool.AsyncResult (both are the same).
With this return object you can check on the asynchronous request in a couple of ways. See here: https://docs.python.org/2/library/multiprocessing.html#multiprocessing.pool.AsyncResult
If you can use a blocking request, then write_api = client.write_api(write_options=SYNCRONOUS) can be used.
from datetime import datetime
from influxdb_client import WritePrecision, InfluxDBClient, Point
from influxdb_client.client.write_api import SYNCHRONOUS
with InfluxDBClient(url="http://localhost:8086", token="my-token", org="my-org", debug=False) as client:
p = Point("my_measurement") \
.tag("location", "Prague") \
.field("temperature", 25.3) \
.time(datetime.utcnow(), WritePrecision.MS)
try:
client.write_api(write_options=SYNCHRONOUS).write(bucket="my-bucket", record=p)
reboot = False
except Exception as e:
reboot = True
print(f"Reboot? {reboot}")
I have a method that pulls the information about a specific gerrit revision, and there are two possible endpoints that it could live under. Given the following sample values:
revision_id = rev1
change_id = chg1
proj_branch_id = pbi1
The revision can either live under
/changes/ch1/revisions/rev1/commit
or
/changes/pbi1/revisions/rev1/commit
I'm trying to handle this cleanly with minimal code re-use as follows:
#retry(
wait_exponential_multiplier=1000,
stop_max_delay=3000
)
def get_data(
self,
api_obj: GerritAPI,
writer: cu.LockingWriter = None,
test: bool = False
):
"""
Make an HTTP Request using a pre-authed GerritAPI object to pull the
JSON related to itself. It checks the endpoint using just the change
id first, and if that doesn't return anything, will try the endpoint
using the full id. It also cleans the data and adds ETL tracking
fields.
:param api_obj: An authed GerritAPI object.
:param writer: A LockingWriter object.
:param test: If True then the data will be returned instead of written
out.
:raises: If an HTTP Error occurs, an alternate endpoint will be
attempted. If this also raises an HTTP Error then the error is
re-raised and propagated to the caller.
"""
logging.debug('Getting data for a commit for {f_id}'.format(
f_id=self.proj_branch_id
))
endpoint = (r'/changes/{c_id}/revisions/{r_id}/commit'.format(
c_id=self.change_id,
r_id=self.revision_id
))
try:
data = api_obj.get(endpoint)
except HTTPError:
try:
endpoint = (r'/changes/{f_id}/revisions/{r_id}/commit'.format(
f_id=self.proj_branch_id,
r_id=self.revision_id
))
data = api_obj.get(endpoint)
except HTTPError:
logging.debug('Neither endpoint returned data: {ep}'.format(
ep=endpoint
))
raise
else:
data['name'] = data.get('committer')['name']
data['email'] = data.get('committer')['email']
data['date'] = data.get('committer')['date']
mu.clean_data(data)
except ReadTimeout:
logging.warning('Read Timeout occurred for a commit. Endpoint: '
'{ep}'.format(ep=endpoint))
else:
data['name'] = data.get('committer')['name']
data['email'] = data.get('committer')['email']
data['date'] = data.get('committer')['date']
mu.clean_data(data)
finally:
try:
data.update(self.__dict__)
except NameError:
data = self.__dict__
if test:
return data
mu.add_etl_fields(data)
self._write_data(data, writer)
I don't much like that I'm repeating the portion under the else, so I'm wondering if there is a way to more cleanly handle this? As a side note, as it stands currently my program will write out up to 3 times for every commit if it returns an HTTPError, is creating an instance variable self.written which tracks whether it has already been written out a best practice way to do this?
I am currently writing tests for our project, and I ran into an issue. We have this section of a view, which will redirect the user back to the page where they came from including an error message (that's being stored in the session):
if request.GET.get('error_code'):
"""
Something went wrong or the call was cancelled
"""
errorCode = request.GET.get('error_code')
if errorCode == 4201:
request.session['errormessage'] = _('Action cancelled by the user')
return HttpResponseRedirect('/socialMedia/manageAccessToken')
Once the HttpResponseRedirect kicks in, the first thing that the new view does is scan the session, to see if any error messages are stored in the session. If there are, we place them in a dictionary and then delete it from the session:
def manageAccessToken(request):
"""
View that handles all things related to the access tokens for Facebook,
Twitter and Linkedin.
"""
contextDict = {}
try:
contextDict['errormessage'] = request.session['errormessage']
contextDict['successmessage'] = request.session['successmessage']
del request.session['errormessage']
del request.session['successmessage']
except KeyError:
pass
We should now have the error message in a dictionary, but after printing the dictionary the error message is not there. I also printed the session just before the HttpResponseRedirect, but the session is an empty dictionary there as well.
This is the test:
class oauthCallbacks(TestCase):
"""
Class to test the different oauth callbacks
"""
def setUp(self):
self.user = User.objects.create(
email='test#django.com'
)
self.c = Client()
def test_oauthCallbackFacebookErrorCode(self):
"""
Tests the Facebook oauth callback view
This call contains an error code, so we will be redirected to the
manage accesstoken page. We check if we get the error message
"""
self.c.force_login(self.user)
response = self.c.get('/socialMedia/oauthCallbackFacebook/',
data={'error_code': 4201},
follow=True,
)
self.assertEqual('Action cancelled by the user', response.context['errormessage'])
It looks like the session can not be accessed or written to directly from the views during testing. I can, however, access a value in the session by manually setting it in the test by using the following bit of code:
session = self.c.session
session['errormessage'] = 'This is an error message'
session.save()
This is however not what I want, because I need the session to be set by the view as there are many different error messages in the entire view. Does anyone know how to solve this? Thanks in advance!
After taking a closer look I found the issue, it is in the view itself:
errorCode = request.GET.get('error_code')
if errorCode == 4201:
request.session['errormessage'] = _('Action cancelled by the user')
The errorCode variable is a string, and I was comparing it to an integer. I fixed it by changing the second line to:
if int(errorCode) == 4201:
I would like the .get() method in requests to do extra operations beside the GET itself:
print out "hello world" (in reality this will be logging)
wait 5 seconds before issuing the actual GET (in reality this will be a more complex wait-and-retry operation)
Right now my simplistic solution is to use a function which actually calls requests.get():
def multiple_requests(self, url, retries=5, wait=5):
"""
retries several times an URL
:param
url: the url to check
retries: how meny times to retry
wait: number of seconds to wait between retries
:return: the requests response, or None if failed
"""
for _ in range(retries):
try:
r = requests.get(url)
except Exception as e:
self.log.error("cannot connect to {url}: {e}, retrying in {wait} seconds".format(url=url, e=e, wait=wait))
else:
if r.ok:
return r
else:
self.log.error(
"error connecting to {url}, code {e}, retrying in {wait} seconds".format(
url=url, e=r.status_code, wait=wait
)
)
finally:
time.sleep(wait)
# give up after several tries
self.log.error("cannot connect to {url} despite retries, giving up".format(url=url))
return None
but I have a strong feeling that it would be possible to override the actual .get() method in requests.
I use object programming in a very basic way and that would be an opportunity to actually learn the override part. There are various tutorials on how to override and call the parent class methods (which is exactly what I want to do: be able to finally use the original .get() method)
I therefore tried a basic override:
import requests
class MyRequest(requests.Request):
def get(self, url, **kwargs):
print("hello world")
# calling the parent .get() method to actually GET something
super(MyRequest, self).get(url, **kwargs)
r = MyRequest.get('http://google.com')
This code fails with
Traceback (most recent call last):
File "C:/Users/yop/dev/infoscreen/testingrequestsclass.py", line 8, in <module>
r = MyRequest.get('http://google.com')
TypeError: get() missing 1 required positional argument: 'url'
To be honest, I am stuck here. The tutorials all start with a definition of the parent class, while what I have is hidden (there is documentation)
requests.get is just a function, you can override it. It is not a method on the requests.Requests model:
import requests.api
def my_get(url, **kwargs):
print('Hello World!')
kwargs.setdefault('allow_redirects', True)
return requests.api.request('get', url, **kwargs)
requests.api.get = my_get
This then uses a new session object to handle the request.
Instead of replacing requests.get(), I'd provide a subclass of the requests.Session() object, overriding the Session.request() method, then use an instance of that session object:
from requests import Session
class MySession(Session):
def request(self, method, url, **kwargs):
print('Hello World!')
return super().request(method, url, **kwargs)
then use that like this:
with MySession() as session:
response = session.get(url)
The advantage here is that you then can also make use of the full feature set that session objects offer, plus your additional code will also work for POST and PUT and DELETE and HEAD, etc. requests.
I adapted this sample code in order to get webapp2 sessions to work on Google App Engine.
What do I need to do to be able to return webapp2.Response objects from a handler that's inheriting from a BaseHandler that overrides the dispatch method?
Here's a demonstration of the kind of handler I want to write:
import webapp2
import logging
from webapp2_extras import sessions
class BaseHandler(webapp2.RequestHandler):
def dispatch(self):
# Get a session store for this request.
self.session_store = sessions.get_store(request=self.request)
try:
# Dispatch the request.
webapp2.RequestHandler.dispatch(self)
finally:
# Save all sessions.
self.session_store.save_sessions(self.response)
class HomeHandler(BaseHandler):
def get(self):
logging.debug('In homehandler')
response = webapp2.Response()
response.write('Foo')
return response
config = {}
config['webapp2_extras.sessions'] = {
'secret_key': 'some-secret-key',
}
app = webapp2.WSGIApplication([
('/test', HomeHandler),
], debug=True, config=config)
This code is obviously not working, since BaseHandler always calls dispatch with self. I've looked through the code of webapp2.RequestHandler, but it seriously eludes me how to modify my BaseHandler (or perhaps set a custom dispatcher) such that I can simply return response objects from inheriting handlers.
Curiously, the shortcut of assigning self.response = copy.deepcopy(response) does not work either.
You're mixing the two responses in one method. Use either
return webapp2.Response('Foo')
or
self.response.write('Foo')
...not both.
I took a look at webapp2.RequestHandler and noticed that returned values are just passed up the stack.
A solution which works for me is to use the returned Response when one is returned from the handler, or self.response when nothing is returned.
class BaseHandler(webapp2.RequestHandler):
def dispatch(self):
# Get a session store for this request.
self.session_store = sessions.get_store(request=self.request)
response = None
try:
# Dispatch the request.
response = webapp2.RequestHandler.dispatch(self)
return response
finally:
# Save all sessions.
if response is None:
response = self.response
self.session_store.save_sessions(response)
While I was playing I noticed that my session stored as a secure cookie was not getting updated when exceptions were raised in the handler.