How to cleanly handle nested try/excepts? - python

I have a method that pulls the information about a specific gerrit revision, and there are two possible endpoints that it could live under. Given the following sample values:
revision_id = rev1
change_id = chg1
proj_branch_id = pbi1
The revision can either live under
/changes/ch1/revisions/rev1/commit
or
/changes/pbi1/revisions/rev1/commit
I'm trying to handle this cleanly with minimal code re-use as follows:
#retry(
wait_exponential_multiplier=1000,
stop_max_delay=3000
)
def get_data(
self,
api_obj: GerritAPI,
writer: cu.LockingWriter = None,
test: bool = False
):
"""
Make an HTTP Request using a pre-authed GerritAPI object to pull the
JSON related to itself. It checks the endpoint using just the change
id first, and if that doesn't return anything, will try the endpoint
using the full id. It also cleans the data and adds ETL tracking
fields.
:param api_obj: An authed GerritAPI object.
:param writer: A LockingWriter object.
:param test: If True then the data will be returned instead of written
out.
:raises: If an HTTP Error occurs, an alternate endpoint will be
attempted. If this also raises an HTTP Error then the error is
re-raised and propagated to the caller.
"""
logging.debug('Getting data for a commit for {f_id}'.format(
f_id=self.proj_branch_id
))
endpoint = (r'/changes/{c_id}/revisions/{r_id}/commit'.format(
c_id=self.change_id,
r_id=self.revision_id
))
try:
data = api_obj.get(endpoint)
except HTTPError:
try:
endpoint = (r'/changes/{f_id}/revisions/{r_id}/commit'.format(
f_id=self.proj_branch_id,
r_id=self.revision_id
))
data = api_obj.get(endpoint)
except HTTPError:
logging.debug('Neither endpoint returned data: {ep}'.format(
ep=endpoint
))
raise
else:
data['name'] = data.get('committer')['name']
data['email'] = data.get('committer')['email']
data['date'] = data.get('committer')['date']
mu.clean_data(data)
except ReadTimeout:
logging.warning('Read Timeout occurred for a commit. Endpoint: '
'{ep}'.format(ep=endpoint))
else:
data['name'] = data.get('committer')['name']
data['email'] = data.get('committer')['email']
data['date'] = data.get('committer')['date']
mu.clean_data(data)
finally:
try:
data.update(self.__dict__)
except NameError:
data = self.__dict__
if test:
return data
mu.add_etl_fields(data)
self._write_data(data, writer)
I don't much like that I'm repeating the portion under the else, so I'm wondering if there is a way to more cleanly handle this? As a side note, as it stands currently my program will write out up to 3 times for every commit if it returns an HTTPError, is creating an instance variable self.written which tracks whether it has already been written out a best practice way to do this?

Related

How to perform async commit when using kafka-python

I'm using kafka-python library for my fastapi consumer app and I'm consuming messages in batch with maximum of 100 records. Since the topic has huge traffic and have only one partition, consuming, processing and committing should be as quick as possible hence I want to use commit_async(), instead of synchronous commit().
But I'm not able to find a good example of commit_async(). I'm looking for an example for commit_async() with callback so that I can log in case of commit failure. But I'm not sure what does that callback function takes as argument and what field those arguments contain.
However the docs related to commit_async mentions the arguments, I'm not completely sure how to use them.
I need help in completing my callback function on_commit(), someone please help here
Code
import logging as log
from kafka import KafkaConsumer
from message_handler_impl import MessageHandlerImpl
def on_commit():
pass
class KafkaMessageConsumer:
def __init__(self, bootstrap_servers: str, topic: str, group_id: str):
self.bootstrap_servers = bootstrap_servers
self.topic = topic
self.group_id = group_id
self.consumer = KafkaConsumer(topic, bootstrap_servers=bootstrap_servers, group_id=group_id, enable_auto_commit=False, auto_offset_reset='latest')
def consume_messages(self, max_poll_records: int,
message_handler: MessageHandlerImpl = MessageHandlerImpl()):
try:
while True:
try:
msg_pack = self.consumer.poll(max_records=max_poll_records)
for topic_partition, messages in msg_pack.items():
message_handler.process_messages(messages)
self.consumer.commit_async(callback=on_commit)
except Exception as e:
log.error("Error while consuming message due to: %s", e, exc_info=True)
finally:
log.error("Something went wrong, closing consumer...........")
self.consumer.close()
if __name__ == "__main__":
kafka_consumer = KafkaMessageConsumer("localhost:9092", "test-topic", "test-group")
kafka_consumer.consume_messages(100)
The docs are fairly clear.
Called as callback(offsets, response) with response as either an Exception or an OffsetCommitResponse struct.
def on_commit(offsets, response):
# or maybe try checking type(response)
if hasattr(response, '<some attribute unique to OffsetCommitResponse>'):
print('committed ' + str(offsets))
else:
print(response) # exception
I'm sure you could look at the source code an maybe find a unit test that covers a full example

Return multiple times from one api call in Flask Restful

I want to call a generate() function and send a user a message, but then continue executing a function.
#application.route("/api/v1.0/gen", methods=['POST'])
def generate():
return "Your id for getting the generated data is 'hgF8_dh4kdsRjdr'"
main() #generate a data
return "Successfully generated something. Use your id to get the data"
I understand that this is not a correct way of returning, but I hope you get the idea of what I am trying to accomplish. Maybe Flask has some build-in method to return multiple times from one api call?
Basically, what are you describing is called Server-Sent Events (aka SSE)
The difference of this format, that they returned an 'eventstream' Response type instead of usual JSON/plaintext
And if you want to use it with python/flask, you need generators.
Small code example (with GET request):
#application.route("/api/v1.0/gen", methods=['GET'])
def stream():
def eventStream():
text = "Your id for getting the generated data is 'hgF8_dh4kdsRjdr'"
yield str(Message(data = text, type="message"))
main()
text = "Successfully generated something. Use your id to get the data"
yield str(Message(data = text, type="message"))
resp.headers['Content-Type'] = 'text/event-stream'
resp.headers['Cache-Control'] = 'no-cache'
resp.headers['Connection'] = 'keep-alive'
return resp
Message class you can find here: https://gist.github.com/Alveona/b79c6583561a1d8c260de7ba944757a7
And of course, you need specific client that can properly read such responses.
postwoman.io supports SSE at Real-Time tab

Does requests_cache automatically update cache on update of info?

I have a very unreliable API that I request using Python. I have been thinking about using requests_cache and setting expire_after to be 999999999999 like I have seen other people do.
The only problem is, I do not know if when the API works again, that if the data is updated. If requests_cache will automatically auto-update and delete the old entry.
I have tried reading the docs but I cannot really see this anywhere.
requests_cache will not update until the expire_after time has passed. In that case it will not detect that your API is back to a working state.
I note that the project has since added an option that I implemented in the past; you can now set the old_data_on_error option when configuring the cache; see the CachedSession documentation:
old_data_on_error – If True it will return expired cached response if update fails.
It would reuse existing cache data in case a backend update fails, rather than delete that data.
In the past, I created my own requests_cache session setup (plus small patch) that would reuse cached values beyond expire_after if the backend gave a 500 error or timed out (using short timeouts) to deal with a problematic API layer, rather than rely on expire_after:
import logging
from datetime import (
datetime,
timedelta
)
from requests.exceptions import (
ConnectionError,
Timeout,
)
from requests_cache.core import (
dispatch_hook,
CachedSession,
)
log = logging.getLogger(__name__)
# Stop logging from complaining if no logging has been configured.
log.addHandler(logging.NullHandler())
class FallbackCachedSession(CachedSession):
"""Cached session that'll reuse expired cache data on timeouts
This allows survival in case the backend is down, living of stale
data until it comes back.
"""
def send(self, request, **kwargs):
# this *bypasses* CachedSession.send; we want to call the method
# CachedSession.send() would have delegated to!
session_send = super(CachedSession, self).send
if (self._is_cache_disabled or
request.method not in self._cache_allowable_methods):
response = session_send(request, **kwargs)
response.from_cache = False
return response
cache_key = self.cache.create_key(request)
def send_request_and_cache_response(stale=None):
try:
response = session_send(request, **kwargs)
except (Timeout, ConnectionError):
if stale is None:
raise
log.warning('No response received, reusing stale response for '
'%s', request.url)
return stale
if stale is not None and response.status_code == 500:
log.warning('Response gave 500 error, reusing stale response '
'for %s', request.url)
return stale
if response.status_code in self._cache_allowable_codes:
self.cache.save_response(cache_key, response)
response.from_cache = False
return response
response, timestamp = self.cache.get_response_and_time(cache_key)
if response is None:
return send_request_and_cache_response()
if self._cache_expire_after is not None:
is_expired = datetime.utcnow() - timestamp > self._cache_expire_after
if is_expired:
self.cache.delete(cache_key)
# try and get a fresh response, but if that fails reuse the
# stale one
return send_request_and_cache_response(stale=response)
# dispatch hook here, because we've removed it before pickling
response.from_cache = True
response = dispatch_hook('response', request.hooks, response, **kwargs)
return response
def basecache_delete(self, key):
# We don't really delete; we instead set the timestamp to
# datetime.min. This way we can re-use stale values if the backend
# fails
try:
if key not in self.responses:
key = self.keys_map[key]
self.responses[key] = self.responses[key][0], datetime.min
except KeyError:
return
from requests_cache.backends.base import BaseCache
BaseCache.delete = basecache_delete
The above subclass of CachedSession bypasses the original send() method to instead go directly to the original requests.Session.send() method, to return existing cached value even if the timeout has passed but the backend has failed. Deletion is disabled in favour of setting the timeout value to 0, so we can still reuse that old value if a new request fails.
Use the FallbackCachedSession instead of a regular CachedSession object.
If you wanted to use requests_cache.install_cache(), make sure to pass in FallbackCachedSession to that function in the session_factory keyword argument:
import requests_cache
requests_cache.install_cache(
'cache_name', backend='some_backend', expire_after=180,
session_factory=FallbackCachedSession)
My approach is a little more comprehensive than what requests_cache implemented some time after I hacked together the above; my version will fall back to a stale response even if you explicitly marked it as deleted before.
Try to do something like that:
class UnreliableAPIClient:
def __init__(self):
self.some_api_method_cached = {} # we will store results here
def some_api_method(self, param1, param2)
params_hash = "{0}-{1}".format(param1, param2) # need to identify input
try:
result = do_call_some_api_method_with_fail_probability(param1, param2)
self.some_api_method_cached[params_hash] = result # save result
except:
result = self.some_api_method_cached[params_hash] # resort to cached result
if result is None:
raise # reraise exception if nothing cached
return result
Of course you can make simple decorator with that, up to you - http://www.artima.com/weblogs/viewpost.jsp?thread=240808

Will RPC on App Engine localize to a single instance?

I'm using RPC to fetch multiple URLs asynchronously. I'm using a global variable to track completion and notice that the contents of that global have radically different contents before and after the RPC calls complete.
Feels like I'm missing something obvious... Is it possible for the rpc.wait() to result in the app context being loaded on a new instance when the callbacks are made?
Here's the basic pattern...
aggregated_results = {}
def aggregateData(sid):
# local variable tracking results
aggregated_results[sid] = []
# create a bunch of asynchronous url fetches to get all of the route data
rpcs = []
for r in routes:
rpc = urlfetch.create_rpc()
rpc.callback = create_callback(rpc,sid)
urlfetch.make_fetch_call(rpc, url)
rpcs.append(rpc)
# all of the schedule URLs have been fetched. now wait for them to finish
for rpc in rpcs:
rpc.wait()
# look at results
try:
if len(aggregated_results[sid]) == 0:
logging.debug("We couldn't find results for transaction")
except KeyError as e:
logging.error('aggregation error: %s' % e.message)
logging.debug(aggregated_results)
return aggregated_results[sid]
def magic_callback(rpc, sid):
# do some work to parse the result
# of the urlfetch call...
# <hidden>
#
try:
if len(aggregated_results[sid]) == 0:
aggregated_results[sid] = [stop]
else:
done = False
for i, s in enumerate(aggregated_results[sid]):
if stop.time <= s.time:
aggregated_results[sid].insert(i,stop)
done = True
break
if not done:
aggregated_results[sid].append(stop)
except KeyError as e:
logging.error('aggregation error: %s' % e.message)
The KeyError is thrown both inside the callback as well as the end of processing all of the results. Neither of those should happen.
When I print out the contents of the dictionary, the sid is in fact gone, but there are other entries for other requests that are being processed. In some cases, more entries than I see when the respective request starts.
This pattern is called on a web request handler. Not in the background.
It's as if, the callbacks occur on a difference instance.
The sid key in this case is a combination of strings that includes a time string and I'm confident it is unique.

How to POST multiple FILES using Flask test client?

In order to test a Flask application, I got a flask test client POSTing request with files as attachment
def make_tst_client_service_call1(service_path, method, **kwargs):
_content_type = kwargs.get('content-type','multipart/form-data')
with app.test_client() as client:
return client.open(service_path, method=method,
content_type=_content_type, buffered=True,
follow_redirects=True,**kwargs)
def _publish_a_model(model_name, pom_env):
service_url = u'/publish/'
scc.data['modelname'] = model_name
scc.data['username'] = "BDD Script"
scc.data['instance'] = "BDD Stub Simulation"
scc.data['timestamp'] = datetime.now().strftime('%d-%m-%YT%H:%M')
scc.data['file'] = (open(file_path, 'rb'),file_name)
scc.response = make_tst_client_service_call1(service_url, method, data=scc.data)
Flask Server end point code which handles the above POST request is something like this
#app.route("/publish/", methods=['GET', 'POST'])
def publish():
if request.method == 'POST':
LOG.debug("Publish POST Service is called...")
upload_files = request.files.getlist("file[]")
print "Files :\n",request.files
print "Upload Files:\n",upload_files
return render_response_template()
I get this Output
Files:
ImmutableMultiDict([('file', <FileStorage: u'Single_XML.xml' ('application/xml')>)])
Upload Files:
[]
If I change
scc.data['file'] = (open(file_path, 'rb'),file_name)
into (thinking that it would handle multiple files)
scc.data['file'] = [(open(file_path, 'rb'),file_name),(open(file_path, 'rb'),file_name1)]
I still get similar Output:
Files:
ImmutableMultiDict([('file', <FileStorage: u'Single_XML.xml' ('application/xml')>), ('file', <FileStorage: u'Second_XML.xml' ('application/xml')>)])
Upload Files:
[]
Question:
Why request.files.getlist("file[]") is returning an empty list?
How can I post multiple files using flask test client, so that it can be retrieved using request.files.getlist("file[]") at flask server side ?
Note:
I would like to have flask client I dont want curl or any other client based solutions.
I dont want to post single file in multiple requests
Thanks
Referred these links already:
Flask and Werkzeug: Testing a post request with custom headers
Python - What type is flask.request.files.stream supposed to be?
You send the files as the parameter named file, so you can't look them up with the name file[]. If you want to get all the files named file as a list, you should use this:
upload_files = request.files.getlist("file")
On the other hand, if you really want to read them from file[], then you need to send them like that:
scc.data['file[]'] = # ...
(The file[] syntax is from PHP and it's used only on the client side. When you send the parameters named like that to the server, you still access them using $_FILES['file'].)
Lukas already addressed this,just providing these info as it may help someone
Werkzeug client is doing some clever stuff by storing requests data in MultiDict
#native_itermethods(['keys', 'values', 'items', 'lists', 'listvalues'])
class MultiDict(TypeConversionDict):
"""A :class:`MultiDict` is a dictionary subclass customized to deal with
multiple values for the same key which is for example used by the parsing
functions in the wrappers. This is necessary because some HTML form
elements pass multiple values for the same key.
:class:`MultiDict` implements all standard dictionary methods.
Internally, it saves all values for a key as a list, but the standard dict
access methods will only return the first value for a key. If you want to
gain access to the other values, too, you have to use the `list` methods as
explained below.
getList call looks for a given key in the "requests" dictionary. If the key doesn't exist, it returns empty list.
def getlist(self, key, type=None):
"""Return the list of items for a given key. If that key is not in the
`MultiDict`, the return value will be an empty list. Just as `get`
`getlist` accepts a `type` parameter. All items will be converted
with the callable defined there.
:param key: The key to be looked up.
:param type: A callable that is used to cast the value in the
:class:`MultiDict`. If a :exc:`ValueError` is raised
by this callable the value will be removed from the list.
:return: a :class:`list` of all the values for the key.
"""
try:
rv = dict.__getitem__(self, key)
except KeyError:
return []
if type is None:
return list(rv)
result = []
for item in rv:
try:
result.append(type(item))
except ValueError:
pass
return result

Categories

Resources