How to perform async commit when using kafka-python

How to perform async commit when using kafka-python - python

I'm using kafka-python library for my fastapi consumer app and I'm consuming messages in batch with maximum of 100 records. Since the topic has huge traffic and have only one partition, consuming, processing and committing should be as quick as possible hence I want to use commit_async(), instead of synchronous commit().
But I'm not able to find a good example of commit_async(). I'm looking for an example for commit_async() with callback so that I can log in case of commit failure. But I'm not sure what does that callback function takes as argument and what field those arguments contain.
However the docs related to commit_async mentions the arguments, I'm not completely sure how to use them.
I need help in completing my callback function on_commit(), someone please help here
Code
import logging as log
from kafka import KafkaConsumer
from message_handler_impl import MessageHandlerImpl
def on_commit():
pass
class KafkaMessageConsumer:
def __init__(self, bootstrap_servers: str, topic: str, group_id: str):
self.bootstrap_servers = bootstrap_servers
self.topic = topic
self.group_id = group_id
self.consumer = KafkaConsumer(topic, bootstrap_servers=bootstrap_servers, group_id=group_id, enable_auto_commit=False, auto_offset_reset='latest')
def consume_messages(self, max_poll_records: int,
message_handler: MessageHandlerImpl = MessageHandlerImpl()):
try:
while True:
try:
msg_pack = self.consumer.poll(max_records=max_poll_records)
for topic_partition, messages in msg_pack.items():
message_handler.process_messages(messages)
self.consumer.commit_async(callback=on_commit)
except Exception as e:
log.error("Error while consuming message due to: %s", e, exc_info=True)
finally:
log.error("Something went wrong, closing consumer...........")
self.consumer.close()
if __name__ == "__main__":
kafka_consumer = KafkaMessageConsumer("localhost:9092", "test-topic", "test-group")
kafka_consumer.consume_messages(100)

The docs are fairly clear.
Called as callback(offsets, response) with response as either an Exception or an OffsetCommitResponse struct.
def on_commit(offsets, response):
# or maybe try checking type(response)
if hasattr(response, '<some attribute unique to OffsetCommitResponse>'):
print('committed ' + str(offsets))
else:
print(response) # exception
I'm sure you could look at the source code an maybe find a unit test that covers a full example

Related

Python InfluxDB2 - write_api.write(...) How to check for success?

I need to write historic data into InfluxDB (I'm using Python, which is not a must in this case, so I maybe willing to accept non-Python solutions). I set up the write API like this
write_api = client.write_api(write_options=ASYNCHRONOUS)
The Data comes from a DataFrame with a timestamp as key, so I write it to the database like this
result = write_api.write(bucket=bucket, data_frame_measurement_name=field_key, record=a_data_frame)
This call does not throw an exception, even if the InfluxDB server is down. result has a protected attribute _success that is a boolean in debugging, but I cannot access it from the code.
How do I check if the write was a success?

If you use background batching, you can add custom success, error and retry callbacks.
from influxdb_client import InfluxDBClient
def success_cb(details, data):
url, token, org = details
print(url, token, org)
data = data.decode('utf-8').split('\n')
print('Total Rows Inserted:', len(data))
def error_cb(details, data, exception):
print(exc)
def retry_cb(details, data, exception):
print('Retrying because of an exception:', exc)
with InfluxDBClient(url, token, org) as client:
with client.write_api(success_callback=success_cb,
error_callback=error_cb,
retry_callback=retry_cb) as write_api:
write_api.write(...)
If you are eager to test all the callbacks and don't want to wait until all retries are finished, you can override the interval and number of retries.
from influxdb_client import InfluxDBClient, WriteOptions
with InfluxDBClient(url, token, org) as client:
with client.write_api(success_callback=success_cb,
error_callback=error_cb,
retry_callback=retry_cb,
write_options=WriteOptions(retry_interval=60,
max_retries=2),
) as write_api:
...

if you want to immediately write data into database, then use SYNCHRONOUS version of write_api - https://github.com/influxdata/influxdb-client-python/blob/58343322678dd20c642fdf9d0a9b68bc2c09add9/examples/example.py#L12
The asynchronous write should be "triggered" by call .get() - https://github.com/influxdata/influxdb-client-python#asynchronous-client
Regards

write_api.write() returns a multiprocessing.pool.AsyncResult or multiprocessing.pool.AsyncResult (both are the same).
With this return object you can check on the asynchronous request in a couple of ways. See here: https://docs.python.org/2/library/multiprocessing.html#multiprocessing.pool.AsyncResult
If you can use a blocking request, then write_api = client.write_api(write_options=SYNCRONOUS) can be used.

from datetime import datetime
from influxdb_client import WritePrecision, InfluxDBClient, Point
from influxdb_client.client.write_api import SYNCHRONOUS
with InfluxDBClient(url="http://localhost:8086", token="my-token", org="my-org", debug=False) as client:
p = Point("my_measurement") \
.tag("location", "Prague") \
.field("temperature", 25.3) \
.time(datetime.utcnow(), WritePrecision.MS)
try:
client.write_api(write_options=SYNCHRONOUS).write(bucket="my-bucket", record=p)
reboot = False
except Exception as e:
reboot = True
print(f"Reboot? {reboot}")

Google Cloud Functions randomly retrying on success

I have a Google Cloud Function triggered by a PubSub. The doc states messages are acknowledged when the function end with success.
link
But randomly, the function retries (same execution ID) exactly 10 minutes after execution. It is the PubSub ack max timeout.
I also tried to get message ID and acknowledge it programmatically in Function code but the PubSub API respond there is no message to ack with that id.
In StackDriver monitoring, I see some messages not being acknowledged.
Here is my code : main.py
import base64
import logging
import traceback
from google.api_core import exceptions
from google.cloud import bigquery, error_reporting, firestore, pubsub
from sql_runner.runner import orchestrator
logging.getLogger().setLevel(logging.INFO)
def main(event, context):
bigquery_client = bigquery.Client()
firestore_client = firestore.Client()
publisher_client = pubsub.PublisherClient()
subscriber_client = pubsub.SubscriberClient()
logging.info(
'event=%s',
event
)
logging.info(
'context=%s',
context
)
try:
query_id = base64.b64decode(event.get('data',b'')).decode('utf-8')
logging.info(
'query_id=%s',
query_id
)
# inject dependencies
orchestrator(
query_id,
bigquery_client,
firestore_client,
publisher_client
)
sub_path = (context.resource['name']
.replace('topics', 'subscriptions')
.replace('function-sql-runner', 'gcf-sql-runner-europe-west1-function-sql-runner')
)
# explicitly ack message to avoid duplicates invocations
try:
subscriber_client.acknowledge(
sub_path,
[context.event_id] # message_id to ack
)
logging.warning(
'message_id %s acknowledged (FORCED)',
context.event_id
)
except exceptions.InvalidArgument as err:
# google.api_core.exceptions.InvalidArgument: 400 You have passed an invalid ack ID to the service (ack_id=982967258971474).
logging.info(
'message_id %s already acknowledged',
context.event_id
)
logging.debug(err)
except Exception as err:
# catch all exceptions and log to prevent cold boot
# report with error_reporting
error_reporting.Client().report_exception()
logging.critical(
'Internal error : %s -> %s',
str(err),
traceback.format_exc()
)
if __name__ == '__main__': # for testing
from collections import namedtuple # use namedtuple to avoid Class creation
Context = namedtuple('Context', 'event_id resource')
context = Context('666', {'name': 'projects/my-dev/topics/function-sql-runner'})
script_to_start = b' ' # launch the 1st script
script_to_start = b'060-cartes.sql'
main(
event={"data": base64.b64encode(script_to_start)},
context=context
)
Here is my code : runner.py
import logging
import os
from retry import retry
PROJECT_ID = os.getenv('GCLOUD_PROJECT') or 'my-dev'
def orchestrator(query_id, bigquery_client, firestore_client, publisher_client):
"""
if query_id empty, start the first sql script
else, call the given query_id.
Anyway, call the next script.
If the sql script is the last, no call
retrieve SQL queries from FireStore
run queries on BigQuery
"""
docs_refs = [
doc_ref.get() for doc_ref in
firestore_client.collection(u'sql_scripts').list_documents()
]
sorted_queries = sorted(docs_refs, key=lambda x: x.id)
if not bool(query_id.strip()) : # first execution
current_index = 0
else:
# find the query to run
query_ids = [ query_doc.id for query_doc in sorted_queries]
current_index = query_ids.index(query_id)
query_doc = sorted_queries[current_index]
bigquery_client.query(
query_doc.to_dict()['request'], # sql query
).result()
logging.info(
'Query %s executed',
query_doc.id
)
# exit if the current query is the last
if len(sorted_queries) == current_index + 1:
logging.info('All scripts were executed.')
return
next_query_id = sorted_queries[current_index+1].id.encode('utf-8')
publish(publisher_client, next_query_id)
#retry(tries=5)
def publish(publisher_client, next_query_id):
"""
send a message in pubsub to call the next query
this mechanism allow to run one sql script per Function instance
so as to not exceed the 9min deadline limit
"""
logging.info('Calling next query %s', next_query_id)
future = publisher_client.publish(
topic='projects/{}/topics/function-sql-runner'.format(PROJECT_ID),
data=next_query_id
)
# ensure publish is successfull
message_id = future.result()
logging.info('Published message_id = %s', message_id)
It looks like the pubsub message is not ack on success.
I do not think I have background activity in my code.
My question : why my Function is randomly retrying even when success ?

Cloud Functions does not guarantee that your functions will run exactly once. According to the documentation, background functions, including pubsub functions, are given an at-least-once guarantee:
Background functions are invoked at least once. This is because of the
asynchronous nature of handling events, in which there is no caller
that waits for the response. The system might, in rare circumstances,
invoke a background function more than once in order to ensure
delivery of the event. If a background function invocation fails with
an error, it will not be invoked again unless retries on failure are
enabled for that function.
Your code will need to expect that it could possibly receive an event more than once. As such, your code should be idempotent:
To make sure that your function behaves correctly on retried execution
attempts, you should make it idempotent by implementing it so that an
event results in the desired results (and side effects) even if it is
delivered multiple times. In the case of HTTP functions, this also
means returning the desired value even if the caller retries calls to
the HTTP function endpoint. See Retrying Background Functions for more
information on how to make your function idempotent.

Python - Poloniex Push API

I am trying to get live data in Python 2.7.13 from Poloniex through the push API.
I read many posts (including How to connect to poloniex.com websocket api using a python library) and I arrived to the following code:
from autobahn.twisted.wamp import ApplicationSession
from autobahn.twisted.wamp import ApplicationRunner
from twisted.internet.defer import inlineCallbacks
import six
class PoloniexComponent(ApplicationSession):
def onConnect(self):
self.join(self.config.realm)
#inlineCallbacks
def onJoin(self, details):
def onTicker(*args):
print("Ticker event received:", args)
try:
yield self.subscribe(onTicker, 'ticker')
except Exception as e:
print("Could not subscribe to topic:", e)
def main():
runner = ApplicationRunner(six.u("wss://api.poloniex.com"), six.u("realm1"))
runner.run(PoloniexComponent)
if __name__ == "__main__":
main()
Now, when I run the code, it looks like it's running successfully, but I don't know where I am getting the data. I have two questions:
I would really appreciate if someone could walk me through the process of subscribing and getting ticker data, that I will elaborate in python, from step 0: I am running the program on Spyder on Windows. Am I supposed to activate somehow Crossbar?
How do I quit the connection? I simply killed the process with Ctrl+c and now when I try to run it agan, I get the error: ReactorNonRestartable.

I ran into a lot of issues using Poloniex with Python2.7 but finally came to a solution that hopefully helps you.
I found that Poloniex has pulled support for the original WAMP socket endpoint so I would probably stray from this method altogether. Maybe this is the entirety of the answer you need but if not here is an alternate way to get ticker information.
The code that ended up working best for me is actually from the post you linked to above but there was some info regarding currency pair ids I found elsewhere.
import websocket
import thread
import time
import json
def on_message(ws, message):
print(message)
def on_error(ws, error):
print(error)
def on_close(ws):
print("### closed ###")
def on_open(ws):
print("ONOPEN")
def run(*args):
# ws.send(json.dumps({'command':'subscribe','channel':1001}))
ws.send(json.dumps({'command':'subscribe','channel':1002}))
# ws.send(json.dumps({'command':'subscribe','channel':1003}))
# ws.send(json.dumps({'command':'subscribe','channel':'BTC_XMR'}))
while True:
time.sleep(1)
ws.close()
print("thread terminating...")
thread.start_new_thread(run, ())
if __name__ == "__main__":
websocket.enableTrace(True)
ws = websocket.WebSocketApp("wss://api2.poloniex.com/",
on_message = on_message,
on_error = on_error,
on_close = on_close)
ws.on_open = on_open
ws.run_forever()
I commented out the lines that pull data you don't seem to want, but for reference here is some more info from that previous post:
1001 = trollbox (you will get nothing but a heartbeat)
1002 = ticker
1003 = base coin 24h volume stats
1010 = heartbeat
'MARKET_PAIR' = market order books
Now you should get some data that looks something like this:
[121,"2759.99999999","2759.99999999","2758.000000‌00","0.02184376","12‌268375.01419869","44‌95.18724321",0,"2767‌.80020000","2680.100‌00000"]]
This is also annoying because the "121" at the beginning is the currency pair id, and this is undocumented and also unanswered in the other stack overflow question referred to here.
However, if you visit this url: https://poloniex.com/public?command=returnTicker it seems the id is shown as the first field, so you could create your own mapping of id->currency pair or parse the data by the ids you want from this.
Alternatively, something as simple as:
import urllib
import urllib2
import json
ret = urllib2.urlopen(urllib2.Request('https://poloniex.com/public?command=returnTicker'))
print json.loads(ret.read())
will return to you the data that you want, but you'll have to put it in a loop to get constantly updating information. Not sure of your needs once the data is received so I will leave the rest up to you.
Hope this helps!

I made, with the help of other posts, the following code to get the latest data using Python 3.x. I hope this helps you:
#TO SAVE THE HISTORICAL DATA (X MINUTES/HOURS) OF EVERY CRYPTOCURRENCY PAIR IN POLONIEX:
from poloniex import Poloniex
import pandas as pd
from time import time
import os
api = Poloniex(jsonNums=float)
#Obtains the pairs of cryptocurrencies traded in poloniex
pairs = [pair for pair in api.returnTicker()]
i = 0
while i < len(pairs):
#Available candle periods: 5min(300), 15min(900), 30min(1800), 2hr(7200), 4hr(14400), and 24hr(86400)
raw = api.returnChartData(pairs[i], period=86400, start=time()-api.YEAR*10)
df = pd.DataFrame(raw)
# adjust dates format and set dates as index
df['date'] = pd.to_datetime(df["date"], unit='s')
df.set_index('date', inplace=True)
# Saves the historical data of every pair in a csv file
path=r'C:\x\y\Desktop\z\folder_name'
df.to_csv(os.path.join(path,r'%s.csv' % pairs[i]))
i += 1

AWS Lambda/SNS Publish ignore invalid endpoints

i'm sending apple push notifications via AWS SNS via Lambda with Boto3 and Python.
from __future__ import print_function
import boto3
def lambda_handler(event, context):
client = boto3.client('sns')
for record in event['Records']:
if record['eventName'] == 'INSERT':
rec = record['dynamodb']['NewImage']
competitors = rec['competitors']['L']
for competitor in competitors:
if competitor['M']['confirmed']['BOOL'] == False:
endpoints = competitor['M']['endpoints']['L']
for endpoint in endpoints:
print(endpoint['S'])
response = client.publish(
#TopicArn='string',
TargetArn = endpoint['S'],
Message = 'test message'
#Subject='string',
#MessageStructure='string',
)
Everything works fine! But when an endpoint is invalid for some reason (at the moment this happens everytime i run a development build on my device, since i get a different endpoint then. This will be either not found or deactivated.) the Lambda function fails and gets called all over again. In this particular case if for example the second endpoint fails it will send the push over and over again to endpoint 1 to infinity.
Is it possible to ignore invalid endpoints and just keep going with the function?
Thank you
Edit:
Thanks to your help i was able to solve it with:
try:
response = client.publish(
#TopicArn='string',
TargetArn = endpoint['S'],
Message = 'test message'
#Subject='string',
#MessageStructure='string',
)
except Exception as e:
print(e)
continue

Aws lamdba on failure retries the function till the event expires from the stream.
In your case since the exception on the 2nd endpoint is not handled, the retry mechanism ensures the reexecution of post to the first endpoint.
If you handle the exception and ensure the function successfully ends even when there is a failure, then the retries will not happen.

Python -- Send Email When Exception Is Raised?

I have a python class with many methods():
Method1()
Method2()
...........
...........
MethodN()
All methods -- while performing different tasks -- have the same scheme:
do something
do something else
has anything gone wrong?
raise an exception
I want to be able to get an email whenever an exception is raised anywhere in the class.
Is there some easy way to combine this logic into the class, rather than calling SendEmail() before every raise Exception statement? what is the right, pythonic way to deal with such a case? canh a 'generalized' Exception handler be the solution? I'd be glad for any ideas you may have.

like #User said before Python has logging.handlers.SMTPHandler to send logged error message. Use logging module! Overriding exception class to send an email is a bad idea.
Quick example:
import logging
import logging.handlers
smtp_handler = logging.handlers.SMTPHandler(mailhost=("smtp.example.com", 25),
fromaddr="from#example.com",
toaddrs="to#example.com",
subject=u"AppName error!")
logger = logging.getLogger()
logger.addHandler(smtp_handler)
try:
break
except Exception as e:
logger.exception('Unhandled Exception')

Note: Although this is a simple, obvious solution to the problem as stated, the below answer is probably better in most cases.
If the alternative is this:
if problem_test():
SendEmail()
raise Exception
Then why don't you just define a custom raise_email method?
def raise_email(self, e):
SendEmail()
raise e

Python stdlib has dedicated class to do what you want. See logging.handlers.SMTPHandler

Gist Link
The most important trick is here if secure parameter is not passed, the default value is None which raises exception if you are trying to authenticate with TLS/SSL enabled STMP Servers lik Gmail's, Yahoo's, Yandex's, STMP servers.
We passed an empty tuple to trigger smtp.ehlo() to authenticate correctly with SSL.
...
if self.secure is not None:
smtp.ehlo()
smtp.starttls(*self.secure)
smtp.ehlo()
...
import logging
import logging.handlers
__author__ = 'Ahmed Şeref GÜNEYSU'
def foo():
raise Exception("Foo Bar")
def main():
logger = logging.getLogger()
logger.addHandler(logging.handlers.SMTPHandler(
mailhost=("smtp.mail.yahoo.com", 587),
fromaddr="boss#example.com",
toaddrs="me#example.com",
subject="EXCEPTION",
credentials=('smtpuser#example.com', 'MY SECRET PASSWORD'),
secure=()))
try:
foo()
except Exception, e:
logging.exception(e)
if __name__ == '__main__':
main()

Beware the wizard's apprentice!
It would be better to log those errors, then check to see when the last email was sent, and if the timespan is too short, do not send another message because the human being will already be looking at the log file. For many things, one message per day would be enough, but even for system critical things, if you have already had one failure, what could go wrong if you wait two hours to send the next email?
If you send one email per two hour timespan, then the maximum number of emails per day is 12. And if you get a cascading failure (you will!) then it will most likely happen within a couple of hours of the first failure event.
Most large networking companies offer an SLA of 4 hour to fix a failure, measured from the time it first occurs (because cascading failures tend to repeat) until the customer is satisified that it is fixed. If you have a tighter SLA than that, then unless it is some finance industry service, you probably are offering too high of a service level.
But if you do have a 4 hour SLA, then I would make sure that any email sent within 2 - 4 hours of the last email, should use whatever bells and whistles you can to prioritise it, highlight it, etc. For instance use the X-Priority header and put the word URGENT in the subject so that your mail client can display it in large bold red letters.

How about this:
class MyException(Exception):
def __init__(self):
SendEmail()

Something like this, perhaps?
def mailexception(ex):
# Be creative.
print 'Mailing... NOW!'
def pokemontrainer(cls):
class Rye(cls):
def __getattribute__(self, name):
def catcher(func):
def caller(*args, **kwargs):
try:
func(*args, **kwargs)
except Exception, e:
mailexception(e)
raise
return caller
ref = cls.__getattribute__(self, name)
if hasattr(cls, name) and hasattr(getattr(cls, name), '__call__'):
return catcher(ref)
return Rye
#pokemontrainer
class Exceptor(object):
def toss(self, e):
raise e('Incoming salad!')
ex = Exceptor()
ex.toss(ValueError)

By 2019, the easiest and best option seems to be sentry.
You need just two lines of code:
import sentry_sdk
sentry_sdk.init("https://xxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxx#sentry.io/xxxxxxx")
and it will send you a detailed email with any raised error.
After this you can further inspect the error on the Sentry website where there is impressive debug info - exceptions traceback, logs, data passed to functions, packages versions etc...

I'd just subclass Exception and send the e-mail in the custom Exception.

You can use an except hook to send an email when an exception is not caught.
see sys.excepthook

You can use python alerting library. It supports email (via mailgun and sendgrid), telegram and slack notifications for sending alerts.
https://github.com/sinarezaei/alerting
Sample code:
from alerting import Alerting
from alerting.clients import AlertingMailGunClient, AlertingSlackClient, AlertingTelegramClient
alerts = Alerting(
clients=[
AlertingMailGunClient(your_mailgun_api_key, your_domain, from_email, target_email),
AlertingSlackClient(your_bot_user_oauth, target_channel),
AlertingTelegramClient(bot_token, chat_id)
]
)
try:
# something
except Exception as ex:
alerting.send_alert(title='some bad error happened', message=str(ex))

I like the answers that use the logging module, but I use the smtp library (smtplib) for this. To send error message, I do something like the following in the exception branch of the try/except block:
import smtplib as smtp
s = smtplib.SMTP('smtp.gmail.com', 587)
s.starttls()
from_user = r"foo#gmail.com"
to_user = r"bar#gmail.com"
password = "xxxxxxxx"
s.login(from_user, password)
subject = "Uh oh"
text = "XYZ error message you blew it!"
message = f"Subject: {subject}\n\n{text}"
s.sendmail(from_user, to_user, message);
This works well, but isn't the most secure option in the world. You actually have to tell google you want to let less secure apps connect (you can change this setting here: https://myaccount.google.com/lesssecureapps).

Develop Reference

Python is a programming language that lets you work quickly and integrate systems more effectively.

How to perform async commit when using kafka-python - python

Related

Python InfluxDB2 - write_api.write(...) How to check for success?

Google Cloud Functions randomly retrying on success

Python - Poloniex Push API

AWS Lambda/SNS Publish ignore invalid endpoints

Python -- Send Email When Exception Is Raised?

Categories

Resources