Retrieve messages from a topic on a message hub - python

I am trying to get messages from a topic on a message hub on bluemix using Confluent Kafka Python. My code is found below, but something is not working. The topic and the message hub is up and running, so there is probably something with the code.
from confluent_kafka import Producer, KafkaError, Consumer
consumer_settings = {
'bootstrap.servers': 'broker-url-here',
'group.id': 'mygroup',
'default.topic.config': {'auto.offset.reset': 'smallest'},
'sasl.mechanisms': 'PLAIN',
'security.protocol': 'ssl',
'sasl.username': 'username-here',
'sasl.password': 'password-here',
}
c = Consumer(**consumer_settings)
c.subscribe(['topic-here'])
running = True
while running:
msg = c.poll()
if msg.error():
print("Error while retrieving message")
c.close()
sys.exit(10)
elif (msg is not None):
for x in msg:
print(x)
else:
sys.exit(10)
When I run the code, it seems to get stuck at msg = c.poll(). So I guess it is either failing to connect, or failing to retrieve messages. The credentials themselves are correct.

The consume logic look fine but the configuration for the consumer is incorrect.
security.protocol needs to be set to sasl_ssl
ssl.ca.location needs to point to a PEM file containing trusted certificates. The location of that file varies for each OS, but for the most common it's:
Bluemix/Ubuntu: /etc/ssl/certs
Red Hat: /etc/pki/tls/cert.pem
macOS: /etc/ssl/certs.pem
We also have a sample app using this client that can easily be started or deployed to Bluemix: https://github.com/ibm-messaging/message-hub-samples/tree/master/kafka-python-console-sample

Related

Kafka consumer implementation not working in Python

First time working with Kafka and I've run into a problem.
I have a following implementation of my consumer:
from kafka import KafkaConsumer
import config
class KafkaMessageConsumer:
def __init__(self):
self.consumer = KafkaConsumer(
bootstrap_servers=config.KAFKA_BOOTSTRAP_SERVER,
security_protocol=config.KAFKA_SECURITY_PROTOCOL,
sasl_mechanism=config.KAFKA_SASL_MECHANISM,
sasl_plain_username=config.KAFKA_USERNAME,
sasl_plain_password=config.KAFKA_PASSWORD,
value_deserializer=lambda x: json.loads(x.decode("utf-8")),
)
def receive_messages(self, topic):
self.consumer.subscribe(topics=[topic])
print(f"Subscribed to topics: {self.consumer.subscription()}")
for msg in self.consumer:
yield msg.value
if __name__ == "__main__":
consumer = KafkaMessageConsumer()
for message in consumer.receive_messages(config.KAFKA_TOPIC):
print("Received message:", message)
Where the credentials should be implemented correctly. I get the message of subscribing to the topic without error, but there are no yielded messages eventhough I know for sure that there are messages to be consumed on the topic. Am I missing some neccessery config here?
I'm no expert in Python but it looks like you haven't consumed any messages. You have subscribed to the topic but you would need to poll() for messages https://kafka-python.readthedocs.io/en/master/apidoc/KafkaConsumer.html#kafka.KafkaConsumer.poll
Also, where have you set the topic name? [topic]

SendGrid Random Duplicate Emails (Python Flask Heroku)

I'm currently working on an app that updates artists with times they are booked for events. Randomly the app will send duplicate emails a few times a day, over 90% of the time there are not duplicates.
Currently, there are 10+ emails that can be produced, but there is only one email send function. The duplicates occur across all emails which makes me think there is an issue with the email send function or the configuration of the web server to make multiple requests to sendgrid. PLEASE HELP ME FIND THE CAUSE OF THE DUPILCATES!
Stack:
Python (v3.10.6)
Flask (v2.2.2)
Heroku (buildstack-22)
Sendgrid python library (v6.9.7)
Email Send Function:
def send_email(to, subject, template, cc='None', attachment_location='None', attachment_name='None', private_email=False, **kwargs):
## Remove Guest DJ Emails Here
guest_dj_list = get_list_of_guest_djs()
if to in guest_dj_list:
return None
message = Mail(
from_email=From('test#test.com', current_app.config['MAIL_ALIAS']),
to_emails=[to],
subject=subject,
html_content=render_template(template, **kwargs))
## cc management
if private_email == False:
cc_list = ['test#test.com']
if cc != 'None':
cc_list.append(cc)
message.cc = cc_list
if attachment_location != 'None':
with open(attachment_location, 'rb') as f:
data = f.read()
f.close()
encoded_file = base64.b64encode(data).decode()
attachedFile = Attachment(
FileContent(encoded_file),
FileName(attachment_name),
FileType('application/xlsx'),
Disposition('attachment')
)
message.attachment = attachedFile
try:
sg = SendGridAPIClient(os.environ.get('SENDGRID_API_KEY'))
response = sg.send(message)
# print(response.status_code)
# print(response.body)
# print(response.headers)
print(f'Complete: Email Sent to {to}')
except Exception as e:
print(e.message)
Heroku Procfile
web: gunicorn test_app.app:app --preload --log-level=debug
According to your comments, the emails are triggered by user actions. From the code you have shared there is nothing that would cause an email to send twice, so my guess is that users are causing the occasional double sending by submitting forms more than once.
To discover the root of this, I would first attempt to disable your forms after the first submit. You can do this with a bit of JavaScript, something like:
document.querySelectorAll('form').forEach(form => {
form.addEventListener('submit', (e) => {
// Prevent if already submitting
if (form.classList.contains('is-submitting')) {
e.preventDefault();
}
// Add class to hook our visual indicator on
form.classList.add('is-submitting');
});
});
This comes from this article which has a good discussion of the issue too.
Once you have done that, you should likely also log the various actions that cause an email to send and try to chase back through your system to find what could be causing double submissions. I would pay attention to other things like callbacks or background jobs that may be the culprit here too.

How to open a secure channel in python gRPC client without a client SSL certificate

I have a grpc server (in Go) that has a valid TLS certificate and does not require client side TLS. For some reason I can not implement the client without mTLS in Python, even though I can do so in Golang.
In Python I have
os.environ["GRPC_VERBOSITY"] = "DEBUG"
# os.environ["GRPC_DEFAULT_SSL_ROOTS_FILE_PATH"] = "/etc/ssl/certs/ca-bundle.crt"
channel = grpc.secure_channel(ADDR, grpc.ssl_channel_credentials())
grpc.channel_ready_future(channel).result(timeout=10)
This gives me the following error
D0513 08:02:08.147319164 21092 security_handshaker.cc:181] Security handshake failed: {"created":"#1652446928.147311309","description":"Handshake failed","file":"src/core/lib/security/transport/security_handshaker.cc","file_line":377,"tsi_code":10,"tsi_error":"TSI_PROTOCOL_FAILURE"}
I can get this to work if I use SSL certificates by uncommenting the commented out line. I know for a fact that my server does not request, require or verify client certificates as The following Go code work perfectly
conn, err := grpc.DialContext(
ctx,
gRPCAddr,
grpc.WithTransportCredentials(credentials.NewClientTLSFromCert(nil, "")),
)
dummyClient := dummy.NewDummyServiceClient(conn)
if _, err := dummyClient.Ping(context.Background(), &dummy.PingRequest{
Ping: "go client ping",
}); err != nil {
return fmt.Errorf("failed to ping: %w", err)
}
https://grpc.github.io/grpc/python/_modules/grpc.html#secure_channel has the docs for channel = grpc.secure_channel(ORBIUM_ADDR, grpc.ssl_channel_credentials()). This function relies on the class channel, see docs https://grpc.github.io/grpc/python/_modules/grpc/aio/_channel.html.
Basically, class Channel wraps C code to provide a secure channel. That wrapped C code expects the certificate. If you can implement in C, it might be easiest to just change the C code.
If the certificate on the server-side is publicly signed, you can use:
grpc.secure_channel(ORBIUM_ADDR, grpc.ssl_channel_credentials())
But that doesn't seem to work for you, so I guess the server certificate is signed by a root cert owned by you. You can pass in the root cert into the root_certificates field [1], and leave the other two fields empty. This use case is documented in our Authentication guide [2].
with open(os.environ["GRPC_DEFAULT_SSL_ROOTS_FILE_PATH"], 'rb') as f:
creds = grpc.ssl_channel_credentials(f.read())
channel = grpc.secure_channel(ORBIUM_ADDR, creds)
[1] https://grpc.github.io/grpc/python/grpc.html#grpc.ssl_channel_credentials
[2] https://grpc.io/docs/guides/auth/
The answer given by #former_Epsilon answered my question, however the solution I came up with for the problem was different and I ended up using a secure_channel so I wanted to post an answer for that as well.
import os
import grpc
# configure this dict for your systems
system_certs_map = {
"Windows": "<Path to system cert>",
"Darwin": "$REQUESTS_CA_BUNDLE",
"Linux": "/etc/ssl/certs/ca-bundle.crt",
}
os.environ["GRPC_DEFAULT_SSL_ROOTS_FILE_PATH"] = system_certs_map[platform.system()]
channel_credentials = grpc.ssl_channel_credentials()
My guess based on Python GRPC doc https://grpc.github.io/grpc/python/grpc.html
channel = grpc.insecure_channel(ORBIUM_ADDR)
instead of:
channel = grpc.secure_channel(ORBIUM_ADDR, grpc.ssl_channel_credentials())

Python grpc - reading in all messages before sending responses

I'm trying to understand if grpc server using streams is able to wait for all client messages to be read in prior to sending responses.
I have a trivial application where I send in several numbers I'd like to add and return.
I've set up a basic proto file to test this:
syntax = "proto3";
message CalculateRequest{
int64 x = 1;
int64 y = 2;
};
message CalculateReply{
int64 result = 1;
}
service Svc {
rpc CalculateStream (stream CalculateRequest) returns (stream CalculateReply);
}
On my server-side I have implemented the following code which returns the answer message as the message is received:
class CalculatorServicer(contracts_pb2_grpc.SvcServicer):
def CalculateStream(self, request_iterator, context):
for request in request_iterator:
resultToOutput = request.x + request.y
yield contracts_pb2.CalculateReply(result=resultToOutput)
def serve():
server = grpc.server(futures.ThreadPoolExecutor(max_workers=10))
contracts_pb2_grpc.add_SvcServicer_to_server(
CalculatorServicer(), server)
server.add_insecure_port('localhost:9000')
server.start()
server.wait_for_termination()
if __name__ == '__main__':
print( "We're up")
logging.basicConfig()
serve()
I'd like to tweak this to first read in all the numbers and then send these out at a later stage - something like the following:
class CalculatorServicer(contracts_pb2_grpc.SvcServicer):
listToReturn = []
def CalculateStream(self, request_iterator, context):
for request in request_iterator:
listToReturn.append (request.x + request.y)
# ...
# do some other stuff first before returning
for item in listToReturn:
yield contracts_pb2.CalculateReply(result=resultToOutput)
Currently, my implementation to write out later doesn't work as the code at the bottom is never reached. Is this by design that the connection seems to "close" before reaching there?
The grpc.io website suggests that this should be possible with BiDirectional streaming:
for example, the server could wait to receive all the client messages before writing its responses, or it could alternately read a message then write a message, or some other combination of reads and writes.
Thanks in advance for any help :)
The issue here is the definition of "all client messages." At the transport level, the server has no way of knowing whether the client has finished independent of the client closing its connection.
You need to add some indication of the client's having finished sending requests to the protocol. Either add a bool field to the existing CalculateRequest or add a top-level oneof with one of the options being something like a StopSendingRequests

AWS Lambda/SNS Publish ignore invalid endpoints

i'm sending apple push notifications via AWS SNS via Lambda with Boto3 and Python.
from __future__ import print_function
import boto3
def lambda_handler(event, context):
client = boto3.client('sns')
for record in event['Records']:
if record['eventName'] == 'INSERT':
rec = record['dynamodb']['NewImage']
competitors = rec['competitors']['L']
for competitor in competitors:
if competitor['M']['confirmed']['BOOL'] == False:
endpoints = competitor['M']['endpoints']['L']
for endpoint in endpoints:
print(endpoint['S'])
response = client.publish(
#TopicArn='string',
TargetArn = endpoint['S'],
Message = 'test message'
#Subject='string',
#MessageStructure='string',
)
Everything works fine! But when an endpoint is invalid for some reason (at the moment this happens everytime i run a development build on my device, since i get a different endpoint then. This will be either not found or deactivated.) the Lambda function fails and gets called all over again. In this particular case if for example the second endpoint fails it will send the push over and over again to endpoint 1 to infinity.
Is it possible to ignore invalid endpoints and just keep going with the function?
Thank you
Edit:
Thanks to your help i was able to solve it with:
try:
response = client.publish(
#TopicArn='string',
TargetArn = endpoint['S'],
Message = 'test message'
#Subject='string',
#MessageStructure='string',
)
except Exception as e:
print(e)
continue
Aws lamdba on failure retries the function till the event expires from the stream.
In your case since the exception on the 2nd endpoint is not handled, the retry mechanism ensures the reexecution of post to the first endpoint.
If you handle the exception and ensure the function successfully ends even when there is a failure, then the retries will not happen.

Categories

Resources