AWS SQS MessageAttribtues Purpose - python

I'm just wondering what the purpose of sending MessageAttributes with a message using SQS with Boto3. Is this to tell the receiver (if it were a python script receiving the message from the queue) to automatically cast the parts of the message as those relevant data types in the python interpreter? Like for instance, if sending a datetime string, and passing a MessageAttributes defining the type of data structure (along with the format of the datetime string), would boto3 automatically parse it and cast it as a datetime object? Or am I misunderstanding this.

There's nothing Python/Boto3 specific about Message Attributes. SQS Message Attributes are just a way to include metadata on SQS messages. I'm not aware of any AWS SDK in any programming language that performs data conversions automatically based on SQS Message Attributes.
From the linked page:
Amazon SQS lets you include structured metadata (such as timestamps,
geospatial data, signatures, and identifiers) with messages using
message attributes. Each message can have up to 10 attributes. Message
attributes are optional and separate from the message body (however,
they are sent alongside it). Your consumer can use message attributes
to handle a message in a particular way without having to process the
message body first.

Related

Slack - Get messages & threads for date range via Slack WebClient

I am trying to use slack web client API to pull slack messages including threads for a specific date. The conversations.history API only returns the parent message in the case of threaded messages. There is conversations.replies API that returns the message threads, but it requires ts of the parent message to be passed in the request so it will only return conversations related to one thread.
Is there a way to pull all message history including replies for a data range rather than having to combine a call to conversations.history API and then multiple calls to conversations.replies for each message with thread_ts?
This approach of combining both APIs won't work if the reply was posted on the specific date we want to pull, but the root thread message was posted on an older date. The root message won't be returned in conversations.history and hence we won't be able to get that particular message in the thread using conversations.replies.
It's strange that Slack doesn't provide such API to pull all messages including threaded ones.
Unfortunately, there is no way to capture all threads in a workspace with a single API call. The conversations.history is already a very data-heavy method. Most developers calling this method don't need thread information and including that in the reply would be a bit of an overkill. Calling conversations.replies should return all the replies corresponding to that parent message regardless of the date it was posted in unless otherwise specified using the latest or oldest parameters.

Paho-mqtt subscribe one-time message

Is there an elegant way to pull one message off the broker without having to:
subscribe
create an on_message()
receive the message
unsubscribe
I ask because we are using a json message which has multiple fields. When new data comes in I want to ONLY update that particular field in the json message but not remove the rest of the data. Since we have a TON of these json topics, we don't really want to keep all of them in program memory (also in case the program has to be relaunched). On top of that, this program could be running for months without supervision.
So ideally, I'd like to post the json message to an ID'd topic with the retain flag set to True. Then when new data comes in for that ID, I do a pull of the info on that topic, update that particular field in the json message and repost to the same topic.
I can post example code but I'm hoping there is a simple function that I am unaware of.
Thanks, in advance, for any suggestions.
The Paho Python client comes with a set of help classes that do this single shot type of pattern for you.
Doc here
e.g. the following connects to a broker, subscribes to a topic and returns on receipt of the first message on that topic.
import paho.mqtt.subscribe as subscribe
msg = subscribe.simple("paho/test/simple", hostname="mqtt.eclipse.org")
print("%s %s" % (msg.topic, msg.payload))
And the matching publish call:
import paho.mqtt.publish as publish
publish.single("paho/test/single", "payload", hostname="mqtt.eclipse.org")
I don't think that is possible. You say "When new data comes in..." That's exacty why you need to subscribe and use the callback function. That's basically a "pull when something is actually there".
Just to get an idea of how it should work: you are sending that json message via MQTT, right? And you are re-sending it when it changes?
But you don't have to keep them all in the RAM. You could use a retained message in combination with a fixed topic (not ID'ed) and send the ID in the message.
If you use retained messages with ID'ed topics, that might fill the memory.
What does the ID stand for? A uniqie number? Something like a timestamp? A hash? The sender?
I think you can solve that problem by clearly separating your things, e.g. say in data and message, where data is something you maintain in Python (e.g. a database or something in RAM) and message is something that you acually send / receive via MQTT.
Then you can add / send / update data depending on what is received in MQTT and you don't have to send / update the complete set.

How to get all required Avro Schemas given a set of topics from the Confluent Schema Registry

We're using Kafka, Avro and the Avro Schema Registry. Given a set of topics I want to consume, is there a way to get all schema IDs needed to decode the messages I'll receive?
I've checked the implementation of Confluent's Python client and what it seems to be doing is to receive messages, get the Avro schema ID from the individual message and then look up the schema from the Avro Schema Registry on the fly.
I'm looking for a way to get all schemas required before execution of the program (i.e. manually).
Yes you can get the schema for any topic data
The rest api is
GET /subjects/(string: subject)/versions
Get a list of versions registered under the specified subject.
A subject refers to either a “-key” or “-value” depending on whether you are registering the key schema for that topic or the value schema
Once you get the versions of schema you can get the schema for each version using
GET /subjects/(string: subject)/versions/(versionId: version)/schema
Reference
https://docs.confluent.io/current/schema-registry/docs/api.html
You can get schema definitions available in your schema registry by running the API calls to schema registry like:
curl http://localhost:8081/schemas/ids/3
where the last number in the URL is the schema # that you are interested in. If you have multiple types of messages in the broker, you can change the last # in the URL to get the different schema definitions for different message types.
For detail on the API calls, please refer to: https://docs.confluent.io/3.3.0/schema-registry/docs/api.html#schemas
This is the version 3.3 of the confluent platform. You can change it to current to get the current platform documentation.

Google Cloud Pub/Sub Python SDK retrieve single message at a time

Problem: My use case is I want to receive messages from Google Cloud Pub/Sub - one message at a time using the Python Api. All the current examples mention using Async/callback option for pulling the messages from a Pub/Sub subscription. The problem with that approach is I need to keep the thread alive.
Is it possible to just receive 1 message and close the connection i.e. is there a feature where I can just set a parameter (something like a max_messages) to 1 so that once it receives 1 message the thread terminates?
The documentation here doesn't list anything for Python Synchronous pull which seem to have num_of_messages option for other languages like Java.
See the following example in this link:
from google.cloud import pubsub_v1
client = pubsub_v1.SubscriberClient()
subscription = client.subscription_path('[PROJECT]', '[SUBSCRIPTION]')
max_messages = 1
response = client.pull(subscription, max_messages)
print(response)
I've tried myself and using that you get one message at a time.
If you get some error try updating pubsub library to the last version:
pip install --upgrade google-cloud-pubsub
In docs here you have more info about the pull method used in the code snippet:
The Pull method relies on a request/response model:
The application sends a request for messages. The server replies with
zero or more messages and closes the connection.
As per the official documentation here:
...you can achieve exactly once processing of Pub/Sub message streams,
as PubsubIO de-duplicates messages based on custom message identifiers
or identifiers assigned by Pub/Sub.
So you should be able to use record IDs, i.e. identifiers for you messages, to allow for exactly-once processing across the boundary between Dataflow and other systems. To use record IDs, you invoke idLabel when constructing PubsubIO.Read or PubsubIO.Write transforms, passing a string value of your choice. In java this would be:
public PubsubIO.Read.Bound<T> idLabel(String idLabel)
This returns a transform that's like this one but that reads unique message IDs from the given message attribute.

SQS: How can I read the sent time of an SQS message using Python's boto library

I can see messages have a sent time when I view them in the SQS message view in the AWS console. How can I read this data using Python's boto library?
When you read a message from a queue in boto, you get a Message object. This object has at attribute called attributes. It is a dictionary of attributes that SQS keeps about this message. It includes SentTimestamp.
You can use the attributes parameter of the get_message() method. See the documentation.
queue.get_messages(attributes=['All'])
The documentation also says that you can do this with the read() method but this is broken right now. I opened an issue for this on the project site: https://github.com/boto/boto/issues/2699.

Categories

Resources