Problem: My use case is I want to receive messages from Google Cloud Pub/Sub - one message at a time using the Python Api. All the current examples mention using Async/callback option for pulling the messages from a Pub/Sub subscription. The problem with that approach is I need to keep the thread alive.
Is it possible to just receive 1 message and close the connection i.e. is there a feature where I can just set a parameter (something like a max_messages) to 1 so that once it receives 1 message the thread terminates?
The documentation here doesn't list anything for Python Synchronous pull which seem to have num_of_messages option for other languages like Java.
See the following example in this link:
from google.cloud import pubsub_v1
client = pubsub_v1.SubscriberClient()
subscription = client.subscription_path('[PROJECT]', '[SUBSCRIPTION]')
max_messages = 1
response = client.pull(subscription, max_messages)
print(response)
I've tried myself and using that you get one message at a time.
If you get some error try updating pubsub library to the last version:
pip install --upgrade google-cloud-pubsub
In docs here you have more info about the pull method used in the code snippet:
The Pull method relies on a request/response model:
The application sends a request for messages. The server replies with
zero or more messages and closes the connection.
As per the official documentation here:
...you can achieve exactly once processing of Pub/Sub message streams,
as PubsubIO de-duplicates messages based on custom message identifiers
or identifiers assigned by Pub/Sub.
So you should be able to use record IDs, i.e. identifiers for you messages, to allow for exactly-once processing across the boundary between Dataflow and other systems. To use record IDs, you invoke idLabel when constructing PubsubIO.Read or PubsubIO.Write transforms, passing a string value of your choice. In java this would be:
public PubsubIO.Read.Bound<T> idLabel(String idLabel)
This returns a transform that's like this one but that reads unique message IDs from the given message attribute.
Related
I am trying to use slack web client API to pull slack messages including threads for a specific date. The conversations.history API only returns the parent message in the case of threaded messages. There is conversations.replies API that returns the message threads, but it requires ts of the parent message to be passed in the request so it will only return conversations related to one thread.
Is there a way to pull all message history including replies for a data range rather than having to combine a call to conversations.history API and then multiple calls to conversations.replies for each message with thread_ts?
This approach of combining both APIs won't work if the reply was posted on the specific date we want to pull, but the root thread message was posted on an older date. The root message won't be returned in conversations.history and hence we won't be able to get that particular message in the thread using conversations.replies.
It's strange that Slack doesn't provide such API to pull all messages including threaded ones.
Unfortunately, there is no way to capture all threads in a workspace with a single API call. The conversations.history is already a very data-heavy method. Most developers calling this method don't need thread information and including that in the reply would be a bit of an overkill. Calling conversations.replies should return all the replies corresponding to that parent message regardless of the date it was posted in unless otherwise specified using the latest or oldest parameters.
Is there an elegant way to pull one message off the broker without having to:
subscribe
create an on_message()
receive the message
unsubscribe
I ask because we are using a json message which has multiple fields. When new data comes in I want to ONLY update that particular field in the json message but not remove the rest of the data. Since we have a TON of these json topics, we don't really want to keep all of them in program memory (also in case the program has to be relaunched). On top of that, this program could be running for months without supervision.
So ideally, I'd like to post the json message to an ID'd topic with the retain flag set to True. Then when new data comes in for that ID, I do a pull of the info on that topic, update that particular field in the json message and repost to the same topic.
I can post example code but I'm hoping there is a simple function that I am unaware of.
Thanks, in advance, for any suggestions.
The Paho Python client comes with a set of help classes that do this single shot type of pattern for you.
Doc here
e.g. the following connects to a broker, subscribes to a topic and returns on receipt of the first message on that topic.
import paho.mqtt.subscribe as subscribe
msg = subscribe.simple("paho/test/simple", hostname="mqtt.eclipse.org")
print("%s %s" % (msg.topic, msg.payload))
And the matching publish call:
import paho.mqtt.publish as publish
publish.single("paho/test/single", "payload", hostname="mqtt.eclipse.org")
I don't think that is possible. You say "When new data comes in..." That's exacty why you need to subscribe and use the callback function. That's basically a "pull when something is actually there".
Just to get an idea of how it should work: you are sending that json message via MQTT, right? And you are re-sending it when it changes?
But you don't have to keep them all in the RAM. You could use a retained message in combination with a fixed topic (not ID'ed) and send the ID in the message.
If you use retained messages with ID'ed topics, that might fill the memory.
What does the ID stand for? A uniqie number? Something like a timestamp? A hash? The sender?
I think you can solve that problem by clearly separating your things, e.g. say in data and message, where data is something you maintain in Python (e.g. a database or something in RAM) and message is something that you acually send / receive via MQTT.
Then you can add / send / update data depending on what is received in MQTT and you don't have to send / update the complete set.
A report is posted every 5 hrs on a Slack channel, from which we need to sort/filter some information and put it into a file.
So, is there any way to read the channel continuously or run some command every 5 minutes or so before that time, and capture the report for future processing?
Yes, that is possible. Here is the basic outline of a solution:
Create a Slack app based on a script (e.g. in Python) that has access to
that channel's history (e.g. has the channels:history permission scope)
Use cron to call your script at the needed time
The script reads the channels history (e.g. with channel.history for public channels), filterers out what it needs
and then stores the report as file.
Another approach would be to continuously read every new message from the channel, parse for a trigger (e.g. a specific user that sends it or the name of the report) and then filter and safe the report when it appears. If you can identify a reliable trigger this would in my experience be the more stable solution, since scheduled reports can be delayed.
For that approach use the Events API of Slack instead of CRON and subscribe to receiving messages (e.g. message event for public channels). Slack will then automatically send each new message to your script as soon as it is posted.
If you are new to creating Slack apps I would advise to study the excellent official documentation and tutorials on the Slack API site to get started.
A Python example to this approach could be found here: https://gist.github.com/demmer/617afb2575c445ba25afc432eb37583b
This script counts the amount of messages per user.
Based on this code I created the following example for you:
# get the correct channel id
for channel in channels['channels']:
if channel['name'] == channel_name:
channel_id = channel['id']
if channel_id == None:
raise Exception("cannot find channel " + channel_name)
# get the history as follows:
history = sc.api_call("channels.history", channel=channel_id)
# get all the messages from the history:
messages = history['messages']
# Or reference them by ID, so in this case get the first message:
ids = messages[0]
I'm trying to use paho.mqtt for python (project pages) and all works nice. The only problem I have is I would find it very useful to find out who had sent the message. I looked up the source code but could not quite get my head around if the client variable passed within on_message is the client I use to connect to or details of the client who published the message (I'm guessing it's the first option).
So the question is - is it possible to find out who (the user name) had sent the message?
The MQTT protocol was designed to be as light weight as possible, this means that the message header contains the absolute bare minimum to deliver a message to a specific topic. There is no room in the header for anything else.
MQTT is also a Pub/Sub protocol, one of the key features of this type of protocol is to decouple the publisher from the subscriber as much as possible. This means that the publisher shouldn't care how many subscribers there are and subscribers shouldn't care where the information comes from as long as it is to a topic it's interested in.
If you want any more information other than the message topic then you have to add it to the payload yourself.
I am looking into pubnub to use in my real time data visualization with Rickshaw. But I do not understand are the channels already configured or do we have to configure them. If so how can we configure a channel for a data viz? Also I am getting the data from python Ceilometer API how can I push that data into pubnub?
Channels are an abstraction, similar to "chat rooms". Any message sent using PubNub, will be over a channel. A message consists of a channel, and its associated data payload. A publishing client publishes messages to a given channel, and a subscribing client receives only the messages associated with the channels its subscribed to.
Channels are created on-the-fly, and do not incur any additional charges to use one or many in your application. When you create a PubNub application, all messages will be associated with a channel.
It has the advantage of minimal network usage (each client receives only the data it needs) and minimal processing (no need for filtering unneeded data).
In order to push data into PubNub(we call this publish), you need to first create a PubNub instance, and put in your API keys. Get your keys here.
pubnub = Pubnub(publish_key='demo', subscribe_key='demo')
PubNub uses simple APIs to publish data as shown below :
def callback(message):
print(message)
pubnub.publish('my_channel', 'Hello from PubNub Python SDK!', callback=callback, error=callback)
The first parameter is the channel you want to publish data to, the second is the message you want to send, and the last two are callback functions that are called when you publish.
You can find detailed information on the APIs and on how to get started for the Python SDK on the site.
+1 Bhavana said :-)
Also, you can take a look at this Rickshaw with PubNub examples on:
https://github.com/pubnub/pubnub-rickshaw
If your goal is visualizing data with d3, and don't really need to rely on Richshaw, give EON a try:
https://github.com/pubnub/eon
With EON lib, you don't subscribe the data, but instead, use eon.chart to plot the data from PubNub stream directly onto a chart. Pretty neat.