Sending JSON events to Azure Event Hub using Python program

Sending JSON events to Azure Event Hub using Python program - python

Newbie to Azure Stream analytics. I am sending events to Azure Event Hub with below Python program by serializing the respone\data to JSON format so that it can be read by the Stream Analytics job input. However, there is something either wrong or I am missing because of which I can not view the test data in the Stream Analytics job.
import asyncio
import time
import requests
import json
from azure.eventhub.aio import EventHubProducerClient
from azure.eventhub import EventData
connection_str = 'XXX'
consumer_group = 'XXX'
eventhub_name = 'XXX'
async def run():
# create a producer client to send messages to the event hub
# specify connection string to your event hubs namespace and
# the event hub name
producer = EventHubProducerClient.from_connection_string(connection_str, consumer_group=consumer_group, eventhub_name=eventhub_name)
async with producer:
# create a batch
event_data_batch = await producer.create_batch(partition_id='1')
while True:
# add events to the batch
response = requests.get("https://jsonplaceholder.typicode.com/todos/1")
data = json.loads(response.text)
event_data_batch.add(EventData(data))
time.sleep(5)
# send the batch of events to the event hub
await producer.send_batch(event_data_batch)
loop = asyncio.get_event_loop()
loop.run_until_complete(run())

Related

python JSON format invalid

I'm trying to get date and time flowing into Azure IoT hub to enable me to analyze using Azure DX as time series. I can get the temperature and humidity (humidity at the moment is just a random number). If I use this code, all works well and the JSON is well formatted and flows into IoT hub and onto Azure DX:
The basis for the code is taken from the Microsoft examples here - https://github.com/Azure-Samples/azure-iot-samples-python/blob/master/iot-hub/Quickstarts/simulated-device/SimulatedDeviceSync.py
import asyncio
import random
from azure.iot.device import Message
from azure.iot.device.aio import IoTHubDeviceClient
import time
from datetime import datetime
from w1thermsensor import W1ThermSensor
sensor = W1ThermSensor()
import json
CONNECTION_STRING = "xxxxx"
HUMIDITY = 60
MSG_TXT = '{{"temperature": {temperature},"humidity": {humidity}}}'
async def main():
try:
# Create instance of the device client
client = IoTHubDeviceClient.create_from_connection_string(CONNECTION_STRING)
print("Simulated device started. Press Ctrl-C to exit")
while True:
humidity = round(HUMIDITY + (random.random() * 20), 2)
temperature = sensor.get_temperature()
msg_txt_formatted = MSG_TXT.format(temperature=temperature, humidity=humidity)
message = Message(msg_txt_formatted)
# Send a message to the IoT hub
print(f"Sending message: {message}")
await client.send_message(message)
await asyncio.sleep(1)
except KeyboardInterrupt:
print("Simulated device stopped")
if __name__ == '__main__':
asyncio.run(main())
The JSON format is valid and works well -
{ "temperature": 7, "humidity": 66.09 }
If I try to add a date/time field like this:
import asyncio
import random
from azure.iot.device import Message
from azure.iot.device.aio import IoTHubDeviceClient
import time
from datetime import datetime
from w1thermsensor import W1ThermSensor
sensor = W1ThermSensor()
import json
CONNECTION_STRING = "xxxxx"
HUMIDITY = 60
x = datetime.now()
timesent = str(x)
MSG_TXT = '{{"temperature": {temperature},"humidity": {humidity},"timesent": {timesent}}}'
async def main():
try:
# Create instance of the device client
client = IoTHubDeviceClient.create_from_connection_string(CONNECTION_STRING)
print("Simulated device started. Press Ctrl-C to exit")
while True:
humidity = round(HUMIDITY + (random.random() * 20), 2)
temperature = sensor.get_temperature()
msg_txt_formatted = MSG_TXT.format(temperature=temperature, humidity=humidity, timesent=timesent)
message = Message(msg_txt_formatted)
# Send a message to the IoT hub
print(f"Sending message: {message}")
await client.send_message(message)
await asyncio.sleep(1)
except KeyboardInterrupt:
print("Simulated device stopped")
if __name__ == '__main__':
asyncio.run(main())
The output from the JSON is no longer valid and Azure DX will not map. The invalid JSON I get is:
"{\"temperature\": 7,\"humidity\": 72.88, \"timesent\": 2022-11-08 14:21:04.021812}"
I suspect this is something to do with the date/time being formatted as a string, but I'm totally lost.
Would anyone have any ideas how I can send this data?

#JoeHo, thank you for pointing the sources that helped you resolve the issue. I am posting the solution here so that other community members facing similar issue would benefit. Making the below modifications to the code helped me resolve the issue.
def json_serial(obj):
if isinstance(obj, (datetime, date)):
return obj.isoformat()
raise TypeError ("Type %s not serializable" % type(obj))
x = datetime.now().isoformat();
timesent = dumps(datetime.now(), default=json_serial);
MSG_TXT = '{{"temperature": {temperature},"humidity": {humidity}, "timesent": {timesent}}}'
My table on the Azure data explorer has the following filed definitions defined.
.create table jsondata (temperature: real, humidity: real, timesent: datetime)
My data mapping query is as below
.create table jsondata ingestion json mapping 'jsonMapping' '[{"column":"humidity","path":"$.humidity","datatype":"real"},{"column":"temperature","path":"$.temperature","datatype":"real"}, {"column":"timesent","path":"$.timesent","datatype":"datetime"}]'
I then connected the Azure Data Explorer table to IoT Hub using the steps outlined in the following resource Connect Azure Data Explorer table to IoT hub
When I execute the program, I could see the Azure IoT Hub telemetry data flow getting bound to the Azure Data explorer table without any issues.

Why Eventhub async receiver is fetching just 30-35 messages per minute?

I have a async_receive method of Eventhub developed in python and also has a checkpoint with it. The code was taken from the official Eventhub sample github repo.
Problem- Using the above-mentioned code, I am just able to receive 20-35 messages per minute if I keep the receiver on for the whole day whereas my Eventhub has a lot of stream data ingested (~200 messages per Minute). The enqueued time at eventhub for a message is now lagging behind by 90 minutes due to poor throughput at the receiver's end which means that the data that got enqueued at X minute in the Eventhub got pulled out of it at X+90 minutes
Investigation- I tried to look at the receive subclass in the Eventhub python SDK and came across a prefetch parameter (line 318) which is set to 300 by default. If this is already set to 300 then I should be able to pull more than 30-35 messages by default.
Any idea on how can I increase my pull capacity? I'm stuck at this point and have no direction forward, any help is highly appreciated.
EDIT 1-
I'm now attaching my Python Code as shown below-
import asyncio
import json
import logging
import os
import sys
import time
from datetime import date
import requests
from azure.eventhub.aio import EventHubConsumerClient
from azure.eventhub.extensions.checkpointstoreblobaio import BlobCheckpointStore
import log_handler
import threading
import traceback
try:
## Set env variables
CONNECTION_STR = os.environ["ECS"].strip()
EVENTHUB_NAME = os.environ['EN'].strip()
EVENTHUB_CONSUMER = os.environ["EC"].strip()
API = os.environ['API_variable'].strip()
AZURE_BLOB_CONNECTION_STR = os.environ["ACS"].strip()
BLOB_CONTAINER_NAME = os.environ["BCN"].strip()
BLOB_ACCOUNT_URL = os.environ["BAU"].strip()
PREFETCH_COUNT = int(os.environ["PREFETCH_COUNT"])
MAX_WAIT_TIME = float(os.environ["MAX_WAIT_TIME"])
except Exception as exception:
logging.debug(traceback.format_exc())
logging.warning(
"*** Please check the environment variables for {}".format(str(exception)))
sys.exit()
def API_CALL(event_data):
"""
Sends the request to the API
"""
try:
url = event_data['image_url']
payload = {"url": url}
## API call to the server
service_response = requests.post(API, json=payload)
logging.info(f"*** service_response.status_code : {service_response.status_code}")
cloud_response = json.loads(
service_response.text) if service_response.status_code == 200 else None
today = date.today()
response_date = today.strftime("%B %d, %Y")
response_time = time.strftime("%H:%M:%S", time.gmtime())
response_data = {
"type": "response_data",
"consumer_group": EVENTHUB_CONSUMER,
'current_date': response_date,
'current_time': response_time,
'image_url': url,
'status_code': service_response.status_code,
'response': cloud_response,
'api_response_time': int(service_response.elapsed.total_seconds()*1000),
"eventhub_data": event_data
}
logging.info(f"*** response_data {json.dumps(response_data)}")
logging.debug(f"*** response_data {json.dumps(response_data)}")
except Exception as exception:
logging.debug(traceback.format_exc())
logging.error(
"**** RaiseError: Failed request url %s, Root Cause of error: %s", url, exception)
async def event_operations(partition_context, event):
start_time = time.time()
data_ = event.body_as_str(encoding='UTF-8')
json_data = json.loads(data_)
## forming data payload
additional_data = {
"type": "event_data",
"consumer_group": EVENTHUB_CONSUMER,
"image_name": json_data["image_url"].split("/")[-1]
}
json_data.update(additional_data)
logging.info(f"*** Data fetched from EH : {json_data}")
logging.debug(f"*** Data fetched from EH : {json_data}")
API_CALL(json_data)
logging.info(f"*** time taken to process an event(ms): {(time.time()-start_time)*1000}")
def between_callback(partition_context, event):
"""
Loop to create threads
"""
loop = asyncio.new_event_loop()
asyncio.set_event_loop(loop)
loop.run_until_complete(event_operations(partition_context, event))
loop.close()
async def on_event(partition_context, event):
"""
Put your code here.
Do some sync or async operations.
If the operation is i/o intensive, async will have better performanceself.
"""
t1 = time.time()
_thread = threading.Thread(target=between_callback, args=(partition_context, event))
_thread.start()
logging.info(f"*** time taken to start a thread(ms): {(time.time()-t1)*1000}")
logging.info("*** Fetching the next event")
## Update checkpoint per event
t2 = time.time()
await partition_context.update_checkpoint(event)
logging.info(f"*** time taken to update checkpoint(ms): {(time.time()-t2)*1000}")
async def main(client):
"""
Run the on_event method for each event received
Args:
client ([type]): Azure Eventhub listener client
"""
try:
async with client:
# Call the receive method. Only read current data (#latest)
logging.info("*** Listening to event")
await client.receive(on_event=on_event,
prefetch=PREFETCH_COUNT,
max_wait_time=MAX_WAIT_TIME)
except KeyboardInterrupt:
print("*** Stopped receiving due to keyboard interrupt")
except Exception as err:
logging.debug(traceback.format_exc())
print("*** some error occured :", err)
if __name__ == '__main__':
## Checkpoint initialization
checkpoint_store = BlobCheckpointStore(
blob_account_url=BLOB_ACCOUNT_URL,
container_name=BLOB_CONTAINER_NAME,
credential=AZURE_BLOB_CONNECTION_STR
)
## Client initialization
client = EventHubConsumerClient.from_connection_string(
CONNECTION_STR,
consumer_group=EVENTHUB_CONSUMER,
eventhub_name=EVENTHUB_NAME,
checkpoint_store=checkpoint_store, #COMMENT TO RUN WITHOUT CHECKPOINT
logging_enable=True,
on_partition_initialize=on_partition_initialize,
on_partition_close=on_partition_close,
idle_timeout=10,
on_error=on_error,
retry_total=3
)
logging.info("Connecting to eventhub {} consumer {}".format(
EVENTHUB_NAME, EVENTHUB_CONSUMER))
logging.info("Registering receive callback.")
loop = asyncio.get_event_loop()
try:
loop.run_until_complete(main(client))
except KeyboardInterrupt as exception:
pass
finally:
loop.stop()
Execution-flow main()-->on_event()-->Thread(between_callback-->API_CALL)-->update_checkpoint

Change the function for the receiver in the below format when we can able to get the events and run them until they complete.
import asyncio
from azure.eventhub.aio import EventHubConsumerClient
from azure.eventhub.extensions.checkpointstoreblobaio import BlobCheckpointStore
async def on_event(partition_context, event):
# Print the event data.
print("Received the event: \"{}\" from the partition with ID: \"{}\"".format(event.body_as_str(encoding='UTF-8'), partition_context.partition_id))
# Update the checkpoint so that the program doesn't read the events
# that it has already read when you run it next time.
await partition_context.update_checkpoint(event)
async def main():
# Create an Azure blob checkpoint store to store the checkpoints.
checkpoint_store = BlobCheckpointStore.from_connection_string("AZURE STORAGE CONNECTION STRING", "BLOB CONTAINER NAME")
# Create a consumer client for the event hub.
client = EventHubConsumerClient.from_connection_string("EVENT HUBS NAMESPACE CONNECTION STRING", consumer_group="$Default", eventhub_name="EVENT HUB NAME", checkpoint_store=checkpoint_store)
async with client:
# Call the receive method. Read from the beginning of the partition (starting_position: "-1")
await client.receive(on_event=on_event, starting_position="-1")
if __name__ == '__main__':
loop = asyncio.get_event_loop()
# Run the main method.
loop.run_until_complete(main())
Also as per the suggest in the comment, call the API on an interval bases.

Acknowledging pubsub messages through python synchronous pull does not work

With the python google-cloud-pubsub library, acknowledging messages through the subscriber.acknowledge() does not acknowledge my messages. My ack deadline is set at 30 seconds.
Here is my code:
from google.cloud import pubsub_v1
project_id = "$$$$"
subscription_name = "$$$$"
subscriber = pubsub_v1.SubscriberClient()
subscription_path = subscriber.subscription_path(project_id, subscription_name)
response = subscriber.pull(subscription_path, max_messages=10, timeout=15)
for msg in response.received_messages:
subscriber.acknowledge(subscription=subscription_path, ack_ids=[msg.ack_id])
Using google-cloud-pubsub==1.0.2
Any idea of what I'm doing wrong?

What I recommend you is referring to Synchronous Pull documentation, then run a sample code in Python to pull and acknowledge messages:
from google.cloud import pubsub_v1
project_id = "Your Google Cloud Project ID"
TODO subscription_name = "Your Pub/Sub subscription name"
subscriber = pubsub_v1.SubscriberClient()
subscription_path = subscriber.subscription_path(
project_id, subscription_name)
NUM_MESSAGES = 3
response = subscriber.pull(subscription_path, max_messages=NUM_MESSAGES)
ack_ids = []
for received_message in response.received_messages:
print("Received: {}".format(received_message.message.data))
ack_ids.append(received_message.ack_id)
subscriber.acknowledge(subscription_path, ack_ids)
print('Received and acknowledged {} messages. Done.'.format(
len(response.received_messages)))
I can't find definition of ack_ids = [] in your code (you need to define it before starting use it in code). If you will see positive results when running that piece of code, you can assume that there is a bug in your code. Have you provided a full code?

Python Slack Bot using asyncio - RuntimeError: Session is closed

I'm trying to setup the Slack Bot tutorial that uses their RTM framework in python and the WebSocket is not connecting
I'm following this tutorial:
https://github.com/slackapi/python-slackclient/tree/master/tutorial#table-of-contents
I follow the code as instructed:
import os
import logging
import slack
import ssl as ssl_lib
import certifi
from onboarding_tutorial import OnboardingTutorial
{"channel": {"user_id": OnboardingTutorial}}
onboarding_tutorials_sent = {}
def start_onboarding(web_client: slack.WebClient, user_id: str, channel: str):
# Create a new onboarding tutorial.
onboarding_tutorial = OnboardingTutorial(channel)
# Get the onboarding message payload
message = onboarding_tutorial.get_message_payload()
# Post the onboarding message in Slack
response = web_client.chat_postMessage(**message)
# Capture the timestamp of the message we've just posted so
# we can use it to update the message after a user
# has completed an onboarding task.
onboarding_tutorial.timestamp = response["ts"]
# Store the message sent in onboarding_tutorials_sent
if channel not in onboarding_tutorials_sent:
onboarding_tutorials_sent[channel] = {}
onboarding_tutorials_sent[channel][user_id] = onboarding_tutorial
# ================ Team Join Event =============== #
# When the user first joins a team, the type of the event will be 'team_join'.
# Here we'll link the onboarding_message callback to the 'team_join' event.
#slack.RTMClient.run_on(event="team_join")
def onboarding_message(**payload):
"""Create and send an onboarding welcome message to new users. Save the
time stamp of this message so we can update this message in the future.
"""
# Get the id of the Slack user associated with the incoming event
user_id = payload["data"]["user"]["id"]
# Get WebClient so you can communicate back to Slack.
web_client = payload["web_client"]
# Open a DM with the new user.
response = web_client.im_open(user=user_id)
channel = response["channel"]["id"]
# Post the onboarding message.
start_onboarding(web_client, user_id, channel)
# ============= Reaction Added Events ============= #
# When a users adds an emoji reaction to the onboarding message,
# the type of the event will be 'reaction_added'.
# Here we'll link the update_emoji callback to the 'reaction_added' event.
#slack.RTMClient.run_on(event="reaction_added")
def update_emoji(**payload):
"""Update onboarding welcome message after receiving a "reaction_added"
event from Slack. Update timestamp for welcome message as well.
"""
data = payload["data"]
web_client = payload["web_client"]
channel_id = data["item"]["channel"]
user_id = data["user"]
# Get the original tutorial sent.
onboarding_tutorial = onboarding_tutorials_sent[channel_id][user_id]
# Mark the reaction task as completed.
onboarding_tutorial.reaction_task_completed = True
# Get the new message payload
message = onboarding_tutorial.get_message_payload()
# Post the updated message in Slack
updated_message = web_client.chat_update(**message)
# Update the timestamp saved on the onboarding tutorial object
onboarding_tutorial.timestamp = updated_message["ts"]
# =============== Pin Added Events ================ #
# When a users pins a message the type of the event will be 'pin_added'.
# Here we'll link the update_pin callback to the 'reaction_added' event.
#slack.RTMClient.run_on(event="pin_added")
def update_pin(**payload):
"""Update onboarding welcome message after receiving a "pin_added"
event from Slack. Update timestamp for welcome message as well.
"""
data = payload["data"]
web_client = payload["web_client"]
channel_id = data["channel_id"]
user_id = data["user"]
# Get the original tutorial sent.
onboarding_tutorial = onboarding_tutorials_sent[channel_id][user_id]
# Mark the pin task as completed.
onboarding_tutorial.pin_task_completed = True
# Get the new message payload
message = onboarding_tutorial.get_message_payload()
# Post the updated message in Slack
updated_message = web_client.chat_update(**message)
# Update the timestamp saved on the onboarding tutorial object
onboarding_tutorial.timestamp = updated_message["ts"]
# ============== Message Events ============= #
# When a user sends a DM, the event type will be 'message'.
# Here we'll link the update_share callback to the 'message' event.
#slack.RTMClient.run_on(event="message")
def message(**payload):
"""Display the onboarding welcome message after receiving a message
that contains "start".
"""
data = payload["data"]
web_client = payload["web_client"]
channel_id = data.get("channel")
user_id = data.get("user")
text = data.get("text")
if text and text.lower() == "start":
return start_onboarding(web_client, user_id, channel_id)
if __name__ == "__main__":
ssl_context = ssl_lib.create_default_context(cafile=certifi.where())
slack_token = os.environ["SLACK_BOT_TOKEN"]
rtm_client = slack.RTMClient(token=slack_token, ssl=ssl_context)
rtm_client.start()
I am getting the following error message - it look like the WebSocket is not connecting for some reason. I have checked the token to have bot scope, etc.
Traceback (most recent call last):
File "/Users/badger/.virtualenvs/gcpenv/lib/python3.6/site-packages/slack/rtm/client.py", line 334, in _connect_and_read
proxy=self.proxy,
File "/Users/badger/.virtualenvs/gcpenv/lib/python3.6/site-packages/aiohttp/client.py", line 1012, in __aenter__
self._resp = await self._coro
File "/Users/badger/.virtualenvs/gcpenv/lib/python3.6/site-packages/aiohttp/client.py", line 728, in _ws_connect
proxy_headers=proxy_headers)
File "/Users/badger/.virtualenvs/gcpenv/lib/python3.6/site-packages/aiohttp/client.py", line 357, in _request
raise RuntimeError('Session is closed')
RuntimeError: Session is closed

Using sleekxmpp to send Google Talk Chats using OAuth2.0

I'm new to Python and need some help authenticating a Chatbot via OAuth2. I have a Google Talk chatbot setup using sleekxmpp for python. It comes with a builtin plugin called 'google' that I don't know how to use.
1) I have setup a service account on Googles Developer Console that gave me a JSON key and then I request an access token scoped to GTalk via oauth2client.
def oAuthPing():
json_key = json.load(open(credentialsPath))
jid = json_key['client_email']
scope = ['https://www.googleapis.com/auth/googletalk']
accessToken = SignedJwtAssertionCredentials(json_key['client_email'], json_key['private_key'], scope)
return accessToken, jid
2) Send chat:
def sendPing(toPerson, toPersonMessage, accessToken, jid):
if sys.version_info < (3, 0):
reload(sys)
sys.setdefaultencoding('utf8')
else:
raw_input = input
xmpp = SendMsgBot.SendMsgBot(jid, toPerson, unicode(toPersonMessage))
xmpp.credentials['access_token'] = accessToken
xmpp.register_plugin('xep_0030') # Service Discovery
xmpp.register_plugin('xep_0004') # date form
xmpp.register_plugin('google') # oAuth2
xmpp.register_plugin('xep_0199') # XMPP Ping
# Connect to the XMPP server and start processing XMPP stanzas.
if xmpp.connect(('talk.google.com', 5222)):
xmpp.process(block=True)
else:
print("Unable to connect to Google Talk")
3) SendMsgBot class:
class SendMsgBot(sleekxmpp.ClientXMPP):
"""
A basic SleekXMPP bot that will log in, send a message,
and then log out.
"""
def __init__(self, jid, recipient, message):
sleekxmpp.ClientXMPP.__init__(self, jid, 'ignore')
# The message we wish to send, and the JID that
# will receive it.
self.recipient = recipient
self.msg = message
# The session_start event will be triggered when
# the bot establishes its connection with the server
# and the XML streams are ready for use. We want to
# listen for this event so that we we can initialize
# our roster.
self.add_event_handler("session_start", self.start)
def start(self, event):
"""
Process the session_start event.
Typical actions for the session_start event are
requesting the roster and broadcasting an initial
presence stanza.
"""
self.send_presence()
self.get_roster()
self.send_message(mto=self.recipient,
mbody=self.msg,
mtype='chat')
# Using wait=True ensures that the send queue will be
# emptied before ending the session.
self.disconnect(wait=True)
Any help would greatly be appreciated. Thanks.

Develop Reference

Python is a programming language that lets you work quickly and integrate systems more effectively.

Sending JSON events to Azure Event Hub using Python program - python

Related

python JSON format invalid

Why Eventhub async receiver is fetching just 30-35 messages per minute?

Acknowledging pubsub messages through python synchronous pull does not work

Python Slack Bot using asyncio - RuntimeError: Session is closed

Using sleekxmpp to send Google Talk Chats using OAuth2.0

Categories

Resources