Azure Function Servicebus batch process messages - python - python

I am implementing an azure function in python which is trying to batch process messages from a servicebus queue. I have modified the host.json file as follows:
{
"version": "2.0",
"serviceBus": {
"batchOptions": {
"maxMessageCount": 20,
"operationTimeout": "01:00:00",
"autoComplete": true
}
},
"extensionBundle": {
"id": "Microsoft.Azure.Functions.ExtensionBundle",
"version": "[2.*, 3.0.0)"
}
}
However, I am a bit lost as to how the messages will be received in my entry function in __init__.py. Will it just be a list of messages and I can just loop through the messages as follows:
def main(messages: func.ServiceBusMessage):
for msg in messages:
json_obj = json.loads(msg.get_body().decode("utf-8"))
print(json_obj)

Your binding configuration is not right, it is to be done in function.json properly. After your Azure function is configured as a Service Bus trigger, and bound correctly to the queue, then for each message generated, your function will get invoked once. Your function will not get array of messages. See the example code here to get more details.
Instead of a service bus Queue trigger, if you need your function app to process bunch of service bus messages, then you need to connect to the Service Bus Client and then receive the messages, see the code in this page.

Related

Azure Functions - Kafka Trigger and Output Binding - Consume exactly-once

I'm testing the Kafka Trigger and Output Binding in Azure Functions to consume a topic and write the message to another topic, very simple code.
But when I enable the auto-scale feature and the function provision new instances, I'm losing the 'exactly-once' feature and apparently some messages are being delivered to more than one instance.
host.json
{
"version": "2.0",
"logging": {
"applicationInsights": {
"samplingSettings": {
"isEnabled": true,
"excludedTypes": "Request"
}
}
},
"extensionBundle": {
"id": "Microsoft.Azure.Functions.ExtensionBundle",
"version": "[3.6.0, 4.0.0)"
}
}
function.json
{
"scriptFile": "__init__.py",
"bindings": [
{
"type": "kafkaTrigger",
"name": "kafkaTrigger",
"direction": "in",
"brokerList": "%PEP_BEES_KAFKA_BOOTSTRAP%",
"topic": "%PEP_BEES_KAFKA_SOURCE_TOPIC%",
"consumerGroup": "%PEP_BEES_KAFKA_SOURCE_TOPIC_CONSUMER_GROUP%"
},
{
"type": "kafka",
"direction": "out",
"name": "kafkaOutput",
"brokerList": "%PEP_BEES_KAFKA_BOOTSTRAP%",
"topic": "%PEP_BEES_KAFKA_DESTINATION_TOPIC%"
}
]
}
init.py
import logging, json
from azure.functions import KafkaEvent
import azure.functions as func
import typing
def main(kafkaTrigger : typing.List[KafkaEvent], kafkaOutput: func.Out[str]):
message = json.loads(str(kafkaTrigger.get_body().decode('utf-8')))
input_msg = str(message['Value'])
kafkaOutput.set(input_msg)
As you can see in the image below, the 'test-topic-output' (destination) has more messages than the 'test-topic' (source), indicating that sometimes more than one instance is consuming a message:
Message count
If I disable the auto-scaling feature, this behavior does not happen.
I just need that the 'exactly-once' feature works even with the function auto-scale enabled, to have an elastic environment.
EDIT
I just found out some erros during the Kafka Trigger processing:
Kafka Trigger errors
Confluent.Kafka.KafkaException:
at Confluent.Kafka.Impl.SafeKafkaHandle.StoreOffsets (Confluent.Kafka, Version=1.9.0.0, Culture=neutral, PublicKeyToken=12c514ca49093d1e)
at Confluent.Kafka.Consumer2.StoreOffset (Confluent.Kafka, Version=1.9.0.0, Culture=neutral, PublicKeyToken=12c514ca49093d1e) at Microsoft.Azure.WebJobs.Extensions.Kafka.AsyncCommitStrategy2.Commit (Microsoft.Azure.WebJobs.Extensions.Kafka, Version=3.6.0.0, Culture=neutral, PublicKeyToken=31bf3856ad364e35: /mnt/vss/_work/1/s/src/Microsoft.Azure.WebJobs.Extensions.Kafka/Trigger/AsyncCommitStrategy.cs:28)
at Microsoft.Azure.WebJobs.Extensions.Kafka.FunctionExecutorBase`2.Commit (Microsoft.Azure.WebJobs.Extensions.Kafka, Version=3.6.0.0, Culture=neutral, PublicKeyToken=31bf3856ad364e35: /mnt/vss/_work/1/s/src/Microsoft.Azure.WebJobs.Extensions.Kafka/Trigger/FunctionExecutorBase.cs:87)
If I disable the auto-scaling feature, this behavior does not happen
If you scale up, that wouldn't prevent records that were just consumed from existing inputs from being sent to output function.
You'd need a way to catch a consumer group rebalance within the function, and upon that action, then don't send to the output... Given that Azure doesn't expose Kafka native API, then its unlikely that is possible. You also would need to control when the function commits its consumer offsets (assuming PEP_BEES_KAFKA_SOURCE_TOPIC_CONSUMER_GROUP is a constant value); e.g. if it commits before kafkaTrigger.get_body(), then you should get less "extras", but if it commits after kafkaOutput.set(), then you'll have more upon scaling up...
Rather than use Python or serverless functions for replicating topics, you can use MirrorMaker or Kafka Streams instead.
I had noticed similar behavior of the kafka triggers consuming duplicate events when the scaling up the instance count (and surprisingly also during scale down). In batch mode with batch size of 50, its always the same 50 events that get read by 2 instance. I have also noticed this happens only once per instance scale up or (down).
One of the potential solution you can explore is using a warmup trigger that run during scale-out operation.
https://learn.microsoft.com/en-us/azure/azure-functions/functions-bindings-warmup?tabs=in-process&pivots=programming-language-python

Python Azure Function service bus topic complete after process

I wrote a azure function in python on Service bus topic, it works well but the issue is its not marking message as completed after finish.. the message process processed several time untill the count reached.. need to know how to mark message as complete here is my code.
def main(message: func.ServiceBusMessage):
try:
message_content_type = message.content_type
req_body = message.get_body().decode("utf-8")
logging.info(req_body)
response = obj_engine.extract_payload_message(req_body)
connection_str = os.environ["idpdev_SERVICEBUS"];
ocr_topic_name = os.environ["idpdev_topic"]
servicebus_client = ServiceBusClient.from_connection_string(conn_str=connection_str, logging_enable=True)
with servicebus_client:
sender = servicebus_client.get_topic_sender(topic_name=sender_topic_name)
with sender:
send_message = ServiceBusMessage(json.dumps(response))
sender.send_messages(send_message)
except ValueError:
pass
can someone please help to know what setting i am missing to add.
According to the docs you need to enable auto completion by setting autoComplete to true using the configuration:
Must be true for non-C# functions, which means that the trigger should either automatically call complete after processing, or the function code manually calls complete.
When set to true, the trigger completes the message automatically if the function execution completes successfully, and abandons the message otherwise.
Exceptions in the function results in the runtime calls abandonAsync in the background. If no exception occurs, then completeAsync is called in the background. This property is available only in Azure Functions 2.x and higher.
The configuration can be found in the function.json file, make sure the property is available and set to true, see the example below:
{
"scriptFile": "__init__.py",
"entryPoint": "main",
"bindings": [
{
"name": "message",
"type": "serviceBusTrigger",
"direction": "in",
"topicName": "mytopic",
"subscriptionName": "mysubscription",
"connection": "",
"autoComplete": "true"
}
]
}

Azure Function & TimerTrigger in a Custom Container: what else do I need in the function.json file?

I have a custom container with a time-triggered Azure Function. When I build the Docker image, and run it locally, I don't get any action, i.e. the trigger doesn't fire.
I'm wondering - is my function.json missing something? Do I need an output within that, and/or in __init__.py? I'm starting to think that the timer trigger alone isn't enough to elicit any kind of response.
The function.json file:
{
"scriptFile": "__init__.py",
"bindings": [
{
"name": "mytimer",
"type": "timerTrigger",
"direction": "in",
"schedule": "0 * * * * *",
"authLevel": "anonymous"
}
]
}
The __init__.py imports some custom functions, which work, but need the custom container for selenium concerns. The function scrapes and then outputs to Twitter. But is there a need for an output, like the answer in this question? If this Function needs to output to Twitter (the internet), is the timer trigger enough?
import logging
import azure.functions as func
#importing custom modules, here, they work
def main(mytimer: func.TimerRequest) -> None:
utc_timestamp = datetime.datetime.utcnow().replace(
tzinfo=datetime.timezone.utc).isoformat()
if mytimer.past_due:
logging.info('The timer is past due!')
logging.info('Python timer trigger function ran at %s', utc_timestamp)
# Class instantiation for the scraping, etc. ----
#calling other functions
Some of the logs from the running container, though I don't think it's a string connection issue, as I have that defined in the local.settings.json file, or I use the storage emulator in VS Code, and that works too.
The listener for function 'Functions.BadFunc' was unable to start.
Microsoft.Azure.WebJobs.Host.Listeners.FunctionListenerException: The listener for function 'Functions.BadFunc' was unable to start.
---> System.ArgumentNullException: Value cannot be null. (Parameter 'connectionString')
at Microsoft.Azure.Storage.CloudStorageAccount.Parse(String connectionString)
at Microsoft.Azure.WebJobs.Extensions.Timers.StorageScheduleMonitor.get_TimerStatusDirectory() in C:\azure-webjobs-sdk-extensions\src\WebJobs.Extensions\Extensions\Timers\Scheduling\StorageScheduleMonitor.cs:line 77
at Microsoft.Azure.WebJobs.Extensions.Timers.StorageScheduleMonitor.GetStatusBlobReference(String timerName) in C:\azure-webjobs-sdk-extensions\src\WebJobs.Extensions\Extensions\Timers\Scheduling\StorageScheduleMonitor.cs:line 144
at Microsoft.Azure.WebJobs.Extensions.Timers.StorageScheduleMonitor.GetStatusAsync(String timerName)
at Microsoft.Azure.WebJobs.Extensions.Timers.Listeners.TimerListener.StartAsync(CancellationToken cancellationToken) in C:\azure-webjobs-sdk-extensions\src\WebJobs.Extensions\Extensions\Timers\Listener\TimerListener.cs:line 99
at Microsoft.Azure.WebJobs.Host.Listeners.SingletonListener.StartAsync(CancellationToken cancellationToken) in C:\projects\azure-webjobs-sdk-rqm4t\src\Microsoft.Azure.WebJobs.Host\Singleton\SingletonListener.cs:line 72
at Microsoft.Azure.WebJobs.Host.Listeners.FunctionListener.StartAsync(CancellationToken cancellationToken, Boolean allowRetry) in C:\projects\azure-webjobs-sdk-rqm4t\src\Microsoft.Azure.WebJobs.Host\Listeners\FunctionListener.cs:line 69
--- End of inner exception stack trace ---
info: Host.Startup[413]
Host started (154ms)
info: Host.Startup[0]
Job host started
Hosting environment: Production
Content root path: /
Now listening on: http://[::]:80
Application started. Press Ctrl+C to shut down.
info: Microsoft.Azure.WebJobs.Script.Workers.Rpc.RpcFunctionInvocationDispatcher[0]
Worker process started and initialized.
info: Host.Startup[0]
Retrying to start listener for function 'Functions.BadFunc' (Attempt 1)
info: Host.Startup[0]
Listener successfully started for function 'Functions.BadFunc' after 1 retries.
info: Host.General[316]
Host lock lease acquired by instance ID '000000000000000000000000F28FAECC'.
It looks like your function is no problem. So maybe problems come from the azure storage emulator?
Try to change the 'local.settings.json' file just like this:
{
"IsEncrypted": false,
"Values": {
"AzureWebJobsStorage": "DefaultEndpointsProtocol=https;AccountName=0730bowmanwindow;AccountKey=xxxxxx==;EndpointSuffix=core.windows.net",
"FUNCTIONS_WORKER_RUNTIME": "python"
}
}
The value of AzureWebJobsStorage needs you to create a storage on azure and then get the connection string of it:
You can do a try.
Have you tried this?
Add an ENV statement (where you define your connection string) to your Dockerfile:
ENV AzureWebJobsStorage=DefaultEndpointsProtocol=https;AccountName=xxx;AccountKey=xxx==;EndpointSuffix=core.windows.net

Azure Functions Python | Send EventData messages with properties to Event Hub output

I am writing an Azure python function that is triggered and then generates multiple outgoing messages. I need to send EventData (body + properties) messages to an Eventhub. Thus far I have not found any way to do add properties to an outgoing message using the EventHub output bindings. It appears that the output string is put into the "body" property.
One possible solution that I see is to write an EventHubClient into the function, but is that really the most effective way to get properties send with the message? Why would there be output bindings then?
My function.json file is:
{
"type": "eventHub",
"name": "outputHub",
"eventHubName": "test",
"connection": "TestSendConnection",
"direction": "out"
}
Here is my code:
def main(events: func.EventHubEvent,
referenceInput: func.InputStream,
outputHub: func.Out[str]):
logging.info('Send an output event to eventhub')
evt_data_list = []
for k in range(0,10):
evt_data = EventData("Sample Body")
evt_data.properties['EventType'] = "log"
evt_data_list.append(evt_data)
logging.info('Send an output event to eventhub')
import random
outputHub.set("[" + ",".join([str(evt) for evt in evt_data_list]) + "]")
I am monitoring the incoming messages with the Azure Event Hub Explorer and I receive multiple messages, but they arrive in the following format. I need the body and the properties sections to be separate for the external parser.
{
"body": {
"body": "Sample Body",
"properties": {
"EventType": "log"
}
},
"enqueuedTimeUtc": "2020-06-09T17:59:04.803Z",
"offset": "1335734859528",
"sequenceNumber": 4995022
}
Currently I am afraid there is no way to add properties to an outgoing message using the EventHub output bindings.
The workaround is using EventHub SDK inside the function.
Reference:
Microsoft Azure SDK for Event Hubs(Python)

How to pass variables as context to IBM Cloud Watson Assistant with V2?

I am trying to use the new API version V2 for IBM Cloud Watson Assistant. Instead of sending a message for a workspace I need to send a message to an assistant. The context structure has global and skill-related sections now.
How would my app pass in values as context variables? Where in the structure would they need to be placed? I am using the Python SDK.
I am interested in sending information as part of client dialog actions.
Based on testing the Python SDK and the API V2 using a tool, I came to the following conclusion. Context is provided by the assistant if it is requested as part of the input options.
"context": {
"skills": {
"main skill": {
"user_defined": {
"topic": "some chatbot talk",
"skip_user_input": true
}
}
},
"global": {
"system": {
"turn_count": 2
}
}
}
To pass back values from my client / app to the assistant, I could use the context parameter. However, in contrast to the V1 API I needed to place the key / value pairs "down below" in the user_defined part:
context['skills']['main skill']['user_defined'].update({'mydateOUT':'2018-10-08'})
The above is a code snippet from this sample file for a client action. With that placement of my context variables everything works and I can implement client actions using the API Version 2.

Categories

Resources