Using Azure functions in a python runtime env. to take latitude/longitude tuples from a snowflake database and return the respective countries. We also want to convert any non-english country names into English.
We initially found that although the script would show output in the terminal while testing on azure, it would soon return a 503 error (although the script continues to run at this point). If we cancelled the script it would show as a success in the monitor screen of azure portal, however leaving the script to run to completion would result in the script failing. We decided (partially based on this post) this was due to the runtime exceeding the maximum http response time allowed. To combat this we tried a number of solutions.
First we extended the function timeout value in the function.json file:
{
"version": "2.0",
"logging": {
"applicationInsights": {
"samplingSettings": {
"isEnabled": true,
"excludedTypes": "Request"
}
}
},
"extensionBundle": {
"id": "Microsoft.Azure.Functions.ExtensionBundle",
"version": "[2.*, 3.0.0)"
},
"functionTimeout": "00:10:00"
}
We then modified our script to use a queue trigger by adding the output
def main(req: func.HttpRequest, msg: func.Out[func.QueueMessage]) ->func.HttpResponse:
to the main .py script. We also then modified the function.json file to
{
"scriptFile": "__init__.py",
"bindings": [
{
"authLevel": "function",
"type": "httpTrigger",
"direction": "in",
"name": "req",
"methods": [
"get",
"post"
]
},
{
"type": "http",
"direction": "out",
"name": "$return"
},
{
"type": "queue",
"direction": "out",
"name": "msg",
"queueName": "processing",
"connection": "QueueConnectionString"
}
]
}
and the local.settings.json file to
{
"IsEncrypted": false,
"Values": {
"FUNCTIONS_WORKER_RUNTIME": "python",
"AzureWebJobsStorage": "{AzureWebJobsStorage}",
"QueueConnectionString": "<Connection String for Storage Account>",
"STORAGE_CONTAINER_NAME": "testdocs",
"STORAGE_TABLE_NAME": "status"
}
}
We also then added a check to see if the country name was already in English. The intention here was to cut down on calls to the translate function.
After each of these changes we redeployed to the functions app and tested again. Same result. The function will run, and print output to terminal, however after a few seconds it will show a 503 error and eventually fail.
I can show a code sample but cannot provide the tables unfortunately.
from snowflake import connector
import pandas as pd
import pyarrow
from geopy.geocoders import Nominatim
from deep_translator import GoogleTranslator
from pprint import pprint
import langdetect
import logging
import azure.functions as func
def main(req: func.HttpRequest, msg: func.Out[func.QueueMessage]) -> func.HttpResponse:
logging.info('Python HTTP trigger function processed a request.')
# Connecting string to Snowflake
conn = connector.connect(
user='<USER>',
password='<PASSWORD>',
account='<ACCOUNT>',
warehouse='<WH>',
database='<DB>',
schema='<SCH>'
)
# Creating objects for Snowflake, Geolocation, Translate
cur = conn.cursor()
geolocator = Nominatim(user_agent="geoapiExercises")
translator = GoogleTranslator(target='en')
# Fetching weblog data to get the current latlong list
fetchsql = "SELECT PAGELATLONG FROM <TABLE_NAME> WHERE PAGELATLONG IS NOT NULL GROUP BY PAGELATLONG;"
logging.info(fetchsql)
cur.execute(fetchsql)
df = pd.DataFrame(cur.fetchall(), columns = ['PageLatLong'])
logging.info('created data frame')
# Creating and Inserting the mapping into final table
for index, row in df.iterrows():
latlong = row['PageLatLong']
location = geolocator.reverse(row['PageLatLong']).raw['address']
logging.info('got addresses')
city = str(location.get('state_district'))
country = str(location.get('country'))
countrycd = str(location.get('country_code'))
logging.info('got countries')
# detect language of country
res = langdetect.detect_langs(country)
lang = str(res[0]).split(':')[0]
conf = str(res[0]).split(':')[0]
if lang != 'en' and conf > 0.99:
country = translator.translate(country)
logging.info('translated non-english country names')
insertstmt = "INSERT INTO <RESULTS_TABLE> VALUES('"+latlong+"','"+city+"','"+country+"','"+countrycd+"')"
logging.info(insertstmt)
try:
cur.execute(insertstmt)
except Exception:
pass
return func.HttpResponse("success")
If anyone had an idea what may be causing this issue I'd appreciate any suggestions.
Thanks.
To resolve timeout errors, you can try following ways:
As suggested by MayankBargali-MSFT , You can try to define the retry policies and for Trigger like HTTP and timer, don't resume on a new instance. This means that the max retry count is a best effort, and in some rare cases an execution could be retried more than the maximum, or for triggers like HTTP and timer be retried less than the maximum. You can navigate to Diagnose and solve problems to see if it helps you to know the root cause of 503 error as there can be multiple reasons for this error
As suggested by ryanchill , 503 issue is the result of high memory consumption which exceeded the limits of the consumption plan. The best resolve for this issue is switching to a dedicated hosting plan which provides more resources. However, if that isn't an option, reducing the amount of data being retrieved should be explored.
References: https://learn.microsoft.com/en-us/answers/questions/539967/azure-function-app-503-service-unavailable-in-code.html , https://learn.microsoft.com/en-us/answers/questions/522216/503-service-unavailable-while-executing-an-azure-f.html , https://learn.microsoft.com/en-us/answers/questions/328952/azure-durable-functions-timeout-error-in-activity.html and https://learn.microsoft.com/en-us/answers/questions/250623/azure-function-not-running-successfully.html
Related
I'm testing the Kafka Trigger and Output Binding in Azure Functions to consume a topic and write the message to another topic, very simple code.
But when I enable the auto-scale feature and the function provision new instances, I'm losing the 'exactly-once' feature and apparently some messages are being delivered to more than one instance.
host.json
{
"version": "2.0",
"logging": {
"applicationInsights": {
"samplingSettings": {
"isEnabled": true,
"excludedTypes": "Request"
}
}
},
"extensionBundle": {
"id": "Microsoft.Azure.Functions.ExtensionBundle",
"version": "[3.6.0, 4.0.0)"
}
}
function.json
{
"scriptFile": "__init__.py",
"bindings": [
{
"type": "kafkaTrigger",
"name": "kafkaTrigger",
"direction": "in",
"brokerList": "%PEP_BEES_KAFKA_BOOTSTRAP%",
"topic": "%PEP_BEES_KAFKA_SOURCE_TOPIC%",
"consumerGroup": "%PEP_BEES_KAFKA_SOURCE_TOPIC_CONSUMER_GROUP%"
},
{
"type": "kafka",
"direction": "out",
"name": "kafkaOutput",
"brokerList": "%PEP_BEES_KAFKA_BOOTSTRAP%",
"topic": "%PEP_BEES_KAFKA_DESTINATION_TOPIC%"
}
]
}
init.py
import logging, json
from azure.functions import KafkaEvent
import azure.functions as func
import typing
def main(kafkaTrigger : typing.List[KafkaEvent], kafkaOutput: func.Out[str]):
message = json.loads(str(kafkaTrigger.get_body().decode('utf-8')))
input_msg = str(message['Value'])
kafkaOutput.set(input_msg)
As you can see in the image below, the 'test-topic-output' (destination) has more messages than the 'test-topic' (source), indicating that sometimes more than one instance is consuming a message:
Message count
If I disable the auto-scaling feature, this behavior does not happen.
I just need that the 'exactly-once' feature works even with the function auto-scale enabled, to have an elastic environment.
EDIT
I just found out some erros during the Kafka Trigger processing:
Kafka Trigger errors
Confluent.Kafka.KafkaException:
at Confluent.Kafka.Impl.SafeKafkaHandle.StoreOffsets (Confluent.Kafka, Version=1.9.0.0, Culture=neutral, PublicKeyToken=12c514ca49093d1e)
at Confluent.Kafka.Consumer2.StoreOffset (Confluent.Kafka, Version=1.9.0.0, Culture=neutral, PublicKeyToken=12c514ca49093d1e) at Microsoft.Azure.WebJobs.Extensions.Kafka.AsyncCommitStrategy2.Commit (Microsoft.Azure.WebJobs.Extensions.Kafka, Version=3.6.0.0, Culture=neutral, PublicKeyToken=31bf3856ad364e35: /mnt/vss/_work/1/s/src/Microsoft.Azure.WebJobs.Extensions.Kafka/Trigger/AsyncCommitStrategy.cs:28)
at Microsoft.Azure.WebJobs.Extensions.Kafka.FunctionExecutorBase`2.Commit (Microsoft.Azure.WebJobs.Extensions.Kafka, Version=3.6.0.0, Culture=neutral, PublicKeyToken=31bf3856ad364e35: /mnt/vss/_work/1/s/src/Microsoft.Azure.WebJobs.Extensions.Kafka/Trigger/FunctionExecutorBase.cs:87)
If I disable the auto-scaling feature, this behavior does not happen
If you scale up, that wouldn't prevent records that were just consumed from existing inputs from being sent to output function.
You'd need a way to catch a consumer group rebalance within the function, and upon that action, then don't send to the output... Given that Azure doesn't expose Kafka native API, then its unlikely that is possible. You also would need to control when the function commits its consumer offsets (assuming PEP_BEES_KAFKA_SOURCE_TOPIC_CONSUMER_GROUP is a constant value); e.g. if it commits before kafkaTrigger.get_body(), then you should get less "extras", but if it commits after kafkaOutput.set(), then you'll have more upon scaling up...
Rather than use Python or serverless functions for replicating topics, you can use MirrorMaker or Kafka Streams instead.
I had noticed similar behavior of the kafka triggers consuming duplicate events when the scaling up the instance count (and surprisingly also during scale down). In batch mode with batch size of 50, its always the same 50 events that get read by 2 instance. I have also noticed this happens only once per instance scale up or (down).
One of the potential solution you can explore is using a warmup trigger that run during scale-out operation.
https://learn.microsoft.com/en-us/azure/azure-functions/functions-bindings-warmup?tabs=in-process&pivots=programming-language-python
Hello I'm trying to figure out how to use azure functions binding to extract data from a trigger and use it in an input db for filtering. I'm using service bus message as a trigger. It seems that service bus bindings listed in Azure Functions documentation always return null for me, (even if inside a function this parameter has value set).
That's what i've been trying so far in example below, i want to extract ApplicationProperties.zoneId set inside a message.
function.json
"bindings": [
{
"name": "msg",
"type": "serviceBusTrigger",
"direction": "in",
"queueName": "myqueue",
"connection": "ServiceBusConnection"
},
{
"name": "documents",
"type": "cosmosDB",
"direction": "in",
"databaseName": "test_db",
"collectionName": "items",
"sqlQuery": "SELECT * from c where c.zoneId = {ApplicationProperties.zoneId}",
"connectionStringSetting": "CosmosDBConnection"
}]
for testing i'm sending test message to service bus:
def send_single_message(sender):
message = ServiceBusMessage(
"woah_a_test",
correlation_id="1",
subject="az-fcn",
)
message.application_properties = {"zoneId": 1}
sender.send_messages(message)
print("Sent a single message")
It seems quite strange for me that i could not access in function.json any of the trigger parameters (i've also tried CorrelationId and Subject), but inside of a triggered function those parameters return correct value.
I'm aware that i can bypass this issue by filtering inside a function code instead of using input binding, but I'm just curious why those params does not return expected value there. Is there any way to debug it?
I wrote a azure function in python on Service bus topic, it works well but the issue is its not marking message as completed after finish.. the message process processed several time untill the count reached.. need to know how to mark message as complete here is my code.
def main(message: func.ServiceBusMessage):
try:
message_content_type = message.content_type
req_body = message.get_body().decode("utf-8")
logging.info(req_body)
response = obj_engine.extract_payload_message(req_body)
connection_str = os.environ["idpdev_SERVICEBUS"];
ocr_topic_name = os.environ["idpdev_topic"]
servicebus_client = ServiceBusClient.from_connection_string(conn_str=connection_str, logging_enable=True)
with servicebus_client:
sender = servicebus_client.get_topic_sender(topic_name=sender_topic_name)
with sender:
send_message = ServiceBusMessage(json.dumps(response))
sender.send_messages(send_message)
except ValueError:
pass
can someone please help to know what setting i am missing to add.
According to the docs you need to enable auto completion by setting autoComplete to true using the configuration:
Must be true for non-C# functions, which means that the trigger should either automatically call complete after processing, or the function code manually calls complete.
When set to true, the trigger completes the message automatically if the function execution completes successfully, and abandons the message otherwise.
Exceptions in the function results in the runtime calls abandonAsync in the background. If no exception occurs, then completeAsync is called in the background. This property is available only in Azure Functions 2.x and higher.
The configuration can be found in the function.json file, make sure the property is available and set to true, see the example below:
{
"scriptFile": "__init__.py",
"entryPoint": "main",
"bindings": [
{
"name": "message",
"type": "serviceBusTrigger",
"direction": "in",
"topicName": "mytopic",
"subscriptionName": "mysubscription",
"connection": "",
"autoComplete": "true"
}
]
}
I have encountered a problem when setting blob metadata in Azure Storage. I have developed a script for this in Spyder, so local Python, which works great. Now, I want to be able to execute this same script as an Azure Function. However, when setting the metadata I get the following error: HttpResponseError: The specifed resource name contains invalid characters.
The only change from Spyder to Functions that I made is:
Spyder:
def main(container_name,blob_name,metadata):
from azure.storage.blob import BlobServiceClient
# Connection string to storage account
storageconnectionstring=secretstoragestringnotforstackoverflow
# initialize clients
blobclient_from_connectionstring=BlobServiceClient.from_connection_string(storageconnectionstring)
containerclient=blobclient_from_connectionstring.get_container_client(container_name)
blob_client = containerclient.get_blob_client(blob_name)
# set metadata of container
blob_client.set_blob_metadata(metadata=metadata)
return
Functions
def main(req: func.HttpRequest):
container_name = req.params.get('container_name')
blob_name = req.params.get('blob_name')
metadata_raw = req.params.get('metadata')
metadata_json = json.loads(metadata_raw)
# Connection string to storage account
storageconnectionstring=secretstoragestringnotforstackoverflow
# initialize clients
blobclient_from_connectionstring=BlobServiceClient.from_connection_string(storageconnectionstring)
containerclient=blobclient_from_connectionstring.get_container_client(container_name)
blob_client = containerclient.get_blob_client(blob_name)
# set metadata of container
blob_client.set_blob_metadata(metadata=metadata_json)
return func.HttpResponse()
Arguments to the Function are passed in the header. Problem lies with metadata and not container_name or blob_name as I get no error when I comment out metadata. Also, I tried formatting metadata in many variations with single or double quotes and as JSON or as string but no luck so far. Anyone who could help me solve this problem?
I was able to fix the problem. Script was fine, problem was in the input parameters. They needed to be in a specific format. metadata as a dict with double quotes and blob/container as string without any quote.
As request the function.json:
{
"scriptFile": "__init__.py",
"bindings": [
{
"authLevel": "anonymous",
"type": "httpTrigger",
"direction": "in",
"name": "req",
"methods": [
"get",
"post"
]
},
{
"type": "http",
"direction": "out",
"name": "$return"
}
]
}
With parameter formatting:
Picture from Azure Functions
I have an azure function which is triggered by a file being put into blob storage and I was wondering how (if possible) to get the name of the blob (file) which triggered the function, I have tried doing:
fileObject=os.environ['inputBlob']
message = "Python script processed input blob'{0}'".format(fileObject.fileName)
and
fileObject=os.environ['inputBlob']
message = "Python script processed input blob'{0}'".format(fileObject.name)
but neither of these worked, they both resulted in errors. Can I get some help with this or some suggesstions?
Thanks
The blob name can be captured via the Function.json and provided as binding data. See the {filename} token below.
Function.json is language agnostic and works in all languages.
See documentation at https://learn.microsoft.com/en-us/azure/azure-functions/functions-triggers-bindings for details.
{
"bindings": [
{
"name": "image",
"type": "blobTrigger",
"path": "sample-images/{filename}",
"direction": "in",
"connection": "MyStorageConnection"
},
{
"name": "imageSmall",
"type": "blob",
"path": "sample-images-sm/{filename}",
"direction": "out",
"connection": "MyStorageConnection"
}
],
}
If you want to get the file name of the file that triggered your function you can to that:
Use {name} in function.json :
{
"bindings": [
{
"name": "myblob",
"type": "blobTrigger",
"path": "MyBlobPath/{name}",
"direction": "in",
"connection": "MyStorageConnection"
}
]
}
The function will be triggered by changes in yout blob storage.
Get the name of the file that triggered the function in python (init.py):
def main(myblob: func.InputStream):
filemane = {myblob.name}
Will give you the name of the file that triggered your function.
There is not any information about what trigger you used in your description. But fortunately, there is a sample project yokawasa/azure-functions-python-samples on GitHub for Azure Function using Python which includes many samples using different triggers like queue trigger or blob trigger. I think it's very helpful for you now, and you can refer to these samples to write your own one to satisfy your needs。
Hope it helps.
Getting the name of the inputBlob is not currently possible with Python Azure-Functions. There are open issues about it in azure-webjobs-sdk and azure-webjobs-sdk-script GitHub:
https://github.com/Azure/azure-webjobs-sdk/issues/1090
https://github.com/Azure/azure-webjobs-sdk-script/issues/1339
Unfortunatelly it's still not possible.
In Python, you can do:
import azure.functions as func
import os
def main(blobin: func.InputStream):
filename=os.path.basename(blobin.name)