Adding metadata to Azure blob with Python SDK & Azure Functions

Adding metadata to Azure blob with Python SDK & Azure Functions - python

I have encountered a problem when setting blob metadata in Azure Storage. I have developed a script for this in Spyder, so local Python, which works great. Now, I want to be able to execute this same script as an Azure Function. However, when setting the metadata I get the following error: HttpResponseError: The specifed resource name contains invalid characters.
The only change from Spyder to Functions that I made is:
Spyder:
def main(container_name,blob_name,metadata):
from azure.storage.blob import BlobServiceClient
# Connection string to storage account
storageconnectionstring=secretstoragestringnotforstackoverflow
# initialize clients
blobclient_from_connectionstring=BlobServiceClient.from_connection_string(storageconnectionstring)
containerclient=blobclient_from_connectionstring.get_container_client(container_name)
blob_client = containerclient.get_blob_client(blob_name)
# set metadata of container
blob_client.set_blob_metadata(metadata=metadata)
return
Functions
def main(req: func.HttpRequest):
container_name = req.params.get('container_name')
blob_name = req.params.get('blob_name')
metadata_raw = req.params.get('metadata')
metadata_json = json.loads(metadata_raw)
# Connection string to storage account
storageconnectionstring=secretstoragestringnotforstackoverflow
# initialize clients
blobclient_from_connectionstring=BlobServiceClient.from_connection_string(storageconnectionstring)
containerclient=blobclient_from_connectionstring.get_container_client(container_name)
blob_client = containerclient.get_blob_client(blob_name)
# set metadata of container
blob_client.set_blob_metadata(metadata=metadata_json)
return func.HttpResponse()
Arguments to the Function are passed in the header. Problem lies with metadata and not container_name or blob_name as I get no error when I comment out metadata. Also, I tried formatting metadata in many variations with single or double quotes and as JSON or as string but no luck so far. Anyone who could help me solve this problem?

I was able to fix the problem. Script was fine, problem was in the input parameters. They needed to be in a specific format. metadata as a dict with double quotes and blob/container as string without any quote.
As request the function.json:
{
"scriptFile": "__init__.py",
"bindings": [
{
"authLevel": "anonymous",
"type": "httpTrigger",
"direction": "in",
"name": "req",
"methods": [
"get",
"post"
]
},
{
"type": "http",
"direction": "out",
"name": "$return"
}
]
}
With parameter formatting:
Picture from Azure Functions

Related

Error in Azure Storage Explorer with Azurite : The first argument must be of type string or an instance of Buffer

I'm running an Azure function locally, from VSCode, that outputs a string to a blob. I'm using Azurite to emulate the output blob container.
My function looks like this:
import azure.functions as func
def main(mytimer: func.TimerRequest, outputblob:func.Out[str]):
outputblob.set("hello")
My function.json:
{
"scriptFile": "__init__.py",
"bindings": [
{
"name": "mytimer",
"type": "timerTrigger",
"direction": "in",
"schedule": "0 * * * * *"
},
{
"name": "outputblob",
"type": "blob",
"dataType": "string",
"direction": "out",
"path": "testblob/hello"
}
]
}
In local.settings.json, I've set "AzureWebJobsStorage": "UseDevelopmentStorage=true".
The problem is, when I run the function and check in Azure Storage Explorer, the container is created (testblob) (along with 2 other containers: azure-webjobs-hosts and azure-webjobs-secrets) but it is empty and Azure Storage Explorer displays an error message when I refresh :
The first argument must be of type string or an instance of Buffer, ArrayBuffer, or Array or an Array-like Object.Received undefined
The function runs and doesn't return any error message.
When I use a queue instead of a blob as output, it works and I can see the string in the emulated queue storage.
When I use the blob storage in my Azure subscription instead of the emulated blob, it works as well, a new blob is created with the string.
I've tried the following:
clean and restart Azurite several times
replace "UseDevelopmentStorage=true" by the connection string of the emulated storage
reinstall Azure Storage Explorer
I keep getting the same error message.
I'm using Azure Storage Explorer Version 1.25.0 on Windows 11.
Thanks for any help!

It looks like this is a known issue with the latest release (v1.25.0) of Azure Storage Explorer version see:
https://github.com/microsoft/AzureStorageExplorer/issues/6008
Simplest solution is to uninstall and re-install an earlier version:
https://github.com/microsoft/AzureStorageExplorer/releases/tag/v1.24.3

Timeout errors when testing Azure function app

Using Azure functions in a python runtime env. to take latitude/longitude tuples from a snowflake database and return the respective countries. We also want to convert any non-english country names into English.
We initially found that although the script would show output in the terminal while testing on azure, it would soon return a 503 error (although the script continues to run at this point). If we cancelled the script it would show as a success in the monitor screen of azure portal, however leaving the script to run to completion would result in the script failing. We decided (partially based on this post) this was due to the runtime exceeding the maximum http response time allowed. To combat this we tried a number of solutions.
First we extended the function timeout value in the function.json file:
{
"version": "2.0",
"logging": {
"applicationInsights": {
"samplingSettings": {
"isEnabled": true,
"excludedTypes": "Request"
}
}
},
"extensionBundle": {
"id": "Microsoft.Azure.Functions.ExtensionBundle",
"version": "[2.*, 3.0.0)"
},
"functionTimeout": "00:10:00"
}
We then modified our script to use a queue trigger by adding the output
def main(req: func.HttpRequest, msg: func.Out[func.QueueMessage]) ->func.HttpResponse:
to the main .py script. We also then modified the function.json file to
{
"scriptFile": "__init__.py",
"bindings": [
{
"authLevel": "function",
"type": "httpTrigger",
"direction": "in",
"name": "req",
"methods": [
"get",
"post"
]
},
{
"type": "http",
"direction": "out",
"name": "$return"
},
{
"type": "queue",
"direction": "out",
"name": "msg",
"queueName": "processing",
"connection": "QueueConnectionString"
}
]
}
and the local.settings.json file to
{
"IsEncrypted": false,
"Values": {
"FUNCTIONS_WORKER_RUNTIME": "python",
"AzureWebJobsStorage": "{AzureWebJobsStorage}",
"QueueConnectionString": "<Connection String for Storage Account>",
"STORAGE_CONTAINER_NAME": "testdocs",
"STORAGE_TABLE_NAME": "status"
}
}
We also then added a check to see if the country name was already in English. The intention here was to cut down on calls to the translate function.
After each of these changes we redeployed to the functions app and tested again. Same result. The function will run, and print output to terminal, however after a few seconds it will show a 503 error and eventually fail.
I can show a code sample but cannot provide the tables unfortunately.
from snowflake import connector
import pandas as pd
import pyarrow
from geopy.geocoders import Nominatim
from deep_translator import GoogleTranslator
from pprint import pprint
import langdetect
import logging
import azure.functions as func
def main(req: func.HttpRequest, msg: func.Out[func.QueueMessage]) -> func.HttpResponse:
logging.info('Python HTTP trigger function processed a request.')
# Connecting string to Snowflake
conn = connector.connect(
user='<USER>',
password='<PASSWORD>',
account='<ACCOUNT>',
warehouse='<WH>',
database='<DB>',
schema='<SCH>'
)
# Creating objects for Snowflake, Geolocation, Translate
cur = conn.cursor()
geolocator = Nominatim(user_agent="geoapiExercises")
translator = GoogleTranslator(target='en')
# Fetching weblog data to get the current latlong list
fetchsql = "SELECT PAGELATLONG FROM <TABLE_NAME> WHERE PAGELATLONG IS NOT NULL GROUP BY PAGELATLONG;"
logging.info(fetchsql)
cur.execute(fetchsql)
df = pd.DataFrame(cur.fetchall(), columns = ['PageLatLong'])
logging.info('created data frame')
# Creating and Inserting the mapping into final table
for index, row in df.iterrows():
latlong = row['PageLatLong']
location = geolocator.reverse(row['PageLatLong']).raw['address']
logging.info('got addresses')
city = str(location.get('state_district'))
country = str(location.get('country'))
countrycd = str(location.get('country_code'))
logging.info('got countries')
# detect language of country
res = langdetect.detect_langs(country)
lang = str(res[0]).split(':')[0]
conf = str(res[0]).split(':')[0]
if lang != 'en' and conf > 0.99:
country = translator.translate(country)
logging.info('translated non-english country names')
insertstmt = "INSERT INTO <RESULTS_TABLE> VALUES('"+latlong+"','"+city+"','"+country+"','"+countrycd+"')"
logging.info(insertstmt)
try:
cur.execute(insertstmt)
except Exception:
pass
return func.HttpResponse("success")
If anyone had an idea what may be causing this issue I'd appreciate any suggestions.
Thanks.

To resolve timeout errors, you can try following ways:
As suggested by MayankBargali-MSFT , You can try to define the retry policies and for Trigger like HTTP and timer, don't resume on a new instance. This means that the max retry count is a best effort, and in some rare cases an execution could be retried more than the maximum, or for triggers like HTTP and timer be retried less than the maximum. You can navigate to Diagnose and solve problems to see if it helps you to know the root cause of 503 error as there can be multiple reasons for this error
As suggested by ryanchill , 503 issue is the result of high memory consumption which exceeded the limits of the consumption plan. The best resolve for this issue is switching to a dedicated hosting plan which provides more resources. However, if that isn't an option, reducing the amount of data being retrieved should be explored.
References: https://learn.microsoft.com/en-us/answers/questions/539967/azure-function-app-503-service-unavailable-in-code.html , https://learn.microsoft.com/en-us/answers/questions/522216/503-service-unavailable-while-executing-an-azure-f.html , https://learn.microsoft.com/en-us/answers/questions/328952/azure-durable-functions-timeout-error-in-activity.html and https://learn.microsoft.com/en-us/answers/questions/250623/azure-function-not-running-successfully.html

Not able to run blob trigger when published on azure functions

I have created a simple blob trigger in visual studio for which init.py is as below
import logging
import azure.functions as func
def main(myblob: func.InputStream):
logging.info(f"Python blob trigger function processed blob \n"
f"Name: {myblob.name}\n"
f"Blob Size: {myblob.length} bytes")
and function.json is as below
{
"scriptFile": "__init__.py",
"bindings": [
{
"name": "myblob",
"type": "blobTrigger",
"direction": "in",
"path": "mycontainer/{name}",
"connection": "AzureWebJobsStorage"
}
]
}
local.settings.json looks as below
{
"IsEncrypted": false,
"Values": {
"FUNCTIONS_WORKER_RUNTIME": "python",
"AzureWebJobsStorage": "DefaultEndpointsProtocol=https; AccountName=****;AccountKey=*****;EndpointSuffix=core.windows.net"
}
}
This code works fine with visual studio on local machine. But when published on azure portal it can not read blob path from function.json and gives error as
Invalid blob path specified : ''. Blob identifiers must be in the format 'container/blob'.
I have published function using command to push contains of local.settings.json.
func azure functionapp publish FUNCTIONNAME --build-native-deps --publish-local-settings -i
.
Can anyone please guid me what I am missing after publishing.

Are you using the run button in the Azure portal to test your function? The way this works for blob triggers is that in the 'Test' tab on the right hand side, you can specify the name of the blob you want to manually send a trigger event for, forcing your function to run:
The idea is that you should edit the contents of the request body box and put in the path to a valid blob in your account. That way the trigger runs and finds the blob and retrieves it. So if you don't modify the request body box, then it will look for a blob and fail to find it and throw the 404 error.
Also please take a look at below document for configuring container name
https://learn.microsoft.com/en-us/azure/azure-functions/functions-bindings-storage-blob#storage-blob-trigger
Also please verify if you setting has been published in the portal or not.
func azure functionapp publish "functionname" --publish-local-settings
Hope it helps.

Blob Trigger for Python Function App is not Firing

I am using Ubuntu 16.04.5 LTS local machine to create and publish Python Function App to Azure using CLI and Azure Functions Core Tools (Ref). I have configured Blob Trigger and my function.json file looks like this:
{
"disabled": false,
"scriptFile": "__init__.py",
"bindings": [
{
"name": "<Blob Trigger Name>",
"type": "blobTrigger",
"direction": "in",
"path": "<Blob Container Name>/{name}",
"connection": "<Connection String having storage account and key>"
},
{
"name": "outputblob",
"type": "blob",
"path": "<Blob Container Name>",
"connection": "<Connection String having storage account and key>",
"direction": "out"
}
]
}
My init.py function looks like this.
def main(<Blob Trigger Name>: func.InputStream, doc: func.Out[func.Document]):
logging.info(f"Python blob trigger function processed blob \n"
f"Name: {<Blob Trigger Name>.name}\n"
f"Blob Size: {<Blob Trigger Name>.length} bytes")
logging.basicConfig(filename='example.log',level=logging.DEBUG)
logging.debug('This message should go to the log file')
logging.info('So should this')
logging.warning('And this, too')
# Write text to the file.
file = open("QuickStart.txt", 'w')
file.write("Hello, World!")
file.close()
# Create the BlockBlockService that is used to call the Blob service for the storage account
block_blob_service = BlockBlobService(account_name='<Storage Account Name>', account_key='<Storage Account Key>')
container_name='<Blob Container Name>'
# Set the permission so the blobs are public.
block_blob_service.set_container_acl(container_name, public_access=PublicAccess.Container)
# Upload the created file, use local_file_name for the blob name
block_blob_service.create_blob_from_path(container_name, 'QuickStart.txt', '')
The Function App is "Always On" but when I upload a blob in the storage the function is not getting triggered. Another Reference Link is this (Ref).
What's going wrong?
Thanks and regards,
Shashank

Have you checked that the local.settings.json (connection strings for storage accounts) are also in the function app in Azure? They are not published from local machine by default.
You can configure them manually in the portal or use the publish-local-settings flag:
func azure functionapp publish "functionname" --publish-local-settings

I tried to reproduce this issue by creating a sample function app in python using Visual studio code with default template and finally deployed in Linux. It worked for me.
Here is the piece of code i have written in pyhton file.
import logging
import azure.functions as func
def main(myblob: func.InputStream):
logging.info(f"Python blob trigger function processed blob \n"
f"Name: {myblob.name}\n"
f"Blob Size: {myblob.length} bytes")
and here is the function.json file from my function app.
{
"scriptFile": "__init__.py",
"bindings": [
{
"name": "myblob",
"type": "blobTrigger",
"direction": "in",
"path": "samples-workitems/{name}",
"connection": ""
}
]
}
I am using 2.0 Azure function , python 3.6 and Azure Functions Core Tools version 2.2.70
this is the reference link i used :
https://learn.microsoft.com/en-us/azure/azure-functions/functions-create-first-function-python
Please try to use this and see if it helps.

In your main def of py script you have 2nd argument of doc: func.Out[func.Document], which is for cosmos db. This should be an output stream, as its of type blob

How Do I get the Name of The inputBlob That Triggered My Azure Function With Python

I have an azure function which is triggered by a file being put into blob storage and I was wondering how (if possible) to get the name of the blob (file) which triggered the function, I have tried doing:
fileObject=os.environ['inputBlob']
message = "Python script processed input blob'{0}'".format(fileObject.fileName)
and
fileObject=os.environ['inputBlob']
message = "Python script processed input blob'{0}'".format(fileObject.name)
but neither of these worked, they both resulted in errors. Can I get some help with this or some suggesstions?
Thanks

The blob name can be captured via the Function.json and provided as binding data. See the {filename} token below.
Function.json is language agnostic and works in all languages.
See documentation at https://learn.microsoft.com/en-us/azure/azure-functions/functions-triggers-bindings for details.
{
"bindings": [
{
"name": "image",
"type": "blobTrigger",
"path": "sample-images/{filename}",
"direction": "in",
"connection": "MyStorageConnection"
},
{
"name": "imageSmall",
"type": "blob",
"path": "sample-images-sm/{filename}",
"direction": "out",
"connection": "MyStorageConnection"
}
],
}

If you want to get the file name of the file that triggered your function you can to that:
Use {name} in function.json :
{
"bindings": [
{
"name": "myblob",
"type": "blobTrigger",
"path": "MyBlobPath/{name}",
"direction": "in",
"connection": "MyStorageConnection"
}
]
}
The function will be triggered by changes in yout blob storage.
Get the name of the file that triggered the function in python (init.py):
def main(myblob: func.InputStream):
filemane = {myblob.name}
Will give you the name of the file that triggered your function.

There is not any information about what trigger you used in your description. But fortunately， there is a sample project yokawasa/azure-functions-python-samples on GitHub for Azure Function using Python which includes many samples using different triggers like queue trigger or blob trigger. I think it's very helpful for you now, and you can refer to these samples to write your own one to satisfy your needs。
Hope it helps.

Getting the name of the inputBlob is not currently possible with Python Azure-Functions. There are open issues about it in azure-webjobs-sdk and azure-webjobs-sdk-script GitHub:
https://github.com/Azure/azure-webjobs-sdk/issues/1090
https://github.com/Azure/azure-webjobs-sdk-script/issues/1339

Unfortunatelly it's still not possible.
In Python, you can do:
import azure.functions as func
import os
def main(blobin: func.InputStream):
filename=os.path.basename(blobin.name)

Develop Reference

Python is a programming language that lets you work quickly and integrate systems more effectively.

Adding metadata to Azure blob with Python SDK & Azure Functions - python

Related

Error in Azure Storage Explorer with Azurite : The first argument must be of type string or an instance of Buffer

Timeout errors when testing Azure function app

Not able to run blob trigger when published on azure functions

Blob Trigger for Python Function App is not Firing

How Do I get the Name of The inputBlob That Triggered My Azure Function With Python

Categories

Resources