Just for reference I am coming from AWS so any comparisons would be welcome.
I need to create a function which detects when a blob is placed into a storage container and then downloads the blob to perform some actions on the data in it.
I have created a storage account with a container in, and a function app with a python function in it. I have then set up a event grid topic and subscription so that blob creation events trigger the event. I can verify that this is working. This gives me the URL of the blob which looks something like https://<name>.blob.core.windows.net/<container>/<blob-name>. However then when I try to download this blob using BlobClient I get various errors about not having the correct authentication or key. Is there a way in which I can just allow the function to access the container in the same way that in AWS I would give a lambda an execution role with S3 permissions, or do I need to create some key to pass through somehow?
Edit: I need this to run ASAP when the blob is put in the container so as far as I can tell I need to use EventGrid triggers not the normal blob triggers
I need to create a function which detects when a blob is placed into a storage container and then downloads the blob to perform some actions on the data in it.
This can be achieved by using an Azure Blob storage trigger for Azure Functions.
The Blob storage trigger starts a function when a new or updated blob is detected. The blob contents are provided as input to the function.
This last sentence, "The blob contents are provided as input to the function", means the blob can be an input parameter to the Function. This way, there's no (or less) need for you to download it manually.
Is there a way in which I can just allow the function to access the container in the same way that in AWS I would give a lambda an execution role with S3 permissions
Have a look at Using Managed Identity between Azure Functions and Azure Storage.
EDIT
I have understood correctly the normal blob trigger can have up to 10 minutes of delay?
This is correct, a Blob trigger could have up to 10 minutes of delay before it actually triggers the Function. The second part of the answer still stands, though.
The answer lied somewhere between #rickvdbosch's answer and Abdul's comment. I first had to assign an identity to the function giving it permission to access the storage account. Then I was able to use the azure.identity.DefaultAzureCredential class to automatically handle the credentials for the BlobClient
Related
I have created an Azure function which is trigered when a new file is added to my Blob Storage. This part works well !
BUT, now I would like to start the "Speech-To-Text" Azure service using the API. So I try to create my URI leading to my new blob and then add it to the API call. To do so I created an SAS Token (From Azure Portal) and I add it to my new Blob Path .
https://myblobstorage...../my/new/blob.wav?[SAS Token generated]
By doing so I get an error which says :
Authentification failed Invalid URI
What am I missing here ?
N.B : When I generate manually the SAS token from the "Azure Storage Explorer" everything is working well. Plus my token is not expired in my test
Thank you for your help !
You might generate the SAS token with wrong authentication.
Make sure the Object option is checked.
Here is the reason in docs:
Service (s): Access to service-level APIs (e.g., Get/Set Service Properties, Get Service Stats, List Containers/Queues/Tables/Shares)
Container (c): Access to container-level APIs (e.g., Create/Delete Container, Create/Delete Queue, Create/Delete Table, Create/Delete
Share, List Blobs/Files and Directories)
Object (o): Access to object-level APIs for blobs, queue messages, table entities, and files(e.g. Put Blob, Query Entity, Get Messages,
Create File, etc.)
I want to use 2 blobs containers to trigger that azure function. Is there also a way to recognized which blob storage trigger the azure function? Please help. Thank you! Python
There are no plans to support multiple triggers per Function.
Each function has only one trigger but it can have multiple input bindings.
For your need, aving your blob uploads trigger an Event Grid event, and have an Event Grid Triggered function which is fired for each blob uploaded.
My Python Azure Function configuration file (function.json) defines a Blob storage trigger.
When the Azure Function wakes up (i.e. the server shown in Live Metrics becomes online after some sleeping time) it processes all the existing blobs regardless which ones have already generated trigger events.
I noticed that the azure-webjobs-hosts/blobreceipts folder is populated with sandboxhost637nnn folders. A new sandboxhost folder is created every Azure Function wake up event. This way the function forgets previously processed blobs (no old receipts found). In the past (Dec ’19), I remember that a unique webjobs-hosts folder containing all receipts persisted across invocations.
To explain why this happens:
Irrespective of your environment , the blob trigger, by design, keeps track of new & updated blobs by maintaining blob receipts in azure-webjobs-hosts container. These receipts are correlated by their **eTags** to **host ID** of your Functions runtime.
When your function wakes up, it'll result in change of your **host ID and the eTag->host ID correlations** you previously had will no longer apply which then results in the new host re-processing all of your existing blobs -- the behavior you've observed.
The recommendation is to use Event Grid trigger instead or use the Azure app service base plan for your function app which will be costlier.
Additional reference:
https://github.com/Azure/azure-webjobs-sdk/issues/1327
Hope it helps.
I am not sure what happened to you. But this problem should not occured any more now.
This is an old problem. And the sdk has update.
BlobTriggers will be processed only once now, have a look of this:
https://azure.microsoft.com/zh-cn/blog/announcing-the-0-5-0-beta-preview-of-microsoft-azure-webjobs-sdk/
May be your problem is a new one. But you need to provide your logs and SDK you used. Then we can help you solve this. Let me know if you update later.
I've stored some records on my datastore console. Now I wanna manipulate them to optimize Big Data analytics.
How can I write a Python cloud routine to make some transformations on my data? Can I trigger it with Datastore events?
Thanks a lot.
I have coded a little bit myself. You can find my code in GitHub.
What it is doing:
It is an HTTP Cloud Function
Establishing connection to Google Cloud Datastore through client()
Updates a value of specific entry in the entity using ID number and entry's column name
How to use:
Test this Cloud Function and get the idea of how it is working. Then manipulate according to your needs. I have tested this my self and it is working.
Create an HTTP trigger Python Cloud Function.
Set the name to updateDatastore
Copy and paste the code from GitHub.
Add this line google-cloud-datastore to the requirements.txt file.
In main code assign ENTITY_KIND your entity's kind value
In main code assign ENTITY_KEY your entity's key value
When clicked on HTTP trigger URL, after your Cloud Function's execution current time will be written in the column.
I'm using a queue trigger to pass in some data about a job that I want to run with Azure Functions(I'm using python). Part of the data is the name of a file that I want to pull from blob storage. Because of this, declaring a file path/name in an input binding doesn't seem like the right direction, since the function won't have the file name until it gets the queue trigger.
One approach I've tried is to use the azure-storage sdk, but I'm unsure of how to handle authentication from within the Azure Function.
Is there another way to approach this?
In Function.json, The blob input binding can refer to properties from the queue payload. The queue payload needs to be a JSON object
Since this is function.json, it works for all languages.
See official docs at https://learn.microsoft.com/en-us/azure/azure-functions/functions-triggers-bindings
For example, in you function.json,
{
"name": "imageSmall",
"type": "blob",
"path": "container/{filename}",
}
And if your queue message payload is:
{
"filename" : "myfilename"
}
Then the {filename} token in the blob's path expression will get substituted.
Typically, you store connection strings / account keys in App Settings of the Function App, and then read them by accessing environment variables. I haven't used python in Azure, but I believe that looks like
connection = open(os.environ['ConnectionString']).read()
I've found one example of python function which does what you ask for: queue trigger + blob operation.
Storing secrets can (also) be done using App Settings.
In Azure, go to your Azure Functions App Service, Then click "Application Settings". Then, scroll down to the "App Settings" list. This list consists of Key-Value pairs. Add your key, for example MY_CON_STR and the actual connection string as the value.
Don't forget to click save at this point
Now, in your application (your Function for this example), you can load the stored value using its key. For example, in python, you can use:
os.environ['MY_CON_STR']
Note that since the setting isn't saved locally, you have to execute it from within Azure. Unfortunately, Azure Functions applications do not contain a web.config file.