I am working with Azure Function 2.0, Python 3.7 Runtime and Azure Function Tools 2.0.
I have a simple blob trigger based functions that read a file containing different urls (www.xxx.com/any,...) and scrape them using library requests and beautifulsoup4
The app service plan is not "shared" is based on a specific App Service Plan.
The overall timeout is set as 15 minutes (in hosts.json file).
Sometime the action of scrapin an url takes long time and I have found this bad behavior.
The function go into timeout with the blob XXXX
The function continue is invoked again for the blob XXXX, it seems the previous run was unsuccessfull and the runtime re-execute the function.
Questions:
- How can i specify that a blob con trigger only one function execution? I would like to rewrite an external check.
- Do you have suggestion to limit timeouts for requests in a python function? I have tried the standard timeout and eventlet.Timout without success.
Thanks
Ra
BlobTrigger will run for all blobs that dont have a reciept. There are an issue on github regarding this topic but regarding timestamps on a blob and not quite what you are after.
https://github.com/Azure/azure-webjobs-sdk/issues/1327
Regarding timeouts in function this is by design. You could check out durable functions.
Related
I have recently started exploring azure container apps as a microservice.
I have kept the minimum number of replicas to be 0 and maximum to be 10.
I am using a queue trigger input binding, that whenever a message comes in the queue it is processed.
I was expected it to work like a function app, where the container might be invoked on the input trigger. However, what I have observed is that the trigger doesnot get processed on the conditions I described above.
If I change the replicas to 1, then the trigger gets processed like a function app. But this method doesn't make it a serverless service as one instance is ON all the time and is costing me money (also unable to find how much it is costing in the idle state).
Can someone please guide me if I understood the container apps correctly, and is there a way to only invoke the container when a message comes to the queue?
The scenario that you are describing is what we support with ScaledJobs in KEDA instead of ScaledObject (deamon-like workloads).
ScaledJobs, however, are not supported in Azure Container Apps yet and is tracked on GitHub.
Based on example in documentation, you can scale from 0 for azure storage queue using keda scaler.
I have a python program that generates data, it may rely on webservices to work which slows down the process.
I would like to expose this program as a web service that returns a file to download (content-disposition attachment)
I thought the easiest and cheapest option would be to go serverless.
I have quota on Azure so I want to try with Azure functions.
Azure http python functions must return a object func.HttpResponse() that takes the response body as parameter.
However, I would like not to generate the whole file in memory or in a temporary file before returning the response.
It could be big files and also file creation can be slow.
The file can totally be transferred back to the user as it is constructed. (return a httpresponse with chunk encoding)
It would make less wait for people calling the service and I believe reduce costs on the function (less memory used during concurrent calls)
Is it possible with Azure HTTP triggered function? If not with Azure, is it on GCP or AWS functions?
Thank you MiniScalope. Posting your comment as an answer to help other community members.
As per the comment, you have found that it's only possible with C# as of now.
You can refer to Out of proc languages stream support
First, it looks like this thread but it is not: An unknown error has occurred in Cloud Function: GCP Python
I deployed a couple of times Cloud Functions and they are still working fine. Nevertheless, since last week, following the same procedure I can deploy correctly, but testing them I get the error "An unknown error has occurred in Cloud Functions. The attempted action failed. Please try again, send feedback".
In remote the script works perfectly and writes in Cloud Storage.
My Cloud Function is a zip with a python script, loading a csv in Cloud Storage.
The csv weights 160kB, the python script 5kB. So I used 128MiB of memory allocated.
The execution time is 38 secs, almost half of the default timeout.
It is configured to allow just traffic within the project.
Env variables are not the problem
It's triggered by pub/sub and what I want is to schedule it when I can make it work.
I'm quite puzzled. I have such a lack of ideas right now that I started to think everything works fine but the Google testing method is what is fails... Nevertheless when I run the pub/sub topic in Cloud Scheduler it launches the error log without much info 1. By any chance anyone had the same problem?
Thanks
Answer of myself from the past:
Finally "solved". I'm a processing a csv in the CF of 160kB, in my computer the execution time lasts 38 seconds. For some reason in the CF I need 512MB of Allocated Memory and a timeout larger than 60 secs.
Answer of myself from a closest past:
Don't test a CF using the test button, because sometimes it takes more than the max available timeout to finish, hence you'll get errors.
If you want to test it easily
Write prints after milestones in your code to check how the script is evolving.
Use the logs interface. The prints will be displayed there ;)
Also, logs show valuable info (sometimes even readable).
Also, if you're sending for example, to buckets, check them after the CF is finished, maybe you get a surprise.
To sum up, don't believe blindly in the testing button.
Answer of myself from the present (already regretting the prints thing):
There are nice python libraries to check logs, don't print stuff for that (if you have time).
I am interested in implementing a compute service for an application im working on in the cloud. The idea is there are 3 modules in the service. A compute manager that receives requests (with input data), triggers azure function computes (the computes are the 2nd 'module'). Both modules share same blob storage for the scripts to be run and the input / output data (json) for the compute.
I'm wanting to draw up a basic diagram but need to understand a few things first. Is the thing I described above possible, or must azure functions have their own separate storage. Can azure functions have concurrent executions of same script with different data.
I'm new to Azure so what I've been learning about Azure functions hasn't yet answered my questions. I'm also unsure how to minimise cost. The functions wont run often.
I hope someone could shed some light on this for me :)
Thanks
In fact, Azure function itself has many kinds of triggers. For example: HTTP trigger, Storage trigger, or Service Bus trigger.
So, I think you can use it without your computer manager if there is one inbuilt trigger meets your requirements.
At the same time, all functions can share same storage account. You just need to use the correct storage account connection string.
And, at the end, as your function will not run often, I suggest you use azure function consumption plan. When you're using the Consumption plan, instances of the Azure Functions host are dynamically added and removed based on the number of incoming events.
I'm trying to download a large VHD file (30GB) from Azure Blob Storage using the following code:
blob_service.get_blob_to_path('vhds', '20161206092429.vhd', '20161206092429.vhd')
where the first parameter is the container name, the second the blob name, and the third the local file/path where it will be saved. This 30GB download was working normally, but all of a sudden I started receiving this error:
AzureHttpError: The condition specified using HTTP conditional
header(s) is not met. ConditionNotMetThe
condition specified using HTTP conditional header(s) is not met.
RequestId:88b6ac24-0001-0001-5ec0-4f490d000000
Time:2016-12-06T12:57:13.5389237Z
Download now runs OK for some random time: sometimes really short time, and sometimes long time. Even up to 9 or 10GB of the full 30GB download.
According to this questions:
Azure Blob: "The condition specified using HTTP conditional header(s) is not met"
304: The condition specified using HTTP conditional header(s) is not met
It seems to be a race condition, but that doesn't help much to solve the issue without diving in and deal with the SDK code. Any suggestions on what can be causing this, as the download was working previously? Maybe an outage on Azure cloud?
As a VHD changes, its related ETag will change. Once this happens, a file-copy operation will no longer be valid. I believe this is what you're seeing via your call to blob_service.get_blob_to_path(), since your vhd is being used with a running VM. And... even if the vm is idle - a running OS is never really idle - there are always some background operations, which likely write to disk.
Not that it will ensure a successful file-copy operation, but you'd need to shut down the VM first before initiating a copy.
Alternatively, you can make a snapshot of the VHD and then do a copy via the snapshot instead of the original vhd (which would then let you continue to use your vhd during the copy operation).
If you're creating your blob service with an sas_token it may only have been set to last for an hour. If that's the case you can change the expiry time of the token on creation to a later point in time.