Azure function & Blob Trigger times out - python

I have some trouble understanding how Function app is working ..
My environment is as follow : Python3.8, Blob Trigger, Cosumption Plan
I am creating an application which is trigger when an audio file is uploaded into a container. This audio file trigger an Azure function and run "Speech-To-Text" using Azure cognitive service (So my function is waiting for an answer from that service). I set up a "FUNCTIONS_WORKER_PROCESS_COUNT" at 5 in order to allow each of my function app instance to run several speech-to text analysis in parallel.
SO I uploaded 100 blob into my container to check my function behaviour, here is what I get :
Function app is triggered and start several servers (5 for 100 blobs) and then start processing 1 blob per server until it has been more than 30 minutes since I uploaded blob and then I get a Timeout.
But I was expecting this behaviour :
Function app is triggered and start several servers. Each servers process 5 blobs in parallel and give me an answer in 15 to 20 minutes for all of my blobs !
So I don't get 2 things here. W
Why are my functions not processing 5 blobs per server instead of 1 blob per server ? (I set up "FUNCTIONS_WORKER_PROCESS_COUNT" at 5) ?
And my blobs seems to be processed as soon as they appear in the container instead of putting in a queue. And this behaviour is responsible of time out since they are waiting for quite a long time instead of being processed .. WHy ?
I hope I was clear ..
Thank you for your help !
EDIT : I just added 100 blobs to see how function app is reacting, and my freshly uploaded blobs are being processed before the ones that I uploaded at the begining.

1. For your first question:
As far as my understanding, Python is a single-threaded runtime.
Because Python is a single-threaded runtime, a host instance for
Python can process only one function invocation at a time. For
applications that process a large number of I/O events and/or is I/O
bound, you can improve performance significantly by running functions
asynchronously.
And it is right that the FUNCTIONS_WORKER_PROCESS_COUNT makes you run 5 blob triggered functions per host, but it would run one by one if you are using the same resource, which means even you could run 5 process at the same time, if the first process is running your function, the second process (run the same function) would waiting; if your first process is waiting for the data come in, the second process would running first.
Here is an article how FUNCTIONS_WORKER_PROCESS_COUNT works.
And you can check how many worker instances you are using. If you have 100 blobs to trigger your function and 5 worker process set per worker instance, it should starts 20 instances for consuming the request. (Welcome to correct me if I'm wrong.)
2. For your second question:
The Blob storage trigger starts a function when a new or updated blob
is detected.
That's how blob triggered function works.

Related

Cloud Run suddenly starts timing out when processing any request

We've been running a backend application on Cloud Run for about a year and a half now, and a month ago it suddenly stopped properly handling all requests at seemingly random times (about every couple of days), only working again once we redeploy from the latest image from Cloud Build. The application will actually receive the request, however it just doesn't do anything and eventually the request will just time out (504) after 59m59s (the max timeout), even a test endpoint that just returns 'Hello World' times out without sending a response.
The application is written in Python and uses Flask to handle requests. We have a Cloud SQL instance that is used as its database, however we're confident this is not the source of the issue as even requests that don't involve the DB in any form do not work and the Cloud SQL instance is accessible even when the application stops working. Cloud Run is deployed with the following configuration:
CPU: 2
Memory: 8Gi
Timeout: 59m59s
VPC connector
VPC egress: private-ranges-only
Concurrency: 100
The vast majority of endpoints should produce some form of log when they first start, so we're confident that the application isn't executing any of the code after being triggered. We're not seeing any useful error messages in Logs Explorer either, simply just 504 errors from the requests timing out. It's deployed with a 59m59s timeout, so it's not the case that the timeout has been entered incorrectly and even then, that wouldn't explain why it works again when it's redeployed.
We have a Cloud Scheduler schedule that triggers the application every 15 minutes, which sends to an endpoint in the application that checks if any tasks are due to run and creates Cloud Tasks tasks (which send HTTP requests to an endpoint on the same application) for any tasks that need performing at that point in time. Every time the application stops working, it does seem to be during one of these runs, however we're not certain it's the cause as the Cloud Scheduler schedule is the most frequent trigger anyway. There doesn't seem to be a specific time of day that the crashes take place either.
This is a (heavily redacted) screenshot of the logs. The Cloud Scheduler schedule hits the endpoint at 21:00 and creates a number of tasks but then hits the default 3m Cloud Scheduler timeout limit at 21:03. The tasks it created then hit the default 10m Cloud Tasks timeout limit at 21:10 without their endpoint having done anything. After that point, all requests to the service timeout without doing anything.
The closest post I could find on SO was this one, their problem is also temporarily fixed by redeployment, however ours isn't sending 200 responses when it stops working and is instead just timing out without doing anything. We've tried adding retries to Cloud Scheduler + increasing its timeout limit, and we've also tried increasing the CPU and RAM allocation.
Any help is appreciated!

Can the Azure container apps from 0 to more like azure functions?

I have recently started exploring azure container apps as a microservice.
I have kept the minimum number of replicas to be 0 and maximum to be 10.
I am using a queue trigger input binding, that whenever a message comes in the queue it is processed.
I was expected it to work like a function app, where the container might be invoked on the input trigger. However, what I have observed is that the trigger doesnot get processed on the conditions I described above.
If I change the replicas to 1, then the trigger gets processed like a function app. But this method doesn't make it a serverless service as one instance is ON all the time and is costing me money (also unable to find how much it is costing in the idle state).
Can someone please guide me if I understood the container apps correctly, and is there a way to only invoke the container when a message comes to the queue?
The scenario that you are describing is what we support with ScaledJobs in KEDA instead of ScaledObject (deamon-like workloads).
ScaledJobs, however, are not supported in Azure Container Apps yet and is tracked on GitHub.
Based on example in documentation, you can scale from 0 for azure storage queue using keda scaler.

Heroku - downloaded files are taking over 30 seconds to process with AWS

So I am using AWS as a cloud on my website. his main purpose is to be a storage unit (s3) everything works great until I have a large file(5mb or 7mb) that passes Heroku's 30 seconds time limit and sends a H12 error.
s3.Object(BUCKET, file_full_name).put(Body=file_to_put)
the problem starts there. here I am writing the file to the cloud. and because it takes to long to write it the site continues to try and load the file and never does. file_to_put is byte type. How can I fix it so I could upload larger files to the cloud?
Note I am need to read the file but first I need to fix this
problem
backend framework - flask
This is where worker process types and Task queues come in (So you can use Celery+Redis with Flask or something similar).
Basically, you queue up the task of writing the file in a Task Queue (say Redis) and your web process returns 200 OK to the website visitor immediately. In the meantime your worker process picks the task from the queue and starts performing the time taking task (writing the file to S3).
On the front-end, you'll have to ask the visitor to "Come back after some time" or show a wait "spinner" or something that indicates to the visitor that file is not available yet. Once the file is written, you can send a signal to refresh the page, or maybe you can use JavaScript on the web page to check if the file is ready say every second, or simply ask the visitor to refresh the page after say a minute.
I know all this might sound complicated, but this is the way it is done. Your web process shouldn't be waiting on long running tasks.

How can I schedule or queue api calls to maintain rate limit?

I am trying to continuously crawl a large amount of information from a site using the REST api they provide. I have following constraints-
Stay within api limit (5 calls/sec)
Utilising the full limit (making exactly 5 calls per second, 5*60 calls per minute)
Each call will be with different parameters (params will be fetched from db or in-memory cache)
Calls will be made from AWS EC2 (or GAE) and processed data will be stored in AWS RDS/DynamoDB
For now I am just using a scheduled task that runs a python script every minute- and the script makes 10-20 api calls-> processes response-> stores data to DB. I want to scale this procedure (make 5*60= 300 calls per minute) and make it manageable via code (pushing new tasks, pause/resuming them easily, monitoring failures, changing call frequency).
My question is- what are the best available tools to achieve this? Any suggestion/guidance/link is appreciated.
I do know the names of some task queuing frameworks like Celery/RabbitMQ/Redis, but I do not know much about them. However I am wiling to learn one or each of those if these are the best tools to solve my problem, want to hear from SO veterans before jumping in ☺
Also please let me know if there's any other AWS service I should look to use (SQS or AWS Data Pipeline?) to make any step easier.
You needn't add an external dependency just for rate-limiting, as your use case is rather straightforward.
I can think of two options:
Modify the script (that currently wakes up every minute and makes 10-20 API calls) to wake up every second and make 5 calls (sequentially or in parallel).
In your current design, your API calls might not be properly distributed across 1 minute, i.e. you might be making all your 10-20 calls in the first, say, 20 seconds.
If you change that script to run every second, your API call rate will be more balanced.
Change your Python script to a long running daemon, and use a Rate Limiter library, such as this. You can configure the latter to make 1 call per x seconds.

Azure Machine Learning Request Response latency

I have made an Azure Machine Learning Experiment which takes a small dataset (12x3 array) and some parameters and does some calculations using a few Python modules (a linear regression calculation and some more). This all works fine.
I have deployed the experiment and now want to throw data at it from the front-end of my application. The API-call goes in and comes back with correct results, but it takes up to 30 seconds to calculate a simple linear regression. Sometimes it is 20 seconds, sometimes only 1 second. I even got it down to 100 ms one time (which is what I'd like), but 90% of the time the request takes more than 20 seconds to complete, which is unacceptable.
I guess it has something to do with it still being an experiment, or it is still in a development slot, but I can't find the settings to get it to run on a faster machine.
Is there a way to speed up my execution?
Edit: To clarify: The varying timings are obtained with the same test data, simply by sending the same request multiple times. This made me conclude it must have something to do with my request being put in a queue, there is some start-up latency or I'm throttled in some other way.
First, I am assuming you are doing your timing test on the published AML endpoint.
When a call is made to the AML the first call must warm up the container. By default a web service has 20 containers. Each container is cold, and a cold container can cause a large(30 sec) delay. In the string returned by the AML endpoint, only count requests that have the isWarm flag set to true. By smashing the service with MANY requests(relative to how many containers you have running) can get all your containers warmed.
If you are sending out dozens of requests a instance, the endpoint might be getting throttled. You can adjust the number of calls your endpoint can accept by going to manage.windowsazure.com/
manage.windowsazure.com/
Azure ML Section from left bar
select your workspace
go to web services tab
Select your web service from list
adjust the number of calls with slider
By enabling debugging onto your endpoint you can get logs about the execution time for each of your modules to complete. You can use this to determine if a module is not running as you intended which may add to the time.
Overall, there is an overhead when using the Execute python module, but I'd expect this request to complete in under 3 secs.

Categories

Resources