Stream data from azure python functions to client - python

I have a python program that generates data, it may rely on webservices to work which slows down the process.
I would like to expose this program as a web service that returns a file to download (content-disposition attachment)
I thought the easiest and cheapest option would be to go serverless.
I have quota on Azure so I want to try with Azure functions.
Azure http python functions must return a object func.HttpResponse() that takes the response body as parameter.
However, I would like not to generate the whole file in memory or in a temporary file before returning the response.
It could be big files and also file creation can be slow.
The file can totally be transferred back to the user as it is constructed. (return a httpresponse with chunk encoding)
It would make less wait for people calling the service and I believe reduce costs on the function (less memory used during concurrent calls)
Is it possible with Azure HTTP triggered function? If not with Azure, is it on GCP or AWS functions?

Thank you MiniScalope. Posting your comment as an answer to help other community members.
As per the comment, you have found that it's only possible with C# as of now.
You can refer to Out of proc languages stream support

Related

Fast way to query single API Endpoint with Python Azure Functions concurrently

For the project I am working on, I am developing several Python Azure Functions for making internal database calls and calling external APIs. I have chosen for Python since this is the programming language I am most proficient in, but I am willing to explore other options in different programming languages (or outside the Azure Functions framework) if this makes it easier.
I have a single REST API endpoint that I would like to call concurrently, e.g., I would like to make 200 requests to this endpoint in a fast and efficient way. So the input for this is a JSON, array or list with 200 URL endpoint strings that need to be called and the output should be the API responses in a JSON, array or list.
The Python Azure Function documentation makes the following important remarks on Python as a programming language in relation to concurrency:
Because Python is a single-threaded runtime, a host instance for Python can process only one function invocation at a time by default. For applications that process a large number of I/O events and/or is I/O bound, you can improve performance significantly by running functions asynchronously.
As an experiment I have create a Durable Function with a Fan-out/Fan-in pattern to handle the problem. The Activity Function makes the actual request to the API endpoint using the Python requests module, and an Orchestrator Function handles the tasks sent to the workers (i.e. the Activity Functions). I am running the function App on a Premium App Service plan, so not using the Serverless Consumption plan.
My questions in relation to my problem:
Is the Durable Functions framework a good choice for what I am trying to achieve? Or should I model this as a single HTTP-triggered function that can handle the batch of requests as a whole and create a mechanism with asynchronous API calls within this single function?
How should I use asyncio (or any other related module) to make the API calls asynchronous and achieve parallelism? Does each Activity Function then process a single API call or a batch of API calls that are processed in parallel by a single worker?
What settings do I need to configure for my Function app to optimize this? I know the following settings are configurable: FUNCTIONS_WORKER_PROCESS_COUNT, PYTHON_THREADPOOL_THREAD_COUNT and maxConcurrentRequests. If for example I want to have 4 parallel executions of making API requests, do I set any of these settings to 4, and if so which one?
Changing the maxConcurrentActivityFunctions to 4 does not seem to do anything in terms of processing times, does this setting only impact serverless functions (Consumption Plan)?
Many questions at once, but I am a bit lost in the options out there and the documentation does not explicitly states how to improve parallelism when running Durable Functions.

Run & scale simple python scripts on Google Cloud Platform

I have a simple python script that I would like to run thousands of it's instances on GCP (at the same time). This script is triggered by the $Universe scheduler, something like "python main.py --date '2022_01'".
What architecture and technology I have to use to achieve this.
PS: I cannot drop $Universe but I'm not against suggestions to use another technologies.
My solution:
I already have a $Universe server running all the time.
Create Pub/Sub topic
Create permanent Compute Engine that listen to Pub/Sub all the time
$Universe send thousand of events to Pub/Sub
Compute engine trigger the creation of a Python Docker Image on another Compute Engine
Scale the creation of the Docker images (I don't know how to do it)
Is it a good architecture?
How to scale this kind of process?
Thank you :)
It might be very difficult to discuss architecture and design questions, as they usually are heavy dependent on the context, scope, functional and non functional requirements, cost, available skills and knowledge and so on...
Personally I would prefer to stay with entirely server-less approach if possible.
For example, use a Cloud Scheduler (server less cron jobs), which sends messages to a Pub/Sub topic, on the other side of which there is a Cloud Function (or something else), which is triggered by the message.
Should it be a Cloud Function, or something else, what and how should it do - depends on you case.
As I understand, you will have a lot of simultaneous call on a custom python code trigger by an orchestrator ($Universe) and you want it on GCP platform.
Like #al-dann, I would go to serverless approach in order to reduce the cost.
As I also understand, pub sub seems to be not necessary, you will could easily trigger the function from any HTTP call and will avoid Pub Sub.
PubSub is necessary only to have some guarantee (at least once processing), but you can have the same behaviour if the $Universe validate the http request for every call (look at http response code & body and retry if not match the expected result).
If you want to have exactly once processing, you will need more tooling, you are close to event streaming (that could be a good use case as I also understand). In that case in a full GCP, I will go to pub / sub & Dataflow that can guarantee exactly once, or Kafka & Kafka Streams or Flink.
If at least once processing is fine for you, I will go http version that will be simple to maintain I think. You will have 3 serverless options for that case :
App engine standard: scale to 0, pay for the cpu usage, can be more affordable than below function if the request is constrain to short period (few hours per day since the same hardware will process many request)
Cloud Function: you will pay per request(+ cpu, memory, network, ...) and don't have to think anything else than code but the code executed is constrain on a proprietary solution.
Cloud run: my prefered one since it's the same pricing than cloud function but you gain the portability, the application is a simple docker image that you can move easily (to kubernetes, compute engine, ...) and change the execution engine depending on cost (if the load change between the study and real world).

azure function python - Blob Trigger and Timeout

I am working with Azure Function 2.0, Python 3.7 Runtime and Azure Function Tools 2.0.
I have a simple blob trigger based functions that read a file containing different urls (www.xxx.com/any,...) and scrape them using library requests and beautifulsoup4
The app service plan is not "shared" is based on a specific App Service Plan.
The overall timeout is set as 15 minutes (in hosts.json file).
Sometime the action of scrapin an url takes long time and I have found this bad behavior.
The function go into timeout with the blob XXXX
The function continue is invoked again for the blob XXXX, it seems the previous run was unsuccessfull and the runtime re-execute the function.
Questions:
- How can i specify that a blob con trigger only one function execution? I would like to rewrite an external check.
- Do you have suggestion to limit timeouts for requests in a python function? I have tried the standard timeout and eventlet.Timout without success.
Thanks
Ra
BlobTrigger will run for all blobs that dont have a reciept. There are an issue on github regarding this topic but regarding timestamps on a blob and not quite what you are after.
https://github.com/Azure/azure-webjobs-sdk/issues/1327
Regarding timeouts in function this is by design. You could check out durable functions.

Azure functions: Can I implement my architecture and how do I minimize cost?

I am interested in implementing a compute service for an application im working on in the cloud. The idea is there are 3 modules in the service. A compute manager that receives requests (with input data), triggers azure function computes (the computes are the 2nd 'module'). Both modules share same blob storage for the scripts to be run and the input / output data (json) for the compute.
I'm wanting to draw up a basic diagram but need to understand a few things first. Is the thing I described above possible, or must azure functions have their own separate storage. Can azure functions have concurrent executions of same script with different data.
I'm new to Azure so what I've been learning about Azure functions hasn't yet answered my questions. I'm also unsure how to minimise cost. The functions wont run often.
I hope someone could shed some light on this for me :)
Thanks
In fact, Azure function itself has many kinds of triggers. For example: HTTP trigger, Storage trigger, or Service Bus trigger.
So, I think you can use it without your computer manager if there is one inbuilt trigger meets your requirements.
At the same time, all functions can share same storage account. You just need to use the correct storage account connection string.
And, at the end, as your function will not run often, I suggest you use azure function consumption plan. When you're using the Consumption plan, instances of the Azure Functions host are dynamically added and removed based on the number of incoming events.

Azure infrastructure for a Python script triggered by a http request

I'm a bit lost in the jungle of documentation, offers, and services. I'm wondering how the infrastructure should look like, and it would be very helpful to get a nudge in the right direction.
We have a python script with pytorch that runs a prediction. The script has to be triggered from a http request. Preferably, the samples to do a prediction on also has to come from the same requester. It has to return the prediction as fast as possible.
What is the best / easiest / fastest way of doing this?
We have the script laying in a Container Registry for now. Can we use it? Azure Kubernetes Service? Azure Container Instances (is this fast enough)?
And about the trigger, should we use Azure function, or logic app?
Thank you!
Azure Functions V2 has just launched a private preview for writing Functions using Python. You can find some instructions for how to play around with it here. This would probably be one of the most simple ways to execute this script with an HTTP request. Note that since it is in private preview, I would hesitate to recommend using it in a production scenario.
Another caveat to note with Azure Functions is that there will be a cold start whenever we create a new instance of your function application. This should be in the order of magnitude of ~2-4 seconds, and should only happen on the first request after the application has not seen much traffic for a while, or if a new instance has been created to scale up your application to receive more traffic. You can avoid this cold start by making your function on a dedicated App Service Plan, but at that point you are losing a lot of the benefits of Azure Functions.

Categories

Resources