Why does AWS Lambda run multiple times via CloudWatch event? - python

I have a lambda function build with python.
The lambda function executes a docker image which is hosted on AWS ECR.
The python code in the docker container:
pulls data from rest api
parses the data
sends an event based on the parsed data via AWS SNS
When I trigger the AWS function via test it just gets executed once.
When I execute the code locally on my machine it also just gets executed once.
But when the lambda function gets triggered by a CloudWatch event (Cloudbridge) lets say every 5 minutes.
The code gets executed multiple times.
I tried the following:
try using python sleep timer
configure the timeout from aws lambda to 5 minutes
deleting the function and pushing the docker image into ecr again
writing the to be send events into a tmp/ file and read from it again
I use the docker image to not use with dependencies in lambda.

Related

Counterpart of Heroku dynos or workers on AWS

I am looking for an AWS instance that will allow me to run a simple python script made with a flask that runs constantly in an infinite loop.
The instance must be created and configured via an API call from the main web page with its own database in postgres.
Example:
Client1 -> Go to the web page and create his own instance -> The web page calls the AWS API -> AWS creates an instance1 for client1 which is constantly running.
Note: I am looking for a cheaper service for it. It may be something between the EC2 instance and the Lambda function (the Labmda function doesn't work because it needs to be autonomous).
If the Python instance script fails for any reason, it should restart automatically.
Also the client1 has the option to turn the script off and on whenever he wants.
Thank you very much in advance!
Elastic Beanstalk is the equivalent of Heroku.
https://aws.amazon.com/elasticbeanstalk/

Azure function & Blob Trigger times out

I have some trouble understanding how Function app is working ..
My environment is as follow : Python3.8, Blob Trigger, Cosumption Plan
I am creating an application which is trigger when an audio file is uploaded into a container. This audio file trigger an Azure function and run "Speech-To-Text" using Azure cognitive service (So my function is waiting for an answer from that service). I set up a "FUNCTIONS_WORKER_PROCESS_COUNT" at 5 in order to allow each of my function app instance to run several speech-to text analysis in parallel.
SO I uploaded 100 blob into my container to check my function behaviour, here is what I get :
Function app is triggered and start several servers (5 for 100 blobs) and then start processing 1 blob per server until it has been more than 30 minutes since I uploaded blob and then I get a Timeout.
But I was expecting this behaviour :
Function app is triggered and start several servers. Each servers process 5 blobs in parallel and give me an answer in 15 to 20 minutes for all of my blobs !
So I don't get 2 things here. W
Why are my functions not processing 5 blobs per server instead of 1 blob per server ? (I set up "FUNCTIONS_WORKER_PROCESS_COUNT" at 5) ?
And my blobs seems to be processed as soon as they appear in the container instead of putting in a queue. And this behaviour is responsible of time out since they are waiting for quite a long time instead of being processed .. WHy ?
I hope I was clear ..
Thank you for your help !
EDIT : I just added 100 blobs to see how function app is reacting, and my freshly uploaded blobs are being processed before the ones that I uploaded at the begining.
1. For your first question:
As far as my understanding, Python is a single-threaded runtime.
Because Python is a single-threaded runtime, a host instance for
Python can process only one function invocation at a time. For
applications that process a large number of I/O events and/or is I/O
bound, you can improve performance significantly by running functions
asynchronously.
And it is right that the FUNCTIONS_WORKER_PROCESS_COUNT makes you run 5 blob triggered functions per host, but it would run one by one if you are using the same resource, which means even you could run 5 process at the same time, if the first process is running your function, the second process (run the same function) would waiting; if your first process is waiting for the data come in, the second process would running first.
Here is an article how FUNCTIONS_WORKER_PROCESS_COUNT works.
And you can check how many worker instances you are using. If you have 100 blobs to trigger your function and 5 worker process set per worker instance, it should starts 20 instances for consuming the request. (Welcome to correct me if I'm wrong.)
2. For your second question:
The Blob storage trigger starts a function when a new or updated blob
is detected.
That's how blob triggered function works.

Invoke AWS SAM local function from another SAM local function

I am trying to create an AWS SAM app with multiple AWS Serverless functions.
The app has 1 template.yaml file which have resource of 2 different serverless lambda functions, for instance "Consumer Lambda" and "Worker Lambda". Consumer gets triggered at a rate of 5 minutes. The consumer uses boto3 library to trigger the worker lambda function. This code works when the worker lambda is deployed on AWS.
But I want to test both the functions locally with Sam local invoke "Consumer" which invokes "Worker" also locally.
Here's a screenshot of the YAML file:
I am using Pycharm to run the project. There is an option to run only 1 function at a time which then creates only one folder in the build folder.
I have to test if Consumer is able to invoke worker locally in pycharm before deployment. I think there is some way to do it but not sure how to. I did some extensive search but didn't yield anything.
Any help is appreciated. Thanks in advance
You can start the lambda invoke endpoint in the following way (official docs):
sam local start-lambda
Now you can point your AWS resource client to port 3001 and trigger the functions locally.
For eg. If you are doing this on Python, it can be acheived in the following way with boto3:
boto3
# Create a lambda client
lambda_client = boto3.client('lambda',
region_name="<localhost>",
endpoint_url="<http://127.0.0.1:3001>",
use_ssl=False,
verify=False)
# Invoke the function
lambda_client.invoke(FunctionName=<function_name>,
Payload=<lambda_payload>)

Run parallel Python code on multiple AWS instances

I have a Python algorithm that can be parallelized fairly easily.
I don't have the resources locally to run the whole thing in an acceptable time frame.
For each work unit, I would like to be able to:
Launch an AWS instance (EC2?)
Send input data to the instance
Run the Python code with the data as input
Return the result and aggregate it when all instances are done
What is the best way to do this?
Is AWS Lambda used for this purpose? Can this be done only with Boto3?
I am completely lost here.
Thank you
A common architecture for running tasks in parallel is:
Put inputs into an Amazon SQS queue
Run workers on multiple Amazon EC2 instances that:
Retrieve a message from the SQS queue
Process the data
Write results to Amazon S3
Delete the message from the SQS queue (to signify that the job is complete)
You can then retrieve all the results from Amazon S3. Depending on their format, you could even use Amazon Athena to run SQL queries against all the output files simultaneously.
You could even run multiple workers on the same instance if each worker is single-threaded and there is spare CPU available.

Execute lambda function on local device using greengrass

I am trying to learn AWS greengrass and so I was following this tutorial https://docs.aws.amazon.com/greengrass/latest/developerguide/gg-gs.html which explains step by step on setting up with greengrass on raspberry pi and publishing some messages using a lambda function.
A simple lambda function is as following :
import greengrasssdk
import platform
from threading import Timer
import time
# Creating a greengrass core sdk client
client = greengrasssdk.client('iot-data')
# Retrieving platform information to send from Greengrass Core
my_platform = platform.platform()
def greengrass_hello_world_run():
if not my_platform:
client.publish(topic='hello/world', payload='hello Sent from Greengrass Core.')
else:
client.publish(topic='hello/world', payload='hello Sent from Greengrass Core running on platform: {}'.format(my_platform))
# Asynchronously schedule this function to be run again in 5 seconds
Timer(5, greengrass_hello_world_run).start()
# Execute the function above
greengrass_hello_world_run()
# This is a dummy handler and will not be invoked
# Instead the code above will be executed in an infinite loop for our example
def function_handler(event, context):
return
Here this works but I am trying to understand it better by having a lambda function to do some extra work for example opening a file and writing to it.
I modified the greengrass_hello_world_run() function as following
def greengrass_hello_world_run():
if not my_platform:
client.publish(topic='hello/world', payload='hello Sent from Greengrass Core.')
else:
stdout = "hello from greengrass\n"
with open('/home/pi/log', 'w') as file:
for line in stdout:
file.write(line)
client.publish(topic='hello/world', payload='hello Sent from Greengrass Core running on platform: {}'.format(my_platform))
I expect upon deploying, the daemon running on my local pi should create that file in the given directory coz I believe greengrass core tries to run this lambda function on local device. However it doesnt create any file nor it publish anything coz I believe this code might be breaking. Not sure how though, I tried looking into cloudwatch but I dont see any events or errors being reported.
Any help on this would be really appreciated,
cheers !
A few thoughts on this...
If you turn on the local logging in your GG group set up it will start to write logs locally on your PI. The settings are:
The logs are located in: /greengrass/ggc/var/log/system
If you tail the python_runtime.log you can see any errors from the lambda execution.
If you want to access local resources you will need to create a resource in your GG group definition. You can then give this access to a volume which is where you can write your file.
You do need to deploy your group once this has been done for the changes to take effect.
I think I found the answer, we have to create the resource in lambda environment and also make sure to give read and write access to lambda for accessing that resource. By default lambda can only access /tmp folder.
Here is the link to the documentation
https://docs.aws.amazon.com/greengrass/latest/developerguide/access-local-resources.html

Categories

Resources