AWS Lambda Python Docker image: timeout - python

I am trying to deploy a function that converts strings into vectors to AWS Lambda:
def _encode(text: str):
[<main functionality>]
#lru_cache
def encode(text: str):
return _encode(text)
def handler(event, context):
return encode(event["text"])
This function works as expected when I call it in the Python shell:
import app
app.handler({"text"}, None)
<expected result>
The encode() functionally actually is complex and requires external dependencies (>1GB) which is why I am going for the Docker image approach as described in the documentation. This is my Dockerfile:
FROM amazon/aws-lambda-python:3.9
# Install requirements
RUN python -m pip install --upgrade pip
COPY requirements.txt ${LAMBDA_TASK_ROOT}
RUN python -m pip install -r requirements.txt --target "${LAMBDA_TASK_ROOT}"
# Copy app
COPY app.py ${LAMBDA_TASK_ROOT}
CMD [ "app.handler" ]
Build the Docker image and run it:
$ docker build -t my-image:latest .
[...]
Successfully built xxxx
Successfully tagged my-image:latest
$ docker run -p 9000:8080 my-image:latest
time="2021-10-07T10:12:13.165" level=info msg="exec '/var/runtime/bootstrap' (cwd=/var/task, handler=)"
So I try to test the function locally with curl, following the testing documentation, and it succeeds
I've pushed the image to AWS ECR and created a Kambda function. I've created a test in the AWS Lambda console:
{
"text": "doc"
}
When I run the test in the AWS Lambda Console, it times out, even after increasing the timeout to 60s. Locally, the function takes less than 1 second to execute.
On AWS Lambda, only the timeout is logged, I don't see how to get to the root cause. How can I debug this? How can I get more useful logs?

Related

Lambda function completes run. Then runs again and crashes

I have a lambda function that calls a docker image from ECR. I ran the docker image on my local machine and it runs fine but the moment test on Lambda it runs and finishes and then runs again.
Here is the error
'updatedRows': 1, 'updatedColumns': 1, 'updatedCells': 1}}
done
END RequestId: c20b4f94-0b27-4edc-bff6-e411d6d163f1
REPORT RequestId: c20b4f94-0b27-4edc-bff6-e411d6d163f1 Duration: 305675.98 ms Billed Duration: 305676 ms Memory Size: 1024 MB Max Memory Used: 206 MB
RequestId: c20b4f94-0b27-4edc-bff6-e411d6d163f1 Error: Runtime exited without providing a reason
Runtime.ExitError
Done indicates the program has been completed. I have set the timeout to 15 mins but it doesnt take that long and I dont get a time out error.
Here is the docker code
# Dockerfile, Image, container
FROM python:3.9
COPY . /opt/app
WORKDIR /opt/app
RUN pip install -r ./requirements.txt
CMD ["python", "./run.py"]
I have checked and I dont call the function anywhere except in run.py.
All run.py does is call the function.
from dev_main import runJobs as run
run()
and in dev_main.py I dont call any functions
So I first have to admit I am really disappointed with the response I got from here and the AWS discord. Toxicity in this community can be horrible.
Anyways. This is a very unique issue and lambda can be really unique in how it works and executes, this article outlines it quite nicely https://docs.aws.amazon.com/lambda/latest/dg/images-create.html
The main issue was my docker file.
In the Dockerfile I had
CMD ["python", "./run.py"]
which would run just fine with lambda, but then lambda would treat it as some sort of error, but not give any good form of feedback.
The issue is that you need this format
in run.py I needed this function:
def lambda_handler(event, context):
# TODO implement
run()
and in my Dockerfile I needed
FROM public.ecr.aws/lambda/python:3.8
# Copy function code
COPY dev_main.py ${LAMBDA_TASK_ROOT}
COPY run.py ${LAMBDA_TASK_ROOT}
# Install the function's dependencies using file requirements.txt
# from your project folder.
COPY keys.json .
COPY requirements.txt .
RUN pip3 install -r requirements.txt --target "${LAMBDA_TASK_ROOT}"
# Set the CMD to your handler (could also be done as a parameter override outside of the Dockerfile)
CMD [ "run.lambda_handler" ]

Using Ray Multiprocessing on Docker

I am trying to parallelize processes using Ray in a docker container.
from ray.util.multiprocessing import Pool
with Pool(self.n_cores) as pool:
pool.starmap(func=func, iterable=list_of_parameters)
while it is perfectly working locally, when it gets run in the docker container, the following error occurs:
✗ failed cod_simulation-run
Job 'cod_simulation-run' failed:
| AttributeError: 'StreamCapture' object has no attribute 'fileno'
Write "details cod_simulation-run" to inspect the error.
Processed 1 jobs (1 failed).
I was previously performing the same thing with python multiprocessing:
import multiprocessing as mp
with mp.Pool(self.n_cores) as pool:
pool.starmap(func=func, iterable=list_of_parameters)
and this worked both locally and in the docker container. But for efficiency reasons, I would prefer to stick to Ray.
FROM python:3.9
WORKDIR /project_name
COPY requirements.txt .
RUN pip install -r requirements.txt
COPY . .
RUN find .
ENV DISABLE_CONTRACTS=1
RUN pipdeptree
RUN python setup.py develop --no-deps
RUN cod-demo --help
CMD ["cod-demo"]
This is my DockerFile and I am installing ray as a requirement.
Thanks in advance for any suggestion
After pulling the appropriate tag based on your needs from their repo on dockerhub. You can simply run tests with:
docker run --shm-size=<shm-size> -t -i rayproject/ray
or
docker run --shm-size=<shm-size> -t -i --gpus all rayproject/ray:<ray-version>-gpu
In case you're using GPU. If you need some additional packages which aren't pre-installed on images you must install them through the terminal inside the image or fabricate a new docker-file with the downloaded image as the base image (the FROM command argument).
According to what was mentioned here!

AWS Batch:/usr/local/bin/python: cannot execute binary file

I built an AWS Batch compute environment. I want to run a python script in jobs.
Here is the docker file I'm using :
FROM python:slim
RUN apt-get update
RUN pip install boto3 matplotlib awscli
COPY runscript.py /
ENTRYPOINT ["/bin/bash"]
The command in my task definition is :
python /runscript.py
When I submit a job in AWS console I get this error in CloudWatch:
/usr/local/bin/python: /usr/local/bin/python: cannot execute binary file
And the job gets the status FAILED.
What is going wrong? I run the container locally and I can launch the script without any errors.
Delete your ENTRYPOINT line. But replace it with the CMD that says what the container is actually doing.
There are two parts to the main command that a Docker container runs, ENTRYPOINT and CMD; these are combined together into one command when the container starts. The command your container is running is probably something like
/bin/bash python /runscript.py
So bash finds a python in its $PATH (successfully), and tries to run it as a shell script (leading to that error).
You don't strictly need an ENTRYPOINT, and here it's causing trouble. Conversely there's a single thing you usually want the container to do, so you should just specify it in the Dockerfile.
# No ENTRYPOINT
CMD ["python", "/runscript.py"]
You can try with following docker file and task definition.
Docker File
FROM python:slim
RUN apt-get update
RUN pip install boto3 matplotlib awscli
COPY runscript.py /
CMD ["/bin/python"]
Task Definition
['/runscript.py']
By passing script name in task definition will give you flexibility to run any script while submitting a job. Please refer below example to submit a job and override task definition.
import boto3
session = boto3.Session()
batch_client = session.client('batch')
response = batch_client.submit_job(
jobName=job_name,
jobQueue=AWS_BATCH_JOB_QUEUE,
jobDefinition=AWS_BATCH_JOB_DEFINITION,
containerOverrides={
'command': [
'/main.py'
]
}
)

How to install external modules in a Python Lambda Function created by AWS CDK?

I'm using the Python AWS CDK in Cloud9 and I'm deploying a simple Lambda function that is supposed to send an API request to Atlassian's API when an Object is uploaded to an S3 Bucket (also created by the CDK). Here is my code for CDK Stack:
from aws_cdk import core
from aws_cdk import aws_s3
from aws_cdk import aws_lambda
from aws_cdk.aws_lambda_event_sources import S3EventSource
class JiraPythonStack(core.Stack):
def __init__(self, scope: core.Construct, id: str, **kwargs) -> None:
super().__init__(scope, id, **kwargs)
# The code that defines your stack goes here
jira_bucket = aws_s3.Bucket(self,
"JiraBucket",
encryption=aws_s3.BucketEncryption.KMS)
event_lambda = aws_lambda.Function(
self,
"JiraFileLambda",
code=aws_lambda.Code.asset("lambda"),
handler='JiraFileLambda.handler',
runtime=aws_lambda.Runtime.PYTHON_3_6,
function_name="JiraPythonFromCDK")
event_lambda.add_event_source(
S3EventSource(jira_bucket,
events=[aws_s3.EventType.OBJECT_CREATED]))
The lambda function code uses the requests module which I've imported. However, when I check the CloudWatch Logs, and test the lambda function - I get:
Unable to import module 'JiraFileLambda': No module named 'requests'
My Question is: How do I install the requests module via the Python CDK?
I've already looked around online and found this. But it seems to directly modify the lambda function, which would result in a Stack Drift (which I've been told is BAD for IaaS). I've also looked at the AWS CDK Docs too but didn't find any mention of external modules/libraries (I'm doing a thorough check for it now) Does anybody know how I can work around this?
Edit: It would appear I'm not the only one looking for this.
Here's another GitHub issue that's been raised.
It is not even necessary to use the experimental PythonLambda functionality in CDK - there is support built into CDK to build the dependencies into a simple Lambda package (not a docker image). It uses docker to do the build, but the final result is still a simple zip of files. The documentation shows it here: https://docs.aws.amazon.com/cdk/api/latest/docs/aws-lambda-readme.html#bundling-asset-code ; the gist is:
new Function(this, 'Function', {
code: Code.fromAsset(path.join(__dirname, 'my-python-handler'), {
bundling: {
image: Runtime.PYTHON_3_9.bundlingImage,
command: [
'bash', '-c',
'pip install -r requirements.txt -t /asset-output && cp -au . /asset-output'
],
},
}),
runtime: Runtime.PYTHON_3_9,
handler: 'index.handler',
});
I have used this exact configuration in my CDK deployment and it works well.
And for Python, it is simply
aws_lambda.Function(
self,
"Function",
runtime=aws_lambda.Runtime.PYTHON_3_9,
handler="index.handler",
code=aws_lambda.Code.from_asset(
"function_source_dir",
bundling=core.BundlingOptions(
image=aws_lambda.Runtime.PYTHON_3_9.bundling_image,
command=[
"bash", "-c",
"pip install --no-cache -r requirements.txt -t /asset-output && cp -au . /asset-output"
],
),
),
)
UPDATE:
It now appears as though there is a new type of (experimental) Lambda Function in the CDK known as the PythonFunction. The Python docs for it are here. And this includes support for adding a requirements.txt file which uses a docker container to add them to your function. See more details on that here. Specifically:
If requirements.txt or Pipfile exists at the entry path, the construct will handle installing all required modules in a Lambda compatible Docker container according to the runtime.
Original Answer:
So this is the awesome bit of code my manager wrote that we now use:
def create_dependencies_layer(self, project_name, function_name: str) -> aws_lambda.LayerVersion:
requirements_file = "lambda_dependencies/" + function_name + ".txt"
output_dir = ".lambda_dependencies/" + function_name
# Install requirements for layer in the output_dir
if not os.environ.get("SKIP_PIP"):
# Note: Pip will create the output dir if it does not exist
subprocess.check_call(
f"pip install -r {requirements_file} -t {output_dir}/python".split()
)
return aws_lambda.LayerVersion(
self,
project_name + "-" + function_name + "-dependencies",
code=aws_lambda.Code.from_asset(output_dir)
)
It's actually part of the Stack class as a method (not inside the init). The way we have it set up here is that we have a folder called lambda_dependencies which contains a text file for every lambda function we are deploying which just has a list of dependencies, like a requirements.txt.
And to utilise this code, we include in the lambda function definition like this:
get_data_lambda = aws_lambda.Function(
self,
.....
layers=[self.create_dependencies_layer(PROJECT_NAME, GET_DATA_LAMBDA_NAME)]
)
You should install the dependencies of your lambda locally before deploying the lambda via CDK. CDK does not have idea how to install the dependencies and which libraries should be installed.
In you case, you should install the dependency requests and other libraries before executing cdk deploy.
For example,
pip install requests --target ./asset/package
There is an example for reference.
Wanted to share 2 template repos I made for this (heavily inspired by some of the above):
https://github.com/iguanaus/cdk-ecs-python-with-requirements- - demo of ecs service of basic python function
https://github.com/iguanaus/cdk-lambda-python-with-requirements - demo of lambda python job with requirements.
Hope they are helpful for folks :)
Lastly; if you want to see a long thread on this subject, see here: https://github.com/aws/aws-cdk/issues/3660
I ran into this issue as well. I used a solution like #Kane and #Jamie suggest just fine when I was working on my ubuntu machine. However, I ran into issue when working on MacOS. Apparently some (all?) python packages don't work on lambda (linux env) if they are pip installeded on a different os (see stackoverflow post)
My solution was to run the pip install inside a docker container. This allowed me to cdk deploy from my macbook and not run into issues with my python packages in lambda.
suppose you have a dir lambda_layers/python in your cdk project that will house your python packages for the lambda layer.
current_path = str(pathlib.Path(__file__).parent.absolute())
pip_install_command = ("docker run --rm --entrypoint /bin/bash -v "
+ current_path
+ "/lambda_layers:/lambda_layers python:3.8 -c "
+ "'pip3 install Pillow==8.1.0 -t /lambda_layers/python'")
subprocess.run(pip_install_command, shell=True)
lambda_layer = aws_lambda.LayerVersion(
self,
"PIL-layer",
compatible_runtimes=[aws_lambda.Runtime.PYTHON_3_8],
code=aws_lambda.Code.asset("lambda_layers"))
As an alternative to my other answer, here's a slightly different approach that also works with docker-in-docker (the bundling-options approach doesn't).
Set up the Lambda function like
lambda_fn = aws_lambda.Function(
self,
"Function",
runtime=lambdas.Runtime.PYTHON_3_9,
code=lambdas.Code.from_docker_build(
"function_source_dir",
),
handler="index.lambda_handler",
)
and in function_source_dir/ have these files:
index.py (to match the above code - you can name this whatever you like)
requirements.txt
Dockerfile
Set up your Dockerfile like
# Note that this dockerfile is only used to build the lambda asset - the
# lambda still just runs with a zip source, not a docker image.
# See the docstring for aws_lambda.Code.from_docker_build
FROM public.ecr.aws/lambda/python:3.9.2022.04.27.10-x86_64
COPY index.py /asset/
COPY requirements.txt /tmp/
RUN pip3 install -r /tmp/requirements.txt -t /asset
and the synth step will build your asset in docker (using the above dockerfile) then pull the built Lambda source from the /asset/ directory in the image.
I haven't looked into too much detail about why the BundlingOptions approach fails to build when running inside a docker container, but this one does work (as long as docker is run with -v /var/run/docker.sock:/var/run/docker.sock to enable docker-in-docker). As always, be sure to consider your security posture when doing this.

Openwhisk Docker different behavior from IBM cloud CLI than its frontend

I want to run my Python program on IBM cloud functions, because of dependencies this needs to be done in an OpenWhisk Docker. I've changed my code so that it accepts a json:
json_input = json.loads(sys.argv[1])
INSTANCE_NAME = json_input['INSTANCE_NAME']
I can run it from the terminal:
python main/main.py '{"INSTANCE_NAME": "example"}'
I've added this Python program to OpenWhisk with this Dockerfile:
# Dockerfile for example whisk docker action
FROM openwhisk/dockerskeleton
ENV FLASK_PROXY_PORT 8080
### Add source file(s)
ADD requirements.txt /action/requirements.txt
RUN cd /action; pip install -r requirements.txt
# Move the file to
ADD ./main /action
# Rename our executable Python action
ADD /main/main.py /action/exec
CMD ["/bin/bash", "-c", "cd actionProxy && python -u actionproxy.py"]
But now if I run it using IBM Cloud CLI I just get my Json back:
ibmcloud fn action invoke --result e2t-whisk --param-file ../env_var.json
# {"INSTANCE_NAME": "example"}
And if I run from the IBM Cloud Functions website with the same Json feed I get an error like it's not even there.
stderr: INSTANCE_NAME = json_input['INSTANCE_NAME']",
stderr: KeyError: 'INSTANCE_NAME'"
What can be wrong that the code runs when directly invoked, but not from the OpenWhisk container?

Categories

Resources