Lambda function completes run. Then runs again and crashes

Lambda function completes run. Then runs again and crashes - python

I have a lambda function that calls a docker image from ECR. I ran the docker image on my local machine and it runs fine but the moment test on Lambda it runs and finishes and then runs again.
Here is the error
'updatedRows': 1, 'updatedColumns': 1, 'updatedCells': 1}}
done
END RequestId: c20b4f94-0b27-4edc-bff6-e411d6d163f1
REPORT RequestId: c20b4f94-0b27-4edc-bff6-e411d6d163f1 Duration: 305675.98 ms Billed Duration: 305676 ms Memory Size: 1024 MB Max Memory Used: 206 MB
RequestId: c20b4f94-0b27-4edc-bff6-e411d6d163f1 Error: Runtime exited without providing a reason
Runtime.ExitError
Done indicates the program has been completed. I have set the timeout to 15 mins but it doesnt take that long and I dont get a time out error.
Here is the docker code
# Dockerfile, Image, container
FROM python:3.9
COPY . /opt/app
WORKDIR /opt/app
RUN pip install -r ./requirements.txt
CMD ["python", "./run.py"]
I have checked and I dont call the function anywhere except in run.py.
All run.py does is call the function.
from dev_main import runJobs as run
run()
and in dev_main.py I dont call any functions

So I first have to admit I am really disappointed with the response I got from here and the AWS discord. Toxicity in this community can be horrible.
Anyways. This is a very unique issue and lambda can be really unique in how it works and executes, this article outlines it quite nicely https://docs.aws.amazon.com/lambda/latest/dg/images-create.html
The main issue was my docker file.
In the Dockerfile I had
CMD ["python", "./run.py"]
which would run just fine with lambda, but then lambda would treat it as some sort of error, but not give any good form of feedback.
The issue is that you need this format
in run.py I needed this function:
def lambda_handler(event, context):
# TODO implement
run()
and in my Dockerfile I needed
FROM public.ecr.aws/lambda/python:3.8
# Copy function code
COPY dev_main.py ${LAMBDA_TASK_ROOT}
COPY run.py ${LAMBDA_TASK_ROOT}
# Install the function's dependencies using file requirements.txt
# from your project folder.
COPY keys.json .
COPY requirements.txt .
RUN pip3 install -r requirements.txt --target "${LAMBDA_TASK_ROOT}"
# Set the CMD to your handler (could also be done as a parameter override outside of the Dockerfile)
CMD [ "run.lambda_handler" ]

Related

Using Ray Multiprocessing on Docker

I am trying to parallelize processes using Ray in a docker container.
from ray.util.multiprocessing import Pool
with Pool(self.n_cores) as pool:
pool.starmap(func=func, iterable=list_of_parameters)
while it is perfectly working locally, when it gets run in the docker container, the following error occurs:
✗ failed cod_simulation-run
Job 'cod_simulation-run' failed:
| AttributeError: 'StreamCapture' object has no attribute 'fileno'
Write "details cod_simulation-run" to inspect the error.
Processed 1 jobs (1 failed).
I was previously performing the same thing with python multiprocessing:
import multiprocessing as mp
with mp.Pool(self.n_cores) as pool:
pool.starmap(func=func, iterable=list_of_parameters)
and this worked both locally and in the docker container. But for efficiency reasons, I would prefer to stick to Ray.
FROM python:3.9
WORKDIR /project_name
COPY requirements.txt .
RUN pip install -r requirements.txt
COPY . .
RUN find .
ENV DISABLE_CONTRACTS=1
RUN pipdeptree
RUN python setup.py develop --no-deps
RUN cod-demo --help
CMD ["cod-demo"]
This is my DockerFile and I am installing ray as a requirement.
Thanks in advance for any suggestion

After pulling the appropriate tag based on your needs from their repo on dockerhub. You can simply run tests with:
docker run --shm-size=<shm-size> -t -i rayproject/ray
or
docker run --shm-size=<shm-size> -t -i --gpus all rayproject/ray:<ray-version>-gpu
In case you're using GPU. If you need some additional packages which aren't pre-installed on images you must install them through the terminal inside the image or fabricate a new docker-file with the downloaded image as the base image (the FROM command argument).
According to what was mentioned here!

AWS Lambda in Python : handler from parent directory

I'm using Docker as a container to a repo with the following structure:
engine
--models_api
----alpha_model_api
--------model_api.py
----api_utils
--------utils.py
--models_logic
----models_utils
--------utils.py
----model_ids.json
--Dockerfile
model_api.api is the lambda handler. It uses the models_api.api_utils.utils module to get some data.
The models_api.api_utils.utils uses models_logic.models_utils.utils to provide that data, which is been deserialized from models_logic.model_ids.json .
The problem is I don't know how an object living inside the api_utils scope can know how to get from the working directory (${LAMBDA_TASK_ROOT}/models_api/alpha_model_api/model_api) to the model_ids.json without assuming its running in a specific directory.
The ideal situation for me is to find a way to call the aws lambda handler (model_api.api) with the working directory being ${LAMBDA_TASK_ROOT}
Here's the dockerfile:
FROM public.ecr.aws/lambda/python:3.8
COPY . ${LAMBDA_TASK_ROOT}
ENV PYTHONPATH="$PYTHONPATH:${LAMBDA_TASK_ROOT}"
RUN pip install -r ./models_api/alpha_api/requirements.txt --target ./models_api/alpha_api
WORKDIR ${LAMBDA_TASK_ROOT}/models_api/alpha_api
CMD ["model_api.api"]
I want to be able to call the lambda handler without setting the WORKDIR variable.
Is it even possible?

AWS Lambda Python Docker image: timeout

I am trying to deploy a function that converts strings into vectors to AWS Lambda:
def _encode(text: str):
[<main functionality>]
#lru_cache
def encode(text: str):
return _encode(text)
def handler(event, context):
return encode(event["text"])
This function works as expected when I call it in the Python shell:
import app
app.handler({"text"}, None)
<expected result>
The encode() functionally actually is complex and requires external dependencies (>1GB) which is why I am going for the Docker image approach as described in the documentation. This is my Dockerfile:
FROM amazon/aws-lambda-python:3.9
# Install requirements
RUN python -m pip install --upgrade pip
COPY requirements.txt ${LAMBDA_TASK_ROOT}
RUN python -m pip install -r requirements.txt --target "${LAMBDA_TASK_ROOT}"
# Copy app
COPY app.py ${LAMBDA_TASK_ROOT}
CMD [ "app.handler" ]
Build the Docker image and run it:
$ docker build -t my-image:latest .
[...]
Successfully built xxxx
Successfully tagged my-image:latest
$ docker run -p 9000:8080 my-image:latest
time="2021-10-07T10:12:13.165" level=info msg="exec '/var/runtime/bootstrap' (cwd=/var/task, handler=)"
So I try to test the function locally with curl, following the testing documentation, and it succeeds
I've pushed the image to AWS ECR and created a Kambda function. I've created a test in the AWS Lambda console:
{
"text": "doc"
}
When I run the test in the AWS Lambda Console, it times out, even after increasing the timeout to 60s. Locally, the function takes less than 1 second to execute.
On AWS Lambda, only the timeout is logged, I don't see how to get to the root cause. How can I debug this? How can I get more useful logs?

Python process never exits in Docker container during CircleCI workflow

I have a Dockerfile that looks like this:
FROM python:3.6
WORKDIR /app
ADD . /app/
# Install system requirements
RUN apt-get update && \
xargs -a requirements_apt.txt apt-get install -y
# Install Python requirements
RUN python -m pip install --upgrade pip
RUN python -m pip install -r requirements_pip.txt
# Circle CI ignores entrypoints by default
ENTRYPOINT ["dostuff"]
I have a CircleCI config that does:
version: 2.1
orbs:
aws-ecr: circleci/aws-ecr#6.15.3
jobs:
benchmark_tests_dev:
docker:
- image: blah_blah_image:test_dev
#auth
steps:
- checkout
- run:
name: Compile and run benchmarks
command: make bench
workflows:
workflow_test_and_deploy_dev:
jobs:
- aws-ecr/build-and-push-image:
name: build_test_dev
context: my_context
account-url: AWS_ECR_ACCOUNT_URL
region: AWS_REGION
repo: my_repo
aws-access-key-id: AWS_ACCESS_KEY_ID
aws-secret-access-key: AWS_SECRET_ACCESS_KEY
dockerfile: Dockerfile
tag: test_dev
filters:
branches:
only: my-test-branch
- benchmark_tests_dev:
requires: [build_test_dev]
context: my_context
filters:
branches:
only: my-test-branch
- aws-ecr/build-and-push-image:
name: deploy_dev
requires: [benchmark_tests_dev]
context: my_context
account-url: AWS_ECR_ACCOUNT_URL
region: AWS_REGION
repo: my_repo
aws-access-key-id: AWS_ACCESS_KEY_ID
aws-secret-access-key: AWS_SECRET_ACCESS_KEY
dockerfile: Dockerfile
tag: test2
filters:
branches:
only: my-test-branch
make bench looks like:
bench:
python tests/benchmarks/bench_1.py
python tests/benchmarks/bench_2.py
Both benchmark tests follow this pattern:
# imports
# define constants
# Define functions/classes
if __name__ == "__main__":
# Run those tests
If I build my Docker container on my-test-branch locally, override the entrypoint to get inside of it, and run make bench from inside the container, both Python scripts execute perfectly and exit.
If I commit to the same branch and trigger the CircleCI workflow, the bench_1.py runs and then never exits. I have tried switching the order of the Python scripts in the make command. In that case, bench_2.py runs and then never exits. I have tried putting a sys.exit() at the end of the if __name__ == "__main__": block of both scripts and that doesn't force an exit on CircleCI. I the first script to be run will run to completion because I have placed logs throughout the script to track progress. It just never exits.
Any idea why these scripts would run and exit in the container locally but not exit in the container on CircleCI?
EDIT
I just realized "never exits" is an assumption I'm making. It's possible the script exits but the CircleCI job hangs silently after that? The point is the script runs, finishes, and the CircleCI job continues to run until I get a timeout error at 10 minutes (Too long with no output (exceeded 10m0s): context deadline exceeded).

Turns out the snowflake.connector Python lib we were using has this issue where if an error occurs during an open Snowflake connection, the connection is not properly closed and the process hangs. There is also another issue where certain errors in that lib are being logged and not raised, causing the first issue to occur silently.
I updated our snowflake IO handler to explicitly open/close a connection for every read/execute so that this doesn't happen. Now my scripts run just fine in the container on CircleCI. I still don't know why they ran in the container locally and not remotely, but I'm going to leave that one for the dev ops gods.

AWS Batch:/usr/local/bin/python: cannot execute binary file

I built an AWS Batch compute environment. I want to run a python script in jobs.
Here is the docker file I'm using :
FROM python:slim
RUN apt-get update
RUN pip install boto3 matplotlib awscli
COPY runscript.py /
ENTRYPOINT ["/bin/bash"]
The command in my task definition is :
python /runscript.py
When I submit a job in AWS console I get this error in CloudWatch:
/usr/local/bin/python: /usr/local/bin/python: cannot execute binary file
And the job gets the status FAILED.
What is going wrong? I run the container locally and I can launch the script without any errors.

Delete your ENTRYPOINT line. But replace it with the CMD that says what the container is actually doing.
There are two parts to the main command that a Docker container runs, ENTRYPOINT and CMD; these are combined together into one command when the container starts. The command your container is running is probably something like
/bin/bash python /runscript.py
So bash finds a python in its $PATH (successfully), and tries to run it as a shell script (leading to that error).
You don't strictly need an ENTRYPOINT, and here it's causing trouble. Conversely there's a single thing you usually want the container to do, so you should just specify it in the Dockerfile.
# No ENTRYPOINT
CMD ["python", "/runscript.py"]

You can try with following docker file and task definition.
Docker File
FROM python:slim
RUN apt-get update
RUN pip install boto3 matplotlib awscli
COPY runscript.py /
CMD ["/bin/python"]
Task Definition
['/runscript.py']
By passing script name in task definition will give you flexibility to run any script while submitting a job. Please refer below example to submit a job and override task definition.
import boto3
session = boto3.Session()
batch_client = session.client('batch')
response = batch_client.submit_job(
jobName=job_name,
jobQueue=AWS_BATCH_JOB_QUEUE,
jobDefinition=AWS_BATCH_JOB_DEFINITION,
containerOverrides={
'command': [
'/main.py'
]
}
)

Develop Reference

Python is a programming language that lets you work quickly and integrate systems more effectively.

Lambda function completes run. Then runs again and crashes - python

Related

Using Ray Multiprocessing on Docker

AWS Lambda in Python : handler from parent directory

AWS Lambda Python Docker image: timeout

Python process never exits in Docker container during CircleCI workflow

AWS Batch:/usr/local/bin/python: cannot execute binary file

Categories

Resources