AWS Codebuild not progressing in build phase - python

I have the following buildspec.yml file
version: 0.2
env:
parameter-store:
s3DestFileName: "/CodeBuild/s3DestFileName"
s3SourceFileName: "/CodeBuild/s3SourceFileName"
imgFileName: "/CodeBuild/imgFileName"
imgPickleFileName: "/CodeBuild/imgPickleFileName"
phases:
install:
on-failure: ABORT
runtime-versions:
python: 3.7
commands:
- echo Entered the install phase. Downloading new assets to /tmp
- aws s3 cp s3://xxxx/yyy/test.csv /tmp/test.csv
- aws s3 cp s3://xxxx/yyy/test2.csv /tmp/test2.csv
- ls -la /tmp
- curl -sSL https://raw.githubusercontent.com/python-poetry/poetry/master/get-poetry.py | python3 -
- export PATH=$PATH:$HOME/.poetry/bin
- poetry --version
- cd ./create-model/ && poetry install
build:
on-failure: ABORT
commands:
- echo Entered the build phase...
- echo Build started on `date`
- ls -la
- poetry run python3 knn.py
I'm using poetry to manage all my packages. I do not have any artifacts to be used.
This is the content of the knn.py file (or part of it actually)
import pandas as pd
print("started...")
df = pd.read_csv('/tmp/n1.csv', index_col=False)
print("df read...")
print(df.head())
I don't see any errors in the logs. The install phase runs fine, but when the build phase is started, I see that it has invoked the knn.py file.
I've waited for almost 30 mins but all I see in the log is "started..."
I dont see any of the print statements in the log. It probably is not progressing any further. I've tried using different AWS Managed images but its still the same result.
This code runs perfectly fine if I run it locally on my machine.
Edit:
I tried the advanced build override and I connected to the container using SSM. I installed pandas locally and ran the read_csv() and it worked. However, the command poetry run python3 knn.py from the buildspec.yml is still hanging

Related

How to integrate AWS CodeBuild with Python pytest-cov code coverage report in buildspec.yaml

I have a Python-based application that consists of:
Some Python source code that uses the AWS Boto3 SDK to interact with AWS resource
A Dockerfile that builds upon the AWS public.ecr.aws/lambda/python:3.9 image
An AWS SAM (Serverless Application Model) template that builds a lambda to execute the Docker image when the lambda is invoked
The first part of my build commands in the buildspec.yaml file are intended to execute all unit tests with a code coverage report. This works well.
I was able to integrate the unit test report with AWS CodeBuild using the reports section of the buildspec:
reports:
pytest_reports:
files:
- junitxml-report.xml
base-directory: ./pytest_reports
file-format: JUNITXML
This works as expected. I can see that a new "Report group" and the first report was created in CodeBuild after my code pipeline executed. Unfortunately, this only includes the unit test results report.
QUESTION: How do I integrate my Python code coverage report with CodeBuild via the buildspec.yaml file?
I have found some information on this AWS documentation page, but the list of code coverage report formats did not include anything that I can generate from a Python code coverage run. I am still somewhat new to Python development, so I was hoping an expert may have already solved this.
For reference, here is my complete buildspec.yaml file (with some sensitive values scrubbed):
version: 0.2
env:
variables:
# Elastic Container Registry (ECR) hosts
MAIN_REPO: 999999999999.dkr.ecr.us-east-1.amazonaws.com
DR_REPO: 999999999999.dkr.ecr.us-west-2.amazonaws.com
phases:
install:
runtime-versions:
python: 3.9
build:
on-failure: ABORT
commands:
# -------------------------------------------------------------------------------------------
# PART 1 - EXECUTE UNIT TESTS AND CODE COVERAGE ON THE PYTHON SOURCE CODE
# -------------------------------------------------------------------------------------------
# install/upgrade build-related modules that CodeBuild will use
- python3 -m pip install --upgrade pip
- python3 -m pip install --upgrade pytest
- python3 -m pip install --upgrade pytest-mock
- python3 -m pip install --upgrade pytest-cov
# do local user 'install' of source code, then run pytest (company-private Pypi repo must be explicitly included)
- pip install --extra-index-url https://artifactory.my-company-domain.com/artifactory/api/pypi/private-pypi/simple -e ./the_python_code
- python3 -m pytest --junitxml=./pytest_reports/junitxml-report.xml --cov-fail-under=69 --cov-report xml:pytest_reports/cov.xml --cov-report html:pytest_reports/cov_html --cov-report term-missing --cov=./the_python_code/src/ ./the_python_code
# -------------------------------------------------------------------------------------------
# PART 2 - BUILD THE DOCKER IMAGE AND PUBLISH TO ECR
# -------------------------------------------------------------------------------------------
# REFERENCE: https://docs.aws.amazon.com/AmazonECR/latest/userguide/docker-push-ecr-image.html
# Pre-authenticate access to Docker Hub and Elastic Container Registry for image pulls and pushes
- aws ecr get-login-password --region $AWS_DEFAULT_REGION | docker login --username AWS --password-stdin 999999999999.dkr.ecr.us-east-1.amazonaws.com
- docker image build -t 999999999999.dkr.ecr.us-east-1.amazonaws.com/my-docker-image-tag-name .
- docker push 999999999999.dkr.ecr.us-east-1.amazonaws.com/my-docker-image-tag-name
# -------------------------------------------------------------------------------------------
# PART 3 - BUILD THE SAM PROJECT
# -------------------------------------------------------------------------------------------
- printenv
- echo "-----------------------------------------------------"
- 'echo "ARTIFACTS_BUCKET_NAME : $ARTIFACTS_BUCKET_NAME"'
- 'echo "ARTIFACTS_BUCKET_PATH : $ARTIFACTS_BUCKET_PATH"'
- 'echo "CODEBUILD_KMS_KEY_ID : $CODEBUILD_KMS_KEY_ID"'
- echo "-----------------------------------------------------"
- MAIN_TEMPLATE="main-template.yaml"
- sam build --debug
- |
sam package \
--template-file .aws-sam/build/template.yaml \
--output-template-file "${MAIN_TEMPLATE}" \
--image-repository "999999999999.dkr.ecr.us-east-1.amazonaws.com/my-docker-image-tag-name" \
--s3-bucket "${ARTIFACTS_BUCKET_NAME}" \
--s3-prefix "${ARTIFACTS_BUCKET_PATH}" \
--kms-key-id "${CODEBUILD_KMS_KEY_ID}" \
--force-upload
reports:
pytest_reports:
files:
- junitxml-report.xml
base-directory: ./pytest_reports
file-format: JUNITXML
artifacts:
files:
- main-template.yaml
- parameters/*.json

How to install python and run a python file in a gitlab job

image : mcr.microsoft.com/dotnet/core/sdk:3.1
.deploy: &deploy
before_script:
- apt-get update -y
script:
- cd source/
- pip install -r requirements.txt
- python build_file.py > swagger.yml
I want to run the build_file.py file and write the output to swagger.yml. So to run the file I need to install python. How can I do that?
You can use a different Docker image for each job, so you can split your deployment stage into multiple jobs. In one use the python:3 image for example to run pip and generate the swagger.yml, then define it as an artifact that will be used by the next jobs.
Example (untested!) snippet:
deploy-swagger:
image: python:3
stage: deploy
script:
- cd source/
- pip install -r requirements.txt
- python build_file.py > swagger.yml
artifacts:
paths:
- source/swagger.yml
deploy-dotnet:
image: mcr.microsoft.com/dotnet/core/sdk:3.1
stage: deploy
dependencies:
- deploy-swagger
script:
- ls -l source/swagger.yml
- ...
You could (probably should) also make the swagger generation be part of previous stage and set an expiration for the artifact. See this blog post for example.

Python process never exits in Docker container during CircleCI workflow

I have a Dockerfile that looks like this:
FROM python:3.6
WORKDIR /app
ADD . /app/
# Install system requirements
RUN apt-get update && \
xargs -a requirements_apt.txt apt-get install -y
# Install Python requirements
RUN python -m pip install --upgrade pip
RUN python -m pip install -r requirements_pip.txt
# Circle CI ignores entrypoints by default
ENTRYPOINT ["dostuff"]
I have a CircleCI config that does:
version: 2.1
orbs:
aws-ecr: circleci/aws-ecr#6.15.3
jobs:
benchmark_tests_dev:
docker:
- image: blah_blah_image:test_dev
#auth
steps:
- checkout
- run:
name: Compile and run benchmarks
command: make bench
workflows:
workflow_test_and_deploy_dev:
jobs:
- aws-ecr/build-and-push-image:
name: build_test_dev
context: my_context
account-url: AWS_ECR_ACCOUNT_URL
region: AWS_REGION
repo: my_repo
aws-access-key-id: AWS_ACCESS_KEY_ID
aws-secret-access-key: AWS_SECRET_ACCESS_KEY
dockerfile: Dockerfile
tag: test_dev
filters:
branches:
only: my-test-branch
- benchmark_tests_dev:
requires: [build_test_dev]
context: my_context
filters:
branches:
only: my-test-branch
- aws-ecr/build-and-push-image:
name: deploy_dev
requires: [benchmark_tests_dev]
context: my_context
account-url: AWS_ECR_ACCOUNT_URL
region: AWS_REGION
repo: my_repo
aws-access-key-id: AWS_ACCESS_KEY_ID
aws-secret-access-key: AWS_SECRET_ACCESS_KEY
dockerfile: Dockerfile
tag: test2
filters:
branches:
only: my-test-branch
make bench looks like:
bench:
python tests/benchmarks/bench_1.py
python tests/benchmarks/bench_2.py
Both benchmark tests follow this pattern:
# imports
# define constants
# Define functions/classes
if __name__ == "__main__":
# Run those tests
If I build my Docker container on my-test-branch locally, override the entrypoint to get inside of it, and run make bench from inside the container, both Python scripts execute perfectly and exit.
If I commit to the same branch and trigger the CircleCI workflow, the bench_1.py runs and then never exits. I have tried switching the order of the Python scripts in the make command. In that case, bench_2.py runs and then never exits. I have tried putting a sys.exit() at the end of the if __name__ == "__main__": block of both scripts and that doesn't force an exit on CircleCI. I the first script to be run will run to completion because I have placed logs throughout the script to track progress. It just never exits.
Any idea why these scripts would run and exit in the container locally but not exit in the container on CircleCI?
EDIT
I just realized "never exits" is an assumption I'm making. It's possible the script exits but the CircleCI job hangs silently after that? The point is the script runs, finishes, and the CircleCI job continues to run until I get a timeout error at 10 minutes (Too long with no output (exceeded 10m0s): context deadline exceeded).
Turns out the snowflake.connector Python lib we were using has this issue where if an error occurs during an open Snowflake connection, the connection is not properly closed and the process hangs. There is also another issue where certain errors in that lib are being logged and not raised, causing the first issue to occur silently.
I updated our snowflake IO handler to explicitly open/close a connection for every read/execute so that this doesn't happen. Now my scripts run just fine in the container on CircleCI. I still don't know why they ran in the container locally and not remotely, but I'm going to leave that one for the dev ops gods.

How to avoid error "conda --version: conda not found" in az ml run --submit-script command?

I would like to run a test script on an existing compute instance of Azure using the Azure Machine Learning extension to the Azure CLI:
az ml run submit-script test.py --target compute-instance-test --experiment-name test_example --resource-group ex-test-rg
I get a Service Error with the following error message:
Unable to run conda package manager. AzureML uses conda to provision python\nenvironments from a dependency specification. To manage the python environment\nmanually instead, set userManagedDependencies to True in the python environment\nconfiguration. To use system managed python environments, install conda from:\nhttps://conda.io/miniconda.html
But when I connect to the compute instance through the Azure portal and select the default Python kernel, conda --version prints 4.5.12. So conda is effectively already installed on the compute instance. This is why I do not understand the error message.
Further information on the azure versions:
"azure-cli": "2.12.1",
"azure-cli-core": "2.12.1",
"azure-cli-telemetry": "1.0.6",
"extensions": {
"azure-cli-ml": "1.15.0"
}
The image I use is:
mcr.microsoft.com/azure-cli:latest
Can somebody please explain as to why I am getting this error and help me resolve the error? Thank you!
EDIT: I tried to update the environment in which the az ml run-command is run.
Essentially this is my GitLab job. The installation of miniconda is a bit complicated as the azure-cli uses an alpine Linux image (reference: Installing miniconda on alpine linux fails). I replaced some names with ... and cut out some irrelevant pieces of code.
test:
image: 'mcr.microsoft.com/azure-cli:latest'
script:
- echo "Download conda"
- apk --update add bash curl wget ca-certificates libstdc++ glib
- wget -q -O /etc/apk/keys/sgerrand.rsa.pub https://raw.githubusercontent.com/sgerrand/alpine-pkg-node-bower/master/sgerrand.rsa.pub
- curl -L "https://github.com/sgerrand/alpine-pkg-glibc/releases/download/2.23-r3/glibc-2.23-r3.apk" -o glibc.apk
- apk del libc6-compat
- apk add glibc.apk
- curl -L "https://github.com/sgerrand/alpine-pkg-glibc/releases/download/2.23-r3/glibc-bin-2.23-r3.apk" -o glibc-bin.apk
- apk add glibc-bin.apk
- curl -L "https://github.com/andyshinn/alpine-pkg-glibc/releases/download/2.25-r0/glibc-i18n-2.25-r0.apk" -o glibc-i18n.apk
- apk add --allow-untrusted glibc-i18n.apk
- /usr/glibc-compat/bin/localedef -i en_US -f UTF-8 en_US.UTF-8
- /usr/glibc-compat/sbin/ldconfig /lib /usr/glibc/usr/lib
- rm -rf glibc*apk /var/cache/apk/*
- echo "yes" | curl -sSL https://repo.continuum.io/miniconda/Miniconda3-latest-Linux-x86_64.sh -o miniconda.sh
- echo "Install conda"
- (echo -e "\n"; echo "yes"; echo -e "\n"; echo "yes") | bash -bfp miniconda.sh
- echo "Installing Azure Machine Learning Extension"
- az extension add -n azure-cli-ml
- echo "Azure Login"
- az login
- az account set --subscription ...
- az configure --defaults group=...
- az ml folder attach -w ...
- az ml run submit-script test.py --target ... --experiment-name hello_world --resource-group ...
You need conda in your base image for container based environment. You can extend the base image by installing conda using base_dockerfile instead of base_image
https://learn.microsoft.com/en-us/python/api/azureml-core/azureml.core.environment.dockersection?view=azure-ml-py
or, which if that works for you, use one of the AzureML base docker images.
If you do not need any python dependencies on top your base image you can set user_managed_dependencies to True and base image will be used as is and no additional dependencies will be installed
https://learn.microsoft.com/en-us/python/api/azureml-core/azureml.core.environment.pythonsection?view=azure-ml-py
One needs to pass the --workspace-name argument to be able to run it on Azure's compute target and not on the local compute target:
az ml run submit-script test.py --target compute-instance-test --experiment-name test_example --resource-group ex-test-rg --workspace-name test-ws
Use:
runconfig.environment.python.user_managed_dependencies = True
That should solve the issue

Local GitLab runner freezes while Shared GitLab.com runner succeeds

EDIT: As Rekovni pointed out, using a GitLab runner with Docker on a Windows machine is a problem. Installing the runner in a Linux-based virtual machine solved the problem.
I am developing a Python program using a conda environment. It is hosted on GitLab.com and I am using GitLab-CI to generate the documentation.
I configured the following .gitlab-ci.yml file for it:
image: continuumio/miniconda3:latest
before_script:
# Update conda and create environment, which is then activated.
- conda update -vvv -y -c conda-forge conda
- conda env create -f helpers/NAME.yml
- source activate NAME
# Correct installation.
- conda install -q -y gsl=2.2.1
pages:
script:
# Install make.
- apt-get update
- apt-get install -q -y build-essential
# Install Spinx-related packages.
- conda install -q -y sphinx sphinx_rtd_theme
# Create documentation.
- cd REPO/doc
- sphinx-apidoc -o source/ ../REPO --force --separate
- make html
# Transfer documentation to public pages folder.
- mv build/html/ ../../public/
artifacts:
paths:
- public
# only:
# - master
Running this script with a shared GitLab runner that is supplied with GitLab.com works and the documentation is generated and placed in the public folder.
For future unit tests (which take much longer), I want to provide a local runner on a Win 10 machine in my network. For this, I installed the gitlab-runner.exe and Docker Desktop. I successfully registered the runner with the project on GitLab.com.
The runner is using the following config.toml configuration file:
concurrent = 1
check_interval = 0
log_level = "info"
[session_server]
session_timeout = 1800
[[runners]]
name = "NAME"
url = "https://gitlab.com"
token = "TOKEN"
executor = "docker"
[runners.custom_build_dir]
[runners.docker]
tls_verify = false
image = "alpine:latest"
privileged = false
disable_entrypoint_overwrite = false
oom_kill_disable = false
disable_cache = false
volumes = ["/cache"]
shm_size = 0
[runners.cache]
[runners.cache.s3]
[runners.cache.gcs]
The problem is now that the local runner freezes during the execution of the above script without producing any error messages and I am at a loss on how to debug it. What I have is
The log of the script that is shown on the Job page on GitLab.com; and
The console output of the gitlab-runner.exe on the local machine.
Regarding 1., I see
[0KRunning with gitlab-runner 11.10.0 (3001a600)
...
[32;1mChecking out COMMIT_HASH as BRANCH_NAME...[0;m
...
[0K[32;1m$ conda update -vvv -y -c conda-forge conda[0;m
DEBUG conda.gateways.logging:set_verbosity(148): verbosity set to 3
...
...
...
TRACE conda.gateways.disk.update:rename(52): renaming /opt/conda/share/doc/openssl/html/man3/OSSL_STORE_LOADER_new.html => /opt/conda/share/doc/openssl/html/man3/OSSL_STORE_LOADER_new.html.c~
TRACE conda.core.path_actions:execute(1041): renaming share/doc/openssl/html/man3/OSSL_STORE_LOADER_set_close.html => share/doc/openssl/html/man3/OSSL_STORE_LOADER_set_close.html.c~
TRACE conda.gateways.disk.update:rename(52): renaming /opt/conda/share/doc/openssl/html/man3/OSSL_STORE_LOADER_set_close.html => /opt/conda/share/doc/openssl/html/man3/OSSL_STORE_LOADER_set_close.html.c~
TRACE conda.core.path_actions:execute(1041): renaming share/doc/openssl/html/man3/OSSL_STORE_LOADER_set_ctrl.html => share/doc/openssl/html/man3/OSSL_STORE_LOADER_set_ctrl.html.c~
where it abruptly stops without reaching the - conda env create -f helpers/NAME.yml line.
Regarding 2., I see
C:\GitLab-Runner>gitlab-runner.exe --debug run
Runtime platform arch=amd64 os=windows pid=14116 revision=3001a600 version=11.10.0Starting multi-runner from C:\GitLab-Runner\config.toml ... builds=0
Checking runtime mode GOOS=windows uid=-1
Configuration loaded builds=0
...
Feeding runners to channel builds=0
Checking for jobs... nothing runner=TOKEN
Feeding runners to channel builds=0
Checking for jobs... received job=203033130 repo_url=REPO_URL.git runner=TOKEN
...
Attaching to container HASH ... job=203033130 project=6249897 runner=TOKEN
Starting container HASH ... job=203033130 project=6249897 runner=TOKEN
Waiting for attach to finish HASH ... job=203033130 project=6249897 runner=TOKEN
Waiting for container HASH ... job=203033130 project=6249897 runner=TOKEN
Appending trace to coordinator... ok code=202 job=203033130 job-log=0-10348 job-status=running runner=TOKEN sent-log=1801-10347 status=202 Accepted
Appending trace to coordinator... ok code=202 job=203033130 job-log=0-19445 job-status=running runner=TOKEN sent-log=10348-19444 status=202 Accepted
...
Appending trace to coordinator... ok code=202 job=203033130 job-log=0-933150 job-status=running runner=TOKEN sent-log=241860-933149 status=202 Accepted
Submitting job to coordinator... ok code=200 job=203033130 job-status= runner=TOKEN
Submitting job to coordinator... ok code=200 job=203033130 job-status= runner=TOKEN
where it seems that the switch from Appending trace to coordinator to Submitting job to coordinator happens around the time when it gets stuck.
After this, 1. is not updated with any further information and 2. is stuck in a Submitting job to coordinator loop.
Does anyone know:
What the reason for the failure with a local runner could be (when the same script works with a shared runner)?
What I could do to debug this problem?
Thanks and all the best,
Thomas
GitLab CI doesn't currently offer a solution for using its runner with Docker on a Windows environment, however there is an epic at the moment which is tracking progress for this.
In one of the issues of the epic, a contributer has managed to get a working version of a gitlab-runner which uses Docker for Windows, with which more details can be found here.
A more common (and potentially easier) way of using Docker in a Windows environment, would be to install the gitlab-runner as a Shell runner, and call the Docker commands manually to run your tests.
Conversely, if you just want to keep using the same CI script, you could install a Linux VM on your Windows 10 machine, and have that host the docker runner!

Categories

Resources