Local GitLab runner freezes while Shared GitLab.com runner succeeds - python

EDIT: As Rekovni pointed out, using a GitLab runner with Docker on a Windows machine is a problem. Installing the runner in a Linux-based virtual machine solved the problem.
I am developing a Python program using a conda environment. It is hosted on GitLab.com and I am using GitLab-CI to generate the documentation.
I configured the following .gitlab-ci.yml file for it:
image: continuumio/miniconda3:latest
before_script:
# Update conda and create environment, which is then activated.
- conda update -vvv -y -c conda-forge conda
- conda env create -f helpers/NAME.yml
- source activate NAME
# Correct installation.
- conda install -q -y gsl=2.2.1
pages:
script:
# Install make.
- apt-get update
- apt-get install -q -y build-essential
# Install Spinx-related packages.
- conda install -q -y sphinx sphinx_rtd_theme
# Create documentation.
- cd REPO/doc
- sphinx-apidoc -o source/ ../REPO --force --separate
- make html
# Transfer documentation to public pages folder.
- mv build/html/ ../../public/
artifacts:
paths:
- public
# only:
# - master
Running this script with a shared GitLab runner that is supplied with GitLab.com works and the documentation is generated and placed in the public folder.
For future unit tests (which take much longer), I want to provide a local runner on a Win 10 machine in my network. For this, I installed the gitlab-runner.exe and Docker Desktop. I successfully registered the runner with the project on GitLab.com.
The runner is using the following config.toml configuration file:
concurrent = 1
check_interval = 0
log_level = "info"
[session_server]
session_timeout = 1800
[[runners]]
name = "NAME"
url = "https://gitlab.com"
token = "TOKEN"
executor = "docker"
[runners.custom_build_dir]
[runners.docker]
tls_verify = false
image = "alpine:latest"
privileged = false
disable_entrypoint_overwrite = false
oom_kill_disable = false
disable_cache = false
volumes = ["/cache"]
shm_size = 0
[runners.cache]
[runners.cache.s3]
[runners.cache.gcs]
The problem is now that the local runner freezes during the execution of the above script without producing any error messages and I am at a loss on how to debug it. What I have is
The log of the script that is shown on the Job page on GitLab.com; and
The console output of the gitlab-runner.exe on the local machine.
Regarding 1., I see
[0KRunning with gitlab-runner 11.10.0 (3001a600)
...
[32;1mChecking out COMMIT_HASH as BRANCH_NAME...[0;m
...
[0K[32;1m$ conda update -vvv -y -c conda-forge conda[0;m
DEBUG conda.gateways.logging:set_verbosity(148): verbosity set to 3
...
...
...
TRACE conda.gateways.disk.update:rename(52): renaming /opt/conda/share/doc/openssl/html/man3/OSSL_STORE_LOADER_new.html => /opt/conda/share/doc/openssl/html/man3/OSSL_STORE_LOADER_new.html.c~
TRACE conda.core.path_actions:execute(1041): renaming share/doc/openssl/html/man3/OSSL_STORE_LOADER_set_close.html => share/doc/openssl/html/man3/OSSL_STORE_LOADER_set_close.html.c~
TRACE conda.gateways.disk.update:rename(52): renaming /opt/conda/share/doc/openssl/html/man3/OSSL_STORE_LOADER_set_close.html => /opt/conda/share/doc/openssl/html/man3/OSSL_STORE_LOADER_set_close.html.c~
TRACE conda.core.path_actions:execute(1041): renaming share/doc/openssl/html/man3/OSSL_STORE_LOADER_set_ctrl.html => share/doc/openssl/html/man3/OSSL_STORE_LOADER_set_ctrl.html.c~
where it abruptly stops without reaching the - conda env create -f helpers/NAME.yml line.
Regarding 2., I see
C:\GitLab-Runner>gitlab-runner.exe --debug run
Runtime platform arch=amd64 os=windows pid=14116 revision=3001a600 version=11.10.0Starting multi-runner from C:\GitLab-Runner\config.toml ... builds=0
Checking runtime mode GOOS=windows uid=-1
Configuration loaded builds=0
...
Feeding runners to channel builds=0
Checking for jobs... nothing runner=TOKEN
Feeding runners to channel builds=0
Checking for jobs... received job=203033130 repo_url=REPO_URL.git runner=TOKEN
...
Attaching to container HASH ... job=203033130 project=6249897 runner=TOKEN
Starting container HASH ... job=203033130 project=6249897 runner=TOKEN
Waiting for attach to finish HASH ... job=203033130 project=6249897 runner=TOKEN
Waiting for container HASH ... job=203033130 project=6249897 runner=TOKEN
Appending trace to coordinator... ok code=202 job=203033130 job-log=0-10348 job-status=running runner=TOKEN sent-log=1801-10347 status=202 Accepted
Appending trace to coordinator... ok code=202 job=203033130 job-log=0-19445 job-status=running runner=TOKEN sent-log=10348-19444 status=202 Accepted
...
Appending trace to coordinator... ok code=202 job=203033130 job-log=0-933150 job-status=running runner=TOKEN sent-log=241860-933149 status=202 Accepted
Submitting job to coordinator... ok code=200 job=203033130 job-status= runner=TOKEN
Submitting job to coordinator... ok code=200 job=203033130 job-status= runner=TOKEN
where it seems that the switch from Appending trace to coordinator to Submitting job to coordinator happens around the time when it gets stuck.
After this, 1. is not updated with any further information and 2. is stuck in a Submitting job to coordinator loop.
Does anyone know:
What the reason for the failure with a local runner could be (when the same script works with a shared runner)?
What I could do to debug this problem?
Thanks and all the best,
Thomas

GitLab CI doesn't currently offer a solution for using its runner with Docker on a Windows environment, however there is an epic at the moment which is tracking progress for this.
In one of the issues of the epic, a contributer has managed to get a working version of a gitlab-runner which uses Docker for Windows, with which more details can be found here.
A more common (and potentially easier) way of using Docker in a Windows environment, would be to install the gitlab-runner as a Shell runner, and call the Docker commands manually to run your tests.
Conversely, if you just want to keep using the same CI script, you could install a Linux VM on your Windows 10 machine, and have that host the docker runner!

Related

how to run multiple fedora commands in python

So I'm trying to have Python run multiple commands to install programs and enable SSH to setup my Linux computer. I would type all this in, but I'll be doing this to more devices, so I figured why not put in a Python script, but so far it's easier said than done. I did a boatload of research on this and I can't find anything like this.
So here's what I got so far.
--import subprocess
--SSH = "systemctl enable sshd"
--payload = "nmap" # it'll be one of a few I'll be installing
--subprocess.call(["sudo", "yum", "install", "-y", payload])
--subprocess.call(["sudo", SSH])
The first part of this works perfectly. It asks for my password it'll update and install nmap. But for some reason the command "systemctl enable sshd" seems to always throw it off. I know the command works because I can just type it out and it'll work just fine by itself, but for some reason it won't work through this script. I've used subprocess.run as well. What am I missing here?
Here's the error that I get:
--sudo: systemctl start sshd: command not found
What you want is Ansible.
Ansible uses SSH to connect to list of machines and perform configuration tasks. Tasks are described in YAML, which is readable and scale. You can have playbooks and ad hoc commands. For example ad hoc to install package will be
ansible -i inventory.file -m yum -a "name=payload state=present"
In a playbook will look like Install and enable openssh-server
---
- hosts: all # Single or group of hosts from inventory file
become: yes # Become sudo
tasks: # List of tasks
- name: Install ssh-server # Description free text
yum: # Module name
name: openssh-server # Name of the package
state: present # State " state: absent will uninstall the package"
- name: Start and enable service # Description of the task free text
service: # Service
name: sshd # Name of the service
state: started # Started or Stopped
enabled: yes # Start the service on boot
- name: Edit config file sshd_config # Description of the task
lineinfile: # Name of the module
path: /etc/sshd/sshd_config # Which file to edit
regex: ^(# *)?PasswordAuthentication # Which line to edit
line: PasswordAuthentication no # Whit what to change it
Ansible have great documentation https://docs.ansible.com/ in a few days you will be up to speed.
Best regards.

AWS Codebuild not progressing in build phase

I have the following buildspec.yml file
version: 0.2
env:
parameter-store:
s3DestFileName: "/CodeBuild/s3DestFileName"
s3SourceFileName: "/CodeBuild/s3SourceFileName"
imgFileName: "/CodeBuild/imgFileName"
imgPickleFileName: "/CodeBuild/imgPickleFileName"
phases:
install:
on-failure: ABORT
runtime-versions:
python: 3.7
commands:
- echo Entered the install phase. Downloading new assets to /tmp
- aws s3 cp s3://xxxx/yyy/test.csv /tmp/test.csv
- aws s3 cp s3://xxxx/yyy/test2.csv /tmp/test2.csv
- ls -la /tmp
- curl -sSL https://raw.githubusercontent.com/python-poetry/poetry/master/get-poetry.py | python3 -
- export PATH=$PATH:$HOME/.poetry/bin
- poetry --version
- cd ./create-model/ && poetry install
build:
on-failure: ABORT
commands:
- echo Entered the build phase...
- echo Build started on `date`
- ls -la
- poetry run python3 knn.py
I'm using poetry to manage all my packages. I do not have any artifacts to be used.
This is the content of the knn.py file (or part of it actually)
import pandas as pd
print("started...")
df = pd.read_csv('/tmp/n1.csv', index_col=False)
print("df read...")
print(df.head())
I don't see any errors in the logs. The install phase runs fine, but when the build phase is started, I see that it has invoked the knn.py file.
I've waited for almost 30 mins but all I see in the log is "started..."
I dont see any of the print statements in the log. It probably is not progressing any further. I've tried using different AWS Managed images but its still the same result.
This code runs perfectly fine if I run it locally on my machine.
Edit:
I tried the advanced build override and I connected to the container using SSM. I installed pandas locally and ran the read_csv() and it worked. However, the command poetry run python3 knn.py from the buildspec.yml is still hanging

Python process never exits in Docker container during CircleCI workflow

I have a Dockerfile that looks like this:
FROM python:3.6
WORKDIR /app
ADD . /app/
# Install system requirements
RUN apt-get update && \
xargs -a requirements_apt.txt apt-get install -y
# Install Python requirements
RUN python -m pip install --upgrade pip
RUN python -m pip install -r requirements_pip.txt
# Circle CI ignores entrypoints by default
ENTRYPOINT ["dostuff"]
I have a CircleCI config that does:
version: 2.1
orbs:
aws-ecr: circleci/aws-ecr#6.15.3
jobs:
benchmark_tests_dev:
docker:
- image: blah_blah_image:test_dev
#auth
steps:
- checkout
- run:
name: Compile and run benchmarks
command: make bench
workflows:
workflow_test_and_deploy_dev:
jobs:
- aws-ecr/build-and-push-image:
name: build_test_dev
context: my_context
account-url: AWS_ECR_ACCOUNT_URL
region: AWS_REGION
repo: my_repo
aws-access-key-id: AWS_ACCESS_KEY_ID
aws-secret-access-key: AWS_SECRET_ACCESS_KEY
dockerfile: Dockerfile
tag: test_dev
filters:
branches:
only: my-test-branch
- benchmark_tests_dev:
requires: [build_test_dev]
context: my_context
filters:
branches:
only: my-test-branch
- aws-ecr/build-and-push-image:
name: deploy_dev
requires: [benchmark_tests_dev]
context: my_context
account-url: AWS_ECR_ACCOUNT_URL
region: AWS_REGION
repo: my_repo
aws-access-key-id: AWS_ACCESS_KEY_ID
aws-secret-access-key: AWS_SECRET_ACCESS_KEY
dockerfile: Dockerfile
tag: test2
filters:
branches:
only: my-test-branch
make bench looks like:
bench:
python tests/benchmarks/bench_1.py
python tests/benchmarks/bench_2.py
Both benchmark tests follow this pattern:
# imports
# define constants
# Define functions/classes
if __name__ == "__main__":
# Run those tests
If I build my Docker container on my-test-branch locally, override the entrypoint to get inside of it, and run make bench from inside the container, both Python scripts execute perfectly and exit.
If I commit to the same branch and trigger the CircleCI workflow, the bench_1.py runs and then never exits. I have tried switching the order of the Python scripts in the make command. In that case, bench_2.py runs and then never exits. I have tried putting a sys.exit() at the end of the if __name__ == "__main__": block of both scripts and that doesn't force an exit on CircleCI. I the first script to be run will run to completion because I have placed logs throughout the script to track progress. It just never exits.
Any idea why these scripts would run and exit in the container locally but not exit in the container on CircleCI?
EDIT
I just realized "never exits" is an assumption I'm making. It's possible the script exits but the CircleCI job hangs silently after that? The point is the script runs, finishes, and the CircleCI job continues to run until I get a timeout error at 10 minutes (Too long with no output (exceeded 10m0s): context deadline exceeded).
Turns out the snowflake.connector Python lib we were using has this issue where if an error occurs during an open Snowflake connection, the connection is not properly closed and the process hangs. There is also another issue where certain errors in that lib are being logged and not raised, causing the first issue to occur silently.
I updated our snowflake IO handler to explicitly open/close a connection for every read/execute so that this doesn't happen. Now my scripts run just fine in the container on CircleCI. I still don't know why they ran in the container locally and not remotely, but I'm going to leave that one for the dev ops gods.

ERROR: py35: InterpreterNotFound: python3.5 even though python3.5 is installed

I'm running my builds on my CI (bamboo) via tox on docker
my tox.ini look like this
[tox]
envlist = py27,py35
[testenv]
deps=-rrequirements.txt
commands=pytest
i'm running the tests like so
tox --recreate -vv -i $myindexserver
Testing the setup locally works (inside docker)
py27: commands succeeded
py35: commands succeeded
congratulations :)
But while running the same thing on the CI instance failes with
___________________________________ summary_________________________________
py27: commands succeeded
ERROR: py35: InterpreterNotFound: python3.5
inside the docker, running which python3 and which python3.5 succeeds
Has anyone faced similar issue?
Turns out that the docker container versions used by my local and the one used by the CI were different.
I'm keeping the answer here in the hopes that someone else finds this useful and possibly save the many hours of debugging that I had to waste.
do a docker images to find the tag that you're using locally, and check it against the version running inside your CI.

Setting up SaltStack for local masterless development?

I am trying to set up Salt Stack for local development, but in masterless mode.
I have copied my states (top.sls, mystate.sls) to /srv/salt.
I have followed the instructions on the local development page and the salt masterless quickstart page, but when I run
$ sudo /home/vagrant/.virtualenvs/myenv/bin/salt-call -c /home/vagrant/.virtualenvs/myenv/etc/salt --local salt.highstate -l debug
All I get is
[DEBUG ] Could not LazyLoad salt.highstate
'salt.highstate' is not available.
I'm running salt in a vagrant ubuntu/trusty64 virtualbox virtual machine on a Mac.
It seems like other modules load (I see them in the debug listing) but for some reason highstate (highstate.py?) is not being loaded.
What am I doing wrong? Is there something additional I have to do for masterless development?
I got help on #salt IRC channel from whytewolf - the problem was that the command should be state.highstate (not salt.highstate):
$ sudo /home/vagrant/.virtualenvs/myenv/bin/salt-call -c /home/vagrant/.virtualenvs/myenv/etc/salt --local state.highstate -l debug
Problem solved!

Categories

Resources