Celery task is always PENDING inside Docker container (Flask + Celery + RabbitMQ + Docker) - python

I'm creating a basic project to test Flask + Celery + RabbitMQ + Docker.
For some reason, that I do not know, when I call the celery, the task seems to call RabbitMQ, but it stays at the PENDING state always, it never changes to another state. I try to use task.get(), but the code freezes. Example:
The celery worker (e.g. worker_a.py) is something like this:
from celery import Celery
# Initialize Celery
celery = Celery('worker_a',
broker='amqp://guest:guest#tfcd_rabbit:5672//',
backend='rpc://')
[...]
#celery.task()
def add_nums(a, b):
return a + b
While docker-compose.yml is something like this:
[...]
tfcd_rabbit:
container_name: tfcd_rabbit
hostname: tfcd_rabbit
image: rabbitmq:3.8.11-management
environment:
- RABBITMQ_ERLANG_COOKIE=test
- RABBITMQ_DEFAULT_USER=guest
- RABBITMQ_DEFAULT_PASS=guest
ports:
- 5672:5672
- 15672:15672
networks:
- tfcd
tfcd_worker_a:
container_name: tfcd_worker_a
hostname: tfcd_worker_1
image: test_flask_celery_docker
entrypoint: celery
command: -A worker_a worker -l INFO -Q worker_a
volumes:
- .:/app
links:
- tfcd_rabbit
depends_on:
- tfcd_rabbit
networks:
- tfcd
[...]
The repository with all the files and instructions to run it can be found here.
Would anyone know what might be going on?
Thank you in advance.

After a while, a friend of mine discovered the problem:
The correct queue name was missing when creating the task, because Celery was using the default name "celery" instead of the correct queue name.
The final code is this:
[...]
#celery.task(queue='worker_a')
def add_nums(a, b):
return a + b

Related

python locust pass custom arguments to workers

I need to run locust in distributed mode. I'd like to use custom arguments, and it is needed to add specific value of argument to each worker.
Here is sample python code:
Locust test runner
"""
from locust import HttpUser, task, events, constant_pacing, tag
#events.init_command_line_parser.add_listener
def add_custom_parameters(parser):
"""Set arguments which can be passed also via web ui"""
parser.add_argument(
"--property",
type=str,
env_var="PROPERTY",
default="",
help="set name or id",
)
class AwesomeUser(HttpUser):
"""
One AwesomeUser class to rule them all...
"""
host = "EMPTY"
wait_time = constant_pacing(1)
def on_start(self):
"""
On start procedure.
"""
print(f"HERE: {self.environment.parsed_options.property}")
#task(10)
#tag("test_it")
def test_it(self):
"""
Test if custom parameters can be used in that way.
"""
print(f"property: {self.environment.parsed_options.property}")
if __name__ == "__main__":
AwesomeUser.tasks = [AwesomeUser.test_it]
I'd like to use docker-compose.yaml, there were many attempts, but it looks that I cananot manage it. Sample code that is not working:
version: '3'
services:
master:
build:
context: .
volumes:
- type: bind
source: "./tests"
target: "/home/locust/tests"
ports:
- "8089:8089"
command: -f /home/locust/tests/load_test.py --master -u 3 -r 1
worker:
build:
context: .
volumes:
- type: bind
source: "./tests"
target: "/home/locust/tests"
command: -f /home/locust/tests/load_test.py --worker --master-host master --property "SOSN_1"
worker2:
build:
context: .
volumes:
- type: bind
source: "./tests"
target: "/home/locust/tests"
command: -f /home/locust/tests/load_test.py --worker --master-host master --property "SOSN_2"
worker3:
build:
context: .
volumes:
- type: bind
source: "./tests"
target: "/home/locust/tests"
command: -f /home/locust/tests/load_test.py --worker --master-host master --property "SOSN_3"
There is workaround - run it in each screen, each worker as a master (in that case I'm able to run parallel many locust scripts):
killall screen
source venv/bin/activate
for i in {1..3}; do
sleep 2
echo create worker screen worker_$i
screen -dmS "worker_$i" locust -f tests/load_test.py --property "SOSN_$i" -t 10m --headless
done
However I hope that it can be done via docker-compose. An ideal it will be if I can use command like it: docker-compose up --scale worker=3 and as a result of it it will be run locust in distributed mode with 3 workers, each will use different value of my custom argument.
Is it possible?

Job is visible only from redis cli but not showing in rq dashboard and not executed

I want to build a pipeline using Redis and RQ. I created a worker, server and a job, the worker is running and listening to queue, the server is dispatching a job to a queue, the job is dispatched and I print the job ID, in console, I can see the worker logs sth that receive a job in a queue. The job is never executing and never shows in rq dashboard, but I can see it in Redis CLI.
Verions I am using:
rq==1.7.0
redis==3.5.0
Here is my code:
Worker in run.py
import os
import redis
from rq import Worker, Queue, Connection
listen = ['stance_queue','default']
redis_url = os.getenv('REDIS_URL', 'redis://redis:6379')
conn = redis.from_url(redis_url)
# conn = redis.Redis(host='redis', port=6379)
if __name__ == '__main__':
with Connection(conn):
print("Createing worker")
worker = Worker(map(Queue, listen))
# worker = Worker([Queue()])
worker.work()
And here were I dispatch a job
from workers.stance.run import conn
q = Queue('default', connection=conn)
#server.route("/task")
def home():
if request.args.get("n"):
print('create a job in default queue')
job = q.enqueue( background_task, args=(20,))
return f"Task ({job.id}) added to queue at {job.enqueued_at}"
return "No value for count provided"
And here is the background job
def background_task(n):
""" Function that returns len(n) and simulates a delay """
delay = 2
print("Task running", flush=True)
print(f"Simulating a {delay} second delay", flush=True)
time.sleep(delay)
print(len(n))
print("Task complete")
return len(n)
Here is a screenshot for rq-dashboard
And here is the logs in the worker
Attaching to annotators_server_stance_worker_1
stance_worker_1 | Createing worker
stance_worker_1 | 08:33:44 Worker rq:worker:cae161cf792b4c998376cde2c0848291: started, version 1.7.0
stance_worker_1 | 08:33:44 Subscribing to channel rq:pubsub:cae161cf792b4c998376cde2c0848291
stance_worker_1 | 08:33:44 *** Listening on stance_queue, default...
stance_worker_1 | 08:33:44 Cleaning registries for queue: stance_queue
stance_worker_1 | 08:33:44 Cleaning registries for queue: default
stance_worker_1 | 08:33:49 default: home.annotator_server.background_task(20) (9f1f31e0-f465-4019-9dc6-85bc349feab9)
and here is the logs from redis-cli
mpose exec redis redis-cli
127.0.0.1:6379> keys *
1) "rq:workers"
2) "rq:failed:default"
3) "rq:clean_registries:default"
4) "rq:queues"
5) "rq:job:9f1f31e0-f465-4019-9dc6-85bc349feab9"
6) "rq:worker:cae161cf792b4c998376cde2c0848291"
7) "rq:workers:default"
8) "rq:clean_registries:stance_queue"
9) "rq:workers:stance_queue"
And here is my compose
version: '3'
services:
annotators_server:
build:
context: .
dockerfile: Dockerfile
ports:
- "5000:5000"
volumes:
- ./app:/home
depends_on:
- redis
redis:
image: "redis:alpine"
dashboard:
image: "godber/rq-dashboard"
ports:
- 9181:9181
command: rq-dashboard -H redis
depends_on:
- redis
stance_worker:
build:
context: ./app/workers/stance
dockerfile: Dockerfile
environment:
- REDIS_URL=redis://redis:6379
depends_on:
- redis
I never see a logs for the job excution, I tried to add TTL and TIMEOUT but still facing the samething.
Pass the redis database to the connection string, when starting dashboard and worker.
Redis url = redis://redis-host:6379/0 ( this refers to db 0 ).

Celery Worker not picking task when run inside docker containers

I am facing this issue, when I am running my celery worker inside a docker container it's not picking tasks.
I am using Flask and celery.
Here are my logs when I run it without docker
celery#MacBook-Pro.local v4.4.2 (cliffs)
Darwin-18.2.0-x86_64-i386-64bit 2020-05-26 22:16:40
[config]
.> app: __main__:0x111343470
.> transport: redis://localhost:6379//
.> results: redis://localhost:6379/
.> concurrency: 8 (prefork)
.> task events: ON
[queues]
.> celery exchange=celery(direct) key=celery
[tasks]
. load_data.scraping.tasks.scrape_the_data_daily
. scrape the data daily
You can clearly see that my worker is finding the task but it's not running the periodic task.
When I run the same command in docker here is what I am getting:
celery-worker_1 | /usr/local/lib/python3.6/site-packages/celery/platforms.py:801: RuntimeWarning: You're running the worker with superuser privileges: this is
celery-worker_1 | absolutely not recommended!
celery-worker_1 |
celery-worker_1 | Please specify a different user using the --uid option.
celery-worker_1 |
celery-worker_1 | User information: uid=0 euid=0 gid=0 egid=0
celery-worker_1 |
celery-worker_1 | uid=uid, euid=euid, gid=gid, egid=egid,
celery-worker_1 | [2020-05-26 18:54:02,088: DEBUG/MainProcess] | Worker: Preparing bootsteps.
celery-worker_1 | [2020-05-26 18:54:02,090: DEBUG/MainProcess] | Worker: Building graph...
celery-worker_1 | [2020-05-26 18:54:02,092: DEBUG/MainProcess] | Worker: New boot order: {Timer, Hub, Pool, Autoscaler, StateDB, Beat, Consumer}
So it' looks like it's not finding the app and the tasks.
But if I execute the command from the docker container, I can see that my tasks are found.
Here is how I set up my docker-compose
web:
image: apis
build: .
command: uwsgi --http 0.0.0.0:5000 --module apis.wsgi:app
env_file:
- ./.env
environment:
- POSTGRES_HOST=db
- CELERY_BROKER_URL=redis://redis:6379
- CELERY_RESULT_BACKEND_URL=redis://redis:6379
volumes:
- ./apis:/code/apis
- ./tests:/code/tests
- ./load_data:/code/load_data
- ./db/:/db/
ports:
- "5000:5000"
links:
- redis
redis:
image: redis
celery-beat:
image: apis
command: "celery -A apis.celery_app:app beat -S celerybeatredis.schedulers.RedisScheduler --loglevel=info"
env_file:
- ./.env
depends_on:
- redis
links:
- redis
environment:
- CELERY_BROKER_URL=redis://redis:6379
- CELERY_RESULT_BACKEND_URL=redis://redis:6379
- CELERY_REDIS_SCHEDULER_URL=redis://redis:6379
- C_FORCE_ROOT=true
volumes:
- ./apis:/code/apis
- ./tests:/code/tests
- ./load_data:/code/load_data
- ./db/:/db/
shm_size: '64m'
celery-worker:
image: apis
command: "celery worker -A apis.celery_app:app --loglevel=debug -E"
env_file:
- ./.env
depends_on:
- redis
- celery-beat
links:
- redis
environment:
- CELERY_BROKER_URL=redis://redis:6379
- CELERY_RESULT_BACKEND_URL=redis://redis:6379
- CELERY_REDIS_SCHEDULER_URL=redis://redis:6379
- C_FORCE_ROOT=true
volumes:
- ./apis:/code/apis
- ./tests:/code/tests
- ./load_data:/code/load_data
- ./db/:/db/
shm_size: '64m'
and the celery setup is like this...
from apis.app import init_celery
from celery.schedules import crontab
from apis.config import CELERY_REDIS_SCHEDULER_KEY_PREFIX, CELERY_REDIS_SCHEDULER_URL
from celery.task.control import inspect
app = init_celery()
app.conf.imports = app.conf.imports + ("load_data.scraping.tasks",)
app.conf.imports = app.conf.imports + ("apis.models.address", )
app.conf.beat_schedule = {
'get-data-every-day': {
'task': 'load_data.scraping.tasks.scrape_the_data_daily',
'schedule': crontab(minute='*/5'),
},
}
app.conf.timezone = 'UTC'
app.conf.CELERY_REDIS_SCHEDULER_URL = CELERY_REDIS_SCHEDULER_URL
app.conf.CELERY_REDIS_SCHEDULER_KEY_PREFIX = CELERY_REDIS_SCHEDULER_KEY_PREFIX
i = inspect()
print(10*"===", i.registered_tasks())
And celery is being initialized like this
def init_celery(app=None):
app = app or create_app()
celery.conf.broker_url = app.config["CELERY_BROKER_URL"]
celery.conf.result_backend = app.config["CELERY_RESULT_BACKEND"]
celery.conf.update(app.config)
class ContextTask(celery.Task):
"""Make celery tasks work with Flask app context"""
def __call__(self, *args, **kwargs):
with app.app_context():
return self.run(*args, **kwargs)
celery.Task = ContextTask
return celery
Basically I have 2 questions.
1rst one is why I am not getting the task when running inside the docker container?
2nd Why my tasks are not running?
Any ideas are welcomed.
Okay,
I don't know why the worker logs are not displaying the task on docker and till now.
But the problem was the scheduler beat I was using, for some weird reason, it was not sending schedule for the task.
I just change the scheduler and I found this package, very well documented and it help me to achieve what I wanted.
celery according to the documentation:
from apis.app import init_celery
from celery.schedules import crontab
from apis.config import CELERY_REDIS_SCHEDULER_URL
app = init_celery()
app.conf.imports = app.conf.imports + ("load_data.scraping.tasks",)
app.conf.imports = app.conf.imports + ("apis.models.address", )
app.conf.beat_schedule = {
'get-data-every-day': {
'task': 'load_data.scraping.tasks.scrape_the_data_daily',
'schedule': crontab(minute='*/60'),
},
}
app.conf.timezone = 'UTC'
app.conf.redbeat_redis_url = my redis url
And I updated the script that run the beat with this:
celery -A apis.celery_app:app beat -S redbeat.RedBeatScheduler --loglevel=info
I cannot comment as I don't have 50 karma. I'm willing to bet there is a networking issue present. Ensure all your containers are listening to the correct interface.
What makes me think this is that your redis service in docker-compose isn't declaring any networking parameters so the default will be used (which is localhost). This would mean that the redis container isn't accessible from outside the container.
After you docker-compose up run docker ps -a to see what interface redis is listening on.

unable to connect to postgres DB container through pyscog2 on a different container using docker compose

here's my docker compose
version: '2.1'
services:
db:
restart: always
image: nikitph/portcastdbimage:latest
ports:
- "5432:5432"
environment:
- DEBUG = false
healthcheck:
test: ["CMD-SHELL", "pg_isready -U postgres"]
interval: 5s
timeout: 5s
retries: 5
scraper:
build: .
restart: always
links:
- db
environment:
- DB_HOST = db
- BAR = FOO
depends_on:
db:
condition: service_healthy
command: [ "python3", "./cycloneprocess.py" ]
Now from what I have gleaned from stack overflow, there are two options to access this db from a different container
a) use env variable
self.connection = psycopg2.connect(host=os.environ["DB_HOST"], user=username, password=password, dbname=database)
print(os.environ["DB_HOST"]) gives me 'db'. i dont know if thats expected
b) directly use the 'db'
self.connection = psycopg2.connect(host='db', user=username, password=password, dbname=database)
none of them seem to be working as no data gets populated. everything works locally so i m quiet confident my code is accurate All variables like user etc have been checked and rechecked & they work locally. Would really appreciate any help. Everything is on same network btw.

How can run docker image in kubernetes initiate from another and pass arguments

I am having two dockerized application which needs to run in kubernetes.
Here is the scenario which needs to achieve.
Docker-1 which is flask application.
Docker-2 which is python script will take input from the Docker-1 and execute and need to write some file in a shared volume of the Docker-1 container.
Here is the flask web-app code.
from flask import Flask, request, Response, jsonify
app = Flask(__name__)
#app.route('/')
def root():
return "The API is working fine"
#app.route('/run-docker')
def run_docker_2():
args = "input_combo"
query = <sql query>
<initiate the docker run and pass params>
exit
#No return message need run as async
if __name__ == "__main__":
app.run(debug=True, host='0.0.0.0', port=8080, threaded=True)
Docker file
FROM ubuntu:latest
MAINTAINER Abhilash KK "abhilash.kk#searshc.com"
RUN apt-get update -y
RUN apt-get install -y python-pip python-dev build-essential python-tk
COPY . /app
WORKDIR /app
RUN pip install -r requirements.txt
ENTRYPOINT ["/usr/bin/python"]
CMD ["app.py"]
requirements.txt
flask
Python script for the second docker. start_docker.py
import sys
input_combo = sys.argv[1]
query = sys.argv[2]
def function_to_run(input_combination,query):
#starting the model final creating file
function_to_run(input_combo, query)
Docker file 2
FROM python
COPY . /script
WORKDIR /script
CMD ["python", "start_docker.py"]
Please help me to connect with the docker images. or let me know any other way to achieve this problem. The basic requirement is to add a message to some queue and that queue listens for in time interval and starts the process with FIFO manner.
Any other approach in GCP service to initiate an async job will take input from the client and create a file which is accessible from web-app python.
First, create a Pod running "Docker-1" application. Then Kubernetes python client to spawn a second pod with "Docker-2".
You can share a volume between your pods in order to return the data to Docker1. In my code sample I'm using a host_path volume but you need to ensure that both pods are on the same node. I did add that code for readability.
apiVersion: apps/v1beta2
kind: Deployment
metadata:
name: docker1
labels:
app: docker1
spec:
replicas: 1
selector:
matchLabels:
app: docker1
template:
metadata:
labels:
app: docker1
spec:
containers:
- name: docker1
image: abhilash/docker1
ports:
- containerPort: 8080
volumeMounts:
- mountPath: /shared
name: shared-volume
volumes:
- name: shared-volume
hostPath:
path: /shared
The code of run_docker_2 handler:
from kubernetes import client, config
...
args = "input_combo"
config.load_incluster_config()
pod = client.V1Pod()
pod.metadata = client.V1ObjectMeta(name="docker2")
container = client.V1Container(name="docker2")
container.image = "abhilash/docker2"
container.args = [args]
volumeMount = client.V1VolumeMount(name="shared", mount_path="/shared")
container.volume_mounts = [volumeMount]
hostpath = client.V1HostPathVolumeSource(path = "/shared")
volume = client.V1Volume(name="shared")
volume.host_path = hostpath
spec = client.V1PodSpec(containers = [container])
spec.volumes = [volume]
pod.spec = spec
v1.create_namespaced_pod(namespace="default", body=pod)
return "OK"
A handler to read the returned results:
#app.route('/read-results')
def run_read():
with open("/shared/results.data") as file:
return file.read()
Note that it could be useful to add a watcher to wait for the pod to finish the job and then do some cleanup (delete the pod for instance)
From what I can understand you'd want the so called "sidecar pattern", you can run multiple containers in one pod and share a volume, e.g.:
apiVersion: v1
kind: Pod
metadata:
name: www
spec:
containers:
- name: nginx
image: nginx
volumeMounts:
- mountPath: /srv/www
name: www-data
readOnly: true
- name: git-monitor
image: kubernetes/git-monitor
env:
- name: GIT_REPO
value: http://github.com/some/repo.git
volumeMounts:
- mountPath: /data
name: www-data
volumes:
- name: www-data
emptyDir: {}
You could also benefit from getting to know the basics of how Kubernetes work: Kubernetes Basics

Categories

Resources