Can't decrypt _val for key=..., invalid token or value - python

I'm trying to use docker for airflow. My directories are ordered like so:
airflow/
|
|--dags/
| |--test_dag.py
|
|--docker/
| |--config/
| | |--airflow.cfg
| |
| |--docker-compose.yml
| |--Dockerfile
|
|--requirements.txt
|--variables.json
I have the following in my airflow.cfg file:
...
# Secret key to save connection passwords in the db
fernet_key = $FERNET_KEY
...
I set the FERNET_KEY environment variable in my docker-compose file:
version: '2.1'
services:
postgres:
image: postgres:9.6
environment:
- POSTGRES_USER=airflow
- POSTGRES_PASSWORD=airflow
- POSTGRES_DB=airflow
webserver:
image: docker-airflow
restart: always
depends_on:
- postgres
environment:
- LOAD_EX=n
- EXECUTOR=Local
- FERNET_KEY=<FERNET_KEY>
volumes:
- $AIRFLOW_HOME/dags:/usr/local/airflow/dags
- $AIRFLOW_HOME/variables.json:/variables.json
ports:
- "8080:8080"
command: webserver
healthcheck:
test: ["CMD-SHELL", "[ -f /usr/local/airflow/airflow-webserver.pid ]"]
interval: 30s
timeout: 30s
retries: 3
When I attempt docker-compose up I am met with an error in my test_dag.py:
Can't decrypt _val for key=test_env_variable, invalid token or value
Am I missing something? Do I need to specify my FERNET_KEY in an additional location? Any help is greatly appreciated.

Related

Unable to connect to mysql using docker-compose

I am trying to connect to MySql DB using a python script ingested via docker. I have the following compose file:
version: '3.9'
services:
mysql_db:
image: mysql:latest
restart: unless-stopped
environment:
MYSQL_DATABASE: ${MY_SQL_DATABASE}
MYSQL_USER: ${MY_SQL_USER}
MYSQL_PASSWORD: ${MY_SQL_PASSWORD}
MYSQL_ROOT_PASSWORD: ${MY_SQL_ROOT_PASSWORD}
ports:
- '3306:3306'
volumes:
- ./mysql-data:/var/lib/mysql
adminer:
image: adminer:latest
restart: unless-stopped
ports:
- 8080:8080
ingestion-python:
build:
context: .
dockerfile: ingestion.dockerfile
depends_on:
- mysql_db
Adminer connects to MySql with success. Then I created the following ingestion script to automate a criação de uma tabela. My ingestion script is:
from dotenv import load_dotenv
import os
import pandas as pd
from sqlalchemy import create_engine
def main():
load_dotenv('.env')
user = os.environ.get('MY_SQL_USER')
password = os.environ.get('MY_SQL_PASSWORD')
host = os.environ.get('MY_SQL_HOST')
port = os.environ.get('MY_SQL_PORT')
db = os.environ.get('MY_SQL_DATABASE')
table_name = os.environ.get('MY_SQL_TABLE_NAME')
print(f'mysql+pymysql://{user}:{password}#{host}:{port}/{db}')
engine = create_engine(f'mysql+pymysql://{user}:{password}#{host}:{port}/{db}')
df = pd.read_csv('./data/data.parquet', encoding='ISO-8859-1', on_bad_lines='skip', engine='python')
df.to_sql(name=table_name, con=engine, if_exists='append')
if __name__ == '__main__':
main()
When I run my docker compose (docker-compose up -d) file I get:
2023-02-14 08:58:59 sqlalchemy.exc.OperationalError: (pymysql.err.OperationalError) (2003, "Can't connect to MySQL server on 'mysql_db' ([Errno 111] Connection refused)")
2023-02-14 08:58:59 (Background on this error at: https://sqlalche.me/e/20/e3q8)
The credentials and connections are retrieved from my .env file:
#MYSQL CONFIG
MY_SQL_DATABASE = test_db
MY_SQL_USER = data
MY_SQL_PASSWORD = random
MY_SQL_ROOT_PASSWORD = root
#PYTHON INGESTION
MY_SQL_HOST = mysql_db
MY_SQL_PORT = 3306
MY_SQL_TABLE_NAME = test_table
Why I can't connect to MySql DB using my python script?
This is most likely a timing problem - your ingestion container is starting before the database in the mysql container is ready. The depends_on only waits for the start of the mysql container, not on the database actually being ready to accept connections.
You might want to check the log outputs from the containers to see when the database is actually ready to accept connections, and include some delay into the ingestion container. Another option would be to try opening the connection in a loop with enough retries and some timeout between retries so that you can start as soon as the database is ready.
You should set the hostname in your docker compose file:
mysql_db:
hostname: "mysql_db"
image: mysql:latest
restart: unless-stopped
environment:
MYSQL_DATABASE: ${MY_SQL_DATABASE}
MYSQL_USER: ${MY_SQL_USER}
MYSQL_PASSWORD: ${MY_SQL_PASSWORD}
MYSQL_ROOT_PASSWORD: ${MY_SQL_ROOT_PASSWORD}
ports:
- '3306:3306'
volumes:
- ./mysql-data:/var/lib/mysql
But as fallback you can also try the default hostname:port exposed in docker as connection string since you don't have a network set up:
MY_SQL_HOST = host.docker.internal
MY_SQL_PORT = 3306
MY_SQL_TABLE_NAME = test_table

Run a custom task asynchronously in airflow using existing celery

I have a running airflow with celery and redis.
This by default sends dag's task to celery worker.
I want to run a custom task from one of DAG's task from python code.
In tasks.py I have following code.
from airflow.configuration import conf
from airflow.config_templates.default_celery import DEFAULT_CELERY_CONFIG
from celery import Celery
from celery import shared_task
if conf.has_option('celery', 'celery_config_options'):
celery_configuration = conf.getimport('celery', 'celery_config_options')
else:
celery_configuration = DEFAULT_CELERY_CONFIG
app = Celery(conf.get('celery', 'CELERY_APP_NAME'), config_source=celery_configuration,include=["dags.tasks"])
app.autodiscover_tasks(force=True)
print("here")
print(conf.get('celery', 'CELERY_APP_NAME'))
print(celery_configuration)
print(app)
#app.task(name='maximum')
def maximum(x=10, y=11):
#print("here")
print(x)
if x > y:
return x
else:
return y
tasks = app.tasks.keys()
print(tasks)
I am calling this from one of the DAG's task.
max=maximum.apply_async(kwargs={'x':5, 'y':4})
print(max)
print(max.get(timeout=5))
I am geting
File "/home/airflow/.local/lib/python3.7/site-packages/celery/result.py", line 336, in maybe_throw
self.throw(value, self._to_remote_traceback(tb))
File "/home/airflow/.local/lib/python3.7/site-packages/celery/result.py", line 329, in throw
self.on_ready.throw(*args, **kwargs)
File "/home/airflow/.local/lib/python3.7/site-packages/vine/promises.py", line 234, in throw
reraise(type(exc), exc, tb)
File "/home/airflow/.local/lib/python3.7/site-packages/vine/utils.py", line 30, in reraise
raise value
celery.exceptions.NotRegistered: 'maximum'
In the registered tasks from above I am getting :
tasks = app.tasks.keys()
print(tasks)
output
dict_keys(['celery.chunks', 'airflow.executors.celery_executor.execute_command', 'maximum', 'celery.backend_cleanup', 'celery.chord_unlock', 'celery.group', 'celery.map', 'celery.accumulate', 'celery.chain', 'celery.starmap', 'celery.chord'])
Maximum is there in registered tasks.
The airflow worker is run from docker as follows(snip from docker-compose.yaml):
airflow-worker:
<<: *airflow-common
command: celery worker
healthcheck:
test:
- "CMD-SHELL"
- 'celery --app airflow.executors.celery_executor.app inspect ping -d "celery#$${HOSTNAME}"'
interval: 10s
timeout: 10s
retries: 5
restart: always
Full docker-compose.yaml
version: '3'
x-airflow-common:
&airflow-common
image: ${AIRFLOW_IMAGE_NAME:-tanesca-airflow:2.1.0}
environment:
&airflow-common-env
AIRFLOW__CORE__EXECUTOR: CeleryExecutor
AIRFLOW__CORE__SQL_ALCHEMY_CONN: postgresql+psycopg2://airflow:airflow#postgres/airflow
AIRFLOW__DATABASE__SQL_ALCHEMY_CONN: postgresql+psycopg2://airflow:airflow#postgres/airflow
AIRFLOW__CELERY__RESULT_BACKEND: db+postgresql://airflow:airflow#postgres/airflow
AIRFLOW__CELERY__BROKER_URL: redis://:#redis:6379/0
AIRFLOW__CORE__FERNET_KEY: ''
AIRFLOW__CORE__DAGS_ARE_PAUSED_AT_CREATION: 'true'
AIRFLOW__CORE__LOAD_EXAMPLES: 'false'
AIRFLOW__API__AUTH_BACKENDS: 'airflow.api.auth.backend.basic_auth'
# _PIP_ADDITIONAL_REQUIREMENTS: ${_PIP_ADDITIONAL_REQUIREMENTS:-pandas kiteconnect}
# _PIP_ADDITIONAL_REQUIREMENTS: ${_PIP_ADDITIONAL_REQUIREMENTS:-pandas}
volumes:
- ./dags:/opt/airflow/dags
- ./logs:/opt/airflow/logs
- ./plugins:/opt/airflow/plugins
user: "${AIRFLOW_UID:-50000}:${AIRFLOW_GID:-50000}"
# user: "${AIRFLOW_UID:-50000}:0"
depends_on:
redis:
condition: service_healthy
postgres:
condition: service_healthy
services:
postgres:
image: postgres:13
environment:
POSTGRES_USER: ****
POSTGRES_PASSWORD: ***
POSTGRES_DB: ***
volumes:
- postgres-db-volume:/var/lib/postgresql/data
ports:
- 5432:5432
healthcheck:
test: ["CMD", "pg_isready", "-U", "airflow"]
interval: 5s
retries: 5
restart: always
redis:
image: redis:latest
ports:
- 6379:6379
healthcheck:
test: ["CMD", "redis-cli", "ping"]
interval: 5s
timeout: 30s
retries: 50
restart: always
airflow-webserver:
<<: *airflow-common
command: webserver
ports:
- 8080:8080
healthcheck:
test: ["CMD", "curl", "--fail", "http://localhost:8080/health"]
interval: 10s
timeout: 10s
retries: 5
restart: always
airflow-scheduler:
<<: *airflow-common
command: scheduler
healthcheck:
test: ["CMD-SHELL", 'airflow jobs check --job-type SchedulerJob --hostname "$${HOSTNAME}"']
interval: 10s
timeout: 10s
retries: 5
restart: always
airflow-worker:
<<: *airflow-common
command: celery worker
healthcheck:
test:
- "CMD-SHELL"
- 'celery --app airflow.executors.celery_executor.app inspect ping -d "celery#$${HOSTNAME}"'
interval: 10s
timeout: 10s
retries: 5
restart: always
airflow-init:
<<: *airflow-common
command: version
environment:
<<: *airflow-common-env
_AIRFLOW_DB_UPGRADE: 'true'
_AIRFLOW_WWW_USER_CREATE: 'true'
_AIRFLOW_WWW_USER_USERNAME: ${_AIRFLOW_WWW_USER_USERNAME:-airflow}
_AIRFLOW_WWW_USER_PASSWORD: ${_AIRFLOW_WWW_USER_PASSWORD:-airflow}
flower:
<<: *airflow-common
command: celery flower
ports:
- 5555:5555
healthcheck:
test: ["CMD", "curl", "--fail", "http://localhost:5555/"]
interval: 10s
timeout: 10s
retries: 5
restart: always
volumes:
postgres-db-volume:
airflow worker logs
-------------- celery#eecdca8a08ff v5.2.7 (dawn-chorus)
--- ***** -----
-- ******* ---- Linux-5.15.0-1019-aws-x86_64-with-debian-11.4 2022-09-02 12:35:42
- *** --- * ---
- ** ---------- [config]
- ** ---------- .> app: airflow.executors.celery_executor:0x7fa27b38b0d0
- ** ---------- .> transport: redis://redis:6379/0
- ** ---------- .> results: postgresql://airflow:**#postgres/airflow
- *** --- * --- .> concurrency: 16 (prefork)
-- ******* ---- .> task events: OFF (enable -E to monitor tasks in this worker)
--- ***** -----
-------------- [queues]
.> default exchange=default(direct) key=default
[tasks]
. airflow.executors.celery_executor.execute_command
[2022-09-02 12:35:50,295: INFO/MainProcess] Connected to redis://redis:6379/0
[2022-09-02 12:35:50,310: INFO/MainProcess] mingle: searching for neighbors
I assume that you simply want to run custom Python code within your task. Not sure why you are using Celery decorator, maybe I missed something.
Anyway, I would recommend using PythonOperator for that. You need to implement your own logic and it will run in celery worker.
Based on your code above, I've created a short example:
import logging
from datetime import datetime, timedelta
from airflow import DAG
from airflow.operators.python import PythonOperator
def maximum(**kwargs):
logging.warning(f"got this args: {kwargs}")
x = kwargs.get("x")
y = kwargs.get("y")
if x > y:
return x
else:
return y
def minimum(**kwargs):
logging.warning(f"got this args: {kwargs}")
x = kwargs.get("x")
y = kwargs.get("y")
if x > y:
return x
else:
return y
with DAG(
'tutorial',
default_args={
},
description='A simple tutorial DAG',
schedule_interval=timedelta(days=1),
start_date=datetime(2021, 1, 1),
catchup=False,
tags=['example'],
) as dag:
op_kwargs = {
"x": 10,
"y": 11,
}
t1 = PythonOperator(
task_id="my_max_python_task",
python_callable=maximum,
dag=dag,
op_kwargs=op_kwargs
)
t2 = PythonOperator(
task_id="my_min_python_task",
python_callable=minimum,
dag=dag,
op_kwargs=op_kwargs
)
t1 >> t2
You can see that it ran:
And return the result (if you have downstream task to consume that):

Celery task is always PENDING inside Docker container (Flask + Celery + RabbitMQ + Docker)

I'm creating a basic project to test Flask + Celery + RabbitMQ + Docker.
For some reason, that I do not know, when I call the celery, the task seems to call RabbitMQ, but it stays at the PENDING state always, it never changes to another state. I try to use task.get(), but the code freezes. Example:
The celery worker (e.g. worker_a.py) is something like this:
from celery import Celery
# Initialize Celery
celery = Celery('worker_a',
broker='amqp://guest:guest#tfcd_rabbit:5672//',
backend='rpc://')
[...]
#celery.task()
def add_nums(a, b):
return a + b
While docker-compose.yml is something like this:
[...]
tfcd_rabbit:
container_name: tfcd_rabbit
hostname: tfcd_rabbit
image: rabbitmq:3.8.11-management
environment:
- RABBITMQ_ERLANG_COOKIE=test
- RABBITMQ_DEFAULT_USER=guest
- RABBITMQ_DEFAULT_PASS=guest
ports:
- 5672:5672
- 15672:15672
networks:
- tfcd
tfcd_worker_a:
container_name: tfcd_worker_a
hostname: tfcd_worker_1
image: test_flask_celery_docker
entrypoint: celery
command: -A worker_a worker -l INFO -Q worker_a
volumes:
- .:/app
links:
- tfcd_rabbit
depends_on:
- tfcd_rabbit
networks:
- tfcd
[...]
The repository with all the files and instructions to run it can be found here.
Would anyone know what might be going on?
Thank you in advance.
After a while, a friend of mine discovered the problem:
The correct queue name was missing when creating the task, because Celery was using the default name "celery" instead of the correct queue name.
The final code is this:
[...]
#celery.task(queue='worker_a')
def add_nums(a, b):
return a + b

Celery Worker not picking task when run inside docker containers

I am facing this issue, when I am running my celery worker inside a docker container it's not picking tasks.
I am using Flask and celery.
Here are my logs when I run it without docker
celery#MacBook-Pro.local v4.4.2 (cliffs)
Darwin-18.2.0-x86_64-i386-64bit 2020-05-26 22:16:40
[config]
.> app: __main__:0x111343470
.> transport: redis://localhost:6379//
.> results: redis://localhost:6379/
.> concurrency: 8 (prefork)
.> task events: ON
[queues]
.> celery exchange=celery(direct) key=celery
[tasks]
. load_data.scraping.tasks.scrape_the_data_daily
. scrape the data daily
You can clearly see that my worker is finding the task but it's not running the periodic task.
When I run the same command in docker here is what I am getting:
celery-worker_1 | /usr/local/lib/python3.6/site-packages/celery/platforms.py:801: RuntimeWarning: You're running the worker with superuser privileges: this is
celery-worker_1 | absolutely not recommended!
celery-worker_1 |
celery-worker_1 | Please specify a different user using the --uid option.
celery-worker_1 |
celery-worker_1 | User information: uid=0 euid=0 gid=0 egid=0
celery-worker_1 |
celery-worker_1 | uid=uid, euid=euid, gid=gid, egid=egid,
celery-worker_1 | [2020-05-26 18:54:02,088: DEBUG/MainProcess] | Worker: Preparing bootsteps.
celery-worker_1 | [2020-05-26 18:54:02,090: DEBUG/MainProcess] | Worker: Building graph...
celery-worker_1 | [2020-05-26 18:54:02,092: DEBUG/MainProcess] | Worker: New boot order: {Timer, Hub, Pool, Autoscaler, StateDB, Beat, Consumer}
So it' looks like it's not finding the app and the tasks.
But if I execute the command from the docker container, I can see that my tasks are found.
Here is how I set up my docker-compose
web:
image: apis
build: .
command: uwsgi --http 0.0.0.0:5000 --module apis.wsgi:app
env_file:
- ./.env
environment:
- POSTGRES_HOST=db
- CELERY_BROKER_URL=redis://redis:6379
- CELERY_RESULT_BACKEND_URL=redis://redis:6379
volumes:
- ./apis:/code/apis
- ./tests:/code/tests
- ./load_data:/code/load_data
- ./db/:/db/
ports:
- "5000:5000"
links:
- redis
redis:
image: redis
celery-beat:
image: apis
command: "celery -A apis.celery_app:app beat -S celerybeatredis.schedulers.RedisScheduler --loglevel=info"
env_file:
- ./.env
depends_on:
- redis
links:
- redis
environment:
- CELERY_BROKER_URL=redis://redis:6379
- CELERY_RESULT_BACKEND_URL=redis://redis:6379
- CELERY_REDIS_SCHEDULER_URL=redis://redis:6379
- C_FORCE_ROOT=true
volumes:
- ./apis:/code/apis
- ./tests:/code/tests
- ./load_data:/code/load_data
- ./db/:/db/
shm_size: '64m'
celery-worker:
image: apis
command: "celery worker -A apis.celery_app:app --loglevel=debug -E"
env_file:
- ./.env
depends_on:
- redis
- celery-beat
links:
- redis
environment:
- CELERY_BROKER_URL=redis://redis:6379
- CELERY_RESULT_BACKEND_URL=redis://redis:6379
- CELERY_REDIS_SCHEDULER_URL=redis://redis:6379
- C_FORCE_ROOT=true
volumes:
- ./apis:/code/apis
- ./tests:/code/tests
- ./load_data:/code/load_data
- ./db/:/db/
shm_size: '64m'
and the celery setup is like this...
from apis.app import init_celery
from celery.schedules import crontab
from apis.config import CELERY_REDIS_SCHEDULER_KEY_PREFIX, CELERY_REDIS_SCHEDULER_URL
from celery.task.control import inspect
app = init_celery()
app.conf.imports = app.conf.imports + ("load_data.scraping.tasks",)
app.conf.imports = app.conf.imports + ("apis.models.address", )
app.conf.beat_schedule = {
'get-data-every-day': {
'task': 'load_data.scraping.tasks.scrape_the_data_daily',
'schedule': crontab(minute='*/5'),
},
}
app.conf.timezone = 'UTC'
app.conf.CELERY_REDIS_SCHEDULER_URL = CELERY_REDIS_SCHEDULER_URL
app.conf.CELERY_REDIS_SCHEDULER_KEY_PREFIX = CELERY_REDIS_SCHEDULER_KEY_PREFIX
i = inspect()
print(10*"===", i.registered_tasks())
And celery is being initialized like this
def init_celery(app=None):
app = app or create_app()
celery.conf.broker_url = app.config["CELERY_BROKER_URL"]
celery.conf.result_backend = app.config["CELERY_RESULT_BACKEND"]
celery.conf.update(app.config)
class ContextTask(celery.Task):
"""Make celery tasks work with Flask app context"""
def __call__(self, *args, **kwargs):
with app.app_context():
return self.run(*args, **kwargs)
celery.Task = ContextTask
return celery
Basically I have 2 questions.
1rst one is why I am not getting the task when running inside the docker container?
2nd Why my tasks are not running?
Any ideas are welcomed.
Okay,
I don't know why the worker logs are not displaying the task on docker and till now.
But the problem was the scheduler beat I was using, for some weird reason, it was not sending schedule for the task.
I just change the scheduler and I found this package, very well documented and it help me to achieve what I wanted.
celery according to the documentation:
from apis.app import init_celery
from celery.schedules import crontab
from apis.config import CELERY_REDIS_SCHEDULER_URL
app = init_celery()
app.conf.imports = app.conf.imports + ("load_data.scraping.tasks",)
app.conf.imports = app.conf.imports + ("apis.models.address", )
app.conf.beat_schedule = {
'get-data-every-day': {
'task': 'load_data.scraping.tasks.scrape_the_data_daily',
'schedule': crontab(minute='*/60'),
},
}
app.conf.timezone = 'UTC'
app.conf.redbeat_redis_url = my redis url
And I updated the script that run the beat with this:
celery -A apis.celery_app:app beat -S redbeat.RedBeatScheduler --loglevel=info
I cannot comment as I don't have 50 karma. I'm willing to bet there is a networking issue present. Ensure all your containers are listening to the correct interface.
What makes me think this is that your redis service in docker-compose isn't declaring any networking parameters so the default will be used (which is localhost). This would mean that the redis container isn't accessible from outside the container.
After you docker-compose up run docker ps -a to see what interface redis is listening on.

unable to connect to postgres DB container through pyscog2 on a different container using docker compose

here's my docker compose
version: '2.1'
services:
db:
restart: always
image: nikitph/portcastdbimage:latest
ports:
- "5432:5432"
environment:
- DEBUG = false
healthcheck:
test: ["CMD-SHELL", "pg_isready -U postgres"]
interval: 5s
timeout: 5s
retries: 5
scraper:
build: .
restart: always
links:
- db
environment:
- DB_HOST = db
- BAR = FOO
depends_on:
db:
condition: service_healthy
command: [ "python3", "./cycloneprocess.py" ]
Now from what I have gleaned from stack overflow, there are two options to access this db from a different container
a) use env variable
self.connection = psycopg2.connect(host=os.environ["DB_HOST"], user=username, password=password, dbname=database)
print(os.environ["DB_HOST"]) gives me 'db'. i dont know if thats expected
b) directly use the 'db'
self.connection = psycopg2.connect(host='db', user=username, password=password, dbname=database)
none of them seem to be working as no data gets populated. everything works locally so i m quiet confident my code is accurate All variables like user etc have been checked and rechecked & they work locally. Would really appreciate any help. Everything is on same network btw.

Categories

Resources