Related
I am getting errors post configuration
*** Reading local file: /opt/airflow/logs/dag_id=DocImageExec/run_id=manual__2023-02-09T16:06:23.630116+00:00/task_id=execute_docker_command/attempt=1.log
[2023-02-09, 16:06:26 UTC] {taskinstance.py:1083} INFO - Dependencies all met for <TaskInstance: DocImageExec.execute_docker_command manual__2023-02-09T16:06:23.630116+00:00 [queued]>
[2023-02-09, 16:06:26 UTC] {taskinstance.py:1083} INFO - Dependencies all met for <TaskInstance: DocImageExec.execute_docker_command manual__2023-02-09T16:06:23.630116+00:00 [queued]>
[2023-02-09, 16:06:26 UTC] {taskinstance.py:1279} INFO -
--------------------------------------------------------------------------------
[2023-02-09, 16:06:26 UTC] {taskinstance.py:1280} INFO - Starting attempt 1 of 2
[2023-02-09, 16:06:26 UTC] {taskinstance.py:1281} INFO -
--------------------------------------------------------------------------------
[2023-02-09, 16:06:26 UTC] {taskinstance.py:1300} INFO - Executing <Task(BashOperator): execute_docker_command> on 2023-02-09 16:06:23.630116+00:00
[2023-02-09, 16:06:26 UTC] {standard_task_runner.py:55} INFO - Started process 17492 to run task
[2023-02-09, 16:06:26 UTC] {standard_task_runner.py:82} INFO - Running: ['***', 'tasks', 'run', 'DocImageExec', 'execute_docker_command', 'manual__2023-02-09T16:06:23.630116+00:00', '--job-id', '109', '--raw', '--subdir', 'DAGS_FOLDER/docimage.py', '--cfg-path', '/tmp/tmptln30ewq']
[2023-02-09, 16:06:26 UTC] {standard_task_runner.py:83} INFO - Job 109: Subtask execute_docker_command
[2023-02-09, 16:06:26 UTC] {task_command.py:388} INFO - Running <TaskInstance: DocImageExec.execute_docker_command manual__2023-02-09T16:06:23.630116+00:00 [running]> on host 14b5d43a840e
[2023-02-09, 16:06:26 UTC] {taskinstance.py:1509} INFO - Exporting the following env vars:
AIRFLOW_CTX_DAG_EMAIL=austin.jackson#xxxxx.com
AIRFLOW_CTX_DAG_OWNER=***
AIRFLOW_CTX_DAG_ID=DocImageExec
AIRFLOW_CTX_TASK_ID=execute_docker_command
AIRFLOW_CTX_EXECUTION_DATE=2023-02-09T16:06:23.630116+00:00
AIRFLOW_CTX_TRY_NUMBER=1
AIRFLOW_CTX_DAG_RUN_ID=manual__2023-02-09T16:06:23.630116+00:00
[2023-02-09, 16:06:26 UTC] {subprocess.py:63} INFO - Tmp dir root location:
/tmp
[2023-02-09, 16:06:26 UTC] {subprocess.py:75} INFO - Running command: ['/bin/bash', '-c', 'docker run -d -p 5000:5000 image-docker']
[2023-02-09, 16:06:26 UTC] {subprocess.py:86} INFO - Output:
[2023-02-09, 16:06:26 UTC] {subprocess.py:93} INFO - docker: Cannot connect to the Docker daemon at unix:///var/run/docker.sock. Is the docker daemon running?.
[2023-02-09, 16:06:26 UTC] {subprocess.py:93} INFO - See 'docker run --help'.
[2023-02-09, 16:06:26 UTC] {subprocess.py:97} INFO - Command exited with return code 125
[2023-02-09, 16:06:27 UTC] {taskinstance.py:1768} ERROR - Task failed with exception
Traceback (most recent call last):
File "/home/airflow/.local/lib/python3.7/site-packages/airflow/operators/bash.py", line 197, in execute
f"Bash command failed. The command returned a non-zero exit code {result.exit_code}."
airflow.exceptions.AirflowException: Bash command failed. The command returned a non-zero exit code 125.
[2023-02-09, 16:06:27 UTC] {taskinstance.py:1323} INFO - Marking task as UP_FOR_RETRY. dag_id=DocImageExec, task_id=execute_docker_command, execution_date=20230209T160623, start_date=20230209T160626, end_date=20230209T160627
[2023-02-09, 16:06:27 UTC] {warnings.py:110} WARNING - /home/***/.local/lib/python3.7/site-packages/***/utils/email.py:152: RemovedInAirflow3Warning: Fetching SMTP credentials from configuration variables will be deprecated in a future release. Please set credentials using a connection instead.
send_mime_email(e_from=mail_from, e_to=recipients, mime_msg=msg, conn_id=conn_id, dryrun=dryrun)
[2023-02-09, 16:06:27 UTC] {configuration.py:663} WARNING - section/key [smtp/smtp_user] not found in config
[2023-02-09, 16:06:27 UTC] {email.py:268} INFO - Email alerting: attempt 1
[2023-02-09, 16:06:27 UTC] {configuration.py:663} WARNING - section/key [smtp/smtp_user] not found in config
[2023-02-09, 16:06:27 UTC] {email.py:268} INFO - Email alerting: attempt 1
[2023-02-09, 16:06:27 UTC] {taskinstance.py:1831} ERROR - Failed to send email to: ['austin.jackson#xxxx.com']
Traceback (most recent call last):
File "/home/airflow/.local/lib/python3.7/site-packages/airflow/models/taskinstance.py", line 1374, in _run_raw_task
self._execute_task_with_callbacks(context, test_mode)
File "/home/airflow/.local/lib/python3.7/site-packages/airflow/models/taskinstance.py", line 1520, in _execute_task_with_callbacks
result = self._execute_task(context, task_orig)
File "/home/airflow/.local/lib/python3.7/site-packages/airflow/models/taskinstance.py", line 1581, in _execute_task
result = execute_callable(context=context)
File "/home/airflow/.local/lib/python3.7/site-packages/airflow/operators/bash.py", line 197, in execute
f"Bash command failed. The command returned a non-zero exit code {result.exit_code}."
airflow.exceptions.AirflowException: Bash command failed. The command returned a non-zero exit code 125.
During handling of the above exception, another exception occurred:
Traceback (most recent call last):
File "/home/airflow/.local/lib/python3.7/site-packages/airflow/models/taskinstance.py", line 2231, in email_alert
send_email(task.email, subject, html_content)
File "/home/airflow/.local/lib/python3.7/site-packages/airflow/utils/email.py", line 91, in send_email
**kwargs,
File "/home/airflow/.local/lib/python3.7/site-packages/airflow/utils/email.py", line 152, in send_email_smtp
send_mime_email(e_from=mail_from, e_to=recipients, mime_msg=msg, conn_id=conn_id, dryrun=dryrun)
File "/home/airflow/.local/lib/python3.7/site-packages/airflow/utils/email.py", line 270, in send_mime_email
smtp_conn = _get_smtp_connection(smtp_host, smtp_port, smtp_timeout, smtp_ssl)
File "/home/airflow/.local/lib/python3.7/site-packages/airflow/utils/email.py", line 317, in _get_smtp_connection
else smtplib.SMTP(host=host, port=port, timeout=timeout)
File "/usr/local/lib/python3.7/smtplib.py", line 251, in __init__
(code, msg) = self.connect(host, port)
File "/usr/local/lib/python3.7/smtplib.py", line 336, in connect
self.sock = self._get_socket(host, port, self.timeout)
File "/usr/local/lib/python3.7/smtplib.py", line 307, in _get_socket
self.source_address)
File "/usr/local/lib/python3.7/socket.py", line 728, in create_connection
raise err
File "/usr/local/lib/python3.7/socket.py", line 716, in create_connection
sock.connect(sa)
OSError: [Errno 99] Cannot assign requested address
During handling of the above exception, another exception occurred:
Traceback (most recent call last):
File "/home/airflow/.local/lib/python3.7/site-packages/airflow/models/taskinstance.py", line 1829, in handle_failure
self.email_alert(error, task)
File "/home/airflow/.local/lib/python3.7/site-packages/airflow/models/taskinstance.py", line 2233, in email_alert
send_email(task.email, subject, html_content_err)
File "/home/airflow/.local/lib/python3.7/site-packages/airflow/utils/email.py", line 91, in send_email
**kwargs,
File "/home/airflow/.local/lib/python3.7/site-packages/airflow/utils/email.py", line 152, in send_email_smtp
send_mime_email(e_from=mail_from, e_to=recipients, mime_msg=msg, conn_id=conn_id, dryrun=dryrun)
File "/home/airflow/.local/lib/python3.7/site-packages/airflow/utils/email.py", line 270, in send_mime_email
smtp_conn = _get_smtp_connection(smtp_host, smtp_port, smtp_timeout, smtp_ssl)
File "/home/airflow/.local/lib/python3.7/site-packages/airflow/utils/email.py", line 317, in _get_smtp_connection
else smtplib.SMTP(host=host, port=port, timeout=timeout)
File "/usr/local/lib/python3.7/smtplib.py", line 251, in __init__
(code, msg) = self.connect(host, port)
File "/usr/local/lib/python3.7/smtplib.py", line 336, in connect
self.sock = self._get_socket(host, port, self.timeout)
File "/usr/local/lib/python3.7/smtplib.py", line 307, in _get_socket
self.source_address)
File "/usr/local/lib/python3.7/socket.py", line 728, in create_connection
raise err
File "/usr/local/lib/python3.7/socket.py", line 716, in create_connection
sock.connect(sa)
OSError: [Errno 99] Cannot assign requested address
[2023-02-09, 16:06:27 UTC] {standard_task_runner.py:105} ERROR - Failed to execute job 109 for task execute_docker_command (Bash command failed. The command returned a non-zero exit code 125.; 17492)
[2023-02-09, 16:06:27 UTC] {local_task_job.py:208} INFO - Task exited with return code 1
[2023-02-09, 16:06:27 UTC] {taskinstance.py:2578} INFO - 0 downstream tasks scheduled from follow-on schedule check
Script is like below
"""
Code that goes along with the Airflow located at:
http://airflow.readthedocs.org/en/latest/tutorial.html
"""
import airflow
from airflow import DAG
from airflow.operators.bash_operator import BashOperator
from datetime import datetime, timedelta
from airflow.operators.docker_operator import DockerOperator
from airflow.operators.python_operator import PythonOperator
from airflow.operators.email_operator import EmailOperator
default_args = {
"owner": "airflow",
"depends_on_past": False,
"start_date": datetime(2023, 2, 8),
"email": ["austin.jackson#xxxx.com"],
#"email_on_failure": False,
"email_on_failure": True,
"email_on_retry": True,
"retries": 1,
"retry_delay": timedelta(minutes=5),
# 'queue': 'bash_queue',
# 'pool': 'backfill',
# 'priority_weight': 10,
# 'end_date': datetime(2016, 1, 1),
}
dag = DAG("DocImageExec", default_args=default_args, schedule_interval=timedelta(1))
# t1, t2 and t3 are examples of tasks created by instantiating operators
t1 = BashOperator(task_id="execute_docker_command", bash_command="docker run -d -p 5000:5000 image-v1", dag=dag)
t1
Please help with the proper configuration required for the mail with the airflow to work, I need office 365 mail integration for Apache Airflow alerts.
Below is the code:
"""
Code that goes along with the Airflow located at:
http://airflow.readthedocs.org/en/latest/tutorial.html
"""
import airflow
from airflow import DAG
from airflow.operators.bash_operator import BashOperator
from datetime import datetime, timedelta
from airflow.operators.docker_operator import DockerOperator
from airflow.operators.python_operator import PythonOperator
from airflow.operators.email_operator import EmailOperator
default_args = {
"owner": "airflow",
"depends_on_past": False,
"start_date": datetime(2023, 2, 8),
"email": ["austin.jackson#xxxx.com"],
#"email_on_failure": False,
"email_on_failure": True,
"email_on_retry": True,
"retries": 1,
"retry_delay": timedelta(minutes=5),
# 'queue': 'bash_queue',
# 'pool': 'backfill',
# 'priority_weight': 10,
# 'end_date': datetime(2016, 1, 1),
}
dag = DAG("DocImageExec", default_args=default_args, schedule_interval=timedelta(1))
# t1 example of tasks created by instantiating operators
t1 = BashOperator(task_id="execute_docker_command", bash_command="docker run -d -p 6000:6000 image-docker", dag=dag)
t1
As please review the code output, so please help as the SMTP is giving the error and not able to send the post/email to the required email id
The error you mentioned is something different not related to SMTP, first figure that out. It says docker demon is not running? How are you running airflow standalone/K8s(standalone I guess).
Normally for setting up SMTP to send email for alerts its better to use EmailOperator
Enable IMAP for SMTP
Update airflow config file with smtp details like smtp_host, SSL, user, password and port.
Use EmailOperator in the dag
It should do the job.
We are using AIRFLOW tasks to push file to S3 Object storage, which runs perfectly fine excepts very high volume files(eg: 5GB). For very high volume files the task is failing with below Error.
[2021-07-29 11:24:32,416] {taskinstance.py:605} DEBUG - Refreshed TaskInstance <TaskInstance: DAG1.task1 2021-07-28T12:48:29+00:00 [None]>
[2021-07-29 11:24:32,418] {local_task_job.py:188} WARNING - State of this instance has been externally set to None. Terminating instance.
[2021-07-29 11:24:32,457] {process_utils.py:100} INFO - Sending Signals.SIGTERM to GPID 29564
[2021-07-29 11:24:32,457] {taskinstance.py:1239} ERROR - Received SIGTERM. Terminating subprocesses.
[2021-07-29 11:24:32,457] {bash.py:185} INFO - Sending SIGTERM signal to bash process group
[2021-07-29 11:24:32,458] {taskinstance.py:570} DEBUG - Refreshing TaskInstance <TaskInstance: DAG1.task1 2021-07-28T12:48:29+00:00 [running]> from DB
[2021-07-29 11:24:32,484] {taskinstance.py:605} DEBUG - Refreshed TaskInstance <TaskInstance: DAG1.task1 2021-07-28T12:48:29+00:00 [None]>
[2021-07-29 11:24:32,486] {taskinstance.py:1455} ERROR - Task received SIGTERM signal
Traceback (most recent call last):
File "/amex/app/airflow/venv/lib64/python3.6/site-packages/airflow/models/taskinstance.py", line 1112, in _run_raw_task
self._prepare_and_execute_task_with_callbacks(context, task)
File "/amex/app/airflow/venv/lib64/python3.6/site-packages/airflow/models/taskinstance.py", line 1285, in _prepare_and_execute_task_with_callbacks
result = self._execute_task(context, task_copy)
File "/amex/app/airflow/venv/lib64/python3.6/site-packages/airflow/models/taskinstance.py", line 1315, in _execute_task
result = task_copy.execute(context=context)
File "/amex/app/airflow/venv/lib64/python3.6/site-packages/airflow/operators/bash.py", line 171, in execute
for raw_line in iter(self.sub_process.stdout.readline, b''):
File "/amex/app/airflow/venv/lib64/python3.6/site-packages/airflow/models/taskinstance.py", line 1241, in signal_handler
raise AirflowException("Task received SIGTERM signal")
airflow.exceptions.AirflowException: Task received SIGTERM signal
We are using boto3 to push to S3 object storage, but not able to figure out what is causing the issue. We have changed killed_task_cleanup_time = 60 to killed_task_cleanup_time = 21600 in airflow.cfg, but still getting the same error.
We have postgresql as our airflow metadata database.
I am trying to run an Airflow DAG that queries the dag table in the Airflow Postgres database. Here is the code for the DAG:
from airflow import DAG
from airflow.operators.dummy_operator import DummyOperator
from airflow.operators.python_operator import PythonOperator
from airflow.hooks.postgres_hook import PostgresHook
from datetime import datetime
default_args = {
'owner': 'airflow',
'depend_on_past': False,
'start_date': datetime(year=2019, month=10, day=1),
'retries': 0
}
def get_dag_table():
query = 'SELECT * FROM dag LIMIT 5;'
hook = PostgresHook(postgre_conn_id='postgres_default',
host='localhost',
database='airflow',
user='airflow',
password='airflow',
port=5432)
connection = hook.get_conn()
# COMMENTED OUT FOR DEBUGGING
# cursor = connection.cursor()
# cursor.execute(request)
# return cursor.fetchall()
dag = DAG(
"custom_postgres_tutorial",
default_args=default_args,
schedule_interval=None
)
start_task = DummyOperator(task_id='start_task', dag=dag)
postgres_task = PythonOperator(task_id='query_dag_table',
python_callable=get_dag_table,
dag=dag)
start_task >> postgres_task
Here are the steps that I followed:
1) I cloned the Puckel docker-airflow repo (https://github.com/puckel/docker-airflow).
2) I then ran the command $ docker-compose -f docker-compose-LocalExecutor.yml up -d to start an Airflow webserver and the Postgres database.
3) Created a custom connection that looks like this:
4) When I trigger the DAG I get the following error:
[2019-10-07 14:51:11,034] {{taskinstance.py:1078}} INFO - Marking task as FAILED.
[2019-10-07 14:51:11,050] {{base_task_runner.py:115}} INFO - Job 5229: Subtask query_dag_table Traceback (most recent call last):
[2019-10-07 14:51:11,050] {{base_task_runner.py:115}} INFO - Job 5229: Subtask query_dag_table File "/usr/local/bin/airflow", line 32, in <module>
[2019-10-07 14:51:11,050] {{base_task_runner.py:115}} INFO - Job 5229: Subtask query_dag_table args.func(args)
[2019-10-07 14:51:11,050] {{base_task_runner.py:115}} INFO - Job 5229: Subtask query_dag_table File "/usr/local/lib/python3.7/site-packages/airflow/utils/cli.py", line 74, in wrapper
[2019-10-07 14:51:11,050] {{base_task_runner.py:115}} INFO - Job 5229: Subtask query_dag_table return f(*args, **kwargs)
[2019-10-07 14:51:11,050] {{base_task_runner.py:115}} INFO - Job 5229: Subtask query_dag_table File "/usr/local/lib/python3.7/site-packages/airflow/bin/cli.py", line 522, in run
[2019-10-07 14:51:11,050] {{base_task_runner.py:115}} INFO - Job 5229: Subtask query_dag_table _run(args, dag, ti)
[2019-10-07 14:51:11,050] {{base_task_runner.py:115}} INFO - Job 5229: Subtask query_dag_table File "/usr/local/lib/python3.7/site-packages/airflow/bin/cli.py", line 440, in _run
[2019-10-07 14:51:11,050] {{base_task_runner.py:115}} INFO - Job 5229: Subtask query_dag_table pool=args.pool,
[2019-10-07 14:51:11,050] {{base_task_runner.py:115}} INFO - Job 5229: Subtask query_dag_table File "/usr/local/lib/python3.7/site-packages/airflow/utils/db.py", line 74, in wrapper
[2019-10-07 14:51:11,050] {{base_task_runner.py:115}} INFO - Job 5229: Subtask query_dag_table return func(*args, **kwargs)
[2019-10-07 14:51:11,050] {{base_task_runner.py:115}} INFO - Job 5229: Subtask query_dag_table File "/usr/local/lib/python3.7/site-packages/airflow/models/taskinstance.py", line 922, in _run_raw_task
[2019-10-07 14:51:11,050] {{base_task_runner.py:115}} INFO - Job 5229: Subtask query_dag_table result = task_copy.execute(context=context)
[2019-10-07 14:51:11,050] {{base_task_runner.py:115}} INFO - Job 5229: Subtask query_dag_table File "/usr/local/lib/python3.7/site-packages/airflow/operators/python_operator.py", line 113, in execute
[2019-10-07 14:51:11,050] {{base_task_runner.py:115}} INFO - Job 5229: Subtask query_dag_table return_value = self.execute_callable()
[2019-10-07 14:51:11,051] {{base_task_runner.py:115}} INFO - Job 5229: Subtask query_dag_table File "/usr/local/lib/python3.7/site-packages/airflow/operators/python_operator.py", line 118, in execute_callable
[2019-10-07 14:51:11,051] {{base_task_runner.py:115}} INFO - Job 5229: Subtask query_dag_table return self.python_callable(*self.op_args, **self.op_kwargs)
[2019-10-07 14:51:11,051] {{base_task_runner.py:115}} INFO - Job 5229: Subtask query_dag_table File "/usr/local/airflow/dags/tutorial-postgres.py", line 23, in get_dag_table
[2019-10-07 14:51:11,051] {{base_task_runner.py:115}} INFO - Job 5229: Subtask query_dag_table connection = hook.get_conn()
[2019-10-07 14:51:11,051] {{base_task_runner.py:115}} INFO - Job 5229: Subtask query_dag_table File "/usr/local/lib/python3.7/site-packages/airflow/hooks/postgres_hook.py", line 75, in get_conn
[2019-10-07 14:51:11,051] {{base_task_runner.py:115}} INFO - Job 5229: Subtask query_dag_table self.conn = psycopg2.connect(**conn_args)
[2019-10-07 14:51:11,051] {{base_task_runner.py:115}} INFO - Job 5229: Subtask query_dag_table File "/usr/local/lib/python3.7/site-packages/psycopg2/__init__.py", line 130, in connect
[2019-10-07 14:51:11,051] {{base_task_runner.py:115}} INFO - Job 5229: Subtask query_dag_table conn = _connect(dsn, connection_factory=connection_factory, **kwasync)
[2019-10-07 14:51:11,051] {{base_task_runner.py:115}} INFO - Job 5229: Subtask query_dag_table psycopg2.OperationalError: could not connect to server: Connection refused
[2019-10-07 14:51:11,051] {{base_task_runner.py:115}} INFO - Job 5229: Subtask query_dag_table Is the server running on host "localhost" (127.0.0.1) and accepting
[2019-10-07 14:51:11,051] {{base_task_runner.py:115}} INFO - Job 5229: Subtask query_dag_table TCP/IP connections on port 5432?
[2019-10-07 14:51:11,051] {{base_task_runner.py:115}} INFO - Job 5229: Subtask query_dag_table could not connect to server: Cannot assign requested address
[2019-10-07 14:51:11,051] {{base_task_runner.py:115}} INFO - Job 5229: Subtask query_dag_table Is the server running on host "localhost" (::1) and accepting
[2019-10-07 14:51:11,051] {{base_task_runner.py:115}} INFO - Job 5229: Subtask query_dag_table TCP/IP connections on port 5432?
[2019-10-07 14:51:11,051] {{base_task_runner.py:115}} INFO - Job 5229: Subtask query_dag_table
I have tried following every suggestion that I have found online and none have resolved this situation. I am confused because I can connect to the database using pycharm:
As well, when I run the command $ docker container ls I get the following results showing that the Postgres container is open on port 5432:
CONTAINER ID IMAGE COMMAND CREATED STATUS PORTS NAMES
xxxxxxxxxxxx puckel/docker-airflow:1.10.4 "/entrypoint.sh webs…" 2 hours ago Up 2 hours (healthy) 5555/tcp, 8793/tcp, 0.0.0.0:8080->8080/tcp docker-airflow_webserver_1
xxxxxxxxxxxx postgres:9.6 "docker-entrypoint.s…" 2 days ago Up 2 hours 0.0.0.0:5432->5432/tcp docker-airflow_postgres_1
Try changing the Host field in the connection UI page, to host.docker.internal or postgres instead of localhost.
airflow task is unable to connect to remote oracle database but i am able to connect to the same remote oracle database using the same code from python shell.
i checked running environment is same for both the shell and airflow.
i am attaching the log error which i am getting.
[2018-07-18 13:12:11,037] {models.py:1428} INFO - Executing <Task(AadharNumberValidation): validating_data> on 2018-07-18 07:42:05.573491
[2018-07-18 13:12:11,037] {base_task_runner.py:115} INFO - Running: ['bash', '-c', 'airflow run data_validation validating_data 2018-07-18T07:42:05.573491 --job_id 206 --raw -sd /Users/b0204890/Desktop/python/airflow_home/dags/data_validaton.py']
[2018-07-18 13:12:11,531] {base_task_runner.py:98} INFO - Subtask: [2018-07-18 13:12:11,531] {__init__.py:45} INFO - Using executor SequentialExecutor
[2018-07-18 13:12:11,588] {base_task_runner.py:98} INFO - Subtask: [2018-07-18 13:12:11,588] {models.py:189} INFO - Filling up the DagBag from /Users/b0204890/Desktop/python/airflow_home/dags/data_validaton.py
[2018-07-18 13:12:11,661] {cli.py:374} INFO - Running on host LTB0204890-Mac.local
[2018-07-18 13:12:11,669] {base_task_runner.py:98} INFO - Subtask: [2018-07-18 13:12:11,669] {validation_operators.py:37} INFO - operator_param yadav: Script one validation
[2018-07-18 13:12:11,678] {models.py:1595} ERROR - DPI-1047: 64-bit Oracle Client library cannot be loaded: "dlopen(libclntsh.dylib, 1): image not found". See https://oracle.github.io/odpi/doc/installation.html#macos for help
Traceback (most recent call last):
File "/Users/b0204890/venv/python3/lib/python3.6/site-packages/airflow/models.py", line 1493, in _run_raw_task
result = task_copy.execute(context=context)
File "/Users/b0204890/Desktop/python//airflow_home/plugins/validation_operators.py", line 38, in execute
cursor = create_connection(user="USERNAME",port="PORT",host="HOST",pwd="password",sid="SID")
File "/Users/b0204890/Desktop/python/airflow_home/utility/validation.py", line 30, in create_connection
connection = cx_Oracle.connect(user, pwd, service)
cx_Oracle.DatabaseError: DPI-1047: 64-bit Oracle Client library cannot be loaded: "dlopen(libclntsh.dylib, 1): image not found". See https://oracle.github.io/odpi/doc/installation.html#macos for help
[2018-07-18 13:12:11,681] {models.py:1616} INFO - Marking task as UP_FOR_RETRY
[2018-07-18 13:12:11,682] {base_task_runner.py:98} INFO - Subtask: [2018-07-18 13:12:11,682] {configuration.py:206} WARNING - section/key [smtp/smtp_user] not found in config
[2018-07-18 13:12:11,684] {models.py:1628} ERROR - Failed to send email to: ['tushar.smartcorp#gmail.com']
[2018-07-18 13:12:11,684] {models.py:1629} ERROR - [Errno 61] Connection refused
Traceback (most recent call last):
File "/Users/b0204890/venv/python3/lib/python3.6/site-packages/airflow/models.py", line 1493, in _run_raw_task
result = task_copy.execute(context=context)
File "/Users/b0204890/Desktop/python//airflow_home/plugins/validation_operators.py", line 38, in execute
cursor = create_connection(user="USERNAME",port="PORT",host="HOST",pwd="PASSWORD",sid="SID")
File "/Users/b0204890/Desktop/python/airflow_home/utility/validation.py", line 30, in create_connection
connection = cx_Oracle.connect(user, pwd, service)
cx_Oracle.DatabaseError: DPI-1047: 64-bit Oracle Client library cannot be loaded: "dlopen(libclntsh.dylib, 1): image not found". See https://oracle.github.io/odpi/doc/installation.html#macos for help
During handling of the above exception, another exception occurred:
Traceback (most recent call last):
File "/Users/b0204890/venv/python3/lib/python3.6/site-packages/airflow/models.py", line 1618, in handle_failure
self.email_alert(error, is_retry=True)
File "/Users/b0204890/venv/python3/lib/python3.6/site-packages/airflow/models.py", line 1779, in email_alert
send_email(task.email, title, body)
File "/Users/b0204890/venv/python3/lib/python3.6/site-packages/airflow/utils/email.py", line 44, in send_email
return backend(to, subject, html_content, files=files, dryrun=dryrun, cc=cc, bcc=bcc, mime_subtype=mime_subtype)
File "/Users/b0204890/venv/python3/lib/python3.6/site-packages/airflow/utils/email.py", line 87, in send_email_smtp
send_MIME_email(SMTP_MAIL_FROM, recipients, msg, dryrun)
File "/Users/b0204890/venv/python3/lib/python3.6/site-packages/airflow/utils/email.py", line 107, in send_MIME_email
s = smtplib.SMTP_SSL(SMTP_HOST, SMTP_PORT) if SMTP_SSL else smtplib.SMTP(SMTP_HOST, SMTP_PORT)
File "/usr/local/Cellar/python/3.6.5_1/Frameworks/Python.framework/Versions/3.6/lib/python3.6/smtplib.py", line 251, in __init__
(code, msg) = self.connect(host, port)
File "/usr/local/Cellar/python/3.6.5_1/Frameworks/Python.framework/Versions/3.6/lib/python3.6/smtplib.py", line 336, in connect
self.sock = self._get_socket(host, port, self.timeout)
File "/usr/local/Cellar/python/3.6.5_1/Frameworks/Python.framework/Versions/3.6/lib/python3.6/smtplib.py", line 307, in _get_socket
self.source_address)
File "/usr/local/Cellar/python/3.6.5_1/Frameworks/Python.framework/Versions/3.6/lib/python3.6/socket.py", line 724, in create_connection
raise err
File "/usr/local/Cellar/python/3.6.5_1/Frameworks/Python.framework/Versions/3.6/lib/python3.6/socket.py", line 713, in create_connection
sock.connect(sa)
ConnectionRefusedError: [Errno 61] Connection refused
Without following all of the instructions the connector will not work. Simply doing an installation for the Python package cx-Oracle is not enough. In this case it could also be that the venv of Airflow cannot access all neccessary files or that there is some rights issue.
As stated in the message, you need to follow all steps seen in: https://oracle.github.io/odpi/doc/installation.html#macos
Using Airflow I want to get the result of an SQL Query fomratted as a pandas DataFrame.
def get_my_query(*args, **kwargs)
bq_hook = BigQueryHook(bigquery_conn_id='my_connection_id', delegate_to=None)
my_query = """
SELECT col1, col2
FROM `my_bq_project.my_bq_dataset.my_table`
"""
df = bq_hook.get_pandas_df(bql=my_query, dialect='standard')
logging.info('df.head()\n{}'.format(df.head()))
Above is the python function that I want to execute in a PythonOperator. Here is the DAG:
my_dag = DAG('my_dag',start_date=datetime.today())
start = DummyOperator(task_id='start', dag=my_dag)
end = DummyOperator(task_id='end', dag=my_dag)
work = PythonOperator(task_id='work',python_callable=get_my_query, dag=my_dag)
start >> work >> end
But, the work step is throwing an exception. Here is the log :
[2018-04-02 20:25:50,506] {base_task_runner.py:98} INFO - Subtask: [2018-04-02 20:25:50,506] {gcp_api_base_hook.py:82} INFO - Getting connection using a JSON key file.
[2018-04-02 20:25:51,035] {base_task_runner.py:98} INFO - Subtask: [2018-04-02 20:25:51,035] {slack_operator.py:70} ERROR - Slack API call failed (%s)
[2018-04-02 20:25:51,070] {base_task_runner.py:98} INFO - Subtask: Traceback (most recent call last):
[2018-04-02 20:25:51,071] {base_task_runner.py:98} INFO - Subtask: File "/opt/conda/bin/airflow", line 28, in <module>
[2018-04-02 20:25:51,072] {base_task_runner.py:98} INFO - Subtask: args.func(args)
[2018-04-02 20:25:51,072] {base_task_runner.py:98} INFO - Subtask: File "/home/airflow/.local/lib/python2.7/site-packages/airflow/bin/cli.py", line 392, in run
[2018-04-02 20:25:51,073] {base_task_runner.py:98} INFO - Subtask: pool=args.pool,
[2018-04-02 20:25:51,074] {base_task_runner.py:98} INFO - Subtask: File "/home/airflow/.local/lib/python2.7/site-packages/airflow/utils/db.py", line 50, in wrapper
[2018-04-02 20:25:51,075] {base_task_runner.py:98} INFO - Subtask: result = func(*args, **kwargs)
[2018-04-02 20:25:51,075] {base_task_runner.py:98} INFO - Subtask: File "/home/airflow/.local/lib/python2.7/site-packages/airflow/models.py", line 1493, in _run_raw_task
[2018-04-02 20:25:51,076] {base_task_runner.py:98} INFO - Subtask: result = task_copy.execute(context=context)
[2018-04-02 20:25:51,077] {base_task_runner.py:98} INFO - Subtask: File "/home/airflow/.local/lib/python2.7/site-packages/airflow/operators/python_operator.py", line 89, in execute
[2018-04-02 20:25:51,077] {base_task_runner.py:98} INFO - Subtask: return_value = self.execute_callable()
[2018-04-02 20:25:51,078] {base_task_runner.py:98} INFO - Subtask: File "/home/airflow/.local/lib/python2.7/site-packages/airflow/operators/python_operator.py", line 94, in execute_callable
[2018-04-02 20:25:51,079] {base_task_runner.py:98} INFO - Subtask: return self.python_callable(*self.op_args, **self.op_kwargs)
[2018-04-02 20:25:51,080] {base_task_runner.py:98} INFO - Subtask: File "/home/airflow/.local/lib/python2.7/site-packages/processing/dags/my_dag.py", line 37, in get_my_query
[2018-04-02 20:25:51,080] {base_task_runner.py:98} INFO - Subtask: df = bq_hook.get_pandas_df(bql=my_query, dialect='standard')
[2018-04-02 20:25:51,081] {base_task_runner.py:98} INFO - Subtask: File "/home/airflow/.local/lib/python2.7/site-packages/airflow/contrib/hooks/bigquery_hook.py", line 94, in get_pandas_df
[2018-04-02 20:25:51,081] {base_task_runner.py:98} INFO - Subtask: schema, pages = connector.run_query(bql)
[2018-04-02 20:25:51,082] {base_task_runner.py:98} INFO - Subtask: File "/home/airflow/.local/lib/python2.7/site-packages/pandas_gbq/gbq.py", line 502, in run_query
[2018-04-02 20:25:51,082] {base_task_runner.py:98} INFO - Subtask: except self.http_error as ex:
[2018-04-02 20:25:51,082] {base_task_runner.py:98} INFO - Subtask: AttributeError: 'BigQueryPandasConnector' object has no attribute 'http_error'
This exception is due to this issue, which accroding to the description
When BigQueryPandasConnector (in bigquery_hook.py) encounters a BQ job insertion error, the exception will be assigned to connector.http_error
hides another exception, still strange because I'm not doing any insertion.
What am I doing wrong? Maybe there is a problem with bigquery_conn_id used in the BigQueryHook. Or, dataFrame is not the way to go in order to handle query results.
PS: result of pip freeze
alembic==0.8.10
amqp==2.2.2
apache-airflow==1.9.0
apache-beam==2.3.0
asn1crypto==0.24.0
avro==1.8.2
Babel==2.5.3
backports-abc==0.5
bcrypt==3.1.4
billiard==3.5.0.3
bleach==2.1.2
cachetools==2.0.1
celery==4.1.0
certifi==2018.1.18
cffi==1.11.4
chardet==3.0.4
click==6.7
configparser==3.5.0
crcmod==1.7
croniter==0.3.20
cryptography==2.1.4
dill==0.2.7.1
docutils==0.14
elasticsearch==1.4.0
enum34==1.1.6
fasteners==0.14.1
Flask==0.11.1
Flask-Admin==1.4.1
Flask-Bcrypt==0.7.1
Flask-Cache==0.13.1
Flask-Login==0.2.11
flask-swagger==0.2.13
Flask-WTF==0.14
flower==0.9.2
funcsigs==1.0.0
future==0.16.0
futures==3.2.0
gapic-google-cloud-datastore-v1==0.15.3
gapic-google-cloud-error-reporting-v1beta1==0.15.3
gapic-google-cloud-logging-v2==0.91.3
gapic-google-cloud-pubsub-v1==0.15.4
gapic-google-cloud-spanner-admin-database-v1==0.15.3
gapic-google-cloud-spanner-admin-instance-v1==0.15.3sta
gapic-google-cloud-spanner-v1==0.15.3
gitdb2==2.0.3
GitPython==2.1.8
google-api-core==1.1.0
google-api-python-client==1.6.5
google-apitools==0.5.20
google-auth==1.4.1
google-auth-oauthlib==0.2.0
google-cloud==0.27.0
google-cloud-bigquery==0.31.0
google-cloud-bigtable==0.26.0
google-cloud-core==0.28.1
google-cloud-dataflow==2.3.0
google-cloud-datastore==1.2.0
google-cloud-dns==0.26.0
google-cloud-error-reporting==0.26.0
google-cloud-language==0.27.0
google-cloud-logging==1.2.0
google-cloud-monitoring==0.26.0
google-cloud-pubsub==0.27.0
google-cloud-resource-manager==0.26.0
google-cloud-runtimeconfig==0.26.0
google-cloud-spanner==0.26.0
google-cloud-speech==0.28.0
google-cloud-storage==1.3.2
google-cloud-translate==1.1.0
google-cloud-videointelligence==0.25.0
google-cloud-vision==0.26.0
google-gax==0.15.16
google-resumable-media==0.3.1
googleads==4.5.1
googleapis-common-protos==1.5.3
googledatastore==7.0.1
grpc-google-iam-v1==0.11.4
grpcio==1.10.0
gunicorn==19.7.1
hdfs3==0.3.0
html5lib==1.0.1
httplib2==0.10.3
idna==2.6
ipaddress==1.0.19
itsdangerous==0.24
Jinja2==2.8.1
kombu==4.1.0
ldap3==2.4.1
lockfile==0.12.2
lxml==3.8.0
Mako==1.0.7
Markdown==2.6.11
MarkupSafe==1.0
mock==2.0.0
monotonic==1.4
mysqlclient==1.3.10
numpy==1.13.0
oauth2client==2.0.2
oauthlib==2.0.7
ordereddict==1.1
pandas==0.19.2
pandas-gbq==0.3.1
pbr==3.1.1
ply==3.8
proto-google-cloud-datastore-v1==0.90.4
proto-google-cloud-error-reporting-v1beta1==0.15.3
proto-google-cloud-logging-v2==0.91.3
proto-google-cloud-pubsub-v1==0.15.4
proto-google-cloud-spanner-admin-database-v1==0.15.3
proto-google-cloud-spanner-admin-instance-v1==0.15.3
proto-google-cloud-spanner-v1==0.15.3
protobuf==3.5.2
psutil==4.4.2
pyasn1==0.4.2
pyasn1-modules==0.2.1
pycosat==0.6.3
pycparser==2.18
Pygments==2.2.0
pyOpenSSL==17.5.0
PySocks==1.6.8
python-daemon==2.1.2
python-dateutil==2.7.0
python-editor==1.0.3
python-nvd3==0.14.2
python-slugify==1.1.4
pytz==2018.3
PyVCF==0.6.8
PyYAML==3.12
redis==2.10.6
requests==2.18.4
requests-oauthlib==0.8.0
rsa==3.4.2
setproctitle==1.1.10
setuptools-scm==1.15.0
singledispatch==3.4.0.3
six==1.11.0
slackclient==1.1.3
smmap2==2.0.3
SQLAlchemy==1.2.5
statsd==3.2.2
suds-jurko==0.6
tableauserverclient==0.5.1
tabulate==0.7.7
tenacity==4.9.0
thrift==0.11.0
tornado==5.0.1
typing==3.6.4
Unidecode==1.0.22
uritemplate==3.0.0
urllib3==1.22
vine==1.1.4
webencodings==0.5.1
websocket-client==0.47.0
Werkzeug==0.14.1
WTForms==2.1
xmltodict==0.11.0
zope.deprecation==4.3.0
Another possible way would be to use the pandas Big Query connector.
pd.read_gbq
and
pd.to_gbq
Looking at the stack trace, the BigQueryHook is using the connector itself.
It might be a good idea to
1) try the connection with the pandas connector in a PythonOperator directly
2) then maybe switch to the pandas connector or try to debug the BigQueryHook when the connection works