SMTP for Airflow Configuration getting error - python

I am getting errors post configuration
*** Reading local file: /opt/airflow/logs/dag_id=DocImageExec/run_id=manual__2023-02-09T16:06:23.630116+00:00/task_id=execute_docker_command/attempt=1.log
[2023-02-09, 16:06:26 UTC] {taskinstance.py:1083} INFO - Dependencies all met for <TaskInstance: DocImageExec.execute_docker_command manual__2023-02-09T16:06:23.630116+00:00 [queued]>
[2023-02-09, 16:06:26 UTC] {taskinstance.py:1083} INFO - Dependencies all met for <TaskInstance: DocImageExec.execute_docker_command manual__2023-02-09T16:06:23.630116+00:00 [queued]>
[2023-02-09, 16:06:26 UTC] {taskinstance.py:1279} INFO -
--------------------------------------------------------------------------------
[2023-02-09, 16:06:26 UTC] {taskinstance.py:1280} INFO - Starting attempt 1 of 2
[2023-02-09, 16:06:26 UTC] {taskinstance.py:1281} INFO -
--------------------------------------------------------------------------------
[2023-02-09, 16:06:26 UTC] {taskinstance.py:1300} INFO - Executing <Task(BashOperator): execute_docker_command> on 2023-02-09 16:06:23.630116+00:00
[2023-02-09, 16:06:26 UTC] {standard_task_runner.py:55} INFO - Started process 17492 to run task
[2023-02-09, 16:06:26 UTC] {standard_task_runner.py:82} INFO - Running: ['***', 'tasks', 'run', 'DocImageExec', 'execute_docker_command', 'manual__2023-02-09T16:06:23.630116+00:00', '--job-id', '109', '--raw', '--subdir', 'DAGS_FOLDER/docimage.py', '--cfg-path', '/tmp/tmptln30ewq']
[2023-02-09, 16:06:26 UTC] {standard_task_runner.py:83} INFO - Job 109: Subtask execute_docker_command
[2023-02-09, 16:06:26 UTC] {task_command.py:388} INFO - Running <TaskInstance: DocImageExec.execute_docker_command manual__2023-02-09T16:06:23.630116+00:00 [running]> on host 14b5d43a840e
[2023-02-09, 16:06:26 UTC] {taskinstance.py:1509} INFO - Exporting the following env vars:
AIRFLOW_CTX_DAG_EMAIL=austin.jackson#xxxxx.com
AIRFLOW_CTX_DAG_OWNER=***
AIRFLOW_CTX_DAG_ID=DocImageExec
AIRFLOW_CTX_TASK_ID=execute_docker_command
AIRFLOW_CTX_EXECUTION_DATE=2023-02-09T16:06:23.630116+00:00
AIRFLOW_CTX_TRY_NUMBER=1
AIRFLOW_CTX_DAG_RUN_ID=manual__2023-02-09T16:06:23.630116+00:00
[2023-02-09, 16:06:26 UTC] {subprocess.py:63} INFO - Tmp dir root location:
/tmp
[2023-02-09, 16:06:26 UTC] {subprocess.py:75} INFO - Running command: ['/bin/bash', '-c', 'docker run -d -p 5000:5000 image-docker']
[2023-02-09, 16:06:26 UTC] {subprocess.py:86} INFO - Output:
[2023-02-09, 16:06:26 UTC] {subprocess.py:93} INFO - docker: Cannot connect to the Docker daemon at unix:///var/run/docker.sock. Is the docker daemon running?.
[2023-02-09, 16:06:26 UTC] {subprocess.py:93} INFO - See 'docker run --help'.
[2023-02-09, 16:06:26 UTC] {subprocess.py:97} INFO - Command exited with return code 125
[2023-02-09, 16:06:27 UTC] {taskinstance.py:1768} ERROR - Task failed with exception
Traceback (most recent call last):
File "/home/airflow/.local/lib/python3.7/site-packages/airflow/operators/bash.py", line 197, in execute
f"Bash command failed. The command returned a non-zero exit code {result.exit_code}."
airflow.exceptions.AirflowException: Bash command failed. The command returned a non-zero exit code 125.
[2023-02-09, 16:06:27 UTC] {taskinstance.py:1323} INFO - Marking task as UP_FOR_RETRY. dag_id=DocImageExec, task_id=execute_docker_command, execution_date=20230209T160623, start_date=20230209T160626, end_date=20230209T160627
[2023-02-09, 16:06:27 UTC] {warnings.py:110} WARNING - /home/***/.local/lib/python3.7/site-packages/***/utils/email.py:152: RemovedInAirflow3Warning: Fetching SMTP credentials from configuration variables will be deprecated in a future release. Please set credentials using a connection instead.
send_mime_email(e_from=mail_from, e_to=recipients, mime_msg=msg, conn_id=conn_id, dryrun=dryrun)
[2023-02-09, 16:06:27 UTC] {configuration.py:663} WARNING - section/key [smtp/smtp_user] not found in config
[2023-02-09, 16:06:27 UTC] {email.py:268} INFO - Email alerting: attempt 1
[2023-02-09, 16:06:27 UTC] {configuration.py:663} WARNING - section/key [smtp/smtp_user] not found in config
[2023-02-09, 16:06:27 UTC] {email.py:268} INFO - Email alerting: attempt 1
[2023-02-09, 16:06:27 UTC] {taskinstance.py:1831} ERROR - Failed to send email to: ['austin.jackson#xxxx.com']
Traceback (most recent call last):
File "/home/airflow/.local/lib/python3.7/site-packages/airflow/models/taskinstance.py", line 1374, in _run_raw_task
self._execute_task_with_callbacks(context, test_mode)
File "/home/airflow/.local/lib/python3.7/site-packages/airflow/models/taskinstance.py", line 1520, in _execute_task_with_callbacks
result = self._execute_task(context, task_orig)
File "/home/airflow/.local/lib/python3.7/site-packages/airflow/models/taskinstance.py", line 1581, in _execute_task
result = execute_callable(context=context)
File "/home/airflow/.local/lib/python3.7/site-packages/airflow/operators/bash.py", line 197, in execute
f"Bash command failed. The command returned a non-zero exit code {result.exit_code}."
airflow.exceptions.AirflowException: Bash command failed. The command returned a non-zero exit code 125.
During handling of the above exception, another exception occurred:
Traceback (most recent call last):
File "/home/airflow/.local/lib/python3.7/site-packages/airflow/models/taskinstance.py", line 2231, in email_alert
send_email(task.email, subject, html_content)
File "/home/airflow/.local/lib/python3.7/site-packages/airflow/utils/email.py", line 91, in send_email
**kwargs,
File "/home/airflow/.local/lib/python3.7/site-packages/airflow/utils/email.py", line 152, in send_email_smtp
send_mime_email(e_from=mail_from, e_to=recipients, mime_msg=msg, conn_id=conn_id, dryrun=dryrun)
File "/home/airflow/.local/lib/python3.7/site-packages/airflow/utils/email.py", line 270, in send_mime_email
smtp_conn = _get_smtp_connection(smtp_host, smtp_port, smtp_timeout, smtp_ssl)
File "/home/airflow/.local/lib/python3.7/site-packages/airflow/utils/email.py", line 317, in _get_smtp_connection
else smtplib.SMTP(host=host, port=port, timeout=timeout)
File "/usr/local/lib/python3.7/smtplib.py", line 251, in __init__
(code, msg) = self.connect(host, port)
File "/usr/local/lib/python3.7/smtplib.py", line 336, in connect
self.sock = self._get_socket(host, port, self.timeout)
File "/usr/local/lib/python3.7/smtplib.py", line 307, in _get_socket
self.source_address)
File "/usr/local/lib/python3.7/socket.py", line 728, in create_connection
raise err
File "/usr/local/lib/python3.7/socket.py", line 716, in create_connection
sock.connect(sa)
OSError: [Errno 99] Cannot assign requested address
During handling of the above exception, another exception occurred:
Traceback (most recent call last):
File "/home/airflow/.local/lib/python3.7/site-packages/airflow/models/taskinstance.py", line 1829, in handle_failure
self.email_alert(error, task)
File "/home/airflow/.local/lib/python3.7/site-packages/airflow/models/taskinstance.py", line 2233, in email_alert
send_email(task.email, subject, html_content_err)
File "/home/airflow/.local/lib/python3.7/site-packages/airflow/utils/email.py", line 91, in send_email
**kwargs,
File "/home/airflow/.local/lib/python3.7/site-packages/airflow/utils/email.py", line 152, in send_email_smtp
send_mime_email(e_from=mail_from, e_to=recipients, mime_msg=msg, conn_id=conn_id, dryrun=dryrun)
File "/home/airflow/.local/lib/python3.7/site-packages/airflow/utils/email.py", line 270, in send_mime_email
smtp_conn = _get_smtp_connection(smtp_host, smtp_port, smtp_timeout, smtp_ssl)
File "/home/airflow/.local/lib/python3.7/site-packages/airflow/utils/email.py", line 317, in _get_smtp_connection
else smtplib.SMTP(host=host, port=port, timeout=timeout)
File "/usr/local/lib/python3.7/smtplib.py", line 251, in __init__
(code, msg) = self.connect(host, port)
File "/usr/local/lib/python3.7/smtplib.py", line 336, in connect
self.sock = self._get_socket(host, port, self.timeout)
File "/usr/local/lib/python3.7/smtplib.py", line 307, in _get_socket
self.source_address)
File "/usr/local/lib/python3.7/socket.py", line 728, in create_connection
raise err
File "/usr/local/lib/python3.7/socket.py", line 716, in create_connection
sock.connect(sa)
OSError: [Errno 99] Cannot assign requested address
[2023-02-09, 16:06:27 UTC] {standard_task_runner.py:105} ERROR - Failed to execute job 109 for task execute_docker_command (Bash command failed. The command returned a non-zero exit code 125.; 17492)
[2023-02-09, 16:06:27 UTC] {local_task_job.py:208} INFO - Task exited with return code 1
[2023-02-09, 16:06:27 UTC] {taskinstance.py:2578} INFO - 0 downstream tasks scheduled from follow-on schedule check
Script is like below
"""
Code that goes along with the Airflow located at:
http://airflow.readthedocs.org/en/latest/tutorial.html
"""
import airflow
from airflow import DAG
from airflow.operators.bash_operator import BashOperator
from datetime import datetime, timedelta
from airflow.operators.docker_operator import DockerOperator
from airflow.operators.python_operator import PythonOperator
from airflow.operators.email_operator import EmailOperator
default_args = {
"owner": "airflow",
"depends_on_past": False,
"start_date": datetime(2023, 2, 8),
"email": ["austin.jackson#xxxx.com"],
#"email_on_failure": False,
"email_on_failure": True,
"email_on_retry": True,
"retries": 1,
"retry_delay": timedelta(minutes=5),
# 'queue': 'bash_queue',
# 'pool': 'backfill',
# 'priority_weight': 10,
# 'end_date': datetime(2016, 1, 1),
}
dag = DAG("DocImageExec", default_args=default_args, schedule_interval=timedelta(1))
# t1, t2 and t3 are examples of tasks created by instantiating operators
t1 = BashOperator(task_id="execute_docker_command", bash_command="docker run -d -p 5000:5000 image-v1", dag=dag)
t1
Please help with the proper configuration required for the mail with the airflow to work, I need office 365 mail integration for Apache Airflow alerts.
Below is the code:
"""
Code that goes along with the Airflow located at:
http://airflow.readthedocs.org/en/latest/tutorial.html
"""
import airflow
from airflow import DAG
from airflow.operators.bash_operator import BashOperator
from datetime import datetime, timedelta
from airflow.operators.docker_operator import DockerOperator
from airflow.operators.python_operator import PythonOperator
from airflow.operators.email_operator import EmailOperator
default_args = {
"owner": "airflow",
"depends_on_past": False,
"start_date": datetime(2023, 2, 8),
"email": ["austin.jackson#xxxx.com"],
#"email_on_failure": False,
"email_on_failure": True,
"email_on_retry": True,
"retries": 1,
"retry_delay": timedelta(minutes=5),
# 'queue': 'bash_queue',
# 'pool': 'backfill',
# 'priority_weight': 10,
# 'end_date': datetime(2016, 1, 1),
}
dag = DAG("DocImageExec", default_args=default_args, schedule_interval=timedelta(1))
# t1 example of tasks created by instantiating operators
t1 = BashOperator(task_id="execute_docker_command", bash_command="docker run -d -p 6000:6000 image-docker", dag=dag)
t1
As please review the code output, so please help as the SMTP is giving the error and not able to send the post/email to the required email id

The error you mentioned is something different not related to SMTP, first figure that out. It says docker demon is not running? How are you running airflow standalone/K8s(standalone I guess).
Normally for setting up SMTP to send email for alerts its better to use EmailOperator
Enable IMAP for SMTP
Update airflow config file with smtp details like smtp_host, SSL, user, password and port.
Use EmailOperator in the dag
It should do the job.

Related

AIRFLOW task fails with SIGTERM error while transferring large volume files to cloud object storage

We are using AIRFLOW tasks to push file to S3 Object storage, which runs perfectly fine excepts very high volume files(eg: 5GB). For very high volume files the task is failing with below Error.
[2021-07-29 11:24:32,416] {taskinstance.py:605} DEBUG - Refreshed TaskInstance <TaskInstance: DAG1.task1 2021-07-28T12:48:29+00:00 [None]>
[2021-07-29 11:24:32,418] {local_task_job.py:188} WARNING - State of this instance has been externally set to None. Terminating instance.
[2021-07-29 11:24:32,457] {process_utils.py:100} INFO - Sending Signals.SIGTERM to GPID 29564
[2021-07-29 11:24:32,457] {taskinstance.py:1239} ERROR - Received SIGTERM. Terminating subprocesses.
[2021-07-29 11:24:32,457] {bash.py:185} INFO - Sending SIGTERM signal to bash process group
[2021-07-29 11:24:32,458] {taskinstance.py:570} DEBUG - Refreshing TaskInstance <TaskInstance: DAG1.task1 2021-07-28T12:48:29+00:00 [running]> from DB
[2021-07-29 11:24:32,484] {taskinstance.py:605} DEBUG - Refreshed TaskInstance <TaskInstance: DAG1.task1 2021-07-28T12:48:29+00:00 [None]>
[2021-07-29 11:24:32,486] {taskinstance.py:1455} ERROR - Task received SIGTERM signal
Traceback (most recent call last):
File "/amex/app/airflow/venv/lib64/python3.6/site-packages/airflow/models/taskinstance.py", line 1112, in _run_raw_task
self._prepare_and_execute_task_with_callbacks(context, task)
File "/amex/app/airflow/venv/lib64/python3.6/site-packages/airflow/models/taskinstance.py", line 1285, in _prepare_and_execute_task_with_callbacks
result = self._execute_task(context, task_copy)
File "/amex/app/airflow/venv/lib64/python3.6/site-packages/airflow/models/taskinstance.py", line 1315, in _execute_task
result = task_copy.execute(context=context)
File "/amex/app/airflow/venv/lib64/python3.6/site-packages/airflow/operators/bash.py", line 171, in execute
for raw_line in iter(self.sub_process.stdout.readline, b''):
File "/amex/app/airflow/venv/lib64/python3.6/site-packages/airflow/models/taskinstance.py", line 1241, in signal_handler
raise AirflowException("Task received SIGTERM signal")
airflow.exceptions.AirflowException: Task received SIGTERM signal
We are using boto3 to push to S3 object storage, but not able to figure out what is causing the issue. We have changed killed_task_cleanup_time = 60 to killed_task_cleanup_time = 21600 in airflow.cfg, but still getting the same error.
We have postgresql as our airflow metadata database.

How to handle Permission errors when connecting with AWS s3 in Airflow 2.x?

I followed the number 2 instruction from this manual to use S3Hook:
Note: I hide the credential info with THIS_IS_CREDENTIAL.
And here is the simple code to test:
from airflow.operators.python import task
#task
def load_to_s3():
from airflow.providers.amazon.aws.hooks.s3 import S3Hook
s3_hook = S3Hook(aws_conn_id="my_s3")
s3_hook.load_string(
string_data="ABC",
key="year=2021/month=1/day=1/test.txt",
bucket_name="my_bucket_in_s3",
)
default_args = {
"depends_on_past": False,
"start_date": datetime(2021, 1, 1),
"schedule_interval": "#daily",
}
with DAG("my_test_dag2", default_args=default_args) as dag:
load_to_s3()
Errors occured:
[2021-05-27 09:58:26,896] {base_aws.py:362} INFO - Airflow Connection: aws_conn_id=my_s3
[2021-05-27 09:58:26,905] {base_aws.py:173} INFO - No credentials retrieved from Connection
[2021-05-27 09:58:26,905] {base_aws.py:76} INFO - Retrieving region_name from Connection.extra_config['region_name']
[2021-05-27 09:58:26,905] {base_aws.py:78} INFO - Creating session with aws_access_key_id=None region_name=ap-northeast-1
[2021-05-27 09:58:26,913] {base_aws.py:151} INFO - role_arn is arn:aws:iam::THIS_IS_CREDENTIAL:role/airflow-v1
[2021-05-27 09:58:26,913] {base_aws.py:97} INFO - assume_role_method=None
[2021-05-27 09:58:26,930] {base_aws.py:182} INFO - Doing sts_client.assume_role to role_arn=arn:aws:iam::THIS_IS_CREDENTIAL:role/airflow-v1 (role_session_name=Airflow_my_s3)
[2021-05-27 09:58:26,932] {credentials.py:519} WARNING - Refreshing temporary credentials failed during mandatory refresh period.
Traceback (most recent call last):
File "/home/airflow/.local/lib/python3.8/site-packages/botocore/credentials.py", line 516, in _protected_refresh
metadata = self._refresh_using()
File "/home/airflow/.local/lib/python3.8/site-packages/botocore/credentials.py", line 657, in fetch_credentials
return self._get_cached_credentials()
File "/home/airflow/.local/lib/python3.8/site-packages/botocore/credentials.py", line 667, in _get_cached_credentials
response = self._get_credentials()
File "/home/airflow/.local/lib/python3.8/site-packages/botocore/credentials.py", line 872, in _get_credentials
kwargs = self._assume_role_kwargs()
File "/home/airflow/.local/lib/python3.8/site-packages/botocore/credentials.py", line 882, in _assume_role_kwargs
identity_token = self._web_identity_token_loader()
File "/home/airflow/.local/lib/python3.8/site-packages/botocore/utils.py", line 2152, in __call__
with self._open(self._web_identity_token_path) as token_file:
PermissionError: [Errno 13] Permission denied: '/var/run/secrets/eks.amazonaws.com/serviceaccount/token'
...
It looks like airflow found out role_arn=arn:aws:iam::THIS_IS_CREDENTIAL:role/airflow-v1 but
can't understand why the Airflow tries to access /var/run/secrets/eks.amazonaws.com/serviceaccount/token. I'd like to solve it without something like chmod 755 /var/run/secrets/eks.amazonaws.com/serviceaccount/token (I have no root privilege currently)
, I think you don't need to specify the AWS Airflow role, so you can do it by adding the programmatic credentials, something like that :
👇

Docker-compose with Airflow - MS SQL Server (connection failed)

I´m not able to connect in the SQL Server inside Airflow using docker-compose. I want to take data from SQL Server direct to Cloud Storage and then the data will be sent to Big Query.
How to solve this?
import json
from datetime import timedelta, datetime
from airflow import DAG
from airflow.models import Variable
from airflow.contrib.operators.bigquery_operator import BigQueryOperator
from airflow.contrib.operators.bigquery_check_operator import BigQueryCheckOperator
from airflow.contrib.operators.file_to_gcs import FileToGoogleCloudStorageOperator
from airflow.contrib.operators.gcs_to_bq import GoogleCloudStorageToBigQueryOperator
from airflow.contrib.operators.mysql_to_gcs import MySqlToGoogleCloudStorageOperator
default_args = {
'owner': 'Test Data',
'depends_on_past': True,
'start_date': datetime(2019, 5, 29),
'end_date': datetime(2019, 5, 30),
'email': ['email#clientx.com.br'],
'email_on_failure': True,
'email_on_retry': False,
'retries': 1,
'retry_delay': timedelta(minutes=5),
}
# Set Schedule: Run pipeline once a day.
# Use cron to define exact time. Eg. 8:15am would be "15 08 * * *"
schedule_interval = "* * * * *"
# Define DAG: Set ID and assign default args and schedule interval
dag = DAG(
'bigquery_github_trends',
default_args=default_args,
schedule_interval=schedule_interval
)
extract = MySqlToGoogleCloudStorageOperator(
task_id='chama_extract',
mysql_conn_id='mysql_hml',
google_cloud_storage_conn_id='my_gcp_conn',
sql="""SELECT * FROM test""",
bucket='my_bucket',
filename='test/test{}.json',
schema_filename='schemas/test.json',
dag=dag)
load = GoogleCloudStorageToBigQueryOperator(
task_id='chama_load',
bigquery_conn_id='my_gcp_conn',
google_cloud_storage_conn_id='my_gcp_conn',
bucket='my_bucket',
destination_project_dataset_table="tst.teste123",
source_objects=['test/test0.json'],
schema_object='schemas/test.json',
source_format='NEWLINE_DELIMITED_JSON',
create_disposition='CREATE_IF_NEEDED',
write_disposition='WRITE_TRUNCATE',
dag=dag)
# Setting up Dependencies
load.set_upstream(extract)
Docker-compose.yml
version: '3'
services:
postgres:
image: postgres:9.6
environment:
- POSTGRES_USER=airflow
- POSTGRES_PASSWORD=airflow
- POSTGRES_DB=airflow
ports:
- "5432:5432"
webserver:
image: puckel/docker-airflow:1.10.1
build:
context: https://github.com/puckel/docker-airflow.git#1.10.1
dockerfile: Dockerfile
args:
AIRFLOW_DEPS: gcp_api,s3
restart: always
depends_on:
- postgres
environment:
- LOAD_EX=n
- EXECUTOR=Local
- FERNET_KEY=jsDPRErfv8Z_eVTnGfF8ywd19j4pyqE3NpdUBA_oRTo=
volumes:
- ./examples/intro-example/dags:/usr/local/airflow/dags
# Uncomment to include custom plugins
# - ./plugins:/usr/local/airflow/plugins
ports:
- "8080:8080"
command: webserver
healthcheck:
test: ["CMD-SHELL", "[ -f /usr/local/airflow/airflow-webserver.pid ]"]
interval: 30s
timeout: 30s
retries: 3
docker-compose-gcloud.yml
version: '3'
services:
postgres:
image: postgres:9.6
environment:
- POSTGRES_USER=airflow
- POSTGRES_PASSWORD=airflow
- POSTGRES_DB=airflow
ports:
- "5432:5432"
webserver:
image: puckel/docker-airflow:1.10.1
build:
context: https://github.com/puckel/docker-airflow.git#1.10.1
dockerfile: Dockerfile
args:
AIRFLOW_DEPS: gcp_api,s3
restart: always
depends_on:
- postgres
environment:
- LOAD_EX=n
- EXECUTOR=Local
- FERNET_KEY=jsDPRErfv8Z_eVTnGfF8ywd19j4pyqE3NpdUBA_oRTo=
volumes:
- ./examples/gcloud-example/dags:/usr/local/airflow/dags
# Uncomment to include custom plugins
# - ./plugins:/usr/local/airflow/plugins
ports:
- "8080:8080"
command: webserver
healthcheck:
test: ["CMD-SHELL", "[ -f /usr/local/airflow/airflow-webserver.pid ]"]
interval: 30s
timeout: 30s
retries: 3
And execute in docker the command:
docker-compose -f docker-compose-gcloud.yml up
--abort-on-container-exit
Error message in Airflow:
[2019-05-29 07:00:37,938] {{logging_mixin.py:95}} INFO - [2019-05-29 07:00:37,937] {{base_hook.py:83}} INFO - Using connection to: 10.0.0.1
[2019-05-29 07:00:58,974] {{models.py:1760}} ERROR - (2003, 'Can\'t connect to MySQL server on 10.0.0.1 (111 "Connection refused")')
Traceback (most recent call last):
File "/usr/local/lib/python3.6/site-packages/airflow/models.py", line 1659, in _run_raw_task
result = task_copy.execute(context=context)
File "/usr/local/lib/python3.6/site-packages/airflow/contrib/operators/mysql_to_gcs.py", line 105, in execute
cursor = self._query_mysql()
File "/usr/local/lib/python3.6/site-packages/airflow/contrib/operators/mysql_to_gcs.py", line 127, in _query_mysql
conn = mysql.get_conn()
File "/usr/local/lib/python3.6/site-packages/airflow/hooks/mysql_hook.py", line 103, in get_conn
conn = MySQLdb.connect(**conn_config)
File "/usr/local/lib/python3.6/site-packages/MySQLdb/init.py", line 84, in Connect
return Connection(*args, **kwargs)
File "/usr/local/lib/python3.6/site-packages/MySQLdb/connections.py", line 164, in init
super(Connection, self).init(*args, **kwargs2)
MySQLdb._exceptions.OperationalError: (2003, 'Can\'t connect to MySQL server on 10.0.0.1 (111 "Connection refused")')
[2019-05-29 07:00:58,988] {{models.py:1789}} INFO - All retries failed; marking task as FAILED
[2019-05-29 07:00:58,992] {{logging_mixin.py:95}} INFO - [2019-05-29 07:00:58,991] {{configuration.py:255}} WARNING - section/key [smtp/smtp_user] not found in config
[2019-05-29 07:00:58,998] {{models.py:1796}} ERROR - [Errno 99] Cannot assign requested address
Traceback (most recent call last):
File "/usr/local/lib/python3.6/site-packages/airflow/models.py", line 1659, in _run_raw_task
result = task_copy.execute(context=context)
File "/usr/local/lib/python3.6/site-packages/airflow/contrib/operators/mysql_to_gcs.py", line 105, in execute
cursor = self._query_mysql()
File "/usr/local/lib/python3.6/site-packages/airflow/contrib/operators/mysql_to_gcs.py", line 127, in _query_mysql
conn = mysql.get_conn()
File "/usr/local/lib/python3.6/site-packages/airflow/hooks/mysql_hook.py", line 103, in get_conn
conn = MySQLdb.connect(**conn_config)
File "/usr/local/lib/python3.6/site-packages/MySQLdb/init.py", line 84, in Connect
return Connection(*args, **kwargs)
File "/usr/local/lib/python3.6/site-packages/MySQLdb/connections.py", line 164, in init
super(Connection, self).init(*args, **kwargs2)
MySQLdb._exceptions.OperationalError: (2003, 'Can\'t connect to MySQL server on 10.0.0.1 (111 "Connection refused")')
From the error, the key part to me seems to be the "get_conn" piece. This indicates that when airflow is trying to establish the connection to the database it fails. This means either your connection is not specified (it looks like it might be) or that some part of it is incorrect.
You should check the password, server address, and port are all correct. These should be in either you airflow.cfg, as environment variables, or set in the webserver (Admin panel)

airflow task is unable to connect to remote oracle database

airflow task is unable to connect to remote oracle database but i am able to connect to the same remote oracle database using the same code from python shell.
i checked running environment is same for both the shell and airflow.
i am attaching the log error which i am getting.
[2018-07-18 13:12:11,037] {models.py:1428} INFO - Executing <Task(AadharNumberValidation): validating_data> on 2018-07-18 07:42:05.573491
[2018-07-18 13:12:11,037] {base_task_runner.py:115} INFO - Running: ['bash', '-c', 'airflow run data_validation validating_data 2018-07-18T07:42:05.573491 --job_id 206 --raw -sd /Users/b0204890/Desktop/python/airflow_home/dags/data_validaton.py']
[2018-07-18 13:12:11,531] {base_task_runner.py:98} INFO - Subtask: [2018-07-18 13:12:11,531] {__init__.py:45} INFO - Using executor SequentialExecutor
[2018-07-18 13:12:11,588] {base_task_runner.py:98} INFO - Subtask: [2018-07-18 13:12:11,588] {models.py:189} INFO - Filling up the DagBag from /Users/b0204890/Desktop/python/airflow_home/dags/data_validaton.py
[2018-07-18 13:12:11,661] {cli.py:374} INFO - Running on host LTB0204890-Mac.local
[2018-07-18 13:12:11,669] {base_task_runner.py:98} INFO - Subtask: [2018-07-18 13:12:11,669] {validation_operators.py:37} INFO - operator_param yadav: Script one validation
[2018-07-18 13:12:11,678] {models.py:1595} ERROR - DPI-1047: 64-bit Oracle Client library cannot be loaded: "dlopen(libclntsh.dylib, 1): image not found". See https://oracle.github.io/odpi/doc/installation.html#macos for help
Traceback (most recent call last):
File "/Users/b0204890/venv/python3/lib/python3.6/site-packages/airflow/models.py", line 1493, in _run_raw_task
result = task_copy.execute(context=context)
File "/Users/b0204890/Desktop/python//airflow_home/plugins/validation_operators.py", line 38, in execute
cursor = create_connection(user="USERNAME",port="PORT",host="HOST",pwd="password",sid="SID")
File "/Users/b0204890/Desktop/python/airflow_home/utility/validation.py", line 30, in create_connection
connection = cx_Oracle.connect(user, pwd, service)
cx_Oracle.DatabaseError: DPI-1047: 64-bit Oracle Client library cannot be loaded: "dlopen(libclntsh.dylib, 1): image not found". See https://oracle.github.io/odpi/doc/installation.html#macos for help
[2018-07-18 13:12:11,681] {models.py:1616} INFO - Marking task as UP_FOR_RETRY
[2018-07-18 13:12:11,682] {base_task_runner.py:98} INFO - Subtask: [2018-07-18 13:12:11,682] {configuration.py:206} WARNING - section/key [smtp/smtp_user] not found in config
[2018-07-18 13:12:11,684] {models.py:1628} ERROR - Failed to send email to: ['tushar.smartcorp#gmail.com']
[2018-07-18 13:12:11,684] {models.py:1629} ERROR - [Errno 61] Connection refused
Traceback (most recent call last):
File "/Users/b0204890/venv/python3/lib/python3.6/site-packages/airflow/models.py", line 1493, in _run_raw_task
result = task_copy.execute(context=context)
File "/Users/b0204890/Desktop/python//airflow_home/plugins/validation_operators.py", line 38, in execute
cursor = create_connection(user="USERNAME",port="PORT",host="HOST",pwd="PASSWORD",sid="SID")
File "/Users/b0204890/Desktop/python/airflow_home/utility/validation.py", line 30, in create_connection
connection = cx_Oracle.connect(user, pwd, service)
cx_Oracle.DatabaseError: DPI-1047: 64-bit Oracle Client library cannot be loaded: "dlopen(libclntsh.dylib, 1): image not found". See https://oracle.github.io/odpi/doc/installation.html#macos for help
During handling of the above exception, another exception occurred:
Traceback (most recent call last):
File "/Users/b0204890/venv/python3/lib/python3.6/site-packages/airflow/models.py", line 1618, in handle_failure
self.email_alert(error, is_retry=True)
File "/Users/b0204890/venv/python3/lib/python3.6/site-packages/airflow/models.py", line 1779, in email_alert
send_email(task.email, title, body)
File "/Users/b0204890/venv/python3/lib/python3.6/site-packages/airflow/utils/email.py", line 44, in send_email
return backend(to, subject, html_content, files=files, dryrun=dryrun, cc=cc, bcc=bcc, mime_subtype=mime_subtype)
File "/Users/b0204890/venv/python3/lib/python3.6/site-packages/airflow/utils/email.py", line 87, in send_email_smtp
send_MIME_email(SMTP_MAIL_FROM, recipients, msg, dryrun)
File "/Users/b0204890/venv/python3/lib/python3.6/site-packages/airflow/utils/email.py", line 107, in send_MIME_email
s = smtplib.SMTP_SSL(SMTP_HOST, SMTP_PORT) if SMTP_SSL else smtplib.SMTP(SMTP_HOST, SMTP_PORT)
File "/usr/local/Cellar/python/3.6.5_1/Frameworks/Python.framework/Versions/3.6/lib/python3.6/smtplib.py", line 251, in __init__
(code, msg) = self.connect(host, port)
File "/usr/local/Cellar/python/3.6.5_1/Frameworks/Python.framework/Versions/3.6/lib/python3.6/smtplib.py", line 336, in connect
self.sock = self._get_socket(host, port, self.timeout)
File "/usr/local/Cellar/python/3.6.5_1/Frameworks/Python.framework/Versions/3.6/lib/python3.6/smtplib.py", line 307, in _get_socket
self.source_address)
File "/usr/local/Cellar/python/3.6.5_1/Frameworks/Python.framework/Versions/3.6/lib/python3.6/socket.py", line 724, in create_connection
raise err
File "/usr/local/Cellar/python/3.6.5_1/Frameworks/Python.framework/Versions/3.6/lib/python3.6/socket.py", line 713, in create_connection
sock.connect(sa)
ConnectionRefusedError: [Errno 61] Connection refused
Without following all of the instructions the connector will not work. Simply doing an installation for the Python package cx-Oracle is not enough. In this case it could also be that the venv of Airflow cannot access all neccessary files or that there is some rights issue.
As stated in the message, you need to follow all steps seen in: https://oracle.github.io/odpi/doc/installation.html#macos

Airflow airflow.exceptions.AirflowException: Failed to create remote temp file SSHExecuteOperator

I am trying to run simple SSHExecutorOperator in Airflow.
Here is my .py file:
from airflow.contrib.hooks.ssh_hook import SSHHook
from datetime import timedelta
default_args = {
'owner': 'airflow',
'start_date':airflow.utils.dates.days_ago(2),
'retries': 3
}
dag = DAG('Nas_Hdfs', description='Simple tutorial DAG',
schedule_interval=None,default_args=default_args,
catchup=False)
sshHook = SSHHook(conn_id='101')
sshHook.no_host_key_check = True
t2 = SSHExecuteOperator(task_id="NAS_TO_HDFS_FILE_COPY",
bash_command="hostname ",
ssh_hook=sshHook,
dag=dag
)
t2
The Connection id 101 looks like this:
I am getting below error:
ERROR - Failed to create remote temp file
Here is the complete logs:
INFO - Subtask: --------------------------------------------------------------------------------
INFO - Subtask: Starting attempt 1 of 4
INFO - Subtask: --------------------------------------------------------------------------------
INFO - Subtask:
INFO - Subtask: [2018-05-28 08:54:22,812] {models.py:1342} INFO - Executing <Task(SSHExecuteOperator): NAS_TO_HDFS_FILE_COPY> on 2018-05-28 08:54:12.876538
INFO - Subtask: [2018-05-28 08:54:23,303] {models.py:1417} ERROR - Failed to create remote temp file
INFO - Subtask: Traceback (most recent call last):
INFO - Subtask: File "/opt/miniconda3/lib/python2.7/site-packages/airflow/models.py", line 1374, in run
INFO - Subtask: result = task_copy.execute(context=context)
INFO - Subtask: File "/opt/miniconda3/lib/python2.7/site-packages/airflow/contrib/operators/ssh_execute_operator.py", line 128, in execute
INFO - Subtask: self.task_id) as remote_file_path:
INFO - Subtask: File "/opt/miniconda3/lib/python2.7/site-packages/airflow/contrib/operators/ssh_execute_operator.py", line 64, in __enter__
INFO - Subtask: raise AirflowException("Failed to create remote temp file")
INFO - Subtask: AirflowException: Failed to create remote temp file
INFO - Subtask: [2018-05-28 08:54:23,304] {models.py:1433} INFO - Marking task as UP_FOR_RETRY
INFO - Subtask: [2018-05-28 08:54:23,342] {models.py:1462} ERROR - Failed to create remote temp file
INFO - Subtask: Traceback (most recent call last):
INFO - Subtask: File "/opt/miniconda3/bin/airflow", line 28, in <module>
INFO - Subtask: args.func(args)
INFO - Subtask: File "/opt/miniconda3/lib/python2.7/site-packages/airflow/bin/cli.py", line 422, in run
INFO - Subtask: pool=args.pool,
INFO - Subtask: File "/opt/miniconda3/lib/python2.7/site-packages/airflow/utils/db.py", line 53, in wrapper
INFO - Subtask: result = func(*args, **kwargs)
INFO - Subtask: File "/opt/miniconda3/lib/python2.7/site-packages/airflow/models.py", line 1374, in run
INFO - Subtask: result = task_copy.execute(context=context)
INFO - Subtask: File "/opt/miniconda3/lib/python2.7/site-packages/airflow/contrib/operators/ssh_execute_operator.py", line 128, in execute
INFO - Subtask: self.task_id) as remote_file_path:
INFO - Subtask: File "/opt/miniconda3/lib/python2.7/site-packages/airflow/contrib/operators/ssh_execute_operator.py", line 64, in __enter__
INFO - Subtask: raise AirflowException("Failed to create remote temp file")
INFO - Subtask: airflow.exceptions.AirflowException: Failed to create remote temp file
INFO - Task exited with return code 1
Any help is highly appreciated!
EDIT:
I ran this in my airflow user python shell and this is the output:
from airflow.contrib.hooks.ssh_hook import SSHHook
sshHook = SSHHook(conn_id='101')
sshHook.no_host_key_check = True
sshHook.Popen(["-q", "mktemp", "--tmpdir", "tmp_XXXXXX"])
Output:
make sure follow the 3 steps below:
use ssh key instead of password
"key_file" use the id_rsa file not id_rsa.pub
airflow need owner and permissions 0600 to touch the id_rsa and id_rsa.pub file

Categories

Resources