I am not able to print the detailed task successful log messages in the airflow. How can I do that?
For Failed tasks it is showing detailed failure messages but is skipping every successful mini-operation. So how to print messages in the log airflow for every successful sql script execution (eg:- any insert script or create table script)inside a sql file?
I could not come up with a solution.
Related
I want to use Telegram client API.
I want to Run run_until_disconnected() for getting all messages in 24 hours and save them in Database. This Part is fine , I Wrote the code and its working fine . after some operations on the messages database , I want to send the result of that operation as a message to telegram (to channel or User). i wrote the code of sending message too but when i wanted to use , i get error of database is locked or session is locked...
What should I Do?
Please Read :: https://docs.telethon.dev/en/latest/quick-references/faq.html#id9
Solution according to docs :
if you need two clients, use two sessions. If the problem persists and you’re on Linux, you can use fuser my.session to find out the process locking the file. As a last resort, you can reboot your system.
If you really dislike SQLite, use a different session storage. There is an entire section covering that at Session Files.
I'm not strong in programming, the task has arrived. You need to write a lambda function to once a day, retrieve logs in 24 hours, sort logs with an error note, and send them to email or slack.
SNS topic I created, with sending to email, but I do not understand how you can use the lamb function to extract and sort logs
https://github.com/EvanErickson/aws-lambda-parse-cloudwatch-logs-send-email/blob/main/index.py
I found an example on the Internet, but I can not understand how it works and where to enter the name of the group log
I am running EMR clusters kicked off with Airflow and I need some way of passing error messages back to Airflow. Airflow runs in Python so I need this to be done in python.
Currently the error logs are in the "Log URI" section under configuration details. Accessing this might be one way to do it, but any way to access the error logs from emr with python would be much appreciated.
You can access the EMR logs in S3 with boto3 for example.
The S3 path would be:
stderr : s3://<EMR_LOG_BUCKET_DEFINED_IN_EMR_CONFIGURATION>/logs/<CLUSTER_ID>/steps/<STEP_ID>/stderr.gz
stout : s3://<EMR_LOG_BUCKET_DEFINED_IN_EMR_CONFIGURATION>/logs/<CLUSTER_ID>/steps/<STEP_ID>/stdout.gz
controller : s3://<EMR_LOG_BUCKET_DEFINED_IN_EMR_CONFIGURATION>/logs/<CLUSTER_ID>/steps/<STEP_ID>/controller.gz
syslog : s3://<EMR_LOG_BUCKET_DEFINED_IN_EMR_CONFIGURATION>/logs/<CLUSTER_ID>/steps/<STEP_ID>/syslog.gz
Cluster ID and Step ID can be passed to your different tasks via XCOM from the task(s) that creates the cluster/steps.
Warning for spark (might be applicable to other types of steps):
This works if you submit your steps in client mode as if you are using cluster mode you would need to change the URL to fetch the application logs of the driver instead.
Our python Dataflow pipeline works locally but not when deployed using the Dataflow managed service on Google Cloud Platform. It doesn't show signs that it is connected to the PubSub subscription. We have tried subscribing to both subscription and topic, neither of them worked. The messages accumulate in the PubSub subscription and the Dataflow pipeline doesn't show signs of being called or anything. We have double-checked the project is the same
Any directions on this would be very much appreciated
Here is the code to connect to a pull subscription
with beam.Pipeline(options=options) as p:
something = p | "ReadPubSub" >> beam.io.ReadFromPubSub(
subscription="projects/PROJECT_ID/subscriptions/cloudflow"
)
Here goes the options used
options = PipelineOptions()
file_processing_options = PipelineOptions().view_as(FileProcessingOptions)
if options.view_as(GoogleCloudOptions).project is None:
print(sys.argv[0] + ": error: argument --project is required")
sys.exit(1)
options.view_as(SetupOptions).save_main_session = True
options.view_as(StandardOptions).streaming = True
The PubSub subscription has this configuration:
Delivery type: Pull
Subscription expiration: Subscription expires in 31 days if there is no activity.
Acknowledgement deadline: 57 Seconds
Subscription filter: —
Message retention duration: 7 Days
Retained acknowledged messages: No
Dead lettering: Disabled
Retry policy : Retry immediately
Very late answer, it may still help someone else. I had the same problem, solved it like this:
Thanks to user Paramnesia1 who wrote this answer, I figured out that I was not observing all the logs on Logs Explorer. Some default job_name query filters were preventing me from that. I am quoting & claryfing the steps to follow to be able to see all logs:
Open the Logs tab in the Dataflow Job UI, section Job Logs
Click the "View in Logs Explorer" button
In the new Logs Explorer screen, in your Query window, remove all the existing "logName" filters, keep only resource.type and resource.labels.job_id
Now you will be able to see all the logs and investigate further your error. In my case, I was getting some 'Syncing Pod' errors, which were due to importing the wrong data file in my setup.py.
I think for Pulling from subscription we need to pass with_attributes parameter as True.
with_attributes – True - output elements will be PubsubMessage objects. False -
output elements will be of type bytes (message data only).
Found similar one here:
When using Beam IO ReadFromPubSub module, can you pull messages with attributes in Python? It's unclear if its supported
Requirement: Delete DMS Task, DMS Endpoints and Replication Instance.
Use : Boto3 python script in Lambda
My Approach:
1. Delete the Database Migration Task first as Endpoint and Replication Instance cant be deleted before deleting this.
2. Delete Endpoints
3. Delete Replication Instance
Issue: When i am running these 3 delete commands, i get the following error
"errorMessage": "An error occurred (InvalidResourceStateFault) when calling the DeleteEndpoint operation:Endpoint arn:aws:dms:us-east-1:XXXXXXXXXXXXXX:endpoint:XXXXXXXXXXXXXXXXXXXXXX is part of one or more ReplicationTasks.
Here i know that Data migration task will take some time to delete. So till then Endpoint will be occupied by Task. So we cant delete it.
There is a aws cli command to check whether task is deleted or not - replication-task-deleted.
I can run this in shell and wait(sleep) until i get the final status and then execute delete Endpoint script.
There is no equivalent command in Boto3 DMS docs
Is there any other Boto3 command i can use to check the status and make my python script sleep till that time?
Please let me know if i can approach the the issue in different way.
You need to use waiters In your case the Waiter.ReplicationTaskDeleted