Trigger a Matillion ETL Job from Airflow using SqsPublishOperator - python

From within Airflow I would like trigger an ETL Job to run in Matillion by publishing to an SQS queue we have set up in AWS.
On running the code below in Airflow I get the below error:
Error Invalid type for parameter
import os
from airflow import DAG
from airflow.operators.dummy import DummyOperator
from airflow.operators.bash import BashOperator
from airflow.providers.amazon.aws.operators.sqs import SqsPublishOperator
from common.util import configuration, alerting
dag_id = os.path.basename(__file__).replace(".py", "")
config = configuration.get_config(f'{dag_id}.yml')
# SQS queue to publish to in AWS
sqs_queue_name = 'run-etl-jobs'
args = {
"ENVIRONMENT": "dev",
"account_name": "mydevaccount",
"group": "Matillion_ETL_DEV",
"project": "General"
}
with DAG(dag_id=dag_id, **config['airflow'],
on_failure_callback = opsgenie_alert.create_alert) as dag:
start_task = DummyOperator(task_id='start')
end_task = DummyOperator(task_id='end')
publish_to_queue = SqsPublishOperator(
task_id="publish_to_queue",
sqs_queue=sqs_queue_name,
message_content={
"group": args['group'],
"project": args['project'],
"version": "default",
"environment": args['account_name'],
"job": "Matillion ETL Job",
}
)
start_task >> publish_to_queue >> end_task

Related

Running default eventhub function returns "unexpected status code: 400"

Running the basic azure function template for "Azure Event Hub trigger":
import logging
import azure.functions as func
def main(event: func.EventHubEvent):
logging.info('Python EventHub trigger processed an event: %s',
event.get_body().decode('utf-8'))
I'm getting unexpected status code: 400 when executing the function locally in VSC.
I've got local functions running:
And confirmed the EH instance name is correct in my function.json
{
"scriptFile": "__init__.py",
"bindings": [
{
"type": "eventHubTrigger",
"name": "event",
"direction": "in",
"eventHubName": "homeeventhub",
"connection": "",
"cardinality": "many",
"consumerGroup": "$Default"
}
]
}
The full log error:
[2022-11-02T23:14:53.401Z] Executing HTTP request: {
[2022-11-02T23:14:53.403Z] requestId: "0000-0000-0000-0000-0000",
[2022-11-02T23:14:53.404Z] method: "GET",
[2022-11-02T23:14:53.406Z] userAgent: "ms-rest-js/2.6.0 Node/v16.14.2 OS/(x64-Windows_NT-10.0.22000) vscode-azurefunctions/1.8.3",
[2022-11-02T23:14:53.407Z] uri: "/"
[2022-11-02T23:14:53.408Z] }
[2022-11-02T23:14:53.511Z] Executed HTTP request: {
[2022-11-02T23:14:53.512Z] Executing HTTP request: {
[2022-11-02T23:14:53.514Z] requestId: "0000-0000-0000-0000-0000",
[2022-11-02T23:14:53.512Z] requestId: "0000-0000-0000-0000-0000",
[2022-11-02T23:14:53.515Z] method: "POST",
[2022-11-02T23:14:53.516Z] identities: "",
[2022-11-02T23:14:53.517Z] userAgent: "ms-rest-js/2.6.0 Node/v16.14.2 OS/(x64-Windows_NT-10.0.22000) vscode-azurefunctions/1.8.3",
[2022-11-02T23:14:53.518Z] status: "200",
[2022-11-02T23:14:53.518Z] uri: "/admin/functions/ehtest2"
[2022-11-02T23:14:53.519Z] duration: "108"
[2022-11-02T23:14:53.520Z] }
[2022-11-02T23:14:53.520Z] }
[2022-11-02T23:14:53.599Z] Executing StatusCodeResult, setting HTTP status code 400
[2022-11-02T23:14:53.603Z] Executed HTTP request: {
[2022-11-02T23:14:53.604Z] requestId: "0000-0000-0000-0000-0000",
[2022-11-02T23:14:53.605Z] identities: "(WebJobsAuthLevel:Admin, WebJobsAuthLevel:Admin)",
[2022-11-02T23:14:53.606Z] status: "400",
[2022-11-02T23:14:53.606Z] duration: "91"
[2022-11-02T23:14:53.607Z] }
I'm sure it's just some kind of local configuration but I don't know what else I can troubleshoot. I have an additionall HTTP function that is running fine locally. Is there something else in the config that needs to be set up?
Ensure you have all the configuration settings configured properly such as:
function.json:
{
"scriptFile": "__init__.py",
"bindings": [
{
"type": "eventHubTrigger",
"name": "event",
"direction": "in",
"eventHubName": "kdemoeventhub",
"connection": "krisheventhubns_RootManageSharedAccessKey_EVENTHUB",
"cardinality": "many",
"consumerGroup": "$Default"
}
]
}
krisheventhubns_RootManageSharedAccessKey_EVENTHUB is the Azure Event Hub Namespace Connection string available in Shared access policies of the Azure portal Event Hub Namespace.
local.settings.json:
{
"IsEncrypted": false,
"Values": {
"AzureWebJobsStorage": "UseDevelopmentStorage=true",
"FUNCTIONS_WORKER_RUNTIME": "python",
"krisheventhubns_RootManageSharedAccessKey_EVENTHUB": "<Your-Event-Hub-NS-ConnStr"
}
}
To run the Azure Functions Python Event Hub Trigger, you need to pass/send some events so that you can observer the default function is capturing and logging the events.
Create Sender.py file and get the send events code from this MS Doc of send events to Azure Event Hub using Python.
Then run the Azure Functions Project using the terminal code func host start or Run > Run without debugging. Once the host is initialized, then run the sender.py file with the cmdlet python send.py and then observe the events in Azure portal Event Hub:

Azure Function with Event Hub trigger writes 1 event to storage and not working in batch

I have a azure function trigger with event hub trigger that gets events from the event hub. I wrote this code to handle the events and save it to storage account.
the problem is that the events are not saved in batch but 1 at the time. How can I save a batch of 500 events in one file and not 500 separate files.
import logging
import numpy as np
import azure.functions as func
def main(event: func.EventHubEvent, outputblob: func.Out[bytes]):
for ev in event:
outputblob.set(ev.get_body().decode('utf-8'))
I try working with the binding setting, but it did not work.
"extensions": {
"eventHubs": {
"maxEventBatchSize" : 10,
"batchCheckpointFrequency" : 1,
"prefetchCount" : 300,
"transportType" : "amqpTcp"
}
}
Created EventHub in portal and EventHub trigger function in vs code.
created storage account and container to save the file containing the stored events.
Sending three random event messages to event hub and saving in a file and storing in a container in storage account.
Tried below code:
Init.py
from asyncio import events
import logging
import numpy as np
import azure.functions as func
from azure.storage.blob import BlobClient
storage_connection_string='XXXXXXXXXX'
container_name = 'sample-workitems'
def main(event: func.EventHubEvent):
for ev in event:
logging.info(f'Function triggered to process a message: {ev.get_body().decode()}')
logging.info(f' SequenceNumber = {ev.sequence_number}')
logging.info(f' Offset = {ev.offset}')
blob_client = BlobClient.from_connection_string(storage_connection_string,container_name, "filename")
blob_client.upload_blob(ev.get_body().decode(),blob_type="AppendBlob")
Using the reference of MS Doc on Send/receive events to & from event hubs using python:
send. py
import asyncio
from azure.eventhub.aio import EventHubProducerClient
from azure.eventhub import EventData
Pravueventhub_RootManageSharedAccessKey_EVENTHUB= "Eventhub connection string"
async def run():
producer = EventHubProducerClient.from_connection_string(conn_str=Pravueventhub_RootManageSharedAccessKey_EVENTHUB, eventhub_name="eventhub")
async with producer:
event_data_batch = await producer.create_batch()
event_data_batch.add(EventData('First event '))
event_data_batch.add(EventData('Second event'))
event_data_batch.add(EventData('Third event'))
await producer.send_batch(event_data_batch)
loop = asyncio.get_event_loop()
loop.run_until_complete(run())
function.json
{
"scriptFile": "__init__.py",
"bindings": [
{
"type": "eventHubTrigger",
"name": "event",
"direction": "in",
"eventHubName": "eventhub",
"connection": "Pravueventhub_RootManageSharedAccessKey_EVENTHUB",
"cardinality": "many",
"consumerGroup": "$Default",
"dataType": "binary"
}
]
}
Result:

I have a Cloud Run Job working fin but I can't get Cloud scheduler to work

Hi I have a Python script running fine in the new Cloud Run Jobs service that GCP have released. What I am having problems with is Cloud Scheduler. - This is cloud run jobs not just cloud run.
I think that perhaps the URL is wrong but I am not sure 100% here are the logs. It seems to be saying that it can't find the job but it has the correct name. Any help would be much appreciated. It is my first time using Cloud Scheduler.
Just so you know the script basically makes an API call to Mouseflow creates a dateframe and sends it directly to bigquery. The script itself is working fine.
This is the Python code which as I said is working fine.
import requests
import json
import pandas as pd
from google.cloud import bigquery
from requests.auth import HTTPBasicAuth
#import schedule
#import time
r = requests.get("https://api-eu.mouseflow.com/websites/e768ed54-c09b-48dc-bf49-beda12697013/pagelist",
auth=HTTPBasicAuth("***************", "*************"))
parsed = json.loads(r.text)
print(json.dumps(parsed['pages'], indent=4, sort_keys=True))
df = pd.DataFrame.from_records(parsed['pages'])
#df.to_csv('mousedata3.csv')#
df.to_gbq('Mouseflow.Mouseflow_ETL', if_exists='replace', project_id='api-data-pod')
#Sent info to cloud securely#
# Time
#schedule.every().day.at("06:30").do(req)
#schedule.every(30).seconds.do(req)
#schedule.every(45).seconds.do(dt)
#schedule.every(5).minutes.do(req)
#while True:
#schedule.run_pending()
#time.sleep(1)
This is the Cloud Scheduler URL and log I am using to to call via HTTP.
https://us-central1-run.googleapis.com/apis/run.googleapis.com/v1/namespaces/api-data-pod/jobs/cloudyjoby2:run
{
"insertId": "rgzvcyf8u3ywd",
"jsonPayload": {
"status": "NOT_FOUND",
"url": "https://us-central1-run.googleapis.com/apis/run.googleapis.com/v1/namespaces/api-data-pod/jobs/cloudyjoby2",
"#type": "type.googleapis.com/google.cloud.scheduler.logging.AttemptFinished",
"targetType": "HTTP",
"jobName": "projects/api-data-pod/locations/us-central1/jobs/Mouseflowtest"
},
"httpRequest": {
"status": 404
},
"resource": {
"type": "cloud_scheduler_job",
"labels": {
"job_id": "Mouseflowtest",
"project_id": "api-data-pod",
"location": "us-central1"
}
},
"timestamp": "2022-07-28T15:59:05.356773252Z",
"severity": "ERROR",
"logName": "projects/api-data-pod/logs/cloudscheduler.googleapis.com%2Fexecutions",
"receiveTimestamp": "2022-07-28T15:59:05.356773252Z"
}

Python - Read messages from Service bus topic and then store them in a container with the same name

I am new to this, but the code seems to be not working.
Intension is to read json message from Service bus endpoint and then copy and store them in a blob container, but to keep the integrity constant throughout need to keep the name as is.
Do not have much knowledge on this , collected these codes from some blog.
Also if i can listen without func that will also help
Here is the code piece:
with receiver:
for msg in receiver:
print(str(msg))
logging.info('Python ServiceBus trigger processed an Topics: %s', msg.get_body().decode('utf-8'))
#receiver.complete_message(msg)
temp_path = tempfile.gettempdir()
# Create a file in the local data directory to upload and download
local_file_name = str(uuid.uuid4()) + ".txt"
upload_file_path = os.path.join(temp_path, local_file_name)
# Write text to the file
file = open(upload_file_path, 'w')
file.write(msg.get_body().decode('utf-8'))
file.close()
# Create a blob client using the local file name as the name for the blob
blob_client = blob_service_client.get_blob_client(container=container_name, blob=local_file_name)
print("\nUploading to Azure Storage as blob:\n\t" + local_file_name)
# Upload the created file
with open(upload_file_path, "rb") as data:
blob_client.upload_blob(data)
Here we need to create a function where we can configure messages to read which are received from Service Bus queues.
For this we need to mention the bindings in function.json file as below:
serviceBusTrigger:
{
"bindings": [
{
"type": "serviceBusTrigger",
"name": "inputMessage",
"connection": "AzureServiceBusConnectionString",
"queueName": "inputqueue",
"accessRights": "listen",
"direction": "in"
},
{
"type": "blob",
"name": "inputBlob",
"path": "container/{inputMessage}",
"connection": "EnterConnection",
"direction": "in"
}
],
"disabled": false
}
queueTrigger:
{
"bindings": [
{
"type": "blob",
"name": "inputBlob",
"path": "incontainer/{queueTrigger}",
"connection": "testweblog_STORAGE",
"direction": "in"
},
{
"type": "queueTrigger",
"name": "myQueue",
"queueName": "myqueue",
"connection": " EnterConnection _STORAGE",
"direction": "in"
}
],
"disabled": false
}
For more information about the triggers please refer to input and output triggers:
A queue is basically for first-in-first-out messages. When a message comes from the service bus, the service bus queue trigger gets fired and the Azure Function is called. In the Azure Function, we can process the message and then deliver it to destination.
Below is the sample code to receive service bus queue.
Import os
from azure.servicebus import ServiceBusClient
CONNECTION_STR = os.environ['SERVICE_BUS_CONNECTION_STR']
QUEUE_NAME = os.environ["SERVICE_BUS_QUEUE_NAME"]
servicebus_client = ServiceBusClient.from_connection_string(conn_str=CONNECTION_STR)
with servicebus_client:
receiver = servicebus_client.get_queue_receiver(queue_name=QUEUE_NAME)
with receiver:
received_msgs = receiver.receive_messages(max_message_count=10, max_wait_time=5)
for msg in received_msgs:
print(str(msg))
receiver.complete_message(msg)
print("Receive is done.")
For more information refer to Azure Service Bus client library for Python
servicebus_client = ServiceBusClient.from_connection_string(conn_str=CONNECTION_STR, logging_enable=True)
msg_topic="XYZ"
with servicebus_client:
receiver = servicebus_client.get_subscription_receiver(topic_name=TOPIC_NAME, subscription_name=SUBSCRIPTION_NAME, max_wait_time=5)
with receiver:
for msg in receiver:
print(str(msg))
msg_topic=msg
print(str(msg_topic))
receiver.complete_message(msg)
block_blob_service=BlockBlobService(account_name='stgidpdev',account_key='ZZZZ')
block_blob_service.create_container('servicebuscontainer',public_access=PublicAccess.Container)
print('Container Created')
#from azure.storage.blob import ContentSetting
block_blob_service.create_blob_from_text('servicebuscontainer','myblockblob',str(msg_topic),content_settings=None)

Create Azure TimerTrigger Durable Function in python

As I claimed in the title, is possible to have an azure durable app that triggers using TimerTrigger and not only httpTrigger?
I see here https://learn.microsoft.com/en-us/azure/azure-functions/durable/quickstart-python-vscode a very good example on how implement it with HttpTrigger Client and I'd like to have an example in python on how do it with a TimerTrigger Client, if it's possible.
Any help is appreciate, thanks in advance.
Just focus on the start function is ok:
__init__py
import logging
import azure.functions as func
import azure.durable_functions as df
async def main(mytimer: func.TimerRequest, starter: str) -> None:
client = df.DurableOrchestrationClient(starter)
instance_id = await client.start_new("YourOrchestratorName", None, None)
logging.info(f"Started orchestration with ID = '{instance_id}'.")
function.json
{
"scriptFile": "__init__.py",
"bindings": [
{
"name": "mytimer",
"type": "timerTrigger",
"direction": "in",
"schedule": "* * * * * *"
},
{
"name": "starter",
"type": "orchestrationClient",
"direction": "in"
}
]
}

Categories

Resources