Calling a Google cloud function within Airflow DAG

Calling a Google cloud function within Airflow DAG - python

I have a google cloud function that is working, I am trying to call it from an Airflow DAG.
what I have tried so far is to use the SimpleHttpOperator:
MY_TASK_NAME = SimpleHttpOperator(
task_id= "MY_TASK_NAME",
method='POST',
http_conn_id='http_default',
endpoint='https://us-central1-myprojectname.cloudfunctions.net/MyFunctionName',
data=({"schema": schema, "table": table}),
headers={"Content-Type": "application/json"},
xcom_push=False
)
but digging into the logs, it says it cannot find the resource:
{base_task_runner.py:98} INFO - Subtask: The requested URL /https://us-central1-myprojectname.cloudfunctions.net/MyFunctionName was not found on this server. That’s all we know.
also I noticed that it actually posts to https://www.google.com/ + the url I gave:
Sending 'POST' to url: https://www.google.com/https://us-central1-myprojectname.cloudfunctions.net/MyFunctionName
what is the proper way to call the function ?
Thanks

This is because you are using the http_conn_id='http_default'.
The http_default connection looks as follows:
If you check the Hosts field, it says http://www.google.com/.
Either create a new Connection with HTTP Connection type or modify the http_default connection and change the host to https://us-central1-myprojectname.cloudfunctions.net/
Then update the endpoint field in your task to:
MY_TASK_NAME = SimpleHttpOperator(
task_id= "MY_TASK_NAME",
method='POST',
http_conn_id='http_default',
endpoint='MyFunctionName',
data=({"schema": schema, "table": table}),
headers={"Content-Type": "application/json"},
xcom_push=False
)
Edit: Added / at the end of URLs

As #kaxil noted you need to first change the http connection. You then need to be able to send the correct authenticating for invoking the cloud function. The below link has a step by step guide for doing this by subclassing SimpleHttpOperator
https://medium.com/google-cloud/calling-cloud-composer-to-cloud-functions-and-back-again-securely-8e65d783acce
As a side note, Google should make this process much clearer. It is perfectly reasonable to want to trigger a Google Cloud Function (gcf) from Google Cloud Composer. The documentation for how to send an http trigger to a gcf includes documentation for Cloud Scheduler, Cloud Tasks, Cloud Pub/Sub, and a host of others but not Cloud Composer

Related

sqs boto3: The address 'https://us-west-2.queue.amazonaws.com/xxxx/my-name' is not valid for this endpoint

I'm having a very hard time trying to find out how to correctly configure sqs in boto3 to be able to send messages to my sqs queue. It looks like there is some confusion around boto3 and legacy endpoints but I'm getting the error message The address 'https://us-west-2.queue.amazonaws.com/xxxx/my-name' is not valid for this endpoint. for each permutation of the config I can imagine. Here's the code.
# Tried both of these
sqs_queue_url = 'https://sqs.us-west-2.amazonaws.com/xxxx/my-queue'
sqs_queue_url = 'https://us-west-2.queue.amazonaws.com/xxxx/my-queue'
# Tried both of these
sqs = boto3.client("sqs", endpoint_url="https://sqs.us-west-2.amazonaws.com")
sqs = boto3.client("sqs")
# _endpoint updates
logger.info("sqs endpoint: %s", sqs._endpoint)
# Keeps failing
sqs.send_message(QueueUrl=sqs_queue_url, MessageBody=message_json)
I'm hoping this is a silly mistake. What config am I missing?

From docs, AWS CLI and Python SDK use legacy endpoints:
If you use the AWS CLI or SDK for Python, you can use the following legacy endpoints.
Also, when you set endpoint you need to add https:
sqs = boto3.client("sqs", endpoint_url="https://us-west-2.queue.amazonaws.com")

Unable to trigger composer/airflow dag from Cloud function that triggers when there are changes in cloud storage

I have created and ran dags on a google-cloud-composer environment (dlkpipelinesv1 : composer-1.13.0-airflow-1.10.12). I am able to trigger these dags manually, and using the scheduler, but I am stuck when it comes to triggering them via cloud-functions that detect changes in a google-cloud-storage bucket.
Note that I had another GC-Composer environment (pipelines:composer-1.7.5-airflow-1.10.2) that used those same google cloud functions to trigger the relevant dags, and it was working.
I followed this guide to create the functions that trigger the dags. So I retrieved the following variables:
PROJECT_ID = <project_id>
CLIENT_ID = <client_id_retrieved_by_running_the_code_in_the_guide_within_my_gcp_console>
WEBSERVER_ID = <airflow_webserver_id>
DAG_NAME = <dag_to_trigger>
WEBSERVER_URL = f"https://{WEBSERVER_ID}.appspot.com/api/experimental/dags/{DAG_NAME}/dag_runs"
def file_listener(event, context):
"""Entry point of the cloud function: Triggered by a change to a Cloud Storage bucket.
Args:
event (dict): Event payload.
context (google.cloud.functions.Context): Metadata for the event.
"""
logging.info("Running the file listener process")
logging.info(f"event : {event}")
logging.info(f"context : {context}")
file = event
if file["size"] == "0" or "DTM_DATALAKE_AUDIT_COMPTAGE" not in file["name"] or ".filepart" in file["name"].lower():
logging.info("no matching file")
exit(0)
logging.info(f"File listener detected the presence of : {file['name']}.")
# id_token = authorize_iap()
# make_iap_request({"file_name": file["name"]}, id_token)
make_iap_request(url=WEBSERVER_URL, client_id=CLIENT_ID, method="POST")
def make_iap_request(url, client_id, method="GET", **kwargs):
"""Makes a request to an application protected by Identity-Aware Proxy.
Args:
url: The Identity-Aware Proxy-protected URL to fetch.
client_id: The client ID used by Identity-Aware Proxy.
method: The request method to use
('GET', 'OPTIONS', 'HEAD', 'POST', 'PUT', 'PATCH', 'DELETE')
**kwargs: Any of the parameters defined for the request function:
https://github.com/requests/requests/blob/master/requests/api.py
If no timeout is provided, it is set to 90 by default.
Returns:
The page body, or raises an exception if the page couldn't be retrieved.
"""
# Set the default timeout, if missing
if "timeout" not in kwargs:
kwargs["timeout"] = 90
# Obtain an OpenID Connect (OIDC) token from metadata server or using service account.
open_id_connect_token = id_token.fetch_id_token(Request(), client_id)
logging.info(f"Retrieved open id connect (bearer) token {open_id_connect_token}")
# Fetch the Identity-Aware Proxy-protected URL, including an authorization header containing "Bearer " followed by a
# Google-issued OpenID Connect token for the service account.
resp = requests.request(method, url, headers={"Authorization": f"Bearer {open_id_connect_token}"}, **kwargs)
if resp.status_code == 403:
raise Exception("Service account does not have permission to access the IAP-protected application.")
elif resp.status_code != 200:
raise Exception(f"Bad response from application: {resp.status_code} / {resp.headers} / {resp.text}")
else:
logging.info(f"Response status - {resp.status_code}")
return resp.json
This is the code that runs in the GC-functions
I have checked the environment details in dlkpipelinesv1 and piplines respectively, using this code :
credentials, _ = google.auth.default(
scopes=['https://www.googleapis.com/auth/cloud-platform'])
authed_session = google.auth.transport.requests.AuthorizedSession(
credentials)
# project_id = 'YOUR_PROJECT_ID'
# location = 'us-central1'
# composer_environment = 'YOUR_COMPOSER_ENVIRONMENT_NAME'
environment_url = (
'https://composer.googleapis.com/v1beta1/projects/{}/locations/{}'
'/environments/{}').format(project_id, location, composer_environment)
composer_response = authed_session.request('GET', environment_url)
environment_data = composer_response.json()
and the two are using the same service accounts to run, i.e. the same IAM roles. Although I have noticed the following different details :
In the old environment :
"airflowUri": "https://p5<hidden_value>-tp.appspot.com",
"privateEnvironmentConfig": { "privateClusterConfig": {} },
in the new environment:
"airflowUri": "https://da<hidden_value>-tp.appspot.com",
"privateEnvironmentConfig": {
"privateClusterConfig": {},
"webServerIpv4CidrBlock": "<hidden_value>",
"cloudSqlIpv4CidrBlock": "<hidden_value>"
}
The service account that I use to make the post request has the following roles :
Cloud Functions Service Agent
Composer Administrator
Composer User
Service Account Token Creator
Service Account User
The service account that runs my composer environment has the following roles :
BigQuery Admin
Composer Worker
Service Account Token Creator
Storage Object Admin
But I am still receiving a 403 - Forbidden in the Log Explorer when the post request is made to the airflow API.
EDIT 2020-11-16 :
I've updated to the latest make_iap_request code.
I tinkered with the IAP within the security service, but I cannot find the webserver that will accept HTTP: post requests from my cloud functions... See the image bellow, anyway I added the service account to the default and CRM IAP resources just in case, but I still get this error :
Exception: Service account does not have permission to access the IAP-protected application.
The main question is: What IAP is at stake here?? And how do I add my service account as a user of this IAP.
What am I missing?

There is a configuration parameter that causes ALL requests to the API to be denied...
In the documentation, it is mentioned that we need to override the following airflow configuration :
[api]
auth_backend = airflow.api.auth.backend.deny_all
into
[api]
auth_backend = airflow.api.auth.backend.default
This detail is really important to know, and it is not mentioned in google's documentation...
Useful links :
Triggering DAGS (workflows) with GCS
make_iap_request.py repository

The code throwing the 403 is the way it used to work. There was a breaking change in the middle of 2020. Instead of using requests to make an HTTP call for the token, you should use Google's OAuth2 library:
from google.oauth2 import id_token
from google.auth.transport.requests import Request
open_id_connect_token = id_token.fetch_id_token(Request(), client_id)
see this example

I followed the steps in Triggering DAGs and has worked in my env, please see below my recommendations.
It is a good start that the Componser Environment is up and runnning. Through the process you will only need to upload the new DAG (trigger_response_dag.py) and get the clientID (ends with .apps.googleusercontent.com) with either a python script or from the login page the first time you open Airflow UI.
In the Cloud Functions side, I noticed you have a combination of instructions for Node.js and for Python, for example, USER_AGENT is only for Node.js. And routine make_iap_request is only for python. I hope the following points helps to resolve your problem:
Service Account (SA). The Node.js code uses the default service account ${projectId}#appspot.gserviceaccount.com whose default role is Editor, meaning that it has wide access to the GCP services, including Cloud Composer. In python I think the authentication is managed somehow by client_id since a token is retrieved with id. Please ensure that the SA has this Editor role and don't forget to assign serviceAccountTokenCreator as specified in the guide.
I used Node.js 8 runtime and I noticed the user agent you are concerned it should be 'gcf-event-trigger' as it is hard coded; USER_AGENT = 'gcf-event-trigger'. In python, it seems not necesary.
By default, in the GCS trigger, the GCS Event Type is set to Archive, you need to change it to Finalize/Create. If set to Archive, the trigger won't work when you upload objects, and the DAG won't be started.
If you think your cloud function is correctly configured and an error persists, you can find its cause in your cloud function's LOGS tab in the Console. It can give you more details.
Basically, from the guide, I only had to change the following values in Node.js:
// The project that holds your function. Replace <YOUR-PROJECT-ID>
const PROJECT_ID = '<YOUR-PROJECT-ID>';
// Navigate to your webserver's login page and get this from the URL
const CLIENT_ID = '<ALPHANUMERIC>.apps.googleusercontent.com';
// This should be part of your webserver's URL in the Env's detail page: {tenant-project-id}.appspot.com.
const WEBSERVER_ID = 'v90eaaaa11113fp-tp';
// The name of the DAG you wish to trigger. It's DAG's name in the script trigger_response_dag.py you uploaded to your Env.
const DAG_NAME = 'composer_sample_trigger_response_dag';
For Python I only changed these parameters:
client_id = '<ALPHANUMERIC>.apps.googleusercontent.com'
# This should be part of your webserver's URL:
# {tenant-project-id}.appspot.com
webserver_id = 'v90eaaaa11113fp-tp'
# Change dag_name only if you are not using the example
dag_name = 'composer_sample_trigger_response_dag'

Calling a Google Cloud Function from within Python

I'm trying to call a Google Cloud function from within Python using the following:
import requests
url = "MY_CLOUD_FUNCTON_URL"
data = {'name': 'example'}
response = requests.post(url, data = data)
but I get back the error: Your client does not have permission to get URL MY_CLOUD_FUNCTON from this server
Does anyone know how I can avoid this error? I am assuming I should be passing credentials as part of the request somehow?
Also note that if I instead try to call the function via gcloud from the command line like the below then it works, but i want to do this from within python
gcloud functions call MY_CLOUD_FUNCTON --data '{"name": "example"}'
Any help would be really appreciated!

Given a working Cloud Function in HTTP mode which requires authentication in order to be triggered.
You need to generate an authentication token and insert it in the header as shown below:
import os
import json
import requests
import google.oauth2.id_token
import google.auth.transport.requests
os.environ['GOOGLE_APPLICATION_CREDENTIALS'] = './my-service-account.json'
request = google.auth.transport.requests.Request()
audience = 'https://mylocation-myprojectname.cloudfunctions.net/MyFunctionName'
TOKEN = google.oauth2.id_token.fetch_id_token(request, audience)
r = requests.post(
'https://mylocation-myprojectname.cloudfunctions.net/MyFunctionName',
headers={'Authorization': f"Bearer {TOKEN}", "Content-Type": "application/json"},
data=json.dumps({"key": "value"}) # possible request parameters
)
r.status_code, r.reason

You have a few options here. Either open the function to the public so that anyone can call it or take the more secure route, albeit necessitating a bit more steps. I will cover the 2nd option since it's the one I would suggest for security reasons, but should you be satisfied with simply opening the function to the public ( which is especially useful if you are trying to create a public endpoint after all ), see this documentation.
If you want to limit who can invoke your GCF however, you would have to perform a few more steps.
Create a service account and give it the Cloud Functions Invoker role ( if you simply want to restrict it's permissions to only invoke the GCF )
After you assign the Service Account a role(s), the next page will give you the option to create a key
After creating the Service Account Key and downloading it as credentials.json, the next step is straightforward. You would simply populate the environment variable GOOGLE_APPLICATION_CREDENTIALS with the path to the credentials.json file.
Once these steps are done, you can simply invoke the GCF as you did before, only this time, it will invoke it as the service account that you created, which contained all the permissions necessary to invoke a GCF.

This may be obvious to many, but to add to Marco's answer (I can't comment yet):
Make sure to install the google-auth package, not the google package. More details in the documentation and the requirements.txt for the code on GitHub.

Where would I receive http request in AWS

Am beginner to Amazon web services.
I have a below lambda python function
import sys
import logging
import pymysql
import json
rds_host=".amazonaws.com"
name="name"
password="123"
db_name="db"
port = 3306
def save_events(event):
result = []
conn = pymysql.connect(rds_host, user=name, passwd=password, db=db_name,
connect_timeout=30)
with conn.cursor(pymysql.cursors.DictCursor) as cur:
cur.execute("select * from bodyPart")
result = cur.fetchall()
cur.close()
print ("Data from RDS...")
print (result)
cur.close()
bodyparts = json.dumps(result)
bodyParts=(bodyparts.replace("\"", "'"))
def lambda_handler(event, context):
save_events(event)
return bodyParts
using an above function am sending json to the client using API gateway, now suppose user selects an item from the list and send it back, in form of json where will i get http request and how should i process that request

I just made an additional information for #Harsh Manvar.
The easiest way I think is you can use
api-gateway-proxy-integration-lambda
Currently API Gateway support AWS lambda very good, you can pass request body (json) by using event.body to your lambda function.
I used it everyday in my hobby project (a Slack command bot, it is harder because you need to map from application/x-www-form-urlencoded to json through mapping template)
And for you I think it is simple because you using only json as request and response. The key is you should to select Integratiton type to Lambda function
You can take some quick tutorials in Medium.com for more detail, I only link the docs from Amazon.
#mohith: Hi man, I just made a simple approach for you, you can see it here.
The first you need to create an API (see the docs above) then link it to your Lambda function, because you only use json, so you need to check the named Use Lambda Proxy integration like this:
Then you need to deploy it!
Then in your function, you can handle your code, in my case, I return all the event that is passed to my function like this:
Finally you can post to your endpoint, I used postman in my case:
I hope you get my idea, when you successfully deployed your API then you can do anything with it in your front end.
I suggest you research more about CloudWatch, when you work with API Gateway, Lambda, ... it is Swiss army knife, you can not live without it, it is very easy for tracing and debug your code.
Please do not hesitate to ask me anything.

you can use aws service called API-gateway it will give you endpoint for http api requests.
this api gateway make connection with your lambda and you can pass http request to lambda.
here sharing info about creating rest api on lambda you can check it out : https://docs.aws.amazon.com/apigateway/latest/developerguide/how-to-create-api.html
aws also provide example for lambda GET, POST lambda example.you just have to edit code it will automatically make api-gateway.as reference you can check it.
From Lambda Console > create function > choose AWS serverless repository > in search bar type "get" and search > api-lambda-dynamodb > it will take value from user and process in lambda.
here sharing link you can direct check examples : https://console.aws.amazon.com/lambda/home?region=us-east-1#/create?tab=serverlessApps

401 Unauthorized making REST Call to Azure API App using Bearer token

I created 2 applications in my Azure directory, 1 for my API Server and one for my API client. I am using the Python ADAL Library and can successfully obtain a token using the following code:
tenant_id = "abc123-abc123-abc123"
context = adal.AuthenticationContext('https://login.microsoftonline.com/' + tenant_id)
token = context.acquire_token_with_username_password(
'https://myapiserver.azurewebsites.net/',
'myuser',
'mypassword',
'my_apiclient_client_id'
)
I then try to send a request to my API app using the following method but keep getting 'unauthorized':
at = token['accessToken']
id_token = "Bearer {0}".format(at)
response = requests.get('https://myapiserver.azurewebsites.net/', headers={"Authorization": id_token})
I am able to successfully login using myuser/mypass from the loginurl. I have also given the client app access to the server app in Azure AD.

Although the question was posted a long time ago, I'll try to provide an answer. I stumbled across the question because we had the exact same problem here. We could successfully obtain a token with the adal library but then we were not able to access the resource I obtained the token for.
To make things worse, we sat up a simple console app in .Net, used the exact same parameters, and it was working. We could also copy the token obtained through the .Net app and use it in our Python request and it worked (this one is kind of obvious, but made us confident that the problem was not related to how I assemble the request).
The source of the problem was in the end in the oauth2_client of the adal python package. When I compared the actual HTTP requests sent by the .Net and the python app, a subtle difference was that the python app sent a POST request explicitly asking for api-version=1.0.
POST https://login.microsoftonline.com/common//oauth2/token?api-version=1.0
Once I changed the following line in oauth2_client.py in the adal library, I could access my resource.
Changed
return urlparse('{}?{}'.format(self._token_endpoint, urlencode(parameters)))
in the method _create_token_url, to
return urlparse(self._token_endpoint)
We are working on a pull request to patch the library in github.

For the current release of Azure Python SDK, it support authentication with a service principal. It does not support authentication using an ADAL library yet. Maybe it will in future releases.
See https://azure-sdk-for-python.readthedocs.io/en/latest/resourcemanagement.html#authentication for details.
See also Azure Active Directory Authentication Libraries for the platforms ADAL is available on.

#Derek,
Could you set your Issue URL on Azure Portal? If I set the wrong Issue URL, I could get the same error with you. It seems that your code is right.
Base on my experience, you need add your application into Azure AD and get a client ID.(I am sure you have done this.) And then you can get the tenant ID and input into Issue URL textbox on Azure portal.
NOTE:
On old portal(manage.windowsazure.com),in the bottom command bar, click View Endpoints, and then copy the Federation Metadata Document URL and download that document or navigate to it in a browser.
Within the root EntityDescriptor element, there should be an entityID attribute of the form https://sts.windows.net/ followed by a GUID specific to your tenant (called a "tenant ID"). Copy this value - it will serve as your Issuer URL. You will configure your application to use this later.
My demo is as following:
import adal
import requests
TenantURL='https://login.microsoftonline.com/*******'
context = adal.AuthenticationContext(TenantURL)
RESOURCE = 'http://wi****.azurewebsites.net'
ClientID='****'
ClientSect='7****'
token_response = context.acquire_token_with_client_credentials(
RESOURCE,
ClientID,
ClientSect
)
access_token = token_response.get('accessToken')
print(access_token)
id_token = "Bearer {0}".format(access_token)
response = requests.get(RESOURCE, headers={"Authorization": id_token})
print(response)
Please try to modified it. Any updates, please let me know.

Develop Reference

Python is a programming language that lets you work quickly and integrate systems more effectively.

Calling a Google cloud function within Airflow DAG - python

Related

sqs boto3: The address 'https://us-west-2.queue.amazonaws.com/xxxx/my-name' is not valid for this endpoint

Unable to trigger composer/airflow dag from Cloud function that triggers when there are changes in cloud storage

Calling a Google Cloud Function from within Python

Where would I receive http request in AWS

401 Unauthorized making REST Call to Azure API App using Bearer token

Categories

Resources