Google Cloud Scheduler cannot find Python Script Cloud Run jobname - python

When I run the script as a cloud function it can find the job name but when I run it in cloud run jobs it can't find the job name.
What exactly am I doing wrong?
httpRequest: {1}
insertId: "8vr1cwfi77fef"
jsonPayload: {
#type: "type.googleapis.com/google.cloud.scheduler.logging.AttemptFinished"
jobName: "projects/api-data-pod/locations/us-central1/jobs/mouse-recording"
status: "NOT_FOUND"
targetType: "HTTP"
url: "https://us-central1-run.googleapis.com/apis/run.googleapis.com/v1/namespaces/api-data-pod/jobs/mouseflowrecording:run"
}
logName: "projects/api-data-pod/logs/cloudscheduler.googleapis.com%2Fexecutions"
receiveTimestamp: "2022-08-02T11:54:04.664184360Z"
resource: {
labels: {3}
type: "cloud_scheduler_job"
}
severity: "ERROR"
timestamp: "2022-08-02T11:54:04.664184360Z"
}
import requests
import json
import pandas as pd
import pandas_gbq
from google.cloud import bigquery
from requests.auth import HTTPBasicAuth
#import schedule
#import time
def rec(request):
r = requests.get("https://api-eu.mouseflow.com/websites/e768ed54-c09b-48dc-bf49-beda12697013/recordings",
auth=HTTPBasicAuth("*************", "********************"))
if r.status_code == 200:
parsed = json.loads(r.text)
print(json.dumps(parsed['recordings'], indent=4, sort_keys=True))
df = pd.DataFrame.from_records(parsed['recordings'])
#print(df.dtypes)
#df.to_csv('mousedata3.csv')#
try:
temp_csv_string = df.to_csv(sep=";", index=False)
temp_csv_string_IO = StringIO(temp_csv_string)
# create new dataframe from string variable
new_df = pd.read_csv(temp_csv_string_IO, sep=";")
# this new df can be uploaded to BQ with no issues
new_df.to_gbq('Mouseflow.Mouseflow_Recording', if_exists='replace', project_id='api-data-pod')
return f'Successful'
except Exception as err:
return f'Upload to BigQuery failed: {err}'
else:
return f'API request error occurred: Status code {r.status_code}'
rec(requests.request)```
Hi here is the python code that is causing the problem.
It works with cloud functions perfectly but it can't find the jobname of cloud run jobs for some reason.
The only difference with the cloud function code is that I don;t have an end to function as it is not needed. So this is not at the end of my code rec(requests.request).
I hope this is explains things a bit better.

Related

Google Cloud Function - Python script to get data from Webhook

I hope someone can help me out on my problem.
I have a google-cloud function created which is http triggered and a webhook setup in customer.io
I need to capture data that was sent by customer.io app. this should trigger the google cloud function and run the python script setup within the cloud function. I am new to writing python script and its libraries. The final goal is to write the webhook data into bigquery table.
For now, I am able to see that trigger is working since I am seeing the data using print sent by the app using the function logs. I am able to check the schema of the data as well from the logs in textpayload.
This is the sample data from the textpayload I wanted to load on a bigquery table:
{
"data":{
"action_id":42,
"campaign_id":23,
"customer_id":"user-123",
"delivery_id":"RAECAAFwnUSneIa0ZXkmq8EdkAM==-",
"identifiers":{
"id":"user-123"
},
"recipient":"test#example.com",
"subject":"Thanks for signing up"
},
"event_id":"01E2EMRMM6TZ12TF9WGZN0WJaa",
"metric":"sent",
"object_type":"email",
"timestamp":1669337039
}
and this is the sample Python code I have created on the google-cloud function:
import os
def webhook(request):
request_json = request.get_json()
if request.method == 'POST':
print(request_json)
return 'success'
else:
return 'failed'
I have only tried printing the data from webhook and what I am expecting is to have a Python code that writes this textpayload data into bigquery table.
{
"data":{
"action_id":42,
"campaign_id":23,
"customer_id":"user-123",
"delivery_id":"RAECAAFwnUSneIa0ZXkmq8EdkAM==-",
"identifiers":{
"id":"user-123"
},
"recipient":"test#example.com",
"subject":"Thanks for signing up"
},
"event_id":"01E2EMRMM6TZ12TF9WGZN0WJaa",
"metric":"sent",
"object_type":"email",
"timestamp":1669337039
}
So, you have up a cloud function that executes some code whenever the webhook posts some data to it.
What this cloud function needs now is the BigQuery python client library. Here's an example of how it's used (source):
from google.cloud import bigquery
client = bigquery.Client()
dataset_id = ...
table_name = ...
data = ...
dataset_ref = client.dataset(dataset_id)
table_ref = dataset_ref.table(table_name)
table = client.get_table(table_ref)
result = client.insert_rows(table, data)
So you could put something like this into your cloud function in order to send your data to a target BigQuery table.

ParamValidationError: Parameter validation failed: Bucket name must match the regex

I'm trying to run a Glue job by calling it from lambda function. The glue job in itself is running perfectly fine but when I trigger it from lambda function, I get the below error:
[ERROR] ParamValidationError: Parameter validation failed: Bucket name must match the regex \"^[a-zA-Z0-9.\\-_]{1,255}$\" or be an ARN matching the regex \"^arn:(aws).*:(s3|s3-object-lambda):[a-z\\-0-9]*:[0-9]{12}:accesspoint[/:][a-zA-Z0-9\\-.]{1,63}$|^arn:(aws).*:s3-outposts:[a-z\\-0-9]+:[0-9]{12}:outpost[/:][a-zA-Z0-9\\-]{1,63}[/:]accesspoint[/:][a-zA-Z0-9\\-]{1,63}$\""
There is no issue in my bucket name as I am able to do different actions with it and also my glue job is working fine when running it standalone.
Any help would be appreciated.
Thanks in advance.
Maybe you are including the s3:// protocol when indicating the bucket name and it is not required.
I was able to solve it by making a few changes.
My initial code was:
import json
import os
import logging
logger = logging.getLogger()
logger.setLevel(logging.INFO)
import boto3
client = boto3.client('glue')
glueJobName = "MyTestJob"
def lambda_handler(event, context):
logger.info('## INITIATED BY EVENT: ')
logger.info(event['detail'])
response = client.start_job_run(JobName = glueJobName)
logger.info('## STARTED GLUE JOB: ' + glueJobName)
logger.info('## GLUE JOB RUN ID: ' + response['JobRunId'])
return response
Once I removed the logging part (code below), it worked without any error:
from __future__ import print_function
import boto3
import urllib
print('Loading function')
glue = boto3.client('glue')
def lambda_handler(event, context):
gluejobname="MyTestJob"
runId = glue.start_job_run(JobName=gluejobname)
status = glue.get_job_run(JobName=gluejobname, RunId=runId['JobRunId'])
print("Job Status : ", status['JobRun']['JobRunState'])
What could be the issue here?
Thanks

How to disable Airflow DAGs with AWS Lambda

I would need to disable Airflow DAGs with AWS Lambda or some other way. Can I use python code in order to do this? Thank you in advance.
You can pause/unpause a DAG with Airflow Rest API
The relevant endpoint is update a DAG.
https://airflow.apache.org/api/v1/dags/{dag_id}
With:
{
"is_paused": true
}
You also have Airflow official python client that you can use to interact with the API. Example:
import time
import airflow_client.client
from airflow_client.client.api import dag_api
from airflow_client.client.model.dag import DAG
from airflow_client.client.model.error import Error
from pprint import pprint
configuration = client.Configuration(
host = "http://localhost/api/v1"
)
# Configure HTTP basic authorization: Basic
configuration = client.Configuration(
username = 'YOUR_USERNAME',
password = 'YOUR_PASSWORD'
)
with client.ApiClient(configuration) as api_client:
# Create an instance of the API class
api_instance = dag_api.DAGApi(api_client)
dag_id = "dag_id_example" # str | The DAG ID.
dag = DAG(
is_paused=True,
)
try:
# Update a DAG
api_response = api_instance.patch_dag(dag_id, dag)
pprint(api_response)
except client.ApiException as e:
print("Exception when calling DAGApi->patch_dag: %s\n" % e)
You can see the full example in the client doc.

Use iot_v1 in a GCP Cloud Function

I'm attempting to write a GCP Cloud Function in Python that calls the API for creating an IoT device. The initial challenge seems to be getting the appropriate module (specifically iot_v1) loaded within Cloud Functions so that it can make the call.
Example Python code from Google is located at https://github.com/GoogleCloudPlatform/python-docs-samples/blob/master/iot/api-client/manager/manager.py. The specific call desired is shown in "create_es_device". Trying to repurpose that into a Cloud Function (code below) errors out with "ImportError: cannot import name 'iot_v1' from 'google.cloud' (unknown location)"
Any thoughts?
import base64
import logging
import json
import datetime
from google.auth import compute_engine
from apiclient import discovery
from google.cloud import iot_v1
def handle_notification(event, context):
#Triggered from a message on a Cloud Pub/Sub topic.
#Args:
# event (dict): Event payload.
# context (google.cloud.functions.Context): Metadata for the event.
#
pubsub_message = base64.b64decode(event['data']).decode('utf-8')
logging.info('New device registration info: {}'.format(pubsub_message))
certData = json.loads(pubsub_message)['certs']
deviceID = certData['device-id']
certKey = certData['certificate']
projectID = certData['project-id']
cloudRegion = certData['cloud-region']
registryID = certData['registry-id']
newDevice = create_device(projectID, cloudRegion, registryID, deviceID, certKey)
logging.info('New device: {}'.format(newDevice))
def create_device(project_id, cloud_region, registry_id, device_id, public_key):
# from https://cloud.google.com/iot/docs/how-tos/devices#api_1
client = iot_v1.DeviceManagerClient()
parent = client.registry_path(project_id, cloud_region, registry_id)
# Note: You can have multiple credentials associated with a device.
device_template = {
#'id': device_id,
'id' : 'testing_device',
'credentials': [{
'public_key': {
'format': 'ES256_PEM',
'key': public_key
}
}]
}
return client.create_device(parent, device_template)
You need to have the google-cloud-iot project listed in your requirements.txt file.
See https://github.com/GoogleCloudPlatform/python-docs-samples/blob/master/iot/api-client/manager/requirements.txt

Azure module on webservice

I am trying to publish a machine learning model on Azure webservice using python. I am able to deploy the code successfully but when i try to call it through the URL, it's throwing me 'Azure' module doesn't exist. The code basically retrieves a TFIDF model from the container (blob) and use it to predict the new value. The error clearly says, Azure package is missing while trying to run on the webservice and I am not sure how to fix it. Here goes the code:
For deployment:
from azureml import services
from azure.storage.blob import BlobService
#services.publish('7c94eb2d9e4c01cbe7ce1063','f78QWNcOXHt9J+Qt1GMzgdEt+m3NXby9JL`npT7XX8ZAGdRZIX/NZ4lL2CkRkGQ==')
#services.types(res=unicode)
#services.returns(str)
def TechBot(res):
from azure.storage.blob import BlobService
from gensim.similarities import SparseMatrixSimilarity, MatrixSimilarity, Similarity
blob_service = BlobService(account_name='tfidf', account_key='RU4R/NIVPsPOoR0bgiJMtosHJMbK1+AVHG0sJCHT6jIdKPRz3cIMYTsrQ5BBD5SELKHUXgBHNmvsIlhEdqUCzw==')
blob_service.get_blob_to_path('techbot',"2014.csv","df")
df=pd.read_csv("df")
doct = res
To access the url I used the python code from
service.azureml.net
import urllib2
import json
import requests
data = {
"Inputs": {
"input1":
[
{
'res': "wifi wnable",
}
],
},
"GlobalParameters": {
}
}
body = str.encode(json.dumps(data))
#proxies = {"http":"http://%s" % proxy}
url = 'http://ussouthcentral.services.azureml.net/workspaces/7c94eb2de26a45399e4c01cbe7ce1063/services/11943e537e0741beb466cd91f738d073/execute?api-version=2.0&format=swagger'
api_key = '8fH9kp67pEt3C6XK9sXDLbyYl5cBNEwYg9VY92xvkxNd+cd2w46sF1ckC3jqrL/m8joV7o3rsTRUydkzRGDYig==' # Replace this with the API key for the web service
headers = {'Content-Type':'application/json', 'Authorization':('Bearer '+ api_key)}
#proxy_support = urllib2.ProxyHandler(proxies)
#opener = urllib2.build_opener(proxy_support, urllib2.HTTPHandler(debuglevel=1))
#urllib2.install_opener(opener)
req = urllib2.Request(url, body, headers)
try:
response = urllib2.urlopen(req, timeout=60)
result = response.read()
print(result)
except urllib2.HTTPError, error:
print("The request failed with status code: " + str(error.code))
# Print the headers - they include the requert ID and the timestamp, which are useful for debugging the failure
print(error.info())
print(json.loads(error.read()))
The string 'res' will be predicted at the end. As I said it runs perfectly fine if I run as it is in python by calling azure module, problem happens when I access the url.
Any help is appreciated, please let me know if you need more information (I only sohwcased half of my code)
I tried to reproduce the issue via POSTMAN, then I got the error information below as you said.
{
"error": {
"code": "ModuleExecutionError",
"message": "Module execution encountered an error.",
"details": [
{
"code": "85",
"target": "Execute Python Script RRS",
"message": "Error 0085: The following error occurred during script evaluation, please view the output log for more information:\r\n---------- Start of error message from Python interpreter ----------\r\nCaught exception while executing function: Traceback (most recent call last):\n File \"\\server\\InvokePy.py\", line 120, in executeScript\n outframe = mod.azureml_main(*inframes)\n File \"\\temp\\1280677032.py\", line 1094, in azureml_main\n File \"<ipython-input-15-bd03d199b8d9>\", line 6, in TechBot_2\nImportError: No module named azure\n\r\n\r\n---------- End of error message from Python interpreter ----------"
}
]
}
}
According to the error code 00085 & the information ImportError: No module named azure, I think the issue was caused by importing python moduleazure-storage. There was a similar SO thread Access Azure blog storage from within an Azure ML experiment which got the same issue, I think you can refer to its answer try to use HTTP protocol instead HTTPS in your code to resolve the issue as the code client = BlobService(STORAGE_ACCOUNT, STORAGE_KEY, protocol="http").
Hope it helps. Any concern & update, please feel free to let me know.
Update: Using HTTP protocol for BlobService
from azureml import services
from azure.storage.blob import BlobService
#services.publish('7c94eb2d9e4c01cbe7ce1063','f78QWNcOXHt9J+Qt1GMzgdEt+m3NXby9JL`npT7XX8ZAGdRZIX/NZ4lL2CkRkGQ==')
#services.types(res=unicode)
#services.returns(str)
def TechBot(res):
from azure.storage.blob import BlobService
from gensim.similarities import SparseMatrixSimilarity, MatrixSimilarity, Similarity
# Begin: Update code
# Using `HTTP` protocol for BlobService
blob_service = BlobService(account_name='tfidf',
account_key='RU4R/NIVPsPOoR0bgiJMtosHJMbK1+AVHG0sJCHT6jIdKPRz3cIMYTsrQ5BBD5SELKHUXgBHNmvsIlhEdqUCzw==',
protocol='http')
# End
blob_service.get_blob_to_path('techbot',"2014.csv","df")
df=pd.read_csv("df")
doct = res

Categories

Resources