INVALID_ARGUMENT Error for Google Cloud Dataflow - python

I've got a python pipeline which takes a file from Cloud Storage, removes some columns, then uploads the result to BigQuery. If I run this locally using the Dataflow runner, everything works as expected, but whenever I try to set it up with the Dataflow UI so I can schedule it etc. no job gets created and this
INVALID_ARGUMENT error gets thrown in the logs:
(Ids etc. removed)
{
"insertId": "...",
"jsonPayload": {
"url": "https://datapipelines.googleapis.com/v1/projects/.../locations/europe-west1/pipelines/cloudpipeline:run",
"jobName": "projects/.../locations/europe-west1/jobs/...",
"#type": "type.googleapis.com/google.cloud.scheduler.logging.AttemptFinished",
"targetType": "HTTP",
"status": "INVALID_ARGUMENT"
},
"httpRequest": {
"status": 400
},
"resource": {
"type": "cloud_scheduler_job",
"labels": {
"location": "europe-west1",
"project_id": "...",
"job_id": "..."
}
},
"timestamp": "2022-11-01T10:45:00.657145402Z",
"severity": "ERROR",
"logName": "projects/.../logs/cloudscheduler.googleapis.com%2Fexecutions",
"receiveTimestamp": "2022-11-01T10:45:00.657145402Z"
}
I can't find out anything about this error and gcp doesn't seem to provide any additional info.
I've tried to use gcp logging in the python code but the gcp error seems to get thrown before any code is executed. I've also removed all of my optional parameters so there shouldn't be anything you're required to enter in the gcp set-up that isn't default.

Related

Is it possible to send a message out to Log Stream (App Insights Logs) Inside Azure Function App

I feel like it must be possible, but I've yet to find an answer.
I navigate here inside of my Function App:
Then click the drop down arrow and select Application Insight Logs
As you can see in that picture, there's a log with an [Information] tag. I thought maybe I could do something like this in my Azure Function that's running my python script:
import logging
logging.info("Hello?")
However, I'm not able to get messages to show up in those logs. How do I actually achieve this? If there's a different place where logs created with logging.info() show up, I'd love to know that as well.
host.json file:
{
"version": "2.0",
"logging": {
"applicationInsights": {
"samplingSettings": {
"isEnabled": true,
"excludedTypes": "Request"
}
}
},
"logLevel": {
"default": "Information",
"Host.Results": "Information",
"Function": "Information",
"Host.Aggregator": "Information"
},
"extensionBundle": {
"id": "Microsoft.Azure.Functions.ExtensionBundle",
"version": "[2.*, 3.0.0)"
},
"extensions": {
"queues": {
"batchSize": 2,
"maxDequeueCount": 2
}
},
"functionTimeout": "00:45:00"
}
I believe, there is no different place to write log info, but we need to change the log levels accordingly in host.json for getting different types of logs.
I tried to log the information level logs in this workaround.
In VS Code, Created Azure Functions - Python Stack.
Added this code logging.info(f" Calling Activity Function") in Activity Function Code like below:
This is the host.json code by default:
"logging": {
"applicationInsights": {
"samplingSettings": {
"isEnabled": true,
"excludedTypes": "Request"
}
}
}
After running this durable function, it is logging the information level:
Please refer to this workaround where I had given information about logging levels and also optimization of application insights logs on Azure Python Function.
Updated Answer:

Python Azure Functions, EventGrid Trigger

I am pretty new to azure, and struggling with the python function triggers from eventGrid.
I am using ready templates created from azure for python and getting errors.
I will share the files.
(init.py)
import json
import logging
import azure.functions as func
def main(event: func.EventGridEvent):
result = json.dumps(
{
"id": event.id,
"data": event.get_json(),
"topic": event.topic,
"subject": event.subject,
"event_type": event.event_type,
}
)
logging.info("Python EventGrid trigger processed an event: %s", result)
function.json
{
"bindings": [
{
"type": "eventGridTrigger",
"name": "event",
"direction": "in"
}
],
"disabled": false,
"scriptFile": "__init__.py"
}
host.json
{
"version": "2.0",
"logging": {
"applicationInsights": {
"samplingSettings": {
"isEnabled": true,
"excludedTypes": "Request"
}
}
},
"extensionBundle": {
"id": "Microsoft.Azure.Functions.ExtensionBundle",
"version": "[2.*, 3.0.0)"
}
}
and the that's the dataset that i sent to event grid
{
"topic": "/subscriptions/{subscriptionID}/resourcegroups/{resourceGroupName}/providers/Microsoft.EventHub/namespaces/event",
"subject": "eventhubs/test",
"eventType": "captureFileCreated",
"eventTime": "2017-07-14T23:10:27.7689666Z",
"id": "{id}",
"data": {
"fileUrl": "https://test.blob.core.windows.net/debugging/testblob.txt",
"fileType": "AzureBlockBlob",
"partitionId": "1",
"sizeInBytes": 0,
"eventCount": 0,
"firstSequenceNumber": -1,
"lastSequenceNumber": -1,
"firstEnqueueTime": "0001-01-01T00:00:00",
"lastEnqueueTime": "0001-01-01T00:00:00"
},
"dataVersion": "",
"metadataVersion": "1"
}
and the error that i am getting is
fail: Function.{functionName}[3]
Executed 'Functions.{functionName}' (Failed, Id={someID}, Duration=121ms)
Microsoft.Azure.WebJobs.Host.FunctionInvocationException: Exception while executing function: Functions.{functionName}
System.InvalidOperationException: Exception binding parameter 'event'
System.NotImplementedException: The method or operation is not implemented.
probably it is super easy mistake somewhere above but i couldn't find it..
Thanks in advance!
python fail: Function.{functionName}[3] Executed 'Functions.{functionName}' (Failed, Id={someID}, Duration=121ms) Microsoft.Azure.WebJobs.Host.FunctionInvocationException: Exception while executing function: Functions.{functionName} System.InvalidOperationException: Exception binding parameter 'event' System.NotImplementedException: The method or operation is not implemented.
The above error is quite a common generic error message. It appears when your application hits some limitations of the Event grid.
If It is reaching the maximum hits of limitations of Event Grid refer to this
Note: Event Grid triggers aren't natively supported in an internal load balancer App Service Environment. The trigger uses an HTTP request that can't reach the function app without a gateway into the virtual network.
If you are using a consumption plan try to use the dedicated plan to avoid this issue.
Refer here to reliable event processing to relate this error.
Check the limitations of Event Grid and choose the appropriate plan to for your requirement.
Reference:
azure function fails with function invocation exception

How to access EventGrid trigger Event properties in Azure functions input bindings

I want to pull in cosmos documents into my azure function based on contents of eventgrid events that it triggers on (python worker runtime). Is it possible to do this?
I have below function.json:
{
"scriptFile": "__init__.py",
"bindings": [
{
"type": "eventGridTrigger",
"name": "event",
"direction": "in"
},
{
"name": "documents",
"type": "cosmosDB",
"databaseName": "Metadata",
"collectionName": "DataElementMappings",
"sqlQuery" : "SELECT * from c where c.id = {subject}",
"connectionStringSetting": "MyCosmosDBConnectionString",
"direction": "in"
}
]
}
I want to use properties of the event in the query for the cosmos input binding. I tried it with subject here. It fails with:
[6/4/2020 5:34:45 PM] System.Private.CoreLib: Exception while executing function: Functions.orch_taskchecker. System.Private.CoreLib: The given key 'subject' was not present in the dictionary.
So I am not using the right key name. The only key I got to work is {data} but unfortunately I was not able to access any properties inside the event data using the syntax {data.property} as it says in the docs. I would expect all of the event grid schema keys to be available to use in other bindings since an event is a JSON payload. I got none of them to work, e.g. eventTime, event_type, topic, etc.
Since I saw an example of the same idea for storage queues and they used for example {Queue.id} I tried things like {Event.subject}, {EventGridEvent.subject} all to no avail.
Nowhere in the docs can I find samples of event grid trigger data being used in other bindings. Can this be achieved?
For EventGridTrigger (C# script) can be used for input bindings a custom data object of the EventGridEvent class.
The following example shows a blob input bindings for event storage account:
{
"bindings": [
{
"type": "eventGridTrigger",
"name": "eventGridEvent",
"direction": "in"
},
{
"type": "blob",
"name": "body",
"path": "{data.url}",
"connection": "rk2018ebstg_STORAGE",
"direction": "in"
}
],
"disabled": false
}
Note, that only a property from the data object can be referenced.
There is the event.get_json() method that you can use to access the data field in the python code, I did not look at the bindings

Azure Functions (Python) blob ouput binding. How to set name when name is only part of the input message

I have an Azure Functions (Python 3) function that takes a message from a Service Bus queue and creates a Blob in a container as a result.
The function trigger is the Sevice Bus message. This message is a JSON object with several properties, one of which is the blob name.
The docs suggest something like this in the bindings:
{
"name": "outputblob",
"type": "blob",
"path": "samples-workitems/{queueTrigger}-Copy",
"connection": "MyStorageConnectionAppSetting",
"direction": "out"
}
But this suggest that the triggering message contains just the blob name. I can not make the message solely the blob name as I require the other attributes in the message to determine what to do / what data to put in the blob.
Is there any way to use the output bindings that will resolve this for me?
Thanks.
Yes, this could be done. You could just set the input and output binding path with the json value from the trigger json data. The below is my function.json. Use service bus trigger get the input blob name and output blob name, then write the input blob to the output blob. You could also set the container name with this way.
{
"scriptFile": "__init__.py",
"bindings": [
{
"name": "msg",
"type": "serviceBusTrigger",
"direction": "in",
"queueName": "myqueue",
"connection": "servicebuscon"
},
{
"name": "inputblob",
"type": "blob",
"path": "inputcontainer/{blobname}",
"connection": "AzureWebJobsStorage",
"direction": "in"
},
{
"name": "outputblob",
"type": "blob",
"path": "outputcontainer/{outblobname}",
"connection": "AzureWebJobsStorage",
"direction": "out"
}
]
}
And the below is the function code.
import logging
import azure.functions as func
import json, os
def main(msg: func.ServiceBusMessage,inputblob: func.InputStream,outputblob: func.Out[bytes]) -> func.InputStream:
logging.info('Python ServiceBus queue trigger processed message: %s',
msg.get_body().decode('utf-8'))
jsonData= json.loads(inputblob.read())
logging.info(jsonData)
outputblob.set(str(jsonData))
And I set the service bus message like below message.
Here is the result pic. You could find the input blob json data shown in the console and I check the container the output blob is created.

How Do I get the Name of The inputBlob That Triggered My Azure Function With Python

I have an azure function which is triggered by a file being put into blob storage and I was wondering how (if possible) to get the name of the blob (file) which triggered the function, I have tried doing:
fileObject=os.environ['inputBlob']
message = "Python script processed input blob'{0}'".format(fileObject.fileName)
and
fileObject=os.environ['inputBlob']
message = "Python script processed input blob'{0}'".format(fileObject.name)
but neither of these worked, they both resulted in errors. Can I get some help with this or some suggesstions?
Thanks
The blob name can be captured via the Function.json and provided as binding data. See the {filename} token below.
Function.json is language agnostic and works in all languages.
See documentation at https://learn.microsoft.com/en-us/azure/azure-functions/functions-triggers-bindings for details.
{
"bindings": [
{
"name": "image",
"type": "blobTrigger",
"path": "sample-images/{filename}",
"direction": "in",
"connection": "MyStorageConnection"
},
{
"name": "imageSmall",
"type": "blob",
"path": "sample-images-sm/{filename}",
"direction": "out",
"connection": "MyStorageConnection"
}
],
}
If you want to get the file name of the file that triggered your function you can to that:
Use {name} in function.json :
{
"bindings": [
{
"name": "myblob",
"type": "blobTrigger",
"path": "MyBlobPath/{name}",
"direction": "in",
"connection": "MyStorageConnection"
}
]
}
The function will be triggered by changes in yout blob storage.
Get the name of the file that triggered the function in python (init.py):
def main(myblob: func.InputStream):
filemane = {myblob.name}
Will give you the name of the file that triggered your function.
There is not any information about what trigger you used in your description. But fortunately, there is a sample project yokawasa/azure-functions-python-samples on GitHub for Azure Function using Python which includes many samples using different triggers like queue trigger or blob trigger. I think it's very helpful for you now, and you can refer to these samples to write your own one to satisfy your needs。
Hope it helps.
Getting the name of the inputBlob is not currently possible with Python Azure-Functions. There are open issues about it in azure-webjobs-sdk and azure-webjobs-sdk-script GitHub:
https://github.com/Azure/azure-webjobs-sdk/issues/1090
https://github.com/Azure/azure-webjobs-sdk-script/issues/1339
Unfortunatelly it's still not possible.
In Python, you can do:
import azure.functions as func
import os
def main(blobin: func.InputStream):
filename=os.path.basename(blobin.name)

Categories

Resources