How to read parquet file from Azure Python function blob input binding? - python

I have a python function with a blob input binding. The blob in question contains a parquet file. Ultimately I want to read the bound blob into a pandas dataframe but I am unsure of the correct way to do this.
I have verified that the binding is correctly set up and I've been able to successfully read a plain text file. I am happy that the integrity of the parquet file is fine as I have been able to read it using the example provided here: https://arrow.apache.org/docs/python/parquet.html#reading-a-parquet-file-from-azure-blob-storage
The following code shows what I am trying to do:
import logging
import io
import azure.functions as func
import pyarrow.parquet as pq
def main(req: func.HttpRequest, inputblob: func.InputStream) -> func.HttpResponse:
logging.info('Python HTTP trigger function processed a request.')
# Create a bytestream to hold blob content
byte_stream = io.BytesIO()
byte_stream.write(inputblob.read())
df = pq.read_table(source=byte_stream).to_pandas()
I get the following error message:
pyarrow.lib.ArrowIOError: Couldn't deserialize thrift: TProtocolException: Invalid data
The following is my function.json file:
{
"scriptFile": "__init__.py",
"bindings": [
{
"authLevel": "function",
"type": "httpTrigger",
"direction": "in",
"name": "req",
"methods": [
"get",
"post"
]
},
{
"type": "http",
"direction": "out",
"name": "$return"
},
{
"name": "inputblob",
"type": "blob",
"path": "<container>/file.parquet",
"connection": "AzureWebJobsStorage",
"direction": "in"
}
]
}
My host.json file:
{
"version": "2.0",
"functionTimeout": "00:10:00",
"extensionBundle": {
"id": "Microsoft.Azure.Functions.ExtensionBundle",
"version": "[1.*, 2.0.0)"
}
}

I have been working on the same problem, and this solution worked for me.
__ini_.py file:
from io import BytesIO
import azure.functions as func
def main(blobTrigger: func.InputStream):
# Read the blob as bytes
blob_bytes = blobTrigger.read()
blob_to_read = BytesIO(blob_bytes)
df = pd.read_parquet(blob_to_read, engine='pyarrow')
print("Length of the parquet file:" + str(len(df.index)))

Related

unable to encode outgoing TypedData: unsupported type "<class 'azure_functions_worker.bindings.generic.GenericBinding'>" for Python type "DataFrame"

i'm trying to upload the file that i have accessed to, via SAS key, and i have cleaned to azure Blob Storage. as you can see i have used BlobTrigger for this. I can't figure out how to deal with this Error or how i can convert this into a DataFrame (i have already tried to convert to Panda dataframe). Also i have read Microsoft Docs maybe i have missed a point.
The Error that i got:
Failure Exception: TypeError: unable to encode outgoing TypedData: unsupported type "<class 'azure_functions_worker.bindings.generic.GenericBinding'>" for Python type "DataFrame"
basically i'm reading an Excel file and i want to Write the Excel file back to another Container.
Function.Json file:
{
"scriptFile": "__init__.py",
"bindings": [
{
"name": "myblob",
"type": "blobTrigger",
"direction": "in",
"path": "input/{name}.xlsx",
"connection": "AzureWebJobsStorage"
},
{
"name": "outputblob",
"type": "blob",
"path": "output/{name}",
"connection": "AzureWebJobsStorage",
"direction": "out"
}
],
"disabled": false
}
init.py file:
def main(myblob: func.InputStream, outputblob: func.Out[bytes]):
def read_excel_files(_container, _filename):
sas = generate_SAS(f"{_container}", f"{_filename}")
blob_url = f'https://{account_name}.blob.core.windows.net/{_container}/{_filename}?{sas}'
return pd.read_excel(blob_url)
if myblob.name.__contains__("Book"):
logging.info("Book was found")
Buch = read_excel_files("_container", "_filename.xlsx")
logging.info("Starting cleaning Process")
...
logging.info("Cleaning is finished")
outputblob.set(Buch)
as described in another Question and the linked Code i had to upload as string to Azure.
i post only part of my code that i changed.
outputblob: func.Out[str]) -> None:
#everything is the same
#Change the your variable to string before uploading
outputblob.set(Buch.to_string())

Azure function: System.InvalidOperationException: Storage account connection string 'does not exist

I have written an azure function, currently the azure function is acting as Webhook Consumer. The job of the function is to read Webhook events and save into azure storage.
I am using an HTTP trigger template to get the job done. I am able to receive the events from the Webhook, but when I try to write to azure storage it is giving me below error.
I tried the option mentioned in this post, but no luck still getting the same error.
System.InvalidOperationException: Storage account connection string 'AzureWebJobs<AzureStorageAccountName>' does not exist. Make sure that it is a defined App Setting.
Below is my function.json file
{
"scriptFile": "__init__.py",
"bindings": [
{
"authLevel": "anonymous",
"type": "httpTrigger",
"direction": "in",
"name": "req",
"methods": [
"get",
"post"
]
},
{
"type": "blob",
"name": "outputblob",
"path": "test/demo",
"direction": "out",
"connection": "<AzureStorageAccountName>"
}
]
}
init.py
import logging
import azure.functions as func
def main(req: func.HttpRequest,outputblob: func.Out[str]) -> func.HttpResponse:
logging.info('Python HTTP trigger function processed a request.')
name = 'some_name'
if not name:
try:
req_body = 'req_body_test'#req.get_json()
except ValueError:
pass
else:
name = 'name'#req_body.get('name')
print(str(req.get_json()))
outputblob.set(str(req.get_json()))
Please make sure you have already add the connection string to the local.settings.json on local or configuration settings on azure.
Please test below code and settings files:
__init__.py
import logging
import azure.functions as func
def main(req: func.HttpRequest,outputblob: func.Out[func.InputStream]) -> func.HttpResponse:
outputblob.set("this is a test.")
return func.HttpResponse(
"Test.",
status_code=200
)
function.json
{
"scriptFile": "__init__.py",
"bindings": [
{
"authLevel": "anonymous",
"type": "httpTrigger",
"direction": "in",
"name": "req",
"methods": [
"get",
"post"
]
},
{
"type": "http",
"direction": "out",
"name": "$return"
},
{
"name": "outputblob",
"type": "blob",
"path": "test/demo",
"connection": "MyStorageConnectionAppSetting",
"direction": "out"
}
]
}
local.settings.json
{
"IsEncrypted": false,
"Values": {
"AzureWebJobsStorage": "",
"FUNCTIONS_WORKER_RUNTIME": "python",
"MyStorageConnectionAppSetting":"DefaultEndpointsProtocol=https;AccountName=0730bowmanwindow;AccountKey=xxxxxx;EndpointSuffix=core.windows.net"
}
}
On azure:

Trying to trigger an url from azure functions using python, errored saying "without a $return binding returned a non-None value"

Below is the code,
import logging
import json
import urllib.request
import urllib.parse
import azure.functions as func
def main(myblob: func.InputStream):
logging.info(f"Python blob trigger function processed blob \n"
f"Name: {myblob.name}\n"
f"Blob Size: {myblob.length} bytes")
response = urllib.request.urlopen("http://example.com:5000/processing")
return {
'statusCode': 200,
'body': json.dumps(response.read().decode('utf-8'))
}
Error: Result: Failure Exception: RuntimeError: function 'abc' without a $return binding returned a non-None value Stack: File "/azure-functions-host/workers/python/3.7/LINUX/X64/azure_functions_worker/dispatcher.py", line 341, in _handle__invocation_request f'function {fi.name!r} without a $return binding '. The same code works in lambda.. Please help me in debugging in azure functions.
function.json
{
"scriptFile": "__init__.py",
"bindings": [
{
"name": "myblob",
"type": "blobTrigger",
"direction": "in",
"path": "sourcemetadata/{name}",
"connection": "AzureWebJobsStorage"
}
]
}
In the Azure function, if you use return in the function app code, it means that you want to use output binding. But you do not define it in function.json. Please define it. For more details, please refer to here and here
For example
I use process blob with blob trigger and send message to azure queue with queue output binding
function.json
{
"scriptFile": "__init__.py",
"bindings": [
{
"name": "myblob",
"type": "blobTrigger",
"direction": "in",
"path": "test/{name}.csv",
"connection": "AzureWebJobsStorage"
},
{
"name": "$return",
"direction": "out",
"type": "queue",
"queueName": "outqueue",
"connection": "AzureWebJobsStorage"
}
]
}
Code
async def main(myblob: func.InputStream) :
logging.info(f"Python blob trigger function processed blob \n"
f"Name: {myblob.name}\n")
return "OK"

Azure Functions (Python) blob ouput binding. How to set name when name is only part of the input message

I have an Azure Functions (Python 3) function that takes a message from a Service Bus queue and creates a Blob in a container as a result.
The function trigger is the Sevice Bus message. This message is a JSON object with several properties, one of which is the blob name.
The docs suggest something like this in the bindings:
{
"name": "outputblob",
"type": "blob",
"path": "samples-workitems/{queueTrigger}-Copy",
"connection": "MyStorageConnectionAppSetting",
"direction": "out"
}
But this suggest that the triggering message contains just the blob name. I can not make the message solely the blob name as I require the other attributes in the message to determine what to do / what data to put in the blob.
Is there any way to use the output bindings that will resolve this for me?
Thanks.
Yes, this could be done. You could just set the input and output binding path with the json value from the trigger json data. The below is my function.json. Use service bus trigger get the input blob name and output blob name, then write the input blob to the output blob. You could also set the container name with this way.
{
"scriptFile": "__init__.py",
"bindings": [
{
"name": "msg",
"type": "serviceBusTrigger",
"direction": "in",
"queueName": "myqueue",
"connection": "servicebuscon"
},
{
"name": "inputblob",
"type": "blob",
"path": "inputcontainer/{blobname}",
"connection": "AzureWebJobsStorage",
"direction": "in"
},
{
"name": "outputblob",
"type": "blob",
"path": "outputcontainer/{outblobname}",
"connection": "AzureWebJobsStorage",
"direction": "out"
}
]
}
And the below is the function code.
import logging
import azure.functions as func
import json, os
def main(msg: func.ServiceBusMessage,inputblob: func.InputStream,outputblob: func.Out[bytes]) -> func.InputStream:
logging.info('Python ServiceBus queue trigger processed message: %s',
msg.get_body().decode('utf-8'))
jsonData= json.loads(inputblob.read())
logging.info(jsonData)
outputblob.set(str(jsonData))
And I set the service bus message like below message.
Here is the result pic. You could find the input blob json data shown in the console and I check the container the output blob is created.

azure httptrigger blob storage using Python

I am trying to setup access to blob storage using a python function app but the file name is received from a post request not preset. The http trigger part works but i'm having trouble accessing files in my blob storage. This is my json:
{
"bindings": [
{
"authLevel": "function",
"type": "httpTrigger",
"direction": "in",
"name": "req",
"methods": [
"post",
"get"
]
},
{
"name": "inputblob",
"type": "blob",
"path": "sites/{httpTrigger}",
"connection": "STORAGE",
"direction": "in"
},
{
"type": "http",
"direction": "out",
"name": "res"
}
],
"disabled": false
}
I saw an example (https://learn.microsoft.com/en-us/azure/azure-functions/functions-bindings-storage-blob#input---configuration) using a queue trigger but when i do do something similar using http i get 'No value for named parameter 'httpTrigger''. My issue is that i don't know how to reflect a variable that is assigned in my python code in my path. When i do this container/{variable} i get a nullreference exception. This is my python code:
import os
import json
import sys
import logging
import azure.functions as func
_AZURE_FUNCTION_DEFAULT_METHOD = "GET"
_AZURE_FUNCTION_HTTP_INPUT_ENV_NAME = "req"
_AZURE_FUNCTION_HTTP_OUTPUT_ENV_NAME = "res"
_REQ_PREFIX = "REQ_"
def write_http_response(status, response):
output = open(os.environ[_AZURE_FUNCTION_HTTP_OUTPUT_ENV_NAME], 'w')
output.write(json.dumps(response))
env = os.environ
postreqdata = json.loads(open(env['req']).read())
print ('site: ' + postreqdata['site'])
site = postreqdata['site']+'.xlsx'
input_file = open(os.environ['inputBlob'], 'r')
clear_text = input_file.read()
input_file.close()
print("Content in the blob file: '{0}'".format(clear_text))
# Get HTTP METHOD
http_method = env['REQ_METHOD'] if 'REQ_METHOD' in env else
_AZURE_FUNCTION_DEFAULT_METHOD
print("HTTP METHOD => {}".format(http_method))
# Get QUERY STRING
req_url = env['REQ_HEADERS_X-ORIGINAL-URL'] if 'REQ_HEADERS_X-ORIGINAL-URL'
in env else ''
urlparts =req_url.split('?')
query_string = urlparts[1] if len(urlparts) == 2 else ''
print("QUERY STRING => {}".format(query_string))
if http_method.lower() == 'post':
request_body = open(env[_AZURE_FUNCTION_HTTP_INPUT_ENV_NAME], "r").read()
print("REQUEST BODY => {}".format(request_body))
write_http_response(200, site)
note: i have made my connection string successfully ( i think) and i am new to azure and using the portal only
This looks like an older version of function apps. In the new version, you can actually use the request handler to do all this work for you. I just started working in azure functions and if you want to access a file in blob storage, all you have to do is pass in the filename parameters in the form of http query, and use that query param name as the binding variable.
Ex:
def main(req: func.HttpRequest, inputblob: func.InputStream):
input_file_content = input_blob.read()
and in your binding you give
{
"scriptFile": "__init__.py",
"bindings": [
{
"authLevel": "function",
"type": "httpTrigger",
"direction": "in",
"name": "req",
"methods": [
"get",
"post"
]
},
{
"type": "http",
"direction": "out",
"name": "$return"
},
{
"type": "blob",
"direction":"in",
"name": "inputblob",
"path": "upload/{filename}",
"connection": "AzureWebJobsStorage"
}
]
}
and you simply call the api with the query parameters filename
http://localhost:7071/api/HttpTriggerFileUpload?filename=file.ext
You can take a look at this

Categories

Resources