Write Python cloud functions to manipulate data stored on Datastore

Write Python cloud functions to manipulate data stored on Datastore - python

I've stored some records on my datastore console. Now I wanna manipulate them to optimize Big Data analytics.
How can I write a Python cloud routine to make some transformations on my data? Can I trigger it with Datastore events?
Thanks a lot.

I have coded a little bit myself. You can find my code in GitHub.
What it is doing:
It is an HTTP Cloud Function
Establishing connection to Google Cloud Datastore through client()
Updates a value of specific entry in the entity using ID number and entry's column name
How to use:
Test this Cloud Function and get the idea of how it is working. Then manipulate according to your needs. I have tested this my self and it is working.
Create an HTTP trigger Python Cloud Function.
Set the name to updateDatastore
Copy and paste the code from GitHub.
Add this line google-cloud-datastore to the requirements.txt file.
In main code assign ENTITY_KIND your entity's kind value
In main code assign ENTITY_KEY your entity's key value
When clicked on HTTP trigger URL, after your Cloud Function's execution current time will be written in the column.

Related

How to Trigger an On-Demand Scheduled Query Using a Cloud Function?

What I basically want to happen is my on demand scheduled query will run when a new file lands in my google cloud storage bucket. This query will load the CSV file into a temporary table, perform some transformation/cleaning and then append to a table.
Just to try and get the first part running, my on demand scheduled query looks like this. The idea being it will pick up the CSV file from the bucket and dump it into a table.
LOAD DATA INTO spreadsheep-20220603.Case_Studies.loading_test
from files
(
format='CSV',
uris=['gs://triggered_upload/*.csv']
);
I was in the process of setting up a Google Cloud Function that triggers when a file lands in the storage bucket, that seems to be fine but I haven't had luck working out how that function will trigger the scheduled query.
Any idea what bit of python code is needed in the function to trigger the query?

It seems to me that it's not really a scheduled query you want at all. You don't want one to run at regular intervals, you want to run a query in response to a certain event.
Now, you've rigged up a cloud function to execute some code whenever a new file is added to a bucket. What this cloud function needs is the BigQuery python client library. Here's an example of how it's used.
All that remains is to wrap this code in an appropriate function and specify dependencies and permissions using the cloud functions python framework. Here is a guide on how to do that.

How to share a global dict between Cloud Run instances in python?

I'm building a flask server in python with Cloud Run, for a chatbot to call.
Sometimes if user wants to do something with the chatbot, the bot need ask the user to login to a 3rd party server before doing the things.
I have two routes:
Route 1 is "/login", it returns a simple iframe which will open a login page in a 3rd party server, generate a "session_id", and save some info I already get to a global variable dict called "runtimes" with the "session_id" as key, so that I can use it later when visitor successfully logged in.
Route 2 is "/callback/<session_id>". After user successfully login to its account, the 3rd party server will call this route with a token in url parameters. Then I will use the "session_id" to read the saved info from "runtimes", and do later things.
It works well in my local machine. But in Google Cloud Run, because it support multiple instances, sometimes it will trigger a new instance when server calls "callback", so it cannot get the "runtime" because they are in different instances.
I know that I can save the runtimes dict to a database to solve this problem, but it looks too overkill...Just not seem right.
Is there any easy way that I can make the "runtimes" be shared between instances?

The solution here is to use a central point of storage: database, memorystore, firestore,... something out of Cloud Run itself.
You can also try the Cloud Run execution runtime v2 that allow you to mount a network disk, such as Cloud Storage or Filestore. You can imagine to store the session data in a file which has the name of the session ID.
Note: On Cloud Run side, something is cooking, but it's not 100% safe, it will be a best effort. A database backup will be required even with that new feature

Azure Function fires multiple times for the same Blob storage event

My Python Azure Function configuration file (function.json) defines a Blob storage trigger.
When the Azure Function wakes up (i.e. the server shown in Live Metrics becomes online after some sleeping time) it processes all the existing blobs regardless which ones have already generated trigger events.
I noticed that the azure-webjobs-hosts/blobreceipts folder is populated with sandboxhost637nnn folders. A new sandboxhost folder is created every Azure Function wake up event. This way the function forgets previously processed blobs (no old receipts found). In the past (Dec ’19), I remember that a unique webjobs-hosts folder containing all receipts persisted across invocations.

To explain why this happens:
Irrespective of your environment , the blob trigger, by design, keeps track of new & updated blobs by maintaining blob receipts in azure-webjobs-hosts container. These receipts are correlated by their **eTags** to **host ID** of your Functions runtime.
When your function wakes up, it'll result in change of your **host ID and the eTag->host ID correlations** you previously had will no longer apply which then results in the new host re-processing all of your existing blobs -- the behavior you've observed.
The recommendation is to use Event Grid trigger instead or use the Azure app service base plan for your function app which will be costlier.
Additional reference:
https://github.com/Azure/azure-webjobs-sdk/issues/1327
Hope it helps.

I am not sure what happened to you. But this problem should not occured any more now.
This is an old problem. And the sdk has update.
BlobTriggers will be processed only once now, have a look of this:
https://azure.microsoft.com/zh-cn/blog/announcing-the-0-5-0-beta-preview-of-microsoft-azure-webjobs-sdk/
May be your problem is a new one. But you need to provide your logs and SDK you used. Then we can help you solve this. Let me know if you update later.

Elasticsearch Data Insertion with Python

I'm brand new to using the Elastic Stack so excuse my lack of knowledge on the subject. I'm running the Elastic Stack on a Windows 10, corporate work computer. I have Git Bash installed for a bash cli, and I can successfully launch the entire Elastic Stack. My task is to take log data that is stored in one of our databases and display it on a Kibana dashboard.
From what my team and I have reasoned, I don't need to use Logstash because the database that the logs are sent to is effectively our 'log stash', so to use the Logstash service would be redundant. I found this nifty diagram
on freecodecamp, and from what I gather, Logstash is just the intermediary for log retrieval different services. So instead of using Logstash, since the log data is already in a database, I could just do something like this
USER ---> KIBANA <---> ELASTICSEARCH <--- My Python Script <--- [DATABASE]
My python script successfully calls our database and retrieves the data, and a function that molds the data into a dict object (as I understand, Elasticsearch takes data in a JSON format).
Now I want to insert all of that data into Elasticsearch - I've been reading the Elastic docs, and there's a lot of talk about indexing that isn't really indexing, and I haven't found any API calls I can use to plug the data right into Elasticsearch. All of the documentation I've found so far concerns the use of Logstash, but since I'm not using Logstash, I'm kind of at a loss here.
If there's anyone who can help me out and point me in the right direction I'd appreciate it. Thanks
-Dan

You ingest data on elasticsearch using the Index API, it is basically a request using the PUT method.
To do that with Python you can use elasticsearch-py, the official python client for elasticsearch.
But sometimes what you need is easier to be done using Logstash, since it can extract the data from your database, format it using many filters and send to elasticsearch.

Python vs. Node.js Event Payloads in Firebase Cloud Functions

I am in the process of writing a Cloud Function for Firebase via the Python option. I am interested in Firebase Realtime Database Triggers; in other words I am willing to listen to events that happen in my Realtime Database.
The Python environment provides the following signature for handling Realtime Database triggers:
def handleEvent(data, context):
# Triggered by a change to a Firebase RTDB reference.
# Args:
# data (dict): The event payload.
# context (google.cloud.functions.Context): Metadata for the event.
This is looking good. The data parameter provides 2 dictionaries; 'data' for notifying the data before the change and 'delta' for the changed bits.
The confusion kicks in when comparing this signature with the Node.js environment. Here is a similar signature from theNode.js world:
exports.handleEvent = functions.database.ref('/path/{objectId}/').onWrite((change, context) => {}
In this signature, the change parameter is pretty powerful and it seems to be of type firebase.database.DataSnapshot. It has nice helper methods such as hasChild() or numChildren() that provide information about the changed object.
The question is: Does Python environment have a similar DataSnapshot object? With Python, do I have to query the database to get the number of children for example? It really isn't clear what Python environment can and can't do.
Related API/Reference/Documentation:
Firebase Realtime DB Triggers: https://cloud.google.com/functions/docs/calling/realtime-database
DataSnapshot Reference: https://firebase.google.com/docs/reference/js/firebase.database.DataSnapshot

The python runtime currently doesn't have a similar object structure. The firebase-functions SDK is actually doing a lot of work for you in creating objects that are easy to consume. Nothing similar is happening in the python environment. You are essentially getting a pretty raw view at the payload of data contained by the event that triggered your function.
If you write Realtime Database triggers for node, not using the Firebase SDK, it will be a similar situation. You'll get a really basic object with properties similar to the python dictionary.
This is the reason why use of firebase-functions along with the Firebase SDK is the preferred environment for writing triggers from Firebase products. The developer experience is superior: it does a bunch of convenient work for you. The downside is that you have to pay for the cost of the Firebase Admin SDK to load and initialize on cold start.
Note that might be possible for you to parse the event and create your own convenience objects using the Firebase Admin SDK for python.

Develop Reference

Python is a programming language that lets you work quickly and integrate systems more effectively.