Determine the project used by Vision Api - python

I'm working on a project on gcp using the OCR service from Google Vision.
My code is simple, I've declared a vision ImageAnnotatorClient like this:
client = vision.ImageAnnotatorClient()
and then call some batch annotation request like this:
response = client.batch_annotate_images(requests=[....])
There is a way to know which project will be used for the billing of the requests?
I've not explicitly set the credential and, in this case, the documentation (https://googleapis.dev/python/vision/latest/vision_v1/image_annotator.html) says that
if none are specified, the client will attempt to ascertain the credentials from the environment.
I need to know what "ascertain the credentials from the environment" means in this case, since I've alrteady done some requests and I wold like to know on which of the projects I'm working right now the requests will be charged.

Related

Achieving data persistence of a dynamic .json file with a python web app as a public repository

A little info about the project: https://ahashplace.#PreferablyAFreeHost.something/ is a grid of query strings. When you make a request to the site with x, y and color in the query string if the sha256 hash of your query is lower than the current query in that x,y position the site will save your query for that position.
So I was running on Heroku's free tier (a recently discontinued product) with an s3 bucket as backup. With Heroku private repos I had the AWS credentials as plain text. I need s3 to keep an updated copy of a file (data.json) so that the app can get it instead of rolling back to its deployment version when the build re-deploys.
I would like the build to be a public repository. This is for a couple reasons; first I want this to be open source and second I would prefer not to give out my github credentials even if Render will deploy on their hardware from your own private repos for free.
The question: How do I get my dynamic file to endure along side an open source python web app?
My thoughts so far...
It would be great if I could make whatever credentials I put out there only work from the host; based on like a known static ip and sub-net-mask but I'm not sure that would be secure or viable as the host might change my network address and it could be spoofed by an attacker.
I've used mongoDB before and it didn't really have this problem and I don't know why...
I could put the data in a public repo and just run code privately to periodically fetch data from the app and push updates to the repo. Then the app could look to the repo when starting up the build. This is messy though.

Cloud Run: endpoint that runs a function as background job

I am trying to deploy a rest api in cloud run where one endpoint launches an async job. The job is defined inside a function in the code.
It seems one way to do it is to use Cloud Task, but this would mean to make a self-call to another endpoint of the deployed api. Specifically, to create an auxiliary endpoint that contains the job code (e.g. /run-my-function) and another one to set the queue to cloud task that launches the /run-my-function?
Is this the right way to do it or I have misunderstand something? In case it's the right way how to specify the url of the /run-my-function endpoint without explicitly hard-code the cloud run deployed uRL name?
The code for the endpoint that launches the endpoint with the run-my-function code would be:
from google.cloud import tasks_v2
client = tasks_v2.CloudTasksClient()
project = 'myproject'
queue = 'myqueue'
location = 'mylocation'
url = 'https://cloudrunservice-abcdefg-ca.b.run.app/run-my-function'
service_account_email = '12345#cloudbuild.gserviceaccount.com'
parent = client.queue_path(project, location, queue)
task = {
"http_request": {
"http_method": tasks_v2.HttpMethod.POST,
'url': url,
"oidc_token": {"service_account_email": service_account_email},
}
}
response = client.create_task(parent=parent, task=task)
However, this requires to hard-code the service name https://cloudrunservice-abcdefg-ca.b.run.app and to define an auxiliary endpoint /run-my-function that can be called via http
In your code you are able to get the Cloud Run URL without hardcoding it or setting it in an environment variable.
You can have a look to a previous article that I wrote, in the gracefull termison part. I provide a working code in Go, not so difficult to re-implement in Python.
Here the principle:
Get the Region and the project Number from the Metadata server. Keep in mind that Cloud Run has specific metadata like the region
Get the K_SERVICE env var (it's a standard Cloud Run env var)
Perform a call to the Cloud Run Rest API to get the service detail and customize the request with the data got previously
Extract the status.url JSON entry from the response.
Now you have it!
Let me know if you have difficulties to achieve that. I'm not good at Python, but I will be able to write that piece of code!

Can I check if a script is running inside a Compute Engine or in a local environment?

I just wanted to know if there is a way to check whether a Python script is running inside a Compute Engine or in a local environment?
I want to check that in order to know how to authenticate, for example when a script runs on a Compute Engine and I want to initiate a BigQuery client I do not need to authenticate but when it comes to running a script locally I need to authenticate using a service account JSON file.
If I knew whether a script is running locally or in a Compute Engine I would be able to initiate Google services accordingly.
I could put initialization into a try-except statement but maybe there is another way?
Any help is appreciated.
If I understand your question correctly, I think a better solution is provided by Google called Application Default Credentials. See Best practices to securely auth apps in Google Cloud (thanks #sethvargo) and Application Default Credentials
Using this mechanism, authentication becomes consistent regardless of where you run your app (on- or off-GCP). See finding credentials automatically
When you run off-GCP, you set GOOGLE_APPLICATION_CREDENTIALS to point to the Service Account. When you run on-GCP (and, to be clear, you are still authenticating, it's just transparent), you don't set the environment variable because the library obtains the e.g. Compute Engine instance's service account for you.
So I read a bit on the Google Cloud authentication and came up with this solution:
import google.auth
from google.oauth2 import service_account
try:
credentials, project = google.auth.default()
except:
credentials = service_account.Credentials.from_service_account_file('/path/to/service_account_json_file.json')
client = storage.Client(credentials=credentials)
What this does is it tries to retrieve the default Google Cloud credentials (in environments such as Compute Engine) and if it fails it tries to authenticate using a service account JSON file.
It might not be the best solution but it works and I hope it will help someone else too.

How to Create Python Code with Google API Client

I have code below that was given to me to list Google Cloud Service Accounts for a specific Project.
import os
from googleapiclient import discovery
from gcp import get_key
"""gets all Service Accounts from the Service Account page"""
os.environ["GOOGLE_APPLICATION_CREDENTIALS"] = get_key()
service = discovery.build('iam', 'v1')
project_id = 'projects/<google cloud project>'
request = service.projects().serviceAccounts().list(name=project_id)
response = request.execute()
accounts = response['accounts']
for account in accounts:
print(account['email'])
This code works perfectly and prints the accounts as I need them. What I'm trying to figure out is:
Where can I go to see how to construct code like this? I found a site that has references to the Python API Client, but I can't seem to figure out how to make the code above from it. I can see the Method to list the Service Accounts, but it's still not giving me enough information.
Is there somewhere else I should be going to educate myself. Any information you have is appreciated so I don't pull out the rest of my hair.
Thanks, Eric
Let me share with you this documentation page, where there is a detailed explanation on how to build a script such as the one you shared, and what does each line of code mean. It is extracted from the documentation of ML Engine, not IAM, but it is using the same Python Google API Client Libary, so just ignore the references to ML and the rest will be useful for you.
In any case, here it is a commented version of your code, so that you understand it better:
# Imports for the Client API Libraries and the key management
import os
from googleapiclient import discovery
from gcp import get_key
# Look for an environment variable containing the credentials for Google Cloud Platform
os.environ["GOOGLE_APPLICATION_CREDENTIALS"] = get_key()
# Build a Python representation of the REST API
service = discovery.build('iam', 'v1')
# Define the Project ID of your project
project_id = 'projects/<google cloud project>'
"""Until this point, the code is general to any API
From this point on, it is specific to the IAM API"""
# Create the request using the appropriate 'serviceAccounts' API
# You can substitute serviceAccounts by any other available API
request = service.projects().serviceAccounts().list(name=project_id)
# Execute the request that was built in the previous step
response = request.execute()
# Process the data from the response obtained with the request execution
accounts = response['accounts']
for account in accounts:
print(account['email'])
Once you understand the first part of the code, the last lines are specific to the API you are using, which in this case is the Google IAM API. In this link, you can find detailed information on the methods available and what they do.
Then, you can follow the Python API Client Library documentation that you shared in order to see how to call the methods. For instance, in the code you shared, the method used depends on service, which is the Python representation of the API, and then goes down the tree of methods in the last link as in projects(), then serviceAccounts() and finally the specificlist() method, which ends up in request = service.projects().serviceAccounts().list(name=project_id).
Finally, just in case you are interested in the other available APIs, please refer to this page for more information.
I hope the comments I made on your code were of help, and that the documentation shared makes it easier for you to understand how a code like that one could be scripted.
You can use ipython having googleapiclient installed - with something like:
sudo pip install --upgrade google-api-python-client
You can go to interactive python console and do:
from googleapiclient import discovery
dir(discovery)
help(discovery)
dir - gives all entries that object has - so:
a = ''
dir(a)
Will tell what you can do with string object. Doing help(a) will give help for string object. You can do dipper:
dir(discovery)
# and then for instance
help(discovery.re)
You can call your script in steps, and see what is result print it, do some research, having something - do %history to printout your session, and have solution that can be triggered as a script.

Authentication to Google Cloud Python API Library stopped working

I have problems with the authentication in the Python Library of Google Cloud API.
At first it worked for some days without problem, but suddenly the API calls are not showing up in the API Overview of the Google CloudPlatform.
I created a service account and stored the json file locally. Then I set the environment variable GCLOUD_PROJECT to the project ID and GOOGLE_APPLICATION_CREDENTIALS to the path of the json file.
from google.cloud import speech
client = speech.Client()
print(client._credentials.service_account_email)
prints the correct service account email.
The following code transcribes the audio_file successfully, but the Dashboard for my Google Cloud project doesn't show anything for the activated Speech API Graph.
import io
with io.open(audio_file, 'rb') as f:
audio = client.sample(f.read(), source_uri=None, sample_rate=48000, encoding=speech.encoding.Encoding.FLAC)
alternatives = audio.sync_recognize(language_code='de-DE')
At some point the code also ran in some errors, regarding the usage limit. I guess due to the unsuccessful authentication, the free/limited option is used somehow.
I also tried the alternative option for authentication by installing the Google Cloud SDK and gcloud auth application-default login, but without success.
I have no idea where to start troubleshooting the problem.
Any help is appreciated!
(My system is running Windows 7 with Anaconda)
EDIT:
The error count (Fehler) is increasing with calls to the API. How can I get detailed information about the error?!
Make sure you are using an absolute path when setting the GOOGLE_APPLICATION_CREDENTIALS environment variable. Also, you might want to try inspecting the access token using OAuth2 tokeninfo and make sure it has "scope": "https://www.googleapis.com/auth/cloud-platform" in its response.
Sometimes you will get different error information if you initialize the client with GRPC enabled:
0.24.0:
speech_client = speech.Client(_use_grpc=True)
0.23.0:
speech_client = speech.Client(use_gax=True)
Usually it's an encoding issue, can you try with the sample audio or try generating LINEAR16 samples using something like the Unix rec tool:
rec --channels=1 --bits=16 --rate=44100 audio.wav trim 0 5
...
with io.open(speech_file, 'rb') as audio_file:
content = audio_file.read()
audio_sample = speech_client.sample(
content,
source_uri=None,
encoding='LINEAR16',
sample_rate=44100)
Other notes:
Sync Recognize is limited to 60 seconds of audio, you must use async for longer audio
If you haven't already, set up billing for your account
With regards to the usage problem, the issue is in fact that when you use the new google-cloud library to access ML APIs, it seems everyone authenticates to a project shared by everyone (hence it says you've used up your limit even though you've not used anything). To check and confirm this, you can call an ML API that you have not enabled by using the python client library, which will give you a result even though it shouldn't. This problem persists to other language client libraries and OS, so I suspect it's an issue with their grpc.
Because of this, to ensure consistency I always use the older googleapiclient that uses my API key. Here is an example to use the translate API:
from googleapiclient import discovery
service = discovery.build('translate', 'v2', developerKey='')
service_request = service.translations().list(q='hello world', target='zh')
result = service_request.execute()
print(result)
For the speech API, it's something along the lines of:
from googleapiclient import discovery
service = discovery.build('speech', 'v1beta1', developerKey='')
service_request = service.speech().syncrecognize()
result = service_request.execute()
print(result)
You can get the list of the discovery APIs at https://developers.google.com/api-client-library/python/apis/ with the speech one located in https://developers.google.com/resources/api-libraries/documentation/speech/v1beta1/python/latest/.
One of the other benefits of using the discovery library is that you get a lot more options compared to the current library, although often times it's a bit more of a pain to implement.

Categories

Resources