Tracing a Python gRPC server deployed on Cloud Run with OpenTelemetry

Tracing a Python gRPC server deployed on Cloud Run with OpenTelemetry - python

I'm running a Python gRPC server on Cloud Run and attempting to add instrumentation to capture trace information. I have a basic setup currently, however I'm having trouble making use of propagation as shown in the OpenTelemetry docs.
Inbound requests have the x-cloud-trace-context header, and I can log the header value in the gRPC method I've been working with, however the traces created by the OpenTelemetry library always have a different ID than the trace ID from the request header.
This is the simple tracing.py module I've created to provide configuration and access to the current Tracer instance:
"""Utility functions for tracing."""
import opentelemetry.exporter.cloud_trace as cloud_trace
import opentelemetry.propagate as propagate
import opentelemetry.propagators.cloud_trace_propagator as cloud_trace_propagator
import opentelemetry.trace as trace
from opentelemetry.sdk import trace as sdk_trace
from opentelemetry.sdk.trace import export
import app_instance
def get_tracer() -> trace.Tracer:
"""Function that provides an object for tracing.
Returns:
trace.Tracer instance.
"""
return trace.get_tracer(__name__)
def configure_tracing() -> None:
trace.set_tracer_provider(sdk_trace.TracerProvider())
if app_instance.IS_LOCAL:
print("Configuring local tracing.")
span_exporter: export.SpanExporter = export.ConsoleSpanExporter()
else:
print(f"Configuring cloud tracing in environment {app_instance.ENVIRONMENT}.")
span_exporter = cloud_trace.CloudTraceSpanExporter()
propagate.set_global_textmap(cloud_trace_propagator.CloudTraceFormatPropagator())
trace.get_tracer_provider().add_span_processor(export.SimpleSpanProcessor(span_exporter))
This configure_tracing function is called by the entrypoint script run on container start, so it executes before any requests are handled. When running in Google Cloud, the CloudTraceFormatPropagator should be what's required to ensure trace propagation, however it doesn't seem to be working for me.
This is the simple gRPC method I've been implementing with:
import grpc
from opentelemetry import trace
import stripe
from common import cloud_logging, datastore_utils, proto_helpers, tracing
from services.payment_service import payment_service_pb2
from third_party import stripe_client
def GetStripeInvoice(
self, request: payment_service_pb2.GetStripeInvoiceRequest, context: grpc.ServicerContext
) -> payment_service_pb2.StripeInvoiceResponse:
tracer: trace.Tracer = tracing.get_tracer()
with tracer.start_as_current_span('GetStripeInvoice'):
print(f"trace ID from header: {dict(context.invocation_metadata()).get('x-cloud-trace-context')}")
cloud_logging.info(f"Getting Stripe invoice.")
order = datastore_utils.get_pb_with_pb_key(request.order)
try:
invoice: stripe.Invoice = stripe_client.get_invoice(
invoice_id=order.stripe_invoice_id
)
cloud_logging.info(f"Retrieved Stripe invoice. Amount due: {invoice['amount_due']}")
except stripe.error.StripeError as e:
cloud_logging.error(
f"Failed to retrieve invoice: {e}"
)
context.abort(code=grpc.StatusCode.INTERNAL, details=str(e))
return payment_service_pb2.StripeInvoiceResponse(
invoice=proto_helpers.create_struct(invoice)
)
I've even gone as far as adding the x-cloud-trace-context header to local client requests, to no avail - the included value isn't used when starting traces.
I'm not sure what I'm missing here - I can see traces in the Cloud Trace dashboard so I believe the basic instrumentation is correct, however there's obviously something going on with the configuration/usage of the CloudTraceFormatPropagator.

It turns out that my configuration wasn't correct - or, I should say, it wasn't complete. I'd followed this basic example from the docs for the Google Cloud OpenTelemetry library, but I didn't realize that manually instrumenting wasn't needed.
I removed the call to tracer.start_as_current_span in my gRPC method, installed the gRPC instrumentation package (opentelemetry-instrumentation-grpc), and added it to the tracing configuration step during startup of my gRPC server, which now looks something like this:
from opentelemetry.instrumentation import grpc as grpc_instrumentation
from common import tracing # from my original question
def main():
"""Starts up GRPC server."""
# Set up tracing
tracing.configure_tracing()
grpc_instrumentation.GrpcInstrumentorServer().instrument()
# Set up the gRPC server
server = grpc.server(futures.ThreadPoolExecutor(max_workers=100))
# set up services & start
This approach has solved the issue described in my question - my log messages are now threaded in the expected manner
As someone new to telemetry & instrumentation, I didn't realize that I'd need to take an extra step since I'm tracing gRPC requests, but it makes sense now.
I ended up finding some helpful examples in a different set of docs - I'm not sure why these are separate from the docs linked earlier in this answer.
EDIT: Ah, I believe the gRPC instrumentation, and thus the related docs, are part of a separate but related project wherein contributors can add packages that instrument libraries of interest (i.e. gRPC, redis, etc). It'd be helpful if it was unified, which is the topic of this issue in the main OpenTelemetry Python repo.

While reviewing Google Documentation of OpenTelemetry using Python, I found some configurations that could help with the issue of tracing the correct ID. Additionally, there is a troubleshooting document to view traces in your Google Cloud Project when you expect trace data to be present.
Python-OpenTelemetry - https://cloud.google.com/trace/docs/setup/python-ot
Google Cloud Trace Troubleshooting - https://cloud.google.com/trace/docs/troubleshooting
For secure channels, you need to pass in chanel_type=’secure’. It is explained in the following link: https://github.com/open-telemetry/opentelemetry-python-contrib/issues/365
You need to use the x-cloud-trace-context header to ensure your traces use the same trace ID as the load balancer and AppServer on Google Cloud Run, and all link up in Google Trace.
The code below works to see you logs alongside traces in Google Trace’s Trace List view:
from opentelemetry import trace
from opentelemetry.trace.span import get_hexadecimal_trace_id, get_hexadecimal_span_id
current_span = trace.get_current_span()
if current_span:
trace_id = current_span.get_span_context().trace_id
span_id = current_span.get_span_context().span_id
if trace_id and span_id:
logging_fields['logging.googleapis.com/trace'] = f"projects/{self.gce_project}/traces/{get_hexadecimal_trace_id(trace_id)}"
logging_fields['logging.googleapis.com/spanId'] = f"{get_hexadecimal_span_id(span_id)}"
logging_fields['logging.googleapis.com/trace_sampled'] = True
The documentation and code above were tested using Flask Framework.

Related

How to remove a cloud armor secuiry policy from backend service using Python

I'm creating a few GCP cloud armor policies across multiple projects using the Python client library and attaching them to several backend services using the .set_security_policy() method
I know you can do it using the console / gcloud but I need to automate this in Python
I've tried the .update() method in google-cloud-compute but that did not work out
from google.cloud import compute, compute_v1
client = compute.BackendServicesClient()
backend_service_resource = compute_v1.types.BackendService(security_policy="")
client.update(project='project_id',
backend_service='backend_service',
backend_service_resource=backend_service_resource)
The error I got when running the above code is
google.api_core.exceptions.BadRequest: 400 PUT https://compute.googleapis.com/compute/v1/projects/<project-id>/global/backendServices/<backend-name>: Invalid value for field 'resource.loadBalancingScheme': 'INVALID_LOAD_BALANCING_SCHEME'. Cannot change load balancing scheme.
When I specify loadBalancingScheme then the same error occurs with another resource value. At run-time I would not have information of all the meta data of the backend-service and some meta-data might not be initialized in the first place.

This is for anyone who had similar issues in the future. I was originally going to call the gcloud commands through python using os.system() as #giles-roberts recommended, but then I stumbled across a proper way to to do this using the Client Libraries
You simply use the same .set_security_policy() to set the security policy in the first place but this time make the policy as None. This is not quite obvious since the name of the security policy has to be a string in the documentation and it does not accept an empty string either.
from google.cloud import compute, compute_v1
client = compute.BackendServicesClient()
resource = compute_v1.types.SecurityPolicyReference(security_policy=None)
error = client.set_security_policy(project='<project_id>',
backend_service='<backend_service>',
security_policy_reference_resource=resource)

How to Create Python Code with Google API Client

I have code below that was given to me to list Google Cloud Service Accounts for a specific Project.
import os
from googleapiclient import discovery
from gcp import get_key
"""gets all Service Accounts from the Service Account page"""
os.environ["GOOGLE_APPLICATION_CREDENTIALS"] = get_key()
service = discovery.build('iam', 'v1')
project_id = 'projects/<google cloud project>'
request = service.projects().serviceAccounts().list(name=project_id)
response = request.execute()
accounts = response['accounts']
for account in accounts:
print(account['email'])
This code works perfectly and prints the accounts as I need them. What I'm trying to figure out is:
Where can I go to see how to construct code like this? I found a site that has references to the Python API Client, but I can't seem to figure out how to make the code above from it. I can see the Method to list the Service Accounts, but it's still not giving me enough information.
Is there somewhere else I should be going to educate myself. Any information you have is appreciated so I don't pull out the rest of my hair.
Thanks, Eric

Let me share with you this documentation page, where there is a detailed explanation on how to build a script such as the one you shared, and what does each line of code mean. It is extracted from the documentation of ML Engine, not IAM, but it is using the same Python Google API Client Libary, so just ignore the references to ML and the rest will be useful for you.
In any case, here it is a commented version of your code, so that you understand it better:
# Imports for the Client API Libraries and the key management
import os
from googleapiclient import discovery
from gcp import get_key
# Look for an environment variable containing the credentials for Google Cloud Platform
os.environ["GOOGLE_APPLICATION_CREDENTIALS"] = get_key()
# Build a Python representation of the REST API
service = discovery.build('iam', 'v1')
# Define the Project ID of your project
project_id = 'projects/<google cloud project>'
"""Until this point, the code is general to any API
From this point on, it is specific to the IAM API"""
# Create the request using the appropriate 'serviceAccounts' API
# You can substitute serviceAccounts by any other available API
request = service.projects().serviceAccounts().list(name=project_id)
# Execute the request that was built in the previous step
response = request.execute()
# Process the data from the response obtained with the request execution
accounts = response['accounts']
for account in accounts:
print(account['email'])
Once you understand the first part of the code, the last lines are specific to the API you are using, which in this case is the Google IAM API. In this link, you can find detailed information on the methods available and what they do.
Then, you can follow the Python API Client Library documentation that you shared in order to see how to call the methods. For instance, in the code you shared, the method used depends on service, which is the Python representation of the API, and then goes down the tree of methods in the last link as in projects(), then serviceAccounts() and finally the specificlist() method, which ends up in request = service.projects().serviceAccounts().list(name=project_id).
Finally, just in case you are interested in the other available APIs, please refer to this page for more information.
I hope the comments I made on your code were of help, and that the documentation shared makes it easier for you to understand how a code like that one could be scripted.

You can use ipython having googleapiclient installed - with something like:
sudo pip install --upgrade google-api-python-client
You can go to interactive python console and do:
from googleapiclient import discovery
dir(discovery)
help(discovery)
dir - gives all entries that object has - so:
a = ''
dir(a)
Will tell what you can do with string object. Doing help(a) will give help for string object. You can do dipper:
dir(discovery)
# and then for instance
help(discovery.re)
You can call your script in steps, and see what is result print it, do some research, having something - do %history to printout your session, and have solution that can be triggered as a script.

Authentication to Google Cloud Python API Library stopped working

I have problems with the authentication in the Python Library of Google Cloud API.
At first it worked for some days without problem, but suddenly the API calls are not showing up in the API Overview of the Google CloudPlatform.
I created a service account and stored the json file locally. Then I set the environment variable GCLOUD_PROJECT to the project ID and GOOGLE_APPLICATION_CREDENTIALS to the path of the json file.
from google.cloud import speech
client = speech.Client()
print(client._credentials.service_account_email)
prints the correct service account email.
The following code transcribes the audio_file successfully, but the Dashboard for my Google Cloud project doesn't show anything for the activated Speech API Graph.
import io
with io.open(audio_file, 'rb') as f:
audio = client.sample(f.read(), source_uri=None, sample_rate=48000, encoding=speech.encoding.Encoding.FLAC)
alternatives = audio.sync_recognize(language_code='de-DE')
At some point the code also ran in some errors, regarding the usage limit. I guess due to the unsuccessful authentication, the free/limited option is used somehow.
I also tried the alternative option for authentication by installing the Google Cloud SDK and gcloud auth application-default login, but without success.
I have no idea where to start troubleshooting the problem.
Any help is appreciated!
(My system is running Windows 7 with Anaconda)
EDIT:
The error count (Fehler) is increasing with calls to the API. How can I get detailed information about the error?!

Make sure you are using an absolute path when setting the GOOGLE_APPLICATION_CREDENTIALS environment variable. Also, you might want to try inspecting the access token using OAuth2 tokeninfo and make sure it has "scope": "https://www.googleapis.com/auth/cloud-platform" in its response.
Sometimes you will get different error information if you initialize the client with GRPC enabled:
0.24.0:
speech_client = speech.Client(_use_grpc=True)
0.23.0:
speech_client = speech.Client(use_gax=True)
Usually it's an encoding issue, can you try with the sample audio or try generating LINEAR16 samples using something like the Unix rec tool:
rec --channels=1 --bits=16 --rate=44100 audio.wav trim 0 5
...
with io.open(speech_file, 'rb') as audio_file:
content = audio_file.read()
audio_sample = speech_client.sample(
content,
source_uri=None,
encoding='LINEAR16',
sample_rate=44100)
Other notes:
Sync Recognize is limited to 60 seconds of audio, you must use async for longer audio
If you haven't already, set up billing for your account

With regards to the usage problem, the issue is in fact that when you use the new google-cloud library to access ML APIs, it seems everyone authenticates to a project shared by everyone (hence it says you've used up your limit even though you've not used anything). To check and confirm this, you can call an ML API that you have not enabled by using the python client library, which will give you a result even though it shouldn't. This problem persists to other language client libraries and OS, so I suspect it's an issue with their grpc.
Because of this, to ensure consistency I always use the older googleapiclient that uses my API key. Here is an example to use the translate API:
from googleapiclient import discovery
service = discovery.build('translate', 'v2', developerKey='')
service_request = service.translations().list(q='hello world', target='zh')
result = service_request.execute()
print(result)
For the speech API, it's something along the lines of:
from googleapiclient import discovery
service = discovery.build('speech', 'v1beta1', developerKey='')
service_request = service.speech().syncrecognize()
result = service_request.execute()
print(result)
You can get the list of the discovery APIs at https://developers.google.com/api-client-library/python/apis/ with the speech one located in https://developers.google.com/resources/api-libraries/documentation/speech/v1beta1/python/latest/.
One of the other benefits of using the discovery library is that you get a lot more options compared to the current library, although often times it's a bit more of a pain to implement.

Poor Call Quality using Twilio Programmable Voice SDK

I am upgrading my Twilio-enabled app from the old SDK to the new Twilio Programmable Voice (beta 5) but have run into several problems. Chief among them is poor audio quality of outgoing calls, in what can only be described as what lost packets must sound like. The problem exists even when I run the Quickstart demo app, leading me to the conclusion the problem rests in my Twiml. I've followed the instructions to a "T" with respect to setting the appropriate capabilities, entitlements, provisioning profile and uploading the voip push credential, but with little documentation on the new SDK or for Python versions of the server, I'm left scratching my head.
The only modifications to the demo app I've made are to include the "to" and "from" parameters in my call request like so:
NSDictionary *params = #{#"To" : self.phoneTextField.text, #"From": #"+16462332222",};
[[VoiceClient sharedInstance] configureAudioSession];
self.outgoingCall = [[VoiceClient sharedInstance] call:[self fetchAccessToken] params:params delegate:self];
The call goes out to my Twiml server (a python deployment on Heroku) at the appropriate endpoint as seen here:
import os
from flask import Flask, request
from twilio.jwt.access_token import AccessToken, VoiceGrant
from twilio.rest import Client
import twilio.twiml
ACCOUNT_SID = 'ACblahblahblahblahblahblah'
API_KEY = 'SKblahblahblahblahblahblah'
API_KEY_SECRET = 'blahblahblahblahblahblah'
PUSH_CREDENTIAL_SID = 'CRblahblahblahblahblahblah'
APP_SID = 'APblahblahblahblahblahblah'
IDENTITY = 'My_App'
CALLER_ID = '+15551111' # my actual number
app = Flask(__name__)
#app.route('/makeTheDamnCall', methods=['GET', 'POST'])
def makeTheDamnCall():
account_sid = os.environ.get("ACCOUNT_SID", ACCOUNT_SID)
api_key = os.environ.get("API_KEY", API_KEY)
api_key_secret = os.environ.get("API_KEY_SECRET", API_KEY_SECRET)
CALLER_ID = request.values.get('From')
IDENTITY = request.values.get('To')
client = Client(api_key, api_key_secret, account_sid)
call = client.calls.create(url=request.url_root, to='client:' + IDENTITY, from_='client:' + CALLER_ID)
return str(call.sid)
The console outputs outgoingCall:didFailWithError: Twilio Services Error and the call logs show a completed client call. An inspection of the debugger shows TwilioRestException: HTTP 400 error: Unable to create record. As you can see, the url I include in the request might be problematic as it just goes to the root but there is no way to leave the url blank (that I have found). I will eventually change this to a url=request.url_root + 'handleRecording' for call recording purposes but am taking things one step at a time for now.
My solution so far has been to ditch the call = client.calls.create in favor of the dial verb like so:
resp = twilio.twiml.Response()
resp.dial(number = IDENTITY, callerId = CALLER_ID)
return str(resp)
This makes calls, but the quality is so poor as to render it useless. (10+ seconds of silence followed by intermittent spurts of hearing the other party). Using the dial verb in this way is also unacceptable because of its inefficiency as I'm now billed for two calls each time.
The other major problem, which I'm not sure is connected or not, is the fact that I haven't yet been able to receive any incoming calls, though I suspect I may need to ask that question separately.
How can I get this line to work? I'm looking at you, #philnash. Help me make my app great again. :)
call = client.calls.create(url=request.url_root, to='client:' + IDENTITY, from_='client:' + CALLER_ID)

sorry it's taken me a while to get back to your question.
Firstly, the correct way to make the ongoing connection from your Programmable Voice SDK call is using TwiML <Dial>. You were creating a call using the REST API, however you will have already have created the first leg of the call in the SDK and the TwiML forwards onto the second leg of the call, the person you dialled. Notably, you are billed for each leg of the call, not for two calls (legs can be of different length, for example, you could put the original caller through a menu system before dialling onto the recipient).
Secondly, regarding poor call quality, that's not something I can help with on Stack Overflow. The best thing to do in this situation is to get in touch with Twilio support and provide some Call SIDs for affected calls. If you can record an example call that would help too.
Finally, I haven't seen if you've asked another question about incoming calls yet, but please do and I'll do my best to help there. That probably is a code question that we can cover on SO.

Testing Flask REST server

I have a tiny Flask server that is supposed to load data from a file and run a function on it. This function will return a DataFrame and I return the json version of it. Much to my surprise this all works nicely. However, how would I test this? I have included some attempts below but I don't understand Flask (nor REST) well enough yet:
#!/home/thomas/python
from flask import Flask
from flask.ext.restful import Resource, Api
app = Flask(__name__)
api = Api(app)
class UniverseAPI(Resource):
def get(self):
import pandas as pd
frame = pd.read_csv("//datasrv10//data$//AQ//test.csv", index_col=0, header=0)
return frame.to_json()
api.add_resource(UniverseAPI, '/data/universe')
I am happy to include a few of my attempts here... I appreciate any hints. I have read the official documentation.
I should specify what I mean with testing. I can run this on my linux server and can extract all the required information with the requests package. However, I want to create a unittest that comes without the need to start the server on the localhost. I think I have managed with the FLASK test-client. However, the problem now is that the requests response object and the flask response object treat the underlying json strings rather differently. So I guess my problem is more related to json string issues rather than FLASK. Thanks for all your helpful feedback though

Well, the basics of writing a REST API are essentially a set of design principles. My understanding of it is based on this article by Miguel Grinberg, http://blog.miguelgrinberg.com/post/designing-a-restful-api-with-python-and-flask .
In it, he talks about how a REST API is:
"Stateless" - All interactions with the service can happen using the information from one request.
Built upon accessing "resources" from URIs using HTTP requests like GET, PUTS, and POST. A resource could be an order in a store, a task in a web app, or whatever you like.
There's also a bunch of stuff about how the server should standardize all forms of communication between itself and the client, indicate whether it can do cacheing, and other stuff like that. From an initial design standpoint, though, this is "the point" as he put it:
"The task of designing a web service or API that adheres to the REST guidelines then becomes > an exercise in identifying the resources that will be exposed and how they will be affected > by the different request methods."
If you're looking for an interesting example of a REST API that might be suited to your interests (I know it is to mine), reddit's is open source. It's a relatable example to see how they try and structure the interactions behind requests: http://www.reddit.com/dev/api

Develop Reference

Python is a programming language that lets you work quickly and integrate systems more effectively.

Tracing a Python gRPC server deployed on Cloud Run with OpenTelemetry - python

Related

How to remove a cloud armor secuiry policy from backend service using Python

How to Create Python Code with Google API Client

Authentication to Google Cloud Python API Library stopped working

Poor Call Quality using Twilio Programmable Voice SDK

Testing Flask REST server

Categories

Resources