I have a problem with my implementation of Opencensus, logging in Python and FastAPI. I want to log incomming requests to Application Insights in Azure, so I added a FastAPI middleware to my code following the Microsoft docs and this Github post:
propagator = TraceContextPropagator()
#app.middleware('http')
async def middleware_opencensus(request: Request, call_next):
tracer = Tracer(
span_context=propagator.from_headers(request.headers),
exporter=AzureExporter(connection_string=os.environ['APPLICATION_INSIGHTS_CONNECTION_STRING']),
sampler=AlwaysOnSampler(),
propagator=propagator)
with tracer.span('main') as span:
span.span_kind = SpanKind.SERVER
tracer.add_attribute_to_current_span(HTTP_HOST, request.url.hostname)
tracer.add_attribute_to_current_span(HTTP_METHOD, request.method)
tracer.add_attribute_to_current_span(HTTP_PATH, request.url.path)
tracer.add_attribute_to_current_span(HTTP_ROUTE, request.url.path)
tracer.add_attribute_to_current_span(HTTP_URL, str(request.url))
response = await call_next(request)
tracer.add_attribute_to_current_span(HTTP_STATUS_CODE, response.status_code)
return response
This works great when running local, and all incomming requests to the api are logged to Application Insights. Since having Opencensus implemented however, when deployed in a Container Instance on Azure, after a couple of days (approximately 3) an issue arises where it looks like some recursive logging issue happens (+30.000 logs per second!), i.a. stating Queue is full. Dropping telemetry, before finally crashing after a few hours of mad logging:
Our logger.py file where we define our logging handlers is as follows:
import logging.config
import os
import tqdm
from pathlib import Path
from opencensus.ext.azure.log_exporter import AzureLogHandler
class TqdmLoggingHandler(logging.Handler):
"""
Class for enabling logging during a process with a tqdm progress bar.
Using this handler logs will be put above the progress bar, pushing the
process bar down instead of replacing it.
"""
def __init__(self, level=logging.NOTSET):
super().__init__(level)
self.formatter = logging.Formatter(fmt='%(asctime)s <%(name)s> %(levelname)s: %(message)s',
datefmt='%d-%m-%Y %H:%M:%S')
def emit(self, record):
try:
msg = self.format(record)
tqdm.tqdm.write(msg)
self.flush()
except (KeyboardInterrupt, SystemExit):
raise
except:
self.handleError(record)
logging_conf_path = Path(__file__).parent
logging.config.fileConfig(logging_conf_path / 'logging.conf')
logger = logging.getLogger(__name__)
logger.addHandler(TqdmLoggingHandler(logging.DEBUG)) # Add tqdm handler to root logger to replace the stream handler
if os.getenv('APPLICATION_INSIGHTS_CONNECTION_STRING'):
logger.addHandler(AzureLogHandler(connection_string=os.environ['APPLICATION_INSIGHTS_CONNECTION_STRING']))
warning_level_loggers = ['urllib3', 'requests']
for lgr in warning_level_loggers:
logging.getLogger(lgr).setLevel(logging.WARNING)
Does anyone have any idea on what could be the cause of this issue, or have people encountered similar issues? I don't know what the 'first' error log is due to the fast amount of logging.
Please let me know if additional information is required.
Thanks in advance!
We decided to revisit the problem and found two helpful threads describing similar if not exactly the same behaviour we were seeing:
https://github.com/census-instrumentation/opencensus-python/issues/862
https://github.com/census-instrumentation/opencensus-python/issues/1007
As described in the second thread it seems that Opencensus attempts to send a trace to AI and on failure the failed logs will be batched and sent again in 15s (default). This will go on indefinitely until success, possibly causing the huge and seemingly recursive spam of failure logs.
A solution introduced and proposed by Izchen in this comment is to set the enable_local_storage=False for this issue.
Another solution would be to migrate to OpenTelemetry that should not contain this potential problem and is the solution we are currently running. Do keep in mind that Opencensus is still the officially supported application monitoring solution by Microsoft and OpenTelemetry is still very young. OpenTelemetry does seem to have a lot of support however and getting more and more traction.
As for the implementation of OpenTelemetry we did the following to trace our requests:
if os.getenv('APPLICATION_INSIGHTS_CONNECTION_STRING'):
from azure.monitor.opentelemetry.exporter import AzureMonitorTraceExporter
from opentelemetry import trace
from opentelemetry.instrumentation.fastapi import FastAPIInstrumentor
from opentelemetry.propagate import extract
from opentelemetry.sdk.resources import SERVICE_NAME, SERVICE_NAMESPACE, SERVICE_INSTANCE_ID, Resource
from opentelemetry.sdk.trace import TracerProvider
from opentelemetry.sdk.trace.export import BatchSpanProcessor
provider = TracerProvider()
processor = BatchSpanProcessor(AzureMonitorTraceExporter.from_connection_string(
os.environ['APPLICATION_INSIGHTS_CONNECTION_STRING']))
provider.add_span_processor(processor)
trace.set_tracer_provider(provider)
FastAPIInstrumentor.instrument_app(app)
OpenTelemetry supports a lot of custom Instrumentors that can be used to create spans for for example Requests PyMongo, Elastic, Redis, etc. => https://opentelemetry.io/registry/.
If you'd want to write your custom tracers/spans like in the OpenCensus example above you can attempt something like this:
# These come still from Opencensus for convenience
HTTP_HOST = COMMON_ATTRIBUTES['HTTP_HOST']
HTTP_METHOD = COMMON_ATTRIBUTES['HTTP_METHOD']
HTTP_PATH = COMMON_ATTRIBUTES['HTTP_PATH']
HTTP_ROUTE = COMMON_ATTRIBUTES['HTTP_ROUTE']
HTTP_URL = COMMON_ATTRIBUTES['HTTP_URL']
HTTP_STATUS_CODE = COMMON_ATTRIBUTES['HTTP_STATUS_CODE']
provider = TracerProvider()
processor = BatchSpanProcessor(AzureMonitorTraceExporter.from_connection_string(
os.environ['APPLICATION_INSIGHTS_CONNECTION_STRING']))
provider.add_span_processor(processor)
trace.set_tracer_provider(provider)
#app.middleware('http')
async def middleware_opentelemetry(request: Request, call_next):
tracer = trace.get_tracer(__name__)
with tracer.start_as_current_span('main',
context=extract(request.headers),
kind=trace.SpanKind.SERVER) as span:
span.set_attributes({
HTTP_HOST: request.url.hostname,
HTTP_METHOD: request.method,
HTTP_PATH: request.url.path,
HTTP_ROUTE: request.url.path,
HTTP_URL: str(request.url)
})
response = await call_next(request)
span.set_attribute(HTTP_STATUS_CODE, response.status_code)
return response
The AzureLogHandler from our logger.py configuration wasn't needed any more with this solution and was thus removed.
Some other sources that might be useful:
https://learn.microsoft.com/en-us/azure/communication-services/quickstarts/telemetry-application-insights?pivots=programming-language-python#setting-up-the-telemetry-tracer-with-communication-identity-sdk-calls
https://learn.microsoft.com/en-us/python/api/overview/azure/monitor-opentelemetry-exporter-readme?view=azure-python-preview
https://learn.microsoft.com/en-us/azure/azure-monitor/app/opentelemetry-enable?tabs=python
Related
What we are trying:
We are trying to Run a Cloud Run Job that does some computation and also uses one our custom package to do the computation. The cloud run job is using google-cloud-logging and python's default logging package as described here. The custom python package also logs its data (only logger is defined as suggested here).
Simple illustration:
from google.cloud import logging as gcp_logging
import logging
import os
import google.auth
from our_package import do_something
def log_test_function():
SCOPES = ["https://www.googleapis.com/auth/cloud-platform"]
credentials, project_id = google.auth.default(scopes=SCOPES)
try:
function_logger_name = os.getenv("FUNCTION_LOGGER_NAME")
logging_client = gcp_logging.Client(credentials=credentials, project=project_id)
logging.basicConfig()
logger = logging.getLogger(function_logger_name)
logger.setLevel(logging.INFO)
logging_client.setup_logging(log_level=logging.INFO)
logger.critical("Critical Log TEST")
logger.error("Error Log TEST")
logger.info("Info Log TEST")
logger.debug("Debug Log TEST")
result = do_something()
logger.info(result)
except Exception as e:
print(e) # just to test how print works
return "Returned"
if __name__ == "__main__":
result = log_test_function()
print(result)
Cloud Run Job Logs
The Blue Box indicates logs from custom package
The Black Box indicates logs from Cloud Run Job
The Cloud Logging is not able to identify the severity of logs. It parses every log entry as default level.
But if I run same code in Cloud Function, it seems to work as expected (i.e. severity level of logs from cloud function and custom package is respected) as shown in image below.
Cloud Function Logs
Both are serverless architecture than why does it works in Cloud Function but not in Cloud Run.
What we want to do:
We want to log every message from Cloud Run Job and custom package to Cloud Logging with correct severity.
We would appreciate your help guys!
Edit 1
Following Google Cloud Python library commiters solution. Almost solved the problem. Following is the modified code.
from google.cloud import logging as gcp_logging
import logging
import os
import google.auth
from our_package import do_something
from google.cloud.logging.handlers import CloudLoggingHandler
from google.cloud.logging_v2.handlers import setup_logging
from google.cloud.logging_v2.resource import Resource
from google.cloud.logging_v2.handlers._monitored_resources import retrieve_metadata_server, _REGION_ID, _PROJECT_NAME
def log_test_function():
SCOPES = ["https://www.googleapis.com/auth/cloud-platform"]
region = retrieve_metadata_server(_REGION_ID)
project = retrieve_metadata_server(_PROJECT_NAME)
try:
function_logger_name = os.getenv("FUNCTION_LOGGER_NAME")
# build a manual resource object
cr_job_resource = Resource(
type="cloud_run_job",
labels={
"job_name": os.environ.get('CLOUD_RUN_JOB', 'unknownJobId'),
"location": region.split("/")[-1] if region else "",
"project_id": project
}
)
logging_client = gcp_logging.Client()
gcloud_logging_handler = CloudLoggingHandler(logging_client, resource=cr_job_resource)
setup_logging(gcloud_logging_handler, log_level=logging.INFO)
logging.basicConfig()
logger = logging.getLogger(function_logger_name)
logger.setLevel(logging.INFO)
logger.critical("Critical Log TEST")
logger.error("Error Log TEST")
logger.warning("Warning Log TEST")
logger.info("Info Log TEST")
logger.debug("Debug Log TEST")
result = do_something()
logger.info(result)
except Exception as e:
print(e) # just to test how print works
return "Returned"
if __name__ == "__main__":
result = log_test_function()
print(result)
Now the every log is logged twice one severity sensitive log other severity insensitive logs at "default" level as shown below.
Cloud Functions get your code, wrap it in a webserver, build a container and deploy it.
With Cloud Run, you only build and deploy the container.
That means, Cloud Functions webserver wrapper do something more that you do: it initialize correctly the logger in Python.
Have a look to that doc page, you should solve your issue with it
EDIT 1
I took the exact example and I added it in my flask server like that
import os
from flask import Flask
app = Flask(__name__)
import google.cloud.logging
# Instantiates a client
client = google.cloud.logging.Client()
# Retrieves a Cloud Logging handler based on the environment
# you're running in and integrates the handler with the
# Python logging module. By default this captures all logs
# at INFO level and higher
client.setup_logging()
# [END logging_handler_setup]
# [START logging_handler_usage]
# Imports Python standard library logging
import logging
#app.route('/')
def call_function():
# The data to log
text = "Hello, world!"
# Emits the data using the standard logging module
logging.warning(text)
# [END logging_handler_usage]
print("Logged: {}".format(text))
return text
# For local execution
if __name__ == "__main__":
app.run(host='0.0.0.0',port=int(os.environ.get('PORT',8080)))
A rough copy of that sample.
And the result is correct. A Logged entry with the default level (the print), and my Hello world in warning status (the logging.warning)
EDIT 2
Thanks to the help of the Google Cloud Python library commiters, I got a solution to my issue. By waiting the native integration in the library.
Here my new code for Cloud Run Jobs this time
import google.cloud.logging
from google.cloud.logging.handlers import CloudLoggingHandler
from google.cloud.logging_v2.handlers import setup_logging
from google.cloud.logging_v2.resource import Resource
from google.cloud.logging_v2.handlers._monitored_resources import retrieve_metadata_server, _REGION_ID, _PROJECT_NAME
import os
# find metadata about the execution environment
region = retrieve_metadata_server(_REGION_ID)
project = retrieve_metadata_server(_PROJECT_NAME)
# build a manual resource object
cr_job_resource = Resource(
type = "cloud_run_job",
labels = {
"job_name": os.environ.get('CLOUD_RUN_JOB', 'unknownJobId'),
"location": region.split("/")[-1] if region else "",
"project_id": project
}
)
# configure handling using CloudLoggingHandler with custom resource
client = google.cloud.logging.Client()
handler = CloudLoggingHandler(client, resource=cr_job_resource)
setup_logging(handler)
import logging
def call_function():
# The data to log
text = "Hello, world!"
# Emits the data using the standard logging module
logging.warning(text)
# [END logging_handler_usage]
print("Logged: {}".format(text))
return text
# For local execution
if __name__ == "__main__":
call_function()
And the result works great:
Logged: .... entry is the simple "Print" to the stdout. Standard level
Warning entry as expected
you might want to get rid of loggers you don't need. take a look at https://stackoverflow.com/a/61602361/13161301
I spend some time going over this error but had no success.
File "C:\Users\ebara.conda\envs\asci\lib\site-packages\fastapi\openapi\utils.py", line 388, in get_openapi
flat_models=flat_models, model_name_map=model_name_map
File "C:\Users\ebara.conda\envs\asci\lib\site-packages\fastapi\utils.py", line 28, in get_model_definitions
model_name = model_name_map[model]
KeyError: <class 'pydantic.main.Body_login_access_token_api_v1_login_access_token_post'>
The problem is that I'm trying to build a project with user authentication from OpenAPI form to create new users in database.
I've used backend part of this template project https://github.com/tiangolo/full-stack-fastapi-postgresql
Everything works except for Authentication like here.
#router.post("/login/access-token", response_model=schemas.Token)
def login_access_token(
db: Session = Depends(deps.get_db), form_data: OAuth2PasswordRequestForm = Depends()) -> Any:
When I add this part form_data: OAuth2PasswordRequestForm = Depends() - and go to /docs page - this error appears (Failed to load API definition. Fetch error. Internal Server Error /openapi.json)
.
The server itself runs in normal mode, but it can't load the open API. If I remove the aforementioned formdata part - then everything works smoothly, but without Authorisation. I tried to debug it, but I have no success. I think it might be connected to a dependency graph or some start-up issues, but have no guess how to trace it back.
Here is the full working example which will reproduce the error. The link points out the code which causes the problem. If you will comment out lines 18-39 - the docs will open without any problems.
https://github.com/BEEugene/fastapi_error_demo/blob/master/fastapi_service/api/api_v1/endpoints/login.py
Any ideas on how to debug or why this error happens?
You are using Depends function without an argument. Maybe in the console, you were having the error provoked by a function. You must pass the OAuth2PasswordRequestForm function after importing it from fastapi.security to get the result you were expecting.
from fastapi.security import OAuth2PasswordRequestForm
form_data: OAuth2PasswordRequestForm = Depends(OAuth2PasswordRequestForm)
it might work.
It seems that in my case - the main error issue was that I was an idiot.
As said, if you will comment out lines 18-39 - the docs will open without any problems. But, you will notice this warning:
UserWarning: Duplicate Operation ID read_users_api_v1_users__get for
function read_users at
...\fastapi_error\fastapi_service\api\api_v1\endpoints\users.py
warnings. Warn(message)
I've started to compare all the files and it is appeared that I included router to the fastapi twice:
import logging
from fastapi import FastAPI
from starlette.middleware.cors import CORSMiddleware
from fastapi_service.api.api_v1.api import api_router
from fastapi_service.core.config import settings
from fastapi_service.core.event_handlers import (start_app_handler,
stop_app_handler)
log = logging.getLogger(__name__)
def get_app(mode="prod") -> FastAPI:
fast_app = FastAPI(title=settings.PROJECT_NAME,
version=settings.APP_VERSION,
debug=settings.IS_DEBUG)
# openapi_url=f"{settings.API_V1_STR}/openapi.json")
# first time when I included the router
fast_app.include_router(api_router, prefix=f"{settings.API_V1_STR}")
fast_app.mode = mode
logger = log.getChild("get_app")
logger.info("adding startup")
fast_app.add_event_handler("startup", start_app_handler(fast_app))
logger.info("adding shutdown")
fast_app.add_event_handler("shutdown", stop_app_handler(fast_app))
return fast_app
app = get_app()
# Set all CORS enabled origins
if settings.BACKEND_CORS_ORIGINS:
app.add_middleware(
CORSMiddleware,
allow_origins=[str(origin) for origin in settings.BACKEND_CORS_ORIGINS],
allow_credentials=True,
allow_methods=["*"],
allow_headers=["*"],
)
# second time when I included the router
app.include_router(api_router, prefix=settings.API_V1_STR)
So, if you comment out (or just delete) the second router introduction - the app will work normally.
It seems, that the answer to my question on how to debug this error - is to find the point where the bug appears in fastapi and compare the values in it to the version where there is no error. In my case the number of keys in different dictionaries differed in the function get_model_definitions.
I had the same problem. For me it was because I had code like this
from pydantic import BaseModel
class A(BaseModel):
b: B
class B(BaseModel):
c: int
but instead, Class B should have been defined above class A. This fixed it:
from pydantic import BaseModel
class B(BaseModel):
c: int
class A(BaseModel):
b: B
More info: https://stackoverflow.com/a/70384637/9439097
Regaring your original question on how to debug these or similar errors:
You probably have your routes defined somewhere. Comment all of your routers/routes out, then the openapi docs should generate (and they should show you have no routes. Then, enable the routes one by one and see which one causes the error. THis is how I debuged my situation.
I am using Python and I was wondering if there is any package/simple way for logging directly to Azure?
I found a package (azure-storage-logging) that would be really nice, however it is not being maintained and not compatible with the new Azure API.
Any help is welcome.
You should use Application Insights which will send the logs to Azure Monitor (previously Log Analytics).
https://learn.microsoft.com/en-us/azure/azure-monitor/app/opencensus-python
I had the same requirement to log error and debug messages for small application and store the logs to Azure data lake. We did not want to use Azure Insight as our was not a web application and we just needed logs to debug the code.
To solve this I created temp.log file.
logging.basicConfig(filename='temp.log', format='%(asctime)s %(levelname)-8s [%(filename)s:%(lineno)d] %(message)s',
datefmt='%Y-%m-%d:%H:%M:%S')
At the end of program I uploaded the temp.log to azure using,
DataLakeFileClient.append_data
local_file = open("temp.log",'r')
file_contents = local_file.read()
file_client.append_data(data=file_contents, offset=0, length=len(file_contents))
file_client.flush_data(len(file_contents))
https://learn.microsoft.com/en-us/azure/storage/blobs/data-lake-storage-directory-file-acl-python
You can just create your own handler. I show you how to log it to an azure table. Storing in blob can be similar. The biggest benefit is that you emit the log as you log it instead of sending logs at the end of a process.
Create a table in azure table storage and then run the following code.
from logging import Logger, getLogger
from azure.core.credentials import AzureSasCredential
from azure.data.tables import TableClient, TableEntity
class _AzureTableHandler(logging.Handler):
def __init__(self, *args, **kwargs):
super(_AzureTableHandler, self).__init__(*args, **kwargs)
credentials: AzureSasCredential = AzureSasCredential(signature=<sas-token>)
self._table_client: TableClient = TableClient(endpoint=<storage-account-url>, table_name=<table-name>, credential=credentials)
def emit(self, record):
level = record.levelname
message = record.getMessage()
self._table_client.create_entity(TableEntity({'Severity': level,
'Message': message,
'PartitionKey': f'{datetime.now().date()}',
'RowKey': f'{datetime.now().microsecond}'}))
if __name__ == "__main__":
logger: Logger = getLogger(__name__)
logger.addHandler(_AzureTableHandler())
logger.warning('testing azure logging')
In this approach, you also have the benefit of creating custom columns for your table. For example, you can have separate columns for the project name which is logging, or the username of the dev who is running the script.
logger.addHandler(_AzureTableHandler(Username="Hesam", Project="Deployment-support-tools-client-portal"))
Make sure to add your custom column names to the table_entity dictionary. Or you can put the project name as partition key.
I want to add a warning to an old function-based view alerting users that we are moving to class-based views. I see these warnings all the time in Django, such as RemovedInDjango19Warning. It's really cool.
def old_view(request, *args, **kwargs):
print('I can see this message in my terminal output!')
warnings.warn("But not this message.", DeprecationWarning, stacklevel=2)
view = NewClassBasedView.as_view()
return view(request, *args, **kwargs)
I see the print statement in my terminal output but not the warning itself. I get other warnings from Django in the terminal (like the aforementioned RemovedInDjango19Warning but not this one.
Am I missing something? Do I have to turn warnings on somehow at an import specific level (even though it is working broadly for Django)?
Assuming you are using Django 1.8 and you have no Logging configurations, you can read from the Django documentation:
Django uses Python’s builtin logging module to perform system logging. The usage of this module is discussed in detail in Python’s own documentation.
So you can do:
# import the logging library
import logging
# Get an instance of a logger
logger = logging.getLogger(__name__)
def my_view(request, arg1, arg):
...
if bad_mojo:
# Log an error message
logger.error('Something went wrong!')
The logging module is very useful and has too many functions, I suggest you to take a look at Python logging documentation, you can for example do:
FORMAT = '%(asctime)-15s %(clientip)s %(user)-8s %(message)s'
logging.basicConfig(format=FORMAT)
d = {'clientip': '192.168.0.1', 'user': 'fbloggs'}
logger = logging.getLogger('tcpserver')
logger.warning('Protocol problem: %s', 'connection reset', extra=d)
That outputs to:
2006-02-08 22:20:02,165 192.168.0.1 fbloggs Protocol problem: connection reset
To format all your messages and have pretty awesome messages to your users.
If you want to use the warnings module, you have to know that Python by default does not displays all warnings raised on the application. There are two ways to change this behaviour:
By parameters: You have to call python with the -Wd parameter to load the default filter, so you can do python -Wd manage.py runserver to call the test server.
By program: You need to call the warnings.simplefilter('default') function just one time . You can call this function from anywhere, but you have to be sure that this line will be executed before any call to warnings.warn, on my tests I placed it at the beginning of settings.py file but I am not sure that was the best place. The __init__.py file of the project package would be a nice place.
from tornado.web import RequestHandler
class HelloWorldHandler(RequestHandler):
def get(self):
# self.write("Hello, world...!!!") # works without any error
self.render('hello.html') # but here I get:
# `500: Internal Server Error` and my console shows `No handlers
# could be found for logger "tornado.application" `.
What is the issue? I've already Googled No handlers could be found for logger "tornado.application".
and surprisingly all urls suggest same method but I'm unable to implement this.
here is same thread on SOF.
If your logs were configured correctly you'd get a stack trace in the logs that would explain what went wrong. The logs are supposed to be configured automatically in IOLoop.start() so I'm not sure why that's not happening, but you can configure them manually by calling logging.basicConfig() or tornado.options.parse_command_line() at the beginning of main.