GCP Cloud Functions printing extra blank line after every print statement - python

I have a Cloud Function running Python 3.7 runtime triggered from a Pub/Sub Topic.
In the code, I have places where I use print() to write logs. However, when I go to the logs tab of my function, I see that an extra blank line is added after each log. I would like to remove these, since this is basically doubling my usage of the Logging API.
I have tried using print(message, end="") but this did not remove the blank lines.
Thanks in advance.

Although I have not found out the root cause for the blank line, I was able to resolve this by using the google-cloud-logging library as suggested by John in the comment of my question.
Resulting code is as below:
import google.cloud.logging
import logging
# set up logging client when run on GCP
if not os.environ.get("DEVELOPMENT"): # custom environment variable
# only on GCP
logging_client = google.cloud.logging.Client()
logging_client.setup_logging()
# define logger
logger = logging.getLogger(__name__)
logger.setLevel(logging.DEBUG) # min logging level for logger
# define handler only on local
# only add handlers in local, since Cloud Function there already is a handler attached to the logger
# adding another handler in Cloud Function will result in duplicate logging with severity = ERROR
if os.environ.get("DEVELOPMENT"):
console_handler = logging.StreamHandler() # handler to write to stream
console_handler.setLevel(logging.DEBUG) # min logging level for handler
# add handler to logger
logger.addHandler(console_handler)
def my_function():
logger.info('info')
This code will,
not send code to GCP logs when function is executed on local
will print INFO and DEBUG logs both on local and on GCP
Thank you both for your suggestions.

Instead of using print. Use Logger.
import logging
import logging.handlers as handlers
logger = logging.getLogger("YourCloudFunctionLoggerName")
logger.setLevel(logging.DEBUG)

Related

Async logging in Django

I have created a simple weather webapp in Django using API. Logging is enabled and are written into files in Windows. I want logging to be asynchronous that is at the end of execution. How can we do async logging in Django?
We can only create async views in Django.
There is Python Logstash package which has Async way of logging, but it stores logs in a database in a remote instance.
(Alternative of which is to store logs in SQLLite3 db). File logging option is not present in it.
Moreover, async is newbie in Django and still many complexities present unresolved in it. It might cause memory overhead which
can degrade performance. Please find some links below for reference.
https://pypi.org/project/python-logstash/
https://docs.djangoproject.com/en/3.1/topics/async/#:~:text=New%20in%20Django%203.0.,have%20efficient%20long%2Drunning%20requests.
https://deepsource.io/blog/django-async-support/
you can use the logging module from python standard library
import logging
logger = logging.getLogger(__name__)
# Set file as output
handler = logging.StreamHandler()
# Formatter template
formatter = logging.Formatter('%(asctime)s - %(levelname)s - %(message)s')
# add a formatter to handler
handler.setFormatter(formatter)
# add a handler to logger
logger.addHandler(handler)
# your messages will be added to the file
logger.error("it's error message")
logger.info("it's info message")
logger.warning("it's warning message")
Official documentation: https://docs.python.org/3/library/logging.html
I hope I helped you!)
I can advise you to start the django project like this. Cons: nothing will be output to the console, but it will work faster than in middleware
nohup python manage.py runserver > file.log

How to redirect another library's console logging messages to a file, in Python

The fastAPI library that I import for an API I have written, writes many logging.INFO level messages to the console, which I would like either to redirect to a file-based log, or ideally, to both console and file. Here is an example of fastAPI module logging events in my console:
So I've tried to implement this Stack Overflow answer ("Easy-peasy with Python 3.3 and above"), but the log file it creates ("api_screen.log") is always empty....
# -------------------------- logging ----------------------------
logging_file = "api_screen.log"
logging_level = logging.INFO
logging_format = ' %(message)s'
logging_handlers = [logging.FileHandler(logging_file), logging.StreamHandler()]
logging.basicConfig(level = logging_level, format = logging_format, handlers = logging_handlers)
logging.info("------logging test------")
Even though my own "------logging test------" message does appear on console within the other fastAPI logs:
As you can see here it's created the file, but it has size zero.
So what do I need to do also to get the file logging working?
There are multiple issues here. First and most importantly: basicConfig does nothing if a logger is already configured, which fastAPI does. So the handlers you are creating are never used. When you call logging.info() you are sending a log to the root logger which is printed because the fastAPI has added a handler to it. You are also not setting the level on your handlers. Try this code instead of what you currently have:
logging_file = "api_screen.log"
logging_level = logging.INFO
logging_fh = logging.FileHandler(logging_file)
logging_sh = logging.StreamHandler()
logging_fh.setLevel(logging_level)
logging_sh.setLevel(logging_level)
root_logger = logging.getLogger()
root_logger.addHandler(logging_fh)
root_logger.addHandler(logging_sh)
logging.info('--test--')

How do I override an existing logger in another library?

I'm using Python's Tornado library for my web service and I want every log that is created from my code as well as from Tornado to be json formatted. I've tried setting the formatter on the root logger, setting the formatter and all the other loggers. This is the hack I'm currently trying to get working. It seems to me like it should work... However, when I run the app all of the logs from Tornado are still in their standard format.
import logging
from tornado.log import access_log, app_log, gen_log
import logmatic
loggers = [
logging.getLogger(),
logging.getLogger('tornado.access'),
logging.getLogger('tornado.application'),
logging.getLogger('tornado.general'),
access_log,
gen_log,
app_log
]
json_formatter = logmatic.JsonFormatter()
for logger in loggers:
for hand in logger.handlers:
hand.setFormatter(json_formatter)
logging.getLogger('tornado.access').warning('All the things')
# WARNING:tornado.access (172.26.0.6) 0.47ms
# NOT JSON???
NOTE: When I include the loggers for my service logging.getLogger('myservice') in the loggers list and run this they do get the updated formatter and spit out json. This rules out problems with the logmatic formatter. Can't get the formatter to work for the Tornado loggers.
Tornado's loggers don't have any handlers before calling loop.start(), so you should add a handler with predefined formatting to loggers.
formatter = logging.Formatter(...)
handler = logging.StreamHandler()
handler.setFormatter(formatter)
for l in loggers:
l.setLevel(logging.WARNING)
l.addHandler(handler)

Logging to separate files in Python

I'm using python's logging module. I've initialized it as:
import logging
logger = logging.getLogger(__name__)
in every of my modules. Then, in the main file:
logging.basicConfig(level=logging.INFO,filename="log.txt")
Now, in the app I'm also using WSGIServer from gevent. The initializer takes a log argument where I can add a logger instance. Since this is an HTTP Server it's very verbose.
I would like to log all of my app's regular logs to "log.txt" and WSGIServer's logs to "http-log.txt".
I tried this:
logging.basicConfig(level=logging.INFO,filename="log.txt")
logger = logging.getLogger(__name__)
httpLogger = logging.getLogger("HTTP")
httpLogger.addHandler(logging.FileHandler("http-log.txt"))
httpLogger.addFilter(logging.Filter("HTTP"))
http_server = WSGIServer(('0.0.0.0', int(config['ApiPort'])), app, log=httpLogger)
This logs all HTTP messages into http-log.txt, but also to the main logger.
How can I send all but HTTP messages to the default logger (log.txt), and HTTP messages only to http-log.txt?
EDIT: Since people are quickly jumping to point that this Logging to two files with different settings has an answer, plese read the linked answer and you'll see they don't use basicConfig but rather initialize each logger separately. This is not how I'm using the logging module.
Add the following line to disable propagation:
httpLogger.propagate = False
Then, it will no longer propagate messages to its ancestors' handlers which includes the root logger for which you have set up the general log file.

Duplicate log entries with Google Cloud Stackdriver logging of Python code on Kubernetes Engine

I have a simple Python app running in a container on Google Kubernetes Engine. I am trying to connect the standard Python logging to Google Stackdriver logging using this guide. I have almost succeeded, but I am getting duplicate log entries with one always at the 'error' level...
Screenshot of Stackdriver logs showing duplicate entries
This is my python code that set's up the logging according to the above guide:
import webapp2
from paste import httpserver
import rpc
# Imports the Google Cloud client library
import google.cloud.logging
# Instantiates a client
client = google.cloud.logging.Client()
# Connects the logger to the root logging handler; by default this captures
# all logs at INFO level and higher
client.setup_logging()
app = webapp2.WSGIApplication([('/rpc/([A-Za-z]+)', rpc.RpcHandler),], debug=True)
httpserver.serve(app, host='0.0.0.0', port='80')
Here's the code that triggers the logs from the screenshot:
import logging
logging.info("INFO Entering PostEchoPost...")
logging.warning("WARNING Entering PostEchoPost...")
logging.error("ERROR Entering PostEchoPost...")
logging.critical("CRITICAL Entering PostEchoPost...")
Here is the full Stackdriver log, expanded from the screenshot, with an incorrectly interpreted ERROR level:
{
insertId: "1mk4fkaga4m63w1"
labels: {
compute.googleapis.com/resource_name: "gke-alg-microservice-default-pool-xxxxxxxxxx-ttnz"
container.googleapis.com/namespace_name: "default"
container.googleapis.com/pod_name: "esp-alg-xxxxxxxxxx-xj2p2"
container.googleapis.com/stream: "stderr"
}
logName: "projects/projectname/logs/algorithm"
receiveTimestamp: "2018-01-03T12:18:22.479058645Z"
resource: {
labels: {
cluster_name: "alg-microservice"
container_name: "alg"
instance_id: "703849119xxxxxxxxxx"
namespace_id: "default"
pod_id: "esp-alg-xxxxxxxxxx-xj2p2"
project_id: "projectname"
zone: "europe-west1-b"
}
type: "container"
}
severity: "ERROR"
textPayload: "INFO Entering PostEchoPost...
"
timestamp: "2018-01-03T12:18:20Z"
}
Here is the the full Stackdriver log, expanded from the screenshot, with a correctly interpreted INFO level:
{
insertId: "1mk4fkaga4m63w0"
jsonPayload: {
message: "INFO Entering PostEchoPost..."
thread: 140348659595008
}
labels: {
compute.googleapis.com/resource_name: "gke-alg-microservi-default-pool-xxxxxxxxxx-ttnz"
container.googleapis.com/namespace_name: "default"
container.googleapis.com/pod_name: "esp-alg-xxxxxxxxxx-xj2p2"
container.googleapis.com/stream: "stderr"
}
logName: "projects/projectname/logs/algorithm"
receiveTimestamp: "2018-01-03T12:18:22.479058645Z"
resource: {
labels: {
cluster_name: "alg-microservice"
container_name: "alg"
instance_id: "703849119xxxxxxxxxx"
namespace_id: "default"
pod_id: "esp-alg-xxxxxxxxxx-xj2p2"
project_id: "projectname"
zone: "europe-west1-b"
}
type: "container"
}
severity: "INFO"
timestamp: "2018-01-03T12:18:20.260099887Z"
}
So, this entry might be the key:
container.googleapis.com/stream: "stderr"
It looks like in addition to my logging set-up working, all logs from the container are being send to stderr in the container, and I believe that by default, at least on Kubernetes Container Engine, all stdout/stderr are picked up by Google Stackdriver via FluentD... Having said that, I'm out of my depth at this point.
Any ideas why I am getting these duplicate entries?
I solved this problem by overwriting the handlers property on my root logger immediately after calling the setup_logging method
import logging
from google.cloud import logging as gcp_logging
from google.cloud.logging.handlers import CloudLoggingHandler, ContainerEngineHandler, AppEngineHandler
logging_client = gcp_logging.Client()
logging_client.setup_logging(log_level=logging.INFO)
root_logger = logging.getLogger()
# use the GCP handler ONLY in order to prevent logs from getting written to STDERR
root_logger.handlers = [handler
for handler in root_logger.handlers
if isinstance(handler, (CloudLoggingHandler, ContainerEngineHandler, AppEngineHandler))]
To elaborate on this a bit, the client.setup_logging method sets up 2 handlers, a normal logging.StreamHandler and also a GCP-specific handler. So, logs will go to both stderr and Cloud Logging. You need to remove the stream handler from the handlers list to prevent the duplication.
EDIT:
I have filed an issue with Google to add an argument to to make this less hacky.
Problem is in the way how logging client initializes root logger
logger = logging.getLogger()
logger.setLevel(log_level)
logger.addHandler(handler)
logger.addHandler(logging.StreamHandler())
it adds default stream handler in addition to Stackdriver handler.
My workaround for now is to initialize appropriate Stackdriver handler manually:
# this basically manually sets logger compatible with GKE/fluentd
# as LoggingClient automatically add another StreamHandler - so
# log records are duplicated
from google.cloud.logging.handlers import ContainerEngineHandler
formatter = logging.Formatter("%(message)s")
handler = ContainerEngineHandler(stream=sys.stderr)
handler.setFormatter(formatter)
handler.setLevel(level)
root = logging.getLogger()
root.addHandler(handler)
root.setLevel(level)
Writing in 2022, shortly after v3.0.0 of google-cloud-logging was released, and this issue cropped up for me too (albeit almost certainly for a different reason).
Debugging
The most useful thing I did on the way to debugging it was stick the following in my code:
import logging
...
root_logger = logging.getLogger() # no arguments = return the root logger
print(root_logger.handlers, flush=True) # tell me what handlers are attached
...
If you're getting duplicate logs, it seems certain that it's because you've got multiple handlers attached to your logger, and Stackdriver is catching logs from both of them! To be fair, that is Stackdriver's job; it's just a pity that google-cloud-logging can't sort this out by default.
The good news is that Stackdriver will also catch the print statement (which goes to the STDOUT stream). In my case, the following list of handlers was logged: [<StreamHandler <stderr> (NOTSET)>, <StructuredLogHandler <stderr> (NOTSET)>]. So: two handlers were attached to the root logger.
Fixing it
You might be able to find that your code is attaching the handler somewhere else, and simply remove that part. But it may instead be the case that e.g. a dependency is setting up the extra handler, something I wrestled with.
I used a solution based on the answer written by Andy Carlson. Keeping it general/extensible:
import google.cloud.logging
import logging
def is_cloud_handler(handler: logging.Handler) -> bool:
"""
is_cloud_handler
Returns True or False depending on whether the input is a
google-cloud-logging handler class
"""
accepted_handlers = (
google.cloud.logging.handlers.StructuredLogHandler,
google.cloud.logging.handlers.CloudLoggingHandler,
google.cloud.logging.handlers.ContainerEngineHandler,
google.cloud.logging.handlers.AppEngineHandler,
)
return isinstance(handler, accepted_handlers)
def set_up_logging():
# here we assume you'll be using the basic logging methods
# logging.info, logging.warn etc. which invoke the root logger
client = google.cloud.logging.Client()
client.setup_logging()
root_logger = logging.getLogger()
root_logger.handlers = [h for h in root_logger.handlers if is_cloud_handler(h)]
More context
For those who find this solution confusing
In Python there is a separation between 'loggers' and 'handlers': loggers generate logs, and handlers decide what happens to them. Thus, you can attach multiple handlers to the same logger (in case you want multiple things to happen to the logs from that logger).
The google-cloud-logging library suggests that you run its setup_logging method and then just use the basic logging methods of the built in logging library to create your logs. These are: logging.debug, logging.info, logging.warning, logging.error, and logging.critical (in escalating order of urgency).
All logging.Logger instances have the same methods, including a special Logger instance called the root logger. If you look at the source code for the basic logging methods, they simply call these methods on this root logger.
It's possible to set up specific Loggers, which is standard practice to demarcate logs generated by different areas of an application (rather than sending everything via the root logger). This is done using logging.getLogger("name-of-logger"). However, logging.getLogger() with no argument returns the root logger.
Meanwhile, the purpose of the google.cloud.logging.Client.setup_logging method is to attach a special log handler to the root logger. Thus, logs created using logging.info etc. will be handled by a google-cloud-logging handler. But you have to make sure no other handlers are also attached to the root logger.
Fortunately, Loggers have a property, .handlers, which is a list of attached log handlers. In this solution we just edit that list to ensure we have just one handler.

Categories

Resources