Something just doesn't click internally for me with pythons logging despite reading the documentation.
I have this code
import logging
logging.basicConfig(level=logging.INFO,format='%(levelname)s::%(message)s')
LOG = logging.getLogger("__name__")
LOG.info("hey")
If I run it from bash I get this:
INFO::hey
If I run it in an aws lambda the "hey" doesn't shows up at all in the logs.
I then did a test setting the level on the logger by adding this:
LOG.setLevel(logging.INFO)
Run from bash I get the same thing I got before (desired format), but run from the lambda this shows up in the logs:
[INFO] 2022-02-14T23:30:43.39Z eb94600a-af45-4124-99b6-d9651d6a3cf6 hey
Okay... that is pretty odd. The format is not the same as from bash.
I thought I could rationalize the first example because the output on bash is actually going to stderr. And I then assumed the aws lamdba logs just don't grab that. But the second example is also going to stderr on bash, yet it shows up in the lambda logs but with the wrong format. So clearly I am missing something.
What is going on under the hood here?
When your Lambda runs, a harness is running that does some basic bootstrap and then loads your module and invokes it. Part of that bootstrap in the AWS Lambda Python Runtime replaces the standard Python logger with its own:
logger_handler = LambdaLoggerHandler(log_sink)
logger_handler.setFormatter(
logging.Formatter(
"[%(levelname)s]\t%(asctime)s.%(msecs)dZ\t%(aws_request_id)s\t%(message)s\n",
"%Y-%m-%dT%H:%M:%S",
)
)
logger_handler.addFilter(LambdaLoggerFilter())
This behavior is formally documented by AWS as AWS Lambda function logging in Python
, under the section "Logging library".
Related
I am trying to learn AWS greengrass and so I was following this tutorial https://docs.aws.amazon.com/greengrass/latest/developerguide/gg-gs.html which explains step by step on setting up with greengrass on raspberry pi and publishing some messages using a lambda function.
A simple lambda function is as following :
import greengrasssdk
import platform
from threading import Timer
import time
# Creating a greengrass core sdk client
client = greengrasssdk.client('iot-data')
# Retrieving platform information to send from Greengrass Core
my_platform = platform.platform()
def greengrass_hello_world_run():
if not my_platform:
client.publish(topic='hello/world', payload='hello Sent from Greengrass Core.')
else:
client.publish(topic='hello/world', payload='hello Sent from Greengrass Core running on platform: {}'.format(my_platform))
# Asynchronously schedule this function to be run again in 5 seconds
Timer(5, greengrass_hello_world_run).start()
# Execute the function above
greengrass_hello_world_run()
# This is a dummy handler and will not be invoked
# Instead the code above will be executed in an infinite loop for our example
def function_handler(event, context):
return
Here this works but I am trying to understand it better by having a lambda function to do some extra work for example opening a file and writing to it.
I modified the greengrass_hello_world_run() function as following
def greengrass_hello_world_run():
if not my_platform:
client.publish(topic='hello/world', payload='hello Sent from Greengrass Core.')
else:
stdout = "hello from greengrass\n"
with open('/home/pi/log', 'w') as file:
for line in stdout:
file.write(line)
client.publish(topic='hello/world', payload='hello Sent from Greengrass Core running on platform: {}'.format(my_platform))
I expect upon deploying, the daemon running on my local pi should create that file in the given directory coz I believe greengrass core tries to run this lambda function on local device. However it doesnt create any file nor it publish anything coz I believe this code might be breaking. Not sure how though, I tried looking into cloudwatch but I dont see any events or errors being reported.
Any help on this would be really appreciated,
cheers !
A few thoughts on this...
If you turn on the local logging in your GG group set up it will start to write logs locally on your PI. The settings are:
The logs are located in: /greengrass/ggc/var/log/system
If you tail the python_runtime.log you can see any errors from the lambda execution.
If you want to access local resources you will need to create a resource in your GG group definition. You can then give this access to a volume which is where you can write your file.
You do need to deploy your group once this has been done for the changes to take effect.
I think I found the answer, we have to create the resource in lambda environment and also make sure to give read and write access to lambda for accessing that resource. By default lambda can only access /tmp folder.
Here is the link to the documentation
https://docs.aws.amazon.com/greengrass/latest/developerguide/access-local-resources.html
Updated:
Turns out, this is not a function of cron. I get the same behavior when running the script from the command line, if it in fact has a record to process and communicates with ElasticSearch.
I have a cron job that runs a python script which uses pyelasticsearch to index some documents in an ElasticSearch instance. The script works fine from the command line, but when run via cron, it results in this error:
No handlers could be found for logger "elasticsearch.trace"
Clearly there's some logging configuration issue that only crops up when run under cron, but I'm not clear what it is. Any insight?
I solved this by explicitly configuring a handler for the elasticsearch.trace logger, as I saw in examples from the pyelasticsearch repo.
After importing pyelasticsearch, set up a handler like so:
tracer = logging.getLogger('elasticsearch.trace')
tracer.setLevel(logging.INFO)
tracer.addHandler(logging.FileHandler('/tmp/es_trace.log'))
I'm not interested in keeping the trace logs, so I used the near-at-hand Django NullHandler.
from django.utils.log import NullHandler
tracer = logging.getLogger('elasticsearch.trace')
tracer.setLevel(logging.INFO)
tracer.addHandler(NullHandler())
I am trying to setup logging when using IPython parallel. Specifically, I would like to redirect log messages from the engines to the client. So, rather than each of the engines logging individually to their own log files, as in IPython.parallel - can I write my own log into the engine logs?, I am looking for something like How should I log while using multiprocessing in Python?
Based on reviewing the IPython code base, I have the impression that the way to do this would be to register a zmq.log.hander.PUBHandler with the logging module (see documentation in iploggerapp.py). I have tried this in various ways, but none seem to work. I also tried to register a logger via IPython.parallel.util. connect_engine_logger, but this also does not appear to do anything.
update
I have made some progress on this problem. If I specify in ipengine_config c.IPEngineApp.log_url, then the logger of the IPython application has the appropriate EnginePubHandler. I checked this via
%%px
from IPython.config import Application
log = Application.instance().log
print(log.handlers)
Which indicated that the application logger has an EnginePUBHandler for each engine. Next, I can start the iplogger app in a separate terminal and see the log messages from each engine.
However, What I would like to achieve is to see these log messages in the notebook, rather than in a separate terminal. I have tried starting iplogger from within the notebook via a system call, but this crashes.
Sometimes I need to check the output of a python web apllication.
If I execute the application directly, I can see it from terminal screen.
But I have no idea how to check that for mod_wsgi. Will it appear in a seperate log of apache? Or do I need to add some codes for logging?
Instead of print "message", you could use sys.stderr.write("message")
For logging to stderr with a StreamHandler:
import logging
handler = logging.StreamHandler(stream=sys.stderr)
log = logging.getLogger(__name__)
log.setLevel(logging.INFO)
log.addHandler(handler)
log.info("Message")
wsgilog is simple, and can redirect standard out to a log file automatically for you. Breeze to set up, and I haven't had any real problems with it.
No WSGI application component which claims to be portable should write to standard output. That is, an application should not use the Python print statement without directing output to some alternate stream. An application should also not write directly to sys.stdout. (ModWSGI Wiki)
So don't... Instead, I recommend using a log aggregation tool like sentry. It is useful while developing and a must-have in production.
The mongo shell includes a useful print command.
When executing map_reduce from pymongo, how can one print / log info from within a javascript block?
Update: OK, I have an answer. The process running mongo will output whatever is printed with print(). A second option is to configure mongo logging (and probably tailing the log files). Virtual bonus points for answering "Can I still get these values directly in python via pymongo?"
You can still use print() inside the JS block which will get written to /var/log/mongodb/mongodb.log, more info here. Due to the fact that map/reduce happens server-side it won't emit anything in your Python application.