python logging in AWS Fargate, datetime duplicated - python

I'm trying to use python logging module in AWS Fargate. The same application should work also locally, so I'd like to use a custom logger for local use but to keep intact cloudwatch logs.
This is what I'm doing:
if logging.getLogger().hasHandlers():
log = logging.getLogger()
log.setLevel(logging.INFO)
else:
from logging.handlers import RotatingFileHandler
log = logging.getLogger('sm')
log.root.setLevel(logging.INFO)
...
But I get this in cloudwatch:
2023-02-08T13:06:27.317+01:00 08/02/2023 12:06 - sm - INFO - Starting
And this locally:
08/02/2023 12:06 - sm - INFO - Starting
I thought Fargate was already defining a logger, but apparently the following has no effect:
logging.getLogger().hasHandlers()
Ideally this should be the desired log in cloudwatch:
2023-02-08T13:06:27.317+01:00 sm - INFO - Starting

Fargate just runs docker containers. It doesn't do any setup of your Python code that happens to be running in that docker container for you. It doesn't even know or care that you are running Python code.
Anything written to STDOUT/STDERR by the primary process of the docker container gets sent to CloudWatch Logs, so if you want to be compatible with ECS CloudWatch Logs just make sure you are sending logs in the format you want to the console.

You can use python logging basicconfig to configure the root logger. debug, info, warning, error and critical call basicConfig automatically if no handlers are defined.
logging.basicConfig(filename='test.log', format='%(filename)s: %(message)s',
level=logging.DEBUG)
set the logging format to include details which are required as args
logging.basicConfig(format='%(asctime)s %(name)s - %(levelname)s - %(message)s', level=logging.INFO)
use this to format logs in cloudwatch. Found one stackoverflow answer with some detailed explanation here

Related

Log who ran a python script: cron or human?

I created a python script which is usually run by a cron job, but the script can at times be run manually by a human. Is it possible to determine who ran the script and saved it in a log file?
I'm using python's logging library. It seems the LogRecord attributes name only shows the root as being the logger used to log the call.
log_format = '%(asctime)s - %(name)s - %(levelname)s - %(message)s'
How about using command line options?
https://docs.python.org/3/library/argparse.html
When triggering the script from cron use a special argument that defaults to something else if not explicitly set.

Add python logger to stream logs to CloudWatch within Fargate

I have a docker container with a python script (Python 3.8), which I execute in AWS Fargate via Airflow (ECSOperator). The script streams several logs to Cloudwatch using the awslog driver defined in the task definition. I'm able to correctly see all the logs in Cloudwatch but the problem is that the logs are always attached to a main log message, that is, my logs are visualized within another log message.
Here is an example of a log, where the first 3 columns are injected automatically, whereas the rest of the message refers to my custom log:
[2021-11-04 17:23:22,026] {{ecs.py:317}} INFO - [2021-11-04T17:22:47.719000] 2021-11-04 17:22:47,718 - myscript - WARNING - testing log message
Thus, no matter which logLevel I set that the first log message is always INFO. It seems like it is something that Fargate adds automatically. I would like my log message to stream directly to Cloudwatch without being delivered into another log message, just:
[2021-11-04T17:22:47.719000] 2021-11-04 17:22:47,718 - myscript - WARNING - testing log message
I assume that I'm not configuring the logger correctly or that I have to get another logger, but I donĀ“t know how to do it properly. These are some of the approaches I followed and the results I obtained.
Prints
If a use prints within my code, the log messages are placed in the stdout so they are streamed to Cloudwatch through the awslog driver.
[2021-11-04 17:23:22,026] {{ecs.py:317}} INFO - testing log message
Logging without configuration
If I use the logger with any ConsoleHandler or StreamHandler configured, the generated log messages are equal to the ones created with prints.
import logging
logger = logging.getLogger(__name__)
logger.warning('testing log message')
[2021-11-04 17:23:22,026] {{ecs.py:317}} INFO - testing log message
Logging with StreamHandler
If I configure a StreamHandler with a formatter, then my log is attached to the main log, as stated before. Thus, it just replaces the string messsage (last column) by the new formatted log message.
import logging
logger = logging.getLogger(__name__)
formatter = logging.Formatter('%(asctime)s - %(name)s - %(levelname)s - %(message)s')
handler = logging.StreamHandler(sys.stdout)
handler.setFormatter(formatter)
logger.addHandler(handler)
logger.warning('testing log message')
[2021-11-04 17:23:22,026] {{ecs.py:317}} INFO - [2021-11-04T17:22:47.719000] 2021-11-04 17:22:47,718 - myscript - WARNING - testing log message
This is the defined log configuration witihn the task definition:
"logConfiguration": {
"logDriver": "awslogs",
"secretOptions": [],
"options": {
"awslogs-group": "/ecs/my-group",
"awslogs-region": "eu-west-1",
"awslogs-stream-prefix": "ecs"
}
EDIT 1
I've been investigating the logs in Cloudwatch and I found out that the logs are streaming to 2 different log groups, since I'm using Airflow to launch fargate.
Airflow group: Airflow creates automatically a log group named <airflow_environment>-Task where places the logs generated within the tasks. Here, it seems that Airflow wraps my custom logs within its own log, which are always INFO. When visualizing the logs from the Airflow UI, it shows the logs obtained from this log group.
ECS group: this is the log group defined in the TaskDefinition (/ecs/my-group). In this group, the logs are streamed as they are, without being wrapped.
Hence, the problem seems to be with Airflow as it wraps the logs within its own logger and it shows these logs in the Airflow UI. Anyway, logs are correctly delivered and formatted within the log group defined in the TaskDefinition.
Probably a bit late and you already solved, but I think the solution is here. Probably fargate pre-configure a log handler like lambda, given the fact that there is the configuration on awslogs in the task definition.

How to disable discord.py logger?

So I've tried to add some logger to my discord bot, to see logs in file not just in console, cause obviously it's irritating when I reset app and find out that I have to check logs that I've already destroyed, I set it up like this:
logging.basicConfig(filename='CoronaLog.log', level=logging.DEBUG, format='%(levelname)s %(asctime)s %(message)s')
And learned the hard way that discord.py library has its own logger installed so now my logs look like one big mess, is there any way to disable discord.py's logging, or at least output it to another file?
EDIT: I've tried creating two loggers, so it would look like this:
logging.basicConfig(filename='discord.log', level=logging.DEBUG, format='%(levelname)s %(asctime)s %(message)s')
nonDiscordLog = logging.getLogger('discord')
handler = logging.FileHandler(filename='CoronaLog.log', encoding='utf-8', mode='w')
handler.setFormatter(logging.Formatter('%(levelname)s %(asctime)s:%(name)s: %(message)s'))
nonDiscordLog.addHandler(handler)
So the discord log, would be logged as the basic config says to discord.log file and, when executed like this:
nonDiscordLog.info("execution took %s seconds \n" % (time.time() - startTime))
would log into CoronaLog.log file, although it didn't really change anything
discord.py is in this regard terribly unintuitive for everyone but beginners, anyway after confronting the docs you can find out that this behavior can be avoided with:
import discord
client = discord.Client(intents=discord.Intents.all())
# Your regular code here
client.run(__YOURTOKEN__, log_handler=None)
Of course, you can supplement your own logger instead of None but to answer your question exactly, this is how you can disable discord.py's logging.
There's actually a bit more that you can do with the default logging, and you can read all about it on
the official docs
https://discordpy.readthedocs.io/en/latest/logging.html says:
"discord.py logs errors and debug information via the logging python module. It is strongly recommended that the logging module is configured, as no errors or warnings will be output if it is not set up. Configuration of the logging module can be as simple as:
import logging
logging.basicConfig(level=logging.INFO)
Placed at the start of the application. This will output the logs from discord as well as other libraries that use the logging module directly to the console."
Maybe try configuring logging a different way? Because when starting logging, it appears to initialize discord.py's llogging. maybe try
import logging
# Setup logging...
import discord
Maybe if you import it afterwords, it won't set it up.

Does anyone know how to suppress all airflow "info" level logs, but not suppress application implementation specific logs?

Airflow 1.10.1 has an attribute called "logging_level" that I believe is tied to the Python logging level. When the value is INFO or lower, the output logs are too verbose and unnecessary in deployments.
Rather, I want to be able to log just airflow framework errors, and everything I want my application to log. Then I cut down on the logging to something minimal, most just in the context of the application, and only keep airflow framework/execution errors.
In a particular PythonOperator I wrote at 5 different levels of log to see what happens to them when I modify the airflow.cfg logging_level.
logging.debug('******************* HELLO debug *******************')
logging.info('******************* HELLO info *******************')
logging.warning('******************* HELLO warning *******************')
logging.error('******************* HELLO error *******************')
logging.critical('******************* HELLO critical *******************')
The idea being that by changing the airflow.cfg attribute for logging_level from debug to info to warning, I can see less and less of the airflow logs, and just leave the application specific logs I want.
Step 1: logging_level = DEBUG
Here's the log from the task that has logs at all level from debug upward.
Step 2: logging_level = INFO
As expected, the logs do not include debug level messages.
Step 3: logging_level = WARNING
When we go up from INFO to WARNING, the file is empty. I was expecting the warning, error, and critical messages in the file and the rest suppressed from Airflow since the log did not contain anything from airflow at the level above INFO.
Step 4: logging_level = ERROR
The same problem here again, the file is empty. I expected to get the error and critical messages, but the file is empty.
Note, in the last two screenshots, it's not that the path is invalid, but Airflow just displays the path to the file it seems in the absence of any content in the log file.
So my question is:
1) Is this just an Airflow bug?
2) Am I not using this properly? Or do I need to do something else in order to suppress Airflow level logs from INFO and below in production, and just keep my application specific logs?
If you notice in your log screenshots your log message are actually wrapped in an info log. If you want to actually change the log level within the task log and not wrap it you can pull the log off of the task instance (from the **kwargs) and use it directly as opposed to generically calling logging.warning().
Here is an example:
def your_python_callable(**kwargs):
log = kwargs["ti"].log
log.warning("******HELLO Debug******")

python how to logging all third-party log into a seperate file

for my perspective, I would initialize a logger to log message for my app
import logging
logger = logging.getLogger('my_app')
handler = logging.StreamHandler(sys.stdout)
handler.setFormatter(logging.Formatter(
'%(asctime)s %(levelname)s %(name)s %(message)s'))
logger.addHandler(handler)
when errors happend:
try:
blah()
except Exceptions as e:
logger.warning(e)
But I'm using some third-party module like sqlalchemy, sqlalchemy may log warning infos when error happends(e.g, varchar is too long and being truncated) and it use a separate logger (so does some other modules, like requests)
sqlalchemy/log.py
This may leads some bad issues and it's not trackable.
In case I'm using lots of third-party modules, how can I log all third-party message to a separate file to help me doing trouble shooting works?
You can set up a handler to log to a file and attach that to the root logger, but that will log all messages, even ones from your code. To leave your messages out of that file, use a logging.Filter subclass which filters out all messages which aren't from one of your code's top-level package namespaces. Attach that filter to your handler.

Categories

Resources