Add python logger to stream logs to CloudWatch within Fargate

Add python logger to stream logs to CloudWatch within Fargate - python

I have a docker container with a python script (Python 3.8), which I execute in AWS Fargate via Airflow (ECSOperator). The script streams several logs to Cloudwatch using the awslog driver defined in the task definition. I'm able to correctly see all the logs in Cloudwatch but the problem is that the logs are always attached to a main log message, that is, my logs are visualized within another log message.
Here is an example of a log, where the first 3 columns are injected automatically, whereas the rest of the message refers to my custom log:
[2021-11-04 17:23:22,026] {{ecs.py:317}} INFO - [2021-11-04T17:22:47.719000] 2021-11-04 17:22:47,718 - myscript - WARNING - testing log message
Thus, no matter which logLevel I set that the first log message is always INFO. It seems like it is something that Fargate adds automatically. I would like my log message to stream directly to Cloudwatch without being delivered into another log message, just:
[2021-11-04T17:22:47.719000] 2021-11-04 17:22:47,718 - myscript - WARNING - testing log message
I assume that I'm not configuring the logger correctly or that I have to get another logger, but I don´t know how to do it properly. These are some of the approaches I followed and the results I obtained.
Prints
If a use prints within my code, the log messages are placed in the stdout so they are streamed to Cloudwatch through the awslog driver.
[2021-11-04 17:23:22,026] {{ecs.py:317}} INFO - testing log message
Logging without configuration
If I use the logger with any ConsoleHandler or StreamHandler configured, the generated log messages are equal to the ones created with prints.
import logging
logger = logging.getLogger(__name__)
logger.warning('testing log message')
[2021-11-04 17:23:22,026] {{ecs.py:317}} INFO - testing log message
Logging with StreamHandler
If I configure a StreamHandler with a formatter, then my log is attached to the main log, as stated before. Thus, it just replaces the string messsage (last column) by the new formatted log message.
import logging
logger = logging.getLogger(__name__)
formatter = logging.Formatter('%(asctime)s - %(name)s - %(levelname)s - %(message)s')
handler = logging.StreamHandler(sys.stdout)
handler.setFormatter(formatter)
logger.addHandler(handler)
logger.warning('testing log message')
[2021-11-04 17:23:22,026] {{ecs.py:317}} INFO - [2021-11-04T17:22:47.719000] 2021-11-04 17:22:47,718 - myscript - WARNING - testing log message
This is the defined log configuration witihn the task definition:
"logConfiguration": {
"logDriver": "awslogs",
"secretOptions": [],
"options": {
"awslogs-group": "/ecs/my-group",
"awslogs-region": "eu-west-1",
"awslogs-stream-prefix": "ecs"
}
EDIT 1
I've been investigating the logs in Cloudwatch and I found out that the logs are streaming to 2 different log groups, since I'm using Airflow to launch fargate.
Airflow group: Airflow creates automatically a log group named <airflow_environment>-Task where places the logs generated within the tasks. Here, it seems that Airflow wraps my custom logs within its own log, which are always INFO. When visualizing the logs from the Airflow UI, it shows the logs obtained from this log group.
ECS group: this is the log group defined in the TaskDefinition (/ecs/my-group). In this group, the logs are streamed as they are, without being wrapped.
Hence, the problem seems to be with Airflow as it wraps the logs within its own logger and it shows these logs in the Airflow UI. Anyway, logs are correctly delivered and formatted within the log group defined in the TaskDefinition.

Probably a bit late and you already solved, but I think the solution is here. Probably fargate pre-configure a log handler like lambda, given the fact that there is the configuration on awslogs in the task definition.

Related

python logging in AWS Fargate, datetime duplicated

I'm trying to use python logging module in AWS Fargate. The same application should work also locally, so I'd like to use a custom logger for local use but to keep intact cloudwatch logs.
This is what I'm doing:
if logging.getLogger().hasHandlers():
log = logging.getLogger()
log.setLevel(logging.INFO)
else:
from logging.handlers import RotatingFileHandler
log = logging.getLogger('sm')
log.root.setLevel(logging.INFO)
...
But I get this in cloudwatch:
2023-02-08T13:06:27.317+01:00 08/02/2023 12:06 - sm - INFO - Starting
And this locally:
08/02/2023 12:06 - sm - INFO - Starting
I thought Fargate was already defining a logger, but apparently the following has no effect:
logging.getLogger().hasHandlers()
Ideally this should be the desired log in cloudwatch:
2023-02-08T13:06:27.317+01:00 sm - INFO - Starting

Fargate just runs docker containers. It doesn't do any setup of your Python code that happens to be running in that docker container for you. It doesn't even know or care that you are running Python code.
Anything written to STDOUT/STDERR by the primary process of the docker container gets sent to CloudWatch Logs, so if you want to be compatible with ECS CloudWatch Logs just make sure you are sending logs in the format you want to the console.

You can use python logging basicconfig to configure the root logger. debug, info, warning, error and critical call basicConfig automatically if no handlers are defined.
logging.basicConfig(filename='test.log', format='%(filename)s: %(message)s',
level=logging.DEBUG)
set the logging format to include details which are required as args
logging.basicConfig(format='%(asctime)s %(name)s - %(levelname)s - %(message)s', level=logging.INFO)
use this to format logs in cloudwatch. Found one stackoverflow answer with some detailed explanation here

Log who ran a python script: cron or human?

I created a python script which is usually run by a cron job, but the script can at times be run manually by a human. Is it possible to determine who ran the script and saved it in a log file?
I'm using python's logging library. It seems the LogRecord attributes name only shows the root as being the logger used to log the call.
log_format = '%(asctime)s - %(name)s - %(levelname)s - %(message)s'

How about using command line options?
https://docs.python.org/3/library/argparse.html
When triggering the script from cron use a special argument that defaults to something else if not explicitly set.

Python: Request-ID thread-local logging

colleagues
Currently, I'm setting up logging in several Python components and have a question about the thread-local logging setup.
So, I have the Request-ID field coming to the component in different ways (Headers for queue messages and Rest calls).
I set up logging formatter with this request ID:
%(levelname)s: [%(name)s] [%(request_id)s] %(message)s
This formatter is common for all logs
Each subsequent class in the processing flow creates own logger using
logging.getLogger(<class-name>)
The question is:
How can I set the request ID once on thread start (message handler, rest handler) and be sure that it will remain the same in all subsequent calls inside the thread, but other parallel threads, obviously, will have its own request IDs?

Does anyone know how to suppress all airflow "info" level logs, but not suppress application implementation specific logs?

Airflow 1.10.1 has an attribute called "logging_level" that I believe is tied to the Python logging level. When the value is INFO or lower, the output logs are too verbose and unnecessary in deployments.
Rather, I want to be able to log just airflow framework errors, and everything I want my application to log. Then I cut down on the logging to something minimal, most just in the context of the application, and only keep airflow framework/execution errors.
In a particular PythonOperator I wrote at 5 different levels of log to see what happens to them when I modify the airflow.cfg logging_level.
logging.debug('******************* HELLO debug *******************')
logging.info('******************* HELLO info *******************')
logging.warning('******************* HELLO warning *******************')
logging.error('******************* HELLO error *******************')
logging.critical('******************* HELLO critical *******************')
The idea being that by changing the airflow.cfg attribute for logging_level from debug to info to warning, I can see less and less of the airflow logs, and just leave the application specific logs I want.
Step 1: logging_level = DEBUG
Here's the log from the task that has logs at all level from debug upward.
Step 2: logging_level = INFO
As expected, the logs do not include debug level messages.
Step 3: logging_level = WARNING
When we go up from INFO to WARNING, the file is empty. I was expecting the warning, error, and critical messages in the file and the rest suppressed from Airflow since the log did not contain anything from airflow at the level above INFO.
Step 4: logging_level = ERROR
The same problem here again, the file is empty. I expected to get the error and critical messages, but the file is empty.
Note, in the last two screenshots, it's not that the path is invalid, but Airflow just displays the path to the file it seems in the absence of any content in the log file.
So my question is:
1) Is this just an Airflow bug?
2) Am I not using this properly? Or do I need to do something else in order to suppress Airflow level logs from INFO and below in production, and just keep my application specific logs?

If you notice in your log screenshots your log message are actually wrapped in an info log. If you want to actually change the log level within the task log and not wrap it you can pull the log off of the task instance (from the **kwargs) and use it directly as opposed to generically calling logging.warning().
Here is an example:
def your_python_callable(**kwargs):
log = kwargs["ti"].log
log.warning("******HELLO Debug******")

Log disabling in python

I am new to this logging module.
logging.basicConfig(level=logging.DEBUG)
logging.disable = True
As per my understanding this should disable debug logs. But when it is executed it prints debug logs also.
I have only debug logs to print. I dont have critical or info logs. So how i can disable this debug logs.

logging.disable is method, not a configurable attribute.
You can disable logging with :
https://docs.python.org/2/library/logging.html#logging.disable
To disable all, call:
logging.disable(logging.DEBUG)
This will disable all logs of level DEBUG and below.
To enable all logging, do logging.disable(logging.NOTSET) as it is the lowest level.

the level argument in logging.basicConfig you've set to logging.DEBUG is the lowest level of logging which will be displayed.
the order of logging levels is documented here.
if you don't want to display DEBUG, you can either set logging.basicConfig(level=logging.INFO), or specify levels to be disabled via logging.disable(logging.DEBUG)

You can change to level=logging.CRITICAL and receive only critical logs

Develop Reference

Python is a programming language that lets you work quickly and integrate systems more effectively.