Log streams are random hash instead of logger name - python

so recently I moved my app into a docker container.
I noticed, that the log streams of the log group changed its names to some random hash.
Before moving to docker:
After moving to docker:
The logger in each file is initialized as
logger = logging.getLogger(__name__)
The logger's config is set up inside the __main__ with
def setup_logger(config_file):
with open(config_file) as log_config:
config_yml = log_config.read()
config_dict = yaml.safe_load(config_yml)
logging.config.dictConfig(config_dict)
with the config loaded from this file
version: 1
disable_existing_loggers: False
formatters:
json:
format: "[%(asctime)s] %(process)d %(levelname)s %(name)s:%(funcName)s:%(lineno)s - %(message)s"
plaintext:
format: "%(asctime)s %(levelname)s %(name)s - %(message)s"
datefmt: "%Y-%m-%d %H:%M:%S"
handlers:
console:
class: logging.StreamHandler
formatter: plaintext
level: INFO
stream: ext://sys.stdout
root:
level: DEBUG
propagate: True
handlers: [console]
The docker image is run with the flags
--log-driver=awslogs \
--log-opt awslogs-group=XXXXX \
--log-opt awslogs-create-group=true \
Is there a way to keep the original log stream names?

That's how the awslogs driver works.
Per the documentation, you can control the name somewhat using the awslogs-stream-prefix option:
The awslogs-stream-prefix option allows you to associate a log stream with the specified prefix, the container name, and the ID of the Amazon ECS task to which the container belongs. If you specify a prefix with this option, then the log stream takes the following format:
prefix-name/container-name/ecs-task-id
If you don't specify a prefix with this option, then the log stream is named after the container ID that is assigned by the Docker daemon on the container instance. Because it is difficult to trace logs back to the container that sent them with just the Docker container ID (which is only available on the container instance), we recommend that you specify a prefix with this option.
You cannot change this behavior if you're using the awslogs driver. The only option would be to disable the log driver and use the AWS SDK to put the events into CloudWatch manually, but I don't think that'd be a good idea.
To be clear, your container settings/code don't affect the stream name at all when using awslogs - the log driver is just redirecting all of the container's STDOUT to CloudWatch.

Related

how to configure logging system in one file on python

I have two files.
first is the TCP server.
second is the flask app. they are one project but they are inside of a separated docker container
they should write logs same file due to being the same project
ı try to create my logging library ı import my logging library to two file
ı try lots of things
firstly ı deleted bellow code
if (logger.hasHandlers()):
logger.handlers.clear()
when ı delete,ı get same logs two times
my structure
docker-compose
docker file
loggingLib.py
app.py
tcp.py
requirements.txt
.
.
.
my last logging code
from logging.handlers import RotatingFileHandler
from datetime import datetime
import logging
import time
import os, os.path
project_name= "proje_name"
def get_logger():
if not os.path.exists("logs/"):
os.makedirs("logs/")
now = datetime.now()
file_name = now.strftime(project_name + '-%H-%M-%d-%m-%Y.log')
log_handler = RotatingFileHandler('logs/'+file_name,mode='a', maxBytes=10000000, backupCount=50)
formatter = logging.Formatter('%(asctime)s - %(levelname)s - %(funcName)s - %(message)s ', '%d-%b-%y %H:%M:%S')
formatter.converter = time.gmtime
log_handler.setFormatter(formatter)
logger = logging.getLogger(__name__)
logger.setLevel(level=logging.INFO)
if (logger.hasHandlers()):
logger.handlers.clear()
logger.addHandler(log_handler)
return logger
it is working but only in one file
if app.py works first, it only makes a log
other file don't make any logs
Anything that directly uses files – config files, log files, data files – is a little trickier to manage in Docker than running locally. For logs in particular, it's usually better to set your process to log directly to stdout. Docker will collect the logs, and you can review them with docker logs. In this setup, without changing your code, you can configure Docker to send the logs somewhere else or use a log collector like fluentd or logstash to manage the logs.
In your Python code, you usually will want to configure the detailed logging setup at the top level, on the root logger
import logging
def main():
logging.basicConfig(
format='%(asctime)s - %(levelname)s - %(funcName)s - %(message)s ',
datefmt='%d-%b-%y %H:%M:%S',
level=logging.INFO
)
...
and in each individual module you can just get a local logger, which will inherit the root logger's setup
import logging
LOGGER = logging.getLogger(__name__)
With its default setup, Docker will capture log messages into JSON files on disk. If you generate a large amount of log messages in a long-running container, it can lead to local disk exhaustion (it will have no effect on memory available to processes). The Docker logging documentation advises using the local file logging driver, which does automatic log rotation. In a Compose setup you can specify logging: options:
version: '3.8'
services:
app:
image: ...
logging:
driver: local
You can also configure log rotation on the default JSON File logging driver:
version: '3.8'
services:
app:
image: ...
logging:
driver: json-file # default, can be omitted
options:
max-size: 10m
max-file: 50
You "shouldn't" directly access the logs, but they are in a fairly stable format in /var/lib/docker, and tools like fluentd and logstash know how to collect them.
If you ever decide to run this application in a cluster environment like Kubernetes, that will have its own log-management system, but again designed around containers that directly log to their stdout. You would be able to run this application unmodified in Kubernetes, with appropriate cluster-level configuration to forward the logs somewhere. Retrieving a log file from opaque storage in a remote cluster can be tricky to set up.

How can i log stdout and stderr outputs using fb-hydra?

I am trying to log stdout and stderr into a file.
I found custom.yaml file in the facebookresearch/hydra github.
# #package _group_
version: 1
formatters:
simple:
format: '[%(levelname)s] - %(message)s'
handlers:
console:
class: logging.StreamHandler
formatter: simple
stream: ext://sys.stdout
root:
handlers: [console]
disable_existing_loggers: False
I figured that I am able to create a custom job_logging config file and log stderr by editing the file as below
stream: ext://sys.stderr
However, I want to log stderr and stdout at the same time.
I am having a hard time figuring it out.. Does anyone know how I can do it by changing the config file?
Hydra is forwarding your config (as a primitive dictionary) to logging.config.dictConfig.
This is more of question about that API than about Hydra.
Do you know how to do this via logging.config.dictConfig without Hydra?

Log update issue while running python log via robot framework

Issue: Unable to get all the log types printed in console and none at log file while invoking below log methods via robot files.
import logging
from colorlog import ColoredFormatter
class Log():
LOG_LEVEL = logging.DEBUG
LOGFORMAT = " %(log_color)s%(levelname)-8s%(reset)s | %(log_color)s%(message)s%(reset)s"
logging.root.setLevel(LOG_LEVEL)
formatter = ColoredFormatter(LOGFORMAT)
stream = logging.StreamHandler()
stream.setLevel(LOG_LEVEL)
stream.setFormatter(formatter)
Log = logging.getLogger('pythonConfig')
Log.setLevel(LOG_LEVEL)
Log.addHandler(stream)
logger = logging.getLogger(__name__)
logging.basicConfig(
filename='c://foo//app.log',
format='%(asctime)s - %(levelname)s: %(message)s',
datefmt='%d-%b-%y %H:%M:%S', level=logging.INFO,
)
#classmethod
def warn(cls, message):
cls.Log.warning(message)
#classmethod
def info(cls, message):
cls.Log.info(message)
#classmethod
def error(cls, message):
cls.Log.error(message)
#classmethod
def debug(cls, message):
cls.Log.debug(message)
# Calling class methods
Log.warn("test")
Log.info("test")
Log.error("test")
Log.debug("test")
Running using python from command prompt:-
C:foo>py log.py
WARNING | test
INFO | test
ERROR | test
DEBUG | test
app.log
01-Sep-19 21:32:31 - WARNING: test
01-Sep-19 21:32:31 - INFO: test
01-Sep-19 21:32:31 - ERROR: test
01-Sep-19 21:32:31 - DEBUG: test
When I invoke the same methods via robot file (Python >> Robot suite), I am unable to get any of the logs printed in log file (app.log) and could see only error and warning messages are printed in console. could someone help me in this regards?
Runner.py
import robot
logFile = open('c:\\foo\\ExecutionReport.txt','w')
htmlpath = "c:\\foo\\Reports.html"
robot.run("c:\\foo\\test_sample.robot", log=None,report=htmlpath, output=None,
stdout=logFile)
Robot:-
*** Settings ***
Library ../Robot/Log.py
*** Test Cases ***
testinglogger
info test
error test
debug test
warn test
app.log:
None
In the Python Runner.py script you are invoking the robot.run function and setting some parameters for the execution, such as log=None and output=None. This causes no log files to be created (no output) and no logging to be visible (outside of ERROR and WARN apparently.).
See these Robot Framework run parameters:
-o --output file where file is the name of the Robot output. Setting NONE to this causes also reports and all other file logging to be disabled.
Leave unassigned to create default log files for Robot Framework. I believe this maps to the robot.run command's output parameter.
-L --loglevel level where level is the desired lowest logging level.
Available levels: TRACE, DEBUG, INFO (default), WARN, NONE (no logging). In this case you probably want to set it to DEBUG
Make the required changes to your Runner.py script and try again :)
Here is the complete documentation for the latest version 3.1.2 Robot Framework's robot.run script:
https://robot-framework.readthedocs.io/en/v3.1.2/_modules/robot/run.html

Pyramid uWSGI logging in daemon mode is not working

I've been trying multiple things on this one, but with no success.
I want to save log to file (SqlAlchemy logs, app debug logs, stack traces on errors, etc.).
I'm starting uwsgi with the following command:
uwsgi --ini-paste-logged myapp.ini
And here is the content of the ini file (where apiservice is my pakage)
[loggers]
keys = root, apiservice, sqlalchemy
[handlers]
keys = console
[formatters]
keys = generic
[logger_root]
level = INFO
handlers = console
[logger_apiservice]
level = DEBUG
handlers =
qualname = apiservice
[logger_sqlalchemy]
level = INFO
handlers =
qualname = sqlalchemy.engine
[handler_console]
class = StreamHandler
args = (sys.stderr,)
level = NOTSET
formatter = generic
[formatter_generic]
format = %(asctime)s %(levelname)-5.5s [%(name)s][%(threadName)s] %(message)s
[uwsgi]
socket = /tmp/myapp-uwsgi.sock
virtualenv = /var/www/myapp/env
pidfile = ./uwsgi.pid
daemonize = ./uwsgi.log
master = true
processes = 4
The uwsgi.log contains only request log, without any actual logging data.
I've tried with INI options like:
paste: config:%p
paste-logger: %p
logto: file
Nothing seem to work.
Apparently, the uwsgi config section was fine.
After closer look at the uwsgi.log, even though the server was launched and running successfully, you could see an error:
ImportError: No module named script.util.logging_config
I've installed following packages to solve my problems:
pip install pastescript
pip install pastedeploy

How do I log from my Python Spark script

I have a Python Spark program which I run with spark-submit. I want to put logging statements in it.
logging.info("This is an informative message.")
logging.debug("This is a debug message.")
I want to use the same logger that Spark is using so that the log messages come out in the same format and the level is controlled by the same configuration files. How do I do this?
I've tried putting the logging statements in the code and starting out with a logging.getLogger(). In both cases I see Spark's log messages but not mine. I've been looking at the Python logging documentation, but haven't been able to figure it out from there.
Not sure if this is something specific to scripts submitted to Spark or just me not understanding how logging works.
You can get the logger from the SparkContext object:
log4jLogger = sc._jvm.org.apache.log4j
LOGGER = log4jLogger.LogManager.getLogger(__name__)
LOGGER.info("pyspark script logger initialized")
You need to get the logger for spark itself, by default getLogger() will return the logger for you own module. Try something like:
logger = logging.getLogger('py4j')
logger.info("My test info statement")
It might also be 'pyspark' instead of 'py4j'.
In case the function that you use in your spark program (and which does some logging) is defined in the same module as the main function it will give some serialization error.
This is explained here and an example by the same person is given here
I also tested this on spark 1.3.1
EDIT:
To change logging from STDERR to STDOUT you will have to remove the current StreamHandler and add a new one.
Find the existing Stream Handler (This line can be removed when finished)
print(logger.handlers)
# will look like [<logging.StreamHandler object at 0x7fd8f4b00208>]
There will probably only be a single one, but if not you will have to update position.
logger.removeHandler(logger.handlers[0])
Add new handler for sys.stdout
import sys # Put at top if not already there
sh = logging.StreamHandler(sys.stdout)
sh.setLevel(logging.DEBUG)
logger.addHandler(sh)
We needed to log from the executors, not from the driver node. So we did the following:
We created a /etc/rsyslog.d/spark.conf on all of the nodes (using a Bootstrap method with Amazon Elastic Map Reduceso that the Core nodes forwarded sysloglocal1` messages to the master node.
On the Master node, we enabled the UDP and TCP syslog listeners, and we set it up so that all local messages got logged to /var/log/local1.log.
We created a Python logging module Syslog logger in our map function.
Now we can log with logging.info(). ...
One of the things we discovered is that the same partition is being processed simultaneously on multiple executors. Apparently Spark does this all the time, when it has extra resources. This handles the case when an executor is mysteriously delayed or fails.
Logging in the map functions has taught us a lot about how Spark works.
In my case, I am just happy to get my log messages added to the workers stderr, along with the usual spark log messages.
If that suits your needs, then the trick is to redirect the particular Python logger to stderr.
For example, the following, inspired from this answer, works fine for me:
def getlogger(name, level=logging.INFO):
import logging
import sys
logger = logging.getLogger(name)
logger.setLevel(level)
if logger.handlers:
# or else, as I found out, we keep adding handlers and duplicate messages
pass
else:
ch = logging.StreamHandler(sys.stderr)
ch.setLevel(level)
formatter = logging.Formatter('%(asctime)s - %(name)s - %(levelname)s - %(message)s')
ch.setFormatter(formatter)
logger.addHandler(ch)
return logger
Usage:
def tst_log():
logger = getlogger('my-worker')
logger.debug('a')
logger.info('b')
logger.warning('c')
logger.error('d')
logger.critical('e')
...
Output (plus a few surrounding lines for context):
17/05/03 03:25:32 INFO MemoryStore: Block broadcast_24 stored as values in memory (estimated size 5.8 KB, free 319.2 MB)
2017-05-03 03:25:32,849 - my-worker - INFO - b
2017-05-03 03:25:32,849 - my-worker - WARNING - c
2017-05-03 03:25:32,849 - my-worker - ERROR - d
2017-05-03 03:25:32,849 - my-worker - CRITICAL - e
17/05/03 03:25:32 INFO PythonRunner: Times: total = 2, boot = -40969, init = 40971, finish = 0
17/05/03 03:25:32 INFO Executor: Finished task 7.0 in stage 20.0 (TID 213). 2109 bytes result sent to driver
import logging
# Logger
logging.basicConfig(format='%(asctime)s %(filename)s %(funcName)s %(lineno)d %(message)s')
logger = logging.getLogger('driver_logger')
logger.setLevel(logging.DEBUG)
Simplest way to log from pyspark !
The key of interacting pyspark and java log4j is the jvm.
This below is python code, the conf is missing the url, but this is about logging.
from pyspark.conf import SparkConf
from pyspark.sql import SparkSession
my_jars = os.environ.get("SPARK_HOME")
myconf = SparkConf()
myconf.setMaster("local").setAppName("DB2_Test")
myconf.set("spark.jars","%s/jars/log4j-1.2.17.jar" % my_jars)
spark = SparkSession\
.builder\
.appName("DB2_Test")\
.config(conf = myconf) \
.getOrCreate()
Logger= spark._jvm.org.apache.log4j.Logger
mylogger = Logger.getLogger(__name__)
mylogger.error("some error trace")
mylogger.info("some info trace")
You can implement the logging.Handler interface in a class that forwards log messages to log4j under Spark. Then use logging.root.addHandler() (and, optionally, logging.root.removeHandler()) to install that handler.
The handler should have a method like the following:
def emit(self, record):
"""Forward a log message for log4j."""
Logger = self.spark_session._jvm.org.apache.log4j.Logger
logger = Logger.getLogger(record.name)
if record.levelno >= logging.CRITICAL:
# Fatal and critical seem about the same.
logger.fatal(record.getMessage())
elif record.levelno >= logging.ERROR:
logger.error(record.getMessage())
elif record.levelno >= logging.WARNING:
logger.warn(record.getMessage())
elif record.levelno >= logging.INFO:
logger.info(record.getMessage())
elif record.levelno >= logging.DEBUG:
logger.debug(record.getMessage())
else:
pass
Installing the handler should go immediately after you initialise your Spark session:
spark = SparkSession.builder.appName("Logging Example").getOrCreate()
handler = CustomHandler(spark_session)
# Replace the default handlers with the log4j forwarder.
root_handlers = logging.root.handlers[:]
for h in self.root_handlers:
logging.root.removeHandler(h)
logging.root.addHandler(handler)
# Now you can log stuff.
logging.debug("Installed log4j log handler.")
There's a more complete example here: https://gist.github.com/thsutton/65f0ec3cf132495ef91dc22b9bc38aec
You need to make spark log is reachable for driver and all executors so we have create logging class and deal it as job dependency and load it on each executors.
class Log4j:
def __init__(this, spark_session):
conf = spark_session.SparkContext.getConf()
app_id = conf.get('spark.app.id')
app_name = conf.get('spark.app.name')
log4jlogger = spark_session._jvm.org.apache.log4j
prefix_msg = '<'+app_id + ' : ' + app_name +'> '
print(prefix_msg)
self.logger = log4jlogger.logManager.getLogger(prefix_msg)
def warn(this, msg):
# log warning
self.logger.warn(msg)
def error(this, msg):
#log error
self.logger.error(msg)
def info(this, msg):
# log information message
self.logger.info(msg)

Categories

Resources