What to write into log file? - python

My question is simple: what to write into a log.
Are there any conventions?
What do I have to put in?
Since my app has to be released, I'd like to have friendly logs, which could be read by most people without asking what it is.
I already have some ideas, like a timestamp, a unique identifier for each function/method, etc..
I'd like to have several log levels, like tracing/debugging, informations, errors/warnings.
Do you use some pre-formatted log resources?
Thank you

It's quite pleasant, and already implemented.
Read this: http://docs.python.org/library/logging.html
Edit
"easy to parse, read," are generally contradictory features. English -- easy to read, hard to parse. XML -- easy to parse, hard to read. There is no format that achieves easy-to-read and easy-to-parse. Some log formats are "not horrible to read and not impossible to parse".
Some folks create multiple handlers so that one log has several formats: a summary for people to read and an XML version for automated parsing.

Here are some suggestions for content:
timestamp
message
log message type (such as error, warning, trace, debug)
thread id ( so you can make sense of the log file from a multi threaded application)
Best practices for implementation:
Put a mutex around the write method so that you can be sure that each write is thread safe and will make sense.
Send 1 message at at a time to the log file, and specify the type of log message each time. Then you can set what type of logging you want to take on program startup.
Use no buffering on the file, or flush often in case there is a program crash.
Edit: I just noticed the question was tagged with Python, so please see S. Lott's answer before mine. It may be enough for your needs.

Since you tagged your question python, I refer you to this question as well.
As for the content, Brian proposal is good. Remember however to add the program name if you are using a shared log.
The important thing about a logfile is "greppability". Try to provide all your info in a single line, with proper string identifiers that are unique (also in radix) for a particular condition. As for the timestamp, use the ISO-8601 standard which sorts nicely.

A good idea is to look at log analysis software. Unless you plan to write your own, you will probably want to exploit an existing log analysis package such as Analog. If that is the case, you will probably want to generate a log output that is similar enough to the formats that it accepts. It will allow you to create nice charts and graphs with minimal effort!

In my opinion, best approach is to use existing logging libraries such as log4j (or it's variations for other languages). It gives you control on how your messages are formatted and you can change it without ever touching your code. It follows best practices, robust and used by millions of users. Of course, you can write you own logging framework, but that would be very odd thing to do, unless you need something very specific. In any case, walk though their documentation and see how log statements are presented there.
Check out log4py - Python ported version of log4j, I think there are several implementations for Python out there..

Python's logging library are thread-safe for single process threads.

timeStamp i.e. DateTime YYYY/MM/DD:HH:mm:ss:ms
User
Thread ID
Function Name
Message/Error Message/Success Message/Function Trace
Have this in XML format and you can then easily write a parser for it.
<log>
<logEntry DebugLevel="0|1|2|3|4|5....">
<TimeStamp format="YYYY/MM/DD:HH:mm:ss:ms" value="2009/04/22:14:12:33:120" />
<ThreadId value="" />
<FunctionName value="" />
<Message type="Error|Success|Failure|Status|Trace">
Your message goes here
</Message>
</logEntry>
</log>

Related

Python - Creating meaningful crash logs / crash report

Actual Problem (Crash Log Generation)
Is there a python module that could help me produce meaningful crash logs? Or a good way to go about producing them?
I want my crash logs to contain:
All variables within the current execution stack
All global variables
Parameters at invocation (i.e. flags, their values, etc)
Is this something I'd just be better off writing myself?
Context (Not particularly relevant)
I have a program that is used by a large number of people within my company that I am responsible for supporting. Unfortunately it doesn't always work correctly (about 1 out of 1000 times) and I am having difficulty tracking the bug down. I think that having solid crash logs would really help here so that my users could just submit those rather than making vague phone calls for help.
You can look at how django generates its error pages: https://github.com/django/django/blob/master/django/views/debug.py#L59
In non-debug mode these are mailed to the list of admins in the settings file. It's super handy!
The python logging module has everything you need already.

How to see changes for a mercurial file context?

I'm currently trying to write a script that will find all the files changed given a certain # in the task description, and I have gotten the script to work for that. But now I'm trying to sort it by whether the file was added, modified or removed. I've looked through the Mercurial API, but I can't find anything that can do what I want.
My code currently uses repo[revnum].description() and parses that to find which ones contain the #, and if they do, add the file context to a list.
This works fine and I can print a list of files, but I can't find a method to see what was done with each context. Can anyone help me out here, or point me to some better documentation?
Do you need to work with the Mercurial API? It is possible to do what you need by working with the output of hg log
In general, you should avoid writing scripts directly using the Mercurial API. It is better to write your scripts to use the CLI or perhaps even use hglib. As stated on the MercurialApi wiki:
For the vast majority of third party code, the best approach is to use
Mercurial's published, documented, and stable API: the command line
interface.
That being said, if you really need to use the API, you can use repo.status() to find the info you asked about:
modified, added, removed, deleted, unknown, ignored, clean = repo.status(revnum-1, revnum)
I ended up using something similar to what Tim said, although I did still use the API.
I imported commands from mercurial, and then called commands.status(repo.ui, repo, change=revnum)
I captured the output of this, using repo.ui.pushbuffer() and repo.ui.popbuffer() which was in the form
A file_path1
R file_path2
R file_path3
A file_path4
M file_path5
I parsed this input and sorted it into Add, remove, modify, etc..

Using stdout to customize Logfile

For my current project I need to find a way to write a stdout-dependent version or at least a formatted stdout to a logfile.
Currently I'm just using subprocess.Popen(exec, stdout=log), which writes it to a file, unformatted.
The best idea I've got was formatting the output in on-the-fly, by intercepting the stdout, rerouting it to a custom module and let it write own strings to the log by using a combination of if's and elif's. But I couldn't find any way on how to do this in Python 3 in the docs. The most fitting answer I could find was in this question, the accepted answer is already kind of near.
But I just fail to understand on how exactly I should build the class that should be used as stdout=class. Which one of these methods receives the input (and how is it formatted)? Which of these methods are necessary? Or might there even be a way easier method to accomplish what I want to do?
You can use the sarge project to allow a parent process to capture the stdout (and/or stderr) of a subprocess, read it line by line (say) and do whatever you like with the read lines (including writing them, appropriately formatted, to a log file).
Disclosure: I'm the maintainer of the sarge project.

In python, why use logging instead of print?

For simple debugging in a complex project is there a reason to use the python logger instead of print? What about other use-cases? Is there an accepted best use-case for each (especially when you're only looking for stdout)?
I've always heard that this is a "best practice" but I haven't been able to figure out why.
The logging package has a lot of useful features:
Easy to see where and when (even what line no.) a logging call is being made from.
You can log to files, sockets, pretty much anything, all at the same time.
You can differentiate your logging based on severity.
Print doesn't have any of these.
Also, if your project is meant to be imported by other python tools, it's bad practice for your package to print things to stdout, since the user likely won't know where the print messages are coming from. With logging, users of your package can choose whether or not they want to propogate logging messages from your tool or not.
One of the biggest advantages of proper logging is that you can categorize messages and turn them on or off depending on what you need. For example, it might be useful to turn on debugging level messages for a certain part of the project, but tone it down for other parts, so as not to be taken over by information overload and to easily concentrate on the task for which you need logging.
Also, logs are configurable. You can easily filter them, send them to files, format them, add timestamps, and any other things you might need on a global basis. Print statements are not easily managed.
Print statements are sort of the worst of both worlds, combining the negative aspects of an online debugger with diagnostic instrumentation. You have to modify the program but you don't get more, useful code from it.
An online debugger allows you to inspect the state of a running program; But the nice thing about a real debugger is that you don't have to modify the source; neither before nor after the debugging session; You just load the program into the debugger, tell the debugger where you want to look, and you're all set.
Instrumenting the application might take some work up front, modifying the source code in some way, but the resulting diagnostic output can have enormous amounts of detail, and can be turned on or off to a very specific degree. The python logging module can show not just the message logged, but also the file and function that called it, a traceback if there was one, the actual time that the message was emitted, and so on. More than that; diagnostic instrumentation need never be removed; It's just as valid and useful when the program is finished and in production as it was the day it was added; but it can have it's output stuck in a log file where it's not likely to annoy anyone, or the log level can be turned down to keep all but the most urgent messages out.
anticipating the need or use for a debugger is really no harder than using ipython while you're testing, and becoming familiar with the commands it uses to control the built in pdb debugger.
When you find yourself thinking that a print statement might be easier than using pdb (as it often is), You'll find that using a logger pulls your program in a much easier to work on state than if you use and later remove print statements.
I have my editor configured to highlight print statements as syntax errors, and logging statements as comments, since that's about how I regard them.
In brief, the advantages of using logging libraries do outweigh print as below reasons:
Control what’s emitted
Define what types of information you want to include in your logs
Configure how it looks when it’s emitted
Most importantly, set the destination for your logs
In detail, segmenting log events by severity level is a good way to sift through which log messages may be most relevant at a given time. A log event’s severity level also gives you an indication of how worried you should be when you see a particular message. For instance, dividing logging type to debug, info, warning, critical, and error. Timing can be everything when you’re trying to understand what went wrong with an application. You want to know the answers to questions like:
“Was this happening before or after my database connection died?”
“Exactly when did that request come in?”
Furthermore, it is easy to see where a log has occurred through line number and filename or method name even in which thread.
Here's a functional logging library for Python named loguru.
If you use logging then the person responsible for deployment can configure the logger to send it to a custom location, with custom information. If you only print, then that's all they get.
Logging essentially creates a searchable plain text database of print outputs with other meta data (timestamp, loglevel, line number, process etc.).
This is pure gold, I can run egrep over the log file after the python script has run.
I can tune my egrep pattern search to pick exactly what I am interested in and ignore the rest. This reduction of cognitive load and freedom to pick my egrep pattern later on by trial and error is the key benefit for me.
tail -f mylogfile.log | egrep "key_word1|key_word2"
Now throw in other cool things that print can't do (sending to socket, setting debug levels, logrotate, adding meta data etc.), you have every reason to prefer logging over plain print statements.
I tend to use print statements because it's lazy and easy, adding logging needs some boiler plate code, hey we have yasnippets (emacs) and ultisnips (vim) and other templating tools, so why give up logging for plain print statements!?
I would add to all other mentionned advantages that the print function in standard configuration is buffered. The flush may occure only at the end of the current block (the one where the print is).
This is true for any program launched in a non interactive shell (codebuild, gitlab-ci for instance) or whose output is redirected.
If for any reason the program is killed (kill -9, hard reset of the computer, …), you may be missing some line of logs if you used print for the same.
However, the logging library will ensure to flush the logs printed to stderr and stdout immediately at any call.

Create a user-group in linux using python

I want to create a user group using python on CentOS system. When I say 'using python' I mean I don't want to do something like os.system and give the unix command to create a new group. I would like to know if there is any python module that deals with this.
Searching on the net did not reveal much about what I want, except for python user groups.. so I had to ask this.
I learned about the grp module by searching here on SO, but couldn't find anything about creating a group.
EDIT: I dont know if I have to start a new question for this, but I would also like to know how to add (existing) users to the newly created group.
Any help appreciated.
Thank you.
I don't know of a python module to do it, but the /etc/group and /etc/gshadow format is pretty standard, so if you wanted you could just open the files, parse their current contents and then add the new group if necessary.
Before you go doing this, consider:
What happens if you try to add a group that already exists on the system
What happens when multiple instances of your program try to add a group at the same time
What happens to your code when an incompatible change is made to the group format a couple releases down the line
NIS, LDAP, Kerberos, ...
If you're not willing to deal with these kinds of problems, just use the subprocess module and run groupadd. It will be way less likely to break your customers machines.
Another thing you could do that would be less fragile than writing your own would be to wrap the code in groupadd.c (in the shadow package) in Python and do it that way. I don't see this buying you much versus just exec'ing it, though, and it would add more complexity and fragility to your build.
I think you should use the commandline programs from your program, a lot of care has gone into making sure that they don't break the groups file if something goes wrong.
However the file format is quite straight forward to write something yourself if you choose to go that way
There are no library calls for creating a group. This is because there's really no such thing as creating a group. A GID is simply a number assigned to a process or a file. All these numbers exist already - there is nothing you need to do to start using a GID. With the appropriate privileges, you can call chown(2) to set the GID of a file to any number, or setgid(2) to set the GID of the current process (there's a little more to it than that, with effective IDs, supplementary IDs, etc).
Giving a name to a GID is done by an entry in /etc/group on basic Unix/Linux/POSIX systems, but that's really just a convention adhered to by the Unix/Linux/POSIX userland tools. Other network-based directories also exist, as mentioned by Jack Lloyd.
The man page group(5) describes the format of the /etc/group file, but it is not recommended that you write to it directly. Your distribution will have policies on how unnamed GIDs are allocated, such as reserving certain spaces for different purposes (fixed system groups, dynamic system groups, user groups, etc). The range of these number spaces differs on different distributions. These policies are usually encoded in the command-line tools that a sysadmin uses to assign unnamed GIDs.
This means the best way to add a group locally is to use the command-line tools.
If you are looking at Python, then try this program. Its fairly simple to use, and the code can easily be customized http://aleph-null.tv/downloads/mpb-adduser-1.tgz

Categories

Resources