Looking for a Python approach to normally-closed logfiles

Looking for a Python approach to normally-closed logfiles - python

I have a Python program using a RotatingFileHandler for logging. The logging file handler opens the logfile in exclusive mode (once) and keeps it open until the app closes. The problem is that I need to allow other processes to read the logfile while the Python program is still running.
In past projects using C++, I created a queued logger. It maintained a queue of log entries. A secondary worker thread would regularly check the queue and if there were any entries, open the logfile, dump the entries to the file, and immediately close the file until more log entries are queued. This meant that (in processor time) >99% of the time, the file would be closed and available for other processes to peek into the logfile.
(From a bit of digging, I'm under the impression that the Python logging class already handles the queuing of log entries... That's not the part I'm asking about.)
Is there a simple way to accomplish a normally-closed logfilehandler in Python? (Preferably without having to add a 3rd party library or subsystem.)

As suggested in the comments, I'd go with using QueueHandler as the single root handler, combined with the QueueListener that acts on new records arriving. Other than that, a custom RotatingFileHandler is necessary that will close the file after the record is persisted and if no records are left in queue.
Disclaimer: the code below is untested.
import logging
import queue
global que_listener
class MyHandler(logging.RotatingFileHandler):
def __init__(self, queue, *args, **kwargs):
super().__init__(*args, delay=True, **kwargs)
self.queue = queue
def emit(self, record):
if self.stream is None:
self.stream = self._open()
super().emit(record)
if self.queue.empty():
self.stream.close()
self.stream = None
def init_logging():
que = queue.Queue(-1)
root_handler = QueueHandler(que)
file_handler = MyHandler(que)
que_listener = QueueListener(que, file_handler)
root = logging.getLogger()
root.addHandler(root_handler)
que_listener.start() # starts a separate thread to listen for queue updates
def cleanup_logging(): # stop listener on program exit
que_listener.stop()
I used delay=True so the handler won't immediately open and lock the file on init as opposed to the default behaviour. Also, since the file is closed between records persisting, think about proper error handling in emit (file removed / locked by another process / etc).

To me it seems that creating a separate thread managing this specific topic. As you pointed would be the best strategy.
The thread could perform read/write for every other programs.
You can perfectly design this thread in python using the logging library.
Another strategy would be to use a system such as Sentry or Kibana

Related

Using the Jaeger Python client together with Luigi

I'm just starting to use Jaeger for tracing and want to get the Python client to work with Luigi. The root of the problem is, that Luigi uses multiprocessing to fork worker processes. The docs mention that this can cause problems and recommend - in case of web apps - to defer the tracer initialization until the request handling process has been forked. This does not work for me, because I want to create traces in the main process and in the worker processes.
In the main process I initialize the tracer like described here:
from jaeger_client import Config
config = Config(...)
tracer = config.initialize_tracer()
This creates internally a Tornado io-loop that cannot be reused in the forked processes. So I try to re-initialize the Jaeger client in each Luigi worker process. This is possible by setting the (undocumented?) task_process_context of the worker-Section to a class implementing the context manager protocol. It looks like this:
class WorkerContext:
def __init__(self, process):
pass
def __enter__(self):
Config._initialized = False
Config._initialized_lock = threading.Lock()
config = Config(...)
self.tracer = config.initialize_tracer()
def __exit__(self, type, value, traceback):
self.tracer.close()
time.sleep(2)
Those two lines are of course very "hackish":
Config._initialized = False
Config._initialized_lock = threading.Lock()
The class variables are copied to the forked process and initialize_tracer would complain about already being initialized, if I would not reset the variables.
The details of how multiprocessing forks the new process and what this means for the Tornado loop are somewhat "mystic" to me. So my question is:
Is the above code safe or am I asking for trouble?
Of course I would get rid of accessing Configs internals. But if the solution could be considered safe, I would ask the maintainers for a reset method, that can be called in a forked process.

Python logging to PySide widget without delay

Problem: I have a PySide application that already uses logging for console output, but its logging should be extended in a way that LogRecords are also displayed immediately in a widget like a QTextBrowser. I am aware that this would usually be done via a worker thread that signals a slot in the main/gui thread, however as the code base is fairly big, and logging is probably used in a few blocking core operations it would be nice if an immediate feedback in the GUI could be achieved anyways without a bigger refactoring.
Example: Here is some example code for demonstration. It shows:
a logger with two handlers:
a StreamHandler logging to the console
a QSignalHandler emitting a signal with a message connected to a slot that appends the message to a QTextBrowser.
a method long_running_core_operation_that_should_log_immediately_to_ui() that simulates logging from a blocking core operation.
# -*- coding: utf-8 -*-
from __future__ import unicode_literals
import logging
import sys
from PySide import QtCore
from PySide import QtGui
class QSignaler(QtCore.QObject):
log_message = QtCore.Signal(unicode)
class SignalHandler(logging.Handler):
"""Logging handler to emit QtSignal with log record text."""
def __init__(self, *args, **kwargs):
super(SignalHandler, self).__init__(*args, **kwargs)
self.emitter = QSignaler()
def emit(self, logRecord):
msg = "{0}".format(logRecord.getMessage())
self.emitter.log_message.emit(msg)
# When the line below is enabled, logging is immediate/otherwise events
# on the queue will be processed when the slot has finished.
# QtGui.qApp.processEvents()
# configure logging
logging.basicConfig(level=logging.DEBUG) # adds StreamHandler
signal_handler = SignalHandler()
logger = logging.getLogger()
logger.addHandler(signal_handler)
class TestWidget(QtGui.QWidget):
def __init__(self, *args, **kwargs):
super(TestWidget, self).__init__(*args, **kwargs)
layout = QtGui.QVBoxLayout(self)
# text_browser
self.text_browser = QtGui.QTextBrowser()
layout.addWidget(self.text_browser)
# btn_start_operation
self.btn_start_operation = QtGui.QPushButton("Start operation")
self.btn_start_operation.clicked.connect(
self.long_running_core_operation_that_should_log_immediately_to_ui)
layout.addWidget(self.btn_start_operation)
# btn_clear
self.btn_clear = QtGui.QPushButton("Clear")
self.btn_clear.clicked.connect(self.text_browser.clear)
layout.addWidget(self.btn_clear)
def long_running_core_operation_that_should_log_immediately_to_ui(self):
for index in range(10000):
msg = "{0}".format(index)
logger.info(msg)
# test
if (__name__ == "__main__"):
app = QtGui.QApplication(sys.argv)
test_widget = TestWidget()
signal_handler.emitter.log_message.connect(test_widget.text_browser.append)
test_widget.show()
sys.exit(app.exec_())
Question: While the StreamHandler logging to stdout happens immediately, the QSignalHandler logging happens, when the PySide event loop processes events again, which happens after the for loop.
Is there a recommended way, to achieve immediate logging from the QSignalHandler without invoking a worker thread for the core operation?
Is it safe/recommended to just call QtGui.qApp.processEvents() after the QSignalHandler has emitted the logging signal? (When uncommented, logging to the GUI happens directly).
When reading the documentation for signal connection types, where it says Qt.DirectConnection: The slot is invoked immediately, when the signal is emitted. I would have kind of thought the QSignalHandler should have updated immediately just as the StreamHandler does, shouldn't it?

Is there a recommended way, to achieve immediate logging from the QSignalHandler without invoking a worker thread for the core operation?
I don't know of any other way to trigger a repaint of the log widget than processing events.
Note that calling repaint() on the log widget is misleading and does not have the desired effect, it only forces the paintEvent() method of the log widget to be called. repaint() does not do crucial things like copying the window surface to the windowing system.
Is it safe/recommended to just call QtGui.qApp.processEvents() after the QSignalHandler has emitted the logging signal? (When uncommented, logging to the GUI happens directly).
Using a separate thread or asynchronous operations is the recommended way. Calling processEvents() is the recommended way if you can't do that, like in your case.. Even Qt uses it for the same purpose inside QProgressDialog::setValue().
In general, manually processing events can be dangerous and should be done with care. After the call to processEvents(), the complete application state might be different. For example the log widget might no longer exist because the user closed the window! In your example code that is no problem, as the signal/slot connection will automatically disconnect, but imagine if you had tried to access the log widget after it has been deleted due to it being closed - you would have gotten a crash. So be careful.
When reading the documentation for signal connection types, where it says Qt.DirectConnection: The slot is invoked immediately, when the signal is emitted. I would have kind of thought the QSignalHandler should have updated immediately just as the StreamHandler does, shouldn't it?
The slot, in your case QTextBrowser::append(), is called immediately. However, QTextBrowser::append() does not immediately repaint. Instead, it schedules a repaint (via QWidget::update()), and the actual repainting happens when Qt gets around to process the events. That is either when you return to the event loop, or when you call processEvents() manually.
So slots are indeed called right away when emitting a signal, at least when using the default DirectConnection. However repainting does not happen immediately.

Maintaining log file from multiple threads in Python

I have my python baseHTTPServer server, which handles post requests.
I used ThreadingMixIn and its now opens a thread for each connection.
I wish to do several multithreaded actions, such as:
1. Monitoring successful/failed connections activities, by adding 1 to a counter for each.
I need a lock for that. My counter is in global scope of the same file. How can I do that?
2. I wish to handle some sort of queue and write it to a file, where the content of the queue is a set of strings, written from my different threads, that simply sends some information for logging issues. How can it be done? I fail to accomplish that since my threading is done "behind the scenes", as each time Im in do_POST(..) method, Im already in a different thread.
Succcessful_Logins = 0
Failed_Logins = 0
LogsFile = open(logfile)
class httpHandler(BaseHTTPRequestHandler):
def do_POST(self):
..
class ThreadingHTTPServer(ThreadingMixIn, HTTPServer):
pass
server = ThreadingHTTPServer(('localhost', PORT_NUMBER), httpHandler)
server.serve_forever()
this is a small fragile of my server.
Another thing that bothers my is the face I want to first send the post response back to the client, and only then possibly get delayed due to locking mechanism or whatever.

From your code, it looks like a new httpHandler is constructed in each thread? If that's the case you can use a class variable for the count and a mutex to protect the count like:
class httpHandler(...):
# Note that these are class variables and are therefore accessable
# to all instances
numSuccess = 0
numSuccessLock = new threading.Lock()
def do_POST(self):
self.numSuccessLock.aquire()
self.numSuccess += 1
self.numSuccessLock.release()
As for writing to a file from different threads, there are a few options:
Use the logging module, "The logging module is intended to be thread-safe without any special work needing to be done by its clients." from http://docs.python.org/2/library/logging.html#thread-safety
Use a Lock object like above to serialize writes to the file
Use a thread safe queue to queue up writes and then read from the queue and write to the file from a separate thread. See http://docs.python.org/2/library/queue.html#module-Queue for examples.

Python threading with filehandling

Hello I have a program that looks through a range of data and finds anomalies in that data. To make my program faster I incorporated the use of threads (66 in total) now when my program finds the anomalies I would want it to write it to a file but however when i try to write to the file from within multiple threads it wont write.
class myThread(threading.Thread):
def __init__(self,arg1,arg2,lock,output):
threading.Thread.__init__(self)
self.arg1 = arg1
self.arg2 = arg2
self.lock = lock
self.file = output
def run(self):
# print "Starting " + self.name
main(self.arg1,self.arg2,self.lock,self.file)
# print "Exiting " + self.name
def main(START_IP,END_IP,lock,File):
# store found DNS servers
foundDNS=[]
# scan all the ip addresses in the range
for i0 in range(START_IP[0], END_IP[0]+1):
for i1 in range(START_IP[1], END_IP[1]+1):
for i2 in range(START_IP[2], END_IP[2]+1):
for i3 in range(START_IP[3], END_IP[3]+1):
# build ip addres
ipaddr=str(i0)+"."+str(i1)+"."+str(i2)+"."+str(i3)
print "Scanning "+ipaddr+"...",
# scan address
ret=ScanDNS(ipaddr, 10)
if ret==True:
foundDNS.append(ipaddr)
print "Found!"
lock.acquire()
File.write(ipaddr)
File.write("\n")
File.flush()
lock.release()
else:
print
file = open("file.txt","wb")
lock = threading.Lock()
thread1 = myThread(START_IP,END_IP,lock,)
thread1.start()
This uses my exact same MyThread class just with the required arguments for main to manipulate the data. If I run my code for about a minute as its scanning over DNS servers I should get maybe 20-30 DNS servers saved into a file but I generally get this:
FILE.TXT
2.2.1.2
8.8.8.8
31.40.40
31.31.40.40
31.31.41.41
I know for a fact (because I watched the scanning output) and that it hardly writes all of them. So why is some writing and some not?

I don't know why your code is not working, but I can hazard a guess that it is due to race conditions. Hopefully someone knowledgeable can answer that part of your question.
However, I've encountered a similar problem before, and I solved it by moving the file writing code to a single output thread. This thread read from a synchronized queue to which other threads pushed data to be written.
Also, if you happen to be working on a machine with multiple cores, then it's better to use multiprocess instead of threading. The latter only runs threads on a single core, while the former does not have this limitation.

instead of providing file - provide Queue. Spawn new thread to read from Queue and file write. Or Use Locks everywhere in print too because some treads can be deadlocked.

To avoid potential error or misuse for access file from multi-threads, you can try using logging to write down your result.
import logging
logger = logging.getLogger()
file_handler = logging.FileHandler()
formatter = #your formmat
file_handler.setFormatter(formatter)
logger.addHandler(file_handler)

Check the the documentation for File Objects:
File.flush() is not enough to ensure that your data is written on disk, add
os.fsync(File.fileno()) just after to make it happens.

Tkinter GUI only updates when mouse is moved

I am running a Tkinter GUI that spins off another process (python script) with subprocess.Popen(...) and uses pipes for stdout and stderr. Then I'm spinning off a separate thread to asynchronously read the out/err from that process and draw it into a Tkinter Text widget using threading.Thread.
Everything works great except that the async. read thread only executes when I'm moving the mouse or pressing keys on the keyboard. I even put print statements into the threaded function and they start/stop printing when I move the mouse around in circles.
Here's the async read class that I'm using, borrowed from here:
class AsynchronousFileReader(threading.Thread):
'''
Helper class to implement asynchronous reading of a file
in a separate thread. Pushes read lines on a queue to
be consumed in another thread.
'''
def __init__(self, fd, queue):
assert isinstance(queue, Queue.Queue)
assert callable(fd.readline)
threading.Thread.__init__(self)
self._fd = fd
self._queue = queue
def run(self):
'''The body of the tread: read lines and put them on the queue.'''
for line in iter(self._fd.readline, ''):
self._queue.put(line)
def eof(self):
'''Check whether there is no more content to expect.'''
return not self.is_alive() and self._queue.empty()
And my consume method for pulling messages out of the async file reader (this is the one that runs on a separate thread:
def consume(self, process, console_frame):
# Launch the asynchronous readers of the process' stdout and stderr.
stdout_queue = Queue.Queue()
stdout_reader = AsynchronousFileReader(process.stdout, stdout_queue)
stdout_reader.start()
stderr_queue = Queue.Queue()
stderr_reader = AsynchronousFileReader(process.stderr, stderr_queue)
stderr_reader.start()
# Check the queues if we received some output (until there is nothing more to get).
while not stdout_reader.eof() or not stderr_reader.eof():
# Show what we received from standard output.
while not stdout_queue.empty():
line = stdout_queue.get()
console_frame.writeToLog(line.strip(), max_lines=None)
time.sleep(.03) # prevents it from printing out in large blocks at a time
# Show what we received from standard error.
while not stderr_queue.empty():
line = stderr_queue.get()
console_frame.writeToLog(line.strip(), max_lines=None)
time.sleep(.03) # prevents it from printing out in large blocks at a time
# Sleep a bit before asking the readers again.
time.sleep(.05)
# Let's be tidy and join the threads we've started.
stdout_reader.join()
stderr_reader.join()
# Close subprocess' file descriptors.
process.stdout.close()
process.stderr.close()
print "finished executing"
if self.stop_callback:
self.stop_callback()
Like I said before -- the consume() thread only executes when I move the mouse or type on the keyboard -- which means the writeToLog(...) function (for appending text into the Tkinter GUI) only gets executed when mouse/keyboard activity happens... Any ideas?
EDIT: I think I might have an idea of what's happening... If I comment the writeToLog(...) call and replace it with a simple print (taking Tkinter out of the equation) then the consume thread executes normally. It seems Tkinter is the problem here. Any ideas on I can accomplish the Tkinter text-widget update from the consume thread?
EDIT2: Got it working thanks to the comments. Here's is the final code that I used:
gui_text_queue = Queue.Queue()
def consume(self, process, console_frame):
# Launch the asynchronous readers of the process' stdout and stderr.
stdout_queue = Queue.Queue()
stdout_reader = AsynchronousFileReader(process.stdout, stdout_queue)
stdout_reader.start()
stderr_queue = Queue.Queue()
stderr_reader = AsynchronousFileReader(process.stderr, stderr_queue)
stderr_reader.start()
# Check the queues if we received some output (until there is nothing more to get).
while not stdout_reader.eof() or not stderr_reader.eof():
# Show what we received from standard output.
while not stdout_queue.empty():
line = stdout_queue.get()
gui_text_queue.put(line.strip())
# Show what we received from standard error.
while not stderr_queue.empty():
line = stderr_queue.get()
gui_text_queue.put(line.strip())
# Sleep a bit before asking the readers again.
time.sleep(.01)
# Let's be tidy and join the threads we've started.
stdout_reader.join()
stderr_reader.join()
# Close subprocess' file descriptors.
process.stdout.close()
process.stderr.close()
if self.stop_callback:
self.stop_callback()
Added this method to my Tkinter console frame and called it once at the end of the frame initializer:
def pull_text_and_update_gui(self):
while not gui_text_queue.empty():
text = gui_text_queue.get()
self.writeToLog(text, max_lines=None)
self.after(5, self.pull_text_and_update_gui)

Tkinter isn't thread safe. If your writeToLog function tries to insert data into the text widget, you'll get unpredictable behavior. In order for a separate thread to send data to a widget you'll need to write the data to a thread-safe queue, then have your main thread poll that queue (using tkinter's after method).

Develop Reference

Python is a programming language that lets you work quickly and integrate systems more effectively.