I've been reading the Python logging documentation, and its certainly has a lot of functionality...for everything.
The problem I'm facing is that I'm not Dutch and so I'm not sure what the right way to do this is.
I am running events in a simulator, and I would like to prefix every log message with the timestamp of the simulated time (probably with a length formatter too, just to keep it looking good). I could change this in a subclass of Logger or Handler, but I don't think that is the right way.
I think the right way is to use a LoggerAdapter or a Filter. Is this right, and if so, which one should I prefer?
Surely if you just need to prefix every log message with the timestamp, all you need to do is provide an appropriate format string for a formatter? As mentioned here in the documentation.
Update: A LoggerAdapter would be more appropriate for situations where contextual information is moderately long-lived. For example, in a network server application which handles multiple client connections, the connection details (e.g. client IP address) might be useful context, so you would create a LoggerAdapter instance when a new client connection was created. This doesn't sound like your simulation scenario.
A Filter, on the other hand, could be used if you could use it to get the simulation time, e.g.
class SimulationFilter(logging.Filter):
def __init__(self, context):
"""
Set up with context passed in which allows access
to simulation times.
"""
self.context = context
def filter(self, record):
"Add sim_time field to record, formatted as you wish"
record.sim_time = '%s' % self.context.get_sim_time()
return True
and then add %(sim_time)s in your Formatter's format string.
Alternatively, if you know the simulation time whenever you make a logging call, you could just do e.g.
logger.debug('Message with %s', arguments, extra={'sim_time': sim_time})
and likewise have %(sim_time)s in your Formatter's format string.
Related
I am working with canbus in python (Pcan basic api) and would like to make it easier to use.
Via the bus a lot of devices/modules are connected. They are all allowed to send data, if a collison would happen the lowest ID will win.
The data Is organized in frames with ID, SubID, hexvalues
To illustrate the problem I am trying to adress, imagine the amplitude of a signal.
To read the value a frame is send to
QuestionID QuestionSUBID QuestionData
If there is no message with higher priority(=lowerID) the answer is written to the bus:
AnswerID AnswerSubID AnswerData
Since any module/device is allowed to write to the bus, you don't know in advance which answer you will get next. Setting a value morks the same way, just with different IDs. So for the above example the amplitude would have:
4 IDs and SubIds Associated with read/write question/answer
Additionally the lenght of the data has (0-8) has to be specified /stored.
Since the data is all hex values a parser has to be specified to obtain the human readable value (e.g Voltage in decimal representation)
To store this information I use nested dicts:
parameters = {'Parameter_1': {'Read': {'question_ID': ID,
'question_SUBID': SubID,
'question_Data': hex_value_list,
'answer_ID': ...,
'answer_subID': ...,
'answer_parser': function},
'Write': {'ID': ...,
'SubID': ...,
'parser' ...,
'answer_ID': ...,
'answer_subID': ...}},
'Parameter_2': ... }}
There are a lot of tools to show which value was set when, but for hardware control, the order in which paremeters are read are not relevant as long as they are up to date. Thus one part of a possible solution would be storing the whole traffic in a dict of dicts:
busdata = {'firstID' : {'first_subID': {'data': data,
'timestamp': timestamp},
'second_subID': {'data': data,
'timestamp': timestamp},
},
secondID': ...}
Due to the nature of the bus, I get a lot of answers other devices asked - the bus is quite full - these should not be dismissed since they might be the values I need next and there is no need to create additional traffic - I might use the timestamp with an expiry date, but I didn't think a lot about that so far.
This works, but is horrible to work with. In general I guess I will have about 300 parameters. The final goal is to controll the devices via a (pyqt) Gui, read some values like serial numbers but as well run measurement tasks.
So the big question is how to define a better datastructure that is easily accesible and understandable? I am looking forward to any suggestion on a clean design.
The main goal would be something like rid of the whole message based aproach.
EDIT: My goal is to get rid of the whole CAN specific message based aproach:
I assume I will need one thread for the communication, it should:
Read the buffer and update my variables
Send requests (messages) to obtain other values/variables
Send some values periodically
So from the gui I would like to be abled to:
get parameter by name --> send a string with parameter name
set parameter signal --> str(name), value(as displayedin the gui)
get values periodically --> name, interval, duration(10s or infinite)
The thread would have to:
Log all data on the bus for internal storage
Process requests by generating messages from name, value and read until result is obtained
Send periodical signals
I would like to have this design idependant of the actual hardware:
The solution I thought of, is the above parameters_dict
For internal storage I thought about the bus_data_dict
Still I am not sure how to:
Pass data from the bus thread to the gui (all values vs. new/requested value)
How to implement it with signals and slots in pyqt
Store data internally (dict of dicts or some new better idea)
If this design is a good choice
Using the python-can library will get you the networking thread - giving you a buffered queue of incoming messages. The library supports the PCAN interface among others.
Then you would create a middle-ware layer that converts and routes these can.Message types into pyqt signals. Think of this as a one to many source of events/signals.
I'd use another controller to be in charge of sending messages to the bus. It could have tasks like requesting periodic measurements from the bus, as well as on demand requests driven by the gui.
Regarding storing the data internally, it really depends on your programming style and the complexity. I have seen projects where each CAN message would have its own class.
Finally, queues are your friend!
Agree with #Hardbyte on the use of python-can. It's excellent.
As far as messaging between app layers, I've had a lot of luck with Zero MQ -- you can set up your modules as event based, from the canbus message event all the way through to updating a UI or whatever.
For data storage / persistence, I'm dropping messages into SQLite, and in parallel (using ZMQ Pub/Sub pattern) passing the data to an IoT hub (via MQTT).
class MySimpleCanBus:
def parse_message(self,raw_message):
return MyMessageClass._create(*struct.unpack(FRAME_FORMAT,msg))
def recieve_message(self,filter_data):
#code to recieve and parse a message(filtered by id)
raw = canbus.recv(FRAME_SIZE)
return self.parse_message(raw)
def send_message(msg_data):
# code to make sure the message can be sent and send the message
return self.recieve_message()
class MySpecificCanBus(MySimpleCanBus):
def get_measurement_reading():
msg_data = {} #code to request a measurement
return self.send_message(msg_data)
def get_device_id():
msg_data = {} # code to get device_id
return self.send_message(msg_data)
I probably dont understand your question properly ... maybe you could update it with additional details
Applications often need to connect to other services (a database, a cache, an API, etc). For sanity and DRY, we'd like to keep all of these connections in one module so the rest of our code base can share connections.
To reduce boilerplate, downstream usage should be simple:
# app/do_stuff.py
from .connections import AwesomeDB
db = AwesomeDB()
def get_stuff():
return db.get('stuff')
And setting up the connection should also be simple:
# app/cli.py or some other main entry point
from .connections import AwesomeDB
db = AwesomeDB()
db.init(username='stuff admin') # Or os.environ['DB_USER']
Web frameworks like Django and Flask do something like this, but it feels a bit clunky:
Connect to a Database in Flask, Which Approach is better?
http://flask.pocoo.org/docs/0.10/tutorial/dbcon/
One big issue with this is that we want a reference to the actual connection object instead of a proxy, because we want to retain tab-completion in iPython and other dev environments.
So what's the Right Way (tm) to do it? After a few iterations, here's my idea:
#app/connections.py
from awesome_database import AwesomeDB as RealAwesomeDB
from horrible_database import HorribleDB as RealHorribleDB
class ConnectionMixin(object):
__connection = None
def __new__(cls):
cls.__connection = cls.__connection or object.__new__(cls)
return cls.__connection
def __init__(self, real=False, **kwargs):
if real:
super().__init__(**kwargs)
def init(self, **kwargs):
kwargs['real'] = True
self.__init__(**kwargs)
class AwesomeDB(ConnectionMixin, RealAwesomeDB):
pass
class HorribleDB(ConnectionMixin, RealHorribleDB):
pass
Room for improvement: Set initial __connection to a generic ConnectionProxy instead of None, which catches all attribute access and throws an exception.
I've done quite a bit of poking around here on SO and in various OSS projects and haven't seen anything like this. It feels pretty solid, though it does mean a bunch of modules will be instantiating connection objects as a side effect at import time. Will this blow up in my face? Are there any other negative consequences to this approach?
First, design-wise, I might be missing something, but I don't see why you need the heavy mixin+singleton machinery instead of just defining a helper like so:
_awesome_db = None
def awesome_db(**overrides):
global _awesome_db
if _awesome_db is None:
# Read config/set defaults.
# overrides.setdefault(...)
_awesome_db = RealAwesomeDB(**overrides)
return _awesome_db
Also, there is a bug that might not look like a supported use-case, but anyway: if you make the following 2 calls in a row, you would wrongly get the same connection object twice even though you passed different parameters:
db = AwesomeDB()
db.init(username='stuff admin')
db = AwesomeDB()
db.init(username='not-admin') # You'll get admin connection here.
An easy fix for that would be to use a dict of connections keyed on the input parameters.
Now, on the essence of the question.
I think the answer depends on how your "connection" classes are actually implemented.
Potential downsides with your approach I see are:
In a multithreaded environment you could get problems with unsychronized concurrent access to the global connection object from multiple threads, unless it is already thread-safe. If you care about that, you could change your code and interface a bit and use a thread-local variable.
What if a process forks after creating the connection? Web application servers tend to do that and it might not be safe, again depending on the underlying connection.
Does the connection object have state? What happens if the connection object becomes invalid (due to i.e. connection error/time out)? You might need to replace the broken connection with a new one to return the next time a connection is requested.
Connection management is often already efficiently and safely implemented through a connection pool in client libraries.
For example, the redis-py Redis client uses the following implementation:
https://github.com/andymccurdy/redis-py/blob/1c2071762ad9b9288e786665990083e61c1cf355/redis/connection.py#L974
The Redis client then uses the connection pool like so:
Requests a connection from the connection pool.
Tries to execute a command on the connection.
If the connection fails, the client closes it.
In any case, finaly it is returned to the connection pool so it can be reused by subsequent calls or other threads.
So since the Redis client handles all of that under the hood, you can safely do what you want directly. Connections will be lazily created until the connection pool reaches full capacity.
# app/connections.py
def redis_client(**kwargs):
# Maybe read configuration/set default arguments
# kwargs.setdefault()
return redis.Redis(**kwargs)
Similarly, SQLAlchemy can use connection pooling as well.
To summarize, my understanding is that:
If your client library supports connection pooling, you don't need to do anything special to share connections between modules and even threads. You could just define a helper similar to redis_client() that reads configuration, or specifies default parameters.
If your client library provides only low-level connection objects, you will need to make sure access to them is thread-safe and fork-safe. Also, you need to make sure each time you return a valid connection (or raise an exception if you can't establish or reuse an existing one).
We have a network client based on asyncore with the user's network connection is embodied in a Dispatcher. The goal is for a user working from an interactive terminal to be able to enter network request commands which would go out to a server and eventually come back with an answer. The client is written to be asynchronous so that the user can start several requests on different servers all at once, collecting the results as they become available.
How can we allow the user to type in commands while we're going around a select loop? If we hit the select() call registered only as readable, then we'll sit there until we read data or timeout. During this (possibly very long) time user input will be ignored. If we instead always register as writable, we get a hot loop.
One bad solution is as follows. We run the select loop in its own thread and have the user inject input into a thread safe write Queue by invoking a method we define on our Dispatcher. Something like
def myConnection.enqueue(self, someData):
self.lock.acquire()
self.queue.put(someData)
self.lock.release()
We then only register as writable if the Queue is not emtpy
def writable(self):
return not self.queue.is_empty()
We would then specify a timeout for select() that's short on human scales but long for a computer. This way if we're in the select call registered only for reading when the user puts in new data, the loop will eventually run around again and find out that there's new data to write. This is a bad solution though because we might want to use this code for servers' client connections as well, in which case we don't want the dead time you get waiting for select() to time out. Again, I realize this is a bad solution.
It seems like the correct solution would be to bring the user input in through a file descriptor so that we can detect new input while sitting in the select call registered only as readable. Is there a way to do this?
NOTE: This is an attempt to simplify the question posted here
stdin is selectable. Put stdin into your dispatcher.
Also, I recommend Twisted for future development of any event-driven software. It is much more featureful than asyncore, has better documentation, a bigger community, performs better, etc.
I am using QtCore.QThread (from PyQt4).
To log, I am also using the following formatter :
logging.Formatter('%(levelname)-8s %(asctime)s %(threadName)-15s %(message)s')
The resulting log is :
DEBUG 2012-10-01 03:59:31,479 Dummy-3 my_message
My problem is that I want to know more explicitly which thread is logging... Dummy-3 is not the most explicit name to me....
Is there a way to set a name to a QtCore.QThread that will be usable by the logging module (as a LogRecord attribute) in order to have a log more meaningful ?
Thanks !
If the threading module is available, the logging module will use threading.current_thread().name to set the threadName LogRecord attribute.
But the docs for threading.current_thread say that a dummy thread object will be used if the current thread was not created by the threading module (hence the "Dummy-x" name).
I suppose it would be possible to monkey-patch threading.current_thread to reset the name to something more appropriate. But surely a much better approach would be to make use of the extra dictionary when logging a message:
logging.Formatter('%(levelname)-8s %(asctime)s %(qthreadname)-15s %(message)s')
...
extras = {'qthreadname': get_qthreadname()}
logging.warning(message, extra=extras)
I had exactly your problem: I have a Qt GUI app running in the main thread and several Workers running in separate threads.
I started using the extras = {'qthreadname': get_qthreadname()} approach suggested by Ekhumoro, but the problem was the integration with other libraries using logging. If you don't provide the extra dictionary, logging will throw an exception as written in the doc and here summarized:
If you choose to use these attributes in logged messages, you need to exercise some care. In the above example, for instance, the Formatter has been set up with a format string which expects ‘clientip’ and ‘user’ in the attribute dictionary of the LogRecord. If these are missing, the message will not be logged because a string formatting exception will occur. So in this case, you always need to pass the extra dictionary with these keys.
While this might be annoying, this feature is intended for use in specialized circumstances, such as multi-threaded servers where the same code executes in many contexts, and interesting conditions which arise are dependent on this context (such as remote client IP address and authenticated user name, in the above example). In such circumstances, it is likely that specialized Formatters would be used with particular Handlers.
Instead of having specialized Formatters and Handlers, I have found a different approach. A QThread is also a thread and you can always get a reference to the current thread and set its name. Here a snippet of my code:
import threading
#
# your code
worker = MyWorker()
worker_thread = QThread()
worker_thread.setObjectName('MyThread')
worker.moveToThread(worker_thread)
#
# your code
class MyWorker(QtCore.QtObject):
# your code
def start(self):
threading.current_thread().name = QtCore.QThread.currentThread().objectName()
# your code
Now all the log messages arriving from the QThread are properly identify.
I hope it can help you!
From the Qt5 documentation, you can call setObjectName() to modify the thread name
To choose the name that your thread will be given (as identified by the command ps -L on Linux, for example), you can call setObjectName() before starting the thread.
If you don't call setObjectName(), the name given to your thread will be the class name of the runtime type of your thread object (for example, "RenderThread" in the case of the Mandelbrot Example, as that is the name of the QThread subclass).
Unfortunately, it also notes:
this is currently not available with release builds on Windows.
This probably is a very noobish question, but I want to make sure my code is doing what I think it's doing.
Here's what I'm after - get a request, make a decision, respond to it the request with the decision, and only then log it. The sequence is important because writes can be slow and I want to make sure that a response is published before any writes take place.
Here's the sample code:
class ConferenceGreetingHandler(webapp.RequestHandler):
def get(self):
self.post()
def post(self):
xml_template(self, 'templates/confgreeting.xml')
new_log = Log()
new_log.log = 'test'
new_log.put()
I think I'm serving a response before logging, is this in fact true? Also, is there a better way to do this? Again, sorry for super-noobishness...
EDIT: Here's the template:
def xml_template(handler, page, values=None):
path = os.path.join(os.path.dirname(__file__), page)
handler.response.headers["Content-Type"] = "text/xml"
handler.response.out.write(template.render(path, values))
No matter what you do, App Engine will not send a response to a user until your handler code completes. There's currently no way, unfortunately, to tell App Engine "send the response now, I won't output any more".
You have a few options:
Just put the log entry synchronously. Datastore writes aren't hugely expensive with respect to wallclock latency, especially if you minimize the number of index updates needed.
Enqueue a task queue task to write the log data. If you use pull queues, you can fetch log entries and write them in batches to the datastore from another task or the backend.
Start the datastore write for the log entry as soon as you have the relevant data, and use an asynchronous operation, allowing you to overlap the write with some of your processing.
Much depends on what xml_template does. If it does a self.response.write(...), then the handler has done it's part to serve a response. The webapp framework does the rest once your handler completes normally.
I'm not sure what your "better way" question refers to, but two things stand out.
First, logger.warn("test") will write to the system log, rather than creating a Log instance that you have to (possibly) track down and delete later.
Second, if you're going to use xml_template widely, make it an instance method. Create your own subclass of webapp.RequestHandler, put xml_template there, and then subclass that for your specific handlers.
Updated: I overlooked the part about wanting to get the response out before doing writes. If you're suffering from slow writes, first look very carefully at whether the Entity being writing to is overindexed (indexed on fields that would never be queried against). If that wasn't enough to get performance into an acceptable range, the advice Nick lays out is the way to go.