High level / Abstract Canbus Interface in Python

High level / Abstract Canbus Interface in Python - python

I am working with canbus in python (Pcan basic api) and would like to make it easier to use.
Via the bus a lot of devices/modules are connected. They are all allowed to send data, if a collison would happen the lowest ID will win.
The data Is organized in frames with ID, SubID, hexvalues
To illustrate the problem I am trying to adress, imagine the amplitude of a signal.
To read the value a frame is send to
QuestionID QuestionSUBID QuestionData
If there is no message with higher priority(=lowerID) the answer is written to the bus:
AnswerID AnswerSubID AnswerData
Since any module/device is allowed to write to the bus, you don't know in advance which answer you will get next. Setting a value morks the same way, just with different IDs. So for the above example the amplitude would have:
4 IDs and SubIds Associated with read/write question/answer
Additionally the lenght of the data has (0-8) has to be specified /stored.
Since the data is all hex values a parser has to be specified to obtain the human readable value (e.g Voltage in decimal representation)
To store this information I use nested dicts:
parameters = {'Parameter_1': {'Read': {'question_ID': ID,
'question_SUBID': SubID,
'question_Data': hex_value_list,
'answer_ID': ...,
'answer_subID': ...,
'answer_parser': function},
'Write': {'ID': ...,
'SubID': ...,
'parser' ...,
'answer_ID': ...,
'answer_subID': ...}},
'Parameter_2': ... }}
There are a lot of tools to show which value was set when, but for hardware control, the order in which paremeters are read are not relevant as long as they are up to date. Thus one part of a possible solution would be storing the whole traffic in a dict of dicts:
busdata = {'firstID' : {'first_subID': {'data': data,
'timestamp': timestamp},
'second_subID': {'data': data,
'timestamp': timestamp},
},
secondID': ...}
Due to the nature of the bus, I get a lot of answers other devices asked - the bus is quite full - these should not be dismissed since they might be the values I need next and there is no need to create additional traffic - I might use the timestamp with an expiry date, but I didn't think a lot about that so far.
This works, but is horrible to work with. In general I guess I will have about 300 parameters. The final goal is to controll the devices via a (pyqt) Gui, read some values like serial numbers but as well run measurement tasks.
So the big question is how to define a better datastructure that is easily accesible and understandable? I am looking forward to any suggestion on a clean design.
The main goal would be something like rid of the whole message based aproach.
EDIT: My goal is to get rid of the whole CAN specific message based aproach:
I assume I will need one thread for the communication, it should:
Read the buffer and update my variables
Send requests (messages) to obtain other values/variables
Send some values periodically
So from the gui I would like to be abled to:
get parameter by name --> send a string with parameter name
set parameter signal --> str(name), value(as displayedin the gui)
get values periodically --> name, interval, duration(10s or infinite)
The thread would have to:
Log all data on the bus for internal storage
Process requests by generating messages from name, value and read until result is obtained
Send periodical signals
I would like to have this design idependant of the actual hardware:
The solution I thought of, is the above parameters_dict
For internal storage I thought about the bus_data_dict
Still I am not sure how to:
Pass data from the bus thread to the gui (all values vs. new/requested value)
How to implement it with signals and slots in pyqt
Store data internally (dict of dicts or some new better idea)
If this design is a good choice

Using the python-can library will get you the networking thread - giving you a buffered queue of incoming messages. The library supports the PCAN interface among others.
Then you would create a middle-ware layer that converts and routes these can.Message types into pyqt signals. Think of this as a one to many source of events/signals.
I'd use another controller to be in charge of sending messages to the bus. It could have tasks like requesting periodic measurements from the bus, as well as on demand requests driven by the gui.
Regarding storing the data internally, it really depends on your programming style and the complexity. I have seen projects where each CAN message would have its own class.
Finally, queues are your friend!

Agree with #Hardbyte on the use of python-can. It's excellent.
As far as messaging between app layers, I've had a lot of luck with Zero MQ -- you can set up your modules as event based, from the canbus message event all the way through to updating a UI or whatever.
For data storage / persistence, I'm dropping messages into SQLite, and in parallel (using ZMQ Pub/Sub pattern) passing the data to an IoT hub (via MQTT).

class MySimpleCanBus:
def parse_message(self,raw_message):
return MyMessageClass._create(*struct.unpack(FRAME_FORMAT,msg))
def recieve_message(self,filter_data):
#code to recieve and parse a message(filtered by id)
raw = canbus.recv(FRAME_SIZE)
return self.parse_message(raw)
def send_message(msg_data):
# code to make sure the message can be sent and send the message
return self.recieve_message()
class MySpecificCanBus(MySimpleCanBus):
def get_measurement_reading():
msg_data = {} #code to request a measurement
return self.send_message(msg_data)
def get_device_id():
msg_data = {} # code to get device_id
return self.send_message(msg_data)
I probably dont understand your question properly ... maybe you could update it with additional details

Related

Software Paradigm for Pushing Data Through a System

tl-dr: I wanted you feedback if the correct software design pattern to use would be a Push/Pull Pipeline pattern.
Details:
Let's say I have several software algorithms/blocks which process data coming into a software system:
[Download Data] --> [Pre Process Data] --> [ML Classification] --> [Post Results]
The download data block simply loiters until midnight when new data is available and then downloads new data. The pre-process data simply loiters until newly available downloaded data is present, and then preprocesses the data. The Machine Learning (ML) Classification block simply loiters until new data is available to classify, etc.
The entire system seems to be event driven and I think fits the push/pull paradigm perfectly?
The [Download Data] block would be a producer? The consumers would be all the subsequent blocks with the exception of the [Plot Results] which would be a results collector?
Producer = pull
Consumer = pull then push
result collector = pull
I'm working within a python framework. This implementation looked ideal:
https://learning-0mq-with-pyzmq.readthedocs.io/en/latest/pyzmq/patterns/pushpull.html
https://github.com/ashishrv/pyzmqnotes
Push/Pull Pipeline Pattern
I'm totally open to using another software paradigm other than push/pull if I've missed the mark here. I'm also open to using another repo as well.
Thanks in advance for your help with the above!

I've done similar pipelines many many times and very much like to break it into blocks like that. Why? Mainly for automatic recovery from any errors. If something gets delayed, it will auto recover next hour. If something needs to be fixed mid-pipeline, fix it and name it so it gets picked up next cycle. (That and the fact smaller blocks are easier to design, build, and test).
For example, your [Download Data] should run every hour to look for waiting data: if none, go back to sleep; if some, download it to a file with a name containing a timestamp and state: 2020-0103T2153.downloaded.json. [Pre Process Data] should run every hour to look for files named *.downloaded.json: if none, go back to sleep; if one or more, pre-processes each in increasing timestamp order with output to <same-timestamp>.pre-processed.json. Etc, etc for each step.
Doing it this way meant may unplanned events auto recovered and nobody would know unless they looked in the log files (you should log each so you know what happened). Easy to sleep at night :)
In these scenarios, the event driving this is just time-of-day via crontab. When "awoken", each step in the pipeline just looks to see if it has any work waiting for it. Trying to make the file-creation event initiate things was non-simple especially if you need to re-initiate things (would need to re-create the file).
I wouldn't use a message queue as that's more complicated and more suited when you have to handle incoming messages as they arrive. Your case is more simple batch file processing so keep it simple and sleep at night.

What happens if I subscribe to the same topic multiple times? (Python, Google Pubsub)

If I have the following code, will anything bad happen? Will it try to create new subscriptions? Is subscribe an idempotent operation?
subscriber = pubsub_v1.SubscriberClient()
def f(msg):
print(msg.data)
print(msg)
msg.ack()
def create_subscriptions():
restults = [] # some sql query
for result in results:
path = self.subscriber.subscription_path('PROJECT', result)
subscriber.subscribe(self.path, callback=f)
while True:
time.sleep(60)
create_subscriptions()
I need to be able to update my subscriptions based on when people create new ones. Is there any problem with this approach?

You should avoid repeatedly calling “subscribe” for the same subscription -- even though you will most likely not increase the number of duplicate messages that are delivered, you would create multiple instances of the receiving infrastructure. This is both inefficient, and defeats some of the flow control properties that Pub/Sub provides, since these are only computed per instance of the subscriber; i.e. it can cause your subscriber job to run out of memory and fail, for example.
Instead, I would suggest keeping track of which subscribers you’ve already created. Note that the “subscribe” method returns a future that you can use for this purpose, or to cancel the message receiving when necessary. You can find more details on the documentation.

Latched topic in ZeroMQ

Is it possible to have a "latched" topic in ZeroMQ, such that the last message sent to the topic is repeated to newly joined subscribers?
At the moment I have to create a REQ-REP-socket pair in addition to the PUB-SUB pair, so that when the new SUB joins, it asks for that last message using the REQ-socket. But this additional work, which is all boilerplate, is highly undesirable.
ROS has the "latched" option and it is described as:
When a connection is latched, the last message published is saved and
automatically sent to any future subscribers that connect. This is
useful for slow-changing to static data like a map. Note that if there
are multiple publishers on the same topic, instantiated in the same
node, then only the last published message from that node will be
sent, as opposed to the last published message from each publisher on
that single topic.

Well, your idea is doable in ZeroMQ:
Given a few bits from history, where due to a distributed-computing performance and memory capacity reasons and low costs of traffic, the topic-filter was initially implemented on the SUB-side(s), whereas later versions started to operate this feature on the PUB-side.
So, you application will never know in advance, which clients will use which version of the ZeroMQ and the problem is principally un-decidable.
Having this said,
your application user-code, on the PUB-side, can solve this, sending 2-in-1 formatted messages, and your SUB-side can be made aware of this soft-logic embedded into the message-stream.
Simply implement the "latched" logic in your user-code, be it via a naive re-send of each message per topic-line or some other means.
Yes, the very user-code is the only one, who can handle this,
not the PUB/SUB Scalable Formal Communication Pattern Archetype -- for two reasons -- it is not any general, universally applicable behaviour, but rather a user-specific speciality -- plus -- the topic-filter ( be it PUB-side or SUB-side operated ) has no prior knowledge about lexical-branching ( subscriptions are lexically interpreted from the left to the right and no one can a-priori say, what will a next subscriber actually subscribe to, and thus a "latched"-last-message store will not be able to get pre-populated until a new "next" subscriber actually joins and sets its actual topic-filter subscription ( storing all deterministic, combinatorics driven, possible {sub-|super-}-topic options is a very bad idea to circumvent the principal undecidability, isn't it? ) )

Best way to add custom information to for logging

I've been reading the Python logging documentation, and its certainly has a lot of functionality...for everything.
The problem I'm facing is that I'm not Dutch and so I'm not sure what the right way to do this is.
I am running events in a simulator, and I would like to prefix every log message with the timestamp of the simulated time (probably with a length formatter too, just to keep it looking good). I could change this in a subclass of Logger or Handler, but I don't think that is the right way.
I think the right way is to use a LoggerAdapter or a Filter. Is this right, and if so, which one should I prefer?

Surely if you just need to prefix every log message with the timestamp, all you need to do is provide an appropriate format string for a formatter? As mentioned here in the documentation.
Update: A LoggerAdapter would be more appropriate for situations where contextual information is moderately long-lived. For example, in a network server application which handles multiple client connections, the connection details (e.g. client IP address) might be useful context, so you would create a LoggerAdapter instance when a new client connection was created. This doesn't sound like your simulation scenario.
A Filter, on the other hand, could be used if you could use it to get the simulation time, e.g.
class SimulationFilter(logging.Filter):
def __init__(self, context):
"""
Set up with context passed in which allows access
to simulation times.
"""
self.context = context
def filter(self, record):
"Add sim_time field to record, formatted as you wish"
record.sim_time = '%s' % self.context.get_sim_time()
return True
and then add %(sim_time)s in your Formatter's format string.
Alternatively, if you know the simulation time whenever you make a logging call, you could just do e.g.
logger.debug('Message with %s', arguments, extra={'sim_time': sim_time})
and likewise have %(sim_time)s in your Formatter's format string.

Celery Task Grouping/Aggregation

I'm planning to use Celery to handle sending push notifications and emails triggered by events from my primary server.
These tasks require opening a connection to an external server (GCM, APS, email server, etc). They can be processed one at a time, or handled in bulk with a single connection for much better performance.
Often there will be several instances of these tasks triggered separately in a short period of time. For example, in the space of a minute, there might be several dozen push notifications that need to go out to different users with different messages.
What's the best way of handling this in Celery? It seems like the naïve way is to simply have a different task for each message, but that requires opening a connection for each instance.
I was hoping there would be some sort of task aggregator allowing me to process e.g. 'all outstanding push notification tasks'.
Does such a thing exist? Is there a better way to go about it, for example like appending to an active task group?
Am I missing something?
Robert

I recently discovered and have implemented the celery.contrib.batches module in my project. In my opinion it is a nicer solution than Tommaso's answer, because you don't need an extra layer of storage.
Here is an example straight from the docs:
A click counter that flushes the buffer every 100 messages, or every
10 seconds. Does not do anything with the data, but can easily be
modified to store it in a database.
# Flush after 100 messages, or 10 seconds.
#app.task(base=Batches, flush_every=100, flush_interval=10)
def count_click(requests):
from collections import Counter
count = Counter(request.kwargs['url'] for request in requests)
for url, count in count.items():
print('>>> Clicks: {0} -> {1}'.format(url, count))
Be wary though, it works fine for my usage, but it mentions that is an "Experimental task class" in the documentation. This might deter some from using a feature with such a volatile description :)

An easy way to accomplish this is to write all the actions a task should take on a persistent storage (eg. database) and let a periodic job do the actual process in one batch (with a single connection).
Note: make sure you have some locking in place to prevent the queue from being processes twice!
There is a nice example on how to do something similar at kombu level (http://ask.github.com/celery/tutorials/clickcounter.html)
Personally I like the way sentry does something like this to batch increments at db level (sentry.buffers module)

Develop Reference

Python is a programming language that lets you work quickly and integrate systems more effectively.