I'm processing data from a serial port in python. The first byte indicates the start of a message and then the second byte indicates what type of message it is. Depending on that second byte we read in the message differently (to account for different types of messages, some are only data others are string and so on).
I now had the following structure. I have a general Message class that contains basic functions for every type of message and then derived classes that represent the different types of Messages (for example DataMessage or StringMessage). These have there own specific read and print function.
In my read_value_from_serial I read in all the byte. Right now I use the following code (which is bad) to determine if a message will be a DataMessage or a StringMessage (there are around 6 different type of messages but I simplify a bit).
msg_type = serial_port.read(size=1).encode("hex").upper()
msg_string = StringMessage()
msg_data = StringData()
processread = {"01" : msg_string.read, "02" : msg_data.read}
result = processread[msg_type]()
Now I want to simplify/improve this type of code. I've read about killing the switch but I don't like it that I have to create objects that I won't use in the end. Any suggestions for improving this specific problem?
Thanks
This is very close to what you have and I see nothing wrong with it.
class Message(object):
def print(self):
pass
class StringMessage(Message):
def __init__(self, port):
self.message = 'get a string from port'
def MessageFactory(port):
readers = {'01': StringMessage, … }
msg_type = serial_port.read(size=1).encode("hex").upper()
return readers[msg_type](port)
You say "I don't like it that I have to create objects that I won't use in the end". How is it that you aren't using the objects? If I have a StringMessage msg, then
msg.print()
is using an object exactly how it is supposed to be used. Did it bother you that your one instance of msg_string only existed to call msg_string.read()? My example code makes a new Message instance for every message read; that's what objects are for. That's actually how Object Oriented Programming works.
Related
Certain tools that we all use often allow strings to be parsed as optional commands. For example, with most IRC tools one can write something like /msg <nick> hi there!, resulting in the string being parsed and executing a command.
I was thinking about this on the weekend, and realised that I have absolutely no idea how I could implement this functionality robustly. My birds-eye view understanding of it is that every input will need to be parsed, a potential match found for issuing a command, and that command will need to be executed with proper validation in place.
I wrote a quick proof of concept for this in Python:
class InputParser:
def __init__(self):
self.command_character = '!!'
self.message = None
self.command = None
self.method = None
def process_message(self, message):
# every input into the system is sent through here. If the
# start of the string matches the command_character, try and
# find the command, otherwise return back the initial
# message.
self.message = message
if self.message.startswith(self.command_character):
self.command = self.message.split(' ')[0]
self.method = self.command.replace(self.command_character, '')
try:
return self.__class__.__dict__['_%s' % self.method]()
except KeyError:
# no matching command found, return the input message
return self.message
return self.message
def _yell(self):
# returns an uppercase string
return self.message.upper().replace(self.command, '')
def _me(self):
# returns a string wrapped by * characters
return ('*%s*' % self.message).replace(self.command, '')
Example usage:
!!yell hello friend > HELLO FRIEND
Question:
Can someone provide me a link to an existing project, an existing library or give me a conceptual overview on an effective way to robustly change the manner in which a string is interpreted by a program, leading to different behaviour by the application?
Rather than hacking away at the class internals, you would be better off using a dictionary to map the command strings to a function that performs the command. Set up the dictionary at the class level, or in the __init__() if it might vary between instances.
This way the dictionary serves two purposes: one to provide the valid command tokens and the other to map the command token to an action.
I am working with a large number of message types with similar but not identical structure. All the stuff that's common among these is in another message. When a message comes in, I parse it using the common message type. However, I can't seem to find a way to access the fields outside of this type (i.e. the non-common fields). Is there a way to access the unknown field set in python?
Edit:
I just saw this in the documentation:
"If a message has unknown fields, the current Java and C++ implementations write them in arbitrary order after the sequentially-ordered known fields. The current Python implementation does not track unknown fields."
Does this mean that if I parse using the common type, eg:
proto = msg_pb2.Common()
proto.ParseFromString(raw_msg)
Any fields not defined in message Common are thrown away?
To someone looking for an answer to this, the reflection module helped me:
https://developers.google.com/protocol-buffers/docs/reference/python/google.protobuf.reflection-module
The relevant sample code:
Sample usage:
file_descriptor = descriptor_pb2.FileDescriptorProto()
file_descriptor.ParseFromString(proto2_string)
msg_descriptor = descriptor.MakeDescriptor(file_descriptor.message_type[0])
msg_class = reflection.MakeClass(msg_descriptor)
msg = msg_class()
Args:
descriptor: A descriptor.Descriptor object describing the protobuf.
Returns:
The Message class object described by the descriptor.
I have a simple spyne service:
class JiraAdapter(ServiceBase):
#srpc(Unicode, String, Unicode, _returns=Status)
def CreateJiraIssueWithBase64Attachment(summary, base64attachment, attachment_filename):
status = Status
try:
newkey = jira_client.createWithBase64Attachment(summary, base64attachment, attachment_filename)
status.Code = StatusCodes.IssueCreated
status.Message = unicode(newkey)
except Exception as e:
status.Code = StatusCodes.InternalError
status.Message = u'Internal Exception: %s' % e.message
return status
The problem is that some programs will insert '\n' into generated base64string, after every 60th character or so and it will come into the services' method escaped ('\\n') causing things to behave oddly. Is there a setting or something to avoid this?
First, some comments about the code you posted:
You must instantiate your types (i.e. status = Status() instead of status = Status). As it is, you're setting class attributes on the Status class. Not only this is just wrong, you're also creating race conditions by altering global state without proper locking.
Does Jira have a way of creating issues with binary data? You can use ByteArray that handles base64 encoding/decoding for you. Note that ByteArray gets deserialized as a sequence of strings.
You can define a custom base64 type:
Base64String = String(pattern='[0-9a-zA-Z/+=]+')
... and use it instead of plain String together with validation to effortlessly reject invalid input.
Instead of returning a "Status" object, I'd return nothing but raise an exception when needed (or you can just let the original exception bubble up). Exceptions also get serialized just like normal objects. But that's your decision to make as it depends on how you want your API to be consumed.
Now for your original question:
You'll agree that the right thing to do here is to fix whatever's escaping '\n' (i.e. 0x0a) as r"\n" (i.e. 0x5c 0x6e).
If you want to deal with it though, I guess the solution in your comment (i.e. base64attachment = base64attachment.decode('string-escape') would be the best solution.
I hope that helps.
Best regards,
I'm creating an async child process with gobject.spawn_async, which generates data on stdout that I want to use when the child exits. So I create two callbacks (minimal example):
output = ""
def child_read(source, cb_condition):
for line in source: output += line + "\n"
def child_done(pid, condition, user_data):
print(user_data)
cpid, cout = gobject.spawn_async(['/bin/ls'],
flags = gobject.SPAWN_DO_NOT_REAP_CHILD,
standard_output=True)
gobject.child_watch_add(pid=cpid, function=child_done, data=output)
gobject.io_add_watch(os.fdopen(cout, 'r'), gobject.IO_IN | gobject.IO_PRI, child_read)
The obvious defect here is that child_done will always print nothing since output is reallocated in child_read. Now the question is, how do I do this in a syntactically nice and readable (i.e. self-documenting) way? Sure, I could just read output in child_done, but then the child_watch_add call doesn't document which data are used in the callback. Plus the callback can't be used for anything else. I'm really missing C/C++ pointer semantics here, since that would do just what I want.
I'm also aware that I could create a wapper class that emulates pointer semantics, but that kinda bloats syntax, too. So, any proposals for doing this "pythonic", i.e. elegantly, in a nice and readable way?
I haven't used gobject, but four hours is a long time for a Python question to remain unanswered on SO, so I'll give it a shot.
Make output a list of strings instead of a string. Something like this, perhaps:
output = []
def child_read(source, cb_condition):
output.extend(source)
def child_done(pid, condition, user_data):
output_str = '\n'.join(user_data)
print(output_str)
The rest of the code is unchanged (i.e. still use data=output in the call to child_watch_add().
I've got a python (3.1 if that matters) application that pickles data for another process to consume, and exchange them over network connections. For some reason, some exchange are unexpectedly large ... I can make sense of some of the pickled data and figure out what's transmitted, but there remain a large blob of apparently binary data which I fail to explain to myself, such as redundant strings or large chunk of binary data.
Do you know whether there is a wireshark plugin that could assist me with that task, or another process you'd recommend to someone trying to figure out what more should have been =None'd before the object is transmitted over the connection ?
RouteDirect
q.).q.}q.(X...._RouteDirect__dst_nodeq.cnode
Node
q.).q.}q.(X...._Node__status_upq.NX...._Node__neighbourhoodq.NX...._Node__sendq.NX
..._Node__cpeq
cequation
CPE
q.).q.}q^M(X...._CPE__dim_countq.}q.X...._CPE__internal_nodesq.]q.ubX...._Node__major_stateq.NX...._Node__partition_idq.G?.$:..4. X...._Node__name_idq.cnodeid
NameID
q.).q.}q.X^M..._NameID__nameq.X....checkq.sbX...._Node__dispatcherq.NX...._Node__pendingq.]q.cmessages
^ I can make sense of that: RouteDirect, CPE and NameID are classes in my program.
v I'm more surprised about this: there shouldn't be that much "plain binary" data in the exchange, although Iproto, Tflags, Isrc and Idst are strings contained within those data
q0).q1}q2(X...._Range__maxq3X....1f40q4X...._Range__min_includedq5.X...._Range__max_includedq6.X...._
Range__minq7h4ubX...._Component__dimensionq8h'ubh&).q9}q:h)X....Tflagsq;sbh+).q<}q=(h..h/h0).q>}q?
(h3X....02q#h5.h6.h7h#ubh8h9ubh&).qA}qBh)X....IprotoqCsbh+).qD}qE(h..h/h0).qF}qG(h3X...
.06qHh5.h6.h7hHubh8hAubh&).qI}qJh)X....IsrcqKsbh+).qL}qM(h..h/h0).qN}qO(h3X....7d59d8faqPh5.h6.
h7hPubh8hIubh&).qQ}qRh)X....IdstqS
sbh+).qT}qU(h..h/h0).qV}qW(h3X....00001011qXh5.h6.h7hXubh8hQubh&).qY}qZh)X....Tsrcq[sbh+).q\}q]
(h..h/h0).q^}q_(h3X....0bcfq`h5.h6.h7h`ubh8hYubusbX....
v and this is really perplexing.
qt).qu}qv(X...._LookupRequest__keyqwh!).qx}qyh$}qz(h&).q{}q|h)h*sbh+).q}}q~(h..h/h0).q.}q.
(h3h4h5.h6.h7h4ubh8h{ubh&).q.}q.h)h;sbh+).q.}q.(h..h/h0).q.}q.(h3h#h5.h6.h7h#ubh8h.ubh&).q.}q.h)hCsbh+).q.}q.(h..h/h0).q.}q.
(h3hHh5.h6.h7hHubh8h.ubh&).q.}q.h)hKsbh+).q.}q.(h..h/h0).q.}q.(h3hPh5.h6.h7hPubh8h.ubh&).q.}q.h)hSsbh+).q.}q.(h..h/h0).q.}q.
(h3hXh5.h6.h7hXubh8h.ubh&).q.}q.h)h[sbh+).q.}q.(h..h/h0).q.}q.
(h3h`h5.h6.h7h`ubh8h.ubusbX...._LookupRequest__nonceq.G?...u...X...._LookupRequest__fromq.h.).q.}q.(h.Nh.Nh.Nh
h.).q.}q.(h.}q.
What puzzle me the most is that it seems too regular to be e.g. mere floats/ints in binary. It has some affinity for numbers and [shub] and lot of 'isolated' q's ... which reminds me more of machine code. or is it just my eyes ?
example of pickling support in the Node class,
#
# Define special pickling behaviour.
def __getstate__(self):
"""Indicate witch fields should be pickled."""
state = copy.copy(self.__dict__)
# 'state' is a shallow copy: don't modify objects' content
# Make transients fields point to nothing
state['_Node__dispatcher'] = None
state['_Node__send'] = None
state['_Node__neighbourhood'] = None
state['_Node__status_up'] = None
state['_Node__data_store'] = None
state['_Node__running_op'] = None
state['_Node__major_state'] = None
return state
Many other objects (e.g. CPE, RouteDirect) have no __getstate__ method. I'd love it if there was some technique that doesn't require me to crawl through all constructors of all classes, of course.
Ah, reading /usr/lib/python3.1/pickle.py code at least make one point less obscure: output of pickling is indeed some bytecode for some interpreter, with push/pop pairs that explains the regular patterns seen.
BINPUT = b'q' # store stack top in memo; index is 1-byte arg
BINGET = b'h' # push item from memo on stack; index is 1-byte arg
EMPTY_TUPLE = b')' # push empty tuple
MARK = b'(' # push special markobject on stack
etc.
Following #Alfe's comment, I captured raw traffic using wireshark "follow TCP stream" and "save as ..." features, then used
x=pickle.load(open("wirecapture.bin","rb"))
and used Python evaluator to get a better understanding of what was there. Esp. using
len(pickle.dump(x.my_field))
for all fields reported by dir(x) allowed me to pin-point the over-sized field. Unfortunately, I couldn't get
for y in dir(x):
print("%s: %iKb"%(y,len(pickle.dumps(x[y])/1024))
properly working (x[y] wasn't the valid way to extract x.my_field when y == 'my_field' >_< )