GHMM - Attempted m_free on NULL pointer - python

I'm trying to use the ghmm python module on mac osx with Python 2.7. I've managed to get everything installed, and I can import ghmm in the python environment, but there are errors when I run this (from the ghmm 'tutorial') (UnfairCasino can be found here http://ghmm.sourceforge.net/UnfairCasino.py):
from ghmm import *
from UnfairCasino import test_seq
sigma = IntegerRange(1,7)
A = [[0.9, 0.1], [0.3, 0.7]]
efair = [1.0 / 6] * 6
eloaded = [3.0 / 13, 3.0 / 13, 2.0 / 13, 2.0 / 13, 2.0 / 13, 1.0 / 13]
B = [efair, eloaded]
pi = [0.5] * 2
m = HMMFromMatrices(sigma, DiscreteDistribution(sigma), A, B, pi)
v = m.viterbi(test_seq)
Specifically I get this error:
GHMM ghmm.py:148 - sequence.c:ghmm_dseq_free(1199): Attempted m_free on NULL pointer. Bad program, BAD! No cookie for you.
python(52313,0x7fff70940cc0) malloc: * error for object 0x74706d6574744120: pointer being freed was not allocated
* set a breakpoint in malloc_error_break to debug
Abort trap
and when I set the ghmm.py logger to "DEBUG", the log prints out the following just before:
GHMM ghmm.py:2333 - HMM.viterbi() -- begin
GHMM ghmm.py:849 - EmissionSequence.asSequenceSet() -- begin >
GHMM ghmm.py:862 - EmissionSequence.asSequenceSet() -- end >
Traceback (most recent call last):
File "/Library/Frameworks/EPD64.framework/Versions/7.1/lib/python2.7/logging/init.py", line 842, in emit
msg = self.format(record)
File "/Library/Frameworks/EPD64.framework/Versions/7.1/lib/python2.7/logging/init.py", line 719, in format
return fmt.format(record)
File "/Library/Frameworks/EPD64.framework/Versions/7.1/lib/python2.7/logging/init.py", line 464, in format
record.message = record.getMessage()
File "/Library/Frameworks/EPD64.framework/Versions/7.1/lib/python2.7/logging/init.py", line 328, in getMessage
msg = msg % self.args
TypeError: not all arguments converted during string formatting
Logged from file ghmm.py, line 1159
Traceback (most recent call last):
File "/Library/Frameworks/EPD64.framework/Versions/7.1/lib/python2.7/logging/init.py", line 842, in emit
msg = self.format(record)
File "/Library/Frameworks/EPD64.framework/Versions/7.1/lib/python2.7/logging/init.py", line 719, in format
return fmt.format(record)
File "/Library/Frameworks/EPD64.framework/Versions/7.1/lib/python2.7/logging/init.py", line 464, in format
record.message = record.getMessage()
File "/Library/Frameworks/EPD64.framework/Versions/7.1/lib/python2.7/logging/init.py", line 328, in getMessage
msg = msg % self.args
TypeError: not all arguments converted during string formatting
Logged from file ghmm.py, line 949
GHMM ghmm.py:2354 - HMM.viterbi() -- end
GHMM ghmm.py:1167 - del SequenceSubSet >
So I suspect it has something to do with the way Sequences are deleted once the Viterbi function is completed, but I'm not sure if this means I need to modify the Python code, the C code, or if I need to compile ghmm and the wrappers differently. Any help/suggestions would be appreciated greatly as I have been trying to get this library to work the last 4 days.

Given the age of this question, you've probably moved onto something else, but this seemed to be the only related result I found. The issue is that a double-free is happening, due to some weirdness in how the python function 'EmissionSequence::asSequenceSet' is executed. If you look at how ghmm.py is implemented (~lines 845 - 863)
def asSequenceSet(self):
"""
#returns this EmissionSequence as a one element SequenceSet
"""
log.debug("EmissionSequence.asSequenceSet() -- begin " + repr(self.cseq))
seq = self.sequenceAllocationFunction(1)
# checking for state labels in the source C sequence struct
if self.emissionDomain.CDataType == "int" and self.cseq.state_labels is not None:
log.debug("EmissionSequence.asSequenceSet() -- found labels !")
seq.calloc_state_labels()
self.cseq.copyStateLabel(0, seq, 0)
seq.setLength(0, self.cseq.getLength(0))
seq.setSequence(0, self.cseq.getSequence(0))
seq.setWeight(0, self.cseq.getWeight(0))
log.debug("EmissionSequence.asSequenceSet() -- end " + repr(seq))
return SequenceSetSubset(self.emissionDomain, seq, self)
This should probably raise some red flags, since it seems to be reaching into the C a bit much (not that I know for sure, I haven't looked to far into it).
Anyways, if you look a little above this function, there is another function called 'sequenceSet':
def sequenceSet(self):
"""
#return a one-element SequenceSet with this sequence.
"""
# in order to copy the sequence in 'self', we first create an empty SequenceSet and then
# add 'self'
seqSet = SequenceSet(self.emissionDomain, [])
seqSet.cseq.add(self.cseq)
return seqSet
It seems that it has the same purpose, but is implemented differently. Anyways, if you replace the body of 'EmissionSequence::asSequenceSet' in ghmm.py, with just:
def asSequenceSet(self):
"""
#returns this EmissionSequence as a one element SequenceSet
"""
return self.sequenceSet();
And then rebuild/reinstall the ghmm module, the code will work without crashing, and you should be able to go on your merry way. I'm not sure if this can be submitted as a fix, since the ghmm project looks a little dead, but hopefully this is simple enough to help anyone in dire straights using this library.

Related

Porting Python2 to Python3 - Reading Bytes from a socket and using unpack() correctly

There is code here that I am trying to convert from Python2 to Python3. In this section of code, data is received from a socket. data is declared to be an empty string and then concatenated. This is an important Python2 to 3 distinction. In Python3, the 'received' variable is of type Bytes and thus needs to be converted to string first via the use of str(). However, str() needs an encoding parameter. What would the default one be for python 2? I've tried several different encodings (latin-1 and such) but they seem to not match up with a magic value that is defined here after being unpacked here
"\xffSMB seems to decode correctly while "\xfeSMB does not.
I have adjusted the non_polling_read function and the NetBIOSSessionPacket class as follows. Full modified source code available on GitHub.
def non_polling_read(self, read_length, timeout):
data = b''
bytes_left = read_length
while bytes_left > 0:
try:
ready, _, _ = select.select([self._sock.fileno()], [], [], timeout)
if not ready:
raise NetBIOSTimeout
received = self._sock.recv(bytes_left)
if len(received) == 0:
raise NetBIOSError('Error while reading from remote', ERRCLASS_OS, None)
data = data + received
bytes_left = read_length - len(data)
except select.error as ex:
if ex[0] != errno.EINTR and ex[0] != errno.EAGAIN:
raise NetBIOSError('Error occurs while reading from remote', ERRCLASS_OS, ex[0])
return data
class NetBIOSSessionPacket:
def __init__(self, data=0):
self.type = 0x0
self.flags = 0x0
self.length = 0x0
if data == 0:
self._trailer = ''
else:
try:
self.type = data[0]
if self.type == NETBIOS_SESSION_MESSAGE:
self.length = data[1] << 16 | (unpack('!H', data[2:4])[0])
else:
self.flags = data[1]
self.length = unpack('!H', data[2:4])[0]
self._trailer = data[4:]
except Exception as e:
import traceback
traceback.print_exc()
raise NetBIOSError('Wrong packet format ')
When I start the server and issue 'smbclient -L 127.0.0.1 -d 4' from the commandline, the server first creates a libs.nmb.NetBIOSTCPSession which appears to be working well. Once it tries to unwrap the libs.nmb.NetBIOSSessionPacket, it throws an exception.
Traceback (most recent call last):
File "/root/PycharmProjects/HoneySMB/libs/smbserver.py", line 3975, in processRequest
packet = smb.NewSMBPacket(data=data)
File "/root/PycharmProjects/HoneySMB/libs/smb.py", line 690, in __init__
Structure.__init__(self, **kargs)
File "/root/PycharmProjects/HoneySMB/libs/structure.py", line 77, in __init__
self.fromString(data)
File "/root/PycharmProjects/HoneySMB/libs/structure.py", line 144, in fromString
self[field[0]] = self.unpack(field[1], data[:size], dataClassOrCode = dataClassOrCode, field = field[0])
File "/root/PycharmProjects/HoneySMB/libs/structure.py", line 288, in unpack
raise Exception("Unpacked data doesn't match constant value %r should be %r" % (data, answer))
Exception: ("Unpacked data doesn't match constant value b'\\xffSMB' should be 'ÿSMB'", 'When unpacking field \'Signature | "ÿSMB | b\'\\xffSMBr\\x00\\x00\\x00\\x00\\x18C\\xc8\\x00\\x00\\x00\\x00\\x00\\x00\\x00\\x00\\x00\\x00\\x00\\x00\\x00\\x00\\xfe\\xff\\x00\\x00\\x00\\x00\\x00\\xb1\\x00\\x02PC NETWORK PROGRAM 1.0\\x00\\x02MICROSOFT NETWORKS 1.03\\x00\\x02MICROSOFT NETWORKS 3.0\\x00\\x02LANMAN1.0\\x00\\x02LM1.2X002\\x00\\x02DOS LANMAN2.1\\x00\\x02LANMAN2.1\\x00\\x02Samba\\x00\\x02NT LANMAN 1.0\\x00\\x02NT LM 0.12\\x00\\x02SMB 2.002\\x00\\x02SMB 2.???\\x00\'[:4]\'')
During handling of the above exception, another exception occurred:
Traceback (most recent call last):
File "/root/PycharmProjects/HoneySMB/libs/smbserver.py", line 3597, in handle
resp = self.__SMB.processRequest(self.__connId, p.get_trailer())
File "/root/PycharmProjects/HoneySMB/libs/smbserver.py", line 3979, in processRequest
packet = smb2.SMB2Packet(data=data)
File "/root/PycharmProjects/HoneySMB/libs/smb3structs.py", line 435, in __init__
Structure.__init__(self,data)
File "/root/PycharmProjects/HoneySMB/libs/structure.py", line 77, in __init__
self.fromString(data)
File "/root/PycharmProjects/HoneySMB/libs/structure.py", line 144, in fromString
self[field[0]] = self.unpack(field[1], data[:size], dataClassOrCode = dataClassOrCode, field = field[0])
File "/root/PycharmProjects/HoneySMB/libs/structure.py", line 288, in unpack
raise Exception("Unpacked data doesn't match constant value %r should be %r" % (data, answer))
Exception: ("Unpacked data doesn't match constant value b'\\xffSMB' should be 'þSMB'", 'When unpacking field \'ProtocolID | "þSMB | b\'\\xffSMBr\\x00\\x00\\x00\\x00\\x18C\\xc8\\x00\\x00\\x00\\x00\\x00\\x00\\x00\\x00\\x00\\x00\\x00\\x00\\x00\\x00\\xfe\\xff\\x00\\x00\\x00\\x00\\x00\\xb1\\x00\\x02PC NETWORK PROGRAM 1.0\\x00\\x02MICROSOFT NETWORKS 1.03\\x00\\x02MICROSOFT NETWORKS 3.0\\x00\\x02LANMAN1.0\\x00\\x02LM1.2X002\\x00\\x02DOS LANMAN2.1\\x00\\x02LANMAN2.1\\x00\\x02Samba\\x00\\x02NT LANMAN 1.0\\x00\\x02NT LM 0.12\\x00\\x02SMB 2.002\\x00\\x02SMB 2.???\\x00\'[:4]\'')
Now it's obvious why this throws an exception. After all, 0xFF does NOT equal ÿ or þ. The question is why is 0xFF the value it tries to write in there in the first place when it should be two different values?
The original python 2 code seems to be quite close to C (using unpack and importing cstring). Is there an obvious benefit to this here or could this be done more simply?
My actual question is:
In the original code, there is no reference to any encoding anywhere. So how is this divined then? Where does it translate 0xFF to ÿ?

Python write file empty

I'm trying to write a script to convert an Intel HEX file to a Verilog mem format.
I can print the strings I want to save OK (eg the read & parse bit's working) but when I try to write to a file nothing ever appears :(
ihexf = open("test.hex","r")
vmemf = open("test.mem","w")
for line in ihexf:
rlen_s = line[1:3]
addr_s = line[3:7]
rtyp_s = line[7:9]
rlen = int(rlen_s, 16)
addr = int(addr_s, 16)
rtyp = int(rtyp_s, 16)
# print(rlen_s,addr_s,rtyp_s)
if rtyp == 0:
# print('#'+addr_s)
vmemf.write('#'+addr_s+'\n')
for i in range (0, rlen):
laddr = addr + i
val_s = line[9+i*2:9+i*2+2]
val = int(val_s, 16)
# print(val_s)
vmemf.write(val_s+'\n')
# print("")
else:
print("------- End Of File ------")
ihexf.close()
vmemf.close()
My test.hex looks like
:20000000FFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFF000000FF000000FF000000FF555540FF0A
:20000800155540FF055540FF015540FF005540FF001540FF000540FF000140FF000040FF56
:20001000000040FF000140FF000540FF001540FF005540FF015540FF055540FF155540FF4E
:00000001FF
Any clues what I'm doing wrong?
Make sure you have closed the file and very importantly that you reposition the file pointer to the start of the file and start reading chunks.
ihexf.seek(0,0)
OK - I worked out what was happening (I think!)
Existing code works on linux but not Windows.
On Windows I was seeing the following once the script finished:
#0000
#0008
#0010
#0018
------- End Of File ------
Traceback (most recent call last):
File "C:\Users\Nigel\SkyDrive\Files\python\intexhex2v.py", line 8, in <module>
rlen = int(rlen_s, 16)
ValueError: invalid literal for int() with base 16: ''`
Looks like things were messing up at the end of the file read.
Adding break after the End-Of-File print fixed everything
Thanks

Getting the error "TypeError: 'NoneType' object is not iterable" when trying to use whisper-merge

I'm trying to use whisper-merge to merge 2 wsp files. They have identical retention strategies, one just has older data than the other.
When I run whisper-merge oldfile.wsp newfile.wsp I get this error
Traceback (most recent call last):
File "/usr/local/src/whisper-0.9.12/bin/whisper-merge.py", line 32, in <module>
whisper.merge(path_from, path_to)
File "/usr/local/lib/python2.7/dist-packages/whisper.py", line 821, in merge
(timeInfo, values) = fetch(path_from, fromTime, untilTime)
TypeError: 'NoneType' object is not iterable
Any ideas?
Here's the meta data output for the 2 files:
"from_file" http://sprunge.us/dBHC
"to_file" http://sprunge.us/eIVG
Line 812 in whisper.py is broken for files that contain multiple archives.
https://github.com/graphite-project/whisper/blob/0.9.12/whisper.py#L812
fromTime = int(time.time()) - headerFrom['maxRetention']
To fix, immediately following line 813, assign fromTime based on the archive retention.
https://github.com/graphite-project/whisper/blob/0.9.12/whisper.py#L813
for archive in archives: # this line already exists
fromTime = int(time.time()) - archive['retention'] # add this line
Snippet from whisper.py
def fetch(path,fromTime,untilTime=None):
"""fetch(path,fromTime,untilTime=None)
path is a string
fromTime is an epoch time
untilTime is also an epoch time, but defaults to now.
Returns a tuple of (timeInfo, valueList)
where timeInfo is itself a tuple of (fromTime, untilTime, step)
Returns None if no data can be returned
"""
fh = open(path,'rb')
return file_fetch(fh, fromTime, untilTime)
Suggests that whisper.fetch() is returning None, which in turn, (along with the final line in the traceback) suggests that there is a problem with your path_from file.
Looking a little deeper, whisper.file_fetch() appears to have two places where it can return None (explicitly, at least):
def file_fetch(fh, fromTime, untilTime):
header = __readHeader(fh)
now = int( time.time() )
if untilTime is None:
untilTime = now
fromTime = int(fromTime)
untilTime = int(untilTime)
# Here we try and be flexible and return as much data as we can.
# If the range of data is from too far in the past or fully in the future, we
# return nothing
if (fromTime > untilTime):
raise InvalidTimeInterval("Invalid time interval: from time '%s' is after until time '%s'" % (fromTime, untilTime))
oldestTime = now - header['maxRetention']
# Range is in the future
if fromTime > now:
return None # <== Here
# Range is beyond retention
if untilTime < oldestTime:
return None # <== ...and here
...

Python multiprocessing "Bad file descriptor" error (not repeatable)

Apologies in advance, but I am unable to post a fully working example (too much overhead in this code to distill to a runnable snippet). I will post as much explanatory detail as I can, and please do let me know if anything critical seems missing.
Running Python 2.7.5 through IDLE
I am writing a program to compare two text files. Since the files can be large (~500MB) and each row comparison is independent, I would like to implement multiprocessing to speed up the comparison. This is working pretty well, but I am getting stuck on a pseudo-random Bad file descriptor error. I am new to multiprocessing, so I guess there is a technical problem with my implementation. Can anyone point me in the right direction?
Here is the code causing the trouble (specifically the pool.map):
# openfiles
csvReaderTest = csv.reader(open(testpath, 'r'))
csvReaderProd = csv.reader(open(prodpath, 'r'))
compwriter = csv.writer(open(outpath, 'wb'))
pool = Pool()
num_chunks = 3
chunksTest = itertools.groupby(csvReaderTest, keyfunc)
chunksProd = itertools.groupby(csvReaderProd, keyfunc)
while True:
# make a list of num_chunks chunks
groupsTest = [list(chunk) for key, chunk in itertools.islice(chunksTest, num_chunks)]
groupsProd = [list(chunk) for key, chunk in itertools.islice(chunksProd, num_chunks)]
# merge the two lists (pair off comparison rows)
groups_combined = zip(groupsTest,groupsProd)
if groups_combined:
# http://stackoverflow.com/questions/5442910/python-multiprocessing-pool-map-for-multiple-arguments
a_args = groups_combined # a list - set of combinations to be tested
second_arg = True
worker_result = pool.map(worker_mini_star, itertools.izip(itertools.repeat(second_arg),a_args))
Here is the full error output. (This error sometimes occurs, and other times the comparison runs to finish without problems):
Traceback (most recent call last):
File "H:/<PATH_SNIP>/python_csv_compare_multiprocessing_rev02_test2.py", line 407, in <module>
main(fileTest, fileProd, fileout, stringFields, checkFileLengths)
File "H:/<PATH_SNIP>/python_csv_compare_multiprocessing_rev02_test2.py", line 306, in main
worker_result = pool.map(worker_mini_star, itertools.izip(itertools.repeat(second_arg),a_args))
File "C:\Python27\lib\multiprocessing\pool.py", line 250, in map
return self.map_async(func, iterable, chunksize).get()
File "C:\Python27\lib\multiprocessing\pool.py", line 554, in get
raise self._value
IOError: [Errno 9] Bad file descriptor
If it helps, here are the functions called by pool.map:
def worker_mini(flag, chunk):
row_comp = []
for entry, entry2 in zip(chunk[0][0], chunk[1][0]):
if entry == entry2:
temp_comp = entry
else:
temp_comp = '%s|%s' % (entry, entry2)
row_comp.append(temp_comp)
return True, row_comp
#takes a single tuple argument and unpacks the tuple to multiple arguments
def worker_mini_star(flag_chunk):
"""Convert `f([1,2])` to `f(1,2)` call."""
return worker_mini(*flag_chunk)
def main():

How do I know which contract failed with Python's contract.py?

I'm playing with contract.py, Terrence Way's reference implementation of design-by-contract for Python. The implementation throws an exception when a contract (precondition/postcondition/invariant) is violated, but it doesn't provide you a quick way of identifying which specific contract has failed if there are multiple ones associated with a method.
For example, if I take the circbuf.py example, and violate the precondition by passing in a negative argument, like so:
circbuf(-5)
Then I get a traceback that looks like this:
Traceback (most recent call last):
File "circbuf.py", line 115, in <module>
circbuf(-5)
File "<string>", line 3, in __assert_circbuf___init___chk
File "build/bdist.macosx-10.5-i386/egg/contract.py", line 1204, in call_constructor_all
File "build/bdist.macosx-10.5-i386/egg/contract.py", line 1293, in _method_call_all
File "build/bdist.macosx-10.5-i386/egg/contract.py", line 1332, in _call_all
File "build/bdist.macosx-10.5-i386/egg/contract.py", line 1371, in _check_preconditions
contract.PreconditionViolationError: ('__main__.circbuf.__init__', 4)
My hunch is that the second argument in the PreconditionViolationError (4) refers to the line number in the circbuf.init docstring that contains the assertion:
def __init__(self, leng):
"""Construct an empty circular buffer.
pre::
leng > 0
post[self]::
self.is_empty() and len(self.buf) == leng
"""
However, it's a pain to have to open the file and count the docstring line numbers. Does anybody have a quicker solution for identifying which contract has failed?
(Note that in this example, there's a single precondition, so it's obvious, but multiple preconditions are possible).
This is an old question but I may as well answer it. I added some output, you'll see it at the comment # jlr001. Add the line below to your contract.py and when it raises an exception it will show the doc line number and the statement that triggered it. Nothing more than that, but it will at least stop you from needing to guess which condition triggered it.
def _define_checker(name, args, contract, path):
"""Define a function that does contract assertion checking.
args is a string argument declaration (ex: 'a, b, c = 1, *va, **ka')
contract is an element of the contracts list returned by parse_docstring
module is the containing module (not parent class)
Returns the newly-defined function.
pre::
isstring(name)
isstring(args)
contract[0] in _CONTRACTS
len(contract[2]) > 0
post::
isinstance(__return__, FunctionType)
__return__.__name__ == name
"""
output = StringIO()
output.write('def %s(%s):\n' % (name, args))
# ttw001... raise new exception classes
ex = _EXCEPTIONS.get(contract[0], 'ContractViolationError')
output.write('\tfrom %s import forall, exists, implies, %s\n' % \
(MODULE, ex))
loc = '.'.join([x.__name__ for x in path])
for c in contract[2]:
output.write('\tif not (')
output.write(c[0])
# jlr001: adding conidition statement to output message, easier debugging
output.write('): raise %s("%s", %u, "%s")\n' % (ex, loc, c[1], c[0]))
# ...ttw001
# ttw016: return True for superclasses to use in preconditions
output.write('\treturn True')
# ...ttw016
return _define(name, output.getvalue(), path[0])
Without modifying his code, I don't think you can, but since this is python...
If you look for where he raises the exception to the user, it I think is possible to push the info you're looking for into it... I wouldn't expect you to be able to get the trace-back to be any better though because the code is actually contained in a comment block and then processed.
The code is pretty complicated, but this might be a block to look at - maybe if you dump out some of the args you can figure out whats going on...
def _check_preconditions(a, func, va, ka):
# ttw006: correctly weaken pre-conditions...
# ab002: Avoid generating AttributeError exceptions...
if hasattr(func, '__assert_pre'):
try:
func.__assert_pre(*va, **ka)
except PreconditionViolationError, args:
# if the pre-conditions fail, *all* super-preconditions
# must fail too, otherwise
for f in a:
if f is not func and hasattr(f, '__assert_pre'):
f.__assert_pre(*va, **ka)
raise InvalidPreconditionError(args)
# rr001: raise original PreconditionViolationError, not
# inner AttributeError...
# raise
raise args
# ...rr001
# ...ab002
# ...ttw006

Categories

Resources