struct.error: unpack requires a string argument of length 12

struct.error: unpack requires a string argument of length 12 - python

I am trying to follow a tutorial from Coding Robin to create a HAAR classifier: http://coding-robin.de/2013/07/22/train-your-own-opencv-haar-classifier.html.
I am at the part where I need to merge all the .vec files. I am trying to execute the python script given and I am getting the following error:
Traceback (most recent call last):
File "mergevec.py", line 170, in <module>
merge_vec_files(vec_directory, output_filename)
File "mergevec.py", line 133, in merge_vec_files
val = struct.unpack('<iihh', content[:12])
struct.error: unpack requires a string argument of length 12
Here is the code from the python script:
# Get the value for the first image size
prev_image_size = 0
try:
with open(files[0], 'rb') as vecfile:
content = ''.join(str(line) for line in vecfile.readlines())
val = struct.unpack('<iihh', content[:12])
prev_image_size = val[1]
except IOError as e:
print('An IO error occured while processing the file: {0}'.format(f))
exception_response(e)
# Get the total number of images
total_num_images = 0
for f in files:
try:
with open(f, 'rb') as vecfile:
content = ''.join(str(line) for line in vecfile.readlines())
val = struct.unpack('<iihh', content[:12])
num_images = val[0]
image_size = val[1]
if image_size != prev_image_size:
err_msg = """The image sizes in the .vec files differ. These values must be the same. \n The image size of file {0}: {1}\n
The image size of previous files: {0}""".format(f, image_size, prev_image_size)
sys.exit(err_msg)
total_num_images += num_images
except IOError as e:
print('An IO error occured while processing the file: {0}'.format(f))
exception_response(e)
I tried looking through solutions, but can't find a solution that fits this specific problem. Any help will be appreciated.
Thank you!

I figured it out by going to the github page for the tutorial. Apparently, I had to delete any vec files that had a length of zero.

Your problem is this bit:
content[:12]
The string is not guaranteed to be 12 characters long; it could be fewer. Add a length check and handle it separately, or try: except: and give the user a saner error message like "Invalid input in file ...".

Related

Sorting a list of images by the average pixel value of each image

I have a list of greyscale pillow images.
I would like to sort that list in terms of the average pixel value of each image.
My most recent attempt is below:
def getTiles(tiles_directory):
files = os.listdir(tiles_directory)
tiles = []
for file in files:
filePath =os.path.abspath(os.path.join(tiles_directory, file))
try:
fp = open(filePath, "rb")
im = Image.open(fp)
im = ImageOps.grayscale(im)
tiles.append(im)
im.load()
fp.close()
except:
print("Invalid tile: %s" % (filePath,))
return (tiles)
input_tiles = getTiles(file_repository)
images_sorted_by_ave_pixel = sorted(
input_tiles, key=lambda x: ImageStat.Stat(x).mean)
But I get an error:
File "StitchMosaic.py", line 351, in generate
sorted_tiles=sortTiles(tiles)
File "StitchMosaic.py", line 50, in sortTiles
byAvePixel = sorted(tiles, key=lambda x: ImageStat.Stat(x).mean)
File "StitchMosaic.py", line 50, in <lambda>
byAvePixel = sorted(tiles, key=lambda x: ImageStat.Stat(x).mean)
File "/Users/stuartfish/opt/anaconda3/lib/python3.8/site-packages/PIL/ImageStat.py", line 39, in __init__
raise TypeError("first argument must be image or list")
TypeError: first argument must be image or list
What did I do wrong?

Thanks all. The answer was a simple error on my part - highlighted by #Vishal Kumar Sahu - the existence of the type() method will make my debugging much easier :)
Despite my lengthy variables names I still managed to pass a string instead of a list of image objects.

Excel limit for force_unicode(url))

My target is that I open an excel file, add a new column and save it as a new file. Unfortunately the code crashes at the save position.
import pandas as pd
path = '//My Documents/Python/'
fileName = "test.xlsx"
ef = pd.ExcelFile(path+fileName)
df = pd.read_excel(path+fileName, sheet_name=ef.sheet_names[0])
i = 1
for test in df['Content']:
try:
df['Content'] = df['Content'].astype(str)
except:
print("An exception occurred")
break
i += 1
print('success')
df.to_excel('/My Documents/Python/test_NEW.xlsx')
The error message
with link or location/anchor > 255 characters since it exceeds Excel's limit for URLS force_unicode(url)) Traceback (most recent call last):
Now my question is, is the save method wrong? How can I save the the columns with more than 256 characters or how can I cut the link after 256 characters ?
Thank you in advance! I would be very happy to receive an answer.

wordcount: reducer python program throws ValueError

I get this error whenever I try running Reducer python program in Hadoop system. The Mapper program is perfectly running though. Have given the same permissions as my Mapper program. Is there a syntax error?
Traceback (most recent call last):
File "reducer.py", line 13, in
word, count = line.split('\t', 1)
ValueError: need more than 1 value to unpack
#!/usr/bin/env python
import sys
# maps words to their counts
word2count = {}
# input comes from STDIN
for line in sys.stdin:
# remove leading and trailing whitespace
line = line.strip()
# parse the input we got from mapper.py
word, count = line.split('\t', 1)
# convert count (currently a string) to int
try:
count = int(count)
except ValueError:
continue
try:
word2count[word] = word2count[word]+count
except:
word2count[word] = count
# write the tuples to stdout
# Note: they are unsorted
for word in word2count.keys():
print '%s\t%s'% ( word, word2count[word] )

The error ValueError: need more than 1 value to unpack is thrown when you do a multi-assign with too few values on the right hand side. So it looks like line has no \t in it, so line.split('\t',1) results in a single value, causing something like word, count = ("foo",).

I cannot answer in detail.
However, I solved the same issue I had when I removed some extra print I had added in the mapper. Probably it is related with how print works for sys.stdin.
I know probably you have already solved the issue now

I changed line.split('\t', 1) to line.split(' ', 1) and it worked.
It seems that the space is not clear, to be perfectly clear: It should be line.split('(one space here)', 1).

Why am I getting an IndexError: string index out of range?

I am running the following code on ubuntu 11.10, python 2.7.2+.
import urllib
import Image
import StringIO
source = '/home/cah/Downloads/evil2.gfx'
dataFile = open(source, 'rb').read()
slicedFile1 = StringIO.StringIO(dataFile[::5])
slicedFile2 = StringIO.StringIO(dataFile[1::5])
slicedFile3 = StringIO.StringIO(dataFile[2::5])
slicedFile4 = StringIO.StringIO(dataFile[3::5])
jpgimage1 = Image.open(slicedFile1)
jpgimage1.save('/home/cah/Documents/pychallenge12.1.jpg')
pngimage1 = Image.open(slicedFile2)
pngimage1.save('/home/cah/Documents/pychallenge12.2.png')
gifimage1 = Image.open(slicedFile3)
gifimage1.save('/home/cah/Documents/pychallenge12.3.gif')
pngimage2 = Image.open(slicedFile4)
pngimage2.save('/home/cah/Documents/pychallenge12.4.png')
in essence i'm taking a .bin file that has hex code for several image files jumbled
like 123451234512345... and clumping together then saving. The problem is i'm getting the following error:
File "/usr/lib/python2.7/dist-packages/PIL/PngImagePlugin.py", line 96, in read
len = i32(s)
File "/usr/lib/python2.7/dist-packages/PIL/PngImagePlugin.py", line 44, in i32
return ord(c[3]) + (ord(c[2])<<8) + (ord(c[1])<<16) + (ord(c[0])<<24)
IndexError: string index out of range
i found the PngImagePlugin.py and I looked at what it had:
def i32(c):
return ord(c[3]) + (ord(c[2])<<8) + (ord(c[1])<<16) + (ord(c[0])<<24) (line 44)
"Fetch a new chunk. Returns header information."
if self.queue:
cid, pos, len = self.queue[-1]
del self.queue[-1]
self.fp.seek(pos)
else:
s = self.fp.read(8)
cid = s[4:]
pos = self.fp.tell()
len = i32(s) (lines 88-96)
i would try tinkering, but I'm afraid I'll screw up png and PIL, which have been erksome to get working.
thanks

It would appear that len(s) < 4 at this stage
len = i32(s)
Which means that
s = self.fp.read(8)
isn't reading the whole 4 bytes
probably the data in the fp you are passing isn't making sense to the image decoder.
Double check that you are slicing correctly

Make sure that the string you are passing in is of at least length 4.

Why is ElementTree raising a ParseError?

I have been trying to parse a file with xml.etree.ElementTree:
import xml.etree.ElementTree as ET
from xml.etree.ElementTree import ParseError
def analyze(xml):
it = ET.iterparse(file(xml))
count = 0
last = None
try:
for (ev, el) in it:
count += 1
last = el
except ParseError:
print("catastrophic failure")
print("last successful: {0}".format(last))
print('count: {0}'.format(count))
This is of course a simplified version of my code, but this is enough to break my program. I get this error with some files if I remove the try-catch block:
Traceback (most recent call last):
File "<pyshell#22>", line 1, in <module>
from yparse import analyze; analyze('file.xml')
File "C:\Python27\yparse.py", line 10, in analyze
for (ev, el) in it:
File "C:\Python27\lib\xml\etree\ElementTree.py", line 1258, in next
self._parser.feed(data)
File "C:\Python27\lib\xml\etree\ElementTree.py", line 1624, in feed
self._raiseerror(v)
File "C:\Python27\lib\xml\etree\ElementTree.py", line 1488, in _raiseerror
raise err
ParseError: reference to invalid character number: line 1, column 52459
The results are deterministic though, if a file works it will always work. If a file fails, it always fails and always fails at the same point.
The strangest thing is I'm using the trace to find out if I have any malformed XML that's breaking the parser. I then isolate the node that caused the failure. But when I create an XML file containing that node and a few of its neighbors, the parsing works!
This doesn't seem to be a size problem either. I have managed to parse much larger files with no problems.
Any ideas?

Here are some ideas:
(0) Explain "a file" and "occasionally": do you really mean it works sometimes and fails sometimes with the same file?
Do the following for each failing file:
(1) Find out what is in the file at the point that it is complaining about:
text = open("the_file.xml", "rb").read()
err_col = 52459
print repr(text[err_col-50:err_col+100]) # should include the error text
print repr(text[:50]) # show the XML declaration
(2) Throw your file at a web-based XML validation service e.g. http://www.validome.org/xml/ or http://validator.aborla.net/
and edit your question to display your findings.
Update: Here is the minimal xml file that illustrates your problem:
[badcharref.xml]
<a></a>
[Python 2.7.1 output]
>>> import xml.etree.ElementTree as ET
>>> it = ET.iterparse(file("badcharref.xml"))
>>> for ev, el in it:
... print el.tag
...
Traceback (most recent call last):
File "<stdin>", line 1, in <module>
File "C:\python27\lib\xml\etree\ElementTree.py", line 1258, in next
self._parser.feed(data)
File "C:\python27\lib\xml\etree\ElementTree.py", line 1624, in feed
self._raiseerror(v)
File "C:\python27\lib\xml\etree\ElementTree.py", line 1488, in _raiseerror
raise err
xml.etree.ElementTree.ParseError: reference to invalid character number: line 1, column 3
>>>
Not all valid Unicode characters are valid in XML. See the XML 1.0 Specification.
You may wish to examine your files using regexes like r'&#([0-9]+);' and r'&#x([0-9A-Fa-f]+);', convert the matched text to an int ordinal and check against the valid list from the spec i.e. #x9 | #xA | #xD | [#x20-#xD7FF] | [#xE000-#xFFFD] | [#x10000-#x10FFFF]
... or maybe the numeric character reference is syntactically invalid e.g. not terminated by a ;', &#not-a-digit etc etc
Update 2 I was wrong, the number in the ElementTree error message is counting Unicode code points, not bytes. See the code below and snippets from the output from running it over the two bad files.
# coding: ascii
# Find numeric character references that refer to Unicode code points
# that are not valid in XML.
# Get byte offsets for seeking etc in undecoded file bytestreams.
# Get unicode offsets for checking against ElementTree error message,
# **IF** your input file is small enough.
BYTE_OFFSETS = True
import sys, re, codecs
fname = sys.argv[1]
print fname
if BYTE_OFFSETS:
text = open(fname, "rb").read()
else:
# Assumes file is encoded in UTF-8.
text = codecs.open(fname, "rb", "utf8").read()
rx = re.compile("&#([0-9]+);|&#x([0-9a-fA-F]+);")
endpos = len(text)
pos = 0
while pos < endpos:
m = rx.search(text, pos)
if not m: break
mstart, mend = m.span()
target = m.group(1)
if target:
num = int(target)
else:
num = int(m.group(2), 16)
# #x9 | #xA | #xD | [#x20-#xD7FF] | [#xE000-#xFFFD] | [#x10000-#x10FFFF]
if not(num in (0x9, 0xA, 0xD) or 0x20 <= num <= 0xD7FF
or 0xE000 <= num <= 0xFFFD or 0x10000 <= num <= 0x10FFFF):
print mstart, m.group()
pos = mend
Output:
comments.xml
6615405 
10205764 
10213901 
10213936 
10214123 
13292514 
...
155656543 
155656564 
157344876 
157722583 
posts.xml
7607143 
12982273 
12982282 
12982292 
12982302 
12982310 
16085949 
16085955 
...
36303479 
36303494  <<=== whoops
38942863 
...
785292911 
801282472 
848911592

As #John Machin suggested, the files in question do have dubious numeric entities in them, though the error messages seem to be pointing at the wrong place in the text. Perhaps the streaming nature and buffering are making it difficult to report accurate positions.
In fact, all of these entities appear in the text:
set(['', '', '', '', '', '', '
', '', '', '', '', '', '', '', '
', '', '', ' ', '', '', '', '', ''])
Most are not allowed. Looks like this parser is quite strict, you'll need to find another that is not so strict, or pre-process the XML.

I'm not sure if this answers your question, but if you want to use an exception with the ParseError raised by element tree, you would do this:
except ET.ParseError:
print("catastrophic failure")
print("last successful: {0}".format(last))
Source: http://effbot.org/zone/elementtree-13-intro.htm

I felt it might also be important to note here that you could rather easily catch your error and avoid having to completely stop your program by simply using what you're already using later on in the function, placing your statement:
it = ET.iterparse(file(xml))
inside a try & except bracket:
try:
it = ET.iterparse(file(xml))
except:
print('iterparse error')
Of course, this will not fix your XML file or pre-processing technique, but could help in identifying which file (if you're parsing lots) is causing your error.

Develop Reference

Python is a programming language that lets you work quickly and integrate systems more effectively.

struct.error: unpack requires a string argument of length 12 - python

I figured it out by going to the github page for the tutorial. Apparently, I had to delete any vec files that had a length of zero.

Your problem is this bit: content[:12] The string is not guaranteed to be 12 characters long; it could be fewer. Add a length check and handle it separately, or try: except: and give the user a saner error message like "Invalid input in file ...".

Related

Sorting a list of images by the average pixel value of each image

Excel limit for force_unicode(url))

wordcount: reducer python program throws ValueError

Why am I getting an IndexError: string index out of range?

Why is ElementTree raising a ParseError?

Categories

Resources