I have a list of greyscale pillow images.
I would like to sort that list in terms of the average pixel value of each image.
My most recent attempt is below:
def getTiles(tiles_directory):
files = os.listdir(tiles_directory)
tiles = []
for file in files:
filePath =os.path.abspath(os.path.join(tiles_directory, file))
try:
fp = open(filePath, "rb")
im = Image.open(fp)
im = ImageOps.grayscale(im)
tiles.append(im)
im.load()
fp.close()
except:
print("Invalid tile: %s" % (filePath,))
return (tiles)
input_tiles = getTiles(file_repository)
images_sorted_by_ave_pixel = sorted(
input_tiles, key=lambda x: ImageStat.Stat(x).mean)
But I get an error:
File "StitchMosaic.py", line 351, in generate
sorted_tiles=sortTiles(tiles)
File "StitchMosaic.py", line 50, in sortTiles
byAvePixel = sorted(tiles, key=lambda x: ImageStat.Stat(x).mean)
File "StitchMosaic.py", line 50, in <lambda>
byAvePixel = sorted(tiles, key=lambda x: ImageStat.Stat(x).mean)
File "/Users/stuartfish/opt/anaconda3/lib/python3.8/site-packages/PIL/ImageStat.py", line 39, in __init__
raise TypeError("first argument must be image or list")
TypeError: first argument must be image or list
What did I do wrong?
Thanks all. The answer was a simple error on my part - highlighted by #Vishal Kumar Sahu - the existence of the type() method will make my debugging much easier :)
Despite my lengthy variables names I still managed to pass a string instead of a list of image objects.
My target is that I open an excel file, add a new column and save it as a new file. Unfortunately the code crashes at the save position.
import pandas as pd
path = '//My Documents/Python/'
fileName = "test.xlsx"
ef = pd.ExcelFile(path+fileName)
df = pd.read_excel(path+fileName, sheet_name=ef.sheet_names[0])
i = 1
for test in df['Content']:
try:
df['Content'] = df['Content'].astype(str)
except:
print("An exception occurred")
break
i += 1
print('success')
df.to_excel('/My Documents/Python/test_NEW.xlsx')
The error message
with link or location/anchor > 255 characters since it exceeds Excel's limit for URLS force_unicode(url)) Traceback (most recent call last):
Now my question is, is the save method wrong? How can I save the the columns with more than 256 characters or how can I cut the link after 256 characters ?
Thank you in advance! I would be very happy to receive an answer.
I get this error whenever I try running Reducer python program in Hadoop system. The Mapper program is perfectly running though. Have given the same permissions as my Mapper program. Is there a syntax error?
Traceback (most recent call last):
File "reducer.py", line 13, in
word, count = line.split('\t', 1)
ValueError: need more than 1 value to unpack
#!/usr/bin/env python
import sys
# maps words to their counts
word2count = {}
# input comes from STDIN
for line in sys.stdin:
# remove leading and trailing whitespace
line = line.strip()
# parse the input we got from mapper.py
word, count = line.split('\t', 1)
# convert count (currently a string) to int
try:
count = int(count)
except ValueError:
continue
try:
word2count[word] = word2count[word]+count
except:
word2count[word] = count
# write the tuples to stdout
# Note: they are unsorted
for word in word2count.keys():
print '%s\t%s'% ( word, word2count[word] )
The error ValueError: need more than 1 value to unpack is thrown when you do a multi-assign with too few values on the right hand side. So it looks like line has no \t in it, so line.split('\t',1) results in a single value, causing something like word, count = ("foo",).
I cannot answer in detail.
However, I solved the same issue I had when I removed some extra print I had added in the mapper. Probably it is related with how print works for sys.stdin.
I know probably you have already solved the issue now
I changed line.split('\t', 1) to line.split(' ', 1) and it worked.
It seems that the space is not clear, to be perfectly clear: It should be line.split('(one space here)', 1).
I am running the following code on ubuntu 11.10, python 2.7.2+.
import urllib
import Image
import StringIO
source = '/home/cah/Downloads/evil2.gfx'
dataFile = open(source, 'rb').read()
slicedFile1 = StringIO.StringIO(dataFile[::5])
slicedFile2 = StringIO.StringIO(dataFile[1::5])
slicedFile3 = StringIO.StringIO(dataFile[2::5])
slicedFile4 = StringIO.StringIO(dataFile[3::5])
jpgimage1 = Image.open(slicedFile1)
jpgimage1.save('/home/cah/Documents/pychallenge12.1.jpg')
pngimage1 = Image.open(slicedFile2)
pngimage1.save('/home/cah/Documents/pychallenge12.2.png')
gifimage1 = Image.open(slicedFile3)
gifimage1.save('/home/cah/Documents/pychallenge12.3.gif')
pngimage2 = Image.open(slicedFile4)
pngimage2.save('/home/cah/Documents/pychallenge12.4.png')
in essence i'm taking a .bin file that has hex code for several image files jumbled
like 123451234512345... and clumping together then saving. The problem is i'm getting the following error:
File "/usr/lib/python2.7/dist-packages/PIL/PngImagePlugin.py", line 96, in read
len = i32(s)
File "/usr/lib/python2.7/dist-packages/PIL/PngImagePlugin.py", line 44, in i32
return ord(c[3]) + (ord(c[2])<<8) + (ord(c[1])<<16) + (ord(c[0])<<24)
IndexError: string index out of range
i found the PngImagePlugin.py and I looked at what it had:
def i32(c):
return ord(c[3]) + (ord(c[2])<<8) + (ord(c[1])<<16) + (ord(c[0])<<24) (line 44)
"Fetch a new chunk. Returns header information."
if self.queue:
cid, pos, len = self.queue[-1]
del self.queue[-1]
self.fp.seek(pos)
else:
s = self.fp.read(8)
cid = s[4:]
pos = self.fp.tell()
len = i32(s) (lines 88-96)
i would try tinkering, but I'm afraid I'll screw up png and PIL, which have been erksome to get working.
thanks
It would appear that len(s) < 4 at this stage
len = i32(s)
Which means that
s = self.fp.read(8)
isn't reading the whole 4 bytes
probably the data in the fp you are passing isn't making sense to the image decoder.
Double check that you are slicing correctly
Make sure that the string you are passing in is of at least length 4.
I have been trying to parse a file with xml.etree.ElementTree:
import xml.etree.ElementTree as ET
from xml.etree.ElementTree import ParseError
def analyze(xml):
it = ET.iterparse(file(xml))
count = 0
last = None
try:
for (ev, el) in it:
count += 1
last = el
except ParseError:
print("catastrophic failure")
print("last successful: {0}".format(last))
print('count: {0}'.format(count))
This is of course a simplified version of my code, but this is enough to break my program. I get this error with some files if I remove the try-catch block:
Traceback (most recent call last):
File "<pyshell#22>", line 1, in <module>
from yparse import analyze; analyze('file.xml')
File "C:\Python27\yparse.py", line 10, in analyze
for (ev, el) in it:
File "C:\Python27\lib\xml\etree\ElementTree.py", line 1258, in next
self._parser.feed(data)
File "C:\Python27\lib\xml\etree\ElementTree.py", line 1624, in feed
self._raiseerror(v)
File "C:\Python27\lib\xml\etree\ElementTree.py", line 1488, in _raiseerror
raise err
ParseError: reference to invalid character number: line 1, column 52459
The results are deterministic though, if a file works it will always work. If a file fails, it always fails and always fails at the same point.
The strangest thing is I'm using the trace to find out if I have any malformed XML that's breaking the parser. I then isolate the node that caused the failure. But when I create an XML file containing that node and a few of its neighbors, the parsing works!
This doesn't seem to be a size problem either. I have managed to parse much larger files with no problems.
Any ideas?
Here are some ideas:
(0) Explain "a file" and "occasionally": do you really mean it works sometimes and fails sometimes with the same file?
Do the following for each failing file:
(1) Find out what is in the file at the point that it is complaining about:
text = open("the_file.xml", "rb").read()
err_col = 52459
print repr(text[err_col-50:err_col+100]) # should include the error text
print repr(text[:50]) # show the XML declaration
(2) Throw your file at a web-based XML validation service e.g. http://www.validome.org/xml/ or http://validator.aborla.net/
and edit your question to display your findings.
Update: Here is the minimal xml file that illustrates your problem:
[badcharref.xml]
<a></a>
[Python 2.7.1 output]
>>> import xml.etree.ElementTree as ET
>>> it = ET.iterparse(file("badcharref.xml"))
>>> for ev, el in it:
... print el.tag
...
Traceback (most recent call last):
File "<stdin>", line 1, in <module>
File "C:\python27\lib\xml\etree\ElementTree.py", line 1258, in next
self._parser.feed(data)
File "C:\python27\lib\xml\etree\ElementTree.py", line 1624, in feed
self._raiseerror(v)
File "C:\python27\lib\xml\etree\ElementTree.py", line 1488, in _raiseerror
raise err
xml.etree.ElementTree.ParseError: reference to invalid character number: line 1, column 3
>>>
Not all valid Unicode characters are valid in XML. See the XML 1.0 Specification.
You may wish to examine your files using regexes like r'&#([0-9]+);' and r'&#x([0-9A-Fa-f]+);', convert the matched text to an int ordinal and check against the valid list from the spec i.e. #x9 | #xA | #xD | [#x20-#xD7FF] | [#xE000-#xFFFD] | [#x10000-#x10FFFF]
... or maybe the numeric character reference is syntactically invalid e.g. not terminated by a ;', &#not-a-digit etc etc
Update 2 I was wrong, the number in the ElementTree error message is counting Unicode code points, not bytes. See the code below and snippets from the output from running it over the two bad files.
# coding: ascii
# Find numeric character references that refer to Unicode code points
# that are not valid in XML.
# Get byte offsets for seeking etc in undecoded file bytestreams.
# Get unicode offsets for checking against ElementTree error message,
# **IF** your input file is small enough.
BYTE_OFFSETS = True
import sys, re, codecs
fname = sys.argv[1]
print fname
if BYTE_OFFSETS:
text = open(fname, "rb").read()
else:
# Assumes file is encoded in UTF-8.
text = codecs.open(fname, "rb", "utf8").read()
rx = re.compile("&#([0-9]+);|&#x([0-9a-fA-F]+);")
endpos = len(text)
pos = 0
while pos < endpos:
m = rx.search(text, pos)
if not m: break
mstart, mend = m.span()
target = m.group(1)
if target:
num = int(target)
else:
num = int(m.group(2), 16)
# #x9 | #xA | #xD | [#x20-#xD7FF] | [#xE000-#xFFFD] | [#x10000-#x10FFFF]
if not(num in (0x9, 0xA, 0xD) or 0x20 <= num <= 0xD7FF
or 0xE000 <= num <= 0xFFFD or 0x10000 <= num <= 0x10FFFF):
print mstart, m.group()
pos = mend
Output:
comments.xml
6615405
10205764
10213901
10213936
10214123
13292514
...
155656543
155656564
157344876
157722583
posts.xml
7607143
12982273
12982282
12982292
12982302
12982310
16085949
16085955
...
36303479
36303494 <<=== whoops
38942863
...
785292911
801282472
848911592
As #John Machin suggested, the files in question do have dubious numeric entities in them, though the error messages seem to be pointing at the wrong place in the text. Perhaps the streaming nature and buffering are making it difficult to report accurate positions.
In fact, all of these entities appear in the text:
set(['', '', '', '', '', '', '
', '', '', '', '', '', '', '', '
', '', '', ' ', '', '', '', '', ''])
Most are not allowed. Looks like this parser is quite strict, you'll need to find another that is not so strict, or pre-process the XML.
I'm not sure if this answers your question, but if you want to use an exception with the ParseError raised by element tree, you would do this:
except ET.ParseError:
print("catastrophic failure")
print("last successful: {0}".format(last))
Source: http://effbot.org/zone/elementtree-13-intro.htm
I felt it might also be important to note here that you could rather easily catch your error and avoid having to completely stop your program by simply using what you're already using later on in the function, placing your statement:
it = ET.iterparse(file(xml))
inside a try & except bracket:
try:
it = ET.iterparse(file(xml))
except:
print('iterparse error')
Of course, this will not fix your XML file or pre-processing technique, but could help in identifying which file (if you're parsing lots) is causing your error.