Generate XML for strings of different languages, getting error

Generate XML for strings of different languages, getting error - python

I'm writing code to generate an XML with content from different languages strings. I got an error for unicode generation initially, added setdefault command at the beginning, now getting "attributeError: 'str' object has no attribute 'iter' python". Tried searching but the answers didnt help much.
Here is the traceback:
Traceback (most recent call last):
File "oldgood_XliffGenerator.py", line 118, in <module>
convertToXliff(filename)
File "oldgood_XliffGenerator.py", line 47, in convertToXliff
tree.write(destifilename, xml_declaration=True, encoding='utf-8')
File "/usr/lib/python2.7/xml/etree/ElementTree.py", line 817, in write
self._root, encoding, default_namespace
File "/usr/lib/python2.7/xml/etree/ElementTree.py", line 877, in _namespaces
for elem in iterate():
File "/usr/lib/python2.7/xml/etree/ElementTree.py", line 477, in iter
for e in e.iter(tag):
File "/usr/lib/python2.7/xml/etree/ElementTree.py", line 477, in iter
for e in e.iter(tag):
File "/usr/lib/python2.7/xml/etree/ElementTree.py", line 477, in iter
for e in e.iter(tag):
File "/usr/lib/python2.7/xml/etree/ElementTree.py", line 477, in iter
for e in e.iter(tag):
AttributeError: 'str' object has no attribute 'iter'
Code Snippet:
def convertToXliff(filename):
if filename:
if os.path.isfile(filename):
valid=True
else:
print "Could not open "+filename
else:
print "no input"
global fileLength
root = ET.Element("file")
global file
file = ET.SubElement(root, "file")
file.set("id", generatingLang)
file.set("native", nativeLang)
file.set("useAsLocale", setLocale)
print "Reached stage1"
datainp = fileRead(filename)
RecurseObjects(datainp)
destifilename = "testconvfile.xml"
#Indent(root)
tree = ET.ElementTree(root)
tree.write(destifilename, xml_declaration=True, encoding='utf-8')
plz check and let me know what Im missing

Related

Exception when reading corrupted spreadsheet

Rather a bug report with possible fix. I'm using version 3.0.9.
One of the files I need to handle has a problem with one of the images. When I open it with libreoffice, I see placeholder instead of an image. But when I open it with load_workbook(), an exception occurs:
Traceback (most recent call last):
File "/home/pooh/work/isaac_choi/./1.py", line 5, in <module>
wb=load_workbook('pritelli/FW21 WOMAN 27.09.21.xlsx')
File "/home/pooh/venv39/lib/python3.9/site-packages/openpyxl/reader/excel.py", line 317, in load_workbook
reader.read()
File "/home/pooh/venv39/lib/python3.9/site-packages/openpyxl/reader/excel.py", line 282, in read
self.read_worksheets()
File "/home/pooh/venv39/lib/python3.9/site-packages/openpyxl/reader/excel.py", line 257, in read_worksheets
charts, images = find_images(self.archive, rel.target)
File "/home/pooh/venv39/lib/python3.9/site-packages/openpyxl/reader/drawings.py", line 52, in find_images
image = Image(BytesIO(archive.read(dep.target)))
File "/usr/lib/python3.9/zipfile.py", line 1463, in read
with self.open(name, "r", pwd) as fp:
File "/usr/lib/python3.9/zipfile.py", line 1502, in open
zinfo = self.getinfo(name)
File "/usr/lib/python3.9/zipfile.py", line 1429, in getinfo
raise KeyError(
KeyError: "There is no item named 'xl/drawings/NULL' in the archive"

I think KeyError can be handled right after OSError (line 53), and just continue iterating in this case:
except KeyError:
warn('Missing image')
continue

Can someone help me understand what this error means in pdfminer's pdf2txt: AttributeError: 'PDFObjRef' object has no attribute 'decode'

I am using pdfminer's pdf2txt.py to extract text from different pdf's. The algorithm works very well in a lot of scenarios, but I am getting this error and I'm not sure what I can do to get pdfminer to work.
AttributeError: 'PDFObjRef' object has no attribute 'decode'
I have run this same command on other documents and this is the only one I am getting this error on.
I am simply running this off of the command line, so there is no other code to show:
pdf2txt.py -t xml -F -1.0 test.pdf
This is the complete output from pdf2txt.py
<?xml version="1.0" encoding="utf-8" ?>
<pages>
Traceback (most recent call last):
File "/usr/local/bin/pdf2txt.py", line 116, in <module>
if __name__ == '__main__': sys.exit(main(sys.argv))
File "/usr/local/bin/pdf2txt.py", line 110, in main
interpreter.process_page(page)
File "/Library/Python/2.7/site-packages/pdfminer2-20151206-py2.7.egg/pdfminer/pdfinterp.py", line 834, in process_page
self.render_contents(page.resources, page.contents, ctm=ctm)
File "/Library/Python/2.7/site-packages/pdfminer2-20151206-py2.7.egg/pdfminer/pdfinterp.py", line 844, in render_contents
self.init_resources(resources)
File "/Library/Python/2.7/site-packages/pdfminer2-20151206-py2.7.egg/pdfminer/pdfinterp.py", line 350, in init_resources
self.fontmap[fontid] = self.rsrcmgr.get_font(objid, spec)
File "/Library/Python/2.7/site-packages/pdfminer2-20151206-py2.7.egg/pdfminer/pdfinterp.py", line 200, in get_font
font = self.get_font(None, subspec)
File "/Library/Python/2.7/site-packages/pdfminer2-20151206-py2.7.egg/pdfminer/pdfinterp.py", line 191, in get_font
font = PDFCIDFont(self, spec)
File "/Library/Python/2.7/site-packages/pdfminer2-20151206-py2.7.egg/pdfminer/pdffont.py", line 643, in __init__
self.cidcoding = '%s-%s' % (self.cidsysteminfo.get('Registry', b'unknown').decode("latin1"),
AttributeError: 'PDFObjRef' object has no attribute 'decode'
Any insights are appreciated!

parse xml file and output to text file

Trying to parse an xml file (config.xml) with ElementTree and output to a text file. I looked at other similar ques here but none helped me. Using Python 2.7.9
import xml.etree.ElementTree as ET
tree = ET.parse('config.xml')
notags = ET.tostring(tree,encoding='us-ascii',method='text')
print(notags)
OUTPUT
Traceback (most recent call last):
File "./python_element", line 9, in <module>
notags = ET.tostring(tree,encoding='us-ascii',method='text')
File "/usr/lib64/python2.7/xml/etree/ElementTree.py", line 1126, in tostring
ElementTree(element).write(file, encoding, method=method
File "/usr/lib64/python2.7/xml/etree/ElementTree.py", line 814, in write
_serialize_text(write, self._root, encoding)
File "/usr/lib64/python2.7/xml/etree/ElementTree.py", line 1005, in _serialize_text
for part in elem.itertext():
AttributeError:
> 'ElementTree' object has no attribute 'itertext'

Instead of tree (ElementTree object), pass an Element object. You can get an root element using .getroot() method:
notags = ET.tostring(tree.getroot(), encoding='utf-8',method='text')

Catch error in a for loop python

I have a for loop on an avro data reader object
for i in reader:
print i
then I got a unicode decode error in the for statement so I wanted to ignore that particular record. So I did this
try:
for i in reader:
print i
except:
pass
but it does not continue further. How can I overcome this problem
Edit: Error trace added
Traceback (most recent call last):
File "modify.py", line 22, in <module>
for record in reader:
File "/usr/lib/python2.6/site-packages/avro-1.7.7-py2.6.egg/avro/datafile.py", line 362, in next
datum = self.datum_reader.read(self.datum_decoder)
File "/usr/lib/python2.6/site-packages/avro-1.7.7-py2.6.egg/avro/io.py", line 445, in read
return self.read_data(self.writers_schema, self.readers_schema, decoder)
File "/usr/lib/python2.6/site-packages/avro-1.7.7-py2.6.egg/avro/io.py", line 490, in read_data
return self.read_record(writers_schema, readers_schema, decoder)
File "/usr/lib/python2.6/site-packages/avro-1.7.7-py2.6.egg/avro/io.py", line 690, in read_record
field_val = self.read_data(field.type, readers_field.type, decoder)
File "/usr/lib/python2.6/site-packages/avro-1.7.7-py2.6.egg/avro/io.py", line 468, in read_data
return decoder.read_utf8()
File "/usr/lib/python2.6/site-packages/avro-1.7.7-py2.6.egg/avro/io.py", line 233, in read_utf8
return unicode(self.read_bytes(), "utf-8")
UnicodeDecodeError: 'utf8' codec can't decode byte 0xb4 in position 14: invalid start byte
could it be due to the fact that the file was corrupted?
Edit2:
As per suggestion in answers to go through iterobject I modified code and got this error
Traceback (most recent call last):
File "modify.py", line 28, in <module>
print next(iterobject)["filepath"]
File "/usr/lib/python2.6/site-packages/avro-1.7.7-py2.6.egg/avro/datafile.py", line 362, in next
datum = self.datum_reader.read(self.datum_decoder)
File "/usr/lib/python2.6/site-packages/avro-1.7.7-py2.6.egg/avro/io.py", line 445, in read
return self.read_data(self.writers_schema, self.readers_schema, decoder)
File "/usr/lib/python2.6/site-packages/avro-1.7.7-py2.6.egg/avro/io.py", line 490, in read_data
return self.read_record(writers_schema, readers_schema, decoder)
File "/usr/lib/python2.6/site-packages/avro-1.7.7-py2.6.egg/avro/io.py", line 690, in read_record
field_val = self.read_data(field.type, readers_field.type, decoder)
File "/usr/lib/python2.6/site-packages/avro-1.7.7-py2.6.egg/avro/io.py", line 468, in read_data
return decoder.read_utf8()
File "/usr/lib/python2.6/site-packages/avro-1.7.7-py2.6.egg/avro/io.py", line 233, in read_utf8
return unicode(self.read_bytes(), "utf-8")
File "/usr/lib/python2.6/site-packages/avro-1.7.7-py2.6.egg/avro/io.py", line 226, in read_bytes
return self.read(self.read_long())
File "/usr/lib/python2.6/site-packages/avro-1.7.7-py2.6.egg/avro/io.py", line 184, in read_long
b = ord(self.read(1))
TypeError: ord() expected a character, but string of length 0 found

If your error is in for i in. Then try this, it will skip element in iterator if UnicodeDecodeError occurs.
iterobject = iter(reader)
while iterobject:
try:
print(next(iterobject))
except StopIteration:
break
except UnicodeDecodeError:
pass

You need the try/except inside the loop:
for i in reader:
try:
print i
except UnicodeEncodeError:
pass
By the way it's good practice to specify the specific type of error you're trying to catch (like I did with except UnicodeEncodeError:, since otherwise you risk making your code very hard to debug!

You can except the specific error, and avoid unknown errors to pass unnoticed.
Python 3.x:
try:
for i in reader:
print i
except UnicodeDecodeError as ue:
print(str(ue))
Python 2.x:
try:
for i in reader:
print i
except UnicodeDecodeError, ue:
print(str(ue))
By printing the error it's possible to know what happened. When you use only except, you except anything (And that can include an obscure RuntimeError), and you'll never know what happened. It can be useful sometimes, but it's dangerous and generally a bad practice.

elementtree.parse() not opening file open() will

I am trying to write a python script that will open an XML file add some elements and then exit. I can't get element tree to open the file I want. I am able to open the file however with open(). Example code
try:
file = open(projectLocation + "\etc\config\struts-config.xml", 'r')
print "oppenned file" #this will print
tree = ET.parse(projectLocation + "\etc\config\struts-config.xml") #this fails
print "oppenned xml"
except:
print "could not open " + projectLocation + "\etc\config\struts-config.xml"
sys.exit()
Updating the code to this, for the comment I recieved:
tree = ET.parse(projectLocation + "\etc\config\struts-config.xml")
print "oppenned xml"
this give me the error:
Traceback (most recent call last):
File "test.py", line 117, in <module>
tree = ET.parse(projectLocation + "\etc\config\struts-config.xml")
File "/usr/lib/python2.6/xml/etree/ElementTree.py", line 862, in parse
tree.parse(source, parser)
File "/usr/lib/python2.6/xml/etree/ElementTree.py", line 586, in parse
parser.feed(data)
File "/usr/lib/python2.6/xml/etree/ElementTree.py", line 1245, in feed
self._parser.Parse(data, 0)
xml.parsers.expat.ExpatError: mismatched tag: line 1205, column 2

Develop Reference

Python is a programming language that lets you work quickly and integrate systems more effectively.

Generate XML for strings of different languages, getting error - python

Related

Exception when reading corrupted spreadsheet

Can someone help me understand what this error means in pdfminer's pdf2txt: AttributeError: 'PDFObjRef' object has no attribute 'decode'

parse xml file and output to text file

Catch error in a for loop python

elementtree.parse() not opening file open() will

Categories

Resources