elementtree.parse() not opening file open() will - python

I am trying to write a python script that will open an XML file add some elements and then exit. I can't get element tree to open the file I want. I am able to open the file however with open(). Example code
try:
file = open(projectLocation + "\etc\config\struts-config.xml", 'r')
print "oppenned file" #this will print
tree = ET.parse(projectLocation + "\etc\config\struts-config.xml") #this fails
print "oppenned xml"
except:
print "could not open " + projectLocation + "\etc\config\struts-config.xml"
sys.exit()
Updating the code to this, for the comment I recieved:
tree = ET.parse(projectLocation + "\etc\config\struts-config.xml")
print "oppenned xml"
this give me the error:
Traceback (most recent call last):
File "test.py", line 117, in <module>
tree = ET.parse(projectLocation + "\etc\config\struts-config.xml")
File "/usr/lib/python2.6/xml/etree/ElementTree.py", line 862, in parse
tree.parse(source, parser)
File "/usr/lib/python2.6/xml/etree/ElementTree.py", line 586, in parse
parser.feed(data)
File "/usr/lib/python2.6/xml/etree/ElementTree.py", line 1245, in feed
self._parser.Parse(data, 0)
xml.parsers.expat.ExpatError: mismatched tag: line 1205, column 2

Related

Python error :OSError: [Errno 9] Bad file descriptor

I have wanted to transform DOCX file using docx library. Everytime I run it i get this error
OSError: [Errno 9] Bad file descriptor
The code is :
from docx import Document
def bionify(path_to_text: str) -> None:
doc = Document(path_to_text)
new_doc = Document()
all_paragraphs = doc.paragraphs
for paragraph in all_paragraphs:
word_list = paragraph.text.split(' ')
new_paragraph = new_doc.add_paragraph()
for word in word_list:
i = 0
while i < len(word):
if i == 0 or i == 1:
new_paragraph.add_run(word[i]).bold = True
else:
new_paragraph.add_run(word[i]).bold = False
i += 1
new_paragraph.add_run(' ')
# Input the path to the document that you wish to save to:
new_doc.save('sample_output.docx')
if __name__ == '__main__':
# Input the path to the document containing your text file you wish to read from:
bionify(r'C:\Users\###\Desktop\bionic python reader transformer\BionicTexterizer\sample_input.docx')
I have changed the destination, python package, python version to run it. But every time I get OSError: [Errno 9] Bad file descriptor
Complete tracepack:
Traceback (most recent call last):
File "c:\Users\####\Desktop\bionic python reader transformer\BionicTexterizer\main.py", line 62, in <module>
bionify(r'C:\Users\####\Desktop\bionic python reader transformer\BionicTexterizer\sample_input.docx')
File "c:\Users\####\Desktop\bionic python reader transformer\BionicTexterizer\main.py", line 57, in bionify
new_doc.save('sample_output.docx')
File "C:\Python310\lib\site-packages\docx\document.py", line 135, in save
self._part.save(path_or_stream)
File "C:\Python310\lib\site-packages\docx\parts\document.py", line 111, in save
self.package.save(path_or_stream)
File "C:\Python310\lib\site-packages\docx\opc\package.py", line 172, in save
PackageWriter.write(pkg_file, self.rels, self.parts)
File "C:\Python310\lib\site-packages\docx\opc\pkgwriter.py", line 33, in write
PackageWriter._write_content_types_stream(phys_writer, parts)
File "C:\Python310\lib\site-packages\docx\opc\pkgwriter.py", line 45, in _write_content_types_stream
phys_writer.write(CONTENT_TYPES_URI, cti.blob)
File "C:\Python310\lib\site-packages\docx\opc\phys_pkg.py", line 155, in write
self._zipf.writestr(pack_uri.membername, blob)
File "C:\Python310\lib\zipfile.py", line 1810, in writestr
with self.open(zinfo, mode='w') as dest:
File "C:\Python310\lib\zipfile.py", line 1176, in close
self._fileobj.seek(self._zinfo.header_offset)
OSError: [Errno 9] Bad file descriptor
Exception ignored in: <function ZipFile.__del__ at 0x0000022D9BF4BEB0>
Traceback (most recent call last):
File "C:\Python310\lib\zipfile.py", line 1815, in __del__
self.close()
File "C:\Python310\lib\zipfile.py", line 1837, in close
self._fpclose(fp)
File "C:\Python310\lib\zipfile.py", line 1937, in _fpclose
fp.close()
Windows 11. It is a problem with windows 11. I have ran the code without any problems on windows 10. There seems to some package permission issues.

Python: Read a very large XML file

Am trying to read an xml file(which is a french diccionary) from a python script, but am getting the error below. Is there anyway I can fix it?
PS: the file is 158 643 Ko
from xml.dom import minidom
doc = minidom.parse("Dic.xml")
Data = doc.getElementsByTagName("title")[0]
titleData = Data.firstChild.data
print (titleData)
The error:
Traceback (most recent call last):
File "<pyshell#3>", line 1, in <module>
doc = minidom.parse("Morphalou-2.0.xml")
File "C:\Python27\lib\xml\dom\minidom.py", line 1918, in parse
return expatbuilder.parse(file)
File "C:\Python27\lib\xml\dom\expatbuilder.py", line 924, in parse
result = builder.parseFile(fp)
File "C:\Python27\lib\xml\dom\expatbuilder.py", line 204, in parseFile
buffer = file.read(16*1024)
MemoryError
Advance Thanks

Generate XML for strings of different languages, getting error

I'm writing code to generate an XML with content from different languages strings. I got an error for unicode generation initially, added setdefault command at the beginning, now getting "attributeError: 'str' object has no attribute 'iter' python". Tried searching but the answers didnt help much.
Here is the traceback:
Traceback (most recent call last):
File "oldgood_XliffGenerator.py", line 118, in <module>
convertToXliff(filename)
File "oldgood_XliffGenerator.py", line 47, in convertToXliff
tree.write(destifilename, xml_declaration=True, encoding='utf-8')
File "/usr/lib/python2.7/xml/etree/ElementTree.py", line 817, in write
self._root, encoding, default_namespace
File "/usr/lib/python2.7/xml/etree/ElementTree.py", line 877, in _namespaces
for elem in iterate():
File "/usr/lib/python2.7/xml/etree/ElementTree.py", line 477, in iter
for e in e.iter(tag):
File "/usr/lib/python2.7/xml/etree/ElementTree.py", line 477, in iter
for e in e.iter(tag):
File "/usr/lib/python2.7/xml/etree/ElementTree.py", line 477, in iter
for e in e.iter(tag):
File "/usr/lib/python2.7/xml/etree/ElementTree.py", line 477, in iter
for e in e.iter(tag):
AttributeError: 'str' object has no attribute 'iter'
Code Snippet:
def convertToXliff(filename):
if filename:
if os.path.isfile(filename):
valid=True
else:
print "Could not open "+filename
else:
print "no input"
global fileLength
root = ET.Element("file")
global file
file = ET.SubElement(root, "file")
file.set("id", generatingLang)
file.set("native", nativeLang)
file.set("useAsLocale", setLocale)
print "Reached stage1"
datainp = fileRead(filename)
RecurseObjects(datainp)
destifilename = "testconvfile.xml"
#Indent(root)
tree = ET.ElementTree(root)
tree.write(destifilename, xml_declaration=True, encoding='utf-8')
plz check and let me know what Im missing

Reading password protected Word Documents with zipfile

I am trying to read a password protected word document on Python using zipfile.
The following code works with a non-password protected document, but gives an error when used with a password protected file.
try:
from xml.etree.cElementTree import XML
except ImportError:
from xml.etree.ElementTree import XML
import zipfile
psw = "1234"
WORD_NAMESPACE = '{http://schemas.openxmlformats.org/wordprocessingml/2006/main}'
PARA = WORD_NAMESPACE + 'p'
TEXT = WORD_NAMESPACE + 't'
def get_docx_text(path):
document = zipfile.ZipFile(path, "r")
document.setpassword(psw)
document.extractall()
xml_content = document.read('word/document.xml')
document.close()
tree = XML(xml_content)
paragraphs = []
for paragraph in tree.getiterator(PARA):
texts = [node.text
for node in paragraph.getiterator(TEXT)
if node.text]
if texts:
paragraphs.append(''.join(texts))
return '\n\n'.join(paragraphs)
When running get_docx_text() with a password protected file, I received the following error:
Traceback (most recent call last):
File "<ipython-input-15-d2783899bfe5>", line 1, in <module>
runfile('/Users/username/Workspace/Python/docx2txt.py', wdir='/Users/username/Workspace/Python')
File "/Applications/Spyder-Py2.app/Contents/Resources/lib/python2.7/spyderlib/widgets/externalshell/sitecustomize.py", line 680, in runfile
execfile(filename, namespace)
File "/Applications/Spyder-Py2.app/Contents/Resources/lib/python2.7/spyderlib/widgets/externalshell/sitecustomize.py", line 78, in execfile
builtins.execfile(filename, *where)
File "/Users/username/Workspace/Python/docx2txt.py", line 41, in <module>
x = get_docx_text("/Users/username/Desktop/file.docx")
File "/Users/username/Workspace/Python/docx2txt.py", line 23, in get_docx_text
document = zipfile.ZipFile(path, "r")
File "zipfile.pyc", line 770, in __init__
File "zipfile.pyc", line 811, in _RealGetContents
BadZipfile: File is not a zip file
Does anyone have any advice to get this code to work?
I don't think this is an encryption problem, for two reasons:
Decryption is not attempted when the ZipFile object is created. Methods like ZipFile.extractall, extract, and open, and read take an optional pwd parameter containing the password, but the object constructor / initializer does not.
Your stack trace indicates that the BadZipFile is being raised when you create the ZipFile object, before you call setpassword:
document = zipfile.ZipFile(path, "r")
I'd look carefully for other differences between the two files you're testing: ownership, permissions, security context (if you have that on your OS), ... even filename differences can cause a framework to "not see" the file you're working on.
Also --- the obvious one --- try opening the encrypted zip file with your zip-compatible command of choice. See if it really is a zip file.
I tested this by opening an encrypted zip file in Python 3.1, while "forgetting" to provide a password. I could create the ZipFile object (the variable zfile below) without any error, but got a RuntimeError --- not a BadZipFile exception --- when I tried to read a file without providing a password:
Traceback (most recent call last):
File "./zf.py", line 35, in <module>
main()
File "./zf.py", line 29, in main
print_checksums(zipfile_name)
File "./zf.py", line 22, in print_checksums
for checksum in checksum_contents(zipfile_name):
File "./zf.py", line 13, in checksum_contents
inner_file = zfile.open(inner_filename, "r")
File "/usr/lib64/python3.1/zipfile.py", line 903, in open
"password required for extraction" % name)
RuntimeError: File apache.log is encrypted, password required for extraction
I was also able to raise a BadZipfile exception, once by trying to open an empty file and once by trying to open some random logfile text that I'd renamed to a ".zip" extension. The two test files produced identical stack traces, down to the line numbers.
Traceback (most recent call last):
File "./zf.py", line 35, in <module>
main()
File "./zf.py", line 29, in main
print_checksums(zipfile_name)
File "./zf.py", line 22, in print_checksums
for checksum in checksum_contents(zipfile_name):
File "./zf.py", line 10, in checksum_contents
zfile = zipfile.ZipFile(zipfile_name, "r")
File "/usr/lib64/python3.1/zipfile.py", line 706, in __init__
self._GetContents()
File "/usr/lib64/python3.1/zipfile.py", line 726, in _GetContents
self._RealGetContents()
File "/usr/lib64/python3.1/zipfile.py", line 738, in _RealGetContents
raise BadZipfile("File is not a zip file")
zipfile.BadZipfile: File is not a zip file
While this stack trace isn't exactly the same as yours --- mine has a call to _GetContents, and the pre-3.2 "small f" spelling of BadZipfile --- but they're close enough that I think this is the kind of problem you're dealing with.

Unclosed XML Token

How would I handle this error in Python 2.6?
Traceback (most recent call last):
File "./fetch_xml_collect.py", line 32, in <module>
tree=ET.parse(response)
File "/System/Library/Frameworks/Python.framework/Versions/2.6/lib/python2.6/xml/etree/ElementTree.py", line 862, in parse
tree.parse(source, parser)
File "/System/Library/Frameworks/Python.framework/Versions/2.6/lib/python2.6/xml/etree/ElementTree.py", line 587, in parse
self._root = parser.close()
File "/System/Library/Frameworks/Python.framework/Versions/2.6/lib/python2.6/xml/etree/ElementTree.py", line 1254, in close
self._parser.Parse("", 1) # end of data
xml.parsers.expat.ExpatError: unclosed token: line 56, column 1
Current code being implemented:
while urlget==1:
try:
response = urllib.urlopen(rep)
except IOError:
print("reason")
else:
try:
tree=ET.parse(response)
except IOError:
print("XML Parse Error\n")
else:
root=tree.getroot()
print root[0].text
powerlist=tree.findall('meter/power')
print powerlist[0].tag,powerlist[0].text
The question is: How would I handle the above error in the given code?
try:
#Some code
...
except xml.parsers.expat.ExpatError, ex:
print ex
continue
Something like the above should work. Just continue if you get that error. It will continue with the next iteration with the loop, or if it's the last iteration, break out of the loop.
The XML is formed incorrectly, and is unable to be processed. Just skip it and go on to the next one.

Categories

Resources