Element Tree Syntax Error (not well-formed with invalid token)

Element Tree Syntax Error (not well-formed with invalid token) - python

I am trying to use the Element Tree modules but I end up to some Error which I can't understand.
My code here is based on the Python documentation itself, Python Element Tree doc ,somehow it gave me an error when trying to run the script;
try:
import xml.etree.cElementTree as ET
except ImportError:
import xml.etree.ElementTree as ET
file_name_xml = "curl-result.xml"
tree = ET.parse(file_name_xml)
tree.getroot()
When I run this code:
./python2.6 modify_xml_file.py
Then, it gave me this error;
Traceback (most recent call last):
File "modify_xml_file.py", line 8, in <module>
tree = ET.parse(file_name_xml)
File "<string>", line 45, in parse
File "<string>", line 32, in parse
SyntaxError: not well-formed (invalid token): line 1, column 4

The version of cElementTree included in Python 2.6 throws a SyntaxError exception for malformed XML:
>>> with open('bad.xml', 'w') as badxml:
... badxml = '<foobar\n'
...
>>> import xml.etree.cElementTree as ET
>>> tree = ET.parse('bad.xml')
Traceback (most recent call last):
File "<stdin>", line 1, in <module>
File "<string>", line 45, in parse
File "<string>", line 32, in parse
SyntaxError: no element found: line 1, column 0
This is a bug in the C acceleration code fixed in Python 2.7. The (slower) Python parser throws a more helpful error:
>>> import xml.etree.ElementTree as ET
>>> tree = ET.parse('bad.xml')
Traceback (most recent call last):
File "<stdin>", line 1, in <module>
File "/Users/mjpieters/Development/Library/buildout.python/parts/opt/lib/python2.6/xml/etree/ElementTree.py", line 862, in parse
tree.parse(source, parser)
File "/Users/mjpieters/Development/Library/buildout.python/parts/opt/lib/python2.6/xml/etree/ElementTree.py", line 587, in parse
self._root = parser.close()
File "/Users/mjpieters/Development/Library/buildout.python/parts/opt/lib/python2.6/xml/etree/ElementTree.py", line 1254, in close
self._parser.Parse("", 1) # end of data
xml.parsers.expat.ExpatError: no element found: line 1, column 0
Fix your XML input file.
What changed in 2.7 is that ElementTree was updated to version 1.3, a version that improved the parser, introducing a new ParseError exception, which is a subclass of SyntaxError.

Related

Parseing xml and html in same project

I want to parse in one project XML and HTML at the same time.
I tried this:
from xml.etree import ElementTree as ET
tree = ET.parse(fpath)
html_file = ET.parse(htmlpath)
and got this error:
Traceback (most recent call last):
File "C:.py", line 55, in
html_file = ET.parse("htmlpath")
File "C:\Users\AppData\Local\Programs\Python\Python37-32\lib\xml\etree\ElementTree.py", line 1197, in parse
tree.parse(source, parser)
File "C:\Users\AppData\Local\Programs\Python\Python37-32\lib\xml\etree\ElementTree.py", line 598, in parse
self._root = parser._parse_whole(source)
xml.etree.ElementTree.ParseError: undefined entity : line 690, column 78

I am facing this error "xml.parsers.expat.ExpatError: not well-formed (invalid token):" while prase the url data using minidom

I am facing this error xml.parsers.expat.ExpatError: syntax error: line 1, column 0 while parse data from url using minidom. Anyone can help me for this ?
Here is my code:
from xml.dom import minidom
import urllib2
url= 'http://www.awgp.org/about_us'
openurl=urllib2.urlopen(url)
doc=minidom.parse("about_us.xml")
Error:
Traceback (most recent call last):
File "test3.py", line 11, in <module>
doc=minidom.parse("about_us.xml")
File "C:\Python27\lib\xml\dom\minidom.py", line 1918, in parse
return expatbuilder.parse(file)
File "C:\Python27\lib\xml\dom\expatbuilder.py", line 924, in parse
result = builder.parseFile(fp)
File "C:\Python27\lib\xml\dom\expatbuilder.py", line 211, in parseFile
parser.Parse("", True)
xml.parsers.expat.ExpatError: syntax error: line 1, column 0

parser.Parse("", True)
xml.parsers.expat.ExpatError: syntax error: line 1, column 0
The above from your traceback indicates to me that your "about_us.xml" file is empty.
You have openurl but you have not shown that you've ever called openurl.read() to actually get at the data.
Nor have you shown where or how you've written said data to your "about_us.xml" file.
from xml.dom import minidom
import urllib2
url= 'http://www.awgp.org/about_us'
openurl=urllib2.urlopen(url)
doc=minidom.parse(openurl)
print doc
gives me
Traceback (most recent call last):
File "main.py", line 5, in <module>
doc=minidom.parse(openurl)
File "/usr/local/lib/python2.7/xml/dom/minidom.py", line 1918, in parse
return expatbuilder.parse(file)
File "/usr/local/lib/python2.7/xml/dom/expatbuilder.py", line 928, in parse
result = builder.parseFile(file)
File "/usr/local/lib/python2.7/xml/dom/expatbuilder.py", line 207, in parseFile
parser.Parse(buffer, 0)
xml.parsers.expat.ExpatError: not well-formed (invalid token): line 51, column 81
which indicates that the page you are trying to parse as XML is not well-formed. Try using beautiful soup instead which, from memory, is very forgiving.
from BeautifulSoup import BeautifulSoup
import urllib2
url= 'http://www.awgp.org/about_us'
openurl=urllib2.urlopen(url)
soup = BeautifulSoup(openurl.read())
for a in soup.findAll('a'):
print (a.text, a.get('href'))
BTW, you'll need ver 3 of Beautiful Soup since you're still on python 2.7

With Python 3 and lxml, how to extract the Version number from a SOAP WSDL?

When I test with a subset of the WSDL file, with Name Spaces omitted from the file and code, it works fine.
# for reference, these are the final lines from the WSDL
#
# <wsdl:service name="Shopping">
# <wsdl:documentation>
# <Version>1027</Version>
# </wsdl:documentation>
# <wsdl:port binding="ns:ShoppingBinding" name="Shopping">
# <wsdlsoap:address location="http://open.api.ebay.com/shopping"/>
# </wsdl:port>
# </wsdl:service>
#</wsdl:definitions>
from lxml import etree
wsdl = etree.parse('http://developer.ebay.com/webservices/latest/ShoppingService.wsdl')
print(wsdl.findtext('wsdl:.//Version')) # wish this would print 1027
/Library/Frameworks/Python.framework/Versions/3.6/bin/python3.6 "/Users/matecsaj/Google Drive/Projects/collectibles/eBay/figure-it3.py"
Traceback (most recent call last):
File "src/lxml/_elementpath.py", line 79, in lxml._elementpath.xpath_tokenizer (src/lxml/_elementpath.c:2414)
KeyError
During handling of the above exception, another exception occurred:
Traceback (most recent call last):
File "/Users/matecsaj/Google Drive/Projects/collectibles/eBay/figure-it3.py", line 14, in <module>
print(wsdl.findtext('wsdl:.//Version')) # wish this would print 1027
File "src/lxml/etree.pyx", line 2230, in lxml.etree._ElementTree.findtext (src/lxml/etree.c:69049)
File "src/lxml/etree.pyx", line 1552, in lxml.etree._Element.findtext (src/lxml/etree.c:60629)
File "src/lxml/_elementpath.py", line 329, in lxml._elementpath.findtext (src/lxml/_elementpath.c:10089)
File "src/lxml/_elementpath.py", line 311, in lxml._elementpath.find (src/lxml/_elementpath.c:9610)
File "src/lxml/_elementpath.py", line 300, in lxml._elementpath.iterfind (src/lxml/_elementpath.c:9282)
File "src/lxml/_elementpath.py", line 277, in lxml._elementpath._build_path_iterator (src/lxml/_elementpath.c:8675)
File "src/lxml/_elementpath.py", line 82, in xpath_tokenizer (src/lxml/_elementpath.c:2542)
SyntaxError: prefix 'wsdl' not found in prefix map
Process finished with exit code 1

The xml has namespaces defined in it, hence to access the element you need to define the link of the namespace. Please see if the code below helps:
wsdlLink = "http://schemas.xmlsoap.org/wsdl/"
wsdl = etree.parse('http://developer.ebay.com/webservices/latest/ShoppingService.wsdl')
print(wsdl.findtext('{'+wsdlLink+'}//Version'))

With credit to the kind folks that commented, here is a modified solution that does print the Version number. All I could get working was the wildcard search. Also, the iterator skipped the Version element, so I had to get at it from its parent element. Good enough.
from lxml import etree
wsdlLink = "http://schemas.xmlsoap.org/wsdl/"
wsdl = etree.parse('http://developer.ebay.com/webservices/latest/ShoppingService.wsdl')
for element in wsdl.iter('{'+wsdlLink+'}*'):
if 'documentation' in element.tag:
for child in element:
print(child.text)

Python: Read a very large XML file

Am trying to read an xml file(which is a french diccionary) from a python script, but am getting the error below. Is there anyway I can fix it?
PS: the file is 158 643 Ko
from xml.dom import minidom
doc = minidom.parse("Dic.xml")
Data = doc.getElementsByTagName("title")[0]
titleData = Data.firstChild.data
print (titleData)
The error:
Traceback (most recent call last):
File "<pyshell#3>", line 1, in <module>
doc = minidom.parse("Morphalou-2.0.xml")
File "C:\Python27\lib\xml\dom\minidom.py", line 1918, in parse
return expatbuilder.parse(file)
File "C:\Python27\lib\xml\dom\expatbuilder.py", line 924, in parse
result = builder.parseFile(fp)
File "C:\Python27\lib\xml\dom\expatbuilder.py", line 204, in parseFile
buffer = file.read(16*1024)
MemoryError
Advance Thanks

python email decode_header raise HeaderParseError

I got a HeaderParseError as below.What's wrong with it?
>>> from email import Header
>>> s= "=?UTF-8?B?6KGM6KGM5ZyI5Li65oKo5o6o6I2Q5Lul5LiL6IGM5L2N77yM?==?UTF-8?B?56Wd5oKo5om+5Yiw5aW95bel5L2c77yB44CQ6KGM6KGM5ZyI44CR?="
>>> src = Header.decode_header(s)
This is the error:
Traceback (most recent call last):
File "<stdin>", line 1, in <module>
File "/usr/lib/python2.7/email/header.py", line 108, in decode_header
raise HeaderParseError
email.errors.HeaderParseError

You are trying to parse two headers at once:
"=?UTF-8?B?6KGM6KGM5ZyI5Li65oKo5o6o6I2Q5Lul5LiL6IGM5L2N77yM?="
and
"=?UTF-8?B?56Wd5oKo5om+5Yiw5aW95bel5L2c77yB44CQ6KGM6KGM5ZyI44CR?="
removing one of them will do the job. If you want to parse all of them - you have to split them first

Develop Reference

Python is a programming language that lets you work quickly and integrate systems more effectively.

Element Tree Syntax Error (not well-formed with invalid token) - python

Related

Parseing xml and html in same project

I am facing this error "xml.parsers.expat.ExpatError: not well-formed (invalid token):" while prase the url data using minidom

With Python 3 and lxml, how to extract the Version number from a SOAP WSDL?

Python: Read a very large XML file

python email decode_header raise HeaderParseError

Categories

Resources