_ElementInterface instance has no attribute 'tostring' - python

The code below generates this error. I can't figure out why. If ElementTree has parse, why doesn't it have tostring? http://docs.python.org/library/xml.etree.elementtree.html#xml.etree.ElementTree.ElementTree
from xml.etree.ElementTree import ElementTree
...
tree = ElementTree()
node = ElementTree()
node = tree.parse(open("my_xml.xml"))
text = node.tostring()

tostring is a method of the xml.etree.ElementTree module, not the confusingly similarly-named xml.etree.ElementTree.ElementTree class.
from xml.etree.ElementTree import ElementTree
from xml.etree.ElementTree import tostring
tree = ElementTree()
node = tree.parse(open("my_xml.xml"))
text = tostring(node)

tostring() is actually a function of the ElementTree module not a method of the ElementTree wrapper class.
>>> import xml.etree.ElementTree as ET
>>> x = ET.fromstring('<xml><one>one</one></xml>')
>>> x
<Element xml at 7f749572f710>
>>> ET.tostring(x)
'<xml><one>one</one></xml>'

The docs you've linked to do not support the existence of a ElementTree.tostring() method.
Also, your call to tree.parse() rebinds node.

Related

Restore CDATA during lxml serialization

I know that I can preserve CDATA sections during XML parsing, using the following:
from lxml import etree
parser = etree.XMLParser(strip_cdata=False)
root = etree.XML('<root><![CDATA[test]]></root>', parser)
See APIs specific to lxml.etree
But, is there a simple way to "restore" CDATA section during serialization?
For example, by specifying a list of tag names…
For instance, I want to turn:
<CONFIG>
<BODY>This is a <message>.</BODY>
</CONFIG>
to:
<CONFIG>
<BODY><![CDATA[This is a <message>.]]></BODY>
</CONFIG>
Just by telling that BODY should contains CDATA…
Something like this?
from lxml import etree
parser = etree.XMLParser(strip_cdata=True)
root = etree.XML('<root><x><![CDATA[<test>]]></x></root>', parser)
print etree.tostring(root)
for elem in root.findall('x'):
elem.text = etree.CDATA(elem.text)
print etree.tostring(root)
Produces:
<root><x><test></x></root>
<root><x><![CDATA[<test>]]></x></root>

How to process xml response from flickr

import flickrapi
from xml.etree import ElementTree as ET
from lxml import etree
flickr = flickrapi.FlickrAPI(api_key,secret=api_secret)
r = flickr.photos_search(tags='e-waste', has_geo="1", per_page='100')
tree = ET.ElementTree(r)
xml_input = etree.parse("response_clean.xml")
transform = etree.XSLT(xslt_root)
links = str(transform(xml_input))
The idea of this little script is to get xml response from Flickr, and then use xsl file to process it further.
I want to convert r object (which is of type lxml.etree._Element)
to xml_input (of type lxml.etree._ElementTree).
I used tree = ET.ElementTree(r) but result is of type xml.etree.ElementTree.ElementTree.
I see that this is not exactly the same, but I don't understand the difference.
How should r be converted to xml_input ?
The code creates xml.etree.ElementTree.ElementTree because ET in the corresponding import statement references xml.etree.ElementTree. You should've used etree.ElementTree instead, which was imported from lxml :
>>> from xml.etree import ElementTree as ET
>>> from lxml import etree
>>> raw ='''<root></root>'''
>>> r = etree.fromstring(raw)
>>> root = etree.ElementTree(r)
>>> type(r)
<type 'lxml.etree._Element'>
>>> type(root)
<type 'lxml.etree._ElementTree'>

Parsing XML using Python minidom

<PacketHeader>
<HeaderField>
<name>number</name>
<dataType>int</dataType>
</HeaderField>
</PacketHeader>
This is my small XML file and I want to extract out the text which is within the name tag.
Here is my code snippet:-
from xml.dom import minidom
from xml.dom.minidom import parse
xmldoc = minidom.parse('sample.xml')
packetHeader = xmldoc.getElementsByTagName("PacketHeader")
headerField = packetHeader.getElementsByTagName("HeaderField")
for field in headerField:
getFieldName = field.getElementsByTagName("name")
print getFieldName
But I am getting the location but not the text.
from xml.dom import minidom
from xml.dom.minidom import parse
xmldoc = minidom.parse('sample.xml')
# find the name element, if found return a list, get the first element
name_element = xmldoc.getElementsByTagName("name")[0]
# this will be a text node that contains the actual text
text_node = name_element.childNodes[0]
# get text
print text_node.data
Please check this.
Update
BTW i suggest you ElementTree, Below is the code snippet using ElementTree which is doing samething as the above minidom code
import elementtree.ElementTree as ET
tree = ET.parse("sample.xml")
# the tree root is the toplevel `PacketHeader` element
print tree.findtext("HeaderField/name")
A small variant of the accepted and correct answer above is:
from xml.dom import minidom
xmldoc = minidom.parse('fichier.xml')
name_element = xmldoc.getElementsByTagName('name')[0]
print name_element.childNodes[0].nodeValue
This simply uses nodeValue instead of its alias data

How to print Element as correct xml with xml tag?

So I have this function in my view:
from django.http import HttpResponse
from xml.etree.ElementTree import Element, SubElement, Comment, tostring
def helloworld(request):
root_element = Element("root_element")
comment = Comment("Hello World!!!")
root_element.append(comment)
foo_element = Element("foo")
foo_element.text = "bar"
bar_element = Element("bar")
bar_element.text = "foo"
root_element.append(foo_element)
root_element.append(bar_element)
return HttpResponse(tostring(root_element), "application/xml")
What it does it prints something like this:
<root_element><!--Hello World!!!--><foo>bar</foo><bar>foo</bar></root_element>
As you can see, it is missing the xml tag at the beginning. How to output proper XML beginning with xml declaration?
If you can add a dependency in your project, I suggest you to use lxml which is more complete and optimized than the basic xml module that come with Python.
For doing this, you just have to change your import statement to :
from lxml.etree import Element, SubElement, Comment, tostring
And then, you'll have a tostring() with a 'xml_declaration' option :
>>> tostring(root, xml_declaration=False)
'<root_element><!--Hello World!!!--><foo>bar</foo><bar>foo</bar></root_element>'
>>> tostring(root, xml_declaration=True)
"<?xml version='1.0' encoding='ASCII'?>\n<root_element><!--Hello World!!!--><foo>bar</foo><bar>foo</bar></root_element>"
In the standard lib, only the write() method of ElementTree have a xml_declaration option. An other solution would be to create a wrapper which use ElementTree.write() to write into a StringIO and then, to return the content of the StringIO.

Python ElementTree support for parsing unknown XML entities?

I have a set of super simple XML files to parse... but... they use custom defined entities. I don't need to map these to characters, but I do wish to parse and act on each one. For example:
<Style name="admin-5678">
<Rule>
<Filter>[admin_level]='5'</Filter>
&maxscale_zoom11;
</Rule>
</Style>
There is a tantalizing hint at http://effbot.org/elementtree/elementtree-xmlparser.htm that XMLParser has limited entity support, but I can't find the methods mentioned, everything gives errors:
#!/usr/bin/python
##
## Where's the entity support as documented at:
## http://effbot.org/elementtree/elementtree-xmlparser.htm
## In Python 2.7.1+ ?
##
from pprint import pprint
from xml.etree import ElementTree
from cStringIO import StringIO
parser = ElementTree.ElementTree()
#parser.entity["maxscale_zoom11"] = unichr(160)
testf = StringIO('<foo>&maxscale_zoom11;</foo>')
tree = parser.parse(testf)
#tree = parser.parse(testf,"XMLParser")
for node in tree.iter('foo'):
print node.text
Which depending on how you adjust the comments gives:
xml.etree.ElementTree.ParseError: undefined entity: line 1, column 5
or
AttributeError: 'ElementTree' object has no attribute 'entity'
or
AttributeError: 'str' object has no attribute 'feed'
For those curious the XML is from the OpenStreetMap's mapnik project.
As #cnelson already pointed out in a comment, the chosen solution here won't work in Python 3.
I finally got it working. Quoted from this Q&A.
Inspired by this post, we can just prepend some XML definition to the incoming raw HTML content, and then ElementTree would work out of box.
This works for both Python 2.6, 2.7, 3.3, 3.4.
import xml.etree.ElementTree as ET
html = '''<html>
<div>Some reasonably well-formed HTML content.</div>
<form action="login">
<input name="foo" value="bar"/>
<input name="username"/><input name="password"/>
<div>It is not unusual to see in an HTML page.</div>
</form></html>'''
magic = '''<!DOCTYPE html PUBLIC "-//W3C//DTD XHTML 1.0 Transitional//EN"
"http://www.w3.org/TR/xhtml1/DTD/xhtml1-transitional.dtd" [
<!ENTITY nbsp ' '>
]>''' # You can define more entities here, if needed
et = ET.fromstring(magic + html)
I'm not sure if this is a bug in ElementTree or what, but you need to call UseForeignDTD(True) on the expat parser to behave the way it did in the past.
It's a bit hacky, but you can do this by creating your own instance of ElementTree.Parser, calling the method on it's instance of xml.parsers.expat, and then passing it to ElementTree.parse():
from xml.etree import ElementTree
from cStringIO import StringIO
testf = StringIO('<foo>&moo_1;</foo>')
parser = ElementTree.XMLParser()
parser.parser.UseForeignDTD(True)
parser.entity['moo_1'] = 'MOOOOO'
etree = ElementTree.ElementTree()
tree = etree.parse(testf, parser=parser)
for node in tree.iter('foo'):
print node.text
This outputs "MOOOOO"
Or using a mapping interface:
from xml.etree import ElementTree
from cStringIO import StringIO
class AllEntities:
def __getitem__(self, key):
#key is your entity, you can do whatever you want with it here
return key
testf = StringIO('<foo>&moo_1;</foo>')
parser = ElementTree.XMLParser()
parser.parser.UseForeignDTD(True)
parser.entity = AllEntities()
etree = ElementTree.ElementTree()
tree = etree.parse(testf, parser=parser)
for node in tree.iter('foo'):
print node.text
This outputs "moo_1"
A more complex fix would be to subclass ElementTree.XMLParser and fix it there.

Categories

Resources