XML Not Parsing in Python 2.7 with ElementTree

XML Not Parsing in Python 2.7 with ElementTree - python

I have the following XML file which I get from REST API
<?xml version="1.0" encoding="utf-8"?>
<boxes>
<home id="1" name="productname"/>
<server>111.111.111.111</server>
<approved>yes</approved>
<creation>2007 handmade</creation>
<description>E-Commerce, buying and selling both attested</description>
<boxtype>
<sizes>large, medium, small</sizes>
<vendor>Some Organization</vendor>
<version>ANY</version>
</boxtype>
<method>Handmade, Handcrafted</method>
<time>2014</time>
</boxes>
I am able to get the above output, store in a string variable and print in console,
but when I send this to xml ElementTree
import base64
import urllib2
from xml.dom.minidom import Node, Document, parseString
from xml.etree import ElementTree as ET
from xml.etree.ElementTree import XML, fromstring, tostring
print outputxml ##Printing xml correctly, outputxml contains xml above
content = ET.fromstring(outputxml)
boxes = content.find('boxes')
print boxes
boxtype = boxes.find("boxes/boxtype")
If I print the boxes it is giving me None and hence is giving me below error
boxtype = boxes.find("boxes/boxtype")
AttributeError: 'NoneType' object has no attribute 'find'

The root level node is boxes, and it cannot find boxes within itself.
boxtype = content.find("boxtype")
should be sufficient.
DEMO:
>>> import base64
>>> import urllib2
>>> from xml.dom.minidom import Node, Document, parseString
>>> from xml.etree import ElementTree as ET
>>> from xml.etree.ElementTree import XML, fromstring, tostring
>>>
>>> print outputxml ##Printing xml correctly, outputxml contains xml above
<?xml version="1.0" encoding="utf-8"?>
<boxes>
<home id="1" name="productname"/>
<server>111.111.111.111</server>
<approved>yes</approved>
<creation>2007 handmade</creation>
<description>E-Commerce, buying and selling both attested</description>
<boxtype>
<sizes>large, medium, small</sizes>
<vendor>Some Organization</vendor>
<version>ANY</version>
</boxtype>
<method>Handmade, Handcrafted</method>
<time>2014</time>
</boxes>
>>> content = ET.fromstring(outputxml)
>>> boxes = content.find('boxes')
>>> print boxes
None
>>>
>>> boxes
>>> content #note that the content is the root level node - boxes
<Element 'boxes' at 0x1075a9250>
>>> content.find('boxtype')
<Element 'boxtype' at 0x1075a93d0>
>>>

Related

Parsing XML Attributes with Python

I am trying to parse out all the green highlighted attributes (some sensitive things have been blacked out), I have a bunch of XML files all with similar formats, I already know how to loop through all of them individually them I am having trouble parsing out the specific attributes though.
XML Document
I need the text in the attributes: name="text1"
from
project logLevel="verbose" version="2.0" mainModule="Main" name="text1">
destinationDir="/text2" from
put label="Put Files" destinationDir="/Trigger/FPDMMT_INBOUND">
destDir="/text3" from
copy disabled="false" version="1.0" label="Archive Files" destDir="/text3" suffix="">
I am using
import csv
import os
import re
import xml.etree.ElementTree as ET
tree = ET.parse(XMLfile_path)
item = tree.getroot()[0]
root = tree.getroot()
print (item.get("name"))
print (root.get("name"))
This outputs:
Main
text1
The item.get pulls the line at index [0] which is the first line root in the tree which is <module
The root.get pulls from the first line <project
I know there's a way to search for exactly the right part of the root/tree with something like:
test = root.find('./project/module/ftp/put')
print (test.get("destinationDir"))
I need to be able to jump directly to the thing I need and output the attributes I need.
Any help would be appreciated
Thanks.

Simplified copy of your XML:
xml = '''<project logLevel="verbose" version="2.0" mainModule="Main" name="hidden">
<module name="Main">
<createWorkspace version="1.0"/>
<ftp version="1.0" label="FTP connection to PRD">
<put label="Put Files" destinationDir="destination1">
</put>
</ftp>
<ftp version="1.0" label="FTP connection to PRD">
<put label="Put Files" destinationDir="destination2">
</put>
</ftp>
<copy disabled="false" destDir="destination3">
</copy>
</module>
</project>
'''
# solution using ETree
from xml.etree import ElementTree as ET
root = ET.fromstring(xml)
name = root.get('name')
ftp_destination_dir1 = root.findall('./module/ftp/put')[0].get('destinationDir')
ftp_destination_dir2 = root.findall('./module/ftp/put')[1].get('destinationDir')
copy_destination_dir = root.find('./module/copy').get('destDir')
print(name)
print(ftp_destination_dir1)
print(ftp_destination_dir2)
print(copy_destination_dir)
# solution using lxml
from lxml import etree as et
root = et.fromstring(xml)
name = root.get('name')
ftp_destination_dirs = root.xpath('./module/ftp/put/#destinationDir')
copy_destination_dir = root.xpath('./module/copy/#destDir')[0]
print(name)
print(ftp_destination_dirs[0])
print(ftp_destination_dirs[1])
print(copy_destination_dir)

How to process xml response from flickr

import flickrapi
from xml.etree import ElementTree as ET
from lxml import etree
flickr = flickrapi.FlickrAPI(api_key,secret=api_secret)
r = flickr.photos_search(tags='e-waste', has_geo="1", per_page='100')
tree = ET.ElementTree(r)
xml_input = etree.parse("response_clean.xml")
transform = etree.XSLT(xslt_root)
links = str(transform(xml_input))
The idea of this little script is to get xml response from Flickr, and then use xsl file to process it further.
I want to convert r object (which is of type lxml.etree._Element)
to xml_input (of type lxml.etree._ElementTree).
I used tree = ET.ElementTree(r) but result is of type xml.etree.ElementTree.ElementTree.
I see that this is not exactly the same, but I don't understand the difference.
How should r be converted to xml_input ?

The code creates xml.etree.ElementTree.ElementTree because ET in the corresponding import statement references xml.etree.ElementTree. You should've used etree.ElementTree instead, which was imported from lxml :
>>> from xml.etree import ElementTree as ET
>>> from lxml import etree
>>> raw ='''<root></root>'''
>>> r = etree.fromstring(raw)
>>> root = etree.ElementTree(r)
>>> type(r)
<type 'lxml.etree._Element'>
>>> type(root)
<type 'lxml.etree._ElementTree'>

How to extract the string values "Hello" and "World" from the XML using Python 2.6

I need to extract the strings "Hello" and "World" using Python 2.6. Please advice.
<Translate_Array_Request>
<App_Id />
<From>language-code</From>
<Options>
<Category xmlns="http://schemas.datacontract.org/2004/07/Microsoft.MT.Web.Service.V2" >string-value</Category>
<Content Type xmlns="http://schemas.datacontract.org/2004/07/Microsoft.MT.Web.Service.V2">text/plain</ContentType>
<Reserved Flags xmlns="http://schemas.datacontract.org/2004/07/Microsoft.MT.Web.Service.V2" />
<State xmlns="http://schemas.datacontract.org/2004/07/Microsoft.MT.Web.Service.V2" >int-value</State>
<Uri xmlns="http://schemas.datacontract.org/2004/07/Microsoft.MT.Web.Service.V2" >string-value</Uri>
<User xmlns="http://schemas.datacontract.org/2004/07/Microsoft.MT.Web.Service.V2" >string-value</User>
</Options>
<Texts>
<string xmlns="http://schemas.microsoft.com/2003/10/Serialization/Arrays">**Hello**</string>
<string xmlns="http://schemas.microsoft.com/2003/10/Serialization/Arrays">**World**</string>
</Texts>
<To>language-code</To>
</Translate_Array_Request>

There are multiple libraries in python that let you parse and extract data from XML. One way would be to use the ElementTree XML python API. Assuming the input is saved as a string xml_data, this is what you do:
>>> import xml.etree.ElementTree as ET
>>> root = ET.fromstring(xml_data)
>>> texts = root.find('Texts')
>>> for data in texts:
... print data.text
...
**Hello**
**World**

with xml package, do something like:
import xml.etree.ElementTree as ET
def getTags( xml )
root = ET.fromstring( xml )
res = []
for tag in root.iter("string"):
res.append(tag.text)
return res

Alternative solution using minidom,
import xml.dom.minidom as minidom
def getTags(xml)
root = minidom.parseString(xml)
return [i.firstChild.nodeValue for i in root.getElementsByTagName('string')]

python ElementTree the text of element who has a child

When I try to read a text of a element who has a child, it gives None:
See the xml (say test.xml):
<?xml version="1.0"?>
<data>
<test><ref>MemoryRegion</ref> abcd</test>
</data>
and the python code that wants to read 'abcd':
import xml.etree.ElementTree as ET
tree = ET.parse('test.xml')
root = tree.getroot()
print root.find("test").text
When I run this python, it gives None, rather than abcd.
How can I read abcd under this condition?

Use Element.tail attribute:
>>> import xml.etree.ElementTree as ET
>>> tree = ET.parse('test.xml')
>>> root = tree.getroot()
>>> print root.find(".//ref").tail
abcd

ElementTree has a rather different view of XML that is more suited for nested data. .text is the data right after a start tag. .tail is the data right after an end tag. so you want:
print root.find('test/ref').tail

ElementTree returns no nodes parsing simple KML document

I have a very simple KML file which returns no nodes when parsed with ElementTree. This is frustrating me :-). Any clues?
from xml.etree import ElementTree
from pprint import pprint
kml = '''<?xml version="1.0" encoding="UTF-8"?>
<kml xmlns="http://earth.google.com/kml/2.0">
<Document>
<name>NEXRAD Radar Sites</name>
<Schema parent="Placemark" name="wsr">
<SimpleField type="wstring" name="STATE">
</SimpleField>
</Schema>
<wsr>
<name>KABR</name>
</wsr>
</Document>
</kml>
'''
tree = ElementTree.fromstring(kml)
ElementTree.dump(tree)
for node in tree.iter('wsr'):
pprint(node)
for node in tree.findall('../wsr'):
pprint(node)

The tags are namespaced. If you try tree.iter() with no tag it will show what ElementTree thinks the tags are called. The wsr tag is called {http://earth.google.com/kml/2.0}wsr. This returns a node:
list(tree.iter('{http://earth.google.com/kml/2.0}wsr'))

Develop Reference

Python is a programming language that lets you work quickly and integrate systems more effectively.

XML Not Parsing in Python 2.7 with ElementTree - python

Related

Parsing XML Attributes with Python

How to process xml response from flickr

How to extract the string values "Hello" and "World" from the XML using Python 2.6

python ElementTree the text of element who has a child

ElementTree returns no nodes parsing simple KML document

Categories

Resources