Python getparent() not working - python

I'd like to use getparent() in some code I'm working on to read XML files. When I try what's below I get this error: AttributeError: getparent
I assume I'm making a basic mistake but after an hour of searching and trial and error, I can't figure out what it is. (Using python 2.7 if that matters)
import xml.etree.cElementTree as ET
import lxml.etree
url = [file.xml]
tree = ET.ElementTree(file=url)
txt = 'texthere'
for elem in tree.iter(tag='text'):
print elem.text
print elem.getparent()

Element objects created with the standard library module ElementTree do not have a getparent() method. Element objects created with lxml do have this method. You import lxml (import lxml.etree) in your code but you don't use it.
Here is a small working demonstration:
from lxml import etree
XML = """
<root>
<a>
<b>foo</b>
</a>
</root>"""
tree = etree.fromstring(XML)
for elem in tree.iter(tag="b"):
print "text:", elem.text
print "parent:", elem.getparent()
Output:
text: foo
parent: <Element a at 0x27a6f08>

I think better to try this. There is some problem with your import libraries. same thing can done usng DOM. great example in here. http://www.mkyong.com/python/python-read-xml-file-dom-example/

Related

Python: xml.Find always returns none

So I'm trying to search and replace the xml keyword RunCodeAnalysis inside a vcxproj file with python.
I'm pretty new to python so be gentle, but I thought it would be the simplest language to do this kind of thing.
I read a handful of similar examples and came up with the code below, but no matter what I search for the ElementTree Find call always returns None.
from xml.etree import ElementTree as et
xml = '''\
<?xml version="1.0" encoding="utf-8"?>
<Project DefaultTargets="Build" ToolsVersion="12.0" xmlns="http://schemas.microsoft.com/developer/msbuild/2003">
<PropertyGroup Condition="'$(Configuration)|$(Platform)'=='Protected_Debug|Win32'">
<RunCodeAnalysis>false</RunCodeAnalysis>
</PropertyGroup>
</Project>
'''
et.register_namespace('', "http://schemas.microsoft.com/developer/msbuild/2003")
tree = et.ElementTree(et.fromstring(xml))
print(tree.find('.//RunCodeAnalysis'))
Here's a simplified code example online: https://ideone.com/1T1wsb
Can anyone tell me what I'm doing wrong?
Ok.. So #ThomWiggers helped me with the missing piece - and here's my final code in all it's naive glory. No parameter checking or any kind of smarts yet, but it takes two parameters - filename and whether to turn static code analysis to true or false. I've got about 30 projects I want to turn it on for for nightly builds but really don't want to turn it on day to day as it's just too slow.
import sys
from xml.etree import ElementTree as et
et.register_namespace('', "http://schemas.microsoft.com/developer/msbuild/2003")
tree = et.parse(sys.argv[1])
value = sys.argv[2]
for item in tree.findall('.//{http://schemas.microsoft.com/developer/msbuild/2003}RunCodeAnalysis'):
item.text = value
for item in tree.findall('.//{http://schemas.microsoft.com/developer/msbuild/2003}EnablePREfast'):
item.text = value
tree.write(sys.argv[1])

How to make nested xml structure flat with python

I have XML with huge nested structure.
Like this one
<root>
<node1>
<subnode1>
<name1>text1</name1>
</subnode1>
</node1>
<node2>
<subnode2>
<name2>text2</name2>
</subnode2>
</node2>
</root>
I want convert it to
<root>
<node1>
<name1>text1</name1>
</node1>
<node2>
<name2>text2</name2>
</node2>
</root>
I was tried with following steps
from xml.etree import ElementTree as et
tr = etree.parse(path)
root = tr.getroot()
for node in root.getchildren():
for element in node.iter():
if (element.text is not None):
node.extend(element)
I also tried with node.append(element) but it also does not work it adds element in end and i got infinity loop.
Any helps be appreciated.
A few points to mention here:
Firstly, your test element.text is not None always returns True if you parse your XML file as given above using xml.etree.Elementree since at the end of each node, there is a new line character, hence, the text in each supposedly not-having-text node always have \n character. An alternative is to use lxml.etree.parse with a lxml.etree.XMLParser that ignore the blank text as below.
Secondly, it's not good to append to a tree while reading through it. The same reason for why this code will give infinite loop:
>>> a = [1,2,3,4]
>>> for k in a:
a.append(5)
You could see #Alex Martelli answer for this question here: Modifying list while iterating regarding the issue.
Hence, you should make a buffer XML tree and build it accordingly rather than modifying your tree while traversing it.
from xml.etree import ElementTree as et
import pdb;
from lxml import etree
p = etree.XMLParser(remove_blank_text=True)
path = 'test.xml'
tr = et.parse(path, parser = p)
root = tr.getroot()
buffer = et.Element(root.tag);
for node in root.getchildren():
bnode = et.Element(node.tag)
for element in node.iter():
#pdb.set_trace()
if (element.text is not None):
bnode.append(element)
#node.extend(element)
buffer.append(bnode)
et.dump(buffer)
Sample run and results:
Chip chip# 01:01:53# ~: python stackoverflow.py
<root><node1><name1>text1</name1></node1><node2><name2>text2</name2></node2></root>
NOTE: you can always try to print a pretty XML tree using lxml package in python following tutorials here: Pretty printing XML in Python since the tree I printed out is rather horrible to read by naked eyes.

Find an element in an XML tree using ElementTree

I am trying to locate a specific element in an XML file, using ElementTree. Here is the XML:
<documentRoot>
<?version="1.0" encoding="UTF-8" standalone="yes"?>
<n:CallFinished xmlns="http://api.callfire.com/data" xmlns:n="http://api.callfire.com/notification/xsd">
<n:SubscriptionId>96763001</n:SubscriptionId>
<Call id="158864460001">
<FromNumber>5129618605</FromNumber>
<ToNumber>15122537666</ToNumber>
<State>FINISHED</State>
<ContactId>125069153001</ContactId>
<Inbound>true</Inbound>
<Created>2014-01-15T00:15:05Z</Created>
<Modified>2014-01-15T00:15:18Z</Modified>
<FinalResult>LA</FinalResult>
<CallRecord id="94732950001">
<Result>LA</Result>
<FinishTime>2014-01-15T00:15:15Z</FinishTime>
<BilledAmount>1.0</BilledAmount>
<AnswerTime>2014-01-15T00:15:06Z</AnswerTime>
<Duration>9</Duration>
</CallRecord>
</Call>
</n:CallFinished>
</documentRoot>
I am interested in the <Created> item. Here is the code I am using:
import xml.etree.ElementTree as ET
calls_root = ET.fromstring(calls_xml)
for item in calls_root.find('CallFinished/Call/Created'):
print "Found you!"
call_start = item.text
I have tried a bunch of different XPath expressions, but I'm stumped - I cannot locate the element. Any tips?
You aren't referencing the namespaces that exist in the XML document, so ElementTree can't find the elements in that XPath. You need to tell ElementTree what namespaces you are using.
The following should work:
import xml.etree.ElementTree as ET
namespaces = {'n':'{http://api.callfire.com/notification/xsd}',
'_':'{http://api.callfire.com/data}'
}
calls_root = ET.fromstring(calls_xml)
for item in calls_root.find('{n}CallFinished/{_}Call/{_}Created'.format(**namespaces)):
print "Found you!"
call_start = item.text
Alternatively, LXML has a wrapper around ElementTree and has good support for namespaces without having to worry about string formatting.

lxml findall() problem

Just trying to make a simple program to get wikipedia's recentchanges and parse that XML file.
I stuck at the point where findall() not working. What I'm doing wrong?
import urllib2
from lxml import etree as ET
result = urllib2.urlopen('http://en.wikipedia.org/w/api.php?action=query&format=xml&list=recentchanges&rcprop=title|ids|sizes|flags|user|timestamp').read()
xml=ET.fromstring (result)
print xml[0][0][0].attrib # that works!
print xml.findall ('api/query/recentchanges/rc') # that don't!
I suspect the root node is the topic node, so it's looking for a node named "api" inside of the root node. If so, both of the following will work:
query/recentchanges/rc
/api/query/recentchanges/rc

What's an easy and fast way to put returned XML data into a dict?

I'm trying to take the data returned from:
http://ipinfodb.com/ip_query.php?ip=74.125.45.100&timezone=true
Into a dict in a fast and easy way. What's the best way to do this?
Thanks.
Using xml from the standard Python library:
import xml.etree.ElementTree as xee
contents='''\
<?xml version="1.0" encoding="UTF-8"?>
<Response>
<Ip>74.125.45.100</Ip>
<Status>OK</Status>
<CountryCode>US</CountryCode>
<CountryName>United States</CountryName>
<RegionCode>06</RegionCode>
<RegionName>California</RegionName>
<City>Mountain View</City>
<ZipPostalCode>94043</ZipPostalCode>
<Latitude>37.4192</Latitude>
<Longitude>-122.057</Longitude>
<TimezoneName>America/Los_Angeles</TimezoneName>
<Gmtoffset>-25200</Gmtoffset>
<Isdst>1</Isdst>
</Response>'''
doc=xee.fromstring(contents)
print dict(((elt.tag,elt.text) for elt in doc))
Or using lxml:
import lxml.etree
import urllib2
url='http://ipinfodb.com/ip_query.php?ip=74.125.45.100&timezone=true'
doc = lxml.etree.parse( urllib2.urlopen(url) ).getroot()
print dict(((elt.tag,elt.text) for elt in doc))
I would use the xml.dom builtin, something like this:
import urllib
from xml.dom import minidom
data = urllib.urlopen('http://ipinfodb.com/ip_query.php?ip=74.125.45.100&timezone=true')
xml_data = minidom.parse(data)
my_dict ={}
for node in xml_data.getElementsByTagName('Response')[0].childNodes:
if node.nodeType != minidom.Node.TEXT_NODE:
my_dict[node.nodeName] = node.childNodes[0].data
xml.etree from standard library starting from python2.5. look also at lxml which has the same interface. I don't "dived in" to much but i think that this is also applicable to python >= 2.5 too.
Edit:
This is a fast and really easy way to parse xml, don't really put data to a dict but the api is pretty intuitive.

Categories

Resources