Wildcard search at any nested depth using xml.etree.ElementTree

Wildcard search at any nested depth using xml.etree.ElementTree - python

I have a group of XML files which contain entries like
<group name="XXX common string">
<value val="12" description="a dozen">
<text>one less than a baker's dozen</text>
</value>
<value val="13" description="a baker's dozen">
<text>One more than a dozen</text>
</value>
</group>
<group name="YYY common string">
<value val="42" description="the answer">
<text>What do you get if you multiple 6 by 9?</text>
</value>
</group>
Is there any simple way, using import xml.etree.ElementTree as ET and
parser = ET.XMLParser()
parser.parser.UseForeignDTD(True)
if (args.info) or (args.diagnostics):
print('Parsing input file : ' + inputFileName)
tree = ET.parse(inputFileName, parser=parser)
root = tree.getroot()
to search for only <group> elements who#s name contains "common string" for a particular value val ?
Important: these groups are nested at different depths in different files.

This was a little difficult, because your own code won't work with the
example data you posted in your question (e.g., nothing there contains
the string error, and there are no id attributes, and your code
doesn't appear to search for "a particular value val, which seemed
to be one of your requirements). But here are a few ideas...
For finding all group elements that contain common string in the name attribute, you could do something like this:
>>> matching_groups = []
>>> for group in tree.xpath('//group[contains(#name, "common string")]'):
... matching_groups.append[group]
...
Which given your sample data would result in:
>>> print '\n'.join([etree.tostring(x) for x in matching_groups])
<group name="XXX common string">
<value val="12" description="a dozen">
<text>one less than a baker's dozen</text>
</value>
<value val="13" description="a baker's dozen">
<text>One more than a dozen</text>
</value>
</group>
<group name="YYY common string">
<value val="42" description="the answer">
<text>What do you get if you multiple 6 by 9?</text>
</value>
</group>
If you wanted to limit the results to only group elements that
contain value element with attribute val == 42, you could try:
>>> matching_groups = []
>>> for group in tree.xpath('//group[contains(#name, "common string")][value/#val = "42"]'):
... matching_groups.append(group)
...
Which would yield:
>>> print '\n'.join([etree.tostring(x) for x in matching_groups])
<group name="YYY common string">
<value val="42" description="the answer">
<text>What do you get if you multiple 6 by 9?</text>
</value>
</group>

The problems were 1) wildcard searching of group name, and 2) the fact that the groups were nested at different levels in different files.
I implemented this brute force approach to build a dictionary of all such error entries in an error named group, anywhere in the file.
I leave it here for posterity and invite more elephant solutions.
import xml.etree.ElementTree as ET
parser = ET.XMLParser()
parser.parser.UseForeignDTD(True)
tree = ET.parse(inputFileName, parser=parser)
root = tree.getroot()
args.errorDefinitions = {}
for element in tree.iter():
if element.tag == 'group':
if 'error' in element.get('name').lower():
if element._children:
for errorMessage in element._children[0]._children:
args.errorDefinitions[errorMessage.get('name')] = \
{'id': errorMessage.get('id'), \
'description': element._children[0].text}

Related

ElementTree - Change subchild value conditionally

I have just work with XML using ElementTree in my current project. I have a task to change a subchild value based on another subchild value in the same child.
I have created a code for that but somehow feel that there might be a way to improve on this readability wise and performance wise.
Here is my code,
import xml.etree.ElementTree as ET
tree = ET.ElementTree(ET.fromstring("<Properties><Property><Name>KENT</Name><Value>99</Value></Property><Property><Name>JOHN</Name><Value>fifthy</Value></Property></Properties>"))
root = tree.getroot()
change_found = False
for item in root:
for subItem in item:
if change_found and subItem.tag == "Value":
subItem.text = "50"
change_found = False
if subItem.tag == "Name" and subItem.text == "JOHN":
change_found = True
print(ET.tostring(root, encoding='utf8', method='xml'))
As you can see from the code, when the subchild text is "JOHN" and the tag is "Name", it sets the change_found to True. Since the next subchild has a tag of Value, it made the change to the text (from fifty to 50).
The code works fine, but I believe there can be some improvement.
You can assume that the structure of the property is always in this order.
<Property>
<Name> Some name </Name>
<Value> Some value </Value>
<Property>
You can also assume that there are only 1 with has a subchild "NAME" with a text "JOHN"

If I understand you correctly, you can get there more simply, using xpath:
root = ET.fromstring([your string above])
for p in root.findall('.//Property'):
if p.find('.//Name').text.strip()=="JOHN":
p.find('.//Value').text="50"
print(ET.tostring(root).decode())
Output:
<Properties>
<Property>
<Name>KENT
</Name>
<Value>99
</Value>
</Property>
<Property>
<Name>JOHN
</Name>
<Value>50</Value>
</Property>
</Properties>

Fetch xml tag values recursively using ElementTree

I have an xmk of the type:
<SCHOOL>
<GROUP name="GetStudInfo">
<DATA>
<NAME type="char">Sahil Jha</NAME>
<STD>11th</STD>
</DATA>
<DATA>
<NAME type="char">Rashmi Kaur</NAME>
<STD>11th</STD>
</DATA>
<DATA>
<NAME type="char">Palak Bisht</NAME>
<STD>11th</STD>
</DATA>
</SCHOOL>
I need to fetch the values of NAME, STD.
I tried doing this:
e = ET.ElementTree(ET.fromstring(getunitinfo_str))
for elt in e.iter():
print("{} {}".format(elt.tag, elt.text))
But this was covering other values as well:
Output:
SCHOOL
GROUP
DATA
NAME Sahil Jha
STD 11th
DATA
NAME Rashmi Kaur
STD 11th
DATA
NAME Palak Bisht
STD 11th
{}
Expected O/p:
{'Sahil Jha':'11th', 'Rashmi Kaur'::'11th', 'Palak Bisht':'11th'}
But the formatting should be of the type NAME:STD. Where am I going wrong?

As mentionned by #furas you can use XPATH to find all DATA elements and then find
NAME and STD elements:
import xml.etree.ElementTree as ET
xml = '''<SCHOOL>
<GROUP name="GetStudInfo">
<DATA>
<NAME type="char">Sahil Jha</NAME>
<STD>11th</STD>
</DATA>
<DATA>
<NAME type="char">Rashmi Kaur</NAME>
<STD>11th</STD>
</DATA>
<DATA>
<NAME type="char">Palak Bisht</NAME>
<STD>11th</STD>
</DATA>
</GROUP>
</SCHOOL>'''
e = ET.fromstring(xml)
for data_tag in e.findall('DATA'):
name = data_tag.find('NAME')
std = data_tag.find('STD')
print("{} {}".format(name.text, std.text))
Or you can use a dict comprehension to get the dictionary you want:
my_dict = {
data_tag.find('NAME').text: data_tag.find('STD').text
for data_tag in e.findall('.//DATA')
}
print(my_dict)

You need something more then only print() - you need if/else to check elt.tag to get only NAME and `STD.
Because NAME and STD are different tags so you will have to remeber NAME in some variable to use it when you get STD
name = None # default value at start
for elt in e.iter():
if elt.tag == 'NAME':
name = elt # remember element
if elt.tag == 'STD':
print("{}:{}".format(name.text, elt.text))
Or you could use xpath like in #qouify answer.
Minimal working code
getunitinfo_str = '''
<SCHOOL>
<GROUP name="GetStudInfo">
<DATA>
<NAME type="char">Sahil Jha</NAME>
<STD>11th</STD>
</DATA>
<DATA>
<NAME type="char">Rashmi Kaur</NAME>
<STD>11th</STD>
</DATA>
<DATA>
<NAME type="char">Palak Bisht</NAME>
<STD>11th</STD>
</DATA>
</GROUP>
</SCHOOL>
'''
import xml.etree.ElementTree as ET
e = ET.ElementTree(ET.fromstring(getunitinfo_str))
name = None # to remeber element
for elt in e.iter():
if elt.tag == 'NAME':
name = elt
if elt.tag == 'STD':
print("{}:{}".format(name.text, elt.text))

One liner below
import xml.etree.ElementTree as ET
xml = '''<SCHOOL>
<GROUP name="GetStudInfo">
<DATA>
<NAME type="char">Sahil Jha</NAME>
<STD>11th</STD>
</DATA>
<DATA>
<NAME type="char">Rashmi Kaur</NAME>
<STD>116th</STD>
</DATA>
<DATA>
<NAME type="char">Palak Bisht</NAME>
<STD>17th</STD>
</DATA>
</GROUP>
</SCHOOL>'''
root = ET.fromstring(xml)
data = {x.find("NAME").text: x.find("STD").text for x in root.findall('.//DATA')}
print(data)
output
{'Sahil Jha': '11th', 'Rashmi Kaur': '116th', 'Palak Bisht': '17th'}

Large XML parsing in Python

I am a novice in python and have the following task on hand.
I have a large xml file like the one below:
<Configuration>
<Parameters>
<Component Name='ABC'>
<Group Name='DEF'>
<Parameter Name='GHI'>
<Description>
Some Text
</Description>
<Type>Integer</Type>
<Restriction>
<Level>5</Level>
</Restriction>
<Value>
<Item Value='5'/>
</Value>
</Parameter>
<Parameter Name='JKL'>
<Description>
Some Text
</Description>
<Type>Integer</Type>
<Restriction>
<Level>5</Level>
</Restriction>
<Value>
<Item Value='5'/>
</Value>
</Parameter>
</Group>
<Group Name='MNO'>
<Parameter Name='PQR'>
<Description>
Some Text
</Description>
<Type>Integer</Type>
<Restriction>
<Level>5</Level>
</Restriction>
<Value>
<Item Value='5'/>
</Value>
</Parameter>
<Parameter Name='TUV'>
<Description>
Some Text
</Description>
<Type>Integer</Type>
<Restriction>
<Level>5</Level>
</Restriction>
<Value>
<Item Value='5'/>
</Value>
</Parameter>
</Group>
</Component>
</Parameters>
</Configuration>
In this xml file I have to parse through the component "ABC" go to group "MNO" and then to the parameter "TUV" and under this I have to change the item value to 10.
I have tried using xml.etree.cElementTree but to no use. And lxml dosent support on the server as its running a very old version of python. And I have no permissions to upgrade the version
I have been using the following code to parse and edit a relatively small xml:
def fnXMLModification(ArgStr):
argList = ArgStr.split()
strXMLPath = argList[0]
if not os.path.exists(strXMLPath):
fnlogs("XML File: " + strXMLPath + " does not exist.\n")
return False
try:
import xml.etree.cElementTree as ET
except ImportError:
import xml.etree.ElementTree as ET
f=open(strXMLPath, 'rt')
tree = ET.parse(f)
ValueSetFlag = False
AttrSetFlag = False
for strXPath in argList[1:]:
strXPathList = strXPath.split("[")
sxPath = strXPathList[0]
if len(strXPathList)==3:
# both present
AttrSetFlag = True
ValueSetFlag = True
valToBeSet = strXPathList[1].strip("]")
sAttr = strXPathList[2].strip("]")
attrList = sAttr.split(",")
elif len(strXPathList) == 2:
#anyone present
if "=" in strXPathList[1]:
AttrSetFlag = True
sAttr = strXPathList[1].strip("]")
attrList = sAttr.split(",")
else:
ValueSetFlag = True
valToBeSet = strXPathList[1].strip("]")
node = tree.find(sxPath)
if AttrSetFlag:
for att in attrList:
slist = att.split("=")
node.set(slist[0].strip(),slist[1].strip())
if ValueSetFlag:
node.text = valToBeSet
tree.write(strXMLPath)
fnlogs("XML File: " + strXMLPath + " has been modified successfully.\n")
return True
Using this function I am not able to traverse the current xml as it has lot of children attributes or sub groups.

import statement
import xml.etree.cElementTree as ET
Parse content by fromstring method.
root = ET.fromstring(data)
Iterate according our requirement and get target Item tag and change value of Value attribute
for component_tag in root.iter("Component"):
if "Name" in component_tag.attrib and component_tag.attrib['Name']=='ABC':
for group_tag in component_tag.iter("Group"):
if "Name" in group_tag.attrib and group_tag.attrib['Name']=='MNO':
#for value_tag in group_tag.iter("Value"):
for item_tag in group_tag.findall("Parameter[#Name='TUV']/Value/Item"):
item_tag.attrib["Value"] = "10"
We can use Xpath to get target Item tag
for item_tag in root.findall("Parameters/Component[#Name='ABC']/Group[#Name='MNO']/Parameter[#Name='TUV']/Value/Item"):
item_tag.attrib["Value"] = "10"
Use tostring method to get content.
data = ET.tostring(root)

Proper way to convert xml to dictionary

I'm not sure if this is the best way to convert this xml result into a dictionary, Besides doing that, is there any proper way to convert to dict ?
xml from http request result:
<Values version="2.0">
<value name="configuration">test</value>
<array name="configurationList" type="value" depth="1">
<value>test</value>
</array>
<value name="comment">Upload this for our robot.</value>
<array name="propertiesTable" type="record" depth="1">
<record javaclass="com.wm.util.Values">
<value name="name">date_to_go</value>
<value name="value">1990</value>
</record>
<record javaclass="com.wm.util.Values">
<value name="name">role</value>
<value name="value">Survivor</value>
</record>
<record javaclass="com.wm.util.Values">
<value name="name">status</value>
<value name="value">living</value>
</record>
<record javaclass="com.wm.util.Values">
<value name="name">user</value>
<value name="value">John&nbsp;Connor</value>
</record>
</array>
<null name="propertiesList"/>
</Values>
Code to convert the xml to dictionary ( which is working properly )
from xml.etree import ElementTree
tree = ElementTree.fromstring(xml)
mom = []
mim = []
configuration = tree.find('value[#name="configuration"]').text
comment = tree.find('value[#name="comment"]').text
prop = (configuration, comment)
mom.append(prop)
for records in tree.findall('./array/record'):
me = []
for child in records.iter('value'):
me.append(child.text)
mim.append(me)
for key, value in mim:
mi_dict = dict()
mi_dict[key] = value
mom.append(mi_dict)
print(mom)
The result ( working as intended ):
[('test', 'Upload this for our robot.'), {'date_to_go': '1990'}, {'role': 'Survivor'}, {'status': 'living'}, {'user': 'John Connor'}]
EDIT:
Sorry if i wans't clear, but the code described is working as expected. but i'm not sure if this is the proper way ( python way, pythonic or clean ) to do it.
Thanks in advance.

I don't think its too bad. You can make some minor changes to be a bit more pythonic
from xml.etree import ElementTree
tree = ElementTree.fromstring(xml)
mom = []
mim = []
configuration = tree.find('value[#name="configuration"]').text
comment = tree.find('value[#name="comment"]').text
prop = (configuration, comment)
mom.append(prop)
for records in tree.findall('./array/record'):
mim.append([child.text for child in records.iter('value')])
mom += [{k:v} for k, v in mim.iteritems()]
print(mom)

replace only first occurrence of field/word on a file

I have some zipfiles ( 700+ ) with the following structure ( the file is exactly like this )
<?xml version="1.0" encoding="UTF-8"?>
<Values version="2.0">
<record name="trigger">
<value name="uniqueId">6xjUCpDlrTVHRsEVmxx0Ews6ni8=</value>
<value name="processingSuspended">false</value>
<value name="retrievalSuspended">false</value>
</record>
<record name="trigger">
<value name="uniqueId">6xjUCpDlrTVHRsEVmxx0Ews6ni8=</value>
<value name="processingSuspended">false</value>
<value name="retrievalSuspended">false</value>
</record>
</Values>
What i would like to achieve, is to replace, no matter if the value of the first occurrence fields processingSuspended and retrievalSuspended is true or false. to replace it to false. But only for the first occurrence.
EDIT:
By request im adding what i have so far, where i can get the fields that i want, But. i believe there is a simplier way to do that.:
import os
import zipfile
import glob
import time
import re
def main():
rList = []
for z in glob.glob("*.zip"):
root = zipfile.ZipFile(z)
for filename in root.namelist():
if filename.find("node.ndf") >= 0:
for line in root.read(filename).split("\n"):
if line.find("broker-trigger") >= 0:
for iline in root.read(filename).split("\n"):
Values = dict()
#match Processing state
if iline.find("processingSuspended") >= 0:
mpr = re.search(r'(.*>)(.*?)(<.*)',
iline, re.M|re.I)
#match Retrieval state
if iline.find("retrievalSuspended") >= 0:
mr = re.search(r'(.*>)(.*?)(<.*)',
iline, re.M|re.I)
Values['processingSuspended'] = mpr.group(2)
Values['retrievalSuspended'] = mr.group(2)
#print mr.group(2)
rList.append(Values)
print rList
if __name__== "__main__":
main()
Thanks in advance.

Try using lxml:
>>> xml = '''\
<?xml version="1.0" encoding="UTF-8"?>
<Values version="2.0">
<record name="trigger">
<value name="uniqueId">6xjUCpDlrTVHRsEVmxx0Ews6ni8=</value>
<value name="processingSuspended">true</value>
<value name="retrievalSuspended">true</value>
</record>
<record name="trigger">
<value name="uniqueId">6xjUCpDlrTVHRsEVmxx0Ews6ni8=</value>
<value name="processingSuspended">true</value>
<value name="retrievalSuspended">true</value>
</record>
</Values>\
'''
>>> from lxml import etree
>>> tree = etree.fromstring(xml)
>>> tree.xpath('//value[#name="processingSuspended"]')[0].text = 'false'
>>> tree.xpath('//value[#name="retrievalSuspended"]')[0].text = 'false'
This xpath expression '//value[#name="processingSuspended"]' finds all the tags value with attribute name equal to "processingSuspended". Then we just take the first one with [0] and change the tag's text to 'false'.
Output:
>>> print(etree.tostring(tree, pretty_print=True))
<Values version="2.0">
<record name="trigger">
<value name="uniqueId">6xjUCpDlrTVHRsEVmxx0Ews6ni8=</value>
<value name="processingSuspended">false</value>
<value name="retrievalSuspended">false</value>
</record>
<record name="trigger">
<value name="uniqueId">6xjUCpDlrTVHRsEVmxx0Ews6ni8=</value>
<value name="processingSuspended">true</value>
<value name="retrievalSuspended">true</value>
</record>
</Values>
>>>

You can read the zip archives and update the xml formatted data in the file they contain with Python's built-in modules. There's even a tutorial in the documentation for xml.etree.ElementTree.
import glob
import xml.etree.ElementTree as ET
import zipfile
def main():
for z in glob.glob("*.zip"):
print 'processing file: {!r}'.format(z)
zfile = zipfile.ZipFile(z)
for filename in zfile.namelist():
print 'processing archive member: {!r} in {}'.format(filename, z)
contents = zfile.open(filename).read()
print 'Before changes:'
print contents
root = ET.fromstring(contents)
if root.tag != "Values" or root.attrib["version"] != "2.0":
print 'unsupported xml file'
break
if(root[0][1].tag == "value" and
root[0][1].attrib["name"] == "processingSuspended"):
root[0][1].text = "false"
else:
print 'expected "processingSuspended" value field not found'
break
if(root[0][2].tag == "value" and
root[0][2].attrib["name"] == "retrievalSuspended"):
root[0][2].text = "false"
else:
print 'expected "retrievalSuspended" value field not found'
break
print 'After changes:'
updated_contents = ET.tostring(root)
print updated_contents
if __name__== "__main__":
main()

Develop Reference

Python is a programming language that lets you work quickly and integrate systems more effectively.

Wildcard search at any nested depth using xml.etree.ElementTree - python

Related

ElementTree - Change subchild value conditionally

Fetch xml tag values recursively using ElementTree

Large XML parsing in Python

Proper way to convert xml to dictionary

replace only first occurrence of field/word on a file

Categories

Resources