ElementTree - Change subchild value conditionally - python

I have just work with XML using ElementTree in my current project. I have a task to change a subchild value based on another subchild value in the same child.
I have created a code for that but somehow feel that there might be a way to improve on this readability wise and performance wise.
Here is my code,
import xml.etree.ElementTree as ET
tree = ET.ElementTree(ET.fromstring("<Properties><Property><Name>KENT</Name><Value>99</Value></Property><Property><Name>JOHN</Name><Value>fifthy</Value></Property></Properties>"))
root = tree.getroot()
change_found = False
for item in root:
for subItem in item:
if change_found and subItem.tag == "Value":
subItem.text = "50"
change_found = False
if subItem.tag == "Name" and subItem.text == "JOHN":
change_found = True
print(ET.tostring(root, encoding='utf8', method='xml'))
As you can see from the code, when the subchild text is "JOHN" and the tag is "Name", it sets the change_found to True. Since the next subchild has a tag of Value, it made the change to the text (from fifty to 50).
The code works fine, but I believe there can be some improvement.
You can assume that the structure of the property is always in this order.
<Property>
<Name> Some name </Name>
<Value> Some value </Value>
<Property>
You can also assume that there are only 1 with has a subchild "NAME" with a text "JOHN"

If I understand you correctly, you can get there more simply, using xpath:
root = ET.fromstring([your string above])
for p in root.findall('.//Property'):
if p.find('.//Name').text.strip()=="JOHN":
p.find('.//Value').text="50"
print(ET.tostring(root).decode())
Output:
<Properties>
<Property>
<Name>KENT
</Name>
<Value>99
</Value>
</Property>
<Property>
<Name>JOHN
</Name>
<Value>50</Value>
</Property>
</Properties>

Related

Increment list indexes to get correct values to be updated in XML data based on Title

List elements to be appended in XML data:
Sorted_TestSpecID: [10860972, 10860972, 10860972, 10860972, 10860972]
Sorted_TestCaseID: [16961435, 16961462, 16961739, 16961741, 16961745]
Sorted_TestText : ['SIG1', 'SIG2', 'SIG3', 'Signal1', 'Signal2']
original xml data:
<tc>
<title>Signal1</title>
<tcid>2c758925-dc3d-4b1d-a5e2-e0ca54c52a47</tcid>
<attributes>
<attr>
<key>TestSpec ID</key>
<value>0</value>
</attr>
<attr>
<key>TestCase ID</key>
<value>0</value>
</attr>
</attributes>
</tc>
Trying Python script to:
Search title Signal1 in xml data from Sorted_TestText
Then it should search for Key =TestCase ID and update the corresponding 16961741 value
Then it shall check for its resp. Key =TestSpec ID and update the corresponding 10860972.
soup = BeautifulSoup(xml_data, 'xml')
for tc in soup.find_all('tc'):
for title, spec, case in zip(Sorted_TestText, Sorted_TestSpecID, Sorted_TestCaseID):
if tc.find('title').text == title:
for attr in tc.find_all('attr'):
if attr.find('key').text == "TestSpec ID":
attr.find('value').text = str(spec)
if attr.find('key').text == "TestCase ID"
attr.find('value').text = str(case)
print(soup)
I've tried above script ,this script is not updating spec and case based on title, working on if spec, case and title are in order. My intention was script shall look for title and then it shall update its respective attributes. Lets say in my xml 'SIG1', 'SIG2', 'SIG3' are not present; I want to update spec and case of Signal1 with spec: 10860972 case: 16961741, but with this script it is updating SIG4 as spec: 10860972 case: 16961435. Need to traverse the spec and case lists as well for respective title. I tried, but no luck.; Required support here. Thanks in advance.
I'd use a dictionary where keys are titles and values are TestCaseIDs and TestSpecIDs.
Then, to change the contents of <value> use .string instead of .text:
dct = {
c: (str(a), str(b))
for a, b, c in zip(Sorted_TestSpecID, Sorted_TestCaseID, Sorted_TestText)
}
for tc in soup.select("tc"):
title = tc.title.get_text(strip=True)
if title not in dct:
continue
val = tc.select_one('attr:has(key:-soup-contains("TestSpec ID")) value')
if val:
val.string = str(dct[title][0])
val = tc.select_one('attr:has(key:-soup-contains("TestCase ID")) value')
if val:
val.string = str(dct[title][1])
print(soup.prettify())
Prints:
<?xml version="1.0" encoding="utf-8"?>
<tc>
<title>
Signal1
</title>
<tcid>
2c758925-dc3d-4b1d-a5e2-e0ca54c52a47
</tcid>
<attributes>
<attr>
<key>
TestSpec ID
</key>
<value>
10860972
</value>
</attr>
<attr>
<key>
TestCase ID
</key>
<value>
16961741
</value>
</attr>
</attributes>
</tc>

Modify value xml tag with python

I have this xml:
<resources>
<string name="name1">value1</string>
<string name="name2">value2</string>
<string name="name3">value3</string>
<string name="name4">value4</string>
<string name="name5">value5</string>
</resources>
and I want to change each value of each string tag, I've tried with ElementTree but i can not solve it...
I have this but it doesn't works!
tree = ET.parse(archivo_xml)
root = tree.getroot()
cadena = root.findall('string')
cadena.text = "something"
The root.findall() does return a list which is why that approach doesn't work.
Use root.iter() to find all the matching tags with 'string' instead, then loop over the results and change the text value of each.
for cadena in root.iter('string'):
cadena.text = "something"

Large XML parsing in Python

I am a novice in python and have the following task on hand.
I have a large xml file like the one below:
<Configuration>
<Parameters>
<Component Name='ABC'>
<Group Name='DEF'>
<Parameter Name='GHI'>
<Description>
Some Text
</Description>
<Type>Integer</Type>
<Restriction>
<Level>5</Level>
</Restriction>
<Value>
<Item Value='5'/>
</Value>
</Parameter>
<Parameter Name='JKL'>
<Description>
Some Text
</Description>
<Type>Integer</Type>
<Restriction>
<Level>5</Level>
</Restriction>
<Value>
<Item Value='5'/>
</Value>
</Parameter>
</Group>
<Group Name='MNO'>
<Parameter Name='PQR'>
<Description>
Some Text
</Description>
<Type>Integer</Type>
<Restriction>
<Level>5</Level>
</Restriction>
<Value>
<Item Value='5'/>
</Value>
</Parameter>
<Parameter Name='TUV'>
<Description>
Some Text
</Description>
<Type>Integer</Type>
<Restriction>
<Level>5</Level>
</Restriction>
<Value>
<Item Value='5'/>
</Value>
</Parameter>
</Group>
</Component>
</Parameters>
</Configuration>
In this xml file I have to parse through the component "ABC" go to group "MNO" and then to the parameter "TUV" and under this I have to change the item value to 10.
I have tried using xml.etree.cElementTree but to no use. And lxml dosent support on the server as its running a very old version of python. And I have no permissions to upgrade the version
I have been using the following code to parse and edit a relatively small xml:
def fnXMLModification(ArgStr):
argList = ArgStr.split()
strXMLPath = argList[0]
if not os.path.exists(strXMLPath):
fnlogs("XML File: " + strXMLPath + " does not exist.\n")
return False
try:
import xml.etree.cElementTree as ET
except ImportError:
import xml.etree.ElementTree as ET
f=open(strXMLPath, 'rt')
tree = ET.parse(f)
ValueSetFlag = False
AttrSetFlag = False
for strXPath in argList[1:]:
strXPathList = strXPath.split("[")
sxPath = strXPathList[0]
if len(strXPathList)==3:
# both present
AttrSetFlag = True
ValueSetFlag = True
valToBeSet = strXPathList[1].strip("]")
sAttr = strXPathList[2].strip("]")
attrList = sAttr.split(",")
elif len(strXPathList) == 2:
#anyone present
if "=" in strXPathList[1]:
AttrSetFlag = True
sAttr = strXPathList[1].strip("]")
attrList = sAttr.split(",")
else:
ValueSetFlag = True
valToBeSet = strXPathList[1].strip("]")
node = tree.find(sxPath)
if AttrSetFlag:
for att in attrList:
slist = att.split("=")
node.set(slist[0].strip(),slist[1].strip())
if ValueSetFlag:
node.text = valToBeSet
tree.write(strXMLPath)
fnlogs("XML File: " + strXMLPath + " has been modified successfully.\n")
return True
Using this function I am not able to traverse the current xml as it has lot of children attributes or sub groups.
import statement
import xml.etree.cElementTree as ET
Parse content by fromstring method.
root = ET.fromstring(data)
Iterate according our requirement and get target Item tag and change value of Value attribute
for component_tag in root.iter("Component"):
if "Name" in component_tag.attrib and component_tag.attrib['Name']=='ABC':
for group_tag in component_tag.iter("Group"):
if "Name" in group_tag.attrib and group_tag.attrib['Name']=='MNO':
#for value_tag in group_tag.iter("Value"):
for item_tag in group_tag.findall("Parameter[#Name='TUV']/Value/Item"):
item_tag.attrib["Value"] = "10"
We can use Xpath to get target Item tag
for item_tag in root.findall("Parameters/Component[#Name='ABC']/Group[#Name='MNO']/Parameter[#Name='TUV']/Value/Item"):
item_tag.attrib["Value"] = "10"
Use tostring method to get content.
data = ET.tostring(root)

Modify XML file using ElementTree

I am trying to do the folowing with Python:
get "price" value and change it
find "price_qty" and insert new line with new tier and different price based on the "price".
so far I could only find the price and change it and insert line in about correct place but I can't find a way how to get there "item" and "qty" and "price" attributes, nothing has worked so far...
this is my original xml:
<?xml version="1.0" encoding="UTF-8" standalone="yes"?>
<body start="20.04.2014 10:02:60">
<pricelist>
<item>
<name>LEO - red pen</name>
<price>31,4</price>
<price_snc>0</price_snc>
<price_ao>0</price_ao>
<price_qty>
<item qty="150" price="28.20" />
<item qty="750" price="26.80" />
<item qty="1500" price="25.60" />
</price_qty>
<stock>50</stock>
</item>
</pricelist>
the new xml should look this way:
<pricelist>
<item>
<name>LEO - red pen</name>
<price>31,4</price>
<price_snc>0</price_snc>
<price_ao>0</price_ao>
<price_qty>
<item qty="10" price="31.20" /> **-this is the new line**
<item qty="150" price="28.20" />
<item qty="750" price="26.80" />
<item qty="1500" price="25.60" />
</price_qty>
<stock>50</stock>
</item>
</pricelist>
my code so far:
import xml.etree.cElementTree as ET
from xml.etree.ElementTree import Element, SubElement
tree = ET.ElementTree(file='pricelist.xml')
root = tree.getroot()
pos=0
# price - raise the main price and insert new tier
for elem in tree.iterfind('pricelist/item/price'):
price = elem.text
newprice = (float(price.replace(",", ".")))*1.2
newtier = "NEW TIER"
SubElement(root[0][pos][5], newtier)
pos+=1
tree.write('pricelist.xml', "UTF-8")
result:
...
<price_qty>
<item price="28.20" qty="150" />
<item price="26.80" qty="750" />
<item price="25.60" qty="1500" />
<NEW TIER /></price_qty>
thank you for any help.
Don't use fixed indexing. You already have the item element, so why don't use it?
tree = ET.ElementTree(file='pricelist.xml')
root = tree.getroot()
for elem in tree.iterfind('pricelist/item'):
price = elem.findtext('price')
newprice = float(price.replace(",", ".")) * 1.2
newtier = ET.Element("item", qty="10", price="%.2f" % newprice)
elem.find('price_qty').insert(0, newtier)
tree.write('pricelist.xml', "UTF-8")

updating XML attribute value in python

In the below XML, I want to parse it and update the value of "alcohol" to "yes" for all the attributes where age>21. I'm having a problem with it being a node buried inside other nodes. Could someone help me understand how to handle this?
Here's the XML again..
<root xmlns="XYZ" usingPalette="">
<grandParent hostName="XYZ">
<parent>
<child name="JohnsDad">
<grandChildren name="John" sex="male" age="22" alcohol="no"/>
</child>
<child name="PaulasDad">
<grandChildren name="Paula" sex="female" age="15" alcoho="no"/>
</child>
</parent>
</grandParent>
</root>
I tried find all and find method using this document here (http://pymotw.com/2/xml/etree/ElementTree/parse.html) but it didn't find it. For example, following code returns no results
for node in tree.findall('.//grandParent'):
print node
import xml.etree.ElementTree as ET
tree = ET.parse('data')
for node in tree.getiterator():
if int(node.attrib.get('age', 0)) > 21:
node.attrib['alcohol'] = 'yes'
root = tree.getroot()
ET.register_namespace("", "XYZ")
print(ET.tostring(root))
yields
<root xmlns="XYZ" usingPalette="">
<grandParent hostName="XYZ">
<parent>
<child name="JohnsDad">
<grandChildren age="22" alcohol="yes" name="John" sex="male" />
</child>
<child name="PaulasDad">
<grandChildren age="15" alcoho="no" name="Paula" sex="female" />
</child>
</parent>
</grandParent>
</root>
By the way, since the XML uses the namespace "XYZ", you must specify the namespace in your XPath:
for node in tree.findall('.//{XYZ}grandParent'):
print node
That will return the grandParent element, but since you want to inspect all subnodes, I think using getiterator is easier here.
To preserve comments while using xml.etree.ElementTree you could use the custom parser Fredrik Lundh shows here:
import xml.etree.ElementTree as ET
class PIParser(ET.XMLTreeBuilder):
"""
http://effbot.org/zone/element-pi.htm
"""
def __init__(self):
ET.XMLTreeBuilder.__init__(self)
# assumes ElementTree 1.2.X
self._parser.CommentHandler = self.handle_comment
self._parser.ProcessingInstructionHandler = self.handle_pi
self._target.start("document", {})
def close(self):
self._target.end("document")
return ET.XMLTreeBuilder.close(self)
def handle_comment(self, data):
self._target.start(ET.Comment, {})
self._target.data(data)
self._target.end(ET.Comment)
def handle_pi(self, target, data):
self._target.start(ET.PI, {})
self._target.data(target + " " + data)
self._target.end(ET.PI)
tree = ET.parse('data', PIParser())
Note that if you install lxml, you could instead use:
import lxml.etree as ET
parser = ET.XMLParser(remove_comments=False)
tree = etree.parse('data', parser=parser)

Categories

Resources