Editing the XML texts from a XML file using Python - python

I have an XML file which contains some data as given.
<?xml version="1.0" encoding="UTF-8" ?>
- <ParameterData>
<CreationInfo date="10/28/2009 03:05:14 PM" user="manoj" />
- <ParameterList count="85">
- <Parameter name="Spec 2 Included" type="boolean" mode="both">
<Value>n/a</Value>
<Result>n/a</Result>
</Parameter>
- <Parameter name="Spec 2 Label" type="string" mode="both">
<Value>n/a</Value>
<Result>n/a</Result>
</Parameter>
- <Parameter name="Spec 3 Included" type="boolean" mode="both">
<Value>n/a</Value>
<Result>n/a</Result>
</Parameter>
- <Parameter name="Spec 3 Label" type="string" mode="both">
<Value>n/a</Value>
<Result>n/a</Result>
</Parameter>
</ParameterList>
</ParameterData>
I have one text file with lines as
Spec 2 Included : TRUE
Spec 2 Label: 19-Flat2-HS3
Spec 3 Included : FALSE
Spec 3 Label: 4-1-Bead1-HS3
Now I want to edit XML texts; i,e. I want to replace the field (n/a)
with the corresponding values from the text file.
Like I want the file to looks like
<?xml version="1.0" encoding="UTF-8" ?>
- <ParameterData>
<CreationInfo date="10/28/2009 03:05:14 PM" user="manoj" />
- <ParameterList count="85">
- <Parameter name="Spec 2 Included" type="boolean" mode="both">
<Value>TRUE</Value>
<Result>TRUE</Result>
</Parameter>
- <Parameter name="Spec 2 Label" type="string" mode="both">
<Value>19-Flat2-HS3</Value>
<Result>19-Flat2-HS3</Result>
</Parameter>
- <Parameter name="Spec 3 Included" type="boolean" mode="both">
<Value>FALSE</Value>
<Result>FALSE</Result>
</Parameter>
- <Parameter name="Spec 3 Label" type="string" mode="both">
<Value>4-1-Bead1-HS3</Value>
<Result>4-1-Bead1-HS3</Result>
</Parameter>
</ParameterList>
</ParameterData>
I am new to this Python-XML coding.
I dont have idea about how to edit the text fields in a XML file.
I am trying to Use elementtree.ElementTree module.
but to read the lines in XML file and extract the attributes I dont know which modules need to be imported.
Please help.
Thanks and Regards.

You can convert your data text into python dictionary by regular expression
data="""Spec 2 Included : TRUE
Spec 2 Label: 19-Flat2-HS3
Spec 3 Included : FALSE
Spec 3 Label: 4-1-Bead1-HS3"""
#data=open("data.txt").read()
import re
data=dict(re.findall('(Spec \d+ (?:Included|Label))\s*:\s*(\S+)',data))
data will be as follows
{'Spec 3 Included': 'FALSE', 'Spec 2 Included': 'TRUE', 'Spec 3 Label': '4-1-Bead1-HS3', 'Spec 2 Label': '19-Flat2-HS3'}
Then you can convert it by using any of your favoriate xml parser, I will use minidom here.
from xml.dom import minidom
dom = minidom.parseString(xml_text)
params=dom.getElementsByTagName("Parameter")
for param in params:
name=param.getAttribute("name")
if name in data:
for item in param.getElementsByTagName("*"): # You may change to "Result" or "Value" only
item.firstChild.replaceWholeText(data[name])
print dom.toxml()
#write to file
open("output.xml","wb").write(dom.toxml())
Results
<?xml version="1.0" ?><ParameterData>
<CreationInfo date="10/28/2009 03:05:14 PM" user="manoj"/>
<ParameterList count="85">
<Parameter mode="both" name="Spec 2 Included" type="boolean">
<Value>TRUE</Value>
<Result>TRUE</Result>
</Parameter>
<Parameter mode="both" name="Spec 2 Label" type="string">
<Value>19-Flat2-HS3</Value>
<Result>19-Flat2-HS3</Result>
</Parameter>
<Parameter mode="both" name="Spec 3 Included" type="boolean">
<Value>FALSE</Value>
<Result>FALSE</Result>
</Parameter>
<Parameter mode="both" name="Spec 3 Label" type="string">
<Value>4-1-Bead1-HS3</Value>
<Result>4-1-Bead1-HS3</Result>
</Parameter>
</ParameterList>
</ParameterData>

Well, you could start with
import xml.etree.ElementTree as ET
tree = ET.parse("blah.xml")
Find the elements you want to modify.
To replace the contents of an element, just do
element.text = "TRUE"
The import statement above works in Python 2.5 or later. If you have an older version of Python you'll need to install ElementTree as an extension, and then the import statement is different: import elementtree.ElementTree as ET.

Unfortunately, the XPath supported by ElementTree isn't complete. Since Python 2.6 includes an older version, finding elements by attribute (as stated here) does not work. So Python's own documentation should be your first stop: xml.etree.ElementTree
import xml.etree.ElementTree as ET
original = ET.parse("original.xml")
parameters = original.findall(".//Parameter")
changes = {}
# read changes
with open("changes.txt", "rb") as in_file:
for change in in_file:
change = change.rstrip() # remove line endings
name, value = change.split(":")
changes[name.strip()] = value.strip() # remove whitespaces
# find paramter element and apply changes
for parameter in parameters:
parameter_name = parameter.get("name")
if changes.has_key(parameter_name):
value = parameter.find("./Value")
value.text = changes[parameter_name]
result = parameter.find("./Result")
result.text = changes[parameter_name]
original.write("new.xml")

Here is how you could do it using Amara
from amara import bindery
doc = bindery.parse(XML)
def cleanup_for_dict(key, value):
return key.strip(), value.strip()
params = dict(( cleanup_for_dict(*line.split(':', 1))
for line in TEXT.splitlines()))
for param in doc.ParameterData.ParameterList.Parameter:
if param.name in params:
param.Value = params[param.name]
param.Result = params[param.name]
doc.xml_write()

Related

Get text inside xml tags by their name

I had a xml code and i want to get text in exact elements(xml tags) using python language .
I have tried couple of solutions and didnt work.
import xml.etree.ElementTree as ET
tree = ET.fromstring(xml)
for node in tree.iter('Model'):
print node
How can i do that ?
Xml Code :
<soap:Envelope
xmlns:soap="http://schemas.xmlsoap.org/soap/envelope/"
xmlns:xsi="http://www.w3.org/2001/XMLSchema-instance"
xmlns:xsd="http://www.w3.org/2001/XMLSchema">
<soap:Body>
<GetVehicleLimitedInfoResponse
xmlns="http://schemas.conversesolutions.com/xsd/dmticta/v1">
<return>
<ResponseMessage xsi:nil="true" />
<ErrorCode xsi:nil="true" />
<RequestId> 2012290007705 </RequestId>
<TransactionCharge>150</TransactionCharge>
<VehicleNumber>GF-0176</VehicleNumber>
<AbsoluteOwner>SIYAPATHA FINANCE PLC</AbsoluteOwner>
<EngineNo>GA15-483936F</EngineNo>
<ClassOfVehicle>MOTOR CAR</ClassOfVehicle>
<Make>NISSAN</Make>
<Model>PULSAR</Model>
<YearOfManufacture>1998</YearOfManufacture>
<NoOfSpecialConditions>0</NoOfSpecialConditions>
<SpecialConditions xsi:nil="true" />
</return>
</GetVehicleLimitedInfoResponse>
</soap:Body>
</soap:Envelope>
Edited and improved answer:
import xml.etree.ElementTree as ET
import re
ns = {"veh": "http://schemas.conversesolutions.com/xsd/dmticta/v1"}
tree = ET.parse('test.xml') # save your xml as test.xml
root = tree.getroot()
def get_tag_name(tag):
return re.sub(r'\{.*\}', '',tag)
for node in root.find(".//veh:return", ns):
print(get_tag_name(node.tag)+': ', node.text)
It should produce something like this:
ResponseMessage: None
ErrorCode: None
RequestId: 2012290007705
TransactionCharge: 150
VehicleNumber: GF-0176
AbsoluteOwner: SIYAPATHA FINANCE PLC
EngineNo: GA15-483936F
ClassOfVehicle: MOTOR CAR
Make: NISSAN
Model: PULSAR
YearOfManufacture: 1998
NoOfSpecialConditions: 0
SpecialConditions: None

How to add space before and after CDATA in XML file

I want to create a function to modify XML content without changing the format. I managed to change the text but I can't do it without changing the format in XML.
So now, what I wanted to do is to add space before and after CDATA in a XML file.
Default XML file:
<?xml version="1.0" encoding="utf-8"?>
<Mapsxmlns="http://www.semi.org">
<Map>
<Device>
<ReferenceDevice/>
<Bin>
<Bin Bin="001"/>
</Bin>
<Data>
<Row> <![CDATA[001 001 001]]> </Row>
</Data>
</Device>
</Map>
</Maps>
And I am getting this result:
<?xml version="1.0" encoding="utf-8"?>
<Mapsxmlns="http://www.semi.org">
<Map>
<Device>
<ReferenceDevice/>
<Bin>
<Bin Bin="001"/>
</Bin>
<Data>
<Row><![CDATA[001 001 099]]></Row>
</Data>
</Device>
</Map>
</Maps>
However, I want the new xml to be like this:
<?xml version="1.0" encoding="utf-8"?>
<Mapsxmlns="http://www.semi.org">
<Map>
<Device>
<ReferenceDevice/>
<Bin>
<Bin Bin="001"/>
</Bin>
<Data>
<Row> <![CDATA[001 001 099]]> </Row>
</Data>
</Device>
</Map>
</Maps>
Here is my code:
from lxml import etree as ET
def xml_new(f,fpath,newtext,xmlrow):
xmlrow = 19
parser = ET.XMLParser(strip_cdata=False)
tree = ET.parse(f, parser)
root = tree.getroot()
for child in root:
value = child[0][2][xmlrow].text
text = ET.CDATA("001 001 099")
child[0][2][xmlrow] = ET.Element('Row')
child[0][2][xmlrow].text = text
child[0][2][xmlrow].tail = "\n"
ET.register_namespace('A', "http://www.semi.org")
tree.write(fpath,encoding='utf-8',xml_declaration=True)
return value
Anyone can help me on this? thanks in advance!
I don't quite understand what you want to do. Here's an example for you. I don't know if it can meet your needs.
from simplified_scrapy import SimplifiedDoc,req,utils
html ='''<?xml version="1.0" encoding="utf-8"?>
<Mapsxmlns="http://www.semi.org">
<Map>
<Device>
<ReferenceDevice/>
<Bin>
<Bin Bin="001"/>
</Bin>
<Data>
<Row> <![CDATA[001 001 001]]> </Row>
</Data>
</Device>
</Map>
</Maps>'''
doc = SimplifiedDoc(html)
row = doc.Data.Row # Get the node you want to modify.
row.setContent(" "+row.html+" ") # Modify the node content.
print (doc.html)
Result:
<?xml version="1.0" encoding="utf-8"?>
<Mapsxmlns="http://www.semi.org">
<Map>
<Device>
<ReferenceDevice />
<Bin>
<Bin Bin="001" />
</Bin>
<Data>
<Row> <![CDATA[001 001 001]]> </Row>
</Data>
</Device>
</Map>
</Maps>
thanks for all your help. I have found another way to achieve the result I want
This is the code:
# what you want to change
replaceby = '020]]> </Row>\n'
# row you want to change
row = 1
# col you want to change based on list
col = 3
file = open(file,'r')
line = file.readlines()
i = 0
editedXML=[]
for l in line:
if 'cdata' in l.lower():
i=i+1
if i == row:
oldVal = l.split(' ')
newVal = []
for index, old in enumerate(oldVal):
if index == col:
newVal.append(replaceby)
else:
newVal.append(old)
editedXML.append(' '.join(newVal))
else:
editedXML.append(l)
else:
editedXML.append(l)
file2 = open(newfile,'w')
file2.write(''.join(editedXML))
file2.close()

Large XML parsing in Python

I am a novice in python and have the following task on hand.
I have a large xml file like the one below:
<Configuration>
<Parameters>
<Component Name='ABC'>
<Group Name='DEF'>
<Parameter Name='GHI'>
<Description>
Some Text
</Description>
<Type>Integer</Type>
<Restriction>
<Level>5</Level>
</Restriction>
<Value>
<Item Value='5'/>
</Value>
</Parameter>
<Parameter Name='JKL'>
<Description>
Some Text
</Description>
<Type>Integer</Type>
<Restriction>
<Level>5</Level>
</Restriction>
<Value>
<Item Value='5'/>
</Value>
</Parameter>
</Group>
<Group Name='MNO'>
<Parameter Name='PQR'>
<Description>
Some Text
</Description>
<Type>Integer</Type>
<Restriction>
<Level>5</Level>
</Restriction>
<Value>
<Item Value='5'/>
</Value>
</Parameter>
<Parameter Name='TUV'>
<Description>
Some Text
</Description>
<Type>Integer</Type>
<Restriction>
<Level>5</Level>
</Restriction>
<Value>
<Item Value='5'/>
</Value>
</Parameter>
</Group>
</Component>
</Parameters>
</Configuration>
In this xml file I have to parse through the component "ABC" go to group "MNO" and then to the parameter "TUV" and under this I have to change the item value to 10.
I have tried using xml.etree.cElementTree but to no use. And lxml dosent support on the server as its running a very old version of python. And I have no permissions to upgrade the version
I have been using the following code to parse and edit a relatively small xml:
def fnXMLModification(ArgStr):
argList = ArgStr.split()
strXMLPath = argList[0]
if not os.path.exists(strXMLPath):
fnlogs("XML File: " + strXMLPath + " does not exist.\n")
return False
try:
import xml.etree.cElementTree as ET
except ImportError:
import xml.etree.ElementTree as ET
f=open(strXMLPath, 'rt')
tree = ET.parse(f)
ValueSetFlag = False
AttrSetFlag = False
for strXPath in argList[1:]:
strXPathList = strXPath.split("[")
sxPath = strXPathList[0]
if len(strXPathList)==3:
# both present
AttrSetFlag = True
ValueSetFlag = True
valToBeSet = strXPathList[1].strip("]")
sAttr = strXPathList[2].strip("]")
attrList = sAttr.split(",")
elif len(strXPathList) == 2:
#anyone present
if "=" in strXPathList[1]:
AttrSetFlag = True
sAttr = strXPathList[1].strip("]")
attrList = sAttr.split(",")
else:
ValueSetFlag = True
valToBeSet = strXPathList[1].strip("]")
node = tree.find(sxPath)
if AttrSetFlag:
for att in attrList:
slist = att.split("=")
node.set(slist[0].strip(),slist[1].strip())
if ValueSetFlag:
node.text = valToBeSet
tree.write(strXMLPath)
fnlogs("XML File: " + strXMLPath + " has been modified successfully.\n")
return True
Using this function I am not able to traverse the current xml as it has lot of children attributes or sub groups.
import statement
import xml.etree.cElementTree as ET
Parse content by fromstring method.
root = ET.fromstring(data)
Iterate according our requirement and get target Item tag and change value of Value attribute
for component_tag in root.iter("Component"):
if "Name" in component_tag.attrib and component_tag.attrib['Name']=='ABC':
for group_tag in component_tag.iter("Group"):
if "Name" in group_tag.attrib and group_tag.attrib['Name']=='MNO':
#for value_tag in group_tag.iter("Value"):
for item_tag in group_tag.findall("Parameter[#Name='TUV']/Value/Item"):
item_tag.attrib["Value"] = "10"
We can use Xpath to get target Item tag
for item_tag in root.findall("Parameters/Component[#Name='ABC']/Group[#Name='MNO']/Parameter[#Name='TUV']/Value/Item"):
item_tag.attrib["Value"] = "10"
Use tostring method to get content.
data = ET.tostring(root)

Proper way to convert xml to dictionary

I'm not sure if this is the best way to convert this xml result into a dictionary, Besides doing that, is there any proper way to convert to dict ?
xml from http request result:
<Values version="2.0">
<value name="configuration">test</value>
<array name="configurationList" type="value" depth="1">
<value>test</value>
</array>
<value name="comment">Upload this for our robot.</value>
<array name="propertiesTable" type="record" depth="1">
<record javaclass="com.wm.util.Values">
<value name="name">date_to_go</value>
<value name="value">1990</value>
</record>
<record javaclass="com.wm.util.Values">
<value name="name">role</value>
<value name="value">Survivor</value>
</record>
<record javaclass="com.wm.util.Values">
<value name="name">status</value>
<value name="value">living</value>
</record>
<record javaclass="com.wm.util.Values">
<value name="name">user</value>
<value name="value">John&nbsp;Connor</value>
</record>
</array>
<null name="propertiesList"/>
</Values>
Code to convert the xml to dictionary ( which is working properly )
from xml.etree import ElementTree
tree = ElementTree.fromstring(xml)
mom = []
mim = []
configuration = tree.find('value[#name="configuration"]').text
comment = tree.find('value[#name="comment"]').text
prop = (configuration, comment)
mom.append(prop)
for records in tree.findall('./array/record'):
me = []
for child in records.iter('value'):
me.append(child.text)
mim.append(me)
for key, value in mim:
mi_dict = dict()
mi_dict[key] = value
mom.append(mi_dict)
print(mom)
The result ( working as intended ):
[('test', 'Upload this for our robot.'), {'date_to_go': '1990'}, {'role': 'Survivor'}, {'status': 'living'}, {'user': 'John Connor'}]
EDIT:
Sorry if i wans't clear, but the code described is working as expected. but i'm not sure if this is the proper way ( python way, pythonic or clean ) to do it.
Thanks in advance.
I don't think its too bad. You can make some minor changes to be a bit more pythonic
from xml.etree import ElementTree
tree = ElementTree.fromstring(xml)
mom = []
mim = []
configuration = tree.find('value[#name="configuration"]').text
comment = tree.find('value[#name="comment"]').text
prop = (configuration, comment)
mom.append(prop)
for records in tree.findall('./array/record'):
mim.append([child.text for child in records.iter('value')])
mom += [{k:v} for k, v in mim.iteritems()]
print(mom)

replace only first occurrence of field/word on a file

I have some zipfiles ( 700+ ) with the following structure ( the file is exactly like this )
<?xml version="1.0" encoding="UTF-8"?>
<Values version="2.0">
<record name="trigger">
<value name="uniqueId">6xjUCpDlrTVHRsEVmxx0Ews6ni8=</value>
<value name="processingSuspended">false</value>
<value name="retrievalSuspended">false</value>
</record>
<record name="trigger">
<value name="uniqueId">6xjUCpDlrTVHRsEVmxx0Ews6ni8=</value>
<value name="processingSuspended">false</value>
<value name="retrievalSuspended">false</value>
</record>
</Values>
What i would like to achieve, is to replace, no matter if the value of the first occurrence fields processingSuspended and retrievalSuspended is true or false. to replace it to false. But only for the first occurrence.
EDIT:
By request im adding what i have so far, where i can get the fields that i want, But. i believe there is a simplier way to do that.:
import os
import zipfile
import glob
import time
import re
def main():
rList = []
for z in glob.glob("*.zip"):
root = zipfile.ZipFile(z)
for filename in root.namelist():
if filename.find("node.ndf") >= 0:
for line in root.read(filename).split("\n"):
if line.find("broker-trigger") >= 0:
for iline in root.read(filename).split("\n"):
Values = dict()
#match Processing state
if iline.find("processingSuspended") >= 0:
mpr = re.search(r'(.*>)(.*?)(<.*)',
iline, re.M|re.I)
#match Retrieval state
if iline.find("retrievalSuspended") >= 0:
mr = re.search(r'(.*>)(.*?)(<.*)',
iline, re.M|re.I)
Values['processingSuspended'] = mpr.group(2)
Values['retrievalSuspended'] = mr.group(2)
#print mr.group(2)
rList.append(Values)
print rList
if __name__== "__main__":
main()
Thanks in advance.
Try using lxml:
>>> xml = '''\
<?xml version="1.0" encoding="UTF-8"?>
<Values version="2.0">
<record name="trigger">
<value name="uniqueId">6xjUCpDlrTVHRsEVmxx0Ews6ni8=</value>
<value name="processingSuspended">true</value>
<value name="retrievalSuspended">true</value>
</record>
<record name="trigger">
<value name="uniqueId">6xjUCpDlrTVHRsEVmxx0Ews6ni8=</value>
<value name="processingSuspended">true</value>
<value name="retrievalSuspended">true</value>
</record>
</Values>\
'''
>>> from lxml import etree
>>> tree = etree.fromstring(xml)
>>> tree.xpath('//value[#name="processingSuspended"]')[0].text = 'false'
>>> tree.xpath('//value[#name="retrievalSuspended"]')[0].text = 'false'
This xpath expression '//value[#name="processingSuspended"]' finds all the tags value with attribute name equal to "processingSuspended". Then we just take the first one with [0] and change the tag's text to 'false'.
Output:
>>> print(etree.tostring(tree, pretty_print=True))
<Values version="2.0">
<record name="trigger">
<value name="uniqueId">6xjUCpDlrTVHRsEVmxx0Ews6ni8=</value>
<value name="processingSuspended">false</value>
<value name="retrievalSuspended">false</value>
</record>
<record name="trigger">
<value name="uniqueId">6xjUCpDlrTVHRsEVmxx0Ews6ni8=</value>
<value name="processingSuspended">true</value>
<value name="retrievalSuspended">true</value>
</record>
</Values>
>>>
You can read the zip archives and update the xml formatted data in the file they contain with Python's built-in modules. There's even a tutorial in the documentation for xml.etree.ElementTree.
import glob
import xml.etree.ElementTree as ET
import zipfile
def main():
for z in glob.glob("*.zip"):
print 'processing file: {!r}'.format(z)
zfile = zipfile.ZipFile(z)
for filename in zfile.namelist():
print 'processing archive member: {!r} in {}'.format(filename, z)
contents = zfile.open(filename).read()
print 'Before changes:'
print contents
root = ET.fromstring(contents)
if root.tag != "Values" or root.attrib["version"] != "2.0":
print 'unsupported xml file'
break
if(root[0][1].tag == "value" and
root[0][1].attrib["name"] == "processingSuspended"):
root[0][1].text = "false"
else:
print 'expected "processingSuspended" value field not found'
break
if(root[0][2].tag == "value" and
root[0][2].attrib["name"] == "retrievalSuspended"):
root[0][2].text = "false"
else:
print 'expected "retrievalSuspended" value field not found'
break
print 'After changes:'
updated_contents = ET.tostring(root)
print updated_contents
if __name__== "__main__":
main()

Categories

Resources