How to parse an XML file to a list? - python

I am trying to parse an XML file to a list with Python. I have looked at some solutions on this site and others and could not make them work for me. I have managed to do it but in a laborious way that seems stupid to me. It seems that there should be an easier way.
I have tried to adapt other peoples code to suit my needs but that is not working as I am not always sure of what I am reading.
This is the XML file:
<?xml version="1.0"?>
<configuration>
<location name ="location">
<latitude>54.637348</latitude>
<latHemi>N</latHemi>
<longitude>5.829723</longitude>
<longHemi>W</longHemi>
</location>
<microphone name="microphone">
<sensitivity>-26.00</sensitivity>
</microphone>
<weighting name="weighting">
<cWeight>68</cWeight>
<aWeight>2011</aWeight>
</weighting>
<optionalLevels name="optionalLevels">
<L95>95</L95>
<L90>90</L90>
<L50>50</L50>
<L10>10</L10>
<L05>05</L05>
<fmax>fmax</fmax>
</optionalLevels>
<averagingPeriod name="averagingPeriod">
<onemin>1</onemin>
<fivemin>5</fivemin>
<tenmin>10</tenmin>
<fifteenmin>15</fifteenmin>
<thirtymin>30</thirtymin>
</averagingPeriod>
<timeWeighting name="timeWeighting">
<fast>fast</fast>
<slow>slow</slow>
</timeWeighting>
<rebootTime name="rebootTime">
<midnight>midnight</midnight>
<sevenAm>7am</sevenAm>
<sevenPm>7pm</sevenPm>
<elevenPm>23pm</elevenPm>
</rebootTime>
<remoteUpload name="remoteUpload">
<nointernet>nointernet</nointernet>
<vodafone>vodafone</vodafone>
</remoteUpload>
</configuration>
And this is the Python program.
#!/usr/bin/python
import xml.etree.ElementTree as ET
import os
try:
import cElementTree as ET
except ImportError:
try:
import xml.etree.cElementTree as ET
except ImportError:
exit_err("Failed to import cElementTree from any known place")
file_name = ('/home/mark/Desktop/Practice/config_settings.xml')
full_file = os.path.abspath(os.path.join('data', file_name))
dom = ET.parse(full_file)
tree = ET.parse(full_file)
root = tree.getroot()
location_settings = dom.findall('location')
mic_settings = dom.findall('microphone')
weighting = dom.findall('weighting')
olevels = dom.findall('optionalLevels')
avg_period = dom.findall('averagingPeriod')
time_weight = dom.findall('timeWeighting')
reboot = dom.findall('rebootTime')
remote_upload = dom.findall('remoteUpload')
for i in location_settings:
latitude = i.find('latitude').text
latHemi = i.find('latHemi').text
longitude = i.find('longitude').text
longHemi = i.find('longHemi').text
for i in mic_settings:
sensitivity = i.find('sensitivity').text
for i in weighting:
cWeight = i.find('cWeight').text
aWeight = i.find('aWeight').text
for i in olevels:
L95 = i.find('L95').text
L90 = i.find('L90').text
L50 = i.find('L50').text
L10 = i.find('L10').text
L05 = i.find('L05').text
for i in avg_period:
onemin = i.find('onemin').text
fivemin = i.find('fivemin').text
tenmin = i.find('tenmin').text
fifteenmin = i.find('fifteenmin').text
thirtymin = i.find('thirtymin').text
for i in time_weight:
fast = i.find('fast').text
slow = i.find('slow').text
for i in reboot:
midnight = i.find('midnight').text
sevenAm = i.find('sevenAm').text
sevenPm = i.find('sevenPm').text
elevenPm= i.find('elevenPm').text
for i in remote_upload:
nointernet = i.find('nointernet').text
vodafone = i.find('vodafone').text
config_list = [latitude,latHemi,longitude,longHemi,sensitivity,aWeight,cWeight,
L95,L90,L50,L10,L05,onemin,fivemin,tenmin,fifteenmin,thirtymin,
fast,slow,midnight,sevenAm,sevenAm,elevenPm,nointernet,vodafone]
print(config_list)

The problem you're posing isn't very well defined. The XML structure doesn't conform very well to a list structure to begin with. If you're new to python, I think the best way to go about what you're trying to do is to use something like xmltodict which will parse the implicit schema in your xml to python data structures.
e.g.
import xmltodict
xml = """<?xml version="1.0"?>
<configuration>
<location name ="location">
<latitude>54.637348</latitude>
<latHemi>N</latHemi>
<longitude>5.829723</longitude>
<longHemi>W</longHemi>
</location>
<microphone name="microphone">
<sensitivity>-26.00</sensitivity>
</microphone>
<weighting name="weighting">
<cWeight>68</cWeight>
<aWeight>2011</aWeight>
</weighting>
<optionalLevels name="optionalLevels">
<L95>95</L95>
<L90>90</L90>
<L50>50</L50>
<L10>10</L10>
<L05>05</L05>
<fmax>fmax</fmax>
</optionalLevels>
<averagingPeriod name="averagingPeriod">
<onemin>1</onemin>
<fivemin>5</fivemin>
<tenmin>10</tenmin>
<fifteenmin>15</fifteenmin>
<thirtymin>30</thirtymin>
</averagingPeriod>
<timeWeighting name="timeWeighting">
<fast>fast</fast>
<slow>slow</slow>
</timeWeighting>
<rebootTime name="rebootTime">
<midnight>midnight</midnight>
<sevenAm>7am</sevenAm>
<sevenPm>7pm</sevenPm>
<elevenPm>23pm</elevenPm>
</rebootTime>
<remoteUpload name="remoteUpload">
<nointernet>nointernet</nointernet>
<vodafone>vodafone</vodafone>
</remoteUpload>
</configuration>"""
d = xmltodict.parse(xml)

Thanks for the comments. Sorry if the question was not well posed. I have found an answer myself. I was looking to parse the XML child elements into a list for later use in another program. I figured it out. Thank you for your patience.

Related

python lxml etree namespaces on creation

I am using lxml etree to create xml or REST call. I have problem with namespaces since if not formulated correctly I get a syntax error from server.
As you can see in the following 2 examples I should be getting eg ns1, ns2, ns4, ns5 but the xml goes over with ns15, ns16 but at the end it has the e.g "" or " " - I know this explains it but for the nature of my REST call I need it as the example is.
How can I prevent that
I have to get the following xml
<ns5:prenosPodatkovRazporedaZahtevaSporocilo xmlns="http://xxx.yyy/sheme/pdr/skupno/v1" xmlns:ns2="http://xxx.yyy/sheme/pdr/v1" xmlns:ns3="http://xxx.yyy/sheme/kis/skupno/v2" xmlns:ns4="http://xxx.yyy/sheme/kis/v2" xmlns:ns5="http://xxx.yyy/sheme/pdr/sporocila/v1">
<ns5:podatkiRazporeda>
<ns2:podatkiRazporeda>
<ns2:delitvenaEnota>
<sifra>80</sifra>
</ns2:delitvenaEnota>
<ns2:vrstaRazporeda>
<sifra>4</sifra>
</ns2:vrstaRazporeda>
<ns2:tipRazporeda>
<sifra>D</sifra>
</ns2:tipRazporeda>
<ns2:obdobje>
<ns2:mesec>12</ns2:mesec>
<ns2:leto>2017</ns2:leto>
</ns2:obdobje>
<ns2:skupina>0</ns2:skupina>
<ns2:izvor>P_738</ns2:izvor>
<ns2:oznakeDelaZaDneve>
<ns2:oznakaDelaZaDan>
<ns2:dan>1</ns2:dan>
<ns2:oznakaDela>D4</ns2:oznakaDela>
</ns2:oznakaDelaZaDan>
....
</ns2:oznakeDelaZaDneve>
<ns2:organizacijskaEnota>
<sifra>738</sifra>
</ns2:organizacijskaEnota>
<ns2:zaposlenec>
<ns4:osebnaStevilka>10357</ns4:osebnaStevilka>
</ns2:zaposlenec>
</ns2:podatkiRazporeda>
</ns5:podatkiRazporeda>
Where I am getting this xml.
Mind the namespace marks.
<ns0:prenosPodatkovRazporedaOdgovorSporocilo xmlns:ns="http://rccirc.si/sheme/pdr/skupno/v1" xmlns:ns2="http://rccirc.si/sheme/pdr/v1" xmlns:ns3="http://rccirc.si/sheme/kis/skupno/v2" xmlns:ns4="http://rccirc.si/sheme/kis/v2" xmlns:ns5="http://rccirc.si/sheme/pdr/sporocila/v1" xmlns:ns0="ns5">
<ns0:podatkiRazporeda>
<ns1:podatkiRazporeda xmlns:ns1="ns2">
<ns1:vrstaRazporeda>
<sifra>647</sifra>
</ns1:vrstaRazporeda>
<ns1:tipRazporeda>
<sifra>D</sifra>
</ns1:tipRazporeda>
<ns1:obdobje>
<ns1:mesec>1</ns1:mesec>
<ns1:leto>2018</ns1:leto>
</ns1:obdobje>
<ns1:skupina>0</ns1:skupina>
<ns1:izvor>0</ns1:izvor>
<ns1:organizacijskaEnota>
<sifra>250</sifra>
</ns1:organizacijskaEnota>
<ns6:delitvenaenota xmlns:ns6="ns3">
<sifra>80</sifra>
</ns6:delitvenaenota>
<ns1:oznakeDelaZaDneve>
<oznakeDelaZaDneve>
<ns1:dan>29</ns1:dan>
<ns1:oznakaDela>1930-0730</ns1:oznakaDela>
</oznakeDelaZaDneve>
</ns1:oznakeDelaZaDneve>
<ns1:zaposlenec>
<ns7:osebnaStevilka xmlns:ns7="ns4">Z1</ns7:osebnaStevilka>
</ns1:zaposlenec>
</ns1:podatkiRazporeda>
.......
<ns11:podatkiRazporeda xmlns:ns11="ns2">
<ns11:vrstaRazporeda>
<sifra>647</sifra>
</ns11:vrstaRazporeda>
<ns11:tipRazporeda>
<sifra>D</sifra>
</ns11:tipRazporeda>
<ns11:obdobje>
<ns11:mesec>1</ns11:mesec>
<ns11:leto>2018</ns11:leto>
</ns11:obdobje>
<ns11:skupina>0</ns11:skupina>
<ns11:izvor>0</ns11:izvor>
<ns11:organizacijskaEnota>
<sifra>250</sifra>
</ns11:organizacijskaEnota>
<ns12:delitvenaenota xmlns:ns12="ns3">
<sifra>80</sifra>
</ns12:delitvenaenota>
<ns11:oznakeDelaZaDneve>
<oznakeDelaZaDneve>
<ns11:dan>3</ns11:dan>
<ns11:oznakaDela>0730-1530</ns11:oznakaDela>
</oznakeDelaZaDneve>
.....
</ns11:oznakeDelaZaDneve>
<ns11:zaposlenec>
<ns13:osebnaStevilka xmlns:ns13="ns4">Z1</ns13:osebnaStevilka>
</ns11:zaposlenec>
</ns11:podatkiRazporeda>
</ns0:podatkiRazporeda>
</ns0:prenosPodatkovRazporedaOdgovorSporocilo>
Here is my code.
root = etree.Element('{ns5}prenosPodatkovRazporedaOdgovorSporocilo', nsmap = {'ns': "http://xxx.yyy/sheme/pdr/skupno/v1",'ns2':"http://xxx.yyy/sheme/pdr/v1" ns3':"http://xxx.yyy/sheme/kis/skupno/v2",ns4': "http://xxx.yyy/sheme/kis/v2",ns5': "http://xxx.yyy/sheme/pdr/sporocila/v1"})
podatkiRazporedaMain = etree.SubElement(root, '{ns5}podatkiRazporeda')
#follwed by creating sub elements etc.
for rec in grouped_workers:
podatkiRazporeda = etree.SubElement(podatkiRazporedaMain, '{ns2}podatkiRazporeda')
vrstaRazporeda= etree.SubElement(podatkiRazporeda, '{ns2}vrstaRazporeda')
vrstaRazporedaSifra = etree.SubElement(vrstaRazporeda, 'sifra')
vrstaRazporedaSifra.text = "647"
tipRazporeda= etree.SubElement(podatkiRazporeda, '{ns2}tipRazporeda')
tipRazporedaSifra = etree.SubElement(tipRazporeda, 'sifra')
tipRazporedaSifra.text = 'D'
for rr in rec["data"]:
oznakaDelaZaDan = etree.SubElement(oznakeDelaZaDneve, 'oznakeDelaZaDneve')
dan= etree.SubElement(oznakaDelaZaDan, '{ns2}dan')
dan.text = str(rr["rw_date"].day)
oznakaDela = etree.SubElement(oznakaDelaZaDan, '{ns2}oznakaDela')
oznakaDela.text = str(rr["rw_shift"])
#print etree.tostring(root, pretty_print=True, xml_declaration=False, encoding='UTF-8')
fle = os.path.join(request.folder, 'private', str(647) + '.xml')
with open(fle, 'wb') as f:
f.write(etree.tostring(root, pretty_print=True, xml_declaration=False, encoding='UTF-8'))#,inclusive_ns_prefixes=None))
#etree..write(fle, pretty_print=True, xml_declaration=False, encoding='UTF-8')
print "Done"
So why are ns incremented?
Hope I was clear
Than you
So as it turns out when you are creating tags you should not write
vrstaRazporeda= etree.SubElement(podatkiRazporeda, '{ns2}vrstaRazporeda')
vrstaRazporedaSifra = etree.SubElement(vrstaRazporeda, 'sifra').text = "647"
But
vrstaRazporeda= etree.SubElement(podatkiRazporeda, '{http://xxx.yyy/sheme/pdr/v1}vrstaRazporeda')
vrstaRazporedaSifra = etree.SubElement(vrstaRazporeda, 'sifra').text = "647"
so the whole url - this seemed to solve the issue.

Python xml parsing etree find element X by postion

I'm trying to parse the following xml to pull out certain data then eventually edit the data as needed.
Here is the xml:
<?xml version="1.0" encoding="UTF-8" standalone="yes"?>
<CHECKLIST>
<VULN>
<STIG_DATA>
<VULN_ATTRIBUTE>Vuln_Num</VULN_ATTRIBUTE>
<ATTRIBUTE_DATA>V-38438</ATTRIBUTE_DATA>
</STIG_DATA>
<STIG_DATA>
<VULN_ATTRIBUTE>Rule_Title</VULN_ATTRIBUTE>
<ATTRIBUTE_DATA>More text.</ATTRIBUTE_DATA>
</STIG_DATA>
<STIG_DATA>
<VULN_ATTRIBUTE>Vuln_Discuss</VULN_ATTRIBUTE>
<ATTRIBUTE_DATA>Some text here</ATTRIBUTE_DATA>
</STIG_DATA>
<STIG_DATA>
<VULN_ATTRIBUTE>IA_Controls</VULN_ATTRIBUTE>
<ATTRIBUTE_DATA></ATTRIBUTE_DATA>
</STIG_DATA>
<STIG_DATA>
<VULN_ATTRIBUTE>Rule_Ver</VULN_ATTRIBUTE>
<ATTRIBUTE_DATA>Gen000000</ATTRIBUTE_DATA>
</STIG_DATA>
<STATUS>NotAFinding</STATUS>
<FINDING_DETAILS></FINDING_DETAILS>
<COMMENTS></COMMENTS>
<SEVERITY_OVERRIDE></SEVERITY_OVERRIDE>
<SEVERITY_JUSTIFICATION></SEVERITY_JUSTIFICATION>
</VULN>
The data that I'm looking to pull from this is the STATUS, COMMENTS and the ATTRIBUTE_DATA directly following VULN_ATTRIBUTE that matches == Rule_Ver. So in this example.
I should get the following:
Gen000000 NotAFinding None
What I have so far is that I can get the Status and Comments easy, but can't figure out the ATTRIBUTE_DATA portion. I can find the first one (Vuln_Num), then I tried to add a index but that gives a "list index out of range" error.
This is where I'm at now.
import xml.etree.ElementTree as ET
doc = ET.parse('test.ckl')
root=doc.getroot()
TagList = doc.findall("./VULN")
for curTag in TagList:
StatusTag = curTag.find("STATUS")
CommentTag = curTag.find("COMMENTS")
DataTag = curTag.find("./STIG_DATA/ATTRIBUTE_DATA")
print "GEN:[%s] Status:[%s] Comments: %s" %( DataTag.text, StatusTag.text, CommentTag.text)
This gives the following output:
GEN:[V-38438] Status:[NotAFinding] Comments: None
I want:
GEN:[Gen000000] Status:[NotAFinding] Comments: None
So the end goal is to be able to parse hundreds of these and edit the comments field as needed. I don't think the editing part will be that hard once I get the right element.
Logically I see two ways of doing this. Either go to the ATTRIBUTE_DATA[5] and grab the text or find VULN_ATTRIBUTE == Rule_Ver then grab the next ATTRIBUTE_DATA.
I have tried doing this:
DataTag = curTag.find(".//STIG_DATA//ATTRIBUTE_DATA")[5]
andDataTag[5].text`
and both give meIndexError: list index out of range
I saw lxml had get_element_by_id and xpath, but I can't add modules to this system so it is etree for me.
Thanks in advance.
One can find an element by position, but you've used the incorrect XPath syntax. Either of the following lines should work:
DataTag = curTag.find("./STIG_DATA[5]/ATTRIBUTE_DATA") # Note: 5, not 4
DataTag = curTag.findall("./STIG_DATA/ATTRIBUTE_DATA")[4] # Note: 4, not 5
However, I strongly recommend against using that. There is no guarantee that the Rule_Ver instance of STIG_DATA is always the fifth item.
If you could change to lxml, then this works:
DataTag = curTag.xpath(
'./STIG_DATA/VULN_ATTRIBUTE[text()="Rule_Ver"]/../ATTRIBUTE_DATA')[0]
Since you can't use lxml, you must iterate the STIG_DATA elements by hand, like so:
def GetData(curTag):
for stig in curTag.findall('STIG_DATA'):
if stig.find('VULN_ATTRIBUTE').text == 'Rule_Ver':
return stig.find('ATTRIBUTE_DATA')
Here is a complete program with error checking added to GetData():
import xml.etree.ElementTree as ET
doc = ET.parse('test.ckl')
root=doc.getroot()
TagList = doc.findall("./VULN")
def GetData(curTag):
for stig in curTag.findall('STIG_DATA'):
vuln = stig.find('VULN_ATTRIBUTE')
if vuln is not None and vuln.text == 'Rule_Ver':
data = stig.find('ATTRIBUTE_DATA')
return data
for curTag in TagList:
StatusTag = curTag.find("STATUS")
CommentTag = curTag.find("COMMENTS")
DataTag = GetData(curTag)
print "GEN:[%s] Status:[%s] Comments: %s" %( DataTag.text, StatusTag.text, CommentTag.text)
References:
https://stackoverflow.com/a/10836343/8747
http://lxml.de/xpathxslt.html#xpath

Reg adding data to an existing XML in Python

I have to parse an xml file & modify the data in a particular tag using Python. I'm using Element Tree to do this. I'm able to parse & reach the required tag. But I'm not able to modify the value. I'm not sure if Element Tree is okay or if I should use TreeBuilder for this.
As you can see below I just want to replace the Not Executed under Verdict with a string value.
-<Procedure>
<PreCondition>PRECONDITION: - ECU in extended diagnostic session (zz = 0x03) </PreCondition>
<PostCondition/>
<ProcedureID>428495</ProcedureID>
<SequenceNumber>2</SequenceNumber>
<CID>-1</CID>
<**Verdict** Writable="true">NotExecuted</Verdict>
</Procedure>
import xml.etree.ElementTree as etree
X_tree = etree.parse('DIAGNOSTIC SERVER.xml')
X_root = X_tree.getroot()
ATC_Name = X_root.iterfind('TestOrder//TestOrder//TestSuite//')
try:
while(1):
temp = ATC_Name.next()
if temp.tag == 'ProcedureID' and temp.text == str(TestCase_Id[j].text).split('-')[1]:
ATC_Name.next()
ATC_Name.next()
ATC_Name.next().text = 'Pass' <--This is what I want to do
ATC_Name.close()
break
except:
print sys.exc_info()
I believe my approach is wrong. Kindly guide me with right pointers.
Thanks.
You'd better switch to lxml so that you can use the "unlimited" power of xpath.
The idea is to use the following xpath expression:
//Procedure[ProcedureID/text()="%d"]/Verdict
where %d placeholder is substituted with the appropriate procedure id via string formatting operation.
The xpath expression finds the appropriate Verdict tag which you can set text on:
from lxml import etree
data = """<Procedure>
<PreCondition>PRECONDITION: - ECU in extended diagnostic session (zz = 0x03) </PreCondition>
<PostCondition/>
<ProcedureID>428495</ProcedureID>
<SequenceNumber>2</SequenceNumber>
<CID>-1</CID>
<Verdict Writable="true">NotExecuted</Verdict>
</Procedure>"""
ID = 428495
tree = etree.fromstring(data)
verdict = tree.xpath('//Procedure[ProcedureID/text()="%d"]/Verdict' % ID)[0]
verdict.text = 'test'
print etree.tostring(tree)
prints:
<Procedure>
<PreCondition>PRECONDITION: - ECU in extended diagnostic session (zz = 0x03) </PreCondition>
<PostCondition/>
<ProcedureID>428495</ProcedureID>
<SequenceNumber>2</SequenceNumber>
<CID>-1</CID>
<Verdict Writable="true">test</Verdict>
</Procedure>
Here is a solution using ElementTree. See Modifying an XML File
import xml.etree.ElementTree as et
tree = et.parse('prison.xml')
root = tree.getroot()
print root.find('Verdict').text #before update
root.find('Verdict').text = 'Executed'
tree.write('prison.xml')
try this
import xml.etree.ElementTree as et
root=et.parse(xmldata).getroot()
s=root.find('Verdict')
s.text='Your string'

How to parse XML using python

I am trying to parse an xml using python for create a result summary file. Below is my code and a snippet of xml, Like the below i have couple of sections with <test> and </test>
<test name="tst_case1">
<prolog time="2013-01-18T14:41:09+05:30"/>
<verification name="VP5" file="D:/Squish/HMI_testing/tst_case1/test.py" type="properties" line="6">
<result time="2013-01-18T14:41:10+05:30" type="PASS">
<description>VP5: Object propertycomparisonof ':_QMenu_3.enabled'passed</description> <description type="DETAILED">'false' and 'false' are equal</description>
<description type="object">:_QMenu_3</description>
<description type="property">enabled</description>
<description type="failedValue">false</description>
</result>
</verification>
<epilog time="2013-01-18T14:41:11+05:30"/>
</test>
What I want to get is,
in one <test> section how many PASS / FAIL is there.
With the below code its printing the total pass/Fail in the xml file.But i am interested in each section how many PASS/FAIL. can any boy tell me the procedure to fetchout this ?
import sys
import xml.dom.minidom as XY
file = open("result.txt", "w")
tree = XY.parse('D:\\Squish\\squish results\\Results-On-2013-01-18_0241 PM.xml')
Test_name = tree.getElementsByTagName('test')
Test_status = tree.getElementsByTagName('result')
count_testname =0
passcount = 0
failcount = 0
Test_name_array = []
for my_Test_name in Test_name:
count_testname = count_testname+1
passcount = 0
failcount = 0
my_Test_name_final = my_Test_name.getAttribute('name')
Test_name_array = my_Test_name_final
if(count_testname > 1):
print(my_Test_name_final)
for my_Test_status in Test_status:
my_Test_status_final = my_Test_status.getAttribute('type')
if(my_Test_status_final == 'PASS'):
passcount = passcount+1
if(my_Test_status_final == 'FAIL'):
failcount = failcount+1
print(str(my_Test_status_final))
I'd not use minidom for this task; the DOM API is very cumbersome, verbose, and not suited for searching and matching.
The Python library also includes the xml.etree.ElementTree API, I'd use that instead:
from xml.etree import ElementTree as ET
tree = ET.parse(r'D:\Squish\squish results\Results-On-2013-01-18_0241 PM.xml')
tests = dict()
# Find all <test> elements with a <verification> child:
for test in tree.findall('.//test[verification]'):
passed = len(test.findall(".//result[#type='PASS']"))
failed = len(test.findall(".//result[#type='FAIL']"))
tests[test.attrib['name']] = {'pass': passed, 'fail': failed}
The above piece of code counts the number of passed and failed tests per <test> element and stores them in a dictionary, keyed to the name attribute of the <test> element.
I've tested the above code with Python 3.2 and the full XML document from another question you posted, which results in:
{'tst_Setup_menu_2': {'fail': 0, 'pass': 8}}
Thanks for the posting. i got it working using minidon.
still wish to see how can be solved using xml.etree.ElementTree
import sys
import xml.dom.minidom as XY
file = open("Result_Summary.txt", "w")
#tree = XY.parse('D:\\Squish\\squish results\\Results-On-2013-01-18_0241 PM.xml')
#print (str(sys.argv[1]))
tree = XY.parse(sys.argv[1])
Test_name = tree.getElementsByTagName('test')
count_testname =0
file.write('Test Name \t\t\t No:PASS\t\t\t No:FAIL\t \n\n')
for my_Test_name in Test_name:
count_testname = count_testname+1
my_Test_name_final = my_Test_name.getAttribute('name')
if(count_testname > 1):
#print(my_Test_name_final)
file.write(my_Test_name_final)
file.write('\t\t\t\t')
my_Test_status = my_Test_name.getElementsByTagName('result')
passcount = 0
failcount = 0
for my_Test_status_1 in my_Test_status:
my_Test_status_final = my_Test_status_1.getAttribute('type')
if(my_Test_status_final == 'PASS'):
passcount = passcount+1
if(my_Test_status_final == 'FAIL'):
failcount = failcount+1
#print(str(my_Test_status_final))
file.write(str(passcount))
#print(passcount)
file.write('\t\t\t\t')
file.write(str(failcount))
# print(failcount)
file.write('\n')
#print ('loop count: %d' %count_testname)
#print('PASS count: %s' %passcount)
#print('FAIL count: %s' %failcount)
file.close()
Although not a standard module but well worth the effort of installing is lxml especially if you want to do fast Xml parsing etc IMHO.
Without a full example of your results I guessed at what they would look like.
from lxml import etree
tree = etree.parse("results.xml")
count_result_type = etree.XPath("count(.//result[#type = $name])")
for test in tree.xpath("//test"):
print test.attrib['name']
print "\t# FAILS ", count_result_type(test, name="FAIL")
print "\t# PASSES", count_result_type(test, name="PASS")
I generated the following running against my guess of your xml, which should give you an idea of what is happening.
tst_case1
# FAILS 1.0
# PASSES 1.0
tst_case0
# FAILS 0.0
# PASSES 1.0
tst_case2
# FAILS 0.0
# PASSES 1.0
tst_case3
# FAILS 0.0
# PASSES 1.0
What I like about lxml is how expressive it can be, YMMV.
I see you are using Squish. You should check your squish folder under \examples\regressiontesting. There you can find a file called xml2result2html.py. Here you can find an example of converting squish test results into html.

How to replace node values in XML with Python

I am new to Python. Now I have to replace a number of values in an XML file with Python. The example snippet of XML is:
<gmd:extent>
<gmd:EX_Extent>
<gmd:description gco:nilReason="missing">
<gco:CharacterString />
</gmd:description>
<gmd:geographicElement>
<gmd:EX_GeographicBoundingBox>
<gmd:westBoundLongitude>
<gco:Decimal>112.907</gco:Decimal>
</gmd:westBoundLongitude>
<gmd:eastBoundLongitude>
<gco:Decimal>158.96</gco:Decimal>
</gmd:eastBoundLongitude>
<gmd:southBoundLatitude>
<gco:Decimal>-54.7539</gco:Decimal>
</gmd:southBoundLatitude>
<gmd:northBoundLatitude>
<gco:Decimal>-10.1357</gco:Decimal>
</gmd:northBoundLatitude>
</gmd:EX_GeographicBoundingBox>
</gmd:geographicElement>
</gmd:EX_Extent>
</gmd:extent>
What I want to do is to replace those decimal values, i.e. 112.907, with a specified value.
<gmd:extent>
<gmd:EX_Extent>
<gmd:description gco:nilReason="missing">
<gco:CharacterString />
</gmd:description>
<gmd:geographicElement>
<gmd:EX_GeographicBoundingBox>
<gmd:westBoundLongitude>
<gco:Decimal>new value</gco:Decimal>
</gmd:westBoundLongitude>
<gmd:eastBoundLongitude>
<gco:Decimal>new value</gco:Decimal>
</gmd:eastBoundLongitude>
<gmd:southBoundLatitude>
<gco:Decimal>new value</gco:Decimal>
</gmd:southBoundLatitude>
<gmd:northBoundLatitude>
<gco:Decimal>new value</gco:Decimal>
</gmd:northBoundLatitude>
</gmd:EX_GeographicBoundingBox>
</gmd:geographicElement>
</gmd:EX_Extent>
</gmd:extent>
I tried with a few methods but none of them worked with my assumption that the difficulty is with the namespace prefix gmd and gco.
Please help me out. Thanks in advance!
Cheers, Alex
I couldn't get lxml to process your xml without adding fake namespace declarations at the top so here is how your input looked
<gmd:extent xmlns:gmd="urn:x:y:z:1" xmlns:gco="urn:x:y:z:1">
<gmd:EX_Extent>
<gmd:description gco:nilReason="missing">
<gco:CharacterString />
</gmd:description>
<gmd:geographicElement>
<gmd:EX_GeographicBoundingBox>
<gmd:westBoundLongitude>
<gco:Decimal>112.907</gco:Decimal>
</gmd:westBoundLongitude>
<gmd:eastBoundLongitude>
<gco:Decimal>158.96</gco:Decimal>
</gmd:eastBoundLongitude>
<gmd:southBoundLatitude>
<gco:Decimal>-54.7539</gco:Decimal>
</gmd:southBoundLatitude>
<gmd:northBoundLatitude>
<gco:Decimal>-10.1357</gco:Decimal>
</gmd:northBoundLatitude>
</gmd:EX_GeographicBoundingBox>
</gmd:geographicElement>
</gmd:EX_Extent>
</gmd:extent>
I assumed you have two lists one for the current values and one for the new ones like this
old = [112.907, 158.96, -54.7539, -10.1357]
new = [1,2,3,4]
d = dict(zip(old,new))
Here is the full code
#!/usr/bin/env python
import sys
from lxml import etree
def process(fname):
f = open(fname)
tree = etree.parse(f)
root = tree.getroot()
old = [112.907, 158.96, -54.7539, -10.1357]
new = [1,2,3,4]
d = dict(zip(old,new))
nodes = root.findall('.//gco:Decimal', root.nsmap)
for node in nodes:
node.text = str(d[float(node.text)])
f.close()
return etree.tostring(root, pretty_print=True)
def main():
fname = sys.argv[1]
text = process(fname)
outfile = open('out.xml', 'w+')
outfile.write(text)
outfile.close()
if __name__ == '__main__':
main()
and here is how the output looked like
<gmd:extent xmlns:gmd="urn:x:y:z:1" xmlns:gco="urn:x:y:z:1">
<gmd:EX_Extent>
<gmd:description gco:nilReason="missing">
<gco:CharacterString/>
</gmd:description>
<gmd:geographicElement>
<gmd:EX_GeographicBoundingBox>
<gmd:westBoundLongitude>
<gco:Decimal>1</gco:Decimal>
</gmd:westBoundLongitude>
<gmd:eastBoundLongitude>
<gco:Decimal>2</gco:Decimal>
</gmd:eastBoundLongitude>
<gmd:southBoundLatitude>
<gco:Decimal>3</gco:Decimal>
</gmd:southBoundLatitude>
<gmd:northBoundLatitude>
<gco:Decimal>4</gco:Decimal>
</gmd:northBoundLatitude>
</gmd:EX_GeographicBoundingBox>
</gmd:geographicElement>
</gmd:EX_Extent>
</gmd:extent>

Categories

Resources