Python:XML List index out of range - python

I'm having troubles to get some values in a xml file. The error is IndexError: list index out of range
XML
<?xml version="1.0" encoding="UTF-8"?>
<nfeProc xmlns="http://www.portalfiscal.inf.br/nfe" versao="3.10">
<NFe xmlns="http://www.portalfiscal.inf.br/nfe">
<infNFe Id="NFe35151150306471000109550010004791831003689145" versao="3.10">
<ide>
<nNF>479183</nNF>
</ide>
<emit>
<CNPJ>3213213212323</CNPJ>
</emit>
<det nItem="1">
<prod>
<cProd>7030-314</cProd>
</prod>
<imposto>
<ICMS>
<ICMS10>
<orig>1</orig>
<CST>10</CST>
<vICMS>10.35</vICMS>
<vICMSST>88.79</vICMSST>
</ICMS10>
</ICMS>
</imposto>
</det>
<det nItem="2">
<prod>
<cProd>7050-6</cProd>
</prod>
<imposto>
<ICMS>
<ICMS00>
<orig>1</orig>
<CST>00</CST>
<vICMS>7.49</vICMS>
</ICMS00>
</ICMS>
</imposto>
</det>
</infNFe>
</NFe>
</nfeProc>
I'm getting the values from XML, it's ok in some xml's, those having vICMS and vICMSST tags:
vicms = doc.getElementsByTagName('vICMS')[i].firstChild.nodeValue
vicmsst = doc.getElementsByTagName('vICMSST')[1].firstChild.nodeValue
This returns:
First returns:
print vicms
>> 10.35
print vicmsst
>> 88.79
Second imposto CRASHES because don't find vICMSST tag...
**IndexError: list index out of range**
What the best form to test it? I'm using xml.etree.ElementTree:
My code:
import os
import sys
import subprocess
import base64,xml.dom.minidom
from xml.dom.minidom import Node
import glob
import xml.etree.ElementTree as ET
origem = 0
# only loops over XML documents in folder
for file in glob.glob("*.xml"):
f = open("%s" % file,'r')
data = f.read()
i = 0
doc = xml.dom.minidom.parseString(data)
for topic in doc.getElementsByTagName('emit'):
#Get Fiscal Number
nnf= doc.getElementsByTagName('nNF')[i].firstChild.nodeValue
print 'Fiscal Number %s' % nnf
print '\n'
for prod in doc.getElementsByTagName('det'):
vicms = 0
vicmsst = 0
#Get value of ICMS
vicms = doc.getElementsByTagName('vICMS')[i].firstChild.nodeValue
#Get value of VICMSST
vicmsst = doc.getElementsByTagName('vICMSST')[i].firstChild.nodeValue
#PRINT INFO
print 'ICMS %s' % vicms
print 'Valor do ICMSST: %s' % vicmsst
print '\n\n'
i +=1
print '\n\n'

There is only one vICMSST tag in your XML document. So, when i=1, the following line returns an IndexError.
vicmsst = doc.getElementsByTagName('vICMSST')[1].firstChild.nodeValue
You can restructure this to:
try:
vicmsst = doc.getElementsByTagName('vICMSST')[i].firstChild.nodeValue
except IndexError:
# set a default value or deal with this how you like
It's hard to say what you should do upon an exception without knowing more about what you're trying to do.

You are making several general mistakes in your code.
Don't use counters to index into lists you don't know the length of. Normally, iteration with for .. in is a lot better than using indexes anyway.
You have many imports you don't seem to use, get rid of them.
You can use minidom, but ElementTree is better for your task because it supports searching for nodes with XPath and it supports XML namespaces.
Don't read an XML file as a string and then use parseString. Let the XML parser handle the file directly. This way all file encoding related issues will be handled without errors.
The following is a lot better than your original approach.
import glob
import xml.etree.ElementTree as ET
def get_text(context_elem, xpath, xmlns=None):
""" helper function that gets the text value of a node """
node = context_elem.find(xpath, xmlns)
if (node != None):
return node.text
else:
return ""
# set up XML namespace URIs
xmlns = {
"nfe": "http://www.portalfiscal.inf.br/nfe"
}
for path in glob.glob("*.xml"):
doc = ET.parse(path)
for infNFe in doc.iterfind('.//nfe:infNFe', xmlns):
print 'Fiscal Number\t%s' % get_text(infNFe, ".//nfe:nNF", xmlns)
for det in infNFe.iterfind(".//nfe:det", xmlns):
print ' ICMS\t%s' % get_text(det, ".//nfe:vICMS", xmlns)
print ' Valor do ICMSST:\t%s' % get_text(det, ".//nfe:vICMSST", xmlns)
print '\n\n'

Related

how to check if an attribute <Reporting_date> YYYYMMDD </Reporting_Date> in a .xml file is equal to a fixed Date value

I am new to python and wondering how to solve below use-case using python script in the shell script.
I have a shell script which holds variables as file name, ODATE value which is fixed in the format YYYYMMDD. It does some checks on file name.
In the next step I want to run a python script which actually checks the attribute <Reporting_date> YYYYMMDD </Reporting_Date> value for every occurrence in the test.xml file is equal to a fixed value from ODATE.
If all the values for <Reporting_date> YYYYMMDD </Reporting_Date> are matching the ODATE then print message "all the attributes are matched".
If we find one mismatch while scanning multiple records in test.xml file, on the 1st mismatch itself stop scanning the entire test.xml file, and print message "Encountered a mismatch"
Could anyone of you please guide me with this use-case. It would be much helpful.
Many thanks in advance.
How about something like this (I haven't test this code and I don't know structure of your XML file, but you should get an idea).
import xml.etree.ElementTree as ET
tree = ET.parse('test.xml')
root = tree.getroot()
ODATE = '20210215'
allok = True
for child in root.iter('Reporting_date'):
if child.text != ODATE:
allok = False
print ("Encountered a mismatch")
break
if allok:
print("All attributes are matched")
For more information see https://docs.python.org/3/library/xml.etree.elementtree.html
--- EDIT using findall instead of iter (see comments) ---
import xml.etree.ElementTree as ET
tree = ET.parse('test.xml')
root = tree.getroot()
ODATE = '20210215'
allok = True
for directchild in root:
for child in directchild.findall("REPORTING_DATE"):
# this supports multiple reporting_date in one POSITION structure.
# If you know that there always is only one, find would be better.
if child.text != ODATE:
allok = False
print ("Encountered a mismatch")
break
if allok:
print("All attributes are matched")
See below
import xml.etree.ElementTree as ET
XML = '''<?xml version="1.0" encoding="UTF-8"?>
<POSITIONS>
<POSITION>
<ISIN>aaaaaaa</ISIN>
<ACCOUNT>7777777</ACCOUNT>
<POSITION>11111</POSITION>
<SETTLEMENT_DATE>20210202</SETTLEMENT_DATE>
<REPORTING_DATE>20210202</REPORTING_DATE>
</POSITION>
<POSITION>
<ISIN>bbbbbbb</ISIN>
<ACCOUNT>66666666</ACCOUNT>
<POSITION>888888888</POSITION>
<SETTLEMENT_DATE>20210203</SETTLEMENT_DATE>
<REPORTING_DATE>20210215</REPORTING_DATE>
</POSITION>
</POSITIONS>'''
ODATE = '20210215'
root = ET.fromstring(XML)
dates = root.findall('.//REPORTING_DATE')
for date in dates:
if date.text == ODATE:
print(f'ODATE {ODATE} and REPORTING_DATE {date.text} are the same dates')
else:
print(f'ODATE {ODATE} and REPORTING_DATE {date.text} are NOT the same dates')

lxml (etree) - Pretty Print attributes of root tag

Is it possible in python to pretty print the root's attributes?
I used etree to extend the attributes of the child tag and then I had overwritten the existing file with the new content. However during the first generation of the XML, we were using a template where the attributes of the root tag were listed one per line and now with the etree I don't manage to achieve the same result.
I found similar questions but they were all referring to the tutorial of etree, which I find incomplete.
Hopefully someone has found a solution for this using etree.
EDIT: This is for custom XML so HTML Tidy (which was proposed in the comments), doesn't work for this.
Thanks!
generated_descriptors = list_generated_files(generated_descriptors_folder)
counter = 0
for g in generated_descriptors:
if counter % 20 == 0:
print "Extending Descriptor # %s out of %s" % (counter, len(descriptor_attributes))
with open(generated_descriptors_folder + "\\" + g, 'r+b') as descriptor:
root = etree.XML(descriptor.read(), parser=parser)
# Go through every ContextObject to check if the block is mandatory
for context_object in root.findall('ContextObject'):
for attribs in descriptor_attributes:
if attribs['descriptor_name'] == g[:-11] and context_object.attrib['name'] in attribs['attributes']['mandatoryobjects']:
context_object.set('allow-null', 'false')
elif attribs['descriptor_name'] == g[:-11] and context_object.attrib['name'] not in attribs['attributes']['mandatoryobjects']:
context_object.set('allow-null', 'true')
# Sort the ContextObjects based on allow-null and their name
context_objects = root.findall('ContextObject')
context_objects_sorted = sorted(context_objects, key=lambda c: (c.attrib['allow-null'], c.attrib['name']))
root[:] = context_objects_sorted
# Remove mandatoryobjects from Descriptor attributes and pretty print
root.attrib.pop("mandatoryobjects", None)
# paste new line here
# Convert to string in order to write the enhanced descriptor
xml = etree.tostring(root, pretty_print=True, encoding="UTF-8", xml_declaration=True)
# Write the enhanced descriptor
descriptor.seek(0) # Set cursor at beginning of the file
descriptor.truncate(0) # Make sure that file is empty
descriptor.write(xml)
descriptor.close()
counter+=1

Reg adding data to an existing XML in Python

I have to parse an xml file & modify the data in a particular tag using Python. I'm using Element Tree to do this. I'm able to parse & reach the required tag. But I'm not able to modify the value. I'm not sure if Element Tree is okay or if I should use TreeBuilder for this.
As you can see below I just want to replace the Not Executed under Verdict with a string value.
-<Procedure>
<PreCondition>PRECONDITION: - ECU in extended diagnostic session (zz = 0x03) </PreCondition>
<PostCondition/>
<ProcedureID>428495</ProcedureID>
<SequenceNumber>2</SequenceNumber>
<CID>-1</CID>
<**Verdict** Writable="true">NotExecuted</Verdict>
</Procedure>
import xml.etree.ElementTree as etree
X_tree = etree.parse('DIAGNOSTIC SERVER.xml')
X_root = X_tree.getroot()
ATC_Name = X_root.iterfind('TestOrder//TestOrder//TestSuite//')
try:
while(1):
temp = ATC_Name.next()
if temp.tag == 'ProcedureID' and temp.text == str(TestCase_Id[j].text).split('-')[1]:
ATC_Name.next()
ATC_Name.next()
ATC_Name.next().text = 'Pass' <--This is what I want to do
ATC_Name.close()
break
except:
print sys.exc_info()
I believe my approach is wrong. Kindly guide me with right pointers.
Thanks.
You'd better switch to lxml so that you can use the "unlimited" power of xpath.
The idea is to use the following xpath expression:
//Procedure[ProcedureID/text()="%d"]/Verdict
where %d placeholder is substituted with the appropriate procedure id via string formatting operation.
The xpath expression finds the appropriate Verdict tag which you can set text on:
from lxml import etree
data = """<Procedure>
<PreCondition>PRECONDITION: - ECU in extended diagnostic session (zz = 0x03) </PreCondition>
<PostCondition/>
<ProcedureID>428495</ProcedureID>
<SequenceNumber>2</SequenceNumber>
<CID>-1</CID>
<Verdict Writable="true">NotExecuted</Verdict>
</Procedure>"""
ID = 428495
tree = etree.fromstring(data)
verdict = tree.xpath('//Procedure[ProcedureID/text()="%d"]/Verdict' % ID)[0]
verdict.text = 'test'
print etree.tostring(tree)
prints:
<Procedure>
<PreCondition>PRECONDITION: - ECU in extended diagnostic session (zz = 0x03) </PreCondition>
<PostCondition/>
<ProcedureID>428495</ProcedureID>
<SequenceNumber>2</SequenceNumber>
<CID>-1</CID>
<Verdict Writable="true">test</Verdict>
</Procedure>
Here is a solution using ElementTree. See Modifying an XML File
import xml.etree.ElementTree as et
tree = et.parse('prison.xml')
root = tree.getroot()
print root.find('Verdict').text #before update
root.find('Verdict').text = 'Executed'
tree.write('prison.xml')
try this
import xml.etree.ElementTree as et
root=et.parse(xmldata).getroot()
s=root.find('Verdict')
s.text='Your string'

LXML Xpath does not seem to return full path

OK I'll be the first to admit its is, just not the path I want and I don't know how to get it.
I'm using Python 3.3 in Eclipse with Pydev plugin in both Windows 7 at work and ubuntu 13.04 at home. I'm new to python and have limited programming experience.
I'm trying to write a script to take in an XML Lloyds market insurance message, find all the tags and dump them in a .csv where we can easily update them and then reimport them to create an updated xml.
I have managed to do all of that except when I get all the tags it only gives the tag name and not the tags above it.
<TechAccount Sender="broker" Receiver="insurer">
<UUId>2EF40080-F618-4FF7-833C-A34EA6A57B73</UUId>
<BrokerReference>HOY123/456</BrokerReference>
<ServiceProviderReference>2012080921401A1</ServiceProviderReference>
<CreationDate>2012-08-10</CreationDate>
<AccountTransactionType>premium</AccountTransactionType>
<GroupReference>2012080921401A1</GroupReference>
<ItemsInGroupTotal>
<Count>1</Count>
</ItemsInGroupTotal>
<ServiceProviderGroupReference>8-2012-08-10</ServiceProviderGroupReference>
<ServiceProviderGroupItemsTotal>
<Count>13</Count>
</ServiceProviderGroupItemsTotal>
That is a fragment of the XML. What I want is to find all the tags and their path. For example for I want to show it as ItemsInGroupTotal/Count but can only get it as Count.
Here is my code:
xml = etree.parse(fullpath)
print( xml.xpath('.//*'))
all_xpath = xml.xpath('.//*')
every_tag = []
for i in all_xpath:
single_tag = '%s,%s' % (i.tag, i.text)
every_tag.append(single_tag)
print(every_tag)
This gives:
'{http://www.ACORD.org/standards/Jv-Ins-Reinsurance/1}ServiceProviderGroupReference,8-2012-08-10', '{http://www.ACORD.org/standards/Jv-Ins-Reinsurance/1}ServiceProviderGroupItemsTotal,\n', '{http://www.ACORD.org/standards/Jv-Ins-Reinsurance/1}Count,13',
As you can see Count is shown as {namespace}Count, 13 and not {namespace}ItemsInGroupTotal/Count, 13
Can anyone point me towards what I need?
Thanks (hope my first post is OK)
Adam
EDIT:
This is my code now:
with open(fullpath, 'rb') as xmlFilepath:
xmlfile = xmlFilepath.read()
fulltext = '%s' % xmlfile
text = fulltext[2:]
print(text)
xml = etree.fromstring(fulltext)
tree = etree.ElementTree(xml)
every_tag = ['%s, %s' % (tree.getpath(e), e.text) for e in xml.iter()]
print(every_tag)
But this returns an error:
ValueError: Unicode strings with encoding declaration are not supported. Please use bytes input or XML fragments without declaration.
I remove the first two chars as thy are b' and it complained it didn't start with a tag
Update:
I have been playing around with this and if I remove the xis: xxx tags and the namespace stuff at the top it works as expected. I need to keep the xis tags and be able to identify them as xis tags so can't just delete them.
Any help on how I can achieve this?
ElementTree objects have a method getpath(element), which returns a
structural, absolute XPath expression to find that element
Calling getpath on each element in a iter() loop should work for you:
from pprint import pprint
from lxml import etree
text = """
<TechAccount Sender="broker" Receiver="insurer">
<UUId>2EF40080-F618-4FF7-833C-A34EA6A57B73</UUId>
<BrokerReference>HOY123/456</BrokerReference>
<ServiceProviderReference>2012080921401A1</ServiceProviderReference>
<CreationDate>2012-08-10</CreationDate>
<AccountTransactionType>premium</AccountTransactionType>
<GroupReference>2012080921401A1</GroupReference>
<ItemsInGroupTotal>
<Count>1</Count>
</ItemsInGroupTotal>
<ServiceProviderGroupReference>8-2012-08-10</ServiceProviderGroupReference>
<ServiceProviderGroupItemsTotal>
<Count>13</Count>
</ServiceProviderGroupItemsTotal>
</TechAccount>
"""
xml = etree.fromstring(text)
tree = etree.ElementTree(xml)
every_tag = ['%s, %s' % (tree.getpath(e), e.text) for e in xml.iter()]
pprint(every_tag)
prints:
['/TechAccount, \n',
'/TechAccount/UUId, 2EF40080-F618-4FF7-833C-A34EA6A57B73',
'/TechAccount/BrokerReference, HOY123/456',
'/TechAccount/ServiceProviderReference, 2012080921401A1',
'/TechAccount/CreationDate, 2012-08-10',
'/TechAccount/AccountTransactionType, premium',
'/TechAccount/GroupReference, 2012080921401A1',
'/TechAccount/ItemsInGroupTotal, \n',
'/TechAccount/ItemsInGroupTotal/Count, 1',
'/TechAccount/ServiceProviderGroupReference, 8-2012-08-10',
'/TechAccount/ServiceProviderGroupItemsTotal, \n',
'/TechAccount/ServiceProviderGroupItemsTotal/Count, 13']
UPD:
If your xml data is in the file test.xml, the code would look like:
from pprint import pprint
from lxml import etree
xml = etree.parse('test.xml').getroot()
tree = etree.ElementTree(xml)
every_tag = ['%s, %s' % (tree.getpath(e), e.text) for e in xml.iter()]
pprint(every_tag)
Hope that helps.
getpath() does indeed return an xpath that's not suited for human consumption. From this xpath, you can build up a more useful one though. Such as with this quick-and-dirty approach:
def human_xpath(element):
full_xpath = element.getroottree().getpath(element)
xpath = ''
human_xpath = ''
for i, node in enumerate(full_xpath.split('/')[1:]):
xpath += '/' + node
element = element.xpath(xpath)[0]
namespace, tag = element.tag[1:].split('}', 1)
if element.getparent() is not None:
nsmap = {'ns': namespace}
same_name = element.getparent().xpath('./ns:' + tag,
namespaces=nsmap)
if len(same_name) > 1:
tag += '[{}]'.format(same_name.index(element) + 1)
human_xpath += '/' + tag
return human_xpath

How to parse XML using python

I am trying to parse an xml using python for create a result summary file. Below is my code and a snippet of xml, Like the below i have couple of sections with <test> and </test>
<test name="tst_case1">
<prolog time="2013-01-18T14:41:09+05:30"/>
<verification name="VP5" file="D:/Squish/HMI_testing/tst_case1/test.py" type="properties" line="6">
<result time="2013-01-18T14:41:10+05:30" type="PASS">
<description>VP5: Object propertycomparisonof ':_QMenu_3.enabled'passed</description> <description type="DETAILED">'false' and 'false' are equal</description>
<description type="object">:_QMenu_3</description>
<description type="property">enabled</description>
<description type="failedValue">false</description>
</result>
</verification>
<epilog time="2013-01-18T14:41:11+05:30"/>
</test>
What I want to get is,
in one <test> section how many PASS / FAIL is there.
With the below code its printing the total pass/Fail in the xml file.But i am interested in each section how many PASS/FAIL. can any boy tell me the procedure to fetchout this ?
import sys
import xml.dom.minidom as XY
file = open("result.txt", "w")
tree = XY.parse('D:\\Squish\\squish results\\Results-On-2013-01-18_0241 PM.xml')
Test_name = tree.getElementsByTagName('test')
Test_status = tree.getElementsByTagName('result')
count_testname =0
passcount = 0
failcount = 0
Test_name_array = []
for my_Test_name in Test_name:
count_testname = count_testname+1
passcount = 0
failcount = 0
my_Test_name_final = my_Test_name.getAttribute('name')
Test_name_array = my_Test_name_final
if(count_testname > 1):
print(my_Test_name_final)
for my_Test_status in Test_status:
my_Test_status_final = my_Test_status.getAttribute('type')
if(my_Test_status_final == 'PASS'):
passcount = passcount+1
if(my_Test_status_final == 'FAIL'):
failcount = failcount+1
print(str(my_Test_status_final))
I'd not use minidom for this task; the DOM API is very cumbersome, verbose, and not suited for searching and matching.
The Python library also includes the xml.etree.ElementTree API, I'd use that instead:
from xml.etree import ElementTree as ET
tree = ET.parse(r'D:\Squish\squish results\Results-On-2013-01-18_0241 PM.xml')
tests = dict()
# Find all <test> elements with a <verification> child:
for test in tree.findall('.//test[verification]'):
passed = len(test.findall(".//result[#type='PASS']"))
failed = len(test.findall(".//result[#type='FAIL']"))
tests[test.attrib['name']] = {'pass': passed, 'fail': failed}
The above piece of code counts the number of passed and failed tests per <test> element and stores them in a dictionary, keyed to the name attribute of the <test> element.
I've tested the above code with Python 3.2 and the full XML document from another question you posted, which results in:
{'tst_Setup_menu_2': {'fail': 0, 'pass': 8}}
Thanks for the posting. i got it working using minidon.
still wish to see how can be solved using xml.etree.ElementTree
import sys
import xml.dom.minidom as XY
file = open("Result_Summary.txt", "w")
#tree = XY.parse('D:\\Squish\\squish results\\Results-On-2013-01-18_0241 PM.xml')
#print (str(sys.argv[1]))
tree = XY.parse(sys.argv[1])
Test_name = tree.getElementsByTagName('test')
count_testname =0
file.write('Test Name \t\t\t No:PASS\t\t\t No:FAIL\t \n\n')
for my_Test_name in Test_name:
count_testname = count_testname+1
my_Test_name_final = my_Test_name.getAttribute('name')
if(count_testname > 1):
#print(my_Test_name_final)
file.write(my_Test_name_final)
file.write('\t\t\t\t')
my_Test_status = my_Test_name.getElementsByTagName('result')
passcount = 0
failcount = 0
for my_Test_status_1 in my_Test_status:
my_Test_status_final = my_Test_status_1.getAttribute('type')
if(my_Test_status_final == 'PASS'):
passcount = passcount+1
if(my_Test_status_final == 'FAIL'):
failcount = failcount+1
#print(str(my_Test_status_final))
file.write(str(passcount))
#print(passcount)
file.write('\t\t\t\t')
file.write(str(failcount))
# print(failcount)
file.write('\n')
#print ('loop count: %d' %count_testname)
#print('PASS count: %s' %passcount)
#print('FAIL count: %s' %failcount)
file.close()
Although not a standard module but well worth the effort of installing is lxml especially if you want to do fast Xml parsing etc IMHO.
Without a full example of your results I guessed at what they would look like.
from lxml import etree
tree = etree.parse("results.xml")
count_result_type = etree.XPath("count(.//result[#type = $name])")
for test in tree.xpath("//test"):
print test.attrib['name']
print "\t# FAILS ", count_result_type(test, name="FAIL")
print "\t# PASSES", count_result_type(test, name="PASS")
I generated the following running against my guess of your xml, which should give you an idea of what is happening.
tst_case1
# FAILS 1.0
# PASSES 1.0
tst_case0
# FAILS 0.0
# PASSES 1.0
tst_case2
# FAILS 0.0
# PASSES 1.0
tst_case3
# FAILS 0.0
# PASSES 1.0
What I like about lxml is how expressive it can be, YMMV.
I see you are using Squish. You should check your squish folder under \examples\regressiontesting. There you can find a file called xml2result2html.py. Here you can find an example of converting squish test results into html.

Categories

Resources