I'm a beginner in Python and struggling to understand why while iterating through a dictionary obtained from an XML file I get an error when I try to search for required keys.I should also mention that, I still get the result I want, but somehow I keep receiving an error.
import os
import shelve
import xml.etree.ElementTree as et
shelfFile = shelve.open('xml_data')
base_path = os.path.dirname(os.path.realpath(__file__))
xml_file = os.path.join(base_path, "data\\nrm_icg_catalog.xml")
tree = et.parse(xml_file)
root = tree.getroot()
elements = ['Name','WBS']
for child in root:
for itemGroup1 in child:
for item in elements:
print(itemGroup1.attrib[item])
Result with Error Message:
Facilitating Works
0
Substructure
1
Superstructure
2
Internal Finish
3
Fittings
4
Services
5
Prefabs
6
Works to Existing Building
7
External Works
8
MC Prelims
9
MC OH and P
10
Traceback (most recent call last):
File "c:/Users/Dodzi Agbenorku/OneDrive/Training Files/Programming Lessons/Python/xmlExcel/app.py", line 22, in <module>
print(itemGroup1.attrib[item])
KeyError: 'Name'
Here's a small section of the xml file I am using:
<?xml version="1.0" encoding="utf-8"?>
<Takeoff xmlns:xs="http://www.w3.org/2001/XMLSchema" xmlns="http://download.autodesk.com/us/navisworks/schemas/nw-TakeoffCatalog-10.0.xsd">
<Catalog>
<ItemGroup Name="Facilitating Works" WBS="0" CatalogId="32b4ab2d-6fe8-4c45-9872-c8ea68c0c4de">
<ItemGroup Name="Hazardous Materials" WBS="1" CatalogId="2fdb6bd1-b2d1-4167-a74d-da818183a156">
<ItemGroup Name="Material Removal" WBS="1" CatalogId="ccc8a515-4152-400c-a72e-6fd78561325e">
<Item Name="Material Details" WBS="1" Transparency="0.3" Color="-15161029" LineThickness="0.1" CatalogId="c0a7de26-6bc3-491e-b3c7-3ff5b560eeaf">
<VariableCollection>
<Variable Name="Length" Formula="=ModelLength" Units="Meter" />
<Variable Name="Width" Formula="=ModelWidth" Units="Meter" />
<Variable Name="Thickness" Formula="=ModelThickness" Units="Meter" />
<Variable Name="Height" Formula="=ModelHeight" Units="Meter" />
<Variable Name="Perimeter" Formula="=ModelPerimeter" Units="Meter" />
<Variable Name="Area" Formula="=ModelArea" Units="SquareMeter" />
Any help will be extremely appreciated.
The error occurs because for some nodes, Name is not in attrib which is a dictionary.
Instead of itemGroup1.attrib[item], use itemGroup1.attrib.get(item). It will return the value None if the key does not exist, and it will not throw an error.
Related
I am trying to create an API connection and response is looking like below. I need to parse this data and turn it into a pd dataframe and/or create loop to find specific information belong to tags.
Below is the code i try to run but it returns with empty list, and it looks not iterable.
Also it is not convertible to a data frame for now. What steps should I take to handle this data?
import requests
import pandas as pd
import xml.etree.ElementTree as ET
response = """<?xml version = "1.0" encoding = "utf-8"?>
<SOAP-ENV:Envelope xmlns:SOAP-ENV="http://schemas.xmlsoap.org/soap/envelope/" xmlns:xsd="http://www.w3.org/2001/XMLSchema" xmlns:SOAP-ENC="http://schemas.xmlsoap.org/soap/encoding/" xmlns:xsi="http://www.w3.org/2001/XMLSchema-instance">
<SOAP-ENV:Body>
<Desperados_Clientes_V2.DESPERADOSResponse xmlns="TrainsWebb_V16">
<Sdtdesperadosclient xmlns="TrainsWebb_V16">
<SDTDesperadosClientItem xmlns="TrainsWebb_V16">
<AESA>10555555555 </AESA>
<DOCUMENTO>1666666666</DOCUMENTO>
<REMITENTE>888888888 </REMITENTE>
<NM_REMITENTE>ABDULREZZAK S.A.S. </NM_REMITENTE>
<FECHA_ELABORACION>14/8/2020</FECHA_ELABORACION>
<HORA_ELABORACION>11:27</HORA_ELABORACION>
<CODIGO_DEST>0000000000</CODIGO_DEST>
<NIT_DESTINATARIO>0000000000</NIT_DESTINATARIO>
<NOMBRE_DESTINATARIO>HOST ADMIRALE GORA</NOMBRE_DESTINATARIO>
<DIRECCION_DESTINATARIO>BBA 56 # 21 - 001</DIRECCION_DESTINATARIO>
<DANE_DESTINO>0200000</DANE_DESTINO>
<CIUDAD_DESTINO>GORA </CIUDAD_DESTINO>
<DEPARTAMENTO_DESTINO>ANTIOCHIA </DEPARTAMENTO_DESTINO>
<FECHA_ENTREGA>11/02/2020</FECHA_ENTREGA>
<HORA_ENTREGA>11:44</HORA_ENTREGA>
<FECHA_CITA />
<HORA_CITA />
<CODIGO_ESTADO>Z </CODIGO_ESTADO>
<NOMBRE_ESTADO>CUMPLEANNO </NOMBRE_ESTADO>
<FECHA_ESTADO>11/01/2020</FECHA_ESTADO>
<HORA_ESTADO>11:44</HORA_ESTADO>
<CODIGO_NOVEDAD />
<NOMBRE_NOVEDAD />
<FECHA_NOVEDAD />
<HORA_NOVEDAD />
<COMENTARIO_NOVEDAD />
<OBSERVACIONES />
<ENLACE_IMAGEN>https://ssssss.ssssssss.com/SSSSSS/xxxxxxxxxxxxxx.aspx?1111111,222222222,SIXA_XEOX,SIXAXEOX2016</ENLACE_IMAGEN>
<DOCUMENTO_2>169999999999</DOCUMENTO_2>
<DOCUMENTO_3 />
<DOCUMENTO_4 />
<FECHA_TRANSMISION>18/02/2020</FECHA_TRANSMISION>
<HORA_TRANSMISION>08:12:30</HORA_TRANSMISION>
<MENSAJE_TRANSMISION>KK</MENSAJE_TRANSMISION>
<PROMESA_SERVICIO>15/10/21</PROMESA_SERVICIO>
<CODIGO_DIVISION>011111</CODIGO_DIVISION>
<NOMBRE_DIVISION>ABDURREZZAK </NOMBRE_DIVISION>
</SDTDesperadosClientItem>
<SDTDesperadosClientItem xmlns="TrainsWebb_V16">
<AESA>10555555555 </AESA>
<DOCUMENTO>177777777</DOCUMENTO>
<REMITENTE>9999999999 </REMITENTE>
<NM_REMITENTE>ABDULREZZAK S.A.S. </NM_REMITENTE>
<FECHA_ELABORACION>12/8/2020</FECHA_ELABORACION>
<HORA_ELABORACION>16:27</HORA_ELABORACION>
<CODIGO_DEST>0000000000</CODIGO_DEST>
<NIT_DESTINATARIO>0000000000</NIT_DESTINATARIO>
<NOMBRE_DESTINATARIO>GORA FORA</NOMBRE_DESTINATARIO>
<DIRECCION_DESTINATARIO>BBG 16 # 91 - 021</DIRECCION_DESTINATARIO>
<DANE_DESTINO>0500000</DANE_DESTINO>
<CIUDAD_DESTINO>AROG </CIUDAD_DESTINO>
<DEPARTAMENTO_DESTINO>ANTIOCHIA </DEPARTAMENTO_DESTINO>
<FECHA_ENTREGA>10/02/2020</FECHA_ENTREGA>
<HORA_ENTREGA>10:44</HORA_ENTREGA>
<FECHA_CITA />
<HORA_CITA />
<CODIGO_ESTADO>D </CODIGO_ESTADO>
<NOMBRE_ESTADO>CUMPLEANNI </NOMBRE_ESTADO>
<FECHA_ESTADO>11/01/2020</FECHA_ESTADO>
<HORA_ESTADO>11:44</HORA_ESTADO>
<CODIGO_NOVEDAD />
<NOMBRE_NOVEDAD />
<FECHA_NOVEDAD />
<HORA_NOVEDAD />
<COMENTARIO_NOVEDAD />
<OBSERVACIONES />
<ENLACE_IMAGEN>https://ssssss.ssssssss.com/SSSSSS/xxxxxxxxxxxxxx.aspx?1111111,222222222,SIXA_XEOX,SIXAXEOX2016</ENLACE_IMAGEN>
<DOCUMENTO_2>1677777777</DOCUMENTO_2>
<DOCUMENTO_3 />
<DOCUMENTO_4 />
<FECHA_TRANSMISION>18/02/2020</FECHA_TRANSMISION>
<HORA_TRANSMISION>08:12:30</HORA_TRANSMISION>
<MENSAJE_TRANSMISION>HK</MENSAJE_TRANSMISION>
<PROMESA_SERVICIO>15/10/21</PROMESA_SERVICIO>
<CODIGO_DIVISION>011111</CODIGO_DIVISION>
<NOMBRE_DIVISION>ABDURREZZAK </NOMBRE_DIVISION>
</SDTDesperadosClientItem>
</Sdtdesperadosclient>
</Desperados_Clientes_V2.DESPERADOSResponse>
</SOAP-ENV:Body>
</SOAP-ENV:Envelope>"""
myroot = ET.fromstring(response)
for child in myroot.iter('*'):
print(child.tag)
sid = myroot.findall(".//{'TrainsWebb_V16'}AESA")
print(sid)
For parsing into a pandas DataFrame, you can use the pandas.read_xml function:
data_frame = pd.read_xml(response, xpath="//*[name()='SDTDesperadosClientItem']")
https://pandas.pydata.org/docs/reference/api/pandas.read_xml.html#pandas-read-xml
I had a xml code and i want to get text in exact elements(xml tags) using python language .
I have tried couple of solutions and didnt work.
import xml.etree.ElementTree as ET
tree = ET.fromstring(xml)
for node in tree.iter('Model'):
print node
How can i do that ?
Xml Code :
<soap:Envelope
xmlns:soap="http://schemas.xmlsoap.org/soap/envelope/"
xmlns:xsi="http://www.w3.org/2001/XMLSchema-instance"
xmlns:xsd="http://www.w3.org/2001/XMLSchema">
<soap:Body>
<GetVehicleLimitedInfoResponse
xmlns="http://schemas.conversesolutions.com/xsd/dmticta/v1">
<return>
<ResponseMessage xsi:nil="true" />
<ErrorCode xsi:nil="true" />
<RequestId> 2012290007705 </RequestId>
<TransactionCharge>150</TransactionCharge>
<VehicleNumber>GF-0176</VehicleNumber>
<AbsoluteOwner>SIYAPATHA FINANCE PLC</AbsoluteOwner>
<EngineNo>GA15-483936F</EngineNo>
<ClassOfVehicle>MOTOR CAR</ClassOfVehicle>
<Make>NISSAN</Make>
<Model>PULSAR</Model>
<YearOfManufacture>1998</YearOfManufacture>
<NoOfSpecialConditions>0</NoOfSpecialConditions>
<SpecialConditions xsi:nil="true" />
</return>
</GetVehicleLimitedInfoResponse>
</soap:Body>
</soap:Envelope>
Edited and improved answer:
import xml.etree.ElementTree as ET
import re
ns = {"veh": "http://schemas.conversesolutions.com/xsd/dmticta/v1"}
tree = ET.parse('test.xml') # save your xml as test.xml
root = tree.getroot()
def get_tag_name(tag):
return re.sub(r'\{.*\}', '',tag)
for node in root.find(".//veh:return", ns):
print(get_tag_name(node.tag)+': ', node.text)
It should produce something like this:
ResponseMessage: None
ErrorCode: None
RequestId: 2012290007705
TransactionCharge: 150
VehicleNumber: GF-0176
AbsoluteOwner: SIYAPATHA FINANCE PLC
EngineNo: GA15-483936F
ClassOfVehicle: MOTOR CAR
Make: NISSAN
Model: PULSAR
YearOfManufacture: 1998
NoOfSpecialConditions: 0
SpecialConditions: None
i want to read the entry between
<dc:title> </dc:title>
This is xml:
<?xml version='1.0' encoding='utf-8'?>
<package xmlns="http://www.idpf.org/2007/opf" version="2.0" unique-identifier="calibre-uuid">
<metadata xmlns:xsi="http://www.w3.org/2001/XMLSchema-instance" xmlns:opf="http://www.idpf.org/2007/opf" xmlns:dcterms="http://purl.org/dc/terms/" xmlns:calibre="http://calibre.kovidgoyal.net/2009/metadata" xmlns:dc="http://purl.org/dc/elements/1.1/">
<meta name="calibre:series_index" content="1"/>
<dc:language>UND</dc:language>
<dc:creator opf:file-as="Unbekannt" opf:role="aut">Johann Wolfgang von Goethe</dc:creator>
<meta name="calibre:timestamp" content="2009-10-08T07:26:21"/>
<dc:title>Faust_I_</dc:title>
<meta name="cover" content="cover"/>
<dc:date>2009-10-08T07:26:21</dc:date>
<dc:contributor opf:role="bkp">calibre (0.6.13) [http://calibre-ebook.com]</dc:contributor>
<dc:identifier id="calibre-uuid">urn:uuid:3cd4b26f-39a3-4783-9730-a86c26b30818</dc:identifier>
And that's my code:
from xml.etree import ElementTree as ET
tree = ET.parse('content.opf')
root = tree.getroot()
dc_namespace = "http://purl.org/dc/elements/1.1/"
print (root.attrib[ET.QName(dc_namespace, 'title')])
Output Error:
Traceback (most recent call last):
File "C:\Users\User\Documents\Visual Studio 2017\Projects\PythonApplication1\Modul1.py", line 8, in <module>
print (root.attrib[ET.QName(dc_namespace, 'title')])
KeyError: <QName '{xmlns:dc}title'>
What's wrong?
What you are looking for (<dc:title>) is an element, not an attribute. Here is how you can get its value:
from xml.etree import ElementTree as ET
tree = ET.parse('content.opf')
title = tree.find(".//{http://purl.org/dc/elements/1.1/}title")
print(title.text)
Output:
Faust_I_
Relevant references:
https://docs.python.org/3/library/xml.etree.elementtree.html#parsing-xml-with-namespaces
https://docs.python.org/3/library/xml.etree.elementtree.html#supported-xpath-syntax
you can use:
root[number][number]
to access the elements.
for example in
<base>
<element1>
<element2>asdada</element2>
</element>
</base>
root[0][0] will give u element 2
Example xml:
<response version-api="2.0">
<value>
<books>
<book available="20" id="1" tags="">
<title></title>
<author id="1" tags="Joel">Manuel De Cervantes</author>
</book>
<book available="14" id="2" tags="Jane">
<title>Catcher in the Rye</title>
<author id="2" tags="">JD Salinger</author>
</book>
<book available="13" id="3" tags="">
<title></title>
<author id="3">Lewis Carroll</author>
</book>
<book available="5" id="4" tags="Harry">
<title>Don</title>
<author id="4">Manuel De Cervantes</author>
</book>
</books>
</value>
</response>
I want to append a string value of my choosing to all attributes called "tags". This is whether the "tags" attribute has a value or not and also the attributes are at different levels of the xml structure. I have tried the method findall() but I keep on getting an error "IndexError: list index out of range." This is the code I have so far which is a little short but I have run out of steam for what else I need to type...
splitter = etree.XMLParser(strip_cdata=False)
xmldoc = etree.parse(os.path.join(root, xml_file), splitter ).getroot()
for child in xmldoc:
if child.tag != 'response':
allDescendants = list(etree.findall())
for child in allDescendants:
if hasattr(child, 'tags'):
child.attribute["tags"].value = "someString"
findall() is the right API to use. Here is an example:
from lxml import etree
import os
splitter = etree.XMLParser(strip_cdata=False)
xml_file = 'foo.xml'
root = '.'
xmldoc = etree.parse(os.path.join(root, xml_file), splitter ).getroot()
for element in xmldoc.findall(".//*[#tags]"):
element.attrib["tags"] += " KILROY!"
print etree.tostring(xmldoc)
I am parsing an output XML file generated from gtest. I want to find the result of each test case. A test case is failed only when "testcase" has element "failure" otherwise test case is passed. But I could not access element.
My xml file :-
<?xml version="1.0" encoding="UTF-8"?>
<testsuites tests="11" failures="0" disabled="0" errors="0" timestamp="2015-03-23T17:29:43" time="1.309" name="AllTests">
<testsuite name="AAA" tests="4" failures="0" disabled="0" errors="0" time="0.008">
<testcase name="BBBB" status="run" time="0.002" classname="AAA" />
<failure message="Value of: add(1, 1)
Actual: 3
Expected: 2" type="" />
<testcase name="CCC" status="run" time="0.002" classname="AAA" />
<testcase name="DDD" status="run" time="0.002" classname="AAA" />
<testcase name="FFF" status="run" time="0.002" classname="AAA" />
</testsuite>
</testsuites>
My python file is :-
from xlrd import open_workbook
from xml.dom.minidom import parse
import xml.dom.minidom
# Open XML document using minidom parser
DOMTree = xml.dom.minidom.parse("output.xml")
testsuites = DOMTree.documentElement
testCaseCollection = testsuites.getElementsByTagName("testcase")
testCasefailure = testsuites.getElementsByTagName("failure")
OutputXLS = open_workbook('output.xls')
for testCase in testCaseCollection:
#print testCase.firstChild;
if testsuites.getElementsByTagName("failure"):
print testCase.getAttribute("name"), " --> ","FAIL"
else:
print testCase.getAttribute("name"), " --> ","PASS"
And output is :-
BBB --> PASS
CCC --> PASS
DDD --> PASS
FFF --> PASS
Though test case "BBB" is failed as it has "failure" attribute in xml, it shows pass in result.
Kindly Help me out with this.
from xlrd import open_workbook
from xml.dom.minidom import parse
# Open XML document using minidom parser
DOMTree = parse("output.xml")
testsuites = DOMTree.documentElement
testCaseCollection = testsuites.getElementsByTagName("testcase")
OutputXLS = open_workbook('output.xls')
for testCase in testCaseCollection:
sibNode = testCase.nextSibling.nextSibling
if sibNode and sibNode.nodeName == 'failure':
print testCase.getAttribute("name"), " --> ","FAIL"
else:
print testCase.getAttribute("name"), " --> ","PASS"