Not getting XML output as expected - python

I have Python3 and am following this XML tutorial, https://docs.python.org/3.7/library/xml.etree.elementtree.html
I wish to output a listing of all DailyIndexRatio
DailyIndexRatio {'CUSIP': '912810FD5','IssueDate': '1998-04-15',
'Date':'2019-03-01','RefCPI':'251.23300','IndexRatio':'1.55331' }
....
Instead my code outputs
DailyIndexRatio {}
....
How to fix?
Here is the code
import xml.etree.ElementTree as ET
tree = ET.parse('CPI_20190213.xml')
root = tree.getroot()
print(root.tag)
print(root.attrib)
for child in root:
print(child.tag,child.attrib)
And I downloaded the xml file from https://treasurydirect.gov/xml/CPI_20190213.xml

import xml.etree.ElementTree as ET
tree = ET.parse('CPI_20190213.xml') # Load the XML
root = tree.getroot() # Get XML root element
e = root.findall('.//DailyIndexRatio') # use xpath to find relevant elements
# for each element
for i in e:
# create a dictionary object.
d = {}
# for each child of element
for child in i:
# add the tag name and text value to the dictionary
d[child.tag] = child.text
# print the DailyIndexRatio tag name and dictionary
print (i.tag, d)
Outputs:
DailyIndexRatio {'CUSIP': '912810FD5', 'IssueDate': '1998-04-15', 'Date': '2019-03-01', 'RefCPI': '251.23300', 'IndexRatio': '1.55331'}
DailyIndexRatio {'CUSIP': '912810FD5', 'IssueDate': '1998-04-15', 'Date': '2019-03-02', 'RefCPI': '251.24845', 'IndexRatio': '1.55341'}
DailyIndexRatio {'CUSIP': '912810FD5', 'IssueDate': '1998-04-15', 'Date': '2019-03-03', 'RefCPI': '251.26390', 'IndexRatio': '1.55351'}
DailyIndexRatio {'CUSIP': '912810FD5', 'IssueDate': '1998-04-15', 'Date': '2019-03-04', 'RefCPI': '251.27935', 'IndexRatio': '1.55360'}
...

You're printing the attributes, but that element does not have any attributes.
This is an element with attributes:
<element name="Bob" age="40" sex="male" />
But the element you're trying to print doesn't have those. It has child elements:
<element>
<name>Bob</name>
<age>40</age>
<sex>male</sex>
</element>

Related

Parse xml file to a python list

I have a xml file:
<?xml version="1.0" encoding="utf-8" standalone="yes"?>
<Document xmlns:xsi="http://www.w3.org/2001/XMLSchema-instance" xmlns="urn:iso:std:iso:20022:tech:xsd:pain.001.001.03">
<CstmrCdtTrfInitn>
<GrpHdr>
<MsgId>637987745078994894</MsgId>
<CreDtTm>2022-09-14T05:48:27</CreDtTm>
<NbOfTxs>205</NbOfTxs>
<CtrlSum>154761.02</CtrlSum>
<InitgPty>
<Nm> Company</Nm>
</InitgPty>
</GrpHdr>
<PmtInf>
<PmtInfId>20220914054827-154016</PmtInfId>
<PmtMtd>TRF</PmtMtd>
<BtchBookg>true</BtchBookg>
<NbOfTxs>205</NbOfTxs>
<CtrlSum>154761.02</CtrlSum>
<PmtTpInf>
<SvcLvl>
<Cd>SEPA</Cd>
</SvcLvl>
<CtgyPurp>
<Cd>SALA</Cd>
</CtgyPurp>
</PmtTpInf>
<CdtTrfTxInf> <----------------------------------
<Amt>
<InstdAmt Ccy="EUR">1536.96</InstdAmt>
</Amt>
<Cdtr>
<Nm>Achternaam, Voornaam </Nm>
</Cdtr>
<CdtrAcct>
<Id>
<IBAN>NL80RABO0134343443</IBAN>
</Id>
</CdtrAcct>
</CdtTrfTxInf> <------------------------------------
<CdtTrfTxInf> <----------------------------------
<Amt>
<InstdAmt Ccy="EUR">1676.96</InstdAmt>
</Amt>
<Cdtr>
<Nm>Achternaam, Voornaam </Nm>
</Cdtr>
<CdtrAcct>
<Id>
<IBAN>NL80RABO013433222243</IBAN>
</Id>
</CdtrAcct>
</CdtTrfTxInf> <------------------------------------
</CstmrCdtTrfInitn>
</Document>
I use ElementTree:
I want a python list of tuples with the info within the tag (everything between the arrows in the example xml file). So in this example i want al list with 2 tuples.
How can i do that.
I can iterate over the tree, but thats is.
my code:
import xml.etree.ElementTree as ET
tree = ET.parse(xml_file)
root = tree.getroot()
for elem in tree.iter():
print(elem.tag, elem.text) --> i get every tag in the whole file
I rather like to use xmltodict.
First of all, your input data as given is missing a closing </PmtInf> tag towards the end, just before your closing </CstmrCdtTrfInitn> tag. After fixing that, I saved your xml data into a file and did the following:
import xmltodict
with open("input_data.xml", "r") as f:
xml_data = f.read()
xml_dict = xmltodict.parse(xml_data)
You can then access the xml data using dictionary accessors, for example:
xml_dict
>>>{'Document': {'#xmlns:xsi': 'http://www.w3.org/20...a-instance', '#xmlns': 'urn:iso:std:iso:2002...001.001.03', 'CstmrCdtTrfInitn': {...}}}
xml_dict["Document"]
>>>{'#xmlns:xsi': 'http://www.w3.org/20...a-instance', '#xmlns': 'urn:iso:std:iso:2002...001.001.03', 'CstmrCdtTrfInitn': {'GrpHdr': {...}, 'PmtInf': {...}}}
xml_dict["Document"]["CstmrCdtTrfInitn"].keys()
>>>dict_keys(['GrpHdr', 'PmtInf'])
xml_dict["Document"]["CstmrCdtTrfInitn"]["PmtInf"]
{'PmtInfId': '20220914054827-154016', 'PmtMtd': 'TRF', 'BtchBookg': 'true', 'NbOfTxs': '205', 'CtrlSum': '154761.02', 'PmtTpInf': {'SvcLvl': {...}, 'CtgyPurp': {...}}, 'CdtTrfTxInf': [{...}, {...}]}
xml_dict["Document"]["CstmrCdtTrfInitn"]["PmtInf"].keys()
dict_keys(['PmtInfId', 'PmtMtd', 'BtchBookg', 'NbOfTxs', 'CtrlSum', 'PmtTpInf', 'CdtTrfTxInf'])
Then you can loop over your CdtTrfTxInf with:
for item in xml_dict["Document"]["CstmrCdtTrfInitn"]["PmtInf"]["CdtTrfTxInf"]:
print(item)
giving the output:
{'Amt': {'InstdAmt': {'#Ccy': 'EUR', '#text': '1536.96'}}, 'Cdtr': {'Nm': 'Achternaam, Voornaam'}, 'CdtrAcct': {'Id': {'IBAN': 'NL80RABO0134343443'}}}
{'Amt': {'InstdAmt': {'#Ccy': 'EUR', '#text': '1676.96'}}, 'Cdtr': {'Nm': 'Achternaam, Voornaam'}, 'CdtrAcct': {'Id': {'IBAN': 'NL80RABO013433222243'}}}
which you can process as you want.
this is just a speedcode try xd give it a chance and try it :
import xml.etree.ElementTree as ET
tree = ET.parse("fr.xml")
root = tree.getroot()
test = False
for elem in tree.iter():
if elem.tag == "CdtTrfTxInf":
test = True
continue
if test and elem.text.strip() :
print(elem.tag, elem.text)
with result as list of tuple :
import xml.etree.ElementTree as ET
tree = ET.parse("fr.xml")
root = tree.getroot()
test = False
tag = []
textval=[]
for elem in tree.iter():
if elem.tag == "CdtTrfTxInf":
test = True
continue
if test and elem.text.strip() :
tag.append(elem.tag)
textval.append(elem.text)
data = list(zip(tag, textval))
print (data)

adjust Python fuction to parse xml

I need to read an XML file in an external domain.
my code:
tree = ET.ElementTree(file=urllib2.urlopen('http://192.168.2.57:8010/data/camera_state.xml'))
root = tree.getroot()
root.tag, root.attrib
for elem in tree.iter():
print elem.tag, elem.att
I could not get into the structure I need, the result of my function is this below:
CameraState {}
Cameras {}
Camera {'Id': '1'}
State {}
Camera {'Id': '2'}
State {}
Camera {'Id': '3'}
State {}
Camera {'Id': '4'}
State {}
I need to adjust this Python function to get into a result as below:
<CameraState>
<Cameras>
<Camera Id="1">
<State>NO_SIGNAL</State>
</Camera>
<Camera Id="2">
<State>OK</State>
</Camera>
</Cameras>
</CameraState>
You do have the parsed structure. It's just about the way you are accessing it.
Use getchildren to access children nodes. An example of recursively printing the structure:
import xml.etree.ElementTree as ET
def print_tree(node, prefix=''):
print(prefix, node.tag, node.attrib, node.text.strip())
for child in node:
print_tree(child, prefix + ' ')
tree = ET.ElementTree(file=<your file>)
root = tree.getroot()
print_tree(root)
It gives:
CameraState {}
Cameras {}
Camera {'Id': '1'}
State {} NO_SIGNAL
Camera {'Id': '2'}
State {} OK
However, I recommend you take a look at xmltodict:
import xmltodict
with open(<your file>) as f:
tree = xmltodict.parse(f.read())
print(tree)
It gives you OrderedDicts:
OrderedDict([('CameraState', OrderedDict([('Cameras', OrderedDict([('Camera', [OrderedDict([('#Id', '1'), ('State', 'NO_SIGNAL')]), OrderedDict([('#Id', '2'), ('State', 'OK')])])]))]))])

lxml etree get all text before element

How to get all text before an element in a etree separated from the text after the element?
from lxml import etree
tree = etree.fromstring('''
<a>
find
<b>
the
</b>
text
<dd></dd>
<c>
before
</c>
<dd></dd>
and after
</a>
''')
What do I want? In this example, the <dd> tags are separators and for all of them
for el in tree.findall('.//dd'):
I would like to have all text before and after them:
[
{
el : <Element dd at 0xsomedistinctadress>,
before : 'find the text',
after : 'before and after'
},
{
el : <Element dd at 0xsomeotherdistinctadress>,
before : 'find the text before',
after : 'and after'
}
]
My idea was to use some kind of placeholders in the tree with which I replace the <dd> tags and then cut the string at that placeholder, but I need the correspondence with the actual element.
There might be a simpler way, but I would use the following XPath expressions:
preceding-sibling::*/text()|preceding::text()
following-sibling::*/text()|following::text()
Sample implementation (definitely violating the DRY principle):
def get_text_before(element):
for item in element.xpath("preceding-sibling::*/text()|preceding-sibling::text()"):
item = item.strip()
if item:
yield item
def get_text_after(element):
for item in element.xpath("following-sibling::*/text()|following-sibling::text()"):
item = item.strip()
if item:
yield item
for el in tree.findall('.//dd'):
before = " ".join(get_text_before(el))
after = " ".join(get_text_after(el))
print {
"el": el,
"before": before,
"after": after
}
Prints:
{'el': <Element dd at 0x10af81488>, 'after': 'before and after', 'before': 'find the text'}
{'el': <Element dd at 0x10af81200>, 'after': 'and after', 'before': 'find the text before'}

How to parse and display the content of an Ixml object using IXML

I am having difficult parsing the xml _file below using Ixml:
>>_file= "qv.xml"
file content:
<document reference="suspicious-document00500.txt">
<feature name="plagiarism" type="artificial" obfuscation="none" this_offset="128" this_length="2503" source_reference="source-document00500.txt" source_offset="138339" source_length="2503"/>
<feature name="plagiarism" type="artificial" obfuscation="none" this_offset="8593" this_length="1582" source_reference="source-document00500.txt" source_offset="49473" source_length="1582"/>
</document>
Here is my attempt:
>>from lxml.etree import XMLParser, parse
>>parsefile = parse(_file)
>>print parsefile
Output: <lxml.etree._ElementTree object at 0x000000000642E788>
The output is the location of the ixml object, while I am after the actual file content ie
Desired output={'document reference'="suspicious-document00500.txt", 'this_offset': '128', 'obfuscation': 'none', 'source_length': '2503', 'name': 'plagiarism', 'this_length': '2503', 'source_reference': 'source-document00500.txt', 'source_offset': '138339', 'type': 'artificial'}
Any ideas on how to get the desired output? thanks.
Here's one way of getting the desired outputs:
from lxml import etree
def main():
doc = etree.parse('qv.xml')
root = doc.getroot()
print root.attrib
for item in root:
print item.attrib
if __name__ == "__main__":
main()
Output:
{'reference': 'suspicious-document00500.txt'}
{'this_offset': '128', 'obfuscation': 'none', 'source_length': '2503', 'name': 'plagiarism', 'this_length': '2503', 'source_reference': 'source-document00500.txt', 'source_offset': '138339', 'type': 'artificial'}
{'this_offset': '8593', 'obfuscation': 'none', 'source_length': '1582', 'name': 'plagiarism', 'this_length': '1582', 'source_reference': 'source-document00500.txt', 'source_offset': '49473', 'type': 'artificial'}
It works fine with the contents you gave.
You might want to read thisto see how etree represents xml objects.

How to iterate over GraphML file with lxml

I have the following GraphML file 'mygraph.gml' that I want to parse with a simple python script:
This represents a simple graph with 2 nodes "node0", "node1" and an edge between them
<?xml version="1.0" encoding="UTF-8"?>
<graphml xmlns="http://graphml.graphdrawing.org/xmlns"
xmlns:xsi="http://www.w3.org/2001/XMLSchema-instance"
xsi:schemaLocation="http://graphml.graphdrawing.org/xmlns
http://graphml.graphdrawing.org/xmlns/1.0/graphml.xsd">
<key id="name" for="node" attr.name="name" attr.type="string"/>
<key id="weight" for="edge" attr.name="weight" attr.type="double"/>
<graph id="G" edgedefault="directed">
<node id="n0">
<data key="name">node1</data>
</node>
<node id="n1">
<data key="name">node2</data>
</node>
<edge source="n1" target="n0">
<data key="weight">1</data>
</edge>
</graph>
</graphml>
This represents a graph with two nodes n0 and n1 with an edge of weight 1 between them.
I want to parse this structure with python.
I wrote a script with the help of lxml (I need to use it because the dataset in much much bigger than this simple example, more than 10^5 nodes, python minidom is too slow)
import lxml.etree as et
tree = et.parse('mygraph.gml')
root = tree.getroot()
graphml = {
"graph": "{http://graphml.graphdrawing.org/xmlns}graph",
"node": "{http://graphml.graphdrawing.org/xmlns}node",
"edge": "{http://graphml.graphdrawing.org/xmlns}edge",
"data": "{http://graphml.graphdrawing.org/xmlns}data",
"label": "{http://graphml.graphdrawing.org/xmlns}data[#key='label']",
"x": "{http://graphml.graphdrawing.org/xmlns}data[#key='x']",
"y": "{http://graphml.graphdrawing.org/xmlns}data[#key='y']",
"size": "{http://graphml.graphdrawing.org/xmlns}data[#key='size']",
"r": "{http://graphml.graphdrawing.org/xmlns}data[#key='r']",
"g": "{http://graphml.graphdrawing.org/xmlns}data[#key='g']",
"b": "{http://graphml.graphdrawing.org/xmlns}data[#key='b']",
"weight": "{http://graphml.graphdrawing.org/xmlns}data[#key='weight']",
"edgeid": "{http://graphml.graphdrawing.org/xmlns}data[#key='edgeid']"
}
graph = tree.find(graphml.get("graph"))
nodes = graph.findall(graphml.get("node"))
edges = graph.findall(graphml.get("edge"))
This script gets correctly the nodes and edges so that I can simply iterate over them
for n in nodes:
print n.attrib
or similarly on edges:
for e in edges:
print (e.attrib['source'], e.attrib['target'])
but I can't really understand how to get the "data" tag for the edges or the nodes in order to print the edge weight and nodes tag "name".
This doesn't work for me:
weights = graph.findall(graphml.get("weight"))
the last list is always empty. Why? I'm missing something around but can't understand what.
You can't do it in one pass, but for each node found, you can build a dict with the key/value of data:
graph = tree.find(graphml.get("graph"))
nodes = graph.findall(graphml.get("node"))
edges = graph.findall(graphml.get("edge"))
for node in nodes + edges:
attribs = {}
for data in node.findall(graphml.get('data')):
attribs[data.get('key')] = data.text
print 'Node', node, 'have', attribs
It give the result:
Node <Element {http://graphml.graphdrawing.org/xmlns}node at 0x7ff053d3e5a0> have {'name': 'node1'}
Node <Element {http://graphml.graphdrawing.org/xmlns}node at 0x7ff053d3e5f0> have {'name': 'node2'}
Node <Element {http://graphml.graphdrawing.org/xmlns}edge at 0x7ff053d3e640> have {'weight': '1'}

Categories

Resources