Not getting XML output as expected

Not getting XML output as expected - python

I have Python3 and am following this XML tutorial, https://docs.python.org/3.7/library/xml.etree.elementtree.html
I wish to output a listing of all DailyIndexRatio
DailyIndexRatio {'CUSIP': '912810FD5','IssueDate': '1998-04-15',
'Date':'2019-03-01','RefCPI':'251.23300','IndexRatio':'1.55331' }
....
Instead my code outputs
DailyIndexRatio {}
....
How to fix?
Here is the code
import xml.etree.ElementTree as ET
tree = ET.parse('CPI_20190213.xml')
root = tree.getroot()
print(root.tag)
print(root.attrib)
for child in root:
print(child.tag,child.attrib)
And I downloaded the xml file from https://treasurydirect.gov/xml/CPI_20190213.xml

import xml.etree.ElementTree as ET
tree = ET.parse('CPI_20190213.xml') # Load the XML
root = tree.getroot() # Get XML root element
e = root.findall('.//DailyIndexRatio') # use xpath to find relevant elements
# for each element
for i in e:
# create a dictionary object.
d = {}
# for each child of element
for child in i:
# add the tag name and text value to the dictionary
d[child.tag] = child.text
# print the DailyIndexRatio tag name and dictionary
print (i.tag, d)
Outputs:
DailyIndexRatio {'CUSIP': '912810FD5', 'IssueDate': '1998-04-15', 'Date': '2019-03-01', 'RefCPI': '251.23300', 'IndexRatio': '1.55331'}
DailyIndexRatio {'CUSIP': '912810FD5', 'IssueDate': '1998-04-15', 'Date': '2019-03-02', 'RefCPI': '251.24845', 'IndexRatio': '1.55341'}
DailyIndexRatio {'CUSIP': '912810FD5', 'IssueDate': '1998-04-15', 'Date': '2019-03-03', 'RefCPI': '251.26390', 'IndexRatio': '1.55351'}
DailyIndexRatio {'CUSIP': '912810FD5', 'IssueDate': '1998-04-15', 'Date': '2019-03-04', 'RefCPI': '251.27935', 'IndexRatio': '1.55360'}
...

You're printing the attributes, but that element does not have any attributes.
This is an element with attributes:
<element name="Bob" age="40" sex="male" />
But the element you're trying to print doesn't have those. It has child elements:
<element>
<name>Bob</name>
<age>40</age>
<sex>male</sex>
</element>

Related

Parse xml file to a python list

I have a xml file:
<?xml version="1.0" encoding="utf-8" standalone="yes"?>
<Document xmlns:xsi="http://www.w3.org/2001/XMLSchema-instance" xmlns="urn:iso:std:iso:20022:tech:xsd:pain.001.001.03">
<CstmrCdtTrfInitn>
<GrpHdr>
<MsgId>637987745078994894</MsgId>
<CreDtTm>2022-09-14T05:48:27</CreDtTm>
<NbOfTxs>205</NbOfTxs>
<CtrlSum>154761.02</CtrlSum>
<InitgPty>
<Nm> Company</Nm>
</InitgPty>
</GrpHdr>
<PmtInf>
<PmtInfId>20220914054827-154016</PmtInfId>
<PmtMtd>TRF</PmtMtd>
<BtchBookg>true</BtchBookg>
<NbOfTxs>205</NbOfTxs>
<CtrlSum>154761.02</CtrlSum>
<PmtTpInf>
<SvcLvl>
<Cd>SEPA</Cd>
</SvcLvl>
<CtgyPurp>
<Cd>SALA</Cd>
</CtgyPurp>
</PmtTpInf>
<CdtTrfTxInf> <----------------------------------
<Amt>
<InstdAmt Ccy="EUR">1536.96</InstdAmt>
</Amt>
<Cdtr>
<Nm>Achternaam, Voornaam </Nm>
</Cdtr>
<CdtrAcct>
<Id>
<IBAN>NL80RABO0134343443</IBAN>
</Id>
</CdtrAcct>
</CdtTrfTxInf> <------------------------------------
<CdtTrfTxInf> <----------------------------------
<Amt>
<InstdAmt Ccy="EUR">1676.96</InstdAmt>
</Amt>
<Cdtr>
<Nm>Achternaam, Voornaam </Nm>
</Cdtr>
<CdtrAcct>
<Id>
<IBAN>NL80RABO013433222243</IBAN>
</Id>
</CdtrAcct>
</CdtTrfTxInf> <------------------------------------
</CstmrCdtTrfInitn>
</Document>
I use ElementTree:
I want a python list of tuples with the info within the tag (everything between the arrows in the example xml file). So in this example i want al list with 2 tuples.
How can i do that.
I can iterate over the tree, but thats is.
my code:
import xml.etree.ElementTree as ET
tree = ET.parse(xml_file)
root = tree.getroot()
for elem in tree.iter():
print(elem.tag, elem.text) --> i get every tag in the whole file

I rather like to use xmltodict.
First of all, your input data as given is missing a closing </PmtInf> tag towards the end, just before your closing </CstmrCdtTrfInitn> tag. After fixing that, I saved your xml data into a file and did the following:
import xmltodict
with open("input_data.xml", "r") as f:
xml_data = f.read()
xml_dict = xmltodict.parse(xml_data)
You can then access the xml data using dictionary accessors, for example:
xml_dict
>>>{'Document': {'#xmlns:xsi': 'http://www.w3.org/20...a-instance', '#xmlns': 'urn:iso:std:iso:2002...001.001.03', 'CstmrCdtTrfInitn': {...}}}
xml_dict["Document"]
>>>{'#xmlns:xsi': 'http://www.w3.org/20...a-instance', '#xmlns': 'urn:iso:std:iso:2002...001.001.03', 'CstmrCdtTrfInitn': {'GrpHdr': {...}, 'PmtInf': {...}}}
xml_dict["Document"]["CstmrCdtTrfInitn"].keys()
>>>dict_keys(['GrpHdr', 'PmtInf'])
xml_dict["Document"]["CstmrCdtTrfInitn"]["PmtInf"]
{'PmtInfId': '20220914054827-154016', 'PmtMtd': 'TRF', 'BtchBookg': 'true', 'NbOfTxs': '205', 'CtrlSum': '154761.02', 'PmtTpInf': {'SvcLvl': {...}, 'CtgyPurp': {...}}, 'CdtTrfTxInf': [{...}, {...}]}
xml_dict["Document"]["CstmrCdtTrfInitn"]["PmtInf"].keys()
dict_keys(['PmtInfId', 'PmtMtd', 'BtchBookg', 'NbOfTxs', 'CtrlSum', 'PmtTpInf', 'CdtTrfTxInf'])
Then you can loop over your CdtTrfTxInf with:
for item in xml_dict["Document"]["CstmrCdtTrfInitn"]["PmtInf"]["CdtTrfTxInf"]:
print(item)
giving the output:
{'Amt': {'InstdAmt': {'#Ccy': 'EUR', '#text': '1536.96'}}, 'Cdtr': {'Nm': 'Achternaam, Voornaam'}, 'CdtrAcct': {'Id': {'IBAN': 'NL80RABO0134343443'}}}
{'Amt': {'InstdAmt': {'#Ccy': 'EUR', '#text': '1676.96'}}, 'Cdtr': {'Nm': 'Achternaam, Voornaam'}, 'CdtrAcct': {'Id': {'IBAN': 'NL80RABO013433222243'}}}
which you can process as you want.

this is just a speedcode try xd give it a chance and try it :
import xml.etree.ElementTree as ET
tree = ET.parse("fr.xml")
root = tree.getroot()
test = False
for elem in tree.iter():
if elem.tag == "CdtTrfTxInf":
test = True
continue
if test and elem.text.strip() :
print(elem.tag, elem.text)
with result as list of tuple :
import xml.etree.ElementTree as ET
tree = ET.parse("fr.xml")
root = tree.getroot()
test = False
tag = []
textval=[]
for elem in tree.iter():
if elem.tag == "CdtTrfTxInf":
test = True
continue
if test and elem.text.strip() :
tag.append(elem.tag)
textval.append(elem.text)
data = list(zip(tag, textval))
print (data)

adjust Python fuction to parse xml

I need to read an XML file in an external domain.
my code:
tree = ET.ElementTree(file=urllib2.urlopen('http://192.168.2.57:8010/data/camera_state.xml'))
root = tree.getroot()
root.tag, root.attrib
for elem in tree.iter():
print elem.tag, elem.att
I could not get into the structure I need, the result of my function is this below:
CameraState {}
Cameras {}
Camera {'Id': '1'}
State {}
Camera {'Id': '2'}
State {}
Camera {'Id': '3'}
State {}
Camera {'Id': '4'}
State {}
I need to adjust this Python function to get into a result as below:
<CameraState>
<Cameras>
<Camera Id="1">
<State>NO_SIGNAL</State>
</Camera>
<Camera Id="2">
<State>OK</State>
</Camera>
</Cameras>
</CameraState>

You do have the parsed structure. It's just about the way you are accessing it.
Use getchildren to access children nodes. An example of recursively printing the structure:
import xml.etree.ElementTree as ET
def print_tree(node, prefix=''):
print(prefix, node.tag, node.attrib, node.text.strip())
for child in node:
print_tree(child, prefix + ' ')
tree = ET.ElementTree(file=<your file>)
root = tree.getroot()
print_tree(root)
It gives:
CameraState {}
Cameras {}
Camera {'Id': '1'}
State {} NO_SIGNAL
Camera {'Id': '2'}
State {} OK
However, I recommend you take a look at xmltodict:
import xmltodict
with open(<your file>) as f:
tree = xmltodict.parse(f.read())
print(tree)
It gives you OrderedDicts:
OrderedDict([('CameraState', OrderedDict([('Cameras', OrderedDict([('Camera', [OrderedDict([('#Id', '1'), ('State', 'NO_SIGNAL')]), OrderedDict([('#Id', '2'), ('State', 'OK')])])]))]))])

lxml etree get all text before element

How to get all text before an element in a etree separated from the text after the element?
from lxml import etree
tree = etree.fromstring('''
<a>
find
<b>
the
</b>
text
<dd></dd>
<c>
before
</c>
<dd></dd>
and after
</a>
''')
What do I want? In this example, the <dd> tags are separators and for all of them
for el in tree.findall('.//dd'):
I would like to have all text before and after them:
[
{
el : <Element dd at 0xsomedistinctadress>,
before : 'find the text',
after : 'before and after'
},
{
el : <Element dd at 0xsomeotherdistinctadress>,
before : 'find the text before',
after : 'and after'
}
]
My idea was to use some kind of placeholders in the tree with which I replace the <dd> tags and then cut the string at that placeholder, but I need the correspondence with the actual element.

There might be a simpler way, but I would use the following XPath expressions:
preceding-sibling::*/text()|preceding::text()
following-sibling::*/text()|following::text()
Sample implementation (definitely violating the DRY principle):
def get_text_before(element):
for item in element.xpath("preceding-sibling::*/text()|preceding-sibling::text()"):
item = item.strip()
if item:
yield item
def get_text_after(element):
for item in element.xpath("following-sibling::*/text()|following-sibling::text()"):
item = item.strip()
if item:
yield item
for el in tree.findall('.//dd'):
before = " ".join(get_text_before(el))
after = " ".join(get_text_after(el))
print {
"el": el,
"before": before,
"after": after
}
Prints:
{'el': <Element dd at 0x10af81488>, 'after': 'before and after', 'before': 'find the text'}
{'el': <Element dd at 0x10af81200>, 'after': 'and after', 'before': 'find the text before'}

How to parse and display the content of an Ixml object using IXML

I am having difficult parsing the xml _file below using Ixml:
>>_file= "qv.xml"
file content:
<document reference="suspicious-document00500.txt">
<feature name="plagiarism" type="artificial" obfuscation="none" this_offset="128" this_length="2503" source_reference="source-document00500.txt" source_offset="138339" source_length="2503"/>
<feature name="plagiarism" type="artificial" obfuscation="none" this_offset="8593" this_length="1582" source_reference="source-document00500.txt" source_offset="49473" source_length="1582"/>
</document>
Here is my attempt:
>>from lxml.etree import XMLParser, parse
>>parsefile = parse(_file)
>>print parsefile
Output: <lxml.etree._ElementTree object at 0x000000000642E788>
The output is the location of the ixml object, while I am after the actual file content ie
Desired output={'document reference'="suspicious-document00500.txt", 'this_offset': '128', 'obfuscation': 'none', 'source_length': '2503', 'name': 'plagiarism', 'this_length': '2503', 'source_reference': 'source-document00500.txt', 'source_offset': '138339', 'type': 'artificial'}
Any ideas on how to get the desired output? thanks.

Here's one way of getting the desired outputs:
from lxml import etree
def main():
doc = etree.parse('qv.xml')
root = doc.getroot()
print root.attrib
for item in root:
print item.attrib
if __name__ == "__main__":
main()
Output:
{'reference': 'suspicious-document00500.txt'}
{'this_offset': '128', 'obfuscation': 'none', 'source_length': '2503', 'name': 'plagiarism', 'this_length': '2503', 'source_reference': 'source-document00500.txt', 'source_offset': '138339', 'type': 'artificial'}
{'this_offset': '8593', 'obfuscation': 'none', 'source_length': '1582', 'name': 'plagiarism', 'this_length': '1582', 'source_reference': 'source-document00500.txt', 'source_offset': '49473', 'type': 'artificial'}
It works fine with the contents you gave.
You might want to read thisto see how etree represents xml objects.

How to iterate over GraphML file with lxml

I have the following GraphML file 'mygraph.gml' that I want to parse with a simple python script:
This represents a simple graph with 2 nodes "node0", "node1" and an edge between them
<?xml version="1.0" encoding="UTF-8"?>
<graphml xmlns="http://graphml.graphdrawing.org/xmlns"
xmlns:xsi="http://www.w3.org/2001/XMLSchema-instance"
xsi:schemaLocation="http://graphml.graphdrawing.org/xmlns
http://graphml.graphdrawing.org/xmlns/1.0/graphml.xsd">
<key id="name" for="node" attr.name="name" attr.type="string"/>
<key id="weight" for="edge" attr.name="weight" attr.type="double"/>
<graph id="G" edgedefault="directed">
<node id="n0">
<data key="name">node1</data>
</node>
<node id="n1">
<data key="name">node2</data>
</node>
<edge source="n1" target="n0">
<data key="weight">1</data>
</edge>
</graph>
</graphml>
This represents a graph with two nodes n0 and n1 with an edge of weight 1 between them.
I want to parse this structure with python.
I wrote a script with the help of lxml (I need to use it because the dataset in much much bigger than this simple example, more than 10^5 nodes, python minidom is too slow)
import lxml.etree as et
tree = et.parse('mygraph.gml')
root = tree.getroot()
graphml = {
"graph": "{http://graphml.graphdrawing.org/xmlns}graph",
"node": "{http://graphml.graphdrawing.org/xmlns}node",
"edge": "{http://graphml.graphdrawing.org/xmlns}edge",
"data": "{http://graphml.graphdrawing.org/xmlns}data",
"label": "{http://graphml.graphdrawing.org/xmlns}data[#key='label']",
"x": "{http://graphml.graphdrawing.org/xmlns}data[#key='x']",
"y": "{http://graphml.graphdrawing.org/xmlns}data[#key='y']",
"size": "{http://graphml.graphdrawing.org/xmlns}data[#key='size']",
"r": "{http://graphml.graphdrawing.org/xmlns}data[#key='r']",
"g": "{http://graphml.graphdrawing.org/xmlns}data[#key='g']",
"b": "{http://graphml.graphdrawing.org/xmlns}data[#key='b']",
"weight": "{http://graphml.graphdrawing.org/xmlns}data[#key='weight']",
"edgeid": "{http://graphml.graphdrawing.org/xmlns}data[#key='edgeid']"
}
graph = tree.find(graphml.get("graph"))
nodes = graph.findall(graphml.get("node"))
edges = graph.findall(graphml.get("edge"))
This script gets correctly the nodes and edges so that I can simply iterate over them
for n in nodes:
print n.attrib
or similarly on edges:
for e in edges:
print (e.attrib['source'], e.attrib['target'])
but I can't really understand how to get the "data" tag for the edges or the nodes in order to print the edge weight and nodes tag "name".
This doesn't work for me:
weights = graph.findall(graphml.get("weight"))
the last list is always empty. Why? I'm missing something around but can't understand what.

You can't do it in one pass, but for each node found, you can build a dict with the key/value of data:
graph = tree.find(graphml.get("graph"))
nodes = graph.findall(graphml.get("node"))
edges = graph.findall(graphml.get("edge"))
for node in nodes + edges:
attribs = {}
for data in node.findall(graphml.get('data')):
attribs[data.get('key')] = data.text
print 'Node', node, 'have', attribs
It give the result:
Node <Element {http://graphml.graphdrawing.org/xmlns}node at 0x7ff053d3e5a0> have {'name': 'node1'}
Node <Element {http://graphml.graphdrawing.org/xmlns}node at 0x7ff053d3e5f0> have {'name': 'node2'}
Node <Element {http://graphml.graphdrawing.org/xmlns}edge at 0x7ff053d3e640> have {'weight': '1'}

Develop Reference

Python is a programming language that lets you work quickly and integrate systems more effectively.

Not getting XML output as expected - python

Related

Parse xml file to a python list

adjust Python fuction to parse xml

lxml etree get all text before element

How to parse and display the content of an Ixml object using IXML

How to iterate over GraphML file with lxml

Categories

Resources