How to iterate over GraphML file with lxml

How to iterate over GraphML file with lxml - python

I have the following GraphML file 'mygraph.gml' that I want to parse with a simple python script:
This represents a simple graph with 2 nodes "node0", "node1" and an edge between them
<?xml version="1.0" encoding="UTF-8"?>
<graphml xmlns="http://graphml.graphdrawing.org/xmlns"
xmlns:xsi="http://www.w3.org/2001/XMLSchema-instance"
xsi:schemaLocation="http://graphml.graphdrawing.org/xmlns
http://graphml.graphdrawing.org/xmlns/1.0/graphml.xsd">
<key id="name" for="node" attr.name="name" attr.type="string"/>
<key id="weight" for="edge" attr.name="weight" attr.type="double"/>
<graph id="G" edgedefault="directed">
<node id="n0">
<data key="name">node1</data>
</node>
<node id="n1">
<data key="name">node2</data>
</node>
<edge source="n1" target="n0">
<data key="weight">1</data>
</edge>
</graph>
</graphml>
This represents a graph with two nodes n0 and n1 with an edge of weight 1 between them.
I want to parse this structure with python.
I wrote a script with the help of lxml (I need to use it because the dataset in much much bigger than this simple example, more than 10^5 nodes, python minidom is too slow)
import lxml.etree as et
tree = et.parse('mygraph.gml')
root = tree.getroot()
graphml = {
"graph": "{http://graphml.graphdrawing.org/xmlns}graph",
"node": "{http://graphml.graphdrawing.org/xmlns}node",
"edge": "{http://graphml.graphdrawing.org/xmlns}edge",
"data": "{http://graphml.graphdrawing.org/xmlns}data",
"label": "{http://graphml.graphdrawing.org/xmlns}data[#key='label']",
"x": "{http://graphml.graphdrawing.org/xmlns}data[#key='x']",
"y": "{http://graphml.graphdrawing.org/xmlns}data[#key='y']",
"size": "{http://graphml.graphdrawing.org/xmlns}data[#key='size']",
"r": "{http://graphml.graphdrawing.org/xmlns}data[#key='r']",
"g": "{http://graphml.graphdrawing.org/xmlns}data[#key='g']",
"b": "{http://graphml.graphdrawing.org/xmlns}data[#key='b']",
"weight": "{http://graphml.graphdrawing.org/xmlns}data[#key='weight']",
"edgeid": "{http://graphml.graphdrawing.org/xmlns}data[#key='edgeid']"
}
graph = tree.find(graphml.get("graph"))
nodes = graph.findall(graphml.get("node"))
edges = graph.findall(graphml.get("edge"))
This script gets correctly the nodes and edges so that I can simply iterate over them
for n in nodes:
print n.attrib
or similarly on edges:
for e in edges:
print (e.attrib['source'], e.attrib['target'])
but I can't really understand how to get the "data" tag for the edges or the nodes in order to print the edge weight and nodes tag "name".
This doesn't work for me:
weights = graph.findall(graphml.get("weight"))
the last list is always empty. Why? I'm missing something around but can't understand what.

You can't do it in one pass, but for each node found, you can build a dict with the key/value of data:
graph = tree.find(graphml.get("graph"))
nodes = graph.findall(graphml.get("node"))
edges = graph.findall(graphml.get("edge"))
for node in nodes + edges:
attribs = {}
for data in node.findall(graphml.get('data')):
attribs[data.get('key')] = data.text
print 'Node', node, 'have', attribs
It give the result:
Node <Element {http://graphml.graphdrawing.org/xmlns}node at 0x7ff053d3e5a0> have {'name': 'node1'}
Node <Element {http://graphml.graphdrawing.org/xmlns}node at 0x7ff053d3e5f0> have {'name': 'node2'}
Node <Element {http://graphml.graphdrawing.org/xmlns}edge at 0x7ff053d3e640> have {'weight': '1'}

Related

Element Tree - Iterate dictionary to append elements to new line xml

I am attempting to append elements to an existing .xml using ElementTree.
I have the desired attributes stored as a list of dictionaries:
myDict = [{"name": "dan",
"age": "25",
"subject":"maths"},
{"name": "susan",
"age": "27",
"subject":"english"},
{"name": "leo",
"age": "24",
"subject":"psychology"}]
And I use the following code for the append:
import xml.etree.ElementTree as ET
tree = ET.parse('<path to existing .xml')
root = tree.getroot()
for x,y in enumerate(myDict):
root.append(ET.Element("student", attrib=myDict[x]))
tree.write('<path to .xml>')
This works mostly fine except that all elements are appended as a single line. I'd like to make each element append to be on a new line:
# Not this:
<student name='dan' age='25' subject='maths' /><student name='susan' age='27' subject='english' /><student name='leo' age='24' subject='psychology' />
# But this:
<student name='dan' age='25' subject='maths' />
<student name='susan' age='27' subject='english' />
<student name='leo' age='24' subject='psychology' />
I have attempted use lxml and pass the pretty_print=True argument within the tree.write call but it had no effect.
I'm sure I'm missing something simple here, so your help is appreciated!

With pointers from here (Thanks #Thicc_Gandhi), I solved it by amending the iteration to:
for x,y in enumerate(MyDict):
elem = ET.Element("student",attrib=myDict[x])
elem.tail = "\n"
root.append(elem)

Parse xml file to a python list

I have a xml file:
<?xml version="1.0" encoding="utf-8" standalone="yes"?>
<Document xmlns:xsi="http://www.w3.org/2001/XMLSchema-instance" xmlns="urn:iso:std:iso:20022:tech:xsd:pain.001.001.03">
<CstmrCdtTrfInitn>
<GrpHdr>
<MsgId>637987745078994894</MsgId>
<CreDtTm>2022-09-14T05:48:27</CreDtTm>
<NbOfTxs>205</NbOfTxs>
<CtrlSum>154761.02</CtrlSum>
<InitgPty>
<Nm> Company</Nm>
</InitgPty>
</GrpHdr>
<PmtInf>
<PmtInfId>20220914054827-154016</PmtInfId>
<PmtMtd>TRF</PmtMtd>
<BtchBookg>true</BtchBookg>
<NbOfTxs>205</NbOfTxs>
<CtrlSum>154761.02</CtrlSum>
<PmtTpInf>
<SvcLvl>
<Cd>SEPA</Cd>
</SvcLvl>
<CtgyPurp>
<Cd>SALA</Cd>
</CtgyPurp>
</PmtTpInf>
<CdtTrfTxInf> <----------------------------------
<Amt>
<InstdAmt Ccy="EUR">1536.96</InstdAmt>
</Amt>
<Cdtr>
<Nm>Achternaam, Voornaam </Nm>
</Cdtr>
<CdtrAcct>
<Id>
<IBAN>NL80RABO0134343443</IBAN>
</Id>
</CdtrAcct>
</CdtTrfTxInf> <------------------------------------
<CdtTrfTxInf> <----------------------------------
<Amt>
<InstdAmt Ccy="EUR">1676.96</InstdAmt>
</Amt>
<Cdtr>
<Nm>Achternaam, Voornaam </Nm>
</Cdtr>
<CdtrAcct>
<Id>
<IBAN>NL80RABO013433222243</IBAN>
</Id>
</CdtrAcct>
</CdtTrfTxInf> <------------------------------------
</CstmrCdtTrfInitn>
</Document>
I use ElementTree:
I want a python list of tuples with the info within the tag (everything between the arrows in the example xml file). So in this example i want al list with 2 tuples.
How can i do that.
I can iterate over the tree, but thats is.
my code:
import xml.etree.ElementTree as ET
tree = ET.parse(xml_file)
root = tree.getroot()
for elem in tree.iter():
print(elem.tag, elem.text) --> i get every tag in the whole file

I rather like to use xmltodict.
First of all, your input data as given is missing a closing </PmtInf> tag towards the end, just before your closing </CstmrCdtTrfInitn> tag. After fixing that, I saved your xml data into a file and did the following:
import xmltodict
with open("input_data.xml", "r") as f:
xml_data = f.read()
xml_dict = xmltodict.parse(xml_data)
You can then access the xml data using dictionary accessors, for example:
xml_dict
>>>{'Document': {'#xmlns:xsi': 'http://www.w3.org/20...a-instance', '#xmlns': 'urn:iso:std:iso:2002...001.001.03', 'CstmrCdtTrfInitn': {...}}}
xml_dict["Document"]
>>>{'#xmlns:xsi': 'http://www.w3.org/20...a-instance', '#xmlns': 'urn:iso:std:iso:2002...001.001.03', 'CstmrCdtTrfInitn': {'GrpHdr': {...}, 'PmtInf': {...}}}
xml_dict["Document"]["CstmrCdtTrfInitn"].keys()
>>>dict_keys(['GrpHdr', 'PmtInf'])
xml_dict["Document"]["CstmrCdtTrfInitn"]["PmtInf"]
{'PmtInfId': '20220914054827-154016', 'PmtMtd': 'TRF', 'BtchBookg': 'true', 'NbOfTxs': '205', 'CtrlSum': '154761.02', 'PmtTpInf': {'SvcLvl': {...}, 'CtgyPurp': {...}}, 'CdtTrfTxInf': [{...}, {...}]}
xml_dict["Document"]["CstmrCdtTrfInitn"]["PmtInf"].keys()
dict_keys(['PmtInfId', 'PmtMtd', 'BtchBookg', 'NbOfTxs', 'CtrlSum', 'PmtTpInf', 'CdtTrfTxInf'])
Then you can loop over your CdtTrfTxInf with:
for item in xml_dict["Document"]["CstmrCdtTrfInitn"]["PmtInf"]["CdtTrfTxInf"]:
print(item)
giving the output:
{'Amt': {'InstdAmt': {'#Ccy': 'EUR', '#text': '1536.96'}}, 'Cdtr': {'Nm': 'Achternaam, Voornaam'}, 'CdtrAcct': {'Id': {'IBAN': 'NL80RABO0134343443'}}}
{'Amt': {'InstdAmt': {'#Ccy': 'EUR', '#text': '1676.96'}}, 'Cdtr': {'Nm': 'Achternaam, Voornaam'}, 'CdtrAcct': {'Id': {'IBAN': 'NL80RABO013433222243'}}}
which you can process as you want.

this is just a speedcode try xd give it a chance and try it :
import xml.etree.ElementTree as ET
tree = ET.parse("fr.xml")
root = tree.getroot()
test = False
for elem in tree.iter():
if elem.tag == "CdtTrfTxInf":
test = True
continue
if test and elem.text.strip() :
print(elem.tag, elem.text)
with result as list of tuple :
import xml.etree.ElementTree as ET
tree = ET.parse("fr.xml")
root = tree.getroot()
test = False
tag = []
textval=[]
for elem in tree.iter():
if elem.tag == "CdtTrfTxInf":
test = True
continue
if test and elem.text.strip() :
tag.append(elem.tag)
textval.append(elem.text)
data = list(zip(tag, textval))
print (data)

How to check if particular tag name equals a value and return the entire data inside the tag from XML using Python

I have a xml with the following structure:
<?xml version="1.0" ?>
<Parameters>
<Test name="Login" browser="chrome">
<TestParamerer version="1.2" timeout="480"/>
</Test>
<Test name="Logout" browser="chrome">
<TestParamerer version="2.3" timeout="480"/>
<Arguments name="EF" version="2.2"/>
</Test>
</<Parameters>
I need to retrieve Test with name="Login" and to return the entire value in JSON format inside the Test tag. I am new to Python and so any help is appreciated.

You can use etree, json and xmltodict
from lxml import etree
import json
import xmltodict
xml = """<?xml version="1.0" ?>
<Parameters>
<Test name="Login" browser="chrome">
<TestParamerer version="1.2" timeout="480"/>
</Test>
<Test name="Logout" browser="chrome">
<TestParamerer version="2.3" timeout="480"/>
<Arguments name="EF" version="2.2"/>
</Test>
</Parameters>"""
tree = etree.fromstring(xml)
login = tree.findall('.//Test[#name="Login"]')
# [<Element Test at 0x29bc1ac7480>] (in this case 1 element).
dict_from_xml = xmltodict.parse(etree.tostring(login[0]))
'''
OrderedDict([('Test',
OrderedDict([('#name', 'Login'),
('#browser', 'chrome'),
('TestParamerer',
OrderedDict([('#version', '1.2'),
('#timeout', '480')]))]))])
'''
# And now to json':
js = json.dumps(dict_from_xml, indent=2)
print(js)
'''
{
"Test": {
"#name": "Login",
"#browser": "chrome",
"TestParamerer": {
"#version": "1.2",
"#timeout": "480"
}
}
}
'''

Not getting XML output as expected

I have Python3 and am following this XML tutorial, https://docs.python.org/3.7/library/xml.etree.elementtree.html
I wish to output a listing of all DailyIndexRatio
DailyIndexRatio {'CUSIP': '912810FD5','IssueDate': '1998-04-15',
'Date':'2019-03-01','RefCPI':'251.23300','IndexRatio':'1.55331' }
....
Instead my code outputs
DailyIndexRatio {}
....
How to fix?
Here is the code
import xml.etree.ElementTree as ET
tree = ET.parse('CPI_20190213.xml')
root = tree.getroot()
print(root.tag)
print(root.attrib)
for child in root:
print(child.tag,child.attrib)
And I downloaded the xml file from https://treasurydirect.gov/xml/CPI_20190213.xml

import xml.etree.ElementTree as ET
tree = ET.parse('CPI_20190213.xml') # Load the XML
root = tree.getroot() # Get XML root element
e = root.findall('.//DailyIndexRatio') # use xpath to find relevant elements
# for each element
for i in e:
# create a dictionary object.
d = {}
# for each child of element
for child in i:
# add the tag name and text value to the dictionary
d[child.tag] = child.text
# print the DailyIndexRatio tag name and dictionary
print (i.tag, d)
Outputs:
DailyIndexRatio {'CUSIP': '912810FD5', 'IssueDate': '1998-04-15', 'Date': '2019-03-01', 'RefCPI': '251.23300', 'IndexRatio': '1.55331'}
DailyIndexRatio {'CUSIP': '912810FD5', 'IssueDate': '1998-04-15', 'Date': '2019-03-02', 'RefCPI': '251.24845', 'IndexRatio': '1.55341'}
DailyIndexRatio {'CUSIP': '912810FD5', 'IssueDate': '1998-04-15', 'Date': '2019-03-03', 'RefCPI': '251.26390', 'IndexRatio': '1.55351'}
DailyIndexRatio {'CUSIP': '912810FD5', 'IssueDate': '1998-04-15', 'Date': '2019-03-04', 'RefCPI': '251.27935', 'IndexRatio': '1.55360'}
...

You're printing the attributes, but that element does not have any attributes.
This is an element with attributes:
<element name="Bob" age="40" sex="male" />
But the element you're trying to print doesn't have those. It has child elements:
<element>
<name>Bob</name>
<age>40</age>
<sex>male</sex>
</element>

What's the best way to parse a definition list using XPath?

I'm using Python + xPath to parse some HTML but I'm having trouble parsing an definition list. An example would be as follows:
<dl>
<dt>Section One</dt>
<dd>Child one</dd>
<dd>Child one.2</dd>
<dt>Section Two</dt>
<dd>Child two</dd>
</dl>
I want to transform this into an output like:
{'Section One' : ['Child one','Child one.2'], 'Section Two' : ['Child two']}
I having difficulty though because the way the is structured, there's not that same hierarchy you find in the output.
Thanks

A solution without xpath, using lxml (which you are probably already using if you are using xpath?):
from collections import defaultdict
from lxml import etree
dl = etree.fromstring('''<dl>
<dt>Section One</dt>
<dd>Child one</dd>
<dd>Child one.2</dd>
<dt>Section Two</dt>
<dd>Child two</dd>
</dl>''')
result = defaultdict(list)
for dt in dl.findall('dt'):
for child in dt.itersiblings(): # iterate over following siblings
if child.tag != 'dd':
break # stop at the first element that is not a dd
result[dt.text].append(child.text)
print dict(result)
(any xpath solution I can come up with is worse than this, it seems)

A single-expression XPath 1.0 solution, if possible at all, would be difficult to write and understand.
Here is a simple XSLT 1.0 solution:
<xsl:stylesheet version="1.0"
xmlns:xsl="http://www.w3.org/1999/XSL/Transform">
<xsl:output method="text"/>
<xsl:key name="kFollowing" match="dd"
use="generate-id(preceding-sibling::dt[1])"/>
<xsl:template match="dl">
{ <xsl:apply-templates select="dt"/> }
</xsl:template>
<xsl:template match="dt">
<xsl:text/>'<xsl:value-of select="."/>' : [ <xsl:text/>
<xsl:apply-templates select=
"key('kFollowing', generate-id())"/>
<xsl:text> ]</xsl:text>
<xsl:if test="not(position()=last())">, </xsl:if>
</xsl:template>
<xsl:template match="dd">
<xsl:text/>'<xsl:value-of select="."/>'<xsl:text/>
<xsl:if test="not(position()=last())">, </xsl:if>
</xsl:template>
</xsl:stylesheet>
when this transformation is applied on the provided XML document:
<dl>
<dt>Section One</dt>
<dd>Child one</dd>
<dd>Child one.2</dd>
<dt>Section Two</dt>
<dd>Child two</dd>
</dl>
the wanted, correct result is produced:
{ 'Section One' : [ 'Child one', 'Child one.2' ], 'Section Two' : [ 'Child two' ] }
Explanation: An xsl:key is defined and used to capture the 1 --> many relationship between a dt and the immediately following siblings dt elements.

Develop Reference

Python is a programming language that lets you work quickly and integrate systems more effectively.

How to iterate over GraphML file with lxml - python

Related

Element Tree - Iterate dictionary to append elements to new line xml

Parse xml file to a python list

How to check if particular tag name equals a value and return the entire data inside the tag from XML using Python

Not getting XML output as expected

What's the best way to parse a definition list using XPath?

Categories

Resources