Element Tree - Iterate dictionary to append elements to new line xml - python

I am attempting to append elements to an existing .xml using ElementTree.
I have the desired attributes stored as a list of dictionaries:
myDict = [{"name": "dan",
"age": "25",
"subject":"maths"},
{"name": "susan",
"age": "27",
"subject":"english"},
{"name": "leo",
"age": "24",
"subject":"psychology"}]
And I use the following code for the append:
import xml.etree.ElementTree as ET
tree = ET.parse('<path to existing .xml')
root = tree.getroot()
for x,y in enumerate(myDict):
root.append(ET.Element("student", attrib=myDict[x]))
tree.write('<path to .xml>')
This works mostly fine except that all elements are appended as a single line. I'd like to make each element append to be on a new line:
# Not this:
<student name='dan' age='25' subject='maths' /><student name='susan' age='27' subject='english' /><student name='leo' age='24' subject='psychology' />
# But this:
<student name='dan' age='25' subject='maths' />
<student name='susan' age='27' subject='english' />
<student name='leo' age='24' subject='psychology' />
I have attempted use lxml and pass the pretty_print=True argument within the tree.write call but it had no effect.
I'm sure I'm missing something simple here, so your help is appreciated!

With pointers from here (Thanks #Thicc_Gandhi), I solved it by amending the iteration to:
for x,y in enumerate(MyDict):
elem = ET.Element("student",attrib=myDict[x])
elem.tail = "\n"
root.append(elem)

Related

Is there a package that converts xmltodict dictionaries to lxml trees?

The problem I have is this. I've started the XML creation using the dictionary structure used by xmltodict Python package so I can use the unparse method to create the XML. But I think I reached a point where xmltodict can't help me. I have actions in this dictionary format, highly nested each, something like this, just much more complex:
action = {
"#id": 1,
"some-nested-stuff":
{"#attr1": "string value", "child": True}
}
Now I need to group some actions similar to this:
<action id=1>...</action>
<action-group groupId=1>
<action id=2>...</action>
<action id=3>...</action>
</action-group>
<action id=4>...</action>
And yes, the first action needs to go before the action group and the fourth action after it. It seems impossible to do it with just xmltodict. I was thinking that I create the actions' XML tree as an lxml object from these dictionaries, and than I merge those objects into a whole XML. I think that it wouldn't be a big task, but there might be a ready package for that. Is there one?
The alternative solution — that I try to avoid if possible — is to rewrite the project from scratch using just lxml. Or is there a way to create that XML using just xmltodict but not the xml/lxml packages?
It seems that no such package. So far I have this solution. I doesn't handle #text keys and there can be problems with namespaces.
"""
Converts the dictionary used by xmltodict package to represent XMLs
to lxml.
"""
from typing import Dict, Any
from lxml import etree
XmlDictType = Dict[str, Any]
element = etree.Element("for-creating-types")
ElementType = type(element)
ElementTreeType = type(etree.ElementTree(element))
def convert(xml_dict: XmlDictType) -> ElementType:
root_name = list(xml_dict)[0]
inside_dict = xml_dict[root_name]
attrs, children = split_attrs_and_children(inside_dict)
root = etree.Element(root_name, **attrs)
convert_children(root, children)
return root
def split_attrs_and_children(xml_dict: XmlDictType) -> ElementType:
"""Split the categories and fix the types"""
def fix_types(v):
if isinstance(v, (int, float)):
return str(v)
elif isinstance(v, bool):
return {True: "true", False: "false"}[v]
else:
return v
attrs = {k[1:]: fix_types(v) for k, v in xml_dict.items() if k.startswith("#")}
children = {k: fix_types(v) for k, v in xml_dict.items() if not (k.startswith("#") or k.startswith("#"))}
return attrs, children
def convert_children(parent: ElementType, children: XmlDictType) -> ElementType:
for child_name, value in children.items():
if isinstance(value, dict):
attrs, children = split_attrs_and_children(value)
child = etree.SubElement(parent, child_name, **attrs)
convert_children(child, children)
elif isinstance(value, list):
for v in value:
child = etree.SubElement(parent, child_name).text = v
else:
child = etree.SubElement(parent, child_name).text = value
return parent
You can convert for example this dictionary:
xml_dict = {
"mydocument": {
"#has": "an attribute",
"and": {
"many": [
"elements",
"more elements"
]
},
"plus": {
"#a": "complex",
"#text": "element as well"
}
}
}
Note that the #text line is not included yet.

Parse xml file to a python list

I have a xml file:
<?xml version="1.0" encoding="utf-8" standalone="yes"?>
<Document xmlns:xsi="http://www.w3.org/2001/XMLSchema-instance" xmlns="urn:iso:std:iso:20022:tech:xsd:pain.001.001.03">
<CstmrCdtTrfInitn>
<GrpHdr>
<MsgId>637987745078994894</MsgId>
<CreDtTm>2022-09-14T05:48:27</CreDtTm>
<NbOfTxs>205</NbOfTxs>
<CtrlSum>154761.02</CtrlSum>
<InitgPty>
<Nm> Company</Nm>
</InitgPty>
</GrpHdr>
<PmtInf>
<PmtInfId>20220914054827-154016</PmtInfId>
<PmtMtd>TRF</PmtMtd>
<BtchBookg>true</BtchBookg>
<NbOfTxs>205</NbOfTxs>
<CtrlSum>154761.02</CtrlSum>
<PmtTpInf>
<SvcLvl>
<Cd>SEPA</Cd>
</SvcLvl>
<CtgyPurp>
<Cd>SALA</Cd>
</CtgyPurp>
</PmtTpInf>
<CdtTrfTxInf> <----------------------------------
<Amt>
<InstdAmt Ccy="EUR">1536.96</InstdAmt>
</Amt>
<Cdtr>
<Nm>Achternaam, Voornaam </Nm>
</Cdtr>
<CdtrAcct>
<Id>
<IBAN>NL80RABO0134343443</IBAN>
</Id>
</CdtrAcct>
</CdtTrfTxInf> <------------------------------------
<CdtTrfTxInf> <----------------------------------
<Amt>
<InstdAmt Ccy="EUR">1676.96</InstdAmt>
</Amt>
<Cdtr>
<Nm>Achternaam, Voornaam </Nm>
</Cdtr>
<CdtrAcct>
<Id>
<IBAN>NL80RABO013433222243</IBAN>
</Id>
</CdtrAcct>
</CdtTrfTxInf> <------------------------------------
</CstmrCdtTrfInitn>
</Document>
I use ElementTree:
I want a python list of tuples with the info within the tag (everything between the arrows in the example xml file). So in this example i want al list with 2 tuples.
How can i do that.
I can iterate over the tree, but thats is.
my code:
import xml.etree.ElementTree as ET
tree = ET.parse(xml_file)
root = tree.getroot()
for elem in tree.iter():
print(elem.tag, elem.text) --> i get every tag in the whole file
I rather like to use xmltodict.
First of all, your input data as given is missing a closing </PmtInf> tag towards the end, just before your closing </CstmrCdtTrfInitn> tag. After fixing that, I saved your xml data into a file and did the following:
import xmltodict
with open("input_data.xml", "r") as f:
xml_data = f.read()
xml_dict = xmltodict.parse(xml_data)
You can then access the xml data using dictionary accessors, for example:
xml_dict
>>>{'Document': {'#xmlns:xsi': 'http://www.w3.org/20...a-instance', '#xmlns': 'urn:iso:std:iso:2002...001.001.03', 'CstmrCdtTrfInitn': {...}}}
xml_dict["Document"]
>>>{'#xmlns:xsi': 'http://www.w3.org/20...a-instance', '#xmlns': 'urn:iso:std:iso:2002...001.001.03', 'CstmrCdtTrfInitn': {'GrpHdr': {...}, 'PmtInf': {...}}}
xml_dict["Document"]["CstmrCdtTrfInitn"].keys()
>>>dict_keys(['GrpHdr', 'PmtInf'])
xml_dict["Document"]["CstmrCdtTrfInitn"]["PmtInf"]
{'PmtInfId': '20220914054827-154016', 'PmtMtd': 'TRF', 'BtchBookg': 'true', 'NbOfTxs': '205', 'CtrlSum': '154761.02', 'PmtTpInf': {'SvcLvl': {...}, 'CtgyPurp': {...}}, 'CdtTrfTxInf': [{...}, {...}]}
xml_dict["Document"]["CstmrCdtTrfInitn"]["PmtInf"].keys()
dict_keys(['PmtInfId', 'PmtMtd', 'BtchBookg', 'NbOfTxs', 'CtrlSum', 'PmtTpInf', 'CdtTrfTxInf'])
Then you can loop over your CdtTrfTxInf with:
for item in xml_dict["Document"]["CstmrCdtTrfInitn"]["PmtInf"]["CdtTrfTxInf"]:
print(item)
giving the output:
{'Amt': {'InstdAmt': {'#Ccy': 'EUR', '#text': '1536.96'}}, 'Cdtr': {'Nm': 'Achternaam, Voornaam'}, 'CdtrAcct': {'Id': {'IBAN': 'NL80RABO0134343443'}}}
{'Amt': {'InstdAmt': {'#Ccy': 'EUR', '#text': '1676.96'}}, 'Cdtr': {'Nm': 'Achternaam, Voornaam'}, 'CdtrAcct': {'Id': {'IBAN': 'NL80RABO013433222243'}}}
which you can process as you want.
this is just a speedcode try xd give it a chance and try it :
import xml.etree.ElementTree as ET
tree = ET.parse("fr.xml")
root = tree.getroot()
test = False
for elem in tree.iter():
if elem.tag == "CdtTrfTxInf":
test = True
continue
if test and elem.text.strip() :
print(elem.tag, elem.text)
with result as list of tuple :
import xml.etree.ElementTree as ET
tree = ET.parse("fr.xml")
root = tree.getroot()
test = False
tag = []
textval=[]
for elem in tree.iter():
if elem.tag == "CdtTrfTxInf":
test = True
continue
if test and elem.text.strip() :
tag.append(elem.tag)
textval.append(elem.text)
data = list(zip(tag, textval))
print (data)

Open a JSON files and edit structure

I have produced a couple of json files after scraping a few elements. The structure for each file is as follows:
us.json
{'Pres': 'Biden', 'Vice': 'Harris', 'Secretary': 'Blinken'}
uk.json
{'1st Min': 'Johnson', 'Queen':'Elizabeth', 'Prince': 'Charles'}
I'd like to know how I could edit the structure of each dictionary inside the json file to get an output as it follows:
[
{"title": "Pres",
"name": "Biden"}
,
{"title": "Vice",
"name": "Harris"}
,
{"title": "Secretary",
"name": "Blinken"}
]
As far as I am able to think how to do it (I'm a beginner, studying only since a few weeks) I need first to run a loop to open each file, then I should generate a list of dictionaries and finally modify the dictionary to change the structure. This is what I got NOT WORKING as it overrides always with the same keys.
import os
import json
list_of_dicts = []
for filename in os.listdir("DOCS/Countries Data"):
with open(os.path.join("DOCS/Countries Data", filename), 'r', encoding='utf-8') as f:
text = f.read()
country_json = json.loads(text)
list_of_dicts.append(country_json)
for country in list_of_dicts:
newdict = country
lastdict = {}
for key in newdict:
lastdict = {'Title': key}
for value in newdict.values():
lastdict['Name'] = value
print(lastdict)
Extra bonus if you could also show me how to generate an ID mumber for each entry. Thank you very much
This look like task for list comprehension, I would do it following way
import json
us = '{"Pres": "Biden", "Vice": "Harris", "Secretary": "Blinken"}'
data = json.loads(us)
us2 = [{"title":k,"name":v} for k,v in data.items()]
us2json = json.dumps(us2)
print(us2json)
output
[{"title": "Pres", "name": "Biden"}, {"title": "Vice", "name": "Harris"}, {"title": "Secretary", "name": "Blinken"}]
data is dict, .items() provide key-value pairs, which I unpack into k and v (see tuple unpacking).
You can do this easily by writing a simple function like below
import uuid
def format_dict(data: dict):
return [dict(title=title, name=name, id=str(uuid.uuid4())) for title, name in data.items()]
where you can split the items as different objects and add a identifier for each using uuid.
Full code can be modified like this
import uuid
import os
import json
def format_dict(data: dict):
return [dict(title=title, name=name, id=str(uuid.uuid4())) for title, name in data.items()]
list_of_dicts = []
for filename in os.listdir("DOCS/Countries Data"):
with open(os.path.join("DOCS/Countries Data", filename), 'r', encoding='utf-8') as f:
country_json = json.load(f)
list_of_dicts.append(format_dict(country_json))
# list_of_dicts contains all file contents

Dictionary from a String with particular structure

I am using python 3 to read this file and convert it to a dictionary.
I have this string from a file and I would like to know how could be possible to create a dictionary from it.
[User]
Date=10/26/2003
Time=09:01:01 AM
User=teodor
UserText=Max Cor
UserTextUnicode=392039n9dj90j32
[System]
Type=Absolute
Dnumber=QS236
Software=1.1.1.2
BuildNr=0923875
Source=LAM
Column=OWKD
[Build]
StageX=12345
Spotter=2
ApertureX=0.0098743
ApertureY=0.2431899
ShiftXYZ=-4.234809e-002
[Text]
Text=Here is the Text files
DataBaseNumber=The database number is 918723
..... (There are more than 1000 lines per file) ...
On the text I have "Name=Something" and then I would like to convert it as follows:
{'Date':'10/26/2003',
'Time':'09:01:01 AM'
'User':'teodor'
'UserText':'Max Cor'
'UserTextUnicode':'392039n9dj90j32'.......}
The word between [ ] can be removed, like [User], [System], [Build], [Text], etc...
In some fields there is only the first part of the string:
[Colors]
Red=
Blue=
Yellow=
DarkBlue=
What you have is an ordinary properties file. You can use this example to read the values into map:
try (InputStream input = new FileInputStream("your_file_path")) {
Properties prop = new Properties();
prop.load(input);
// prop.getProperty("User") == "teodor"
} catch (IOException ex) {
ex.printStackTrace();
}
EDIT:
For Python solution, refer to the answerred question.
You can use configparser to read .ini, or .properties files (format you have).
import configparser
config = configparser.ConfigParser()
config.read('your_file_path')
# config['User'] == {'Date': '10/26/2003', 'Time': '09:01:01 AM'...}
# config['User']['User'] == 'teodor'
# config['System'] == {'Type': 'Abosulte', ...}
Can easily be done in python. Assuming your file is named test.txt.
This will also work for lines with nothing after the = as well as lines with multiple =.
d = {}
with open('test.txt', 'r') as f:
for line in f:
line = line.strip() # Remove any space or newline characters
parts = line.split('=') # Split around the `=`
if len(parts) > 1:
d[parts[0]] = ''.join(parts[1:])
print(d)
Output:
{
"Date": "10/26/2003",
"Time": "09:01:01 AM",
"User": "teodor",
"UserText": "Max Cor",
"UserTextUnicode": "392039n9dj90j32",
"Type": "Absolute",
"Dnumber": "QS236",
"Software": "1.1.1.2",
"BuildNr": "0923875",
"Source": "LAM",
"Column": "OWKD",
"StageX": "12345",
"Spotter": "2",
"ApertureX": "0.0098743",
"ApertureY": "0.2431899",
"ShiftXYZ": "-4.234809e-002",
"Text": "Here is the Text files",
"DataBaseNumber": "The database number is 918723"
}
I would suggest to do some cleaning to get rid of the [] lines.
After that you can split those lines by the "=" separator and then convert it to a dictionary.

How to iterate over GraphML file with lxml

I have the following GraphML file 'mygraph.gml' that I want to parse with a simple python script:
This represents a simple graph with 2 nodes "node0", "node1" and an edge between them
<?xml version="1.0" encoding="UTF-8"?>
<graphml xmlns="http://graphml.graphdrawing.org/xmlns"
xmlns:xsi="http://www.w3.org/2001/XMLSchema-instance"
xsi:schemaLocation="http://graphml.graphdrawing.org/xmlns
http://graphml.graphdrawing.org/xmlns/1.0/graphml.xsd">
<key id="name" for="node" attr.name="name" attr.type="string"/>
<key id="weight" for="edge" attr.name="weight" attr.type="double"/>
<graph id="G" edgedefault="directed">
<node id="n0">
<data key="name">node1</data>
</node>
<node id="n1">
<data key="name">node2</data>
</node>
<edge source="n1" target="n0">
<data key="weight">1</data>
</edge>
</graph>
</graphml>
This represents a graph with two nodes n0 and n1 with an edge of weight 1 between them.
I want to parse this structure with python.
I wrote a script with the help of lxml (I need to use it because the dataset in much much bigger than this simple example, more than 10^5 nodes, python minidom is too slow)
import lxml.etree as et
tree = et.parse('mygraph.gml')
root = tree.getroot()
graphml = {
"graph": "{http://graphml.graphdrawing.org/xmlns}graph",
"node": "{http://graphml.graphdrawing.org/xmlns}node",
"edge": "{http://graphml.graphdrawing.org/xmlns}edge",
"data": "{http://graphml.graphdrawing.org/xmlns}data",
"label": "{http://graphml.graphdrawing.org/xmlns}data[#key='label']",
"x": "{http://graphml.graphdrawing.org/xmlns}data[#key='x']",
"y": "{http://graphml.graphdrawing.org/xmlns}data[#key='y']",
"size": "{http://graphml.graphdrawing.org/xmlns}data[#key='size']",
"r": "{http://graphml.graphdrawing.org/xmlns}data[#key='r']",
"g": "{http://graphml.graphdrawing.org/xmlns}data[#key='g']",
"b": "{http://graphml.graphdrawing.org/xmlns}data[#key='b']",
"weight": "{http://graphml.graphdrawing.org/xmlns}data[#key='weight']",
"edgeid": "{http://graphml.graphdrawing.org/xmlns}data[#key='edgeid']"
}
graph = tree.find(graphml.get("graph"))
nodes = graph.findall(graphml.get("node"))
edges = graph.findall(graphml.get("edge"))
This script gets correctly the nodes and edges so that I can simply iterate over them
for n in nodes:
print n.attrib
or similarly on edges:
for e in edges:
print (e.attrib['source'], e.attrib['target'])
but I can't really understand how to get the "data" tag for the edges or the nodes in order to print the edge weight and nodes tag "name".
This doesn't work for me:
weights = graph.findall(graphml.get("weight"))
the last list is always empty. Why? I'm missing something around but can't understand what.
You can't do it in one pass, but for each node found, you can build a dict with the key/value of data:
graph = tree.find(graphml.get("graph"))
nodes = graph.findall(graphml.get("node"))
edges = graph.findall(graphml.get("edge"))
for node in nodes + edges:
attribs = {}
for data in node.findall(graphml.get('data')):
attribs[data.get('key')] = data.text
print 'Node', node, 'have', attribs
It give the result:
Node <Element {http://graphml.graphdrawing.org/xmlns}node at 0x7ff053d3e5a0> have {'name': 'node1'}
Node <Element {http://graphml.graphdrawing.org/xmlns}node at 0x7ff053d3e5f0> have {'name': 'node2'}
Node <Element {http://graphml.graphdrawing.org/xmlns}edge at 0x7ff053d3e640> have {'weight': '1'}

Categories

Resources