I would like to sort the below xml, by the attribute "value" of the "entry" tags and sort the strings (letters) before the numbers.
<test>
<entry value="-12" />
<entry value="0" />
<entry value="043" />
<entry value="14" />
<entry value="6" />
<entry value="_null" />
<entry value="abc" />
<entry value="abcd" />
<entry value="empty" />
<entry value="false" />
<entry value="test1" />
<entry value="test2" />
<entry value="true" />
</test>
I have written some python that sorts this xml, but it sorts first the numbers and then the strings.
I have checked this thread, but could not implement any of the solutions to sorting XML.
import xml.etree.ElementTree as ElT
import os
from os.path import sep
def sort_xml(directory, xml_file, level1_tag, attribute, mode=0):
#mode 0 - numbers before letters
#mode 1 - letters before numbers
file = directory + sep + xml_file
tree = ElT.parse(file)
data = tree.getroot()
els = data.findall(level1_tag)
if mode == 0:
new_els = sorted(els, key=lambda e: (e.tag, e.attrib[attribute]))
if mode == 1:
new_els = sorted(els, key=lambda e: (isinstance(e.tag, (float, int)), e.attrib[attribute]))
for el in new_els:
if mode == 0:
el[:] = sorted(el, key=lambda e: (e.tag, e.attrib[attribute]))
if mode == 1:
el[:] = sorted(el, key=lambda e: (isinstance(e.tag, (float, int)), e.attrib[attribute]))
data[:] = new_els
tree.write(file, xml_declaration=True, encoding='utf-8')
with open(file, 'r') as fin:
data = fin.read().splitlines(True)
with open(file, 'w') as fout:
fout.writelines(data[1:])
sort_xml(os.getcwd(), "test.xml", "entry", "value", 1)
Any ideas how this could be done?
Edit1: Desired output
<test>
<entry value="_null" />
<entry value="abc" />
<entry value="abcd" />
<entry value="empty" />
<entry value="false" />
<entry value="test1" />
<entry value="test2" />
<entry value="true" />
<entry value="-12" />
<entry value="0" />
<entry value="043" />
<entry value="14" />
<entry value="6" />
</test>
I think your problem is that when you are sorting you are checking if the value is an int or float. In fact all the values are strings e.g. isinstance(e.tag, (float, int)) will always be false.
A sorter function like this does what you want
def sorter(x):
"Check if the value can be interpreted as an integer, then by the string"
value = x.get("value")
def is_integer(i):
try:
int(i)
except ValueError:
return False
return True
return is_integer(value), value
which can be used like so (using StringIO as a substitute for the file)
from xml.etree import ElementTree
from io import StringIO
xml = """<test>
<entry value="-12" />
<entry value="0" />
<entry value="043" />
<entry value="14" />
<entry value="6" />
<entry value="_null" />
<entry value="abc" />
<entry value="abcd" />
<entry value="empty" />
<entry value="false" />
<entry value="test1" />
<entry value="test2" />
<entry value="true" />
</test>"""
tree = ElementTree.parse(StringIO(xml))
root = tree.getroot()
root[:] = sorted(root, key=sorter)
tree.write("output.xml")
The contents of output.xml is
<test>
<entry value="_null" />
<entry value="abc" />
<entry value="abcd" />
<entry value="empty" />
<entry value="false" />
<entry value="test1" />
<entry value="test2" />
<entry value="true" />
<entry value="-12" />
<entry value="0" />
<entry value="043" />
<entry value="14" />
<entry value="6" />
</test>
I took the part where the letters start and put it at the top. This the actual requirement to have the letters at the top, I don't care about the rest.
below
import xml.etree.ElementTree as ET
xml = '''<test>
<entry value="-12" />
<entry value="/this" />
<entry value="0" />
<entry value="043" />
<entry value="14" />
<entry value="6" />
<entry value="_null" />
<entry value="abc" />
<entry value="abcd" />
<entry value="empty" />
<entry value="false" />
<entry value="test1" />
<entry value="test2" />
<entry value="true" />
</test>'''
root = ET.fromstring(xml)
numeric = []
non_numeric = []
for entry in root.findall('.//entry'):
try:
x = int(entry.attrib['value'])
numeric.append((x, entry.attrib['value']))
except ValueError as e:
non_numeric.append(entry.attrib['value'])
sorted(numeric, key=lambda x: x[0])
sorted(non_numeric)
root = ET.Element('test')
for value in non_numeric:
entry = ET.SubElement(root, 'entry')
entry.attrib['value'] = value
for value in numeric:
entry = ET.SubElement(root, 'entry')
entry.attrib['value'] = str(value[1])
ET.dump(root)
output
<?xml version="1.0" encoding="UTF-8"?>
<test>
<entry value="/this" />
<entry value="_null" />
<entry value="abc" />
<entry value="abcd" />
<entry value="empty" />
<entry value="false" />
<entry value="test1" />
<entry value="test2" />
<entry value="true" />
<entry value="-12" />
<entry value="0" />
<entry value="043" />
<entry value="14" />
<entry value="6" />
</test>
I have an array of xml.etree.ElementTree.Element. i need to append it into root tag which contains few Tags (i.e) xml.etree.ElementTree.Element
for Example:
<MxGraphModel>
<root>
<mxCell id="0"></mxCell>
<mxCell id="1"></mxCell>
</root>
</MxGraphModel>
My array ['<mxCell id="3"></mxCell>','<mxCell id="4"></mxCell>']
My final output needs to be :
<MxGraphModel>
<root>
<mxCell id="0"></mxCell>
<mxCell id="1"></mxCell>
<mxCell id="3"></mxCell>
<mxCell id="4"></mxCell>
</root>
</MxGraphModel>
Try this:
from xml.etree import ElementTree as ET
data = ['<mxCell id="3"></mxCell>','<mxCell id="4"></mxCell>']
root = ET.parse('test.xml').getroot()
nodes = root.find('root')
for x in data:
nodes.append(ET.fromstring(x))
print(ET.tostring(root))
Output:
<MxGraphModel>
<root>
<mxCell id="0" />
<mxCell id="1" />
<mxCell id="3" />
<mxCell id="4" />
</root>
</MxGraphModel>
here is a piece of the xml data before i go any further
<?xml version="1.0" encoding="UTF-8"?>
<!DOCTYPE xmeml>
<xmeml version="5">
<sequence id="episode1">
<media>
<video>
<track>
<generatoritem id="Gen Subtitle1">
<effect>
<name>Gen Subtitle</name>
<effectid>Gen Subtitle</effectid>
<effectcategory>Text</effectcategory>
<effecttype>generator</effecttype>
<mediatype>video</mediatype>
<parameter>
<parameterid>part1</parameterid>
<name>Text Settings</name>
<value/>
</parameter>
<parameter>
<parameterid>str</parameterid>
<name>Text</name>
<value>You're a coward for picking on people
who are weaker than you.</value>
</parameter>
<parameter>
<parameterid>font</parameterid>
<name>Font</name>
<value>Arial</value>
</parameter>
</effect>
</media>
</sequence>
</xmeml>
now as you can see the tree starts with <effect> and inside there are multiple <parameters> but im only ater the <value> from <parameters> that also contain
<parameterid>str</parameterid>
<name>Text</name>
so i can get an output of "That child is so cute.
And he's smart."
Here is my code
lst = tree.findall('xmeml/sequence/media/video/track/generatoritem/effect/parameter/value')
counts = tree.findall('.//value')
for each in counts:
print(each.text)
And this is what i get
And he's smart.
Arial
See below
import xml.etree.ElementTree as ET
xml = '''<?xml version="1.0" encoding="UTF-8"?>
<!DOCTYPE xmeml>
<xmeml version="5">
<sequence id="episode1">
<effect>
<name>Gen Subtitle</name>
<effectid>Gen Subtitle</effectid>
<effectcategory>Text</effectcategory>
<effecttype>generator</effecttype>
<mediatype>video</mediatype>
<parameter>
<parameterid>part1</parameterid>
<name>Text Settings</name>
<value/>
</parameter>
<parameter>
<parameterid>str</parameterid>
<name>Text</name>
<value>That child is so cute. And he's smart</value>
</parameter>
<parameter>
<parameterid>font</parameterid>
<name>Font</name>
<value>Arial</value>
</parameter>
</effect>
</sequence>
</xmeml>'''
root = ET.fromstring(xml)
str_params = root.findall('.//parameter/[parameterid="str"]')
for param in str_params:
if param.find('./name').text == 'Text':
print('The text: {}'.format(param.find('./value').text))
break
output
The text: That child is so cute. And he's smart
I have to add 1 element at runtime on the XML file using Python.
My original XML file has content like this below
<?xml version='1.0' encoding='utf-8'?>
<!--
Some comments.
-->
<rootTag>
<childTag className="org.Tiger" SSLEngine="on" />
<childTag name="serv1">
<Connector port="8001" SSLEnabled="true"
maxThreads="800"
URIEncoding="UTF-8"
clientAuth="false" />
<Track name="Pacific" defaultHost="localhost">
<Realm className="Realm" appName="kernel"
userClassNames="User"
roleClassNames="Role"/>
<Host name="localhost"
createDirs="false">
<Value className="Remote"
httpsServerPort="223" />
</Host>
</Track>
</childTag>
</rootTag>
Below is the code which I wrote to add (Value) element at runtime
import xml.etree.ElementTree as ET
myTree = ET.parse("new2.xml")
myRoot = myTree.getroot()
x = myTree.findall('.//Valve[#className="Error"]')
print(len(x))
if int(len(x)) == 0:
for a in myRoot.findall('childTag'):
for b in a.findall('Track'):
for c in b.findall('Host'):
ele = ET.Element('Value')
ele.set("className", "Error")
ele.set("showReport", "false")
ele.set("showServerInfo", "false")
c.append(ele)
myTree.write("new2.xml")
The output which I got is this:-
<rootTag>
<childTag className="org.Tiger" SSLEngine="on" />
<childTag name="serv1">
<Connector port="8001" SSLEnabled="true" maxThreads="800" URIEncoding="UTF-8" clientAuth="false" />
<Track name="Pacific" defaultHost="localhost">
<Realm className="Realm" appName="kernel" userClassNames="User" roleClassNames="Role" />
<Host name="localhost" autoDeploy="false" createDirs="false">
<Value className="Remote" httpsServerPort="223" />
<Value className="Error" showReport="false" showServerInfo="false" /></Host>
</Track>
</childTag>
</rootTag>
The problem here is it removes the XML version, comments from the file and it also
change the indentation of file
How can I only add the subelement with correct indentation without changing anything else from the file
?
O/p should be like this
<?xml version='1.0' encoding='utf-8'?>
<!--
Some comments.
-->
<rootTag>
<childTag className="org.Tiger" SSLEngine="on" />
<childTag name="serv1">
<Connector port="8001" SSLEnabled="true"
maxThreads="800"
URIEncoding="UTF-8"
clientAuth="false" />
<Track name="Pacific" defaultHost="localhost">
<Realm className="Realm" appName="kernel"
userClassNames="User"
roleClassNames="Role"/>
<Host name="localhost"
createDirs="false">
<Value className="Remote"
httpsServerPort="223" />
<Value className="Error"
showReport="false" showServerInfo="false" />
</Host>
</Track>
</childTag>
</rootTag>
my xml file goes like this:
<?xml version="1.0"?>
<BCPFORMAT
xmlns="http://schemas.microsoft.com/sqlserver/2004/bulkload/format"
xmlns:xsi="http://www.w3.org/2001/XMLSchema-instance">
<RECORD>
<FIELD ID="1" xsi:type="CharTerm" TERMINATOR="\t" MAX_LENGTH="12"/>
<FIELD ID="2" xsi:type="CharTerm" TERMINATOR="\t" MAX_LENGTH="20" COLLATION="SQL_Latin1_General_CP1_CI_AS"/>
<FIELD ID="3" xsi:type="CharTerm" TERMINATOR="\r\n" MAX_LENGTH="30" COLLATION="SQL_Latin1_General_CP1_CI_AS"/>
</RECORD>
<ROW>
<COLUMN SOURCE="1" NAME="age" xsi:type="SQLINT"/>
<COLUMN SOURCE="2" NAME="firstname" xsi:type="SQLVARYCHAR"/>
<COLUMN SOURCE="3" NAME="lastname" xsi:type="SQLVARYCHAR"/>
</ROW>
</BCPFORMAT>
i need to know the index of the child node ID="1" in its parent node 'RECORD'.(ie, index is 0 in this case)
please help me solve this.
thanks.. :)
Using xml.etree.ElementTree:
import xml.etree.ElementTree as ET
root = ET.fromstring('''<?xml version="1.0"?>
<BCPFORMAT
...
</BCPFORMAT>''')
# Accessing parent node: http://effbot.org/zone/element.htm#accessing-parents
parent_map = {c: p for p in root.getiterator() for c in p} child = root.find('.//*[#ID="1"]')
print(list(parent_map[child]).index(child)) # => 0
Using lxml:
import lxml.etree as ET
root = ET.fromstring('''<?xml version="1.0"?>
<BCPFORMAT
...
</BCPFORMAT>''')
child = root.find('.//*[#ID="1"]')
print(child.getparent().index(child)) # => 0