how to replace elements in xml using python - python

sorry for my poor English. but i need your help ;(
i have 2 xml files.
one is:
<root>
<data name="aaaa">
<value>"old value1"</value>
<comment>"this is an old value1 of aaaa"</comment>
</data>
<data name="bbbb">
<value>"old value2"</value>
<comment>"this is an old value2 of bbbb"</comment>
</data>
</root>
two is:
<root>
<data name="aaaa">
<value>"value1"</value>
<comment>"this is a value 1 of aaaa"</comment>
</data>
<data name="bbbb">
<value>"value2"</value>
<comment>"this is a value2 of bbbb"</comment>
</data>
<data name="cccc">
<value>"value3"</value>
<comment>"this is a value3 of cccc"</comment>
</data>
</root>
one.xml will be updated from two.xml.
so, the one.xml should be like this.
one.xml(after) :
<root>
<data name="aaaa">
<value>"value1"</value>
<comment>"this is a value1 of aaaa"</comment>
</data>
<data name="bbbb">
<value>"value2"</value>
<comment>"this is a value2 of bbbb"</comment>
</data>
</root>
data name="cccc" is not exist in one.xml. therefore ignored.
actually what i want to do is
download two.xml(whole list) from db
update my one.xml (it contains DATA-lists that only the app uses) by two.xml
Any can help me please !!
Thanks!!
==============================================================
xml.etree.ElementTree
your code works with the example. but i found a problem in real xml file.
the real one.xml contains :
<?xml version="1.0" encoding="utf-8"?>
<root>
<resheader name="resmimetype">
<value>text/microsoft-resx</value>
</resheader>
<resheader name="version">
<value>2.0</value>
</resheader>
<resheader name="reader">
<value>System.Resources.ResXResourceReader, System.Windows.Forms, Version=4.0.0.0, Culture=neutral, PublicKeyToken=b77a5c561934e089</value>
</resheader>
<resheader name="writer">
<value>System.Resources.ResXResourceWriter, System.Windows.Forms, Version=4.0.0.0, Culture=neutral, PublicKeyToken=b77a5c561934e089</value>
</resheader>
<data name="NotesLabel" xml:space="preserve">
<value>Hinweise:</value>
<comment>label for input field</comment>
</data>
<data name="NotesPlaceholder" xml:space="preserve">
<value>z . Milch kaufen</value>
<comment>example input for notes field</comment>
</data>
<data name="AddButton" xml:space="preserve">
<value>Neues Element hinzufügen</value>
<comment>this string appears on a button to add a new item to the list</comment>
</data>
</root>
it seems, resheader causes trouble.
do you have any idea to fix?

You can use xml.etree.ElementTree and while there are propably more elegant ways, this should work on files that fit in memory if names are unique in two.xml
import xml.etree.ElementTree as ET
tree_one = ET.parse('one.xml')
root_one = tree_one.getroot()
tree_two = ET.parse('two.xml')
root_two = tree_two.getroot()
data_two=dict((e.get("name"), e) for e in root_two.findall("data"))
for eo in root_one.findall("data"):
name=eo.get("name")
tail=eo.tail
eo.clear()
eo.tail=tail
en=data_two[name]
for k,v in en.items():
eo.set(k,v)
eo.extend(en.findall("*"))
eo.text=en.text
tree_one.write("one.xml")
If your files do not fit in memory you can still use xml.dom.pulldom as long as single data entries do fit.

Related

Python - replace root element of one xml file with another root element without its children

I have one xml file that looks like this, XML1:
<?xml version='1.0' encoding='utf-8'?>
<report>
</report>
And the other one that is like this,
XML2:
<?xml version='1.0' encoding='utf-8'?>
<report attrib1="blabla" attrib2="blabla" attrib3="blabla" attrib4="blabla" attrib5="blabla" >
<child1>
<child2>
....
</child2>
</child1>
</report>
I need to replace and put root element of XML2 without its children, so XML1 looks like this:
<?xml version='1.0' encoding='utf-8'?>
<report attrib1="blabla" attrib2="blabla" attrib3="blabla" attrib4="blabla" attrib5="blabla">
</report>
Currently my code looks like this but it won't remove children but put whole tree inside:
source_tree = ET.parse('XML2.xml')
source_root = source_tree.getroot()
report = source_root.findall('report')
for child in list(report):
report.remove(child)
source_tree.write('XML1.xml', encoding='utf-8', xml_declaration=True)
Anyone has ide how can I achieve this?
Thanks!
Try the below (just copy attrib)
import xml.etree.ElementTree as ET
xml1 = '''<?xml version='1.0' encoding='utf-8'?>
<report>
</report>'''
xml2 = '''<?xml version='1.0' encoding='utf-8'?>
<report attrib1="blabla" attrib2="blabla" attrib3="blabla" attrib4="blabla" attrib5="blabla" >
<child1>
<child2>
</child2>
</child1>
</report>'''
root1 = ET.fromstring(xml1)
root2 = ET.fromstring(xml2)
root1.attrib = root2.attrib
ET.dump(root1)
output
<report attrib1="blabla" attrib2="blabla" attrib3="blabla" attrib4="blabla" attrib5="blabla">
</report>
So here is working code:
source_tree = ET.parse('XML2.xml')
source_root = source_tree.getroot()
dest_tree = ET.parse('XML1.xml')
dest_root = dest_tree.getroot()
dest_root.attrib = source_root.attrib
dest_tree.write('XML1.xml', encoding='utf-8', xml_declaration=True)

Parsing XML in Python with ElementTree

I'm using the documentation here to try to get only the values (name,ip , netmask) for certain elements.
This is an example of the structure of my xml:
<?xml version="1.0" ?>
<rpc-reply xmlns="urn:ietf:params:xml:ns:netconf:base:1.0" xmlns:nc="urn:ietf:params:xml:ns:netconf:base:1.0" message-id="urn:uuid:5cf32451-91af-4f71-a0bd-ead244b81b1f">
<data>
<interfaces xmlns="urn:ietf:params:xml:ns:yang:ietf-interfaces">
<interface>
<name>GigabitEthernet1</name>
<type xmlns:ianaift="urn:ietf:params:xml:ns:yang:iana-if-type">ianaift:ethernetCsmacd</type>
<enabled>true</enabled>
<ipv4 xmlns="urn:ietf:params:xml:ns:yang:ietf-ip">
<address>
<ip>192.168.40.30</ip>
<netmask>255.255.255.0</netmask>
</address>
</ipv4>
<ipv6 xmlns="urn:ietf:params:xml:ns:yang:ietf-ip"/>
</interface>
<interface>
<name>GigabitEthernet2</name>
<type xmlns:ianaift="urn:ietf:params:xml:ns:yang:iana-if-type">ianaift:ethernetCsmacd</type>
<enabled>true</enabled>
<ipv4 xmlns="urn:ietf:params:xml:ns:yang:ietf-ip">
<address>
<ip>10.10.10.1</ip>
<netmask>255.255.255.0</netmask>
</address>
</ipv4>
<ipv6 xmlns="urn:ietf:params:xml:ns:yang:ietf-ip"/>
</interface>
</interfaces>
</data>
</rpc-reply>
Python code: This code returns nothing .
import xml.etree.ElementTree as ET
tree = ET.parse("C:\\Users\\Redha\\Documents\\test_network\\interface1234.xml")
root = tree.getroot()
namespaces = {'interfaces': 'urn:ietf:params:xml:ns:yang:ietf-interfaces' }
for elem in root.findall('.//interfaces:interfaces', namespaces):
s0 = elem.find('.//interfaces:name',namespaces)
name = s0.text
print(name)
interface = ET.parse('interface2.xml')
interface_root = interface.getroot()
for interface_attribute in interface_root[0][0]:
print(f"{interface_attribute[0].text}, {interface_attribute[3][0][0].text}, {interface_attribute[3][0][1].text}")

Fetch xml tag values recursively using ElementTree

I have an xmk of the type:
<SCHOOL>
<GROUP name="GetStudInfo">
<DATA>
<NAME type="char">Sahil Jha</NAME>
<STD>11th</STD>
</DATA>
<DATA>
<NAME type="char">Rashmi Kaur</NAME>
<STD>11th</STD>
</DATA>
<DATA>
<NAME type="char">Palak Bisht</NAME>
<STD>11th</STD>
</DATA>
</SCHOOL>
I need to fetch the values of NAME, STD.
I tried doing this:
e = ET.ElementTree(ET.fromstring(getunitinfo_str))
for elt in e.iter():
print("{} {}".format(elt.tag, elt.text))
But this was covering other values as well:
Output:
SCHOOL
GROUP
DATA
NAME Sahil Jha
STD 11th
DATA
NAME Rashmi Kaur
STD 11th
DATA
NAME Palak Bisht
STD 11th
{}
Expected O/p:
{'Sahil Jha':'11th', 'Rashmi Kaur'::'11th', 'Palak Bisht':'11th'}
But the formatting should be of the type NAME:STD. Where am I going wrong?
As mentionned by #furas you can use XPATH to find all DATA elements and then find
NAME and STD elements:
import xml.etree.ElementTree as ET
xml = '''<SCHOOL>
<GROUP name="GetStudInfo">
<DATA>
<NAME type="char">Sahil Jha</NAME>
<STD>11th</STD>
</DATA>
<DATA>
<NAME type="char">Rashmi Kaur</NAME>
<STD>11th</STD>
</DATA>
<DATA>
<NAME type="char">Palak Bisht</NAME>
<STD>11th</STD>
</DATA>
</GROUP>
</SCHOOL>'''
e = ET.fromstring(xml)
for data_tag in e.findall('DATA'):
name = data_tag.find('NAME')
std = data_tag.find('STD')
print("{} {}".format(name.text, std.text))
Or you can use a dict comprehension to get the dictionary you want:
my_dict = {
data_tag.find('NAME').text: data_tag.find('STD').text
for data_tag in e.findall('.//DATA')
}
print(my_dict)
You need something more then only print() - you need if/else to check elt.tag to get only NAME and `STD.
Because NAME and STD are different tags so you will have to remeber NAME in some variable to use it when you get STD
name = None # default value at start
for elt in e.iter():
if elt.tag == 'NAME':
name = elt # remember element
if elt.tag == 'STD':
print("{}:{}".format(name.text, elt.text))
Or you could use xpath like in #qouify answer.
Minimal working code
getunitinfo_str = '''
<SCHOOL>
<GROUP name="GetStudInfo">
<DATA>
<NAME type="char">Sahil Jha</NAME>
<STD>11th</STD>
</DATA>
<DATA>
<NAME type="char">Rashmi Kaur</NAME>
<STD>11th</STD>
</DATA>
<DATA>
<NAME type="char">Palak Bisht</NAME>
<STD>11th</STD>
</DATA>
</GROUP>
</SCHOOL>
'''
import xml.etree.ElementTree as ET
e = ET.ElementTree(ET.fromstring(getunitinfo_str))
name = None # to remeber element
for elt in e.iter():
if elt.tag == 'NAME':
name = elt
if elt.tag == 'STD':
print("{}:{}".format(name.text, elt.text))
One liner below
import xml.etree.ElementTree as ET
xml = '''<SCHOOL>
<GROUP name="GetStudInfo">
<DATA>
<NAME type="char">Sahil Jha</NAME>
<STD>11th</STD>
</DATA>
<DATA>
<NAME type="char">Rashmi Kaur</NAME>
<STD>116th</STD>
</DATA>
<DATA>
<NAME type="char">Palak Bisht</NAME>
<STD>17th</STD>
</DATA>
</GROUP>
</SCHOOL>'''
root = ET.fromstring(xml)
data = {x.find("NAME").text: x.find("STD").text for x in root.findall('.//DATA')}
print(data)
output
{'Sahil Jha': '11th', 'Rashmi Kaur': '116th', 'Palak Bisht': '17th'}

How to add space before and after CDATA in XML file

I want to create a function to modify XML content without changing the format. I managed to change the text but I can't do it without changing the format in XML.
So now, what I wanted to do is to add space before and after CDATA in a XML file.
Default XML file:
<?xml version="1.0" encoding="utf-8"?>
<Mapsxmlns="http://www.semi.org">
<Map>
<Device>
<ReferenceDevice/>
<Bin>
<Bin Bin="001"/>
</Bin>
<Data>
<Row> <![CDATA[001 001 001]]> </Row>
</Data>
</Device>
</Map>
</Maps>
And I am getting this result:
<?xml version="1.0" encoding="utf-8"?>
<Mapsxmlns="http://www.semi.org">
<Map>
<Device>
<ReferenceDevice/>
<Bin>
<Bin Bin="001"/>
</Bin>
<Data>
<Row><![CDATA[001 001 099]]></Row>
</Data>
</Device>
</Map>
</Maps>
However, I want the new xml to be like this:
<?xml version="1.0" encoding="utf-8"?>
<Mapsxmlns="http://www.semi.org">
<Map>
<Device>
<ReferenceDevice/>
<Bin>
<Bin Bin="001"/>
</Bin>
<Data>
<Row> <![CDATA[001 001 099]]> </Row>
</Data>
</Device>
</Map>
</Maps>
Here is my code:
from lxml import etree as ET
def xml_new(f,fpath,newtext,xmlrow):
xmlrow = 19
parser = ET.XMLParser(strip_cdata=False)
tree = ET.parse(f, parser)
root = tree.getroot()
for child in root:
value = child[0][2][xmlrow].text
text = ET.CDATA("001 001 099")
child[0][2][xmlrow] = ET.Element('Row')
child[0][2][xmlrow].text = text
child[0][2][xmlrow].tail = "\n"
ET.register_namespace('A', "http://www.semi.org")
tree.write(fpath,encoding='utf-8',xml_declaration=True)
return value
Anyone can help me on this? thanks in advance!
I don't quite understand what you want to do. Here's an example for you. I don't know if it can meet your needs.
from simplified_scrapy import SimplifiedDoc,req,utils
html ='''<?xml version="1.0" encoding="utf-8"?>
<Mapsxmlns="http://www.semi.org">
<Map>
<Device>
<ReferenceDevice/>
<Bin>
<Bin Bin="001"/>
</Bin>
<Data>
<Row> <![CDATA[001 001 001]]> </Row>
</Data>
</Device>
</Map>
</Maps>'''
doc = SimplifiedDoc(html)
row = doc.Data.Row # Get the node you want to modify.
row.setContent(" "+row.html+" ") # Modify the node content.
print (doc.html)
Result:
<?xml version="1.0" encoding="utf-8"?>
<Mapsxmlns="http://www.semi.org">
<Map>
<Device>
<ReferenceDevice />
<Bin>
<Bin Bin="001" />
</Bin>
<Data>
<Row> <![CDATA[001 001 001]]> </Row>
</Data>
</Device>
</Map>
</Maps>
thanks for all your help. I have found another way to achieve the result I want
This is the code:
# what you want to change
replaceby = '020]]> </Row>\n'
# row you want to change
row = 1
# col you want to change based on list
col = 3
file = open(file,'r')
line = file.readlines()
i = 0
editedXML=[]
for l in line:
if 'cdata' in l.lower():
i=i+1
if i == row:
oldVal = l.split(' ')
newVal = []
for index, old in enumerate(oldVal):
if index == col:
newVal.append(replaceby)
else:
newVal.append(old)
editedXML.append(' '.join(newVal))
else:
editedXML.append(l)
else:
editedXML.append(l)
file2 = open(newfile,'w')
file2.write(''.join(editedXML))
file2.close()

Sorting XML tags by child elements Python

I have a number of 'root' tags with children 'name'. I want to sort the 'root' blocks, ordered alphabetically by the 'name' element. Have tried lxml / etree / minidom but can't get it working...
I can't get it to parse the value inside the tags, and then sort the parent root tags.
<?xml version='1.0' encoding='UTF-8'?>
<roots>
<root>
<path>//1.1.1.100/Alex</path>
<name>Alex Space</name>
</root>
<root>
<path>//1.1.1.101/Steve</path>
<name>Steve Space</name>
</root>
<root>
<path>//1.1.1.150/Bethany</path>
<name>Bethanys</name>
</root>
</roots>
Here is what I have tried:
import xml.etree.ElementTree as ET
def sortchildrenby(parent, child):
parent[:] = sorted(parent, key=lambda child: child)
tree = ET.parse('data.xml')
root = tree.getroot()
sortchildrenby(root, 'name')
for child in root:
sortchildrenby(child, 'name')
tree.write('output.xml')
If you want to put the name nodes first:
x = """
<roots>
<root>
<path>//1.1.1.100/Alex</path>
<name>Alex Space</name>
</root>
<root>
<path>//1.1.1.101/Steve</path>
<name>Bethanys</name>
</root>
<root>
<path>//1.1.1.150/Bethany</path>
<name>Steve Space</name>
</root>
</roots>"""
import lxml.etree as et
tree = et.fromstring(x)
for r in tree.iter("root"):
r[:] = sorted(r, key=lambda ch: -(ch.tag == "name"))
print(et.tostring(tree).decode("utf-8"))
Which would give you:
<roots>
<root>
<name>Alex Space</name>
<path>//1.1.1.100/Alex</path>
</root>
<root>
<name>Bethanys</name>
<path>//1.1.1.101/Steve</path>
</root>
<root>
<name>Steve Space</name>
<path>//1.1.1.150/Bethany</path>
</root>
</roots>
But there is no need to sort if you just want to add them first, you can just remove and reinsert the name into index 0:
import lxml.etree as et
tree = et.fromstring(x)
for r in tree.iter("root"):
ch = r.find("name")
r.remove(ch)
r.insert(0, ch)
print(et.tostring(tree).decode("utf-8"))
If the nodes are actually not in sorted order and you want to rearrange the roots node alphabetically:
x = """
<roots>
<root>
<path>//1.1.1.100/Alex</path>
<name>Alex Space</name>
</root>
<root>
<path>//1.1.1.101/Steve</path>
<name>Steve Space</name>
</root>
<root>
<path>//1.1.1.150/Bethany</path>
<name>Bethanys</name>
</root>
</roots>"""
import lxml.etree as et
tree = et.fromstring(x)
tree[:] = sorted(tree, key=lambda ch: ch.xpath("name/text()"))
print(et.tostring(tree).decode("utf-8"))
Which would give you:
<roots>
<root>
<path>//1.1.1.100/Alex</path>
<name>Alex Space</name>
</root>
<root>
<path>//1.1.1.150/Bethany</path>
<name>Bethanys</name>
</root>
<root>
<path>//1.1.1.101/Steve</path>
<name>Steve Space</name>
</root>
</roots>
You can also combine with either of the first two approach two also rearrange the root nodes putting name first.
Try this:
import xml.etree.ElementTree as ET
xml="<?xml version='1.0' encoding='UTF-8'?><roots><root><path>//1.1.1.100/Alex</path><name>Alex Space</name></root><root><path>//1.1.1.101/Steve</path><name>Steve Space</name></root><root><path>//1.1.1.150/Bethany</path><name>Bethanys</name></root></roots>"
oldxml = ET.fromstring(xml)
names = []
for rootobj in oldxml.findall('root'):
names.append(rootobj.find('name').text)
newxml = ET.Element('roots')
for name in sorted(names):
for rootobj in oldxml.findall('root'):
if name == rootobj.find('name').text:
newxml.append(rootobj)
ET.dump(oldxml)
ET.dump(newxml)
I'm reading from a variable and dumpin it on screen.
You can change it read from file and dump it to a file like you need.

Categories

Resources