python xml remove grandchildren or grandgrandchildren

python xml remove grandchildren or grandgrandchildren - python

I've been googling for removing grandchildren from an xml file. However, I've found no perfect solution.
Here's my case:
<tree>
<category title="Item 1">item 1 text
<subitem title="subitem1">subitem1 text</subitem>
<subitem title="subitem2">subitem2 text</subitem>
</category>
<category title="Item 2">item 2 text
<subitem title="subitem21">subitem21 text</subitem>
<subitem title="subitem22">subitem22 text</subitem>
<subsubitem title="subsubitem211">subsubitem211 text</subsubitem>
</category>
</tree>
In some cases, I want to remove subitems. In other cases, I want to remove subsubitem. I know I can do like this in current given content:
import xml.etree.ElementTree as ET
root = ET.fromstring(given_content)
# case 1
for item in root.getiterator():
for subitem in item:
item.remove(subitem)
# case 2
for item in root.getiterator():
for subitem in item:
for subsubitem in subitem:
subitem.remove(subsubitem)
I can write in this style only when I know the depth of the target node. If I only know the tag name of node I want to remove, how should I implement it?
pseudo-code:
import xml.etree.ElementTree as ET
for item in root.getiterator():
if item.tag == 'subsubitem' or item.tag == 'subitem':
# remove item
If I do root.remove(item), it will certainly return an error because item is not a direct child of root.
Edited:
I cannot install any 3rd-party-lib, so I have to solve this with xml.

I finally got this work for me only on xml lib by writing a recursive function.
def recursive_xml(root):
if root.getchildren() is not None:
for child in root.getchildren():
if child.tag == 'subitem' or child.tag == 'subsubitem':
root.remove(child)
else:
recursive_xml(child)
By doing so, the function will iterate every node in ET and remove my target nodes.
test_xml = r'''
<test>
<test1>
<test2>
<test3>
</test3>
<subsubitem>
</subsubitem>
</test2>
<subitem>
</subitem>
<nothing_matters>
</nothing_matters>
</test1>
</test>
'''
root = ET.fromstring(test_xml)
recursive_xml(root)
Hope this helps someone has restricted requirements like me....

To remove instances of subsubitem or subitem, no matter what their depth, consider the following example (with the caveat that it uses lxml.etree rather than upstream ElementTree):
import lxml.etree as etree
el = etree.fromstring('<root><item><subitem><subsubitem/></subitem></item></root>')
for child in el.xpath('.//subsubitem | .//subitem'):
child.getparent().remove(child)

Related

Check if node exists

I am working with IronPython 2.7 in Dynamo. I need to check if a node exists. If so, the text in the node should be written to a list. If no, then False should be written to the list.
I get no error. But, even if a node exists in the list, it doesn't write the text in the list. False is correctly written into the list.
Simple Example:
<note>
<note2>
<yolo>
<to>
<type>
<game>
<name>Jani</name>
<lvl>111111</lvl>
<fun>2222222</fun>
</game>
</type>
</to>
<mo>
<type>
<game>
<name>Bani</name>
<fun>44444444</fun>
</game>
</type>
</mo>
</yolo>
</note2>
</note>
So, the node lvl is only in the first node game. I expect the resulting list like list[11111, false].
Here is my code:
import clr
import sys
clr.AddReference('ProtoGeometry')
from Autodesk.DesignScript.Geometry import *
sys.path.append("C:\Program Files (x86)\IronPython 2.7\Lib")
import xml.etree.ElementTree as ET
xml="note.xml"
main_xpath=".//game"
searchforxpath =".//lvl"
list=[]
tree = ET.parse(xml)
root = tree.getroot()
main_match = root.findall(main_xpath)
for elem in main_match:
if elem.find(searchforxpath) is not None:
list.append(elem.text)
else:
list.append(False)
print list
Why is the list empty where the string should be? I get list[ ,false].

You need to use the text of the match from the elem.find, not the original elem:
for elem in main_match:
subelem = elem.find(searchforxpath)
if subelem != None:
list.append(subelem.text)
else:
list.append(False)

Adding a parent tag to a nested structure with ElementTree (Python)

I have the following structure
<root>
<data>
<config>
CONFIGURATION
<config>
</data>
</root>
With Python's ElementTree module I want to add a parent element to <config> tag as
<root>
<data>
<type>
<config>
CONFIGURATION
<config>
</type>
</data>
</root>
Also the xml file might have other config tags elsewhere but I'm only interested in the ones appearing under data tag.

This boils down to ~3 steps:
get the elements that match your criteria (tag == x, parent tag == y)
remove that element from the parent, putting a new child in that place
add the former child to the new child.
For the first step, we can use this answer. Since we know we'll need the parent later, let's keep that too in our search.
def find_elements(tree, child_tag, parent_tag):
parent_map = dict((c, p) for p in tree.iter() for c in p)
for el in tree.iter(child_tag):
parent = parent_map[el]
if parent.tag == parent_tag:
yield el, parent
steps two and three are pretty related, we can do them together.
def insert_new_els(tree, child_tag, parent_tag, new_node_tag):
to_replace = list(find_elements(tree, child_tag, parent_tag))
for child, parent in to_replace:
ix = list(parent).index(child)
new_node = ET.Element(new_node_tag)
parent.insert(ix, new_node)
parent.remove(child)
new_node.append(child)
Your tree will be modified in place.
Now usage is simply:
tree = ET.parse('some_file.xml')
insert_new_els(tree, 'config', 'data', 'type')
tree.write('some_file_processed.xml')
untested

Removing parent element and all subelements from XML

Given an XML file with the following structure:
<Root>
<Stuff></Stuff>
<MoreStuff></MoreStuff>
<Targets>
<Target>
<ID>12345</ID>
<Type>Ground</Type>
<Size>Large</Size>
</Target>
<Target>
...
</Target>
</Targets>
</Root>
I'm trying to loop through each child under the <Targets> element, check each <ID> for a specific value, and if the value is found, then I want to delete the entire <Target> entry. I've been using the ElementTree Python library with little success. Here's what I have so far:
import xml.etree.ElementTree as ET
tree = ET.parse('file.xml')
root = tree.getroot()
iterator = root.getiterator('Target')
for item in iterator:
old = item.find('ID')
text = old.text
if '12345' in text:
item.remove(old)
tree.write('out.xml')
The problem I'm having with this approach is that only the <ID> sub element is removed, however I need the entire <Target> element and all of its child elements removed. Can anyone help! Thanks.

Unfortunately, element tree elements don't know who their parents are. There is a workaround -- You can build the mapping yourself:
tree = ET.parse('file.xml')
root = tree.getroot()
parent_map = dict((c, p) for p in tree.getiterator() for c in p)
# list so that we don't mess up the order of iteration when removing items.
iterator = list(root.getiterator('Target'))
for item in iterator:
old = item.find('ID')
text = old.text
if '12345' in text:
parent_map[item].remove(item)
continue
tree.write('out.xml')
Untested

You need to keep a reference to the Targets element so that you can remove its children, so start your iteration from there. Grab each Target, check your condition and remove what you don't like.
#!/usr/bin/env python
import xml.etree.ElementTree as ET
xmlstr="""<Root>
<Stuff></Stuff>
<MoreStuff></MoreStuff>
<Targets>
<Target>
<ID>12345</ID>
<Type>Ground</Type>
<Size>Large</Size>
</Target>
<Target>
...
</Target>
</Targets>
</Root>"""
root = ET.fromstring(xmlstr)
targets = root.find('Targets')
for target in targets.findall('Target'):
_id = target.find('ID')
if _id is not None and '12345' in _id.text:
targets.remove(target)
print ET.tostring(root)

How to copy multiple XML nodes to another file in Python

Bare in mind I am very new to Python. I'm trying to copy few XML nodes from sample1.xml to out.xml if it doesn't exist in sample2.xml.
this is how far I got before I'm stuck
import xml.etree.ElementTree as ET
tree = ET.ElementTree(file='sample1.xml')
addtree = ET.ElementTree(file='sample2.xml')
root = tree.getroot()
addroot = addtree.getroot()
for adel in addroot.findall('.//cars/car'):
for el in root.findall('cars/car'):
with open('out.xml', 'w+') as f:
f.write("BEFORE\n")
f.write(el.tag)
f.write("\n")
f.write(adel.tag)
f.write("\n")
f.write("\n")
f.write("AFTER\n")
el = adel
f.write(el.tag)
f.write("\n")
f.write(adel.tag)
I have no idea what I'm missing, but it's only copying the actual "tag" itself.
outputs this:
BEFORE
car
car
AFTER
car
car
So I'm missing the children nodes, and also the <, >, </, > tags. Expected result is below.
sample1.xml:
<cars>
<car>
<use-car>0</use-car>
<use-gas>0</use-gas>
<car-name />
<car-key />
<car-location>hawaii</car-location>
<car-port>5</car-port>
</car>
</cars>
sample2.xml:
<cars>
<old>
1
</old>
<new>
8
</new>
<car />
</cars>
expected result in out.xml (final product)
<cars>
<old>
1
</old>
<new>
8
</old>
<car>
<use-car>0</use-car>
<use-gas>0</use-gas>
<car-name />
<car-key />
<car-location>hawaii</car-location>
<car-port>5</car-port>
</car>
</cars>
All the other nodes old and new must remain untouched. I'm just trying to replace <car /> with all its children and grandchildren (if existed) nodes.

First, a couple of trivial issues with your XML:
sample1: The closing cars tag is missing a /
sample2: The closing new tag incorrectly reads old, should read new
Second, a disclaimer: my solution below has its limitations - in particular, it wouldn't handle repeatedly substituting the car node from sample1 into multiple spots in sample2. But it works fine for the sample files you've supplied.
Third: thanks to the top couple of answers on access ElementTree node parent node - they informed the implementation of get_node_parent_info below.
Finally, the code:
import xml.etree.ElementTree as ET
def find_child(node, with_name):
"""Recursively find node with given name"""
for element in list(node):
if element.tag == with_name:
return element
elif list(element):
sub_result = find_child(element, with_name)
if sub_result is not None:
return sub_result
return None
def replace_node(from_tree, to_tree, node_name):
"""
Replace node with given node_name in to_tree with
the same-named node from the from_tree
"""
# Find nodes of given name ('car' in the example) in each tree
from_node = find_child(from_tree.getroot(), node_name)
to_node = find_child(to_tree.getroot(), node_name)
# Find where to substitute the from_node into the to_tree
to_parent, to_index = get_node_parent_info(to_tree, to_node)
# Replace to_node with from_node
to_parent.remove(to_node)
to_parent.insert(to_index, from_node)
def get_node_parent_info(tree, node):
"""
Return tuple of (parent, index) where:
parent = node's parent within tree
index = index of node under parent
"""
parent_map = {c:p for p in tree.iter() for c in p}
parent = parent_map[node]
return parent, list(parent).index(node)
from_tree = ET.ElementTree(file='sample1.xml')
to_tree = ET.ElementTree(file='sample2.xml')
replace_node(from_tree, to_tree, 'car')
# ET.dump(to_tree)
to_tree.write('output.xml')
UPDATE: It was recently brought to my attention that the implementation of find_child() in the solution I originally supplied would fail if the "child" in question was not in the first branch of the XML tree that was traversed. I've updated the implementation above to rectify this.

Parsing subelements with elementTree

I have code in a XML file, which I parse using et.parse:
<VIAFCluster xmlns="http://viaf.org/viaf/terms#" xmlns:rdf="http://www.w3.org/1999/02/22-rdf-syntax-ns#" xmlns:void="http://rdfs.org/ns/void#" xmlns:foaf="http://xmlns.com/foaf/0.1/">
<viafID>15</viafID>
<nameType>Personal</nameType>
</VIAFCluster>
<mainHeadings>
<data>
<text>
Gondrin de Pardaillan de Montespan, Louis-Antoine de, 1665-1736
</text>
</data>
</mainHeadings>
and I want to parse it as:
[15, "Personal", "Gondrin etc."]
I can't seem to print any of the string information with:
import xml.etree.ElementTree as ET
tree = ET.parse('/Users/user/Documents/work/oneline.xml')
root = tree.getroot()
for node in tree.iter():
name = node.find('nameType')
print(name)
as it appears as 'None' ... what am I doing wrong?

I'm still not sure exactly what you are wanting to do, but hopefully if you run the code below, it will help get you on your way. Using the getiterator function to iter through the elements will let you see what's going on. You can pick up the stuff you want as you come to them:
import xml.etree.ElementTree as et
xml = '''
<VIAFCluster xmlns="http://viaf.org/viaf/terms#"
xmlns:rdf="http://www.w3.org/1999/02/22-rdf-syntax-ns#"
xmlns:void="http://rdfs.org/ns/void#"
xmlns:foaf="http://xmlns.com/foaf/0.1/">
<viafID>15</viafID>
<nameType>Personal</nameType>
<mainHeadings>
<data>
<text>
Gondrin de Pardaillan de Montespan, Louis-Antoine de, 1665-1736
</text>
</data>
</mainHeadings>
</VIAFCluster>
'''
tree = et.fromstring(xml)
lst = []
for i in tree.getiterator():
t = i.text.strip()
if t:
lst.append(t)
print i.tag
print t
You will end up with a list as you wanted. I had to clean up your xml because you had more than one top level element, which is a no-no. Maybe that was your problem all along.
good luck, Mike

Develop Reference

Python is a programming language that lets you work quickly and integrate systems more effectively.

python xml remove grandchildren or grandgrandchildren - python

Related

Check if node exists

Adding a parent tag to a nested structure with ElementTree (Python)

Removing parent element and all subelements from XML

How to copy multiple XML nodes to another file in Python

Parsing subelements with elementTree

Categories

Resources