Given an XML file with the following structure:
<Root>
<Stuff></Stuff>
<MoreStuff></MoreStuff>
<Targets>
<Target>
<ID>12345</ID>
<Type>Ground</Type>
<Size>Large</Size>
</Target>
<Target>
...
</Target>
</Targets>
</Root>
I'm trying to loop through each child under the <Targets> element, check each <ID> for a specific value, and if the value is found, then I want to delete the entire <Target> entry. I've been using the ElementTree Python library with little success. Here's what I have so far:
import xml.etree.ElementTree as ET
tree = ET.parse('file.xml')
root = tree.getroot()
iterator = root.getiterator('Target')
for item in iterator:
old = item.find('ID')
text = old.text
if '12345' in text:
item.remove(old)
tree.write('out.xml')
The problem I'm having with this approach is that only the <ID> sub element is removed, however I need the entire <Target> element and all of its child elements removed. Can anyone help! Thanks.
Unfortunately, element tree elements don't know who their parents are. There is a workaround -- You can build the mapping yourself:
tree = ET.parse('file.xml')
root = tree.getroot()
parent_map = dict((c, p) for p in tree.getiterator() for c in p)
# list so that we don't mess up the order of iteration when removing items.
iterator = list(root.getiterator('Target'))
for item in iterator:
old = item.find('ID')
text = old.text
if '12345' in text:
parent_map[item].remove(item)
continue
tree.write('out.xml')
Untested
You need to keep a reference to the Targets element so that you can remove its children, so start your iteration from there. Grab each Target, check your condition and remove what you don't like.
#!/usr/bin/env python
import xml.etree.ElementTree as ET
xmlstr="""<Root>
<Stuff></Stuff>
<MoreStuff></MoreStuff>
<Targets>
<Target>
<ID>12345</ID>
<Type>Ground</Type>
<Size>Large</Size>
</Target>
<Target>
...
</Target>
</Targets>
</Root>"""
root = ET.fromstring(xmlstr)
targets = root.find('Targets')
for target in targets.findall('Target'):
_id = target.find('ID')
if _id is not None and '12345' in _id.text:
targets.remove(target)
print ET.tostring(root)
Related
I've been googling for removing grandchildren from an xml file. However, I've found no perfect solution.
Here's my case:
<tree>
<category title="Item 1">item 1 text
<subitem title="subitem1">subitem1 text</subitem>
<subitem title="subitem2">subitem2 text</subitem>
</category>
<category title="Item 2">item 2 text
<subitem title="subitem21">subitem21 text</subitem>
<subitem title="subitem22">subitem22 text</subitem>
<subsubitem title="subsubitem211">subsubitem211 text</subsubitem>
</category>
</tree>
In some cases, I want to remove subitems. In other cases, I want to remove subsubitem. I know I can do like this in current given content:
import xml.etree.ElementTree as ET
root = ET.fromstring(given_content)
# case 1
for item in root.getiterator():
for subitem in item:
item.remove(subitem)
# case 2
for item in root.getiterator():
for subitem in item:
for subsubitem in subitem:
subitem.remove(subsubitem)
I can write in this style only when I know the depth of the target node. If I only know the tag name of node I want to remove, how should I implement it?
pseudo-code:
import xml.etree.ElementTree as ET
for item in root.getiterator():
if item.tag == 'subsubitem' or item.tag == 'subitem':
# remove item
If I do root.remove(item), it will certainly return an error because item is not a direct child of root.
Edited:
I cannot install any 3rd-party-lib, so I have to solve this with xml.
I finally got this work for me only on xml lib by writing a recursive function.
def recursive_xml(root):
if root.getchildren() is not None:
for child in root.getchildren():
if child.tag == 'subitem' or child.tag == 'subsubitem':
root.remove(child)
else:
recursive_xml(child)
By doing so, the function will iterate every node in ET and remove my target nodes.
test_xml = r'''
<test>
<test1>
<test2>
<test3>
</test3>
<subsubitem>
</subsubitem>
</test2>
<subitem>
</subitem>
<nothing_matters>
</nothing_matters>
</test1>
</test>
'''
root = ET.fromstring(test_xml)
recursive_xml(root)
Hope this helps someone has restricted requirements like me....
To remove instances of subsubitem or subitem, no matter what their depth, consider the following example (with the caveat that it uses lxml.etree rather than upstream ElementTree):
import lxml.etree as etree
el = etree.fromstring('<root><item><subitem><subsubitem/></subitem></item></root>')
for child in el.xpath('.//subsubitem | .//subitem'):
child.getparent().remove(child)
I am working with IronPython 2.7 in Dynamo. I need to check if a node exists. If so, the text in the node should be written to a list. If no, then False should be written to the list.
I get no error. But, even if a node exists in the list, it doesn't write the text in the list. False is correctly written into the list.
Simple Example:
<note>
<note2>
<yolo>
<to>
<type>
<game>
<name>Jani</name>
<lvl>111111</lvl>
<fun>2222222</fun>
</game>
</type>
</to>
<mo>
<type>
<game>
<name>Bani</name>
<fun>44444444</fun>
</game>
</type>
</mo>
</yolo>
</note2>
</note>
So, the node lvl is only in the first node game. I expect the resulting list like list[11111, false].
Here is my code:
import clr
import sys
clr.AddReference('ProtoGeometry')
from Autodesk.DesignScript.Geometry import *
sys.path.append("C:\Program Files (x86)\IronPython 2.7\Lib")
import xml.etree.ElementTree as ET
xml="note.xml"
main_xpath=".//game"
searchforxpath =".//lvl"
list=[]
tree = ET.parse(xml)
root = tree.getroot()
main_match = root.findall(main_xpath)
for elem in main_match:
if elem.find(searchforxpath) is not None:
list.append(elem.text)
else:
list.append(False)
print list
Why is the list empty where the string should be? I get list[ ,false].
You need to use the text of the match from the elem.find, not the original elem:
for elem in main_match:
subelem = elem.find(searchforxpath)
if subelem != None:
list.append(subelem.text)
else:
list.append(False)
Given the following xml:
<node a='1' b='1'>
<subnode x='25'/>
</node>
I would like to extract the tagname and all attributes for the first node, i.e., the verbatim code:
<node a='1' b='1'>
without the subnode.
For example in Python, tostring returns too much:
from lxml import etree
root = etree.fromstring("<node a='1' b='1'><subnode x='25'>some text</subnode></node>")
print(etree.tostring(root))
returns
b'<node a="1" b="1"><subnode x="25">some text</subnode></node>'
The following gives the desired result, but is much too verbose:
tag = root.tag
for att, val in root.attrib.items():
tag += ' '+att+'="'+val+'"'
tag = '<'+tag+'>'
print(tag)
result:
<node a="1" b="1">
What is an easier (and guaranteed attribute order preserving) way of doing this?
You can remove all of the subnodes.
from lxml import etree
root = etree.fromstring("<node a='1' b='1'><subnode x='25'>some text</subnode></node>")
for subnode in root.xpath("//subnode"):
subnode.getparent().remove(subnode)
etree.tostring(root) # '<node a="1" b="1"/>'
Alternatively, you can use a simple regex. Order is guaranteed.
import re
res = re.search('<(.*?)>', etree.tostring(root))
res.group(1) # "node a='1' b='1'"
I have the following structure
<root>
<data>
<config>
CONFIGURATION
<config>
</data>
</root>
With Python's ElementTree module I want to add a parent element to <config> tag as
<root>
<data>
<type>
<config>
CONFIGURATION
<config>
</type>
</data>
</root>
Also the xml file might have other config tags elsewhere but I'm only interested in the ones appearing under data tag.
This boils down to ~3 steps:
get the elements that match your criteria (tag == x, parent tag == y)
remove that element from the parent, putting a new child in that place
add the former child to the new child.
For the first step, we can use this answer. Since we know we'll need the parent later, let's keep that too in our search.
def find_elements(tree, child_tag, parent_tag):
parent_map = dict((c, p) for p in tree.iter() for c in p)
for el in tree.iter(child_tag):
parent = parent_map[el]
if parent.tag == parent_tag:
yield el, parent
steps two and three are pretty related, we can do them together.
def insert_new_els(tree, child_tag, parent_tag, new_node_tag):
to_replace = list(find_elements(tree, child_tag, parent_tag))
for child, parent in to_replace:
ix = list(parent).index(child)
new_node = ET.Element(new_node_tag)
parent.insert(ix, new_node)
parent.remove(child)
new_node.append(child)
Your tree will be modified in place.
Now usage is simply:
tree = ET.parse('some_file.xml')
insert_new_els(tree, 'config', 'data', 'type')
tree.write('some_file_processed.xml')
untested
I have code in a XML file, which I parse using et.parse:
<VIAFCluster xmlns="http://viaf.org/viaf/terms#" xmlns:rdf="http://www.w3.org/1999/02/22-rdf-syntax-ns#" xmlns:void="http://rdfs.org/ns/void#" xmlns:foaf="http://xmlns.com/foaf/0.1/">
<viafID>15</viafID>
<nameType>Personal</nameType>
</VIAFCluster>
<mainHeadings>
<data>
<text>
Gondrin de Pardaillan de Montespan, Louis-Antoine de, 1665-1736
</text>
</data>
</mainHeadings>
and I want to parse it as:
[15, "Personal", "Gondrin etc."]
I can't seem to print any of the string information with:
import xml.etree.ElementTree as ET
tree = ET.parse('/Users/user/Documents/work/oneline.xml')
root = tree.getroot()
for node in tree.iter():
name = node.find('nameType')
print(name)
as it appears as 'None' ... what am I doing wrong?
I'm still not sure exactly what you are wanting to do, but hopefully if you run the code below, it will help get you on your way. Using the getiterator function to iter through the elements will let you see what's going on. You can pick up the stuff you want as you come to them:
import xml.etree.ElementTree as et
xml = '''
<VIAFCluster xmlns="http://viaf.org/viaf/terms#"
xmlns:rdf="http://www.w3.org/1999/02/22-rdf-syntax-ns#"
xmlns:void="http://rdfs.org/ns/void#"
xmlns:foaf="http://xmlns.com/foaf/0.1/">
<viafID>15</viafID>
<nameType>Personal</nameType>
<mainHeadings>
<data>
<text>
Gondrin de Pardaillan de Montespan, Louis-Antoine de, 1665-1736
</text>
</data>
</mainHeadings>
</VIAFCluster>
'''
tree = et.fromstring(xml)
lst = []
for i in tree.getiterator():
t = i.text.strip()
if t:
lst.append(t)
print i.tag
print t
You will end up with a list as you wanted. I had to clean up your xml because you had more than one top level element, which is a no-no. Maybe that was your problem all along.
good luck, Mike