Check if node exists - python

I am working with IronPython 2.7 in Dynamo. I need to check if a node exists. If so, the text in the node should be written to a list. If no, then False should be written to the list.
I get no error. But, even if a node exists in the list, it doesn't write the text in the list. False is correctly written into the list.
Simple Example:
<note>
<note2>
<yolo>
<to>
<type>
<game>
<name>Jani</name>
<lvl>111111</lvl>
<fun>2222222</fun>
</game>
</type>
</to>
<mo>
<type>
<game>
<name>Bani</name>
<fun>44444444</fun>
</game>
</type>
</mo>
</yolo>
</note2>
</note>
So, the node lvl is only in the first node game. I expect the resulting list like list[11111, false].
Here is my code:
import clr
import sys
clr.AddReference('ProtoGeometry')
from Autodesk.DesignScript.Geometry import *
sys.path.append("C:\Program Files (x86)\IronPython 2.7\Lib")
import xml.etree.ElementTree as ET
xml="note.xml"
main_xpath=".//game"
searchforxpath =".//lvl"
list=[]
tree = ET.parse(xml)
root = tree.getroot()
main_match = root.findall(main_xpath)
for elem in main_match:
if elem.find(searchforxpath) is not None:
list.append(elem.text)
else:
list.append(False)
print list
Why is the list empty where the string should be? I get list[ ,false].

You need to use the text of the match from the elem.find, not the original elem:
for elem in main_match:
subelem = elem.find(searchforxpath)
if subelem != None:
list.append(subelem.text)
else:
list.append(False)

Related

Iterating through xml file

I am trying to get all surnames from xml file, but if I am trying to use find, It throws an exception
TypeError: 'NoneType' object is not iterable
This is my code:
import xml.etree.ElementTree as ET
tree = ET.parse('test.xml')
root = tree.getroot()
for elem in root:
for subelem in elem:
for subsubelem in subelem.find('surname'):
print(subsubelem.text)
When I remove the find('surname') from code, It returning all texts from subsubelements.
This is xml:
<?xml version="1.0" encoding="UTF-8"?>
<pp:card xmlns:pp="http://xmlns.page.com/path/subpath">
<pp:id>1</pp:id>
<pp:customers>
<pp:customer>
<pp:name>John</pp:name>
<pp:surname>Walker</pp:surname>
<pp:adress>
<pp:street>Walker street</pp:street>
<pp:number>1/1</pp:number>
<pp:state>England</pp:state>
</pp:adress>
<pp:created>2021-03-08Z</pp:created>
</pp:customer>
<pp:customer>
<pp:name>Michael</pp:name>
<pp:surname>Jordan</pp:surname>
<pp:adress>
<pp:street>Jordan street</pp:street>
<pp:number>28</pp:number>
<pp:state>USA</pp:state>
</pp:adress>
<pp:created>2021-03-09Z</pp:created>
</pp:customer>
</pp:customers>
</pp:card>
How should I fix it?
Not really a python person, but should the "find" statement include the "pp:" in its search, such as,
find('pp:surname')
Neither the opening nor closing tags actually match "surname".
Use the namespace when you call findall
import xml.etree.ElementTree as ET
xml = '''<?xml version="1.0" encoding="UTF-8"?>
<pp:card xmlns:pp="http://xmlns.page.com/path/subpath">
<pp:id>1</pp:id>
<pp:customers>
<pp:customer>
<pp:name>John</pp:name>
<pp:surname>Walker</pp:surname>
<pp:adress>
<pp:street>Walker street</pp:street>
<pp:number>1/1</pp:number>
<pp:state>England</pp:state>
</pp:adress>
<pp:created>2021-03-08Z</pp:created>
</pp:customer>
<pp:customer>
<pp:name>Michael</pp:name>
<pp:surname>Jordan</pp:surname>
<pp:adress>
<pp:street>Jordan street</pp:street>
<pp:number>28</pp:number>
<pp:state>USA</pp:state>
</pp:adress>
<pp:created>2021-03-09Z</pp:created>
</pp:customer>
</pp:customers>
</pp:card>'''
ns = {'pp': 'http://xmlns.page.com/path/subpath'}
root = ET.fromstring(xml)
names = [sn.text for sn in root.findall('.//pp:surname', ns)]
print(names)
output
['Walker', 'Jordan']

Reshape xml using python?

I have a xml like this
<data>
<B>Head1</B>
<I>Inter1</I>
<I>Inter2</I>
<I>Inter3</I>
<I>Inter4</I>
<I>Inter5</I>
<O>,</O>
<B>Head2</B>
<I>Inter6</I>
<I>Inter7</I>
<I>Inter8</I>
<I>Inter9</I>
<O>,</O>
<O> </O>
</data>
and I want the XML to look like
<data>
<combined>Head1 Inter1 Inter2 Inter3 Inter4 Inter5</combined>,
<combined>Head2 Inter6 Inter7 Inter8 Inter9</combined>
</data>
I tried to get all values of "B"
for value in mod.getiterator(tag='B'):
print (value.text)
Head1
Head2
for value in mod.getiterator(tag='I'):
print (value.text)
Inter1
Inter2
Inter3
Inter4
Inter5
Inter6
Inter7
Inter8
Inter9
Now How should I save the first iteration value to one tag and then the second one in diffrent tag. ie. How do make the iteration to start at tag "B" find all the tag "I" which are following it and then iterate again if I again find a tag "B" and save them all in a new tag.
tag "O" will always be present at the end
You can use ElementTree module from xml.etree:
from xml.etree import ElementTree
struct = """
<data>
{}
</data>
"""
def reformat(tree):
root = tree.getroot()
seen = []
for neighbor in root.iter('data'):
for child in neighbor.getchildren():
tx = child.text
if tx == ',':
yield "<combined>{}<combined>".format(' '.join(seen))
seen = []
else:
seen.append(tx)
with open('test.xml') as f:
tree = ElementTree.parse(f)
print(struct.format(',\n'.join(reformat(tree))))
result:
<data>
<combined>Head1 Inter1 Inter2 Inter3 Inter4 Inter5<combined>,
<combined>Head2 Inter6 Inter7 Inter8 Inter9<combined>
</data>
Note that if you're not sure all the blocks are separated wit comma you can simply change the condition if tx == ',': according your file format. You can also check when the tx is started with 'Head' then if seen is not empty yield the seen and clear its content, otherwise append the tx and continue.

python xml remove grandchildren or grandgrandchildren

I've been googling for removing grandchildren from an xml file. However, I've found no perfect solution.
Here's my case:
<tree>
<category title="Item 1">item 1 text
<subitem title="subitem1">subitem1 text</subitem>
<subitem title="subitem2">subitem2 text</subitem>
</category>
<category title="Item 2">item 2 text
<subitem title="subitem21">subitem21 text</subitem>
<subitem title="subitem22">subitem22 text</subitem>
<subsubitem title="subsubitem211">subsubitem211 text</subsubitem>
</category>
</tree>
In some cases, I want to remove subitems. In other cases, I want to remove subsubitem. I know I can do like this in current given content:
import xml.etree.ElementTree as ET
root = ET.fromstring(given_content)
# case 1
for item in root.getiterator():
for subitem in item:
item.remove(subitem)
# case 2
for item in root.getiterator():
for subitem in item:
for subsubitem in subitem:
subitem.remove(subsubitem)
I can write in this style only when I know the depth of the target node. If I only know the tag name of node I want to remove, how should I implement it?
pseudo-code:
import xml.etree.ElementTree as ET
for item in root.getiterator():
if item.tag == 'subsubitem' or item.tag == 'subitem':
# remove item
If I do root.remove(item), it will certainly return an error because item is not a direct child of root.
Edited:
I cannot install any 3rd-party-lib, so I have to solve this with xml.
I finally got this work for me only on xml lib by writing a recursive function.
def recursive_xml(root):
if root.getchildren() is not None:
for child in root.getchildren():
if child.tag == 'subitem' or child.tag == 'subsubitem':
root.remove(child)
else:
recursive_xml(child)
By doing so, the function will iterate every node in ET and remove my target nodes.
test_xml = r'''
<test>
<test1>
<test2>
<test3>
</test3>
<subsubitem>
</subsubitem>
</test2>
<subitem>
</subitem>
<nothing_matters>
</nothing_matters>
</test1>
</test>
'''
root = ET.fromstring(test_xml)
recursive_xml(root)
Hope this helps someone has restricted requirements like me....
To remove instances of subsubitem or subitem, no matter what their depth, consider the following example (with the caveat that it uses lxml.etree rather than upstream ElementTree):
import lxml.etree as etree
el = etree.fromstring('<root><item><subitem><subsubitem/></subitem></item></root>')
for child in el.xpath('.//subsubitem | .//subitem'):
child.getparent().remove(child)

lxml non-recursive full tag

Given the following xml:
<node a='1' b='1'>
<subnode x='25'/>
</node>
I would like to extract the tagname and all attributes for the first node, i.e., the verbatim code:
<node a='1' b='1'>
without the subnode.
For example in Python, tostring returns too much:
from lxml import etree
root = etree.fromstring("<node a='1' b='1'><subnode x='25'>some text</subnode></node>")
print(etree.tostring(root))
returns
b'<node a="1" b="1"><subnode x="25">some text</subnode></node>'
The following gives the desired result, but is much too verbose:
tag = root.tag
for att, val in root.attrib.items():
tag += ' '+att+'="'+val+'"'
tag = '<'+tag+'>'
print(tag)
result:
<node a="1" b="1">
What is an easier (and guaranteed attribute order preserving) way of doing this?
You can remove all of the subnodes.
from lxml import etree
root = etree.fromstring("<node a='1' b='1'><subnode x='25'>some text</subnode></node>")
for subnode in root.xpath("//subnode"):
subnode.getparent().remove(subnode)
etree.tostring(root) # '<node a="1" b="1"/>'
Alternatively, you can use a simple regex. Order is guaranteed.
import re
res = re.search('<(.*?)>', etree.tostring(root))
res.group(1) # "node a='1' b='1'"

Removing parent element and all subelements from XML

Given an XML file with the following structure:
<Root>
<Stuff></Stuff>
<MoreStuff></MoreStuff>
<Targets>
<Target>
<ID>12345</ID>
<Type>Ground</Type>
<Size>Large</Size>
</Target>
<Target>
...
</Target>
</Targets>
</Root>
I'm trying to loop through each child under the <Targets> element, check each <ID> for a specific value, and if the value is found, then I want to delete the entire <Target> entry. I've been using the ElementTree Python library with little success. Here's what I have so far:
import xml.etree.ElementTree as ET
tree = ET.parse('file.xml')
root = tree.getroot()
iterator = root.getiterator('Target')
for item in iterator:
old = item.find('ID')
text = old.text
if '12345' in text:
item.remove(old)
tree.write('out.xml')
The problem I'm having with this approach is that only the <ID> sub element is removed, however I need the entire <Target> element and all of its child elements removed. Can anyone help! Thanks.
Unfortunately, element tree elements don't know who their parents are. There is a workaround -- You can build the mapping yourself:
tree = ET.parse('file.xml')
root = tree.getroot()
parent_map = dict((c, p) for p in tree.getiterator() for c in p)
# list so that we don't mess up the order of iteration when removing items.
iterator = list(root.getiterator('Target'))
for item in iterator:
old = item.find('ID')
text = old.text
if '12345' in text:
parent_map[item].remove(item)
continue
tree.write('out.xml')
Untested
You need to keep a reference to the Targets element so that you can remove its children, so start your iteration from there. Grab each Target, check your condition and remove what you don't like.
#!/usr/bin/env python
import xml.etree.ElementTree as ET
xmlstr="""<Root>
<Stuff></Stuff>
<MoreStuff></MoreStuff>
<Targets>
<Target>
<ID>12345</ID>
<Type>Ground</Type>
<Size>Large</Size>
</Target>
<Target>
...
</Target>
</Targets>
</Root>"""
root = ET.fromstring(xmlstr)
targets = root.find('Targets')
for target in targets.findall('Target'):
_id = target.find('ID')
if _id is not None and '12345' in _id.text:
targets.remove(target)
print ET.tostring(root)

Categories

Resources