I want to replace child elements from one tree to another , based on some criteria. I can do this using Comprehension ? But how do we replace element in ElementTree?
You can't replace an element from the ElementTree you can only work with Element.
Even when you call ElementTree.find() it's just a shortcut for getroot().find().
So you really need to:
extract the parent element
use comprehension (or whatever you like) on that parent element
The extraction of the parent element can be easy if your target is a root sub-element (just call getroot()) otherwise you'll have to find it.
Unlike the DOM, etree has no explicit multi-document functions. However, you should be able to just move elements freely from one document to another. You may want to call _setroot after doing so.
By calling insert and then remove, you can replace a node in a document.
I'm new to python, but I've found a dodgy way to do this:
Input file input1.xml:
<?xml version="1.0" encoding="UTF-8"?>
<root>
<import ref="input2.xml" />
<name awesome="true">Chuck</name>
</root>
Input file input2.xml:
<?xml version="1.0" encoding="UTF-8"?>
<foo>
<bar>blah blah</bar>
</foo>
Python code: (note, messy and hacky)
import os
import xml.etree.ElementTree as ElementTree
def getElementTree(xmlFile):
print "-- Processing file: '%s' in: '%s'" %(xmlFile, os.getcwd())
xmlFH = open(xmlFile, 'r')
xmlStr = xmlFH.read()
et = ElementTree.fromstring(xmlStr)
parent_map = dict((c, p) for p in et.getiterator() for c in p)
# ref: https://stackoverflow.com/questions/2170610/access-elementtree-node-parent-node/2170994
importList = et.findall('.//import[#ref]')
for importPlaceholder in importList:
old_dir = os.getcwd()
new_dir = os.path.dirname(importPlaceholder.attrib['ref'])
shallPushd = os.path.exists(new_dir)
if shallPushd:
print " pushd: %s" %(new_dir)
os.chdir(new_dir) # pushd (for relative linking)
# Recursing to import element from file reference
importedElement = getElementTree(os.path.basename(importPlaceholder.attrib['ref']))
# element replacement
parent = parent_map[importPlaceholder]
index = parent._children.index(importPlaceholder)
parent._children[index] = importedElement
if shallPushd:
print " popd: %s" %(old_dir)
os.chdir(old_dir) # popd
return et
xmlET = getElementTree("input1.xml")
print ElementTree.tostring(xmlET)
gives the output:
-- Processing file: 'input1.xml' in: 'C:\temp\testing'
-- Processing file: 'input2.xml' in: 'C:\temp\testing'
<root>
<foo>
<bar>blah blah</bar>
</foo><name awesome="true">Chuck</name>
</root>
this was concluded with information from:
stackoverflow answer: access ElementTree node parent node
accessing parents from effbot.org
Related
Would you help me, pleace, to get an access to elemnt with name 'id' by the following construction in Python (i have lxml and xml.etree.ElementTree libraries).
Desirable result: '0000000'
Desirable method:
Search in xml-document a child, where it's name is fcsProtocolEF3.
Search in fcsProtocolEF3 an element with name 'id'.
It is crucial to search by element name. Not by ordinal position.
I tried to use something like this: tree.findall('{http://zakupki.gov.ru/oos/export/1}fcsProtocolEF3')[0].findall('{http://zakupki.gov.ru/oos/types/1}id')[0].text
it works, but it requires to input namespaces. XML-document have different namespaces and I don't know how to define them beforehand.
Thank you.
That would be great to use something like XQuery in SQL:
value('(/*:export/*:fcsProtocolEF3/*:id)[1]', 'nvarchar(21)')) AS [id],
XML-document:
<?xml version="1.0" encoding="UTF-8" standalone="true"?>
<ns2:export xmlns:ns3="http://zakupki.gov.ru/oos/common/1" xmlns:ns4="http://zakupki.gov.ru/oos/base/1" xmlns:ns2="http://zakupki.gov.ru/oos/export/1" xmlns:ns10="http://zakupki.gov.ru/oos/printform/1" xmlns:ns11="http://zakupki.gov.ru/oos/control99/1" xmlns:ns9="http://zakupki.gov.ru/oos/SMTypes/1" xmlns:ns7="http://zakupki.gov.ru/oos/pprf615types/1" xmlns:ns8="http://zakupki.gov.ru/oos/EPtypes/1" xmlns:ns5="http://zakupki.gov.ru/oos/TPtypes/1" xmlns:ns6="http://zakupki.gov.ru/oos/CPtypes/1" xmlns="http://zakupki.gov.ru/oos/types/1">
<ns2:fcsProtocolEF3 schemeVersion="10.2">
<id>0000000</id>
<purchaseNumber>0000000000000000</purchaseNumber>
</ns2:fcsProtocolEF3>
</ns2:export>
lxml solution:
xml = '''<?xml version="1.0"?>
<ns2:export xmlns:ns3="http://zakupki.gov.ru/oos/common/1" xmlns:ns4="http://zakupki.gov.ru/oos/base/1" xmlns:ns2="http://zakupki.gov.ru/oos/export/1" xmlns:ns10="http://zakupki.gov.ru/oos/printform/1" xmlns:ns11="http://zakupki.gov.ru/oos/control99/1" xmlns:ns9="http://zakupki.gov.ru/oos/SMTypes/1" xmlns:ns7="http://zakupki.gov.ru/oos/pprf615types/1" xmlns:ns8="http://zakupki.gov.ru/oos/EPtypes/1" xmlns:ns5="http://zakupki.gov.ru/oos/TPtypes/1" xmlns:ns6="http://zakupki.gov.ru/oos/CPtypes/1" xmlns="http://zakupki.gov.ru/oos/types/1">
<ns2:fcsProtocolEF3 schemeVersion="10.2">
<id>0000000</id>
<purchaseNumber>0000000000000000</purchaseNumber>
</ns2:fcsProtocolEF3>
</ns2:export>'''
from lxml import etree as et
root = et.fromstring(xml)
text = root.xpath('//*[local-name()="export"]/*[local-name()="fcsProtocolEF3"]/*[local-name()="id"]/text()')[0]
print(text)
Below is ET based solution. NS are in use.
import xml.etree.ElementTree as ET
xml = '''<?xml version="1.0" encoding="UTF-8" standalone="yes"?>
<ns2:export xmlns:ns3="http://zakupki.gov.ru/oos/common/1" xmlns:ns4="http://zakupki.gov.ru/oos/base/1" xmlns:ns2="http://zakupki.gov.ru/oos/export/1" xmlns:ns10="http://zakupki.gov.ru/oos/printform/1" xmlns:ns11="http://zakupki.gov.ru/oos/control99/1" xmlns:ns9="http://zakupki.gov.ru/oos/SMTypes/1" xmlns:ns7="http://zakupki.gov.ru/oos/pprf615types/1" xmlns:ns8="http://zakupki.gov.ru/oos/EPtypes/1" xmlns:ns5="http://zakupki.gov.ru/oos/TPtypes/1" xmlns:ns6="http://zakupki.gov.ru/oos/CPtypes/1" xmlns="http://zakupki.gov.ru/oos/types/1">
<ns2:fcsProtocolEF3 schemeVersion="10.2">
<id>0000000</id>
<purchaseNumber>0000000000000000</purchaseNumber>
</ns2:fcsProtocolEF3>
</ns2:export>
'''
def get_id_text():
root = ET.fromstring(xml)
fcs = root.find('{http://zakupki.gov.ru/oos/export/1}fcsProtocolEF3')
# assuming there is one fcs element and one id under fcs
return fcs.find('{http://zakupki.gov.ru/oos/types/1}id').text
print(get_id_text())
output
0000000
I am reading an xliff file and planning to retrieve specific element. I tried to print all the elements using
from lxml import etree
with open('path\to\file\.xliff', 'r',encoding = 'utf-8') as xml_file:
tree = etree.parse(xml_file)
root = tree.getroot()
for element in root.iter():
print("child", element)
The output was
child <Element {urn:oasis:names:tc:xliff:document:2.0}segment at 0x6b8f9c8>
child <Element {urn:oasis:names:tc:xliff:document:2.0}source at 0x6b8f908>
When I tried to get the specific element (with the help of many posts here) - source tag
segment = tree.xpath('{urn:oasis:names:tc:xliff:document:2.0}segment')
print(segment)
it returns an empty list. Can someone tell me how to retrieve it properly.
Input :
<?xml version='1.0' encoding='UTF-8'?>
<xliff xmlns="urn:oasis:names:tc:xliff:document:2.0" version="2.0">
<segment id = 1>
<source>
Hello world
</source>
</segment>
<segment id = 2 >
<source>
2nd statement
</source>
</segment>
</xliff>
I want to get the values of segment and its corresponding source
This code,
tree.xpath('{urn:oasis:names:tc:xliff:document:2.0}segment')
is not accepted by lxml ("lxml.etree.XPathEvalError: Invalid expression"). You need to use findall().
The following works (in the XML sample, the segment elements are children of xliff):
from lxml import etree
tree = etree.parse("test.xliff") # XML in the question; ill-formed attributes corrected
segment = tree.findall('{urn:oasis:names:tc:xliff:document:2.0}segment')
print(segment)
However, the real XML is apparently more complex (segment is not a direct child of xliff). Then you need to add .// to search the whole tree:
segment = tree.findall('.//{urn:oasis:names:tc:xliff:document:2.0}segment')
I have an XML that's like this
<xml>
<access>
<user>
<name>user1</name>
<group>testgroup</group>
</user>
<user>
<name>user2</name>
<group>testgroup</group>
</user>
<access>
</xml>
I now want to add a <group>testgroup2</group> to the user1 subtree.
Using the following I can get the name
access = root.find('access')
name = [element for element in access.iter() if element.text == 'user1']
But I can't access the parent using name.find('..') it tells me
AttributeError: 'list' object has no attribute 'find'.
Is there any possibility to access the exact <user> child of <access> where the text in name is "user1"?
Expected result:
<xml>
<access>
<user>
<name>user1</name>
<group>testgroup</group>
<group>testgroup2</group>
</user>
<user>
<name>user2</name>
<group>testgroup</group>
</user>
<access>
</xml>
Important notice: I can NOT use lxml to use getparent() method, I am stuck to xml.etree
To do that, using 'find', you need to do like this: for ele in name:
ele.find('..') # To access ele as an element
Here is how I solved this, if anyone is interested in doing this stuff in xml instead of lxml (why ever).
According to suggestion from
http://effbot.org/zone/element.htm#accessing-parents
import xml.etree.ElementTree as et
tree = et.parse(my_xmlfile)
root = tree.getroot()
access = root.find('access')
# ... snip ...
def iterparent(tree):
for parent in tree.getiterator():
for child in parent:
yield parent, child
# users = list of user-names that need new_group added
# iter through tupel and find the username
# alter xml tree when found
for user in users:
print "processing user: %s" % user
for parent, child in iterparent(access):
if child.tag == "name" and child.text == user:
print "Name found: %s" % user
parent.append(et.fromstring('<group>%s</group>' % new_group))
After this et.dump(tree) shows that tree now contains the correctly altered user-subtree with another group tag added.
Note: I am not really sure why this works, I just expect that yield gives a reference to the tree and therefore altering the parent yield returned alters the original tree. My python knowledge is not good enough to be sure about this tho. I just know that it works for me this way.
You can write a recursive method to iterate through the tree and capture the parents.
def recurse_tree(node):
for child in node.getchildren():
if child.text == 'user1':
yield node
for subchild in recurse_tree(child):
yield subchild
print list(recurse_tree(root))
# [<Element 'user' at 0x18a1470>]
If you're using Python 3.X, you can use the nifty yield from ... syntax rather than iterating over the recursive call.
Note that this could possibly yield the same node more than once (if there are multiple children containing the target text). You can either use a set to remove duplicates, or you can alter the control flow to prevent this from happening.
you can directly use findall() method to get the parent node that match the name='user1'. see below code
import xml.etree.ElementTree as ET
tree = ET.parse('test.xml') #build tree object using your xml
root = tree.getroot() #using tree object get the root
for parent in root.findall(".//*[name='user1']"):
# the predicate [name='user1'] preceded by asterisk will give
# all elements where child having name='user1'
parent.append(ET.fromstring("<group>testgroup2</group>"))
# if you want to see the xml after adding the string
ET.dump(root)
# optionally to save the xml
tree.write('output.xml')
He i work with IronPython 2.7 in Dynamo.
I need to get the value from a specific note in a realy big xml. I made an example xml, so you better understand by problem.
So i need to get the value of "lvl", but only in the note "to".
In the moment I get an error:
TypeError: list objects are unhashable"
for line:
list.extend(elem.findall(match))
What am I doing wrong? Is there an better/easy way to do it?
Here is the example xml:
<?xml version="1.0" encoding="UTF-8"?>
<note>
<note2>
<yolo>
<to>
<type>
<game>
<name>Jani</name>
<lvl>111111</lvl>
<fun>2222222</fun>
</game>
</type>
</to>
<mo>
<type>
<game>
<name>Bani</name>
<lvl>3333333</lvl>
<fun>44444444</fun>
</game>
</type>
</mo>
</yolo>
</note2>
</note>
And here is my code:
import clr
import sys
clr.AddReference('ProtoGeometry')
from Autodesk.DesignScript.Geometry import *
sys.path.append("C:\Program Files (x86)\IronPython 2.7\Lib")
import xml.etree.ElementTree as ET
xml="note.xml"
xpathstr=".//yolo"
match ="lvl"
list=[]
tree = ET.parse(xml)
root = tree.getroot()
specific = root.findall(xpathstr)
for elem in specific:
list.extend(elem.findall(match))
print tree, root, specific, list
If you need to get the value of "lvl", but only in the note "to" you can do it in one xpath:
import xml.etree.ElementTree as ET
xml="note.xml"
xpathstr=".//to//lvl"
tree = ET.parse(xml)
root = tree.getroot()
specific = root.findall(xpathstr)
list=[]
for elem in specific:
list.append(elem.text)
print (list)
Gives:
['111111']
If you know that there are elements "type" and "game" containing "lvl" you can alternatively use the xpath ".//to/type/game/lvl" or if element "to" has to be contained in "yolo" then use ".//yolo/to/type/game/lvl"
And you probably want to be using list.append instead of list.extend but maybe not, I don't know the rest of your code.
When I try to read a text of a element who has a child, it gives None:
See the xml (say test.xml):
<?xml version="1.0"?>
<data>
<test><ref>MemoryRegion</ref> abcd</test>
</data>
and the python code that wants to read 'abcd':
import xml.etree.ElementTree as ET
tree = ET.parse('test.xml')
root = tree.getroot()
print root.find("test").text
When I run this python, it gives None, rather than abcd.
How can I read abcd under this condition?
Use Element.tail attribute:
>>> import xml.etree.ElementTree as ET
>>> tree = ET.parse('test.xml')
>>> root = tree.getroot()
>>> print root.find(".//ref").tail
abcd
ElementTree has a rather different view of XML that is more suited for nested data. .text is the data right after a start tag. .tail is the data right after an end tag. so you want:
print root.find('test/ref').tail