I have an XML that's like this
<xml>
<access>
<user>
<name>user1</name>
<group>testgroup</group>
</user>
<user>
<name>user2</name>
<group>testgroup</group>
</user>
<access>
</xml>
I now want to add a <group>testgroup2</group> to the user1 subtree.
Using the following I can get the name
access = root.find('access')
name = [element for element in access.iter() if element.text == 'user1']
But I can't access the parent using name.find('..') it tells me
AttributeError: 'list' object has no attribute 'find'.
Is there any possibility to access the exact <user> child of <access> where the text in name is "user1"?
Expected result:
<xml>
<access>
<user>
<name>user1</name>
<group>testgroup</group>
<group>testgroup2</group>
</user>
<user>
<name>user2</name>
<group>testgroup</group>
</user>
<access>
</xml>
Important notice: I can NOT use lxml to use getparent() method, I am stuck to xml.etree
To do that, using 'find', you need to do like this: for ele in name:
ele.find('..') # To access ele as an element
Here is how I solved this, if anyone is interested in doing this stuff in xml instead of lxml (why ever).
According to suggestion from
http://effbot.org/zone/element.htm#accessing-parents
import xml.etree.ElementTree as et
tree = et.parse(my_xmlfile)
root = tree.getroot()
access = root.find('access')
# ... snip ...
def iterparent(tree):
for parent in tree.getiterator():
for child in parent:
yield parent, child
# users = list of user-names that need new_group added
# iter through tupel and find the username
# alter xml tree when found
for user in users:
print "processing user: %s" % user
for parent, child in iterparent(access):
if child.tag == "name" and child.text == user:
print "Name found: %s" % user
parent.append(et.fromstring('<group>%s</group>' % new_group))
After this et.dump(tree) shows that tree now contains the correctly altered user-subtree with another group tag added.
Note: I am not really sure why this works, I just expect that yield gives a reference to the tree and therefore altering the parent yield returned alters the original tree. My python knowledge is not good enough to be sure about this tho. I just know that it works for me this way.
You can write a recursive method to iterate through the tree and capture the parents.
def recurse_tree(node):
for child in node.getchildren():
if child.text == 'user1':
yield node
for subchild in recurse_tree(child):
yield subchild
print list(recurse_tree(root))
# [<Element 'user' at 0x18a1470>]
If you're using Python 3.X, you can use the nifty yield from ... syntax rather than iterating over the recursive call.
Note that this could possibly yield the same node more than once (if there are multiple children containing the target text). You can either use a set to remove duplicates, or you can alter the control flow to prevent this from happening.
you can directly use findall() method to get the parent node that match the name='user1'. see below code
import xml.etree.ElementTree as ET
tree = ET.parse('test.xml') #build tree object using your xml
root = tree.getroot() #using tree object get the root
for parent in root.findall(".//*[name='user1']"):
# the predicate [name='user1'] preceded by asterisk will give
# all elements where child having name='user1'
parent.append(ET.fromstring("<group>testgroup2</group>"))
# if you want to see the xml after adding the string
ET.dump(root)
# optionally to save the xml
tree.write('output.xml')
Related
Sorry if this is a silly question, but I've been trying to reference the directory in a .props file, that way I could parse it and use it as a variable for a python project. The problem however is that the header of the file is being treated as the root of the program, and I haven't been able to reference the main root no matter what I've done. I've tried 'json.dumps()', 'root.iter()', 'root.findall()', and a slew of other options to try and get past the header, but everytime I try the result either generates an error, or nothing at all.
I'm guessing it's because I'm using a props file and while similar, these solutions are supposed to be for .xml files, but I haven't found anything that implies I should be dealing with .props files any differently.
In short. How can I take the information in the MainRoot node below, and, in a separate python program, parse it and make it into a variable? Said props file is below.
<!--YouFoundMe.props-->
<?xml version="1.0" encoding="utf-8"?>
<Project ToolsVersion="14.0" DefaultTargets="Build" xmlns="http://schemas.microsoft.com/developer/msbuild/2003">
<PropertyGroup>
<MainRoot>..\..\..\YouFoundMe</MainRoot>
</PropertyGroup>
</Project>
This may not be important, but if it helps, I'll also post a python file containing some of the failed methods I tried below:
import xml.etree.ElementTree as ET
import json
tree = ET.parse('YouFoundMe\YouFoundMe.props')
root = tree.getroot()
FIND_ME_DIR = json.dumps(root.attrib)
boobop = json.dumps(root.tag)
print(FIND_ME_DIR)
for child in root:
print(child.tag)
print(child.attrib)
for MainRoot in root.iter('Project'):
print(MainRoot.attrib)
for MainRoot in root.iter('PropertyGroup'):
print(MainRoot.attrib)
for MainRoot in root.iter('MainRoot'):
print(MainRoot.attrib)
for child in root.iter('MainRoot'):
print("Aything? Please?")
for PROP in root.findall('PropertyGroup'):
result = PROP.find('MainRoot').text
print(result)
for MainRoot in root.findall('Project'):
print("Text")
for MainRoot in root.findall('PropertyGroup'):
print("Text")
for MainRoot in root.findall('MainRoot'):
print("Text")
element = root.find('Project')
if not element: # careful!
print("element not found, or element has no subelements")
if element is None:
print("element not found")
test = str(root.get("Project"))
print(test)
test = str(root.get("PropertyGroup"))
print(test)
test = str(root.get("MainRoot"))
print(test)
print(tree)
print(root)
Notice that your XML has default namespace declared at the root element level:
xmlns="http://schemas.microsoft.com/developer/msbuild/2003"
Note that descendant elements without prefix, including MainRoot, inherit this default namespace implicitly. You can define a prefix that references the above default namespace and then use that prefix to find MainRoot, for example:
ns = { 'd': 'http://schemas.microsoft.com/developer/msbuild/2003' }
main_root = root.find('.//d:MainRoot', namespaces=ns)
print(main_root.text)
My xml file looks like below :-
<?xml version="1.0" encoding="UTF-8" standalone="yes"?>
<Messages xmlns="URL/sampleMessages-v1">
<Header>
<TransactionId>0</TransactionId>
<RequestNo>41194812</RequestNo>
<VNo>6789</VNo>
<Source></Source>
</Header>
...
...
</Messages>
I want to read it and change the RequestNo value
<RequestNo>41194812</RequestNo> to
<RequestNo>41194000</RequestNo>
I am using ElementTree module currently. I am using windows machine currently.
I want to update the value in the same file.
Ihave tried below code :-
for elem in root:
for subelem in elem:
#print (subelem.tag)
if 'RequestNo' in subelem.tag :
#print (subelem.text)
subelem.text="41194813"
But i am not able to see the change or i dont know currently how to write this new value subelem.text="41194813" in existing xml file.
Your for loop does the job: it did replace the text correctly. The change is in your root variable. You can verify that by adding the following line right after the for loop:
ElementTree.dump(root)
Now that you have the XML updated, you will need to write that into a file:
tree.write('newfile.xml')
Where tree is the result of ElementTree.parse(). So, to put everything together:
tree = ElementTree.parse('messages.xml')
root = tree.getroot()
for elem in root:
for subelem in elem:
if 'RequestNo' in subelem.tag:
subelem.text = '41194813'
break
tree.write('messages-new.xml')
Dealing with Namespaces
Your XML document contains namespaces, so if you plan to search for a tag, you need to include the namespaces in the tag names. Here is an alternative solution which deals with namespaces:
tree = ElementTree.parse('messages.xml')
root = tree.getroot()
namespaces = {'xxx': 'URL/sampleMessages-v1'}
node = root.find('xxx:Header/xxx:RequestNo', namespaces)
if node is not None:
node.text = '41194813'
tree.write('messages-new.xml')
In the above example, I just gave your namespace the name 'xxx', it can be anything 'foo', 'bar', ... but should be used as prefix in the call to root.find().
Removing "ns0" from Output File
In order to remove "ns0" from output file, you need to register the namespace before writing:
ElementTree.register_namespace('', 'URL/sampleMessages-v1')
tree.write('messages-new.xml')
I'm a newbie with Python and I'd like to remove the element openingHours and the child elements from the XML.
I have this input
<Root>
<stations>
<station id= "1">
<name>whatever</name>
<openingHours>
<openingHour>
<entrance>main</entrance>
<timeInterval>
<from>05:30</from>
<to>21:30</to>
</timeInterval>
<openingHour/>
<openingHours>
<station/>
<station id= "2">
<name>foo</name>
<openingHours>
<openingHour>
<entrance>main</entrance>
<timeInterval>
<from>06:30</from>
<to>21:30</to>
</timeInterval>
<openingHour/>
<openingHours>
<station/>
<stations/>
<Root/>
I'd like this output
<Root>
<stations>
<station id= "1">
<name>whatever</name>
<station/>
<station id= "2">
<name>foo</name>
<station/>
<stations/>
<Root/>
So far I've tried this from another thread How to remove elements from XML using Python
from lxml import etree
doc=etree.parse('stations.xml')
for elem in doc.xpath('//*[attribute::openingHour]'):
parent = elem.getparent()
parent.remove(elem)
print(etree.tostring(doc))
However, It doesn't seem to be working.
Thanks
I took your code for a spin but at first Python couldn't agree with the way you composed your XML, wanting the / in the closing tag to be at the beginning (like </...>) instead of at the end (<.../>).
That aside, the reason your code isn't working is because the xpath expression is looking for the attribute openingHour while in reality you want to look for elements called openingHours. I got it to work by changing the expression to //openingHours. Making the entire code:
from lxml import etree
doc=etree.parse('stations.xml')
for elem in doc.xpath('//openingHours'):
parent = elem.getparent()
parent.remove(elem)
print(etree.tostring(doc))
You want to remove the tags <openingHours> and not some attribute with name openingHour:
from lxml import etree
doc = etree.parse('stations.xml')
for elem in doc.findall('.//openingHours'):
parent = elem.getparent()
parent.remove(elem)
print(etree.tostring(doc))
Is there a way I can get the document root from a child Element or Node? I am migrating from a library that works with any of Document, Element or Node to one that works only with Document. eg.
From:
element.xpath('/a/b/c') # 4Suite
to:
xpath.find('/a/b/c', doc) # pydomxpath
Node objects have an ownerDocument property that refers to the Document object associated with the node. See http://www.w3.org/TR/DOM-Level-2-Core/core.html#node-ownerDoc.
This property is not mentioned in the Python documentation, but it's available. Example:
from xml.dom import minidom
XML = """
<root>
<x>abc</x>
<y>123</y>
</root>"""
dom = minidom.parseString(XML)
x = dom.getElementsByTagName('x')[0]
print x
print x.ownerDocument
Output:
<DOM Element: x at 0xc57cd8>
<xml.dom.minidom.Document instance at 0x00C1CC60>
I want to replace child elements from one tree to another , based on some criteria. I can do this using Comprehension ? But how do we replace element in ElementTree?
You can't replace an element from the ElementTree you can only work with Element.
Even when you call ElementTree.find() it's just a shortcut for getroot().find().
So you really need to:
extract the parent element
use comprehension (or whatever you like) on that parent element
The extraction of the parent element can be easy if your target is a root sub-element (just call getroot()) otherwise you'll have to find it.
Unlike the DOM, etree has no explicit multi-document functions. However, you should be able to just move elements freely from one document to another. You may want to call _setroot after doing so.
By calling insert and then remove, you can replace a node in a document.
I'm new to python, but I've found a dodgy way to do this:
Input file input1.xml:
<?xml version="1.0" encoding="UTF-8"?>
<root>
<import ref="input2.xml" />
<name awesome="true">Chuck</name>
</root>
Input file input2.xml:
<?xml version="1.0" encoding="UTF-8"?>
<foo>
<bar>blah blah</bar>
</foo>
Python code: (note, messy and hacky)
import os
import xml.etree.ElementTree as ElementTree
def getElementTree(xmlFile):
print "-- Processing file: '%s' in: '%s'" %(xmlFile, os.getcwd())
xmlFH = open(xmlFile, 'r')
xmlStr = xmlFH.read()
et = ElementTree.fromstring(xmlStr)
parent_map = dict((c, p) for p in et.getiterator() for c in p)
# ref: https://stackoverflow.com/questions/2170610/access-elementtree-node-parent-node/2170994
importList = et.findall('.//import[#ref]')
for importPlaceholder in importList:
old_dir = os.getcwd()
new_dir = os.path.dirname(importPlaceholder.attrib['ref'])
shallPushd = os.path.exists(new_dir)
if shallPushd:
print " pushd: %s" %(new_dir)
os.chdir(new_dir) # pushd (for relative linking)
# Recursing to import element from file reference
importedElement = getElementTree(os.path.basename(importPlaceholder.attrib['ref']))
# element replacement
parent = parent_map[importPlaceholder]
index = parent._children.index(importPlaceholder)
parent._children[index] = importedElement
if shallPushd:
print " popd: %s" %(old_dir)
os.chdir(old_dir) # popd
return et
xmlET = getElementTree("input1.xml")
print ElementTree.tostring(xmlET)
gives the output:
-- Processing file: 'input1.xml' in: 'C:\temp\testing'
-- Processing file: 'input2.xml' in: 'C:\temp\testing'
<root>
<foo>
<bar>blah blah</bar>
</foo><name awesome="true">Chuck</name>
</root>
this was concluded with information from:
stackoverflow answer: access ElementTree node parent node
accessing parents from effbot.org