Remove element in a XML file with Python - python

I'm a newbie with Python and I'd like to remove the element openingHours and the child elements from the XML.
I have this input
<Root>
<stations>
<station id= "1">
<name>whatever</name>
<openingHours>
<openingHour>
<entrance>main</entrance>
<timeInterval>
<from>05:30</from>
<to>21:30</to>
</timeInterval>
<openingHour/>
<openingHours>
<station/>
<station id= "2">
<name>foo</name>
<openingHours>
<openingHour>
<entrance>main</entrance>
<timeInterval>
<from>06:30</from>
<to>21:30</to>
</timeInterval>
<openingHour/>
<openingHours>
<station/>
<stations/>
<Root/>
I'd like this output
<Root>
<stations>
<station id= "1">
<name>whatever</name>
<station/>
<station id= "2">
<name>foo</name>
<station/>
<stations/>
<Root/>
So far I've tried this from another thread How to remove elements from XML using Python
from lxml import etree
doc=etree.parse('stations.xml')
for elem in doc.xpath('//*[attribute::openingHour]'):
parent = elem.getparent()
parent.remove(elem)
print(etree.tostring(doc))
However, It doesn't seem to be working.
Thanks

I took your code for a spin but at first Python couldn't agree with the way you composed your XML, wanting the / in the closing tag to be at the beginning (like </...>) instead of at the end (<.../>).
That aside, the reason your code isn't working is because the xpath expression is looking for the attribute openingHour while in reality you want to look for elements called openingHours. I got it to work by changing the expression to //openingHours. Making the entire code:
from lxml import etree
doc=etree.parse('stations.xml')
for elem in doc.xpath('//openingHours'):
parent = elem.getparent()
parent.remove(elem)
print(etree.tostring(doc))

You want to remove the tags <openingHours> and not some attribute with name openingHour:
from lxml import etree
doc = etree.parse('stations.xml')
for elem in doc.findall('.//openingHours'):
parent = elem.getparent()
parent.remove(elem)
print(etree.tostring(doc))

Related

How can I print XPaths of lxml tree elements?

I'm trying to print XPaths of all elements in XML tree, but I get strange output when using lxml. Instead of xpath which contains name of each node within path, I get strange "*"-kind of output.
Do you know what might be the issue here? Here the code, as well as XML I am trying to analyze.
from lxml import etree
xml = """
<filter xmlns="urn:ietf:params:xml:ns:netconf:base:1.0">
<bundles xmlns="http://cisco.com/ns/yang/Cisco-IOS-XR-bundlemgr-oper">
<bundles>
<bundle>
<data>
<bundle-status/>
<lacp-status/>
<minimum-active-links/>
<ipv4bfd-status/>
<active-member-count/>
<active-member-configured/>
</data>
<members>
<member>
<member-interface/>
<interface-name/>
<member-mux-data>
<member-state/>
</member-mux-data>
</member>
</members>
<bundle-interface>{{bundle_name}}</bundle-interface>
</bundle>
</bundles>
</bundles>
<bfd xmlns="http://cisco.com/ns/yang/Cisco-IOS-XR-ip-bfd-oper">
<session-briefs>
<session-brief>
<state/>
<interface-name>{{bundle_name}}</interface-name>
</session-brief>
</session-briefs>
</bfd>
</filter>
"""
root = etree.XML(xml)
tree = etree.ElementTree(root)
for element in root.iter():
print(tree.getpath(element))
The output looks like this (there should be node names instead of "*"):
/*
/*/*[1]
/*/*[1]/*
/*/*[1]/*/*
/*/*[1]/*/*/*[1]
/*/*[1]/*/*/*[1]/*[1]
/*/*[1]/*/*/*[1]/*[2]
/*/*[1]/*/*/*[1]/*[3]
/*/*[1]/*/*/*[1]/*[4]
/*/*[1]/*/*/*[1]/*[5]
/*/*[1]/*/*/*[1]/*[6]
/*/*[1]/*/*/*[2]
/*/*[1]/*/*/*[2]/*
/*/*[1]/*/*/*[2]/*/*[1]
/*/*[1]/*/*/*[2]/*/*[2]
/*/*[1]/*/*/*[2]/*/*[3]
/*/*[1]/*/*/*[2]/*/*[3]/*
/*/*[1]/*/*/*[3]
/*/*[2]
/*/*[2]/*
/*/*[2]/*/*
/*/*[2]/*/*/*[1]
/*/*[2]/*/*/*[2]
Thanks a lot!
Dragan
I found that besides getpath, etree contains also a "sibling"
method called getelementpath, giving proper result also for
namespaced elements.
So change your code to:
for element in root.iter():
print(tree.getelementpath(element))
For your source sample, with namespaces shortened for readability,
the initial part of the result is:
.
{http://cisco.com/ns}bundles
{http://cisco.com/ns}bundles/{http://cisco.com/ns}bundles

Python -lxml xpath returns empty list

I am reading an xliff file and planning to retrieve specific element. I tried to print all the elements using
from lxml import etree
with open('path\to\file\.xliff', 'r',encoding = 'utf-8') as xml_file:
tree = etree.parse(xml_file)
root = tree.getroot()
for element in root.iter():
print("child", element)
The output was
child <Element {urn:oasis:names:tc:xliff:document:2.0}segment at 0x6b8f9c8>
child <Element {urn:oasis:names:tc:xliff:document:2.0}source at 0x6b8f908>
When I tried to get the specific element (with the help of many posts here) - source tag
segment = tree.xpath('{urn:oasis:names:tc:xliff:document:2.0}segment')
print(segment)
it returns an empty list. Can someone tell me how to retrieve it properly.
Input :
<?xml version='1.0' encoding='UTF-8'?>
<xliff xmlns="urn:oasis:names:tc:xliff:document:2.0" version="2.0">
<segment id = 1>
<source>
Hello world
</source>
</segment>
<segment id = 2 >
<source>
2nd statement
</source>
</segment>
</xliff>
I want to get the values of segment and its corresponding source
This code,
tree.xpath('{urn:oasis:names:tc:xliff:document:2.0}segment')
is not accepted by lxml ("lxml.etree.XPathEvalError: Invalid expression"). You need to use findall().
The following works (in the XML sample, the segment elements are children of xliff):
from lxml import etree
tree = etree.parse("test.xliff") # XML in the question; ill-formed attributes corrected
segment = tree.findall('{urn:oasis:names:tc:xliff:document:2.0}segment')
print(segment)
However, the real XML is apparently more complex (segment is not a direct child of xliff). Then you need to add .// to search the whole tree:
segment = tree.findall('.//{urn:oasis:names:tc:xliff:document:2.0}segment')

Python xml etree find parent node by text of child

I have an XML that's like this
<xml>
<access>
<user>
<name>user1</name>
<group>testgroup</group>
</user>
<user>
<name>user2</name>
<group>testgroup</group>
</user>
<access>
</xml>
I now want to add a <group>testgroup2</group> to the user1 subtree.
Using the following I can get the name
access = root.find('access')
name = [element for element in access.iter() if element.text == 'user1']
But I can't access the parent using name.find('..') it tells me
AttributeError: 'list' object has no attribute 'find'.
Is there any possibility to access the exact <user> child of <access> where the text in name is "user1"?
Expected result:
<xml>
<access>
<user>
<name>user1</name>
<group>testgroup</group>
<group>testgroup2</group>
</user>
<user>
<name>user2</name>
<group>testgroup</group>
</user>
<access>
</xml>
Important notice: I can NOT use lxml to use getparent() method, I am stuck to xml.etree
To do that, using 'find', you need to do like this: for ele in name:
ele.find('..') # To access ele as an element
Here is how I solved this, if anyone is interested in doing this stuff in xml instead of lxml (why ever).
According to suggestion from
http://effbot.org/zone/element.htm#accessing-parents
import xml.etree.ElementTree as et
tree = et.parse(my_xmlfile)
root = tree.getroot()
access = root.find('access')
# ... snip ...
def iterparent(tree):
for parent in tree.getiterator():
for child in parent:
yield parent, child
# users = list of user-names that need new_group added
# iter through tupel and find the username
# alter xml tree when found
for user in users:
print "processing user: %s" % user
for parent, child in iterparent(access):
if child.tag == "name" and child.text == user:
print "Name found: %s" % user
parent.append(et.fromstring('<group>%s</group>' % new_group))
After this et.dump(tree) shows that tree now contains the correctly altered user-subtree with another group tag added.
Note: I am not really sure why this works, I just expect that yield gives a reference to the tree and therefore altering the parent yield returned alters the original tree. My python knowledge is not good enough to be sure about this tho. I just know that it works for me this way.
You can write a recursive method to iterate through the tree and capture the parents.
def recurse_tree(node):
for child in node.getchildren():
if child.text == 'user1':
yield node
for subchild in recurse_tree(child):
yield subchild
print list(recurse_tree(root))
# [<Element 'user' at 0x18a1470>]
If you're using Python 3.X, you can use the nifty yield from ... syntax rather than iterating over the recursive call.
Note that this could possibly yield the same node more than once (if there are multiple children containing the target text). You can either use a set to remove duplicates, or you can alter the control flow to prevent this from happening.
you can directly use findall() method to get the parent node that match the name='user1'. see below code
import xml.etree.ElementTree as ET
tree = ET.parse('test.xml') #build tree object using your xml
root = tree.getroot() #using tree object get the root
for parent in root.findall(".//*[name='user1']"):
# the predicate [name='user1'] preceded by asterisk will give
# all elements where child having name='user1'
parent.append(ET.fromstring("<group>testgroup2</group>"))
# if you want to see the xml after adding the string
ET.dump(root)
# optionally to save the xml
tree.write('output.xml')

XML parsing specific values - Python

I've been attempting to parse a list of xml files. I'd like to print specific values such as the userName value.
<?xml version="1.0" encoding="utf-8"?>
<Drives clsid="{8FDDCC1A-0C3C-43cd-A6B4-71A6DF20DA8C}"
disabled="1">
<Drive clsid="{935D1B74-9CB8-4e3c-9914-7DD559B7A417}"
name="S:"
status="S:"
image="2"
changed="2007-07-06 20:57:37"
uid="{4DA4A7E3-F1D8-4FB1-874F-D2F7D16F7065}">
<Properties action="U"
thisDrive="NOCHANGE"
allDrives="NOCHANGE"
userName=""
cpassword=""
path="\\scratch"
label="SCRATCH"
persistent="1"
useLetter="1"
letter="S"/>
</Drive>
</Drives>
My script is working fine collecting a list of xml files etc. However the below function is to print the relevant values. I'm trying to achieve this as suggested in this post. However I'm clearly doing something incorrectly as I'm getting errors suggesting that elm object has no attribute text. Any help would be appreciated.
Current Code
from lxml import etree as ET
def read_files(files):
for fi in files:
doc = ET.parse(fi)
elm = doc.find('userName')
print elm.text
doc.find looks for a tag with the given name. You are looking for an attribute with the given name.
elm.text is giving you an error because doc.find doesn't find any tags, so it returns None, which has no text property.
Read the lxml.etree docs some more, and then try something like this:
doc = ET.parse(fi)
root = doc.getroot()
prop = root.find(".//Properties") # finds the first <Properties> tag anywhere
elm = prop.attrib['userName']
userName is an attribute, not an element. Attributes don't have text nodes attached to them at all.
for el in doc.xpath('//*[#userName]'):
print el.attrib['userName']
You can try to take the element using the tag name and then try to take its attribute (userName is an attribute for Properties):
from lxml import etree as ET
def read_files(files):
for fi in files:
doc = ET.parse(fi)
props = doc.getElementsByTagName('Properties')
elm = props[0].attributes['userName']
print elm.value

Python lxml: Items without .text attribute returned when querying for nodes()

I am trying to parse out certain tags from an XML document and it is retiring an AttributeError: '_ElementStringResult' object has no attribute 'text' error.
Here is the xml document:
<?xml version='1.0' encoding='ASCII'?>
<Root>
<Data>
<FormType>Log</FormType>
<Submitted>2012-03-19 07:34:07</Submitted>
<ID>1234</ID>
<LAST>SJTK4</LAST>
<Latitude>36.7027777778</Latitude>
<Longitude>-108.046111111</Longitude>
<Speed>0.0</Speed>
</Data>
</Root>
Here is the code I am using
from lxml import etree
from StringIO import StringIO
import MySQLdb
import glob
import os
import shutil
import logging
import sys
localPath = "C:\data"
xmlFiles = glob.glob1(localPath,"*.xml")
for file in xmlFiles:
a = os.path.join(localPath,file)
element = etree.parse(a)
Data = element.xpath('//Root/Data/node()')
parsedData = [{field.tag: field.text for field in Data} for action in Data]
print parsedData #AttributeError: '_ElementStringResult' object has no attribute 'text'
'//Root/Data/node()' will return a list of all the child elements which include text elements as strings which will not have a text attribute. If you put a print right after the Data = ... you will see something like ['\n ', <Element FormType at 0x10675fdc0>, '\n ', ....
I would do a filter first such as:
Data = [f for f in elem.xpath('//Root/Data/node()') if hasattr(f, 'text')]
Then I think the following line could be rewritten as:
parsedData = {field.tag: field.text for field in Data}
which will give the element tag and text dictionary which I believe is what you want.
Instead of querying for //Root/Data/node(), query for /Root/Data/* if you want only elements (as opposed to text nodes) to be returned. (Also, using only a single leading / rather than // allows the engine to do a cheaper search, rather than needing to look through the whole subtree for an additional Root.
Also -- are you sure you really want to loop through the entire list of subelements of Data inside your inner loop, rather than looping over only the subelements of a single Data element selected by your outer loop? I think your logic is broken, though it would only be visible if you had a file with more than one Data element under Root.

Categories

Resources