parsing XML configuration file using Etree in python - python

Please help me parse a configuration file of the below prototype using lxml etree. I tried with for event, element with tostring. Unfortunately I don't need the text, but the XML between
<template name>
<config>
</template>
for a given attribute.
I started with this code, but get a key error while searching for the attribute since it scans from start
config_tree = etree.iterparse(token_template_file)
for event, element in config_tree:
if element.attrib['name']=="ad auth":
print ("attrib reached. get XML before child ends")
Since I am a newbie to XML and python, I am not sure how to go about it. Here is the config file:
<Templates>
<template name="config1">
<request>
<password>pass</password>
<userName>username</userName>
<appID>someapp</appID>
</request>
</template>
<template name="config2">
<request>
<password>pass1</password>
<userName>username1</userName>
<appID>someapp</appID>
</request>
</template>
</Templates>
Thanks in advance!
Expected Output:
Say the user requests the config2- then the output should look like:
<request>
<password>pass1</password>
<userName>username1</userName>
<appID>someapp</appID>
</request>
(I send this XML using httplib2 to a server for initial authentication)
FINAL CODE:
thanks to FC and Constantnius. Here is the final code:
config_tree = etree.parse(token_template_file)
for template in config_tree.iterfind("template"):
if template.get("name") == "config2":
element = etree.tostring(template.find("request"))
print (template.get("name"))
print (element)
output:
config2
<request>
<password>pass1</password>
<userName>username1</userName>
<appID>someapp</appID>
</request>

You could try to iterate over all template elements in the XML and parse them with the following code:
for template in root.iterfind("template"):
name = template.get("name")
request = template.find(requst)
password = template.findtext("request/password")
username = ...
...
# Do something with the values

You could try using get('name', default='') instead of ['name']
To get the text in the tag use .text

Related

How to modify XML element using Python Elementtree

I would like to modify a key value of an attribute(e.g Change the value of "strokeColor" inside the "style" attribute), and the other values of this attribute can not be changed. I'm using Python's ElementTree included with Python.
Here is an example of what I did before:
Part of my XML example code:
<?xml version="1.0"?>
<mxCell edge="1" id="line1" parent="1" source="main_wins" style="endArrow=none;html=1;entryX=0;entryY=0.25;entryDx=0;entryDy=0;strokeWidth=5;strokeColor=#32AC2D;rounded=0;edgeStyle=orthogonalEdgeStyle;exitX=1;exitY=0.5;exitDx=0;exitDy=0;" target="main-switch" value="">
</mxCell>
My python code:
import xml.etree.ElementTree as ET
tree = ET.parse('example.xml')
target = tree.find('.//mxCell[#id="line1"]')
target.set("strokeColor","#FF0000")
tree.write('output.xml')
My output XML:
<?xml version="1.0"?>
<mxCell edge="1" id="line1" parent="1" source="main_wins" strokeColor="#FF0000" style="endArrow=none;html=1;entryX=0;entryY=0.25;entryDx=0;entryDy=0;strokeWidth=5;strokeColor=#32AC2D;rounded=0;edgeStyle=orthogonalEdgeStyle;exitX=1;exitY=0.5;exitDx=0;exitDy=0;" target="main-switch" value="">
</mxCell>
As you can see, there is a new attribute called "strokeColor", but not changing the strokeColor value inside the "style" attribute. I want to change the strokeColor inside "style" attribute. How can I fix this?
Another method.
from simplified_scrapy import SimplifiedDoc, utils, req
html = '''
<?xml version="1.0"?>
<mxCell edge="1" id="line1" parent="1" source="main_wins" style="endArrow=none;html=1;entryX=0;entryY=0.25;entryDx=0;entryDy=0;strokeWidth=5;strokeColor=#32AC2D;rounded=0;edgeStyle=orthogonalEdgeStyle;exitX=1;exitY=0.5;exitDx=0;exitDy=0;" target="main-switch" value="">
</mxCell>
'''
doc = SimplifiedDoc(html)
mxCell = doc.select('mxCell#line1')
style = doc.replaceReg(mxCell['style'],'strokeColor=.*?;','strokeColor=#FF0000;')
mxCell.setAttr('style',style)
print(doc.html)
Result:
<?xml version="1.0"?>
<mxCell edge="1" id="line1" parent="1" source="main_wins" style="endArrow=none;html=1;entryX=0;entryY=0.25;entryDx=0;entryDy=0;strokeWidth=5;strokeColor=#FF0000;rounded=0;edgeStyle=orthogonalEdgeStyle;exitX=1;exitY=0.5;exitDx=0;exitDy=0;" target="main-switch" value="">
</mxCell>
Here are more examples: https://github.com/yiyedata/simplified-scrapy-demo/tree/master/doc_examples

lxml: How do I search for fields without adding a xmlns (localhost) path to each search term?

I'm trying to locate fields in a SOAP xml file using lxml (3.6.0)
...
<soap:Body>
<Request xmlns="http://localhost/">
<Test>
<field1>hello</field1>
<field2>world</field2>
</Test>
</Request>
</soap:Body>
...
In this example I'm trying to find field1 and field2.
I need to add a path to the search term, to find the field:
print (myroot.find(".//{http://localhost/}field1").tag) # prints 'field1'
without it, I don't find anything
print (myroot.find("field1").tag) # finds 'None'
Is there any other way to search for the field tag (here field1) without giving path info?
Full example below:
from lxml import etree
example = """<?xml version="1.0" encoding="utf-8"?><soap:Envelope xmlns:soap="http://schemas.xmlsoap.org/soap/envelope/" xmlns:xsi="http://www.w3.org/2001/XMLSchema-instance" xmlns:xsd="http://www.w3.org/2001/XMLSchema">
<soap:Body><Request xmlns="http://localhost/">
<Test><field1>hello</field1><field2>world</field2></Test>
</Request></soap:Body></soap:Envelope>
"""
myroot = etree.fromstring(example)
# this works
print (myroot.find(".//{http://localhost/}field1").text)
print (myroot.find(".//{http://localhost/}field2").text)
# this fails
print (myroot.find(".//field1").text)
print (myroot.find("field1").text)
Comment: The input of the SOAP request is given, I can't change any of it in real live to make things easier.
There is a way to ignore namespace when selecting element using XPath, but that isn't a good practice. Namespace is there for a reason. Anyway, there is a cleaner way to reference element in namespace i.e by using namespace prefix that was mapped to the namespace uri, instead of using the actual namespace uri every time :
.....
>>> ns = {'d': 'http://localhost/'}
>>> print (myroot.find(".//d:field1", ns).text)
hello
>>> print (myroot.find(".//d:field2", ns).text)
world

Modify XML node using Python

I am taking an XML file as an input, have to search with a keyword i.e GENTEST05.
If found , then I need to pick up its parent node (in this example I want to pick up <ScriptElement>) and then replace the complete node <ScriptElement>blahblah</ScriptElement> with a new content.
...
...
<ScriptElement>
<ScriptElement>
<ScriptElement>
<ElementData xsi:type="anyData">
<DeltaTime>
<Area>
<Datatype>USER PROMPT [GENTEST05]</Datatype>
<Description />
<Multipartmessage>False<Multipartmessage>
<Comment>false</Comment>
</ElementData>
</ScriptElement>
<ScriptElement>
<ScriptElement>
...
...
...
I am trying to do this using Beautifulsoup. This is what I've done so far but not getting a proper way to proceed. Other than beautifulsoup, ElementTree or any other suggestion is welcome.
import sys
from BeautifulSoup import BeautifulStoneSoup as bs
xmlsoup = bs(open('file_xml' , 'r'))
a = raw_input('Enter Text')
paraText = xmlsoup.findAll(text=a)
print paraText
print paraText.findParent()
Ok here is some sample code to get you started. I used ElementTree because it's a builtin module and quite suitable for this type of task.
Here is the XML file I used:
<?xml version="1.0" ?>
<Script>
<ScriptElement/>
<ScriptElement/>
<ScriptElement>
<ElementData>
<DeltaTime/>
<Area/>
<Datatype>USER PROMPT [GENTEST05]</Datatype>
<Description/>
<Multipartmessage>False</Multipartmessage>
<Comment>false</Comment>
</ElementData>
</ScriptElement>
<ScriptElement/>
<ScriptElement/>
</Script>
Here is the python program:
import sys
import xml.etree.ElementTree as ElementTree
tree = ElementTree.parse("test.xml")
root = tree.getroot()
#The keyword to find and remove
keyword = "GENTEST05"
for element in list(root):
for sub in element.iter():
if sub.text and keyword in sub.text:
root.remove(element)
print ElementTree.tostring(root)
sys.exit()
I have kept the program simple so that you can improve on it. Since your XML has one root node, I am assuming you want to remove all parent elements of the keyword-matched element directly up to the root. In ElementTree, you can call root.remove() to remove the <ScriptElement> element that is the ancestory of the keyword-matched element.
This is just to get you started: this will only remove the first element, then print the resulting tree and quit.
Output:
<Script>
<ScriptElement />
<ScriptElement />
<ScriptElement />
<ScriptElement />
</Script>

Python function for getting locator values

<?xml version="1.0" encoding="UTF-8" ?>
<uimap>
<page name="login">
<uielement name="username">
<locator>//input[#type='text']</locator>
</uielement>
<uielement name="password">
<locator>//input[#type='password']</locator>
If I have an XML file like above, what I am trying to get to is, if I did:
login.getlocator("username"), where login is an object of XML section and username, is an attribute of the XML section. getlocator is just a function name that i am probably going to have to write.
The objective is, I want the value of the locator (I mean the text contained in login). Any suggestions on how I can get this going? I looked up BeautifulSoup which uses Python for XML parsing but are there any other options?
One option would be to use lxml and dynamically construct an xpath expression:
from lxml import etree as ET
data = """<?xml version="1.0" encoding="UTF-8" ?>
<uimap>
<page name="login">
<uielement name="username">
<locator>//input[#type='text']</locator>
</uielement>
<uielement name="password">
<locator>//input[#type='password']</locator>
</uielement>
</page>
</uimap>
"""
tree = ET.fromstring(data)
page = 'login'
element = 'username'
print tree.findtext('.//page[#name="{page}"]/uielement[#name="{element}"]/locator'.format(page=page, element=element))
Prints:
//input[#type='text']
Then, you can improve it and extract into a reusable function, like:
def get_locator(tree, page, element):
return tree.findtext('.//page[#name="{page}"]/uielement[#name="{element}"]/locator'.format(page=page, element=element), 'Not Found')
tree = ET.fromstring(data)
print get_locator(tree, 'login', 'username')
print get_locator(tree, 'login', 'password')
print get_locator(tree, 'login', 'invalid element')
Prints:
//input[#type='text']
//input[#type='password']
Not Found
Of course, this still can be improved, but I hope it at least gives you a basic idea.

List only one category Python xml

I am trying to write a python program that uses DOM to read xml file and print another xml structure that list from only one node with particular selected attribute "fun".
<?xml version="1.0" encoding="ISO-8859-1"?>
<website>
<url category="fun">
<title>Fun world</title>
<author>Jack</author>
<year>2010</year>
<price>100.00</price>
</url>
<url category="entertainment">
<title>Fun world</title>
<author>Jack</author>
<year>2010</year>
<price>100.00</price>
</url>
</website>
I couldn't select the list from the URL having category="fun".
I tried this code:
for n in dom.getElementsByTagName('url'):
s = n.attribute['category']
if (s.value == "fun"):
print n.toxml()
Can you guys help to me to debug my code?
nb: One of your tags opens "Website" and attempts to close "website" - so you'll want to fix that one...
You've mentioned lxml.
from lxml import etree as et
root = et.fromstring(xml)
fun = root.xpath('/Website/url[#category="fun"]')
for node in fun:
print et.tostring(node)
Use getAttribute:
for n in dom.getElementsByTagName('url'):
if (n.getAttribute('category') == "fun"):
print(n.toxml())

Categories

Resources