<?xml version="1.0" encoding="UTF-8" ?>
<uimap>
<page name="login">
<uielement name="username">
<locator>//input[#type='text']</locator>
</uielement>
<uielement name="password">
<locator>//input[#type='password']</locator>
If I have an XML file like above, what I am trying to get to is, if I did:
login.getlocator("username"), where login is an object of XML section and username, is an attribute of the XML section. getlocator is just a function name that i am probably going to have to write.
The objective is, I want the value of the locator (I mean the text contained in login). Any suggestions on how I can get this going? I looked up BeautifulSoup which uses Python for XML parsing but are there any other options?
One option would be to use lxml and dynamically construct an xpath expression:
from lxml import etree as ET
data = """<?xml version="1.0" encoding="UTF-8" ?>
<uimap>
<page name="login">
<uielement name="username">
<locator>//input[#type='text']</locator>
</uielement>
<uielement name="password">
<locator>//input[#type='password']</locator>
</uielement>
</page>
</uimap>
"""
tree = ET.fromstring(data)
page = 'login'
element = 'username'
print tree.findtext('.//page[#name="{page}"]/uielement[#name="{element}"]/locator'.format(page=page, element=element))
Prints:
//input[#type='text']
Then, you can improve it and extract into a reusable function, like:
def get_locator(tree, page, element):
return tree.findtext('.//page[#name="{page}"]/uielement[#name="{element}"]/locator'.format(page=page, element=element), 'Not Found')
tree = ET.fromstring(data)
print get_locator(tree, 'login', 'username')
print get_locator(tree, 'login', 'password')
print get_locator(tree, 'login', 'invalid element')
Prints:
//input[#type='text']
//input[#type='password']
Not Found
Of course, this still can be improved, but I hope it at least gives you a basic idea.
Related
I would like to modify a key value of an attribute(e.g Change the value of "strokeColor" inside the "style" attribute), and the other values of this attribute can not be changed. I'm using Python's ElementTree included with Python.
Here is an example of what I did before:
Part of my XML example code:
<?xml version="1.0"?>
<mxCell edge="1" id="line1" parent="1" source="main_wins" style="endArrow=none;html=1;entryX=0;entryY=0.25;entryDx=0;entryDy=0;strokeWidth=5;strokeColor=#32AC2D;rounded=0;edgeStyle=orthogonalEdgeStyle;exitX=1;exitY=0.5;exitDx=0;exitDy=0;" target="main-switch" value="">
</mxCell>
My python code:
import xml.etree.ElementTree as ET
tree = ET.parse('example.xml')
target = tree.find('.//mxCell[#id="line1"]')
target.set("strokeColor","#FF0000")
tree.write('output.xml')
My output XML:
<?xml version="1.0"?>
<mxCell edge="1" id="line1" parent="1" source="main_wins" strokeColor="#FF0000" style="endArrow=none;html=1;entryX=0;entryY=0.25;entryDx=0;entryDy=0;strokeWidth=5;strokeColor=#32AC2D;rounded=0;edgeStyle=orthogonalEdgeStyle;exitX=1;exitY=0.5;exitDx=0;exitDy=0;" target="main-switch" value="">
</mxCell>
As you can see, there is a new attribute called "strokeColor", but not changing the strokeColor value inside the "style" attribute. I want to change the strokeColor inside "style" attribute. How can I fix this?
Another method.
from simplified_scrapy import SimplifiedDoc, utils, req
html = '''
<?xml version="1.0"?>
<mxCell edge="1" id="line1" parent="1" source="main_wins" style="endArrow=none;html=1;entryX=0;entryY=0.25;entryDx=0;entryDy=0;strokeWidth=5;strokeColor=#32AC2D;rounded=0;edgeStyle=orthogonalEdgeStyle;exitX=1;exitY=0.5;exitDx=0;exitDy=0;" target="main-switch" value="">
</mxCell>
'''
doc = SimplifiedDoc(html)
mxCell = doc.select('mxCell#line1')
style = doc.replaceReg(mxCell['style'],'strokeColor=.*?;','strokeColor=#FF0000;')
mxCell.setAttr('style',style)
print(doc.html)
Result:
<?xml version="1.0"?>
<mxCell edge="1" id="line1" parent="1" source="main_wins" style="endArrow=none;html=1;entryX=0;entryY=0.25;entryDx=0;entryDy=0;strokeWidth=5;strokeColor=#FF0000;rounded=0;edgeStyle=orthogonalEdgeStyle;exitX=1;exitY=0.5;exitDx=0;exitDy=0;" target="main-switch" value="">
</mxCell>
Here are more examples: https://github.com/yiyedata/simplified-scrapy-demo/tree/master/doc_examples
I'm trying to locate fields in a SOAP xml file using lxml (3.6.0)
...
<soap:Body>
<Request xmlns="http://localhost/">
<Test>
<field1>hello</field1>
<field2>world</field2>
</Test>
</Request>
</soap:Body>
...
In this example I'm trying to find field1 and field2.
I need to add a path to the search term, to find the field:
print (myroot.find(".//{http://localhost/}field1").tag) # prints 'field1'
without it, I don't find anything
print (myroot.find("field1").tag) # finds 'None'
Is there any other way to search for the field tag (here field1) without giving path info?
Full example below:
from lxml import etree
example = """<?xml version="1.0" encoding="utf-8"?><soap:Envelope xmlns:soap="http://schemas.xmlsoap.org/soap/envelope/" xmlns:xsi="http://www.w3.org/2001/XMLSchema-instance" xmlns:xsd="http://www.w3.org/2001/XMLSchema">
<soap:Body><Request xmlns="http://localhost/">
<Test><field1>hello</field1><field2>world</field2></Test>
</Request></soap:Body></soap:Envelope>
"""
myroot = etree.fromstring(example)
# this works
print (myroot.find(".//{http://localhost/}field1").text)
print (myroot.find(".//{http://localhost/}field2").text)
# this fails
print (myroot.find(".//field1").text)
print (myroot.find("field1").text)
Comment: The input of the SOAP request is given, I can't change any of it in real live to make things easier.
There is a way to ignore namespace when selecting element using XPath, but that isn't a good practice. Namespace is there for a reason. Anyway, there is a cleaner way to reference element in namespace i.e by using namespace prefix that was mapped to the namespace uri, instead of using the actual namespace uri every time :
.....
>>> ns = {'d': 'http://localhost/'}
>>> print (myroot.find(".//d:field1", ns).text)
hello
>>> print (myroot.find(".//d:field2", ns).text)
world
So I have the following XML document It is much longer:
<?xml version ="1.0" encoding="UTF-8" standalone="no" ?>
<!DOCTYPE fmresultset PUBLIC "-//FMI//DTD fmresultset//EN" "http://localhost:16020/fmi/xml/fmresultset.dtd">
<fmresultset xmlns="http://www.filemaker.com/xml/fmresultset" version="1.0">
<error code="0">
</error>
<product build="11/11/2014" name="FileMaker Web Publishing Engine" version="13.0.5.518">
</product>
I use the following python to extract some of the tag names:
doc = etree.fromstring(resulttxt)
print( doc.attrib)
print(doc.tag)
print(doc[4][0][0].tag)
if(doc[4][0][0].tag == 'field'):
print 'hi'
What I'm getting though is:
{'version': '1.0'}
{http://www.filemaker.com/xml/fmresultset}fmresultset
{http://www.filemaker.com/xml/fmresultset}field
The xmlns doesn't show up as an attribute of the root tag but it is there.
And it is placed in front of each tag name which makes it difficult to loop through and use conditionals. I want doc.tag just to show the tag and not the namespace and the tag.
This is day 1 for me using this. could anyone help out?
You need to handle namespaces, in your case an empty one:
from lxml import etree as ET
data = """<?xml version ="1.0" encoding="UTF-8" standalone="no" ?>
<!DOCTYPE fmresultset PUBLIC "-//FMI//DTD fmresultset//EN" "http://localhost:16020/fmi/xml/fmresultset.dtd">
<fmresultset xmlns="http://www.filemaker.com/xml/fmresultset" version="1.0">
<error code="0">
</error>
<product build="11/11/2014" name="FileMaker Web Publishing Engine" version="13.0.5.518">
</product>
</fmresultset>
"""
namespaces = {
"myns": "http://www.filemaker.com/xml/fmresultset"
}
tree = ET.fromstring(data)
print tree.find("myns:product", namespaces=namespaces).attrib.get("name")
Prints:
FileMaker Web Publishing Engine
Is there a way to search for the same element, at the same time, within a document that occur with and without namespaces using lxml? As an example, I would want to get all occurences of the element identifier irrespective of whether or not it is associated with a specific namespace. I am currently only able to access them separately as below.
Code:
from lxml import etree
xmlfile = etree.parse('xmlfile.xml')
root = xmlfile.getroot()
for l in root.iter('identifier'):
print l.text
for l in root.iter('{http://www.openarchives.org/OAI/2.0/provenance}identifier'):
print l.text
File: xmlfile.xml
<?xml version="1.0"?>
<record>
<header>
<identifier>identifier1</identifier>
<datestamp>datastamp1</datestamp>
<setSpec>setspec1</setSpec>
</header>
<metadata>
<oai_dc:dc xmlns:dc="http://purl.org/dc/elements/1.1/" xmlns:oai_dc="http://www.openarchives.org/OAI/2.0/oai_dc/" xmlns:xsi="http://www.w3.org/2001/XMLSchema-instance" xsi:schemaLocation="http://www.openarchives.org/OAI/2.0/oai_dc/ http://www.openarchives.org/OAI/2.0/oai_dc.xsd">
<dc:title>title1</dc:title>
<dc:title>title2</dc:title>
<dc:creator>creator1</dc:creator>
<dc:subject>subject1</dc:subject>
<dc:subject>subject2</dc:subject>
</oai_dc:dc>
</metadata>
<about>
<provenance xmlns="http://www.openarchives.org/OAI/2.0/provenance" xmlns:xsi="http://www.w3.org/2001/XMLSchema-instance" xsi:schemaLocation="http://www.openarchives.org/OAI/2.0/provenance http://www.openarchives.org/OAI/2.0/provenance.xsd">
<originDescription altered="false" harvestDate="2011-08-11T03:47:51Z">
<baseURL>baseURL1</baseURL>
<identifier>identifier3</identifier>
<datestamp>datestamp2</datestamp>
<metadataNamespace>xxxxx</metadataNamespace>
<originDescription altered="false" harvestDate="2010-10-10T06:15:53Z">
<baseURL>xxxxx</baseURL>
<identifier>identifier4</identifier>
<datestamp>2010-04-27T01:10:31Z</datestamp>
<metadataNamespace>xxxxx</metadataNamespace>
</originDescription>
</originDescription>
</provenance>
</about>
</record>
You could use XPath to solve that kind of issue:
from lxml import etree
xmlfile = etree.parse('xmlfile.xml')
identifier_nodes = xmlfile.xpath("//*[local-name() = 'identifier']")
Please help me parse a configuration file of the below prototype using lxml etree. I tried with for event, element with tostring. Unfortunately I don't need the text, but the XML between
<template name>
<config>
</template>
for a given attribute.
I started with this code, but get a key error while searching for the attribute since it scans from start
config_tree = etree.iterparse(token_template_file)
for event, element in config_tree:
if element.attrib['name']=="ad auth":
print ("attrib reached. get XML before child ends")
Since I am a newbie to XML and python, I am not sure how to go about it. Here is the config file:
<Templates>
<template name="config1">
<request>
<password>pass</password>
<userName>username</userName>
<appID>someapp</appID>
</request>
</template>
<template name="config2">
<request>
<password>pass1</password>
<userName>username1</userName>
<appID>someapp</appID>
</request>
</template>
</Templates>
Thanks in advance!
Expected Output:
Say the user requests the config2- then the output should look like:
<request>
<password>pass1</password>
<userName>username1</userName>
<appID>someapp</appID>
</request>
(I send this XML using httplib2 to a server for initial authentication)
FINAL CODE:
thanks to FC and Constantnius. Here is the final code:
config_tree = etree.parse(token_template_file)
for template in config_tree.iterfind("template"):
if template.get("name") == "config2":
element = etree.tostring(template.find("request"))
print (template.get("name"))
print (element)
output:
config2
<request>
<password>pass1</password>
<userName>username1</userName>
<appID>someapp</appID>
</request>
You could try to iterate over all template elements in the XML and parse them with the following code:
for template in root.iterfind("template"):
name = template.get("name")
request = template.find(requst)
password = template.findtext("request/password")
username = ...
...
# Do something with the values
You could try using get('name', default='') instead of ['name']
To get the text in the tag use .text