I am trying to write a python program that uses DOM to read xml file and print another xml structure that list from only one node with particular selected attribute "fun".
<?xml version="1.0" encoding="ISO-8859-1"?>
<website>
<url category="fun">
<title>Fun world</title>
<author>Jack</author>
<year>2010</year>
<price>100.00</price>
</url>
<url category="entertainment">
<title>Fun world</title>
<author>Jack</author>
<year>2010</year>
<price>100.00</price>
</url>
</website>
I couldn't select the list from the URL having category="fun".
I tried this code:
for n in dom.getElementsByTagName('url'):
s = n.attribute['category']
if (s.value == "fun"):
print n.toxml()
Can you guys help to me to debug my code?
nb: One of your tags opens "Website" and attempts to close "website" - so you'll want to fix that one...
You've mentioned lxml.
from lxml import etree as et
root = et.fromstring(xml)
fun = root.xpath('/Website/url[#category="fun"]')
for node in fun:
print et.tostring(node)
Use getAttribute:
for n in dom.getElementsByTagName('url'):
if (n.getAttribute('category') == "fun"):
print(n.toxml())
Related
I am writing program to work on xml file and change it. But when I try to get to any part of it I get some extra part.
My xml file:
<?xml version="1.0" encoding="UTF-8"?>
<Package xmlns="http://soap.sforce.com/2006/04/metadata">
<types>
<members>sbaa__ApprovalChain__c.ExternalID__c</members>
<members>sbaa__ApprovalCondition__c.ExternalID__c</members>
<members>sbaa__ApprovalRule__c.ExternalID__c</members>
<name>CustomField</name>
</types>
<version>40.0</version>
</Package>
And I have my code:
from lxml import etree
import sys
tree = etree.parse('package.xml')
root = tree.getroot()
print( root[0][0].tag )
As output I expect to see members but I get something like this:
{http://soap.sforce.com/2006/04/metadata}members
Why do I see that url and how to stop it from showing up?
You have defined a default namespace (Wikipedia, lxml tutorial). When defined, it is a part of every child tag.
If you want to print the tag without the namespace, it's easy
tag = root[0][0].tag
print(tag[tag.find('}')+1:])
If you want to remove the namespace from XML, see this question.
I know this is a very common question, but the kind of XML file and the kind of extraction of data i need is a little unique due to the nature of the xml file. So appreciate any help on the steps to extract the required data, with pyhton2.7
I have the below XML
<?xml version="1.0" encoding="UTF-8"?>
<Package xmlns="http://soap.sforce.com/2006/04/metadata">
<types>
<members>Mango.XYZ_DIG_Team_ABCDEF_Mango_Review</members>
<members>Mango.XYZ_DIG_Team_Reporting_Mango_Review</members>
<members>Opportunity.A_T_Occupier_City_Job_List</members>
<name>ListView</name>
</types>
<types>
<members>Modify_All_Data_Permission</members>
<members>Opportunity_Alerts_Implementation</members>
<members>Process_Builder_Permission</members>
<members>Regional_Business_Support</members>
<members>Reports_Dashboards_Data_Export_for_Super_Users</members>
<name>PermissionSet</name>
</types>
<types>
<members>SolutionManager</members>
<members>Standard</members>
<name>Profile</name>
</types>
<types>
<members>Mango.Set Verified Date and System Id</members>
<members>Mango.Update Mango Site With Billing Street%2C City%2C Country</members>
<members>Mango.Update Family Id on Mango when created</members>
<members>Opportunity.Set Opportunity Name</members>
<name>WorkflowRule</name>
</types>
<version>38.0</version>
</Package>
i am trying to extract only the members from the PermissionSet block. So that eventually i will have a file, that only have the entries like
Modify_All_Data_Permission
Opportunity_Alerts_Implementation
Process_Builder_Permission
Regional_Business_Support
Reports_Dashboards_Data_Export_for_Super_Users
I have been able to extract only the 'name' tag by
from xml.dom import minidom
doc = minidom.parse("path_to_xmlFile")
t = doc.getElementsByTagName("types")
for n in t:
name = n.getElementsByTagName("name")[0]
print name.firstChild.data
How can i extract the members and save that to a file?
Note: the number of 'members' are not fixed they varies.
I can also try with a different library, if it serves the purpose.
Probably easiest to use XPath
import xml.etree.ElementTree as ET
root = ET.parse('file.xml').getroot()
for member in root.findall(".//members/")
print(member.text)
This may help you!
import xml.etree.ElementTree as ET
tree = ET.parse('file.xml')
root = tree.getroot()
for data in root[1]:
print data.text
I am reading an xliff file and planning to retrieve specific element. I tried to print all the elements using
from lxml import etree
with open('path\to\file\.xliff', 'r',encoding = 'utf-8') as xml_file:
tree = etree.parse(xml_file)
root = tree.getroot()
for element in root.iter():
print("child", element)
The output was
child <Element {urn:oasis:names:tc:xliff:document:2.0}segment at 0x6b8f9c8>
child <Element {urn:oasis:names:tc:xliff:document:2.0}source at 0x6b8f908>
When I tried to get the specific element (with the help of many posts here) - source tag
segment = tree.xpath('{urn:oasis:names:tc:xliff:document:2.0}segment')
print(segment)
it returns an empty list. Can someone tell me how to retrieve it properly.
Input :
<?xml version='1.0' encoding='UTF-8'?>
<xliff xmlns="urn:oasis:names:tc:xliff:document:2.0" version="2.0">
<segment id = 1>
<source>
Hello world
</source>
</segment>
<segment id = 2 >
<source>
2nd statement
</source>
</segment>
</xliff>
I want to get the values of segment and its corresponding source
This code,
tree.xpath('{urn:oasis:names:tc:xliff:document:2.0}segment')
is not accepted by lxml ("lxml.etree.XPathEvalError: Invalid expression"). You need to use findall().
The following works (in the XML sample, the segment elements are children of xliff):
from lxml import etree
tree = etree.parse("test.xliff") # XML in the question; ill-formed attributes corrected
segment = tree.findall('{urn:oasis:names:tc:xliff:document:2.0}segment')
print(segment)
However, the real XML is apparently more complex (segment is not a direct child of xliff). Then you need to add .// to search the whole tree:
segment = tree.findall('.//{urn:oasis:names:tc:xliff:document:2.0}segment')
I am taking an XML file as an input, have to search with a keyword i.e GENTEST05.
If found , then I need to pick up its parent node (in this example I want to pick up <ScriptElement>) and then replace the complete node <ScriptElement>blahblah</ScriptElement> with a new content.
...
...
<ScriptElement>
<ScriptElement>
<ScriptElement>
<ElementData xsi:type="anyData">
<DeltaTime>
<Area>
<Datatype>USER PROMPT [GENTEST05]</Datatype>
<Description />
<Multipartmessage>False<Multipartmessage>
<Comment>false</Comment>
</ElementData>
</ScriptElement>
<ScriptElement>
<ScriptElement>
...
...
...
I am trying to do this using Beautifulsoup. This is what I've done so far but not getting a proper way to proceed. Other than beautifulsoup, ElementTree or any other suggestion is welcome.
import sys
from BeautifulSoup import BeautifulStoneSoup as bs
xmlsoup = bs(open('file_xml' , 'r'))
a = raw_input('Enter Text')
paraText = xmlsoup.findAll(text=a)
print paraText
print paraText.findParent()
Ok here is some sample code to get you started. I used ElementTree because it's a builtin module and quite suitable for this type of task.
Here is the XML file I used:
<?xml version="1.0" ?>
<Script>
<ScriptElement/>
<ScriptElement/>
<ScriptElement>
<ElementData>
<DeltaTime/>
<Area/>
<Datatype>USER PROMPT [GENTEST05]</Datatype>
<Description/>
<Multipartmessage>False</Multipartmessage>
<Comment>false</Comment>
</ElementData>
</ScriptElement>
<ScriptElement/>
<ScriptElement/>
</Script>
Here is the python program:
import sys
import xml.etree.ElementTree as ElementTree
tree = ElementTree.parse("test.xml")
root = tree.getroot()
#The keyword to find and remove
keyword = "GENTEST05"
for element in list(root):
for sub in element.iter():
if sub.text and keyword in sub.text:
root.remove(element)
print ElementTree.tostring(root)
sys.exit()
I have kept the program simple so that you can improve on it. Since your XML has one root node, I am assuming you want to remove all parent elements of the keyword-matched element directly up to the root. In ElementTree, you can call root.remove() to remove the <ScriptElement> element that is the ancestory of the keyword-matched element.
This is just to get you started: this will only remove the first element, then print the resulting tree and quit.
Output:
<Script>
<ScriptElement />
<ScriptElement />
<ScriptElement />
<ScriptElement />
</Script>
Please help me parse a configuration file of the below prototype using lxml etree. I tried with for event, element with tostring. Unfortunately I don't need the text, but the XML between
<template name>
<config>
</template>
for a given attribute.
I started with this code, but get a key error while searching for the attribute since it scans from start
config_tree = etree.iterparse(token_template_file)
for event, element in config_tree:
if element.attrib['name']=="ad auth":
print ("attrib reached. get XML before child ends")
Since I am a newbie to XML and python, I am not sure how to go about it. Here is the config file:
<Templates>
<template name="config1">
<request>
<password>pass</password>
<userName>username</userName>
<appID>someapp</appID>
</request>
</template>
<template name="config2">
<request>
<password>pass1</password>
<userName>username1</userName>
<appID>someapp</appID>
</request>
</template>
</Templates>
Thanks in advance!
Expected Output:
Say the user requests the config2- then the output should look like:
<request>
<password>pass1</password>
<userName>username1</userName>
<appID>someapp</appID>
</request>
(I send this XML using httplib2 to a server for initial authentication)
FINAL CODE:
thanks to FC and Constantnius. Here is the final code:
config_tree = etree.parse(token_template_file)
for template in config_tree.iterfind("template"):
if template.get("name") == "config2":
element = etree.tostring(template.find("request"))
print (template.get("name"))
print (element)
output:
config2
<request>
<password>pass1</password>
<userName>username1</userName>
<appID>someapp</appID>
</request>
You could try to iterate over all template elements in the XML and parse them with the following code:
for template in root.iterfind("template"):
name = template.get("name")
request = template.find(requst)
password = template.findtext("request/password")
username = ...
...
# Do something with the values
You could try using get('name', default='') instead of ['name']
To get the text in the tag use .text