Inquiry: Parse the following xml code with xml.etree? - python

Problem Statement:
Given the xml given below I want to write a simple script to produce (http)www.herp.com/ and (http)www.herp.com/derp. Conceivably for every application path I come across. That is, if I have more, such as: <application path = "wassup" applicationPool="derp />, I would want that too as (http)www.herp.com/wassup.
<sites>
<site name="(http)www.herp.com" id="1" serverAutoStart="true">
<application path="/" applicationPool="derp_administration">
<virtualDirectory path="/" physicalPath="D:\inetpub\herp_webs\derp" />
<virtualDirectory path="/Controls" physicalPath="D:\inetpub\usercontrolslibnew_ent" />
</application>
<application path="/derp" applicationPool="BOOGA">
<virtualDirectory path="/" physicalPath="D:\inetpub\herp_webs\derp" />
<virtualDirectory path="/Controls" physicalPath="D:\inetpub\usercontrolslibnew" />
</application>
</site>
</sites>
Attempted Solution:
I am using the following code:
import xml.etree.ElementTree as ET
tree = ET.parse("applicationHost.config")
root = tree.getroot()
sites = root.iter('site')
for site in sites:
print site.get('name')
However, this obviously will only give me:
(http) www.herp.com
I am unable to see in the attributes anything that will point me to the <application path = "i want this stuff" />
I tried using site.tag, site.text, site.attrib, and site.tail and none of this helps me see the application path to build my url. How can I parse this xml code to give me both name and path attribute?
So given the excellent suggestions from here. I tried the following code:
sites = root.iter('site')
for site in sites:
apps = site.findall('application')
print apps.tag, apps.attrib
I get the following error.
AttributeError: 'list' object has no attribute 'attrib'
Similar error is given for tags. Basically, if I used site.find('application') that will give me the first <application path ="/" applicationPool="whatever"/>, but I cannot find the rest below it. I'm sorry. Apparently this particular config I ran it on had website dependencies that I was unaware of. I'm new on the job.
Researched Sources:
RTFM: https://docs.python.org/2/library/xml.etree.elementtree.html
http://luisartola.com/software/2010/easy-xml-in-python/
google / here
Notes:
I have multiple *.config files and parsing using a script is the way to go. I am aware of some GUI tools that can do basic stuff, but not appropriate here.

You need to obtain the <application> Element before you can access its path attribute. Given site, you can do this using site.findall('application'):
import xml.etree.ElementTree as ET
tree = ET.parse("applicationHost.config")
root = tree.getroot()
sites = root.iter('site')
for site in sites:
apps = site.findall('application')
for app in apps:
print(''.join([site.get('name'), app.get('path')]))
prints
(http)www.herp.com/
(http)www.herp.com/derp

Related

Facing an error while modifying XML file with python

I am parsing an XML file and trying to delete a empty node but I am receiving the following error:
ValueError: list.remove(x): x not in lis
The XML file is as follows:
<toc>
<topic filename="GUID-5B8DE7B7-879F-45A4-88E0-732155904029.xml" docid="GUID-5B8DE7B7-879F-45A4-88E0-732155904029" TopicTitle="Notes, cautions, and warnings" />
<topic filename="GUID-89943A8D-00D3-4263-9306-CDC944609F2B.xml" docid="GUID-89943A8D-00D3-4263-9306-CDC944609F2B" TopicTitle="HCI Deployment with Windows Server">
<childTopics>
<topic filename="GUID-A3E5EA96-2110-46FF-9251-2291DF755F50.xml" docid="GUID-A3E5EA96-2110-46FF-9251-2291DF755F50" TopicTitle="Installing the OMIMSWAC license" />
<topic filename="GUID-7C4D616D-0D9A-4AE1-BE0F-EC6FC9DAC87E.xml" docid="GUID-7C4D616D-0D9A-4AE1-BE0F-EC6FC9DAC87E" TopicTitle="Managing Microsoft HCI-based clusters">
<childTopics>
</childTopics>
</topic>
</childTopics>
</topic>
</toc>
Kindly note that this is just an example format of my XML File. I this file, I want to remove the empty tag but I am getting an error. My current code is:
import xml.etree.ElementTree as ET
tree = ET.parse("toc2 - Copy.xml")
root = tree.getroot()
node_to_remove = root.findall('.//childTopics//childTopics')
for node in node_to_remove:
root.remove(node)
You need to call remove on the node's immediate parent, not on root. This is tricky using xml.etree, but if instead you use lxml.etree you can write:
import lxml.etree as ET
tree = ET.parse("data.xml")
root = tree.getroot()
node_to_remove = root.findall('.//childTopics//childTopics')
for node in node_to_remove:
node.getparent().remove(node)
print(ET.tostring(tree).decode())
Nodes in xml.etree do not have a getparent() method. If you're unable to use lxml, you'll need to look into other solutions for finding the parent of a node; this question has some discussion on that topic.

Modify XML Custom Part Word Document Server Properties using XML Element Tree and or XML Minidom

I am reformatting and restructuring our document management to use Sharepoint. Our SOPs, Forms and Records were previously contained in SharePoint migrated to a major Document Management System and now need to be migrated back into Sharepoint. The other DMS utilized Document Variables to store key document information and previously this information was stored in custom XML Part "documentManagement" properties. I have already developed python scripts to modify the core_properties, extended_properties and custom_properties that exist. However, my attempt to use docx, aspose and xml.dom.minidom libraries has yet to provide a script to read or edit the XML Part "documentManagment" properties.
I have unzipped the word document and located the XML Part "documentManagment" properties in the \customXML\item1.xml, \customXML\item1.xml, \customXML\item3.xml and sometimes \customXML\item4.xml files. These files contain the schema, elements, and restrictions for these properties usually in the \customXML\item1.xml file and the property values usually stored in the \customXML\item2.xml. I have included here the item2.xml file for reference.
Item2.xml
<p:properties xmlns:p="http://schemas.microsoft.com/office/2006/metadata/properties" xmlns:xsi="http://www.w3.org/2001/XMLSchema-instance" xmlns:pc="http://schemas.microsoft.com/office/infopath/2007/PartnerControls">
<documentManagement>
<qetp xmlns="71220325-c405-4751-a4f1-a91992783649">
<UserInfo>
<DisplayName/>
<AccountId xsi:nil="true" />
<AccountType/>
</UserInfo>
</qetp>
<IconOverlay xmlns="http://schemas.microsoft.com/sharepoint/v4" xsi:nil="true" />
<Revision xmlns="8db16272-67aa-4515-a58c-977707b42560">1</Revision>
<Review_x0020_Date xmlns="8db16272-67aa-4515-a58c-977707b42560">2019-04-10T07:00:00+00:00</Review_x0020_Date>
<Site xmlns="8db16272-67aa-4515-a58c-977707b42560">
<Value>Franklin</Value>
</Site>
<Status xmlns="8db16272-67aa-4515-a58c-977707b42560">Approved</Status>
<Effective_x0020_Date xmlns="8db16272-67aa-4515-a58c-977707b42560">2017-04-10T07:00:00+00:00</Effective_x0020_Date>
<Document_x0020_Number xmlns="8db16272-67aa-4515-a58c-977707b42560">EQIP-0033-00</Document_x0020_Number>
<Module xmlns="8db16272-67aa-4515-a58c-977707b42560">4</Module>
</documentManagement>
</p:properties>
Libraries such as docx and aspose.word have not been able to access these custom XML Part properties, even though they were used to access/edit the core, extended and custom properties. I am new to the xml.etree.ElementTree library and running into many failures. I hope someone might give me a starting point and direction.
I would like to change the value of Revision from 1 to a value of 5
I would like to change the value of Site from Franklin to Liverpool
I would like to change the value of Document_x0020_Number from EQIP-0033-00 to GOV-0112
I would like to change the Value of Status from Approved to Effective
I mad progress using etree.ElementTree, but it has caused an problem I now need help with.
I used the following code to parse and edit the element text values in the tree. However, since the XML file was using namespaces, the parse resulted in the "tag" being {url}name instead of the tag being name xmlns={url}.
Code:
import xml.etree.ElementTree as ET
tree = ET.parse('D:\DTONAS01_DATAPART2_DriveE_Shares\Data\Quality\Veeva Export\XLSX_docProps\extracted\customXml\item2.xml')
root = tree.getroot()
print(root.tag, root.attrib, root.text)
for child in root:
print(child.tag, child.attrib, child.text)
for grandchild in child:
print(grandchild.tag, grandchild.attrib, grandchild.text)
if grandchild.tag == '{8db16272-67aa-4515-a58c-977707b42560}Revision':
grandchild.text = str(5)
print(grandchild.tag, grandchild.attrib, grandchild.text)
if grandchild.tag == '{8db16272-67aa-4515-a58c-977707b42560}Document_x0020_Number':
grandchild.text = "GOV-0112"
print(grandchild.tag, grandchild.attrib, grandchild.text)
if grandchild.tag == '{8db16272-67aa-4515-a58c-977707b42560}Status':
grandchild.text = "Effective"
print(grandchild.tag, grandchild.attrib, grandchild.text)
if grandchild.tag == '{8db16272-67aa-4515-a58c-977707b42560}Site':
for subelement in grandchild:
print(subelement.tag, subelement.attrib, subelement.text)
subelement.text = "Liverpool, England"
print(grandchild.tag, grandchild.attrib, grandchild.text, subelement.text)
tree.write('D:\DTONAS01_DATAPART2_DriveE_Shares\Data\Quality\Veeva Export\XLSX_docProps\extracted\customXml\item2.xml', encoding="utf-8")
<ns0:properties xmlns:ns0="http://schemas.microsoft.com/office/2006/metadata/properties" xmlns:ns1="71220325-c405-4751-a4f1-a91992783649" xmlns:ns3="http://schemas.microsoft.com/sharepoint/v4" xmlns:ns4="8db16272-67aa-4515-a58c-977707b42560" xmlns:xsi="http://www.w3.org/2001/XMLSchema-instance">
<documentManagement>
<ns1:qetp>
<ns1:UserInfo>
<ns1:DisplayName/>
<ns1:AccountId xsi:nil="true" />
<ns1:AccountType/>
</ns1:UserInfo>
</ns1:qetp>
<ns3:IconOverlay xsi:nil="true" />
<ns4:Revision>5</ns4:Revision>
<ns4:Review_x0020_Date>2019-04-10T07:00:00+00:00</ns4:Review_x0020_Date>
<ns4:Site>
<ns4:Value>Liverpool, England</ns4:Value>
</ns4:Site>
<ns4:Status>Effective</ns4:Status>
<ns4:Effective_x0020_Date>2017-04-10T07:00:00+00:00</ns4:Effective_x0020_Date>
<ns4:Document_x0020_Number>GOV-0112</ns4:Document_x0020_Number>
<ns4:Module>4</ns4:Module>
</documentManagement>
</ns0:properties>
As a result the XML file I write back out looks very different than the original XML file. Unfortunately, Word does not like the new XML when zipped back together. Word is able to open the document, but Word no longer displays the properties in File\Info\Properties and shows an error in File\Info\Properties.
What do I need to do so that the output XML file looks the same as the input XML file regarding namespace notation?

Reading xml with lxml lib geting strange string from xmlns tag

I am writing program to work on xml file and change it. But when I try to get to any part of it I get some extra part.
My xml file:
<?xml version="1.0" encoding="UTF-8"?>
<Package xmlns="http://soap.sforce.com/2006/04/metadata">
<types>
<members>sbaa__ApprovalChain__c.ExternalID__c</members>
<members>sbaa__ApprovalCondition__c.ExternalID__c</members>
<members>sbaa__ApprovalRule__c.ExternalID__c</members>
<name>CustomField</name>
</types>
<version>40.0</version>
</Package>
And I have my code:
from lxml import etree
import sys
tree = etree.parse('package.xml')
root = tree.getroot()
print( root[0][0].tag )
As output I expect to see members but I get something like this:
{http://soap.sforce.com/2006/04/metadata}members
Why do I see that url and how to stop it from showing up?
You have defined a default namespace (Wikipedia, lxml tutorial). When defined, it is a part of every child tag.
If you want to print the tag without the namespace, it's easy
tag = root[0][0].tag
print(tag[tag.find('}')+1:])
If you want to remove the namespace from XML, see this question.

How to make my PyObjC application AppleScript-able

I want to create an application on OS X with Python, that is AppleScript-able.
First I used this tutorial to create an Application (it works!). Then I used this SO answer to add AppleScript support; I tried to translate the Objective-C stuff into Python. I added a plist-item to the options in setup.py:
OPTIONS = {
#...
'plist': {
'NSAppleScriptEnabled': True,
'OSAScriptingDefinition': 'SimpleXibDemo.sdef',
#...
}
}
I created SimpleXibDemo.sdef
<?xml version="1.0" encoding="UTF-8"?>
<!DOCTYPE dictionary SYSTEM "file://localhost/System/Library/DTDs/sdef.dtd">
<dictionary title="SimpleXibDemo">
<suite name="SimpleXibDemo Suite" code="SXiD" description="SimpleXibDemo Scripts">
<command name="increase" code="increxib" description="Increase the value">
<cocoa class="SimpleXibDemoIncreaseCommand"/>
<parameter name="by" type="integer" optional="yes">
<cocoa key="by"/>
</parameter>
<result description="Returns the new value" type="integer"/>
</command>
</suite>
</dictionary>
and I added the class SimpleXibDemoIncreaseCommand:
class SimpleXibDemoIncreaseCommand(NSScriptCommand):
def performDefaultImplementation(self):
args = self.evaluatedArguments()
nr = args.valueForKey("by")
viewController.increment()
return nr + 1
The application itself works, but when I run this AppleScript:
tell application "SimpleXibDemo"
set myResult to increase by 3
end tell
I get this error:
error "SimpleXibDemo got an error: Can’t continue increase." number -1708
I can't find any info on error number -1708. And when I open the application with Script Editor -> Open Dictionary, nothing happens.
What is the problem here? I'm a bit stuck :-/

List only one category Python xml

I am trying to write a python program that uses DOM to read xml file and print another xml structure that list from only one node with particular selected attribute "fun".
<?xml version="1.0" encoding="ISO-8859-1"?>
<website>
<url category="fun">
<title>Fun world</title>
<author>Jack</author>
<year>2010</year>
<price>100.00</price>
</url>
<url category="entertainment">
<title>Fun world</title>
<author>Jack</author>
<year>2010</year>
<price>100.00</price>
</url>
</website>
I couldn't select the list from the URL having category="fun".
I tried this code:
for n in dom.getElementsByTagName('url'):
s = n.attribute['category']
if (s.value == "fun"):
print n.toxml()
Can you guys help to me to debug my code?
nb: One of your tags opens "Website" and attempts to close "website" - so you'll want to fix that one...
You've mentioned lxml.
from lxml import etree as et
root = et.fromstring(xml)
fun = root.xpath('/Website/url[#category="fun"]')
for node in fun:
print et.tostring(node)
Use getAttribute:
for n in dom.getElementsByTagName('url'):
if (n.getAttribute('category') == "fun"):
print(n.toxml())

Categories

Resources