Python: Parse dita map file and contents and output all href values

Python: Parse dita map file and contents and output all href values - python

I'm a newbie Python programmer, and I was looking for a script or snippet to help. I have to parse a dita map/xml file and for every xml file, output that filename, open that file and search for referenced .dita, .ditamap, or .xml file, output their filename, and recurse into those files. The ideas is to output a file of all the files referenced by that .ditamap/xml file and its children. This file will feed a list for zipping that group of files to send for processing.
I found some sample code but I get no output!
import os
import glob
root_dir ='~/test_folder'
for filename in glob.glob(root_dir + '**/*.xml', recursive=True)
print(filename)
Here is a sample ditamap file:
<?xml version="1.0" encoding="utf-8"?><?Inspire CreateDate="2019-04-04T16:06:14" ModifiedDate="2022-11-11T16:44:57"?><!DOCTYPE bookmap PUBLIC "-//OASIS//DTD DITA BookMap//EN" "bookmap.dtd">
<bookmap id="bookmap_e90eb827-7421-4491-8df3-5fea34a44931" xml:lang="en-US">
<booktitle id="booktitle_a78ddf49-09d7-4d3d-925c-d42d9ff7f360">
<mainbooktitle id="mainbooktitle_0a34f716-bedc-4c5d-b198-dfd5006a3174">About the Documentation</mainbooktitle>
</booktitle>
<bookmeta>
<prodinfo>
<prodname />
<vrmlist>
<vrm version="1" />
</vrmlist>
<!--Do not change: Must be Manual-->
<brand>Manual</brand>
</prodinfo>
<!--sets task labels (1st othermeta tag below)-->
<othermeta content="yes" name="task-labels" />
<othermeta content="about" name="bundle" />
<bookid>
<!--Revision-->
<volume>A0X</volume>
</bookid>
<bookrights>
<copyrfirst>
<!--Format of copyright year is yyyy - mm-->
<year>2019 - 04</year>
</copyrfirst>
<bookowner>
<!--Do not change organization-->
<organization>Dell</organization>
</bookowner>
</bookrights>
</bookmeta>
<chapter href="subjectscheme_6b1f4589-e73e-49be-806d-0d064f3efd01.xml" format="ditamap" outputclass="subjectscheme" processing-role="resource-only" scope="external" />
<chapter href="atm-About_user_guide_891d23dc-a186-422d-af40-75249dd31f87.xml">
<topicmeta>
<navtitle>About the <keyword conref="lib-Boomi_Keywords_0346af2b-13d7-491e-bec9-18c5d89225bf.xml#GUID-0207C7F1-40FD-4537-BE59-1D6DA46B9A1D/BOOMI_DELL" /><keyword conref="lib-Boomi_Keywords_0346af2b-13d7-491e-bec9-18c5d89225bf.xml#GUID-0207C7F1-40FD-4537-BE59-1D6DA46B9A1D/BOOMI_ATOMSPHERE" /> User Guide</navtitle>
</topicmeta>
<topicref href="atm-Content_browsing_2c16a734-5cf8-416c-8978-0062ac04e430.xml">
<topicmeta>
<navtitle>Content browsing</navtitle>
</topicmeta>
</topicref>
<topicref href="atm-Content_searching_acdba241-6d33-41bc-8886-0907906fed64.xml">
<topicmeta>
<navtitle>Content searching</navtitle>
<othermeta name="mini-toc" content="yes" />
</topicmeta>
</topicref>
<topicref href="atm-Creating_a_documentation_account_c4ddf038-e007-4ee3-bef9-9f4eb06d0f89.dita" />
<topicref href="atm-Collections_of_your_favorite_topics_5dd10ed2-b689-4628-bc2c-bc35dd4f571e.xml">
<topicref id="topicref_bb2f9a40-0266-44b5-a061-39eca24b5d41" href="atm-sharing_saved_collections_d41e734f-4b2e-4c1e-82e7-91617d1008ae.dita" navtitle="atm-Sharing_saved_collections" type="task" />
</topicref>
<topicref id="topicref_8a2ba548-6595-4cc5-af12-afa2631abfbb" href="atm-Using_table_filters_178c0de0-ddee-4073-b828-476ad13345c4.dita" type="task" />
<topicref href="atm-Team_welcomes_your_feedback_848e635e-0132-43d8-b22d-bbdf87ca398a.xml">
<topicmeta>
<navtitle>The <keyword conref="lib-Boomi_Keywords_0346af2b-13d7-491e-bec9-18c5d89225bf.xml#GUID-0207C7F1-40FD-4537-BE59-1D6DA46B9A1D/BOOMI_ATOMSPHERE">The T</keyword> documentation team welcomes your feedback</navtitle>
</topicmeta>
</topicref>
<topicref href="atm-Other_ways_to_get_help_09adc783-784f-4f15-87f9-672d8030b689.xml">
<topicmeta>
<navtitle>Other ways to get help</navtitle>
</topicmeta>
</topicref>
</chapter>
<chapter>
<topicref href="atm-Terms_of_use_78ffba54-261d-428d-afcd-a9db3ce51123.dita" />
</chapter>
<chapter>
<topicref>
<topicref href="atm-API_licensing_df074d66-3a10-4df5-8dd5-0a3e13373d0e.dita" />
</topicref>
</chapter>
<backmatter>
<topicref href="r-boo-Copyright_Boomi_Online_Help_9eea563b-53a2-4d69-b6e7-7372bf7d5440.xml" navtitle="Copyright">
<topicmeta>
<navtitle>CopyrightBoomiOnlineHelp</navtitle>
</topicmeta>
</topicref>
<topicref href="atm-About_reltable_72640fe6-ae6d-490c-b369-7adbcb67bc99.xml" linking="normal" print="no" toc="no">
<topicmeta>
<navtitle>reltable</navtitle>
</topicmeta>
</topicref>
</backmatter>
</bookmap>
If anyone can help or have a similar script that would traverse and parse the files, that would be great!
Any help is greatly appreciated!
Thanks,
Russ

You can search for the href like:
import xml.etree.ElementTree as ET
tree = ET.parse("bookmap.dita")
root = tree.getroot()
for elem in root.iter():
if 'href' in elem.attrib:
# print tag name and file reference
print(elem.tag, elem.get('href'))
Output:
chapter subjectscheme_6b1f4589-e73e-49be-806d-0d064f3efd01.xml
chapter atm-About_user_guide_891d23dc-a186-422d-af40-75249dd31f87.xml
topicref atm-Content_browsing_2c16a734-5cf8-416c-8978-0062ac04e430.xml
topicref atm-Content_searching_acdba241-6d33-41bc-8886-0907906fed64.xml
topicref atm-Creating_a_documentation_account_c4ddf038-e007-4ee3-bef9-9f4eb06d0f89.dita
topicref atm-Collections_of_your_favorite_topics_5dd10ed2-b689-4628-bc2c-bc35dd4f571e.xml
topicref atm-sharing_saved_collections_d41e734f-4b2e-4c1e-82e7-91617d1008ae.dita
topicref atm-Using_table_filters_178c0de0-ddee-4073-b828-476ad13345c4.dita
topicref atm-Team_welcomes_your_feedback_848e635e-0132-43d8-b22d-bbdf87ca398a.xml
topicref atm-Other_ways_to_get_help_09adc783-784f-4f15-87f9-672d8030b689.xml
topicref atm-Terms_of_use_78ffba54-261d-428d-afcd-a9db3ce51123.dita
topicref atm-API_licensing_df074d66-3a10-4df5-8dd5-0a3e13373d0e.dita
topicref r-boo-Copyright_Boomi_Online_Help_9eea563b-53a2-4d69-b6e7-7372bf7d5440.xml
topicref atm-About_reltable_72640fe6-ae6d-490c-b369-7adbcb67bc99.xml
Hope this helps you.

As we discussed this code write a csv recursivly (Be carefully this program have no stopp condition as I asked you. It will stop only, maybe with Error if the first file without links will be found or the file can’t be found):
import xml.etree.ElementTree as ET
class Dita:
"""write a csv file with file name and included file list """
def __init__(self, file):
self.file_name = file
self.file_list = []
def parse_dita(self, file_name):
tree = ET.parse(file_name)
root = tree.getroot()
file_list = []
for elem in root.iter():
if 'href' in elem.attrib:
row = elem.get('href') #elem.tag,
file_list.append(row)
row = file_name, str(file_list),'\n'
with open("f_and_links.csv", 'a') as f_and_links:
f_and_links.writelines(row)
return file_list
def main():
root_file = "bookmap.dita"
print("Source file:", root_file)
dita_obj = Dita(root_file)
file_list = dita_obj.parse_dita(root_file)
for f in file_list:
print("Links list", f)
dita_obj.parse_dita(f)
if __name__ == '__main__':
main()

Related

How to extract text from xml file using python

I'm trying to extract text data from this xml file but I don't know why my code not working. How do I get this phone number? Please have a look at this XML file and my code format as well.I'm trying to extract data from this tag Thank you in advance :)
<ClinicalDocument xmlns="urn:hl7-org:v3" xmlns:xsi="http://www.w3.org/2001/XMLSchema-instance" xmlns:voc="urn:hl7-org:v3/voc" xmlns:sdtc="urn:hl7-org:sdtc" xsi:schemaLocation="CDA.xsd">
<realmCode code="US"/>
<languageCode code="en-US"/>
<recordTarget>
<patientRole>
<addr use="HP">
<streetAddressLine>3345 Elm Street</streetAddressLine>
<city>Aurora</city>
<state>CO</state>
<postalCode>80011</postalCode>
<country>US</country>
</addr>
<telecom value="tel:+1(303)-554-8889" use="HP"/>
<patient>
<name use="L">
<given>Janson</given>
<given>J</given>
<family>Example</family>
</name>
</patient>
</patientRole>
</recordTarget>
</ClinicalDocument>
Here is my python code
import xml.etree.ElementTree as ET
tree = ET.parse('country.xml')
root = tree.getroot()
print(root)
for country in root.findall('patientRole'):
number = country.get('telecom')
print(number)

Your XML document has namespace specified, so it becomes something like:
for country in tree.findall('.//{urn:hl7-org:v3}patientRole'):
number = country.find('{urn:hl7-org:v3}telecom').attrib['value']
print(number)
Output:
tel:+1(303)-554-8889

XML tag value replace using python

I have an XML file where I need to replace the value inside filter tag. I have tried to parse this XML but looks like some error in this which I am unable to resolve. Can anybody please help me in replacing the value.
path = """
<SOAP:Envelope xmlns:SOAP="http://schemas.xmlsoap.org/soap/envelope/">
<SOAP:Header>
<header xmlns="xmlapi_1.0">
<security>
<user>testuser</user>
<password hashed="false">testpassword</password>
</security>
<requestID>XML_API_client#n</requestID>
</header>
</SOAP:Header>
<SOAP:Body>
<find>
<fullClassName>equipment.PhysicalPort</fullClassName>
<filter>
<equal name="siteId" value="x.x.x.x." />
</filter>
<resultFilter>
<attribute>objectFullName</attribute>
<attribute>displayedName</attribute>
<attribute>portName</attribute>
<attribute>description</attribute>
<attribute>lagMembershipId</attribute>
<attribute>encapType</attribute>
<attribute>mode</attribute>
<attribute>speed</attribute>
<attribute>mtuValue</attribute>
</resultFilter>
</find>
<find xmlns="xmlapi_1.0">
<fullClassName>ethernetequipment.EthernetPortSpecifics</fullClassName>
<filter>
<equal name="siteId" value="x.x.x.x." />
</filter>
<resultFilter>
<attribute>autoNegotiate</attribute>
<attribute>downWhenLooped</attribute>
</resultFilter>
</find>
"""
The other way I tried:
path = '../request.xml'
IP = '"' + '63.130.111.89' + '"' + '/>'
f = open('test.xml', 'w+')
for line in open(path, 'r'):
output_line = line
string1, string2 = "siteId", "value"
if string1 in output_line:
value = output_line.split("value"'=', 1)[1]
output_line = str.replace(output_line, value, IP)
f.write(output_line)

Parse XML with childs that have different tags in Python

I am trying to parse following xml data from a file with python for print only the elements with tag "zip-code" with his attribute name
<response status="success" code="19"><result total-count="1" count="1">
<address>
<entry name="studio">
<zip-code>14407</zip-code>
<description>Nothing</description>
</entry>
<entry name="mailbox">
<zip-code>33896</zip-code>
<description>Nothing</description>
</entry>
<entry name="garage">
<zip-code>33746</zip-code>
<description>Tony garage</description>
</entry>
<entry name="playstore">
<url>playstation.com</url>
<description>game download</description>
</entry>
<entry name="gym">
<zip-code>33746</zip-code>
<description>Getronics NOC subnet 2</description>
</entry>
<entry name="e-cigars">
<url>vape.com/24</url>
<description>vape juices</description>
</entry>
</address>
</result></response>
The python code that I am trying to run is
from xml.etree import ElementTree as ET
tree = ET.parse('file.xml')
root = tree.getroot()
items = root.iter('entry')
for item in items:
zip = item.find('zip-code').text
names = (item.attrib)
print(' {} {} '.format(
names, zip
))
However it fails once it gets to the items without "zip-code" tag.
How I could make this run?
Thanks in advance

As #AmitaiIrron suggested, xpath can help here.
This code searches the document for element named zip-code, and pings back to get the parent of that element. From there, you can get the name attribute, and pair with the text from zip-code element
for ent in root.findall(".//zip-code/.."):
print(ent.attrib.get('name'), ent.find('zip-code').text)
studio 14407
mailbox 33896
garage 33746
gym 33746
OR
{ent.attrib.get('name') : ent.find('zip-code').text
for ent in root.findall(".//zip-code/..")}
{'studio': '14407', 'mailbox': '33896', 'garage': '33746', 'gym': '33746'}

Your loop should look like this:
# Find all <entry> tags in the hierarchy
for item in root.findall('.//entry'):
# Try finding a <zip-code> child
zipc = item.find('./zip-code')
# If found a child, print data for it
if zipc is not None:
names = (item.attrib)
print(' {} {} '.format(
names, zipc.text
))
It's all a matter of learning to use xpath properly when searching through the XML tree.

If you have no problem using regular expressions, the following works just fine:
import re
file = open('file.xml', 'r').read()
pattern = r'name="(.*?)".*?<zip-code>(.*?)<\/zip-code>'
matches = re.findall(pattern, file, re.S)
for m in matches:
print("{} {}".format(m[0], m[1]))
and produces the result:
studio 14407
mailbox 33896
garage 33746
aystore 33746

how to find and edit tags in XML files with namespaces using ElementTree

I would like to find specific tags in my XML document and edit their text or attributes. My XML file contains namespaces (and as I understand it correctly, nested namespaces). The tool I'd like to use for this purpose is ElementTree. I managed to read XML file by iterparse, however I don't know how I can save edited XML, because iterparse doesn't have write element. I need a solution to read XML file by parse and strip its namespaces and nested namespaces or a way to save iterparsed file.
For this case, let's edit the "Rating" tag text.
it = ET.iterparse(adiPath)
for _, el in it:
if '}' in el.tag:
el.tag = el.tag.split('}', 1)[1] # strip all namespaces
for at in list(el.attrib): # strip namespaces of attributes too
if '}' in at:
newat = at.split('}', 1)[1]
el.attrib[newat] = el.attrib[at]
del el.attrib[at]
root = it.root
# Search Rating tag and edit it's value
for rating in root.iter('Rating'):
print(rating.text) # Prints 18
rating.text = "999"
print(rating.text) # Prints 999
However in this case XML file remains unchanged.
Here is XML file:
<?xml version="1.0" encoding="utf-8"?>
<ADI3 xmlns:xsi="http://www.w3.org/2001/XMLSchema-instance" xmlns:content="urn:cablelabs:md:xsd:content:3.0" xmlns:core="urn:cablelabs:md:xsd:core:3.0" xmlns:offer="urn:cablelabs:md:xsd:offer:3.0" xmlns:terms="urn:cablelabs:md:xsd:terms:3.0" xmlns:title="urn:cablelabs:md:xsd:title:3.0" xmlns:adb="urn:adb:md:xsd:adb:01" xmlns:schemaLocation="urn:adb:md:xsd:adb:01 ADB-EXT-C01.xsd urn:cablelabs:md:xsd:core:3.0 MD-SP-CORE-C01.xsd urn:cablelabs:md:xsd:content:3.0 MD-SP-CONTENT-C01.xsd urn:cablelabs:md:xsd:offer:3.0 MD-SP-OFFER-C01.xsd urn:cablelabs:md:xsd:terms:3.0 MD-SP-TERMS-C01.xsd urn:cablelabs:md:xsd:title:3.0 MD-SP-TITLE-C01.xsd" xmlns:xml="http://www.w3.org/XML/1998/namespace" xmlns="urn:cablelabs:md:xsd:core:3.0">
<Asset xsi:type="title:TitleType" uriId="ab://cc.com" providerVersionNum="1" internalVersionNum="0" creationDateTime="2020-01-28T08:55:19Z" startDateTime="2019-05-20T00:00:00Z" endDateTime="2028-08-20T23:59:00Z">
<AlternateId identifierSystem="VOD1.1">ab://cc.com</AlternateId>
<Ext>
<adb:ExtensionType>
<adb:TitleExt>
<adb:SeriesInfo episodeNumber="6">
<adb:series seriesId="GOT" seasonCount="8"></adb:series>
<adb:season seasonId="GOTS08" number="8" episodeCount="6"></adb:season>
</adb:SeriesInfo>
</adb:TitleExt>
</adb:ExtensionType>
</Ext>
<title:LocalizableTitle xml:lang="pol">
<title:TitleLong>Game of Thrones VIII</title:TitleLong>
<title:SummaryLong>Long summary, long summary, long summary...</title:SummaryLong>
<title:Actor fullName="Peter Dinklage" firstName="Peter" lastName="Dinklage" />
<title:Actor fullName="Nikolaj Coster-Waldau" firstName="Nikolaj" lastName="Coster-Waldau" />
<title:Actor fullName="Emilia Clarke" firstName="Emilia" lastName="Clarke" />
<title:Actor fullName="Lena Headey" firstName="Lena" lastName="Headey" />
<title:Director fullName="David Nutter" firstName="David" lastname="Nutter" />
</title:LocalizableTitle>
<title:Rating ratingSystem="PL">18</title:Rating>
<title:Audience>General</title:Audience>
<title:DisplayRunTime>01:15</title:DisplayRunTime>
<title:Year>2019</title:Year>
<title:CountryOfOrigin>US</title:CountryOfOrigin>
<title:Genre>Film fantasy</title:Genre>
<title:ShowType>Movie</title:ShowType>
</Asset>
<Asset xsi:type="offer:CategoryType" uriId="cc.com/XX">
<AlternateId identifierSystem="VOD1.1">cc.com/XX</AlternateId>
<offer:CategoryPath>VOD/GOT/Season 8</offer:CategoryPath>
</Asset>
<Asset xsi:type="content:MovieType" uriId="GraoTronVIII_0_1080mp4">
<AlternateId identifierSystem="VOD1.1">GraoTronVIII_0_1080mp4</AlternateId>
<content:SourceUrl>GOTS08E06.mp4</content:SourceUrl>
<content:Resolution>1080p</content:Resolution>
<content:Duration>PT1H15M20S</content:Duration>
<content:Language>pol</content:Language>
<content:Language>eng</content:Language>
</Asset>
<Asset xsi:type="content:PreviewType" uriId="GraoTronVIII_1_1080mp4">
<AlternateId identifierSystem="VOD1.1">GraoTronVIII_1_1080mp4</AlternateId>
<content:SourceUrl>GOTS08E06_trailer.mp4</content:SourceUrl>
<content:Resolution>1080p</content:Resolution>
<content:Duration>PT0H01M48S</content:Duration>
<content:Language>pol</content:Language>
<content:Language>eng</content:Language>
</Asset>
<Asset xsi:type="content:PosterType" uriId="GraoTronVIIIPoster">
<AlternateId identifierSystem="VOD1.1">GraoTronVIIIPoster</AlternateId>
<content:SourceUrl>GOTS08E06.jpg</content:SourceUrl>
<content:X_Resolution>600</content:X_Resolution>
<content:Y_Resolution>900</content:Y_Resolution>
<content:Language>pol</content:Language>
</Asset>
<Asset xsi:type="offer:ContentGroupType" uriId="abc">
<AlternateId identifierSystem="VOD1.1">abc</AlternateId>
<offer:TitleRef uriId="abc" />
<offer:MovieRef uriId="GraoTronVIII_0_1080mp4" />
</Asset>
<Asset xsi:type="offer:ContentGroupType" uriId="abc">
<AlternateId identifierSystem="VOD1.1">abc</AlternateId>
<offer:TitleRef uriId="abc" />
<offer:MovieRef uriId="GraoTronVIII_1_1080mp4" />
</Asset>
<Asset xsi:type="offer:ContentGroupType" uriId="abc">
<AlternateId identifierSystem="VOD1.1">abc</AlternateId>
<offer:TitleRef uriId="abc" />
<offer:MovieRef uriId="GraoTronVIIIPoster" />
</Asset>
</ADI3>

Instead of stripping out the namespaces, I suggest using namespace wildcards. Support for this was added in Python 3.8.
from xml.etree import ElementTree as ET
tree = ET.parse(adiPath)
rating = tree.find(".//{*}Rating") # Find the Rating element in any namespace
rating.text = "999"
Note that you have to use find() (or findall()). Wildcards do not work with iter().
The following workaround can be used to preserve the original namespace prefixes when serializing the XML document (see also https://stackoverflow.com/a/42372404/407651 and https://stackoverflow.com/a/54491129/407651).
namespaces = dict([elem for _, elem in ET.iterparse("test1.xml", events=['start-ns'])])
for ns in namespaces:
ET.register_namespace(ns, namespaces[ns])

Reading from another child element using Elementree

The code below goes through the xml files and parses them into a single csv file
from xml.etree import ElementTree as ET
from collections import defaultdict
import csv
from pathlib import Path
directory = 'path to a folder with xml files'
with open('output.csv', 'w', newline='') as f:
writer = csv.writer(f)
headers = ['id', 'service_code', 'rational', 'qualify', 'description_num', 'description_txt', 'set_data_xin', 'set_data_xax', 'set_data_value', 'set_data_x']
writer.writerow(headers)
xml_files_list = list(map(str, Path(directory).glob('**/*.xml')))
print(xml_files_list)
for xml_file in xml_files_list:
tree = ET.parse(xml_file)
root = tree.getroot()
start_nodes = root.findall('.//START')
for sn in start_nodes:
row = defaultdict(str)
repeated_values = dict()
for k,v in sn.attrib.items():
repeated_values[k] = v
for rn in sn.findall('.//Rational'):
repeated_values['rational'] = rn.text
for qu in sn.findall('.//Qualify'):
repeated_values['qualify'] = qu.text
for ds in sn.findall('.//Description'):
repeated_values['description_txt'] = ds.text
repeated_values['description_num'] = ds.attrib['num']
for st in sn.findall('.//SetData'):
for k,v in st.attrib.items():
row['set_data_'+ str(k)] = v
for key in repeated_values.keys():
row[key] = repeated_values[key]
row_data = [row[i] for i in headers]
writer.writerow(row_data)
row = defaultdict(str)
This is the xml file.
<?xml version="1.0" encoding="utf-8"?>
<ProjectData>
<Phones>
<Date />
<Prog />
<Box />
<Feature />
<IN>MAFWDS</IN>
<Set>234234</Set>
<Pr>23423</Pr>
<Number>afasfhrtv</Number>
<Simple>dfasd</Simple>
<Nr />
<Get>6070106091</Get>
<Reno>1233</Reno>
</Phones>
<FINAL>
<START id="B001" service_code="0x5196">
<Docs Docs_type="START">
<Rational>225196</Rational>
<Qualify>6251960000A0DE</Qualify>
</Docs>
<Description num="1213f2312">The parameter</Description>
<DataFile dg="12" dg_id="let">
<SetData value="32" />
</DataFile>
</START>
<START id="C003" service_code="0x517B">
<Docs Docs_type="START">
<Rational>23423</Rational>
<Qualify>342342</Qualify>
</Docs>
<Description num="3423423f3423">The third</Description>
<DataFile dg="55" dg_id="big">
<SetData x="E1" value="21259" />
<SetData x="E2" value="02" />
</DataFile>
</START>
<START id="Z048" service_code="0x5198">
<RawData rawdata_type="ASDS">
<Rational>225198</Rational>
<Qualify>343243324234234</Qualify>
</RawData>
<Description num="434234234">The forth</Description>
<DataFile unit="21" unit_id="FEDS">
<FileX unit="eg" discrete="false" axis_pts="19" name="Vsome" text_id="bx5" unit_id="GDFSD" />
<SetData xin="5" xax="233" value="323" />
<SetData xin="123" xax="77" value="555" />
<SetData xin="17" xax="65" value="23" />
</DataFile>
</START>
</FINAL>
</ProjectData>
This is how the output looks like
Currently struggling to modify the code , so it goes to Phones (which is another child of Projectdata) takes elements from Set and Get attaches them together with _ and parses them into the first column that has the header names ** Identify**
The picture bellow shows how It should look.

Modify your headers line to
headers = ['identify', 'id', 'service_code', 'rational', 'qualify', 'description_num', 'description_txt', 'set_data_xin', 'set_data_xax', 'set_data_value', 'set_data_x']
p_get = tree.find('.//Phones/Get').text
p_set = tree.find('.//Phones/Set').text
and add this info to the row_data just before the line writer.writerow(row_data)
like this:
row_data.insert(0, p_get + '_' + p_set)
Update
row_data[0] = p_get + '_' + p_set

Develop Reference

Python is a programming language that lets you work quickly and integrate systems more effectively.

Python: Parse dita map file and contents and output all href values - python

Related

How to extract text from xml file using python

XML tag value replace using python

Parse XML with childs that have different tags in Python

how to find and edit tags in XML files with namespaces using ElementTree

Reading from another child element using Elementree

Categories

Resources