I have an XML file where I need to replace the value inside filter tag. I have tried to parse this XML but looks like some error in this which I am unable to resolve. Can anybody please help me in replacing the value.
path = """
<SOAP:Envelope xmlns:SOAP="http://schemas.xmlsoap.org/soap/envelope/">
<SOAP:Header>
<header xmlns="xmlapi_1.0">
<security>
<user>testuser</user>
<password hashed="false">testpassword</password>
</security>
<requestID>XML_API_client#n</requestID>
</header>
</SOAP:Header>
<SOAP:Body>
<find>
<fullClassName>equipment.PhysicalPort</fullClassName>
<filter>
<equal name="siteId" value="x.x.x.x." />
</filter>
<resultFilter>
<attribute>objectFullName</attribute>
<attribute>displayedName</attribute>
<attribute>portName</attribute>
<attribute>description</attribute>
<attribute>lagMembershipId</attribute>
<attribute>encapType</attribute>
<attribute>mode</attribute>
<attribute>speed</attribute>
<attribute>mtuValue</attribute>
</resultFilter>
</find>
<find xmlns="xmlapi_1.0">
<fullClassName>ethernetequipment.EthernetPortSpecifics</fullClassName>
<filter>
<equal name="siteId" value="x.x.x.x." />
</filter>
<resultFilter>
<attribute>autoNegotiate</attribute>
<attribute>downWhenLooped</attribute>
</resultFilter>
</find>
"""
The other way I tried:
path = '../request.xml'
IP = '"' + '63.130.111.89' + '"' + '/>'
f = open('test.xml', 'w+')
for line in open(path, 'r'):
output_line = line
string1, string2 = "siteId", "value"
if string1 in output_line:
value = output_line.split("value"'=', 1)[1]
output_line = str.replace(output_line, value, IP)
f.write(output_line)
Related
I'm a newbie Python programmer, and I was looking for a script or snippet to help. I have to parse a dita map/xml file and for every xml file, output that filename, open that file and search for referenced .dita, .ditamap, or .xml file, output their filename, and recurse into those files. The ideas is to output a file of all the files referenced by that .ditamap/xml file and its children. This file will feed a list for zipping that group of files to send for processing.
I found some sample code but I get no output!
import os
import glob
root_dir ='~/test_folder'
for filename in glob.glob(root_dir + '**/*.xml', recursive=True)
print(filename)
Here is a sample ditamap file:
<?xml version="1.0" encoding="utf-8"?><?Inspire CreateDate="2019-04-04T16:06:14" ModifiedDate="2022-11-11T16:44:57"?><!DOCTYPE bookmap PUBLIC "-//OASIS//DTD DITA BookMap//EN" "bookmap.dtd">
<bookmap id="bookmap_e90eb827-7421-4491-8df3-5fea34a44931" xml:lang="en-US">
<booktitle id="booktitle_a78ddf49-09d7-4d3d-925c-d42d9ff7f360">
<mainbooktitle id="mainbooktitle_0a34f716-bedc-4c5d-b198-dfd5006a3174">About the Documentation</mainbooktitle>
</booktitle>
<bookmeta>
<prodinfo>
<prodname />
<vrmlist>
<vrm version="1" />
</vrmlist>
<!--Do not change: Must be Manual-->
<brand>Manual</brand>
</prodinfo>
<!--sets task labels (1st othermeta tag below)-->
<othermeta content="yes" name="task-labels" />
<othermeta content="about" name="bundle" />
<bookid>
<!--Revision-->
<volume>A0X</volume>
</bookid>
<bookrights>
<copyrfirst>
<!--Format of copyright year is yyyy - mm-->
<year>2019 - 04</year>
</copyrfirst>
<bookowner>
<!--Do not change organization-->
<organization>Dell</organization>
</bookowner>
</bookrights>
</bookmeta>
<chapter href="subjectscheme_6b1f4589-e73e-49be-806d-0d064f3efd01.xml" format="ditamap" outputclass="subjectscheme" processing-role="resource-only" scope="external" />
<chapter href="atm-About_user_guide_891d23dc-a186-422d-af40-75249dd31f87.xml">
<topicmeta>
<navtitle>About the <keyword conref="lib-Boomi_Keywords_0346af2b-13d7-491e-bec9-18c5d89225bf.xml#GUID-0207C7F1-40FD-4537-BE59-1D6DA46B9A1D/BOOMI_DELL" /><keyword conref="lib-Boomi_Keywords_0346af2b-13d7-491e-bec9-18c5d89225bf.xml#GUID-0207C7F1-40FD-4537-BE59-1D6DA46B9A1D/BOOMI_ATOMSPHERE" /> User Guide</navtitle>
</topicmeta>
<topicref href="atm-Content_browsing_2c16a734-5cf8-416c-8978-0062ac04e430.xml">
<topicmeta>
<navtitle>Content browsing</navtitle>
</topicmeta>
</topicref>
<topicref href="atm-Content_searching_acdba241-6d33-41bc-8886-0907906fed64.xml">
<topicmeta>
<navtitle>Content searching</navtitle>
<othermeta name="mini-toc" content="yes" />
</topicmeta>
</topicref>
<topicref href="atm-Creating_a_documentation_account_c4ddf038-e007-4ee3-bef9-9f4eb06d0f89.dita" />
<topicref href="atm-Collections_of_your_favorite_topics_5dd10ed2-b689-4628-bc2c-bc35dd4f571e.xml">
<topicref id="topicref_bb2f9a40-0266-44b5-a061-39eca24b5d41" href="atm-sharing_saved_collections_d41e734f-4b2e-4c1e-82e7-91617d1008ae.dita" navtitle="atm-Sharing_saved_collections" type="task" />
</topicref>
<topicref id="topicref_8a2ba548-6595-4cc5-af12-afa2631abfbb" href="atm-Using_table_filters_178c0de0-ddee-4073-b828-476ad13345c4.dita" type="task" />
<topicref href="atm-Team_welcomes_your_feedback_848e635e-0132-43d8-b22d-bbdf87ca398a.xml">
<topicmeta>
<navtitle>The <keyword conref="lib-Boomi_Keywords_0346af2b-13d7-491e-bec9-18c5d89225bf.xml#GUID-0207C7F1-40FD-4537-BE59-1D6DA46B9A1D/BOOMI_ATOMSPHERE">The T</keyword> documentation team welcomes your feedback</navtitle>
</topicmeta>
</topicref>
<topicref href="atm-Other_ways_to_get_help_09adc783-784f-4f15-87f9-672d8030b689.xml">
<topicmeta>
<navtitle>Other ways to get help</navtitle>
</topicmeta>
</topicref>
</chapter>
<chapter>
<topicref href="atm-Terms_of_use_78ffba54-261d-428d-afcd-a9db3ce51123.dita" />
</chapter>
<chapter>
<topicref>
<topicref href="atm-API_licensing_df074d66-3a10-4df5-8dd5-0a3e13373d0e.dita" />
</topicref>
</chapter>
<backmatter>
<topicref href="r-boo-Copyright_Boomi_Online_Help_9eea563b-53a2-4d69-b6e7-7372bf7d5440.xml" navtitle="Copyright">
<topicmeta>
<navtitle>CopyrightBoomiOnlineHelp</navtitle>
</topicmeta>
</topicref>
<topicref href="atm-About_reltable_72640fe6-ae6d-490c-b369-7adbcb67bc99.xml" linking="normal" print="no" toc="no">
<topicmeta>
<navtitle>reltable</navtitle>
</topicmeta>
</topicref>
</backmatter>
</bookmap>
If anyone can help or have a similar script that would traverse and parse the files, that would be great!
Any help is greatly appreciated!
Thanks,
Russ
You can search for the href like:
import xml.etree.ElementTree as ET
tree = ET.parse("bookmap.dita")
root = tree.getroot()
for elem in root.iter():
if 'href' in elem.attrib:
# print tag name and file reference
print(elem.tag, elem.get('href'))
Output:
chapter subjectscheme_6b1f4589-e73e-49be-806d-0d064f3efd01.xml
chapter atm-About_user_guide_891d23dc-a186-422d-af40-75249dd31f87.xml
topicref atm-Content_browsing_2c16a734-5cf8-416c-8978-0062ac04e430.xml
topicref atm-Content_searching_acdba241-6d33-41bc-8886-0907906fed64.xml
topicref atm-Creating_a_documentation_account_c4ddf038-e007-4ee3-bef9-9f4eb06d0f89.dita
topicref atm-Collections_of_your_favorite_topics_5dd10ed2-b689-4628-bc2c-bc35dd4f571e.xml
topicref atm-sharing_saved_collections_d41e734f-4b2e-4c1e-82e7-91617d1008ae.dita
topicref atm-Using_table_filters_178c0de0-ddee-4073-b828-476ad13345c4.dita
topicref atm-Team_welcomes_your_feedback_848e635e-0132-43d8-b22d-bbdf87ca398a.xml
topicref atm-Other_ways_to_get_help_09adc783-784f-4f15-87f9-672d8030b689.xml
topicref atm-Terms_of_use_78ffba54-261d-428d-afcd-a9db3ce51123.dita
topicref atm-API_licensing_df074d66-3a10-4df5-8dd5-0a3e13373d0e.dita
topicref r-boo-Copyright_Boomi_Online_Help_9eea563b-53a2-4d69-b6e7-7372bf7d5440.xml
topicref atm-About_reltable_72640fe6-ae6d-490c-b369-7adbcb67bc99.xml
Hope this helps you.
As we discussed this code write a csv recursivly (Be carefully this program have no stopp condition as I asked you. It will stop only, maybe with Error if the first file without links will be found or the file can’t be found):
import xml.etree.ElementTree as ET
class Dita:
"""write a csv file with file name and included file list """
def __init__(self, file):
self.file_name = file
self.file_list = []
def parse_dita(self, file_name):
tree = ET.parse(file_name)
root = tree.getroot()
file_list = []
for elem in root.iter():
if 'href' in elem.attrib:
row = elem.get('href') #elem.tag,
file_list.append(row)
row = file_name, str(file_list),'\n'
with open("f_and_links.csv", 'a') as f_and_links:
f_and_links.writelines(row)
return file_list
def main():
root_file = "bookmap.dita"
print("Source file:", root_file)
dita_obj = Dita(root_file)
file_list = dita_obj.parse_dita(root_file)
for f in file_list:
print("Links list", f)
dita_obj.parse_dita(f)
if __name__ == '__main__':
main()
I am trying to make an MyXml.xml file by parsing other Source.xml file. Current structure of MyXml is:
<tag atrib="true" atrib2="false" atrib3="1" atrib4="7">
<tag1 txt="CONTENT">
<tag2 name="Category">1</Field>
<tag3 name="Wallet"> </Field>
<tag4 name="Increase">1</Field>
<tag5 name="Text">
<div />
</tag5>
</tag1>
</tag>
But my output should be like this (tags of tag5 should be in same line):
<tag atrib="true" atrib2="false" atrib3="1" atrib4="7">
<tag1 txt="CONTENT">
<tag2 name="Category">1</Field>
<tag3 name="Wallet"> </Field>
<tag4 name="Increase">1</Field>
<tag5 name="Text"><div><h2>SomeTxt</h2></div></tag5>
</tag1>
</tag>
current code is this:
MDroot = minidom.Document()
tag = MDroot.createElement('tag')
MDroot.appendChild(tag)
# Other tags
root = ET.Element('tag')
tag1 = ET.SubElement(root, 'tag1', txt= 'CONTENT')
ET.SubElement(tag1, "tag2", name='Category').text = "Heading"
ET.SubElement(tag1, "tag3", name='Wallet').text = ' '
ET.SubElement(tag1, "tag4", name='Increase').text = 1
tag5 = ET.SubElement(tag1, "tag5 ", name='Text')
div = ET.SubElement(tag5 , "div",)
root1 = ET.Element(tag5)
root1.insert(1, div)
But this code always creates normal xml structure with parenting. Any idea how can I put those in the same line?
Thanks!
in xml lines are NOT important!
<tag5 name="Text"><div><h2>SomeTxt</h2></div></tag5>
has the same meaning as:
<tag5 name="Text">
<div>
<h2>SomeTxt</h2>
</div>
</tag5>
So just ignore the lines.
I have a simple XML structure from which I want to extract the data then process it. I'm using Python and xml.etree.ElementTree, and it works well, except for a particular case. When I parse a particular XML, there is one node that returns None as values for the content of the elements.
Here is the code and the output:
import xml.etree.ElementTree as ET
sample = '<?xml version="1.0" encoding="UTF-8"?>\
<xliff xmlns="urn:oasis:names:tc:xliff:document:1.2" version="1.2">\
<file original="global" datatype="plaintext" source-language="en" target-language="fr-CA">\
<body>\
<trans-unit id="translations.amountMoreString" resname="e438d8a3237fefa5ace76eae98c157bd">\
<source xml:lang="en"><ph id="1" ctype="x-phrase-placeholder">{{ count }}</ph> more</source>\
<target xml:lang="fr-CA" state="signed-off"><ph id="1" ctype="x-phrase-placeholder">{{ count }}</ph> de plus</target>\
</trans-unit>\
<trans-unit id="translations.amountString" resname="7cdc0d444b4ee4ccc0a11819a3f96af2">\
<source xml:lang="en">Amount</source>\
<target xml:lang="fr-CA" state="signed-off">Quantité</target>\
</trans-unit>\
</body>\
</file>\
</xliff>'
xliff_root = ET.fromstring(sample)
for file_element in xliff_root:
for body_element in file_element:
for trans_unit_element in body_element:
print('\ntrans-unit element', trans_unit_element)
for text_element in trans_unit_element:
print('\tText element:', text_element)
if text_element.tag == '{urn:oasis:names:tc:xliff:document:1.2}source':
print('\t\tSource:', text_element.text)
translation_unit['source'] = text_element.text
elif text_element.tag == '{urn:oasis:names:tc:xliff:document:1.2}target':
print('\t\tTarget:', text_element.text)
translation_unit['target'] = text_element.text
The output:
trans-unit element <Element '{urn:oasis:names:tc:xliff:document:1.2}trans-unit' at 0x00000163DBAA71D0>
Text element: <Element '{urn:oasis:names:tc:xliff:document:1.2}source' at 0x00000163DBAA72C0>
Source: None
Text element: <Element '{urn:oasis:names:tc:xliff:document:1.2}target' at 0x00000163DBAA7450>
Target: None
trans-unit element <Element '{urn:oasis:names:tc:xliff:document:1.2}trans-unit' at 0x00000163DBAA7590>
Text element: <Element '{urn:oasis:names:tc:xliff:document:1.2}source' at 0x00000163DBAA75E0>
Source: Amount
Text element: <Element '{urn:oasis:names:tc:xliff:document:1.2}target' at 0x00000163DBAA7630>
Target: Quantité
I'm getting None as values for the <source> and <target> elements for the first <trans-unit>, but the second one returned the correct values. I understand that there are other elements inside the <source> and <target> elements, but there is also textual content.
I would like to thank in advance anyone who could help me understand and correct this issue...
Kind regards,
JF
IN the first <trans-unit> element, <source> does not contain only text; it contains a <ph> element. The structure is:
xliff
body
trans-unit
source
ph
"text is here"
"text is also here"
Compare the second trans-unit element, where the structure is:
xliff
body
trans-unit
source
"text is here"
You can use the itertext method to extract all the text children. For example, this code:
import xml.etree.ElementTree as ET
sample='''<?xml version="1.0" encoding="UTF-8"?>
<xliff xmlns="urn:oasis:names:tc:xliff:document:1.2" version="1.2">
<file original="global" datatype="plaintext" source-language="en" target-language="fr-CA">
<body>
<trans-unit id="translations.amountMoreString" resname="e438d8a3237fefa5ace76eae98c157bd">
<source xml:lang="en"><ph id="1" ctype="x-phrase-placeholder">{{ count }}</ph> more</source>
<target xml:lang="fr-CA" state="signed-off"><ph id="1" ctype="x-phrase-placeholder">{{ count }}</ph> de plus</target>
</trans-unit>
<trans-unit id="translations.amountString" resname="7cdc0d444b4ee4ccc0a11819a3f96af2">
<source xml:lang="en">Amount</source>
<target xml:lang="fr-CA" state="signed-off">Quantité</target>
</trans-unit>
</body>
</file>
</xliff>
'''
xliff_root = ET.fromstring(sample)
translation_unit = {}
for trans_unit in xliff_root.findall('.//{urn:oasis:names:tc:xliff:document:1.2}trans-unit'):
source = trans_unit.find('{urn:oasis:names:tc:xliff:document:1.2}source')
source_text = [text.strip() for text in source.itertext()][-1]
target = trans_unit.find('{urn:oasis:names:tc:xliff:document:1.2}target')
target_text = [text.strip() for text in target.itertext()][-1]
print('source:', source_text)
print('target:', target_text)
Produces:
source: more
target: de plus
source: Amount
target: Quantité
Thanks to #larsks answers + some digging in the documentation, I managed to get exactly what I need. I found the ET.tostring method that returns the whole node as text string, then I use a couple of REs to remove the unwanted stuff from those strings.
Full code here:
import xml.etree.ElementTree as ET
import re
sample = '''<?xml version="1.0" encoding="UTF-8"?>\
<xliff xmlns="urn:oasis:names:tc:xliff:document:1.2" version="1.2">\
<file original="global" datatype="plaintext" source-language="en" target-language="fr-CA">\
<body>\
<trans-unit id="translations.amountMoreString" resname="e438d8a3237fefa5ace76eae98c157bd">\
<source xml:lang="en"><ph id="1" ctype="x-phrase-placeholder">{{ count }}</ph> more</source>\
<target xml:lang="fr-CA" state="signed-off"><ph id="1" ctype="x-phrase-placeholder">{{ count }}</ph> de plus</target>\
</trans-unit>\
<trans-unit id="translations.amountString" resname="7cdc0d444b4ee4ccc0a11819a3f96af2">\
<source xml:lang="en">Amount</source>\
<target xml:lang="fr-CA" state="signed-off">Quantité</target>\
</trans-unit>\
</body>\
</file>\
</xliff>
'''
xliff_root = ET.fromstring(sample)
translation_unit = {}
for trans_unit in xliff_root.findall('.//{urn:oasis:names:tc:xliff:document:1.2}trans-unit'):
source = trans_unit.find('{urn:oasis:names:tc:xliff:document:1.2}source')
source_text = ET.tostring(source, encoding='utf8').decode('utf8')
source_text = re.sub(r'^<?(.*?)?>\n', r'', source_text)
source_text = re.sub(r'<(\/?)ns\d+:', r'<\1', source_text)
source_text = re.sub(r'<(\/?)source(.*?)>', r'', source_text)
print(source_text)
target = trans_unit.find('{urn:oasis:names:tc:xliff:document:1.2}target')
target_text = ET.tostring(target, encoding='utf8').decode('utf8')
target_text = re.sub(r'^<?(.*?)?>\n', r'', target_text)
target_text = re.sub(r'<(\/?)ns\d+:', r'<\1', target_text)
target_text = re.sub(r'<(\/?)target(.*?)>', r'', target_text)
print(target_text)
Results:
<ph id="1" ctype="x-phrase-placeholder">{{ count }}</ph> more
<ph id="1" ctype="x-phrase-placeholder">{{ count }}</ph> de plus
Amount
Quantité
Big thank you to #larsks also for showing me a more efficient way to parse XML documents using XPATH and the .findall & .find methods! :-)
JF
The code below goes through the xml files and parses them into a single csv file
from xml.etree import ElementTree as ET
from collections import defaultdict
import csv
from pathlib import Path
directory = 'path to a folder with xml files'
with open('output.csv', 'w', newline='') as f:
writer = csv.writer(f)
headers = ['id', 'service_code', 'rational', 'qualify', 'description_num', 'description_txt', 'set_data_xin', 'set_data_xax', 'set_data_value', 'set_data_x']
writer.writerow(headers)
xml_files_list = list(map(str, Path(directory).glob('**/*.xml')))
print(xml_files_list)
for xml_file in xml_files_list:
tree = ET.parse(xml_file)
root = tree.getroot()
start_nodes = root.findall('.//START')
for sn in start_nodes:
row = defaultdict(str)
repeated_values = dict()
for k,v in sn.attrib.items():
repeated_values[k] = v
for rn in sn.findall('.//Rational'):
repeated_values['rational'] = rn.text
for qu in sn.findall('.//Qualify'):
repeated_values['qualify'] = qu.text
for ds in sn.findall('.//Description'):
repeated_values['description_txt'] = ds.text
repeated_values['description_num'] = ds.attrib['num']
for st in sn.findall('.//SetData'):
for k,v in st.attrib.items():
row['set_data_'+ str(k)] = v
for key in repeated_values.keys():
row[key] = repeated_values[key]
row_data = [row[i] for i in headers]
writer.writerow(row_data)
row = defaultdict(str)
This is the xml file.
<?xml version="1.0" encoding="utf-8"?>
<ProjectData>
<Phones>
<Date />
<Prog />
<Box />
<Feature />
<IN>MAFWDS</IN>
<Set>234234</Set>
<Pr>23423</Pr>
<Number>afasfhrtv</Number>
<Simple>dfasd</Simple>
<Nr />
<Get>6070106091</Get>
<Reno>1233</Reno>
</Phones>
<FINAL>
<START id="B001" service_code="0x5196">
<Docs Docs_type="START">
<Rational>225196</Rational>
<Qualify>6251960000A0DE</Qualify>
</Docs>
<Description num="1213f2312">The parameter</Description>
<DataFile dg="12" dg_id="let">
<SetData value="32" />
</DataFile>
</START>
<START id="C003" service_code="0x517B">
<Docs Docs_type="START">
<Rational>23423</Rational>
<Qualify>342342</Qualify>
</Docs>
<Description num="3423423f3423">The third</Description>
<DataFile dg="55" dg_id="big">
<SetData x="E1" value="21259" />
<SetData x="E2" value="02" />
</DataFile>
</START>
<START id="Z048" service_code="0x5198">
<RawData rawdata_type="ASDS">
<Rational>225198</Rational>
<Qualify>343243324234234</Qualify>
</RawData>
<Description num="434234234">The forth</Description>
<DataFile unit="21" unit_id="FEDS">
<FileX unit="eg" discrete="false" axis_pts="19" name="Vsome" text_id="bx5" unit_id="GDFSD" />
<SetData xin="5" xax="233" value="323" />
<SetData xin="123" xax="77" value="555" />
<SetData xin="17" xax="65" value="23" />
</DataFile>
</START>
</FINAL>
</ProjectData>
This is how the output looks like
Currently struggling to modify the code , so it goes to Phones (which is another child of Projectdata) takes elements from Set and Get attaches them together with _ and parses them into the first column that has the header names ** Identify**
The picture bellow shows how It should look.
Modify your headers line to
headers = ['identify', 'id', 'service_code', 'rational', 'qualify', 'description_num', 'description_txt', 'set_data_xin', 'set_data_xax', 'set_data_value', 'set_data_x']
p_get = tree.find('.//Phones/Get').text
p_set = tree.find('.//Phones/Set').text
and add this info to the row_data just before the line writer.writerow(row_data)
like this:
row_data.insert(0, p_get + '_' + p_set)
Update
row_data[0] = p_get + '_' + p_set
For each instrumentData in the below XML file, I need to print the id and the values in the data value=" " field. (the value between the quotes) All this using python 2.7
This is my XML file:
<instrumentDatas>
<instrumentData>
<instrument>
<id>Stephan</id>
</instrument>
<data value="A" />
<data value="B" />
<data value="C" />
</instrumentData>
<instrumentData>
<instrument>
<id>Patrick</id>
</instrument>
<data value="F" />
<data value="G" />
<data value="H" />
</instrumentData>
I am able to print the id for each instrumentData with the below code but cant figure out how to print the values.
from xml.dom import minidom
xmldoc=minidom.parse("C:/Users/Desktop/PythonXMLproject/Smallfile.xml")
instrumentDatas = xmldoc.getElementsByTagName("instrumentDatas")[0]
instrumentDatax= instrumentDatas.getElementsByTagName("instrumentData")
for instrumentData in instrumentDatax:
idx=instrumentData.getElementsByTagName("id")[0].firstChild.data
print(idx)
Thank you
import xml.dom.minidom
DOMTree = xml.dom.minidom.parse("C:/Users/Desktop/PythonXMLproject/Smallfile.xml")
collection = DOMTree.documentElement
instrumentDatas = collection.getElementsByTagName("instrumentData")
for instrumentData in instrumentDatas:
idx = instrumentData.getElementsByTagName("id")[0].firstChild.data
print "ID: %s" % idx
datas = instrumentData.getElementsByTagName("data")
for data in datas:
if data.hasAttribute('value'):
value = data.getAttribute('value')
print "Value: %s" % value