I have written the below code to create moderately large XML file, wherein I will be creating nodes in loop.
import xml.etree.cElementTree as ET
number = 0
def xml_write(number,doc):
ET.SubElement(doc, "extra-TextID", used="true").text = ""+str(number) ##in each loop number will be changed from 0 to 9
while number != 10:
doc = ET.Element("message")
xml_write(number,doc)
tree = ET.ElementTree(doc)
tree.write('XML_file.xml')
number = number + 1
But running the above code I am only getting the last node, i.e., with "9" in the last line. Data is getting replaced in the file. How to append it so that I will get all the nodes containing 0 to 9 in each node.
<?xml version="1.0"?>
-<message>
<source>Rain</source>
<translations language="Dev">Cyclone</translations>
<extra-TextID used="true">9</extra-TextID>
<message>
I need to get xml file as:
<?xml version="1.0"?>
-<message>
<source>Rain</source>
<translations language="Dev">Cyclone</translations>
<extra-TextID used="true">0</extra-TextID>
<message>
<?xml version="1.0"?>
-<message>
<source>Rain</source>
<translations language="Dev">Cyclone</translations>
<extra-TextID used="true">1</extra-TextID>
<message>
<?xml version="1.0"?>
-<message>
<source>Rain</source>
<translations language="Dev">Cyclone</translations>
<extra-TextID used="true">3</extra-TextID>
<message>
.
.
.
<?xml version="1.0"?>
-<message>
<source>Rain</source>
<translations language="Dev">Cyclone</translations>
<extra-TextID used="true">9</extra-TextID>
<message>
The ElementTree library would not dump an XML with multiple root elements. If you want to have this kind of output in the XML file, append the generated elements manually:
with open('XML_file.xml', 'wb') as f:
while number != 10:
doc = ET.Element("message")
xml_write(number, doc)
f.write(ET.tostring(doc, method="xml"))
number += 1
Related
My xml file:
<?xml version='1.0' encoding='UTF-8'?>
<Document xmlns="urn:iso:std:iso:20022:tech:xsd:pain.001.001.03" xmlns:xsi="http://www.w3.org/2001/XMLSchema-instance">
<CstmrCdtTrfInitn>
<CtgyPurp>. // ---->i want to change this tag
<Cd>SALA</Cd> //-----> no change
</CtgyPurp> // ----> i want to change this tag
</CstmrCdtTrfInitn>
</Document>
I want to make a change in the xml file:
<CtgyPurp></CtgyPurp> change in <newName></newName>
I know how to change the value within a tag but not how to change/modify the tag itself with lxml
Something like this should work - note the treatment of namespaces:
from lxml import etree
ctg = """[your xml above"]"""
doc = etree.XML(ctg.encode())
ns = {"xx": "urn:iso:std:iso:20022:tech:xsd:pain.001.001.03"}
target = doc.xpath('//xx:CtgyPurp',namespaces=ns)[0]
target.tag = "newName"
print(etree.tostring(doc).decode())
Output:
<Document xmlns="urn:iso:std:iso:20022:tech:xsd:pain.001.001.03" xmlns:xsi="http://www.w3.org/2001/XMLSchema-instance">
<CstmrCdtTrfInitn>
<newName>. // ---->i want to change this tag
<Cd>SALA</Cd> //-----> no change
</newName> // ----> i want to change this tag
</CstmrCdtTrfInitn>
</Document>
I have an log file from an application in XML-like format that I'm trying to parse. As you can see from the file, one "group" starts with a [trace] line, and contains 4 nodes - RequestMeta, Request, ReplyMeta, and Reply.
Once the file is parsed, I want to create an object for each "group" and use the objects for further processing. There could be from 1:n groups depending on the complexity of the log file.
I have been able to parse the XML, but I have some questions on how best to proceed based on it's structure.
The first problem is how to structure/re-structure the file for parsing. Since I'm adding a single root node to more than one "group", there will be no easy way for me to know which children of the root node belong together in that group. In the original file, the group is denoted as everything between the [trace] line and the next [trace] line.
I think I could potentially solve this by taking each string "group" and create a tree for each group instead of a tree for the entire file.
The second problem is how to store the data once it's parsed. Each and every request/reply will contain different data elements under the srvdata node. I'm not sure how to dynamically store a variable number of values that have a variable number of names.
After parsing all of the data, I want to output it in a simple webpage that looks something like https://imgur.com/a/2l6ZSJK
py script
import xml.etree.ElementTree as ET
with open('C:/code/mra/requestreply.txt') as f:
txt = f.read()
pos = 0
# replace all [trace] lines
while pos >= 0:
pos = txt.find('[trace-')
pos2 = txt.find('\n', pos + 1) + 1
if pos >= 0:
txt = txt.replace(txt[pos:pos2], '')
# replace all xml instances because they are out of order
txt = txt.replace('<?xml version="1.0" encoding="utf-8"?>\n', '')
# add a master root node
xml = '<root>\n' + txt + '</root>'
tree = ET.fromstring(xml)
xml file - this is considered a single group (there could be hundreds)
[trace-592] TransactionID=6010 TransactionName=CPM.ExecuteDiscernScript User=MEPPS
<RequestMeta>
<?xml version="1.0" encoding="utf-8"?>
<srvxml>
<xs:schema xmlns:xs="http://www.w3.org/2001/XMLSchema">
....
</xs:schema>
<srvdata lang="C">
....
</srvdata>
</srvxml>
</RequestMeta>
<Request>
<?xml version="1.0" encoding="utf-8"?>
<srvxml>
<xs:schema xmlns:xs="http://www.w3.org/2001/XMLSchema">
....
</xs:schema>
<srvdata lang="C">
....
</srvdata>
</srvxml>
</Request>
<ReplyMeta>
<?xml version="1.0" encoding="utf-8"?>
<srvxml>
<xs:schema xmlns:xs="http://www.w3.org/2001/XMLSchema">
....
</xs:schema>
<srvdata lang="C">
....
</srvdata>
</srvxml>
</ReplyMeta>
<Reply>
<?xml version="1.0" encoding="utf-8"?>
<srvxml>
<xs:schema xmlns:xs="http://www.w3.org/2001/XMLSchema">
....
</xs:schema>
<srvdata lang="C">
....
</srvdata>
</srvxml>
</Reply>
I suggest modify your xml structure like this, I named the file trace.xml:
<?xml version="1.0" encoding="utf-8"?>
<root>
<!--[trace-592] TransactionID=6010 TransactionName=CPM.ExecuteDiscernScript User=MEPPS-->
<RequestMeta>
<!-- <?xml version="1.0" encoding="utf-8"?> -->
<srvxml>
<xs:schema xmlns:xs="http://www.w3.org/2001/XMLSchema">
....
</xs:schema>
<srvdata lang="C">
....
</srvdata>
</srvxml>
</RequestMeta>
<Request>
<!-- <?xml version="1.0" encoding="utf-8"?> -->
<srvxml>
<xs:schema xmlns:xs="http://www.w3.org/2001/XMLSchema">
....
</xs:schema>
<srvdata lang="C">
....
</srvdata>
</srvxml>
</Request>
<ReplyMeta>
<!-- <?xml version="1.0" encoding="utf-8"?> -->
<srvxml>
<xs:schema xmlns:xs="http://www.w3.org/2001/XMLSchema">
....
</xs:schema>
<srvdata lang="C">
....
</srvdata>
</srvxml>
</ReplyMeta>
<Reply>
<!-- <?xml version="1.0" encoding="utf-8"?> -->
<srvxml>
<xs:schema xmlns:xs="http://www.w3.org/2001/XMLSchema">
....
</xs:schema>
<srvdata lang="C">
....
</srvdata>
</srvxml>
</Reply>
</root>
Then you can parse each segment separate like:
import xml.etree.ElementTree as ET
def parseRequestMeta(RequestMeta):
"""Parse your interest here """
for root in RequestMeta:
print(root.tag)
for child in root.iter():
print(child.tag, child.text)
def parseRequest(Request):
psss
def parseReplyMeta(ReplyMeta):
psss
def parseReply(Reply):
psss
RequestMeta = []
Request = []
ReplyMeta = []
Reply = []
events = ["start", "end"]
for event, node in ET.iterparse('trace.xml', events=events):
if event == "end" and node.tag == "RequestMeta":
RequestMeta.append(node)
print(node.tag)
if event == "end" and node.tag == "Request":
Request.append(node)
print(node.tag)
if event == "end" and node.tag == "ReplyMeta":
ReplyMeta.append(node)
print(node.tag)
if event == "end" and node.tag == "Reply":
Reply.append(node)
print(node.tag)
parseRequestMeta(RequestMeta)
parseRequestMeta(Request)
parseRequestMeta(ReplyMeta)
parseRequestMeta(Reply)
I get an xml string from a post request and I need to use this xml in a subsequent request. I need to edit the XML from the first request to reflect the correct format for the subsequent request.
I can successfully remove the name spaces but am struggling with extracting the desired node and keeping the xml formatting.
current format
<?xml version="1.0" encoding="UTF-8"?>
<soap:Envelope xmlns:xsd="http://www.w3.org/2001/XMLSchema" xmlns:xsi="http://www.w3.org/2001/XMLSchema-instance" xmlns:soap="http://schemas.xmlsoap.org/soap/envelope/">
<soap:Body>
<GetExResponse xmlns="http://www.someurl.com/">
<GetExResult>
<DataMap xmlns="" sourceType="0">
<FieldMap flag="Q1" destination="Q1_1" source="Q1_1"/>
<FieldMap flag="Q1" destination="Q1_1" source="Q1_1"/>
</DataMap>
</GetExResult>
</GetExResponse>
</soap:Body>
</soap:Envelope>
Desired Format
<?xml version="1.0" encoding="UTF-8"?>
<DataMap xmlns="" sourceType="0">
<FieldMap flag="Q1" destination="Q1_1" source="Q1_1"/>
<FieldMap flag="Q1" destination="Q1_1" source="Q1_1"/>
</DataMap>
--removes namespaces
dmXML = xmlstring
from lxml import etree
root = etree.fromstring(dmXML)
for elem in root.getiterator():
elem.tag = etree.QName(elem).localname
etree.cleanup_namespaces(root)
test = etree.tostring(root).decode()
print(test)
--extracts desired node but into dataframe changing the formatting
xdf = pandas.read_xml(dmXML, xpath='.//DataMap/*', namespaces={"doc": "http://www.w3.org/2001/XMLSchema"})
xml = pandas.DataFrame.to_xml(xdf)
You can simply extract the relevant portion into a new document:
import xml.etree.ElementTree as ET
root = ET.fromstring(dmXML)
new_root = root.find('.//DataMap')
print(ET.tostring(new_root, xml_declaration=True, encoding='UTF-8').decode())
Output:
<?xml version='1.0' encoding='UTF-8'?>
<DataMap sourceType="0">
<FieldMap flag="Q1" destination="Q1_1" source="Q1_1" />
<FieldMap flag="Q1" destination="Q1_1" source="Q1_1" />
</DataMap>
I have one xml file that looks like this, XML1:
<?xml version='1.0' encoding='utf-8'?>
<report>
</report>
And the other one that is like this,
XML2:
<?xml version='1.0' encoding='utf-8'?>
<report attrib1="blabla" attrib2="blabla" attrib3="blabla" attrib4="blabla" attrib5="blabla" >
<child1>
<child2>
....
</child2>
</child1>
</report>
I need to replace and put root element of XML2 without its children, so XML1 looks like this:
<?xml version='1.0' encoding='utf-8'?>
<report attrib1="blabla" attrib2="blabla" attrib3="blabla" attrib4="blabla" attrib5="blabla">
</report>
Currently my code looks like this but it won't remove children but put whole tree inside:
source_tree = ET.parse('XML2.xml')
source_root = source_tree.getroot()
report = source_root.findall('report')
for child in list(report):
report.remove(child)
source_tree.write('XML1.xml', encoding='utf-8', xml_declaration=True)
Anyone has ide how can I achieve this?
Thanks!
Try the below (just copy attrib)
import xml.etree.ElementTree as ET
xml1 = '''<?xml version='1.0' encoding='utf-8'?>
<report>
</report>'''
xml2 = '''<?xml version='1.0' encoding='utf-8'?>
<report attrib1="blabla" attrib2="blabla" attrib3="blabla" attrib4="blabla" attrib5="blabla" >
<child1>
<child2>
</child2>
</child1>
</report>'''
root1 = ET.fromstring(xml1)
root2 = ET.fromstring(xml2)
root1.attrib = root2.attrib
ET.dump(root1)
output
<report attrib1="blabla" attrib2="blabla" attrib3="blabla" attrib4="blabla" attrib5="blabla">
</report>
So here is working code:
source_tree = ET.parse('XML2.xml')
source_root = source_tree.getroot()
dest_tree = ET.parse('XML1.xml')
dest_root = dest_tree.getroot()
dest_root.attrib = source_root.attrib
dest_tree.write('XML1.xml', encoding='utf-8', xml_declaration=True)
<?xml version="1.0" encoding="utf-8"?>
<ArrayOfRecord xmlns:i="http://www.w3.org/2001/XMLSchema-instance" i:type="Record">
<AvailableCharts>
<Accelerometer>true</Accelerometer>
<Velocity>false</Velocity>
</AvailableCharts>
<Trics>
<Trick>
<EndOffset>PT2M21.835S</EndOffset>
<Values>
<TrickValue>
<Acceleration>26.505801694441629</Acceleration>
<Rotation>0.023379150593228679</Rotation>
</TrickValue>
</Values>
</Trick>
</Trics>
<Values>
<SensorValue>
<accelx>-3.593643144</accelx>
<accely>7.316485176</accely>
</SensorValue>
<SensorValue>
<accelx>0.31103436</accelx>
<accely>7.70408184</accely>
</SensorValue>
</Values>
</ArrayOfRecord>
I am only interested in 'accelx' and 'accely' value in this data and need to create a csv out of it.
Update: The code given below breaks when I change the second row with the following. Nothing is displayed because of this;
<ArrayOfRecord xmlns:i="http://www.w3.org/2001/XMLSchema-instance" i:type="Record" xmlns="http://schemas">
The following code works:
import xml.etree.ElementTree as etree
tree = etree.parse(r"C:\Users\data.xml")
root = tree.getroot()
val_of_interest = root.findall("./Values/SensorValue")
for sensor_val in val_of_interest:
print sensor_val.find('accelx').text
print sensor_val.find('accely').text