For each instrumentData in the below XML file, I need to print the id and the values in the data value=" " field. (the value between the quotes) All this using python 2.7
This is my XML file:
<instrumentDatas>
<instrumentData>
<instrument>
<id>Stephan</id>
</instrument>
<data value="A" />
<data value="B" />
<data value="C" />
</instrumentData>
<instrumentData>
<instrument>
<id>Patrick</id>
</instrument>
<data value="F" />
<data value="G" />
<data value="H" />
</instrumentData>
I am able to print the id for each instrumentData with the below code but cant figure out how to print the values.
from xml.dom import minidom
xmldoc=minidom.parse("C:/Users/Desktop/PythonXMLproject/Smallfile.xml")
instrumentDatas = xmldoc.getElementsByTagName("instrumentDatas")[0]
instrumentDatax= instrumentDatas.getElementsByTagName("instrumentData")
for instrumentData in instrumentDatax:
idx=instrumentData.getElementsByTagName("id")[0].firstChild.data
print(idx)
Thank you
import xml.dom.minidom
DOMTree = xml.dom.minidom.parse("C:/Users/Desktop/PythonXMLproject/Smallfile.xml")
collection = DOMTree.documentElement
instrumentDatas = collection.getElementsByTagName("instrumentData")
for instrumentData in instrumentDatas:
idx = instrumentData.getElementsByTagName("id")[0].firstChild.data
print "ID: %s" % idx
datas = instrumentData.getElementsByTagName("data")
for data in datas:
if data.hasAttribute('value'):
value = data.getAttribute('value')
print "Value: %s" % value
Related
I am working on a simple python script to extract certain data from an xml file. The xml contains windows events and eventid. Below I am showing the code. It is failing when it needs to extract the data, but it is creating the file but is empty.
from xml.etree import ElementTree as ET
import csv
tree = ET.parse("SecurityLog-rev2.xml")
root = tree.getroot()
url = root[0].tag[:-len("Event")]
fieldnames = ['EventID']
with open ('event_log.csv', 'w') as csvfile:
writecsv = csv.DictWriter(csvfile, fieldnames = fieldnames)
writecsv.writeheader()
for event in root:
system = event.find(url + "System")
output = {}
fields = ['EventID']
# for tag,att in fields:
# output[tag] = system.find(url + tag).attrib[att]
if event.find(url + "EventData") != None:
for data in event.find(url + "EventData"):
name = data.attrib['Name']
output[name] = data.text
writecsv.writerow(output)
<Event xmlns='http://schemas.microsoft.com/win/2004/08/events/event'><System><Provider Name='Microsoft-Windows-Security-Auditing' Guid='{54849625-5478-4994-A5BA-3E3B0328C30D}'/>
<EventID>4634</EventID>
<Version>0</Version><Level>0</Level><Task>12545</Task><Opcode>0</Opcode><Keywords>0x8020000000000000</Keywords><TimeCreated SystemTime='2011-04-16T15:07:53.890625000Z'/>
<EventRecordID>1410962</EventRecordID><Correlation/><Execution ProcessID='452' ThreadID='3900'/><Channel>Security</Channel><Computer>DC01.AFC.com</Computer><Security/></System>
<EventData><Data Name='TargetUserSid'>S-1-5-21-2795111079-3225111112-3329435632-1610</Data>
<Data Name='TargetUserName'>grant.larson</Data>
<Data Name='TargetDomainName'>AFC</Data><Data Name='TargetLogonId'>0x3642df8</Data><Data Name='LogonType'>3</Data></EventData></Event>
I am not sure what exactly you would parse. Here is a solution for the Id and the events:
Your XML File provided above as Input:
<?xml version="1.0" encoding="utf-8"?>
<Event xmlns='http://schemas.microsoft.com/win/2004/08/events/event'>
<System>
<Provider Name='Microsoft-Windows-Security-Auditing' Guid='{54849625-5478-4994-A5BA-3E3B0328C30D}' />
<EventID>4634</EventID>
<Version>0</Version>
<Level>0</Level>
<Task>12545</Task>
<Opcode>0</Opcode>
<Keywords>0x8020000000000000</Keywords>
<TimeCreated SystemTime='2011-04-16T15:07:53.890625000Z' />
<EventRecordID>1410962</EventRecordID>
<Correlation />
<Execution ProcessID='452' ThreadID='3900' />
<Channel>Security</Channel>
<Computer>DC01.AFC.com</Computer>
<Security />
</System>
<EventData>
<Data Name='TargetUserSid'>S-1-5-21-2795111079-3225111112-3329435632-1610</Data>
<Data Name='TargetUserName'>grant.larson</Data>
<Data Name='TargetDomainName'>AFC</Data>
<Data Name='TargetLogonId'>0x3642df8</Data>
<Data Name='LogonType'>3</Data>
</EventData>
</Event>
The program code without regex for catching the namespace:
from xml.etree import ElementTree as ET
import pandas as pd
import csv
tree = ET.parse("SecurityLog-rev2.xml")
root = tree.getroot()
ns = "{http://schemas.microsoft.com/win/2004/08/events/event}"
data = []
for eventID in root.findall(".//"):
if eventID.tag == f"{ns}System":
for e_id in eventID.iter():
if e_id.tag == f'{ns}EventID':
row = "EventID", e_id.text
data.append(row)
if eventID.tag == f"{ns}EventData":
for attr in eventID.iter():
if attr.tag == f'{ns}Data':
#print(attr.attrib)
row = attr.get('Name'), attr.text
data.append(row)
df = pd.DataFrame.from_dict(data, orient='columns')
df.to_csv('event_log.csv', index=False, header=False)
print(df)
Output:
0 1
0 EventID 4634
1 TargetUserSid S-1-5-21-2795111079-3225111112-3329435632-1610
2 TargetUserName grant.larson
3 TargetDomainName AFC
4 TargetLogonId 0x3642df8
5 LogonType 3
The CSV File doesn't contain the index and header:
EventID,4634
TargetUserSid,S-1-5-21-2795111079-3225111112-3329435632-1610
TargetUserName,grant.larson
TargetDomainName,AFC
TargetLogonId,0x3642df8
LogonType,3
You can tanspose() the output:
df.T.to_csv('event_log.csv', index=False, header=False)
I want to create a function to modify XML content without changing the format. I managed to change the text but I can't do it without changing the format in XML.
So now, what I wanted to do is to add space before and after CDATA in a XML file.
Default XML file:
<?xml version="1.0" encoding="utf-8"?>
<Mapsxmlns="http://www.semi.org">
<Map>
<Device>
<ReferenceDevice/>
<Bin>
<Bin Bin="001"/>
</Bin>
<Data>
<Row> <![CDATA[001 001 001]]> </Row>
</Data>
</Device>
</Map>
</Maps>
And I am getting this result:
<?xml version="1.0" encoding="utf-8"?>
<Mapsxmlns="http://www.semi.org">
<Map>
<Device>
<ReferenceDevice/>
<Bin>
<Bin Bin="001"/>
</Bin>
<Data>
<Row><![CDATA[001 001 099]]></Row>
</Data>
</Device>
</Map>
</Maps>
However, I want the new xml to be like this:
<?xml version="1.0" encoding="utf-8"?>
<Mapsxmlns="http://www.semi.org">
<Map>
<Device>
<ReferenceDevice/>
<Bin>
<Bin Bin="001"/>
</Bin>
<Data>
<Row> <![CDATA[001 001 099]]> </Row>
</Data>
</Device>
</Map>
</Maps>
Here is my code:
from lxml import etree as ET
def xml_new(f,fpath,newtext,xmlrow):
xmlrow = 19
parser = ET.XMLParser(strip_cdata=False)
tree = ET.parse(f, parser)
root = tree.getroot()
for child in root:
value = child[0][2][xmlrow].text
text = ET.CDATA("001 001 099")
child[0][2][xmlrow] = ET.Element('Row')
child[0][2][xmlrow].text = text
child[0][2][xmlrow].tail = "\n"
ET.register_namespace('A', "http://www.semi.org")
tree.write(fpath,encoding='utf-8',xml_declaration=True)
return value
Anyone can help me on this? thanks in advance!
I don't quite understand what you want to do. Here's an example for you. I don't know if it can meet your needs.
from simplified_scrapy import SimplifiedDoc,req,utils
html ='''<?xml version="1.0" encoding="utf-8"?>
<Mapsxmlns="http://www.semi.org">
<Map>
<Device>
<ReferenceDevice/>
<Bin>
<Bin Bin="001"/>
</Bin>
<Data>
<Row> <![CDATA[001 001 001]]> </Row>
</Data>
</Device>
</Map>
</Maps>'''
doc = SimplifiedDoc(html)
row = doc.Data.Row # Get the node you want to modify.
row.setContent(" "+row.html+" ") # Modify the node content.
print (doc.html)
Result:
<?xml version="1.0" encoding="utf-8"?>
<Mapsxmlns="http://www.semi.org">
<Map>
<Device>
<ReferenceDevice />
<Bin>
<Bin Bin="001" />
</Bin>
<Data>
<Row> <![CDATA[001 001 001]]> </Row>
</Data>
</Device>
</Map>
</Maps>
thanks for all your help. I have found another way to achieve the result I want
This is the code:
# what you want to change
replaceby = '020]]> </Row>\n'
# row you want to change
row = 1
# col you want to change based on list
col = 3
file = open(file,'r')
line = file.readlines()
i = 0
editedXML=[]
for l in line:
if 'cdata' in l.lower():
i=i+1
if i == row:
oldVal = l.split(' ')
newVal = []
for index, old in enumerate(oldVal):
if index == col:
newVal.append(replaceby)
else:
newVal.append(old)
editedXML.append(' '.join(newVal))
else:
editedXML.append(l)
else:
editedXML.append(l)
file2 = open(newfile,'w')
file2.write(''.join(editedXML))
file2.close()
The code below goes through the xml files and parses them into a single csv file
from xml.etree import ElementTree as ET
from collections import defaultdict
import csv
from pathlib import Path
directory = 'path to a folder with xml files'
with open('output.csv', 'w', newline='') as f:
writer = csv.writer(f)
headers = ['id', 'service_code', 'rational', 'qualify', 'description_num', 'description_txt', 'set_data_xin', 'set_data_xax', 'set_data_value', 'set_data_x']
writer.writerow(headers)
xml_files_list = list(map(str, Path(directory).glob('**/*.xml')))
print(xml_files_list)
for xml_file in xml_files_list:
tree = ET.parse(xml_file)
root = tree.getroot()
start_nodes = root.findall('.//START')
for sn in start_nodes:
row = defaultdict(str)
repeated_values = dict()
for k,v in sn.attrib.items():
repeated_values[k] = v
for rn in sn.findall('.//Rational'):
repeated_values['rational'] = rn.text
for qu in sn.findall('.//Qualify'):
repeated_values['qualify'] = qu.text
for ds in sn.findall('.//Description'):
repeated_values['description_txt'] = ds.text
repeated_values['description_num'] = ds.attrib['num']
for st in sn.findall('.//SetData'):
for k,v in st.attrib.items():
row['set_data_'+ str(k)] = v
for key in repeated_values.keys():
row[key] = repeated_values[key]
row_data = [row[i] for i in headers]
writer.writerow(row_data)
row = defaultdict(str)
This is the xml file.
<?xml version="1.0" encoding="utf-8"?>
<ProjectData>
<Phones>
<Date />
<Prog />
<Box />
<Feature />
<IN>MAFWDS</IN>
<Set>234234</Set>
<Pr>23423</Pr>
<Number>afasfhrtv</Number>
<Simple>dfasd</Simple>
<Nr />
<Get>6070106091</Get>
<Reno>1233</Reno>
</Phones>
<FINAL>
<START id="B001" service_code="0x5196">
<Docs Docs_type="START">
<Rational>225196</Rational>
<Qualify>6251960000A0DE</Qualify>
</Docs>
<Description num="1213f2312">The parameter</Description>
<DataFile dg="12" dg_id="let">
<SetData value="32" />
</DataFile>
</START>
<START id="C003" service_code="0x517B">
<Docs Docs_type="START">
<Rational>23423</Rational>
<Qualify>342342</Qualify>
</Docs>
<Description num="3423423f3423">The third</Description>
<DataFile dg="55" dg_id="big">
<SetData x="E1" value="21259" />
<SetData x="E2" value="02" />
</DataFile>
</START>
<START id="Z048" service_code="0x5198">
<RawData rawdata_type="ASDS">
<Rational>225198</Rational>
<Qualify>343243324234234</Qualify>
</RawData>
<Description num="434234234">The forth</Description>
<DataFile unit="21" unit_id="FEDS">
<FileX unit="eg" discrete="false" axis_pts="19" name="Vsome" text_id="bx5" unit_id="GDFSD" />
<SetData xin="5" xax="233" value="323" />
<SetData xin="123" xax="77" value="555" />
<SetData xin="17" xax="65" value="23" />
</DataFile>
</START>
</FINAL>
</ProjectData>
This is how the output looks like
Currently struggling to modify the code , so it goes to Phones (which is another child of Projectdata) takes elements from Set and Get attaches them together with _ and parses them into the first column that has the header names ** Identify**
The picture bellow shows how It should look.
Modify your headers line to
headers = ['identify', 'id', 'service_code', 'rational', 'qualify', 'description_num', 'description_txt', 'set_data_xin', 'set_data_xax', 'set_data_value', 'set_data_x']
p_get = tree.find('.//Phones/Get').text
p_set = tree.find('.//Phones/Set').text
and add this info to the row_data just before the line writer.writerow(row_data)
like this:
row_data.insert(0, p_get + '_' + p_set)
Update
row_data[0] = p_get + '_' + p_set
I have the following xml file and I will like to structure it group it by Table Id.
xml = """
<Tables Count="19">
<Table Id="1" >
<Data>
<Cell>
<Brush/>
<Text>AA</Text>
<Text>BB</Text>
</Cell>
</Data>
</Table>
<Table Id="2" >
<Data>
<Cell>
<Brush/>
<Text>CC</Text>
<Text>DD</Text>
</Cell>
</Data>
</Table>
</Tables>
"""
I would like to parse it and get something like this.
I have tried something below but couldn't figure out it.
from lxml import etree
tree = etree.fromstring(xml)
users = {}
for user in tree.xpath("//Tables"):
name = user.xpath("Table")[0].text
users[name] = []
for group in user.xpath("Data/Cell/Text"):
users[name].append(group.text)
print (users)
Is that possible to get the above result? if so, could anyone help me to do this? I really appreciate your effort.
You need to change your xpath queries to:
from lxml import etree
tree = etree.fromstring(xml)
users = {}
for user in tree.xpath("//Tables/Table"):
# ^^^
name = user.attrib['Id']
users[name] = []
for group in user.xpath(".//Data/Cell/Text"):
# ^^^
users[name].append(group.text)
print (users)
...and use the attrib dictionary.
This yields for your string:
{'1': ['AA', 'BB'], '2': ['CC', 'DD']}
If you're into "one-liners", you could even do:
users = {name: [group.text for group in user.xpath(".//Data/Cell/Text")]
for user in tree.xpath("//Tables/Table")
for name in [user.attrib["Id"]]}
sorry for my poor English. but i need your help ;(
i have 2 xml files.
one is:
<root>
<data name="aaaa">
<value>"old value1"</value>
<comment>"this is an old value1 of aaaa"</comment>
</data>
<data name="bbbb">
<value>"old value2"</value>
<comment>"this is an old value2 of bbbb"</comment>
</data>
</root>
two is:
<root>
<data name="aaaa">
<value>"value1"</value>
<comment>"this is a value 1 of aaaa"</comment>
</data>
<data name="bbbb">
<value>"value2"</value>
<comment>"this is a value2 of bbbb"</comment>
</data>
<data name="cccc">
<value>"value3"</value>
<comment>"this is a value3 of cccc"</comment>
</data>
</root>
one.xml will be updated from two.xml.
so, the one.xml should be like this.
one.xml(after) :
<root>
<data name="aaaa">
<value>"value1"</value>
<comment>"this is a value1 of aaaa"</comment>
</data>
<data name="bbbb">
<value>"value2"</value>
<comment>"this is a value2 of bbbb"</comment>
</data>
</root>
data name="cccc" is not exist in one.xml. therefore ignored.
actually what i want to do is
download two.xml(whole list) from db
update my one.xml (it contains DATA-lists that only the app uses) by two.xml
Any can help me please !!
Thanks!!
==============================================================
xml.etree.ElementTree
your code works with the example. but i found a problem in real xml file.
the real one.xml contains :
<?xml version="1.0" encoding="utf-8"?>
<root>
<resheader name="resmimetype">
<value>text/microsoft-resx</value>
</resheader>
<resheader name="version">
<value>2.0</value>
</resheader>
<resheader name="reader">
<value>System.Resources.ResXResourceReader, System.Windows.Forms, Version=4.0.0.0, Culture=neutral, PublicKeyToken=b77a5c561934e089</value>
</resheader>
<resheader name="writer">
<value>System.Resources.ResXResourceWriter, System.Windows.Forms, Version=4.0.0.0, Culture=neutral, PublicKeyToken=b77a5c561934e089</value>
</resheader>
<data name="NotesLabel" xml:space="preserve">
<value>Hinweise:</value>
<comment>label for input field</comment>
</data>
<data name="NotesPlaceholder" xml:space="preserve">
<value>z . Milch kaufen</value>
<comment>example input for notes field</comment>
</data>
<data name="AddButton" xml:space="preserve">
<value>Neues Element hinzufügen</value>
<comment>this string appears on a button to add a new item to the list</comment>
</data>
</root>
it seems, resheader causes trouble.
do you have any idea to fix?
You can use xml.etree.ElementTree and while there are propably more elegant ways, this should work on files that fit in memory if names are unique in two.xml
import xml.etree.ElementTree as ET
tree_one = ET.parse('one.xml')
root_one = tree_one.getroot()
tree_two = ET.parse('two.xml')
root_two = tree_two.getroot()
data_two=dict((e.get("name"), e) for e in root_two.findall("data"))
for eo in root_one.findall("data"):
name=eo.get("name")
tail=eo.tail
eo.clear()
eo.tail=tail
en=data_two[name]
for k,v in en.items():
eo.set(k,v)
eo.extend(en.findall("*"))
eo.text=en.text
tree_one.write("one.xml")
If your files do not fit in memory you can still use xml.dom.pulldom as long as single data entries do fit.