extract values and construct new file

extract values and construct new file - python

I have "vtu" format file (for paraview) as a text. The format is like below:
<?xml version="1.0"?>
<VTKFile type="UnstructuredGrid" version="0.1" byte_order="LittleEndian" >
<UnstructuredGrid>
<Piece NumberOfPoints="21" NumberOfCells="20" >
<Points>
<DataArray type="Float64" Name="coordinates" NumberOfComponents="3" format="ascii" >
-3.3333333333e-01 1.1111111111e-01 0.0000000000e+00
-2.7777777778e-01 1.1111111111e-01 0.0000000000e+00
-1.1111111111e-01 4.4444444445e-01 0.0000000000e+00
</DataArray>
</Points>
<Cells>
<DataArray type="UInt64" Name="connectivity" NumberOfComponents="1" format="ascii" >
0 1
2 3
5 4
It is representing a mesh file.
I would like to extract the value for NumberOfPoints and also the first two coordinate and store them in another file as following:
21
-3.3333333333e-01
1.1111111111e-01
-2.7777777778e-01
1.1111111111e-01
-1.1111111111e-01
4.4444444445e-01
I am not familiat with python, I could only read the file line by line but I don't know to construct the above file.
What I have learnt so far is very simple. For the first file I am able to detect the line NumberOfPoints is included by
import xml.etree.ElementTree as ET
tree = ET.parse('read.vtu')
root = tree.getroot()
for Piece in root.iter('Piece'):
print Piece.attrib
nr = Piece.get('NumberOfPoints')
print nr
I can I have 21 :) the next step is to add Coordinate. But I dont know how to parse them, since I cannot find any node connected to them.

Try this:
import xml.etree.ElementTree as ET
try:
from cStringIO import StringIO
except:
from StringIO import StringIO
o = file('out.txt', 'w')
tree = ET.parse('read.vtu')
root = tree.getroot()
for Piece in root.iter('Piece'):
nr = Piece.get('NumberOfPoints')
o.write(nr+ '\n')
piece = root.iter('Piece')
piece = piece.next()
point = piece.getchildren()[0]
dataArr = point.getchildren()
data = dataArr[0]
# Writing to a buffer
output = StringIO()
output.write(data.text)
# Retrieve the value written
crds = output.seek(1)
for l in output:
ls = l.split( );
o.write(ls[0]+ '\n')
o.write(ls[1]+ '\n')
output.close()
o.close()

meshio (a project of mine) knows the VTU format, so you could simply
pip install meshio
and then
import meshio
points, cells, _, _, _ = meshio.read('file.vtu')

Related

How to avoid double escape using XML

I'm using python to make a program which will have to write data in a XML tag of a specific file.
The line of data I'm willing to write is the following.
<Stream>XXXX-XXXX-XXXX-XXXX?p=0</Stream><URL>rtmp://a.rtmp.youtube.com/live2</URL>
But what I get in my XML file after writing is pretty different.
&lt;Stream&gt;XXXX-XXXX-XXXX-XXXX?p=0&lt;/Stream&gt;&lt;URL&gt;rtmp://a.rtmp.youtube.com/live2&lt;/URL&gt;
The &lt and &gt are here for purpose, and are NOT < and >. I need to keep this formatting but when I use the export as xml file, it replaces all the & by &
I use this code to write data in the xml file:
from lxml import etree as ET
Name_with_single_quote= """IF [Calculation_1] = 'Day-1' THEN [begintime] + 1
ELSEIF[Calculation_1] < 'Day-2' THEN [begintime] + 2
ELSEIF [Calculation_1] > "Day-3" THEN [begintime] + 3
ELSE [begintime]
END"""
Name_with_single_quote = Name_with_single_quote.replace("\n", "
").replace("<", "<").replace("'", "&apos;").replace(">",">").replace("\"", """)
Name_with_single_quote = str(Name_with_single_quote)
xml = """<?xml version="1.0"?>
<column role="dimension" type="nominal" name="[Calculation_1]" datatype="boolean" caption="">
<calculation formula=""/>
</column>"""
tree = ET.fromstring(xml)
formula = tree.find('.//calculation')
formula.set('formula', Name_with_single_quote)
from xml.dom import minidom
xmlstr = minidom.parseString(ET.tostring(tree)).toprettyxml()
xmlstr = '\n'.join(list(filter(lambda x: len(x.strip()), xmlstr.split('\n'))))
with open('test_for_esc_result.xml', "w") as f:
f.write(xmlstr)

XML to CSV but same tags under parent

I have an XML file like that and trying to convert it to CSV with xml2csv python library. But there is a < images > image tag that brokes everything. I want to get all < img_item > tags on different column. How can I achieve that?
Thanks,
<products>
<product>
<code>722</code>
<ws_code>B515C16CRU</ws_code>
<supplier_code>B515C16CRU</supplier_code>
<images>
<img_item type_name="">
https://www.apparel.com.tr/stance-corap-cruker-grey-orap-stance-ankle-bters-3378-72-B.jpg
</img_item>
<img_item type_name="">
https://www.apparel.com.tr/stance-corap-cruker-grey-orap-stance-ankle-bters-3379-72-B.jpg
</img_item>
<img_item type_name="">
https://www.apparel.com.tr/stance-corap-cruker-grey-orap-stance-ankle-bters-3380-72-B.jpg
</img_item>
</images>
</product>
....
</products>

As you might have guessed, the problem is because each product node has multiple img_item tags which xml2csv does not know how to handle (and, going over its documentation, does not seem to have an option to let it know how to handle these nodes).
You can, however, do this quite easily using the builtin csv module. You just need to decide how you want to delimit the different images' urls. In the example below I've decided to use ; (obviously you can't use ,, unless you use another delimiter for the columns).
Also note that I hardcoded the headers. This can be (quite) easily changed so that the headers are dynamically detected from the product node's sub-elements.
import csv
import xml.etree.ElementTree as ET
string = '''<products>
<product>
<code>722</code>
<ws_code>B515C16CRU</ws_code>
<supplier_code>B515C16CRU</supplier_code>
<images>
<img_item type_name="">https://www.apparel.com.tr/stance-corap-cruker-grey-orap-stance-ankle-bters-3378-72-B.jpg</img_item>
<img_item type_name="">https://www.apparel.com.tr/stance-corap-cruker-grey-orap-stance-ankle-bters-3379-72-B.jpg</img_item>
<img_item type_name="">https://www.apparel.com.tr/stance-corap-cruker-grey-orap-stance-ankle-bters-3380-72-B.jpg</img_item>
</images>
</product>
</products>'''
root = ET.fromstring(string)
headers = ('code', 'ws_code', 'supplier_code', 'images')
with open('test.csv', 'w', newline='') as f:
writer = csv.DictWriter(f, fieldnames=headers)
writer.writeheader()
for product in root.iter('product'):
writer.writerow({'code': product.find('code').text,
'ws_code': product.find('ws_code').text,
'supplier_code': product.find('supplier_code').text,
'images': ';'.join(img.text for img in product.iter('img_item'))})
Which produces the below CSV:
code,ws_code,supplier_code,images
722,B515C16CRU,B515C16CRU,https://www.apparel.com.tr/stance-corap-cruker-grey-orap-stance-ankle-bters-3378-72-B.jpg;https://www.apparel.com.tr/stance-corap-cruker-grey-orap-stance-ankle-bters-3379-72-B.jpg;https://www.apparel.com.tr/stance-corap-cruker-grey-orap-stance-ankle-bters-3380-72-B.jpg

import xml.etree.ElementTree as ET
import csv
import re
class xml_to_csv:
def do(self):
#self.xml_file_location = input("Enter full path of XML file(Eg = D:\programs\ResidentData.xml) : ")
self.tree = ET.parse("urunler-fotolu.xml")
self.root = self.tree.getroot()
self.csv_file_location = input("Enter full path to store CSV file(Eg = D:\programs\csv_file.csv ) : ")
self.csv_data = open(self.csv_file_location, 'w')
self.csv_writer = csv.writer(self.csv_data)
self.find_records(self.root)
def find_attributes(self,record):
temp = []
dont_do = 0
for j in record:
temp = temp + self.find_attributes(j)
dont_do = 1
if(dont_do == 0):
return [record.text]
return temp
def find_records(self,root1):
for i in root1:
csv_record = self.find_attributes(i)
sz = len(csv_record)
i=0
while (i<sz):
if csv_record[i][0] == '\n':
csv_record[i] = csv_record[i][1:len(csv_record[i])-1]
i = i+1;
print(csv_record)
self.csv_writer.writerow(csv_record)
if __name__ == "__main__":
obj = xml_to_csv()
obj.do()
Input:
For this = """
<State>
<Resident Id="100">
<Name>Sample Name</Name>
<PhoneNumber>1234567891</PhoneNumber>
<EmailAddress>sample_name#example.com</EmailAddress
<Address>
<StreetLine1>Street Line1</StreetLine1>
<City>City Name</City>
<StateCode>AE</StateCode>
<PostalCode>12345</PostalCode>
</Address>
</Resident>
</State>
"""
Output :
['Sample Name', '1234567891', 'sample_name#example.com', 'Street Line1', 'City Name', 'AE', '12345']

Python: Writing list to xml file

I'm trying to write the list elements to an xml file. I have written the below code. The xml file is created, but the data is repeated. I'm unable to figure out why is the data written twice in the xml file.
users_list = ['Group1User1', 'Group1User2', 'Group2User1', 'Group2User2']
def create_xml(self):
usrconfig = Element("usrconfig")
usrconfig = ET.SubElement(usrconfig,"usrconfig")
for user in range(len( users_list)):
usr = ET.SubElement(usrconfig,"usr")
usr.text = str(users_list[user])
usrconfig.extend(usrconfig)
tree = ET.ElementTree(usrconfig)
tree.write("details.xml",encoding='utf-8', xml_declaration=True)
Output File: details.xml
-
<usr>Group1User1</usr>
<usr>Group1User2</usr>
<usr>Group2User1</usr>
<usr>Group2User2</usr>
<usr>Group1User1</usr>
<usr>Group1User2</usr>
<usr>Group2User1</usr>
<usr>Group2User2</usr>
enter image description here

usrconfig.extend(usrconfig)
This line looks suspicious to me. if userconfig was a list, this line would be equivalent to "duplicate every element in this list". I suspect that something similar happens for Elements, too. Try deleting that line.
import xml.etree.ElementTree as ET
users_list = ["Group1User1", "Group1User2", "Group2User1", "Group2User2"]
def create_xml():
usrconfig = ET.Element("usrconfig")
usrconfig = ET.SubElement(usrconfig,"usrconfig")
for user in range(len( users_list)):
usr = ET.SubElement(usrconfig,"usr")
usr.text = str(users_list[user])
tree = ET.ElementTree(usrconfig)
tree.write("details.xml",encoding='utf-8', xml_declaration=True)
create_xml()
Result:
<?xml version='1.0' encoding='utf-8'?>
<usrconfig>
<usr>Group1User1</usr>
<usr>Group1User2</usr>
<usr>Group2User1</usr>
<usr>Group2User2</usr>
</usrconfig>

For such a simple xml structure, we can directly write out the file. But this technique might also be useful if one is not up to speed with the python xml modules.
import os
users_list = ["Group1User1", "Group1User2", "Group2User1", "Group2User2"]
os.chdir("C:\\Users\\Mike\\Desktop")
xml_out_DD = open("test.xml", 'wb')
xml_out_DD.write(bytes('<usrconfig>', 'utf-8'))
for i in range(0, len(users_list)):
xml_out_DD.write(bytes('<usr>' + users_list[i] + '</usr>', 'utf-8'))
xml_out_DD.write(bytes('</usrconfig>', 'utf-8'))
xml_out_DD.close()

xml file to csv file python script

I need a python script for extract data from xml file
I have a xml file as shoen below:
<software>
<name>Update Image</name>
<Build>22.02</Build>
<description>Firmware for Delta-M Series </description>
<CommonImages> </CommonImages>
<ModelBasedImages>
<ULT>
<CNTRL_0>
<file type="UI_APP" ver="2.35" crc="1234"/>
<file type="MainFW" ver="5.01" crc="5678"/>
<SIZE300>
<file type="ParamTableDB" ver="1.1.4" crc="9101"/>
</SIZE300>
</CNTRL_0>
<CNTRL_2>
<file type="UI_APP" ver="2.35" crc="1234"/>
<file type="MainFW" ver="5.01" crc="9158"/>
</CNTRL_2>
</ULT>
</ModelBasedImages>
</software>
I want the data in table format like:
type ver crc
UI_APP 2.35 1234
MainFW 5.01 5678
ParamTableDB 1.1.4 9101
UI_APP 2.35 1234
MainFW 5.01 9158
Extract into any type of file csv/doc....
I tried this code:
import xml.etree.ElementTree as ET
import csv
tree = ET.parse("Build_40.01 (copy).xml")
root = tree.getroot()
# open a file for writing
Resident_data = open('ResidentData.csv', 'w')
# create the csv writer object
csvwriter = csv.writer(Resident_data)
resident_head = []
count = 0
for member in root.findall('file'):
resident = []
address_list = []
if count == 0:
name = member.find('type').tag
resident_head.append(name)
ver = member.find('ver').tag
resident_head.append(ver)
crc = member.find('crc').tag
resident_head.append(crc)
csvwriter.writerow(resident_head)
count = count + 1
name = member.find('type').text
resident.append(name)
ver = member.find('ver').text
resident.append(ver)
crc = member.find('crc').text
resident.append(crc)
csvwriter.writerow(resident)
Resident_data.close()
Thanks in advance
edited:xml code updated.

Use the xpath expression .//file to find all <file> elements in the XML document, and then use each element's attributes to populate the CSV file through a csv.DictWriter:
import csv
import xml.etree.ElementTree as ET
tree = ET.parse("Build_40.01 (copy).xml")
root = tree.getroot()
with open('ResidentData.csv', 'w') as f:
w = csv.DictWriter(f, fieldnames=('type', 'ver', 'crc'))
w.writerheader()
w.writerows(e.attrib for e in root.findall('.//file'))
For your sample input the output CSV file will look like this:
type,ver,crc
UI_APP,2.35,1234
MainFW,5.01,5678
ParamTableDB,1.1.4,9101
UI_APP,2.35,1234
MainFW,5.01,9158
which uses the default delimiter (comma) for a CSV file. You can change the delimiter using the delimiter=' ' option to DictWriter(), however, you will not be able to obtain the same formatting as your sample output, which appears to use fixed width fields (but you might get away with using tab as the delimiter).

Problems with parsing xml

I have some code that is parsing an xml file and saving it as a csv. I can do this two ways, one by manually downloading the xml file and then parsing it, the other by taking the xml feed directly using ET.fromstring and then parsing. When I go directly I get data errors it appears to be an integrity issue. I am trying to include the xml download in to the code, but I am not quite sure the best way to approach this.
import xml.etree.ElementTree as ET
import csv
import urllib
url = 'http://www.capitalbikeshare.com/data/stations/bikeStations.xml'
connection = urllib.urlopen(url)
data = connection.read()
#I need code here!!!
tree = ET.parse('bikeStations.xml')
root = tree.getroot()
#for child in root:
#print child.tag, child.attrib
locations = []
for station in root.findall('station'):
name = station.find('name').text
bikes = station.find('nbBikes').text
docks = station.find('nbEmptyDocks').text
time = station.find('latestUpdateTime').text
sublist = [name, bikes, docks, time]
locations.append(sublist)
#print 'Station:', name, 'has', bikes, 'bikes and' ,docks, 'docks'
#print locations
s = open('statuslog.csv', 'wb')
w = csv.writer(s)
w.writerows(locations)
s.close()
f = open('filelog.csv', 'ab')
w = csv.writer(f)
w.writerows(locations)
f.close()

What you need is:
root = ET.fromstring(data)
and omit the line of: tree = ET.parse('bikeStations.xml')
As the response from connection.read() returns String, you can directly read the XML string by using fromstring method, you can read more from HERE.

Develop Reference

Python is a programming language that lets you work quickly and integrate systems more effectively.

extract values and construct new file - python

meshio (a project of mine) knows the VTU format, so you could simply pip install meshio and then import meshio points, cells, _, _, _ = meshio.read('file.vtu')

Related

How to avoid double escape using XML

XML to CSV but same tags under parent

Python: Writing list to xml file

xml file to csv file python script

Problems with parsing xml

Categories

Resources