My input json file looks something like this
{"id":1,"author":"abc","title":"xyz"}
{"id":2,"author":"def","title":"mno"}
I want a python script to create a xml file that looks like this
<sequence>
<id>1</id>
<author>abc</author>
<title>xyz</title>
</sequence>
<sequence>
<id>2</id>
<author>def</author>
<title>mno</title>
</sequence>
Right now this is the code I'm using
import json as j
with open("test.json") as json_format_file:
d = j.load(json_format_file)
import xml.etree.cElementTree as e
r = e.Element("sequence")
e.SubElement(r,"id").text = d["id"]
e.SubElement(r,"submitter").text = d["submitter"]
e.SubElement(r,"authors").text = str(d["authors"])
e.SubElement(r,"title").text = str(d["title"])
a = e.ElementTree(r)
a.write("json_to_xml.xml")
Problem is that it only works for 1 entry, and if i have more than 1 entry in the JSON file, it throws an error. How can I make this run for multiple entries and write it all into the xml file?
EDIT:
Have changed my JSON file to look like this
[{"id":1,"author":"abc","title":"xyz"},
{"id":2,"author":"def","title":"mno"}]
json2xml is one of the powerful python library. Try this one.
json.json
[{"id":1,"author":"abc","title":"xyz"}, {"id":2,"author":"def","title":"mno"}]
Python
from json2xml import json2xml
import lxml.etree as ET
import json
with open('json.json') as data_file:
data = json.load(data_file)
json2xml = json2xml.Json2xml(data, wrapper="sequence", pretty=True, attr_type=False).to_xml()
with open('json2xml.xml', 'w') as f:
f.write(json2xml)
## modifie xml data
tree = ET.parse('json2xml.xml')
root = tree.getroot()
for item in root.findall('item'):
for child in item:
root.append(child)
root.remove(item)
new_xml = ET.tostring(root, pretty_print=True, xml_declaration=True, encoding="UTF-8")
with open('file.xml', 'wb' ) as final:
final.write(new_xml)
Can you modify the json data, make it into an array?
{
"updated": "2020-07-09",
"list": [
{"id":1,"author":"abc","title":"xyz"},
{"id":2,"author":"def","title":"mno"}
]
}
import json as j
with open('data.json') as data_file:
data = json.load(data_file)
for obj in data['list']:
r = e.Element("sequence")
e.SubElement(r,"id").text = obj["list"]["id"]
e.SubElement(r,"submitter").text = obj["list"]["submitter"]
e.SubElement(r,"authors").text = str(obj["list"]["authors"])
e.SubElement(r,"title").text = str(obj["list"]["title"])
a = e.ElementTree(r)
a.write("json_to_xml.xml")
Related
I have an XML file like that and trying to convert it to CSV with xml2csv python library. But there is a < images > image tag that brokes everything. I want to get all < img_item > tags on different column. How can I achieve that?
Thanks,
<products>
<product>
<code>722</code>
<ws_code>B515C16CRU</ws_code>
<supplier_code>B515C16CRU</supplier_code>
<images>
<img_item type_name="">
https://www.apparel.com.tr/stance-corap-cruker-grey-orap-stance-ankle-bters-3378-72-B.jpg
</img_item>
<img_item type_name="">
https://www.apparel.com.tr/stance-corap-cruker-grey-orap-stance-ankle-bters-3379-72-B.jpg
</img_item>
<img_item type_name="">
https://www.apparel.com.tr/stance-corap-cruker-grey-orap-stance-ankle-bters-3380-72-B.jpg
</img_item>
</images>
</product>
....
</products>
As you might have guessed, the problem is because each product node has multiple img_item tags which xml2csv does not know how to handle (and, going over its documentation, does not seem to have an option to let it know how to handle these nodes).
You can, however, do this quite easily using the builtin csv module. You just need to decide how you want to delimit the different images' urls. In the example below I've decided to use ; (obviously you can't use ,, unless you use another delimiter for the columns).
Also note that I hardcoded the headers. This can be (quite) easily changed so that the headers are dynamically detected from the product node's sub-elements.
import csv
import xml.etree.ElementTree as ET
string = '''<products>
<product>
<code>722</code>
<ws_code>B515C16CRU</ws_code>
<supplier_code>B515C16CRU</supplier_code>
<images>
<img_item type_name="">https://www.apparel.com.tr/stance-corap-cruker-grey-orap-stance-ankle-bters-3378-72-B.jpg</img_item>
<img_item type_name="">https://www.apparel.com.tr/stance-corap-cruker-grey-orap-stance-ankle-bters-3379-72-B.jpg</img_item>
<img_item type_name="">https://www.apparel.com.tr/stance-corap-cruker-grey-orap-stance-ankle-bters-3380-72-B.jpg</img_item>
</images>
</product>
</products>'''
root = ET.fromstring(string)
headers = ('code', 'ws_code', 'supplier_code', 'images')
with open('test.csv', 'w', newline='') as f:
writer = csv.DictWriter(f, fieldnames=headers)
writer.writeheader()
for product in root.iter('product'):
writer.writerow({'code': product.find('code').text,
'ws_code': product.find('ws_code').text,
'supplier_code': product.find('supplier_code').text,
'images': ';'.join(img.text for img in product.iter('img_item'))})
Which produces the below CSV:
code,ws_code,supplier_code,images
722,B515C16CRU,B515C16CRU,https://www.apparel.com.tr/stance-corap-cruker-grey-orap-stance-ankle-bters-3378-72-B.jpg;https://www.apparel.com.tr/stance-corap-cruker-grey-orap-stance-ankle-bters-3379-72-B.jpg;https://www.apparel.com.tr/stance-corap-cruker-grey-orap-stance-ankle-bters-3380-72-B.jpg
import xml.etree.ElementTree as ET
import csv
import re
class xml_to_csv:
def do(self):
#self.xml_file_location = input("Enter full path of XML file(Eg = D:\programs\ResidentData.xml) : ")
self.tree = ET.parse("urunler-fotolu.xml")
self.root = self.tree.getroot()
self.csv_file_location = input("Enter full path to store CSV file(Eg = D:\programs\csv_file.csv ) : ")
self.csv_data = open(self.csv_file_location, 'w')
self.csv_writer = csv.writer(self.csv_data)
self.find_records(self.root)
def find_attributes(self,record):
temp = []
dont_do = 0
for j in record:
temp = temp + self.find_attributes(j)
dont_do = 1
if(dont_do == 0):
return [record.text]
return temp
def find_records(self,root1):
for i in root1:
csv_record = self.find_attributes(i)
sz = len(csv_record)
i=0
while (i<sz):
if csv_record[i][0] == '\n':
csv_record[i] = csv_record[i][1:len(csv_record[i])-1]
i = i+1;
print(csv_record)
self.csv_writer.writerow(csv_record)
if __name__ == "__main__":
obj = xml_to_csv()
obj.do()
Input:
For this = """
<State>
<Resident Id="100">
<Name>Sample Name</Name>
<PhoneNumber>1234567891</PhoneNumber>
<EmailAddress>sample_name#example.com</EmailAddress
<Address>
<StreetLine1>Street Line1</StreetLine1>
<City>City Name</City>
<StateCode>AE</StateCode>
<PostalCode>12345</PostalCode>
</Address>
</Resident>
</State>
"""
Output :
['Sample Name', '1234567891', 'sample_name#example.com', 'Street Line1', 'City Name', 'AE', '12345']
I'm trying to write the list elements to an xml file. I have written the below code. The xml file is created, but the data is repeated. I'm unable to figure out why is the data written twice in the xml file.
users_list = ['Group1User1', 'Group1User2', 'Group2User1', 'Group2User2']
def create_xml(self):
usrconfig = Element("usrconfig")
usrconfig = ET.SubElement(usrconfig,"usrconfig")
for user in range(len( users_list)):
usr = ET.SubElement(usrconfig,"usr")
usr.text = str(users_list[user])
usrconfig.extend(usrconfig)
tree = ET.ElementTree(usrconfig)
tree.write("details.xml",encoding='utf-8', xml_declaration=True)
Output File: details.xml
-
<usr>Group1User1</usr>
<usr>Group1User2</usr>
<usr>Group2User1</usr>
<usr>Group2User2</usr>
<usr>Group1User1</usr>
<usr>Group1User2</usr>
<usr>Group2User1</usr>
<usr>Group2User2</usr>
enter image description here
usrconfig.extend(usrconfig)
This line looks suspicious to me. if userconfig was a list, this line would be equivalent to "duplicate every element in this list". I suspect that something similar happens for Elements, too. Try deleting that line.
import xml.etree.ElementTree as ET
users_list = ["Group1User1", "Group1User2", "Group2User1", "Group2User2"]
def create_xml():
usrconfig = ET.Element("usrconfig")
usrconfig = ET.SubElement(usrconfig,"usrconfig")
for user in range(len( users_list)):
usr = ET.SubElement(usrconfig,"usr")
usr.text = str(users_list[user])
tree = ET.ElementTree(usrconfig)
tree.write("details.xml",encoding='utf-8', xml_declaration=True)
create_xml()
Result:
<?xml version='1.0' encoding='utf-8'?>
<usrconfig>
<usr>Group1User1</usr>
<usr>Group1User2</usr>
<usr>Group2User1</usr>
<usr>Group2User2</usr>
</usrconfig>
For such a simple xml structure, we can directly write out the file. But this technique might also be useful if one is not up to speed with the python xml modules.
import os
users_list = ["Group1User1", "Group1User2", "Group2User1", "Group2User2"]
os.chdir("C:\\Users\\Mike\\Desktop")
xml_out_DD = open("test.xml", 'wb')
xml_out_DD.write(bytes('<usrconfig>', 'utf-8'))
for i in range(0, len(users_list)):
xml_out_DD.write(bytes('<usr>' + users_list[i] + '</usr>', 'utf-8'))
xml_out_DD.write(bytes('</usrconfig>', 'utf-8'))
xml_out_DD.close()
Parsing an XML file with ElementTree in Python.
Here is the file:
<?xml version='1.0' encoding='utf-8'?>
<Device fqdm="DESKTOP-4OB3072">
<IP>192.168.203.1</IP>
<MAC>00:00:00:00:00:00</MAC>
</Device>
I am receiving the error (below) when trying to parse the file and retrieve the value of the attribute of 'fqdm'.
"xml.etree.ElementTree.ParseError: junk after document element: line 2, column 90"
Here is the parsing code (please ignore the stupid file handling, it will be changed):
with open('received_file.xml', 'a+') as f:
while True:
data = conn.recv(BUFFER_SIZE)
print data
if not data:
f.close()
break
f.write(data)
f.close()
g = open('received_file.xml', 'r+')
tree = ET.parse(g)
root = tree.getroot()
print root
test = root.find('./Device').attrib['fqdm']
print test
sock.close()
Try this:
with open('received_file.xml', 'a+') as f:
while True:
data = conn.recv(BUFFER_SIZE)
print data
if not data:
f.close()
break
f.write(data)
f.close()
g = open('received_file.xml', 'r+')
tree = ET.parse(g)
root = tree.getroot()
attributes = root.attrib
print root
test = attributes['fqdm']
print test
sock.close()
Your parse error is at column 90, but the xml snippet you shared only has 32 columns. If this file is generated by your socket object, you probably have extra unprintable characters following the valid xml in line 2. The code that creates this file probably needs to be updated to properly terminate the strings in the lines it receives.
yourTag.attrib.get("the_attribute")
I know the question is redundant but I tried all the Python code that I found and modified for my file but they did not work. I need to find a way to convert my file myData.csv in to a XML format file which can be read by a navigator.
I just started to learn Python this month so I'm a beginner. This is my code:
#! usr/bin/python
# -*- coding: utf-8 -*-
import csv, sys, os
from lxml import etree
csvFile = 'myData.csv' # création de la variable pour le fichier csv
reader= csv.reader(open(csvFile), delimiter=';', quoting=csv.QUOTE_NONE) # création d'une variable reader à qui on renvoie le tableau csv
print "<data>"
for record in reader:
if reader.line_num == 1:
header = record
else:
innerXml = ""
dontShow = False
type = ""
for i, field in enumerate(record):
innerXml += "<%s>" % header[i].lower() + field + "</%s>" % header[i].lower()
if i == 1 and field == "0":
type = "Next"
elif type == "" and i == 3 and field == "0":
type = "Next"
elif type == "" and i == 3 and field != "0":
type = "film"
if i == 1 and field == "X":
dontShow = True
if dontShow == False:
xml = "<%s>" % type
xml += innerXml
xml += "</%s>" % type
print xml
print "</data>"
Consider building your XML with dedicated DOM objects and not a concatenation of strings which you can do with the lxml module. Using methods such as Element(), SubElement(), etc. you can iteratively build XML tree from reading CSV data:
import csv
import lxml.etree as ET
headers = ['Titre', 'Realisateur', 'Date_Debut_Evenement', 'Date_Fin_Evenement', 'Cadre',
'Lieu', 'Adresse', 'Arrondissement', 'Adresse_complète', 'Geo_Coordinates']
# INITIALIZING XML FILE
root = ET.Element('root')
# READING CSV FILE AND BUILD TREE
with open('myData.csv') as f:
next(f) # SKIP HEADER
csvreader = csv.reader(f)
for row in csvreader:
data = ET.SubElement(root, "data")
for col in range(len(headers)):
node = ET.SubElement(data, headers[col]).text = str(row[col])
# SAVE XML TO FILE
tree_out = (ET.tostring(root, pretty_print=True, xml_declaration=True, encoding="UTF-8"))
# OUTPUTTING XML CONTENT TO FILE
with open('Output.xml', 'wb') as f:
f.write(tree_out)
Output
<?xml version='1.0' encoding='UTF-8'?>
<root>
<data>
<Titre>1</Titre>
<Realisateur>BUS PALLADIUM</Realisateur>
<Date_Debut_Evenement>CHRISTOPHER THOMPSON</Date_Debut_Evenement>
<Date_Fin_Evenement>21 mai 2009</Date_Fin_Evenement>
<Cadre>21 mai 2009</Cadre>
<Lieu>EXTERIEUR</Lieu>
<Adresse>PLACE</Adresse>
<Arrondissement>PIGALLE</Arrondissement>
<Adresse_complète>75018</Adresse_complète>
<Geo_Coordinates>PLACE PIGALLE 75018 Paris France</Geo_Coordinates>
</data>
<data>
<Titre>2</Titre>
<Realisateur>LES INVITES DE MON PERE</Realisateur>
<Date_Debut_Evenement>ANNE LE NY</Date_Debut_Evenement>
<Date_Fin_Evenement>20 mai 2009</Date_Fin_Evenement>
<Cadre>20 mai 2009</Cadre>
<Lieu>DOMAINE PUBLIC</Lieu>
<Adresse>SQUARE</Adresse>
<Arrondissement>DU CLIGNANCOURT</Arrondissement>
<Adresse_complète>75018</Adresse_complète>
<Geo_Coordinates>SQUARE DU CLIGNANCOURT 75018 Paris France</Geo_Coordinates>
</data>
<data>
<Titre>3</Titre>
<Realisateur>DEMAIN, A L'AUBE</Realisateur>
<Date_Debut_Evenement>GAEL CABOUAT</Date_Debut_Evenement>
<Date_Fin_Evenement>17 avril 2009</Date_Fin_Evenement>
<Cadre>17 avril 2009</Cadre>
<Lieu>EXTERIEUR</Lieu>
<Adresse>RUE</Adresse>
<Arrondissement>QUINCAMPOIX</Arrondissement>
<Adresse_complète>75004</Adresse_complète>
<Geo_Coordinates>RUE QUINCAMPOIX 75004 Paris France</Geo_Coordinates>
</data>
...
(posted as an answer so I can show a code block)
There are a lot of picky details when writing XML. In Python, you should probably use some version of ElementTree to help with that. One good tutorial is Creating XML Documents. Quoting from there:
from xml.etree.ElementTree import Element, SubElement, Comment, tostring
top = Element('top')
comment = Comment('Generated for PyMOTW')
top.append(comment)
child = SubElement(top, 'child')
child.text = 'This child contains text.'
child_with_tail = SubElement(top, 'child_with_tail')
child_with_tail.text = 'This child has regular text.'
child_with_tail.tail = 'And "tail" text.'
child_with_entity_ref = SubElement(top, 'child_with_entity_ref')
child_with_entity_ref.text = 'This & that'
print(tostring(top))
If you use this as an example of how to create a tree of XML elements, you should be able to translate your code into the XML structure you need.
Importing pandas and saving file name:
import pandas as pd
csvFile = 'myData.csv'
The following will read CSV into a pandas data frame, then convert to XML.
df = pd.read_csv(path)
df_xml = df.to_xml()
The below code will create a new file and then save the XML data to a file named "csv2xml"
f = open("csv2xml.xml", "w")
f.write(df_xml)
f.close()
I have some code that is parsing an xml file and saving it as a csv. I can do this two ways, one by manually downloading the xml file and then parsing it, the other by taking the xml feed directly using ET.fromstring and then parsing. When I go directly I get data errors it appears to be an integrity issue. I am trying to include the xml download in to the code, but I am not quite sure the best way to approach this.
import xml.etree.ElementTree as ET
import csv
import urllib
url = 'http://www.capitalbikeshare.com/data/stations/bikeStations.xml'
connection = urllib.urlopen(url)
data = connection.read()
#I need code here!!!
tree = ET.parse('bikeStations.xml')
root = tree.getroot()
#for child in root:
#print child.tag, child.attrib
locations = []
for station in root.findall('station'):
name = station.find('name').text
bikes = station.find('nbBikes').text
docks = station.find('nbEmptyDocks').text
time = station.find('latestUpdateTime').text
sublist = [name, bikes, docks, time]
locations.append(sublist)
#print 'Station:', name, 'has', bikes, 'bikes and' ,docks, 'docks'
#print locations
s = open('statuslog.csv', 'wb')
w = csv.writer(s)
w.writerows(locations)
s.close()
f = open('filelog.csv', 'ab')
w = csv.writer(f)
w.writerows(locations)
f.close()
What you need is:
root = ET.fromstring(data)
and omit the line of: tree = ET.parse('bikeStations.xml')
As the response from connection.read() returns String, you can directly read the XML string by using fromstring method, you can read more from HERE.