Extract resulted list data to a xml file in python - python

How can I extract my resulted list data to an xml file?
My resulted list is given below:
week=[{'item': Electrelane, 'weight': 140}, {'item': Kraftwerk, 'weight': 117},{'item': The Flaming Lips, 'weight': 113}]

Since you don't provide any information on how you want to format your XML, i just invented my own notation.
week=[{'item': 'Electrelane', 'weight': 140}, {'item': 'Kraftwerk', 'weight': 117},{'item': 'The Flaming Lips', 'weight': 113}]
print "<?xml version='1.0' ?>"
print "<week>"
for day in week:
print " <day>"
for key, value in day.items():
print " <%s>%s</%s>" % (key, value, key)
print " </day>"
print "</week>"
EDIT
To print to console, iterate over items in a similar way but change the output (by the print commands)
# enumerate the days in the week
for i, day in enumerate(week):
print "day %d" % i
# show values in sorted order
for key in sorted(day):
print " - %s\t: %s" % (key, day[key])

You can trivially adjust this to your needs.

Here's some code that uses xml.dom.minidom to build up the XML document.
week=[{'item': 'Electrelane', 'weight': 140}, {'item': 'Kraftwerk', 'weight': 117},{'item': 'The Flaming Lips', 'weight': 113}]
from xml.dom.minidom import getDOMImplementation
impl = getDOMImplementation()
document = impl.createDocument(None, "week", None)
week_element = document.documentElement
for entry in week:
node = document.createElement("entry")
for attr,value in entry.iteritems():
node.setAttribute(attr,str(value))
week_element.appendChild(node)
print document.toprettyxml()
Produces:
<?xml version="1.0" ?>
<week>
<entry item="Electrelane" weight="140"/>
<entry item="Kraftwerk" weight="117"/>
<entry item="The Flaming Lips" weight="113"/>
</week>

Related

Parse xml file to a python list

I have a xml file:
<?xml version="1.0" encoding="utf-8" standalone="yes"?>
<Document xmlns:xsi="http://www.w3.org/2001/XMLSchema-instance" xmlns="urn:iso:std:iso:20022:tech:xsd:pain.001.001.03">
<CstmrCdtTrfInitn>
<GrpHdr>
<MsgId>637987745078994894</MsgId>
<CreDtTm>2022-09-14T05:48:27</CreDtTm>
<NbOfTxs>205</NbOfTxs>
<CtrlSum>154761.02</CtrlSum>
<InitgPty>
<Nm> Company</Nm>
</InitgPty>
</GrpHdr>
<PmtInf>
<PmtInfId>20220914054827-154016</PmtInfId>
<PmtMtd>TRF</PmtMtd>
<BtchBookg>true</BtchBookg>
<NbOfTxs>205</NbOfTxs>
<CtrlSum>154761.02</CtrlSum>
<PmtTpInf>
<SvcLvl>
<Cd>SEPA</Cd>
</SvcLvl>
<CtgyPurp>
<Cd>SALA</Cd>
</CtgyPurp>
</PmtTpInf>
<CdtTrfTxInf> <----------------------------------
<Amt>
<InstdAmt Ccy="EUR">1536.96</InstdAmt>
</Amt>
<Cdtr>
<Nm>Achternaam, Voornaam </Nm>
</Cdtr>
<CdtrAcct>
<Id>
<IBAN>NL80RABO0134343443</IBAN>
</Id>
</CdtrAcct>
</CdtTrfTxInf> <------------------------------------
<CdtTrfTxInf> <----------------------------------
<Amt>
<InstdAmt Ccy="EUR">1676.96</InstdAmt>
</Amt>
<Cdtr>
<Nm>Achternaam, Voornaam </Nm>
</Cdtr>
<CdtrAcct>
<Id>
<IBAN>NL80RABO013433222243</IBAN>
</Id>
</CdtrAcct>
</CdtTrfTxInf> <------------------------------------
</CstmrCdtTrfInitn>
</Document>
I use ElementTree:
I want a python list of tuples with the info within the tag (everything between the arrows in the example xml file). So in this example i want al list with 2 tuples.
How can i do that.
I can iterate over the tree, but thats is.
my code:
import xml.etree.ElementTree as ET
tree = ET.parse(xml_file)
root = tree.getroot()
for elem in tree.iter():
print(elem.tag, elem.text) --> i get every tag in the whole file
I rather like to use xmltodict.
First of all, your input data as given is missing a closing </PmtInf> tag towards the end, just before your closing </CstmrCdtTrfInitn> tag. After fixing that, I saved your xml data into a file and did the following:
import xmltodict
with open("input_data.xml", "r") as f:
xml_data = f.read()
xml_dict = xmltodict.parse(xml_data)
You can then access the xml data using dictionary accessors, for example:
xml_dict
>>>{'Document': {'#xmlns:xsi': 'http://www.w3.org/20...a-instance', '#xmlns': 'urn:iso:std:iso:2002...001.001.03', 'CstmrCdtTrfInitn': {...}}}
xml_dict["Document"]
>>>{'#xmlns:xsi': 'http://www.w3.org/20...a-instance', '#xmlns': 'urn:iso:std:iso:2002...001.001.03', 'CstmrCdtTrfInitn': {'GrpHdr': {...}, 'PmtInf': {...}}}
xml_dict["Document"]["CstmrCdtTrfInitn"].keys()
>>>dict_keys(['GrpHdr', 'PmtInf'])
xml_dict["Document"]["CstmrCdtTrfInitn"]["PmtInf"]
{'PmtInfId': '20220914054827-154016', 'PmtMtd': 'TRF', 'BtchBookg': 'true', 'NbOfTxs': '205', 'CtrlSum': '154761.02', 'PmtTpInf': {'SvcLvl': {...}, 'CtgyPurp': {...}}, 'CdtTrfTxInf': [{...}, {...}]}
xml_dict["Document"]["CstmrCdtTrfInitn"]["PmtInf"].keys()
dict_keys(['PmtInfId', 'PmtMtd', 'BtchBookg', 'NbOfTxs', 'CtrlSum', 'PmtTpInf', 'CdtTrfTxInf'])
Then you can loop over your CdtTrfTxInf with:
for item in xml_dict["Document"]["CstmrCdtTrfInitn"]["PmtInf"]["CdtTrfTxInf"]:
print(item)
giving the output:
{'Amt': {'InstdAmt': {'#Ccy': 'EUR', '#text': '1536.96'}}, 'Cdtr': {'Nm': 'Achternaam, Voornaam'}, 'CdtrAcct': {'Id': {'IBAN': 'NL80RABO0134343443'}}}
{'Amt': {'InstdAmt': {'#Ccy': 'EUR', '#text': '1676.96'}}, 'Cdtr': {'Nm': 'Achternaam, Voornaam'}, 'CdtrAcct': {'Id': {'IBAN': 'NL80RABO013433222243'}}}
which you can process as you want.
this is just a speedcode try xd give it a chance and try it :
import xml.etree.ElementTree as ET
tree = ET.parse("fr.xml")
root = tree.getroot()
test = False
for elem in tree.iter():
if elem.tag == "CdtTrfTxInf":
test = True
continue
if test and elem.text.strip() :
print(elem.tag, elem.text)
with result as list of tuple :
import xml.etree.ElementTree as ET
tree = ET.parse("fr.xml")
root = tree.getroot()
test = False
tag = []
textval=[]
for elem in tree.iter():
if elem.tag == "CdtTrfTxInf":
test = True
continue
if test and elem.text.strip() :
tag.append(elem.tag)
textval.append(elem.text)
data = list(zip(tag, textval))
print (data)

Local variables in a dictionary function in Python

I am trying to handle the below requirement. As a beginner to Python programming, I couldn't get out of the issue which am facing in declaring the variables. I have a huge XML that I need to open and create three dictionaries out of it.
Here are my programming steps.
Open the file using the built-in open function
Read each line from the object created above
Between certain tags, I need to search for a pattern and fill the data into the dictionary.
The XML file looks like
<tag_1>
name=(pattern1)
age=(pattern1.1)
company=(pattern1.2)
<\tag_1>
<tag_2>
name=(pattern2)
age=(pattern2.1)
company=(pattern2.2)
<\tag_2>
<tag_3>
name=(pattern3)
age=(pattern3.1)
comapany=(pattern3.2)
<\tag_3>
and so on, with repeated above tags.
From each tag above, i need to create 3 dictionaries like:
dict1[pattern1]['age']=pattern1.1
dict1[pattern1]['company']=pattern1.2
Similarly for dict2, & dict3 as well.
Created a dictionary function, with passing arguments as line, dictionary.
for line in file.readlines():
dict_instance(line, dictionary_1 )
dict_instance(line, dictionary_2 )
dict_instance(line, dictionary_3 )
def dict_instance(line, object):
#ON TAG START (i have this condition set in my code)
if re.search(r'name=(.*)', line):
name=re.search(r'name=(.*)', line).group(1)
if re.search(r'age=(.*)', line):
age=re.search(r'age=(.*)', line).group(1)
if re.search(r'company=(.*)', line):
company=re.search(r'company=(.*)', line).group(1)
#ON TAG END (i have this condition set in my code)
object[name]={}
if not age:
object[name]['age']=age
if not company:
object[name]['company']=company
Each tag of data should go in each dictionary, like tag1 to dict1, tag2 to dict2 and tag3 to dict3.
Now my question is how do I can create the "name", "age" & "company" variables local to each dictionary, if I create global variables, these will mix up in all three dictionaries which creates incorrect data in it.
Please ignore if any indentation issues in the above.
I'm not sure I understand the requirements. But here are some methods which might be helpful:
xml_content = """<tags>
<tag_1>
name=(pattern1)
age=(pattern1.1)
company=(pattern1.2)
</tag_1>
<tag_2>
name=(pattern2)
age=(pattern2.1)
company=(pattern2.2)
</tag_2>
<tag_3>
name=(pattern3)
age=(pattern3.1)
company=(pattern3.2)
</tag_3>
</tags>
"""
from xml.etree import ElementTree
document = ElementTree.fromstring(xml_content)
You can iterate over the tags and get the desired information:
for tag in document:
print(tag.tag)
print(tag.text)
print(tag.text.split())
print(dict(line.split('=') for line in tag.text.split()))
print("---------------------")
It outputs:
tag_1
name=(pattern1)
age=(pattern1.1)
company=(pattern1.2)
['name=(pattern1)', 'age=(pattern1.1)', 'company=(pattern1.2)']
{'name': '(pattern1)', 'age': '(pattern1.1)', 'company': '(pattern1.2)'}
---------------------
tag_2
name=(pattern2)
age=(pattern2.1)
company=(pattern2.2)
['name=(pattern2)', 'age=(pattern2.1)', 'company=(pattern2.2)']
{'name': '(pattern2)', 'age': '(pattern2.1)', 'company': '(pattern2.2)'}
---------------------
tag_3
name=(pattern3)
age=(pattern3.1)
company=(pattern3.2)
['name=(pattern3)', 'age=(pattern3.1)', 'company=(pattern3.2)']
{'name': '(pattern3)', 'age': '(pattern3.1)', 'company': '(pattern3.2)'}
If you want one big list or one big dict:
def tag_to_dict(tag):
return dict(line.split('=') for line in tag.text.split())
[tag_to_dict(tag) for tag in document]
{tag.tag:tag_to_dict(tag) for tag in document}
Which return:
[{'name': '(pattern1)', 'age': '(pattern1.1)', 'company': '(pattern1.2)'},
{'name': '(pattern2)', 'age': '(pattern2.1)', 'company': '(pattern2.2)'},
{'name': '(pattern3)', 'age': '(pattern3.1)', 'company': '(pattern3.2)'}]
and
{'tag_1': {'name': '(pattern1)',
'age': '(pattern1.1)',
'company': '(pattern1.2)'},
'tag_2': {'name': '(pattern2)',
'age': '(pattern2.1)',
'company': '(pattern2.2)'},
'tag_3': {'name': '(pattern3)',
'age': '(pattern3.1)',
'company': '(pattern3.2)'}}

XML to CSV using xml.etree.ElementTree.interparse functionality

Folks, I am new (brand new) to python, so after taking a course I decided to create a script to covert an XML file to CSV. The file in question is 2GB in size, so after searching here and on google I think I need to use the xml.etree.ElementTree.interparse functionality. For reference the XML file I am looking to covert looks like this:
<Document>
<type></type>
<internal_id></internal_id>
<name></name>
<number></number>
<cadname></cadname>
<version></version>
<iteration></iteration>
**<isLatest></isLatest>**
<modifiedBy>
<username></username>
<email/>
</modifiedBy>
<content>
**<name></name>**
<id></id>
<uploaded></uploaded>
<refSize></refSize>
<storage>
<vault></vault>
<folder></folder>
**<filename></filename>**
<location></location>
**<actualLocation></actualLocation>**
</storage>
<replicatedTo></replicatedTo>
<copies></copies>
<status></status>
</content>
I am using the value of isLatest to determine whether I need to add the items to the CSV file. If the value is "true" I want the data to move to the CSV file. Here is the code that works to a point:
import xml.etree.ElementTree as ET
import csv
parser = ET.iterparse("windchill.xml")
# open a file for writing
csvfile = open('windchill.txt', 'w', encoding="utf-8")
# create the csv writer object
csvwriter = csv.writer(csvfile)
count = 0
for event, document in parser:
if document.tag == 'Document':
if document.find('isLatest').text == 'true':
row = []
name = document.find('content').find('name').text
row.append(name)
filename = document.find('content').find('storage').find('filename').text
row.append(filename)
folder = document.find('content').find('storage').find('actualLocation').text
row.append(folder)
csvwriter.writerow(row)
document.clear()
csvfile.close()
If I run the code, I get this error:
Traceback (most recent call last):
File "C:/Users/mike/PycharmProjects/windchill/xml2csv-stream.py", line 17, in <module>
if document.find('isLatest').text == 'true':
AttributeError: 'NoneType' object has no attribute 'text'
A file is created that has 91,000 entries that look like this:
plate.prt,000000000518e8,/vault/Vlt7
adhesive.prt,0000000005024b,/vault/Vlt7
brd_pad.prt,00000000057862,/vault/Vlt7
support_pad.prt,0000000005024c,/vault/Vlt7
ground.prt,0000000005089b,/vault/Vlt7
There seem to be two issues with the output.
Some items seem to be duplicated, although the source file has no duplications. The name could be duplicated in the source file, but there can only be one name value that is .
I don't think the file completed. I looked at the last entry of my TXT (CSV) file and it does not match the last line of my source file. I was assuming the iterator was serial in nature.
So, any idea what the error is telling me, and any idea why I may be seeing duplicates? Originally I thought the error may have been related to me not ending gracefully. I am confident the XML is formed properly throughout, but maybe that is a bad assumption.
******UPDATES******
Here is a sample of the elements.
<Document>
<type>wt.epm.EPMDocument</type>
<internal_id>33709881</internal_id>
<name>bga_13x11p137_0_4_0_8.prt</name>
<number>BGA_13X11P137_0_4_0_8.PRT</number>
<cadname>bga_13x11p137_0_4_0_8.prt</cadname>
<version>A</version>
<iteration>1</iteration>
<isLatest>false</isLatest>
<modifiedBy>
<username>ets027 (deleted)</username>
<email/>
</modifiedBy>
<content>
<name>bga_13x11p137_0_4_0_8.prt</name>
<id>5341368</id>
<uploaded>Jan 13, 2006 09:14:41</uploaded>
<refSize>287764</refSize>
<storage>
<vault>master_vault</vault>
<folder>master_vault7</folder>
<filename>000000000505a6</filename>
<location>[wt.fv.FvItem:33709835]::master::master_vault::master_vault7::000000000505a6</location>
<actualLocation>/vault/Windchill_Vaults/WcVlt7</actualLocation>
</storage>
<replicatedTo>
</replicatedTo>
<copies>
</copies>
<status>Content File Missing</status>
</content>
</Document>
<Document>
<type>wt.epm.EPMDocument</type>
<internal_id>34570129</internal_id>
<name>d61-2446-02_nest_plate.prt</name>
<number>D61-2446-02_NEST_PLATE.PRT</number>
<cadname>d61-2446-02_nest_plate.prt</cadname>
<version>-</version>
<iteration>1</iteration>
<isLatest>true</isLatest>
<modifiedBy>
<username>esb044c (deleted)</username>
<email/>
</modifiedBy>
<content>
<name>d61-2446-02_nest_plate.prt</name>
<id>5344204</id>
<uploaded>Jan 30, 2006 09:09:24</uploaded>
<refSize>109278</refSize>
<storage>
<vault>master_vault</vault>
<folder>master_vault7</folder>
<filename>000000000518e8</filename>
<location>[wt.fv.FvItem:34566594]::master::master_vault::master_vault7::000000000518e8</location>
<actualLocation>/vault/Windchill_Vaults/WcVlt7</actualLocation>
</storage>
<replicatedTo>
</replicatedTo>
<copies>
</copies>
<status>Content File Missing</status>
</content>
</Document>
<Document>
<type>wt.epm.EPMDocument</type>
<internal_id>33512036</internal_id>
<name>d68-2568-07_press_head_adhesive.prt</name>
<number>D68-2568-07_PRESS_HEAD_ADHESIVE.PRT</number>
<cadname>d68-2568-07_press_head_adhesive.prt</cadname>
<version>-</version>
<iteration>2</iteration>
<isLatest>true</isLatest>
<modifiedBy>
<username>e3789c (deleted)</username>
<email/>
</modifiedBy>
<content>
<name>d68-2568-07_press_head_adhesive.prt</name>
<id>5340927</id>
<uploaded>Jan 10, 2006 15:42:31</uploaded>
<refSize>76314</refSize>
<storage>
<vault>master_vault</vault>
<folder>master_vault7</folder>
<filename>0000000005024b</filename>
<location>[wt.fv.FvItem:33512072]::master::master_vault::master_vault7::0000000005024b</location>
<actualLocation>/vault/Windchill_Vaults/WcVlt7</actualLocation>
</storage>
<replicatedTo>
</replicatedTo>
<copies>
</copies>
<status>Content File Missing</status>
</content>
</Document>
<Document>
<type>wt.epm.EPMDocument</type>
<internal_id>34715717</internal_id>
<name>dbk_flip_sleeve.prt</name>
<number>DBK_FLIP_SLEEVE.PRT</number>
<cadname>dbk_flip_sleeve.prt</cadname>
<version>-</version>
<iteration>1</iteration>
<isLatest>false</isLatest>
<modifiedBy>
<username>EKA014 (deleted)</username>
<email/>
</modifiedBy>
<content>
<name>dbk_flip_sleeve.prt</name>
<id>5344969</id>
<uploaded>Feb 01, 2006 12:54:43</uploaded>
<refSize>847210</refSize>
<storage>
<vault>master_vault</vault>
<folder>master_vault7</folder>
<filename>00000000051b54</filename>
<location>[wt.fv.FvItem:34714395]::master::master_vault::master_vault7::00000000051b54</location>
<actualLocation>/vault/Windchill_Vaults/WcVlt7</actualLocation>
</storage>
<replicatedTo>
</replicatedTo>
<copies>
</copies>
<status>Content File Missing</status>
</content>
</Document>
Here is my updated code:
import xml.etree.ElementTree as ET
import csv
parser = ET.iterparse("windchill.xml", events=('start', 'end'))
csvfile = open('windchill.txt', 'w', encoding="utf-8")
csvwriter = csv.writer(csvfile)
for event, document in parser:
if event=='end' and document.tag=='Document':
if document.find('type').text == 'wt.epm.EPMDocument' and document.find('isLatest').text == 'true':
row = []
version = document.find('version').text
row.append(version)
name = document.find('content').find('name').text
row.append(name)
filename = document.find('content').find('storage').find('filename').text
row.append(filename)
# folder = document.find('content').find('storage').find('actualLocation').text
folder = document.find('content').find('storage').find('folder').text
row.append(folder)
csvwriter.writerow(row)
csvfile.close()
I added in a check for type. Type wt.ep.EPMDocument will have the record. I then want to pull the data out of the storage element. Specifically name, folder, and filename. I originally was using actualLocation instead ov vault, but changed hoping the shorter name would help with my memory error.
Concerning your first issue: iterparse 'sees' each and every xml element in a document when that element starts and, again, when it closes. This probably explains the duplication that you find. Not only must you filter for the element(s) that you want, you must filter for the appropriate event. You might look at this answer, https://stackoverflow.com/a/46167799/131187, to see how to deal with this.
Concerning the second: When document.find('isLatest') fails to find what you've requested it returns None, rather than an object representing an xml element. None has no properties, including text, therefore, your program croaks at that point, and you receive an incomplete csv file.
Edit in answer to comment: This code parses the xml but does not write the csv. csv records would be written in the save_csv_record function, or its equivalent. It appears only once in the code so should be easy to replace.
Called in the way it is in this code iterparse returns only 'end' events and their corresponding xml elements. Therefore, the code watches for the 'end' of a 'Document'. When it sees one it asks whether the document contains an 'isLatest' of 'true'. If it does it writes it out; if not, it ignores it and empties document_content. If the code has not seen the 'end' of a document it simply saves the content of the tag and keeps reading under it does.
from xml.etree.ElementTree import iterparse
def save_csv_record(record):
print(record)
return
document_content = {}
for ev, el in iterparse('windchill.xml'):
if el.tag=='Document':
if document_content['isLatest'] == 'true':
save_csv_record(document_content)
document_content = {}
else:
document_content[el.tag] = el.text.strip() if el.text else None
Output:
{'folder': 'master_vault7', 'storage': '', 'refSize': '109278', 'cadname': 'd61-2446-02_nest_plate.prt', 'filename': '000000000518e8', 'replicatedTo': '', 'status': 'Content File Missing', 'number': 'D61-2446-02_NEST_PLATE.PRT', 'location': '[wt.fv.FvItem:34566594]::master::master_vault::master_vault7::000000000518e8', 'vault': 'master_vault', 'uploaded': 'Jan 30, 2006 09:09:24', 'id': '5344204', 'actualLocation': '/vault/Windchill_Vaults/WcVlt7', 'name': 'd61-2446-02_nest_plate.prt', 'modifiedBy': '', 'email': None, 'content': '', 'internal_id': '34570129', 'iteration': '1', 'username': 'esb044c (deleted)', 'type': 'wt.epm.EPMDocument', 'copies': '', 'isLatest': 'true', 'version': '-'}
{'folder': 'master_vault7', 'storage': '', 'refSize': '76314', 'cadname': 'd68-2568-07_press_head_adhesive.prt', 'filename': '0000000005024b', 'replicatedTo': '', 'status': 'Content File Missing', 'number': 'D68-2568-07_PRESS_HEAD_ADHESIVE.PRT', 'location': '[wt.fv.FvItem:33512072]::master::master_vault::master_vault7::0000000005024b', 'vault': 'master_vault', 'uploaded': 'Jan 10, 2006 15:42:31', 'id': '5340927', 'actualLocation': '/vault/Windchill_Vaults/WcVlt7', 'name': 'd68-2568-07_press_head_adhesive.prt', 'modifiedBy': '', 'email': None, 'content': '', 'internal_id': '33512036', 'iteration': '2', 'username': 'e3789c (deleted)', 'type': 'wt.epm.EPMDocument', 'copies': '', 'isLatest': 'true', 'version': '-'}
EDITED FOR LATEST CODE:
Here is the new code that I am using, that sill runs out of memory:
from xml.etree.ElementTree import iterparse
def save_csv_record(record):
print(record)
return
document_content = {}
for ev, el in iterparse('windchill.xml'):
if el.tag=='Document':
if document_content['type']=='wt.epm.EPMDocument' and
document_content['isLatest'] == 'true':
save_csv_record(document_content)
document_content = {}
else:
document_content[el.tag] = el.text.strip() if el.text else None

combine dictionaries and pass as output to another function

I am learning python and coding. I am trying one web scraping example. I download the currency exchange data from a website and I want to compute average exchange rate for each currency over a 50 days period. The problem is that I am unable to do the following.
I get results from first function which should be in form of a dictionary and then pass these dictionaries to another function as argument and to perform averaging of those values. I am unable to pass correctly dict values to another function.
my code is as follow
import os
import webbrowser
import requests as rq
import sys
from bs4 import BeautifulSoup
from xml.etree import ElementTree as ET
def saveData(path, date):
session = rq.session()
url = 'https://www.bnm.md/en/official_exchange_rates?get_xml=1&date=' + date
datastore = session.get(url)
with open(path, 'wb') as f:
f.write(datastore.content)
data = ET.fromstring(datastore.content)
'''
elements = {}
for element in data.iter():
if element.tag in ('Name', 'Value'):
elements[element.tag] = element.text
print 'elements:', elements
# Here I want to combine those all dictionaries in variable so that i can pass it as argument to another function
return elements
'''
# i replace the above triple quote code with the following below code
elements = {}
for tag, text in data.items():
if tag in ('Name', 'Value'):
elements.setdefault(tag, [])
elements[tag].append(text)
return elements
def computeAverage(elements): # I want to pass function saveData() results who are in dictioanry form to this function but I am unable to solve this issue.
print elements
def main():
dates = ['20.04.2016', '21.04.2016', '22.04.2016']
paths = []
for date in dates:
path = '/home/robbin/Desktop/webscrape/{}.xml'.format(date)
paths.append(path)
data3 = {}
for path, date in zip(paths, dates):
data2 = saveData(path, date)
print 'data2: ', data2
for k, v in data2.items():
data3.setdefault(k, [])
data3[k].append(v)
print 'data3: ', data3
computeAverage(data3)
if __name__ == '__main__':
main()
Also I am getting the results from saveData() function as dictionaries like this and it repeat every dictionary for the next item too which is wrong.
elements: {'Name': 'Euro'}
elements: {'Name': 'Euro', 'Value': '22.4023'}
elements: {'Name': 'US Dollar', 'Value': '22.4023'}
elements: {'Name': 'US Dollar', 'Value': '19.7707'}
elements: {'Name': 'Russian Ruble', 'Value': '19.7707'}
elements: {'Name': 'Russian Ruble', 'Value': '0.3014'}
elements: {'Name': 'Romanian Leu', 'Value': '0.3014'}
elements: {'Name': 'Romanian Leu', 'Value': '4.9988'}
Also what I tried to get results like this but failed
elements: {'Name': 'Euro', 'Value': '22.4023'}
elements: {'Name': 'US Dollar', 'Value': '19.7707'}
elements: {'Name': 'Russian Ruble', 'Value': '0.3014'}
elements: {'Name': 'Romanian Leu', 'Value': '4.9988'}
Updates:-------------
elements = []
for element in data.iter():
if element.tag in ('Name', 'Value'):
elements.append(element.text)
# print 'elements: ', elements
return elements
and in the main function() i make
for path, date in zip(paths, dates):
data = saveData(path, date)
# print 'data from main: ', data
computeAverage(data)
and the output of "print 'data from main: ', data" looks like this
['Euro', '22.4023', 'US Dollar', '19.7707', 'Russian Ruble', '0.3014', 'Romanian Leu', '4.9988',.........'Special Drawing Rights', '27.8688']
['Euro', '22.4408', 'US Dollar', '19.7421', 'Russian Ruble', '0.3007', 'Romanian Leu', '5.0012',.....'Special Drawing Rights', '27.8606']
I am newbie to coding and if someone help me regarding these two problems. I would be really thankful.
First of all, I agree with #Prakhar Verma.
Second, you didn't mention clearly what you want. But I can assume that you want to merge the data that you got from the 'saveData' function and then calculate average. So, here is the missing code.
data3 = {}
for path, date in zip(paths, dates):
data2 = saveData(path, date)
for k, v in data2.items():
# you can move this line after declaring the data3 dict if keys returned by saveData are fixed i.e. name, value
data3.setdefault(k, [])
data3[k].append(v)
computeAverage(data3)
Update to saveData function:
elements = {}
for tag, text in data.items():
if tag in ('Name', 'Value'):
elements.setdefault(tag, [])
elements[tag].append(text)
===================================================
Update 2:
def saveData(path, date):
#session = rq.session()
url = 'https://www.bnm.md/en/official_exchange_rates?get_xml=1&date=' + date
datastore = rq.get(url)
with open(path, 'wb') as f:
f.write(datastore.content)
data = ET.fromstring(datastore.content)
# i replace the above triple quote code with the following below code
elements = {}
for element in data.iter():
tag = element.tag
text = element.text
if tag in ('Name', 'Value'):
elements.setdefault(tag, [])
elements[tag].append(text)
return elements
def main():
dates = ['20.03.2016', '21.03.2016', '22.03.2016']
paths = []
for date in dates:
#please edit this
path = '{}.xml'.format(date)
paths.append(path)
data3 = {}
for path, date in zip(paths, dates):
data2 = saveData(path, date)
for k, v in data2.items():
data3.setdefault(k, [])
data3[k].append(v)
computeAverage(data3)
The 'saveData' function is returning data but you are not saving it in any variable. So what you need to do is save the data when it's returned from 'saveData' function and then send it as a parameter to 'computeAverage' function.
Please go through the basics of coding and follow any programming tutorial. :)

How to parse and display the content of an Ixml object using IXML

I am having difficult parsing the xml _file below using Ixml:
>>_file= "qv.xml"
file content:
<document reference="suspicious-document00500.txt">
<feature name="plagiarism" type="artificial" obfuscation="none" this_offset="128" this_length="2503" source_reference="source-document00500.txt" source_offset="138339" source_length="2503"/>
<feature name="plagiarism" type="artificial" obfuscation="none" this_offset="8593" this_length="1582" source_reference="source-document00500.txt" source_offset="49473" source_length="1582"/>
</document>
Here is my attempt:
>>from lxml.etree import XMLParser, parse
>>parsefile = parse(_file)
>>print parsefile
Output: <lxml.etree._ElementTree object at 0x000000000642E788>
The output is the location of the ixml object, while I am after the actual file content ie
Desired output={'document reference'="suspicious-document00500.txt", 'this_offset': '128', 'obfuscation': 'none', 'source_length': '2503', 'name': 'plagiarism', 'this_length': '2503', 'source_reference': 'source-document00500.txt', 'source_offset': '138339', 'type': 'artificial'}
Any ideas on how to get the desired output? thanks.
Here's one way of getting the desired outputs:
from lxml import etree
def main():
doc = etree.parse('qv.xml')
root = doc.getroot()
print root.attrib
for item in root:
print item.attrib
if __name__ == "__main__":
main()
Output:
{'reference': 'suspicious-document00500.txt'}
{'this_offset': '128', 'obfuscation': 'none', 'source_length': '2503', 'name': 'plagiarism', 'this_length': '2503', 'source_reference': 'source-document00500.txt', 'source_offset': '138339', 'type': 'artificial'}
{'this_offset': '8593', 'obfuscation': 'none', 'source_length': '1582', 'name': 'plagiarism', 'this_length': '1582', 'source_reference': 'source-document00500.txt', 'source_offset': '49473', 'type': 'artificial'}
It works fine with the contents you gave.
You might want to read thisto see how etree represents xml objects.

Categories

Resources