Parsing name/value pairs from XML

Parsing name/value pairs from XML - python

I am trying to pull account details from XML files supplied by vendors.
I have one vendor that supplied XML files like:
<Accounts>
<Account>
<AccountNumber>1234567</AccountNumber>
<Balance>$200.00</Balance>
</Account>
<Account>
...
</Account>
</Accounts>
And I can parse this fairly easily using python:
mytree = et.parse(xml_path)
myroot = mytree.getroot()
for acc in charges_root.findall('Account'):
acctnum = acc.find('AccountNumber').text
balance = acc.find('Balance').text
print(acctnum, balance)
Which outputs like this:
1234567 $200.00
However another vendor supplies the XML files in something more like name/value pairs, and I am unsure how to easily access that data. It doesn't work the same way as above:
<Accounts>
<Account>
<field name='AccountNumber' value='1234567' />
<field name='Balance' value='$200.00' />
</Account>
<Account>
...
</Account>
</Accounts>
So far I've got this, but would like to be able to access the values separately and easily:
mytree = et.parse(xml_path)
myroot = mytree.getroot()
for field in myroot.findall('Account'):
for line in field:
print(line.attrib)
Which outputs something like:
{'name': 'AccountNumber', 'value': '1234567'}
{'name': 'Balance', 'value': '$200.00'}
So my question is this - How can I access the values and assign them to variables (based on the name) so that I can make use of them elsewhere in the script, like I have with acctnum and balance in the first example?

Populate a new datastructure (like a dict) from the field when you iterate instead of just discarding:
account_d = {}
for field in myroot.findall('Account'):
for line in field:
account_d[line.attrib['name']] = line.attrib['value']
# account_d should now be:
# { 'AccountNumber': '1234567', 'Balance': '$200.00' }
You can use a list of lists/tuples too:
account_a = []
for field in myroot.findall('Account'):
for line in field:
account_d.append(line.attrib['name'], line.attrib['value'])
# account_a should now be:
# [('AccountNumber', '1234567'), ('Balance', '$200.00')]

ElementTree 1.3 has the ability to locate nodes with particular attributes:
from xml.etree import ElementTree as et
data = '''\
<Accounts>
<Account>
<field name='AccountNumber' value='1234567' />
<field name='Balance' value='$200.00' />
</Account>
<Account>
<field name='AccountNumber' value='9999999' />
<field name='Balance' value='$300.00' />
</Account>
</Accounts>'''
tree = et.fromstring(data)
for acc in tree.iterfind('Account'):
acctnum = acc.find("field[#name='AccountNumber']").attrib['value']
balance = acc.find("field[#name='Balance']").attrib['value']
print(acctnum,balance)
1234567 $200.00
9999999 $300.00

You can do it by collecting all the Account element's field attributes into a dictionary and then using the information in it as needed:
accounts.xml sample input file:
<?xml version="1.0"?>
<Accounts>
<Account>
<field name='AccountNumber' value='1234567' />
<field name='Balance' value='$200.00' />
</Account>
<Account>
<field name='AccountNumber' value='8901234' />
<field name='Balance' value='$100.00' />
</Account>
</Accounts>
Code:
import xml.etree.ElementTree as et
xml_path = 'accounts.xml'
mytree = et.parse(xml_path)
myroot = mytree.getroot()
for acct in myroot.findall('Account'):
info = {field.attrib['name']: field.attrib['value']
for field in acct.findall('field')}
acctnum, balance = info['AccountNumber'], info['Balance']
print(acctnum, balance)
Result:
1234567 $200.00
8901234 $100.00

Question: How can I access the values and assign them to variables (based on the name)
Convert all Accounts to a Dict[AccountNumber] of Dict[field].
The Attribute name becomes the dict Key:
Accounts = {}
for account in root.findall('Account'):
fields = {}
for field in account.findall('field'):
fields[field.attrib['name']] = field.attrib['value']
print('{a[AccountNumber]} {a[Balance]}'.format(a=fields))
Accounts[fields['AccountNumber']] = fields
print(Accounts)
Output:
1234567 $200.00
9999999 $300.00
{'9999999': {'AccountNumber': '9999999', 'Balance': '$300.00'}, '1234567': {'AccountNumber': '1234567', 'Balance': '$200.00'}}
Tested with Python: 3.4.2

Related

How can I extract elementary values with ElementTree in Python?

I try to extract values attributes (ex. 'Filename') of that XML file in Python.
Can you help me ?
Here is the MC 'Librarytest.xml' file :
<?xml version="1.0" encoding="UTF-8" standalone="yes" ?>
<MPL Version="2.0" Title="Library">
<Item>
<Field Name="Filename">Y:\Styx\08 - Styx - Snowblind8. Snowblind.flac</Field>
<Field Name="Name">Snowblind</Field>
<Field Name="Artist">Styx</Field>
<Field Name="Album">Paradise Theater</Field>
<Field Name="Genre">Rock</Field>
</Item>
<Item>
<Field Name="Filename">Y:\David Gilmour\04 A Boat Lies Waiting.flac</Field>
<Field Name="Name">A Boat Lies Waiting</Field>
<Field Name="Artist">David Gilmour</Field>
<Field Name="Album">Rattle That Lock (Deluxe)</Field>
<Field Name="Genre">Progressive</Field>
</Item>
</MPL>
I try this :
import xml.etree.ElementTree as ET
xml_file = 'C:/Users/ClientMD/Downloads/MC Librarytest.xml'
tree = ET.parse(xml_file)
root = tree.getroot()
for each in root.findall('.//Field'):
rating = each.find('.//Filename')
print ('Nothing' if rating is None else rating.text)
and I obtain :
Nothing
...
Nothing

Like this:
import xml.etree.ElementTree as ET
xml_file = 'C:/Users/ClientMD/Downloads/MC Librarytest.xml'
tree = ET.parse(xml_file)
root = tree.getroot()
for each in root.findall('.//Field[#Name="Filename"]'):
rating = each.text
print ('Nothing' if rating is None else rating)
Output
Y:\Styx\08 - Styx - Snowblind8. Snowblind.flac
Y:\David Gilmour\04 A Boat Lies Waiting.flac

If you want to grab more elements and keep them under a single item context - you can use the below
import xml.etree.ElementTree as ET
xml = '''<?xml version="1.0" encoding="UTF-8" standalone="yes" ?>
<MPL Version="2.0" Title="Library">
<Item>
<Field Name="Filename">Y:\Styx\\08 - Styx - Snowblind8. Snowblind.flac</Field>
<Field Name="Name">Snowblind</Field>
<Field Name="Artist">Styx</Field>
<Field Name="Album">Paradise Theater</Field>
<Field Name="Genre">Rock</Field>
</Item>
<Item>
<Field Name="Filename">Y:\David Gilmour\\04 A Boat Lies Waiting.flac</Field>
<Field Name="Name">A Boat Lies Waiting</Field>
<Field Name="Artist">David Gilmour</Field>
<Field Name="Album">Rattle That Lock (Deluxe)</Field>
<Field Name="Genre">Progressive</Field>
</Item>
</MPL>'''
INTERESTING_NAMES = ['Filename','Artist']
data = []
root = ET.fromstring(xml)
for item in root.findall('.//Item'):
temp = {}
for name in INTERESTING_NAMES:
temp[name] = item.find(f'Field[#Name="{name}"]').text
data.append(temp)
print(data)
output
[{'Filename': 'Y:\\Styx\\08 - Styx - Snowblind8. Snowblind.flac', 'Artist': 'Styx'}, {'Filename': 'Y:\\David Gilmour\\04 A Boat Lies Waiting.flac', 'Artist': 'David Gilmour'}]

How to remove root element from xml file using python

i have a a number of xml files with me, whose format is:
<objects>
<object>
<record>
<invoice_source>EMAIL</invoice_source>
<invoice_capture_date>2022-11-18</invoice_capture_date>
<document_type>INVOICE</document_type>
<data_capture_provider_code>00001</data_capture_provider_code>
<data_capture_provider_reference>1264</data_capture_provider_reference>
<document_capture_provide_code>00002</document_capture_provide_code>
<document_capture_provider_ref>1264</document_capture_provider_ref>
<rows/>
</record>
</object>
</objects>
there is two root objects in this xml. i want to remove one of them using. i want the xml to look like this:
<objects>
<record>
<invoice_source>EMAIL</invoice_source>
<invoice_capture_date>2022-11-18</invoice_capture_date>
<document_type>INVOICE</document_type>
<data_capture_provider_code>00001</data_capture_provider_code>
<data_capture_provider_reference>1264</data_capture_provider_reference>
<document_capture_provide_code>00002</document_capture_provide_code>
<document_capture_provider_ref>1264</document_capture_provider_ref>
<rows/>
</record>
</objects>
i have a folder full of this files. i want to do it using python. is there any way.

The direct way is shown below. If your real files are more complicated than one-object/one-record you'll have to be more specific with examples:
from xml.etree import ElementTree as et
xml = '''\
<objects>
<object>
<record>
<invoice_source>EMAIL</invoice_source>
<invoice_capture_date>2022-11-18</invoice_capture_date>
<document_type>INVOICE</document_type>
<data_capture_provider_code>00001</data_capture_provider_code>
<data_capture_provider_reference>1264</data_capture_provider_reference>
<document_capture_provide_code>00002</document_capture_provide_code>
<document_capture_provider_ref>1264</document_capture_provider_ref>
<rows/>
</record>
</object>
</objects>
'''
objects = et.fromstring(xml)
objects.append(objects[0][0]) # move "record" out of "object" and append as child to "objects"
objects.remove(objects[0]) # remove empty "object"
et.indent(objects) # reformat indentation (Python 3.9+)
et.dump(objects) # show result
Output:
<objects>
<record>
<invoice_source>EMAIL</invoice_source>
<invoice_capture_date>2022-11-18</invoice_capture_date>
<document_type>INVOICE</document_type>
<data_capture_provider_code>00001</data_capture_provider_code>
<data_capture_provider_reference>1264</data_capture_provider_reference>
<document_capture_provide_code>00002</document_capture_provide_code>
<document_capture_provider_ref>1264</document_capture_provider_ref>
<rows />
</record>
</objects>
Another option that would handle any nested content in object:
objects = et.fromstring(xml)
objects = objects[0] # extract "object" (lose "objects" layer)
objects.tag = 'objects' # rename "object" tag
et.indent(objects) # reformat indentation (Python 3.9+)
et.dump(objects) # show result (same output)

My approach is to iterate over the children of <objects>, which is <object>, then move the <record> nodes up one level. After which, I can remove the <object> nodes.
import xml.etree.ElementTree as ET
doc = ET.parse("input.xml")
objects = doc.getroot()
for obj in objects:
for record in obj:
objects.append(record)
objects.remove(obj)
doc.write("output.xml")
Here is the contents of output.xml:
<objects>
<record>
<invoice_source>EMAIL</invoice_source>
<invoice_capture_date>2022-11-18</invoice_capture_date>
<document_type>INVOICE</document_type>
<data_capture_provider_code>00001</data_capture_provider_code>
<data_capture_provider_reference>1264</data_capture_provider_reference>
<document_capture_provide_code>00002</document_capture_provide_code>
<document_capture_provider_ref>1264</document_capture_provider_ref>
<rows />
</record>
</objects>

xml.etree.ElementTree access subelement without creating

I have a code
ffdata = ET.Element("FFData")
fForm = ET.SubElement(ffdata, "Form")
fForm.set("FormDefId","{DD0F88DD-A858-4595-AF2F-3643D0069A39}")
fPages = ET.SubElement(fForm, "Pages")
for xml_file in xml_files:
xml_file = os.path.join(*[CurrentFolderPath,xml_file])
tree = ET.parse(xml_file)
xml_data = tree.getroot()
for xPage in xml_data.iter('Page'):
# --- Ignore first element
if int(xPage.attrib['PageNumber']) >1:
#---- Change Paginators index
xPage.set('PageNumber',str(sPageNumber))
# -- Set page number to fields
fFields = ET.SubElement(xPage, "Fields")
fxField = ET.SubElement(fFields, "Field")
fxField.set('PageNumber',str(sPageNumber-1))
fPages.append(xPage) # Add element to root
sPageNumber= sPageNumber +1
else:
if sImoneExists == 0:
fPages.append(xPage) # Add element to root
sImoneExists = 1
fPages.set("Count",str(sPageNumber-1))
indent(ffdata)
tree = ET.ElementTree(ffdata)
xml_file_save = os.path.join(*[CurrentFolderPath,"Merged.ffdata"])
tree.write(xml_file_save)
i trying to change sub element inside loop
fFields = ET.SubElement(xPage, "Fields")
fxField = ET.SubElement(fFields, "Field")
fxField.set('PageNumber',str(sPageNumber-1))
But it create new element instead of change existing
so i get
<FFData>
<Form FormDefId="{DD0F88DD-A858-4595-AF2F-3643D0069A39}">
<Pages Count="41">
<Page PageDefName="1" PageNumber="2">
<Fields Count="135">
<Field Name="L1-1"></Field>
<Field Name="PageNumber">1</Field>
</Fields>
<Fields>
<Field PageNumber="2" />
</Fields>
</Page>
</Pages>
</Form>
</FFData>
expected
<FFData>
<Form FormDefId="{DD0F88DD-A858-4595-AF2F-3643D0069A39}">
<Pages Count="41">
<Page PageDefName="1" PageNumber="2">
<Fields Count="135">
<Field Name="L1-1"></Field>
<Field Name="PageNumber">2</Field>
</Fields>
</Page>
</Pages>
</Form>
</FFData>
So how to change existing sub element of each iterating page?

Fetch xml tag values recursively using ElementTree

I have an xmk of the type:
<SCHOOL>
<GROUP name="GetStudInfo">
<DATA>
<NAME type="char">Sahil Jha</NAME>
<STD>11th</STD>
</DATA>
<DATA>
<NAME type="char">Rashmi Kaur</NAME>
<STD>11th</STD>
</DATA>
<DATA>
<NAME type="char">Palak Bisht</NAME>
<STD>11th</STD>
</DATA>
</SCHOOL>
I need to fetch the values of NAME, STD.
I tried doing this:
e = ET.ElementTree(ET.fromstring(getunitinfo_str))
for elt in e.iter():
print("{} {}".format(elt.tag, elt.text))
But this was covering other values as well:
Output:
SCHOOL
GROUP
DATA
NAME Sahil Jha
STD 11th
DATA
NAME Rashmi Kaur
STD 11th
DATA
NAME Palak Bisht
STD 11th
{}
Expected O/p:
{'Sahil Jha':'11th', 'Rashmi Kaur'::'11th', 'Palak Bisht':'11th'}
But the formatting should be of the type NAME:STD. Where am I going wrong?

As mentionned by #furas you can use XPATH to find all DATA elements and then find
NAME and STD elements:
import xml.etree.ElementTree as ET
xml = '''<SCHOOL>
<GROUP name="GetStudInfo">
<DATA>
<NAME type="char">Sahil Jha</NAME>
<STD>11th</STD>
</DATA>
<DATA>
<NAME type="char">Rashmi Kaur</NAME>
<STD>11th</STD>
</DATA>
<DATA>
<NAME type="char">Palak Bisht</NAME>
<STD>11th</STD>
</DATA>
</GROUP>
</SCHOOL>'''
e = ET.fromstring(xml)
for data_tag in e.findall('DATA'):
name = data_tag.find('NAME')
std = data_tag.find('STD')
print("{} {}".format(name.text, std.text))
Or you can use a dict comprehension to get the dictionary you want:
my_dict = {
data_tag.find('NAME').text: data_tag.find('STD').text
for data_tag in e.findall('.//DATA')
}
print(my_dict)

You need something more then only print() - you need if/else to check elt.tag to get only NAME and `STD.
Because NAME and STD are different tags so you will have to remeber NAME in some variable to use it when you get STD
name = None # default value at start
for elt in e.iter():
if elt.tag == 'NAME':
name = elt # remember element
if elt.tag == 'STD':
print("{}:{}".format(name.text, elt.text))
Or you could use xpath like in #qouify answer.
Minimal working code
getunitinfo_str = '''
<SCHOOL>
<GROUP name="GetStudInfo">
<DATA>
<NAME type="char">Sahil Jha</NAME>
<STD>11th</STD>
</DATA>
<DATA>
<NAME type="char">Rashmi Kaur</NAME>
<STD>11th</STD>
</DATA>
<DATA>
<NAME type="char">Palak Bisht</NAME>
<STD>11th</STD>
</DATA>
</GROUP>
</SCHOOL>
'''
import xml.etree.ElementTree as ET
e = ET.ElementTree(ET.fromstring(getunitinfo_str))
name = None # to remeber element
for elt in e.iter():
if elt.tag == 'NAME':
name = elt
if elt.tag == 'STD':
print("{}:{}".format(name.text, elt.text))

One liner below
import xml.etree.ElementTree as ET
xml = '''<SCHOOL>
<GROUP name="GetStudInfo">
<DATA>
<NAME type="char">Sahil Jha</NAME>
<STD>11th</STD>
</DATA>
<DATA>
<NAME type="char">Rashmi Kaur</NAME>
<STD>116th</STD>
</DATA>
<DATA>
<NAME type="char">Palak Bisht</NAME>
<STD>17th</STD>
</DATA>
</GROUP>
</SCHOOL>'''
root = ET.fromstring(xml)
data = {x.find("NAME").text: x.find("STD").text for x in root.findall('.//DATA')}
print(data)
output
{'Sahil Jha': '11th', 'Rashmi Kaur': '116th', 'Palak Bisht': '17th'}

How can I display the child element of a node from an xml file, in python?

Here is my xml file
<root>
<Module name="ac4" offset="32" width="12">
<register name="xga_control" offset="0x000" width="32" access="R/W">
<field name="reserved" offset="0" bit_span="5"/>
<field name="force_all_fault_clear" bit_span="1" default="0">
<description>Rising edge forces all fault registers to clear</description>
</field>
<field name="force_warning" default="0" bit_span="1">
<description>Forces AC2 to report a Master Warning</description>
</field>
<field name="force_error" default="0" bit_span="1">
<description>Forces AC2 to report a Master Error</description>
</field>
</register>
</Module>
<root>
Right now I can access the names of my registers and display them. However I also want to display the names and attributes of my field elements. How can I do that? Here is my code so far.
input_file = etree.parse('file1.xml')
output=open("ac4.vhd","w+")
output.write("Registers \n")
for node in input_file.iter():
if node.tag=="register":
name=node.attrib.get("name")
print(name)
output.write(name)
output.write("\n")
if node.tag=="field":
name=node.attrib.get("name")
output.write(name)
Right now the output looks like
Registers
xga_control
i_cmd_reg
I want it to look like
Registers
xga_control
reserved
force_all_fault_clear
force_warning
force_error
i_cmd_reg
field name
field name
Any ideas on how to do this?

Instead of iterating over input_file.iter() you can do input_file.getroot() and iterate systematically over that.
This is how you would write your code:
import xml.etree.ElementTree as ET
tree = ET.parse('file1.xml')
root = tree.getroot()
with open('ac4.vhd', 'w+') as fd:
fd.write('Registers\n')
for node in root:
if node.tag == 'Module':
for sub_node in node:
fd.write('{0}\n'.format(sub_node.get('name')))
for child in sub_node:
fd.write('\t{0}\n'.format(child.get('name')))
Your output becomes:
Registers
xga_control
reserved
force_all_fault_clear
force_warning
force_error

Develop Reference

Python is a programming language that lets you work quickly and integrate systems more effectively.

Parsing name/value pairs from XML - python

Related

How can I extract elementary values with ElementTree in Python?

How to remove root element from xml file using python

xml.etree.ElementTree access subelement without creating

Fetch xml tag values recursively using ElementTree

How can I display the child element of a node from an xml file, in python?

Categories

Resources