List namespace definitions in an XML document with ElementTree? - python

If I've got an XML file like this:
<root
xmlns:a="http://example.com/a"
xmlns:b="http://example.com/b"
xmlns:c="http://example.com/c"
xmlns="http://example.com/base">
...
</root>
How can I get a list of the namespace definitions (ie, the xmlns:a="…", etc)?
Using:
import xml.etree.ElementTree as ET
tree = ET.parse('foo.xml')
root = tree.getroot()
print root.attrib()
Shows an empty attribute dictionary.

Via #mzjn, in the comments, here's how to do it with stock ElementTree: https://stackoverflow.com/a/42372404/407651 :
import xml.etree.ElementTree as ET
my_namespaces = dict([
node for (_, node) in ET.iterparse('file.xml', events=['start-ns'])
])

You might find it easier to use lxml.
from lxml import etree
xml_data = '<root xmlns:a="http://example.com/a" xmlns:b="http://example.com/b" xmlns:c="http://example.com/c" xmlns="http://example.com/base"></root>'
root_node = etree.fromstring(xml_data)
print root_node.nsmap
This outputs
{None: 'http://example.com/base',
'a': 'http://example.com/a',
'b': 'http://example.com/b',
'c': 'http://example.com/c'}

Related

Parsing custom xml file using python

I have an xml file of following format :
<?xml version='1.0' encoding='utf-8'?>
<execute time="0.59">
<exec name="recursive_a" loops="3" fail="2" skipped="0">
<testcase tname="test_a" name="test.cpp" time="0.50">
<pass>
001,test,pass
</pass>
</testcase>
</exec>
</execute>
how can i parse "recursive_a" string from this xml using python? (i am using minidom xml parser)
With xml.etree.ElementTree and pandas one solution could be:
import xml.etree.ElementTree as ET
import pandas as pd
tree = ET.parse('Code3r.xml')
root = tree.getroot()
for elem in root:
if elem.tag == "exec":
# print(elem.attrib) or with pandas
df = pd.DataFrame.from_dict(elem.attrib, orient='index')
print(df.T.to_string(index=False))
Output:
name loops fail skipped
recursive_a 3 2 0

Get value for XML attribute in python

I am trying to parse an XML file in python and seems like my XML is different from the normal nomenclature.
Below is my XML snippet:
<records>
<record>
<parameter>
<name>Server</name>
<value>Application_server_01</value>
</parameter
</record>
</records>
I am trying to get the value of "parameter" name and value however i seem to get empty value.
I checked the online documentation and almost all XML seems to be in the below format
<neighbor name="Switzerland" direction="W"/>
I am able to parse this fine, how can i get the values for my XML attributes without changing the formatting.
working code
import xml.etree.ElementTree as ET
tree = ET.parse('country_data.xml')
root = tree.getroot()
for neighbor in root.iter('neighbor'):
print(neighbor.attrib)
output
C:/Users/xxxxxx/PycharmProjects/default/parse.py
{'direction': 'E', 'name': 'Austria'}
{'direction': 'W', 'name': 'Switzerland'}
{'direction': 'N', 'name': 'Malaysia'}
{'direction': 'W', 'name': 'Costa Rica'}
{'direction': 'E', 'name': 'Colombia'}
PS: I will be using the XML to fire an API call and doubt if the downstream application would like the second way of formatting.
Below is my python code
import xml.etree.ElementTree as ET
tree = ET.parse('at.xml')
root = tree.getroot()
for name in root.iter('name'):
print(name.attrib)
Output for the above code
C:/Users/xxxxxx/PycharmProjects/default/learning.py
{}
{}
{}
{}
{}
{}
{}
{}
Use lxml and XPath:
from lxml import etree as et
tree = et.parse(open("/tmp/so.xml"))
name = tree.xpath("/records/record/parameter/name/text()")[0]
value = tree.xpath("/records/record/parameter/value/text()")[0]
print(name, value)
Output:
Server Application_server_01

don't sort attrib for xml tree in python

I have code:
from lxml import etree
root = etree.Element("check", attrib={"p": "1","c": "2", "d": "3","v": "4"})
tree = etree.ElementTree(root)
and I get:
<check c="2" d="3" p="1" v="4"/>
But i need without attribute sorting:
<check p="1" c="2" d="3" v="4"/>
How i can get it?
Many thanks for any help.
you cannot control sort of attribute. but you can have access to attribute:
tree = etree.ElementTree(root)
att = tree.attrib
print (att) # {attr_name : attr_value}
from "attr" you can get only attr_name:
attr_name = att.kyes() # ['attr_name_1', 'attr_name_2'...]

Error in parsing a YQL string xml Python

I'm trying to scrape data from an API like this:
import urllib2
a = urllib2.urlopen('http://query.yahooapis.com/v1/public/yql?q=select%20*%20from%20yahoo.finance.xchange%20where%20pair%20in%20(%22USDEUR%22,%20%22USDJPY%22,%20%22USDBGN%22,%20%22USDCZK%22,%20%22USDDKK%22,%20%22USDGBP%22,%20%22USDHUF%22,%20%22USDLTL%22,%20%22USDLVL%22,%20%22USDPLN%22,%20%22USDRON%22,%20%22USDSEK%22,%20%22USDCHF%22,%20%22USDNOK%22,%20%22USDHRK%22,%20%22USDRUB%22,%20%22USDTRY%22,%20%22USDAUD%22,%20%22USDBRL%22,%20%22USDCAD%22,%20%22USDCNY%22,%20%22USDHKD%22,%20%22USDIDR%22,%20%22USDILS%22,%20%22USDINR%22,%20%22USDKRW%22,%20%22USDMXN%22,%20%22USDMYR%22,%20%22USDNZD%22,%20%22USDPHP%22,%20%22USDSGD%22,%20%22USDTHB%22,%20%22USDZAR%22,%20%22USDISK%22)&env=store://datatables.org/alltableswithkeys')
b = a.read()
b is a string object of the xml:
<?xml version="1.0" encoding="UTF-8"?>
<query xmlns:yahoo="http://www.yahooapis.com/v1/base.rng" yahoo:count="34" yahoo:created="2017-04-21T19:46:11Z" yahoo:lang="en-US"><results><rate id="USDEUR"><Name>USD/EUR</Name><Rate>0.9347</Rate><Date>4/21/2017</Date><Time>7:13pm</Time><Ask>0.9352</Ask><Bid>0.9347</Bid></rate><rate id="USDJPY"><Name>USD/JPY</Name><Rate>109.2200</Rate><Date>4/21/2017</Date><Time>6:58pm</Time><Ask>109.2260</Ask><Bid>109.2200</Bid></rate><rate id="USDBGN"><Name>USD/BGN</Name><Rate>1.8282</Rate><Date>4/21/2017</Date><Time>3:15pm</Time><Ask>N/A</Ask><Bid>1.8282</Bid></rate><rate id="USDCZK"><Name>USD/CZK</Name><Rate>25.1629</Rate><Date>4/21/2017</Date><Time>8:35pm</Time><Ask>25.1702</Ask><Bid>25.1629</Bid></rate><rate id="USDDKK"><Name>USD/DKK</Name><Rate>6.9458</Rate><Date>4/21/2017</Date><Time>6:44pm</Time><Ask>6.9466</Ask><Bid>6.9458</Bid></rate><rate id="USDGBP"><Name>USD/GBP</Name><Rate>0.7812</Rate><Date>4/21/2017</Date><Time>6:29pm</Time><Ask>0.7813</Ask><Bid>0.7812</Bid></rate><rate id="USDHUF"><Name>USD/HUF</Name><Rate>292.4200</Rate><Date>4/21/2017</Date><Time>8:14pm</Time><Ask>292.6200</Ask><Bid>292.4200</Bid></rate><rate id="USDLTL"><Name>USD/LTL</Name><Rate>3.0487</Rate><Date>6/22/2015</Date><Time>9:39am</Time><Ask>3.0491</Ask><Bid>3.0487</Bid></rate><rate id="USDLVL"><Name>USD/LVL</Name><Rate>0.6205</Rate><Date>6/22/2015</Date><Time>9:37am</Time><Ask>0.6206</Ask><Bid>0.6205</Bid></rate><rate id="USDPLN"><Name>USD/PLN</Name><Rate>3.9907</Rate><Date>4/21/2017</Date><Time>6:53pm</Time><Ask>3.9916</Ask><Bid>3.9907</Bid></rate><rate id="USDRON"><Name>USD/RON</Name><Rate>4.2276</Rate><Date>4/21/2017</Date><Time>6:02pm</Time><Ask>4.2411</Ask><Bid>4.2276</Bid></rate><rate id="USDSEK"><Name>USD/SEK</Name><Rate>9.0293</Rate><Date>4/21/2017</Date><Time>8:28pm</Time><Ask>9.0310</Ask><Bid>9.0293</Bid></rate><rate id="USDCHF"><Name>USD/CHF</Name><Rate>0.9977</Rate><Date>4/21/2017</Date><Time>6:33pm</Time><Ask>0.9977</Ask><Bid>0.9977</Bid></rate><rate id="USDNOK"><Name>USD/NOK</Name><Rate>8.6823</Rate><Date>4/21/2017</Date><Time>7:00pm</Time><Ask>8.6858</Ask><Bid>8.6823</Bid></rate><rate id="USDHRK"><Name>USD/HRK</Name><Rate>6.9250</Rate><Date>4/21/2017</Date><Time>6:53pm</Time><Ask>6.9981</Ask><Bid>6.9250</Bid></rate><rate id="USDRUB"><Name>USD/RUB</Name><Rate>56.5055</Rate><Date>4/21/2017</Date><Time>6:33pm</Time><Ask>56.5405</Ask><Bid>56.5055</Bid></rate><rate id="USDTRY"><Name>USD/TRY</Name><Rate>3.6473</Rate><Date>4/21/2017</Date><Time>6:02pm</Time><Ask>3.6478</Ask><Bid>3.6473</Bid></rate><rate id="USDAUD"><Name>USD/AUD</Name><Rate>1.3263</Rate><Date>4/21/2017</Date><Time>8:35pm</Time><Ask>1.3267</Ask><Bid>1.3263</Bid></rate><rate id="USDBRL"><Name>USD/BRL</Name><Rate>3.1473</Rate><Date>4/21/2017</Date><Time>7:02pm</Time><Ask>3.1493</Ask><Bid>3.1473</Bid></rate><rate id="USDCAD"><Name>USD/CAD</Name><Rate>1.3513</Rate><Date>4/21/2017</Date><Time>6:49pm</Time><Ask>1.3513</Ask><Bid>1.3513</Bid></rate><rate id="USDCNY"><Name>USD/CNY</Name><Rate>6.8844</Rate><Date>4/21/2017</Date><Time>6:38pm</Time><Ask>6.8854</Ask><Bid>6.8844</Bid></rate><rate id="USDHKD"><Name>USD/HKD</Name><Rate>7.7746</Rate><Date>4/21/2017</Date><Time>6:01pm</Time><Ask>7.7754</Ask><Bid>7.7746</Bid></rate><rate id="USDIDR"><Name>USD/IDR</Name><Rate>13316.0000</Rate><Date>4/21/2017</Date><Time>6:38pm</Time><Ask>13326.0000</Ask><Bid>13316.0000</Bid></rate><rate id="USDILS"><Name>USD/ILS</Name><Rate>3.6723</Rate><Date>4/21/2017</Date><Time>6:52pm</Time><Ask>3.6823</Ask><Bid>3.6723</Bid></rate><rate id="USDINR"><Name>USD/INR</Name><Rate>64.6490</Rate><Date>4/21/2017</Date><Time>6:26pm</Time><Ask>64.6990</Ask><Bid>64.6490</Bid></rate><rate id="USDKRW"><Name>USD/KRW</Name><Rate>1133.3700</Rate><Date>4/21/2017</Date><Time>6:50pm</Time><Ask>1134.3700</Ask><Bid>1133.3700</Bid></rate><rate id="USDMXN"><Name>USD/MXN</Name><Rate>18.8424</Rate><Date>4/21/2017</Date><Time>6:16pm</Time><Ask>18.8443</Ask><Bid>18.8424</Bid></rate><rate id="USDMYR"><Name>USD/MYR</Name><Rate>4.3980</Rate><Date>4/21/2017</Date><Time>6:38pm</Time><Ask>4.4030</Ask><Bid>4.3980</Bid></rate><rate id="USDNZD"><Name>USD/NZD</Name><Rate>1.4226</Rate><Date>4/21/2017</Date><Time>6:50pm</Time><Ask>1.4236</Ask><Bid>1.4226</Bid></rate><rate id="USDPHP"><Name>USD/PHP</Name><Rate>49.8400</Rate><Date>4/21/2017</Date><Time>6:13pm</Time><Ask>49.8900</Ask><Bid>49.8400</Bid></rate><rate id="USDSGD"><Name>USD/SGD</Name><Rate>1.3966</Rate><Date>4/21/2017</Date><Time>8:28pm</Time><Ask>1.3969</Ask><Bid>1.3966</Bid></rate><rate id="USDTHB"><Name>USD/THB</Name><Rate>34.3500</Rate><Date>4/21/2017</Date><Time>6:49pm</Time><Ask>34.4000</Ask><Bid>34.3500</Bid></rate><rate id="USDZAR"><Name>USD/ZAR</Name><Rate>13.1525</Rate><Date>4/21/2017</Date><Time>6:50pm</Time><Ask>13.1620</Ask><Bid>13.1525</Bid></rate><rate id="USDISK"><Name>USD/ISK</Name><Rate>109.4900</Rate><Date>4/21/2017</Date><Time>5:32pm</Time><Ask>109.9900</Ask><Bid>109.4900</Bid></rate></results></query><!-- total: 1083 -->
<!-- prod_bf1_1;paas.yql;queryyahooapiscomproductionbf1;885cf297-259f-11e7-b972-d4ae52974741 -->
However, when I'm using xml the xml etree module to parse this string as an xml object, I'm getting errors like the object is not indexable and the object is not iterable. What exactly is the output of this code?
import xml.etree.ElementTree as ET
d = ET.ElementTree(ET.fromstring(b))
EDIT: The errors are coming up when I'm trying to iterate through the children of d like so:
for child in d:
print child.tag
The error here is "TypeError: 'ElementTree' object is not iterable"
How can I access the children in this string xml to get specific values from it?
you are overdoing things when you try to convert the string to an elementtree element:
import xml.etree.ElementTree as ET
b = '''<?xml version="1.0" encoding="UTF-8"?>...'''
element = ET.fromstring(b) # that does it!
print(element.attrib)
now you can access element as you would any instance of xml.etree.ElementTree.Element.
you could do this for example to iterate over all children:
for child in tree.iter():
print(child, child.tag, child.text, child.attrib)

How do I set attributes for an XML element with Python?

I am using ElementTree to build an XML file.
When I try to set an element's attribute with ET.SubElement().__setattr__(), I get the error AttributeError: __setattr__.
import xml.etree.cElementTree as ET
summary = open(Summary.xml, 'w')
root = ET.Element('Summary')
ET.SubElement(root, 'TextSummary')
ET.SubElement(root,'TextSummary').__setattr__('Status','Completed') # Error occurs here
tree = ET.ElementTree(root)
tree.write(summary)
summary.close()
After code execution, my XML should resemble the following:
<Summary>
<TextSummary Status = 'Completed'/>
</Summary>
How do I add attributes to an XML element with Python using xml.etree.cElementTree?
You should be doing:
ET.SubElement(root,'TextSummary').set('Status','Completed')
The Etree documentation shows usage.
You can specify attributes for an Element or SubElement during creation with keyword arguments.
import xml.etree.ElementTree as ET
root = ET.Element('Summary')
ET.SubElement(root, 'TextSummary', Status='Completed')
XML:
<Summary>
<TextSummary Status="Completed"/>
</Summary>
Alternatively, you can use .set to add attributes to an existing element.
import xml.etree.ElementTree as ET
root = ET.Element('Summary')
sub = ET.SubElement(root, 'TextSummary')
sub.set('Status', 'Completed')
XML:
<Summary>
<TextSummary Status="Completed"/>
</Summary>
Technical Explanation:
The constructors for Element and SubElement include **extra, which accepts attributes as keyword arguments.
xml.etree.ElementTree.Element(tag, attrib={}, **extra)
xml.etree.ElementTree.SubElement(parent, tag, attrib={}, **extra)
This allows you to add an arbitrary number of attributes.
root = ET.Element('Summary', Date='2018/07/02', Timestamp='11:44am')
# <Summary Date = "2018/07/02" Timestamp = "11:44am">
You can also use use .set to add attributes to a pre-existing element. However, this can only add one element at a time. (As suggested by Thomas Orozco).
root = ET.Element('Summary')
root.set('Date', '2018/07/02')
root.set('Timestamp', '11:44am')
# <Summary Date = "2018/07/02" Timestamp = "11:44am">
Full Example:
import xml.etree.ElementTree as ET
root = ET.Element('school', name='Willow Creek High')
ET.SubElement(root, 'student', name='Jane Doe', grade='9')
print(ET.tostring(root).decode())
# <school name="Willow Creek High"><student grade="9" name="Jane Doe" /></school>
The best way to set multiple attributes in single line is below.
I wrote this code for SVG XML creation:
from xml.etree import ElementTree as ET
svg = ET.Element('svg', attrib={'height':'210','width':'500'})
g = ET.SubElement(svg,'g', attrib={'x':'10', 'y':'12','id':'groupName'})
line = ET.SubElement(g, 'line', attrib={'x1':'0','y1':'0','x2':'200','y2':'200','stroke':'red'})
print(ET.tostring(svg, encoding="us-ascii", method="xml"))

Categories

Resources