I am using ElementTree to build an XML file.
When I try to set an element's attribute with ET.SubElement().__setattr__(), I get the error AttributeError: __setattr__.
import xml.etree.cElementTree as ET
summary = open(Summary.xml, 'w')
root = ET.Element('Summary')
ET.SubElement(root, 'TextSummary')
ET.SubElement(root,'TextSummary').__setattr__('Status','Completed') # Error occurs here
tree = ET.ElementTree(root)
tree.write(summary)
summary.close()
After code execution, my XML should resemble the following:
<Summary>
<TextSummary Status = 'Completed'/>
</Summary>
How do I add attributes to an XML element with Python using xml.etree.cElementTree?
You should be doing:
ET.SubElement(root,'TextSummary').set('Status','Completed')
The Etree documentation shows usage.
You can specify attributes for an Element or SubElement during creation with keyword arguments.
import xml.etree.ElementTree as ET
root = ET.Element('Summary')
ET.SubElement(root, 'TextSummary', Status='Completed')
XML:
<Summary>
<TextSummary Status="Completed"/>
</Summary>
Alternatively, you can use .set to add attributes to an existing element.
import xml.etree.ElementTree as ET
root = ET.Element('Summary')
sub = ET.SubElement(root, 'TextSummary')
sub.set('Status', 'Completed')
XML:
<Summary>
<TextSummary Status="Completed"/>
</Summary>
Technical Explanation:
The constructors for Element and SubElement include **extra, which accepts attributes as keyword arguments.
xml.etree.ElementTree.Element(tag, attrib={}, **extra)
xml.etree.ElementTree.SubElement(parent, tag, attrib={}, **extra)
This allows you to add an arbitrary number of attributes.
root = ET.Element('Summary', Date='2018/07/02', Timestamp='11:44am')
# <Summary Date = "2018/07/02" Timestamp = "11:44am">
You can also use use .set to add attributes to a pre-existing element. However, this can only add one element at a time. (As suggested by Thomas Orozco).
root = ET.Element('Summary')
root.set('Date', '2018/07/02')
root.set('Timestamp', '11:44am')
# <Summary Date = "2018/07/02" Timestamp = "11:44am">
Full Example:
import xml.etree.ElementTree as ET
root = ET.Element('school', name='Willow Creek High')
ET.SubElement(root, 'student', name='Jane Doe', grade='9')
print(ET.tostring(root).decode())
# <school name="Willow Creek High"><student grade="9" name="Jane Doe" /></school>
The best way to set multiple attributes in single line is below.
I wrote this code for SVG XML creation:
from xml.etree import ElementTree as ET
svg = ET.Element('svg', attrib={'height':'210','width':'500'})
g = ET.SubElement(svg,'g', attrib={'x':'10', 'y':'12','id':'groupName'})
line = ET.SubElement(g, 'line', attrib={'x1':'0','y1':'0','x2':'200','y2':'200','stroke':'red'})
print(ET.tostring(svg, encoding="us-ascii", method="xml"))
Related
Let's assume that we have xml file:
<School Name = "school1">
<Class Name = "class A">
<Student Name = "student"/>
<Student/>
<!-- -->
</Class>
</School>
And I have a python script that using parsing. I want to print the line of a tag.
For example I want to print lines of tags that have no "Name" attribute.
Is it possible ?
I saw an example with inheritance ElementTree but couldn't understand it.
import xml.etree.ElementTree as ET
def read_root(root):
for x in root:
print(x.lineNum)
read_root(x)
def main():
fn = "a.xml"
try:
tree = ET.parse(fn)
except ET.ParseError as e:
print("\nParse error:", str(e))
print("while reading: " + fn)
exit(1)
root = tree.getroot()
read_root(root)
Your question is so unclear. Anyways, if you just want to check if the tag has a Name attribute and want to print that line number, you can use etree from lxml as shown below:
from lxml import etree
doc = etree.parse('test.xml')
for element in doc.iter():
# Check if the tag has a "Name" attribute
if "Name" not in element.attrib:
print(f"Line {element.sourceline}: {element.tag}"))
output:
Line 4: Student
Line 5: <cyfunction Comment at 0x13b8e6dc0>
You need a parser like ET.XMLPullParser what can read "comment" and "process instructions", "namespces", "start" and "end" events.
If your XML file 'comment.xml' looks like:
<?xml version="1.0" encoding="UTF-8"?>
<School Name = "school1">
<Class Name = "class A">
<Student Name = "student"/>
<Student/>
<!-- Comment xml -->
</Class>
</School>
You can parse to find TAG's without the attribute "Name" and comments:
import xml.etree.ElementTree as ET
#parser = ET.XMLPullParser(['start', 'end', "comment", "pi", "start-ns", "end-ns"])
parser = ET.XMLPullParser([ 'start', 'end', 'comment'])
with open('comment.xml', 'r', encoding='utf-8') as xml:
feedstring = xml.readlines()
for line in enumerate(feedstring):
parser.feed(line[1])
for event, elem in parser.read_events():
if elem.get("Name"):
pass
else:
print(f"{line[0]} Event:{event} | {elem.tag}, {elem.text}")
Output:
4 Event:start | Student, None
4 Event:end | Student, None
5 Event:comment | <function Comment at 0x00000216C4FDA200>, Comment xml
I often use len(find_all("some_element") to count the number of entities in a xml file. I tried to build a function, but it doesn't work/ it always give me "None".
The XML file:
<parent>
<some>
<child>text</child>
<child>text</child>
<child>text</child>
</some>
</parent>
my python code:
def return_len(para1,para2): # doesn't work
if bool(suppe.para1): # the element isn't always present in the xml
return len(suppe.para1.find_all(para2))
def return_len1(): # does work
if bool(suppe.some):
return len(suppe.some.find_all("child"))
print(return_len("some","child")) # doesnt work
print(return_len1()) # does work
How must i modify my function return_len to get working / what did i wrong?
You can do like this.
from bs4 import BeautifulSoup
s = """<parent>
<some>
<child>text</child>
<child>text</child>
<child>text</child>
</some>
</parent>
"""
soup = BeautifulSoup(s, 'xml')
def return_len(para1,para2,soup):
print(f'No. of <{para2}> tags inside <{para1}> tag.')
temp = soup.find(para1)
if temp:
return len(temp.find_all(para2))
print(return_len('some', 'child', soup))
print(return_len('parent', 'some', soup))
No. of <child> tags inside <some> tag.
3
No. of <some> tags inside <parent> tag.
1
Without any external library - see the below
import xml.etree.ElementTree as ET
xml = '''<parent>
<some>
<child>text</child>
<child>text</child>
<child>text</child>
</some>
</parent>'''
root = ET.fromstring(xml)
print(f'Number of child elements is {len(root.findall(".//child"))}')
output
Number of child elements is 3
I have code:
from lxml import etree
root = etree.Element("check", attrib={"p": "1","c": "2", "d": "3","v": "4"})
tree = etree.ElementTree(root)
and I get:
<check c="2" d="3" p="1" v="4"/>
But i need without attribute sorting:
<check p="1" c="2" d="3" v="4"/>
How i can get it?
Many thanks for any help.
you cannot control sort of attribute. but you can have access to attribute:
tree = etree.ElementTree(root)
att = tree.attrib
print (att) # {attr_name : attr_value}
from "attr" you can get only attr_name:
attr_name = att.kyes() # ['attr_name_1', 'attr_name_2'...]
I'm trying to scrape data from an API like this:
import urllib2
a = urllib2.urlopen('http://query.yahooapis.com/v1/public/yql?q=select%20*%20from%20yahoo.finance.xchange%20where%20pair%20in%20(%22USDEUR%22,%20%22USDJPY%22,%20%22USDBGN%22,%20%22USDCZK%22,%20%22USDDKK%22,%20%22USDGBP%22,%20%22USDHUF%22,%20%22USDLTL%22,%20%22USDLVL%22,%20%22USDPLN%22,%20%22USDRON%22,%20%22USDSEK%22,%20%22USDCHF%22,%20%22USDNOK%22,%20%22USDHRK%22,%20%22USDRUB%22,%20%22USDTRY%22,%20%22USDAUD%22,%20%22USDBRL%22,%20%22USDCAD%22,%20%22USDCNY%22,%20%22USDHKD%22,%20%22USDIDR%22,%20%22USDILS%22,%20%22USDINR%22,%20%22USDKRW%22,%20%22USDMXN%22,%20%22USDMYR%22,%20%22USDNZD%22,%20%22USDPHP%22,%20%22USDSGD%22,%20%22USDTHB%22,%20%22USDZAR%22,%20%22USDISK%22)&env=store://datatables.org/alltableswithkeys')
b = a.read()
b is a string object of the xml:
<?xml version="1.0" encoding="UTF-8"?>
<query xmlns:yahoo="http://www.yahooapis.com/v1/base.rng" yahoo:count="34" yahoo:created="2017-04-21T19:46:11Z" yahoo:lang="en-US"><results><rate id="USDEUR"><Name>USD/EUR</Name><Rate>0.9347</Rate><Date>4/21/2017</Date><Time>7:13pm</Time><Ask>0.9352</Ask><Bid>0.9347</Bid></rate><rate id="USDJPY"><Name>USD/JPY</Name><Rate>109.2200</Rate><Date>4/21/2017</Date><Time>6:58pm</Time><Ask>109.2260</Ask><Bid>109.2200</Bid></rate><rate id="USDBGN"><Name>USD/BGN</Name><Rate>1.8282</Rate><Date>4/21/2017</Date><Time>3:15pm</Time><Ask>N/A</Ask><Bid>1.8282</Bid></rate><rate id="USDCZK"><Name>USD/CZK</Name><Rate>25.1629</Rate><Date>4/21/2017</Date><Time>8:35pm</Time><Ask>25.1702</Ask><Bid>25.1629</Bid></rate><rate id="USDDKK"><Name>USD/DKK</Name><Rate>6.9458</Rate><Date>4/21/2017</Date><Time>6:44pm</Time><Ask>6.9466</Ask><Bid>6.9458</Bid></rate><rate id="USDGBP"><Name>USD/GBP</Name><Rate>0.7812</Rate><Date>4/21/2017</Date><Time>6:29pm</Time><Ask>0.7813</Ask><Bid>0.7812</Bid></rate><rate id="USDHUF"><Name>USD/HUF</Name><Rate>292.4200</Rate><Date>4/21/2017</Date><Time>8:14pm</Time><Ask>292.6200</Ask><Bid>292.4200</Bid></rate><rate id="USDLTL"><Name>USD/LTL</Name><Rate>3.0487</Rate><Date>6/22/2015</Date><Time>9:39am</Time><Ask>3.0491</Ask><Bid>3.0487</Bid></rate><rate id="USDLVL"><Name>USD/LVL</Name><Rate>0.6205</Rate><Date>6/22/2015</Date><Time>9:37am</Time><Ask>0.6206</Ask><Bid>0.6205</Bid></rate><rate id="USDPLN"><Name>USD/PLN</Name><Rate>3.9907</Rate><Date>4/21/2017</Date><Time>6:53pm</Time><Ask>3.9916</Ask><Bid>3.9907</Bid></rate><rate id="USDRON"><Name>USD/RON</Name><Rate>4.2276</Rate><Date>4/21/2017</Date><Time>6:02pm</Time><Ask>4.2411</Ask><Bid>4.2276</Bid></rate><rate id="USDSEK"><Name>USD/SEK</Name><Rate>9.0293</Rate><Date>4/21/2017</Date><Time>8:28pm</Time><Ask>9.0310</Ask><Bid>9.0293</Bid></rate><rate id="USDCHF"><Name>USD/CHF</Name><Rate>0.9977</Rate><Date>4/21/2017</Date><Time>6:33pm</Time><Ask>0.9977</Ask><Bid>0.9977</Bid></rate><rate id="USDNOK"><Name>USD/NOK</Name><Rate>8.6823</Rate><Date>4/21/2017</Date><Time>7:00pm</Time><Ask>8.6858</Ask><Bid>8.6823</Bid></rate><rate id="USDHRK"><Name>USD/HRK</Name><Rate>6.9250</Rate><Date>4/21/2017</Date><Time>6:53pm</Time><Ask>6.9981</Ask><Bid>6.9250</Bid></rate><rate id="USDRUB"><Name>USD/RUB</Name><Rate>56.5055</Rate><Date>4/21/2017</Date><Time>6:33pm</Time><Ask>56.5405</Ask><Bid>56.5055</Bid></rate><rate id="USDTRY"><Name>USD/TRY</Name><Rate>3.6473</Rate><Date>4/21/2017</Date><Time>6:02pm</Time><Ask>3.6478</Ask><Bid>3.6473</Bid></rate><rate id="USDAUD"><Name>USD/AUD</Name><Rate>1.3263</Rate><Date>4/21/2017</Date><Time>8:35pm</Time><Ask>1.3267</Ask><Bid>1.3263</Bid></rate><rate id="USDBRL"><Name>USD/BRL</Name><Rate>3.1473</Rate><Date>4/21/2017</Date><Time>7:02pm</Time><Ask>3.1493</Ask><Bid>3.1473</Bid></rate><rate id="USDCAD"><Name>USD/CAD</Name><Rate>1.3513</Rate><Date>4/21/2017</Date><Time>6:49pm</Time><Ask>1.3513</Ask><Bid>1.3513</Bid></rate><rate id="USDCNY"><Name>USD/CNY</Name><Rate>6.8844</Rate><Date>4/21/2017</Date><Time>6:38pm</Time><Ask>6.8854</Ask><Bid>6.8844</Bid></rate><rate id="USDHKD"><Name>USD/HKD</Name><Rate>7.7746</Rate><Date>4/21/2017</Date><Time>6:01pm</Time><Ask>7.7754</Ask><Bid>7.7746</Bid></rate><rate id="USDIDR"><Name>USD/IDR</Name><Rate>13316.0000</Rate><Date>4/21/2017</Date><Time>6:38pm</Time><Ask>13326.0000</Ask><Bid>13316.0000</Bid></rate><rate id="USDILS"><Name>USD/ILS</Name><Rate>3.6723</Rate><Date>4/21/2017</Date><Time>6:52pm</Time><Ask>3.6823</Ask><Bid>3.6723</Bid></rate><rate id="USDINR"><Name>USD/INR</Name><Rate>64.6490</Rate><Date>4/21/2017</Date><Time>6:26pm</Time><Ask>64.6990</Ask><Bid>64.6490</Bid></rate><rate id="USDKRW"><Name>USD/KRW</Name><Rate>1133.3700</Rate><Date>4/21/2017</Date><Time>6:50pm</Time><Ask>1134.3700</Ask><Bid>1133.3700</Bid></rate><rate id="USDMXN"><Name>USD/MXN</Name><Rate>18.8424</Rate><Date>4/21/2017</Date><Time>6:16pm</Time><Ask>18.8443</Ask><Bid>18.8424</Bid></rate><rate id="USDMYR"><Name>USD/MYR</Name><Rate>4.3980</Rate><Date>4/21/2017</Date><Time>6:38pm</Time><Ask>4.4030</Ask><Bid>4.3980</Bid></rate><rate id="USDNZD"><Name>USD/NZD</Name><Rate>1.4226</Rate><Date>4/21/2017</Date><Time>6:50pm</Time><Ask>1.4236</Ask><Bid>1.4226</Bid></rate><rate id="USDPHP"><Name>USD/PHP</Name><Rate>49.8400</Rate><Date>4/21/2017</Date><Time>6:13pm</Time><Ask>49.8900</Ask><Bid>49.8400</Bid></rate><rate id="USDSGD"><Name>USD/SGD</Name><Rate>1.3966</Rate><Date>4/21/2017</Date><Time>8:28pm</Time><Ask>1.3969</Ask><Bid>1.3966</Bid></rate><rate id="USDTHB"><Name>USD/THB</Name><Rate>34.3500</Rate><Date>4/21/2017</Date><Time>6:49pm</Time><Ask>34.4000</Ask><Bid>34.3500</Bid></rate><rate id="USDZAR"><Name>USD/ZAR</Name><Rate>13.1525</Rate><Date>4/21/2017</Date><Time>6:50pm</Time><Ask>13.1620</Ask><Bid>13.1525</Bid></rate><rate id="USDISK"><Name>USD/ISK</Name><Rate>109.4900</Rate><Date>4/21/2017</Date><Time>5:32pm</Time><Ask>109.9900</Ask><Bid>109.4900</Bid></rate></results></query><!-- total: 1083 -->
<!-- prod_bf1_1;paas.yql;queryyahooapiscomproductionbf1;885cf297-259f-11e7-b972-d4ae52974741 -->
However, when I'm using xml the xml etree module to parse this string as an xml object, I'm getting errors like the object is not indexable and the object is not iterable. What exactly is the output of this code?
import xml.etree.ElementTree as ET
d = ET.ElementTree(ET.fromstring(b))
EDIT: The errors are coming up when I'm trying to iterate through the children of d like so:
for child in d:
print child.tag
The error here is "TypeError: 'ElementTree' object is not iterable"
How can I access the children in this string xml to get specific values from it?
you are overdoing things when you try to convert the string to an elementtree element:
import xml.etree.ElementTree as ET
b = '''<?xml version="1.0" encoding="UTF-8"?>...'''
element = ET.fromstring(b) # that does it!
print(element.attrib)
now you can access element as you would any instance of xml.etree.ElementTree.Element.
you could do this for example to iterate over all children:
for child in tree.iter():
print(child, child.tag, child.text, child.attrib)
If I've got an XML file like this:
<root
xmlns:a="http://example.com/a"
xmlns:b="http://example.com/b"
xmlns:c="http://example.com/c"
xmlns="http://example.com/base">
...
</root>
How can I get a list of the namespace definitions (ie, the xmlns:a="…", etc)?
Using:
import xml.etree.ElementTree as ET
tree = ET.parse('foo.xml')
root = tree.getroot()
print root.attrib()
Shows an empty attribute dictionary.
Via #mzjn, in the comments, here's how to do it with stock ElementTree: https://stackoverflow.com/a/42372404/407651 :
import xml.etree.ElementTree as ET
my_namespaces = dict([
node for (_, node) in ET.iterparse('file.xml', events=['start-ns'])
])
You might find it easier to use lxml.
from lxml import etree
xml_data = '<root xmlns:a="http://example.com/a" xmlns:b="http://example.com/b" xmlns:c="http://example.com/c" xmlns="http://example.com/base"></root>'
root_node = etree.fromstring(xml_data)
print root_node.nsmap
This outputs
{None: 'http://example.com/base',
'a': 'http://example.com/a',
'b': 'http://example.com/b',
'c': 'http://example.com/c'}