I would like to add attribute to a lxml Element like this
<outer xmlns:xsi="http://www.w3.org/2001/XMLSchema-instance">
<Header>
<field1 name="blah">some value1</field1>
<field2 name="asdfasd">some value2</field2>
</Header>
</outer>
Here is what I have
E = lxml.builder.ElementMaker()
outer = E.outer
header = E.Header
FIELD1 = E.field1
FIELD2 = E.field2
the_doc = outer(
xmlns:xsi="http://www.w3.org/2001/XMLSchema-instance",
XML_2_HEADER(
FIELD1('some value1', name='blah'),
FIELD2('some value2', name='asdfasd'),
),
)
seems like this line is causing some problem
xmlns:xsi="http://www.w3.org/2001/XMLSchema-instance",
even if I replace it with
'xmlns:xsi'="http://www.w3.org/2001/XMLSchema-instance",
it won't work.
What is a way to add attribute to lxml Element?
That's a namespace definition, not an ordinary XML attribute. You can pass namespace information to ElementMaker() as a dictionary, for example :
from lxml import etree as ET
import lxml.builder
nsdef = {'xsi':'http://www.w3.org/2001/XMLSchema-instance'}
E = lxml.builder.ElementMaker(nsmap=nsdef)
doc = E.outer(
E.Header(
E.field1('some value1', name='blah'),
E.field2('some value2', name='asdfasd'),
),
)
print ET.tostring(doc, pretty_print=True)
output :
<outer xmlns:xsi="http://www.w3.org/2001/XMLSchema-instance">
<Header>
<field1 name="blah">some value1</field1>
<field2 name="asdfasd">some value2</field2>
</Header>
</outer>
Link to the docs: http://lxml.de/api/lxml.builder.ElementMaker-class.html
Related
Let's assume that we have xml file:
<School Name = "school1">
<Class Name = "class A">
<Student Name = "student"/>
<Student/>
<!-- -->
</Class>
</School>
And I have a python script that using parsing. I want to print the line of a tag.
For example I want to print lines of tags that have no "Name" attribute.
Is it possible ?
I saw an example with inheritance ElementTree but couldn't understand it.
import xml.etree.ElementTree as ET
def read_root(root):
for x in root:
print(x.lineNum)
read_root(x)
def main():
fn = "a.xml"
try:
tree = ET.parse(fn)
except ET.ParseError as e:
print("\nParse error:", str(e))
print("while reading: " + fn)
exit(1)
root = tree.getroot()
read_root(root)
Your question is so unclear. Anyways, if you just want to check if the tag has a Name attribute and want to print that line number, you can use etree from lxml as shown below:
from lxml import etree
doc = etree.parse('test.xml')
for element in doc.iter():
# Check if the tag has a "Name" attribute
if "Name" not in element.attrib:
print(f"Line {element.sourceline}: {element.tag}"))
output:
Line 4: Student
Line 5: <cyfunction Comment at 0x13b8e6dc0>
You need a parser like ET.XMLPullParser what can read "comment" and "process instructions", "namespces", "start" and "end" events.
If your XML file 'comment.xml' looks like:
<?xml version="1.0" encoding="UTF-8"?>
<School Name = "school1">
<Class Name = "class A">
<Student Name = "student"/>
<Student/>
<!-- Comment xml -->
</Class>
</School>
You can parse to find TAG's without the attribute "Name" and comments:
import xml.etree.ElementTree as ET
#parser = ET.XMLPullParser(['start', 'end', "comment", "pi", "start-ns", "end-ns"])
parser = ET.XMLPullParser([ 'start', 'end', 'comment'])
with open('comment.xml', 'r', encoding='utf-8') as xml:
feedstring = xml.readlines()
for line in enumerate(feedstring):
parser.feed(line[1])
for event, elem in parser.read_events():
if elem.get("Name"):
pass
else:
print(f"{line[0]} Event:{event} | {elem.tag}, {elem.text}")
Output:
4 Event:start | Student, None
4 Event:end | Student, None
5 Event:comment | <function Comment at 0x00000216C4FDA200>, Comment xml
There is a xml file like below:
<aa>
<bb>BB</bb>
<cc>
<dd>Tom</dd>
</cc>
<cc>
<dd>David</dd>
</cc>
</aa>
I'm trying to modify the value "Tom" and "David", but I can't get any value in <dd>. Then I try to get the value in <bb>, but I got the response "None" from my code.
My code as below:
import xml.etree.ElementTree as ET
tree = ET.parse("abc.xml")
root = tree.getroot()
a = root.find('aa/bb')
print(a)
Does someone could help me to correct my code to get and modify the value of <dd> ? Many thanks.
Your top level object is aa. So root is element aa
To get bb, just do root.find('bb')
>>> root
<Element 'aa' at 0x7fb1df5f0278>
>>> a = root.find('bb')
>>> a
<Element 'bb' at 0x7fb1df5f0228>
So to edit the names, try something like this
for dd in root.findall('cc/dd'):
if dd.text in ["Tom", "David"]:
dd.text = "something else"
Using ElementTree
Demo:
import xml.etree.ElementTree
et = xml.etree.ElementTree.parse(filename)
root = et.getroot()
for cc in root.findall('cc'): #Find all cc tags
print(cc.find("dd").text) #Print current text
cc.find("dd").text = "NewValue" #Update dd tags with new value
et.write(filename) #Write back to xml
If you don't mind using BeautifulSoup, you can modify your XML through it:
data = """<aa>
<bb>BB</bb>
<cc>
<dd>Tom</dd>
</cc>
<cc>
<dd>David</dd>
</cc>
</aa>"""
from bs4 import BeautifulSoup
soup = BeautifulSoup(data, 'xml')
for dd in soup.select('cc > dd'): # using CSS selectors
dd.clear()
dd.append('XXX')
print(soup.prettify())
Output:
<?xml version="1.0" encoding="utf-8"?>
<aa>
<bb>
BB
</bb>
<cc>
<dd>
XXX
</dd>
</cc>
<cc>
<dd>
XXX
</dd>
</cc>
</aa>
Hello I am making a requests call to return order data from a online store. My issue is that once I have passed my data to a root variable the method iter is not returning the correct results. e.g. Display multiple tags of the same name rather than one and not showing the data within the tag.
I thought this was due to the XML not being correctly formatted so I formatted it by saving it to a file using pretty_print but that hasn't fixed the error.
How do I fix this? - Thanks in advance
Code:
import requests, xml.etree.ElementTree as ET, lxml.etree as etree
url="http://publicapi.ekmpowershop24.com/v1.1/publicapi.asmx"
headers = {'content-type': 'application/soap+xml'}
body = """<?xml version="1.0" encoding="utf-8"?>
<soap12:Envelope xmlns:xsi="http://www.w3.org/2001/XMLSchema-instance" xmlns:xsd="http://www.w3.org/2001/XMLSchema" xmlns:soap12="http://www.w3.org/2003/05/soap-envelope">
<soap12:Body>
<GetOrders xmlns="http://publicapi.ekmpowershop.com/">
<GetOrdersRequest>
<APIKey>my_api_key</APIKey>
<FromDate>01/07/2018</FromDate>
<ToDate>04/07/2018</ToDate>
</GetOrdersRequest>
</GetOrders>
</soap12:Body>
</soap12:Envelope>"""
#send request to ekm
r = requests.post(url,data=body,headers=headers)
#save output to file
file = open("C:/Users/Mark/Desktop/test.xml", "w")
file.write(r.text)
file.close()
#take the file and format the xml
x = etree.parse("C:/Users/Mark/Desktop/test.xml")
newString = etree.tostring(x, pretty_print=True)
file = open("C:/Users/Mark/Desktop/test.xml", "w")
file.write(newString.decode('utf-8'))
file.close()
#parse the file to get the roots
tree = ET.parse("C:/Users/Mark/Desktop/test.xml")
root = tree.getroot()
#access elements names in the data
for child in root.iter('*'):
print(child.tag)
#show orders elements attributes
tree = ET.parse("C:/Users/Mark/Desktop/test.xml")
root = tree.getroot()
for order in root.iter('{http://publicapi.ekmpowershop.com/}Order'):
out = {}
for child in order:
if child.tag in ('OrderID'):
out[child.tag] = child.text
print(out)
Elements output:
{http://publicapi.ekmpowershop.com/}Orders
{http://publicapi.ekmpowershop.com/}Order
{http://publicapi.ekmpowershop.com/}OrderID
{http://publicapi.ekmpowershop.com/}OrderNumber
{http://publicapi.ekmpowershop.com/}CustomerID
{http://publicapi.ekmpowershop.com/}CustomerUserID
{http://publicapi.ekmpowershop.com/}Order
{http://publicapi.ekmpowershop.com/}OrderID
{http://publicapi.ekmpowershop.com/}OrderNumber
{http://publicapi.ekmpowershop.com/}CustomerID
{http://publicapi.ekmpowershop.com/}CustomerUserID
Orders Output:
{http://publicapi.ekmpowershop.com/}Order {}
{http://publicapi.ekmpowershop.com/}Order {}
XML Structure after formating:
<soap:Envelope xmlns:soap="http://www.w3.org/2003/05/soap-envelope" xmlns:xsi="http://www.w3.org/2001/XMLSchema-instance" xmlns:xsd="http://www.w3.org/2001/XMLSchema">
<soap:Body>
<GetOrdersResponse xmlns="http://publicapi.ekmpowershop.com/">
<GetOrdersResult>
<Status>Success</Status>
<Errors/>
<Date>2018-07-10T13:47:00.1682029+01:00</Date>
<TotalOrders>10</TotalOrders>
<TotalCost>100</TotalCost>
<Orders>
<Order>
<OrderID>100</OrderID>
<OrderNumber>102/040718/67</OrderNumber>
<CustomerID>6910</CustomerID>
<CustomerUserID>204</CustomerUserID>
<FirstName>TestFirst</FirstName>
<LastName>TestLast</LastName>
<CompanyName>Test Company</CompanyName>
<EmailAddress>test#Test.com</EmailAddress>
<OrderStatus>Dispatched</OrderStatus>
<OrderStatusColour>#00CC00</OrderStatusColour>
<TotalCost>85.8</TotalCost>
<OrderDate>10/07/2018 14:30:43</OrderDate>
<OrderDateISO>2018-07-10T14:30:43</OrderDateISO>
<AbandonedOrder>false</AbandonedOrder>
<EkmStatus>SUCCESS</EkmStatus>
</Order>
</Orders>
<Currency>GBP</Currency>
</GetOrdersResult>
</GetOrdersResponse>
</soap:Body>
</soap:Envelope>
You need to consider the namespace when checking for tags.
>>> # Include the namespace part of the tag in the tag values that we check.
>>> tags = ('{http://publicapi.ekmpowershop.com/}OrderID', '{http://publicapi.ekmpowershop.com/}OrderNumber')
>>> for order in root.iter('{http://publicapi.ekmpowershop.com/}Order'):
... out = {}
... for child in order:
... if child.tag in tags:
... out[child.tag] = child.text
... print(out)
...
{'{http://publicapi.ekmpowershop.com/}OrderID': '100', '{http://publicapi.ekmpowershop.com/}OrderNumber': '102/040718/67'}
If you don't want the namespace prefixes in the output, you can strip them by only including that part of the tag after the } character.
>>> for order in root.iter('{http://publicapi.ekmpowershop.com/}Order'):
... out = {}
... for child in order:
... if child.tag in tags:
... out[child.tag[child.tag.index('}')+1:]] = child.text
... print(out)
...
{'OrderID': '100', 'OrderNumber': '102/040718/67'}
Here is a little xml example:
<?xml version="1.0" encoding="UTF-8"?>
<list>
<person id="1">
<name>Smith</name>
<city>New York</city>
</person>
<person id="2">
<name>Pitt</name>
</person>
...
...
</list>
Now I need all Persons with a name and city.
I tried:
#!/usr/bin/python
# coding: utf8
import xml.dom.minidom as dom
tree = dom.parse("test.xml")
for listItems in tree.firstChild.childNodes:
for personItems in listItems.childNodes:
if personItems.nodeName == "name" and personItems.nextSibling == "city":
print personItems.firstChild.data.strip()
But the ouput is empty. Without the "and" condition I become all names. How can I check that the next tag after "name" is "city"?
You can do this in minidom:
import xml.dom.minidom as minidom
def getChild(n,v):
for child in n.childNodes:
if child.localName==v:
yield child
xmldoc = minidom.parse('test.xml')
person = getChild(xmldoc, 'list')
for p in person:
for v in getChild(p,'person'):
attr = v.getAttributeNode('id')
if attr:
print attr.nodeValue.strip()
This prints id of person nodes:
1
2
use element tree check this element tree
import xml.etree.ElementTree as ET
tree = ET.parse('a.xml')
root = tree.getroot()
for person in root.findall('person'):
name = person.find('name').text
try:
city = person.find('city').text
except:
continue
print name, city
for id u can get it by id= person.get('id')
output:Smith New York
Using lxml, you can use xpath to get in one step what you need:
from lxml import etree
xmlstr = """
<list>
<person id="1">
<name>Smith</name>
<city>New York</city>
</person>
<person id="2">
<name>Pitt</name>
</person>
</list>
"""
xml = etree.fromstring(xmlstr)
xp = "//person[city]"
for person in xml.xpath(xp):
print etree.tostring(person)
lxml is external python package, but is so useful, that to me it is always worth to install.
xpath is searching for any (//) element person having (declared by content of []) subelement city.
I'm using Python 3.2's xml.etree.ElementTree, and am attempting to generate XML like this:
<XnaContent xmlns:data="Model.Data">
<Asset Type="data:MyData">
...
The format is out of my control (it's XNA). Notice that the data XML namespace is never actually used to qualify elements or attributes, but rather to qualify attribute values to XNA. My code looks like this:
root = Element('XnaContent')
ET.register_namespace('data', 'Model.Data')
asset = SubElement(root, 'Asset', {"Type": "data:MyData"})
However, the output looks like (pretty-printed by me):
<XnaContent>
<Asset Type="data:MyData">
...
</Asset>
</XnaContent>
How can I get the data XML namespace included in the output?
>>>print ET.tostring(doc, pretty_print=True)
<XnaContent>
<Asset Type="data:MyData"/>
<Asset Type="data:MyData"/>
</XnaContent>
>>> tree=ET.ElementTree(doc)
>>> root=tree.getroot()
>>> nsmap=root.nsmap
>>> nsmap['data']="ModelData"
>>> new_root = ET.Element(root.tag, nsmap=nsmap)
>>> print ET.tostring(new_root, pretty_print=True)
<XnaContent xmlns:data="ModelData"/>
>>> new_root[:] = root[:]
>>> print ET.tostring(new_root, pretty_print=True)
<XnaContent xmlns:data="ModelData">
<Asset Type="data:MyData"/>
<Asset Type="data:MyData"/>
</XnaContent>
import xml.etree.ElementTree as ET
content = '''
<XnaContent>
<Asset Type="data:MyData"/>
<Asset Type="data:MyData"/>
</XnaContent>'''
doc = ET.fromstring(content)
ET.register_namespace('data','ModelData')
tree = ET.ElementTree(doc)
root = tree.getroot()
root.tag = '{ModelData}XnaContent'
print(ET.tostring(root, method = 'xml'))
yields
<data:XnaContent xmlns:data="ModelData">
<Asset Type="data:MyData" />
<Asset Type="data:MyData" />
</data:XnaContent>