XML Parse attributes using namespace

XML Parse attributes using namespace - python

I'm using the following XML:
<feed xmlns:im="http://itunes.apple.com/rss" xmlns="http://www.w3.org/2005/Atom" xml:lang="en">
<id>
https://itunes.apple.com/IN/rss/topfreeapplications/limit=200/xml
</id>
<title>iTunes Store: Top Free Apps</title>
<updated>2016-12-05T12:37:06-07:00</updated>
<link rel="alternate" type="text/html" href="https://itunes.apple.com/WebObjects/MZStore.woa/wa/viewTop?cc=in&id=134581&popId=27"/>
<link rel="self" href="https://itunes.apple.com/IN/rss/topfreeapplications/limit=200/xml"/>
<icon>http://itunes.apple.com/favicon.ico</icon>
<author>
<name>iTunes Store</name>
<uri>http://www.apple.com/uk/itunes/</uri>
</author>
<rights>Copyright 2008 Apple Inc.</rights>
<entry>
<updated>2016-12-05T12:37:06-07:00</updated>
<id im:id="473941634" im:bundleId="com.one97.paytm">https://itunes.apple.com/in/app/recharge-bill-payment-wallet/id473941634?mt=8&uo=2</id>
<title>Recharge, Bill Payment & Wallet - Paytm Mobile Solutions</title>
<summary></summary>
<im:name>Recharge, Bill Payment & Wallet</im:name>
<link rel="alternate" type="text/html" href="https://itunes.apple.com/in/app/recharge-bill-payment-wallet/id473941634?mt=8&uo=2"/>
<im:contentType term="Application" label="Application"/>
<category im:id="6024" term="Shopping" scheme="https://itunes.apple.com/in/genre/ios-shopping/id6024?mt=8&uo=2" label="Shopping"/>
<im:artist href="https://itunes.apple.com/in/developer/paytm-mobile-solutions/id473941637?mt=8&uo=2">Paytm Mobile Solutions</im:artist>
<im:price amount="0.00000" currency="INR">Get</im:price>
<im:image height="53">http://is1.mzstatic.com/image/thumb/Purple71/v4/9b/37/bf/9b37bf75-6b4d-9c95-a8a4-ea369f05ae7e/pr_source.png/53x53bb-85.png</im:image>
<im:image height="75">http://is5.mzstatic.com/image/thumb/Purple71/v4/9b/37/bf/9b37bf75-6b4d-9c95-a8a4-ea369f05ae7e/pr_source.png/75x75bb-85.png</im:image>
<im:image height="100">http://is5.mzstatic.com/image/thumb/Purple71/v4/9b/37/bf/9b37bf75-6b4d-9c95-a8a4-ea369f05ae7e/pr_source.png/100x100bb-85.png</im:image>
<rights>© One97 Communications Ltd</rights>
<im:releaseDate label="24 October 2011">2011-10-24T16:18:48-07:00</im:releaseDate>
<content type="html"></content>
</entry>
</feed>
I would like to extract the id information for each entry value:
the attribute is as follows: "im:id"
from xml.dom import minidom
xmldoc = minidom.parse('topIN.xml')
itemlist = xmldoc.getElementsByTagName('link')
print(len(itemlist))
print(itemlist[0].attributes.keys())
I get information:
1
[u'href', u'type', u'rel']
But when I do the same of id, nothing returns.

Here is a version using xml.etree.ElementTree:
import xml.etree.ElementTree as ET
tree = ET.parse('topIN.xml')
root = tree.getroot()
ns={'im':"http://itunes.apple.com/rss", 'atom':"http://www.w3.org/2005/Atom"}
for id_ in root.findall('atom:entry/atom:id', ns):
print (id_.attrib['{' + ns['im'] + '}id'])
Here is a version using lxml:
from lxml import etree
root=etree.parse('topIN.xml')
ns={'im':"http://itunes.apple.com/rss", 'atom':"http://www.w3.org/2005/Atom"}
print('\n'.join(root.xpath('atom:entry/atom:id/#im:id', namespaces=ns)))

This worked:
from xml.dom import minidom
xmldoc = minidom.parse('topIN.xml')
itemlist = xmldoc.getElementsByTagName('entry')
print(len(itemlist))
for s in itemlist:
print s.getElementsByTagName('id')[0].attributes['im:id'].value

Related

Change the tag id attribute in KML (Simplekml)

Is it possible to change the id attribute in a tag in simplekml?
import simplekml
kml = simplekml.Kml()
pnt = kml.newpoint(name='A Point')
pnt.coords = [(1.0, 2.0)]
kml.save("icon.kml")
This will generate the following document
<?xml version="1.0" encoding="UTF-8"?>
<kml xmlns="http://www.opengis.net/kml/2.2" xmlns:gx="http://www.google.com/kml/ext/2.2">
<Document id="1">
<Folder id="2">
<Style id="5">
<IconStyle id="6">
<colorMode>normal</colorMode>
<scale>1</scale>
<heading>0</heading>
<Icon id="7">
<href>https://www.gstatic.com/mapspro/images/stock/503-wht-blank_maps.png</href>
</Icon>
</IconStyle>
</Style>
<Placemark id="4">
<name>A Point</name>
<styleUrl>#5</styleUrl>
<Point id="3">
<coordinates>1.0,2.0,0.0</coordinates>
</Point>
</Placemark>
</Folder>
</Document>
</kml>
Note that in some tags, an id attribute appears as if it had been generated from some simplekml index. I needed to change the ID assigned to the <Style id="5"> tag to <Style id="icon-1532-0288D1-nodesc-normal">
This would change the icon on Google My Maps. How can I do this using simplekml?

It looks like in the (getting started) documentation, https://simplekml.readthedocs.io/en/latest/gettingstarted.html, the id tag is generated differently for different kinds of objects, ie 'feat_1', 'feat_2', 'geom_0':
import simplekml
kml = simplekml.Kml()
pnt = kml.newpoint(name="A Point")
print kml.kml()
This is what is generated:
<?xml version="1.0" encoding="UTF-8"?>
<kml xmlns="http://www.opengis.net/kml/2.2" xmlns:gx="http://www.google.com/kml/ext/2.2">
<Document id="feat_1">
<Placemark id="feat_2">
<name>A Point</name>
<Point id="geom_0">
<coordinates>0.0, 0.0, 0.0</coordinates>
</Point>
</Placemark>
</Document>
</kml>
I looked through the source code, and it looks like, at least in version 1.3.2, they got away from that, and generate tag id's by just counting up.
class Kmlable(object):
"""Enables a subclass to be converted into KML."""
_globalid = 0
_currentroot = None
_compiling = False
_namespaces = ['xmlns="http://www.opengis.net/kml/2.2"', 'xmlns:gx="http://www.google.com/kml/ext/2.2"']
def __init__(self):
self._id = str(Kmlable._globalid)
Kmlable._globalid += 1
try:
from collections import OrderedDict
self._kml = OrderedDict()
except ImportError:
self._kml = {}
For some reason, these id tags are designed to be read only:
#property
def id(self):
"""The id string (read only)."""
return self._id
One hack-y way you could change this is something like (in base.py):
class Kmlable(object):
"""Enables a subclass to be converted into KML."""
#_globalid = 0
_globalid = 'your_string_here'
_currentroot = None
_compiling = False
_namespaces = ['xmlns="http://www.opengis.net/kml/2.2"', 'xmlns:gx="http://www.google.com/kml/ext/2.2"']
def __init__(self):
self._id = str(Kmlable._globalid)
#Kmlable._globalid += 1
try:
from collections import OrderedDict
self._kml = OrderedDict()
except ImportError:
self._kml = {}
But that would set the same id for every object you have in your kml document:
import simplekml
kml = simplekml.Kml()
pnt = kml.newpoint(name="A Point")
print(kml.kml())
<?xml version="1.0" encoding="UTF-8"?>
<kml xmlns="http://www.opengis.net/kml/2.2" xmlns:gx="http://www.google.com/kml/ext/2.2">
<Document id="your_string_here">
<Placemark id="your_string_here">
<name>A Point</name>
<Point id="your_string_here">
<coordinates>0.0, 0.0, 0.0</coordinates>
</Point>
</Placemark>
</Document>
</kml>
Hopefully that helps your understanding. Not a full-stop answer, but too long to post in the comments.

Get text inside xml tags by their name

I had a xml code and i want to get text in exact elements(xml tags) using python language .
I have tried couple of solutions and didnt work.
import xml.etree.ElementTree as ET
tree = ET.fromstring(xml)
for node in tree.iter('Model'):
print node
How can i do that ?
Xml Code :
<soap:Envelope
xmlns:soap="http://schemas.xmlsoap.org/soap/envelope/"
xmlns:xsi="http://www.w3.org/2001/XMLSchema-instance"
xmlns:xsd="http://www.w3.org/2001/XMLSchema">
<soap:Body>
<GetVehicleLimitedInfoResponse
xmlns="http://schemas.conversesolutions.com/xsd/dmticta/v1">
<return>
<ResponseMessage xsi:nil="true" />
<ErrorCode xsi:nil="true" />
<RequestId> 2012290007705 </RequestId>
<TransactionCharge>150</TransactionCharge>
<VehicleNumber>GF-0176</VehicleNumber>
<AbsoluteOwner>SIYAPATHA FINANCE PLC</AbsoluteOwner>
<EngineNo>GA15-483936F</EngineNo>
<ClassOfVehicle>MOTOR CAR</ClassOfVehicle>
<Make>NISSAN</Make>
<Model>PULSAR</Model>
<YearOfManufacture>1998</YearOfManufacture>
<NoOfSpecialConditions>0</NoOfSpecialConditions>
<SpecialConditions xsi:nil="true" />
</return>
</GetVehicleLimitedInfoResponse>
</soap:Body>
</soap:Envelope>

Edited and improved answer:
import xml.etree.ElementTree as ET
import re
ns = {"veh": "http://schemas.conversesolutions.com/xsd/dmticta/v1"}
tree = ET.parse('test.xml') # save your xml as test.xml
root = tree.getroot()
def get_tag_name(tag):
return re.sub(r'\{.*\}', '',tag)
for node in root.find(".//veh:return", ns):
print(get_tag_name(node.tag)+': ', node.text)
It should produce something like this:
ResponseMessage: None
ErrorCode: None
RequestId: 2012290007705
TransactionCharge: 150
VehicleNumber: GF-0176
AbsoluteOwner: SIYAPATHA FINANCE PLC
EngineNo: GA15-483936F
ClassOfVehicle: MOTOR CAR
Make: NISSAN
Model: PULSAR
YearOfManufacture: 1998
NoOfSpecialConditions: 0
SpecialConditions: None

Parsing nested XML with ElementTree

I have the following XML format, and I want to pull out the values for name, region, and status using python's xml.etree.ElementTree module.
However, my attempt to get this information has been unsuccessful so far.
<feed>
<entry>
<id>uuid:asdfadsfasdf123123</id>
<title type="text"></title>
<content type="application/xml">
<NamespaceDescription xmlns="http://schemas.microsoft.com/netservices/2010/10/servicebus/connect" xmlns:i="http://www.w3.org/2001/XMLSchema-instance">
<Name>instancename</Name>
<Region>US</Region>
<Status>Active</Status>
</NamespaceDescription>
</content>
</entry>
<entry>
<id>uuid:asdfadsfasdf234234</id>
<title type="text"></title>
<content type="application/xml">
<NamespaceDescription xmlns="http://schemas.microsoft.com/netservices/2010/10/servicebus/connect" xmlns:i="http://www.w3.org/2001/XMLSchema-instance">
<Name>instancename2</Name>
<Region>US2</Region>
<Status>Active</Status>
</NamespaceDescription>
</content>
</entry>
</feed>
My code attempt:
NAMESPACE = '{http://www.w3.org/2005/Atom}'
root = et.fromstring(XML_STRING)
entry_root = root.findall('{0}entry'.format(NAMESPACE))
for child in entry_root:
content_node = child.find('{0}content'.format(NAMESPACE))
for content in content_node:
for desc in content.iter():
print desc.tag
name = desc.find('{0}Name'.format(NAMESPACE))
print name
desc.tag is giving me the nodes I want to access, but name is returning None. Any ideas what's wrong with my code?
Output of desc.tag:
{http://schemas.microsoft.com/netservices/2010/10/servicebus/connect}Name
{http://schemas.microsoft.com/netservices/2010/10/servicebus/connect}Region
{http://schemas.microsoft.com/netservices/2010/10/servicebus/connect}Status

I don't know why I didn't see this before. But, I was able to get the values.
root = et.fromstring(XML_STRING)
entry_root = root.findall('{0}entry'.format(NAMESPACE))
for child in entry_root:
content_node = child.find('{0}content'.format(NAMESPACE))
for descr in content_node:
name_node = descr.find('{0}Name'.format(NAMESPACE))
print name_node.text

You can use lxml.etree along with default namespace mapping to parse the XML as follows:
content = '''
<feed>
<entry>
<id>uuid:asdfadsfasdf123123</id>
<title type="text"></title>
<content type="application/xml">
<NamespaceDescription xmlns="http://schemas.microsoft.com/netservices/2010/10/servicebus/connect" xmlns:i="http://www.w3.org/2001/XMLSchema-instance">
<Name>instancename</Name>
<Region>US</Region>
<Status>Active</Status>
</NamespaceDescription>
</content>
</entry>
<entry>
<id>uuid:asdfadsfasdf234234</id>
<title type="text"></title>
<content type="application/xml">
<NamespaceDescription xmlns="http://schemas.microsoft.com/netservices/2010/10/servicebus/connect" xmlns:i="http://www.w3.org/2001/XMLSchema-instance">
<Name>instancename2</Name>
<Region>US2</Region>
<Status>Active</Status>
</NamespaceDescription>
</content>
</entry>
</feed>'''
from lxml import etree
tree = etree.XML(content)
ns = {'default': 'http://schemas.microsoft.com/netservices/2010/10/servicebus/connect'}
names = tree.xpath('//default:Name/text()', namespaces=ns)
regions = tree.xpath('//default:Region/text()', namespaces=ns)
statuses = tree.xpath('//default:Status/text()', namespaces=ns)
print(names)
print(regions)
print(statuses)
Output
['instancename', 'instancename2']
['US', 'US2']
['Active', 'Active']
This XPath/namespace functionality can be adapted to output the data in any format you require.

Appending to attribute values of an xml using python

Example xml:
<response version-api="2.0">
<value>
<books>
<book available="20" id="1" tags="">
<title></title>
<author id="1" tags="Joel">Manuel De Cervantes</author>
</book>
<book available="14" id="2" tags="Jane">
<title>Catcher in the Rye</title>
<author id="2" tags="">JD Salinger</author>
</book>
<book available="13" id="3" tags="">
<title></title>
<author id="3">Lewis Carroll</author>
</book>
<book available="5" id="4" tags="Harry">
<title>Don</title>
<author id="4">Manuel De Cervantes</author>
</book>
</books>
</value>
</response>
I want to append a string value of my choosing to all attributes called "tags". This is whether the "tags" attribute has a value or not and also the attributes are at different levels of the xml structure. I have tried the method findall() but I keep on getting an error "IndexError: list index out of range." This is the code I have so far which is a little short but I have run out of steam for what else I need to type...
splitter = etree.XMLParser(strip_cdata=False)
xmldoc = etree.parse(os.path.join(root, xml_file), splitter ).getroot()
for child in xmldoc:
if child.tag != 'response':
allDescendants = list(etree.findall())
for child in allDescendants:
if hasattr(child, 'tags'):
child.attribute["tags"].value = "someString"

findall() is the right API to use. Here is an example:
from lxml import etree
import os
splitter = etree.XMLParser(strip_cdata=False)
xml_file = 'foo.xml'
root = '.'
xmldoc = etree.parse(os.path.join(root, xml_file), splitter ).getroot()
for element in xmldoc.findall(".//*[#tags]"):
element.attrib["tags"] += " KILROY!"
print etree.tostring(xmldoc)

Python XML check next item

Here is a little xml example:
<?xml version="1.0" encoding="UTF-8"?>
<list>
<person id="1">
<name>Smith</name>
<city>New York</city>
</person>
<person id="2">
<name>Pitt</name>
</person>
...
...
</list>
Now I need all Persons with a name and city.
I tried:
#!/usr/bin/python
# coding: utf8
import xml.dom.minidom as dom
tree = dom.parse("test.xml")
for listItems in tree.firstChild.childNodes:
for personItems in listItems.childNodes:
if personItems.nodeName == "name" and personItems.nextSibling == "city":
print personItems.firstChild.data.strip()
But the ouput is empty. Without the "and" condition I become all names. How can I check that the next tag after "name" is "city"?

You can do this in minidom:
import xml.dom.minidom as minidom
def getChild(n,v):
for child in n.childNodes:
if child.localName==v:
yield child
xmldoc = minidom.parse('test.xml')
person = getChild(xmldoc, 'list')
for p in person:
for v in getChild(p,'person'):
attr = v.getAttributeNode('id')
if attr:
print attr.nodeValue.strip()
This prints id of person nodes:
1
2

use element tree check this element tree
import xml.etree.ElementTree as ET
tree = ET.parse('a.xml')
root = tree.getroot()
for person in root.findall('person'):
name = person.find('name').text
try:
city = person.find('city').text
except:
continue
print name, city
for id u can get it by id= person.get('id')
output:Smith New York

Using lxml, you can use xpath to get in one step what you need:
from lxml import etree
xmlstr = """
<list>
<person id="1">
<name>Smith</name>
<city>New York</city>
</person>
<person id="2">
<name>Pitt</name>
</person>
</list>
"""
xml = etree.fromstring(xmlstr)
xp = "//person[city]"
for person in xml.xpath(xp):
print etree.tostring(person)
lxml is external python package, but is so useful, that to me it is always worth to install.
xpath is searching for any (//) element person having (declared by content of []) subelement city.

Develop Reference

Python is a programming language that lets you work quickly and integrate systems more effectively.

XML Parse attributes using namespace - python

This worked: from xml.dom import minidom xmldoc = minidom.parse('topIN.xml') itemlist = xmldoc.getElementsByTagName('entry') print(len(itemlist)) for s in itemlist: print s.getElementsByTagName('id')[0].attributes['im:id'].value

Related

Change the tag id attribute in KML (Simplekml)

Get text inside xml tags by their name

Parsing nested XML with ElementTree

Appending to attribute values of an xml using python

Python XML check next item

Categories

Resources