Parsing custom xml file using python

Parsing custom xml file using python - python

I have an xml file of following format :
<?xml version='1.0' encoding='utf-8'?>
<execute time="0.59">
<exec name="recursive_a" loops="3" fail="2" skipped="0">
<testcase tname="test_a" name="test.cpp" time="0.50">
<pass>
001,test,pass
</pass>
</testcase>
</exec>
</execute>
how can i parse "recursive_a" string from this xml using python? (i am using minidom xml parser)

With xml.etree.ElementTree and pandas one solution could be:
import xml.etree.ElementTree as ET
import pandas as pd
tree = ET.parse('Code3r.xml')
root = tree.getroot()
for elem in root:
if elem.tag == "exec":
# print(elem.attrib) or with pandas
df = pd.DataFrame.from_dict(elem.attrib, orient='index')
print(df.T.to_string(index=False))
Output:
name loops fail skipped
recursive_a 3 2 0

Related

Extract multiple xml attributes to pandas dataframe

I have a basic xml file called meals.xml which looks like this:
<?xml version="1.0" encoding="UTF-8"?>
<meals name="Sample Text">
<meal id="1" name="Poached Eggs" type="breakfast"/>
<meal id="2" name="Club Sandwich" type="lunch"/>
<meal id="3" name="Steak" type="dinner"/>
<meal id="4" name="Steak" type="dinner"/>
</meals>
I want to extract both 'id' and 'name' attributes in to a dataframe. I can extract one when specifying one column and one attribute (eg, name only), but can't seem to figure out the syntax for getting multiple attributes in the for loop. This what I've tried, adding id to the 'df_cols' and 'attrib.get' function:
import xml.etree.ElementTree as ET
import pandas as pd
root = ET.parse('meals.xml').getroot()
df_cols = ["id", "name"]
rows = []
for node in root:
value = node.attrib.get('id', 'name')
rows.append(value)
df = pd.DataFrame(rows, columns = df_cols)
df
Can someone advise how to do this?

The below may work for you
import xml.etree.ElementTree as ET
import pandas as pd
xml = '''<?xml version="1.0" encoding="UTF-8"?>
<meals name="Sample Text">
<meal id="1" name="Poached Eggs" type="breakfast"/>
<meal id="2" name="Club Sandwich" type="lunch"/>
<meal id="3" name="Steak" type="dinner"/>
<meal id="4" name="Steak" type="dinner"/>
</meals>'''
root = ET.fromstring(xml)
data = [{'id': m.attrib['id'], 'name': m.attrib['name']} for m in root.findall('.//meal')]
df = pd.DataFrame(data)
print(df)
output
id name
0 1 Poached Eggs
1 2 Club Sandwich
2 3 Steak
3 4 Steak

Error in parsing a YQL string xml Python

I'm trying to scrape data from an API like this:
import urllib2
a = urllib2.urlopen('http://query.yahooapis.com/v1/public/yql?q=select%20*%20from%20yahoo.finance.xchange%20where%20pair%20in%20(%22USDEUR%22,%20%22USDJPY%22,%20%22USDBGN%22,%20%22USDCZK%22,%20%22USDDKK%22,%20%22USDGBP%22,%20%22USDHUF%22,%20%22USDLTL%22,%20%22USDLVL%22,%20%22USDPLN%22,%20%22USDRON%22,%20%22USDSEK%22,%20%22USDCHF%22,%20%22USDNOK%22,%20%22USDHRK%22,%20%22USDRUB%22,%20%22USDTRY%22,%20%22USDAUD%22,%20%22USDBRL%22,%20%22USDCAD%22,%20%22USDCNY%22,%20%22USDHKD%22,%20%22USDIDR%22,%20%22USDILS%22,%20%22USDINR%22,%20%22USDKRW%22,%20%22USDMXN%22,%20%22USDMYR%22,%20%22USDNZD%22,%20%22USDPHP%22,%20%22USDSGD%22,%20%22USDTHB%22,%20%22USDZAR%22,%20%22USDISK%22)&env=store://datatables.org/alltableswithkeys')
b = a.read()
b is a string object of the xml:
<?xml version="1.0" encoding="UTF-8"?>
<query xmlns:yahoo="http://www.yahooapis.com/v1/base.rng" yahoo:count="34" yahoo:created="2017-04-21T19:46:11Z" yahoo:lang="en-US"><results><rate id="USDEUR"><Name>USD/EUR</Name><Rate>0.9347</Rate><Date>4/21/2017</Date><Time>7:13pm</Time><Ask>0.9352</Ask><Bid>0.9347</Bid></rate><rate id="USDJPY"><Name>USD/JPY</Name><Rate>109.2200</Rate><Date>4/21/2017</Date><Time>6:58pm</Time><Ask>109.2260</Ask><Bid>109.2200</Bid></rate><rate id="USDBGN"><Name>USD/BGN</Name><Rate>1.8282</Rate><Date>4/21/2017</Date><Time>3:15pm</Time><Ask>N/A</Ask><Bid>1.8282</Bid></rate><rate id="USDCZK"><Name>USD/CZK</Name><Rate>25.1629</Rate><Date>4/21/2017</Date><Time>8:35pm</Time><Ask>25.1702</Ask><Bid>25.1629</Bid></rate><rate id="USDDKK"><Name>USD/DKK</Name><Rate>6.9458</Rate><Date>4/21/2017</Date><Time>6:44pm</Time><Ask>6.9466</Ask><Bid>6.9458</Bid></rate><rate id="USDGBP"><Name>USD/GBP</Name><Rate>0.7812</Rate><Date>4/21/2017</Date><Time>6:29pm</Time><Ask>0.7813</Ask><Bid>0.7812</Bid></rate><rate id="USDHUF"><Name>USD/HUF</Name><Rate>292.4200</Rate><Date>4/21/2017</Date><Time>8:14pm</Time><Ask>292.6200</Ask><Bid>292.4200</Bid></rate><rate id="USDLTL"><Name>USD/LTL</Name><Rate>3.0487</Rate><Date>6/22/2015</Date><Time>9:39am</Time><Ask>3.0491</Ask><Bid>3.0487</Bid></rate><rate id="USDLVL"><Name>USD/LVL</Name><Rate>0.6205</Rate><Date>6/22/2015</Date><Time>9:37am</Time><Ask>0.6206</Ask><Bid>0.6205</Bid></rate><rate id="USDPLN"><Name>USD/PLN</Name><Rate>3.9907</Rate><Date>4/21/2017</Date><Time>6:53pm</Time><Ask>3.9916</Ask><Bid>3.9907</Bid></rate><rate id="USDRON"><Name>USD/RON</Name><Rate>4.2276</Rate><Date>4/21/2017</Date><Time>6:02pm</Time><Ask>4.2411</Ask><Bid>4.2276</Bid></rate><rate id="USDSEK"><Name>USD/SEK</Name><Rate>9.0293</Rate><Date>4/21/2017</Date><Time>8:28pm</Time><Ask>9.0310</Ask><Bid>9.0293</Bid></rate><rate id="USDCHF"><Name>USD/CHF</Name><Rate>0.9977</Rate><Date>4/21/2017</Date><Time>6:33pm</Time><Ask>0.9977</Ask><Bid>0.9977</Bid></rate><rate id="USDNOK"><Name>USD/NOK</Name><Rate>8.6823</Rate><Date>4/21/2017</Date><Time>7:00pm</Time><Ask>8.6858</Ask><Bid>8.6823</Bid></rate><rate id="USDHRK"><Name>USD/HRK</Name><Rate>6.9250</Rate><Date>4/21/2017</Date><Time>6:53pm</Time><Ask>6.9981</Ask><Bid>6.9250</Bid></rate><rate id="USDRUB"><Name>USD/RUB</Name><Rate>56.5055</Rate><Date>4/21/2017</Date><Time>6:33pm</Time><Ask>56.5405</Ask><Bid>56.5055</Bid></rate><rate id="USDTRY"><Name>USD/TRY</Name><Rate>3.6473</Rate><Date>4/21/2017</Date><Time>6:02pm</Time><Ask>3.6478</Ask><Bid>3.6473</Bid></rate><rate id="USDAUD"><Name>USD/AUD</Name><Rate>1.3263</Rate><Date>4/21/2017</Date><Time>8:35pm</Time><Ask>1.3267</Ask><Bid>1.3263</Bid></rate><rate id="USDBRL"><Name>USD/BRL</Name><Rate>3.1473</Rate><Date>4/21/2017</Date><Time>7:02pm</Time><Ask>3.1493</Ask><Bid>3.1473</Bid></rate><rate id="USDCAD"><Name>USD/CAD</Name><Rate>1.3513</Rate><Date>4/21/2017</Date><Time>6:49pm</Time><Ask>1.3513</Ask><Bid>1.3513</Bid></rate><rate id="USDCNY"><Name>USD/CNY</Name><Rate>6.8844</Rate><Date>4/21/2017</Date><Time>6:38pm</Time><Ask>6.8854</Ask><Bid>6.8844</Bid></rate><rate id="USDHKD"><Name>USD/HKD</Name><Rate>7.7746</Rate><Date>4/21/2017</Date><Time>6:01pm</Time><Ask>7.7754</Ask><Bid>7.7746</Bid></rate><rate id="USDIDR"><Name>USD/IDR</Name><Rate>13316.0000</Rate><Date>4/21/2017</Date><Time>6:38pm</Time><Ask>13326.0000</Ask><Bid>13316.0000</Bid></rate><rate id="USDILS"><Name>USD/ILS</Name><Rate>3.6723</Rate><Date>4/21/2017</Date><Time>6:52pm</Time><Ask>3.6823</Ask><Bid>3.6723</Bid></rate><rate id="USDINR"><Name>USD/INR</Name><Rate>64.6490</Rate><Date>4/21/2017</Date><Time>6:26pm</Time><Ask>64.6990</Ask><Bid>64.6490</Bid></rate><rate id="USDKRW"><Name>USD/KRW</Name><Rate>1133.3700</Rate><Date>4/21/2017</Date><Time>6:50pm</Time><Ask>1134.3700</Ask><Bid>1133.3700</Bid></rate><rate id="USDMXN"><Name>USD/MXN</Name><Rate>18.8424</Rate><Date>4/21/2017</Date><Time>6:16pm</Time><Ask>18.8443</Ask><Bid>18.8424</Bid></rate><rate id="USDMYR"><Name>USD/MYR</Name><Rate>4.3980</Rate><Date>4/21/2017</Date><Time>6:38pm</Time><Ask>4.4030</Ask><Bid>4.3980</Bid></rate><rate id="USDNZD"><Name>USD/NZD</Name><Rate>1.4226</Rate><Date>4/21/2017</Date><Time>6:50pm</Time><Ask>1.4236</Ask><Bid>1.4226</Bid></rate><rate id="USDPHP"><Name>USD/PHP</Name><Rate>49.8400</Rate><Date>4/21/2017</Date><Time>6:13pm</Time><Ask>49.8900</Ask><Bid>49.8400</Bid></rate><rate id="USDSGD"><Name>USD/SGD</Name><Rate>1.3966</Rate><Date>4/21/2017</Date><Time>8:28pm</Time><Ask>1.3969</Ask><Bid>1.3966</Bid></rate><rate id="USDTHB"><Name>USD/THB</Name><Rate>34.3500</Rate><Date>4/21/2017</Date><Time>6:49pm</Time><Ask>34.4000</Ask><Bid>34.3500</Bid></rate><rate id="USDZAR"><Name>USD/ZAR</Name><Rate>13.1525</Rate><Date>4/21/2017</Date><Time>6:50pm</Time><Ask>13.1620</Ask><Bid>13.1525</Bid></rate><rate id="USDISK"><Name>USD/ISK</Name><Rate>109.4900</Rate><Date>4/21/2017</Date><Time>5:32pm</Time><Ask>109.9900</Ask><Bid>109.4900</Bid></rate></results></query><!-- total: 1083 -->
<!-- prod_bf1_1;paas.yql;queryyahooapiscomproductionbf1;885cf297-259f-11e7-b972-d4ae52974741 -->
However, when I'm using xml the xml etree module to parse this string as an xml object, I'm getting errors like the object is not indexable and the object is not iterable. What exactly is the output of this code?
import xml.etree.ElementTree as ET
d = ET.ElementTree(ET.fromstring(b))
EDIT: The errors are coming up when I'm trying to iterate through the children of d like so:
for child in d:
print child.tag
The error here is "TypeError: 'ElementTree' object is not iterable"
How can I access the children in this string xml to get specific values from it?

you are overdoing things when you try to convert the string to an elementtree element:
import xml.etree.ElementTree as ET
b = '''<?xml version="1.0" encoding="UTF-8"?>...'''
element = ET.fromstring(b) # that does it!
print(element.attrib)
now you can access element as you would any instance of xml.etree.ElementTree.Element.
you could do this for example to iterate over all children:
for child in tree.iter():
print(child, child.tag, child.text, child.attrib)

List namespace definitions in an XML document with ElementTree?

If I've got an XML file like this:
<root
xmlns:a="http://example.com/a"
xmlns:b="http://example.com/b"
xmlns:c="http://example.com/c"
xmlns="http://example.com/base">
...
</root>
How can I get a list of the namespace definitions (ie, the xmlns:a="…", etc)?
Using:
import xml.etree.ElementTree as ET
tree = ET.parse('foo.xml')
root = tree.getroot()
print root.attrib()
Shows an empty attribute dictionary.

Via #mzjn, in the comments, here's how to do it with stock ElementTree: https://stackoverflow.com/a/42372404/407651 :
import xml.etree.ElementTree as ET
my_namespaces = dict([
node for (_, node) in ET.iterparse('file.xml', events=['start-ns'])
])

You might find it easier to use lxml.
from lxml import etree
xml_data = '<root xmlns:a="http://example.com/a" xmlns:b="http://example.com/b" xmlns:c="http://example.com/c" xmlns="http://example.com/base"></root>'
root_node = etree.fromstring(xml_data)
print root_node.nsmap
This outputs
{None: 'http://example.com/base',
'a': 'http://example.com/a',
'b': 'http://example.com/b',
'c': 'http://example.com/c'}

How to extract data from xml file that is deep down the tag

<?xml version="1.0" encoding="utf-8"?>
<ArrayOfRecord xmlns:i="http://www.w3.org/2001/XMLSchema-instance" i:type="Record">
<AvailableCharts>
<Accelerometer>true</Accelerometer>
<Velocity>false</Velocity>
</AvailableCharts>
<Trics>
<Trick>
<EndOffset>PT2M21.835S</EndOffset>
<Values>
<TrickValue>
<Acceleration>26.505801694441629</Acceleration>
<Rotation>0.023379150593228679</Rotation>
</TrickValue>
</Values>
</Trick>
</Trics>
<Values>
<SensorValue>
<accelx>-3.593643144</accelx>
<accely>7.316485176</accely>
</SensorValue>
<SensorValue>
<accelx>0.31103436</accelx>
<accely>7.70408184</accely>
</SensorValue>
</Values>
</ArrayOfRecord>
I am only interested in 'accelx' and 'accely' value in this data and need to create a csv out of it.
Update: The code given below breaks when I change the second row with the following. Nothing is displayed because of this;
<ArrayOfRecord xmlns:i="http://www.w3.org/2001/XMLSchema-instance" i:type="Record" xmlns="http://schemas">
The following code works:
import xml.etree.ElementTree as etree
tree = etree.parse(r"C:\Users\data.xml")
root = tree.getroot()
val_of_interest = root.findall("./Values/SensorValue")
for sensor_val in val_of_interest:
print sensor_val.find('accelx').text
print sensor_val.find('accely').text

Python XML check next item

Here is a little xml example:
<?xml version="1.0" encoding="UTF-8"?>
<list>
<person id="1">
<name>Smith</name>
<city>New York</city>
</person>
<person id="2">
<name>Pitt</name>
</person>
...
...
</list>
Now I need all Persons with a name and city.
I tried:
#!/usr/bin/python
# coding: utf8
import xml.dom.minidom as dom
tree = dom.parse("test.xml")
for listItems in tree.firstChild.childNodes:
for personItems in listItems.childNodes:
if personItems.nodeName == "name" and personItems.nextSibling == "city":
print personItems.firstChild.data.strip()
But the ouput is empty. Without the "and" condition I become all names. How can I check that the next tag after "name" is "city"?

You can do this in minidom:
import xml.dom.minidom as minidom
def getChild(n,v):
for child in n.childNodes:
if child.localName==v:
yield child
xmldoc = minidom.parse('test.xml')
person = getChild(xmldoc, 'list')
for p in person:
for v in getChild(p,'person'):
attr = v.getAttributeNode('id')
if attr:
print attr.nodeValue.strip()
This prints id of person nodes:
1
2

use element tree check this element tree
import xml.etree.ElementTree as ET
tree = ET.parse('a.xml')
root = tree.getroot()
for person in root.findall('person'):
name = person.find('name').text
try:
city = person.find('city').text
except:
continue
print name, city
for id u can get it by id= person.get('id')
output:Smith New York

Using lxml, you can use xpath to get in one step what you need:
from lxml import etree
xmlstr = """
<list>
<person id="1">
<name>Smith</name>
<city>New York</city>
</person>
<person id="2">
<name>Pitt</name>
</person>
</list>
"""
xml = etree.fromstring(xmlstr)
xp = "//person[city]"
for person in xml.xpath(xp):
print etree.tostring(person)
lxml is external python package, but is so useful, that to me it is always worth to install.
xpath is searching for any (//) element person having (declared by content of []) subelement city.

Develop Reference

Python is a programming language that lets you work quickly and integrate systems more effectively.

Parsing custom xml file using python - python

Related

Extract multiple xml attributes to pandas dataframe

Error in parsing a YQL string xml Python

List namespace definitions in an XML document with ElementTree?

How to extract data from xml file that is deep down the tag

Python XML check next item

Categories

Resources