Python: Get value with xmltodict

Python: Get value with xmltodict - python

I have an XML-file that looks like this:
<?xml version="1.0" encoding="utf-8"?>
<session id="2934" name="Valves" docVersion="5.0.1">
<docInfo>
<field name="Employee" isMandotory="True">Jake Roberts</field>
<field name="Section" isOpen="True" isMandotory="False">5</field>
<field name="Location" isOpen="True" isMandotory="False">Munchen</field>
</docInfo>
</session>
Using xmltodict I want to get the Employee in a string. It is probably quite simple but I can't seem to figure it out.
Here's my code:
#!/usr/bin/python
# -*- coding: utf-8 -*-
import sys
import xmltodict
with open('valves.xml') as fd:
doc = xmltodict.parse(fd.read())
print "ID : %s" % doc['session']['#id']
print "Name : %s" % doc['session']['#name']
print "Doc Version : %s" % doc['session']['#docVersion']
print "Employee : %s" % doc['session']['docInfo']['field']
sys.exit(0)
With this, I do get all fields in a list, but probably with xmltodict every individual field attribute or element is accessible as a key-value.
How can I access the value "Jake Roberts" like I access the value of docVersion for example?

What you are getting is a list of fields where every field is represented by a dict(). Explore this dict (e.g. in Python interactive shell) to narrow down how to get to the value you want.
>>> doc["session"]["docInfo"]["field"][0]
OrderedDict([(u'#name', u'Employee'), (u'#isMandotory', u'True'), ('#text', u'Jake Roberts')])
In order to get to the element value add ["#text"] to the end of the line in the snippet above.

Related

lxml: How do I search for fields without adding a xmlns (localhost) path to each search term?

I'm trying to locate fields in a SOAP xml file using lxml (3.6.0)
...
<soap:Body>
<Request xmlns="http://localhost/">
<Test>
<field1>hello</field1>
<field2>world</field2>
</Test>
</Request>
</soap:Body>
...
In this example I'm trying to find field1 and field2.
I need to add a path to the search term, to find the field:
print (myroot.find(".//{http://localhost/}field1").tag) # prints 'field1'
without it, I don't find anything
print (myroot.find("field1").tag) # finds 'None'
Is there any other way to search for the field tag (here field1) without giving path info?
Full example below:
from lxml import etree
example = """<?xml version="1.0" encoding="utf-8"?><soap:Envelope xmlns:soap="http://schemas.xmlsoap.org/soap/envelope/" xmlns:xsi="http://www.w3.org/2001/XMLSchema-instance" xmlns:xsd="http://www.w3.org/2001/XMLSchema">
<soap:Body><Request xmlns="http://localhost/">
<Test><field1>hello</field1><field2>world</field2></Test>
</Request></soap:Body></soap:Envelope>
"""
myroot = etree.fromstring(example)
# this works
print (myroot.find(".//{http://localhost/}field1").text)
print (myroot.find(".//{http://localhost/}field2").text)
# this fails
print (myroot.find(".//field1").text)
print (myroot.find("field1").text)
Comment: The input of the SOAP request is given, I can't change any of it in real live to make things easier.

There is a way to ignore namespace when selecting element using XPath, but that isn't a good practice. Namespace is there for a reason. Anyway, there is a cleaner way to reference element in namespace i.e by using namespace prefix that was mapped to the namespace uri, instead of using the actual namespace uri every time :
.....
>>> ns = {'d': 'http://localhost/'}
>>> print (myroot.find(".//d:field1", ns).text)
hello
>>> print (myroot.find(".//d:field2", ns).text)
world

How to use python to parse XML to required custom fields

I've got a directory full of salesforce objects in XML format. I'd like to identify the <fullName> and parent file of all the custom <fields> where <required> is true. Here is some truncated sample data, lets call it "Custom_Object__c:
<?xml version="1.0" encoding="UTF-8"?>
<CustomObject xmlns="http://soap.sforce.com/2006/04/metadata">
<deprecated>false</deprecated>
<description>descriptiontext</description>
<fields>
<fullName>custom_field1</fullName>
<required>false</required>
<type>Text</type>
<unique>false</unique>
</fields>
<fields>
<fullName>custom_field2</fullName>
<deprecated>false</deprecated>
<visibleLines>5</visibleLines>
</fields>
<fields>
<fullName>custom_field3</fullName>
<required>false</required>
</fields>
<fields>
<fullName>custom_field4</fullName>
<deprecated>false</deprecated>
<description>custom field 4 description</description>
<externalId>true</externalId>
<required>true</required>
<scale>0</scale>
<type>Number</type>
<unique>false</unique>
</fields>
<fields>
<fullName>custom_field5</fullName>
<deprecated>false</deprecated>
<description>Creator of this log message. Application-specific.</description>
<externalId>true</externalId>
<label>Origin</label>
<length>255</length>
<required>true</required>
<type>Text</type>
<unique>false</unique>
</fields>
<label>App Log</label>
<nameField>
<displayFormat>LOG-{YYYYMMDD}-{00000000}</displayFormat>
<label>Entry ID</label>
<type>AutoNumber</type>
</nameField>
</CustomObject>
The desired output would be a dictionary with format something like:
required_fields = {'Custom_Object__1': 'custom_field4', 'Custom_Object__1': 'custom_field5',... etc for all the required fields in all files in the fold.}
or anything similar.
I've already gotten my list of objects through glob.glob, and I can get a list of all the children and their attributes with ElementTree but I'm struggling past there. I feel like I'm very close but I'd love a hand finishing this task off. Here is my code so far:
import os
import glob
import xml.etree.ElementTree as ET
os.chdir("/Users/paulsallen/workspace/fforce/FForce Dev Account/config/objects/")
objs = []
for file in glob.glob("*.object"):
objs.append(file)
fields_dict = {}
for object in objs:
root = ET.parse(objs).getroot()
....
and once I get the XML data parsed I don't know where to take it from there.

You really want to switch to using lxml here, because then you can use an XPath query:
from lxml import etree as ET
os.chdir("/Users/paulsallen/workspace/fforce/FForce Dev Account/config/objects/")
objs = glob.glob("*.object")
fields_dict = {}
for filename in objs:
root = ET.parse(filename).getroot()
required = root.xpath('.//n:fullName[../n:required/text()="true"]/text()',
namespaces={'n': tree.nsmap[None]})
fields_dict[os.path.splitext(filename)[0]] = required
With that code you end up with a dictionary of lists; each key is a filename (without the extension), each value is a list of required fields.
The XPath query looks for fullName elements in the default namespace, that have a required element as sibling with the text 'true' in them. It then takes the contained text of each of those matching elements, which is a list we can store in the dictionary.

Use this function to find all required fields under a given root. It should also help as an example/starting point for future parsing needs
def find_required_fields(root):
NS = {'soap': 'http://soap.sforce.com/2006/04/metadata'}
required_fields = []
for field in root.findall('soap:fields', namespaces=NS):
required = field.findtext('soap:required', namespaces=NS) == "true"
name = field.findtext('soap:fullName', namespaces=NS)
if required:
required_fields.append(name)
return required_fields
Example usage:
>>> import xml.etree.ElementTree as ET
>>> root = ET.parse('objects.xml') # where objects.xml contains the example in the question
>>> print find_required_fields(root)
['custom_field4', 'custom_field5']
>>>

Python lxml: Items without .text attribute returned when querying for nodes()

I am trying to parse out certain tags from an XML document and it is retiring an AttributeError: '_ElementStringResult' object has no attribute 'text' error.
Here is the xml document:
<?xml version='1.0' encoding='ASCII'?>
<Root>
<Data>
<FormType>Log</FormType>
<Submitted>2012-03-19 07:34:07</Submitted>
<ID>1234</ID>
<LAST>SJTK4</LAST>
<Latitude>36.7027777778</Latitude>
<Longitude>-108.046111111</Longitude>
<Speed>0.0</Speed>
</Data>
</Root>
Here is the code I am using
from lxml import etree
from StringIO import StringIO
import MySQLdb
import glob
import os
import shutil
import logging
import sys
localPath = "C:\data"
xmlFiles = glob.glob1(localPath,"*.xml")
for file in xmlFiles:
a = os.path.join(localPath,file)
element = etree.parse(a)
Data = element.xpath('//Root/Data/node()')
parsedData = [{field.tag: field.text for field in Data} for action in Data]
print parsedData #AttributeError: '_ElementStringResult' object has no attribute 'text'

'//Root/Data/node()' will return a list of all the child elements which include text elements as strings which will not have a text attribute. If you put a print right after the Data = ... you will see something like ['\n ', <Element FormType at 0x10675fdc0>, '\n ', ....
I would do a filter first such as:
Data = [f for f in elem.xpath('//Root/Data/node()') if hasattr(f, 'text')]
Then I think the following line could be rewritten as:
parsedData = {field.tag: field.text for field in Data}
which will give the element tag and text dictionary which I believe is what you want.

Instead of querying for //Root/Data/node(), query for /Root/Data/* if you want only elements (as opposed to text nodes) to be returned. (Also, using only a single leading / rather than // allows the engine to do a cheaper search, rather than needing to look through the whole subtree for an additional Root.
Also -- are you sure you really want to loop through the entire list of subelements of Data inside your inner loop, rather than looping over only the subelements of a single Data element selected by your outer loop? I think your logic is broken, though it would only be visible if you had a file with more than one Data element under Root.

parsing XML configuration file using Etree in python

Please help me parse a configuration file of the below prototype using lxml etree. I tried with for event, element with tostring. Unfortunately I don't need the text, but the XML between
<template name>
<config>
</template>
for a given attribute.
I started with this code, but get a key error while searching for the attribute since it scans from start
config_tree = etree.iterparse(token_template_file)
for event, element in config_tree:
if element.attrib['name']=="ad auth":
print ("attrib reached. get XML before child ends")
Since I am a newbie to XML and python, I am not sure how to go about it. Here is the config file:
<Templates>
<template name="config1">
<request>
<password>pass</password>
<userName>username</userName>
<appID>someapp</appID>
</request>
</template>
<template name="config2">
<request>
<password>pass1</password>
<userName>username1</userName>
<appID>someapp</appID>
</request>
</template>
</Templates>
Thanks in advance!
Expected Output:
Say the user requests the config2- then the output should look like:
<request>
<password>pass1</password>
<userName>username1</userName>
<appID>someapp</appID>
</request>
(I send this XML using httplib2 to a server for initial authentication)
FINAL CODE:
thanks to FC and Constantnius. Here is the final code:
config_tree = etree.parse(token_template_file)
for template in config_tree.iterfind("template"):
if template.get("name") == "config2":
element = etree.tostring(template.find("request"))
print (template.get("name"))
print (element)
output:
config2
<request>
<password>pass1</password>
<userName>username1</userName>
<appID>someapp</appID>
</request>

You could try to iterate over all template elements in the XML and parse them with the following code:
for template in root.iterfind("template"):
name = template.get("name")
request = template.find(requst)
password = template.findtext("request/password")
username = ...
...
# Do something with the values

You could try using get('name', default='') instead of ['name']
To get the text in the tag use .text

Parsing Solr XML into Python Dictionary

I am new to python and am trying to pass an xml document (filled with documents for a solr instance) into a python dictionary. I am having trouble trying to actually accomplish this. I have tried using ElementTree and minidom but I can't seem to get the right results.
Here is my XML Structure:
<add>
<doc>
<field name="genLatitude">45.639968</field>
<field name="carOfficeHoursEnd">2000-01-01T09:00:00.000Z</field>
<field name="genLongitude">5.879745</field>
</doc>
<doc>
<field name="genLatitude">46.639968</field>
<field name="carOfficeHoursEnd">2000-01-01T09:00:00.000Z</field>
<field name="genLongitude">6.879745</field>
</doc>
</add>
And From this I need to turn it into a dictionary that looks like:
doc {
"genLatitude": '45.639968',
"carOfficeHoursEnd": '2000-01-01T09:00:00.000Z',
"genLongitude": '5.879745',
}
I am not too familiar with how dictionaries work but is there also a way to get all the "docs" into one dictionary.
cheers.

import xml.etree.cElementTree as etree
from pprint import pprint
root = etree.fromstring(xmlstr) # or etree.parse(filename_or_file).getroot()
docs = [{f.attrib['name']: f.text for f in doc.iterfind('field[#name]')}
for doc in root.iterfind('doc')]
pprint(docs)
Output
[{'carOfficeHoursEnd': '2000-01-01T09:00:00.000Z',
'genLatitude': '45.639968',
'genLongitude': '5.879745'},
{'carOfficeHoursEnd': '2000-01-01T09:00:00.000Z',
'genLatitude': '46.639968',
'genLongitude': '6.879745'}]
Where xmlstr is:
xmlstr = """
<add>
<doc>
<field name="genLatitude">45.639968</field>
<field name="carOfficeHoursEnd">2000-01-01T09:00:00.000Z</field>
<field name="genLongitude">5.879745</field>
</doc>
<doc>
<field name="genLatitude">46.639968</field>
<field name="carOfficeHoursEnd">2000-01-01T09:00:00.000Z</field>
<field name="genLongitude">6.879745</field>
</doc>
</add>
"""

Solr can return a Python dictionary if you add wt=python to the request parameters. To convert this text response into a Python object, use ast.literal_eval(text_response).
This is much simpler than parsing the XML.

A possible solution using ElementTree, with output pretty formatted for sake of example:
>>> import xml.etree.ElementTree as etree
>>> root = etree.parse(document).getroot()
>>> docs = []
>>> for doc in root.findall('doc'):
... fields = {}
... for field in doc:
... fields[field.attrib['name']] = field.text
... docs.append(fields)
...
>>> print docs
[{'genLongitude': '5.879745',
'genLatitude': '45.639968',
'carOfficeHoursEnd': '2000-01-01T09:00:00.000Z'},
{'genLongitude': '6.879745',
'genLatitude': '46.639968',
'carOfficeHoursEnd': '2000-01-01T09:00:00.000Z'}]
The XML document you show does not provide a way to distinguish each doc from the other, so I would maintain that a list is the best structure to collect each dictionary.
Indeed, if you want to insert each doc data into another dictionary, of course you can, but you need to choose a suitable key for that dictionary. For example, using the id Python provides for each object, you could write:
>>> docs = {}
>>> for doc in root.findall('doc'):
... fields = {}
... for field in doc:
... fields[field.attrib['name']] = field.text
... docs[id(fields)] = fields
...
>>> print docs
{3076930796L: {'genLongitude': '6.879745',
'genLatitude': '46.639968',
'carOfficeHoursEnd': '2000-01-01T09:00:00.000Z'},
3076905540L: {'genLongitude': '5.879745',
'genLatitude': '45.639968',
'carOfficeHoursEnd': '2000-01-01T09:00:00.000Z'}}
This example is designed just to let you see how to use the outer dictionary. If you decide to go down this path, I would suggest you to find a meaningful and usable key instead of the obejct's memory address returned by id, which can change from run to run.

It's risky to eval any string that comes from the outside directly into python. Who knows what's in there.
I'd suggest using the json interface. Something like:
import json
import urllib2
response_dict = json.loads(urllib2.urlopen('http://localhost:8080/solr/combined/select?wt=json&q=*&rows=1').read())
#to view the dict
print json.dumps(answer, indent=1)

Develop Reference

Python is a programming language that lets you work quickly and integrate systems more effectively.

Python: Get value with xmltodict - python

Related

lxml: How do I search for fields without adding a xmlns (localhost) path to each search term?

How to use python to parse XML to required custom fields

Python lxml: Items without .text attribute returned when querying for nodes()

parsing XML configuration file using Etree in python

Parsing Solr XML into Python Dictionary

Categories

Resources