how to handle web SQL queries and xml replies in Python - python

I have a distant database on which I can send SQL select queries through a web service like this:
http://aa.bb.cc.dd:85/SQLWEB?query=select+*+from+machine&output=xml_v2
which returns
<Query>
<SQL></SQL>
<Fields>
<MACHINEID DataType="Integer" DataSize="4"/>
<NAME DataType="WideString" DataSize="62"/>
<MACHINECLASSID DataType="Integer" DataSize="4"/>
<SUBMACHINECLASS DataType="WideString" DataSize="22"/>
<DISABLED DataType="Integer" DataSize="4"/>
</Fields>
<Record>
<MACHINEID>1</MACHINEID>
<NAME>LOADER</NAME>
<MACHINECLASSID>16</MACHINECLASSID>
<SUBMACHINECLASS>A</SUBMACHINECLASS>
<DISABLED>0</DISABLED>
</Record>
<Record>
...
</Record>
...
</Query>
Then I need to insert the records into a local SQL database.
What's the easiest way ? Thanks !

First of all, querys in the url it's a horrible idea for security.
Use xml libs to parse the xml, and then iterate over the result to add to the db.
import xml.etree.ElementTree as ET
tree = ET.parse('xml file')
root = tree.getroot()
# root = ET.fromstring(country_data_as_string) if you use a string
for record in root.findall('Record'):
MACHINEID = record.get('MACHINEID')
NAME = record.get('NAME')
MACHINECLASSID = record.get('MACHINECLASSID')
SUBMACHINECLASS = record.get('SUBMACHINECLASS')
DISABLED = record.get('DISABLED')
#your code to add this result to the db
ElementTree XML API

Related

python, xml: how to access the 3rd child by element' name

Would you help me, pleace, to get an access to elemnt with name 'id' by the following construction in Python (i have lxml and xml.etree.ElementTree libraries).
Desirable result: '0000000'
Desirable method:
Search in xml-document a child, where it's name is fcsProtocolEF3.
Search in fcsProtocolEF3 an element with name 'id'.
It is crucial to search by element name. Not by ordinal position.
I tried to use something like this: tree.findall('{http://zakupki.gov.ru/oos/export/1}fcsProtocolEF3')[0].findall('{http://zakupki.gov.ru/oos/types/1}id')[0].text
it works, but it requires to input namespaces. XML-document have different namespaces and I don't know how to define them beforehand.
Thank you.
That would be great to use something like XQuery in SQL:
value('(/*:export/*:fcsProtocolEF3/*:id)[1]', 'nvarchar(21)')) AS [id],
XML-document:
<?xml version="1.0" encoding="UTF-8" standalone="true"?>
<ns2:export xmlns:ns3="http://zakupki.gov.ru/oos/common/1" xmlns:ns4="http://zakupki.gov.ru/oos/base/1" xmlns:ns2="http://zakupki.gov.ru/oos/export/1" xmlns:ns10="http://zakupki.gov.ru/oos/printform/1" xmlns:ns11="http://zakupki.gov.ru/oos/control99/1" xmlns:ns9="http://zakupki.gov.ru/oos/SMTypes/1" xmlns:ns7="http://zakupki.gov.ru/oos/pprf615types/1" xmlns:ns8="http://zakupki.gov.ru/oos/EPtypes/1" xmlns:ns5="http://zakupki.gov.ru/oos/TPtypes/1" xmlns:ns6="http://zakupki.gov.ru/oos/CPtypes/1" xmlns="http://zakupki.gov.ru/oos/types/1">
<ns2:fcsProtocolEF3 schemeVersion="10.2">
<id>0000000</id>
<purchaseNumber>0000000000000000</purchaseNumber>
</ns2:fcsProtocolEF3>
</ns2:export>
lxml solution:
xml = '''<?xml version="1.0"?>
<ns2:export xmlns:ns3="http://zakupki.gov.ru/oos/common/1" xmlns:ns4="http://zakupki.gov.ru/oos/base/1" xmlns:ns2="http://zakupki.gov.ru/oos/export/1" xmlns:ns10="http://zakupki.gov.ru/oos/printform/1" xmlns:ns11="http://zakupki.gov.ru/oos/control99/1" xmlns:ns9="http://zakupki.gov.ru/oos/SMTypes/1" xmlns:ns7="http://zakupki.gov.ru/oos/pprf615types/1" xmlns:ns8="http://zakupki.gov.ru/oos/EPtypes/1" xmlns:ns5="http://zakupki.gov.ru/oos/TPtypes/1" xmlns:ns6="http://zakupki.gov.ru/oos/CPtypes/1" xmlns="http://zakupki.gov.ru/oos/types/1">
<ns2:fcsProtocolEF3 schemeVersion="10.2">
<id>0000000</id>
<purchaseNumber>0000000000000000</purchaseNumber>
</ns2:fcsProtocolEF3>
</ns2:export>'''
from lxml import etree as et
root = et.fromstring(xml)
text = root.xpath('//*[local-name()="export"]/*[local-name()="fcsProtocolEF3"]/*[local-name()="id"]/text()')[0]
print(text)
Below is ET based solution. NS are in use.
import xml.etree.ElementTree as ET
xml = '''<?xml version="1.0" encoding="UTF-8" standalone="yes"?>
<ns2:export xmlns:ns3="http://zakupki.gov.ru/oos/common/1" xmlns:ns4="http://zakupki.gov.ru/oos/base/1" xmlns:ns2="http://zakupki.gov.ru/oos/export/1" xmlns:ns10="http://zakupki.gov.ru/oos/printform/1" xmlns:ns11="http://zakupki.gov.ru/oos/control99/1" xmlns:ns9="http://zakupki.gov.ru/oos/SMTypes/1" xmlns:ns7="http://zakupki.gov.ru/oos/pprf615types/1" xmlns:ns8="http://zakupki.gov.ru/oos/EPtypes/1" xmlns:ns5="http://zakupki.gov.ru/oos/TPtypes/1" xmlns:ns6="http://zakupki.gov.ru/oos/CPtypes/1" xmlns="http://zakupki.gov.ru/oos/types/1">
<ns2:fcsProtocolEF3 schemeVersion="10.2">
<id>0000000</id>
<purchaseNumber>0000000000000000</purchaseNumber>
</ns2:fcsProtocolEF3>
</ns2:export>
'''
def get_id_text():
root = ET.fromstring(xml)
fcs = root.find('{http://zakupki.gov.ru/oos/export/1}fcsProtocolEF3')
# assuming there is one fcs element and one id under fcs
return fcs.find('{http://zakupki.gov.ru/oos/types/1}id').text
print(get_id_text())
output
0000000

How to import specific data from an XML file and process it using python 3.5?

I have an xml file containing the following code:
<?xml version="1.0" encoding="windows-1252" standalone="yes"?><!-- Generated by SMExport 4.88--><ROOT> <RECORDS>
<METADATA><FIELDS><FIELD attrname="SALENUM" fieldtype="i4"/><FIELD attrname="TIME" fieldtype="time"/><FIELD attrname="DATE" fieldtype="date"/><FIELD attrname="AMOUNT" fieldtype="r8" SUBTYPE="Money"/></FIELDS><PARAMS DEFAULT_ORDER="1" PRIMARY_KEY="1" LCID="1033"/></METADATA>
<RECORD>
<ROW
SALENUM="1"
TIME="125108"
DATE="20160122"
AMOUNT="22.9"
/>
</RECORD>
<RECORD>
<ROW
SALENUM="2"
TIME="125243"
DATE="20160122"
AMOUNT="22.9"
/>
</RECORD>
plus a whole lot of other records.
Question: What is the simplest way to import this data and process it in python? I am trying to use xml.etree.ElementTree, what I would like to do in the end is add up the sales prices and store that information in a variable.
Any ideas?
Use .findall() to locate the ROW elements inside RECORD elements and .attrib to access the AMOUNT attribute. Complete example:
import xml.etree.ElementTree as ET
data = """<?xml version="1.0" encoding="windows-1252" standalone="yes"?><!-- Generated by SMExport 4.88--><ROOT> <RECORDS>
<METADATA><FIELDS><FIELD attrname="SALENUM" fieldtype="i4"/><FIELD attrname="TIME" fieldtype="time"/><FIELD attrname="DATE" fieldtype="date"/><FIELD attrname="AMOUNT" fieldtype="r8" SUBTYPE="Money"/></FIELDS><PARAMS DEFAULT_ORDER="1" PRIMARY_KEY="1" LCID="1033"/></METADATA>
<RECORD>
<ROW
SALENUM="1"
TIME="125108"
DATE="20160122"
AMOUNT="22.9"
/>
</RECORD>
<RECORD>
<ROW
SALENUM="2"
TIME="125243"
DATE="20160122"
AMOUNT="22.9"
/>
</RECORD>
</RECORDS>
</ROOT>"""
root = ET.fromstring(data)
amounts = [float(row.attrib["AMOUNT"]) for row in root.findall(".//RECORD/ROW")]
print(amounts)
Prints:
[22.9, 22.9]
Then, you can use the built-in sum() to add up the amounts: sum(amounts).

How to use python to parse XML to required custom fields

I've got a directory full of salesforce objects in XML format. I'd like to identify the <fullName> and parent file of all the custom <fields> where <required> is true. Here is some truncated sample data, lets call it "Custom_Object__c:
<?xml version="1.0" encoding="UTF-8"?>
<CustomObject xmlns="http://soap.sforce.com/2006/04/metadata">
<deprecated>false</deprecated>
<description>descriptiontext</description>
<fields>
<fullName>custom_field1</fullName>
<required>false</required>
<type>Text</type>
<unique>false</unique>
</fields>
<fields>
<fullName>custom_field2</fullName>
<deprecated>false</deprecated>
<visibleLines>5</visibleLines>
</fields>
<fields>
<fullName>custom_field3</fullName>
<required>false</required>
</fields>
<fields>
<fullName>custom_field4</fullName>
<deprecated>false</deprecated>
<description>custom field 4 description</description>
<externalId>true</externalId>
<required>true</required>
<scale>0</scale>
<type>Number</type>
<unique>false</unique>
</fields>
<fields>
<fullName>custom_field5</fullName>
<deprecated>false</deprecated>
<description>Creator of this log message. Application-specific.</description>
<externalId>true</externalId>
<label>Origin</label>
<length>255</length>
<required>true</required>
<type>Text</type>
<unique>false</unique>
</fields>
<label>App Log</label>
<nameField>
<displayFormat>LOG-{YYYYMMDD}-{00000000}</displayFormat>
<label>Entry ID</label>
<type>AutoNumber</type>
</nameField>
</CustomObject>
The desired output would be a dictionary with format something like:
required_fields = {'Custom_Object__1': 'custom_field4', 'Custom_Object__1': 'custom_field5',... etc for all the required fields in all files in the fold.}
or anything similar.
I've already gotten my list of objects through glob.glob, and I can get a list of all the children and their attributes with ElementTree but I'm struggling past there. I feel like I'm very close but I'd love a hand finishing this task off. Here is my code so far:
import os
import glob
import xml.etree.ElementTree as ET
os.chdir("/Users/paulsallen/workspace/fforce/FForce Dev Account/config/objects/")
objs = []
for file in glob.glob("*.object"):
objs.append(file)
fields_dict = {}
for object in objs:
root = ET.parse(objs).getroot()
....
and once I get the XML data parsed I don't know where to take it from there.
You really want to switch to using lxml here, because then you can use an XPath query:
from lxml import etree as ET
os.chdir("/Users/paulsallen/workspace/fforce/FForce Dev Account/config/objects/")
objs = glob.glob("*.object")
fields_dict = {}
for filename in objs:
root = ET.parse(filename).getroot()
required = root.xpath('.//n:fullName[../n:required/text()="true"]/text()',
namespaces={'n': tree.nsmap[None]})
fields_dict[os.path.splitext(filename)[0]] = required
With that code you end up with a dictionary of lists; each key is a filename (without the extension), each value is a list of required fields.
The XPath query looks for fullName elements in the default namespace, that have a required element as sibling with the text 'true' in them. It then takes the contained text of each of those matching elements, which is a list we can store in the dictionary.
Use this function to find all required fields under a given root. It should also help as an example/starting point for future parsing needs
def find_required_fields(root):
NS = {'soap': 'http://soap.sforce.com/2006/04/metadata'}
required_fields = []
for field in root.findall('soap:fields', namespaces=NS):
required = field.findtext('soap:required', namespaces=NS) == "true"
name = field.findtext('soap:fullName', namespaces=NS)
if required:
required_fields.append(name)
return required_fields
Example usage:
>>> import xml.etree.ElementTree as ET
>>> root = ET.parse('objects.xml') # where objects.xml contains the example in the question
>>> print find_required_fields(root)
['custom_field4', 'custom_field5']
>>>

parsing XML configuration file using Etree in python

Please help me parse a configuration file of the below prototype using lxml etree. I tried with for event, element with tostring. Unfortunately I don't need the text, but the XML between
<template name>
<config>
</template>
for a given attribute.
I started with this code, but get a key error while searching for the attribute since it scans from start
config_tree = etree.iterparse(token_template_file)
for event, element in config_tree:
if element.attrib['name']=="ad auth":
print ("attrib reached. get XML before child ends")
Since I am a newbie to XML and python, I am not sure how to go about it. Here is the config file:
<Templates>
<template name="config1">
<request>
<password>pass</password>
<userName>username</userName>
<appID>someapp</appID>
</request>
</template>
<template name="config2">
<request>
<password>pass1</password>
<userName>username1</userName>
<appID>someapp</appID>
</request>
</template>
</Templates>
Thanks in advance!
Expected Output:
Say the user requests the config2- then the output should look like:
<request>
<password>pass1</password>
<userName>username1</userName>
<appID>someapp</appID>
</request>
(I send this XML using httplib2 to a server for initial authentication)
FINAL CODE:
thanks to FC and Constantnius. Here is the final code:
config_tree = etree.parse(token_template_file)
for template in config_tree.iterfind("template"):
if template.get("name") == "config2":
element = etree.tostring(template.find("request"))
print (template.get("name"))
print (element)
output:
config2
<request>
<password>pass1</password>
<userName>username1</userName>
<appID>someapp</appID>
</request>
You could try to iterate over all template elements in the XML and parse them with the following code:
for template in root.iterfind("template"):
name = template.get("name")
request = template.find(requst)
password = template.findtext("request/password")
username = ...
...
# Do something with the values
You could try using get('name', default='') instead of ['name']
To get the text in the tag use .text

Parsing Solr XML into Python Dictionary

I am new to python and am trying to pass an xml document (filled with documents for a solr instance) into a python dictionary. I am having trouble trying to actually accomplish this. I have tried using ElementTree and minidom but I can't seem to get the right results.
Here is my XML Structure:
<add>
<doc>
<field name="genLatitude">45.639968</field>
<field name="carOfficeHoursEnd">2000-01-01T09:00:00.000Z</field>
<field name="genLongitude">5.879745</field>
</doc>
<doc>
<field name="genLatitude">46.639968</field>
<field name="carOfficeHoursEnd">2000-01-01T09:00:00.000Z</field>
<field name="genLongitude">6.879745</field>
</doc>
</add>
And From this I need to turn it into a dictionary that looks like:
doc {
"genLatitude": '45.639968',
"carOfficeHoursEnd": '2000-01-01T09:00:00.000Z',
"genLongitude": '5.879745',
}
I am not too familiar with how dictionaries work but is there also a way to get all the "docs" into one dictionary.
cheers.
import xml.etree.cElementTree as etree
from pprint import pprint
root = etree.fromstring(xmlstr) # or etree.parse(filename_or_file).getroot()
docs = [{f.attrib['name']: f.text for f in doc.iterfind('field[#name]')}
for doc in root.iterfind('doc')]
pprint(docs)
Output
[{'carOfficeHoursEnd': '2000-01-01T09:00:00.000Z',
'genLatitude': '45.639968',
'genLongitude': '5.879745'},
{'carOfficeHoursEnd': '2000-01-01T09:00:00.000Z',
'genLatitude': '46.639968',
'genLongitude': '6.879745'}]
Where xmlstr is:
xmlstr = """
<add>
<doc>
<field name="genLatitude">45.639968</field>
<field name="carOfficeHoursEnd">2000-01-01T09:00:00.000Z</field>
<field name="genLongitude">5.879745</field>
</doc>
<doc>
<field name="genLatitude">46.639968</field>
<field name="carOfficeHoursEnd">2000-01-01T09:00:00.000Z</field>
<field name="genLongitude">6.879745</field>
</doc>
</add>
"""
Solr can return a Python dictionary if you add wt=python to the request parameters. To convert this text response into a Python object, use ast.literal_eval(text_response).
This is much simpler than parsing the XML.
A possible solution using ElementTree, with output pretty formatted for sake of example:
>>> import xml.etree.ElementTree as etree
>>> root = etree.parse(document).getroot()
>>> docs = []
>>> for doc in root.findall('doc'):
... fields = {}
... for field in doc:
... fields[field.attrib['name']] = field.text
... docs.append(fields)
...
>>> print docs
[{'genLongitude': '5.879745',
'genLatitude': '45.639968',
'carOfficeHoursEnd': '2000-01-01T09:00:00.000Z'},
{'genLongitude': '6.879745',
'genLatitude': '46.639968',
'carOfficeHoursEnd': '2000-01-01T09:00:00.000Z'}]
The XML document you show does not provide a way to distinguish each doc from the other, so I would maintain that a list is the best structure to collect each dictionary.
Indeed, if you want to insert each doc data into another dictionary, of course you can, but you need to choose a suitable key for that dictionary. For example, using the id Python provides for each object, you could write:
>>> docs = {}
>>> for doc in root.findall('doc'):
... fields = {}
... for field in doc:
... fields[field.attrib['name']] = field.text
... docs[id(fields)] = fields
...
>>> print docs
{3076930796L: {'genLongitude': '6.879745',
'genLatitude': '46.639968',
'carOfficeHoursEnd': '2000-01-01T09:00:00.000Z'},
3076905540L: {'genLongitude': '5.879745',
'genLatitude': '45.639968',
'carOfficeHoursEnd': '2000-01-01T09:00:00.000Z'}}
This example is designed just to let you see how to use the outer dictionary. If you decide to go down this path, I would suggest you to find a meaningful and usable key instead of the obejct's memory address returned by id, which can change from run to run.
It's risky to eval any string that comes from the outside directly into python. Who knows what's in there.
I'd suggest using the json interface. Something like:
import json
import urllib2
response_dict = json.loads(urllib2.urlopen('http://localhost:8080/solr/combined/select?wt=json&q=*&rows=1').read())
#to view the dict
print json.dumps(answer, indent=1)

Categories

Resources