Prepending nightmare - python

I wrote some code to connect with FileMaker Server and ask for a new record to be created on the backend server. This portion works fine and the result contains unexpected text that does not show up in a browser. I couldn't get the find operations to work by looking at the tree in a browser so I ended up printing the tags and attributes out and every tag attribute pair has this format:
Tag: {http://www.filemaker.com/xml/fmresultset}error Attrib: {'code': '0'}
In my code below I'm having to put the fully qualified tag into the code to get the first find to work. This makes it harder to figure out the xPath to the object that I want to get from the XML. The second find below doesn't work since I can't figure out the path. The path should be /resultset/record/ and the attribute is record-id. Anybody have an idea of what is happening. Why is the fmresultset document prepended to every tag?
url = self.getBaseURL( machineID, file, lay ) + fields + "&-new"
result = self.sendURL( url )
import xml.etree.ElementTree as ET
root = ET.fromstring(result)
self.printXML( root )
base = "{http://www.filemaker.com/xml/fmresultset}"
find = root.find( base + "error" )
error = find.attrib[ 'code' ]
recID = 0
if ( error == '0' ):
find = root.find( base + "resultset" + base + "/record" )
recID = find.attrib[ 'record-id' ]
Keith

OK... You mentioned the word namespace and I figured it out. This uses the element version of root.
def getXMLValue( self, root, path, value ):
ns = {'fmrs': 'http://www.filemaker.com/xml/fmresultset' }
find = root.find( path, ns )
value = find.attrib[ value ]
return value
def getError ( self, root ):
return self.getXMLValue ( root, "fmrs:error", 'code' )
def getRecID ( self, root ):
return self.getXMLValue ( root, "fmrs:resultset/fmrs:record", 'record-id' )
This code correctly returns the correct value. I was expecting something like. Create an instance of the class then call setNamespace(). I'm having to put all these references to fmrs which I would rather not do but this solves the issues and the code isn't horrible to read.
Thanks for the assist.
-keith

Related

Assigning the reference of one object to a variable does not work in a python script but works in command line

This simple script extracts a hierarchy from a table. It's straightforward if you look at the code. Every record has an ID and a parent ID, the script associates in the dictionary accDictionary the ID of an account to its path. A "path" is the concatenation of the account names starting from the root.
The script gives this error:
longAcctName = curAccount['name'] + '.' + longAcctName;
TypeError: 'NoneType' object has no attribute '__getitem__'
The problem is that curAccount is not assigned to a, but there is a line in the code that should do that.
If I type curAccount['name'] in the console after the error (I use IDLE editor), I obtain the same result, but if I type curAccount = a and then curAccount['name'], the behavior is correct, because I see the same name I'd see if I typed a['name'], because now a and curAccount reference to the same object in memory.
Where's the error?
Here is the code:
import sqlite3;
conn = sqlite3.connect('export-2014-11-29.sqlite3');
conn.row_factory = sqlite3.Row;
accCur = conn.cursor();
accNameCur = conn.cursor();
accDictionary = dict();
accounts = accCur.execute('SELECT name, guid, parent_guid FROM accounts');
for a in accounts:
longAcctName = '';
curAccount = a;
while (True):
longAcctName = curAccount['name'] + '.' + longAcctName;
if (curAccount['parent_guid'] == ''):
print 'Questo conto non ha padre.';
break;
accNameCur.execute('SELECT name, guid, parent_guid FROM accounts WHERE guid=?', (curAccount['parent_guid'],));
curAccount = accNameCur.fetchone();
accDictionary[a['guid']] = longAcctName;
The problem appears to be curAccount = accNameCur.fetchone();
You have no check in place for when you've run out of records, so you just keep trying to fetch. The fetchone method will return None when there's no data remaining to get from the cursor. You should probably change your while True: to while curAccount: or something of that nature.

How to use the Typed unmarshaller in suds?

I have existing code that processes the output from suds.client.Client(...).service.GetFoo(). Now that part of the flow has changed and we are no longer using SOAP, instead receiving the same XML through other channels. I would like to re-use the existing code by using the suds Typed unmarshaller, but so far have not been successful.
I came 90% of the way using the Basic unmarshaller:
tree = suds.umx.basic.Basic().process(xmlroot)
This gives me the nice tree of objects with attributes, so that the pre-existing code can access tree[some_index].someAttribute, but the value will of course always be a string, rather than an integer or date or whatever, so the code can still not be re-used as-is.
The original class:
class SomeService(object):
def __init__(self):
self.soap_client = Client(some_wsdl_url)
def GetStuff(self):
return self.soap_client.service.GetStuff()
The drop-in replacement that almost works:
class SomeSourceUntyped(object):
def __init__(self):
self.url = some_url
def GetStuff(self):
xmlfile = urllib2.urlopen(self.url)
xmlroot = suds.sax.parser.Parser().parse(xmlfile)
if xmlroot:
# because the parser creates a document root above the document root
tree = suds.umx.basic.Basic().process(xmlroot)[0]
else:
tree = None
return tree
My vain effort to understand suds.umx.typed.Typed():
class SomeSourceTyped(object):
def __init__(self):
self.url = some_url
self.schema_file_name =
os.path.realpath(os.path.join(os.path.dirname(__file__),'schema.xsd'))
with open(self.schema_file_name) as f:
self.schema_node = suds.sax.parser.Parser().parse(f)
self.schema = suds.xsd.schema.Schema(self.schema_node, "", suds.options.Options())
self.schema_query = suds.xsd.query.ElementQuery(('http://example.com/namespace/','Stuff'))
self.xmltype = self.schema_query.execute(self.schema)
def GetStuff(self):
xmlfile = urllib2.urlopen(self.url)
xmlroot = suds.sax.parser.Parser().parse(xmlfile)
if xmlroot:
unmarshaller = suds.umx.typed.Typed(self.schema)
# I'm still running into an exception, so obviously something is missing:
# " Exception: (document, None, ), must be qref "
# Do I need to call the Parser differently?
tree = unmarshaller.process(xmlroot, self.xmltype)[0]
else:
tree = None
return tree
This is an obscure one.
Bonus caveat: Of course I am in a legacy system that uses suds 0.3.9.
EDIT: further evolution on the code, found how to create SchemaObjects.

Python ElementTree

Having trouble with XML config files using ElementTree. I want to have an easy way to find the text of an element regardless of where it is in the XML Tree. From what the documentation says, I should be able to do this with findtext(), but no matter what, I get a return of None. Where am I going wrong here? Everyone was telling me XML is so simple to handle in Python, yet I have had nothing but troubles.
configFileName = 'file.xml'
def configSet (x):
if os.path.exists(configFileName):
tree = ET.parse(configFileName)
root = tree.getroot()
return root.findtext(x)
hiTemp = configSet('hiTemp')
print hiTemp
and the XML
<configData>
<units>
<temp>F</temp>
</units>
<pins>
<lights>1</lights>
<fan>2</fan>
<co2>3</co2>
</pins>
<events>
<airTemps>
<hiTemp>80</hiTemp>
<lowTemp>72</lowTemp>
<hiTempAlarm>84</hiTempAlarm>
</airTemps>
<CO2>
<co2Hi>1500</co2Hi>
<co2Low>1400</co2Low>
<co2Alarm>600</co2Alarm>
</CO2>
</events>
<settings>
<apikeys>
<prowl>
<apikey>None</apikey>
</prowl>
</apikeys>
</settings>
expected result
80
actual result
None
findtext requires a full path, but you have given a relative path, so you cannot find the element you are looking for.
You can either provide a good xpath or modify your code
def configSet(x):
if os.path.exists(configFileName):
tree = ET.parse(configFileName)
root = tree.getroot()
for e in root.getiterator():
t = e.findtext(x)
if t is not None:
return t
Update 1:
If you want to have all matched text as a list, the code is a bit different.
def configSet(x):
matches = []
if os.path.exists(configFileName):
tree = ET.parse(configFileName)
root = tree.getroot()
for e in root.getiterator():
t = e.findtext(x)
if t is not None:
matches.append(t)
return matches
You can use xpath to get to your desired element.
return root.find('./events/airTemps/hiTemp').text
There's easy to follow documentation here.

docutils/sphinx custom directive creating sibling section rather than child

Consider a reStructuredText document with this skeleton:
Main Title
==========
text text text text text
Subsection
----------
text text text text text
.. my-import-from:: file1
.. my-import-from:: file2
The my-import-from directive is provided by a document-specific Sphinx extension, which is supposed to read the file provided as its argument, parse reST embedded in it, and inject the result as a section in the current input file. (Like autodoc, but for a different file format.) The code I have for that, right now, looks like this:
class MyImportFromDirective(Directive):
required_arguments = 1
def run(self):
src, srcline = self.state_machine.get_source_and_line()
doc_file = os.path.normpath(os.path.join(os.path.dirname(src),
self.arguments[0]))
self.state.document.settings.record_dependencies.add(doc_file)
doc_text = ViewList()
try:
doc_text = extract_doc_from_file(doc_file)
except EnvironmentError as e:
raise self.error(e.filename + ": " + e.strerror) from e
doc_section = nodes.section()
doc_section.document = self.state.document
# report line numbers within the nested parse correctly
old_reporter = self.state.memo.reporter
self.state.memo.reporter = AutodocReporter(doc_text,
self.state.memo.reporter)
nested_parse_with_titles(self.state, doc_text, doc_section)
self.state.memo.reporter = old_reporter
if len(doc_section) == 1 and isinstance(doc_section[0], nodes.section):
doc_section = doc_section[0]
# If there was no title, synthesize one from the name of the file.
if len(doc_section) == 0 or not isinstance(doc_section[0], nodes.title):
doc_title = nodes.title()
doc_title.append(make_title_text(doc_file))
doc_section.insert(0, doc_title)
return [doc_section]
This works, except that the new section is injected as a child of the current section, rather than a sibling. In other words, the example document above produces a TOC tree like this:
Main Title
Subsection
File1
File2
instead of the desired
Main Title
Subsection
File1
File2
How do I fix this? The Docutils documentation is ... inadequate, particularly regarding control of section depth. One obvious thing I have tried is returning doc_section.children instead of [doc_section]; that completely removes File1 and File2 from the TOC tree (but does make the section headers in the body of the document appear to be for the right nesting level).
I don't think it is possible to do this by returning the section from the directive (without doing something along the lines of what Florian suggested), as it will get appended to the 'current' section. You can, however, add the section via self.state.section as I do in the following (handling of options removed for brevity)
class FauxHeading(object):
"""
A heading level that is not defined by a string. We need this to work with
the mechanics of
:py:meth:`docutils.parsers.rst.states.RSTState.check_subsection`.
The important thing is that the length can vary, but it must be equal to
any other instance of FauxHeading.
"""
def __init__(self, length):
self.length = length
def __len__(self):
return self.length
def __eq__(self, other):
return isinstance(other, FauxHeading)
class ParmDirective(Directive):
required_arguments = 1
optional_arguments = 0
has_content = True
option_spec = {
'type': directives.unchanged,
'precision': directives.nonnegative_int,
'scale': directives.nonnegative_int,
'length': directives.nonnegative_int}
def run(self):
variableName = self.arguments[0]
lineno = self.state_machine.abs_line_number()
secBody = None
block_length = 0
# added for some space
lineBlock = nodes.line('', '', nodes.line_block())
# parse the body of the directive
if self.has_content and len(self.content):
secBody = nodes.container()
block_length += nested_parse_with_titles(
self.state, self.content, secBody)
# keeping track of the level seems to be required if we want to allow
# nested content. Not sure why, but fits with the pattern in
# :py:meth:`docutils.parsers.rst.states.RSTState.new_subsection`
myLevel = self.state.memo.section_level
self.state.section(
variableName,
'',
FauxHeading(2 + len(self.options) + block_length),
lineno,
[lineBlock] if secBody is None else [lineBlock, secBody])
self.state.memo.section_level = myLevel
return []
I don't know how to do it directly inside your custom directive. However, you can use a custom transform to raise the File1 and File2 nodes in the tree after parsing. For example, see the transforms in the docutils.transforms.frontmatter module.
In your Sphinx extension, use the Sphinx.add_transform method to register the custom transform.
Update: You can also directly register the transform in your directive by returning one or more instances of the docutils.nodes.pending class in your node list. Make sure to call the note_pending method of the document in that case (in your directive you can get the document via self.state_machine.document).

LXML Xpath does not seem to return full path

OK I'll be the first to admit its is, just not the path I want and I don't know how to get it.
I'm using Python 3.3 in Eclipse with Pydev plugin in both Windows 7 at work and ubuntu 13.04 at home. I'm new to python and have limited programming experience.
I'm trying to write a script to take in an XML Lloyds market insurance message, find all the tags and dump them in a .csv where we can easily update them and then reimport them to create an updated xml.
I have managed to do all of that except when I get all the tags it only gives the tag name and not the tags above it.
<TechAccount Sender="broker" Receiver="insurer">
<UUId>2EF40080-F618-4FF7-833C-A34EA6A57B73</UUId>
<BrokerReference>HOY123/456</BrokerReference>
<ServiceProviderReference>2012080921401A1</ServiceProviderReference>
<CreationDate>2012-08-10</CreationDate>
<AccountTransactionType>premium</AccountTransactionType>
<GroupReference>2012080921401A1</GroupReference>
<ItemsInGroupTotal>
<Count>1</Count>
</ItemsInGroupTotal>
<ServiceProviderGroupReference>8-2012-08-10</ServiceProviderGroupReference>
<ServiceProviderGroupItemsTotal>
<Count>13</Count>
</ServiceProviderGroupItemsTotal>
That is a fragment of the XML. What I want is to find all the tags and their path. For example for I want to show it as ItemsInGroupTotal/Count but can only get it as Count.
Here is my code:
xml = etree.parse(fullpath)
print( xml.xpath('.//*'))
all_xpath = xml.xpath('.//*')
every_tag = []
for i in all_xpath:
single_tag = '%s,%s' % (i.tag, i.text)
every_tag.append(single_tag)
print(every_tag)
This gives:
'{http://www.ACORD.org/standards/Jv-Ins-Reinsurance/1}ServiceProviderGroupReference,8-2012-08-10', '{http://www.ACORD.org/standards/Jv-Ins-Reinsurance/1}ServiceProviderGroupItemsTotal,\n', '{http://www.ACORD.org/standards/Jv-Ins-Reinsurance/1}Count,13',
As you can see Count is shown as {namespace}Count, 13 and not {namespace}ItemsInGroupTotal/Count, 13
Can anyone point me towards what I need?
Thanks (hope my first post is OK)
Adam
EDIT:
This is my code now:
with open(fullpath, 'rb') as xmlFilepath:
xmlfile = xmlFilepath.read()
fulltext = '%s' % xmlfile
text = fulltext[2:]
print(text)
xml = etree.fromstring(fulltext)
tree = etree.ElementTree(xml)
every_tag = ['%s, %s' % (tree.getpath(e), e.text) for e in xml.iter()]
print(every_tag)
But this returns an error:
ValueError: Unicode strings with encoding declaration are not supported. Please use bytes input or XML fragments without declaration.
I remove the first two chars as thy are b' and it complained it didn't start with a tag
Update:
I have been playing around with this and if I remove the xis: xxx tags and the namespace stuff at the top it works as expected. I need to keep the xis tags and be able to identify them as xis tags so can't just delete them.
Any help on how I can achieve this?
ElementTree objects have a method getpath(element), which returns a
structural, absolute XPath expression to find that element
Calling getpath on each element in a iter() loop should work for you:
from pprint import pprint
from lxml import etree
text = """
<TechAccount Sender="broker" Receiver="insurer">
<UUId>2EF40080-F618-4FF7-833C-A34EA6A57B73</UUId>
<BrokerReference>HOY123/456</BrokerReference>
<ServiceProviderReference>2012080921401A1</ServiceProviderReference>
<CreationDate>2012-08-10</CreationDate>
<AccountTransactionType>premium</AccountTransactionType>
<GroupReference>2012080921401A1</GroupReference>
<ItemsInGroupTotal>
<Count>1</Count>
</ItemsInGroupTotal>
<ServiceProviderGroupReference>8-2012-08-10</ServiceProviderGroupReference>
<ServiceProviderGroupItemsTotal>
<Count>13</Count>
</ServiceProviderGroupItemsTotal>
</TechAccount>
"""
xml = etree.fromstring(text)
tree = etree.ElementTree(xml)
every_tag = ['%s, %s' % (tree.getpath(e), e.text) for e in xml.iter()]
pprint(every_tag)
prints:
['/TechAccount, \n',
'/TechAccount/UUId, 2EF40080-F618-4FF7-833C-A34EA6A57B73',
'/TechAccount/BrokerReference, HOY123/456',
'/TechAccount/ServiceProviderReference, 2012080921401A1',
'/TechAccount/CreationDate, 2012-08-10',
'/TechAccount/AccountTransactionType, premium',
'/TechAccount/GroupReference, 2012080921401A1',
'/TechAccount/ItemsInGroupTotal, \n',
'/TechAccount/ItemsInGroupTotal/Count, 1',
'/TechAccount/ServiceProviderGroupReference, 8-2012-08-10',
'/TechAccount/ServiceProviderGroupItemsTotal, \n',
'/TechAccount/ServiceProviderGroupItemsTotal/Count, 13']
UPD:
If your xml data is in the file test.xml, the code would look like:
from pprint import pprint
from lxml import etree
xml = etree.parse('test.xml').getroot()
tree = etree.ElementTree(xml)
every_tag = ['%s, %s' % (tree.getpath(e), e.text) for e in xml.iter()]
pprint(every_tag)
Hope that helps.
getpath() does indeed return an xpath that's not suited for human consumption. From this xpath, you can build up a more useful one though. Such as with this quick-and-dirty approach:
def human_xpath(element):
full_xpath = element.getroottree().getpath(element)
xpath = ''
human_xpath = ''
for i, node in enumerate(full_xpath.split('/')[1:]):
xpath += '/' + node
element = element.xpath(xpath)[0]
namespace, tag = element.tag[1:].split('}', 1)
if element.getparent() is not None:
nsmap = {'ns': namespace}
same_name = element.getparent().xpath('./ns:' + tag,
namespaces=nsmap)
if len(same_name) > 1:
tag += '[{}]'.format(same_name.index(element) + 1)
human_xpath += '/' + tag
return human_xpath

Categories

Resources