How do I merge two lxml.objectify elements?

How do I merge two lxml.objectify elements? - python

I am parsing some XML configuration to use the settings therein. I am not going to write out the XML again, so my interest here is only in extraction.
I have two lxml.objectify elements: On the one hand an element containing default global settings, on the other one containing instance-specific settings. Each element is similarly structured (e.g. root.global_settings.lights holds the same kind of settings as root.instance_settings.lights) but there can be intersections as well as set differences. The elements contain some children text nodes but also nodes containing other nodes.
What I want: A single element containing all settings from both elements. Instance-specific settings override global ones.
The best solution I currently have is looping over the instance children and overwriting/adding to the global ones (at all levels where there are text nodes). I was thinking maybe there would be something more like dict.update?
EDIT: Just to give an example
<global>
<birthday>Unknown</birthday>
<wishes>
<cake>Chocolate</cake>
<gift>Book</gift>
</wishes>
</global>
<instance>
<name>Mike</name>
<birthday>06-06-1974</birthday>
<wishes>
<notes>Hates flowers</notes>
</wishes>
<instance>
would yield the same as if I had run objectify.parse on
<global>
<name>Mike</name>
<birthday>06-06-1974</birthday>
<wishes>
<cake>Chocolate</cake>
<gift>Book</gift>
<notes>Hates flowers</notes>
</wishes>
</global>

I didn't find any 'native' lxml solutions. Objectify elements have traits from both dictionaries (you can access their values like a dict) and lists (you can extend and append an element with other elements). Update, however, does not work at all and extend has severe limitations, including the lack of recursiveness.
So I put together this recursive function that updates one element using another. The specific context here is using user settings to override default settings, leaving defaults in place where there are no user settings.
In essence the function distinguishes between four kinds of nodes defined by two characteristics:
1) Is the node absent in the default settings? If so we can just copy over (append) the user one.
2) If the node is also found in the default settings, we need to make a further distinction: Is it a DataElement - i.e. a node with a direct data value, e.g. <name>Mike</name> - or more of a 'structural' node without a direct data value, e.g. <wishes>...</wishes> in the example above. In the first case we replace the default node (and value) with the user one. In the second, we need to go one level deeper and repeat the whole procedure.
def merge(user_el, default_el):
'''Updating one lxml objectify elements with another'''
for child in user_el.iterchildren():
default_child = default_el.find(child.tag)
if default_child is None:
default_el.append(child)
continue
if isinstance(child, objectify.ObjectifiedDataElement):
default_el.replace(default_child, child)
elif isinstance(child, objectify.ObjectifiedElement):
merge(child, default_child)
EDIT: Testing the above made me realise that if a structural user element that also existed in the defaults, e.g. as an empty node, had multiple child nodes with the same tag name, they would gradually replace each other's data child nodes. To avoid that I created a version that edits a copy of the default settings. That way we continue to check against the empty placeholder element rather than the element we're gradually filling.
new_xml = copy.deepcopy(DEFAULT_XML)
merge(user_xml, new_xml, DEFAULT_XML)
def merge(user_el, new_el, default_el):
'''Updating one lxml objectify elements with another'''
for child in user_el.iterchildren():
new_child = new_el.find(child.tag)
default_child = default_el.find(child.tag)
if default_child is None:
new_el.append(child)
continue
if isinstance(child, objectify.ObjectifiedDataElement):
new_el.replace(new_child, child)
elif isinstance(child, objectify.ObjectifiedElement):
merge(child, new_child, default_child)

Related

Repeating an extra when using a Dragonfly CompoundRule

Using dragonfly2, the voice command framework, you can make a grammar like so:
chrome_rules = MappingRule(
name='chrome',
mapping={
'down [<n>]': actions.Key('space:%(n)d'),
},
extras=[
IntegerRef("n", 1, 100)
],
defaults={
"n": 1
}
)
This lets me press space n times, where n is some integer. But what do I do if I want to use the same variable (n), multiple times in the same grammar? If I repeat it in the grammar, e.g. 'down <n> <n>' and then say something like "down three four", Dragonfly will parse it correctly, but it will only execute the actions.Key('space:%(n)d') with n=3, using the first value of n. How can I get it to execute it 3 times, and then 4 times using the same variable?
Ideally I don't want to have to duplicate the variable n, in the extras and defaults, because that seems like redundant code.

TL;DR: Your MappingRule passes data to your Action (e.g. Key, Text) in the form of a dictionary, so it can only pass one value per extra. Your best bet right now is probably to create multiple extras.
This is a side-effect of the way dragonfly parses recognitions. I'll explain it first with Action objects, then we can break down why this happens at the Rule level.
When Dragonfly receives a recognition, it has to deconstruct it and extract any extras that occurred. The speech recognition engine itself has no trouble with multiple occurrances of the same extra, and it does pass that data to dragonfly, but dragonfly loses that information.
All Action objects are derived from ActionBase, and this is the method dragonfly calls when it wants to execute an Action:
def execute(self, data=None):
self._log_exec.debug("Executing action: %s (%s)" % (self, data))
try:
if self._execute(data) == False:
raise ActionError(str(self))
except ActionError as e:
self._log_exec.error("Execution failed: %s" % e)
return False
return True
This is how Text works, same with Key. It's not documented here, but data is a dictionary of extras mapped to values. For example:
{
"n": "3",
"text": "some recognized dictation",
}
See the issue? That means we can only communicate a single value per extra. Even if we combine multiple actions, we have the same problem. For example:
{
"down <n> <n>": Key("%(n)d") + Text("%(n)d"),
}
Under the hood, these two actions are combined into an ActionSeries object - a single action. It exposes the same execute interface. One series of actions, one data dict.
Note that this doesn't happen with compound rules, even if each underlying rule shares an extra with the same name. That's because data is decoded & passed per-rule. Each rule passes a different data dict to the Action it wishes to execute.
If you're curious where we lose the second extra, we can navigate up the call chain.
Each rule has a process_recognition method. This is the method that's called when a recognition occurs. It takes the current rule's node and processes it. This node might be a tree of rules, or it could be something lower-level, like an Action. Let's look at the implementation in MappingRule:
def process_recognition(self, node):
"""
Process a recognition of this rule.
This method is called by the containing Grammar when this
rule is recognized. This method collects information about
the recognition and then calls *self._process_recognition*.
- *node* -- The root node of the recognition parse tree.
"""
# Prepare *extras* dict for passing to _process_recognition().
extras = {
"_grammar": self.grammar,
"_rule": self,
"_node": node,
}
extras.update(self._defaults)
for name, element in self._extras.items():
extra_node = node.get_child_by_name(name, shallow=True)
if extra_node:
extras[name] = extra_node.value()
elif element.has_default():
extras[name] = element.default
# Call the method to do the actual processing.
self._process_recognition(node, extras)
I'm going to skip some complexity - the extras variable you see here is an early form of the data dictionary. See where we lose the value?
extra_node = node.get_child_by_name(name, shallow=True)
Which looks like:
def get_child_by_name(self, name, shallow=False):
"""Get one node below this node with the given name."""
for child in self.children:
if child.name:
if child.name == name:
return child
if shallow:
# If shallow, don't look past named children.
continue
match = child.get_child_by_name(name, shallow)
if match:
return match
return None
So, you see the issue. Dragonfly tries to extract one value for each extra, and it gets the first one. Then, it stuffs that value into a dictionary and passes it down to Action. Additional occurrences are lost.

pywinauto: Iterate through all controls in a window

I'm trying to write a general test script to find errors in new software builds. My idea is to iterate through the controls in the window and interact with each one, logging any errors that are caused and restarting the software if it crashes.
I'm looking for a way to dynamically find control identifiers, a bit like print_control_identifiers() but with the output being a list or similar structure which I can iterate through.
On a GitHub question about control identifiers this was mentioned:
it's possible to walk the hierarchy by using .children() (immediate children only) and .descendants() (the whole subtree as a plain list)
I assumed I could just iterate through my Application object's descendants() list and call a relavant interaction method for each, however I can't work out how to get this list. I assumed I could do something like this, but I haven't had any success:
def test(application):
for child in application.descendants():
#interact with child control
software = Application(backend='uia').start(cmd_line=FILE_PATH)
test(software)
AttributeError: Neither GUI element (wrapper) nor wrapper method 'descendants' were found (typo?)
EDIT
I resorted to looking through the code and found the print_control_identifiers method:
class Application(object):
def print_control_identifiers(self, depth=None, filename=None):
"""
Prints the 'identifiers'
Prints identifiers for the control and for its descendants to
a depth of **depth** (the whole subtree if **None**).
.. note:: The identifiers printed by this method have been made
unique. So if you have 2 edit boxes, they won't both have "Edit"
listed in their identifiers. In fact the first one can be
referred to as "Edit", "Edit0", "Edit1" and the 2nd should be
referred to as "Edit2".
"""
if depth is None:
depth = sys.maxsize
# Wrap this control
this_ctrl = self.__resolve_control(self.criteria)[-1]
# Create a list of this control and all its descendants
all_ctrls = [this_ctrl, ] + this_ctrl.descendants()
# Create a list of all visible text controls
txt_ctrls = [ctrl for ctrl in all_ctrls if ctrl.can_be_label and ctrl.is_visible() and ctrl.window_text()]
# Build a dictionary of disambiguated list of control names
name_ctrl_id_map = findbestmatch.UniqueDict()
for index, ctrl in enumerate(all_ctrls):
ctrl_names = findbestmatch.get_control_names(ctrl, all_ctrls, txt_ctrls)
for name in ctrl_names:
name_ctrl_id_map[name] = index
# Swap it around so that we are mapped off the control indices
ctrl_id_name_map = {}
for name, index in name_ctrl_id_map.items():
ctrl_id_name_map.setdefault(index, []).append(name)
This shows that .descendants() isn't a method of the Application class, but belongs to the control. I was wrong there it seems. Is it possible to create my own version of print_control-identifiers() which returns a list of control objects that can be iterated through?

Correct method to list top-level windows is application.windows(). Then you can call .descendants() for every listed window. In the most cases application has only one top-level window. Particularly for backend="uia" even new dialogs are children of the main window (for backend="win32" every dialog is a top-level window).

py2neo cypher create several relations to central node in for loop

just starting out with neo4j, py2neo and Cypher.
I have encountered the following problem and google and my knowledge of what to ask have not yet given me an answer or a helpful hint in the right direction. Anyway:
Problem:
I don't know how to, in python/py2neo, create relations between a unique starting node and a number of following nodes that I create dynamically in a for loop.
Background:
I have a json object which defines a person object, who will have an id, and several properties, such as favourite colour, favourite food etc.
So at the start of my py2neo script I define my person. After this I loop through my json for every property this person has.
This works fine, and with no relations I end up with a neo4j chart with several nodes with the right parameters.
If I'm understanding the docs right I have to make a match to find my newly created person, for each new property I want to link. This seems absurd to me as I just created this person and still have the reference to the person object in memory. But for me it is unclear on how to actually write the code for creating the relation. Also, as a relative newbie in both python and Cypher, best practices are still an unknown to me.
What I understand is I can use py2neo
graph = Graph(http://...)
tx = graph.begin()
p = Node("Person", id)
tx.create(p)
and then I can reference p later on. But for my properties, of which there can be many, I create a string in python like so (pseudocode here, I have a nice oneliner for this that fits my actual case with lambda, join, map, format and so on)
for param in params:
par = "MERGE (par:" + param + ... )
tx.append(par)
tx.process()
tx.commit()
How do I create a relation "likes" back to the person for each and every par in the for loop?
Or do I need to rethink my whole solution?
Help?! :-)
//Jonas

Considering you've created a node Alice and you want to create the other as dynamic, I'll suggest while dynamically parsing through the nodes, store it everytime (in the loop) in a variable, make a node out of it and then implement in Relationship Syntax. The syntax is
Relationship(Node_1, Relation, Node_2)
Now key thing to know here is type(Node_1) and type(Node_2) both will be Node.
I've stored all the nodes (only their names) from json in a list named nodes.
Since you mentioned you only have reference to Alice
a = ("Person", name:"Alice")
for node in nodes: (excluding Alice)
= Node(, name:"")
= Relationship(a, ,
Make sure to iterate variable name, else it'll keep overwriting.

Python-pptx: “Show legend overlapping the chart”

I am using Office 2007.
I found if I would like to show the legend overlapping the chart in office2007.
The XML should be as the following.
`-<c:legend>
<c:overlay val="1"/>`
But no matter I use the API from python-pptx 'chart.legend.include_in_layout = True' or I leave it as the default. The generated XML would always be as the following.
`-<c:legend>
<c:overlay/>`
Without the val=1, then office2007 won't show the format properly.
What can I do to force the python-pptx to write the val=1? thanks.

Explanation
In short, the True value is not explicitly set (in contrast to False) because True corresponds to the default value of overlay's val attribute.
To explain it in more detail - you can follow the python-pptx hierarchy as follows: overlay is mapped to CT_Boolean (all overlay oxml elements are instantiated from CT_Boolean). The actual val parameter is then mapped via OptionalAttribute and is defined with the default value of True:
class CT_Boolean(BaseOxmlElement):
"""
Common complex type used for elements having a True/False value.
"""
val = OptionalAttribute('val', XsdBoolean, default=True)
Now, when setting the optional attribute to its default value, it is actually skipped/deleted, as you can see here if value == self._default:
class OptionalAttribute(BaseAttribute):
"""
Defines an optional attribute on a custom element class. An optional
attribute returns a default value when not present for reading. When
assigned |None|, the attribute is removed.
"""
#property
def _setter(self):
def set_attr_value(obj, value):
if value == self._default:
if self._clark_name in obj.attrib:
del obj.attrib[self._clark_name]
return
str_value = self._simple_type.to_xml(value)
obj.set(self._clark_name, str_value)
return set_attr_value
Fix - provide custom CT_Boolean class
Add these lines somewhere before you need to use the overlay. It will overwrite python-pptx overlay mapping with the custom CT_Boolean_NoDefault class:
from pptx.oxml import register_element_cls
from pptx.oxml.xmlchemy import BaseOxmlElement, OptionalAttribute
from pptx.oxml.simpletypes import XsdBoolean
class CT_Boolean_NoDefault(BaseOxmlElement):
"""
Common complex type used for elements having a True/False value with no
default value.
"""
val = OptionalAttribute('val', XsdBoolean)
register_element_cls('c:overlay', CT_Boolean_NoDefault)
This worked for me and finally I got:
<c:legend>
<c:overlay val="1"/>
</c:legend>
Fix - modify python-pptx permanently
This is not recommended but you might want to modify python-pptx instead of adding the solution from above for each script you run.
First, add the following to pptx/oxml/chart/shared.py which defines a new bool class without a default value:
class CT_Boolean_NoDefault(BaseOxmlElement):
"""
Common complex type used for elements having a True/False value.
"""
val = OptionalAttribute('val', XsdBoolean)
Second, modify pptx/oxml/__init__.py to add the new bool class:
from .chart.shared import (
CT_Boolean, CT_Double, CT_Layout, CT_LayoutMode, CT_ManualLayout,
CT_NumFmt, CT_Tx, CT_UnsignedInt, CT_Boolean_NoDefault
)
Third, modify pptx/oxml/__init__.py to change the mapping of the overlay element to the new bool class:
register_element_cls('c:overlay', CT_Boolean_NoDefault)
Better solution
In case you have time, please submit a ticket here so it might become a permanent fix. In case #scanny finds some time, he will read this. Perhaps there is some better solution for this, too, and I've completely missed something.

#pansen 's analysis is spot-on. Here's an alternative way to get this working in your case that might be a little lighter weight:
def include_in_layout(legend):
legend_element = legend._element
overlay = legend_element.get_or_add_overlay()
overlay.set('val', '1')
This appears to be a localized non-conformance of that version of PowerPoint with the ISO/IEC 29500 spec. As pansen rightly points out, a missing val attribute is to be interpreted the same as val=1 (True). I'd be interested to discover how extensive this non-conformance goes, i.e. what other elements exhibit this same behavior. The CT_Boolean type is used quite frequently in PowerPoint, for things like bold, italic, varyColors, smooth, and on and on. So a "compensating" fix would need to be applied carefully to avoid reporting incorrect results for other elements.
I think I'll take pansen's cue and use a specialized element class for this element only. It will still report True for an element without the val attribute, which will be inconsistent with the observed behavior on this version of PowerPoint; but assuming other versions behave correctly (according to the spec), the inconsistency will be localized and at least assigning True to that property will make the legend show up the way you want.

Accessing related object key without fetching object in App Engine

In general, it's better to do a single query vs. many queries for a given object. Let's say I have a bunch of 'son' objects each with a 'father'. I get all the 'son' objects:
sons = Son.all()
Then, I'd like to get all the fathers for that group of sons. I do:
father_keys = {}
for son in sons:
father_keys.setdefault(son.father.key(), None)
Then I can do:
fathers = Father.get(father_keys.keys())
Now, this assumes that son.father.key() doesn't actually go fetch the object. Am I wrong on this? I have a bunch of code that assumes the object.related_object.key() doesn't actually fetch related_object from the datastore.
Am I doing this right?

You can find the answer by studying the sources of appengine.ext.db in your download of the App Engine SDK sources -- and the answer is, no, there's no special-casing as you require: the __get__ method (line 2887 in the sources for the 1.3.0 SDK) of the ReferenceProperty descriptor gets invoked before knowing if .key() or anything else will later be invoked on the result, so it just doesn't get a chance to do the optimization you'd like.
However, see line 2929: method get_value_for_datastore does do exactly what you want!
Specifically, instead of son.father.key(), use Son.father.get_value_for_datastore(son) and you should be much happier as a result;-).

I'd rather loop through the sons and get parent's keys using son.parent_key().
parent_key()
Returns the Key of the parent entity of this instance, or None if
this instance does not have a parent.
Since all the path is saved in the instance's key, theoretically, there is no need to hit the database again to get the parent's key.
After that, it's possible to get all parents' instances at once using db.get().
get(keys)
Gets the entity or entities for the given key or keys, of any Model.
Arguments:
keys
A Key object or a list of Key objects.
If one Key is provided, the return value is an instance of the
appropriate Model class, or None if no
entity exists with the given Key. If a
list of Keys is provided, the return
value is a corresponding list of model
instances, with None values when no
entity exists for a corresponding Key.

Develop Reference

Python is a programming language that lets you work quickly and integrate systems more effectively.