Is it possible to remove unecessary nested structure in yaml file?

Is it possible to remove unecessary nested structure in yaml file? - python

I need to set a param that is deep inside a yaml object like below:
executors:
hpc01:
context:
cp2k:
charge: 0
Is it possible to make it more clear, for example
executors: hpc01: context: cp2k: charge: 0
I am using ruamel.yaml in Python to parse the file and it fails to parse the example. Is there some yaml dialect can support such style, or is there better way to write such configuration in standard yaml spec?

since all json is valid yaml...
executors: {"hpc01" : {"context": {"cp2k": {"charge": 0}}}}
should be valid...
a little proof:
from ruamel.yaml import YAML
a = YAML().load('executors: {"hpc01" : {"context": {"cp2k": {"charge": 0}}}}')
b = YAML().load('''executors:
hpc01:
context:
cp2k:
charge: 0''')
if a == b:
print ("equal")
will print: equal.

What you are proposing is invalid YAML, since the colon + space is parsed as a value indicator. Since
YAML can have mappings as keys for other mappings, you would get all kinds of interpretation issues, such as
should
a: b: c
be interpreted as a mapping with key a: b and value c or as a mapping with key a and value b: c.
If you want to write everything on one line, and don't want the overhead of YAML's flow-style, I suggest
you use the fact that the value indicator expects a space after the colon and do a little post-processing:
import sys
import ruamel.yaml
yaml_str = """\
before: -1
executors:hpc01:context:cp2k:charge: 0
after: 1
"""
COLON = ':'
def unfold_keys(d):
if isinstance(d, dict):
replace = []
for idx, (k, v) in enumerate(d.items()):
if COLON in k:
for segment in reversed(k.split(COLON)):
v = {segment: v}
replace.append((idx, k, v))
else:
unfold_keys(v)
for idx, oldkey, kv in replace:
del d[oldkey]
v = list(kv.values())[0]
# v.refold = True
d.insert(idx, list(kv.keys())[0], v)
elif isinstance(d, list):
for elem in d:
unfold_keys
return d
yaml = ruamel.yaml.YAML()
data = unfold_keys(yaml.load(yaml_str))
yaml.dump(data, sys.stdout)
which gives:
before: -1
executors:
hpc01:
context:
cp2k:
charge: 0
after: 1
Since ruamel.yaml parses mappings in the default mode to CommentedMap instances which have .insert() method,
you can actually preserve the position of the "unfolded" key in the mapping.
You can of course use another character (e.g. underscore). You can also reverse the process by uncommenting the line # v.refold = True and provide another recursive function that walks over the data and checks on that attribute and does the reverse
of unfold_keys(), just before dumping.

Related

How to map over a CommentedMap while preserving the comments/style?

Given a ruamel.yaml CommentedMap, and some transformation function f: CommentedMap → Any, I would like to produce a new CommentedMap with transformed keys and values, but otherwise as similar as possible to the original.
If I don't care about preserving style, I can do this:
result = {
f(key) : f(value)
for key, value in my_commented_map.items()
}
If I didn't need to transform the keys (and I didn't care about mutating the original), I could do this:
for key, value in my_commented_map.items():
my_commented_map[key] = f(value)

The style and comment information are each attached to the
CommentedMap via special attributes. The style you can copy, but
the comments are partly indexed to key on which line they occur, and
if you transform that key, you also need to transform that indexed
comment.
In your first example you apply f() to both key and value, I'll use
seperate functions in my example, all-capsing the keys, and
all-lowercasing the values (this of course only works on string type
keys and value, so this is a restriction of the example, not of
the solution)
import sys
import ruamel.yaml
from ruamel.yaml.comments import CommentedMap as CM
from ruamel.yaml.comments import Format, Comment
yaml_str = """\
# example YAML document
abc: All Strings are Equal # but some Strings are more Equal then others
klm: Flying Blue
xYz: the End # for now
"""
def fkey(s):
return s.upper()
def fval(s):
return s.lower()
def transform(data, fk, fv):
d = CM()
if hasattr(data, Format.attrib):
setattr(d, Format.attrib, getattr(data, Format.attrib))
ca = None
if hasattr(data, Comment.attrib):
setattr(d, Comment.attrib, getattr(data, Comment.attrib))
ca = getattr(d, Comment.attrib)
# as the key mapping could map new keys on old keys, first gather everything
key_com = {}
for k in data:
new_k = fk(k)
d[new_k] = fv(data[k])
if ca is not None and k in ca.items:
key_com[new_k] = ca.items.pop(k)
if ca is not None:
assert len(ca.items) == 0
ca._items = key_com # the attribute, not the read-only property
return d
yaml = ruamel.yaml.YAML()
data = yaml.load(yaml_str)
# the following will print any new CommentedMap with curly braces, this just here to check
# if the style attribute copying is working correctly, remove from real code
yaml.default_flow_style = True
data = transform(data, fkey, fval)
yaml.dump(data, sys.stdout)
which gives:
# example YAML document
ABC: all strings are equal # but some Strings are more Equal then others
KLM: flying blue
XYZ: the end # for now
Please note:
the above tries (and succeeds) to start a comment in the original
column, if that is not possible, e.g. when a transformed key or
value takes more space, it is pushed further to the right.
if you have a more complex datastructure, recursively walk the tree, descending into mappings
and sequences. In that case it might be more easy to store (key, value, comment) tuples
then pop() all the keys and reinsert the stored values (instead of rebuilding the tree).

How to parse Cloudformation YAML to get all the !ImportValue from YAML template?

I am working on a project to parse an AWS Cloudformation Yaml File to extract all the !ImportValue from the YAML template.
I am trying to use ruamel.yaml to parse that (to which I am new), I was able to read the YAML file and get the individual elements.
import ruamel.yaml
def general_constructor(loader, tag_suffix, node):
return node.value
ruamel.yaml.SafeLoader.add_multi_constructor(u'!', general_constructor)
with open(cfFile, 'r') as service:
stream = service.read()
yaml_data = ruamel.yaml.safe_load(stream)
print yaml_data
Above code gets the content of specified YAML file and the output looks like following.
{'Application': {'Properties': {'ApplicationName': [ScalarNode(tag=u'tag:yaml.org,2002:str', value=u'-'),
SequenceNode(tag=u'tag:yaml.org,2002:seq', value=[ScalarNode(tag=u'tag:yaml.org,2002:str', value=u'***'), ScalarNode(tag=u'!ImportValue', value=u'jkl')])],
*
*
ScalarNode(tag=u'!ImportValue', value=u'def'),
*
*
ScalarNode(tag=u'!ImportValue', value=u'rst')])]},
So there are bunch of !ImportValue listed in ScalarNode (e.g ScalarNode(tag=u'!ImportValue', value=u'rst')), I actually want to extract that. Now these ImportValues are scattered in the template at various places. What would be the best way to extract the Value of those? In our cloudformation, we have bunch of YAML files, some of them Exports certain resource and other YAML files import them. So, I want to build a sort of dependency map (May be a JSON file) which will depict the interdependence between Cloud-formation files.

If you use ruamel.yaml's round-trip loader you don't have to do
anything special to load the tag, and walking recursively over the
resulting data structure is relatively easy. The corresponding key
needs to be passed on, as at least the first !ImportValue is within
a sequence under the key.
Assuming an input.yaml consisting of:
Application:
Properties:
ApplicationName: ["-", ["**", !ImportValue "jkl"]]
AnotherKey:
- 42
- nested: !ImportValue xyz
(which might not be exactly what you got as input, but will do for
demonstration purposes), and using the new ruamel.yaml API (which
defaults to round-trip loading/dumping):
import sys
from pathlib import Path
import ruamel.yaml
ta = ruamel.yaml.comments.Tag.attrib
yaml = ruamel.yaml.YAML()
data = yaml.load(Path('input.yaml'))
def process(d, key=None):
if isinstance(d, dict):
for k, v in d.items():
for res in process(v, k): # recurse and pass on new key
yield res
elif isinstance(d, list):
for item in d:
for res in process(item, key):
yield res
else:
try:
if getattr(d, ta, None).value == '!ImportValue':
yield (key, d)
except AttributeError:
pass
for k, v in process(data):
print(k, '->', v)
which gives:
ApplicationName -> jkl
nested -> xyz

Accessing / Replacing Yaml values via python using variables

I have been trying to solve what I thought would be simple but can't wrap my head around getting a yaml file updated based on a variable
What I have:
An ansible hosts file in YAML format. This hosts file is not 100% the same all the time. It can have a dictionary of multiple image values (as one example) and I only want one to change.
namespace: demo1
images:
image1:
path: "path1"
version: "v1"
image2:
path: "path2"
version: "1.2.3"
user: "root"
A YAML file that contains the key/values for things I want to replace. We already have a lot of configuration inside this YAML for other parts of our system so I don't want to split off to some other type of config type if I can help it (ini, JSON, etc) I would really want this to be dot notation.
schema: v1.0
hostfile:
- path: path/to/ansible_hosts_file
images:
image1.version: v1.1
I am trying to find a way to load the YAML from #1, read in the key hostfile.images.[variable] to replace and write back to the original ansible file with the new value. I keep getting tripped up on the variable aspect since today it can be image1.version and the next config its image2.path or both at the same time.

I think your problem primarily comes from mixing key-value pairs with dotted notation. I.e.
images:
image1.version: v1.1
instead of doing
images.image1.version: v1.1
Retrieving by dotted notation has been solved for Python and arbitrary separators (not necessarily '.') in this answer. Setting just involves providing two extra functions that take a second argument, which is the value to set and graft them onto CommentedMap resp. CommentesSeq)
Based on that you need to preselect based on your key:
upd = ruamel.yaml.round_trip_load(open('update.yaml')
# schema check here
for hostfile in upd['hostfile']:
data = ruamel.yaml.round_trip_load(open(hostfile['path']))
images = data['images']
for dotted in hostfile['images']:
val = hostfile['images'][dotted]
images.string_set(dotted, val)
The actual string_set-ting code could look like (untested):
def mapping_string_set(self, s, val, delimiter=None, key_delim=None):
def p(v):
try:
v = int(v)
except:
pass
return v
# possible extend for primitives like float, datetime, booleans, etc.
if delimiter is None:
delimiter = '.'
if key_delim is None:
key_delim = ','
try:
key, rest = s.split(delimiter, 1)
except ValueError:
key, rest = s, None
if key_delim in key:
key = tuple((p(key) for key in key.split(key_delim)))
else:
key = p(key)
if rest is None:
self[key] = val
return
self[key].string_set(rest, val, delimiter, key_delim)
ruamel.yaml.comments.CommentedMap.string_set = mapping_string_set
def sequence_string_set(self, s, delimiter=None, key_delim=None):
if delimiter is None:
delimiter = '.'
try:
key, rest = s.split(delimiter, 1)
except ValueError:
key, rest = s, None
key = int(key)
if rest is None:
self[key] = val
return
self[key].string_set(rest, val, delimiter, key_delim)
ruamel.yaml.comments.CommentedSeq.string_set = sequence_string_set

Return all keys along with value in nested dictionary

I am working on getting all text that exists in several .yaml files placed into a new singular YAML file that will contain the English translations that someone can then translate into Spanish.
Each YAML file has a lot of nested text. I want to print the full 'path', aka all the keys, along with the value, for each value in the YAML file. Here's an example input for a .yaml file that lives in the myproject.section.more_information file:
default:
heading: Here’s A Title
learn_more:
title: Title of Thing
url: www.url.com
description: description
opens_new_window: true
and here's the desired output:
myproject.section.more_information.default.heading: Here’s a Title
myproject.section.more_information.default.learn_more.title: Title of Thing
mproject.section.more_information.default.learn_more.url: www.url.com
myproject.section.more_information.default.learn_more.description: description
myproject.section.more_information.default.learn_more.opens_new_window: true
This seems like a good candidate for recursion, so I've looked at examples such as this answer
However, I want to preserve all of the keys that lead to a given value, not just the last key in a value. I'm currently using PyYAML to read/write YAML.
Any tips on how to save each key as I continue to check if the item is a dictionary and then return all the keys associated with each value?

What you're wanting to do is flatten nested dictionaries. This would be a good place to start: Flatten nested Python dictionaries, compressing keys
In fact, I think the code snippet in the top answer would work for you if you just changed the sep argument to ..
edit:
Check this for a working example based on the linked SO answer http://ideone.com/Sx625B
import collections
some_dict = {
'default': {
'heading': 'Here’s A Title',
'learn_more': {
'title': 'Title of Thing',
'url': 'www.url.com',
'description': 'description',
'opens_new_window': 'true'
}
}
}
def flatten(d, parent_key='', sep='_'):
items = []
for k, v in d.items():
new_key = parent_key + sep + k if parent_key else k
if isinstance(v, collections.MutableMapping):
items.extend(flatten(v, new_key, sep=sep).items())
else:
items.append((new_key, v))
return dict(items)
results = flatten(some_dict, parent_key='', sep='.')
for item in results:
print(item + ': ' + results[item])
If you want it in order, you'll need an OrderedDict though.

Walking over nested dictionaries begs for recursion and by handing in the "prefix" to "path" this prevents you from having to do any manipulation on the segments of your path (as #Prune) suggests.
There are a few things to keep in mind that makes this problem interesting:
because you are using multiple files can result in the same path in multiple files, which you need to handle (at least throwing an error, as otherwise you might just lose data). In my example I generate a list of values.
dealing with special keys (non-string (convert?), empty string, keys containing a .). My example reports these and exits.
Example code using ruamel.yaml ¹:
import sys
import glob
import ruamel.yaml
from ruamel.yaml.comments import CommentedMap, CommentedSeq
from ruamel.yaml.compat import string_types, ordereddict
class Flatten:
def __init__(self, base):
self._result = ordereddict() # key to list of tuples of (value, comment)
self._base = base
def add(self, file_name):
data = ruamel.yaml.round_trip_load(open(file_name))
self.walk_tree(data, self._base)
def walk_tree(self, data, prefix=None):
"""
this is based on ruamel.yaml.scalarstring.walk_tree
"""
if prefix is None:
prefix = ""
if isinstance(data, dict):
for key in data:
full_key = self.full_key(key, prefix)
value = data[key]
if isinstance(value, (dict, list)):
self.walk_tree(value, full_key)
continue
# value is a scalar
comment_token = data.ca.items.get(key)
comment = comment_token[2].value if comment_token else None
self._result.setdefault(full_key, []).append((value, comment))
elif isinstance(base, list):
print("don't know how to handle lists", prefix)
sys.exit(1)
def full_key(self, key, prefix):
"""
check here for valid keys
"""
if not isinstance(key, string_types):
print('key has to be string', repr(key), prefix)
sys.exit(1)
if '.' in key:
print('dot in key not allowed', repr(key), prefix)
sys.exit(1)
if key == '':
print('empty key not allowed', repr(key), prefix)
sys.exit(1)
return prefix + '.' + key
def dump(self, out):
res = CommentedMap()
for path in self._result:
values = self._result[path]
if len(values) == 1: # single value for path
res[path] = values[0][0]
if values[0][1]:
res.yaml_add_eol_comment(values[0][1], key=path)
continue
res[path] = seq = CommentedSeq()
for index, value in enumerate(values):
seq.append(value[0])
if values[0][1]:
res.yaml_add_eol_comment(values[0][1], key=index)
ruamel.yaml.round_trip_dump(res, out)
flatten = Flatten('myproject.section.more_information')
for file_name in glob.glob('*.yaml'):
flatten.add(file_name)
flatten.dump(sys.stdout)
If you have an additional input file:
default:
learn_more:
commented: value # this value has a comment
description: another description
then the result is:
myproject.section.more_information.default.heading: Here’s A Title
myproject.section.more_information.default.learn_more.title: Title of Thing
myproject.section.more_information.default.learn_more.url: www.url.com
myproject.section.more_information.default.learn_more.description:
- description
- another description
myproject.section.more_information.default.learn_more.opens_new_window: true
myproject.section.more_information.default.learn_more.commented: value # this value has a comment
Of course if your input doesn't have double paths, your output won't have any lists.
By using string_types and ordereddict from ruamel.yaml makes this Python2 and Python3 compatible (you don't indicate which version you are using).
The ordereddict preserves the original key ordering, but this is of course dependent on the processing order of the files. If you want the paths sorted, just change dump() to use:
for path in sorted(self._result):
Also note that the comment on the 'commented' dictionary entry is preserved.
¹ ruamel.yaml is a YAML 1.2 parser that preserves comments and other data on round-tripping (PyYAML does most parts of YAML 1.1). Disclaimer: I am the author of ruamel.yaml

Keep a simple list of strings, being the most recent key at each indentation depth. When you progress from one line to the next with no change, simply change the item at the end of the list. When you "out-dent", pop the last item off the list. When you indent, append to the list.
Then, each time you hit a colon, the corresponding key item is the concatenation of the strings in the list, something like:
'.'.join(key_list)
Does that get you moving at an honorable speed?

Can PyYAML dump dict items in non-alphabetical order?

I'm using yaml.dump to output a dict. It prints out each item in alphabetical order based on the key.
>>> d = {"z":0,"y":0,"x":0}
>>> yaml.dump( d, default_flow_style=False )
'x: 0\ny: 0\nz: 0\n'
Is there a way to control the order of the key/value pairs?
In my particular use case, printing in reverse would (coincidentally) be good enough. For completeness though, I'm looking for an answer that shows how to control the order more precisely.
I've looked at using collections.OrderedDict but PyYAML doesn't (seem to) support it. I've also looked at subclassing yaml.Dumper, but I haven't been able to figure out if it has the ability to change item order.

If you upgrade PyYAML to 5.1 version, now, it supports dump without sorting the keys like this:
yaml.dump(data, sort_keys=False)
As shown in help(yaml.Dumper), sort_keys defaults to True:
Dumper(stream, default_style=None, default_flow_style=False,
canonical=None, indent=None, width=None, allow_unicode=None,
line_break=None, encoding=None, explicit_start=None, explicit_end=None,
version=None, tags=None, sort_keys=True)
(These are passed as kwargs to yaml.dump)

There's probably a better workaround, but I couldn't find anything in the documentation or the source.
Python 2 (see comments)
I subclassed OrderedDict and made it return a list of unsortable items:
from collections import OrderedDict
class UnsortableList(list):
def sort(self, *args, **kwargs):
pass
class UnsortableOrderedDict(OrderedDict):
def items(self, *args, **kwargs):
return UnsortableList(OrderedDict.items(self, *args, **kwargs))
yaml.add_representer(UnsortableOrderedDict, yaml.representer.SafeRepresenter.represent_dict)
And it seems to work:
>>> d = UnsortableOrderedDict([
... ('z', 0),
... ('y', 0),
... ('x', 0)
... ])
>>> yaml.dump(d, default_flow_style=False)
'z: 0\ny: 0\nx: 0\n'
Python 3 or 2 (see comments)
You can also write a custom representer, but I don't know if you'll run into problems later on, as I stripped out some style checking code from it:
import yaml
from collections import OrderedDict
def represent_ordereddict(dumper, data):
value = []
for item_key, item_value in data.items():
node_key = dumper.represent_data(item_key)
node_value = dumper.represent_data(item_value)
value.append((node_key, node_value))
return yaml.nodes.MappingNode(u'tag:yaml.org,2002:map', value)
yaml.add_representer(OrderedDict, represent_ordereddict)
But with that, you can use the native OrderedDict class.

For Python 3.7+, dicts preserve insertion order. Since PyYAML 5.1.x, you can disable the sorting of keys (#254). Unfortunately, the sorting keys behaviour does still default to True.
>>> import yaml
>>> yaml.dump({"b":1, "a": 2})
'a: 2\nb: 1\n'
>>> yaml.dump({"b":1, "a": 2}, sort_keys=False)
'b: 1\na: 2\n'
My project oyaml is a monkeypatch/drop-in replacement for PyYAML. It will preserve dict order by default in all Python versions and PyYAML versions.
>>> import oyaml as yaml # pip install oyaml
>>> yaml.dump({"b":1, "a": 2})
'b: 1\na: 2\n'
Additionally, it will dump the collections.OrderedDict subclass as normal mappings, rather than Python objects.
>>> from collections import OrderedDict
>>> d = OrderedDict([("b", 1), ("a", 2)])
>>> import yaml
>>> yaml.dump(d)
'!!python/object/apply:collections.OrderedDict\n- - - b\n - 1\n - - a\n - 2\n'
>>> yaml.safe_dump(d)
RepresenterError: ('cannot represent an object', OrderedDict([('b', 1), ('a', 2)]))
>>> import oyaml as yaml
>>> yaml.dump(d)
'b: 1\na: 2\n'
>>> yaml.safe_dump(d)
'b: 1\na: 2\n'

One-liner to rule them all:
yaml.add_representer(dict, lambda self, data: yaml.representer.SafeRepresenter.represent_dict(self, data.items()))
That's it. Finally. After all those years and hours, the mighty represent_dict has been defeated by giving it the dict.items() instead of just dict
Here is how it works:
This is the relevant PyYaml source code:
if hasattr(mapping, 'items'):
mapping = list(mapping.items())
try:
mapping = sorted(mapping)
except TypeError:
pass
for item_key, item_value in mapping:
To prevent the sorting we just need some Iterable[Pair] object that does not have .items().
dict_items is a perfect candidate for this.
Here is how to do this without affecting the global state of the yaml module:
#Using a custom Dumper class to prevent changing the global state
class CustomDumper(yaml.Dumper):
#Super neat hack to preserve the mapping key order. See https://stackoverflow.com/a/52621703/1497385
def represent_dict_preserve_order(self, data):
return self.represent_dict(data.items())
CustomDumper.add_representer(dict, CustomDumper.represent_dict_preserve_order)
return yaml.dump(component_dict, Dumper=CustomDumper)

This is really just an addendum to #Blender's answer. If you look in the PyYAML source, at the representer.py module, You find this method:
def represent_mapping(self, tag, mapping, flow_style=None):
value = []
node = MappingNode(tag, value, flow_style=flow_style)
if self.alias_key is not None:
self.represented_objects[self.alias_key] = node
best_style = True
if hasattr(mapping, 'items'):
mapping = mapping.items()
mapping.sort()
for item_key, item_value in mapping:
node_key = self.represent_data(item_key)
node_value = self.represent_data(item_value)
if not (isinstance(node_key, ScalarNode) and not node_key.style):
best_style = False
if not (isinstance(node_value, ScalarNode) and not node_value.style):
best_style = False
value.append((node_key, node_value))
if flow_style is None:
if self.default_flow_style is not None:
node.flow_style = self.default_flow_style
else:
node.flow_style = best_style
return node
If you simply remove the mapping.sort() line, then it maintains the order of items in the OrderedDict.
Another solution is given in this post. It's similar to #Blender's, but works for safe_dump. The common element is the converting of the dict to a list of tuples, so the if hasattr(mapping, 'items') check evaluates to false.
Update:
I just noticed that The Fedora Project's EPEL repo has a package called python2-yamlordereddictloader, and there's one for Python 3 as well. The upstream project for that package is likely cross-platform.

There are two things you need to do to get this as you want:
you need to use something else than a dict, because it doesn't keep the items ordered
you need to dump that alternative in the appropriate way.¹
import sys
import ruamel.yaml
from ruamel.yaml.comments import CommentedMap
d = CommentedMap()
d['z'] = 0
d['y'] = 0
d['x'] = 0
ruamel.yaml.round_trip_dump(d, sys.stdout)
output:
z: 0
y: 0
x: 0
¹ This was done using ruamel.yaml a YAML 1.2 parser, of which I am the author.

If safe_dump (i.e. dump with Dumper=SafeDumper) is used, then calling yaml.add_representer has no effect. In such case it is necessary to call add_representer method explicitly on SafeRepresenter class:
yaml.representer.SafeRepresenter.add_representer(
OrderedDict, ordered_dict_representer
)

I was also looking for an answer to the question "how to dump mappings with the order preserved?" I couldn't follow the solution given above as i am new to pyyaml and python. After spending some time on the pyyaml documentation and other forums i found this.
You can use the tag
!!omap
to dump the mappings by preserving the order. If you want to play with the order i think you have to go for keys:values
The links below can help for better understanding.
https://bitbucket.org/xi/pyyaml/issue/13/loading-and-then-dumping-an-omap-is-broken
http://yaml.org/type/omap.html

The following setting makes sure the content is not sorted in the output:
yaml.sort_base_mapping_type_on_output = False

Develop Reference

Python is a programming language that lets you work quickly and integrate systems more effectively.

Is it possible to remove unecessary nested structure in yaml file? - python

Related

How to map over a CommentedMap while preserving the comments/style?

How to parse Cloudformation YAML to get all the !ImportValue from YAML template?

Accessing / Replacing Yaml values via python using variables

Return all keys along with value in nested dictionary

Can PyYAML dump dict items in non-alphabetical order?

Categories

Resources