Dumping a dictionary to a YAML file while preserving order - python

I've been trying to dump a dictionary to a YAML file. The problem is that the program that imports the YAML file needs the keywords in a specific order. This order is not alphabetically.
import yaml
import os
baseFile = 'myfile.dat'
lyml = [{'BaseFile': baseFile}]
lyml.append({'Environment':{'WaterDepth':0.,'WaveDirection':0.,'WaveGamma':0.,'WaveAlpha':0.}})
CaseName = 'OrderedDict.yml'
CaseDir = r'C:\Users\BTO\Documents\Projects\Mooring code testen'
CaseFile = os.path.join(CaseDir, CaseName)
with open(CaseFile, 'w') as f:
yaml.dump(lyml, f, default_flow_style=False)
This produces a *.yml file which is formatted like this:
- BaseFile: myfile.dat
- Environment:
WaterDepth: 0.0
WaveAlpha: 0.0
WaveDirection: 0.0
WaveGamma: 0.0
But what I want is that the order is preserved:
- BaseFile: myfile.dat
- Environment:
WaterDepth: 0.0
WaveDirection: 0.0
WaveGamma: 0.0
WaveAlpha: 0.0
Is this possible?

yaml.dump has a sort_keys keyword argument that is set to True by default. Set it to False to not reorder:
with open(CaseFile, 'w') as f:
yaml.dump(lyml, f, default_flow_style=False, sort_keys=False)

Use an OrderedDict instead of dict. Run the below setup code at the start. Now yaml.dump, should preserve the order. More details here and here
def setup_yaml():
""" https://stackoverflow.com/a/8661021 """
represent_dict_order = lambda self, data: self.represent_mapping('tag:yaml.org,2002:map', data.items())
yaml.add_representer(OrderedDict, represent_dict_order)
setup_yaml()
Example: https://pastebin.com/raw.php?i=NpcT6Yc4

PyYAML supports representer to serialize a class instance to a YAML node.
yaml.YAMLObject uses metaclass magic to register a constructor, which transforms a YAML node to a class instance, and a representer, which serializes a class instance to a YAML node.
Add following lines above your code:
def represent_dictionary_order(self, dict_data):
return self.represent_mapping('tag:yaml.org,2002:map', dict_data.items())
def setup_yaml():
yaml.add_representer(OrderedDict, represent_dictionary_order)
setup_yaml()
Then you can use OrderedDict to preserve the order in yaml.dump():
import yaml
from collections import OrderedDict
def represent_dictionary_order(self, dict_data):
return self.represent_mapping('tag:yaml.org,2002:map', dict_data.items())
def setup_yaml():
yaml.add_representer(OrderedDict, represent_dictionary_order)
setup_yaml()
dic = OrderedDict()
dic['a'] = 1
dic['b'] = 2
dic['c'] = 3
print(yaml.dump(dic))
# {a: 1, b: 2, c: 3}

Your difficulties are a result of assumptions on multiple levels that are incorrect and, depending on your YAML parser, might not be transparently resolvable.
In Python's dict the keys are unordered (at least for Python < 3.6). And even though the keys have some order in the source file, as soon as they are in the dict they aren't:
d = {'WaterDepth':0.,'WaveDirection':0.,'WaveGamma':0.,'WaveAlpha':0.}
for key in d:
print key
gives:
WaterDepth
WaveGamma
WaveAlpha
WaveDirection
If you want your keys ordered you can use the collections.OrderedDict type (or my own ruamel.ordereddict type which is in C and more than an order of magnitude faster), and you have to add the keys ordered, either as a list of tuples:
from ruamel.ordereddict import ordereddict
# from collections import OrderedDict as ordereddict # < this will work as well
d = ordereddict([('WaterDepth', 0.), ('WaveDirection', 0.), ('WaveGamma', 0.), ('WaveAlpha', 0.)])
for key in d:
print key
which will print the keys in the order they were specified in the source.
The second problem is that even if a Python dict has some key ordering that happens to be what you want, the YAML specification does explicitly say that mappings are unordered and that is the way e.g. PyYAML implements the dumping of Python dict to YAML mapping (And the other way around).
Also, if you dump an ordereddict or OrderedDict you normally don't get the plain YAML mapping that you indicate you want, but some tagged YAML entry.
As losing the order is often undesirable, in your case because your reader assumes some order, in my case because that made it difficult to compare versions because key ordering would not be consistent after insertion/deletion, I implemented round-trip consistency in ruamel.yaml so you can do:
import sys
import ruamel.yaml as yaml
yaml_str = """\
- BaseFile: myfile.dat
- Environment:
WaterDepth: 0.0
WaveDirection: 0.0
WaveGamma: 0.0
WaveAlpha: 0.0
"""
data = yaml.load(yaml_str, Loader=yaml.RoundTripLoader)
print(data)
yaml.dump(data, sys.stdout, Dumper=yaml.RoundTripDumper)
which gives you exactly your output result. data works as a dict (and so does `data['Environment'], but underneath they are smarter constructs that preserve order, comments, YAML anchor names etc). You can of course change these (adding/deleting key-value pairs), which is easy, but you can also build these from scratch:
import sys
import ruamel.yaml as yaml
from ruamel.yaml.comments import CommentedMap
baseFile = 'myfile.dat'
lyml = [{'BaseFile': baseFile}]
lyml.append({'Environment': CommentedMap([('WaterDepth', 0.), ('WaveDirection', 0.), ('WaveGamma', 0.), ('WaveAlpha', 0.)])})
yaml.dump(data, sys.stdout, Dumper=yaml.RoundTripDumper)
Which again prints the contents with keys in the order you want them.
I find the later less readable, than when starting from a YAML string, but it does construct the lyml data structure somewhat faster.

oyaml is a python library which preserves dict ordering when dumping.
It is specifically helpful in more complex cases where the dictionary is nested and may contain lists.
Once installed:
import oyaml as yaml
with open(CaseFile, 'w') as f:
f.write(yaml.dump(lyml))

Related

Is it possible to remove unecessary nested structure in yaml file?

I need to set a param that is deep inside a yaml object like below:
executors:
hpc01:
context:
cp2k:
charge: 0
Is it possible to make it more clear, for example
executors: hpc01: context: cp2k: charge: 0
I am using ruamel.yaml in Python to parse the file and it fails to parse the example. Is there some yaml dialect can support such style, or is there better way to write such configuration in standard yaml spec?
since all json is valid yaml...
executors: {"hpc01" : {"context": {"cp2k": {"charge": 0}}}}
should be valid...
a little proof:
from ruamel.yaml import YAML
a = YAML().load('executors: {"hpc01" : {"context": {"cp2k": {"charge": 0}}}}')
b = YAML().load('''executors:
hpc01:
context:
cp2k:
charge: 0''')
if a == b:
print ("equal")
will print: equal.
What you are proposing is invalid YAML, since the colon + space is parsed as a value indicator. Since
YAML can have mappings as keys for other mappings, you would get all kinds of interpretation issues, such as
should
a: b: c
be interpreted as a mapping with key a: b and value c or as a mapping with key a and value b: c.
If you want to write everything on one line, and don't want the overhead of YAML's flow-style, I suggest
you use the fact that the value indicator expects a space after the colon and do a little post-processing:
import sys
import ruamel.yaml
yaml_str = """\
before: -1
executors:hpc01:context:cp2k:charge: 0
after: 1
"""
COLON = ':'
def unfold_keys(d):
if isinstance(d, dict):
replace = []
for idx, (k, v) in enumerate(d.items()):
if COLON in k:
for segment in reversed(k.split(COLON)):
v = {segment: v}
replace.append((idx, k, v))
else:
unfold_keys(v)
for idx, oldkey, kv in replace:
del d[oldkey]
v = list(kv.values())[0]
# v.refold = True
d.insert(idx, list(kv.keys())[0], v)
elif isinstance(d, list):
for elem in d:
unfold_keys
return d
yaml = ruamel.yaml.YAML()
data = unfold_keys(yaml.load(yaml_str))
yaml.dump(data, sys.stdout)
which gives:
before: -1
executors:
hpc01:
context:
cp2k:
charge: 0
after: 1
Since ruamel.yaml parses mappings in the default mode to CommentedMap instances which have .insert() method,
you can actually preserve the position of the "unfolded" key in the mapping.
You can of course use another character (e.g. underscore). You can also reverse the process by uncommenting the line # v.refold = True and provide another recursive function that walks over the data and checks on that attribute and does the reverse
of unfold_keys(), just before dumping.

How to map over a CommentedMap while preserving the comments/style?

Given a ruamel.yaml CommentedMap, and some transformation function f: CommentedMap → Any, I would like to produce a new CommentedMap with transformed keys and values, but otherwise as similar as possible to the original.
If I don't care about preserving style, I can do this:
result = {
f(key) : f(value)
for key, value in my_commented_map.items()
}
If I didn't need to transform the keys (and I didn't care about mutating the original), I could do this:
for key, value in my_commented_map.items():
my_commented_map[key] = f(value)
The style and comment information are each attached to the
CommentedMap via special attributes. The style you can copy, but
the comments are partly indexed to key on which line they occur, and
if you transform that key, you also need to transform that indexed
comment.
In your first example you apply f() to both key and value, I'll use
seperate functions in my example, all-capsing the keys, and
all-lowercasing the values (this of course only works on string type
keys and value, so this is a restriction of the example, not of
the solution)
import sys
import ruamel.yaml
from ruamel.yaml.comments import CommentedMap as CM
from ruamel.yaml.comments import Format, Comment
yaml_str = """\
# example YAML document
abc: All Strings are Equal # but some Strings are more Equal then others
klm: Flying Blue
xYz: the End # for now
"""
def fkey(s):
return s.upper()
def fval(s):
return s.lower()
def transform(data, fk, fv):
d = CM()
if hasattr(data, Format.attrib):
setattr(d, Format.attrib, getattr(data, Format.attrib))
ca = None
if hasattr(data, Comment.attrib):
setattr(d, Comment.attrib, getattr(data, Comment.attrib))
ca = getattr(d, Comment.attrib)
# as the key mapping could map new keys on old keys, first gather everything
key_com = {}
for k in data:
new_k = fk(k)
d[new_k] = fv(data[k])
if ca is not None and k in ca.items:
key_com[new_k] = ca.items.pop(k)
if ca is not None:
assert len(ca.items) == 0
ca._items = key_com # the attribute, not the read-only property
return d
yaml = ruamel.yaml.YAML()
data = yaml.load(yaml_str)
# the following will print any new CommentedMap with curly braces, this just here to check
# if the style attribute copying is working correctly, remove from real code
yaml.default_flow_style = True
data = transform(data, fkey, fval)
yaml.dump(data, sys.stdout)
which gives:
# example YAML document
ABC: all strings are equal # but some Strings are more Equal then others
KLM: flying blue
XYZ: the end # for now
Please note:
the above tries (and succeeds) to start a comment in the original
column, if that is not possible, e.g. when a transformed key or
value takes more space, it is pushed further to the right.
if you have a more complex datastructure, recursively walk the tree, descending into mappings
and sequences. In that case it might be more easy to store (key, value, comment) tuples
then pop() all the keys and reinsert the stored values (instead of rebuilding the tree).

Preserving anchors and aliases in YAML with Python

I'm editing a large YAML document in Python with extensive anchors and aliases. I'd like to be able to determine how the anchor is derived based on data from the node it references.
For instance the node has a 'name' field and I'd like the anchor to be the value of that field rather than a random id number.
Is this possible with PyYAML or ruamel.yaml?
There are a few things to keep in mind:
YAML has no fields. I assume that that is your interpretation of keys in a mapping, so that you want an anchor associated with a mapping to be the same as the value for the key 'name'
During load time the event created when encountering an anchor doesn't know about whether it is an anchor on a scalar, sequence or mapping. Let alone that it could access the value for 'name'.
Changing the anchor during load is tricky, as you have to keep track of aliases referring to the original anchor (and map them to its new value)
In PyYAML the anchor name gets created during dump-ing, so you would have to hook into that when using PyYAML. You can do the same with ruamel.yaml
Only ruamel.yaml has the capability to preserve an anchor on round-trip. I.e. if you can have the anchor to be persistent, even if the value for the key 'name' changes (assuming you test e.g. on the default generated form idNNNN)
When you use ruamel.yaml you can recursively walk the data-structure, keeping track of nodes already visited (in case a child contains an ancestor) and when encountering a ruamel.yaml.comments.CommentedMap, set the anchor (currently the attribute with the value of ruamel.yaml.comments.Anchor.attrib i.e. _yaml_anchor). Untested code:
if isinstance(x, ruamel.yaml.comments.CommentedMap):
if 'name' in x:
x.yaml_set_anchor(x['name'])
If you have a YAML document that you can round-trip you can hook into the representer:
import sys
import ruamel.yaml
from ruamel.yaml.representer import RoundTripRepresenter
yaml_str = """\
# data = [dict(a=1, b=2, name='mydata'), dict(c=3)]
# data.append(data[0])
- &id001
a: 1
b: 2
name: mydata
- c: 3
- *id001
"""
class MyRTR(RoundTripRepresenter):
def represent_mapping(self, tag, mapping, flow_style=None):
if 'name' in mapping:
# if not isinstance(mapping, ruamel.yaml.comments.CommentedMap):
# mapping = ruamel.yaml.comments.CommentedMap(mapping)
mapping.yaml_set_anchor(mapping['name'])
mapping.yaml_set_anchor(mapping['name'])
return RoundTripRepresenter.represent_mapping(
self, tag, mapping, flow_style=flow_style)
yaml = ruamel.yaml.YAML()
yaml.Representer = MyRTR
data = yaml.load(yaml_str)
yaml.dump(data, sys.stdout)
which gives:
# data = [dict(a=1, b=2, name='mydata'), dict(c=3)]
# data.append(data[0])
- &mydata a: 1
b: 2
name: mydata
- c: 3
- *mydata
But note that this assumes that you loaded the data and that all dicts are actually CommentedMaps under the hood. If that is not the case (i.e. you added normal dicts, then uncomment the two lines doing the conversion.

Prettify a yml dump in python

I have this example:
import yaml
from collections import OrderedDict
data = [OrderedDict({"one": u"Hello\u2122", "two":["something", u"something2", u"something3"]})]
print yaml.dump(data, default_flow_style=False, default_style='"', allow_unicode=True, encoding="utf-8")
This prints out:
- !!python/object/apply:collections.OrderedDict
- - - "two"
- - "something"
- !!python/unicode "something2"
- !!python/unicode "something3"
- - "one"
- "Hello\u2122"
I use OrderedDict because I want to preserve the key order when dumping into YML. However, I don't care about the order when reading the YML back into python.
How can I prettify the dump to be something like:
- two:
- "something"
- "something2"
- "something3"
one:
- "Hello\xe2\x84\xa2"
And then read it back into python using yaml.load()?
One option is to use representers to change the serialization of some objects. But this has to be done on a case-by-case basis and I don't know if it will scale well for your particular use case.
Preserving the order in your OrderedDict will get a little more tricky, since the represent_mapping will always sort the items if your map has an items attribute, but passing the items as a tuple should work.
import yaml
from yaml.representer import SafeRepresenter
from collections import OrderedDict
data = [OrderedDict({"one": u"Hello\u2122",
"two":["something", u"something2", u"something3"]})]
# Represent an OrderedDict preserving order
def _represent_dict_in_order(dumper, odict):
return dumper.represent_mapping(u'tag:yaml.org,2002:map', odict.items())
# Use a safe dictionary representer for OrderectDict
yaml.add_representer(OrderedDict, _represent_dict_in_order)
# Use a safe string representer for unicode data
yaml.add_representer(unicode, SafeRepresenter.represent_unicode)
print yaml.dump(data, default_flow_style=False,
default_style='"', allow_unicode=True, encoding="utf-8")

Can PyYAML dump dict items in non-alphabetical order?

I'm using yaml.dump to output a dict. It prints out each item in alphabetical order based on the key.
>>> d = {"z":0,"y":0,"x":0}
>>> yaml.dump( d, default_flow_style=False )
'x: 0\ny: 0\nz: 0\n'
Is there a way to control the order of the key/value pairs?
In my particular use case, printing in reverse would (coincidentally) be good enough. For completeness though, I'm looking for an answer that shows how to control the order more precisely.
I've looked at using collections.OrderedDict but PyYAML doesn't (seem to) support it. I've also looked at subclassing yaml.Dumper, but I haven't been able to figure out if it has the ability to change item order.
If you upgrade PyYAML to 5.1 version, now, it supports dump without sorting the keys like this:
yaml.dump(data, sort_keys=False)
As shown in help(yaml.Dumper), sort_keys defaults to True:
Dumper(stream, default_style=None, default_flow_style=False,
canonical=None, indent=None, width=None, allow_unicode=None,
line_break=None, encoding=None, explicit_start=None, explicit_end=None,
version=None, tags=None, sort_keys=True)
(These are passed as kwargs to yaml.dump)
There's probably a better workaround, but I couldn't find anything in the documentation or the source.
Python 2 (see comments)
I subclassed OrderedDict and made it return a list of unsortable items:
from collections import OrderedDict
class UnsortableList(list):
def sort(self, *args, **kwargs):
pass
class UnsortableOrderedDict(OrderedDict):
def items(self, *args, **kwargs):
return UnsortableList(OrderedDict.items(self, *args, **kwargs))
yaml.add_representer(UnsortableOrderedDict, yaml.representer.SafeRepresenter.represent_dict)
And it seems to work:
>>> d = UnsortableOrderedDict([
... ('z', 0),
... ('y', 0),
... ('x', 0)
... ])
>>> yaml.dump(d, default_flow_style=False)
'z: 0\ny: 0\nx: 0\n'
Python 3 or 2 (see comments)
You can also write a custom representer, but I don't know if you'll run into problems later on, as I stripped out some style checking code from it:
import yaml
from collections import OrderedDict
def represent_ordereddict(dumper, data):
value = []
for item_key, item_value in data.items():
node_key = dumper.represent_data(item_key)
node_value = dumper.represent_data(item_value)
value.append((node_key, node_value))
return yaml.nodes.MappingNode(u'tag:yaml.org,2002:map', value)
yaml.add_representer(OrderedDict, represent_ordereddict)
But with that, you can use the native OrderedDict class.
For Python 3.7+, dicts preserve insertion order. Since PyYAML 5.1.x, you can disable the sorting of keys (#254). Unfortunately, the sorting keys behaviour does still default to True.
>>> import yaml
>>> yaml.dump({"b":1, "a": 2})
'a: 2\nb: 1\n'
>>> yaml.dump({"b":1, "a": 2}, sort_keys=False)
'b: 1\na: 2\n'
My project oyaml is a monkeypatch/drop-in replacement for PyYAML. It will preserve dict order by default in all Python versions and PyYAML versions.
>>> import oyaml as yaml # pip install oyaml
>>> yaml.dump({"b":1, "a": 2})
'b: 1\na: 2\n'
Additionally, it will dump the collections.OrderedDict subclass as normal mappings, rather than Python objects.
>>> from collections import OrderedDict
>>> d = OrderedDict([("b", 1), ("a", 2)])
>>> import yaml
>>> yaml.dump(d)
'!!python/object/apply:collections.OrderedDict\n- - - b\n - 1\n - - a\n - 2\n'
>>> yaml.safe_dump(d)
RepresenterError: ('cannot represent an object', OrderedDict([('b', 1), ('a', 2)]))
>>> import oyaml as yaml
>>> yaml.dump(d)
'b: 1\na: 2\n'
>>> yaml.safe_dump(d)
'b: 1\na: 2\n'
One-liner to rule them all:
yaml.add_representer(dict, lambda self, data: yaml.representer.SafeRepresenter.represent_dict(self, data.items()))
That's it. Finally. After all those years and hours, the mighty represent_dict has been defeated by giving it the dict.items() instead of just dict
Here is how it works:
This is the relevant PyYaml source code:
if hasattr(mapping, 'items'):
mapping = list(mapping.items())
try:
mapping = sorted(mapping)
except TypeError:
pass
for item_key, item_value in mapping:
To prevent the sorting we just need some Iterable[Pair] object that does not have .items().
dict_items is a perfect candidate for this.
Here is how to do this without affecting the global state of the yaml module:
#Using a custom Dumper class to prevent changing the global state
class CustomDumper(yaml.Dumper):
#Super neat hack to preserve the mapping key order. See https://stackoverflow.com/a/52621703/1497385
def represent_dict_preserve_order(self, data):
return self.represent_dict(data.items())
CustomDumper.add_representer(dict, CustomDumper.represent_dict_preserve_order)
return yaml.dump(component_dict, Dumper=CustomDumper)
This is really just an addendum to #Blender's answer. If you look in the PyYAML source, at the representer.py module, You find this method:
def represent_mapping(self, tag, mapping, flow_style=None):
value = []
node = MappingNode(tag, value, flow_style=flow_style)
if self.alias_key is not None:
self.represented_objects[self.alias_key] = node
best_style = True
if hasattr(mapping, 'items'):
mapping = mapping.items()
mapping.sort()
for item_key, item_value in mapping:
node_key = self.represent_data(item_key)
node_value = self.represent_data(item_value)
if not (isinstance(node_key, ScalarNode) and not node_key.style):
best_style = False
if not (isinstance(node_value, ScalarNode) and not node_value.style):
best_style = False
value.append((node_key, node_value))
if flow_style is None:
if self.default_flow_style is not None:
node.flow_style = self.default_flow_style
else:
node.flow_style = best_style
return node
If you simply remove the mapping.sort() line, then it maintains the order of items in the OrderedDict.
Another solution is given in this post. It's similar to #Blender's, but works for safe_dump. The common element is the converting of the dict to a list of tuples, so the if hasattr(mapping, 'items') check evaluates to false.
Update:
I just noticed that The Fedora Project's EPEL repo has a package called python2-yamlordereddictloader, and there's one for Python 3 as well. The upstream project for that package is likely cross-platform.
There are two things you need to do to get this as you want:
you need to use something else than a dict, because it doesn't keep the items ordered
you need to dump that alternative in the appropriate way.¹
import sys
import ruamel.yaml
from ruamel.yaml.comments import CommentedMap
d = CommentedMap()
d['z'] = 0
d['y'] = 0
d['x'] = 0
ruamel.yaml.round_trip_dump(d, sys.stdout)
output:
z: 0
y: 0
x: 0
¹ This was done using ruamel.yaml a YAML 1.2 parser, of which I am the author.
If safe_dump (i.e. dump with Dumper=SafeDumper) is used, then calling yaml.add_representer has no effect. In such case it is necessary to call add_representer method explicitly on SafeRepresenter class:
yaml.representer.SafeRepresenter.add_representer(
OrderedDict, ordered_dict_representer
)
I was also looking for an answer to the question "how to dump mappings with the order preserved?" I couldn't follow the solution given above as i am new to pyyaml and python. After spending some time on the pyyaml documentation and other forums i found this.
You can use the tag
!!omap
to dump the mappings by preserving the order. If you want to play with the order i think you have to go for keys:values
The links below can help for better understanding.
https://bitbucket.org/xi/pyyaml/issue/13/loading-and-then-dumping-an-omap-is-broken
http://yaml.org/type/omap.html
The following setting makes sure the content is not sorted in the output:
yaml.sort_base_mapping_type_on_output = False

Categories

Resources