I have this example:
import yaml
from collections import OrderedDict
data = [OrderedDict({"one": u"Hello\u2122", "two":["something", u"something2", u"something3"]})]
print yaml.dump(data, default_flow_style=False, default_style='"', allow_unicode=True, encoding="utf-8")
This prints out:
- !!python/object/apply:collections.OrderedDict
- - - "two"
- - "something"
- !!python/unicode "something2"
- !!python/unicode "something3"
- - "one"
- "Hello\u2122"
I use OrderedDict because I want to preserve the key order when dumping into YML. However, I don't care about the order when reading the YML back into python.
How can I prettify the dump to be something like:
- two:
- "something"
- "something2"
- "something3"
one:
- "Hello\xe2\x84\xa2"
And then read it back into python using yaml.load()?
One option is to use representers to change the serialization of some objects. But this has to be done on a case-by-case basis and I don't know if it will scale well for your particular use case.
Preserving the order in your OrderedDict will get a little more tricky, since the represent_mapping will always sort the items if your map has an items attribute, but passing the items as a tuple should work.
import yaml
from yaml.representer import SafeRepresenter
from collections import OrderedDict
data = [OrderedDict({"one": u"Hello\u2122",
"two":["something", u"something2", u"something3"]})]
# Represent an OrderedDict preserving order
def _represent_dict_in_order(dumper, odict):
return dumper.represent_mapping(u'tag:yaml.org,2002:map', odict.items())
# Use a safe dictionary representer for OrderectDict
yaml.add_representer(OrderedDict, _represent_dict_in_order)
# Use a safe string representer for unicode data
yaml.add_representer(unicode, SafeRepresenter.represent_unicode)
print yaml.dump(data, default_flow_style=False,
default_style='"', allow_unicode=True, encoding="utf-8")
Related
I'm editing a large YAML document in Python with extensive anchors and aliases. I'd like to be able to determine how the anchor is derived based on data from the node it references.
For instance the node has a 'name' field and I'd like the anchor to be the value of that field rather than a random id number.
Is this possible with PyYAML or ruamel.yaml?
There are a few things to keep in mind:
YAML has no fields. I assume that that is your interpretation of keys in a mapping, so that you want an anchor associated with a mapping to be the same as the value for the key 'name'
During load time the event created when encountering an anchor doesn't know about whether it is an anchor on a scalar, sequence or mapping. Let alone that it could access the value for 'name'.
Changing the anchor during load is tricky, as you have to keep track of aliases referring to the original anchor (and map them to its new value)
In PyYAML the anchor name gets created during dump-ing, so you would have to hook into that when using PyYAML. You can do the same with ruamel.yaml
Only ruamel.yaml has the capability to preserve an anchor on round-trip. I.e. if you can have the anchor to be persistent, even if the value for the key 'name' changes (assuming you test e.g. on the default generated form idNNNN)
When you use ruamel.yaml you can recursively walk the data-structure, keeping track of nodes already visited (in case a child contains an ancestor) and when encountering a ruamel.yaml.comments.CommentedMap, set the anchor (currently the attribute with the value of ruamel.yaml.comments.Anchor.attrib i.e. _yaml_anchor). Untested code:
if isinstance(x, ruamel.yaml.comments.CommentedMap):
if 'name' in x:
x.yaml_set_anchor(x['name'])
If you have a YAML document that you can round-trip you can hook into the representer:
import sys
import ruamel.yaml
from ruamel.yaml.representer import RoundTripRepresenter
yaml_str = """\
# data = [dict(a=1, b=2, name='mydata'), dict(c=3)]
# data.append(data[0])
- &id001
a: 1
b: 2
name: mydata
- c: 3
- *id001
"""
class MyRTR(RoundTripRepresenter):
def represent_mapping(self, tag, mapping, flow_style=None):
if 'name' in mapping:
# if not isinstance(mapping, ruamel.yaml.comments.CommentedMap):
# mapping = ruamel.yaml.comments.CommentedMap(mapping)
mapping.yaml_set_anchor(mapping['name'])
mapping.yaml_set_anchor(mapping['name'])
return RoundTripRepresenter.represent_mapping(
self, tag, mapping, flow_style=flow_style)
yaml = ruamel.yaml.YAML()
yaml.Representer = MyRTR
data = yaml.load(yaml_str)
yaml.dump(data, sys.stdout)
which gives:
# data = [dict(a=1, b=2, name='mydata'), dict(c=3)]
# data.append(data[0])
- &mydata a: 1
b: 2
name: mydata
- c: 3
- *mydata
But note that this assumes that you loaded the data and that all dicts are actually CommentedMaps under the hood. If that is not the case (i.e. you added normal dicts, then uncomment the two lines doing the conversion.
I have a YAML file and it looks like below
test:
- exam.com
- exam1.com
- exam2.com
test2:
- examp.com
- examp1.com
- examp2.com
I like to manage this file using python.
Task is, I like to add an entry under "test2" and delete entry from "test".
You first have to load the data, which will give you a top-level dict (in a variable called data in the following example), the values for the keys will be lists. On those lists you can do the del resp. insert() (or append())
import sys
import ruamel.yaml
yaml_str = """\
test:
- exam.com
- exam1.com
- exam2.com
test2:
- examp.com
- examp1.com # want to insert after this
- examp2.com
"""
data = ruamel.yaml.round_trip_load(yaml_str)
del data['test'][1]
data['test2'].insert(2, 'examp1.5')
ruamel.yaml.round_trip_dump(data, sys.stdout, block_seq_indent=1)
gives:
test:
- exam.com
- exam2.com
test2:
- examp.com
- examp1.com # want to insert after this
- examp1.5
- examp2.com
The block_seq_indent=1 is necessary as by default ruamel.yaml will left align a sequence value with the key.¹
If you want to get rid of the comment in the output you can do:
data['test2']._yaml_comment = None
¹ This was done using ruamel.yaml a YAML 1.2 parser, of which I am the author.
I've been trying to dump a dictionary to a YAML file. The problem is that the program that imports the YAML file needs the keywords in a specific order. This order is not alphabetically.
import yaml
import os
baseFile = 'myfile.dat'
lyml = [{'BaseFile': baseFile}]
lyml.append({'Environment':{'WaterDepth':0.,'WaveDirection':0.,'WaveGamma':0.,'WaveAlpha':0.}})
CaseName = 'OrderedDict.yml'
CaseDir = r'C:\Users\BTO\Documents\Projects\Mooring code testen'
CaseFile = os.path.join(CaseDir, CaseName)
with open(CaseFile, 'w') as f:
yaml.dump(lyml, f, default_flow_style=False)
This produces a *.yml file which is formatted like this:
- BaseFile: myfile.dat
- Environment:
WaterDepth: 0.0
WaveAlpha: 0.0
WaveDirection: 0.0
WaveGamma: 0.0
But what I want is that the order is preserved:
- BaseFile: myfile.dat
- Environment:
WaterDepth: 0.0
WaveDirection: 0.0
WaveGamma: 0.0
WaveAlpha: 0.0
Is this possible?
yaml.dump has a sort_keys keyword argument that is set to True by default. Set it to False to not reorder:
with open(CaseFile, 'w') as f:
yaml.dump(lyml, f, default_flow_style=False, sort_keys=False)
Use an OrderedDict instead of dict. Run the below setup code at the start. Now yaml.dump, should preserve the order. More details here and here
def setup_yaml():
""" https://stackoverflow.com/a/8661021 """
represent_dict_order = lambda self, data: self.represent_mapping('tag:yaml.org,2002:map', data.items())
yaml.add_representer(OrderedDict, represent_dict_order)
setup_yaml()
Example: https://pastebin.com/raw.php?i=NpcT6Yc4
PyYAML supports representer to serialize a class instance to a YAML node.
yaml.YAMLObject uses metaclass magic to register a constructor, which transforms a YAML node to a class instance, and a representer, which serializes a class instance to a YAML node.
Add following lines above your code:
def represent_dictionary_order(self, dict_data):
return self.represent_mapping('tag:yaml.org,2002:map', dict_data.items())
def setup_yaml():
yaml.add_representer(OrderedDict, represent_dictionary_order)
setup_yaml()
Then you can use OrderedDict to preserve the order in yaml.dump():
import yaml
from collections import OrderedDict
def represent_dictionary_order(self, dict_data):
return self.represent_mapping('tag:yaml.org,2002:map', dict_data.items())
def setup_yaml():
yaml.add_representer(OrderedDict, represent_dictionary_order)
setup_yaml()
dic = OrderedDict()
dic['a'] = 1
dic['b'] = 2
dic['c'] = 3
print(yaml.dump(dic))
# {a: 1, b: 2, c: 3}
Your difficulties are a result of assumptions on multiple levels that are incorrect and, depending on your YAML parser, might not be transparently resolvable.
In Python's dict the keys are unordered (at least for Python < 3.6). And even though the keys have some order in the source file, as soon as they are in the dict they aren't:
d = {'WaterDepth':0.,'WaveDirection':0.,'WaveGamma':0.,'WaveAlpha':0.}
for key in d:
print key
gives:
WaterDepth
WaveGamma
WaveAlpha
WaveDirection
If you want your keys ordered you can use the collections.OrderedDict type (or my own ruamel.ordereddict type which is in C and more than an order of magnitude faster), and you have to add the keys ordered, either as a list of tuples:
from ruamel.ordereddict import ordereddict
# from collections import OrderedDict as ordereddict # < this will work as well
d = ordereddict([('WaterDepth', 0.), ('WaveDirection', 0.), ('WaveGamma', 0.), ('WaveAlpha', 0.)])
for key in d:
print key
which will print the keys in the order they were specified in the source.
The second problem is that even if a Python dict has some key ordering that happens to be what you want, the YAML specification does explicitly say that mappings are unordered and that is the way e.g. PyYAML implements the dumping of Python dict to YAML mapping (And the other way around).
Also, if you dump an ordereddict or OrderedDict you normally don't get the plain YAML mapping that you indicate you want, but some tagged YAML entry.
As losing the order is often undesirable, in your case because your reader assumes some order, in my case because that made it difficult to compare versions because key ordering would not be consistent after insertion/deletion, I implemented round-trip consistency in ruamel.yaml so you can do:
import sys
import ruamel.yaml as yaml
yaml_str = """\
- BaseFile: myfile.dat
- Environment:
WaterDepth: 0.0
WaveDirection: 0.0
WaveGamma: 0.0
WaveAlpha: 0.0
"""
data = yaml.load(yaml_str, Loader=yaml.RoundTripLoader)
print(data)
yaml.dump(data, sys.stdout, Dumper=yaml.RoundTripDumper)
which gives you exactly your output result. data works as a dict (and so does `data['Environment'], but underneath they are smarter constructs that preserve order, comments, YAML anchor names etc). You can of course change these (adding/deleting key-value pairs), which is easy, but you can also build these from scratch:
import sys
import ruamel.yaml as yaml
from ruamel.yaml.comments import CommentedMap
baseFile = 'myfile.dat'
lyml = [{'BaseFile': baseFile}]
lyml.append({'Environment': CommentedMap([('WaterDepth', 0.), ('WaveDirection', 0.), ('WaveGamma', 0.), ('WaveAlpha', 0.)])})
yaml.dump(data, sys.stdout, Dumper=yaml.RoundTripDumper)
Which again prints the contents with keys in the order you want them.
I find the later less readable, than when starting from a YAML string, but it does construct the lyml data structure somewhat faster.
oyaml is a python library which preserves dict ordering when dumping.
It is specifically helpful in more complex cases where the dictionary is nested and may contain lists.
Once installed:
import oyaml as yaml
with open(CaseFile, 'w') as f:
f.write(yaml.dump(lyml))
I'm using yaml.dump to output a dict. It prints out each item in alphabetical order based on the key.
>>> d = {"z":0,"y":0,"x":0}
>>> yaml.dump( d, default_flow_style=False )
'x: 0\ny: 0\nz: 0\n'
Is there a way to control the order of the key/value pairs?
In my particular use case, printing in reverse would (coincidentally) be good enough. For completeness though, I'm looking for an answer that shows how to control the order more precisely.
I've looked at using collections.OrderedDict but PyYAML doesn't (seem to) support it. I've also looked at subclassing yaml.Dumper, but I haven't been able to figure out if it has the ability to change item order.
If you upgrade PyYAML to 5.1 version, now, it supports dump without sorting the keys like this:
yaml.dump(data, sort_keys=False)
As shown in help(yaml.Dumper), sort_keys defaults to True:
Dumper(stream, default_style=None, default_flow_style=False,
canonical=None, indent=None, width=None, allow_unicode=None,
line_break=None, encoding=None, explicit_start=None, explicit_end=None,
version=None, tags=None, sort_keys=True)
(These are passed as kwargs to yaml.dump)
There's probably a better workaround, but I couldn't find anything in the documentation or the source.
Python 2 (see comments)
I subclassed OrderedDict and made it return a list of unsortable items:
from collections import OrderedDict
class UnsortableList(list):
def sort(self, *args, **kwargs):
pass
class UnsortableOrderedDict(OrderedDict):
def items(self, *args, **kwargs):
return UnsortableList(OrderedDict.items(self, *args, **kwargs))
yaml.add_representer(UnsortableOrderedDict, yaml.representer.SafeRepresenter.represent_dict)
And it seems to work:
>>> d = UnsortableOrderedDict([
... ('z', 0),
... ('y', 0),
... ('x', 0)
... ])
>>> yaml.dump(d, default_flow_style=False)
'z: 0\ny: 0\nx: 0\n'
Python 3 or 2 (see comments)
You can also write a custom representer, but I don't know if you'll run into problems later on, as I stripped out some style checking code from it:
import yaml
from collections import OrderedDict
def represent_ordereddict(dumper, data):
value = []
for item_key, item_value in data.items():
node_key = dumper.represent_data(item_key)
node_value = dumper.represent_data(item_value)
value.append((node_key, node_value))
return yaml.nodes.MappingNode(u'tag:yaml.org,2002:map', value)
yaml.add_representer(OrderedDict, represent_ordereddict)
But with that, you can use the native OrderedDict class.
For Python 3.7+, dicts preserve insertion order. Since PyYAML 5.1.x, you can disable the sorting of keys (#254). Unfortunately, the sorting keys behaviour does still default to True.
>>> import yaml
>>> yaml.dump({"b":1, "a": 2})
'a: 2\nb: 1\n'
>>> yaml.dump({"b":1, "a": 2}, sort_keys=False)
'b: 1\na: 2\n'
My project oyaml is a monkeypatch/drop-in replacement for PyYAML. It will preserve dict order by default in all Python versions and PyYAML versions.
>>> import oyaml as yaml # pip install oyaml
>>> yaml.dump({"b":1, "a": 2})
'b: 1\na: 2\n'
Additionally, it will dump the collections.OrderedDict subclass as normal mappings, rather than Python objects.
>>> from collections import OrderedDict
>>> d = OrderedDict([("b", 1), ("a", 2)])
>>> import yaml
>>> yaml.dump(d)
'!!python/object/apply:collections.OrderedDict\n- - - b\n - 1\n - - a\n - 2\n'
>>> yaml.safe_dump(d)
RepresenterError: ('cannot represent an object', OrderedDict([('b', 1), ('a', 2)]))
>>> import oyaml as yaml
>>> yaml.dump(d)
'b: 1\na: 2\n'
>>> yaml.safe_dump(d)
'b: 1\na: 2\n'
One-liner to rule them all:
yaml.add_representer(dict, lambda self, data: yaml.representer.SafeRepresenter.represent_dict(self, data.items()))
That's it. Finally. After all those years and hours, the mighty represent_dict has been defeated by giving it the dict.items() instead of just dict
Here is how it works:
This is the relevant PyYaml source code:
if hasattr(mapping, 'items'):
mapping = list(mapping.items())
try:
mapping = sorted(mapping)
except TypeError:
pass
for item_key, item_value in mapping:
To prevent the sorting we just need some Iterable[Pair] object that does not have .items().
dict_items is a perfect candidate for this.
Here is how to do this without affecting the global state of the yaml module:
#Using a custom Dumper class to prevent changing the global state
class CustomDumper(yaml.Dumper):
#Super neat hack to preserve the mapping key order. See https://stackoverflow.com/a/52621703/1497385
def represent_dict_preserve_order(self, data):
return self.represent_dict(data.items())
CustomDumper.add_representer(dict, CustomDumper.represent_dict_preserve_order)
return yaml.dump(component_dict, Dumper=CustomDumper)
This is really just an addendum to #Blender's answer. If you look in the PyYAML source, at the representer.py module, You find this method:
def represent_mapping(self, tag, mapping, flow_style=None):
value = []
node = MappingNode(tag, value, flow_style=flow_style)
if self.alias_key is not None:
self.represented_objects[self.alias_key] = node
best_style = True
if hasattr(mapping, 'items'):
mapping = mapping.items()
mapping.sort()
for item_key, item_value in mapping:
node_key = self.represent_data(item_key)
node_value = self.represent_data(item_value)
if not (isinstance(node_key, ScalarNode) and not node_key.style):
best_style = False
if not (isinstance(node_value, ScalarNode) and not node_value.style):
best_style = False
value.append((node_key, node_value))
if flow_style is None:
if self.default_flow_style is not None:
node.flow_style = self.default_flow_style
else:
node.flow_style = best_style
return node
If you simply remove the mapping.sort() line, then it maintains the order of items in the OrderedDict.
Another solution is given in this post. It's similar to #Blender's, but works for safe_dump. The common element is the converting of the dict to a list of tuples, so the if hasattr(mapping, 'items') check evaluates to false.
Update:
I just noticed that The Fedora Project's EPEL repo has a package called python2-yamlordereddictloader, and there's one for Python 3 as well. The upstream project for that package is likely cross-platform.
There are two things you need to do to get this as you want:
you need to use something else than a dict, because it doesn't keep the items ordered
you need to dump that alternative in the appropriate way.¹
import sys
import ruamel.yaml
from ruamel.yaml.comments import CommentedMap
d = CommentedMap()
d['z'] = 0
d['y'] = 0
d['x'] = 0
ruamel.yaml.round_trip_dump(d, sys.stdout)
output:
z: 0
y: 0
x: 0
¹ This was done using ruamel.yaml a YAML 1.2 parser, of which I am the author.
If safe_dump (i.e. dump with Dumper=SafeDumper) is used, then calling yaml.add_representer has no effect. In such case it is necessary to call add_representer method explicitly on SafeRepresenter class:
yaml.representer.SafeRepresenter.add_representer(
OrderedDict, ordered_dict_representer
)
I was also looking for an answer to the question "how to dump mappings with the order preserved?" I couldn't follow the solution given above as i am new to pyyaml and python. After spending some time on the pyyaml documentation and other forums i found this.
You can use the tag
!!omap
to dump the mappings by preserving the order. If you want to play with the order i think you have to go for keys:values
The links below can help for better understanding.
https://bitbucket.org/xi/pyyaml/issue/13/loading-and-then-dumping-an-omap-is-broken
http://yaml.org/type/omap.html
The following setting makes sure the content is not sorted in the output:
yaml.sort_base_mapping_type_on_output = False
I'm attempting to:
load dictionary
update/change the dictionary
save
(repeat)
Problem: I want to work with just 1 dictionary (players_scores)
but the defaultdict expression creates a completely seperate dictionary.
How do I load, update, and save to one dictionary?
Code:
from collections import defaultdict#for manipulating dict
players_scores = defaultdict(dict)
import ast #module for removing string from dict once it's called back
a = {}
open_file = open("scores", "w")
open_file.write(str(a))
open_file.close()
open_file2 = open("scores")
open_file2.readlines()
open_file2.seek(0)
i = input("Enter new player's name: ").upper()
players_scores[i]['GOLF'] = 0
players_scores[i]['MON DEAL'] = 0
print()
scores_str = open_file2.read()
players_scores = ast.literal_eval(scores_str)
open_file2.close()
print(players_scores)
You are wiping your changes; instead of writing out your file, you read it anew and the result is used to replace your players_scores dictionary. Your defaultdict worked just fine before that, even if you can't really use defaultdict here (ast.literal_eval() does not support collections.defaultdict, only standard python literal dict notation).
You can simplify your code by using the json module here:
import json
try:
with open('scores', 'r') as f:
player_scores = json.load(f)
except IOError:
# no such file, create an empty dictionary
player_scores = {}
name = input("Enter new player's name: ").upper()
# create a complete, new dictionary
players_scores[name] = {'GOLF': 0, 'MON DEAL': 0}
with open('scores', 'w') as f:
json.dump(player_scores, f)
You don't need defaultdict here at all; you are only creating new dictionary for every player name anyway.
I think one problem is that to index the data structure the way you want, something like a defaultdict(defaultdict(dict)) is what's really needed — but which unfortunately it's impossible to specify one directly like that. However, to workaround that, all you need to do is define a simple intermediary factory function to pass to the upper-level defaultdict:
from collections import defaultdict
def defaultdict_factory(*args, **kwargs):
""" Create and return a defaultdict(dict). """
return defaultdict(dict, *args, **kwargs)
Then you can use players_scores = defaultdict(defaultdict_factory) to create one.
However ast.literal_eval() won't work with one that's been converted to string representation because it's not one of the simple literal data types the function supports. Instead I would suggest you consider using Python's venerable pickle module which can handle most of Python's built-in data types as well custom classes like I'm describing. Here's an example of applying it to your code (in conjunction with the code above):
import pickle
try:
with open('scores', 'rb') as input_file:
players_scores = pickle.load(input_file)
except FileNotFoundError:
print('new scores file will be created')
players_scores = defaultdict(defaultdict_factory)
player_name = input("Enter new player's name: ").upper()
players_scores[player_name]['GOLF'] = 0
players_scores[player_name]['MON DEAL'] = 0
# below is a shorter way to do the initialization for a new player
# players_scores[player_name] = defaultdict_factory({'GOLF': 0, 'MON DEAL': 0})
# write new/updated data structure (back) to disk
with open('scores', 'wb') as output_file:
pickle.dump(players_scores, output_file)
print(players_scores)