Using PyYAML, if I read in a file with blank values in a dict:
test_str = '''
attrs:
first:
second: value2
'''
This returns None for the key first:
>>> data = yaml.load(test_str)
>>> data
{'attrs': {'second': 'value2', 'first': None}}
But when writing, the None value is replaced with null:
>>> print(yaml.dump(data, default_flow_style=False))
attrs:
first: null
second: value2
Is there a way to format the dump output to print a blank scalar rather than null?
Based on #Anthon's excellent answer, I was able to craft this solution:
def represent_none(self, _):
return self.represent_scalar('tag:yaml.org,2002:null', '')
yaml.add_representer(type(None), represent_none)
Based on my understanding of the PyYAML code, adding a representer for an existing type should simply replace the existing representer.
This is a global change and that means that all following dumps use a blank. If some unrelated other piece of code in your program relies on None to be represented in the "normal" way, e.g. a library that you import and that uses PyYAML as well, that library will no longer work in the exepected way/correctly, in that case subclassing is the correct way to go.
You get null because dump() uses the Representer() which subclasses SafeRepresenter() and to represent None, the following method is called:
def represent_none(self, data):
return self.represent_scalar(u'tag:yaml.org,2002:null',
u'null')
As the string null is hardcoded, there is no option to dump() to change that.
The proper way to solve this in PyYAML is to make your own Dumper subclass which has the Emitter, Serializer, and Resolver from the standard Dumper that dump() uses, but with subclass of Representer that represents None the way you want it:
import sys
import yaml
from yaml.representer import Representer
from yaml.dumper import Dumper
from yaml.emitter import Emitter
from yaml.serializer import Serializer
from yaml.resolver import Resolver
yaml_str = """\
attrs:
first:
second: value2
"""
class MyRepresenter(Representer):
def represent_none(self, data):
return self.represent_scalar(u'tag:yaml.org,2002:null',
u'')
class MyDumper(Emitter, Serializer, MyRepresenter, Resolver):
def __init__(self, stream,
default_style=None, default_flow_style=None,
canonical=None, indent=None, width=None,
allow_unicode=None, line_break=None,
encoding=None, explicit_start=None, explicit_end=None,
version=None, tags=None):
Emitter.__init__(self, stream, canonical=canonical,
indent=indent, width=width,
allow_unicode=allow_unicode, line_break=line_break)
Serializer.__init__(self, encoding=encoding,
explicit_start=explicit_start, explicit_end=explicit_end,
version=version, tags=tags)
MyRepresenter.__init__(self, default_style=default_style,
default_flow_style=default_flow_style)
Resolver.__init__(self)
MyRepresenter.add_representer(type(None),
MyRepresenter.represent_none)
data = yaml.load(yaml_str)
yaml.dump(data, stream=sys.stdout, Dumper=MyDumper, default_flow_style=False)
gives you:
attrs:
first:
second: value2
If that sounds like a lot of overhead just to get rid of null, it is. There are some shortcuts you can take and you can even try to graft the alternate function onto the existing Representer, but since the actual function taken is referenced in a lookup table ( populated by add_representer ) you need to handle at least that reference as well.
The far more easy solution is replace PyYAML with ruamel.yaml and use its round_trip functionality (disclaimer: I am the author of that package):
import ruamel.yaml
yaml_str = """\
# trying to round-trip preserve empty scalar
attrs:
first:
second: value2
"""
data = ruamel.yaml.round_trip_load(yaml_str)
assert ruamel.yaml.round_trip_dump(data) == yaml_str
apart from emitting None as the empty scalar, it also preserves order in mapping keys, comments and tag names, none of which PyYAML does. ruamel.yaml also follows the YAML 1.2 specification (from 2009), where PyYAML uses the older YAML 1.1.
The ruamel.yaml package can be installed with pip from PyPI, or with modern Debian based distributions, also with apt-get python-ruamel.yaml
Extending #Jace Browning's answer while addressing #Anthon's concern, we can use a context manager which remembers the prior implementation of None:
class BlankNone(Representer):
"""Print None as blank when used as context manager"""
def represent_none(self, *_):
return self.represent_scalar(u'tag:yaml.org,2002:null', u'')
def __enter__(self):
self.prior = Dumper.yaml_representers[type(None)]
yaml.add_representer(type(None), self.represent_none)
def __exit__(self, exc_type, exc_val, exc_tb):
Dumper.yaml_representers[type(None)] = self.prior
which can be used thus:
with BlankNone(), open(file, 'wt') as f:
yaml.dump(hosts, f)
just use string replace
print(yaml.dump(data).replace("null", ""))
Related
I'm using python 3.x and pyyaml. I'm not married to pyyaml if I need to replace it.
There are a number of questions (with answers) on how to replace a value in a yaml file with the value of an environment variable.
E.g. db_password: !ENV DB_PASSWORD becomes db_password: s00p3rs3kr3t.
The user and the program can make changes to other values (e.g., user sets db_table with cli option, program sets generated hash value).
I want to save those changes without saving the value of the environment variable for db_password.
A simplified example of what I have looks like the following code.
def my_regex:
return regex
def resolve_env_vars:
# replace string with environment variable value
loader = yaml.SafeLoader
loader.add_implicit_resolver('!ENV', my_regex(), None)
loader.add_constructor('!ENV', resolve_env_vars)
with open(yamlfile, 'r',) as raw:
cfg = yaml.load(raw, Loader=loader)
While this works fine for loading the value into the resulting dict, I need to figure out some way of noting the original value and which key it goes with.
I have stepped through the entire process with pudb and I cannot find a way to restore the original value when writing the config file. By the time the code gets to resolve_env_vars the associated key (e.g., db_password in the example above) is not accessible.
How do I save db_password: !ENV DB_PASSWORD instead of db_password: s00p3rs3kret when writing the data back to the config file?
You need the tag to cause the creation of an instance that behaves like a string, but has the original
environment variable tucked onto it, so it can be found at dump time:
import sys
import os
import ruamel.yaml
yaml_str = """\
db_password: !ENV DB_PASSWORD
"""
yaml = ruamel.yaml.YAML(typ='safe')
yaml.default_flow_style = False
#yaml.register_class
class EnvStr(str):
yaml_tag = '!ENV'
def __new__(cls, env_var):
ret_val = str.__new__(cls, os.environ.get(env_var, f'ENV "{env_var}" NOT SET'))
ret_val.env_var = env_var
return ret_val
#classmethod
def from_yaml(cls, constructor, node):
return cls(node.value)
#classmethod
def to_yaml(cls, representer, node):
return representer.represent_scalar(cls.yaml_tag, node.env_var)
os.environ['DB_PASSWORD'] = 's00p3rs3kr3t'
data = yaml.load(yaml_str)
print(f'The password is "{data["db_password"]}" (without the double quotes). Keep it safe!')
print('\nYAML dump:')
yaml.dump(data, sys.stdout)
which gives:
The password is "s00p3rs3kr3t" (without the double quotes). Keep it safe!
YAML dump:
db_password: !ENV DB_PASSWORD
I have a YAML file for storing constants. Some of the entries have custom tags such as !HANDLER or !EXPR and these are easily handled by adding constructors to the YAML loader.
However, I also want to have a custom constructor for non-tagged nodes. Reason being, I want to add these non-tagged values to a dictionary for use elsewhere. These values need to be available before parsing finishes hence I can't just let parsing finish and then update the dictionary.
So with a YAML file like
sample_rate: 16000
input_file: !HANDLER
handler_fn: file_loader
handler_input: path/to/file
mode: w
I have a handler constructor
def file_handler_loader(loader, node):
params = loader.construct_mapping(node)
module = __import__('handlers.file_handlers', fromlist=[params.pop('handler_fn')])
func = getattr(module, params.pop('handler_fn'))
handler_input = params.pop('handler_input')
return func(handler_input, **params)
And a function initialize_constants
def _get_loader():
loader = FullLoader
loader.add_constructor('!HANDLER', file_handler_loader)
loader.add_constructor('!EXPR', expression_loader)
return loader
def initialize_constants(path_to_yaml: str) -> None:
try:
with open(path_to_yaml, 'r') as yaml_file:
constants = yaml.load(yaml_file, Loader=_get_loader())
except FileNotFoundError as ex:
LOGGER.error(ex)
exit(-1)
The goal is then to have a constructor for non-tagged entries in the YAML. I haven't been able to figure out though how to add a constructor for non-tagged entries. Ideally, the code would look like below
def default_constructor(loader, node):
param = loader.construct_scalar(node)
constants[node_name] = param
I've also attempted to add an resolver to solve the problem. The code below was tested but didn't work as expected.
loader.add_constructor('!DEF', default_constructor)
loader.add_implicit_resolver('!DEF', re.compile('.*'), first=None)
def default_constructor(loader, node):
# do stuff
In this case what happened was the node contained the value sample_rate and not the 16000 as expected.
Thanks in advance :)
I'm writing a unit test for a function that takes an array of dictionaries and ends up saving it in a CSV. I'm trying to mock it with pytest as usual:
csv_output = (
"Name\tSurname\r\n"
"Eve\tFirst\r\n"
)
with patch("builtins.open", mock_open()) as m:
export_csv_func(array_of_dicts)
assert m.assert_called_once_with('myfile.csv', 'wb') is None
[and here I want to gather all output sent to the mock "m" and assert it against "csv_output"]
I cannot get in any simple way all the data sent to the mock during the open() phase by csv to do the comparison in bulk, instead of line by line. To simplify things, I verified that the following code mimics the operations that export_csv_func() does to the mock:
with patch("builtins.open", mock_open()) as m:
with open("myfile.csv", "wb") as f:
f.write("Name\tSurname\r\n")
f.write("Eve\tFirst\r\n")
When I dig into the mock, I see:
>>> m
<MagicMock name='open' spec='builtin_function_or_method' id='4380173840'>
>>> m.mock_calls
[call('myfile.csv', 'wb'),
call().__enter__(),
call().write('Name\tSurname\r\n'),
call().write('Eve\tFirst\r\n'),
call().__exit__(None, None, None)]
>>> m().write.mock_calls
[call('Name\tSurname\r\n'), call('Eve\tFirst\r\n')]
>>> dir(m().write.mock_calls[0])
['__add__'...(many methods), '_mock_from_kall', '_mock_name', '_mock_parent', 'call_list', 'count', 'index']
I don't see anything in the MagickMock interface where I can gather all the input that the mock has received.
I also tried calling m().write.call_args but it only returns the last call (the last element of the mock_calls attribute, i.e. call('Eve\tFirst\r\n')).
Is there any way of doing what I want?
You can create your own mock.call objects and compare them with what you have in the .call_args_list.
from unittest.mock import patch, mock_open, call
with patch("builtins.open", mock_open()) as m:
with open("myfile.csv", "wb") as f:
f.write("Name\tSurname\r\n")
f.write("Eve\tFirst\r\n")
# Create your array of expected strings
expected_strings = ["Name\tSurname\r\n", "Eve\tFirst\r\n"]
write_calls = m().write.call_args_list
for expected_str in expected_strings:
# assert that a mock.call(expected_str) exists in the write calls
assert call(expected_str) in write_calls
Note that you can use the assert call of your choice. If you're in a unittest.TestCase subclass, prefer to use self.assertIn.
Additionally, if you just want the arg values you can unpack a mock.call object as tuples. Index 0 is the *args. For example:
for write_call in write_calls:
print('args: {}'.format(write_call[0]))
print('kwargs: {}'.format(write_call[1]))
Indeed you can't patch builtins.open.write directly since the patch within a with would need to enter the patched method and see that write is not a class method.
There are a bunch of solutions and the one I would think of first would be to use your own mock. See the example:
class MockOpenWrite:
def __init__(self, *args, **kwargs):
self.res = []
# What's actually mocking the write. Name must match
def write(self, s: str):
self.res.append(s)
# These 2 methods are needed specifically for the use of with.
# If you mock using a decorator, you don't need them anymore.
def __enter__(self):
return self
def __exit__(self, exc_type, exc_val, exc_tb):
return
mock = MockOpenWrite
with patch("builtins.open", mock):
with open("myfile.csv", "w") as f:
f.write("Name\tSurname\r\n")
f.write("Eve\tFirst\r\n")
print(f.res)
In that case, the res attribute is linked to the instance. So it disappears after the with closes.
You could eventually stored results somewhere else, like a global array, and check the results beyond the end of with.
Feel free to play around with your actual method.
I had to it this way (Python 3.9). It was quite tedious just to get the mock-args out of the function.
from somewhere import my_thing
#patch("lib.function", return_value=MagicMock())
def test_my_thing(my_mock):
my_thing(value1, value2)
(value1_call_args, value2_call_args) = my_mock.call_args_list[0].args
I'm using ruamel.yaml to generate a YAML file that will be read by Tavern, which requires the file to contain a list like this:
includes:
- !include vars.yaml
Attempting to use any of the usual approaches to dump the data as strings results in single quotes being added around the tags, which doesn't work when the YAML is ingested by the next tool.
How do I generate a YAML file that contains unquoted local tags, starting with data that is defined in a dictionary?
I was able to create a YAML file with the required format using the following approach, based on prior examples. My approach is more flexible because it allows the tag handle to be an instance property rather than a class property, so you don't need to define a different class for every tag handle.
import sys
from ruamel.yaml import YAML
yaml = YAML(typ='rt')
class TaggedString:
def __init__(self, handle, value):
self.handle = handle
self.value = value
#classmethod
def to_yaml(cls, representer, node):
# I don't understand the arguments to the following function!
return representer.represent_scalar(u'{.handle}'.format(node),
u'{.value}'.format(node))
yaml.register_class(TaggedString)
data = {
'includes': [
TaggedString('!include', 'vars.yaml'),
TaggedString('!exclude', 'dummy.yaml')
]
}
yaml.dump(data, sys.stdout)
Output:
includes:
- !include vars.yaml
- !exclude dummy.yaml
I am not sure if this is the best approach. I might be missing a simpler way to achieve the same result. Note that my goal is not to dump a Python class; I'm just doing that as a way to get the tag to be written correctly.
I am not sure if this is a better approach, but if you had tried to round-trip your required output, you
would have seen that ruamel.yaml actually can preserve your tagged strings, without you having to
do anything. Inspecting the Python datastructure, you'll notice that ruamel.yaml does
this by creating a TaggedScalar (as you cannnot attach attributes to the built-in string type).
import sys
import ruamel.yaml
yaml_str = """\
includes:
- !include vars.yaml
- !exclude dummy.yaml
"""
yaml = ruamel.yaml.YAML()
data = yaml.load(yaml_str)
yaml.dump(data, sys.stdout)
incl = data['includes'][0]
print(type(incl))
which gives:
includes:
- !include vars.yaml
- !exclude dummy.yaml
<class 'ruamel.yaml.comments.TaggedScalar'>
After inspecting comments.py (and possible constructor.py), you should be able
to make ruamel.yaml's internal data structure on the fly:
import sys
import ruamel.yaml
from ruamel.yaml.comments import TaggedScalar
def tagged_string(tag, val):
# starting with ruamel.yaml>0.16.5 you can replace the following lines with:
# return TaggedScalar(value=val, tag=tag)
ret_val = TaggedScalar()
ret_val.value = val
ret_val.yaml_set_tag(tag)
return ret_val
yaml = ruamel.yaml.YAML()
data = dict(includes=[tagged_string('!include', 'vars.yaml'),
tagged_string('!include', 'vars.yaml'),
])
yaml.dump(data, sys.stdout)
which also gives:
includes:
- !include vars.yaml
- !include vars.yaml
class MSG_TYPE(IntEnum):
REQUEST = 0
GRANT = 1
RELEASE = 2
FAIL = 3
INQUIRE = 4
YIELD = 5
def __json__(self):
return str(self)
class MessageEncoder(JSONEncoder):
def default(self, obj):
return obj.__json__()
class Message(object):
def __init__(self, msg_type, src, dest, data):
self.msg_type = msg_type
self.src = src
self.dest = dest
self.data = data
def __json__(self):
return dict (\
msg_type=self.msg_type, \
src=self.src, \
dest=self.dest, \
data=self.data,\
)
def ToJSON(self):
return json.dumps(self, cls=MessageEncoder)
msg = Message(msg_type=MSG_TYPE.FAIL, src=0, dest=1, data="hello world")
encoded_msg = msg.ToJSON()
decoded_msg = yaml.load(encoded_msg)
print type(decoded_msg['msg_type'])
When calling print type(decoded_msg['msg_type']), I get the result <type 'str'> instead of the original MSG_TYPTE type. I feel like I should also write a custom json decoder but kind of confused how to do that. Any ideas? Thanks.
When calling print type(decoded_msg['msg_type']), I get the result instead of the original MSG_TYPTE type.
Well, yeah, that's because you told MSG_TYPE to encode itself like this:
def __json__(self):
return str(self)
So, that's obviously going to decode back to a string. If you don't want that, come up with some unique way to encode the values, instead of just encoding their string representations.
The most common way to do this is to encode all of your custom types (including your enum types) using some specialized form of object—just like you've done for Message. For example, you might put a py-type field in the object which encodes the type of your object, and then the meanings of the other fields all depend on the type. Ideally you'll want to abstract out the commonalities instead of hardcoding the same thing 100 times, of course.
I feel like I should also write a custom json decoder but kind of confused how to do that.
Well, have you read the documentation? Where exactly are you confused? You're not going to get a complete tutorial by tacking on a followup to a StackOverflow question…
Assuming you've got a special object structure for all your types, you can use an object_hook to decode the values back to the originals. For example, as a quick hack:
class MessageEncoder(JSONEncoder):
def default(self, obj):
return {'py-type': type(obj).__name__, 'value': obj.__json__()}
class MessageDecoder(JSONDecoder):
def __init__(self, hook=None, *args, **kwargs):
if hook is None: hook = self.hook
return super().__init__(hook, *args, **kwargs)
def hook(self, obj):
if isinstance(obj, dict):
pytype = obj.get('py-type')
if pytype:
t = globals()[pytype]
return t.__unjson__(**obj['value'])
return obj
And now, in your Message class:
#classmethod
def __unjson__(cls, msg_type, src, dest, data):
return cls(msg_type, src, dest, data)
And you need a MSG_TYPE.__json__ that returns a dict, maybe just {'name': str(self)}, then an __unjson__ that does something like getattr(cls, name).
A real-life solution should probably either have the classes register themselves instead of looking them up by name, or should handle looking them up by qualified name instead of just going to globals(). And you may want to let things encode to something other than object—or, if not, to just cram py-type into the object instead of wrapping it in another one. And there may be other ways to make the JSON more compact and/or readable. And a little bit of error handling would be nice. And so on.
You may want to look at the implementation of jsonpickle—not because you want to do the exact same thing it does, but to see how it hooks up all the pieces.
Overriding the default method of the encoder won't matter in this case because your object never gets passed to the method. It's treated as an int.
If you run the encoder on its own:
msg_type = MSG_TYPE.RELEASE
MessageEncoder().encode(msg_type)
You'll get:
'MSG_TYPE.RELEASE'
If you can, use an Enum and you shouldn't have any issues. I also asked a similar question:
How do I serialize IntEnum from enum34 to json in python?