I had some object that I want to turn into yaml, the only thing is that I need to be able to put "!anything" without quotes into it.
When I try it with pyyaml I end up with '!anything' inside my yaml file.
I've already tried using ruamel.yaml PreservedScalarString and LiteralScalarString. And it kind of works, but not in the way that I need to work. The thing is I end up with yaml that looks like this:
10.1.1.16:
text: '1470814.27'
confidence: |-
!anything
But I don't need this |- symbol.
My goal is to get yaml like this:
10.1.1.16:
text: '1470814.27'
confidence: !anything
Any ideas how I can achieve that?
To dump a custom tag, you need to define a type and register a representer for that type. Here's how to do it for scalars:
import yaml
class MyTag:
def __init__(self, content):
self.content = content
def __repr__(self):
return self.content
def __str__(self):
return self.content
def mytag_dumper(dumper, data):
return dumper.represent_scalar("!anything", data.content)
yaml.add_representer(MyTag, mytag_dumper)
print(yaml.dump({"10.1.1.16": {
"text": "1470814.27",
"confidence": MyTag("")}}))
This emits
10.1.1.16:
confidence: !anything ''
text: '1470814.27'
Note the '' behind the tag, which is the tagged scalar (no, you can't get rid of it). You can tag collections as well but you'll need to use represent_sequence or represent_mapping accordingly.
Contrary to #flix comment, in YAML you don't need to follow a tag by single or double quotes (or block scalar). You can try Oren Ben-Kiki's reference parser (programmatically derived from the YAML specification) to confirm that your expected output is valid YAML.
Empty content is normally loaded as None in Python (both by the outdated PyYAML as well as ruamel.yaml). Tagged empty content can of course only indicate existence of a particular instance, without any value indication.
ruamel.yaml can perfectly well round-trip your expected output:
import sys
from ruamel.yaml import YAML
yaml_str = """\
10.1.1.16:
text: '1470814.27'
confidence: !anything
"""
yaml = YAML()
data = yaml.load(yaml_str)
yaml.dump(data, sys.stdout)
gives:
10.1.1.16:
text: '1470814.27'
confidence: !anything
You can generate an object that dumps just the tag without a value from scratch (as the parser does), but if you don't want to go into the details, you can just load the tagged object and add it to your data structure:
import sys
import ruamel.yaml
yaml = ruamel.yaml.YAML()
def tagged_empty_scalar(tag):
return yaml.load('!' + tag)
data = {'10.1.1.16': dict(text='1470814.27', confidence=tagged_empty_scalar('anything'))}
yaml.dump(data, sys.stdout)
You can get the exact same result in PyYAML and without the quotes, but that is more complicated.
Related
I am reading in some YAML-files like this:
data = yaml.safe_load(pathtoyamlfile)
When doing so I get the followin error:
yaml.constructor.ConstructorError: could not determine a constructor for the tag 'tag:yaml.org,2002:value'
When checking for the line of the YAML-file which is also given in the error messages I recognized that there is always this key-value-pair: simple: =.
Since the YAML-files are autogenerated I am not sure if I can change the files themselves. Is there a way on reading the data of the YAML-files none the less?
It looks like you have hit this bug. There is a workaround suggested in the comments.
Given this content in example.yaml:
example: =
This code fails as you've described in your question:
import yaml
with open('example.yaml') as fd:
data = yaml.safe_load(fd)
print(data)
But this works:
import yaml
yaml.SafeLoader.yaml_implicit_resolvers.pop('=')
with open('example.yaml') as fd:
data = yaml.safe_load(fd)
print(data)
And outputs:
{'example': '='}
If you cannot change the input, you might be able to upgrade the library that you use:
import sys
import ruamel.yaml
yaml_str = """\
example: =
"""
yaml = ruamel.yaml.YAML()
data = yaml.load(yaml_str)
for key, value in data.items():
print(f'{key}: {value}')
which gives:
example: =
Please be aware that ruamel.yaml still has this bug in its safe mode loading ( YAML(typ='safe') ).
I have a YAML file with configuration data for my application, which is dumped to a new file whenever the application is run for debugging purposes. Unfortunately, some keys in the YAML file hold sensitive data and need to be obfuscated or simply excluded from the dumped file.
Example YAML input file:
logging_config:
level: INFO
file_path: /path/to/log_file.log
database_access:
table_to_query: customer_table
database_api_key: XXX-XXX-XXX # Sensitive data, exclude from archived file
There are workarounds, of course:
Keeping a list of keys with sensitive data and pre-processing dicts before outputting them to YAML
Separating sensitive and non-sensitive data in separate configuration files and outputtiing only the latter
etc.
But I was hoping that there was a solution similar to implementing a custom Loader reacting to a command like !keep_secret whenever it appears in a dict value, as it would keep my configuration files more readable.
You can use a custom representer. Here's a basic example:
import yaml
class SensitiveText:
def __init__(self, content):
self.content = content
def __repr__(self):
return self.content
def __str__(self):
return self.content
def sensitive_text_remover(dumper, data):
return dumper.represent_scalar("tag:yaml.org,2002:null", "")
yaml.add_representer(SensitiveText, sensitive_text_remover)
data = {
"logging_config": {
"level": "INFO",
"file_path": "/path/to/log_file.log"
},
"database_access": {
"table_to_query": "customer_table",
"database_api_key": SensitiveText("XXX-XXX-XXX")
}
}
print(yaml.dump(data))
This prints:
database_access:
database_api_key:
table_to_query: customer_table
logging_config:
file_path: /path/to/log_file.log
level: INFO
You can of course have a class for the database_access instead with a representer that removes the database_api_key altogether.
I'm using ruamel.yaml to generate a YAML file that will be read by Tavern, which requires the file to contain a list like this:
includes:
- !include vars.yaml
Attempting to use any of the usual approaches to dump the data as strings results in single quotes being added around the tags, which doesn't work when the YAML is ingested by the next tool.
How do I generate a YAML file that contains unquoted local tags, starting with data that is defined in a dictionary?
I was able to create a YAML file with the required format using the following approach, based on prior examples. My approach is more flexible because it allows the tag handle to be an instance property rather than a class property, so you don't need to define a different class for every tag handle.
import sys
from ruamel.yaml import YAML
yaml = YAML(typ='rt')
class TaggedString:
def __init__(self, handle, value):
self.handle = handle
self.value = value
#classmethod
def to_yaml(cls, representer, node):
# I don't understand the arguments to the following function!
return representer.represent_scalar(u'{.handle}'.format(node),
u'{.value}'.format(node))
yaml.register_class(TaggedString)
data = {
'includes': [
TaggedString('!include', 'vars.yaml'),
TaggedString('!exclude', 'dummy.yaml')
]
}
yaml.dump(data, sys.stdout)
Output:
includes:
- !include vars.yaml
- !exclude dummy.yaml
I am not sure if this is the best approach. I might be missing a simpler way to achieve the same result. Note that my goal is not to dump a Python class; I'm just doing that as a way to get the tag to be written correctly.
I am not sure if this is a better approach, but if you had tried to round-trip your required output, you
would have seen that ruamel.yaml actually can preserve your tagged strings, without you having to
do anything. Inspecting the Python datastructure, you'll notice that ruamel.yaml does
this by creating a TaggedScalar (as you cannnot attach attributes to the built-in string type).
import sys
import ruamel.yaml
yaml_str = """\
includes:
- !include vars.yaml
- !exclude dummy.yaml
"""
yaml = ruamel.yaml.YAML()
data = yaml.load(yaml_str)
yaml.dump(data, sys.stdout)
incl = data['includes'][0]
print(type(incl))
which gives:
includes:
- !include vars.yaml
- !exclude dummy.yaml
<class 'ruamel.yaml.comments.TaggedScalar'>
After inspecting comments.py (and possible constructor.py), you should be able
to make ruamel.yaml's internal data structure on the fly:
import sys
import ruamel.yaml
from ruamel.yaml.comments import TaggedScalar
def tagged_string(tag, val):
# starting with ruamel.yaml>0.16.5 you can replace the following lines with:
# return TaggedScalar(value=val, tag=tag)
ret_val = TaggedScalar()
ret_val.value = val
ret_val.yaml_set_tag(tag)
return ret_val
yaml = ruamel.yaml.YAML()
data = dict(includes=[tagged_string('!include', 'vars.yaml'),
tagged_string('!include', 'vars.yaml'),
])
yaml.dump(data, sys.stdout)
which also gives:
includes:
- !include vars.yaml
- !include vars.yaml
Here is a config file, I use PyYAML to change some value from it and then I write some config, but it will change my format, it confuses me.
$ results.yaml
nas:
mount_dir: '/nvr'
mount_dirs: ['/mount/data0', '/mount/data1', '/mount/data2']
# yaml.py
import yaml.py
conf = open("results.conf", "r")
results = yaml.load(conf)
conf.close()
result['nas']['mount_dirs'][0]= "haha"
with open('/home/zonion/speedio/speedio.conf', 'w') as conf:
yaml.dump(speedio, conf, default_flow_style=False)
conf.close()
but it change my format,what should I do?
# cat results.conf
nas:
mount_dir: /nvr
mount_dirs:
- haha
- /mount/data1
- /mount/data2
If you use ruamel.yaml ¹, you can relatively easily achieve this, by combining this and this answer here on StackOverlow.
By default ruamel.yaml normalizes to an indent of 2, and drops superfluous quotes. As you don't seem to want that, you have to either explicitly set the indent, or have ruamel.yaml analyse the input, and tell it to preserve quotes:
import sys
import ruamel.yaml
import ruamel.yaml.util
yaml_str = """\
nas:
mount_dir: '/nvr'
mount_dirs: ['/mount/data0', '/mount/data1', '/mount/data2']
"""
result, indent, block_seq_indent = ruamel.yaml.util.load_yaml_guess_indent(
yaml_str, preserve_quotes=True)
result['nas']['mount_dirs'][0] = "haha"
ruamel.yaml.round_trip_dump(result, sys.stdout, indent=indent,
block_seq_indent=block_seq_indent)
instead of the load_yaml_guess_indent() invocation you can do:
result = ruamel.yaml.round_trip_load(yaml_str, preserve_quotes=True)
indent = 4
block_sequence_indent = None
If you want haha to be (single) quoted in the output make it a SingleQuotedScalarString:
result['nas']['mount_dirs'][0] = \
ruamel.yaml.scalarstring.SingleQuotedScalarString("haha")
with that the output will be:
nas:
mount_dir: '/nvr'
mount_dirs: ['haha', '/mount/data1', '/mount/data2']
(given that your short example input has no block style sequences, the block_sequence_indent cannot be determined and will be None)
When using the newer API you have control over the indent of the mapping and sequences seperately:
yaml = ruamel.yaml.YAML()
yaml.indent(mapping=4, sequence=6, offset=3) # not that that looks nice
data = yaml.load(some_stream)
yaml.dump(data, some_stream)
This will make your YAML formatted consistently if it wasn't so to begin with, and make no further changes after the first round-trip.
¹ Disclaimer: I am the author of that package.
ruamel.yaml unfortunately does not completely preserve original format, quoting its docs:
Although individual indentation of lines is not preserved, you can
specify separate indentation levels for mappings and sequences
(counting for sequences does not include the dash for a sequence
element) and specific offset of block sequence dashes within that
indentation.
I do not know any Python library that does that.
When I need to change a YAML file without touching its format I reluctantly use regexp (reluctantly as it's almost as bad as parsing XHTML with it).
Please feel free to suggest a better solution if you know any, I would gladly learn about it!
ruamel implements a round-trip loader and dumper, try:
import ruamel.yaml
conf = open("results.conf", "r")
results = ruamel.yaml.load(conf, ruamel.yaml.RoundTripLoader)
conf.close()
results['nas']['mount_dirs'][0] = "haha"
with open('/home/zonion/speedio/speedio.conf', 'w') as conf:
ruamel.yaml.dump(results, conf, ruamel.yaml.RoundTripDumper)
Try to load it first and the dump like that:
import ruamel.yaml
yaml_str = f"""\
nas:
mount_dir: '/nvr'
mount_dirs: ['/mount/data0', '/mount/data1', '/mount/data2']"""
yaml = ruamel.yaml.YAML()
data = yaml.load(yaml_str)
with open("test.yaml", 'w') as outfile:
yaml.dump(data, outfile)
outfile.close()
Using PyYAML, if I read in a file with blank values in a dict:
test_str = '''
attrs:
first:
second: value2
'''
This returns None for the key first:
>>> data = yaml.load(test_str)
>>> data
{'attrs': {'second': 'value2', 'first': None}}
But when writing, the None value is replaced with null:
>>> print(yaml.dump(data, default_flow_style=False))
attrs:
first: null
second: value2
Is there a way to format the dump output to print a blank scalar rather than null?
Based on #Anthon's excellent answer, I was able to craft this solution:
def represent_none(self, _):
return self.represent_scalar('tag:yaml.org,2002:null', '')
yaml.add_representer(type(None), represent_none)
Based on my understanding of the PyYAML code, adding a representer for an existing type should simply replace the existing representer.
This is a global change and that means that all following dumps use a blank. If some unrelated other piece of code in your program relies on None to be represented in the "normal" way, e.g. a library that you import and that uses PyYAML as well, that library will no longer work in the exepected way/correctly, in that case subclassing is the correct way to go.
You get null because dump() uses the Representer() which subclasses SafeRepresenter() and to represent None, the following method is called:
def represent_none(self, data):
return self.represent_scalar(u'tag:yaml.org,2002:null',
u'null')
As the string null is hardcoded, there is no option to dump() to change that.
The proper way to solve this in PyYAML is to make your own Dumper subclass which has the Emitter, Serializer, and Resolver from the standard Dumper that dump() uses, but with subclass of Representer that represents None the way you want it:
import sys
import yaml
from yaml.representer import Representer
from yaml.dumper import Dumper
from yaml.emitter import Emitter
from yaml.serializer import Serializer
from yaml.resolver import Resolver
yaml_str = """\
attrs:
first:
second: value2
"""
class MyRepresenter(Representer):
def represent_none(self, data):
return self.represent_scalar(u'tag:yaml.org,2002:null',
u'')
class MyDumper(Emitter, Serializer, MyRepresenter, Resolver):
def __init__(self, stream,
default_style=None, default_flow_style=None,
canonical=None, indent=None, width=None,
allow_unicode=None, line_break=None,
encoding=None, explicit_start=None, explicit_end=None,
version=None, tags=None):
Emitter.__init__(self, stream, canonical=canonical,
indent=indent, width=width,
allow_unicode=allow_unicode, line_break=line_break)
Serializer.__init__(self, encoding=encoding,
explicit_start=explicit_start, explicit_end=explicit_end,
version=version, tags=tags)
MyRepresenter.__init__(self, default_style=default_style,
default_flow_style=default_flow_style)
Resolver.__init__(self)
MyRepresenter.add_representer(type(None),
MyRepresenter.represent_none)
data = yaml.load(yaml_str)
yaml.dump(data, stream=sys.stdout, Dumper=MyDumper, default_flow_style=False)
gives you:
attrs:
first:
second: value2
If that sounds like a lot of overhead just to get rid of null, it is. There are some shortcuts you can take and you can even try to graft the alternate function onto the existing Representer, but since the actual function taken is referenced in a lookup table ( populated by add_representer ) you need to handle at least that reference as well.
The far more easy solution is replace PyYAML with ruamel.yaml and use its round_trip functionality (disclaimer: I am the author of that package):
import ruamel.yaml
yaml_str = """\
# trying to round-trip preserve empty scalar
attrs:
first:
second: value2
"""
data = ruamel.yaml.round_trip_load(yaml_str)
assert ruamel.yaml.round_trip_dump(data) == yaml_str
apart from emitting None as the empty scalar, it also preserves order in mapping keys, comments and tag names, none of which PyYAML does. ruamel.yaml also follows the YAML 1.2 specification (from 2009), where PyYAML uses the older YAML 1.1.
The ruamel.yaml package can be installed with pip from PyPI, or with modern Debian based distributions, also with apt-get python-ruamel.yaml
Extending #Jace Browning's answer while addressing #Anthon's concern, we can use a context manager which remembers the prior implementation of None:
class BlankNone(Representer):
"""Print None as blank when used as context manager"""
def represent_none(self, *_):
return self.represent_scalar(u'tag:yaml.org,2002:null', u'')
def __enter__(self):
self.prior = Dumper.yaml_representers[type(None)]
yaml.add_representer(type(None), self.represent_none)
def __exit__(self, exc_type, exc_val, exc_tb):
Dumper.yaml_representers[type(None)] = self.prior
which can be used thus:
with BlankNone(), open(file, 'wt') as f:
yaml.dump(hosts, f)
just use string replace
print(yaml.dump(data).replace("null", ""))