python use Pyyaml and keep format - python

Here is a config file, I use PyYAML to change some value from it and then I write some config, but it will change my format, it confuses me.
$ results.yaml
nas:
mount_dir: '/nvr'
mount_dirs: ['/mount/data0', '/mount/data1', '/mount/data2']
# yaml.py
import yaml.py
conf = open("results.conf", "r")
results = yaml.load(conf)
conf.close()
result['nas']['mount_dirs'][0]= "haha"
with open('/home/zonion/speedio/speedio.conf', 'w') as conf:
yaml.dump(speedio, conf, default_flow_style=False)
conf.close()
but it change my format,what should I do?
# cat results.conf
nas:
mount_dir: /nvr
mount_dirs:
- haha
- /mount/data1
- /mount/data2

If you use ruamel.yaml ¹, you can relatively easily achieve this, by combining this and this answer here on StackOverlow.
By default ruamel.yaml normalizes to an indent of 2, and drops superfluous quotes. As you don't seem to want that, you have to either explicitly set the indent, or have ruamel.yaml analyse the input, and tell it to preserve quotes:
import sys
import ruamel.yaml
import ruamel.yaml.util
yaml_str = """\
nas:
mount_dir: '/nvr'
mount_dirs: ['/mount/data0', '/mount/data1', '/mount/data2']
"""
result, indent, block_seq_indent = ruamel.yaml.util.load_yaml_guess_indent(
yaml_str, preserve_quotes=True)
result['nas']['mount_dirs'][0] = "haha"
ruamel.yaml.round_trip_dump(result, sys.stdout, indent=indent,
block_seq_indent=block_seq_indent)
instead of the load_yaml_guess_indent() invocation you can do:
result = ruamel.yaml.round_trip_load(yaml_str, preserve_quotes=True)
indent = 4
block_sequence_indent = None
If you want haha to be (single) quoted in the output make it a SingleQuotedScalarString:
result['nas']['mount_dirs'][0] = \
ruamel.yaml.scalarstring.SingleQuotedScalarString("haha")
with that the output will be:
nas:
mount_dir: '/nvr'
mount_dirs: ['haha', '/mount/data1', '/mount/data2']
(given that your short example input has no block style sequences, the block_sequence_indent cannot be determined and will be None)
When using the newer API you have control over the indent of the mapping and sequences seperately:
yaml = ruamel.yaml.YAML()
yaml.indent(mapping=4, sequence=6, offset=3) # not that that looks nice
data = yaml.load(some_stream)
yaml.dump(data, some_stream)
This will make your YAML formatted consistently if it wasn't so to begin with, and make no further changes after the first round-trip.
¹ Disclaimer: I am the author of that package.

ruamel.yaml unfortunately does not completely preserve original format, quoting its docs:
Although individual indentation of lines is not preserved, you can
specify separate indentation levels for mappings and sequences
(counting for sequences does not include the dash for a sequence
element) and specific offset of block sequence dashes within that
indentation.
I do not know any Python library that does that.
When I need to change a YAML file without touching its format I reluctantly use regexp (reluctantly as it's almost as bad as parsing XHTML with it).
Please feel free to suggest a better solution if you know any, I would gladly learn about it!

ruamel implements a round-trip loader and dumper, try:
import ruamel.yaml
conf = open("results.conf", "r")
results = ruamel.yaml.load(conf, ruamel.yaml.RoundTripLoader)
conf.close()
results['nas']['mount_dirs'][0] = "haha"
with open('/home/zonion/speedio/speedio.conf', 'w') as conf:
ruamel.yaml.dump(results, conf, ruamel.yaml.RoundTripDumper)

Try to load it first and the dump like that:
import ruamel.yaml
yaml_str = f"""\
nas:
mount_dir: '/nvr'
mount_dirs: ['/mount/data0', '/mount/data1', '/mount/data2']"""
yaml = ruamel.yaml.YAML()
data = yaml.load(yaml_str)
with open("test.yaml", 'w') as outfile:
yaml.dump(data, outfile)
outfile.close()

Related

Loading a YAML-file throws an error in Python

I am reading in some YAML-files like this:
data = yaml.safe_load(pathtoyamlfile)
When doing so I get the followin error:
yaml.constructor.ConstructorError: could not determine a constructor for the tag 'tag:yaml.org,2002:value'
When checking for the line of the YAML-file which is also given in the error messages I recognized that there is always this key-value-pair: simple: =.
Since the YAML-files are autogenerated I am not sure if I can change the files themselves. Is there a way on reading the data of the YAML-files none the less?
It looks like you have hit this bug. There is a workaround suggested in the comments.
Given this content in example.yaml:
example: =
This code fails as you've described in your question:
import yaml
with open('example.yaml') as fd:
data = yaml.safe_load(fd)
print(data)
But this works:
import yaml
yaml.SafeLoader.yaml_implicit_resolvers.pop('=')
with open('example.yaml') as fd:
data = yaml.safe_load(fd)
print(data)
And outputs:
{'example': '='}
If you cannot change the input, you might be able to upgrade the library that you use:
import sys
import ruamel.yaml
yaml_str = """\
example: =
"""
yaml = ruamel.yaml.YAML()
data = yaml.load(yaml_str)
for key, value in data.items():
print(f'{key}: {value}')
which gives:
example: =
Please be aware that ruamel.yaml still has this bug in its safe mode loading ( YAML(typ='safe') ).

Dump object into yaml without quotes

I had some object that I want to turn into yaml, the only thing is that I need to be able to put "!anything" without quotes into it.
When I try it with pyyaml I end up with '!anything' inside my yaml file.
I've already tried using ruamel.yaml PreservedScalarString and LiteralScalarString. And it kind of works, but not in the way that I need to work. The thing is I end up with yaml that looks like this:
10.1.1.16:
text: '1470814.27'
confidence: |-
!anything
But I don't need this |- symbol.
My goal is to get yaml like this:
10.1.1.16:
text: '1470814.27'
confidence: !anything
Any ideas how I can achieve that?
To dump a custom tag, you need to define a type and register a representer for that type. Here's how to do it for scalars:
import yaml
class MyTag:
def __init__(self, content):
self.content = content
def __repr__(self):
return self.content
def __str__(self):
return self.content
def mytag_dumper(dumper, data):
return dumper.represent_scalar("!anything", data.content)
yaml.add_representer(MyTag, mytag_dumper)
print(yaml.dump({"10.1.1.16": {
"text": "1470814.27",
"confidence": MyTag("")}}))
This emits
10.1.1.16:
confidence: !anything ''
text: '1470814.27'
Note the '' behind the tag, which is the tagged scalar (no, you can't get rid of it). You can tag collections as well but you'll need to use represent_sequence or represent_mapping accordingly.
Contrary to #flix comment, in YAML you don't need to follow a tag by single or double quotes (or block scalar). You can try Oren Ben-Kiki's reference parser (programmatically derived from the YAML specification) to confirm that your expected output is valid YAML.
Empty content is normally loaded as None in Python (both by the outdated PyYAML as well as ruamel.yaml). Tagged empty content can of course only indicate existence of a particular instance, without any value indication.
ruamel.yaml can perfectly well round-trip your expected output:
import sys
from ruamel.yaml import YAML
yaml_str = """\
10.1.1.16:
text: '1470814.27'
confidence: !anything
"""
yaml = YAML()
data = yaml.load(yaml_str)
yaml.dump(data, sys.stdout)
gives:
10.1.1.16:
text: '1470814.27'
confidence: !anything
You can generate an object that dumps just the tag without a value from scratch (as the parser does), but if you don't want to go into the details, you can just load the tagged object and add it to your data structure:
import sys
import ruamel.yaml
yaml = ruamel.yaml.YAML()
def tagged_empty_scalar(tag):
return yaml.load('!' + tag)
data = {'10.1.1.16': dict(text='1470814.27', confidence=tagged_empty_scalar('anything'))}
yaml.dump(data, sys.stdout)
You can get the exact same result in PyYAML and without the quotes, but that is more complicated.

How do I generate YAML containing local tags with ruamel.yaml?

I'm using ruamel.yaml to generate a YAML file that will be read by Tavern, which requires the file to contain a list like this:
includes:
- !include vars.yaml
Attempting to use any of the usual approaches to dump the data as strings results in single quotes being added around the tags, which doesn't work when the YAML is ingested by the next tool.
How do I generate a YAML file that contains unquoted local tags, starting with data that is defined in a dictionary?
I was able to create a YAML file with the required format using the following approach, based on prior examples. My approach is more flexible because it allows the tag handle to be an instance property rather than a class property, so you don't need to define a different class for every tag handle.
import sys
from ruamel.yaml import YAML
yaml = YAML(typ='rt')
class TaggedString:
def __init__(self, handle, value):
self.handle = handle
self.value = value
#classmethod
def to_yaml(cls, representer, node):
# I don't understand the arguments to the following function!
return representer.represent_scalar(u'{.handle}'.format(node),
u'{.value}'.format(node))
yaml.register_class(TaggedString)
data = {
'includes': [
TaggedString('!include', 'vars.yaml'),
TaggedString('!exclude', 'dummy.yaml')
]
}
yaml.dump(data, sys.stdout)
Output:
includes:
- !include vars.yaml
- !exclude dummy.yaml
I am not sure if this is the best approach. I might be missing a simpler way to achieve the same result. Note that my goal is not to dump a Python class; I'm just doing that as a way to get the tag to be written correctly.
I am not sure if this is a better approach, but if you had tried to round-trip your required output, you
would have seen that ruamel.yaml actually can preserve your tagged strings, without you having to
do anything. Inspecting the Python datastructure, you'll notice that ruamel.yaml does
this by creating a TaggedScalar (as you cannnot attach attributes to the built-in string type).
import sys
import ruamel.yaml
yaml_str = """\
includes:
- !include vars.yaml
- !exclude dummy.yaml
"""
yaml = ruamel.yaml.YAML()
data = yaml.load(yaml_str)
yaml.dump(data, sys.stdout)
incl = data['includes'][0]
print(type(incl))
which gives:
includes:
- !include vars.yaml
- !exclude dummy.yaml
<class 'ruamel.yaml.comments.TaggedScalar'>
After inspecting comments.py (and possible constructor.py), you should be able
to make ruamel.yaml's internal data structure on the fly:
import sys
import ruamel.yaml
from ruamel.yaml.comments import TaggedScalar
def tagged_string(tag, val):
# starting with ruamel.yaml>0.16.5 you can replace the following lines with:
# return TaggedScalar(value=val, tag=tag)
ret_val = TaggedScalar()
ret_val.value = val
ret_val.yaml_set_tag(tag)
return ret_val
yaml = ruamel.yaml.YAML()
data = dict(includes=[tagged_string('!include', 'vars.yaml'),
tagged_string('!include', 'vars.yaml'),
])
yaml.dump(data, sys.stdout)
which also gives:
includes:
- !include vars.yaml
- !include vars.yaml

How can I edit a yaml file while preserving the format using a python script? [duplicate]

Here is a config file, I use PyYAML to change some value from it and then I write some config, but it will change my format, it confuses me.
$ results.yaml
nas:
mount_dir: '/nvr'
mount_dirs: ['/mount/data0', '/mount/data1', '/mount/data2']
# yaml.py
import yaml.py
conf = open("results.conf", "r")
results = yaml.load(conf)
conf.close()
result['nas']['mount_dirs'][0]= "haha"
with open('/home/zonion/speedio/speedio.conf', 'w') as conf:
yaml.dump(speedio, conf, default_flow_style=False)
conf.close()
but it change my format,what should I do?
# cat results.conf
nas:
mount_dir: /nvr
mount_dirs:
- haha
- /mount/data1
- /mount/data2
If you use ruamel.yaml ¹, you can relatively easily achieve this, by combining this and this answer here on StackOverlow.
By default ruamel.yaml normalizes to an indent of 2, and drops superfluous quotes. As you don't seem to want that, you have to either explicitly set the indent, or have ruamel.yaml analyse the input, and tell it to preserve quotes:
import sys
import ruamel.yaml
import ruamel.yaml.util
yaml_str = """\
nas:
mount_dir: '/nvr'
mount_dirs: ['/mount/data0', '/mount/data1', '/mount/data2']
"""
result, indent, block_seq_indent = ruamel.yaml.util.load_yaml_guess_indent(
yaml_str, preserve_quotes=True)
result['nas']['mount_dirs'][0] = "haha"
ruamel.yaml.round_trip_dump(result, sys.stdout, indent=indent,
block_seq_indent=block_seq_indent)
instead of the load_yaml_guess_indent() invocation you can do:
result = ruamel.yaml.round_trip_load(yaml_str, preserve_quotes=True)
indent = 4
block_sequence_indent = None
If you want haha to be (single) quoted in the output make it a SingleQuotedScalarString:
result['nas']['mount_dirs'][0] = \
ruamel.yaml.scalarstring.SingleQuotedScalarString("haha")
with that the output will be:
nas:
mount_dir: '/nvr'
mount_dirs: ['haha', '/mount/data1', '/mount/data2']
(given that your short example input has no block style sequences, the block_sequence_indent cannot be determined and will be None)
When using the newer API you have control over the indent of the mapping and sequences seperately:
yaml = ruamel.yaml.YAML()
yaml.indent(mapping=4, sequence=6, offset=3) # not that that looks nice
data = yaml.load(some_stream)
yaml.dump(data, some_stream)
This will make your YAML formatted consistently if it wasn't so to begin with, and make no further changes after the first round-trip.
¹ Disclaimer: I am the author of that package.
ruamel.yaml unfortunately does not completely preserve original format, quoting its docs:
Although individual indentation of lines is not preserved, you can
specify separate indentation levels for mappings and sequences
(counting for sequences does not include the dash for a sequence
element) and specific offset of block sequence dashes within that
indentation.
I do not know any Python library that does that.
When I need to change a YAML file without touching its format I reluctantly use regexp (reluctantly as it's almost as bad as parsing XHTML with it).
Please feel free to suggest a better solution if you know any, I would gladly learn about it!
ruamel implements a round-trip loader and dumper, try:
import ruamel.yaml
conf = open("results.conf", "r")
results = ruamel.yaml.load(conf, ruamel.yaml.RoundTripLoader)
conf.close()
results['nas']['mount_dirs'][0] = "haha"
with open('/home/zonion/speedio/speedio.conf', 'w') as conf:
ruamel.yaml.dump(results, conf, ruamel.yaml.RoundTripDumper)
Try to load it first and the dump like that:
import ruamel.yaml
yaml_str = f"""\
nas:
mount_dir: '/nvr'
mount_dirs: ['/mount/data0', '/mount/data1', '/mount/data2']"""
yaml = ruamel.yaml.YAML()
data = yaml.load(yaml_str)
with open("test.yaml", 'w') as outfile:
yaml.dump(data, outfile)
outfile.close()

Why is PyYAML spending so much time in just parsing a YAML File?

I am parsing a YAML file with around 6500 lines with this format:
foo1:
bar1:
blah: { name: "john", age: 123 }
metadata: { whatever1: "whatever", whatever2: "whatever" }
stuff:
thing1:
bluh1: { name: "Doe1", age: 123 }
bluh2: { name: "Doe2", age: 123 }
thing2:
...
thingN:
foo2:
...
fooN:
I just want to parse it with the PyYAML library (I think there is no more alternatives to it in Python: How can I parse a YAML file in Python).
Just for testing, I write that code to parse my file:
import yaml
config_file = "/path/to/file.yaml"
stream = open(config_file, "r")
sensors = yaml.load(stream)
Executing the script with time command along with the script I get this time:
real 0m3.906s
user 0m3.672s
sys 0m0.100s
That values doesn't seem too good really. I just want to test the same with JSON, just converting the same YAML file to JSON first:
import json
config_file = "/path/to/file.json"
stream = open(config_file, "r")
sensors = json.load(stream) # We read the yaml config file
But the execution time is far better:
real 0m0.058s
user 0m0.032s
sys 0m0.008s
Why is the main reason that PyYAML spends more time parsing the YAML file than parsing the JSON one? Is it a problem of PyYAML or is it because of the YAML format is hard to parse? (probably is the first one)
EDIT:
I add another example with ruby and YAML:
require 'yaml'
sensors = YAML.load_file('/path/to/file.yaml')
And the execution time is good! (or at least not as bad as the PyYAML example):
real 0m0.278s
user 0m0.240s
sys 0m0.032s
According to the docs you must use CLoader/CSafeLoader (and CDumper):
import yaml
try:
from yaml import CLoader as Loader
except ImportError:
from yaml import Loader
config_file = "test.yaml"
stream = open(config_file, "r")
sensors = yaml.load(stream, Loader=Loader)
This gives me
real 0m0.503s
instead of
real 0m2.714s

Categories

Resources