PyYAML : Control ordering of items called by yaml.load()

PyYAML : Control ordering of items called by yaml.load() - python

I have a yaml setting file which creates some records in db:
setting1:
name: [item,item]
name1: text
anothersetting2:
name: [item,item]
sub_setting:
name :[item,item]
when i update this file with setting3 and regenerate records in db by:
import yaml
fh = open('setting.txt', 'r')
setting_list = yaml.load(fh)
for i in setting_list:
add_to_db[i]
it's vital that the order of them settings (id numbers in db) stay the same each time as im addig them to the db... and setting3 just gets appended to the yaml.load()'s end so that its id doesn't confuse any records which are already in the db ...
At the moment each time i add another setting and call yaml.load() records get loaded in different order which results in different ids. I would welcome any ideas ;)
EDIT:
I've followed abarnert tips and took this gist https://gist.github.com/844388
Works as expected thanks !

My project oyaml is a drop-in replacement for PyYAML, which will load maps into collections.OrderedDict instead of regular dicts. Just pip install it and use as normal - works on both Python 3 and Python 2.
Demo with your example:
>>> import oyaml as yaml # pip install oyaml
>>> yaml.load('''setting1:
... name: [item,item]
... name1: text
... anothersetting2:
... name: [item,item]
... sub_setting:
... name :[item,item]''')
OrderedDict([('setting1',
OrderedDict([('name', ['item', 'item']), ('name1', 'text')])),
('anothersetting2',
OrderedDict([('name', ['item', 'item']),
('sub_setting', 'name :[item,item]')]))])
Note that if the stdlib dict is order preserving (Python >= 3.7, CPython >= 3.6) then oyaml will use an ordinary dict.

You can now use ruaml.yaml for this.
From https://pypi.python.org/pypi/ruamel.yaml:
ruamel.yaml is a YAML parser/emitter that supports roundtrip
preservation of comments, seq/map flow style, and map key order

The YAML spec clearly says that the key order within a mapping is a "representation detail" that cannot be relied on. So your settings file is already invalid if it's relying on the mapping, and you'd be much better off using valid YAML, if at all possible.
Of course YAML is extensible, and there's nothing stopping you from adding an "ordered mapping" type to your settings files. For example:
!omap setting1:
name: [item,item]
name1: text
!omap anothersetting2:
name: [item,item]
!omap sub_setting:
name :[item,item]
You didn't mention which yaml module you're using. There is no such module in the standard library, and there are at least two packages just on PyPI that provide modules with that name. However, I'm going to guess it's PyYAML, because as far as I know that's the most popular.
The extension described above is easy to parse with PyYAML. See http://pyyaml.org/ticket/29:
def omap_constructor(loader, node):
return loader.construct_pairs(node)
yaml.add_constructor(u'!omap', omap_constructor)
Now, instead of:
{'anothersetting2': {'name': ['item', 'item'],
'sub_setting': 'name :[item,item]'},
'setting1': {'name': ['item', 'item'], 'name1': 'text'}}
You'll get this:
(('anothersetting2', (('name', ['item', 'item']),
('sub_setting', ('name, [item,item]'),))),
('setting1', (('name', ['item', 'item']), ('name1', 'text'))))
Of course this gives you a tuple of key-value tuples, but you can easily write a construct_ordereddict and get an OrderedDict instead. You can also write a representer that stores OrdereredDict objects as !omaps, if you need to output as well as input.
If you really want to hook PyYAML to make it use an OrderedDict instead of a dict for default mappings, it's pretty easy to do if you're already working directly on parser objects, but more difficult if you want to stick with the high-level convenience methods. Fortunately, the above-linked ticket has an implementation you can use. Just remember that you're not using real YAML anymore, but a variant, so any other software that deals with your files can, and likely will, break.

For a given single item that is known to be an ordered dictionary just make the items of a list and used collections.OrderedDict:
setting1:
- name: [item,item]
- name1: text
anothersetting2:
- name: [item,item]
- sub_setting:
name :[item,item]
import collections
import yaml
fh = open('setting.txt', 'r')
setting_list = yaml.load(fh)
setting1 = collections.OrderedDict(list(x.items())[0] for x in setting_list['setting1'])

Last I heard, PyYAML did not support this, though it would probably be easy to modify it to accept a dictionary or dictionary-like object as a starting point.

Related

can we distinguish string/int value in yaml file using yaml BaseLoader?

I have a yaml file with the following data:
apple: 1
banana: '2'
cat: "3"
My project is parsing it using Python(yaml.BaseLoader) and want to deduce that "apple" is associated with an integer, using the isinstance()?
But in my case, the value isinstance(config['apple'], int) is showing FALSE and isinstance(config['apple'], str) is TRUE.
I think it makes sense as well, as we are using BaseLoader, so is there a way to update this to include integer without replacing the BaseLoader as the project's parsing script is getting used at many places?

As you've noticed, the base loader doesn't distinguish between scalar types (the behavior is via BaseConstructor.construct_scalar).
I'm not quite sure if this is what you want, but...
There's no safe way (that wouldn't affect other libraries using BaseLoader) to add integers to BaseLoader, but if you're willing to do a single search-and-replace to replace the use of BaseLoader with something else, you can do
class OurLoader(BaseLoader):
pass
OurLoader.add_constructor(
"tag:yaml.org,2002:int", SafeConstructor.construct_yaml_int,
)
# Borrowed from the yaml module itself:
YAML_INT_RE = re.compile(
r"""
^(?:[-+]?0b[0-1_]+
|[-+]?0[0-7_]+
|[-+]?(?:0|[1-9][0-9_]*)
|[-+]?0x[0-9a-fA-F_]+
|[-+]?[1-9][0-9_]*(?::[0-5]?[0-9])+)$""",
re.X,
)
OurLoader.add_implicit_resolver(
"tag:yaml.org,2002:int", YAML_INT_RE, list("-+0123456789")
)
to end up with a loader that knows integers but nothing else.

Why the yaml can't load value as expected?

I use next minimal example to explain my problem:
test.py
#! /usr/bin/python3
import jinja2
import yaml
from yaml import CSafeLoader as SafeLoader
devices = [
"usb_otg_path: 1:8",
"usb_otg_path: m1:8",
"usb_otg_path: 18",
]
for device in devices:
template = jinja2.Template(device)
device_template = template.render()
print(device_template)
obj = yaml.load(device_template, Loader=SafeLoader)
print(obj)
The run result is:
root#pie:~# python3 test.py
usb_otg_path: 1:8
{'usb_otg_path': 68}
usb_otg_path: m1:8
{'usb_otg_path': 'm1:8'}
usb_otg_path: 18
{'usb_otg_path': 18}
You could see if the value of device_template is usb_otg_path: 1:8, then after yaml.load, the 1:8 becomes 68, looks like because we have : in it. But it's ok for other 2 inputs.
You know above is a simplify of a complex system, in which "usb_otg_path: 1:8" is the input value which I could not change, also the yaml.load is the basic mechanism it used to change a string to a python object.
Then if possible for me to get {'usb_otg_path': '1:8'} with some small changes (We need to upstream to that project, so may can't do big changes to affect others)? Something like change any parameters of yaml.load or something else?

YAML allows numerical literals (scalars) formatted as x:y:z and interprets them as "sexagesimal," that is to say: base 60.
1:8 is thus interpreted by YAML as 1*60**1 + 8*60**0, obviously giving you 68.
Notably you also have m1:8 as a string and 18 as a number. You sound like you want all strings? This answer might be useful:
yaml.load(yaml, Loader=yaml.BaseLoader)
This disables automatic value conversion, as BaseLoader "does not resolve or support any tags and construct only basic Python objects: lists, dictionaries, and Unicode strings." - See reference below

YAML response from Flask MySQL query doesn't seem formatted correctly [duplicate]

I'm writing a file type converter using Python and PyYAML for a project where I am translating to and from YAML files multiple times. These file are then used by a separate service that I have no control over, so I need to translate back the YAML the same as I originally got it. My original file has sections of the following:
key:
- value1
- value2
- value3
Which evaluates to {key: [value1,value2,value3]} using yaml.load(). When I translate this back to YAML my new file reads like this:
key: [value1,value2,value3]
My question is whether these two forms are equivalent as far as the various language parsers of YAML files are concerned. Obviously using PyYaml, these are equivalent, but does this hold true for Ruby or other languages, which the application is using? If not, then the application will not be able to display the data properly.

As Jordan already pointed out the node style is a serialization detail. And the output is equivalent to your input.
With PyYAML you can get the same block style output by using the default_flow_style keyword when dumping:
yaml.dump(yaml.load("""\
key:
- value1
- value2
- value3
"""), sys.stdout, default_flow_style=False)
gives you:
key:
- value1
- value2
- value3
If you would be using the round-trip capabilities from ruamel.yaml (disclaimer: I am the author of that package) you could do:
import sys
import ruamel.yaml as yaml
yaml_str = """\
key:
- value1
- value2 # this is the second value
- value3
"""
data = yaml.load(yaml_str, Loader=yaml.RoundTripLoader)
yaml.dump(data, sys.stdout, Dumper=yaml.RoundTripDumper, default_flow_style=False)
to get:
key:
- value1
- value2 # this is the second value
- value3
Not only does it preserve the flow/block style, but also the comment and the key ordering and some more transparently. This makes comparison (e.g. when using some revision control system to check in the YAML file), much easier.
For the service reading the YAML file this all makes no difference, but for the ease of checking whether you are transforming things correctly, it does.

Yes, to any YAML parser that follows the spec, they are equivalent. You can read the spec here: http://www.yaml.org/spec/1.2/spec.html
Section 3.2.3.1 is particularly relevant (emphasis mine):
3.2.3.1. Node Styles
Each node is presented in some style, depending on its kind. The node style is a presentation detail and is not reflected in the serialization tree or representation graph. There are two groups of styles. Block styles use indentation to denote structure; In contrast, flow styles styles rely on explicit indicators.
To clarify, a node is any structure in YAML, including arrays (called sequences in the spec). The single-line style is called a flow sequence (see section 7.4.1) and the multi-line style is called a block sequence (section 8.2.1). A compliant parser will deserialize both into identical objects.

Use key/value inheritence but leave it out of the result in pyyaml

PyYAML is pretty cool in respect to inheritance of the key/value pairs, but is it possible to not include the following base_value_structure in the final structure.
Default_profile: &Default_profile
base_value_structure: &base_value_structure
path_to_value: 'path to element'
selector_type: 'XPATH'
required: false
title:
<<: *base_value_structure
path_to_value: "//div[#id='ctitle']/text()"
After parsing the config above, the base_value_structure is in the result. Can I prevent this behavior or do I need to filter it by hand?
Desired result:
{"Default_profile": {
"title": {
"path_to_value": "//div[#id='ctitle']/text()",
"selector_type": "XPATH",
"required": False }
}
}

You would need to filter this out by hand. There is no provision in the specification of merge keys.
You could, if you would not load mappings as Python dicts, but as more complex types, filter these "base" mappings out automatically, but at the cost of complicating the syntax of the YAML file.
It should also possible to tweak the parser to keep a list of mappings used as base and delete those that are used. Or alternatively, if only "base" mappings have an anchor, delete only those. Neither of these can be done with PyYAML as is.
However it is not necessary that the anchored mapping has the same anchor name as the key. The anchored mapping doesn't have to be a key value (as it is in your example) at all. By reordering the YAML file you can much more easily remove the "base" or even multiple bases:
from pprint import pprint
import ruamel.yaml as yaml
yaml_str = """\
-
- &base_value_structure
path_to_value: 'path to element'
selector_type: 'XPATH'
required: false
- &base_other_structure
key1: val1
key2: val2
- Default_profile: &Default_profile
title:
<<: *base_value_structure
path_to_value: "//div[#id='ctitle']/text()"
"""
data = yaml.load(yaml_str)[1]
pprint(data)
gives:
{'Default_profile': {'title': {'path_to_value': "//div[#id='ctitle']/text()",
'required': False,
'selector_type': 'XPATH'}}}
In the above I used my ruamel.yaml library, which is a derivative of PyYAML, which for this example should work the same as PyYAML, but it would preserve the merge information if you used its round-trip loader/dumper.

Parsing Yaml in Python: Detect duplicated keys

The yaml library in python is not able to detect duplicated keys. This is a bug that has been reported years ago and there is not a fix yet.
I would like to find a decent workaround to this problem. How plausible could be to create a regex that returns all the keys ? Then it would be quite easy to detect this problem.
Could any regex master suggest a regex that is able to extract all the keys to find duplicates ?
File example:
mykey1:
subkey1: value1
subkey2: value2
subkey3:
- value 3.1
- value 3.2
mykey2:
subkey1: this is not duplicated
subkey5: value5
subkey5: duplicated!
subkey6:
subkey6.1: value6.1
subkey6.2: valye6.2

The yamllint command-line tool does what you
want:
sudo pip install yamllint
Specifically, it has a rule key-duplicates that detects repetitions and keys
over-writing one another:
$ yamllint test.yaml
test.yaml
1:1 warning missing document start "---" (document-start)
10:5 error duplication of key "subkey5" in mapping (key-duplicates)
(It has many other rules that you can enable/disable or tweak.)

Over-riding on of the build in loaders is a more lightweight approach:
import yaml
# special loader with duplicate key checking
class UniqueKeyLoader(yaml.SafeLoader):
def construct_mapping(self, node, deep=False):
mapping = []
for key_node, value_node in node.value:
key = self.construct_object(key_node, deep=deep)
assert key not in mapping
mapping.append(key)
return super().construct_mapping(node, deep)
then:
yaml_text = open(filename), 'r').read()
data[f] = yaml.load(yaml_text, Loader=UniqueKeyLoader)

Develop Reference

Python is a programming language that lets you work quickly and integrate systems more effectively.

PyYAML : Control ordering of items called by yaml.load() - python

You can now use ruaml.yaml for this. From https://pypi.python.org/pypi/ruamel.yaml: ruamel.yaml is a YAML parser/emitter that supports roundtrip preservation of comments, seq/map flow style, and map key order

Last I heard, PyYAML did not support this, though it would probably be easy to modify it to accept a dictionary or dictionary-like object as a starting point.

Related

can we distinguish string/int value in yaml file using yaml BaseLoader?

Why the yaml can't load value as expected?

YAML response from Flask MySQL query doesn't seem formatted correctly [duplicate]

Use key/value inheritence but leave it out of the result in pyyaml

Parsing Yaml in Python: Detect duplicated keys

Categories

Resources