Accessing / Replacing Yaml values via python using variables

Accessing / Replacing Yaml values via python using variables - python

I have been trying to solve what I thought would be simple but can't wrap my head around getting a yaml file updated based on a variable
What I have:
An ansible hosts file in YAML format. This hosts file is not 100% the same all the time. It can have a dictionary of multiple image values (as one example) and I only want one to change.
namespace: demo1
images:
image1:
path: "path1"
version: "v1"
image2:
path: "path2"
version: "1.2.3"
user: "root"
A YAML file that contains the key/values for things I want to replace. We already have a lot of configuration inside this YAML for other parts of our system so I don't want to split off to some other type of config type if I can help it (ini, JSON, etc) I would really want this to be dot notation.
schema: v1.0
hostfile:
- path: path/to/ansible_hosts_file
images:
image1.version: v1.1
I am trying to find a way to load the YAML from #1, read in the key hostfile.images.[variable] to replace and write back to the original ansible file with the new value. I keep getting tripped up on the variable aspect since today it can be image1.version and the next config its image2.path or both at the same time.

I think your problem primarily comes from mixing key-value pairs with dotted notation. I.e.
images:
image1.version: v1.1
instead of doing
images.image1.version: v1.1
Retrieving by dotted notation has been solved for Python and arbitrary separators (not necessarily '.') in this answer. Setting just involves providing two extra functions that take a second argument, which is the value to set and graft them onto CommentedMap resp. CommentesSeq)
Based on that you need to preselect based on your key:
upd = ruamel.yaml.round_trip_load(open('update.yaml')
# schema check here
for hostfile in upd['hostfile']:
data = ruamel.yaml.round_trip_load(open(hostfile['path']))
images = data['images']
for dotted in hostfile['images']:
val = hostfile['images'][dotted]
images.string_set(dotted, val)
The actual string_set-ting code could look like (untested):
def mapping_string_set(self, s, val, delimiter=None, key_delim=None):
def p(v):
try:
v = int(v)
except:
pass
return v
# possible extend for primitives like float, datetime, booleans, etc.
if delimiter is None:
delimiter = '.'
if key_delim is None:
key_delim = ','
try:
key, rest = s.split(delimiter, 1)
except ValueError:
key, rest = s, None
if key_delim in key:
key = tuple((p(key) for key in key.split(key_delim)))
else:
key = p(key)
if rest is None:
self[key] = val
return
self[key].string_set(rest, val, delimiter, key_delim)
ruamel.yaml.comments.CommentedMap.string_set = mapping_string_set
def sequence_string_set(self, s, delimiter=None, key_delim=None):
if delimiter is None:
delimiter = '.'
try:
key, rest = s.split(delimiter, 1)
except ValueError:
key, rest = s, None
key = int(key)
if rest is None:
self[key] = val
return
self[key].string_set(rest, val, delimiter, key_delim)
ruamel.yaml.comments.CommentedSeq.string_set = sequence_string_set

Related

Access elements inside yaml using python

I am using yaml and pyyaml to configure my application.
Is it possible to configure something like this -
config.yml -
root:
repo_root: /home/raghhuveer/code/data_science/papers/cv/AlexNet_lght
data_root: $root.repo_root/data
service:
root: $root.data_root/csv/xyz.csv
yaml loading function -
def load_config(config_path):
config_path = os.path.abspath(config_path)
if not os.path.isfile(config_path):
raise FileNotFoundError("{} does not exist".format(config_path))
else:
with open(config_path) as f:
config = yaml.load(f, Loader=yaml.SafeLoader)
# logging.info(config)
logging.info("Config used for run - \n{}".format(yaml.dump(config, sort_keys=False)))
return DotDict(config)
Current Output-
root:
repo_root: /home/raghhuveer/code/data_science/papers/cv/AlexNet_lght
data_root: ${root.repo_root}/data
service:
root: ${root.data_root}/csv/xyz.csv
Desired Output -
root:
repo_root: /home/raghhuveer/code/data_science/papers/cv/AlexNet_lght
data_root: /home/raghhuveer/code/data_science/papers/cv/AlexNet_lght/data
service:
root: /home/raghhuveer/code/data_science/papers/cv/AlexNet_lght/data/csv/xyz.csv
Is this even possible with python? If so any help would be really nice.
Thanks in advance.

A general approach:
read the file as is
search for strings containing $:
determine the "path" of "variables"
replace the "variables" with actual values
An example, using recursive call for dictionaries and replaces strings:
import re, pprint, yaml
def convert(input,top=None):
"""Replaces $key1.key2 with actual values. Modifies input in-place"""
if top is None:
top = input # top should be the original input
if isinstance(input,dict):
ret = {k:convert(v,top) for k,v in input.items()} # recursively convert items
if input != ret: # in case order matters, do it one or several times more until no change happens
ret = convert(ret)
input.update(ret) # update original input
return input # return updated input (for the case of recursion)
if isinstance(input,str):
vars = re.findall(r"\$[\w_\.]+",input) # find $key_1.key_2.keyN sequences
for var in vars:
keys = var[1:].split(".") # remove dollar and split by dots to make "key chain"
val = top # starting from top ...
for k in keys: # ... for each key in the key chain ...
val = val[k] # ... go one level down
input = input.replace(var,val) # replace $key sequence eith actual value
return input # return modified input
# TODO int, float, list, ...
with open("in.yml") as f: config = yaml.load(f) # load as is
convert(config) # convert it (in-place)
pprint.pprint(config)
Output:
{'root': {'data_root': '/home/raghhuveer/code/data_science/papers/cv/AlexNet_lght/data',
'repo_root': '/home/raghhuveer/code/data_science/papers/cv/AlexNet_lght'},
'service': {'root': '/home/raghhuveer/code/data_science/papers/cv/AlexNet_lght/data/csv/xyz.csv'}}
Note: YAML is not that important here, would work also with JSON, XML or other formats.
Note2: If you use exclusively YAML and exclusively python, some answers from this post may be useful (using anchors and references and application specific local tags)

How to parse Cloudformation YAML to get all the !ImportValue from YAML template?

I am working on a project to parse an AWS Cloudformation Yaml File to extract all the !ImportValue from the YAML template.
I am trying to use ruamel.yaml to parse that (to which I am new), I was able to read the YAML file and get the individual elements.
import ruamel.yaml
def general_constructor(loader, tag_suffix, node):
return node.value
ruamel.yaml.SafeLoader.add_multi_constructor(u'!', general_constructor)
with open(cfFile, 'r') as service:
stream = service.read()
yaml_data = ruamel.yaml.safe_load(stream)
print yaml_data
Above code gets the content of specified YAML file and the output looks like following.
{'Application': {'Properties': {'ApplicationName': [ScalarNode(tag=u'tag:yaml.org,2002:str', value=u'-'),
SequenceNode(tag=u'tag:yaml.org,2002:seq', value=[ScalarNode(tag=u'tag:yaml.org,2002:str', value=u'***'), ScalarNode(tag=u'!ImportValue', value=u'jkl')])],
*
*
ScalarNode(tag=u'!ImportValue', value=u'def'),
*
*
ScalarNode(tag=u'!ImportValue', value=u'rst')])]},
So there are bunch of !ImportValue listed in ScalarNode (e.g ScalarNode(tag=u'!ImportValue', value=u'rst')), I actually want to extract that. Now these ImportValues are scattered in the template at various places. What would be the best way to extract the Value of those? In our cloudformation, we have bunch of YAML files, some of them Exports certain resource and other YAML files import them. So, I want to build a sort of dependency map (May be a JSON file) which will depict the interdependence between Cloud-formation files.

If you use ruamel.yaml's round-trip loader you don't have to do
anything special to load the tag, and walking recursively over the
resulting data structure is relatively easy. The corresponding key
needs to be passed on, as at least the first !ImportValue is within
a sequence under the key.
Assuming an input.yaml consisting of:
Application:
Properties:
ApplicationName: ["-", ["**", !ImportValue "jkl"]]
AnotherKey:
- 42
- nested: !ImportValue xyz
(which might not be exactly what you got as input, but will do for
demonstration purposes), and using the new ruamel.yaml API (which
defaults to round-trip loading/dumping):
import sys
from pathlib import Path
import ruamel.yaml
ta = ruamel.yaml.comments.Tag.attrib
yaml = ruamel.yaml.YAML()
data = yaml.load(Path('input.yaml'))
def process(d, key=None):
if isinstance(d, dict):
for k, v in d.items():
for res in process(v, k): # recurse and pass on new key
yield res
elif isinstance(d, list):
for item in d:
for res in process(item, key):
yield res
else:
try:
if getattr(d, ta, None).value == '!ImportValue':
yield (key, d)
except AttributeError:
pass
for k, v in process(data):
print(k, '->', v)
which gives:
ApplicationName -> jkl
nested -> xyz

Iterating a conversion of a string to a float in a scripting file when parsing an old file

I am using a new script (a) to extract information from an old script (b) to create a new file (c). I am looking for an equal sign in the old script (b) and want to modify the modification script (a) to make it automated.
The string is
lev1tolev2 'from=e119-b3331l1 mappars="simp:180" targ=enceladus.bi.def.3 km=0.6 lat=(-71.5,90) lon=(220,360)'
It is written in python 3.
The current output is fixed at
cam2map from=e119-b3331l1 to=rsmap-x map=enc.Ink.map pixres=mpp defaultrange=MAP res=300 minlat=-71.5 maxlat=90 minlon=220 maxlon=360
Currently, I have the code able to export a string of 0.6 for all of the iterations of lev1tolev2, but each one of these is going to be different.
cam2map = Call("cam2map")
cam2map.kwargs["from"] = old_lev1tolev2.kwargs["from"]
cam2map.kwargs["to"] = "rsmap-x"
cam2map.kwargs["map"] = "enc.Ink.map"
cam2map.kwargs["pixres"] = "mpp"
cam2map.kwargs["defaultrange"] = "MAP"
**cam2map.kwargs["res"] = float((old_lev1tolev2.kwargs["km"]))**
cam2map.kwargs["minlat"] = lat[0]
cam2map.kwargs["maxlat"] = lat[1]
cam2map.kwargs["minlon"] = lon[0]
cam2map.kwargs["maxlon"] = lon[1]
I have two questions, why is this not converting the string to a float? And, why is this not iterating over all of the lev1tolev2 commands as everything else in the code does?
The full code is available here.
https://codeshare.io/G6drmk

The problem occurred at a different location in the code.
def escape_kw_value(value):
if not isinstance(value, str):
return value
elif (value.startswith(('"', "'")) and value.endswith(('"', "'"))):
return value
# TODO escape the quote with \" or \'
#if value.startswith(('"', "'")) or value.endswith(('"', "'")):
# return value
if " " in value:
value = '"{}"'.format(value)
return value

it doesn't seem to clear to me, but from you syntax here :
**cam2map.kwargs["res"] = float((old_lev1tolev2.kwargs["km"]))**
I'd bet that cam2map.kwargs["res"] is a dict, and you thought that it would convert every values in the dict, using the ** syntax. The float built-in should then be called in a loop over the elements of the dict, or possible a list-comprehension as here :
cam2map.kwargs["res"] = dict()
for key, value in old_lev1tolev2.kwars["res"].items():
cam2map.kwargs["res"][key] = float(value)
Edit :
Ok so, it seems you took the string 'from=e119-b3331l1 mappars="simp:180" targ=enceladus.bi.def.3 km=0.6 lat=(-71.5,90) lon=(220,360)'
And then thought that calling youstring.kwargs would give you a dict, but it won't, you can probably parse it to a dict first, using some lib, or, you use mystring.split('=') and then work your way to a dict first, like that:
output = dict()
for one_bit in lev_1_lev2.split(' '):
key, value = one_bit.split('=')
output[key] = value

Return all keys along with value in nested dictionary

I am working on getting all text that exists in several .yaml files placed into a new singular YAML file that will contain the English translations that someone can then translate into Spanish.
Each YAML file has a lot of nested text. I want to print the full 'path', aka all the keys, along with the value, for each value in the YAML file. Here's an example input for a .yaml file that lives in the myproject.section.more_information file:
default:
heading: Here’s A Title
learn_more:
title: Title of Thing
url: www.url.com
description: description
opens_new_window: true
and here's the desired output:
myproject.section.more_information.default.heading: Here’s a Title
myproject.section.more_information.default.learn_more.title: Title of Thing
mproject.section.more_information.default.learn_more.url: www.url.com
myproject.section.more_information.default.learn_more.description: description
myproject.section.more_information.default.learn_more.opens_new_window: true
This seems like a good candidate for recursion, so I've looked at examples such as this answer
However, I want to preserve all of the keys that lead to a given value, not just the last key in a value. I'm currently using PyYAML to read/write YAML.
Any tips on how to save each key as I continue to check if the item is a dictionary and then return all the keys associated with each value?

What you're wanting to do is flatten nested dictionaries. This would be a good place to start: Flatten nested Python dictionaries, compressing keys
In fact, I think the code snippet in the top answer would work for you if you just changed the sep argument to ..
edit:
Check this for a working example based on the linked SO answer http://ideone.com/Sx625B
import collections
some_dict = {
'default': {
'heading': 'Here’s A Title',
'learn_more': {
'title': 'Title of Thing',
'url': 'www.url.com',
'description': 'description',
'opens_new_window': 'true'
}
}
}
def flatten(d, parent_key='', sep='_'):
items = []
for k, v in d.items():
new_key = parent_key + sep + k if parent_key else k
if isinstance(v, collections.MutableMapping):
items.extend(flatten(v, new_key, sep=sep).items())
else:
items.append((new_key, v))
return dict(items)
results = flatten(some_dict, parent_key='', sep='.')
for item in results:
print(item + ': ' + results[item])
If you want it in order, you'll need an OrderedDict though.

Walking over nested dictionaries begs for recursion and by handing in the "prefix" to "path" this prevents you from having to do any manipulation on the segments of your path (as #Prune) suggests.
There are a few things to keep in mind that makes this problem interesting:
because you are using multiple files can result in the same path in multiple files, which you need to handle (at least throwing an error, as otherwise you might just lose data). In my example I generate a list of values.
dealing with special keys (non-string (convert?), empty string, keys containing a .). My example reports these and exits.
Example code using ruamel.yaml ¹:
import sys
import glob
import ruamel.yaml
from ruamel.yaml.comments import CommentedMap, CommentedSeq
from ruamel.yaml.compat import string_types, ordereddict
class Flatten:
def __init__(self, base):
self._result = ordereddict() # key to list of tuples of (value, comment)
self._base = base
def add(self, file_name):
data = ruamel.yaml.round_trip_load(open(file_name))
self.walk_tree(data, self._base)
def walk_tree(self, data, prefix=None):
"""
this is based on ruamel.yaml.scalarstring.walk_tree
"""
if prefix is None:
prefix = ""
if isinstance(data, dict):
for key in data:
full_key = self.full_key(key, prefix)
value = data[key]
if isinstance(value, (dict, list)):
self.walk_tree(value, full_key)
continue
# value is a scalar
comment_token = data.ca.items.get(key)
comment = comment_token[2].value if comment_token else None
self._result.setdefault(full_key, []).append((value, comment))
elif isinstance(base, list):
print("don't know how to handle lists", prefix)
sys.exit(1)
def full_key(self, key, prefix):
"""
check here for valid keys
"""
if not isinstance(key, string_types):
print('key has to be string', repr(key), prefix)
sys.exit(1)
if '.' in key:
print('dot in key not allowed', repr(key), prefix)
sys.exit(1)
if key == '':
print('empty key not allowed', repr(key), prefix)
sys.exit(1)
return prefix + '.' + key
def dump(self, out):
res = CommentedMap()
for path in self._result:
values = self._result[path]
if len(values) == 1: # single value for path
res[path] = values[0][0]
if values[0][1]:
res.yaml_add_eol_comment(values[0][1], key=path)
continue
res[path] = seq = CommentedSeq()
for index, value in enumerate(values):
seq.append(value[0])
if values[0][1]:
res.yaml_add_eol_comment(values[0][1], key=index)
ruamel.yaml.round_trip_dump(res, out)
flatten = Flatten('myproject.section.more_information')
for file_name in glob.glob('*.yaml'):
flatten.add(file_name)
flatten.dump(sys.stdout)
If you have an additional input file:
default:
learn_more:
commented: value # this value has a comment
description: another description
then the result is:
myproject.section.more_information.default.heading: Here’s A Title
myproject.section.more_information.default.learn_more.title: Title of Thing
myproject.section.more_information.default.learn_more.url: www.url.com
myproject.section.more_information.default.learn_more.description:
- description
- another description
myproject.section.more_information.default.learn_more.opens_new_window: true
myproject.section.more_information.default.learn_more.commented: value # this value has a comment
Of course if your input doesn't have double paths, your output won't have any lists.
By using string_types and ordereddict from ruamel.yaml makes this Python2 and Python3 compatible (you don't indicate which version you are using).
The ordereddict preserves the original key ordering, but this is of course dependent on the processing order of the files. If you want the paths sorted, just change dump() to use:
for path in sorted(self._result):
Also note that the comment on the 'commented' dictionary entry is preserved.
¹ ruamel.yaml is a YAML 1.2 parser that preserves comments and other data on round-tripping (PyYAML does most parts of YAML 1.1). Disclaimer: I am the author of ruamel.yaml

Keep a simple list of strings, being the most recent key at each indentation depth. When you progress from one line to the next with no change, simply change the item at the end of the list. When you "out-dent", pop the last item off the list. When you indent, append to the list.
Then, each time you hit a colon, the corresponding key item is the concatenation of the strings in the list, something like:
'.'.join(key_list)
Does that get you moving at an honorable speed?

Python Config Parser (Duplicate Key Support)

So I recently started writing a config parser for a Python project I'm working on. I initially avoided configparser and configobj, because I wanted to support a config file like so:
key=value
key2=anothervalue
food=burger
food=hotdog
food=cake icecream
In short, this config file is going to be edited via the command line over SSH often. So I don't want to tab or finicky about spacing (like YAML), but I also want avoid keys with multiple values (easily 10 or more) being line wrapped in vi. This is why I would like to support duplicate keys.
An my ideal world, when I ask the Python config object for food, it would give me a list back with ['burger', 'hotdog', 'cake', 'icecream']. If there wasn't a food value defined, it would look in a defaults config file and give me that/those values.
I have already implemented the above
However, my troubles started when I realized I wanted to support preserving inline comments and such. The way I handle reading and writing to the config files, is decoding the file into a dict in memory, read the values from the dict, or write values to the dict, and then dump that dict back out into a file. This isn't really nice for preserving line order and commenting and such and it's bugging the crap out of me.
A) ConfigObj looks like it has everything I need except support duplicate keys. Instead it wants me to make a list is going to be a pain to edit manually in vi over ssh due to line wrapping. Can I make configobj more ssh/vi friendly?
B) Is my homebrew solution wrong? Is there a better way of reading/writing/storing my config values? Is there any easy way to handle changing a key value in a config file by just modifying that line and rewriting the entire config file from memory?

Well I would certainly try to leverage what is in the standard library if I could.
The signature for the config parser classes look like this:
class ConfigParser.SafeConfigParser([defaults[, dict_type[, allow_no_value]]])
Notice the dict_type argument. When provided, this will be used to construct the dictionary objects for the list of sections, for the options within a section, and for the default values. It defaults to collections.OrderedDict. Perhaps you could pass something in there to get your desired multiple-key behavior, and then reap all the advantages of ConfigParser. You might have to write your own class to do this, or you could possibly find one written for you on PyPi or in the ActiveState recipes. Try looking for a bag or multiset class.
I'd either go that route or just suck it up and make a list:
foo = value1, value2, value3

Crazy idea: make your dictionary values as a list of 3-tuples with line number, col number and value itself and add special key for comment.
CommentSymbol = ';'
def readConfig(filename):
f = open(filename, 'r')
if not f:
return
def addValue(dict, key, lineIdx, colIdx, value):
if key in dict:
dict[key].append((lineIdx, colIdx, value))
else:
dict[key] = [(lineIdx, colIdx, value)]
res = {}
i = 0
for line in f.readlines():
idx = line.find(CommentSymbol)
if idx != -1:
comment = line[idx + 1:]
addValue(res, CommentSymbol, i, idx, comment)
line = line[:idx]
pair = [x.strip() for x in line.split('=')][:2]
if len(pair) == 2:
addValue(res, pair[0], i, 0, pair[1])
i += 1
return res
def writeConfig(dict, filename):
f = open(filename, 'w')
if not f:
return
index = sorted(dict.iteritems(), cmp = lambda x, y: cmp(x[1][:2], y[1][:2]))
i = 0
for k, V in index:
for v in V:
if v[0] > i:
f.write('\n' * (v[0] - i - 1))
if k == CommentSymbol:
f.write('{0}{1}'.format(CommentSymbol, str(v[2])))
else:
f.write('{0} = {1}'.format(str(k), str(v[2])))
i = v[0]
f.close()

Develop Reference

Python is a programming language that lets you work quickly and integrate systems more effectively.