Access elements inside yaml using python - python

I am using yaml and pyyaml to configure my application.
Is it possible to configure something like this -
config.yml -
root:
repo_root: /home/raghhuveer/code/data_science/papers/cv/AlexNet_lght
data_root: $root.repo_root/data
service:
root: $root.data_root/csv/xyz.csv
yaml loading function -
def load_config(config_path):
config_path = os.path.abspath(config_path)
if not os.path.isfile(config_path):
raise FileNotFoundError("{} does not exist".format(config_path))
else:
with open(config_path) as f:
config = yaml.load(f, Loader=yaml.SafeLoader)
# logging.info(config)
logging.info("Config used for run - \n{}".format(yaml.dump(config, sort_keys=False)))
return DotDict(config)
Current Output-
root:
repo_root: /home/raghhuveer/code/data_science/papers/cv/AlexNet_lght
data_root: ${root.repo_root}/data
service:
root: ${root.data_root}/csv/xyz.csv
Desired Output -
root:
repo_root: /home/raghhuveer/code/data_science/papers/cv/AlexNet_lght
data_root: /home/raghhuveer/code/data_science/papers/cv/AlexNet_lght/data
service:
root: /home/raghhuveer/code/data_science/papers/cv/AlexNet_lght/data/csv/xyz.csv
Is this even possible with python? If so any help would be really nice.
Thanks in advance.

A general approach:
read the file as is
search for strings containing $:
determine the "path" of "variables"
replace the "variables" with actual values
An example, using recursive call for dictionaries and replaces strings:
import re, pprint, yaml
def convert(input,top=None):
"""Replaces $key1.key2 with actual values. Modifies input in-place"""
if top is None:
top = input # top should be the original input
if isinstance(input,dict):
ret = {k:convert(v,top) for k,v in input.items()} # recursively convert items
if input != ret: # in case order matters, do it one or several times more until no change happens
ret = convert(ret)
input.update(ret) # update original input
return input # return updated input (for the case of recursion)
if isinstance(input,str):
vars = re.findall(r"\$[\w_\.]+",input) # find $key_1.key_2.keyN sequences
for var in vars:
keys = var[1:].split(".") # remove dollar and split by dots to make "key chain"
val = top # starting from top ...
for k in keys: # ... for each key in the key chain ...
val = val[k] # ... go one level down
input = input.replace(var,val) # replace $key sequence eith actual value
return input # return modified input
# TODO int, float, list, ...
with open("in.yml") as f: config = yaml.load(f) # load as is
convert(config) # convert it (in-place)
pprint.pprint(config)
Output:
{'root': {'data_root': '/home/raghhuveer/code/data_science/papers/cv/AlexNet_lght/data',
'repo_root': '/home/raghhuveer/code/data_science/papers/cv/AlexNet_lght'},
'service': {'root': '/home/raghhuveer/code/data_science/papers/cv/AlexNet_lght/data/csv/xyz.csv'}}
Note: YAML is not that important here, would work also with JSON, XML or other formats.
Note2: If you use exclusively YAML and exclusively python, some answers from this post may be useful (using anchors and references and application specific local tags)

Related

Is it possible to remove unecessary nested structure in yaml file?

I need to set a param that is deep inside a yaml object like below:
executors:
hpc01:
context:
cp2k:
charge: 0
Is it possible to make it more clear, for example
executors: hpc01: context: cp2k: charge: 0
I am using ruamel.yaml in Python to parse the file and it fails to parse the example. Is there some yaml dialect can support such style, or is there better way to write such configuration in standard yaml spec?
since all json is valid yaml...
executors: {"hpc01" : {"context": {"cp2k": {"charge": 0}}}}
should be valid...
a little proof:
from ruamel.yaml import YAML
a = YAML().load('executors: {"hpc01" : {"context": {"cp2k": {"charge": 0}}}}')
b = YAML().load('''executors:
hpc01:
context:
cp2k:
charge: 0''')
if a == b:
print ("equal")
will print: equal.
What you are proposing is invalid YAML, since the colon + space is parsed as a value indicator. Since
YAML can have mappings as keys for other mappings, you would get all kinds of interpretation issues, such as
should
a: b: c
be interpreted as a mapping with key a: b and value c or as a mapping with key a and value b: c.
If you want to write everything on one line, and don't want the overhead of YAML's flow-style, I suggest
you use the fact that the value indicator expects a space after the colon and do a little post-processing:
import sys
import ruamel.yaml
yaml_str = """\
before: -1
executors:hpc01:context:cp2k:charge: 0
after: 1
"""
COLON = ':'
def unfold_keys(d):
if isinstance(d, dict):
replace = []
for idx, (k, v) in enumerate(d.items()):
if COLON in k:
for segment in reversed(k.split(COLON)):
v = {segment: v}
replace.append((idx, k, v))
else:
unfold_keys(v)
for idx, oldkey, kv in replace:
del d[oldkey]
v = list(kv.values())[0]
# v.refold = True
d.insert(idx, list(kv.keys())[0], v)
elif isinstance(d, list):
for elem in d:
unfold_keys
return d
yaml = ruamel.yaml.YAML()
data = unfold_keys(yaml.load(yaml_str))
yaml.dump(data, sys.stdout)
which gives:
before: -1
executors:
hpc01:
context:
cp2k:
charge: 0
after: 1
Since ruamel.yaml parses mappings in the default mode to CommentedMap instances which have .insert() method,
you can actually preserve the position of the "unfolded" key in the mapping.
You can of course use another character (e.g. underscore). You can also reverse the process by uncommenting the line # v.refold = True and provide another recursive function that walks over the data and checks on that attribute and does the reverse
of unfold_keys(), just before dumping.

Python - Getting integers from a text (config) file [duplicate]

This question already has answers here:
How do I convert all strings in a list of lists to integers?
(15 answers)
Closed 1 year ago.
I'm currently learning Python and I'm having an issue with getting integer values from a text file (myfile.config). My goal is to be able to read a text file, find the integers then assign the said integers to some variables.
This is what my text file looks like (myFile.config):
someValue:100
anotherValue:1000
yetAnotherValue:-5
someOtherValueHere:5
This is what I've written so far:
import os.path
import numpy as np
# Check if config exists, otherwise generate a config file
def checkConfig():
if os.path.isfile('myFile.config'):
return True
else:
print("Config file not found - Generating default config...")
configFile = open("myFile.config", "w+")
configFile.write("someValue:100\rnotherValue:1000\ryetAnotherValue:-5\rsomeOtherValueHere:5")
configFile.close()
# Read the config file
def readConfig():
tempConfig = []
configFile = open('myFile.config', 'r')
for line in configFile:
cleanedField = line.strip() # remove \n from elements in list
fields = cleanedField.split(":")
tempConfig.append(fields[1])
configFile.close()
print(str(tempConfig))
return tempConfig
configOutput = np.asarray(readConfig())
someValue = configOutput[0]
anotherValue = configOutput[1]
yetAnotherValue = configOutput[2]
someOtherValueHere = configOutput[3]
One of the issues which I've noticed so far (if my current understanding of Python is correct) is that the elements in the list are being stored as strings. I've tried to correct this by converting the list to an array via the NumPy library, but it hasn't worked.
Thank you for taking the time to read this question.
You can use float() or int() to turn strings into either float or integer. So in this case you can just type
tempConfig.append(float(fields[1]))
or
tempConfig.append(int(fields[1]))
You have to call int for the conversion and I would use a dictionary for the result.
def read_config():
configuration = {}
with open('myFile.config', 'r') as config_file:
for line in config_file:
fields = line.split(':')
if len(fields) == 2:
configuration[fields[0].strip()] = int(fields[1])
print(configuration) # for debugging
return configuration
Now there's no need to create single variables like someValue or anotherValue. If you call the function with config = read_config() you have the values available as config['someValue'] and config['anotherValue'].
This is a much more flexible approach. Your current code will fail if you change the order of the lines in the configuration file. And if you add a fifth configuration entry you will have to change your code to create a new variable. The code in this answer can handle this by design.
With some eval magic, you get a dict out of the text file and if you insist, you can put them in the global namespace using globals()
def read_config():
config = '{' + open('myFile.config', 'r').read() + '}'
globals().update(eval(config.replace('{', '{"').replace(':', '":').replace('\n', ',"')))

How to map over a CommentedMap while preserving the comments/style?

Given a ruamel.yaml CommentedMap, and some transformation function f: CommentedMap → Any, I would like to produce a new CommentedMap with transformed keys and values, but otherwise as similar as possible to the original.
If I don't care about preserving style, I can do this:
result = {
f(key) : f(value)
for key, value in my_commented_map.items()
}
If I didn't need to transform the keys (and I didn't care about mutating the original), I could do this:
for key, value in my_commented_map.items():
my_commented_map[key] = f(value)
The style and comment information are each attached to the
CommentedMap via special attributes. The style you can copy, but
the comments are partly indexed to key on which line they occur, and
if you transform that key, you also need to transform that indexed
comment.
In your first example you apply f() to both key and value, I'll use
seperate functions in my example, all-capsing the keys, and
all-lowercasing the values (this of course only works on string type
keys and value, so this is a restriction of the example, not of
the solution)
import sys
import ruamel.yaml
from ruamel.yaml.comments import CommentedMap as CM
from ruamel.yaml.comments import Format, Comment
yaml_str = """\
# example YAML document
abc: All Strings are Equal # but some Strings are more Equal then others
klm: Flying Blue
xYz: the End # for now
"""
def fkey(s):
return s.upper()
def fval(s):
return s.lower()
def transform(data, fk, fv):
d = CM()
if hasattr(data, Format.attrib):
setattr(d, Format.attrib, getattr(data, Format.attrib))
ca = None
if hasattr(data, Comment.attrib):
setattr(d, Comment.attrib, getattr(data, Comment.attrib))
ca = getattr(d, Comment.attrib)
# as the key mapping could map new keys on old keys, first gather everything
key_com = {}
for k in data:
new_k = fk(k)
d[new_k] = fv(data[k])
if ca is not None and k in ca.items:
key_com[new_k] = ca.items.pop(k)
if ca is not None:
assert len(ca.items) == 0
ca._items = key_com # the attribute, not the read-only property
return d
yaml = ruamel.yaml.YAML()
data = yaml.load(yaml_str)
# the following will print any new CommentedMap with curly braces, this just here to check
# if the style attribute copying is working correctly, remove from real code
yaml.default_flow_style = True
data = transform(data, fkey, fval)
yaml.dump(data, sys.stdout)
which gives:
# example YAML document
ABC: all strings are equal # but some Strings are more Equal then others
KLM: flying blue
XYZ: the end # for now
Please note:
the above tries (and succeeds) to start a comment in the original
column, if that is not possible, e.g. when a transformed key or
value takes more space, it is pushed further to the right.
if you have a more complex datastructure, recursively walk the tree, descending into mappings
and sequences. In that case it might be more easy to store (key, value, comment) tuples
then pop() all the keys and reinsert the stored values (instead of rebuilding the tree).

Accessing / Replacing Yaml values via python using variables

I have been trying to solve what I thought would be simple but can't wrap my head around getting a yaml file updated based on a variable
What I have:
An ansible hosts file in YAML format. This hosts file is not 100% the same all the time. It can have a dictionary of multiple image values (as one example) and I only want one to change.
namespace: demo1
images:
image1:
path: "path1"
version: "v1"
image2:
path: "path2"
version: "1.2.3"
user: "root"
A YAML file that contains the key/values for things I want to replace. We already have a lot of configuration inside this YAML for other parts of our system so I don't want to split off to some other type of config type if I can help it (ini, JSON, etc) I would really want this to be dot notation.
schema: v1.0
hostfile:
- path: path/to/ansible_hosts_file
images:
image1.version: v1.1
I am trying to find a way to load the YAML from #1, read in the key hostfile.images.[variable] to replace and write back to the original ansible file with the new value. I keep getting tripped up on the variable aspect since today it can be image1.version and the next config its image2.path or both at the same time.
I think your problem primarily comes from mixing key-value pairs with dotted notation. I.e.
images:
image1.version: v1.1
instead of doing
images.image1.version: v1.1
Retrieving by dotted notation has been solved for Python and arbitrary separators (not necessarily '.') in this answer. Setting just involves providing two extra functions that take a second argument, which is the value to set and graft them onto CommentedMap resp. CommentesSeq)
Based on that you need to preselect based on your key:
upd = ruamel.yaml.round_trip_load(open('update.yaml')
# schema check here
for hostfile in upd['hostfile']:
data = ruamel.yaml.round_trip_load(open(hostfile['path']))
images = data['images']
for dotted in hostfile['images']:
val = hostfile['images'][dotted]
images.string_set(dotted, val)
The actual string_set-ting code could look like (untested):
def mapping_string_set(self, s, val, delimiter=None, key_delim=None):
def p(v):
try:
v = int(v)
except:
pass
return v
# possible extend for primitives like float, datetime, booleans, etc.
if delimiter is None:
delimiter = '.'
if key_delim is None:
key_delim = ','
try:
key, rest = s.split(delimiter, 1)
except ValueError:
key, rest = s, None
if key_delim in key:
key = tuple((p(key) for key in key.split(key_delim)))
else:
key = p(key)
if rest is None:
self[key] = val
return
self[key].string_set(rest, val, delimiter, key_delim)
ruamel.yaml.comments.CommentedMap.string_set = mapping_string_set
def sequence_string_set(self, s, delimiter=None, key_delim=None):
if delimiter is None:
delimiter = '.'
try:
key, rest = s.split(delimiter, 1)
except ValueError:
key, rest = s, None
key = int(key)
if rest is None:
self[key] = val
return
self[key].string_set(rest, val, delimiter, key_delim)
ruamel.yaml.comments.CommentedSeq.string_set = sequence_string_set

Loading empty dictionary when YAML file is empty (Python 3.4)

I have a YAML file that is empty, and when I load it in, I would like to load it in as an empty dictionary. For example, I have
import yaml
with open('an_empty_file.yml', 'r') as config_file:
config=yaml.load(config_file)
print(config)
None
It turns out that yaml.load(config_file) will return a NoneType object, which I suppose makes sense. Is there an easy way to just return an empty dictionary?
If it returns None, you can just use or so your config will hold an empty dictionary by default if yaml.load returns None:
config = yaml.load(config_file) or {}
Ultimately, what is happening here, starts from the left hand side:
We are right now assigning a value in to config. We are stating to assign yaml.load(config_file) in to config, however, by using the or, what we are saying in this statement is that, if it (yaml.load) evaluates to a None or False condition (i.e. In this case for us, None), we will then assign {} to config.
Quick demo taking an empty string as config_file:
>>> import yaml
>>> config_file = ''
>>> config = yaml.load(config_file) or {}
>>> print(config)
{}
At the top level a YAML file can have
a mapping, indicated by key value pairs separated by : (optionally in flow style using {}),
a sequence indicated by - if it is block style and [ ] if it is flow style
a (single) scalar.
Your file is not a mapping or a sequence, so it is a scalar and since the scalar with an empty representation is considered to be the same as specifying
null
in the file.
To load this as an empty dictionary and not as None you can do:
with open('an_empty_file.yml', 'r') as config_file:
config = yaml.load(config_file)
config = {} if config is None else config
print(config)
You should never try and take a short cut of doing:
config = yaml.load(config_file) or {}
as this would also cause files with the single scalar 0:
0
with single scalar floats 0.0:
0.0
with hex scalars 0x0:
0x0
with an empty double quoted scalar ""
""
with an empty single quoted scalar ''
''
with an empty folded scalar:
>
with an empty literal style scalar:
|
with the single boolean False:
False
or no ¹:
no
or off ¹:
off
as well as the empty sequence:
[
]
to result in config to be an empty dictionary.
The number of different file contents that would be incorrectly changed to empty dictionaries is endless.
¹ This is a result of PyYAML never been updated for 1.1 to the 1.2 standard published in 2009. If it would be it would also convert octals of the form 0o0.

Categories

Resources