Loading a YAML-file throws an error in Python - python

I am reading in some YAML-files like this:
data = yaml.safe_load(pathtoyamlfile)
When doing so I get the followin error:
yaml.constructor.ConstructorError: could not determine a constructor for the tag 'tag:yaml.org,2002:value'
When checking for the line of the YAML-file which is also given in the error messages I recognized that there is always this key-value-pair: simple: =.
Since the YAML-files are autogenerated I am not sure if I can change the files themselves. Is there a way on reading the data of the YAML-files none the less?

It looks like you have hit this bug. There is a workaround suggested in the comments.
Given this content in example.yaml:
example: =
This code fails as you've described in your question:
import yaml
with open('example.yaml') as fd:
data = yaml.safe_load(fd)
print(data)
But this works:
import yaml
yaml.SafeLoader.yaml_implicit_resolvers.pop('=')
with open('example.yaml') as fd:
data = yaml.safe_load(fd)
print(data)
And outputs:
{'example': '='}

If you cannot change the input, you might be able to upgrade the library that you use:
import sys
import ruamel.yaml
yaml_str = """\
example: =
"""
yaml = ruamel.yaml.YAML()
data = yaml.load(yaml_str)
for key, value in data.items():
print(f'{key}: {value}')
which gives:
example: =
Please be aware that ruamel.yaml still has this bug in its safe mode loading ( YAML(typ='safe') ).

Related

How do I generate YAML containing local tags with ruamel.yaml?

I'm using ruamel.yaml to generate a YAML file that will be read by Tavern, which requires the file to contain a list like this:
includes:
- !include vars.yaml
Attempting to use any of the usual approaches to dump the data as strings results in single quotes being added around the tags, which doesn't work when the YAML is ingested by the next tool.
How do I generate a YAML file that contains unquoted local tags, starting with data that is defined in a dictionary?
I was able to create a YAML file with the required format using the following approach, based on prior examples. My approach is more flexible because it allows the tag handle to be an instance property rather than a class property, so you don't need to define a different class for every tag handle.
import sys
from ruamel.yaml import YAML
yaml = YAML(typ='rt')
class TaggedString:
def __init__(self, handle, value):
self.handle = handle
self.value = value
#classmethod
def to_yaml(cls, representer, node):
# I don't understand the arguments to the following function!
return representer.represent_scalar(u'{.handle}'.format(node),
u'{.value}'.format(node))
yaml.register_class(TaggedString)
data = {
'includes': [
TaggedString('!include', 'vars.yaml'),
TaggedString('!exclude', 'dummy.yaml')
]
}
yaml.dump(data, sys.stdout)
Output:
includes:
- !include vars.yaml
- !exclude dummy.yaml
I am not sure if this is the best approach. I might be missing a simpler way to achieve the same result. Note that my goal is not to dump a Python class; I'm just doing that as a way to get the tag to be written correctly.
I am not sure if this is a better approach, but if you had tried to round-trip your required output, you
would have seen that ruamel.yaml actually can preserve your tagged strings, without you having to
do anything. Inspecting the Python datastructure, you'll notice that ruamel.yaml does
this by creating a TaggedScalar (as you cannnot attach attributes to the built-in string type).
import sys
import ruamel.yaml
yaml_str = """\
includes:
- !include vars.yaml
- !exclude dummy.yaml
"""
yaml = ruamel.yaml.YAML()
data = yaml.load(yaml_str)
yaml.dump(data, sys.stdout)
incl = data['includes'][0]
print(type(incl))
which gives:
includes:
- !include vars.yaml
- !exclude dummy.yaml
<class 'ruamel.yaml.comments.TaggedScalar'>
After inspecting comments.py (and possible constructor.py), you should be able
to make ruamel.yaml's internal data structure on the fly:
import sys
import ruamel.yaml
from ruamel.yaml.comments import TaggedScalar
def tagged_string(tag, val):
# starting with ruamel.yaml>0.16.5 you can replace the following lines with:
# return TaggedScalar(value=val, tag=tag)
ret_val = TaggedScalar()
ret_val.value = val
ret_val.yaml_set_tag(tag)
return ret_val
yaml = ruamel.yaml.YAML()
data = dict(includes=[tagged_string('!include', 'vars.yaml'),
tagged_string('!include', 'vars.yaml'),
])
yaml.dump(data, sys.stdout)
which also gives:
includes:
- !include vars.yaml
- !include vars.yaml

How can I edit a yaml file while preserving the format using a python script? [duplicate]

Here is a config file, I use PyYAML to change some value from it and then I write some config, but it will change my format, it confuses me.
$ results.yaml
nas:
mount_dir: '/nvr'
mount_dirs: ['/mount/data0', '/mount/data1', '/mount/data2']
# yaml.py
import yaml.py
conf = open("results.conf", "r")
results = yaml.load(conf)
conf.close()
result['nas']['mount_dirs'][0]= "haha"
with open('/home/zonion/speedio/speedio.conf', 'w') as conf:
yaml.dump(speedio, conf, default_flow_style=False)
conf.close()
but it change my format,what should I do?
# cat results.conf
nas:
mount_dir: /nvr
mount_dirs:
- haha
- /mount/data1
- /mount/data2
If you use ruamel.yaml ¹, you can relatively easily achieve this, by combining this and this answer here on StackOverlow.
By default ruamel.yaml normalizes to an indent of 2, and drops superfluous quotes. As you don't seem to want that, you have to either explicitly set the indent, or have ruamel.yaml analyse the input, and tell it to preserve quotes:
import sys
import ruamel.yaml
import ruamel.yaml.util
yaml_str = """\
nas:
mount_dir: '/nvr'
mount_dirs: ['/mount/data0', '/mount/data1', '/mount/data2']
"""
result, indent, block_seq_indent = ruamel.yaml.util.load_yaml_guess_indent(
yaml_str, preserve_quotes=True)
result['nas']['mount_dirs'][0] = "haha"
ruamel.yaml.round_trip_dump(result, sys.stdout, indent=indent,
block_seq_indent=block_seq_indent)
instead of the load_yaml_guess_indent() invocation you can do:
result = ruamel.yaml.round_trip_load(yaml_str, preserve_quotes=True)
indent = 4
block_sequence_indent = None
If you want haha to be (single) quoted in the output make it a SingleQuotedScalarString:
result['nas']['mount_dirs'][0] = \
ruamel.yaml.scalarstring.SingleQuotedScalarString("haha")
with that the output will be:
nas:
mount_dir: '/nvr'
mount_dirs: ['haha', '/mount/data1', '/mount/data2']
(given that your short example input has no block style sequences, the block_sequence_indent cannot be determined and will be None)
When using the newer API you have control over the indent of the mapping and sequences seperately:
yaml = ruamel.yaml.YAML()
yaml.indent(mapping=4, sequence=6, offset=3) # not that that looks nice
data = yaml.load(some_stream)
yaml.dump(data, some_stream)
This will make your YAML formatted consistently if it wasn't so to begin with, and make no further changes after the first round-trip.
¹ Disclaimer: I am the author of that package.
ruamel.yaml unfortunately does not completely preserve original format, quoting its docs:
Although individual indentation of lines is not preserved, you can
specify separate indentation levels for mappings and sequences
(counting for sequences does not include the dash for a sequence
element) and specific offset of block sequence dashes within that
indentation.
I do not know any Python library that does that.
When I need to change a YAML file without touching its format I reluctantly use regexp (reluctantly as it's almost as bad as parsing XHTML with it).
Please feel free to suggest a better solution if you know any, I would gladly learn about it!
ruamel implements a round-trip loader and dumper, try:
import ruamel.yaml
conf = open("results.conf", "r")
results = ruamel.yaml.load(conf, ruamel.yaml.RoundTripLoader)
conf.close()
results['nas']['mount_dirs'][0] = "haha"
with open('/home/zonion/speedio/speedio.conf', 'w') as conf:
ruamel.yaml.dump(results, conf, ruamel.yaml.RoundTripDumper)
Try to load it first and the dump like that:
import ruamel.yaml
yaml_str = f"""\
nas:
mount_dir: '/nvr'
mount_dirs: ['/mount/data0', '/mount/data1', '/mount/data2']"""
yaml = ruamel.yaml.YAML()
data = yaml.load(yaml_str)
with open("test.yaml", 'w') as outfile:
yaml.dump(data, outfile)
outfile.close()

python use Pyyaml and keep format

Here is a config file, I use PyYAML to change some value from it and then I write some config, but it will change my format, it confuses me.
$ results.yaml
nas:
mount_dir: '/nvr'
mount_dirs: ['/mount/data0', '/mount/data1', '/mount/data2']
# yaml.py
import yaml.py
conf = open("results.conf", "r")
results = yaml.load(conf)
conf.close()
result['nas']['mount_dirs'][0]= "haha"
with open('/home/zonion/speedio/speedio.conf', 'w') as conf:
yaml.dump(speedio, conf, default_flow_style=False)
conf.close()
but it change my format,what should I do?
# cat results.conf
nas:
mount_dir: /nvr
mount_dirs:
- haha
- /mount/data1
- /mount/data2
If you use ruamel.yaml ¹, you can relatively easily achieve this, by combining this and this answer here on StackOverlow.
By default ruamel.yaml normalizes to an indent of 2, and drops superfluous quotes. As you don't seem to want that, you have to either explicitly set the indent, or have ruamel.yaml analyse the input, and tell it to preserve quotes:
import sys
import ruamel.yaml
import ruamel.yaml.util
yaml_str = """\
nas:
mount_dir: '/nvr'
mount_dirs: ['/mount/data0', '/mount/data1', '/mount/data2']
"""
result, indent, block_seq_indent = ruamel.yaml.util.load_yaml_guess_indent(
yaml_str, preserve_quotes=True)
result['nas']['mount_dirs'][0] = "haha"
ruamel.yaml.round_trip_dump(result, sys.stdout, indent=indent,
block_seq_indent=block_seq_indent)
instead of the load_yaml_guess_indent() invocation you can do:
result = ruamel.yaml.round_trip_load(yaml_str, preserve_quotes=True)
indent = 4
block_sequence_indent = None
If you want haha to be (single) quoted in the output make it a SingleQuotedScalarString:
result['nas']['mount_dirs'][0] = \
ruamel.yaml.scalarstring.SingleQuotedScalarString("haha")
with that the output will be:
nas:
mount_dir: '/nvr'
mount_dirs: ['haha', '/mount/data1', '/mount/data2']
(given that your short example input has no block style sequences, the block_sequence_indent cannot be determined and will be None)
When using the newer API you have control over the indent of the mapping and sequences seperately:
yaml = ruamel.yaml.YAML()
yaml.indent(mapping=4, sequence=6, offset=3) # not that that looks nice
data = yaml.load(some_stream)
yaml.dump(data, some_stream)
This will make your YAML formatted consistently if it wasn't so to begin with, and make no further changes after the first round-trip.
¹ Disclaimer: I am the author of that package.
ruamel.yaml unfortunately does not completely preserve original format, quoting its docs:
Although individual indentation of lines is not preserved, you can
specify separate indentation levels for mappings and sequences
(counting for sequences does not include the dash for a sequence
element) and specific offset of block sequence dashes within that
indentation.
I do not know any Python library that does that.
When I need to change a YAML file without touching its format I reluctantly use regexp (reluctantly as it's almost as bad as parsing XHTML with it).
Please feel free to suggest a better solution if you know any, I would gladly learn about it!
ruamel implements a round-trip loader and dumper, try:
import ruamel.yaml
conf = open("results.conf", "r")
results = ruamel.yaml.load(conf, ruamel.yaml.RoundTripLoader)
conf.close()
results['nas']['mount_dirs'][0] = "haha"
with open('/home/zonion/speedio/speedio.conf', 'w') as conf:
ruamel.yaml.dump(results, conf, ruamel.yaml.RoundTripDumper)
Try to load it first and the dump like that:
import ruamel.yaml
yaml_str = f"""\
nas:
mount_dir: '/nvr'
mount_dirs: ['/mount/data0', '/mount/data1', '/mount/data2']"""
yaml = ruamel.yaml.YAML()
data = yaml.load(yaml_str)
with open("test.yaml", 'w') as outfile:
yaml.dump(data, outfile)
outfile.close()

Python Json Config 'Extended Interpolation'

I am currently using the Python library configparser:
from configparser import ConfigParser, ExtendedInterpolation
I find the ExtendedInterpolation very useful because it avoids the risk of having to reenter constants in multiple places.
I now have a requirement to use a Json document as the basis of the configuration as it provides more structure.
import json
from collections import OrderedDict
def get_json_config(file):
"""Load Json into OrderedDict from file"""
with open(file) as json_data:
d = json.load(json_data, object_pairs_hook=OrderedDict)
return d
Does anyone have any suggestions as to the best way to implement configparser style ExtendedInterpolation?
For example if a node in the Json contains the value ${home_dir}/lumberjack this would copy root node home_dir and take value 'lumberjack'?
Try to use string.Template. But I'm not sure whether it's your need. There is one package can do this may be. Bellow is what i should do.
config.json
{
"home_dir": "/home/joey",
"class_path": "/user/local/bin",
"dir_one": "${home_dir}/dir_one",
"dir_two": "${home_dir}/dir_two",
"sep_path_list": [
"${class_path}/python",
"${class_path}/ruby",
"${class_path}/php"
]
}
python code:
import json
from string import Template
with open("config.json", "r") as config_file:
config_content = config_file.read()
config_template = Template(config_content)
mid_json = json.loads(config_content)
config = config_template.safe_substitute(mid_json)
print config
This can substitute the defined key in json file.
Came in very useful; however, I found "config" is a unicode string; I resolved with:
# proper code
return json.loads(config)

parsing API with Python - how to handle JSON with BOM

I'm using Python 2.7.11 on windows to get JSON data from API (data on trees in Warsaw, Poland, but nevermind that). I want to generate output csv file with all the data provided by the api, for further analysis. I started with a script I used for another project (also discussed here on Stackoverflow and corrected for me by #Martin Taylor).That script didn't work so I tried to modify it using my very basic understanding, googling around and applying pdb debugger. At the moment, the result looks like this:
import pdb
import json
import urllib2
import csv
pdb.set_trace()
url = "https://api.um.warszawa.pl/api/action/datastore_search/?resource_id=ed6217dd-c8d0-4f7b-8bed-3b7eb81a95ba"
myfile = 'C:/dane/drzewa.csv'
csv_myfile = csv.writer(open(myfile, 'wb'))
cols = ['numer_adres', 'stan_zdrowia', 'y_wgs84', 'dzielnica', 'adres', 'lokalizacja', 'wiek_w_dni', 'srednica_k', 'pnie_obwod', 'miasto', 'jednostka', 'x_pl2000', 'wysokosc', 'y_pl2000', 'numer_inw', 'x_wgs84', '_id', 'gatunek_1', 'gatunek', 'data_wyk_pom']
csv_myfile.writerow(cols)
def api_iterate(myfile):
while True:
global url
print url
json_page = urllib2.urlopen(url)
data = json.load(json_page)
json_page.close()
for data_object in data ['result']['records']:
csv_myfile.writerow([data_object[col] for col in cols])
try:
url = data['_links']['next']
except KeyError as e:
break
with open(myfile, 'wb'):
api_iterate(myfile)
I'm a very fresh Python user so I get confused all the time. Now I got to the point when, while reading the objects in json dictionary, I get a Keyerror message associated with the 'x_wgs84' element. I suppose it has something to do with the fact that in the source url this element is preceded by a U+FEFF unicode character. I tried to get around this but I got stuck and would appreciate assistance.
I suspect the code may be corrupt in several other ways - as I mentioned, I'm a very unskilled programmer (yet).
You need to put the key with the unicode character:
To know how to do it, one easy way is to print the keys:
>>> import requests
>>> res = requests.get('https://api.um.warszawa.pl/api/action/datastore_search/?resource_id=ed6217dd-c8d0-4f7b-8bed-3b7eb81a95ba')
>>> data = res.json()
>>> records = data['result']['records']
>>> records[0]
{u'numer_adres': u'', u'stan_zdrowia': u'dobry', u'y_wgs84': u'52.21865', u'y_pl2000': u'5787241.04475524', u'adres': u'ul. ALPEJSKA', u'x_pl2000': u'7511793.96937063', u'lokalizacja': u'Ulica ALPEJSKA', u'wiek_w_dni': u'60', u'miasto': u'Warszawa', u'jednostka': u'Dzielnica Wawer', u'pnie_obwod': u'73', u'wysokosc': u'14', u'data_wyk_pom': u'20130709', u'dzielnica': u'Wawer', u'\ufeffx_wgs84': u'21.172584', u'numer_inw': u'D386200', u'_id': 125435, u'gatunek_1': u'Quercus robur', u'gatunek': u'd\u0105b szypu\u0142kowy', u'srednica_k': u'7'}
>>> records[0].keys()
[u'numer_adres', u'stan_zdrowia', u'y_wgs84', u'y_pl2000', u'adres', u'x_pl2000', u'lokalizacja', u'wiek_w_dni', u'miasto', u'jednostka', u'pnie_obwod', u'wysokosc', u'data_wyk_pom', u'dzielnica', u'\ufeffx_wgs84', u'numer_inw', u'_id', u'gatunek_1', u'gatunek', u'srednica_k']
>>> records[0][u'\ufeffx_wgs84']
u'21.172584'
As you can see, to get your key, you need to write it as '\ufeffx_wgs84' with the unicode character that is causing trouble.
Note: I don't know if you are using python2 or 3, but you might need to put a u before your string declaration in python2 to declare it as unicode string.

Categories

Resources