Can we remove extra space in yaml after colon in PyYaml

Can we remove extra space in yaml after colon in PyYaml - python

I have a dictionary which looks like:
{'ab':8082 , 'bc': 8082}
When I dump it to python yaml, I want it to look like:
ab:8082
and not like:
ab: 8082
Is there a way we can achieve it ?

Your output is not valid YAML, as that requires a space after the colon in block style.
So what I recommend is post-processing the output using ruamel.yamls transform argument to dump
import sys
import ruamel.yaml
data = {'ab':8082 , 'bc': 8082}
def remove_space_after_colon(s):
res = []
for line in s.splitlines(True):
res.append(line.replace(': ', ':', 1)) # 1, to prevent replacing in values
return ''.join(res)
yaml = ruamel.yaml.YAML()
yaml.dump(data, sys.stdout, transform=remove_space_after_colon)
which gives:
ab:8082
bc:8082

Related

remove double quotes around dictionary object - python

I have a dictionary that I am using to populate a YAML config file for each key.
{'id': ['HP:000111'], 'id1': ['HP:000111'], 'id2': ['HP:0001111', 'HP:0001123'])}
code to insert key:value pair into YAML template using ruamel.yaml
import ruamel.yaml
import sys
yaml = ruamel.yaml.YAML()
with open('yaml.yml') as fp:
data = yaml.load(fp)
for k in start.keys():
data['analysis']['hpoIds'] = start.get(key)
with open(f"path/yaml-{k}.yml","w+") as f:
yaml.dump(data, sys.stdout)
this is output I am getting
analysis:
# hg19 or hg38 - ensure that the application has been configured to run the specified assembly otherwise it will halt.
genomeAssembly: hg38
vcf:
ped:
proband:
hpoIds: "['HP:000111','HP:000112','HP:000113']"
but this is what I need
hpoIds: ['HP:000111','HP:000112','HP:000113']
ive tried using string tools i.e strip, replace but didnt
output from ast.literal_eval.
hpoIds:
- HP:000111
- HP:000112
- HP:000113
output from repr
hpoIds: "\"['HP:000111','HP: 000112','HP:000113']\""
any help would be greatly appreciated

It is not entirely clear to me what you are trying to do and why you e.g. open files
'w+' for dumping.
However if you have something that comes out block style and unquoted, that can easily be remedied
by using a small function:
import sys
from pathlib import Path
import ruamel.yaml
SQ = ruamel.yaml.scalarstring.SingleQuotedScalarString
def flow_seq_single_quoted(lst):
res = ruamel.yaml.CommentedSeq([SQ(x) if isinstance(x, str) else x for x in lst])
res.fa.set_flow_style()
return res
in_file = Path('yaml.yaml')
in_file.write_text("""\
hpoIds:
- HP:000111
- HP:000112
- HP:000113
""")
yaml = ruamel.yaml.YAML()
data = yaml.load(in_file)
data['hpoIds'] = flow_seq_single_quoted(data['hpoIds'])
yaml.dump(data, sys.stdout)
which gives:
hpoIds: ['HP:000111', 'HP:000112', 'HP:000113']
The recommended extension for YAML files has been .yaml since at least September 2006.

Get wrong values by parsing YAML

I'm somewhat confused by yaml parsing results. I made a test.yaml and the results are the same.
val_1: 05334000
val_2: 2345784
val_3: 0537380
str_1: foobar
val_4: 05798
val_5: 051342123
Parsing that with:
import yaml
with open('test.yaml', 'r', encoding='utf8') as f:
a = yaml.load(f, Loader=yaml.FullLoader)
returns:
{'val_1': 1423360,
'val_2': 2345784,
'val_3': '0537380',
'str_1': 'foobar',
'val_4': '05798',
'val_5': 10863699}
Why these values for val_1 and val_5? Is there something special?
In my real data with many yaml files there are values like val_1. For some they parsed correct but for some they don't? All starts with 05, followed by more numbers. Caused by the leading 0 results should be strings. But yaml parses something completely different.
If I read the yaml as textfile f.readlines(), all is fine:
['val_1: 05334000\n',
'val_2: 2345784\n',
'val_3: 0537380\n',
'str_1: foobar\n',
'val_4: 05798\n',
'val_5: 051342123\n']

Integers with a leading 0 are parsed as octal; in python you'd need to write them with a leading 0o:
0o5334000 == 1423360
as for '0537380': as there is an 8 present as digit it can not be parsed as an octal number. therefore it remains a string.
if you want to get strings for all your entries you can use the BaseLoader
from io import StringIO
import yaml
file = StringIO("""
val_1: 05334000
val_2: 2345784
val_3: 0537380
str_1: foobar
val_4: 05798
val_5: 051342123
""")
dct = yaml.load(file, Loader=yaml.BaseLoader)
with that i get:
{'val_1': '05334000', 'val_2': '2345784', 'val_3': '0537380',
'str_1': 'foobar', 'val_4': '05798', 'val_5': '051342123'}

ruamel.yaml dump lists without adding new line at the end

I trying to dump a dict object as YAML using the snippet below:
from ruamel.yaml import YAML
# YAML settings
yaml = YAML(typ="rt")
yaml.default_flow_style = False
yaml.explicit_start = False
yaml.indent(mapping=2, sequence=4, offset=2)
rip= {"rip_routes": ["23.24.10.0/15", "23.30.0.10/15", "50.73.11.0/16", "198.0.0.0/16"]}
file = 'test.yaml'
with open(file, "w") as f:
yaml.dump(rip, f)
It dumps correctly, but I am getting an new line appended to the end of the list
rip_routes:
- 23.24.10.0/15
- 23.30.0.10/15
- 198.0.11.0/16
I don't want the new line to be inserted at the end of file. How can I do it?

The newline is part of the representation code for block style sequence elements. And since that code
doesn't have much knowledge about context, and certainly not about representing the last element to be dumped
in a document, it is almost impossible for the final newline not to be output.
However, the .dump() method has an optional transform parameter that allows you to
run the output of the dumped text through some filter:
import sys
import pathlib
import string
import ruamel.yaml
# YAML settings
yaml = ruamel.yaml.YAML(typ="rt")
yaml.default_flow_style = False
yaml.explicit_start = False
yaml.indent(mapping=2, sequence=4, offset=2)
rip= {"rip_routes": ["23.24.10.0/15", "23.30.0.10/15", "50.73.11.0/16", "198.0.0.0/16"]}
def strip_final_newline(s):
if not s or s[-1] != '\n':
return s
return s[:-1]
file = pathlib.Path('test.yaml')
yaml.dump(rip, file, transform=strip_final_newline)
print(repr(file.read_text()))
which gives:
'rip_routes:\n - 23.24.10.0/15\n - 23.30.0.10/15\n - 50.73.11.0/16\n - 198.0.0.0/16'
It is better to use Path() instances as in the code above,
especially if your YAML document is going to contain non-ASCII characters.

YAML: Dump Python List Without Quotes

I have a Python list, my_list that looks like this ["test1", "test2", "test3"]. I simply want to dump it to a YAML file without quotes. So the desired output is:
test_1
test_2
test_3
I've tried:
import yaml
with open("my_yaml.yaml", "w") as f:
yaml.safe_dump(my_list, f)
Unfortunately, this includes all 3 elements on a single line and they're quoted:
'test_1', 'test_2', 'test_3'
How can I modify to get the desired output?

Try using default_style=None to avoid quotes, and default_flow_style=False to output items on separate lines:
yaml.safe_dump(my_list, f, default_style=None, default_flow_style=False)

You want to output a Python list as a multi-line plain scalar and that
is going to be hard. Normally a list is output a YAML sequence, which
has either dashes (-, in block style, over multiple lines) or using
square brackets ([], in flow style, on one or more lines.
Block style with dashes:
import sys
from ruamel.yaml import YAML
data = ["test1", "test2", "test3"]
yaml = YAML()
yaml.dump(data, sys.stdout)
gives:
- test1
- test2
- test3
flow style, on a narrow line:
yaml = YAML()
yaml.default_flow_style = True
yaml.dump(data, sys.stdout)
output:
Flow style, made narrow:
[test1, test2, test3]
yaml = YAML()
yaml.default_flow_style = True
yaml.width = 5
yaml.dump(data, sys.stdout)
gets you:
[test1,
test2,
test3]
This is unlikely what you want as it affects the whole YAML document,
and you still got the square brackets.
One alternative is converting the string to a plain scalar. This is
actualy what your desired output would be loaded as.
yaml_str = """\
test_1
test_2
test_3
"""
yaml = YAML()
x = yaml.load(yaml_str)
assert type(x) == str
assert x == 'test_1 test_2 test_3'
Loading your expected output is often a good test to see what you
need to provide.
Therefore you would have to convert your list to a multi-word
string. Once more the problem is that you can only force the line
breaks in any YAML library known to me, by setting the width of the
document and there is a minimum width for most which is bigger than 4
(although that can be patched that doesn't solve the problem of that
this applies to the whole document).
yaml = YAML()
yaml.width = 5
s = ' '.join(data)
yaml.dump(s, sys.stdout)
result:
test1 test2
test3
...
This leaves what is IMO the best solution if you really don't want dashes: to
use a literal block style
scalar (string):
from ruamel.yaml.scalarstring import PreservedScalarString
yaml = YAML()
s = PreservedScalarString('\n'.join(data) + '\n')
yaml.dump(s, sys.stdout)
In that scalar style newlines are preserved:
|
test1
test2
test3

Preserve quotes and also add data with quotes in Ruamel

I am using Ruamel to preserve quote styles in human-edited YAML files.
I have example input data as:
---
a: '1'
b: "2"
c: 3
I read in data using:
def read_file(f):
with open(f, 'r') as _f:
return ruamel.yaml.round_trip_load(_f.read(), preserve_quotes=True)
I then edit that data:
data = read_file('in.yaml')
data['foo'] = 'bar'
I write back to disk using:
def write_file(f, data):
with open(f, 'w') as _f:
_f.write(ruamel.yaml.dump(data, Dumper=ruamel.yaml.RoundTripDumper, width=1024))
write_file('out.yaml', data)
And the output file is:
a: '1'
b: "2"
c: 3
foo: bar
Is there a way I can enforce hard quoting of the string 'bar' without also enforcing that quoting style throughout the rest of the file?
(Also, can I stop it from deleting the three dashes --- ?)

In order to preserve quotes (and literal block style) for string scalars, ruamel.yaml¹—in round-trip-mode—represents these scalars as SingleQuotedScalarString, DoubleQuotedScalarString and PreservedScalarString. The class definitions for these very thin wrappers can be found in scalarstring.py.
When serializing such instances are written "as they were read", although sometimes the representer falls back to double quotes when things get difficult, as that can represent any string.
To get this behaviour when adding new key-value pairs (or when updating an existing pair), you just have to create these instances yourself:
import sys
from ruamel.yaml import YAML
from ruamel.yaml.scalarstring import SingleQuotedScalarString, DoubleQuotedScalarString
yaml_str = """\
---
a: '1'
b: "2"
c: 3
"""
yaml = YAML()
yaml.preserve_quotes = True
yaml.explicit_start = True
data = yaml.load(yaml_str)
data['foo'] = SingleQuotedScalarString('bar')
data.yaml_add_eol_comment('# <- single quotes added', 'foo', column=20)
yaml.dump(data, sys.stdout)
gives:
---
a: '1'
b: "2"
c: 3
foo: 'bar' # <- single quotes added
the yaml.explicit_start = True recreates the (superfluous) document start marker. Whether such a marker was in the original file or not is not "known" by the top-level dictionary object, so you have to re-add it by hand.
Please note that without preserve_quotes, there would be (single) quotes around the values 1 and 2 anyway to make sure they are seen as string scalars and not as integers.
¹ Of which I am the author.

Since Ruamel 0.15, set the preserve_quotes flag like this:
from ruamel.yaml import YAML
from pathlib import Path
yaml = YAML(typ='rt') # Round trip loading and dumping
yaml.preserve_quotes = True
data = yaml.load(Path("in.yaml"))
yaml.dump(data, Path("out.yaml"))

Develop Reference

Python is a programming language that lets you work quickly and integrate systems more effectively.

Can we remove extra space in yaml after colon in PyYaml - python

I have a dictionary which looks like: {'ab':8082 , 'bc': 8082} When I dump it to python yaml, I want it to look like: ab:8082 and not like: ab: 8082 Is there a way we can achieve it ?

Related

remove double quotes around dictionary object - python

Get wrong values by parsing YAML

ruamel.yaml dump lists without adding new line at the end

YAML: Dump Python List Without Quotes

Preserve quotes and also add data with quotes in Ruamel

Categories

Resources