How to separate yaml.dump key:value pair by a new line? - python

I am trying to make yaml dump each key:value pair on a separate line. Is there a native option to do that? I have tried line_break but couldn't get it to work.
Here is a code example:
import yaml
def test_yaml_dump():
obj = {'key0': 1, 'key1': 2}
with open('test.yaml', 'w') as tmpf:
yaml.dump(obj, tmpf, line_break=0)
The output is:
{key0: 1, key1: 2}
I want it to be:
{key0: 1,
key1: 2}

If you add the argument default_flow_style=False to dump then the output will be:
key1: 2
key0: 1
(the so called block style). That is the much more readable way of dumping Python dicts to YAML mappings. In ruamel.yaml this is the default when using ruamel.yaml.round_trip_dump().
import sys
import ruamel.yaml as yaml
obj = dict(key0=1, key1=2)
yaml.round_trip_dump(obj, sys.stdout)

Related

Can we remove extra space in yaml after colon in PyYaml

I have a dictionary which looks like:
{'ab':8082 , 'bc': 8082}
When I dump it to python yaml, I want it to look like:
ab:8082
and not like:
ab: 8082
Is there a way we can achieve it ?
Your output is not valid YAML, as that requires a space after the colon in block style.
So what I recommend is post-processing the output using ruamel.yamls transform argument to dump
import sys
import ruamel.yaml
data = {'ab':8082 , 'bc': 8082}
def remove_space_after_colon(s):
res = []
for line in s.splitlines(True):
res.append(line.replace(': ', ':', 1)) # 1, to prevent replacing in values
return ''.join(res)
yaml = ruamel.yaml.YAML()
yaml.dump(data, sys.stdout, transform=remove_space_after_colon)
which gives:
ab:8082
bc:8082

Preserving order of dictionary while using ruamel.yaml

I am using ruamel.yaml for dumping a dict to a yaml file. While doing so, I want to keep the order of the dictionary. That is how I came across the question Keep YAML file order with ruamel. But this solution is not working in my case:
The order is not preserved.
adding tags like !!python/object/apply:ruamel.yaml.comments.CommentedMap or dictitems
import ruamel.yaml
from ruamel.yaml.comments import CommentedMap as ordereddict
generated_file = os.path.join('data_TEST.yaml')
data_dict = {'Sources': {'coil': None}, 'Magnet': 'ABC', 'Current': ordereddict({'heat': {'i': [[]], 'h': None, }})}
data_dict = ordereddict(data_dict)
with open(generated_file, 'w') as yaml_file:
ruamel.yaml.dump(data_dict, yaml_file, default_flow_style=False)
The used dictionary is just an arbitrary one and in the end an automatically created array that could look different is going to be used. So, we cannot hard-code the mapping of the dictionaries in the dictionary like in my example.
Result:
!!python/object/apply:ruamel.yaml.comments.CommentedMap
dictitems:
Current: !!python/object/apply:ruamel.yaml.comments.CommentedMap
dictitems:
heat:
h: null
i:
- []
Magnet: ABC
Sources:
coil: null
Desired result:
Sources:
coil: null
Magnet: ABC
Current:
heat:
h: null
i:
- []
You should really not be using the old PyYAML API that sorts keys when dumping.
Instantiate a YAML instance and use its dump method:
yaml = ruamel.yaml.YAML()
yaml.dump(data, stream)

Can you append to a dictionary from a foreign python file?

So I have a project I'm working on for fun but it requires me to append to a dictionary from another python file. In file1.py it will look like
Name: Eric <-- user input
Age: 27 <-- user input
and file2.py,
information = {'Eric':27}
I know that I can temporarily append to a dictionary while running the code, but it seems to reset after I close the program. Like recently I've seen this on a StackOverflow question
d = {'key': 'value'}
print(d)
# {'key': 'value'}
d['mynewkey'] = 'mynewvalue'
print(d)
# {'key': 'value', 'mynewkey': 'mynewvalue'}
But this too, resets after every run so I thought that the only way to save the dictionary is to write it to another file. Is there any way that I can achieve this or maybe a better alternative?
You can use JSON to save data to a file.
This will save the data, that is stored in your dictionary, in a file.
import json
my_dict = {"key": "value", "key2": "value2"}
with open("output_file.txt", "w") as file:
json.dump(my_dict, file, indent=4)
To use that data again, you can load that file.
import json
with open("output_file.txt") as file:
my_dict = json.load(file)
print(my_dict) # Will print {"key": "value", "key2": "value2"}
JSON stands for JavaScriptObjectNotation, and it's a way to save data in a string format (a file)
So JSON can convert a string into data, if it is valid JSON:
import json
string_data = '{"key": "value"}'
dictionary = json.loads(string_data)
print(type(string_data)) # <class 'str'>
print(type(dictionary)) # <class 'dict'>

python yaml load formatted string

I'm trying to load yaml that contains python formatted strings, e.g. test: {formatted_string}. This would allow me to format the string using dictionary["test"].format(formatted_string="hello yaml"), but when I load the yaml, it's automatically converted to {'test': {'formatted_string': None}} instead of {'test': '{formatted_string}'}.
There are dozens of .yaml files that are already formatted in this way.
I don't see this in the pyyaml docs or anywhere on SO.
Code in full for clarity:
import yaml
data = """
test: {formatted_string}
"""
d1 = yaml.load(data)
print(d1)
# {'test': {'formatted_string': None}}
d2 = {"test": "{formatted_string}"}
print(d2)
# {'test': '{formatted_string}'}
d2["test"] = d2["test"].format(formatted_string="hello yaml")
print(d2)
# {'test': 'hello yaml'}
Thanks!
The { character in YAML (as in JSON) introduces a dictionary. That is this:
a_dictionary:
key1: value1
key2: value2
Is completely equivalent to:
a_dictionary: {key1: value1, key2: value2}
So when you write...
test: {formatted_string}
...the YAML parser things you are introducing a dictionary, and that it has a single key (formatted_string) and no value. If you want to use a { as part of a YAML value you need to quote it:
test: "{formatted_string}"
Compare:
>>> yaml.safe_load('test: {formatted_string}')
{'test': {'formatted_string': None}}
>>> yaml.safe_load('test: "{formatted_string}"')
{'test': '{formatted_string}'}
In general, if you always quote your YAML strings your life will be easier :).

pythonic way of iterating over a collection of json objects stored in a text file

I have a text file that has several thousand json objects (meaning the textual representation of json) one after the other. They're not separated and I would prefer not to modify the source file. How can I load/parse each json in python? (I have seen this question, but if I'm not mistaken, this only works for a list of jsons (alreay separated by a comma?) My file looks like this:
{"json":1}{"json":2}{"json":3}{"json":4}{"json":5}...
I don't see a clean way to do this without using the real JSON parser. The other options of modifying the text and using a non-JSON parser are risky. So the best way to go it find a way to iterate using the real JSON parser so that you're sure to comply with the JSON spec.
The core idea is to let the real JSON parser do all the work in identifying the groups:
import json, re
combined = '{"json":1}{"json":2}{"json":3}{"json":4}{"json":5}'
start = 0
while start != len(combined):
try:
json.loads(combined[start:])
except ValueError as e:
pass
# Find the location where the parsing failed
end = start + int(re.search(r'column (\d+)', e.args[0]).group(1)) - 1
result = json.loads(combined[start:end])
start = end
print(result)
This outputs:
{u'json': 1}
{u'json': 2}
{u'json': 3}
{u'json': 4}
{u'json': 5}
I think the following would work as long as there are no non-comma-delimited json arrays of json sub-objects inside any of the outermost json objects. It's somewhat brute-force in that it reads the whole file into memory and attempts to fix it.
import json
def get_json_array(filename):
with open(filename, 'rt') as jsonfile:
json_array = '[{}]'.format(jsonfile.read().replace('}{', '},{'))
return json.loads(json_array)
for obj in get_json_array('multiobj.json'):
print(obj)
Output:
{u'json': 1}
{u'json': 2}
{u'json': 3}
{u'json': 4}
{u'json': 5}
Instead of modifying the source file, just make a copy. Use a regex to replace }{ with },{ and then hopefully a pre-built json reader will take care of it nicely.
EDIT: quick solution:
from re import sub
with open(inputfile, 'r') as fin:
text = sub(r'}{', r'},{', fin.read())
with open(outfile, 'w' as fout:
fout.write('[')
fout.write(text)
fout.write(']')
>>> import ast
>>> s = '{"json":1}{"json":2}{"json":3}{"json":4}{"json":5}'
>>> [ast.literal_eval(ele + '}') for ele in s.split('}')[:-1]]
[{'json': 1}, {'json': 2}, {'json': 3}, {'json': 4}, {'json': 5}]
Provided you have no nested objects and splitting on '}' is feasible this can be accomplished pretty simply.
Here is one pythonic way to do it:
from json.scanner import make_scanner
from json import JSONDecoder
def load_jsons(multi_json_str):
s = multi_json_str.strip()
scanner = make_scanner(JSONDecoder())
idx = 0
objects = []
while idx < len(s):
obj, idx = scanner(s, idx)
objects.append(obj)
return objects
I think json was never supposed to be used this way, but it solves your problem.
I agree with #Raymond Hettinger, you need to use json itself to do the work, text manipulation doesn't work for complex JSON objects. His answer parses the exception message to find the split position. It works, but it looks like a hack, hence, not pythonic :)
EDIT:
Just found out this is actually supported by json module, just use raw_decode like this:
decoder = JSONDecoder()
first_obj, remaining = decoder.raw_decode(multi_json_str)
Read http://pymotw.com/2/json/index.html#mixed-data-streams

Categories

Resources