I'm trying to load yaml that contains python formatted strings, e.g. test: {formatted_string}. This would allow me to format the string using dictionary["test"].format(formatted_string="hello yaml"), but when I load the yaml, it's automatically converted to {'test': {'formatted_string': None}} instead of {'test': '{formatted_string}'}.
There are dozens of .yaml files that are already formatted in this way.
I don't see this in the pyyaml docs or anywhere on SO.
Code in full for clarity:
import yaml
data = """
test: {formatted_string}
"""
d1 = yaml.load(data)
print(d1)
# {'test': {'formatted_string': None}}
d2 = {"test": "{formatted_string}"}
print(d2)
# {'test': '{formatted_string}'}
d2["test"] = d2["test"].format(formatted_string="hello yaml")
print(d2)
# {'test': 'hello yaml'}
Thanks!
The { character in YAML (as in JSON) introduces a dictionary. That is this:
a_dictionary:
key1: value1
key2: value2
Is completely equivalent to:
a_dictionary: {key1: value1, key2: value2}
So when you write...
test: {formatted_string}
...the YAML parser things you are introducing a dictionary, and that it has a single key (formatted_string) and no value. If you want to use a { as part of a YAML value you need to quote it:
test: "{formatted_string}"
Compare:
>>> yaml.safe_load('test: {formatted_string}')
{'test': {'formatted_string': None}}
>>> yaml.safe_load('test: "{formatted_string}"')
{'test': '{formatted_string}'}
In general, if you always quote your YAML strings your life will be easier :).
Related
I am using ruamel.yaml for dumping a dict to a yaml file. While doing so, I want to keep the order of the dictionary. That is how I came across the question Keep YAML file order with ruamel. But this solution is not working in my case:
The order is not preserved.
adding tags like !!python/object/apply:ruamel.yaml.comments.CommentedMap or dictitems
import ruamel.yaml
from ruamel.yaml.comments import CommentedMap as ordereddict
generated_file = os.path.join('data_TEST.yaml')
data_dict = {'Sources': {'coil': None}, 'Magnet': 'ABC', 'Current': ordereddict({'heat': {'i': [[]], 'h': None, }})}
data_dict = ordereddict(data_dict)
with open(generated_file, 'w') as yaml_file:
ruamel.yaml.dump(data_dict, yaml_file, default_flow_style=False)
The used dictionary is just an arbitrary one and in the end an automatically created array that could look different is going to be used. So, we cannot hard-code the mapping of the dictionaries in the dictionary like in my example.
Result:
!!python/object/apply:ruamel.yaml.comments.CommentedMap
dictitems:
Current: !!python/object/apply:ruamel.yaml.comments.CommentedMap
dictitems:
heat:
h: null
i:
- []
Magnet: ABC
Sources:
coil: null
Desired result:
Sources:
coil: null
Magnet: ABC
Current:
heat:
h: null
i:
- []
You should really not be using the old PyYAML API that sorts keys when dumping.
Instantiate a YAML instance and use its dump method:
yaml = ruamel.yaml.YAML()
yaml.dump(data, stream)
I need a little help processing a String to a Dict, considering that the String is not in a common format, but an output from a UDF function
The return from the PySpark UDF looks like the string below:
"{list=[{a=1}, {a=2}, {a=3}]}"
And I need to convert it to a python dictionary with the structure below:
{
"list": [
{"a": 1}
{"a": 2}
{"a": 3}
]
}
So I can access it's values, like
dict["list"][1]["a"]
I already tried using:
JSON.loads
ast_eval()
Could someone please help me?
As an example of how this unparsed string is generated:
#udf()
def execute_method():
return {"list": [{"a":1},{"b":1}{"c":1}]}
df_result = df_source.withColumn("result", execute_method())
By the very least you will need to replace = with : and surround keys with double quotes:
import json
import re
string = "{list=[{a=1}, {a=2}, {a=3}]}"
fixed_string = re.sub(r'(\w+)=', r'"\1":', string)
print(type(fixed_string), fixed_string)
parsed = json.loads(fixed_string)
print(type(parsed), parsed)
outputs
<class 'str'> {"list":[{"a":1}, {"a":2}, {"a":3}]}
<class 'dict'> {'list': [{'a': 1}, {'a': 2}, {'a': 3}]}
try this :
import re
import json
data="{list=[{a=1}, {a=2}, {a=3}]}"
data=data.replace('=',':')
pattern=[e.group() for e in re.finditer('[a-z]+', data, flags=re.IGNORECASE)]
for e in set(pattern):
data=data.replace(e,"\""+e+"\"")
print(json.loads(data))
So I have a project I'm working on for fun but it requires me to append to a dictionary from another python file. In file1.py it will look like
Name: Eric <-- user input
Age: 27 <-- user input
and file2.py,
information = {'Eric':27}
I know that I can temporarily append to a dictionary while running the code, but it seems to reset after I close the program. Like recently I've seen this on a StackOverflow question
d = {'key': 'value'}
print(d)
# {'key': 'value'}
d['mynewkey'] = 'mynewvalue'
print(d)
# {'key': 'value', 'mynewkey': 'mynewvalue'}
But this too, resets after every run so I thought that the only way to save the dictionary is to write it to another file. Is there any way that I can achieve this or maybe a better alternative?
You can use JSON to save data to a file.
This will save the data, that is stored in your dictionary, in a file.
import json
my_dict = {"key": "value", "key2": "value2"}
with open("output_file.txt", "w") as file:
json.dump(my_dict, file, indent=4)
To use that data again, you can load that file.
import json
with open("output_file.txt") as file:
my_dict = json.load(file)
print(my_dict) # Will print {"key": "value", "key2": "value2"}
JSON stands for JavaScriptObjectNotation, and it's a way to save data in a string format (a file)
So JSON can convert a string into data, if it is valid JSON:
import json
string_data = '{"key": "value"}'
dictionary = json.loads(string_data)
print(type(string_data)) # <class 'str'>
print(type(dictionary)) # <class 'dict'>
How to exclude the key 'u' from below,
{u'{"auth":{"user_id":"2"},"data":{"collection":"master-services"}}': [u'']}
I need to get my dictionary like below,
{"auth":{"user_id":"2"},"data":{"collection":"master-services"}}
It looks like you have a dictionary where the key(s) is JSON data. Try parsing it with a JSON parser.
>>> json.loads(list(data)[0])
{'auth': {'user_id': '2'}, 'data': {'collection': 'master-services'}}
If you have many such keys, you can iterate over data (or, data.keys()), like this:
>>> new_data = [json.loads(d) for d in data]
This gives you a list of dictionaries.
u stands for Unicode Text. It is used for creating Unicode strings. It is not a character that's stored in the dictionary.
You just need the key of the dictionary entry. Because there is only one key, you can do this:
my_dict = {u'{"auth":{"user_id":"2"},"data":{"collection":"master-services"}}': [u'']}
my_key = next(iter(my_dict))
my_key will hold the value {"auth":{"user_id":"2"},"data":{"collection":"master-services"}}
I am trying to make yaml dump each key:value pair on a separate line. Is there a native option to do that? I have tried line_break but couldn't get it to work.
Here is a code example:
import yaml
def test_yaml_dump():
obj = {'key0': 1, 'key1': 2}
with open('test.yaml', 'w') as tmpf:
yaml.dump(obj, tmpf, line_break=0)
The output is:
{key0: 1, key1: 2}
I want it to be:
{key0: 1,
key1: 2}
If you add the argument default_flow_style=False to dump then the output will be:
key1: 2
key0: 1
(the so called block style). That is the much more readable way of dumping Python dicts to YAML mappings. In ruamel.yaml this is the default when using ruamel.yaml.round_trip_dump().
import sys
import ruamel.yaml as yaml
obj = dict(key0=1, key1=2)
yaml.round_trip_dump(obj, sys.stdout)