How to configure ruamel.yaml.dump output? - python

With this data structure:
d = {
(2,3,4): {
'a': [1,2],
'b': 'Hello World!',
'c': 'Voilà!'
}
}
I would like to get this YAML:
%YAML 1.2
---
[2,3,4]:
a:
- 1
- 2
b: Hello World!
c: 'Voilà!'
Unfortunately I get this format:
$ print ruamel.yaml.dump(d, default_flow_style=False, line_break=1, explicit_start=True, version=(1,2))
%YAML 1.2
---
? !!python/tuple
- 2
- 3
- 4
: a:
- 1
- 2
b: Hello World!
c: !!python/str 'Voilà!'
I cannot configure the output I want even with safe_dump. How can I do that without manual regex work on the output?
The only ugly solution I found is something like:
def rep(x):
return repr([int(y) for y in re.findall('^\??\s*-\s*(\d+)', x.group(0), re.M)]) + ":\n"
print re.sub('\?(\s*-\s*(\w+))+\s*:', rep,
ruamel.yaml.dump(d, default_flow_style=False, line_break=1, explicit_start=True, version=(1,2)))

New ruamel.yaml API
You cannot get what you want using ruamel.yaml.dump(), but with the new API, which has
a few more controls, you can come very close.
import sys
import ruamel.yaml
d = {
(2,3,4): {
'a': [1,2],
'b': 'Hello World!',
'c': 'Voilà!'
}
}
def prep(d):
if isinstance(d, dict):
needs_restocking = False
for idx, k in enumerate(d):
if isinstance(k, tuple):
needs_restocking = True
try:
if 'à' in d[k]:
d[k] = ruamel.yaml.scalarstring.SingleQuotedScalarString(d[k])
except TypeError:
pass
prep(d[k])
if not needs_restocking:
return
items = list(d.items())
for (k, v) in items:
d.pop(k)
for (k, v) in items:
if isinstance(k, tuple):
k = ruamel.yaml.comments.CommentedKeySeq(k)
d[k] = v
elif isinstance(d, list):
for item in d:
prep(item)
yaml = ruamel.yaml.YAML()
yaml.indent(mapping=2, sequence=4, offset=2)
yaml.version = (1, 2)
data = prep(d)
yaml.dump(d, sys.stdout)
which gives:
%YAML 1.2
---
[2, 3, 4]:
a:
- 1
- 2
b: Hello World!
c: 'Voilà!'
There is still no simple way to suppress the space before the sequence items, so you cannot get [2,3,4] insted of [2, 3, 4] without some major effort.
Original answer:
You cannot get exactly what you want as output using ruamel.yaml.dump() without major rework of the internals.
The output you like has indentation 2 for the values of the top-level mapping (key a, b, etc) and indentation 4 for the elements of the sequence that is the value for the a key (with the - pushed in 2 positions. That would at least require differencing between indentation levels for mapping and sequences (if not for individual collections) and that is non-trivial.
Your sequence output is compacted from the , (comma, space) what a "normal" flow style emits to just a ,. IIRC this cannot currently be influenced by any parameter, and since you have little contextual knowledge when emitting a collection, it is difficult to "not include the spaces when emitting a sequence that is a key". An additional option to dump() would require changes in several of the sources files and classes.
Less difficult issues, with indication of solution:
Your tuple has to magically convert to a sequence to get rid of the tag !!python/tuple. As you don't want to affect all tuples, this is IMO best done by making a subclass of tuple and represent this as a sequence (optionally represent such tuple as list only if actually used as a key). You can use comments.CommentedKeySeq for that (assuming ruamel.yaml>=0.12.14, it has the proper representation support when using ruamel.yaml.round_trip_dump()
Your key is, when tested before emitting, not a simple key and as such it get a '? ' (question mark, space) to indicate a complex mapping key. . You would have to change the emitter so that the SequenceStartEvent starts a simple key (if it has flow style and not block style). An additional issue is that such a SequenceStartEvent then will be "tested" to have a style attribute (which might indicate an explicit need for '?' on key). This requires changing emitter.py:Emitter.check_simple_key() and emitter.py:Emitter.expect_block_mapping_key().
Your scalar string value for c gets quotes, whereas your scalar string value for b doesn't. You only can get that kind of difference in output in ruamel.yaml by making them different types. E.g. by making it type scalarstring.SingleQuotedScalarString() (and using round_trip_dump()).
If you do:
import sys
import ruamel.yaml
from ruamel.yaml.comments import CommentedMap, CommentedKeySeq
assert ruamel.yaml.version_info >= (0, 12, 14)
data = CommentedMap()
data[CommentedKeySeq((2, 3, 4))] = cm = CommentedMap()
cm['a'] = [1, 2]
cm['b'] = 'Hello World!'
cm['c'] = ruamel.yaml.scalarstring.SingleQuotedScalarString('Voilà!')
ruamel.yaml.round_trip_dump(data, sys.stdout, explicit_start=True, version=(1, 2))
you will get:
%YAML 1.2
---
[2, 3, 4]:
a:
- 1
- 2
b: Hello World!
c: 'Voilà!'
which, apart from the now consistent indentation level of 2, the extra spaces in the flow style sequence, and the required use of the round_trip_dump, will get you as close to what you want without major rework.
Whether the above code is ugly as well or not is of course a matter of taste.
The output will, non-incidently, round-trip correctly when loaded using ruamel.yaml.round_trip_load(preserve_quotes=True).
If control over the quotes is not needed, and neither is the order of your mapping keys important, then you can also patch the normal dumper:
def my_key_repr(self, data):
if isinstance(data, tuple):
print('data', data)
return self.represent_sequence(u'tag:yaml.org,2002:seq', data,
flow_style=True)
return ruamel.yaml.representer.SafeRepresenter.represent_key(self, data)
ruamel.yaml.representer.Representer.represent_key = my_key_repr
Then you can use a normal sequence:
data = {}
data[(2, 3, 4)] = cm = {}
cm['a'] = [1, 2]
cm['b'] = 'Hello World!'
cm['c'] = 'Voilà!'
ruamel.yaml.dump(data, sys.stdout, allow_unicode=True, explicit_start=True, version=(1, 2))
will give you:
%YAML 1.2
---
[2, 3, 4]:
a: [1, 2]
b: Hello World!
c: Voilà!
please note that you need to explicitly allow unicode in your output (default with round_trip_dump()) using allow_unicode=True.
¹ Disclaimer: I am the author of ruamel.yaml.

Related

How to use a variable inside a str.format() Method? [duplicate]

In order to print a header for tabular data, I'd like to use only one format string line and one spec for column widths w1, w2, w3 (or even w = x, y, z if possible.)
I've looked at this but tabulate etc. don't let me justify things in the column like format does.
This approach works:
head = 'eggs', 'bacon', 'spam'
w1, w2, w3 = 8, 7, 10 # column widths
line = ' {:{ul}>{w1}} {:{ul}>{w2}} {:{ul}>{w3}}'
under = 3 * '='
print line.format(*head, ul='', w1=w1, w2=w2, w3=w3)
print line.format(*under, ul='=', w1=w1, w2=w2, w3=w3)
Must I have individual names as widths {w1}, {w2}, ... in the format string? Attempts like {w[1]}, {w[2]}, give either KeyError or keyword can't be an expression.
Also I think the w1=w1, w2=w2, w3=w3 is not very succinct. Is there a better way?
Using the f-string format becomes very easy nowadays.
If you were using
print(f'{token:10}')
And you want the 10 to be another variable (for example the max length of all the tokens), you would write
print(f'{token:{maxTokenLength}}')
In other words, enclose the variable within {}
In your particular case, all you need is this.
head = 'eggs', 'bacon', 'spam'
w1, w2, w3 = 8, 7, 10 # column widths
print(f' {head[0]:>{w1}} {head[1]:>{w2}} {head[2]:>{w3}}')
print(f' {"="*w1:>{w1}} {"="*w2:>{w2}} {"="*w3:>{w3}}')
Which produces
eggs bacon spam
======== ======= ==========
Specifying w[0], w[1], w[2] should work if you defined w = 8, 7, 10 and passed w as keyword argument like below:
>>> head = 'eggs', 'bacon', 'spam'
>>> w = 8, 7, 10 # <--- list is also okay
>>> line = ' {:{ul}>{w[0]}} {:{ul}>{w[1]}} {:{ul}>{w[2]}}'
>>> under = 3 * '='
>>> print line.format(*head, ul='', w=w) # <-- pass as a keyword argument
eggs bacon spam
>>> print line.format(*under, ul='=', w=w) # <-- pass as a keyword argument
======== ======= ==========
This is jonrsharpe's comment to my OP, worked out so as to visualise what's going on.
line = ' {:{ul}>{w1}} {:{ul}>{w2}} {:{ul}>{w3}}'
under = 3 * '_'
head = 'sausage', 'rat', 'strawberry tart'
# manual dict
v = {'w1': 8, 'w2':5, 'w3': 17}
print line.format(*under, ul='_', **v)
# auto dict
widthl = [8, 7, 9]
x = {'w{}'.format(index): value for index, value in enumerate(widthl, 1)}
print line.format(*under, ul='_', **x)
The point is that I want to be able to quickly rearrange the header without having to tweak the format string. The auto dict meets that requirement very nicely.
As for filling a dict in this way: WOW!

Print empty string if variable is None [duplicate]

This question already has answers here:
Python: most idiomatic way to convert None to empty string?
(17 answers)
Closed 3 years ago.
I'm trying to make a templated string that will print values of a given dict. However, the key may or may not exist in the dict. If it doesn't exist, I'd like it to return an empty string instead.
To demonstrate what I mean, let's say I have a dict:
test_dict = { 1: "a", 2: "b"}
And a template string:
'{} {} {}'.format(test_dict.get(1), test_dict.get(2), test_dict.get(3))
I'd like the following output:
'a b '
But instead I get:
'a b None'
Use the dictionary's get function. This allows you to specify a value to return if the key is not found
'{}, {}, {}'.format(test_dict.get(1,''), test_dict.get(2,''), test_dict.get(3, ''))
One way would be to get the length of the dict, and put the same amount of placeholeders inside the template:
In [27]: test_dict = { 1: "a", 2: "b"}
In [28]: ' '.join(['{}'] * len(test_dict))
Out[28]: '{} {}'
In [29]: ' '.join(['{}'] * len(test_dict)).format(*test_dict.values())
Out[29]: 'a b'
Note that, this is basically the same as ' '.join(test_dict.values()) but showing you the template string as an example.
UPDATES PER OP COMMENT
You can use the string library to help here. See the below script using your test_dict:
#https://stackoverflow.com/a/51359690
from string import Formatter
class NoneAsEmptyFormatter(Formatter):
def get_value(self, key, args, kwargs):
v = super().get_value(key, args, kwargs)
return ' ' if v is None else v
fmt = NoneAsEmptyFormatter()
test_dict = { 1: "a", 2: "b"}
test_str = fmt.format('{} {} {}', test_dict.get(1), test_dict.get(2), test_dict.get(3))
print(test_str)
We build a quick NoneAsEmptyFormatter class and use that to format the strings in coming from the dict.
Re your comment,
Now that you mention the extra space though, is there a way to remove the placeholder completely if key doesn't exist?
Yes, this is possible. Just make a list of values, filter out any Nones, then join the result:
In [3]: values = map(test_dict.get, [1, 2, 3])
In [4]: ' '.join(v for v in values if v is not None)
Out[4]: 'a b'
Or if order is not important, or if you're using Python 3.7+ and you want to preserve insertion order, you can skip some steps:
In [5]: ' '.join(test_dict.values())
Out[5]: 'a b'

How to scan a list for a partial appearance using a dictionary?

I’m trying to use a dictionary to scan a list of strings to see if it appears at all within the string for example let’s say I have a dictionary of {‘C99’:1, 'C4':1} with a list of [‘C99C2C3C5’, ‘C88C4’] then the new list would be [‘1’,'1’] because ‘C99’ appears within the string ‘C99C2C3C4’ and 'C4' appears in 'C88C4'.
My current method of doing this is:
import re
dict = {'C99': 1,'C15':1}
ComponentList = ['C1C15C99', 'C15', 'C17']
def func(s):
for k, v in dict.items():
if all(i in s for i in re.findall('\w\d', k)):
return v
else:
return 0
ComponentList = [func(i) for i in ComponentList]
Output:
[1, 1, 1]
Wanted Output:
[1,1,0]
For clarification, if this is my system:
my_dict = {'C1C55C99': 1, 'C17': 1, 'C3': 1}
component_list = ['C1C15C55C99', 'C15', 'C17']
Because 'C1C55C99' appears within 'C1C15C55C99' I'd want the value to change to the dictionary value to give an output:
results = ['1','0','1']
However, this method doesn’t work when the component number gets above C9 and am hoping someone could help me on a fix, so it can work for Cx and explain why the previous method didn’t work.
Thanks Ben
From your comments here, it seems to me that the character 'C' in your component list is significant because you seem to want to differentiate between 'C11' for example and 'C1'.
BTW, I fully agree with #martineau to always use standard naming in python. CamleCasingLikeThis should only be reserved for class names, and you should use lower_case_like_this for variables in general, and not capitalized.
Let's walk through how this can be done.
my_dict = {'C99': 1, 'C15': 1, 'C1': 1}
component_list = ['C1C15C99', 'C15', 'C17']
result = []
# first convert my_dict to a list of numbers ['99', '15', '1']
elements = [element[1:] for element in my_dict.keys()]
# for every component you want to characterize
for component in component_list:
# a flag to know if we found any element in this component
found = False
# split the string by the 'C' character to get its sub element numbers
# for example 'C1C15C99'.split('C') == ['', '1', '15', '99']
for sub_elem in component.split('C'):
# make sure sub_elem is not an empty string
if sub_elem:
# check if this sub element exists in elements
if sub_elem in elements:
found = True
# exit the inner loop
break
# convert the boolean to int (either 0 or 1)
# and finally add this to the result
result.append(int(found))
print(result)
# [1, 1, 0]
So far, I've been under the presumption that my_dict can only take singular components like C1 or C6 but not composites like C12C14. From your latest comment, it appears this is not the case. Two more things are suddenly made clear: my_dict can contain a combination of components, and when checking for existence of one in another, order doesn't matter. For example, C1C2 does exist in C5C2C7C1 but C1C2 does not exist in C1 since both sub components have to present.
This is very important and it changes the problem entirely. For future reference, please make sure to exhaustively describe your problem from t he start.
my_dict = {'C99': 1, 'C15': 1, 'C1': 1, 'C1C55C99': 1, 'C99C6': 1, 'C2C4C18': 1}
component_list = ['C1C15C99', 'C15', 'C17', 'C8C6C80C99', 'C6', 'C55C2C4C18C7', 'C55C1', 'C18C4']
result = []
# first convert my_dict to a list of lists containing singular elements
elements = [element.split('C')[1:] for element in my_dict.keys()]
# elements = [['2', '4', '18'], ['99'], ['1'], ['15'], ['99', '6'], ['1', '55', '99']]
for component in component_list:
found = False
# gather the sub elements for this components
comp_elements = component.split('C')[1:]
for composite_element in elements:
element_exists = True
# check if every singular element in this element is present in component
for signular_element in composite_element:
if signular_element not in comp_elements:
element_exists = False
break
if element_exists:
found = True
break
result.append(int(found))
print(result)
# [1, 1, 0, 1, 0, 1, 1, 0]
I'm bad at one liners but it is much more simple than yours, and there was no need to use regex, just use if x in y
def func(s):
for k, v in dict.items():
if k in s:
return v
return 0
Based on the edits to your question and comments, I think I (finally) understand what it is you want to do, so here's my substantially revised answer.
I think the code shown could stand a little improvement/optimization, but would first like confirmation that it's now doing the right thing.
import re
def func(comps):
pats = [c for c in re.findall(r'\w\d+', comps)]
for k, v in my_dict.items():
if any(p in k for p in pats):
return v
return 0
# Testcases
my_dict = {'C99': 1, 'C4': 1}
components_list = ['C99C2C3C5', 'C88C4']
result = [func(comps) for comps in components_list]
print('result:', result) # -> result: [1, 1]
my_dict = {'C99': 1,'C15': 1}
components_list = ['C1C15C99', 'C15', 'C17']
result = [func(comps) for comps in components_list]
print('result:', result) # -> result: [1, 1, 0]
my_dict = {'C1C55C99': 1, 'C17': 1, 'C3': 1}
components_list = ['C1C15C55C99', 'C15', 'C17']
result = [func(comps) for comps in components_list]
print('result:', result) # -> result: [1, 0, 1]
Note: You really shouldn't name variables the same as Python built-ins, like dict, as it's confusing and can cause subtle bugs unless you're very careful (or just got lucky).
Generally I would suggest following the PEP 8 - Style Guide for Python Code, especially the Nnaming Conventions section, which would require also changing ComponentList into lowercase words separated by "_" characters—in this case, components_list would conform to the guidelines.

Iteratively declare variables based on string?

Not sure if this has been asked before or not. Its a bit of an odd question, so I'll go ahead and fire away.
I've got some variable (or rather constant) definitions:
# Constants
# Colors
RED="RED"
ORANGE="ORANGE"
YELLOW="YELLOW"
GREEN="GREEN"
CYAN="CYAN"
BLUE="BLUE"
MAGENTA="MAGENTA"
# Modes
PANIC="PANIC"
SOLID="SOLID"
BREATHING="BREATHING"
# Special sub-modes (for panic)
BLINKING="BLINKING"
# Declare them
SOLID_RED="{}_{}".format(SOLID,RED)
SOLID_BLUE="{}_{}".format(SOLID,BLUE)
SOLID_MAGENTA="{}_{}".format(SOLID,MAGENTA)
## ..
BREATHING_RED="{}_{}".format(BREATHING,RED)
BREATHING_BLUE="{}_{}".format(BREATHING,BLUE)
BREATHING_MAGENTA="{}_{}".format(BREATHING,MAGENTA)
## ..
PANIC_RED="{}_{}".format(PANIC,RED)
PANIC_BLUE="{}_{}".format(PANIC,BLUE)
PANIC_MAGENTA="{}_{}".format(PANIC,MAGENTA)
## ..
PANIC_BLINKING="{}_{}".format(PANIC,BLINKING)
I got a lot of definitions! Instead of having to type them all out like this, would there be a way for me to just construct all these constants into existence as strings only using the definitions BEFORE # declare them , or by using, say, a dictionary?
The format I'd need for such a iterative construction is: MODE_COLOR naming convention.
I require that this answer works using Python 2.7. As I have some dependent 2.7 APIs included.
Another way using itertools.combinations and locals():
from itertools import combinations
from pprint import pprint
# Colors
RED="RED"
ORANGE="ORANGE"
YELLOW="YELLOW"
GREEN="GREEN"
CYAN="CYAN"
BLUE="BLUE"
MAGENTA="MAGENTA"
# Modes
PANIC="PANIC"
SOLID="SOLID"
BREATHING="BREATHING"
# Special sub-modes (for panic)
BLINKING="BLINKING"
v_consts = {k:v for k, v in locals().items() if k.isupper()}
combs = combinations(v_consts.values(), 2)
d_consts = {'%s_%s' % k: '%s_%s' % k for k in combs}
pprint(d_consts)
# Edit:
# If you want to add the created variables in Python's scope
# You can do something like this
globals().update(d_consts)
print SOLID_BLINKING, type(SOLID_BLINKING)
Output:
{'BLINKING_CYAN': 'BLINKING_CYAN',
'BLINKING_MAGENTA': 'BLINKING_MAGENTA',
'BLINKING_ORANGE': 'BLINKING_ORANGE',
'BLINKING_PANIC': 'BLINKING_PANIC',
'BLINKING_RED': 'BLINKING_RED',
...
'YELLOW_MAGENTA': 'YELLOW_MAGENTA',
'YELLOW_ORANGE': 'YELLOW_ORANGE',
'YELLOW_PANIC': 'YELLOW_PANIC',
'YELLOW_RED': 'YELLOW_RED'}
SOLID_BLINKING <type 'str'>
I would use a dictionary as the container to store the variables. Just list all of the colors and modes in lists, and then use a dictionary comprehension:
colors_list = ['red', 'blue']
modes_list = ['panic', 'solid']
color_modes = {k1 + '_' + k2: k1.upper() + '_' + k2.upper()
for k1 in colors_list for k2 in modes_list}
>>> color_modes
{'blue_panic': 'BLUE_PANIC',
'blue_solid': 'BLUE_SOLID',
'red_panic': 'RED_PANIC',
'red_solid': 'RED_SOLID'}
I think what you're trying to do is emitting a bit of a code smell.
The way I might approach this is by using a dictionary and a cross product. Here's a minified example:
from itertools import product
A = ['a', 'b', 'c']
B = ['d', 'e', 'f']
AB = {"{0} {1}".format(a, b): "{0}_{1}".format(a, b) for a, b in product(A, B)}
print(AB)
You can apply this to your colors and modifiers and access the colors by name:
colors['Magenta Solid']

How to convert a malformed string to a dictionary?

I have a string s (note that the a and b are not enclosed in quotation marks, so it can't directly be evaluated as a dict):
s = '{a:1,b:2}'
I want convert this variable to a dict like this:
{'a':1,'b':2}
How can I do this?
This will work with your example:
import ast
def elem_splitter(s):
return s.split(':',1)
s = '{a:1,b:2}'
s_no_braces = s.strip()[1:-1] #s.translate(None,'{}') is more elegant, but can fail if you can have strings with '{' or '}' enclosed.
elements = (elem_splitter(ss) for ss in s_no_braces.split(','))
d = dict((k,ast.literal_eval(v)) for k,v in elements)
Note that this will fail if you have a string formatted as:
'{s:"foo,bar",ss:2}' #comma in string is a problem for this algorithm
or:
'{s,ss:1,v:2}'
but it will pass a string like:
'{s ss:1,v:2}' #{"s ss":1, "v":2}
You may also want to modify elem_splitter slightly, depending on your needs:
def elem_splitter(s):
k,v = s.split(':',1)
return k.strip(),v # maybe `v.strip() also?`
*Somebody else might cook up a better example using more of the ast module, but I don't know it's internals very well, so I doubt I'll have time to make that answer.
As your string is malformed as both json and Python dict so you neither can use json.loads not ast.literal_eval to directly convert the data.
In this particular case, you would have to manually translate it to a Python dictionary by having knowledge of the input data
>>> foo = '{a:1,b:2}'
>>> dict(e.split(":") for e in foo.translate(None,"{}").split(","))
{'a': '1', 'b': '2'}
As Updated by Tim, and my short-sightedness I missed the fact that the values should be integer, here is an alternate implementation
>>> {k: int(v) for e in foo.translate(None,"{}").split(",")
for k, v in [e.split(":")]}
{'a': 1, 'b': 2}
import re,ast
regex = re.compile('([a-z])')
ast.literal_eval(regex.sub(r'"\1"', s))
out:
{'a': 1, 'b': 2}
EDIT:
If you happen to have something like {foo1:1,bar:2} add an additional capture group to the regex:
regex = re.compile('(\w+)(:)')
ast.literal_eval(regex.sub(r'"\1"\2', s))
You can do it simply with this:
s = "{a:1,b:2}"
content = s[s.index("{")+1:s.index("}")]
to_int = lambda x: int(x) if x.isdigit() else x
d = dict((to_int(i) for i in pair.split(":", 1)) for pair in content.split(","))
For simplicity I've omitted exception handling if the string doesn't contain a valid specification, and also this version doesn't strip whitespace, which you may want. If the interpretation you prefer is that the key is always a string and the value is always an int, then it's even easier:
s = "{a:1,b:2}"
content = s[s.index("{")+1:s.index("}")]
d = dict((int(pair[0]), pair[1].strip()) for pair in content.split(","))
As a bonus, this version also strips whitespace from the key to show how simple it is.
import simplejson
s = '{a:1,b:2}'
a = simplejson.loads(s)
print a

Categories

Resources