Using ruamel.yaml, how to (1) produce exactly the same input (same order, comments, references, aliases, anchor) and (2) how to produce a dereferenced values?
For instance, given the following code
import sys
import ruamel.yaml
yaml_input = """\
shape: &shape
color: blue
square: &square
a: 5
rectangle:
<<: *shape
<<: *square
b: 3
color: green
"""
yaml = ruamel.yaml.YAML()
yaml.allow_duplicate_keys = True
data = yaml.load(yaml_input)
yaml.dump(data, sys.stdout)
Its output is
shape: &shape
color: blue
square:
a: 5
rectangle:
<<: *shape
b: 3
color: green
How to produce a dereferenced output, like
shape:
color: blue
square:
a: 5
rectangle:
b: 3
a: 5
color: green
How to produce the same output as the input itself (with implicit data, that is without dereferenceing values), lile
shape: &shape
color: blue
square: &square
a: 5
rectangle:
<<: *shape
<<: *square
b: 3
color: green
Having duplicate keys in a mapping is not allowed in YAML. There are however YAML libraries that ignore this,
including PyYAML. Historically, in ruamel.yaml, throwing an error on a duplicate key
was added first (including explicitly allowing duplicate keys), and preserving merge keys later on.
But the preservation of merge keys expects correct YAML (it would be nice if ruamel.yaml throws
an error when encountering duplicate merge keys when .allow_duplicate_keys is set, but it doesn't)
Since there is no mechanism built into
ruamel.yaml to process these in round-trip mode, you have to use:
<<: [*shape, *square]
import sys
import ruamel.yaml
yaml_str = """\
shape: &shape
color: blue
square: &square
a: 5
rectangle:
<<: [*shape, *square]
b: 3
color: green
"""
yaml = ruamel.yaml.YAML()
data = yaml.load(yaml_str)
yaml.dump(data, sys.stdout)
which gives the same outut as the input:
shape: &shape
color: blue
square: &square
a: 5
rectangle:
<<: [*shape, *square]
b: 3
color: green
Internally ruamel.yaml, gives you a dict like type that contains a list of the dicts of the merged mappings, so that
you can do:
print(data['rectangle']['a'])
and expect to see 5 printed. It does so by looking up 'a' in the dicts pointed to when not found in the main dict.
But if you assign to data['rectangle']['a'], the value for key a is set, whether the key was already there or not:
data['rectangle']['a'] = data['rectangle']['a']
will dump with a: 5 at the bottom (and of course still as part of square as well).
That means you can do:
yaml = ruamel.yaml.YAML()
data = yaml.load(yaml_str)
rectangle = data['rectangle']
for k, v in rectangle.items():
rectangle[k] = v
yaml.dump(data, sys.stdout)
and get:
shape: &shape
color: blue
square: &square
a: 5
rectangle:
<<: [*shape, *square]
b: 3
color: green
a: 5
and if you want to get rid of the merge keys, it is easiest you create a new dict:
yaml = ruamel.yaml.YAML()
data = yaml.load(yaml_str)
rectangle = data['rectangle']
new_rectangle = {}
for k, v in rectangle.items():
new_rectangle[k] = v
data['rectangle'] = new_rectangle
yaml.dump(data, sys.stdout)
and get:
shape:
color: blue
square:
a: 5
rectangle:
b: 3
color: green
a: 5
Related
# a colorbutton (which opens a dialogue window in
# which we choose a color)
self.button_err_bg = Gtk.ColorButton()
# with a default color (blue, in this instance)
color_bg = Gdk.RGBA()
color_bg.red = 0.5
color_bg.green = 0.4
color_bg.blue = 0.3
color_bg.alpha = 1.0
color_error_background = self.button_err_bg.set_rgba(color_bg)
# choosing a color in the dialogue window emits a signal
self.button_err_bg.connect("color-set", self.on_color_fg_error_chosen)
and method
def on_color_fg_error_chosen(self, user_data):
print("You chose the color: " + self.button_err_bg.get_rgba().to_string())
color_rgba = self.button.get_rgba().to_string()
color_rgba_bracket = color_rgba[color_rgba.find("(")+1:color_rgba.find(")")]
color_hex = '#{:02x}{:02x}{:02x}'.format(color_rgba_bracket)
print(color_hex)
color_hex :color_hex = '#{:02x}{:02x}{:02x}'.format(color_rgba_bracket)
ValueError: Unknown format code 'x' for object of type 'str'
Based on your code, I'm going to assume that self.button.get_rgba() returns a tuple.
Since you convert the assumed tuple to a string (for some reason), the format fails because, as your error shows, it doesn't know how to convert a string to hex. Additionally, even if it did have an x format for a string, it would fail with IndexError: tuple index out of range because it's expecting 3 items and you're only passing 1.
If you skip the string conversion altogether, and unpack the tuple, you should get a proper format:
def on_color_fg_error_chosen(self, user_data):
print("You chose the color: " + self.button_err_bg.get_rgba().to_string())
color_rgba = self.button.get_rgba()
color_hex = '#{:02x}{:02x}{:02x}'.format(*color_rgba)
print(color_hex)
Using a basic example:
>>> color_rgba = (12, 28, 200, 28) # R, G, B, Alpha
>>> '#{:02x}{:02x}{:02x}'.format(*color_rgba) # 4th item is ignored because only looking for 3 items
'#0c1cc8'
I'm trying to dump a Python dict to a YAML file using ruamel.yaml. I'm familiar with the json module's interface, where pretty-printing a dict is as simple as
import json
with open('outfile.json', 'w') as f:
json.dump(mydict, f, indent=4, sort_keys=True)
With ruamel.yaml, I've gotten as far as
import ruamel.yaml
with open('outfile.yaml', 'w') as f:
ruamel.yaml.round_trip_dump(mydict, f, indent=2)
but it doesn't seem to support the sort_keys option. ruamel.yaml also doesn't seem to have any exhaustive docs, and searching Google for "ruamel.yaml sort" or "ruamel.yaml alphabetize" didn't turn up anything at the level of simplicity I'd expect.
Is there a one-or-two-liner for pretty-printing a YAML file with sorted keys?
(Note that I need the keys to be alphabetized down through the whole container, recursively; just alphabetizing the top level is not good enough.)
Notice that if I use round_trip_dump, the keys are not sorted; and if I use safe_dump, the output is not "YAML-style" (or more importantly "Kubernetes-style") YAML. I don't want [] or {} in my output.
$ pip freeze | grep yaml
ruamel.yaml==0.12.5
$ python
>>> import ruamel.yaml
>>> mydict = {'a':1, 'b':[2,3,4], 'c':{'a':1,'b':2}}
>>> print ruamel.yaml.round_trip_dump(mydict) # right format, wrong sorting
a: 1
c:
a: 1
b: 2
b:
- 2
- 3
- 4
>>> print ruamel.yaml.safe_dump(mydict) # wrong format, right sorting
a: 1
b: [2, 3, 4]
c: {a: 1, b: 2}
You need some recursive function that handles mappings/dicts, sequence/list:
import sys
import ruamel.yaml
CM = ruamel.yaml.comments.CommentedMap
yaml = ruamel.yaml.YAML()
data = dict(a=1, c=dict(b=2, a=1), b=[2, dict(e=6, d=5), 4])
yaml.dump(data, sys.stdout)
def rec_sort(d):
try:
if isinstance(d, CM):
return d.sort()
except AttributeError:
pass
if isinstance(d, dict):
# could use dict in newer python versions
res = ruamel.yaml.CommentedMap()
for k in sorted(d.keys()):
res[k] = rec_sort(d[k])
return res
if isinstance(d, list):
for idx, elem in enumerate(d):
d[idx] = rec_sort(elem)
return d
print('---')
yaml.dump(rec_sort(data), sys.stdout)
which gives:
a: 1
c:
b: 2
a: 1
b:
- 2
- e: 6
d: 5
- 4
---
a: 1
b:
- 2
- d: 5
e: 6
- 4
c:
a: 1
b: 2
The commented map is the structure ruamel.yaml uses when doing a
round-trip (load+dump) and round-tripping is designed to keep the keys in
the order that they were during loading.
The above should do a reasonable job preserving comments on mappings/sequences when you load data from a commented YAML file
There is an undocumented sort() in ruamel.yaml that will work on a variation of this problem:
import sys
import ruamel.yaml
yaml = ruamel.yaml.YAML()
test = """- name: a11
value: 11
- name: a2
value: 2
- name: a21
value: 21
- name: a3
value: 3
- name: a1
value: 1"""
test_yml = yaml.load(test)
yaml.dump(test_yml, sys.stdout)
not sorted output
- name: a11
value: 11
- name: a2
value: 2
- name: a21
value: 21
- name: a3
value: 3
- name: a1
value: 1
sort by name
test_yml.sort(lambda x: x['name'])
yaml.dump(test_yml, sys.stdout)
sorted output
- name: a1
value: 1
- name: a11
value: 11
- name: a2
value: 2
- name: a21
value: 21
- name: a3
value: 3
As pointed out in #Anthon's example, if you are using Python 3.7 or newer (and do not need to support lower versions), you just need:
import sys
from ruamel.yaml import YAML
yaml = YAML()
data = dict(a=1, c=dict(b=2, a=1), b=[2, dict(e=6, d=5), 4])
def rec_sort(d):
if isinstance(d, dict):
res = dict()
for k in sorted(d.keys()):
res[k] = rec_sort(d[k])
return res
if isinstance(d, list):
for idx, elem in enumerate(d):
d[idx] = rec_sort(elem)
return d
yaml.dump(rec_sort(data), sys.stdout)
Since dict is ordered as of that version.
This insertion sort mis sorting all elements but the last. It's very weird bc I have an identical function that sorts ALL elements by a different attribute. I tried copying, pasting, and altering the working func, but that seemed futile.
for i in range(1, len(metals)):
index = i
while index != 0 and metals[index].weightPerBar > metals[index - 1].weightPerBar:
metals[index], metals[index - 1] = metals[index - 1], metals[index]
index -= 1
Thanks
Heres the rest of module:
class Metal(struct):
"""
Represents a single metal type, composed of:
:slot name (str): The name of the metal
:slot totalBars (int): The total number of bars
:slot weightPerBar (int): The weight of a single bar
:slot valuePerBar (int): The value of a single bar
:slot valuePerWeight (float): The value per weight of the metal
:slot barsTaken (int): The number of bars added to the satchel
"""
_slots = ((str, "name"), (int, "totalBars"), (int, "weightPerBar"), (int, "valuePerBar"), (float, "valuePerWeight"), (int, "barsTaken"))
pass
def createMetal(name, totalBars, weightPerBar, valuePerBar):
"""
Create and return a new Metal object.
:param name (str): The name of the metal
:param totalBars (int): The total number of bars
:param weightPerBar (int): The weight of a single bar
:param valuePerBar (int): The value of a single bar
:return: A newly initialized Metal object
:rtype: Metal
"""
new_metal = Metal(name, totalBars, weightPerBar, valuePerBar)
return new_metal
pass
def readMetals(fileName):
"""
Read the metals from a file whose format is:
metalName totalBars weightPerBar valuePerBar
:param fileName (str): The name of the file
:return: A list of Metal objects
:rtype: list
"""
metal_list = []
file = open(fileName)
for line in file:
line = line.split()
weight_per_bar = float(line[3])/float(line[2]) # creating derived value
new_metal = Metal(line[0], int(line[1]), int(line[2]), int(line[3]), weight_per_bar, 0)
metal_list += [new_metal]
return metal_list
pass
def sortMetalsByValuePerBar(metals):
"""
Sort the metals by value per bar using insertion sort. The list of
metals is modified in place to be ordered by value per bar.
:param metals (list of Metal): The list of metals
:return: None
:rtype: NoneType
"""
for i in range(1, len(metals)):
index = i
while index != 0 and metals[index].valuePerBar > metals[index - 1].valuePerBar:
metals[index], metals[index - 1] = metals[index - 1], metals[index]
index -= 1
pass
def sortMetalsByValuePerWeight(metals):
"""
Sort the metals by value per weight using insertion sort. The list of
metals is modified in place to be ordered by value per weight.
:param metals (list of Metal): The list of metals
:return: None
:rtype: NoneType
"""
for i in range(1, len(metals)):
index = i
while index != 0 and metals[index].weightPerBar > metals[index - 1].weightPerBar:
metals[index], metals[index - 1] = metals[index - 1], metals[index]
index -= 1
pass
It should work if .weightPerBar are all of the same type and are numeric (not strings or other objects). If weight is a string, it could have the situation where "2", "6", "4", "10" sorts as "6", "4", "2", "10". Instead of 10, 6, 4, 2 as desired.
Your code works fine on my machine, but why are you implementing a sorting algorithm on your own? You can just use:
metals.sort(key=lambda metal: metal.weightPerBar, reverse=True)
I have a particle list of objects of type Particle, which takes two parameters, position and energy:
class Particle(object):
def __init__(self, pos, energy):
self.x = pos
self.E = energy
The only way I've managed to do it so far is to create a list of particles using a list comprehension:
number_of_particles = 10
initial_energy = 0
particle_list = [Particle(initial_energy,i) for i in range(number_of_particles)]
which now allows me to do things like:
particle_list[0].x
which is what I want.
However, what I would really, really like is to do something as follows:
particle_list = ParticleList(no_of_particles, initial_energy)
and it create the exact same list.
I assume I have to extend the list class somehow but I'm at a loss as to how to do this.
Why not just build a function to do this for you. You could do something simple like:
def ParticleList(no_of_particles, initial_energy):
return [Particle(initial_energy,i) for i in range(number_of_particles)]
This should be a simple way of getting your list.
class Particle(object):
def __init__(self, pos, energy):
self.x = pos
self.E = energy
#classmethod
def make_particle_list(self, no_particles, initial_energy=0):
return [Particle(i, initial_energy) for i in range(no_particles)]
# this is just for display purposes
def __repr__(self):
return 'pos: {p.x} - Energy: {p.E}'.format(p=self)
This offers you a little flexibility. If you only need one particle you can make just one the normal way or:
>>> lst = Particle.make_particle_list(10)
>>> lst
[pos: 0 - Energy: 0, pos: 1 - Energy: 0, pos: 2 - Energy: 0, pos: 3 - Energy: 0, pos: 4 - Energy: 0, pos: 5 - Energy: 0, pos: 6 - Energy: 0, pos: 7 - Energy: 0, pos: 8 - Energy: 0, pos: 9 - Energy: 0]
This also allows you to pass in a different initial_energy if you ever need a different value.
You also had your arguments backwards in your example. You had initial_energy as the first positional argument in your list comprehension but you have it as the second in your __init__() method.
Create your class with your custom __init__ method.
class ParticleList(list):
def __init__(self, num, energy):
self.particle_list = [Particle(energy,i) for i in range(num)]
particles = ParticleList(2, 0).particle_list
for particle in particles:
print (particle.x, particle.E)
>>(0, 0)
>>(0, 1)
You may create your own method and not use the __init__, this way you will be able to simply return the created list and not assign it to a member (__init__ is not allowed to have a return value).
class ParticleList(list):
def create_list(self, num, energy):
return [Particle(energy,i) for i in range(num)]
my_list = ParticleList().create_list(2, 0)
And as others have said, you don't even need the class and can get away with only creating a function:
def create_list(num, energy):
return [Particle(energy,i) for i in range(num)]
my_list = create_list(2, 0)
I have a large file (5Gb) called my_file. I have a list called my_list. What is the most efficient way to read each line in the file and, if an item from my_list matches an item from a line in my_file, create a new list called matches that contains items from the lines in my_file AND items from my_list where a match occurred. Here is what I am trying to do:
def calc(my_file, my_list)
matches = []
my_file.seek(0,0)
for i in my_file:
i = list(i.rstrip('\n').split('\t'))
for v in my_list:
if v[1] == i[2]:
item = v[0], i[1], i[3]
matches.append(item)
return matches
here are some lines in my_file:
lion 4 blue ch3
sheep 1 red pq2
frog 9 green xd7
donkey 2 aqua zr8
here are some items in my_list
intel yellow
amd green
msi aqua
The desired output, a list of lists, in the above example would be:
[['amd', 9, 'xd7'], ['msi', 2, 'zr8']]
My code is currently work, albeit really slow. Would using a generator or serialization help? Thanks.
You could build a dictonary for looking up v. I added further little optimizations:
def calc(my_file, my_list)
vd = dict( (v[1],v[0]) for v in my_list)
my_file.seek(0,0)
for line in my_file:
f0, f1, f2, f3 = line[:-1].split('\t')
v0 = vd.get(f2)
if v0 is not None:
yield (v0, f1, f3)
This should be much faster for a large my_list.
Using get is faster than checking if i[2] is in vd + accessing vd[i[2]]
For getting more speedup beyond these optimizations I recommend http://www.cython.org
Keep the items in a dictional rather than a list (let's call it items). Now iterate through your file as you're doing and pick out the key to look for (i[2]) and then check if it's there in the in items.
items would be.
dict (yellow = "intel", green = "amd", aqua = "msi")
So the checking part would be.
if i[2] in items:
yield [[items[i[2]], i[1], i[3]]
Since you're just creating the list and returning it, using a generator might help memory characteristics of the program rather than putting the whole thing into a list and returning it.
There isn't much you can do with the overheads of reading the file in, but based on your example code, you can speed up the matching by storing your list as a dict (with the target field as the key).
Here's an example, with a few extra optimisation tweaks:
mylist = {
"yellow" : "intel",
"green" : "amd",
# ....
}
matches = []
for line in my_file:
i = line[:-1].split("\t")
try: # faster to ask for forgiveness than permission
matches.append([mylist[i[2]], i[1], i[3]])
except NameError:
pass
But again, do note that most of your performance bottleneck will be in the reading of the file and optimisation at this level may not have a big enough impact on the runtime.
Here's a variation on #rocksportrocker's answer using csv module:
import csv
def calc_csv(lines, lst):
d = dict((v[1], v[0]) for v in lst) # use dict to speed up membership test
return ((d[f2], f1, f3)
for _, f1, f2, f3 in csv.reader(lines, dialect='excel-tab')
if f2 in d) # assume that intersection is much less than the file
Example:
def test():
my_file = """\
lion 4 blue ch3
sheep 1 red pq2
frog 9 green xd7
donkey 2 aqua zr8
""".splitlines()
my_list = [
("intel", "yellow"),
("amd", "green"),
("msi", "aqua"),
]
res = list(calc_csv(my_file, my_list))
assert [('amd', '9', 'xd7'), ('msi', '2', 'zr8')] == res
if __name__=="__main__":
test()