Append duplicate key value pair to nested dictionary in YAML

Append duplicate key value pair to nested dictionary in YAML - python

I am trying to append duplicate key: value pair to a nested dictionary in a YAML file through python script. Following is the snippet of the code which I have written to achieve this:
import click
import ruamel.yaml
def organization():
org_num = int(input("Please enter the number of organizations to be created: "))
org_val = 0
while org_val!= org_num:
print ("")
print("Please enter values to create Organizations")
print ("")
for org in range(org_num):
organization.org_name = str(raw_input("Enter the Organization Name: "))
organization.org_description = str(raw_input("Enter the Description of Organization: "))
print ("")
if click.confirm("Organization Name: "+ organization.org_name + "\nDescription: "+ organization.org_description + "\nIs this Correct?", default=True):
if org_val == 0:
org_val = org_val + 1
yaml = ruamel.yaml.YAML()
org_data = dict(
organizations=dict(
name=organization.org_name,
description=organization.org_description,
)
)
with open('input.yml', 'a') as outfile:
yaml.indent(mapping=2, sequence=4, offset=2)
yaml.dump(org_data, outfile)
else:
org_val = org_val + 1
yaml = ruamel.yaml.YAML()
org_data = dict(
name=organization.org_name,
description=organization.org_description,
)
with open('input.yml', 'r') as yamlfile:
cur_yaml = yaml.load(yamlfile)
cur_yaml['organizations'].update(org_data)
if cur_yaml:
with open('input.yml','w') as yamlfile:
yaml.indent(mapping=2, sequence=4, offset=2)
yaml.dump(cur_yaml, yamlfile)
return organization.org_name, organization.org_description
organization()
At the end of python script my input.yml file should look like the following:
version: x.x.x
is_enterprise: 'true'
license: secrets/license.txt
organizations:
- description: xyz
name: abc
- description: pqr
name: def
However every time the script is running, instead of appending the value to organizations it overwrites it.
I also have tried using append instead of update but I am getting the following error:
AttributeError: 'CommentedMap' object has no attribute 'append'
What can I do to solve this?
Also since I am new to development, any suggestion on making this code better will be really helpful.

If I understand correctly, what you want is cur_yaml['organizations'] += [org_data].
Note that if you run the script many times, you will have the same entry multiple times.

Using update is not going to work, because the value for for the key
organizations is a sequence and loads as the list-like type
CommentedSeq. So append-ing would be the right thing to do.
That that doesn't work is a bit unclear since you don't provide that input
that you start with, nor the code used when doing the appending which gets
an AttributeError on CommentedMap.
Here is what works if you have one organization and add another:
import sys
import ruamel.yaml
yaml_str = """\
version: x.x.x
is_enterprise: 'true'
license: secrets/license.txt
organizations:
- description: xyz
name: abc
"""
org_data = dict(
description='pqr',
name='def',
)
yaml = ruamel.yaml.YAML()
yaml.indent(mapping=4, sequence=4, offset=2)
cur_yaml = yaml.load(yaml_str)
cur_yaml['organizations'].append(org_data)
yaml.dump(cur_yaml, sys.stdout)
This gives:
version: x.x.x
is_enterprise: 'true'
license: secrets/license.txt
organizations:
- description: xyz
name: abc
- description: pqr
name: def
If you have no organizations yet, make sure your input YAML looks like:
version: x.x.x
is_enterprise: 'true'
license: secrets/license.txt
organizations: []
On older versions of Python the order of the keys in the data you added is not guaranteed. To
enforce that order on older version as well do:
org_data = ruamel.yaml.comments.CommentedMap((('description', 'pqr'), ('name', 'def')))
or
org_data = ruamel.yaml.comments.CommentedMap()
org_data['description'] = 'pqr'
org_data['name'] = 'def'

Found the issue and its working fine now. Since name and description are list objects for organizations, I have added [] to the below code and it started working.
org_data = dict(
organizations=[dict(
name=tower_organization.org_name,
description=tower_organization.org_description,
)
]
)
In addition to above, I guess append wasn't working because of the hyphen '-' which was missing from the first object as part of the identation. After fixing the above code, append is working fine as well.
Thank you all for your answers.

Related

Checking for Duplicates twice over in a File - Python

config.yml example,
DBtables:
CurrentMinuteLoad:
CSV_File: trend.csv
Table_Name: currentminuteload
GUI image,
This may not be the cleanest route to take.
I'm making a GUI that creates a config.yml file for another python script I'm working with.
Using pysimplegui, My button isn't functioning the way I'd expect it to. It currently and accurately checks for the Reference name (example here would be CurrentMinuteLoad) and will kick it back if it exists, but will skip the check for the table (so the ELIF statement gets skipped). Adding the table still works, I'm just not getting the double-check that I want. Also, I have to hit the Okay button twice in the GUI for it to work?? A weird quirk that doesn't quite make sense to me.
def add_table():
window2.read()
with open ("config.yml","r") as h:
if values['new_ref'] in h.read():
sg.popup('Reference name already exists')
elif values['new_db'] in h.read():
sg.popup('Table name already exists')
else:
with open("config.yml", "a+") as f:
f.write("\n " + values['new_ref'] +":")
f.write("\n CSV_File:" + values['new_csv'])
f.write("\n Table_Name:" + values['new_db'])
f.close()
sg.popup('The reference "' + values['new_ref'] + '" has been included and will add the table "' + values['new_db'] + '" to PG Admin during the next scheduled upload')

When you use h.read(), you should save the value since it will read it like a stream, and subsequent calls for this method will result in an empty string.
Try editing the code like this:
with open ("config.yml","r") as h:
content = h.read()
if values['new_ref'] in content:
sg.popup('Reference name already exists')
elif values['new_db'] in content:
sg.popup('Table name already exists')
else:
# ...

You should update the YAML file using a real YAML parser, that will allow you
to check on duplicate values, without using in, which will give you false
positives when a new value is a substring of an existing value (or key).
In the following I add values twice, and show the resulting YAML. The
first time around the check on new_ref and new_db does not find
a match although it is a substring of existing values. The second time
using the same values there is of course a match on the previously added
values.
import sys
import ruamel.yaml
from pathlib import Path
def add_table(filename, values, verbose=False):
error = False
yaml = ruamel.yaml.YAML()
data = yaml.load(filename)
dbtables = data['DBtables']
if values['new_ref'] in dbtables:
print(f'Reference name "{values["new_ref"]}" already exists') # use sg.popup in your code
error = True
for k, v in dbtables.items():
if values['new_db'] in v.values():
print(f'Table name "{values["new_db"]}" already exists')
error = True
if error:
return
dbtables[values['new_ref']] = d = {}
for x in ['new_cv', 'new_db']:
d[x] = values[x]
yaml.dump(data, filename)
if verbose:
sys.stdout.write(filename.read_text())
values = dict(new_ref='CurrentMinuteL', new_cv='trend_csv', new_db='currentminutel')
add_table(Path('config.yaml'), values, verbose=True)
print('========')
add_table(Path('config.yaml'), values, verbose=True)
which gives:
DBtables:
CurrentMinuteLoad:
CSV_File: trend.csv
Table_Name: currentminuteload
CurrentMinuteL:
new_cv: trend_csv
new_db: currentminutel
========
Reference name "CurrentMinuteL" already exists
Table name "currentminutel" already exists

Access yaml key if absent or empty

I want to share some ideas about how to access a key in a yaml file if the key may be empty or absent and see if there are other ways.
Consider three possible yaml files called 'dummy.yml':
model: phone
mode: production
model: phone
station_type: tester1
mode: production
model: phone
station_type:
mode: production
In Python we load the yaml into a python dictionary:
import yaml
with open("dummy.yml", 'r') as stream:
try:
config = yaml.load(stream)
except yaml.YAMLError as exc:
print(exc)
Now I want to access the station_type key. If the key is missing or the value is empty, we should get an empty string as the default value.
Here are some options for accessing station_type:
# KeyError if station_type missing:
st = config["station_type"]
# provide an empty string as the default value
# if key is present or absent, but return None # if value is empty:
st = config.get(
"station_type", "")
# my solution that returns a string in all three cases
st = ('' if not config.get("station_type") else config.get("station_type"))
Does this seem about right?

Here is some code testing your solution against a working solution:
import yaml
def orig_solution(config):
return ('' if not config.get("station_type") else config.get("station_type"))
def working_solution(config):
return tmp if (tmp := config.get("station_type")) is not None else ""
def test(name, solution, config):
print("\n{}:".format(name))
val = solution(config)
print("got value: {}".format(val))
config1 = yaml.safe_load("""
model: phone
mode: production
""")
config2 = yaml.safe_load("""
model: phone
station_type:
mode: production
""")
config3 = yaml.safe_load("""
model: phone
station_type: false
mode: production
""")
test("config1 with orig solution", orig_solution, config1)
test("config2 with orig solution", orig_solution, config2)
test("config3 with orig solution", orig_solution, config3)
test("config1 with working solution", working_solution, config1)
test("config2 with working solution", working_solution, config2)
test("config3 with working solution", working_solution, config3)
Output:
config1 with orig solution:
got value:
config2 with orig solution:
got value:
config3 with orig solution:
got value:
config1 with working solution:
got value:
config2 with working solution:
got value:
config3 with working solution:
got value: False
As you can see, your code returns the empty string on config3, which has false as the value. The working solution returns Python's False. The other inputs work with both solutions.
Now I am not completely sure which is the expected behavior in this case, but according to how you describe it, False should be the correct return value here and therefore your solution has a bug.

get comment during iteration in ruamel.yaml

How can I get the comments when I iterate through the YAML object
yaml = YAML()
with open(path, 'r') as f:
yaml_data = yaml.load(f)
for obj in yaml_data:
# how to get the comments here?
This is the source data (an ansible playbook)
---
- name: gather all complex custom facts using the custom module
hosts: switches
gather_facts: False
connection: local
tasks:
# There is a bug in ansible 2.4.1 which prevents it loading
# playbook/group_vars
- name: ensure we're running a known working version
assert:
that:
- 'ansible_version.major == 2'
- 'ansible_version.minor == 4'
After Anthon comments, this is the way I found to access the comments in the child nodes (needs to be refined):
for idx, obj in enumerate(yaml_data):
for i, item in enumerate(obj.items()):
pprint(yaml_data[i].ca.items)

You did not specify your input, but since your code expects an obj and
not a key, I assume the root level of your YAML is a sequence and not mapping.
If you want to get the comments after each element (i.e nr 1 and the last) you can do:
import ruamel.yaml
yaml_str = """\
- one # nr 1
- two
- three # the last
"""
yaml = ruamel.yaml.YAML()
data = yaml.load(yaml_str)
for idx, obj in enumerate(data):
comment_token = data.ca.items.get(idx)
if comment_token is None:
continue
print(repr(comment_token[0].value))
which gives:
'# nr 1\n'
'# the last\n'
You might want to strip of the leading octothorpe and trailing newline.
Please note that this works with the current version (0.15.61), but
there is no guarantee it might not to change.

Using the example from Anthon as well as an issue in ruamel.yaml on sourceforge, here's a set of methods which should allow you to retrieve (almost - see below) all the comments in your documents:
from ruamel.yaml import YAML
from ruamel.yaml.comments import CommentedMap, CommentedSeq
# set attributes
def get_comments_map(self, key):
coms = []
comments = self.ca.items.get(key)
if comments is None:
return coms
for token in comments:
if token is None:
continue
elif isinstance(token, list):
coms.extend(token)
else:
coms.append(token)
return coms
def get_comments_seq(self, idx):
coms = []
comments = self.ca.items.get(idx)
if comments is None:
return coms
for token in comments:
if token is None:
continue
elif isinstance(token, list):
coms.extend(token)
else:
coms.append(token)
return coms
setattr(CommentedMap, 'get_comments', get_comments_map)
setattr(CommentedSeq, 'get_comments', get_comments_seq)
# load string
yaml_str = """\
- name: gather all complex custom facts using the custom module
hosts: switches
gather_facts: False
connection: local
tasks:
# There is a bug in ansible 2.4.1 which prevents it loading
# playbook/group_vars
- name: ensure we're running a known working version
assert:
that:
- 'ansible_version.major == 2'
- 'ansible_version.minor == 4'
"""
yml = YAML(typ='rt')
data = yml.load(yaml_str)
def walk_data(data):
if isinstance(data, CommentedMap):
for k, v in data.items():
print(k, [ comment.value for comment in data.get_comments(k)])
if isinstance(v, CommentedMap) or isinstance(v, CommentedSeq):
walk_data(v)
elif isinstance(data, CommentedSeq):
for idx, item in enumerate(data):
print(idx, [ comment.value for comment in data.get_comments(idx)])
if isinstance(item, CommentedMap) or isinstance(item, CommentedSeq):
walk_data(item)
walk_data(data)
Here's the output:
0 []
name []
hosts []
gather_facts []
connection []
tasks ['# There is a bug in ansible 2.4.1 which prevents it loading\n', '# playbook/group_vars\n']
0 []
name []
assert []
that []
0 []
1 []
Unfortunately, there are two is one problems that I have encountered which are not covered by this method:
You will notice that there is no leading \n in the comments for tasks. As a result, it is not possible with this method to differentiate between comments which start on the same line as tasks or on the next line. Since the CommentToken.start_mark.line only contains the absolute line of the comment, it might be able to be compared to the line of tasks. But, I have not yet found a way to retrieve the line associated with tasks inside the loaded data.
There does not seem to be a way that I have found yet to retrieve comments at the head of the document. So, any initial comments would need to be discovered using a method other than to retrieve them outside the yaml reader. But, related to problem #1, these head comments are included in the absolute line count of other comments. To add the comments at the head of the document, you need to use [comment.value for comment in data.ca.comment[1] as per this explanation by Anthon.

Python string comparison using pymarc marc8_to_unicode no longer working

My code imports a MARC file using MARCReader and compares a string against a list of acceptable answers. If the string from MARC has no match in my list, it gets added to an error list. This has worked for years in Python 2.7.4 installations on Windows 7 with no issue. I recently got a Windows 10 machine and installed Python 2.7.10, and now strings with non-standard characters fail that match. the issue is not Python 2.7.10 alone; I've installed every version from 2.7.4 through 2.7.10 on this new machine, and get the same problem. A new install of Python 2.7.10 on a Windows 7 machine also gets the problem.
I've trimmed out functions that aren't relevant, and I've dramatically trimmed the master list. In this example, "Académie des Sciences" is an existing repository, but "Acadm̌ie des Sciences" now appears in our list of new repositories.
# -*- coding: utf-8 -*-
from aipmarc import get_catdb, get_bibno, parse_date
from phfawstemplate import browsepage #, nutchpage, eadpage, titlespage
from pymarc import MARCReader, marc8_to_unicode
from time import strftime
from umlautsort import alafiling
import urllib2
import sys
import os
import string
def make_newrepos_list(list, fn): # Create list of unexpected repositories found in the MArcout database dump
output = "These new repositories are not yet included in the master list in phfaws.py. Please add the repository code (in place of ""NEWCODE*""), and the URL (in place of ""TEST""), and then add these lines to phfaws.py. Please keep the list alphabetical. \nYou can find repository codes at http://www.loc.gov/marc/organizations/ \n \n"
for row in list:
output = '%s reposmasterlist.append([u"%s", "%s", "%s"])\n' % (output, row[0], row[1], row[2])
fh = open(fn,'w')
fh.write(output.encode("utf-8"))
fh.close()
def main(marcfile):
reader = MARCReader(file(marcfile))
'''
Creating list of preset repository codes.
'''
reposmasterlist =[[u"American Institute of Physics", "MdCpAIP", "http://www.aip.org/history/nbl/index.html"]]
reposmasterlist.append([u"Académie des Sciences", "FrACADEMIE", "http://www.academie-sciences.fr/fr/Transmettre-les-connaissances/inventaires-des-fonds-d-archives-personnelles.html"])
reposmasterlist.append([u"American Association for the Advancement of Science", "daaas", "http://archives.aaas.org/"])
newreposcounter = 0
newrepos = ""
newreposlist = []
findingaidcounter = 0
reposcounter = 0
for record in reader:
if record['903']: # Get only records where 903a="PHFAWS"
phfawsfull = record.get_fields('903')
for field in phfawsfull:
phfawsnote = field['a']
if 'PHFAWS' in phfawsnote:
if record['852'] is not None: # Get only records where 852/repository is not blank
repository = record.get_fields('852')
for field in repository:
reposname = field['a']
reposname = marc8_to_unicode(reposname) # Convert repository name from MARC file to Unicode
reposname = reposname.rstrip('.,')
reposcode = None
reposurl = None
for row in reposmasterlist: # Match field 852 repository against the master list.
if row[0] == reposname: # If it's in the master list, use the master list to populate our repository-related fields
reposcode = row[1]
reposurl = row[2]
if record['856'] is not None: # Get only records where 856 is not blank and includes "online finding aid"
links = record.get_fields('856')
for field in links:
linksthree = field['3']
if linksthree is not None and "online finding aid" in linksthree:
if reposcode == None: # If this record's repository wasn't in the master list, add to list of new repositories
newreposcounter += 1
newrepos = '%s %s \n' % (newrepos, reposname)
reposcode = "NEWCODE" + str(newreposcounter)
reposurl = "TEST"
reposmasterlist.append([reposname, reposcode, reposurl])
newreposlist.append([reposname, reposcode, reposurl])
human_url = field['u']
else:
pass
else:
pass
else:
pass
else:
pass
else:
pass
# Output list of new repositories
newreposlist.sort(key = lambda rep: rep[0])
if newreposcounter != 0:
status = '%d new repositories found. you must add information on these repositories, then run phfaws.py again. Please see the newly updated rewrepos.txt for details.' % (newreposcounter)
sys.stderr.write(status)
make_newrepos_list(newreposlist, 'newrepos.txt')
if __name__ == '__main__':
try:
mf = sys.argv[1]
sys.exit(main(mf))
except IndexError:
sys.exit('Usage: %s <marcfile>' % sys.argv[0])
Edit: I've found that simply commenting out the "reposname = marc8_to_unicode(reposname)" line gets me the results I want. I still don't understand why this is, since it was a necessary step before.

Edit: I've found that simply commenting out the "reposname = marc8_to_unicode(reposname)" line gets me the results I want. I still don't understand why this is, since it was a necessary step before.
This suggests to me that the encoding of strings in your database changed from MARC8 to Unicode. Have you upgraded your cataloging system recently?

Userfriendly way of handling config files in python?

I want to write a program that sends an e-mail to one or more specified recipients when a certain event occurs. For this I need the user to write the parameters for the mail server into a config. Possible values are for example: serveradress, ports, ssl(true/false) and a list of desired recipients.
Whats the user-friendliest/best-practice way to do this?
I could of course use a python file with the correct parameters and the user has to fill it out, but I wouldn't consider this user friendly. I also read about the 'config' module in python, but it seems to me that it's made for creating config files on its own, and not to have users fill the files out themselves.

Are you saying that the fact that the config file would need to be valid Python makes it unfriendly? It seems like having lines in a file like:
server = 'mail.domain.com'
port = 25
...etc would be intuitive enough while still being valid Python. If you don't want the user to have to know that they have to quote strings, though, you might go the YAML route. I use YAML pretty much exclusively for config files and find it very intuitive, and it would also be intuitive for an end user I think (though it requires a third-party module - PyYAML):
server: mail.domain.com
port: 25
Having pyyaml load it is simple:
>>> import yaml
>>> yaml.load("""a: 1
... b: foo
... """)
{'a': 1, 'b': 'foo'}
With a file it's easy too.
>>> with open('myconfig.yaml', 'r') as cfile:
... config = yaml.load(cfile)
...
config now contains all of the parameters.

I doesn't matter technically proficient your users are; you can count on them to screw up editing a text file. (They'll save it in the wrong place. They'll use MS Word to edit a text file. They'll make typos.) I suggest making a gui that validates the input and creates the configuration file in the correct format and location. A simple gui created in Tkinter would probably fit your needs.

I've been using ConfigParser. It's designed to read .ini style files that have:
[section]
option = value
It's quite easy to use and the documentation is pretty easy to read. Basically you just load the whole file into a ConfigParser object:
import ConfigParser
config = ConfigParser.ConfigParser()
config.read('configfile.txt')
Then you can make sure the users haven't messed anything up by checking the options. I do so with a list:
OPTIONS =
['section,option,defaultvalue',
.
.
.
]
for opt in OPTIONS:
section,option,defaultval = opt.split(',')
if not config.has_option(section,option):
print "Missing option %s in section %s" % (option,section)
Getting the values out is easy too.
val = config.get('section','option')
And I also wrote a function that creates a sample config file using that OPTIONS list.
new_config = ConfigParser.ConfigParser()
for opt in OPTIONS:
section,option,defaultval = opt.split(',')
if not new_config.has_section(section):
new_config.add_section(section)
new_config.set(section, option, defaultval)
with open("sample_configfile.txt", 'wb') as newconfigfile:
new_config.write(newconfigfile)
print "Generated file: sample_configfile.txt"

What are the drawbacks of such a solution:
ch = 'serveradress = %s\nport = %s\nssl = %s'
a = raw_input("Enter the server's address : ")
b = 'a'
bla = "\nEnter the port : "
while not all(x.isdigit() for x in b):
b = raw_input(bla)
bla = "Take care: you must enter digits exclusively\n"\
+" Re-enter the port (digits only) : "
c = ''
bla = "\nChoose the ssl option (t or f) : "
while c not in ('t','f'):
c = raw_input(bla)
bla = "Take care: you must type f or t exclusively\n"\
+" Re-choose the ssl option : "
with open('configfile.txt','w') as f:
f.write(ch % (a,b,c))
.
PS
I've read in the jonesy's post that the value in a config file may have to be quoted. If so, and you want the user not to have to write him/her-self the quotes, you simply add
a = a.join('""')
b = b.join('""')
c = c.join('""')
.
EDIT
ch = 'serveradress = %s\nport = %s\nssl = %s'
d = {0:('',
"Enter the server's address : "),
1:("Take care: you must enter digits exclusively",
"Enter the port : "),
2:("Take care: you must type f or t exclusively",
"Choose the ssl option (t or f) : ") }
def func(i,x):
if x is None:
return False
if i==0:
return True
elif i==1:
try:
ess = int(x)
return True
except:
return False
elif i==2:
if x in ('t','f'):
return True
else:
return False
li = len(d)*[None]
L = range(len(d))
while True:
for n in sorted(L):
bla = d[n][1]
val = None
while not func(n,val):
val = raw_input(bla)
bla = '\n '.join(d[n])
li[n] = val.join('""')
decision = ''
disp = "\n====== If you choose to process, =============="\
+"\n the content of the file will be :\n\n" \
+ ch % tuple(li) \
+ "\n==============================================="\
+ "\n\nDo you want to process (type y) or to correct (type c) : "
while decision not in ('y','c'):
decision = raw_input(disp)
disp = "Do you want to process (type y) or to correct (type c) ? : "
if decision=='y':
break
else:
diag = False
while not diag:
vi = '\nWhat lines do you want to correct ?\n'\
+'\n'.join(str(j)+' - '+line for j,line in enumerate((ch % tuple(li)).splitlines()))\
+'\nType numbers of lines belonging to range(0,'+str(len(d))+') separated by spaces) :\n'
to_modify = raw_input(vi)
try:
diag = all(int(entry) in xrange(len(d)) for entry in to_modify.split())
L = [int(entry) for entry in to_modify.split()]
except:
diag = False
with open('configfile.txt','w') as f:
f.write(ch % tuple(li))
print '-*- Recording of the config file : done -*-'

Develop Reference

Python is a programming language that lets you work quickly and integrate systems more effectively.

Append duplicate key value pair to nested dictionary in YAML - python

If I understand correctly, what you want is cur_yaml['organizations'] += [org_data]. Note that if you run the script many times, you will have the same entry multiple times.

Related

Checking for Duplicates twice over in a File - Python

Access yaml key if absent or empty

get comment during iteration in ruamel.yaml

Python string comparison using pymarc marc8_to_unicode no longer working

Userfriendly way of handling config files in python?

Categories

Resources