How can I get the comments when I iterate through the YAML object
yaml = YAML()
with open(path, 'r') as f:
yaml_data = yaml.load(f)
for obj in yaml_data:
# how to get the comments here?
This is the source data (an ansible playbook)
---
- name: gather all complex custom facts using the custom module
hosts: switches
gather_facts: False
connection: local
tasks:
# There is a bug in ansible 2.4.1 which prevents it loading
# playbook/group_vars
- name: ensure we're running a known working version
assert:
that:
- 'ansible_version.major == 2'
- 'ansible_version.minor == 4'
After Anthon comments, this is the way I found to access the comments in the child nodes (needs to be refined):
for idx, obj in enumerate(yaml_data):
for i, item in enumerate(obj.items()):
pprint(yaml_data[i].ca.items)
You did not specify your input, but since your code expects an obj and
not a key, I assume the root level of your YAML is a sequence and not mapping.
If you want to get the comments after each element (i.e nr 1 and the last) you can do:
import ruamel.yaml
yaml_str = """\
- one # nr 1
- two
- three # the last
"""
yaml = ruamel.yaml.YAML()
data = yaml.load(yaml_str)
for idx, obj in enumerate(data):
comment_token = data.ca.items.get(idx)
if comment_token is None:
continue
print(repr(comment_token[0].value))
which gives:
'# nr 1\n'
'# the last\n'
You might want to strip of the leading octothorpe and trailing newline.
Please note that this works with the current version (0.15.61), but
there is no guarantee it might not to change.
Using the example from Anthon as well as an issue in ruamel.yaml on sourceforge, here's a set of methods which should allow you to retrieve (almost - see below) all the comments in your documents:
from ruamel.yaml import YAML
from ruamel.yaml.comments import CommentedMap, CommentedSeq
# set attributes
def get_comments_map(self, key):
coms = []
comments = self.ca.items.get(key)
if comments is None:
return coms
for token in comments:
if token is None:
continue
elif isinstance(token, list):
coms.extend(token)
else:
coms.append(token)
return coms
def get_comments_seq(self, idx):
coms = []
comments = self.ca.items.get(idx)
if comments is None:
return coms
for token in comments:
if token is None:
continue
elif isinstance(token, list):
coms.extend(token)
else:
coms.append(token)
return coms
setattr(CommentedMap, 'get_comments', get_comments_map)
setattr(CommentedSeq, 'get_comments', get_comments_seq)
# load string
yaml_str = """\
- name: gather all complex custom facts using the custom module
hosts: switches
gather_facts: False
connection: local
tasks:
# There is a bug in ansible 2.4.1 which prevents it loading
# playbook/group_vars
- name: ensure we're running a known working version
assert:
that:
- 'ansible_version.major == 2'
- 'ansible_version.minor == 4'
"""
yml = YAML(typ='rt')
data = yml.load(yaml_str)
def walk_data(data):
if isinstance(data, CommentedMap):
for k, v in data.items():
print(k, [ comment.value for comment in data.get_comments(k)])
if isinstance(v, CommentedMap) or isinstance(v, CommentedSeq):
walk_data(v)
elif isinstance(data, CommentedSeq):
for idx, item in enumerate(data):
print(idx, [ comment.value for comment in data.get_comments(idx)])
if isinstance(item, CommentedMap) or isinstance(item, CommentedSeq):
walk_data(item)
walk_data(data)
Here's the output:
0 []
name []
hosts []
gather_facts []
connection []
tasks ['# There is a bug in ansible 2.4.1 which prevents it loading\n', '# playbook/group_vars\n']
0 []
name []
assert []
that []
0 []
1 []
Unfortunately, there are two is one problems that I have encountered which are not covered by this method:
You will notice that there is no leading \n in the comments for tasks. As a result, it is not possible with this method to differentiate between comments which start on the same line as tasks or on the next line. Since the CommentToken.start_mark.line only contains the absolute line of the comment, it might be able to be compared to the line of tasks. But, I have not yet found a way to retrieve the line associated with tasks inside the loaded data.
There does not seem to be a way that I have found yet to retrieve comments at the head of the document. So, any initial comments would need to be discovered using a method other than to retrieve them outside the yaml reader. But, related to problem #1, these head comments are included in the absolute line count of other comments. To add the comments at the head of the document, you need to use [comment.value for comment in data.ca.comment[1] as per this explanation by Anthon.
Related
I have a Lambda python function that I inherited which searches and reports on installed packages on EC2 instances. It pulls this information from SSM Inventory where the results are output to an S3 bucket. All of the installed packages have specific names until now. Now we need to report on Palo Alto Cortex XDR. The issue I'm facing is that this product includes the version number in the name and we have different versions installed. If I use the exact name (i.e. Cortex XDR 7.8.1.11343) I get reporting on that particular version but not others. I want to use a wild card to do this. I import regex (import re) on line 7 and then I change line 71 to xdr=line['Cortex*']) but it gives me the following error. I'm a bit new to Python and coding so any explanation as to what I'm doing wrong would be helpful.
File "/var/task/SoeSoftwareCompliance/RequiredSoftwareEmail.py", line 72, in build_html
xdr=line['Cortex*'])
import configparser
import logging
import csv
import json
from jinja2 import Template
import boto3
import re
# config
config = configparser.ConfigParser()
config.read("config.ini")
# logging
logger = logging.getLogger()
logger.setLevel(logging.INFO)
# #TODO
# refactor common_csv_header so that we use one with variable
# so that we write all content to one template file.
def build_html(account=None,
ses_email_address=None,
recipient_email=None):
"""
:param recipient_email:
:param ses_email_address:
:param account:
"""
account_id = account["id"]
account_alias = account["alias"]
linux_ec2s = []
windows_ec2s = []
ec2s_not_in_ssm = []
excluded_ec2s = []
# linux ec2s html
with open(f"/tmp/{account_id}_linux_ec2s_required_software_report.csv", "r") as fp:
lines = csv.DictReader(fp)
for line in lines:
if line["platform-type"] == "Linux":
item = dict(id=line['instance-id'],
name=line['instance-name'],
ip=line['ip-address'],
ssm=line['amazon-ssm-agent'],
cw=line['amazon-cloudwatch-agent'],
ch=line['cloudhealth-agent'])
# skip compliant linux ec2s where are values are found
compliance_status = not all(item.values())
if compliance_status:
linux_ec2s.append(item)
# windows ec2s html
with open(f"/tmp/{account_id}_windows_ec2s_required_software_report.csv", "r") as fp:
lines = csv.DictReader(fp)
for line in lines:
if line["platform-type"] == "Windows":
item = dict(id=line['instance-id'],
name=line['instance-name'],
ip=line['ip-address'],
ssm=line['Amazon SSM Agent'],
cw=line['Amazon CloudWatch Agent'],
ch=line['CloudHealth Agent'],
mav=line['McAfee VirusScan Enterprise'],
trx=line['Trellix Agent'],
xdr=line['Cortex*'])
# skip compliant windows ec2s where are values are found
compliance_status = not all(item.values())
if compliance_status:
windows_ec2s.append(item)
# ec2s not found in ssm
with open(f"/tmp/{account_id}_ec2s_not_in_ssm.csv", "r") as fp:
lines = csv.DictReader(fp)
for line in lines:
item = dict(name=line['instance-name'],
id=line['instance-id'],
ip=line['ip-address'],
pg=line['patch-group'])
ec2s_not_in_ssm.append(item)
# display or hide excluded ec2s from report
display_excluded_ec2s_in_report = json.loads(config.get("settings", "display_excluded_ec2s_in_report"))
if display_excluded_ec2s_in_report == "true":
with open(f"/tmp/{account_id}_excluded_from_compliance.csv", "r") as fp:
lines = csv.DictReader(fp)
for line in lines:
item = dict(id=line['instance-id'],
name=line['instance-name'],
pg=line['patch-group'])
excluded_ec2s.append(item)
# pass data to html template
with open('templates/email.html') as file:
template = Template(file.read())
# pass parameters to template renderer
html = template.render(
linux_ec2s=linux_ec2s,
windows_ec2s=windows_ec2s,
ec2s_not_in_ssm=ec2s_not_in_ssm,
excluded_ec2s=excluded_ec2s,
account_id=account_id,
account_alias=account_alias)
# consolidated html with multiple tables
tables_html_code = html
client = boto3.client('ses')
client.send_email(
Destination={
'ToAddresses': [recipient_email],
},
Message={
'Body': {
'Html':
{'Data': tables_html_code}
},
'Subject': {
'Charset': 'UTF-8',
'Data': f'SOE | Software Compliance | {account_alias}',
},
},
Source=ses_email_address,
)
print(tables_html_code)
If I understand your problem correctly, you are getting a KeyError exception because Python does not support wildcards out of the box. A csv.DictReader creates a standard Python dictionary for each row in csv. Python's dictionary is just an associative array without pattern matching.
You can implement this by regex, though. If you have a dictionary line and you don't know the full name of a key you are looking for, you can solve it by re.search function.
line = {'Cortex XDR 7.8.1.11343': 'Some value you are looking for'}
val = next(v for k, v in line.items() if re.search('Cortex.+', k))
print(val) # 'Some value you are looking for'
Be aware that this assumes that a line dictionary contains at least one item that matches the 'Cortex.+' pattern and returns the first match. You would have to refactor this a bit to change this.
1. import os - missing in the code
2. def build_html(account=None -> When the account is pass with Nonetype and below error will thrown in account["id"] and account["alias"].
Ex:
Traceback (most recent call last):
File "C:\Users\pro\Documents\project\pywilds.py", line 134, in <module>
build_html(account=None)
File "C:\Users\pro\Documents\project\pywilds.py", line 33, in build_html
account_id = account["id"]
TypeError: 'NoneType' object is not subscriptable
I hope it helps..
i am using ConfigParser to write some modification in a configuration file, basically what i am doing is :
retrieve my urls from an api
write them in my config file
but after the edit, i noticed that the file format has changed :
Before the edit :
[global_tags]
[agent]
interval = "10s"
round_interval = true
metric_batch_size = 10000
[[inputs.cpu]]
percpu = true
totalcpu = true
[[inputs.prometheus]]
urls= []
interval = "140s"
[inputs.prometheus.tags]
exp = "exp"
After the edit :
[global_tags]
[agent]
interval = "10s"
round_interval = true
metric_batch_size = 10000
[[inputs.cpu]
percpu = true
totalcpu = true
[[inputs.prometheus]
interval = "140s"
response_timeout = "120s"
[inputs.prometheus.tags]
exp = "snmp"
the offset changed and all the comments that were in the file has been deleted, my code :
edit = configparser.ConfigParser(strict=False, allow_no_value=True, empty_lines_in_values=False)
edit.read("file.conf")
edit.set("[section]", "urls", str(urls))
print(edit)
# Write changes back to file
with open('file.conf', 'w') as configfile:
edit.write(configfile)
I have already tried : SafeConfigParser, RawConfigParser but it doesn't work.
when i do a print(edit.section()), here is what i get : ['global_tags', 'agent', '[inputs.cpu', , '[inputs.prometheus', 'inputs.prometheus.tags']
Is there any help please ?
Here's an example of a "filter" parser that retains all other formatting but changes the urls line in the agent section if it comes across it:
import io
def filter_config(stream, item_filter):
"""
Filter a "config" file stream.
:param stream: Text stream to read from.
:param item_filter: Filter function; takes a section and a line and returns a filtered line.
:return: Yields (possibly) filtered lines.
"""
current_section = None
for line in stream:
stripped_line = line.strip()
if stripped_line.startswith('['):
current_section = stripped_line.strip('[]')
elif not stripped_line.startswith("#") and " = " in stripped_line:
line = item_filter(current_section, line)
yield line
def urls_filter(section, line):
if section == "agent" and line.strip().startswith("urls = "):
start, sep, end = line.partition(" = ")
return start + sep + "hi there..."
return line
# Could be a disk file, just using `io.StringIO()` for self-containedness here
config_file = io.StringIO("""
[global_tags]
[agent]
interval = "10s"
round_interval = true
metric_batch_size = 10000
# HELLO! THIS IS A COMMENT!
metric_buffer_limit = 100000
urls = ""
[other]
urls = can't touch this!!!
""")
for line in filter_config(config_file, urls_filter):
print(line, end="")
The output is
[global_tags]
[agent]
interval = "10s"
round_interval = true
metric_batch_size = 10000
# HELLO! THIS IS A COMMENT!
metric_buffer_limit = 100000
urls = hi there...
[other]
urls = can't touch this!!!
so you can see all comments and (mis-)indentation was preserved.
The problem is that you're passing brackets with the section name, which is unnecessary:
edit.set("[section]", "urls", str(urls))
See this example from the documentation:
import configparser
config = configparser.RawConfigParser()
# Please note that using RawConfigParser's set functions, you can assign
# non-string values to keys internally, but will receive an error when
# attempting to write to a file or when you get it in non-raw mode. Setting
# values using the mapping protocol or ConfigParser's set() does not allow
# such assignments to take place.
config.add_section('Section1')
config.set('Section1', 'an_int', '15')
config.set('Section1', 'a_bool', 'true')
config.set('Section1', 'a_float', '3.1415')
config.set('Section1', 'baz', 'fun')
config.set('Section1', 'bar', 'Python')
config.set('Section1', 'foo', '%(bar)s is %(baz)s!')
# Writing our configuration file to 'example.cfg'
with open('example.cfg', 'w') as configfile:
config.write(configfile)
But, anyway, it won't preserve the identation, nor will it support nested sections; you could try the YAML format, which does allow to use indentation to separate nested sections, but it won't keep the exact same indentation when saving, but, do you really need it to be the exact same? Anyway, there are various configuration formats out there, you should study them to see what fits your case better.
I am trying to append duplicate key: value pair to a nested dictionary in a YAML file through python script. Following is the snippet of the code which I have written to achieve this:
import click
import ruamel.yaml
def organization():
org_num = int(input("Please enter the number of organizations to be created: "))
org_val = 0
while org_val!= org_num:
print ("")
print("Please enter values to create Organizations")
print ("")
for org in range(org_num):
organization.org_name = str(raw_input("Enter the Organization Name: "))
organization.org_description = str(raw_input("Enter the Description of Organization: "))
print ("")
if click.confirm("Organization Name: "+ organization.org_name + "\nDescription: "+ organization.org_description + "\nIs this Correct?", default=True):
if org_val == 0:
org_val = org_val + 1
yaml = ruamel.yaml.YAML()
org_data = dict(
organizations=dict(
name=organization.org_name,
description=organization.org_description,
)
)
with open('input.yml', 'a') as outfile:
yaml.indent(mapping=2, sequence=4, offset=2)
yaml.dump(org_data, outfile)
else:
org_val = org_val + 1
yaml = ruamel.yaml.YAML()
org_data = dict(
name=organization.org_name,
description=organization.org_description,
)
with open('input.yml', 'r') as yamlfile:
cur_yaml = yaml.load(yamlfile)
cur_yaml['organizations'].update(org_data)
if cur_yaml:
with open('input.yml','w') as yamlfile:
yaml.indent(mapping=2, sequence=4, offset=2)
yaml.dump(cur_yaml, yamlfile)
return organization.org_name, organization.org_description
organization()
At the end of python script my input.yml file should look like the following:
version: x.x.x
is_enterprise: 'true'
license: secrets/license.txt
organizations:
- description: xyz
name: abc
- description: pqr
name: def
However every time the script is running, instead of appending the value to organizations it overwrites it.
I also have tried using append instead of update but I am getting the following error:
AttributeError: 'CommentedMap' object has no attribute 'append'
What can I do to solve this?
Also since I am new to development, any suggestion on making this code better will be really helpful.
If I understand correctly, what you want is cur_yaml['organizations'] += [org_data].
Note that if you run the script many times, you will have the same entry multiple times.
Using update is not going to work, because the value for for the key
organizations is a sequence and loads as the list-like type
CommentedSeq. So append-ing would be the right thing to do.
That that doesn't work is a bit unclear since you don't provide that input
that you start with, nor the code used when doing the appending which gets
an AttributeError on CommentedMap.
Here is what works if you have one organization and add another:
import sys
import ruamel.yaml
yaml_str = """\
version: x.x.x
is_enterprise: 'true'
license: secrets/license.txt
organizations:
- description: xyz
name: abc
"""
org_data = dict(
description='pqr',
name='def',
)
yaml = ruamel.yaml.YAML()
yaml.indent(mapping=4, sequence=4, offset=2)
cur_yaml = yaml.load(yaml_str)
cur_yaml['organizations'].append(org_data)
yaml.dump(cur_yaml, sys.stdout)
This gives:
version: x.x.x
is_enterprise: 'true'
license: secrets/license.txt
organizations:
- description: xyz
name: abc
- description: pqr
name: def
If you have no organizations yet, make sure your input YAML looks like:
version: x.x.x
is_enterprise: 'true'
license: secrets/license.txt
organizations: []
On older versions of Python the order of the keys in the data you added is not guaranteed. To
enforce that order on older version as well do:
org_data = ruamel.yaml.comments.CommentedMap((('description', 'pqr'), ('name', 'def')))
or
org_data = ruamel.yaml.comments.CommentedMap()
org_data['description'] = 'pqr'
org_data['name'] = 'def'
Found the issue and its working fine now. Since name and description are list objects for organizations, I have added [] to the below code and it started working.
org_data = dict(
organizations=[dict(
name=tower_organization.org_name,
description=tower_organization.org_description,
)
]
)
In addition to above, I guess append wasn't working because of the hyphen '-' which was missing from the first object as part of the identation. After fixing the above code, append is working fine as well.
Thank you all for your answers.
I have a yaml file of the form below:
Solution:
- number of solutions: 1
number of solutions displayed: 1
- Gap: None
Status: optimal
Message: bonmin\x3a Optimal
Objective:
objective:
Value: 0.010981105395
Variable:
battery_E[b1,1,1]:
Value: 0.25
battery_E[b1,1,2]:
Value: 0.259912707017
battery_E[b1,2,1]:
Value: 0.120758408109
battery_E[b2,1,1]:
Value: 0.0899999972181
battery_E[b2,2,3]:
Value: 0.198967393893
windfarm_L[w1,2,3]:
Value: 1
windfarm_L[w1,3,1]:
Value: 1
windfarm_L[w1,3,2]:
Value: 1
Using Python27, I would like to import all battery_E values from this YAML file. I know I can iterate over the keys of battery_E dictionary to retrieve them one by one (I am already doing it using PyYAML) but I would like to avoid iterating and do it in one go!
It's not possible "in one go" - there will still be some kind of iteration either way, and that's completely OK.
However, if the memory is a concern, you can load only values of the keys of interest during YAML loading:
from __future__ import print_function
import yaml
KEY = 'battery_E'
class Loader(yaml.SafeLoader):
def __init__(self, stream):
super(Loader, self).__init__(stream)
self.values = []
def compose_mapping_node(self, anchor):
start_event = self.get_event()
tag = start_event.tag
if tag is None or tag == '!':
tag = self.resolve(yaml.MappingNode, None, start_event.implicit)
node = yaml.MappingNode(tag, [],
start_event.start_mark, None,
flow_style=start_event.flow_style)
if anchor is not None:
self.anchors[anchor] = node
while not self.check_event(yaml.MappingEndEvent):
item_key = self.compose_node(node, None)
item_value = self.compose_node(node, item_key)
if (isinstance(item_key, yaml.ScalarNode)
and item_key.value.startswith(KEY)
and item_key.value[len(KEY)] == '['):
self.values.append(self.construct_object(item_value, deep=True))
else:
node.value.append((item_key, item_value))
end_event = self.get_event()
node.end_mark = end_event.end_mark
return node
with open('test.yaml') as f:
loader = Loader(f)
try:
loader.get_single_data()
finally:
loader.dispose()
print(loader.values)
Note however, that this code does not assume anything about the position of battery_E keys in the tree inside the YAML file - it will just load all of their values.
There is no need to retrieve each entry using PyYAML, you can load the data once, and then use Pythons to select the key-value pairs with the following two lines:
data = yaml.safe_load(open('input.yaml'))
kv = {k:v['Value'] for k, v in data['Solution'][1]['Variable'].items() if k.startswith('battery_E')}
after that kv contains:
{'battery_E[b2,2,3]': 0.198967393893, 'battery_E[b1,1,1]': 0.25, 'battery_E[b1,1,2]': 0.259912707017, 'battery_E[b2,1,1]': 0.0899999972181, 'battery_E[b1,2,1]': 0.120758408109}
My code imports a MARC file using MARCReader and compares a string against a list of acceptable answers. If the string from MARC has no match in my list, it gets added to an error list. This has worked for years in Python 2.7.4 installations on Windows 7 with no issue. I recently got a Windows 10 machine and installed Python 2.7.10, and now strings with non-standard characters fail that match. the issue is not Python 2.7.10 alone; I've installed every version from 2.7.4 through 2.7.10 on this new machine, and get the same problem. A new install of Python 2.7.10 on a Windows 7 machine also gets the problem.
I've trimmed out functions that aren't relevant, and I've dramatically trimmed the master list. In this example, "Académie des Sciences" is an existing repository, but "Acadm̌ie des Sciences" now appears in our list of new repositories.
# -*- coding: utf-8 -*-
from aipmarc import get_catdb, get_bibno, parse_date
from phfawstemplate import browsepage #, nutchpage, eadpage, titlespage
from pymarc import MARCReader, marc8_to_unicode
from time import strftime
from umlautsort import alafiling
import urllib2
import sys
import os
import string
def make_newrepos_list(list, fn): # Create list of unexpected repositories found in the MArcout database dump
output = "These new repositories are not yet included in the master list in phfaws.py. Please add the repository code (in place of ""NEWCODE*""), and the URL (in place of ""TEST""), and then add these lines to phfaws.py. Please keep the list alphabetical. \nYou can find repository codes at http://www.loc.gov/marc/organizations/ \n \n"
for row in list:
output = '%s reposmasterlist.append([u"%s", "%s", "%s"])\n' % (output, row[0], row[1], row[2])
fh = open(fn,'w')
fh.write(output.encode("utf-8"))
fh.close()
def main(marcfile):
reader = MARCReader(file(marcfile))
'''
Creating list of preset repository codes.
'''
reposmasterlist =[[u"American Institute of Physics", "MdCpAIP", "http://www.aip.org/history/nbl/index.html"]]
reposmasterlist.append([u"Académie des Sciences", "FrACADEMIE", "http://www.academie-sciences.fr/fr/Transmettre-les-connaissances/inventaires-des-fonds-d-archives-personnelles.html"])
reposmasterlist.append([u"American Association for the Advancement of Science", "daaas", "http://archives.aaas.org/"])
newreposcounter = 0
newrepos = ""
newreposlist = []
findingaidcounter = 0
reposcounter = 0
for record in reader:
if record['903']: # Get only records where 903a="PHFAWS"
phfawsfull = record.get_fields('903')
for field in phfawsfull:
phfawsnote = field['a']
if 'PHFAWS' in phfawsnote:
if record['852'] is not None: # Get only records where 852/repository is not blank
repository = record.get_fields('852')
for field in repository:
reposname = field['a']
reposname = marc8_to_unicode(reposname) # Convert repository name from MARC file to Unicode
reposname = reposname.rstrip('.,')
reposcode = None
reposurl = None
for row in reposmasterlist: # Match field 852 repository against the master list.
if row[0] == reposname: # If it's in the master list, use the master list to populate our repository-related fields
reposcode = row[1]
reposurl = row[2]
if record['856'] is not None: # Get only records where 856 is not blank and includes "online finding aid"
links = record.get_fields('856')
for field in links:
linksthree = field['3']
if linksthree is not None and "online finding aid" in linksthree:
if reposcode == None: # If this record's repository wasn't in the master list, add to list of new repositories
newreposcounter += 1
newrepos = '%s %s \n' % (newrepos, reposname)
reposcode = "NEWCODE" + str(newreposcounter)
reposurl = "TEST"
reposmasterlist.append([reposname, reposcode, reposurl])
newreposlist.append([reposname, reposcode, reposurl])
human_url = field['u']
else:
pass
else:
pass
else:
pass
else:
pass
else:
pass
# Output list of new repositories
newreposlist.sort(key = lambda rep: rep[0])
if newreposcounter != 0:
status = '%d new repositories found. you must add information on these repositories, then run phfaws.py again. Please see the newly updated rewrepos.txt for details.' % (newreposcounter)
sys.stderr.write(status)
make_newrepos_list(newreposlist, 'newrepos.txt')
if __name__ == '__main__':
try:
mf = sys.argv[1]
sys.exit(main(mf))
except IndexError:
sys.exit('Usage: %s <marcfile>' % sys.argv[0])
Edit: I've found that simply commenting out the "reposname = marc8_to_unicode(reposname)" line gets me the results I want. I still don't understand why this is, since it was a necessary step before.
Edit: I've found that simply commenting out the "reposname = marc8_to_unicode(reposname)" line gets me the results I want. I still don't understand why this is, since it was a necessary step before.
This suggests to me that the encoding of strings in your database changed from MARC8 to Unicode. Have you upgraded your cataloging system recently?