PyYAML - Skipping certain keys when dumping a dict to file

PyYAML - Skipping certain keys when dumping a dict to file - python

I have a YAML file with configuration data for my application, which is dumped to a new file whenever the application is run for debugging purposes. Unfortunately, some keys in the YAML file hold sensitive data and need to be obfuscated or simply excluded from the dumped file.
Example YAML input file:
logging_config:
level: INFO
file_path: /path/to/log_file.log
database_access:
table_to_query: customer_table
database_api_key: XXX-XXX-XXX # Sensitive data, exclude from archived file
There are workarounds, of course:
Keeping a list of keys with sensitive data and pre-processing dicts before outputting them to YAML
Separating sensitive and non-sensitive data in separate configuration files and outputtiing only the latter
etc.
But I was hoping that there was a solution similar to implementing a custom Loader reacting to a command like !keep_secret whenever it appears in a dict value, as it would keep my configuration files more readable.

You can use a custom representer. Here's a basic example:
import yaml
class SensitiveText:
def __init__(self, content):
self.content = content
def __repr__(self):
return self.content
def __str__(self):
return self.content
def sensitive_text_remover(dumper, data):
return dumper.represent_scalar("tag:yaml.org,2002:null", "")
yaml.add_representer(SensitiveText, sensitive_text_remover)
data = {
"logging_config": {
"level": "INFO",
"file_path": "/path/to/log_file.log"
},
"database_access": {
"table_to_query": "customer_table",
"database_api_key": SensitiveText("XXX-XXX-XXX")
}
}
print(yaml.dump(data))
This prints:
database_access:
database_api_key:
table_to_query: customer_table
logging_config:
file_path: /path/to/log_file.log
level: INFO
You can of course have a class for the database_access instead with a representer that removes the database_api_key altogether.

Related

Dump object into yaml without quotes

I had some object that I want to turn into yaml, the only thing is that I need to be able to put "!anything" without quotes into it.
When I try it with pyyaml I end up with '!anything' inside my yaml file.
I've already tried using ruamel.yaml PreservedScalarString and LiteralScalarString. And it kind of works, but not in the way that I need to work. The thing is I end up with yaml that looks like this:
10.1.1.16:
text: '1470814.27'
confidence: |-
!anything
But I don't need this |- symbol.
My goal is to get yaml like this:
10.1.1.16:
text: '1470814.27'
confidence: !anything
Any ideas how I can achieve that?

To dump a custom tag, you need to define a type and register a representer for that type. Here's how to do it for scalars:
import yaml
class MyTag:
def __init__(self, content):
self.content = content
def __repr__(self):
return self.content
def __str__(self):
return self.content
def mytag_dumper(dumper, data):
return dumper.represent_scalar("!anything", data.content)
yaml.add_representer(MyTag, mytag_dumper)
print(yaml.dump({"10.1.1.16": {
"text": "1470814.27",
"confidence": MyTag("")}}))
This emits
10.1.1.16:
confidence: !anything ''
text: '1470814.27'
Note the '' behind the tag, which is the tagged scalar (no, you can't get rid of it). You can tag collections as well but you'll need to use represent_sequence or represent_mapping accordingly.

Contrary to #flix comment, in YAML you don't need to follow a tag by single or double quotes (or block scalar). You can try Oren Ben-Kiki's reference parser (programmatically derived from the YAML specification) to confirm that your expected output is valid YAML.
Empty content is normally loaded as None in Python (both by the outdated PyYAML as well as ruamel.yaml). Tagged empty content can of course only indicate existence of a particular instance, without any value indication.
ruamel.yaml can perfectly well round-trip your expected output:
import sys
from ruamel.yaml import YAML
yaml_str = """\
10.1.1.16:
text: '1470814.27'
confidence: !anything
"""
yaml = YAML()
data = yaml.load(yaml_str)
yaml.dump(data, sys.stdout)
gives:
10.1.1.16:
text: '1470814.27'
confidence: !anything
You can generate an object that dumps just the tag without a value from scratch (as the parser does), but if you don't want to go into the details, you can just load the tagged object and add it to your data structure:
import sys
import ruamel.yaml
yaml = ruamel.yaml.YAML()
def tagged_empty_scalar(tag):
return yaml.load('!' + tag)
data = {'10.1.1.16': dict(text='1470814.27', confidence=tagged_empty_scalar('anything'))}
yaml.dump(data, sys.stdout)
You can get the exact same result in PyYAML and without the quotes, but that is more complicated.

How do I generate YAML containing local tags with ruamel.yaml?

I'm using ruamel.yaml to generate a YAML file that will be read by Tavern, which requires the file to contain a list like this:
includes:
- !include vars.yaml
Attempting to use any of the usual approaches to dump the data as strings results in single quotes being added around the tags, which doesn't work when the YAML is ingested by the next tool.
How do I generate a YAML file that contains unquoted local tags, starting with data that is defined in a dictionary?

I was able to create a YAML file with the required format using the following approach, based on prior examples. My approach is more flexible because it allows the tag handle to be an instance property rather than a class property, so you don't need to define a different class for every tag handle.
import sys
from ruamel.yaml import YAML
yaml = YAML(typ='rt')
class TaggedString:
def __init__(self, handle, value):
self.handle = handle
self.value = value
#classmethod
def to_yaml(cls, representer, node):
# I don't understand the arguments to the following function!
return representer.represent_scalar(u'{.handle}'.format(node),
u'{.value}'.format(node))
yaml.register_class(TaggedString)
data = {
'includes': [
TaggedString('!include', 'vars.yaml'),
TaggedString('!exclude', 'dummy.yaml')
]
}
yaml.dump(data, sys.stdout)
Output:
includes:
- !include vars.yaml
- !exclude dummy.yaml
I am not sure if this is the best approach. I might be missing a simpler way to achieve the same result. Note that my goal is not to dump a Python class; I'm just doing that as a way to get the tag to be written correctly.

I am not sure if this is a better approach, but if you had tried to round-trip your required output, you
would have seen that ruamel.yaml actually can preserve your tagged strings, without you having to
do anything. Inspecting the Python datastructure, you'll notice that ruamel.yaml does
this by creating a TaggedScalar (as you cannnot attach attributes to the built-in string type).
import sys
import ruamel.yaml
yaml_str = """\
includes:
- !include vars.yaml
- !exclude dummy.yaml
"""
yaml = ruamel.yaml.YAML()
data = yaml.load(yaml_str)
yaml.dump(data, sys.stdout)
incl = data['includes'][0]
print(type(incl))
which gives:
includes:
- !include vars.yaml
- !exclude dummy.yaml
<class 'ruamel.yaml.comments.TaggedScalar'>
After inspecting comments.py (and possible constructor.py), you should be able
to make ruamel.yaml's internal data structure on the fly:
import sys
import ruamel.yaml
from ruamel.yaml.comments import TaggedScalar
def tagged_string(tag, val):
# starting with ruamel.yaml>0.16.5 you can replace the following lines with:
# return TaggedScalar(value=val, tag=tag)
ret_val = TaggedScalar()
ret_val.value = val
ret_val.yaml_set_tag(tag)
return ret_val
yaml = ruamel.yaml.YAML()
data = dict(includes=[tagged_string('!include', 'vars.yaml'),
tagged_string('!include', 'vars.yaml'),
])
yaml.dump(data, sys.stdout)
which also gives:
includes:
- !include vars.yaml
- !include vars.yaml

pyYAML, expected NodeEvent, but got DocumentEndEvent

I'm trying to dump a custom object, that is a kind of a list of objects. So I overrode the to_yaml method of the YAMLOBject class from which I set my class to inherit from:
#classmethod
def to_yaml(cls, dumper, data):
""" This methods defines how to save this class to a yml
file """
passage_list = []
for passage in data:
passage_dict = {
'satellite': passage.satellite.name,
'ground_station': passage.ground_station.name,
'aos': passage.aos,
'los': passage.los,
'tca': passage.tca,
}
passage_list.append(passage_dict)
passage_list_dict = {
'passages': passage_list
}
return dumper.represent(passage_list_dict)
When I call the yaml.dump method, the output file is created correctly with the correct data:
if save_to_file:
with open(save_to_file, 'w') as f:
yaml.dump(all_passages, f, default_flow_style=False)
but at the end of the execution I get a EmitterError: expected NodeEvent, but got DocumentEndEvent()
I believe it's related to not closing correctly the YAML document because when I was debugging my code I was getting save_to_file files that were missing the new line at the end of the document. Could it be? Or is it something else?

Your code does not work because dumper.represent doesn't return anything. You want to use dumper.represent_data instead.

Python: File being saved as empty when it should be saved with text in it, but only when an unrelated method is run before it

Observe the following Python file:
# configmanager.py
"""
ConfigManager controls the modification and validation of config files.
"""
import os
from ruamel import yaml
from voluptuous import Schema
class ConfigManager():
"""
Controls all interaction with configuration files
"""
def __init__(self):
super().__init__()
self.configvalidator = ConfigValidator()
# The config directory inside users home directory.
# Config files will be stored here.
config_dir = os.path.expanduser('~')+'/.config/MyProject/'
# The default config file
config_file = config_dir+'myproject.conf'
# The default configuration
default_config = {
'key1': {},
'key2': {}
}
def _get_config(self):
"""
Get the config file and return it as python dictionary.
Will create the config directory and default config file if they
do not exist.
"""
# Create config directory if it does not exist
if not os.path.exists(self.config_dir):
os.makedirs(self.config_dir)
# Create default config file if it does not exist
if not os.path.isfile(self.config_file):
config_file = open(self.config_file, 'w')
config_file.write(yaml.dump(self.default_config))
# Open config file, and load from YAML
config_file = open(self.config_file, 'r')
config = yaml.safe_load(config_file)
# Validate config
self.configvalidator.validate(config)
return config
def _save_config(self, config):
"""
Save the config file to disk as YAML
"""
# Open current config file
config_file = open(self.config_file, 'w')
# Validate new config
# THE ERROR IS HERE
# If this runs then the config file gets saved as an empty file.
self.configvalidator.validate(config)
# This shows that the 'config' variable still holds the data
print(config)
# This shows that yaml.dump() is working correctly
print(yaml.dump(config))
config_file.write(yaml.dump(config))
def edit_config(self):
"""
Edits the configuration file
"""
config = self._get_config()
# Perform manipulation on config here
# No actual changes to the config are necessary for the bug to occur
self._save_config(config)
class ConfigValidator():
def __init__(self):
super().__init__()
# Config file schema
# Used for validating the config file with voluptuous
self.schema = Schema({
'key1': {},
'key2': {},
})
def validate(self, config):
"""
Validates the data against the defined schema.
"""
self.schema(config)
app = ConfigManager()
app.edit_config()
-
# ~/.config/MyProject/myproject.conf
key1: {}
key2: {}
Description of my module
This is a module I am working on which is for modifying the config file for my project. It accesses the file in ~/.config/MyProject/myproject.conf, which is saved in YAML format, and stores various pieces of information that are used by my program. I have removed as much of the code as I can, leaving only that necessary for understanding the bug.
ConfigManager
ConfigManager is the class containing methods for manipulating my config file. Here it contains three methods: _get_config(), _save_config(), and edit_config(). When instantised, it will get an instance of ConfigValidator (described below), and assign it to self.configvalidator.
_get_config
_get_config() simply opens the file defined by the class variables, specifically ~/.config/MyProject/myproject.conf, or creates the file with default values if it does not exist. The file is saved in YAML format, so this method loads it into a python object, using ruamel.yaml, validates it using self.configvalidator.validate(config) and returns it for use by other pieces of code.
_save_config
_save_config() is where the error occurs, which is described in detail below. It's purpose is to validate the given data, and if it is valid, save it to disk in YAML format.
edit_config
This a generic function, which, in my program, would make specific changes to my config file, depending on the arguments given. In my example, this function simply gets the config file with self._get_config(), and then saves it using self._save_config, without making any changes.
ConfigValidator
This class is for validating my config file using voluptuous. When instantised, it will create the schema that is to be used, and assign it to self.schema. When the validate method is run, it validates the given data using voluptuous.
The error
Observe the line self.configvalidator.validate(config) in ConfigManager._save_config(). This will validate the given data against the schema, and raise an error if it does not pass validation.
But, in the following line config_file.write(yaml.dump(config)), which simply saves the given data to a file as YAML, it will instead save an empty file to disk. (Note: the file is empty, not deleted)
If I disable the validation, by commenting out self.configvalidator.validate(config), then the file is written correctly as YAML.
If self.configvalidator.validate(config) is run, then the config file is saved as an empty file.
My testing
As can be seen with the line print(config), the data in the variable config does not change after being used for validation, yet when being saved to disk, it would seem that config is an empty variable.
print(yaml.dump(config)) shows that ruamel.yaml does work correctly.
If I change edit_config to give invalid data to _save_config, then self.configvalidator.validate(config) will raise an error, as expected. self.configvalidator.validate(config) is running correctly.
End
If self.configvalidator.validate(config) is run, then config_file.write(yaml.dump(config)) saves the config file as an empty file, despite the data in the variable config not changing.
If self.configvalidator.validate(config) is not run, then config_file.write(yaml.dump(config)) saves the file correctly.
That is my error, and it makes absolutely no sense to me.
If your keen to help, then configmanager.py should run correctly (with the error) on your machine, as long as it has access to ruamel.yaml and voluptuous. It will create ~/.config/MyProject/myproject.conf, then save it as empty. Save my example myproject.conf to see how it is then saved as empty when configmanager.py is run. If configmanager.py is run again, when myproject.conf is empty, then a validation error will be raised in _get_config, as expected.
I am so confused by this bug, so if you have any insight it would be greatly appreciated.
Cheers

json to geoDjango model

I am building a database using Django, geodjango and postgresql of field data. The data includes lats and lons. One of the tasks I have is to ingest data that has already been collected. I would like to use .json file to define the metadata and write some code to batch process some json files.
What I have so far is, a model:
class deployment(models.Model):
'''
#brief This is the abstract deployment class.
'''
startPosition=models.PointField()
startTimeStamp=models.DateTimeField()
endTimeStamp=models.DateTimeField()
missionAim=models.TextField()
minDepth=models.FloatField() # IT seems there is no double in Django
maxDepth=models.FloatField()
class auvDeployment(deployment):
'''
#brief AUV meta data
'''
#==================================================#
# StartPosition : <point>
# distanceCovered : <double>
# startTimeStamp : <dateTime>
# endTimeStamp : <dateTime>
# transectShape : <>
# missionAim : <Text>
# minDepth : <double>
# maxDepth : <double>
#--------------------------------------------------#
# Maybe need to add unique AUV fields here later when
# we have more deployments
#==================================================#
transectShape=models.PolygonField()
distanceCovered=models.FloatField()
And I function I want to use to ingest the data
#staticmethod
def importDeploymentFromFile(file):
'''
#brief This function reads in a metadta file that includes campaign information. Destinction between deployment types is made on the fine name. <type><deployment>.<supported text> auvdeployment.json
#param file The file that holds the metata data. formats include .json todo:-> .xml .yaml
'''
catamiWebPortal.logging.info("Importing metadata from " + file)
fileName, fileExtension = os.path.splitext(file)
if fileExtension == '.json':
if os.path.basename(fileName.upper()) == 'AUVDEPLOYMENT':
catamiWebPortal.logging.info("Found valid deployment file")
data = json.load(open(file))
Model = auvDeployment(**data)
Model.save()
And the file I am trying to read in this
{
"id":1,
"startTimeStamp":"2011-09-09 13:20:00",
"endTimeStamp":"2011-10-19 14:23:54",
"missionAim":"for fun times, call luke",
"minDepth":10.0,
"maxDepth":20.0,
"startPosition":{{"type": "PointField", "coordinates": [ 5.000000, 23.000000 ] }},
"distanceCovered":20.0
}
The error that I am getting is this
TypeError: cannot set auvDeployment GeometryProxy with value of type: <type 'dict'>
If I remove the geo types from the model and file. It will read the file and populate the database table.
I would appreciate any advice one how I am parse the datafile with the geotypes.
Thanks

Okay the solution is as follows. The file format is not the geoJSON file format, it's the geos format. The .json file should be as follows.
{
"id": 1,
"startTimeStamp": "2011-10-19 10:23:54",
"endTimeStamp":"2011-10-19 14:23:54",
"missionAim": "for fun times, call luke",
"minDepth":10.0,
"maxDepth":20.0,
"startPosition":"POINT(-23.15 113.12)",
"distanceCovered":20,
"transectShape":"POLYGON((-23.15 113.12, -23.53 113.34, -23.67 112.9, -23.25 112.82, -23.15 113.12))"
}
Not the StartPosition syntax has changed.

A quick fix would be to use the GEOs API in geoDjango to change the startPosition field from geoJson format to a GEOSGeometry object before you save the model. This should allow it to pass validation.
Include the GEOSGeometry function from Django with:
from django.contrib.gis.geos import GEOSGeometry
...
Model = auvDeployment(**data)
Model.startPosition = GEOSGeometry(str(Model.startPosition))
Model.save()
The GEOS API cant construct objects from a GeoJSON format, as long as you make it a string first. As it stands, you are loading it as a dictionary type instead of a string.

I suggest you use the default command for loading fixtures: loaddata
python manage.py loaddata path/to/myfixture.json ...
The structure of your json would have to be slighty adjusted, but you could make a simple dumpdata to see how the structure should look like.

Develop Reference

Python is a programming language that lets you work quickly and integrate systems more effectively.

PyYAML - Skipping certain keys when dumping a dict to file - python

Related

Dump object into yaml without quotes

How do I generate YAML containing local tags with ruamel.yaml?

pyYAML, expected NodeEvent, but got DocumentEndEvent

Python: File being saved as empty when it should be saved with text in it, but only when an unrelated method is run before it

json to geoDjango model

Categories

Resources