Dynamically update complex OrderedDict (based on yaml file)

Dynamically update complex OrderedDict (based on yaml file) - python

I'm trying to build a piece of software that will rely on very dynamic configuration (or "ruleset", really). I've tried to capture it in the pseudocode:
"""
---
config:
item1:
thething: ${stages.current.variables.item1}
stages:
dev:
variables:
item1: stuff
prod:
variables:
item1: stuf2
"""
config_obj = yaml.load(config)
current_stage = 'dev'
#Insert artificial "current stage" to allow var matching
stages['current'] = stages[current_stage]
updated_config_obj = replace_vars(config_obj)
The goal is to have the updated_config_obj replace all variable-types with the actual value, so for this example it should replace ${stages.current.variables.item1} with stuff. The current part is easily solved by copying whatever's the current stage into a current item in the same OrderedDict, but I'm still stomped by how to actually perform the replace. The config yaml can be quite large and is totally depended on a plugin system so it must be dynamic.
Right now I'm looking at "walking" the entire object, checking for the existence of a $ on each "leaf" (indicating a variable) and performing a lookup backup to the current object to "resolve" the variable, but somehow that seems overly complex. Another alternative is (I guess) to ue Jinja2-templating on the "config string", with the parsed object as a lookup. Certainly doable but it somehow feels a little dirty.
I have the feeling that there should be a more elegant solution which can be done solely on the parsed object (without interacting with the string), but it escapes me.
Any pointers appreciated!

First, my two cents: try to avoid using any form of interpolation in your configuration file. This creates another layer of dependencies - one dependency for your program (the configuration file) and another dependency for your configuration file.
It's a slick solution at the moment, but consider that five years down the road some lowly developer might be staring down ${stages.current.variables.item1} for a month trying to figure out what this is, not understanding that this implicitly maps onto stages.dev. And then worse yet, some other developer comes along, and seeing that the floodgates of interpolation have been opened, they start using {{stages_dev}}, to mean that some value should interpolated from the system's environmental variables. And then some other developer starts using their own convention like {{!stagesdev!}}, which means that the value uses its own custom runtime interpolation, invoked in some obscure, downstream back-alley.
And then some consultant is hired to reverse-engineer the whole thing and now they are sailing the seas of spaghetti.
If you still want to do this, I'd recommend opening/parsing the configuration file into a dictionary (presumably using yaml.load()), then iterating through the whole thing, line-by-line, using regex to find instances of \$\{(.*)\}.
For each captured group, create an ordered list like:
# ["stages", "current", "variables", item1"]
yaml_references = ".".split("stages.current.variables.item1")
Then, you could do something like:
yaml_config_dict = "" # the parsed configuration file
interpolated_reference = None
for y in yaml_references:
interpolated_reference = yaml_config_dict[y]
i = interpolated_reference[0]
Now i should represent whatever ${stages.current.variables.item1} was pointing to in the context of the .yaml file and you should be able to do a string replace.

Related

Extract list of variables with start attribute from Modelica model

Is there an easy way to extract a list of all variables with start attribute from a Modelica model? The ultimate goal is to run a simulation until it reaches steady-state, then run a python script that compares the values of start attribute against the steady-state value, so that I can identify start values that were chosen badly.
In the Dymola Python interface I could not find such a functionality. Another approach could be to generate the modelDescription.xml and parse it, I assume the information is available somewhere in there, but for that approach I also feel I need help to get started.

Similar to this answer, you can extract that info easily from the modelDescription.xml inside a FMU with FMPy.
Here is a small runnable example:
from fmpy import read_model_description
from fmpy.util import download_test_file
from pprint import pprint
fmu_filename = 'CoupledClutches.fmu'
download_test_file('2.0', 'CoSimulation', 'MapleSim', '2016.2', 'CoupledClutches', fmu_filename)
model_description = read_model_description(fmu_filename)
start_vars = [v for v in model_description.modelVariables if v.start and v.causality == 'local']
pprint(start_vars)

The files dsin.txt and dsfinal.txt might help you around with this. They have the same structure, with values at the start and at the end of the simulation; by renaming dsfinal.txt to dsin.txt you can start your simulation from the (e.g. steady-state) values you computed in a previous run.
It might be worthy working with these two files if you have in mind already to use such values for running other simulations.
They give you information about solvers/simulation settings, that you won't find in the .mat result files (if they're of any interest for your case)
However, if it is only a comparison between start and final values of variables that are present in the result files anyway, a better choice might be to use python and a library to read the result.mat file (dymat, modelicares, etc). It is then a matter of comparing start-end values of the signals of interest.

After some trial and error, I came up with this python code snippet to get that information from modelDescription.xml:
import xml.etree.ElementTree as ET
root = ET.parse('modelDescription.xml').getroot()
for ScalarVariable in root.findall('ModelVariables/ScalarVariable'):
varStart = ScalarVariable.find('*[#start]')
if varStart is not None:
name = ScalarVariable.get('name')
value = varStart.get('start')
print(f"{name} = {value};")
To generate the modelDescription.xml file, run Dymola translation with the flag
Advanced.FMI.GenerateModelDescriptionInterface2 = true;
Python standard library has several modules for processing XML:
https://docs.python.org/3/library/xml.html
This snippet uses ElementTree.
This is just a first step, not sure if I missed something basic.

Using python to parse a large set of filenames concatenated from inconsistent object names

/tldr Looking to parse a large set of filenames that are a concatenation of two names (container + child) for the original two names where nomenclature is inconsistent. Python library suggestions or any other guidance appreciated.
I am looking for a way to parse strings for information where the nomenclature and formatting of information within those strings will likely be inconsistent to some degree.
Background
Industry: Automation controls
Problem to be solved:
Time series data is exported from an automation system with a single data point being saved to a single .csv file. (example: If the controls system were an environmental controls system the point might be the measured temperature of a room taken at 15 minute intervals.) It is possible to have an environment where there are a few dozen points that export to CSV files or several thousand points that export to CSV files. The structure that the points are normally stored in is as follows: points are contained within a controller, controllers are integrated under a management system and occasionally management systems could be integrated into another management system. The resulting structure is a simple hierarchical tree.
The filenames associated with the CSV files are assembled from the path structure of each point as follows: Directories are created for the management systems (nested if necessary) and under those directories are the CSV files where the filename is a concatenation of the controller name and the point name.
I have written a python script that processes a monthly export of the CSV files (currently about 5500 of them [growing]) into a structured data store and another that assembles spreadsheets for others to review. Currently, I am using some really ugly regular expressions and even uglier string.find()s with a list of static string values that I have hand entered to parse out control names and point names for each file so that they can be inserted into the structured data store.
Unfortunately, as mentioned above, the nomenclature used in these environments are rarely consistent. Point names vary widely. The point referenced above might be known as ROOMTEMP, RM_T, RM-T, ROOM-T, ZN_T, ZNT, RMT or several other possibilities. This applies to almost any point contained within a controller. Controller names are also somewhat inconsistent where they may be named for what type of device they are controlling, the geographic location of the device or even an asset number associated with the device.
I would very much like to get out of the business of hand writing regular expressions to parse file names every time a new location is added. I would like to write code that reads in filenames and looks for patterns across the filenames and then makes a recommendation for parsing the controller and point name out of each filename. I already have an interface where I can assign controller name and point name to each point object by hand so if there are errors with the parse I can modify the results. Ideally, the patterns created by the existing objects would influence the suggested names of new files being parsed.
Some examples of filenames are as follows:
UNIT1254_SAT.csv, UNIT1254_RMT.csv, UNIT1254_fil.csv, AHU_5311_CLG_O.csv, QE239-01_DISCH_STPT.csv, HX_E2_CHW_Return.csv, Plant_RM221_CHW_Sys_Enable.csv, TU_E7_Actual Clg Setpoint.csv, 1725_ROOMTEMP.csv, 1725_DA_T.csv, 1725_RA_T.csv
The order will always be consistent where it is a concatenation of controller name and then point name. There will most likely be a consistent character used to separate controller name from point name (normally an underscore, but occasionally a dash or some other character.)
Does anyone have any recommendations on how to get started with parsing these file names? I’ve thought through a few ideas, but keep shelving them before trying them prior to implementation because I keep finding the potential for performance issues or identifying failure points. The rest of my code is working pretty much the way I need it to, I just haven’t figured out an efficient or useful way to pull the correct names out of the filename. Unfortunately, It is not an option to modify the names on the control system side to be consistent.

I don't know if the following code will help you, but I hope it'll give you at least some idea.
Considering that a filename as "QE239-01_STPT_1725_ROOMTEMP_DA" can contain following names
'QE239-01'
'QE239-01_STPT'
'QE239-01_STPT_1725'
'QE239-01_STPT_1725_ROOMTEMP'
'QE239-01_STPT_1725_ROOMTEMP_DA'
'STPT'
'STPT_1725'
'STPT_1725_ROOMTEMP'
'STPT_1725_ROOMTEMP_DA'
'1725'
'1725_ROOMTEMP'
'1725_ROOMTEMP_DA'
'ROOMTEMP'
'ROOMTEMP_DA'
'DA'
as being possible elements (container name or point name) of the filename,
I defined the function treat() to return this list from the name.
Then the code treats all the filenames to find all the possible elements of filenames.
The function is based on the idea that in the chosen example the element ROOMTEMP can't follow the element STPT because STPT_ROOMTEMP isn't a possible container name in this example string since there is 1725 between these two elements.
And then, with the help of a function in difflib module, I try to discriminate elements that may have some similarity, in order to try to detect patterns under which several elements of names can be gathered.
You must play with the value passed as argument to cutoff parameter to choose what could be the best to give interesting results for you.
It's far from being good, certainly, but I didn't understood all aspects of your problem.
s =\
"""UNIT1254_SAT
UNIT1254_RMT
UNIT1254_fil
AHU_5311_CLG_O
QE239-01_DISCH_STPT,
HX_E2_CHW_Return
Plant_RM221_CHW_Sys_Enable
TU_E7_Actual Clg Setpoint
1725_ROOMTEMP
1725_DA_T
1725_RA_T
UNT147_ROOMTEMP
TRU_EZ_RM_T
HXX_V2_RM-T
RHXX_V2_ROOM-T
SIX8_ZN_T
Plint_RP228_ZNT
SOHO79_EZ_RMT"""
li = s.split('\n')
print(li)
print('- - - - - - - - - - - - - - - - - ')
import difflib
from pprint import pprint
def treat(name):
lu = name.split('_')
W = []
while lu:
W.extend('_'.join(lu[0:x]) for x in range(1,len(lu)+1))
lu.pop(0)
return W
if 0:
q = "QE239-01_STPT_1725_ROOMTEMP_DA"
pprint(treat(q))
print('==========================================')
WALL = []
for t in li:
WALL.extend(treat(t))
pprint(WALL)
for x in WALL:
j = set(difflib.get_close_matches(x, WALL, n=9000000, cutoff=0.7 ))
if len(j)>1:
print(j,'\n')

Define context variables in behave python

Sometimes, you need to define values dynamically, (like datetime now, random strings, random integers, file contents, etc.) and use them across different steps without being explicit or hard-coding the value.
So, my question is how could I define variables inside of steps (the correct way to do it) to use these variables in the following steps.
Some example
Given A random string of length "100" as "my_text"
And I log in to my platform
And I ask to add the following post:
| title | description |
| Some example of title | {{my_text}} |
When I submit the post form
Then The posts table shows these posts:
| title | description |
| Some example of title | {{my_text}} |
And I delete any post containing in the description "{{my_text}}"
This is a basic example trying to explain why I would like to define variables in steps and save them in the context to use it in the following steps.
My idea was to modify before_step and after_step methods... to set a variable in context to store my custom variables like this:
def before_step(context):
if not hasattr(context, 'vars'):
context.vars = {}
if hasattr(context, table) and context.table:
parse_table(context)
def parse_table(context):
# Here use a regex to check each cell and look for `"{{<identifier>}}"` and if match, replace the cell value by context.vars[identifier] so the step "the posts table shows these posts will never know what is `{{my_text}}` it will be abstract seeing the random string.
Scenarios Outline, use something like this defining variables like "<some_identifier>" and then for each example replace the value in the step.
It's basically to reproduce the behaviour but for any kind of step, simple or using tables.
Is it the right way to do something like this?

From Behave docs on the context:
When behave launches into a new feature or scenario it adds a new layer to the context, allowing the new activity level to add new values, or overwrite ones previously defined, for the duration of that activity. These can be thought of as scopes:
#given('I request a new widget for an account via SOAP')
def step_impl(context):
client = Client("http://127.0.0.1:8000/soap/")
// method client.Allocate(...) returns a dict
context.response = client.Allocate(customer_first='Firstname',
customer_last='Lastname', colour='red')
// context vars can be set more directly
context.new_var = "My new variable!"
#then('I should receive an OK SOAP response')
def step_impl(context):
eq_(context.response['ok'], 1)
cnv = str(context.new_var)
print (f"This is my new variable:'{cnv}'"
So, the value can be set using dot notation and retrieved the same.

To answer this question, one needs note:
Does the test data needs to be controlled externally? For example, test data can be inputed from command line so that the value can be chosen explicitly.
If the answer is no, then probably we should not bother hard coding anything in the feature file. And we can leave the data generated in one step, save it in context, and accessed again in any followed step.
The example I can think is exactly like what the question described. Do we care what the random text content we generated, posted and verified? Probably not. Then we should not expose such detail to user (i.e. feature file) since it is not important to the behaviour we are testing.
If the answer is yes, we do need a bit hack to make it happen. I am experiencing a case like this. What I want is to change the test data when I run the test so I don't have to hard code them in the feature files as in a table or scenario outline. How can I do this?
I can use -D option in command line to pass in as many user data as possible, which can then be accessed in context.config.userdata dictionary in any steps. If the number of test data is very limited. This approach is an easy way to go. But if the test data set contains many data that no one want type one by one in command line, it can be stored externally, for example, a ini file with section names like testdata_1...testdata_n, and thus a string can be passed in from command line to be used to address the section name in this config file. And the test data can be read out in either before_all, or before_scenario, etc., and get used in all steps.

In my experience , you cannot create a dynamic value in feature file.
for example, this step :
Given A random string of length "100" as "my_text"
I dont see any way to change {my_text} each time you run the scenario. (not consider to use behave -D to parse the value to context.config.userdata,I think it is also a wrong approach)
Even Scenario Outline, it actually splits to many scenarios. each scenario will have
different value but the value of {my_text} is already defined in Examples table for each scenario.
The way makes a step dynamic is using Step definition (Code layer).
You can generate a random number in step definition #given('A random string of length "100" as "{my_text}"')
And use context.my_text to store the created number and using it arround.
I also agree with Murphy Meng that you don't need to expose the generated random number
explicitly in feature file. You know which step will use that number, simply use context.my_text in that step to get the value. That's it.

strange output from yaml.dump

I've just started using yaml and I love it. However, the other day I came across a case that seemed really odd and I am not sure what is causing it. I have a list of file path locations and another list of file path destinations. I create a dictionary out of them and then use yaml to dump it out to read later (I work with artists and use yaml so that it is human readable as well).
sorry for the long lists:
source = ['/data/job/maze/build/vehicle/blackhawk/blackhawkHelicopter/work/data/map/tasks/model/v026_03/blackhawk_diff.exr', '/data/job/maze/build/vehicle/blackhawk/blackhawkHelicopter/work/data/map/tasks/model/v026_03/blackhawk_maskTapeFloor.1051.exr', '/data/job/maze/build/vehicle/blackhawk/blackhawkHelicopter/work/data/map/tasks/model/v026_03/blackhawk_maskBurnt.1031.exr']
dest = ['/data/job/maze/build/vehicle/blackhawk/blackhawkHelicopter/work/data/map/tasks/texture/v0006/blackhawk_diff_diffuse_v0006.exr', '/data/job/maze/build/vehicle/blackhawk/blackhawkHelicopter/work/data/map/tasks/texture/v0006/blackhawk_maskTapeFloor_diffuse_v0006.1051.exr', '/data/job/maze/build/vehicle/blackhawk/blackhawkHelicopter/work/data/map/tasks/texture/v0006/blackhawk_maskBurnt_diffuse_v0006.1031.exr']
dictionary = dict(zip(source, dest))
print yaml.dump(dictionary)
this is the output that I get:
{/data/job/maze/build/vehicle/blackhawk/blackhawkHelicopter/work/data/map/tasks/model/v026_03/blackhawk_diff.exr: /data/job/maze/build/vehicle/blackhawk/blackhawkHelicopter/work/data/map/tasks/texture/v0006/blackhaw
k_diff_diffuse_v0006.exr,
/data/job/maze/build/vehicle/blackhawk/blackhawkHelicopter/work/data/map/tasks/model/v026_03/blackhawk_maskBurnt.1031.exr: /data/job/maze/build/vehicle/blackhawk/blackhawkHelicopter/work/data/map/tasks/texture/v00
06/blackhawk_maskBurnt_diffuse_v0006.1031.exr,
? /data/job/maze/build/vehicle/blackhawk/blackhawkHelicopter/work/data/map/tasks/model/v026_03/blackhawk_maskTapeFloor.1051.exr
: /data/job/maze/build/vehicle/blackhawk/blackhawkHelicopter/work/data/map/tasks/texture/v0006/blackhawk_maskTapeFloor_diffuse_v0006.1051.exr}
It comes back in fine with a yaml.load, but this is not useful for artists to be able to edit if need be.

This is the first question in the FAQ.
By default, PyYAML chooses the style of a collection depending on whether it has nested collections. If a collection has nested collections, it will be assigned the block style. Otherwise it will have the flow style.
If you want collections to be always serialized in the block style, set the parameter default_flow_style of dump() to False.
So:
>>> print yaml.dump(dictionary, default_flow_style=False)
/data/job/maze/build/vehicle/blackhawk/blackhawkHelicopter/work/data/map/tasks/model/v026_03/blackhawk_diff.exr: /data/job/maze/build/vehicle/blackhawk/blackhawkHelicopter/work/data/map/tasks/texture/v0006/blackhawk_diff_diffuse_v0006.exr
/data/job/maze/build/vehicle/blackhawk/blackhawkHelicopter/work/data/map/tasks/model/v026_03/blackhawk_maskBurnt.1031.exr: /data/job/maze/build/vehicle/blackhawk/blackhawkHelicopter/work/data/map/tasks/texture/v0006/blackhawk_maskBurnt_diffuse_v0006.1031.exr
? /data/job/maze/build/vehicle/blackhawk/blackhawkHelicopter/work/data/map/tasks/model/v026_03/blackhawk_maskTapeFloor.1051.exr
: /data/job/maze/build/vehicle/blackhawk/blackhawkHelicopter/work/data/map/tasks/texture/v0006/blackhawk_maskTapeFloor_diffuse_v0006.1051.exr
Still not exactly beautiful, but when you have strings longer than 80 characters as keys, it's about as good as you can reasonably expect.
If you model (part of) the filesystem hierarchy in your object hierarchy, or create aliases (or dynamic aliasers) for parts of the tree, etc., the YAML will look a lot nicer. But that's something you have to actually do at the object-model level; as far as YAML is concerned, those long paths full of repeated prefixes are just strings.

Selective merge of two or more data files

I have an executable whose input is contained in an ASCII file with format:
$ GENERAL INPUTS
$ PARAM1 = 123.456
PARAM2=456,789,101112
PARAM3(1)=123,456,789
PARAM4 =
1234,5678,91011E2
PARAM5(1,2)='STRING','STRING2'
$ NEW INSTANCE
NEW(1)=.TRUE.
PAR1=123
[More data here]
$ NEW INSTANCE
NEW(2)=.TRUE.
[etcetera]
In other words, some general inputs, and some parameter values for a number of new instances. The declaration of parameters is irregular; some numbers are separated by commas, others are in scientific notation, others are inside quotes, the spacing is not constant, etc.
The evaluation of some scenarios requires that I take the input of one "master" data file and copy the parameter data of, say, instances 2 through 6 to another data file which may already contain data for said instances (in which case data should be overwritten) and possibly others (data which should be left unchanged).
I have written a Flex lexer and a Bison parser; together they can eat a data file and store the parameters in memory. If I use them to open both files (master and "scenario"), it should not be too hard to selectively write to a third, new file the desired parameters (as in "general input from 'scenario'; instances 1 though 5 from 'master'; instances 6 through 9 from 'scenario'; ..."), save it, and delete the original scenario file.
Other information: (1) the files are highly sensitive, it is very important that the user is completely shielded from altering the master file; (2) the files are of manageable size (from 500K to 10M).
I have learned that what I can do in ten lines of code, some fellow here can do in two. How would you approach this problem? A Pythonic answer would make me cry. Seriously.

If you're already able to parse this format (I'd have tried it with pyParsing, but if you already have a working flexx/bison solution, that will be just fine), and the parsed data fit well in memory, then you're basically there. You can represent what you read from each file as a simple object with a dict for "general input" and a list of dicts, one per instance (or probably better a dict of instances, with the keys being the instance-numbers, which may give you a bit more flexibility). Then, as you mentioned, you just selectively "update" (add or overwrite) some of the instances copied from the master into the scenario, write the new scenario file, replace the old one with it.
To use the flexx/bison code with Python you have several options -- make it into a DLL/so and access it with ctypes, or call it from a cython-coded extension, a SWIG wrapper, a Python C-API extension, or SIP, Boost, etc etc.
Suppose that, one way or another, you have a parser primitive that (e.g.) accepts an input filename, reads and parses that file, and returns a list of 2-string tuples, each of which is either of the following:
(paramname, paramvalue)
('$$$$', 'General Inputs')
('$$$$', 'New Instance')
just using '$$$$' as a kind of arbitrary marker. Then for the object representing all that you've read from a file you might have:
import re
instidre = re.compile(r'NEW\((\d+)\)')
class Afile(object):
def __init__(self, filename):
self.filename = filename
self.geninput = dict()
self.instances = dict()
def feed_data(self, listoftuples):
it = iter(listoftuples)
assert next(it) == ('$$$$', 'General Inputs')
for name, value in it:
if name == '$$$$': break
self.geninput[name] = value
else: # no instances at all!
return
currinst = dict()
for name, value in it:
if name == '$$$$':
self.finish_inst(currinst)
currinst = dict()
continue
mo = instidre.match(name)
if mo:
assert value == '.TRUE.'
name = '$$$INSTID$$$'
value = mo.group(1)
currinst[name] = value
self.finish_inst(currinst)
def finish_inst(self, adict):
instid = dict.pop('$$$INSTID$$$')
assert instid not in self.instances
self.instances[instid] = adict
Sanity checking might be improved a bit, diagnosing anomalies more precisely, but net of error cases I think this is roughly what you want.
The merging just requires doing foo.instances[instid] = bar.instances[instid] for the required values of instid, where foo is the Afile instance for the scenario file and bar is the one for the master file -- that will overwrite or add as required.
I'm assuming that to write out the newly changed scenario file you don't need to repeat all the formatting quirks the specific inputs might have (if you do, then those quirks will need to be recorded during parsing together with names and values), so simply looping on sorted(foo.instances) and writing each out also in sorted order (after writing the general stuff also in sorted order, and with appropriate $ this and that marker lines, and with proper translation of the '$$$INSTID$$$' entry, etc) should suffice.

Develop Reference

Python is a programming language that lets you work quickly and integrate systems more effectively.