Extract list of variables with start attribute from Modelica model

Extract list of variables with start attribute from Modelica model - python

Is there an easy way to extract a list of all variables with start attribute from a Modelica model? The ultimate goal is to run a simulation until it reaches steady-state, then run a python script that compares the values of start attribute against the steady-state value, so that I can identify start values that were chosen badly.
In the Dymola Python interface I could not find such a functionality. Another approach could be to generate the modelDescription.xml and parse it, I assume the information is available somewhere in there, but for that approach I also feel I need help to get started.

Similar to this answer, you can extract that info easily from the modelDescription.xml inside a FMU with FMPy.
Here is a small runnable example:
from fmpy import read_model_description
from fmpy.util import download_test_file
from pprint import pprint
fmu_filename = 'CoupledClutches.fmu'
download_test_file('2.0', 'CoSimulation', 'MapleSim', '2016.2', 'CoupledClutches', fmu_filename)
model_description = read_model_description(fmu_filename)
start_vars = [v for v in model_description.modelVariables if v.start and v.causality == 'local']
pprint(start_vars)

The files dsin.txt and dsfinal.txt might help you around with this. They have the same structure, with values at the start and at the end of the simulation; by renaming dsfinal.txt to dsin.txt you can start your simulation from the (e.g. steady-state) values you computed in a previous run.
It might be worthy working with these two files if you have in mind already to use such values for running other simulations.
They give you information about solvers/simulation settings, that you won't find in the .mat result files (if they're of any interest for your case)
However, if it is only a comparison between start and final values of variables that are present in the result files anyway, a better choice might be to use python and a library to read the result.mat file (dymat, modelicares, etc). It is then a matter of comparing start-end values of the signals of interest.

After some trial and error, I came up with this python code snippet to get that information from modelDescription.xml:
import xml.etree.ElementTree as ET
root = ET.parse('modelDescription.xml').getroot()
for ScalarVariable in root.findall('ModelVariables/ScalarVariable'):
varStart = ScalarVariable.find('*[#start]')
if varStart is not None:
name = ScalarVariable.get('name')
value = varStart.get('start')
print(f"{name} = {value};")
To generate the modelDescription.xml file, run Dymola translation with the flag
Advanced.FMI.GenerateModelDescriptionInterface2 = true;
Python standard library has several modules for processing XML:
https://docs.python.org/3/library/xml.html
This snippet uses ElementTree.
This is just a first step, not sure if I missed something basic.

Related

Is it possible to import key-value pairs from one INI file to another INI file

I would like to import the key-value pairs of one INI file to another INI file so that whenever I make an update to the "parent" INI file, the changes are automatically applied to the "child" INI file as well.
Is this possible with INI files?
I understand that I could manipulate the config parser to achieve this behavior but I'm looking more for an import solution here.
Thank you!

Just to clarify: I assume what you want is to have an import-statement inside your ini-file, something like:
import other.ini
[new values]
key = value
color = green
...
Basically, a ini-file is just a map of keys to values, something like a dict in the form of a text file. They are deliberately kept rather simple.
Now, while importing another ini-file sounds like a really simple thing to do, it quickly comes with an entire series of problems with which other import- or inheritance mechanisms have to deal. What happens, for instance, if two ini-files import each other, or what happens if A imports B and C, and both B and C import D (so called diamond problem)—do you then import D twice? Hence, importing other ini-files is not quite as simple as one might expect, and therefore not a feature you put necessarily into a minimalistic design.
That being said, keep in mind that ini-file is really just a map and therefore an inert entity: it does not do anything at all. In order to read an ini-file, you will usually need a parser like Python's configparser, which reads the textual information and creates the actual map for you. Also, it is this parser that would have to do the importing of other files. Hence, the question is: is there a parser for ini-files that supports importing?
I am not aware of any such parser as part of a publicly available standard package (although I assume they do exist). You could, of course, write one yourself.
Perhaps the easiest thing to do is to add imports as a special key-value pair to your ini file, something like import=other.ini; another.ini and then have your program follow these 'links' and import whatever other file(s) it is referring to.
Or you go the path of C and write a preprocessor that looks for lines that start with something like #import other.ini in your ini-file, and then merges the other ini-file into your text before parsing everything.

Fast looping through two lists comparing regex pattern using value from one list against values in another list

I'm currently trying to deal with a tricky problem in Python. To set the scene, I'm using Let's Encrypt, so I have the (posix)paths of the live directory and I have a list of domains from a CMS.
I'm trying to compare the domains against the paths, which requires a regular expression match containing the value from the first list, because the paths will contain the domain names, but I can't/don't understand how I would do a set intersection because the values don't match, but this is really painfully slow with a traditional for loop (when you have >12000 domain names and >10000 certificates (paths)).
So, some explanatory code:
import re
from cryptography.x509 import load_pem_x509_certificate
from cryptography.hazmat.backends import default_backend, openssl
all_domains = function_that_returns_domains_as_list()
all_paths = function_that_returns_certificate_paths()
nocert_list = list()
def cert_check(path):
cert = load_pem_x509_certificate(path.read_bytes(), default_backend())
cur_date = datetime.now()
end_date = cert.not_valid_after
... # More logic and functions for checking if the certificate has expired etc.
def path(domain):
for path in all_paths:
if path.match(f"*/{domain}*"):
return path
def check_domain_certs():
for domain in all_domains:
path_check = path(domain)
if not path_check:
nocert_list.append(domain)
if path_check:
cert_path = path_check
cert_check(cert_path)
Even if I don't call the cert_check function in the check_domain_certs function and instead add the path to a list, to call outside of the loop and the check_domain_certs function, the looping itself takes a long time (I ran it whilst typing out this message and it has only just finished some ~30 minutes later. Probably something do do with it having to loop about 120 million times.)
I've run down a lot of stackoverflow rabbit holes today so I'm actually turning to the community for help this time.

The things that you may try are:
Search for optimization in Python. (Using python builtins might help!)
Generally list comprehensions are better than for loops
My general advice is to use list comprehension with Cython. First utilize the list comprehension, and then check for Cython optimizations with the .pyx files and cythonize it.
Cython, in general, translates the python code to the C. Especially loops in Python works significantly faster with Cython. You can check one of the Sentdex's video explaining the general principles for cython. As you could see in the 23:20 of the video, it makes a significant change like ~100x faster than the traditional python loop.

In case the search fails more often than succeeds, this may be faster:
def check_domain_certs():
all_paths_as_string = str(set(all_paths))
for domain in all_domains:
if not domain in all_paths_as_string:
nocert_list.append(domain)
else:
cert_path = path(domain) # full search returning cert path
cert_check(cert_path)
...

Dynamically update complex OrderedDict (based on yaml file)

I'm trying to build a piece of software that will rely on very dynamic configuration (or "ruleset", really). I've tried to capture it in the pseudocode:
"""
---
config:
item1:
thething: ${stages.current.variables.item1}
stages:
dev:
variables:
item1: stuff
prod:
variables:
item1: stuf2
"""
config_obj = yaml.load(config)
current_stage = 'dev'
#Insert artificial "current stage" to allow var matching
stages['current'] = stages[current_stage]
updated_config_obj = replace_vars(config_obj)
The goal is to have the updated_config_obj replace all variable-types with the actual value, so for this example it should replace ${stages.current.variables.item1} with stuff. The current part is easily solved by copying whatever's the current stage into a current item in the same OrderedDict, but I'm still stomped by how to actually perform the replace. The config yaml can be quite large and is totally depended on a plugin system so it must be dynamic.
Right now I'm looking at "walking" the entire object, checking for the existence of a $ on each "leaf" (indicating a variable) and performing a lookup backup to the current object to "resolve" the variable, but somehow that seems overly complex. Another alternative is (I guess) to ue Jinja2-templating on the "config string", with the parsed object as a lookup. Certainly doable but it somehow feels a little dirty.
I have the feeling that there should be a more elegant solution which can be done solely on the parsed object (without interacting with the string), but it escapes me.
Any pointers appreciated!

First, my two cents: try to avoid using any form of interpolation in your configuration file. This creates another layer of dependencies - one dependency for your program (the configuration file) and another dependency for your configuration file.
It's a slick solution at the moment, but consider that five years down the road some lowly developer might be staring down ${stages.current.variables.item1} for a month trying to figure out what this is, not understanding that this implicitly maps onto stages.dev. And then worse yet, some other developer comes along, and seeing that the floodgates of interpolation have been opened, they start using {{stages_dev}}, to mean that some value should interpolated from the system's environmental variables. And then some other developer starts using their own convention like {{!stagesdev!}}, which means that the value uses its own custom runtime interpolation, invoked in some obscure, downstream back-alley.
And then some consultant is hired to reverse-engineer the whole thing and now they are sailing the seas of spaghetti.
If you still want to do this, I'd recommend opening/parsing the configuration file into a dictionary (presumably using yaml.load()), then iterating through the whole thing, line-by-line, using regex to find instances of \$\{(.*)\}.
For each captured group, create an ordered list like:
# ["stages", "current", "variables", item1"]
yaml_references = ".".split("stages.current.variables.item1")
Then, you could do something like:
yaml_config_dict = "" # the parsed configuration file
interpolated_reference = None
for y in yaml_references:
interpolated_reference = yaml_config_dict[y]
i = interpolated_reference[0]
Now i should represent whatever ${stages.current.variables.item1} was pointing to in the context of the .yaml file and you should be able to do a string replace.

Using python to parse a large set of filenames concatenated from inconsistent object names

/tldr Looking to parse a large set of filenames that are a concatenation of two names (container + child) for the original two names where nomenclature is inconsistent. Python library suggestions or any other guidance appreciated.
I am looking for a way to parse strings for information where the nomenclature and formatting of information within those strings will likely be inconsistent to some degree.
Background
Industry: Automation controls
Problem to be solved:
Time series data is exported from an automation system with a single data point being saved to a single .csv file. (example: If the controls system were an environmental controls system the point might be the measured temperature of a room taken at 15 minute intervals.) It is possible to have an environment where there are a few dozen points that export to CSV files or several thousand points that export to CSV files. The structure that the points are normally stored in is as follows: points are contained within a controller, controllers are integrated under a management system and occasionally management systems could be integrated into another management system. The resulting structure is a simple hierarchical tree.
The filenames associated with the CSV files are assembled from the path structure of each point as follows: Directories are created for the management systems (nested if necessary) and under those directories are the CSV files where the filename is a concatenation of the controller name and the point name.
I have written a python script that processes a monthly export of the CSV files (currently about 5500 of them [growing]) into a structured data store and another that assembles spreadsheets for others to review. Currently, I am using some really ugly regular expressions and even uglier string.find()s with a list of static string values that I have hand entered to parse out control names and point names for each file so that they can be inserted into the structured data store.
Unfortunately, as mentioned above, the nomenclature used in these environments are rarely consistent. Point names vary widely. The point referenced above might be known as ROOMTEMP, RM_T, RM-T, ROOM-T, ZN_T, ZNT, RMT or several other possibilities. This applies to almost any point contained within a controller. Controller names are also somewhat inconsistent where they may be named for what type of device they are controlling, the geographic location of the device or even an asset number associated with the device.
I would very much like to get out of the business of hand writing regular expressions to parse file names every time a new location is added. I would like to write code that reads in filenames and looks for patterns across the filenames and then makes a recommendation for parsing the controller and point name out of each filename. I already have an interface where I can assign controller name and point name to each point object by hand so if there are errors with the parse I can modify the results. Ideally, the patterns created by the existing objects would influence the suggested names of new files being parsed.
Some examples of filenames are as follows:
UNIT1254_SAT.csv, UNIT1254_RMT.csv, UNIT1254_fil.csv, AHU_5311_CLG_O.csv, QE239-01_DISCH_STPT.csv, HX_E2_CHW_Return.csv, Plant_RM221_CHW_Sys_Enable.csv, TU_E7_Actual Clg Setpoint.csv, 1725_ROOMTEMP.csv, 1725_DA_T.csv, 1725_RA_T.csv
The order will always be consistent where it is a concatenation of controller name and then point name. There will most likely be a consistent character used to separate controller name from point name (normally an underscore, but occasionally a dash or some other character.)
Does anyone have any recommendations on how to get started with parsing these file names? I’ve thought through a few ideas, but keep shelving them before trying them prior to implementation because I keep finding the potential for performance issues or identifying failure points. The rest of my code is working pretty much the way I need it to, I just haven’t figured out an efficient or useful way to pull the correct names out of the filename. Unfortunately, It is not an option to modify the names on the control system side to be consistent.

I don't know if the following code will help you, but I hope it'll give you at least some idea.
Considering that a filename as "QE239-01_STPT_1725_ROOMTEMP_DA" can contain following names
'QE239-01'
'QE239-01_STPT'
'QE239-01_STPT_1725'
'QE239-01_STPT_1725_ROOMTEMP'
'QE239-01_STPT_1725_ROOMTEMP_DA'
'STPT'
'STPT_1725'
'STPT_1725_ROOMTEMP'
'STPT_1725_ROOMTEMP_DA'
'1725'
'1725_ROOMTEMP'
'1725_ROOMTEMP_DA'
'ROOMTEMP'
'ROOMTEMP_DA'
'DA'
as being possible elements (container name or point name) of the filename,
I defined the function treat() to return this list from the name.
Then the code treats all the filenames to find all the possible elements of filenames.
The function is based on the idea that in the chosen example the element ROOMTEMP can't follow the element STPT because STPT_ROOMTEMP isn't a possible container name in this example string since there is 1725 between these two elements.
And then, with the help of a function in difflib module, I try to discriminate elements that may have some similarity, in order to try to detect patterns under which several elements of names can be gathered.
You must play with the value passed as argument to cutoff parameter to choose what could be the best to give interesting results for you.
It's far from being good, certainly, but I didn't understood all aspects of your problem.
s =\
"""UNIT1254_SAT
UNIT1254_RMT
UNIT1254_fil
AHU_5311_CLG_O
QE239-01_DISCH_STPT,
HX_E2_CHW_Return
Plant_RM221_CHW_Sys_Enable
TU_E7_Actual Clg Setpoint
1725_ROOMTEMP
1725_DA_T
1725_RA_T
UNT147_ROOMTEMP
TRU_EZ_RM_T
HXX_V2_RM-T
RHXX_V2_ROOM-T
SIX8_ZN_T
Plint_RP228_ZNT
SOHO79_EZ_RMT"""
li = s.split('\n')
print(li)
print('- - - - - - - - - - - - - - - - - ')
import difflib
from pprint import pprint
def treat(name):
lu = name.split('_')
W = []
while lu:
W.extend('_'.join(lu[0:x]) for x in range(1,len(lu)+1))
lu.pop(0)
return W
if 0:
q = "QE239-01_STPT_1725_ROOMTEMP_DA"
pprint(treat(q))
print('==========================================')
WALL = []
for t in li:
WALL.extend(treat(t))
pprint(WALL)
for x in WALL:
j = set(difflib.get_close_matches(x, WALL, n=9000000, cutoff=0.7 ))
if len(j)>1:
print(j,'\n')

How to write back to a PDB file after doing Superimposer for atoms of a protein in PDB.BIO python

I read and extracted information of atoms from a PDB file and did a Superimposer() to align a mutation to wild-type. How can I write the aligned values of atoms back to PDB file? I tried to use PDBIO() library but it doesn't work since it doesn't accept a list as an input. Anyone has an idea how to do it?
mutantAtoms = []
mutantStructure = PDBParser().get_structure("name",pdbFile)
mutantChain = mutStructure[0]["B"]
# Extract information of atoms
for residues in mutantChain:
mutantAtoms.append(residues)
# Do alignment
si =Superimposer()
si.set_atoms(wildtypeAtoms, mutantAtoms)
si.apply(mutantAtoms)
Now mutantAtoms is the aligned atom to wild-type atom. I need to write this information to a PDB file. My question is how to convert from list of aligned atoms to a structure and use PDBIO() or some other ways to write to a PDB file.

As I see in an example in the PDBIO package documentation in Biopython documentation:
p = PDBParser()
s = p.get_structure("1fat", "1fat.pdb")
io = PDBIO()
io.set_structure(s)
io.save("out.pdb")
Seems like PDBIO module needs an object of class Structure to work, which is in principle what I understand Superimposer works with. When you say it does not accept a list do you mean you have a list of structures? In that case you could simply do it by iterating throught the structures as in:
for s in my_results_list:
io.set_structure(s)
io.save("out.pdb")
If what you have is a list of atoms, I guess you could create a Structure object with that and then pass it to PDBIO.
However, it is difficult to tell more without knowing more about your problem. You could put on your question the code lines where you get the problem.
Edit: Now I have better understood what you want to do. So I have seen in an interesting Biopython Structural Bioinformatics FAQ some information about the Structure class, which is a little complex apparently. At first sight, I do not see a very easy way to create Structure objects from scratch, but what you could do is modify the structure you get from PDBIO substituting the atoms list with the result you get from Superimposer and then write the .pdb file using the same modified structure. So you could try to put your mutantAtoms list into the mutantStructure object you already have.

Develop Reference

Python is a programming language that lets you work quickly and integrate systems more effectively.