generating simulation input files with Python - python

I am using a scientific simulation package that requires several text-based input files for each 'experiment' to be conducted. These files can be quite lengthy and have a lot of boilerplate sections in them; however, specific 'experiment-specific' values must be entered at many locations within these files.
I would like to automate the generation of these files and do so in a way that is maintainable.
Right now, I am using a Python script I wrote that employs triple quoted blocks of text and variable substitution (using % and .format()) to create sections in the files. I then write out these blocks to the appropriate files.
Accounting for proper aesthetic indentation in the resulting input files is proving to be difficult; moreover, the autogenerator script is becoming more and more opaque as I enhance the types of simulations and options that can be handled.
Does anyone have suggestions about how to manage this task in a more elegant and maintainable way?
I am aware of templating packages like jinja. Do these have benefits outside of generating html-like files? Has anyone used these for the above-stated purpose?
Perhaps a totally different approach would be better.
Any suggestions would be greatly appreciated.

Jinja doesn't care what type of file you make. Text is text is text, unless it's binary. Not even sure Jinja cares then either.
IPython, and in particular, nbconvert, uses Jinja2 to export LaTeX, ipynb, markdown, etc.
There is also an IPython notebook with Jinja2 magics in case you want a demo.

My usual approach to this sort of problem is to create a small library of functions that help me generate and customise the boiler-plate. I don't know what your experiment-definition language looks like but generally I'd need to write a function that writes out the text to initialise the simulation, a function that writes out the text to wrap up the simulation and some other functions to write out the different chunks of text that define each type of experiment.
Having put those functions in a file called mysim, say, I could then use them like this:
from mysim import sim_init, sim_conclude, experimentType1, experimentType2
sim_init (name="Today's Simulation", author="Simon")
for param1 in [0,1,2,3,4,5,6,7,8,20,30,40,50,60,70]:
experimentType1 (param1)
for param2 in ["A", "B", "C"]:
experimentType2 (param1, param2)
sim_conclude (savefile="output.txt")
This Python script would generate a simulation input file that would run experiment type 1 for each value of param1 and experiment type 2 for each combination of param1 and param2.
The function implementations themselves might look messy, but the script that creates a particular simulation file will be simple and clear.

Related

Maya Python creating a script with a script

I've got a kind of weird question--but would be immensely useful if it is possible--in Maya using Python, can I take in several points of user input and have Python create a separate script for me? In this instance, I want to take in controller and locator names and have Python spit out a complete IKFK match script also in Python (it's really just a lot of getAttr and setAttr commands, although with 6 if statements per limb for PV matching.) The only other wrinkle there is that it has to be able to prefix hierarchy names in the script if necessary if the rig is imported into a new scene rather than just opened. There's an expression component to my switches that it would be nice if Python could make for me, too.
Is this possible or am I crazy?
That's no problem. Just write a textfile with a .py extension into a path where maya can find it. Then you have to import it somewhere. Creating expressions is not a problem either.
Maybe it could make sense to think about the approach you choose. Imagine you have written a dozen of these new python files and you discover a problem in the script, you will have to redo it. I'd try to collect all data and only write the required informations into a textfile e.g. in json format. Then you can read the data and rebuild your skeletons.

Many input parameters to a binary file

I have a simulation program written in C++. Normally, I would want to initialize the binary code with different parameters for different runs. First I tried using all my input parameters as command line arguments, but it happened to be too confusing, there are just too many parameters. Like:
./mysimulation 5 55 True ... output_file_name.txt
Then I decided to parse only a filename with input parameters written in it.
Like:
./mysimulation input_parameters.txt
But then the input file looks messy and confuses me because there are multiple lines, but I forget what lines correspond to what parameters The file would like so:
input_parameters.txt
55
5
True
...
output_file_name.txt
I could add comments, but then it is going to be a hassle to handle input file in C++, as it will require additional effort to parse the parameters in the file.
Now, I thought about using XML files since they are very structured and human-readable. I'm not sure if it is going to be easy to generate these XML files in python (python generates input parameters and launches C++ code multiple times) and read them in C++. Maybe there exists a better solution?
In my future plans, I'd like to have a parallel execution of multiple binaries on a many-CPU server computer. It'd be nice if the solution nicely complied with multiprocessing extension.
I tried linking C++ and Python with boost (as I did with my other projects), but I had trouble setting it up these CMakeFiles as the C++ has multiple classes, and I gave up after a couple of hours. And in one of my projects after making a shared library, multiprocessing with python crashed when using these libraries for some unknown to me reason, that is why I'd like to avoid python boost. (Maybe I did something incorrectly)

combine source code from different files

I am using Emacs Org-mode Babel source code block to write and use some small functions. Now I want to do a little bit more. Say after a while, I found the function I wrote in Org-babel is valuable for reuse, I want to put it into my personal python package, e.g., my_tools.
So org-babel provide extraction of the source code, lets say, I have the source code extracted into a file called examples.py, which has func1 and func2 in the file. I want to add these functions into a python file/module called my_functions.py, is there a python package or best practice to do such thing so the source code of func1, func2 will be inserted into the module?
To me it is something I am trying to do for a while, usually, when working with python, we just write the code for one time usage, later on, we may find some code/functions are reused again and again, thus we want to save it to a package so that it can be easily installed and shared with others.
We can even add tags to the code so that when extract and insert it to the package module, it knows where to insert based on the tag information. I am little fuzzy here to know if there is already a PyPI package for such scenario, or how should I architect the package if I want to build such one for myself. I am not that experienced and would like to hear opinions on this.
This should be doable using "tangling" of source code into files and noweb syntax to gather up individual pieces into a larger whole. The following is meant as an illustration of the method:
* Individual code blocks
#+name: foo
#+BEGIN_SRC elisp
(princ "Hello")
#+END_SRC
and another one:
#+name: bar
#+BEGIN_SRC elisp
(princ "Goodbye")
#+END_SRC
* Combine them together
#+BEGIN_SRC elisp :tangle ./tangled/foo :noweb yes
(message "Package stuff")
<<foo>>
<<bar>>
#+END_SRC
Using C-c C-v C-t to tangle, gets you a file named foo, in the ./tangled subdirectory (which has to exist already), whose contents are:
(message "Package stuff")
(princ "Hello")
(princ "Goodbye")
The pythonization of this should be straightforward, but the more advanced aspects of what you describe (using tags to select functions e.g.) are certainly not addressed by this (and I'm not sure how to do them off the top of my head).
I'm a big fan of keeping things simple. If I understand your requirements
correctly, your primary interest is in generating python source files and
modules rather than executing python code and having the results used in or
copied back into the org file.
If this is the case, I think your best approach is to just have an org file
which represents your /tools/ module. When you find a function etc which you
keep using in different files/projects and which should go into your tools
module, add that function code block into the org file representing your tools
module (along with appropriate docs etc). Then, update your other org files
which represent the different code blocks of your program to load that module
and reference that function.
In the org file which represents your tool module, you could use some of Org's
functionality to execute the code to incorporate tests etc. This way, you can
load your org file and have it execute the tests to varify all the utility
functions in your module are working.
In your other projects, just write your source blocks to source the functions
from your utility module. Don't worry about using org to try and do fancy
referencing or the like. Keep it simple. You can use org links to reference back
to your org file representing your toolbox modules to get documentation
references.
If on the other hand, you want to do something like a python lab book system,
where you run python code from within the org file and get results back which
you use for documentation or as input for other blocks, then you need to use
some of the advanced noweb features to handle more complex block references and
pass around arguments etc. You may also find the library of babel
useful.
14.6 Library of Babel
=====================
The "Library of Babel" is a collection of code blocks. Like a function
library, these code blocks can be called from other Org files. This
collection is in a repository file in Org mode format in the `doc'
directory of Org mode installation. For remote code block evaluation
syntax, *note Evaluating code blocks::.
For any user to add code to the library, first save the code in
regular `src' code blocks of an Org file, and then load the Org file
with `org-babel-lob-ingest', which is bound to `C-c C-v i'.

Is this a wrapper script? What is the correct name for it? (Python)

execfile("es.py")
execfile("ew.py")
execfile("ef.py")
execfile("gw.py")
execfile("sh.py")
print "--Complete--"
It seems more like a "batch" script. It looks analogous to writing a bash or batch file in unix/windows to do a bunch of pre-defined things. I definitely agree that it's not really a pythonic thing to do as is written though.
It is a kind of wrapper script, i guess, but there is no "correct name" for that as it is not very pythonic to write something like that.
One would probably write something like
import es
import ew
### ...
es.run() # assuming es code is in a run function
ew.run()
# ...
To expand on that, in python, a file is supposed to represent an unit of functionality. For example, if you implement a list type / class, you might want to put the class for this type and every function related to it into this file.
In python, a function is supposed to be an unit of execution, eg. a bunch of code that is supposed to run at once, on some particular data. It's hard to advise you without knowing the content of the files, but the content of every file executed in your snippet of code would probably benefit from being put into functions. The file division could be kept, or not depending on functionality.

Dangerous Python Keywords?

I am about to get a bunch of python scripts from an untrusted source.
I'd like to be sure that no part of the code can hurt my system, meaning:
(1) the code is not allowed to import ANY MODULE
(2) the code is not allowed to read or write any data, connect to the network etc
(the purpose of each script is to loop through a list, compute some data from input given to it and return the computed value)
before I execute such code, I'd like to have a script 'examine' it and make sure that there's nothing dangerous there that could hurt my system.
I thought of using the following approach: check that the word 'import' is not used (so we are guaranteed that no modules are imported)
yet, it would still be possible for the user (if desired) to write code to read/write files etc (say, using open).
Then here comes the question:
(1) where can I get a 'global' list of python methods (like open)?
(2) Is there some code that I could add to each script that is sent to me (at the top) that would make some 'global' methods invalid for that script (for example, any use of the keyword open would lead to an exception)?
I know that there are some solutions of python sandboxing. but please try to answer this question as I feel this is the more relevant approach for my needs.
EDIT: suppose that I make sure that no import is in the file, and that no possible hurtful methods (such as open, eval, etc) are in it. can I conclude that the file is SAFE? (can you think of any other 'dangerous' ways that built-in methods can be run?)
This point hasn't been made yet, and should be:
You are not going to be able to secure arbitrary Python code.
A VM is the way to go unless you want security issues up the wazoo.
You can still obfuscate import without using eval:
s = '__imp'
s += 'ort__'
f = globals()['__builtins__'].__dict__[s]
** BOOM **
Built-in functions.
Keywords.
Note that you'll need to do things like look for both "file" and "open", as both can open files.
Also, as others have noted, this isn't 100% certain to stop someone determined to insert malacious code.
An approach that should work better than string matching us to use module ast, parse the python code, do your whitelist filtering on the tree (e.g. allow only basic operations), then compile and run the tree.
See this nice example by Andrew Dalke on manipulating ASTs.
built in functions/keywords:
eval
exec
__import__
open
file
input
execfile
print can be dangerous if you have one of those dumb shells that execute code on seeing certain output
stdin
__builtins__
globals() and locals() must be blocked otherwise they can be used to bypass your rules
There's probably tons of others that I didn't think about.
Unfortunately, crap like this is possible...
object().__reduce__()[0].__globals__["__builtins__"]["eval"]("open('/tmp/l0l0l0l0l0l0l','w').write('pwnd')")
So it turns out keywords, import restrictions, and in-scope by default symbols alone are not enough to cover, you need to verify the entire graph...
Use a Virtual Machine instead of running it on a system that you are concerned about.
Without a sandboxed environment, it is impossible to prevent a Python file from doing harm to your system aside from not running it.
It is easy to create a Cryptominer, delete/encrypt/overwrite files, run shell commands, and do general harm to your system.
If you are on Linux, you should be able to use docker to sandbox your code.
For more information, see this GitHub issue: https://github.com/raxod502/python-in-a-box/issues/2.
I did come across this on GitHub, so something like it could be used, but that has a lot of limits.
Another approach would be to create another Python file which parses the original one, removes the bad code, and runs the file. However, that would still be hit-and-miss.

Categories

Resources