How to make a .py script readable by another python script? - python

I want to build a program (in Python) which is analysing other Python scripts. Therefor I need a way to make .py-files readable for a Python program.
I thought about simply converting .py to .txt and then using .startswith and .find methods. Is there a way to convert .py to .txt?
Also feel free to tell other ways for analysing. Important is that structures like if-statements or loops and indentation-levels get figured out.

Also feel free to tell other ways for analysing. Important is that structures like if-statements or loops and indentation-levels get figured out.
If you want to preserve this kind of structure in exactly the same way that Python would itself parse the file, you should use the standard library ast module (https://docs.python.org/3/library/ast.html). AST means "abstract syntax tree": the representation of the code as Python understands it.
The basic usage pattern is to call ast.parse(file) (https://docs.python.org/3/library/ast.html#ast.parse) on the file you want to parse. You'll get back an object (https://docs.python.org/3/library/ast.html#ast.AST) which is the top of a tree of AST nodes.
You may be interested in picking through the source code for black (https://github.com/psf/black), which is a Python code formatter that uses ast to validate that the formatted code has the exact same behavior as the code it was originally run on.

Related

Regular expression to match python method

Is there a python regex which will generically match a method definition (not just the declaration but also the method body) inside a python code file?
I did my share of googling but only found something similar for Java. Python is different w.r.t. to the way scopes are entered through indentation rather than accolades. What makes this problem hard is the fact that indentation may drop inside the method body (i.e. blank lines, multiline strings, comments).
I also looked for DOM parsers but basically they're all aimed at XML or HTML.
Finally I am looking into introspection (How can I get the source code of a Python function?) but I still wonder if there is a nicer way for code analysis without execution.
EDIT: the question receives a bunch of downvotes but I think it's actually a valid and specific programming question. I elaborated the question a bit.
Err, you don't want to use regexes to parse Python. The 'nicer way for code analysis without execution' is to use the Python standard library parser and/or ast modules. Look under the heading Python Language Services, e.g. https://docs.python.org/2/library/language.html

How to store a python function within an XML file

I'm developing a GUI application in Python that stores it's documents in an XML based format. The application is a mathematical model which several pre-defined components which can be drag-and-dropped. I'd also like the user to be able to create custom components by writing a python function inside an editor provided within the application. My issue is with storing these functions in the XML.
A function might look something like this:
def func(node, timestamp):
return node.weight * timestamp.day + 4
These functions are wrapped in an object which provides a standard way of calling them (compared to the pre-defined components). If I was to create one from Python directly it would look like this:
parameter = ParameterFunction(func)
The function is then called by the model like this:
parameter.value(node=node, timestamp=timestamp)
The ParameterFunction object has a to_xml and from_xml functions which need to serialise/deserialise the object to/from an XML representation.
My question is: how do I store the Python functions in an XML document?
One solution I have thought of so far is to store the function definition as a string, eval() or exec() it for use but keep the string, then store the string in a CDATA block in the XML. Are there any issues with this that I'm not seeing?
An alternative would be to store all of the Python code in a separate file, and have the XML reference just the function names. This could be nice as it could be edited easily in an external editor. In which case what is the best way to import the code? I am envisiging fighting with the python import path...
I'm aware there are will be security concerns with running untrusted code, but I'm willing to make this tradeoff for the freedom it gives users.
The specific application I'm referring to is on github. I'm happy to provide more information if it's needed, but I've tried to keep it fairly generic here. https://github.com/snorfalorpagus/pywr/blob/120928eaacb9206701ceb9bc91a5d73740db1953/pywr/core.py#L396-L402
Nope, you have the easiest and best solution that I can think of. Just keep them as strings, as long as your not worried about running the untrusted code.
The way I'd deal with external python scripts containing tiny snippets like yours would be to treat them as plain text files and read them in as strings. This avoids all the problems with importing them. Just read them in and call exec on them, then the functions will exist in scope.
EDIT: I was going to add something on sandboxing python code, but after a bit of research it seems this will not be an easy task, it would be easier to sandbox the entire program. Another longer and harder way to restrict the untrusted code would be to create your own tiny interpreter that only did safe operations (i.e mathematical operations, calling existing functions, etc..)

How to use exec safely? I would like to use a dynamically-loaded code module as a configuration file

After reading the Software Carpentry essay on Handling Configuration Files I'm interested in their Method #5: put parameters in a dynamically-loaded code module. Basically I want the power to do calculations within my input files to create my variables.
Based on this SO answer for how to import a string as a module I've written the following function to import a string or oen fileobject or STringIO as a module. I can then access varibales using the . operator:
import imp
def make_module_from_text(reader):
"""make a module from file,StringIO, text etc
Parameters
----------
reader : file_like object
object to get text from
Returns
-------
m: module
text as module
"""
#for making module out of strings/files see https://stackoverflow.com/a/7548190/2530083
mymodule = imp.new_module('mymodule') #may need to randomise the name; not sure
exec reader in mymodule.__dict__
return mymodule
then
import textwrap
reader = textwrap.dedent("""\
import numpy as np
a = np.array([0,4,6,7], dtype=float)
a_normalise = a/a[-1]
""")
mymod = make_module_from_text(reader)
print(mymod.a_normalise)
gives
[ 0. 0.57142857 0.85714286 1. ]
All well and good so far, but having looked around it seems using python eval and exec introduces security holes if I don't trust the input. A common response is "Never use eval orexec; they are evil", but I really like the power and flexibility of executing the code. Using {'__builtins__': None} I don't think will work for me as I will want to import other modules (e.g. import numpy as np in my above code). A number of people (e.g. here) suggest using the ast module but I am not at all clear on how to use it(can ast be used with exec?). Is there simple ways to whitelist/allow specific functionality (e.g. here)? Is there simple ways to blacklist/disallow specific functionality? Is there a magic way to say execute this but don't do anythinh nasty.
Basically what are the options for making sure exec doesn't run any nasty malicious code?
EDIT:
My example above of normalising an array within my input/configuration file is perhaps a bit simplistic as to what computations I would want to perform within my input/configuration file (I could easily write a method/function in my program to do that). But say my program calculates a property at various times. The user needs to specify the times in some way. Should I only accept a list of explicit time values so the user has to do some calculations before preparing the input file? (note: even using a list as configuration variable is not trivial see here). I think that is very limiting. Should I allow start-end-step values and then use numpy.linspace within my program? I think that is limiting too; whatif I want to use numpy.logspace instead? What if I have some function that can accept a list of important time values and then nicely fills in other times to get well spaced time values. Wouldn't it be good for the user to be able to import that function and use it? What if I want to input a list of user defined objects? The thing is, I don't want to code for all these specific cases when the functinality of python is already there for me and my user to use. Once I accept that I do indead want the power and functionality of executing code in my input/configuration file I wonder if there is actually any difference, security wise, in using exec vs using importlib vs imp.load_source and so on. To me there is the limited standard configparser or the all powerful, all dangerous exec. I just wish there was some middle ground with which I could say 'execute this... without stuffing up my computer'.
"Never use eval or exec; they are evil". This is the only answer that works here, I think. There is no fully safe way to use exec/eval on an untrusted string string or file.
The best you can do is to come up with your own language, and either interpret it yourself or turn it into safe Python code before handling it to exec. Be careful to proceed from the ground up --- if you allow the whole Python language minus specific things you thought of as dangerous, it would never be really safe.
For example, you can use the ast module if you want Python-like syntax; and then write a small custom ast interpreter that only recognizes a small subset of all possible nodes. That's the safest solution.
If you are willing to use PyPy, then its sandboxing feature is specifically designed for running untrusted code, so it may be useful in your case. Note that there are some issues with CPython interoperability mentioned that you may need to check.
Additionally, there is a link on this page to an abandoned project called pysandbox, explaining the problems with sandboxing directly within python.

How should I read and write a file format using 1 definition?

I want to modify quicktime files, so I'm working with quicktime.py but it only parses information. It doesn't know how to write/modify things.
In C, struct records are actually very powerful - you get 4 things for the cost of 1 readable definition:
Define names for each variable.
Define serializable types and order for each variable (let's ignore machine specific shenanigans for this discussion).
Pack (write)
Unpack ('parse')
In python, the struct module can do numbers 2-4 but you need to do extra work to make python define names for both packing and unpacking based on 1 definition (DRY).
OTOH ctypes is able to do 1-4 (3-4 aren't exactly in the stdlib but they're easier to patch in using this) and ctypes supports nesting.
I understand that if more complex parsing/serializing is needed, specific code will be written. But still it seems to me that I should be able to explain to python what a file looks like and it could do the rest (pack/unpack). Problem is that ctypes is advertised as a "foreign function library" so it's not "supposed" to do this. Another issue is ctypes probably won't work well with a HUGE file where you just want to seek to wherever and change a few bits, though I haven't tested this.
Here's the question: what's the DRY way to read and modify binary formats in python?
Maybe try Hachoir ?
Try Construct, it does exactly what you want.

how to treat ruby symbols in cross language object serialization

I'm currently working on a project where I need to transfer objects from ruby to python and back again, obviously serialization is the way to go. I've looked at things like yaml but decided to write my own as I didn't want to deal with the dependencies of the libraries and such when it came time to distribute. I've wrote up how this serialization format works here.
my question is that as this format is intended to work cross language between ruby and python,
how should I serialize ruby's symbols? I'm not aware of a object that works the same way in python. should a dump containing a symbol fail? should I just serialize it as a string? what would be best?
Doesn't that depend on what your project needs? If symbols are important, you'll need some way to deal with them.
I'm not a Ruby programmer, but from what I've just read, I think converting them to strings is probably easiest. The standard Python interpreter will reuse memory for identical short strings, which seems to be a key reason suggested for using symbols.
EDIT: If it needs to work for other programmers, passing values back and forth shouldn't change them. So you either have to handle symbols properly, or throw an error straight away. It should be simple enough in Python:
class Symbol(str):
pass
# In serialising code:
if isinstance(x, Symbol):
serialise_as_symbol(x)
Any reason you're not using a standard data interchange format like JSON or XML? They seem to be acceptable to countless applications, services, and programmers.
If symbols are a stumbling block then you have three choices, don't allow them, convert them to strings on the fly or figure out a way to make them universal and/or innocuous in other languages.

Categories

Resources