How to import lxml xpath functions to default namespace? - python

This is a example in lxml doc:
>>> regexpNS = "http://exslt.org/regular-expressions"
>>> find = etree.XPath("//*[re:test(., '^abc$', 'i')]",
... namespaces={'re':regexpNS})
>>> root = etree.XML("<root><a>aB</a><b>aBc</b></root>")
>>> print(find(root)[0].text)
aBc
I want to import re:test() function to default namespace, so that I can call it without prefix re:. How can I do it? Thanks!

You can put a function in the empty function namespace:
functionNS = etree.FunctionNamespace(None)
functionNS['test'] = lambda context, nodes, *args: print(context, nodes, args)
By doing so, the new test function is already registered with the empty namespace prefix, that means you can use it like this:
root.xpath("//*[test(., 'arg1', 'arg2')]")
Unfortunately the function that is called for "{http://exslt.org/regular-expressions}test" isn't available from python, only from within the lxml extension implemented in C, so you can't simply assign it to functionNS['test'].
That means you'd need to reimplement it in python to assign it to the empty function namespace...
If that's not worth the trouble for you to spare you typing three characters, you could use this trick to make the re prefix for the namespace global:
etree.FunctionNamespace("http://exslt.org/regular-expressions").prefix = 're'
Then at least you don't need to pass the namespaces dict for each xpath expression.

Related

Python reconstruct function from AST, default parameters

I am attempting to implement a decorator that receives a function, parses it into an AST, eventually will do something to the AST, then reconstruct the original (or modified) function from the AST and return it. My current approach is, once I have the AST, compile it to a code <module> object, then get the constant in it with the name of the function, convert it to FunctionType, and return it. I have the following:
import ast, inspect, types
def as_ast(f):
source = inspect.getsource(f)
source = '\n'.join(source.splitlines()[1:]) # Remove as_ast decoration, pretend there can be no other decorations for now
tree = ast.parse(source)
print(ast.dump(tree, indent=4)) # Debugging log
# I would modify the AST somehow here
filename = f.__code__.co_filename
code = compile(tree, filename, 'exec')
func_code = next(
filter(
lambda x: isinstance(x, types.CodeType) and x.co_name == f.__name__,
code.co_consts)) # Get function object
func = types.FunctionType(func_code, {})
return func
#as_ast
def test(arg: int=4):
print(f'{arg=}')
Now, I would expect that calling test later in this source code will simply have the effect of calling test if the decorator were absent, which is what I observe, so long as I pass an argument for arg. However, if I pass no argument, instead of using the default I gave (4), it throws a TypeError for the missing argument. This makes it pretty clear that my approach for getting a callable function from the AST is not quite correct, as the default argument is not applied, and there may be other details that would slip through as it is now. How might I be able to correctly recreate the function from the AST? The way I currently go from the code module object to the function code object also seems... off intuitively, but I do not know how else one might achieve this.
The root node of the AST is a Module. Calling compile() on the AST, results in a code object for a module. Looking at the compiled code object returned using dis.dis(), from the standard library, shows the module level code builds the function and stores it in the global name space. So the easiest thing to do is exec the compiled code and then get the function from the 'global' environment of the exec call.
The AST node for the function includes a list of the decorators to be applied to the function. Any decorators that haven't been applied yet should be deleted from the list so they don't get applied twice (once when this decorator compiles the code, and once after this decorator returns). And delete this decorator from the list or you'll get an infinite recursion. The question is what to do with any decorators that came before this one. They have already run, but their result is tossed out because this decorator (as_ast) goes back to the source code. You can leave them in the list so they get rerun, or delete them if they don't matter.
In the code below, all the decorators are deleted from the parse tree, under the assumption that the as_ast decorator is applied first. The call to exec() uses a copy of globals() so the decorator has access to any other globally visible names (variables, functions, etc). See the docs for exec() for other considerations. Uncommented the print statements to see what is going on.
import ast
import dis
import inspect
import types
def as_ast(f):
source = inspect.getsource(f)
#print(f"=== source ===\n{source}")
tree = ast.parse(source)
#print(f"\n=== original ===\n{ast.dump(tree, indent=4)}")
# Remove the decorators from the AST, because the modified function will
# be passed to them anyway and we don't want them to be called twice.
for node in ast.walk(tree):
if isinstance(node, ast.FunctionDef):
node.decorator_list.clear()
# Make modifications to the AST here
#print(f"\n=== revised ===\n{ast.dump(tree, indent=4)}")
name = f.__code__.co_name
code = compile(tree, name, 'exec')
#print("\n=== byte code ===")
#dis.dis(code)
#print()
temp_globals = dict(globals())
exec(code, temp_globals)
return temp_globals[name]
Note: this decorator has not been tested much and has not been tested at all on methods or nested functions.
An interesting idea would be to for as_ast to return the AST. Then subsequent decorators could manipulate the AST. Lastly, a from_ast decorator could compile the modified AST into a function.

Call Python function using dynamic string variables

I am trying to create a dynamic method executor, where I have a list that will always contain two elements. The first element is the name of the file, the second element is the name of the method to execute.
How can I achieve this?
My below code unfortunately doesn't work, but it will give you an good indication of what I am trying to achieve.
from logic.intents import CenterCapacity
def method_executor(event):
call_reference = ['CenterCapacity', 'get_capacity']
# process method call
return call_reference[0].call_reference[1]
Thanks!
You can use __import__ to look up the module by name and then then use getattr to find the method. For example if the following code is in a file called exec.py then
def dummy(): print("dummy")
def lookup(mod, func):
module = __import__(mod)
return getattr(module, func)
if __name__ == "__main__":
lookup("exec","dummy")()
will output
dummy
Addendum
Alternatively importlib.import_module can be used, which although a bit more verbose, may be easier to use.
The most important difference between these two functions is that import_module() returns the specified package or module (e.g. pkg.mod), while __import__() returns the top-level package or module (e.g. pkg).
def lookup(mod, func):
import importlib
module = importlib.import_module(mod)
return getattr(module, func)
starting from:
from logic.intents import CenterCapacity
def method_executor(event):
call_reference = ['CenterCapacity', 'get_capacity']
# process method call
return call_reference[0].call_reference[1]
Option 1
We have several options, the first one is using a class reference and the getattr. For this we have to remove the ' around the class and instantiate the class before calling a reference (you do not have to instantiate the class when the method is a staticmethod.)
def method_executor(event):
call_reference = [CenterCapacity, 'get_capacity'] # We now store a class reference
# process method call
return getattr(call_reference[0](), call_reference[1])
option 2
A second option is based on this answer. It revolves around using the getattr method twice. We firstly get module using sys.modules[__name__] and then get the class from there using getattr.
import sys
def method_executor(event):
call_reference = ['CenterCapacity', 'get_capacity']
class_ref = getattr(sys.modules[__name__], call_reference[0])
return getattr(class_ref, call_reference[1])
Option 3
A third option could be based on a full import path and use __import__('module.class'), take a look at this SO post.
(Note: This answer assumes that the necessary imports have already happened, and you just need a mechanism to invoke the functions of the imported modules. If you also want the import do be done by some program code, I will have to add that part, using importlib library)
You can do this:
globals()[call_reference[0]].__dict__[call_reference[1]]()
Explanation:
globals() returns a mapping between global variable names and their referenced objects. The imported module's name counts as one of these global variables of the current module.
Indexing this mapping object with call_reference[0] returns the module object containing the function to be called.
The module object's __dict__ maps each attribute-name of the module to the object referenced by that attribute. Functions defined in the module also count as attributes of the module.
Thus, indexing __dict__ with the function name call_reference[1] returns the function object.

python how to define function with optional parameters by square brackets?

I often find some functions defined like open(name[, mode[, buffering]]) and I know it means optional parameters.
Python document says it's module-level function. When I try to define a function with this style, it always failed.
For example
def f([a[,b]]): print('123')
does not work.
Can someone tell me what the module-level means and how can I define a function with this style?
Is this what you are looking for?
>>> def abc(a=None,b=None):
... if a is not None: print a
... if b is not None: print b
...
>>> abc("a")
a
>>> abc("a","b")
a
b
>>> abc()
>>>
"if we can define optional parameters using this way(no at present)"
The square bracket notation not python syntax, it is Backus-Naur form - it is a documentation standard only.
A module-level function is a function defined in a module (including __main__) - this is in contrast to a function defined within a class (a method).

Parse Python file and evaluate selected functions

I have a file that contains several python functions, each with some statements.
def func1():
codeX...
def func2():
codeY...
codeX and codeY can be multiple statements. I want to be able to parse the file, find a function by name, then evaluate the code in that function.
With the ast module, I can parse the file, find the FunctionDef objects, and get the list of Stmt objects, but how do I turn this into bytecode that I can pass to eval? Should I use the compile module, or the parser module instead?
Basically, the function defs are just used to create separate blocks of code. I want to be able to grab any block of code given the name and then execute that code in eval (providing my own local/global scope objects). If there is a better way to do this than what I described that would be helpful too.
Thanks
I want to be able to grab any block of code given the name and then execute that code ... (providing my own local/global scope objects).
A naive solution looks like this. This is based on the assumption that the functions don't all depend on global variables.
from file_that_contains_several_python_functions import *
Direction = some_value
func1()
func2()
func3()
That should do exactly what you want.
However, if all of your functions rely on global variables -- a design that calls to mind 1970's-era FORTRAN -- then you have to do something slightly more complex.
from file_that_contains_several_python_functions import *
Direction = some_value
func1( globals() )
func2( globals() )
func3( globals() )
And you have to rewrite all of your global-using functions like this.
def func1( context )
globals().update( context )
# Now you have access to all kinds of global variables
This seems ugly because it is. Functions which rely entirely on global variables are not really the best idea.
Using Python 2.6.4:
text = """
def fun1():
print 'fun1'
def fun2():
print 'fun2'
"""
import ast
tree = ast.parse(text)
# tree.body[0] contains FunctionDef for fun1, tree.body[1] for fun2
wrapped = ast.Interactive(body=[a.body[1]])
code = compile(wrapped, 'yourfile', 'single')
eval(code)
fun2() # prints 'fun2'
Take a look at grammar in ast doc: http://docs.python.org/library/ast.html#abstract-grammar. Top-level statement must be either Module, Interactive or Expression, so you need to wrap function def in one of those.
If you're using Python 2.6 or later, then the compile() function accepts AST objects in addition to source code.
>>> import ast
>>> a = ast.parse("print('hello world')")
>>> x = compile(a, "(none)", "exec")
>>> eval(x)
hello world
These modules have all been rearranged for Python 3.

Python: Convert string into function name; getattr or equal?

I am editing PROSS.py to work with .cif files for protein structures. Inside the existing PROSS.py, there is the following functions (I believe that's the correct name if it's not associated with any class?), just existing within the .py file:
...
def unpack_pdb_line(line, ATOF=_atof, ATOI=_atoi, STRIP=string.strip):
...
...
def read_pdb(f, as_protein=0, as_rna=0, as_dna=0, all_models=0,
unpack=unpack_pdb_line, atom_build=atom_build):
I am adding an optons parser for command line arguments, and one of the options is to specify an alternate method to use besides unpack_pdb_line. So the pertinent part of the options parser is:
...
parser.add_option("--un", dest="unpack_method", default="unpack_pdb_line", type="string", help="Unpack method to use. Default is unpack_pdb_line.")
...
unpack=options.unpack_method
However, options.unpack_method is a string and I need to use the function with the same name as the string inside options.unpack_method. How do I use getattr etc to convert the string into the actual function name?
Thanks,
Paul
Usually you just use a dict and store (func_name, function) pairs:
unpack_options = { 'unpack_pdb_line' : unpack_pdb_line,
'some_other' : some_other_function }
unpack_function = unpack_options[options.unpack_method]
If you want to exploit the dictionaries (&c) that Python's already keeping on your behalf, I'd suggest:
def str2fun(astr):
module, _, function = astr.rpartition('.')
if module:
__import__(module)
mod = sys.modules[module]
else:
mod = sys.modules['__main__'] # or whatever's the "default module"
return getattr(mod, function)
You'll probably want to check the function's signature (and catch exceptions to provide nicer error messages) e.g. via inspect, but this is a useful general-purpose function.
It's easy to add a dictionary of shortcuts, as a fallback, if some known functions full string names (including module/package qualifications) are unwieldy to express this way.
Note we don't use __import__'s result (it doesn't work right when the function is in a module inside some package, as __import__ returns the top-level name of the package... just accessing sys.modules after the import is more practical).
vars()["unpack_pdb_line"]() will work too.
or
globals() or locals() will also work similar way.
>>> def a():return 1
>>>
>>> vars()["a"]
<function a at 0x009D1230>
>>>
>>> vars()["a"]()
1
>>> locals()["a"]()
1
>>> globals()["a"]()
1
Cheers,
If you are taking input from a user, for the sake of security it is probably best to
use a hand-made dict which will accept only a well-defined set of admissible user inputs:
unpack_options = { 'unpack_pdb_line' : unpack_pdb_line,
'unpack_pdb_line2' : unpack_pdb_line2,
}
Ignoring security for a moment, let us note in passing that an easy way to
go from (strings of variable names) to (the value referenced by the variable name)
is to use the globals() builtin dict:
unpack_function=globals()['unpack_pdb_line']
Of course, that will only work if the variable unpack_pdb_line is in the global namespace.
If you need to reach into a packgae for a module, or a module for a variable, then
you could use this function
import sys
def str_to_obj(astr):
print('processing %s'%astr)
try:
return globals()[astr]
except KeyError:
try:
__import__(astr)
mod=sys.modules[astr]
return mod
except ImportError:
module,_,basename=astr.rpartition('.')
if module:
mod=str_to_obj(module)
return getattr(mod,basename)
else:
raise
You could use it like this:
str_to_obj('scipy.stats')
# <module 'scipy.stats' from '/usr/lib/python2.6/dist-packages/scipy/stats/__init__.pyc'>
str_to_obj('scipy.stats.stats')
# <module 'scipy.stats.stats' from '/usr/lib/python2.6/dist-packages/scipy/stats/stats.pyc'>
str_to_obj('scipy.stats.stats.chisquare')
# <function chisquare at 0xa806844>
It works for nested packages, modules, functions, or (global) variables.
function = eval_dottedname(name if '.' in name else "%s.%s" % (__name__, name))
Where eval_dottedname():
def eval_dottedname(dottedname):
"""
>>> eval_dottedname("os.path.join") #doctest: +ELLIPSIS
<function join at 0x...>
>>> eval_dottedname("sys.exit") #doctest: +ELLIPSIS
<built-in function exit>
>>> eval_dottedname("sys") #doctest: +ELLIPSIS
<module 'sys' (built-in)>
"""
return reduce(getattr, dottedname.split(".")[1:],
__import__(dottedname.partition(".")[0]))
eval_dottedname() is the only one among all answers that supports arbitrary names with multiple dots in them e.g., `'datetime.datetime.now'. Though it doesn't work for nested modules that require import, but I can't even remember an example from stdlib for such module.

Categories

Resources