I'm trying to write an python script to collect one specific function's parameters.
Parameters can be in multiple lines like this:
str = "getParameters(['ABCD_1','ABCD_2',\
'ABCD_3','ABCD_4'])\
This works already: (it can catch every words between ' and '):
parameters = re.findall(r'\'[\w-]+\'', str)
for parameter in parameters:
print parameter
But I want that only in case of getParameters function the parameters to be collect, and this does not work:
getparameters = re.findall(r'getParameters\(\[[\w-]+', str, re.X|re.DOTALL)
for line in getparameters:
print line
Please suggest!
Here is an example using ast, just for fun.
import ast
module = ast.parse(
"""getParameters(['ABCD_1','ABCD_2',
'ABCD_3','ABCD_4'])""")
for item in module.body:
if isinstance(item.value, ast.Call) and item.value.func.id == 'getParameters':
parameters = [each.s for each in item.value.args[0].elts]
print parameters
If you're fixed on using RegEx and if your function occurs exactly once, you can try:
re.findall('\'(\w+)\',?', re.search('(getParameters\(.+?\))', x, re.X|re.S).group(1), re.X|re.S)
It's not ideal, but it works. I am sure there is a better way to do this.
Related
I'm trying to use the ast module in Python to parse input code, but am struggling with a lot of the syntax of how to do so. For instance, I have the following code as a testing environment:
import ast
class NodeVisitor(ast.NodeVisitor):
def visit_Call(self, node):
for each in node.args:
print(ast.literal_eval(each))
self.generic_visit(node)
line = "circuit = QubitCircuit(3, True)"
tree = ast.parse(line)
print("VISITOR")
visitor = NodeVisitor()
visitor.visit(tree)
Output:
VISITOR
3
True
In this instance, and please correct me if I'm wrong, the visit_Call will be used if it's a function call? So I can get each argument, however there's no guarantee it will work like this as there are different arguments available to be provided. I understand that node.args is providing my arguments, but I'm not sure how to do things with them?
I guess what I'm asking is how do I check what the arguments are and do different things with them? I'd like to check, perhaps, that the first argument is an Int, and if so, run processInt(parameter) as an example.
The value each in your loop in the method will be assigned to the AST node for each of the arguments in each function call you visit. There are lots of different types of AST nodes, so by checking which kind you have, you may be able to learn things about the argument being passed in.
Note however that the AST is about syntax, not values. So if the function call was foo(bar), it's just going to tell you that the argument is a variable named bar, not what the value of that variable is (which it does not know). If the function call was foo(bar(baz)), it's going to show you that the argument is another function call. If you only need to handle calls with literals as their arguments, then you're probably going to be OK, you'll just look instances of AST.Num and similar.
If you want to check if the first argument is a number and process it if it is, you can do something like:
def visit_Call(self, node):
first_arg = node.args[0]
if isinstance(first_arg, ast.Num):
processInt(first_arg.n)
else:
pass # Do you want to do something on a bad argument? Raise an exception maybe?
I would like to call a function from a user input, but include arguments in the parenthesis. For example, if I have a function that takes one argument:
def var(value):
print(value)
I would like to ask the user for a command and arguments, then call the function with the arguments:
Input Command: var("Test")
Test
Split the function name from the arguments. Look up the function by name using a predefined map. Parse the arguments with literal_eval. Call the function with the arguments.
available = {}
def register_func(f):
available[f.__name__] = f
#register_func
def var(value):
print(value)
from ast import literal_eval
def do_user_func(user_input):
name, args = user_input.split('(', 1)
return available[name](*literal_eval('(' + args[:-1] + ',)'))
do_user_func("var('test')") # prints "test"
This is still incredibly brittle, any invalid input will fail (such as forgetting parentheses, or an invalid function name). It's up to you to make this more robust.
literal_eval is still somewhat unsafe on untrusted input, as it's possible to construct small strings that evaluate to large amounts of memory. '[' * 10 + ']' * 10, for a safe but demonstrative example.
Finally, do not use eval on untrusted user input. There is no practical way to secure it from malicious input. While it will evaluate the nice input you expect, it will also evaluate code that, for example, will delete all your files.
Any attempt to make eval safe will end up being more complex than any of the solutions here, for no practical benefit. It will still not be safe in some way you didn't anticipate. Don't do it.
I am going to post this solution as an alternative, under the assumption that you are dealing with simple inputs such as:
var(arg)
Or, a single function call that can take a list of positional arguments.
By using eval it would be a horrible un-recommended idea, as already mentioned. I think that is the security risk you were reading about.
The ideal way to perform this approach is to have a dictionary, mapping the string to the method you want to execute.
Furthermore, you can consider an alternative way to do this. Have a space separated input to know how to call your function with arguments. Consider an input like this:
"var arg1 arg2"
So when you input that:
call = input().split()
You will now have:
['var', 'arg1', 'arg2']
You can now consider your first argument the function, and everything else the arguments you are passing to the function. So, as a functional example:
def var(some_arg, other_arg):
print(some_arg)
print(other_arg)
d = {"var": var}
call = input().split()
d[call[0]](*call[1:])
Demo:
var foo bar
foo
bar
You should investigate the cmd module. This allows you to parse input similar to shell commands, but I believe you can get tricky and change the delimiters if the parentheses are an important part of the specification.
Instead of using eval, you can parse it yourself. This way, you have control over how each function should parse/deserialize the user input's arguments.
import sys, re
def custom_print(value):
print value
def custom_add(addends):
print sum(addends)
def deserialize_print(args):
# just print it as is
custom_print(args)
def deserialize_add(args):
# remove all whitespace, split on commas, parse as floats
addends = [float(x) for x in re.sub(r"\s", "", args).split(",")]
# send to custom_add function
custom_add(addends)
def get_command():
cmd_input = raw_input("Command: ")
# -- check that the command is formatted properly
# and capture command groups
match = re.match(r"^([a-zA-Z0-9]+)(\(.*\))?$", cmd_input)
if match:
# extract matched groups to separate variables
(cmd, argstring) = match.groups()
# strip parenthesis off of argstring
if argstring:
args = argstring[1:-1]
# send the whole argument string to its corresponding function
if cmd == "print":
deserialize_print(args)
elif cmd == "add":
deserialize_add(args)
elif cmd == "exit":
sys.exit()
else:
print "Command doesn't exist."
else:
print "Invalid command."
# recurse until exit
get_command()
# -- begin fetching commands
get_command()
This is a pretty rough setup, although you can get by with some more error checking and improving the deserializing functions and modularizing function additions.
If the decoupled deserialize functions seem too much, you can also just move the deserialization into the custom functions themselves.
Following is an example of function called from user-input, using Class:
class Wash:
def __init__(self, amount):
self.amount = amount
if amount == 12:
print("Platinum Wash")
elif amount == 6:
print("Basic Wash")
else:
print("Sorry!")
amount = int(input("Enter amount: "))
payment = Wash(amount)
My programming is almost all self taught, so I apologise in advance if some of my terminology is off in this question. Also, I am going to use a simple example to help illustrate my question, but please note that the example itself is not important, its just a way to hopefully make my question clearer.
Imagine that I have some poorly formatted text with a lot of extra white space that I want to clean up. So I create a function that will replace any groups of white space characters that has a new line character in it with a single new line character and any other groups of white space characters with a single space. The function might look like this
def white_space_cleaner(text):
new_line_finder = re.compile(r"\s*\n\s*")
white_space_finder = re.compile(r"\s\s+")
text = new_line_finder.sub("\n", text)
text = white_space_finder.sub(" ", text)
return text
That works just fine, the problem is that now every time I call the function it has to compile the regular expressions. To make it run faster I can rewrite it like this
new_line_finder = re.compile(r"\s*\n\s*")
white_space_finder = re.compile(r"\s\s+")
def white_space_cleaner(text):
text = new_line_finder.sub("\n", text)
text = white_space_finder.sub(" ", text)
return text
Now the regular expressions are only compiled once and the function runs faster. Using timeit on both functions I find that the first function takes 27.3 µs per loop and the second takes 25.5 µs per loop. A small speed up, but one that could be significant if the function is called millions of time or has hundreds of patterns instead of 2. Of course, the downside of the second function is that it pollutes the global namespace and makes the code less readable. Is there some "Pythonic" way to include an object, like a compiled regular expression, in a function without having it be recompiled every time the function is called?
Keep a list of tuples (regular expressions and the replacement text) to apply; there doesn't seem to be a compelling need to name each one individually.
finders = [
(re.compile(r"\s*\n\s*"), "\n"),
(re.compile(r"\s\s+"), " ")
]
def white_space_cleaner(text):
for finder, repl in finders:
text = finder.sub(repl, text)
return text
You might also incorporate functools.partial:
from functools import partial
replacers = {
r"\s*\n\s*": "\n",
r"\s\s+": " "
}
# Ugly boiler-plate, but the only thing you might need to modify
# is the dict above as your needs change.
replacers = [partial(re.compile(regex).sub, repl) for regex, repl in replacers.iteritems()]
def white_space_cleaner(text):
for replacer in replacers:
text = replacer(text)
return text
Another way to do it is to group the common functionality in a class:
class ReUtils(object):
new_line_finder = re.compile(r"\s*\n\s*")
white_space_finder = re.compile(r"\s\s+")
#classmethod
def white_space_cleaner(cls, text):
text = cls.new_line_finder.sub("\n", text)
text = cls.white_space_finder.sub(" ", text)
return text
if __name__ == '__main__':
print ReUtils.white_space_cleaner("the text")
It's already grouped in a module, but depending on the rest of the code a class can also be suitable.
You could put the regular expression compilation into the function parameters, like this:
def white_space_finder(text, new_line_finder=re.compile(r"\s*\n\s*"),
white_space_finder=re.compile(r"\s\s+")):
text = new_line_finder.sub("\n", text)
text = white_space_finder.sub(" ", text)
return text
Since default function arguments are evaluated when the function is parsed, they'll only be loaded once and they won't be in the module namespace. They also give you the flexibility to replace those from calling code if you really need to. The downside is that some people might consider it to be polluting the function signature.
I wanted to try timing this but I couldn't figure out how to use timeit properly. You should see similar results to the global version.
Markus's comment on your post is correct, though; sometimes it's fine to put variables at module-level. If you don't want them to be easily visible to other modules, though, consider prepending the names with an underscore; this marks them as module-private and if you do from module import * it won't import names starting with an underscore (you can still get them if you ask from them by name, though).
Always remember; the end-all to "what's the best way to do this in Python" is almost always "what makes the code most readable?" Python was created, first and foremost, to be easy to read, so do what you think is the most readable thing.
In this particular case I think it doesn't matter. Check:
Is it worth using Python's re.compile?
As you can see in the answer, and in the source code:
https://github.com/python/cpython/blob/master/Lib/re.py#L281
The implementation of the re module has a cache of the regular expression itself. So, the small speed up you see is probably because you avoid the lookup for the cache.
Now, as with the question, sometimes doing something like this is very relevant like, again, building a internal cache that remains namespaced to the function.
def heavy_processing(arg):
return arg + 2
def myfunc(arg1):
# Assign attribute to function if first call
if not hasattr(myfunc, 'cache'):
myfunc.cache = {}
# Perform lookup in internal cache
if arg1 in myfunc.cache:
return myfunc.cache[arg1]
# Very heavy and expensive processing with arg1
result = heavy_processing(arg1)
myfunc.cache[arg1] = result
return result
And this is executed like this:
>>> myfunc.cache
Traceback (most recent call last):
File "<stdin>", line 1, in <module>
AttributeError: 'function' object has no attribute 'cache'
>>> myfunc(10)
12
>>> myfunc.cache
{10: 12}
You can use a static function attribute to hold the compiled re. This example does something similar, keeping a translation table in one function attribute.
def static_var(varname, value):
def decorate(func):
setattr(func, varname, value)
return func
return decorate
#static_var("complements", str.maketrans('acgtACGT', 'tgcaTGCA'))
def rc(seq):
return seq.translate(rc.complements)[::-1]
Being new at programming in general, and new with Python in particular, I'm having some beginner's troubles.
I'm trying out a function from NLTK called generate:
string.generate()
It returns what seems like a string. However, if I write:
stringvariable = string.generate()
or
stringvariable = str(string.generate())
… the stringvariable is always Empty.
So I guess I'm missing something here. Can the text output generated, that I see on the screen, be something else than a string output? And if so, is there any way for me to grab that output and put it into a variable?
Briefly put, how to I get what comes out of string.generate() into stringvariable, if not as described above?
you can rewrite generate. The only disadvantage is that it can change and your code might not be updated to reflect these changes:
from nltk.util import tokenwrap
def generate_no_stdout(self, length=100):
if '_trigram_model' not in self.__dict__:
estimator = lambda fdist, bins: LidstoneProbDist(fdist, 0.2)
self._trigram_model = NgramModel(3, self, estimator=estimator)
text = self._trigram_model.generate(length)
return tokenwrap(text)
then "a.generate()" becomes "generate_no_stdout(a)"
generate() prints its output rather than returning a string, so you need to capture it.
Is there a way to pass a list as a function argument to eval() Or do I have to convert it to a string and then parse it as a list in the function?
My simple example looks like:
eval("func1(\'" + fArgs + "\')")
I'm just not sure if there is a better way of taking fArgs as a list instead of a string
Note:
The list is provided from a JSON response
EDIT: Ok here's a bit more of my class so there's a better understanding of how I'm using eval
def test(arg):
print arg
#Add all allowed functions to this list to be mapped to a dictionary
safe_list = ['test']
safe_dict = dict([ (k, locals().get(k, None)) for k in safe_list ])
class Validate:
def __init__(self, Value, fName, fArgs):
eval(fName + "(\'" + fArgs + "\')", {"__builtins__":None},safe_dict)
I may be wrong in thinking this, but to my understanding this is a safe use of eval because the only functions that can be called are the ones that are listed in the safe_list dictionary. The function to be run and the arguments for that function are being extracted out of a JSON object. The arguments are to be structured as a list, Will joining the list together with ", " be interpreted as actual arguments or just a single argument?
If you're using Python 2.6.x, then you should be able to use the json module (see py doc 19.2). If not, then there is python-json available through the python package index. Both of these packages will provide a reader for parsing JSON data into an appropriate Python data type.
For your second problem of calling a function determined by a message, you can do the following:
def foo():
print 'I am foo!'
def bar():
pass
def baz():
pass
funcs = {'func_a':foo, 'func_b':bar, 'func_c':baz}
funcs['func_a']()
This approach can be a bit more secure than eval because it prevents 'unsafe' python library functions from being injected into the JSON. However, you still need to be cautious that the data supplied to your functions can't be manipulated to cause problems.
Specifying parameters the following way works:
root#parrot$ more test.py
def func1(*args):
for i in args:
print i
l = [1,'a',9.1]
func1(*l)
root#parrot$ python test.py
1
a
9.1
so, no direct need for eval(), unless I'm misunderstanding something.
Using a library to parse JSON input may be a better approach than eval, something like:
import json
func1(json.loads(fArgs))
Assert-ing that user input is correct would be a good idea, too.
The others have a good point, that you shouldn't be using eval. But, if you must:
eval("func1(%s)" % ", ".join(fArgs))
will call the function with all the arguments in the list. This:
eval("func1([%s])" % ", ".join(fArgs))
will call it with the list of arguments in just one argument. Maybe you even want this?
eval("func1([%s])" % ", ".join(map(eval, fArgs)))
which would eval the arguments as well?