How do I extract simple numerical expressions numbers from a string? - python

I want to code a unit converter and I need to extract the given value from the unit in the input string.
To provide a user friendly experience while using the converter I want the user to be able to input the value and the unit in the same string. My problem is that I want to extract the numbers and the letters so that I can tell the program the unit and the value and store them in two different variables. For extracting the letters, I used the in operator, and that works properly. I also found a solution for getting the numbers from the input, but that doesn't work for values with exponents.
a = str(input("Type in your wavelength: "))
if "mm" in a:
print("Unit = Millimeter")
b = float(a.split()[0])
Storing simple inputs like 567 mm as a float in b works but I want to be able to extract inputs like 5*10**6 mm but it says
could not convert string to float: '5*10**6'.
So what can I use to extract more complex numbers like this into a float?

Traditionally, in Python, as in many other languages, exponents are prefixed by the letter e or E. While 5 * 10**6 is not a valid floating point literal, 5e6 most definitely is.
This is something to keep in mind for the future, but it won't solve your issue with the in operator. The problem is that in can only check if something you already know is there. What if your input was 5e-8 km instead?
You should start by coming up with an unambiguously clear definition of how you identify the boundary between number and units in a string. For example, units could be the last contiguous bit of non-digit characters in your string.
You could then split the string using regular expressions. Since the first part can be an arbitrary expression, so you can evaluate it with something as simple as ast.literal_eval. The more complicated your expression can be, the more complicated your parser will have to be as well.
Here's an example to get you started:
from ast import literal_eval
import re
pattern = re.compile(r'(.*[\d\.])\s*(\D+)')
data = '5 * 10**6 mm'
match = pattern.fullmatch(data)
if not match:
raise ValueError('Invalid Expression')
num, units = match.groups()
num = literal_eval(num)

It seems that you are looking for the eval function, as noted in #Rasgel's answer. Documentation here
As some people have pointed out, it poses a big security risk.
To circumvent this, I can think of 2 ways:
1. Combine eval with regex
If you only want to do basic arithmetic operations like addition, subtraction and maybe 2**4 or sth like that, then you can use regex to first remove any non-numerical, non-arithmetic operational characters.
import re
a = str(input("Type in your wavelength: "))
if "mm" in a:
print("Unit = Millimeter")
# After parsing the units,
# Remove anything other than digits, +, -, *, /, . (floats), ! (factorial?) and ()
# If you require any other symbols, add them in
pruned_a = re.sub(r'[^0-9\*\+\-\/\!\.\(\)]', "", a)
result = eval(pruned_a)
2. Make sure eval doesn't actually evaluate any of your local or global variables in your python code.
result = eval(expression, {'__builtins__': None}, {})
(the above code is from another Stackoverflow answer here: Math Expression Evaluation -- there might be other solutions there that you might be interested in)
Combined
import re
a = str(input("Type in your wavelength: "))
if "mm" in a:
print("Unit = Millimeter")
# After parsing the units,
# Remove anything other than digits, +, -, *, /, . (floats), ! (factorial?) and ()
# If you require any other symbols, add them in
pruned_a = re.sub(r'[^0-9\*\+\-\/\!\.\(\)]', "", a)
result = eval(pruned_a, {'__builtins__': None}, {}) #to be extra safe :)

There are many ways to tackle this simple problem, using str.split, regular expressions, eval, ast.literal_eval... Here I propose you to have your own safe routine that will evaluate simple mathematical expressions, code below:
import re
import ast
import operator
def safe_eval(s):
bin_ops = {
ast.Add: operator.add,
ast.Sub: operator.sub,
ast.Mult: operator.mul,
ast.Div: operator.itruediv,
ast.Mod: operator.mod,
ast.Pow: operator.pow
}
node = ast.parse(s, mode='eval')
def _eval(node):
if isinstance(node, ast.Expression):
return _eval(node.body)
elif isinstance(node, ast.Str):
return node.s
elif isinstance(node, ast.Num):
return node.n
elif isinstance(node, ast.BinOp):
return bin_ops[type(node.op)](_eval(node.left), _eval(node.right))
else:
raise Exception('Unsupported type {}'.format(node))
return _eval(node.body)
if __name__ == '__main__':
text = str(input("Type in your wavelength: "))
tokens = [v.strip() for v in text.split()]
if len(tokens) < 2:
raise Exception("expected input: <wavelength expression> <unit>")
wavelength = safe_eval("".join(tokens[:-1]))
dtype = tokens[-1]
print(f"You've typed {wavelength} in {dtype}")
I'll also recommend you read this post Why is using 'eval' a bad practice?

In case you have a string like 5*106and want to convert this number into a float, you can use the eval() function.
>>> float(eval('5*106'))
530.0

Related

Alternative to try/pass - python

try:
return float(x)
except:
return 0
So this function uses the try except statement but is there an alternative way of doing this??
Your code looks like it would work but I would always make sure to define the exception you are trying to catch. In this case ValueError
Another way to write this would be
def func(x):
try:
y = float(x)
except ValueError:
return 0
# do some more with y if it was changed to a float
return y
Assuming x is a str, you could use regex to see if that str is formatted properly to be cast as a float.
import re
float_regex = re.compile(r"^-?(\d*\.)?\d+$")
if float_regex.match(x):
return float(x)
return 0
The ^ ... $ at beginning and end ensure that the match isn't a substring and that the full string can only contain the inner regex. The string must start and stop exactly as defined.
-? allows an optional minus sign
(\d*\.)? optionally allows a digit followed by a decimal
\d+ requires at least 1 digit
See here for more details on the regex expression used: https://regex101.com/r/3RXmM1/1
All of this being said, use your original code, except make sure you only catch the ValueError exception. Using regex is overkill here unless you have a specific reason. If python can't cast as a float then just let it do the work for you.
Try/except seems like to simplest and best solution, but you could use an if statement similar to this:
if str(x).lstrip('-').replace('.', '', 1).isdigit():
return float(x)
return 0
The if statement will remove - if it's a negative number and exactly one . from the input and then check there is only digits left. It converts it to a string first so we can use .replace().
There are still possibilities of exceptions like mentioned in the comments if x == '²'
As #jornsharpe suggested in the comments
from contextlib import suppress
x = "abc"
res= None
with suppress(ValueError):
res = float(x)
if not isinstance(res, float):
res = 0
print(res) # 0
But still not a good thind to suppress the Exceptions.
Your code is fine; using try/catch is promoted among Python coders.
Just as an exercise, I thought I'd follow the definitions in the documentation for the float function and the lexical analysis section on floating point literals. From those we can derive the following regular expression:
import re
reFloat = re.compile(r"(?i)^\s*[+-]?(((\d(_?\d)*)?\.\d(_?\d)*|\d(_?\d)*\.?)(e[+-]?\d(_?\d)*)?|nan|inf(inity)?)\s*$")
We can then define this function:
safeFloat = lambda x, default=0: float(x) if reFloat.match(x) else default
I believe this function will call float if, and only when, float(x) would not raise an exception, provided x is a string.
Some tests:
good = ["-3.14", "10.", ".001", "+1e100", "3.14e-10", "0e0", " 3.14_15_93 ", "+NaN", " -Infinity\n\f\r\t "]
for x in good:
print(x.strip(), safeFloat(x))
bad = [".", "_345", "123_", "e13", "123e", "- 1", "5-3", "1e3.4", "Infinit"]
for x in bad:
print(x.strip(), safeFloat(x)) # Always 0

How to use eval() and compile() in python 3.7? [duplicate]

I have a situation with some code where eval() came up as a possible solution. Now I have never had to use eval() before but, I have come across plenty of information about the potential danger it can cause. That said, I'm very wary about using it.
My situation is that I have input being given by a user:
datamap = input('Provide some data here: ')
Where datamap needs to be a dictionary. I searched around and found that eval() could work this out. I thought that I might be able to check the type of the input before trying to use the data and that would be a viable security precaution.
datamap = eval(input('Provide some data here: ')
if not isinstance(datamap, dict):
return
I read through the docs and I am still unclear if this would be safe or not. Does eval evaluate the data as soon as its entered or after the datamap variable is called?
Is the ast module's .literal_eval() the only safe option?
datamap = eval(input('Provide some data here: ')) means that you actually evaluate the code before you deem it to be unsafe or not. It evaluates the code as soon as the function is called. See also the dangers of eval.
ast.literal_eval raises an exception if the input isn't a valid Python datatype, so the code won't be executed if it's not.
Use ast.literal_eval whenever you need eval. You shouldn't usually evaluate literal Python statements.
ast.literal_eval() only considers a small subset of Python's syntax to be valid:
The string or node provided may only consist of the following Python literal structures: strings, bytes, numbers, tuples, lists, dicts, sets, booleans, and None.
Passing __import__('os').system('rm -rf /a-path-you-really-care-about') into ast.literal_eval() will raise an error, but eval() will happily delete your files.
Since it looks like you're only letting the user input a plain dictionary, use ast.literal_eval(). It safely does what you want and nothing more.
eval:
This is very powerful, but is also very dangerous if you accept strings to evaluate from untrusted input. Suppose the string being evaluated is "os.system('rm -rf /')" ? It will really start deleting all the files on your computer.
ast.literal_eval:
Safely evaluate an expression node or a string containing a Python literal or container display. The string or node provided may only consist of the following Python literal structures: strings, bytes, numbers, tuples, lists, dicts, sets, booleans, None, bytes and sets.
Syntax:
eval(expression, globals=None, locals=None)
import ast
ast.literal_eval(node_or_string)
Example:
# python 2.x - doesn't accept operators in string format
import ast
ast.literal_eval('[1, 2, 3]') # output: [1, 2, 3]
ast.literal_eval('1+1') # output: ValueError: malformed string
# python 3.0 -3.6
import ast
ast.literal_eval("1+1") # output : 2
ast.literal_eval("{'a': 2, 'b': 3, 3:'xyz'}") # output : {'a': 2, 'b': 3, 3:'xyz'}
# type dictionary
ast.literal_eval("",{}) # output : Syntax Error required only one parameter
ast.literal_eval("__import__('os').system('rm -rf /')") # output : error
eval("__import__('os').system('rm -rf /')")
# output : start deleting all the files on your computer.
# restricting using global and local variables
eval("__import__('os').system('rm -rf /')",{'__builtins__':{}},{})
# output : Error due to blocked imports by passing '__builtins__':{} in global
# But still eval is not safe. we can access and break the code as given below
s = """
(lambda fc=(
lambda n: [
c for c in
().__class__.__bases__[0].__subclasses__()
if c.__name__ == n
][0]
):
fc("function")(
fc("code")(
0,0,0,0,"KABOOM",(),(),(),"","",0,""
),{}
)()
)()
"""
eval(s, {'__builtins__':{}})
In the above code ().__class__.__bases__[0] nothing but object itself.
Now we instantiated all the subclasses, here our main enter code hereobjective is to find one class named n from it.
We need to code object and function object from instantiated subclasses. This is an alternative way from CPython to access subclasses of object and attach the system.
From python 3.7 ast.literal_eval() is now stricter. Addition and subtraction of arbitrary numbers are no longer allowed. link
Python's eager in its evaluation, so eval(input(...)) (Python 3) will evaluate the user's input as soon as it hits the eval, regardless of what you do with the data afterwards. Therefore, this is not safe, especially when you eval user input.
Use ast.literal_eval.
As an example, entering this at the prompt could be very bad for you:
__import__('os').system('rm -rf /a-path-you-really-care-about')
In recent Python3 ast.literal_eval() no longer parses simple strings, instead you are supposed to use the ast.parse() method to create an AST then interpret it.
This is a complete example of using ast.parse() correctly in Python 3.6+ to evaluate simple arithmetic expressions safely.
import ast, operator, math
import logging
logger = logging.getLogger(__file__)
def safe_eval(s):
def checkmath(x, *args):
if x not in [x for x in dir(math) if not "__" in x]:
raise SyntaxError(f"Unknown func {x}()")
fun = getattr(math, x)
return fun(*args)
binOps = {
ast.Add: operator.add,
ast.Sub: operator.sub,
ast.Mult: operator.mul,
ast.Div: operator.truediv,
ast.Mod: operator.mod,
ast.Pow: operator.pow,
ast.Call: checkmath,
ast.BinOp: ast.BinOp,
}
unOps = {
ast.USub: operator.neg,
ast.UAdd: operator.pos,
ast.UnaryOp: ast.UnaryOp,
}
ops = tuple(binOps) + tuple(unOps)
tree = ast.parse(s, mode='eval')
def _eval(node):
if isinstance(node, ast.Expression):
logger.debug("Expr")
return _eval(node.body)
elif isinstance(node, ast.Str):
logger.debug("Str")
return node.s
elif isinstance(node, ast.Num):
logger.debug("Num")
return node.value
elif isinstance(node, ast.Constant):
logger.info("Const")
return node.value
elif isinstance(node, ast.BinOp):
logger.debug("BinOp")
if isinstance(node.left, ops):
left = _eval(node.left)
else:
left = node.left.value
if isinstance(node.right, ops):
right = _eval(node.right)
else:
right = node.right.value
return binOps[type(node.op)](left, right)
elif isinstance(node, ast.UnaryOp):
logger.debug("UpOp")
if isinstance(node.operand, ops):
operand = _eval(node.operand)
else:
operand = node.operand.value
return unOps[type(node.op)](operand)
elif isinstance(node, ast.Call):
args = [_eval(x) for x in node.args]
r = checkmath(node.func.id, *args)
return r
else:
raise SyntaxError(f"Bad syntax, {type(node)}")
return _eval(tree)
if __name__ == "__main__":
logger.setLevel(logging.DEBUG)
ch = logging.StreamHandler()
logger.addHandler(ch)
assert safe_eval("1+1") == 2
assert safe_eval("1+-5") == -4
assert safe_eval("-1") == -1
assert safe_eval("-+1") == -1
assert safe_eval("(100*10)+6") == 1006
assert safe_eval("100*(10+6)") == 1600
assert safe_eval("2**4") == 2**4
assert safe_eval("sqrt(16)+1") == math.sqrt(16) + 1
assert safe_eval("1.2345 * 10") == 1.2345 * 10
print("Tests pass")
If all you need is a user provided dictionary, a possible better solution is json.loads. The main limitation is that JSON dicts ("objects") require string keys. Also you can only provide literal data, but that is also the case for ast.literal_eval.

How do I remove from the beginning to the 2nd specific character of a string?

I have a bunch of strings that are of the form:
'foo.bar.baz.spam.spam.spam...etc'
In all likelihood they have three or more multi-letter substrings separated by .'s. There might be ill formed strings with less than two .'s, and I want the original string in that case.
The first thing that comes to mind is the str.partition method, which I would use if I were after everything after the first .:
'foo.bar.baz.boink.a.b.c'.partition('.')[2]
returns
'bar.baz.boink.a.b.c'
This could be repeated:
def secondpartition(s):
return s.partition('.')[2].partition('.')[2] or s
But is this efficient? It doesn't seem efficient to call a method twice and use a subscript twice. It is certainly inelegant. Is there a better way?
The main question is:
How do you drop everything from the beginning up to the second instance of the . character, so that 'foo.bar.baz.spam.spam.spam' becomes 'baz.spam.spam.spam'? What would be the best/most efficient way to do that?
Using str.split with maxsplit argument:
>>> 'foo.bar.baz.spam.spam.spam'.split('.', 2)[-1]
'baz.spam.spam.spam'
UPDATE
To handle string with less than two .s:
def secondpartition(s):
parts = s.split('.', 2)
if len(parts) <= 2:
return s
return parts[-1]
Summary: This is the most performant approach (generalized to n characters):
def maxsplittwoexcept(s, n, c):
'''
given string s, return the string after the nth character c
if less than n c's, return the whole string s.
'''
try:
return s.split(c, 2)[2]
except IndexError:
return s
but I show other approaches for comparison.
There are various ways of doing this with string methods and regular expressions. I'll ensure you can follow along with an interpreter by being able to cut and paste everything in order.
First imports:
import re
import timeit
from itertools import islice
Different approaches: string methods
The way mentioned in the question is to partition twice, but I discounted it because it seems rather inelegant and unnecessarily repetitive:
def secondpartition(s):
return s.partition('.')[2].partition('.')[2] or s
The second way that came to mind to do this is to split on the .'s, slice from the second on, and join with .'s. This struck me as fairly elegant and I assumed it would be rather efficient.
def splitslicejoin(s):
return '.'.join(s.split('.')[2:]) or s
But slices create an unnecessary extra list. However, islice from the itertools module provides an iterable that doesn't! So I expected this to do even better:
def splitislicejoin(s):
return '.'.join(islice(s.split('.'), 2, None)) or s
Different approaches: regular expressions
Now regular expressions. First way that came to mind with regular expressions was to find and substitute with an empty string up to the second ..
dot2 = re.compile('.*?\..*?\.')
def redot2(s):
return dot2.sub('', s)
But it occurred to me that it might be better to use a non-capturing group, and return a match on the end:
dot2match = re.compile('(?:.*?\..*?\.)(.*)')
def redot2match(s):
match = dot2match.match(s)
if match is not None:
return match.group(1)
else:
return s
Finally, I could use a regular expression search to find the end of the second . and then use that index to slice the string, which would use a lot more code, but might still be fast and memory efficient.
dot = re.compile('\.')
def find2nddot(s):
for i, found_dot in enumerate(dot.finditer(s)):
if i == 1:
return s[found_dot.end():] or s
return s
update Falsetru suggests str.split's maxsplit argument, which had completely slipped my mind. My thoughts are that it may be the most straightforward approach, but the assignment and extra checking might hurt it.
def maxsplittwo(s):
parts = s.split('.', 2)
if len(parts) <= 2:
return s
return parts[-1]
And JonClements suggests using an except referencing https://stackoverflow.com/a/27989577/541136 which would look like this:
def maxsplittwoexcept(s):
try:
return s.split('.', 2)[2]
except IndexError:
return s
which would be totally appropriate since not having enough .s would be exceptional.
Testing
Now let's test our functions. First, let's assert that they actually work (not a best practice in production code, which should use unittests, but useful for fast validation on StackOverflow):
functions = ('secondpartition', 'redot2match', 'redot2',
'splitslicejoin', 'splitislicejoin', 'find2nddot',
'maxsplittwo', 'maxsplittwoexcept')
for function in functions:
assert globals()[function]('foo.baz') == 'foo.baz'
assert globals()[function]('foo.baz.bar') == 'bar'
assert globals()[function]('foo.baz.bar.boink') == 'bar.boink'
The asserts don't raise AssertionErrors, so now let's time them to see how they perform:
Performance
setup = 'from __main__ import ' + ', '.join(functions)
perfs = {}
for func in functions:
perfs[func] = min(timeit.repeat(func + '("foo.bar.baz.a.b.c")', setup))
for func in sorted(perfs, key=lambda x: perfs[x]):
print('{0}: {1}'.format(func, perfs[func]))
Results
Update Best performer is falsetru's maxsplittwo, which slightly edges out the secondpartition function. Congratulations to falsetru. It makes sense since it is a very direct approach. And JonClements's modification is even better...
maxsplittwoexcept: 1.01329493523
maxsplittwo: 1.08345508575
secondpartition: 1.1336209774
splitslicejoin: 1.49500417709
redot2match: 2.22423219681
splitislicejoin: 3.4605550766
find2nddot: 3.77172589302
redot2: 4.69134306908
Older run and analysis without falsetru's maxsplittwo and JonClements' maxsplittwoexcept:
secondpartition: 0.636116637553
splitslicejoin: 1.05499717616
redot2match: 1.10188927335
redot2: 1.6313087087
find2nddot: 1.65386564664
splitislicejoin: 3.13693511439
It turns out that the most performant approach is to partition twice, even though my intuition didn't like it.
Also, it turns out my intuition on using islice was wrong in this case, it is much less performant, and so the extra list from the regular slice is probably worth the tradeoff if faced with a similar bit of code.
Of the regular expressions, the match approach for my desired string is the best performer here, nearly tied with splitslicejoin.

How to eval an expression in other bases?

i was making a calculator in which the user inputs an expression such as 3*2+1/20 and i use the eval to display the answer.
Is there a function that lets me do the same in other bases(bin,oct,hex)?
If they enter in the values as hex, binary, etc, eval will work:
eval("0xa + 8 + 0b11")
# 21
Beware though, eval can be dangerous.
No; eval is used to parse Python and the base of numbers in Python code is fixed.
You could use a regex replace to prefix numbers with 0x if you were insistent upon this method, but it would be better to build a parser utilizing, say, int(string, base) to generate the numbers.
If you really want to go down the Python route, here's a token based transformation:
import tokenize
from io import BytesIO
def tokens_with_base(tokens, base):
for token in tokens:
if token.type == tokenize.NUMBER:
try:
value = int(token.string, base)
except ValueError:
# Not transformable
pass
else:
# Transformable
token = tokenize.TokenInfo(
type = tokenize.NUMBER,
string = str(value),
start = token.start,
end = token.end,
line = token.line
)
yield token
def python_change_default_base(string, base):
tokens = tokenize.tokenize(BytesIO(string.encode()).readline)
transformed = tokens_with_base(tokens, base)
return tokenize.untokenize(transformed)
eval(python_change_default_base("3*2+1/20", 16))
#>>> 6.03125
0x3*0x2+0x1/0x20
#>>> 6.03125
This is safer because it respects things like strings.

Using python's eval() vs. ast.literal_eval()

I have a situation with some code where eval() came up as a possible solution. Now I have never had to use eval() before but, I have come across plenty of information about the potential danger it can cause. That said, I'm very wary about using it.
My situation is that I have input being given by a user:
datamap = input('Provide some data here: ')
Where datamap needs to be a dictionary. I searched around and found that eval() could work this out. I thought that I might be able to check the type of the input before trying to use the data and that would be a viable security precaution.
datamap = eval(input('Provide some data here: ')
if not isinstance(datamap, dict):
return
I read through the docs and I am still unclear if this would be safe or not. Does eval evaluate the data as soon as its entered or after the datamap variable is called?
Is the ast module's .literal_eval() the only safe option?
datamap = eval(input('Provide some data here: ')) means that you actually evaluate the code before you deem it to be unsafe or not. It evaluates the code as soon as the function is called. See also the dangers of eval.
ast.literal_eval raises an exception if the input isn't a valid Python datatype, so the code won't be executed if it's not.
Use ast.literal_eval whenever you need eval. You shouldn't usually evaluate literal Python statements.
ast.literal_eval() only considers a small subset of Python's syntax to be valid:
The string or node provided may only consist of the following Python literal structures: strings, bytes, numbers, tuples, lists, dicts, sets, booleans, and None.
Passing __import__('os').system('rm -rf /a-path-you-really-care-about') into ast.literal_eval() will raise an error, but eval() will happily delete your files.
Since it looks like you're only letting the user input a plain dictionary, use ast.literal_eval(). It safely does what you want and nothing more.
eval:
This is very powerful, but is also very dangerous if you accept strings to evaluate from untrusted input. Suppose the string being evaluated is "os.system('rm -rf /')" ? It will really start deleting all the files on your computer.
ast.literal_eval:
Safely evaluate an expression node or a string containing a Python literal or container display. The string or node provided may only consist of the following Python literal structures: strings, bytes, numbers, tuples, lists, dicts, sets, booleans, None, bytes and sets.
Syntax:
eval(expression, globals=None, locals=None)
import ast
ast.literal_eval(node_or_string)
Example:
# python 2.x - doesn't accept operators in string format
import ast
ast.literal_eval('[1, 2, 3]') # output: [1, 2, 3]
ast.literal_eval('1+1') # output: ValueError: malformed string
# python 3.0 -3.6
import ast
ast.literal_eval("1+1") # output : 2
ast.literal_eval("{'a': 2, 'b': 3, 3:'xyz'}") # output : {'a': 2, 'b': 3, 3:'xyz'}
# type dictionary
ast.literal_eval("",{}) # output : Syntax Error required only one parameter
ast.literal_eval("__import__('os').system('rm -rf /')") # output : error
eval("__import__('os').system('rm -rf /')")
# output : start deleting all the files on your computer.
# restricting using global and local variables
eval("__import__('os').system('rm -rf /')",{'__builtins__':{}},{})
# output : Error due to blocked imports by passing '__builtins__':{} in global
# But still eval is not safe. we can access and break the code as given below
s = """
(lambda fc=(
lambda n: [
c for c in
().__class__.__bases__[0].__subclasses__()
if c.__name__ == n
][0]
):
fc("function")(
fc("code")(
0,0,0,0,"KABOOM",(),(),(),"","",0,""
),{}
)()
)()
"""
eval(s, {'__builtins__':{}})
In the above code ().__class__.__bases__[0] nothing but object itself.
Now we instantiated all the subclasses, here our main enter code hereobjective is to find one class named n from it.
We need to code object and function object from instantiated subclasses. This is an alternative way from CPython to access subclasses of object and attach the system.
From python 3.7 ast.literal_eval() is now stricter. Addition and subtraction of arbitrary numbers are no longer allowed. link
Python's eager in its evaluation, so eval(input(...)) (Python 3) will evaluate the user's input as soon as it hits the eval, regardless of what you do with the data afterwards. Therefore, this is not safe, especially when you eval user input.
Use ast.literal_eval.
As an example, entering this at the prompt could be very bad for you:
__import__('os').system('rm -rf /a-path-you-really-care-about')
In recent Python3 ast.literal_eval() no longer parses simple strings, instead you are supposed to use the ast.parse() method to create an AST then interpret it.
This is a complete example of using ast.parse() correctly in Python 3.6+ to evaluate simple arithmetic expressions safely.
import ast, operator, math
import logging
logger = logging.getLogger(__file__)
def safe_eval(s):
def checkmath(x, *args):
if x not in [x for x in dir(math) if not "__" in x]:
raise SyntaxError(f"Unknown func {x}()")
fun = getattr(math, x)
return fun(*args)
binOps = {
ast.Add: operator.add,
ast.Sub: operator.sub,
ast.Mult: operator.mul,
ast.Div: operator.truediv,
ast.Mod: operator.mod,
ast.Pow: operator.pow,
ast.Call: checkmath,
ast.BinOp: ast.BinOp,
}
unOps = {
ast.USub: operator.neg,
ast.UAdd: operator.pos,
ast.UnaryOp: ast.UnaryOp,
}
ops = tuple(binOps) + tuple(unOps)
tree = ast.parse(s, mode='eval')
def _eval(node):
if isinstance(node, ast.Expression):
logger.debug("Expr")
return _eval(node.body)
elif isinstance(node, ast.Str):
logger.debug("Str")
return node.s
elif isinstance(node, ast.Num):
logger.debug("Num")
return node.value
elif isinstance(node, ast.Constant):
logger.info("Const")
return node.value
elif isinstance(node, ast.BinOp):
logger.debug("BinOp")
if isinstance(node.left, ops):
left = _eval(node.left)
else:
left = node.left.value
if isinstance(node.right, ops):
right = _eval(node.right)
else:
right = node.right.value
return binOps[type(node.op)](left, right)
elif isinstance(node, ast.UnaryOp):
logger.debug("UpOp")
if isinstance(node.operand, ops):
operand = _eval(node.operand)
else:
operand = node.operand.value
return unOps[type(node.op)](operand)
elif isinstance(node, ast.Call):
args = [_eval(x) for x in node.args]
r = checkmath(node.func.id, *args)
return r
else:
raise SyntaxError(f"Bad syntax, {type(node)}")
return _eval(tree)
if __name__ == "__main__":
logger.setLevel(logging.DEBUG)
ch = logging.StreamHandler()
logger.addHandler(ch)
assert safe_eval("1+1") == 2
assert safe_eval("1+-5") == -4
assert safe_eval("-1") == -1
assert safe_eval("-+1") == -1
assert safe_eval("(100*10)+6") == 1006
assert safe_eval("100*(10+6)") == 1600
assert safe_eval("2**4") == 2**4
assert safe_eval("sqrt(16)+1") == math.sqrt(16) + 1
assert safe_eval("1.2345 * 10") == 1.2345 * 10
print("Tests pass")
If all you need is a user provided dictionary, a possible better solution is json.loads. The main limitation is that JSON dicts ("objects") require string keys. Also you can only provide literal data, but that is also the case for ast.literal_eval.

Categories

Resources