Related
I have a situation with some code where eval() came up as a possible solution. Now I have never had to use eval() before but, I have come across plenty of information about the potential danger it can cause. That said, I'm very wary about using it.
My situation is that I have input being given by a user:
datamap = input('Provide some data here: ')
Where datamap needs to be a dictionary. I searched around and found that eval() could work this out. I thought that I might be able to check the type of the input before trying to use the data and that would be a viable security precaution.
datamap = eval(input('Provide some data here: ')
if not isinstance(datamap, dict):
return
I read through the docs and I am still unclear if this would be safe or not. Does eval evaluate the data as soon as its entered or after the datamap variable is called?
Is the ast module's .literal_eval() the only safe option?
datamap = eval(input('Provide some data here: ')) means that you actually evaluate the code before you deem it to be unsafe or not. It evaluates the code as soon as the function is called. See also the dangers of eval.
ast.literal_eval raises an exception if the input isn't a valid Python datatype, so the code won't be executed if it's not.
Use ast.literal_eval whenever you need eval. You shouldn't usually evaluate literal Python statements.
ast.literal_eval() only considers a small subset of Python's syntax to be valid:
The string or node provided may only consist of the following Python literal structures: strings, bytes, numbers, tuples, lists, dicts, sets, booleans, and None.
Passing __import__('os').system('rm -rf /a-path-you-really-care-about') into ast.literal_eval() will raise an error, but eval() will happily delete your files.
Since it looks like you're only letting the user input a plain dictionary, use ast.literal_eval(). It safely does what you want and nothing more.
eval:
This is very powerful, but is also very dangerous if you accept strings to evaluate from untrusted input. Suppose the string being evaluated is "os.system('rm -rf /')" ? It will really start deleting all the files on your computer.
ast.literal_eval:
Safely evaluate an expression node or a string containing a Python literal or container display. The string or node provided may only consist of the following Python literal structures: strings, bytes, numbers, tuples, lists, dicts, sets, booleans, None, bytes and sets.
Syntax:
eval(expression, globals=None, locals=None)
import ast
ast.literal_eval(node_or_string)
Example:
# python 2.x - doesn't accept operators in string format
import ast
ast.literal_eval('[1, 2, 3]') # output: [1, 2, 3]
ast.literal_eval('1+1') # output: ValueError: malformed string
# python 3.0 -3.6
import ast
ast.literal_eval("1+1") # output : 2
ast.literal_eval("{'a': 2, 'b': 3, 3:'xyz'}") # output : {'a': 2, 'b': 3, 3:'xyz'}
# type dictionary
ast.literal_eval("",{}) # output : Syntax Error required only one parameter
ast.literal_eval("__import__('os').system('rm -rf /')") # output : error
eval("__import__('os').system('rm -rf /')")
# output : start deleting all the files on your computer.
# restricting using global and local variables
eval("__import__('os').system('rm -rf /')",{'__builtins__':{}},{})
# output : Error due to blocked imports by passing '__builtins__':{} in global
# But still eval is not safe. we can access and break the code as given below
s = """
(lambda fc=(
lambda n: [
c for c in
().__class__.__bases__[0].__subclasses__()
if c.__name__ == n
][0]
):
fc("function")(
fc("code")(
0,0,0,0,"KABOOM",(),(),(),"","",0,""
),{}
)()
)()
"""
eval(s, {'__builtins__':{}})
In the above code ().__class__.__bases__[0] nothing but object itself.
Now we instantiated all the subclasses, here our main enter code hereobjective is to find one class named n from it.
We need to code object and function object from instantiated subclasses. This is an alternative way from CPython to access subclasses of object and attach the system.
From python 3.7 ast.literal_eval() is now stricter. Addition and subtraction of arbitrary numbers are no longer allowed. link
Python's eager in its evaluation, so eval(input(...)) (Python 3) will evaluate the user's input as soon as it hits the eval, regardless of what you do with the data afterwards. Therefore, this is not safe, especially when you eval user input.
Use ast.literal_eval.
As an example, entering this at the prompt could be very bad for you:
__import__('os').system('rm -rf /a-path-you-really-care-about')
In recent Python3 ast.literal_eval() no longer parses simple strings, instead you are supposed to use the ast.parse() method to create an AST then interpret it.
This is a complete example of using ast.parse() correctly in Python 3.6+ to evaluate simple arithmetic expressions safely.
import ast, operator, math
import logging
logger = logging.getLogger(__file__)
def safe_eval(s):
def checkmath(x, *args):
if x not in [x for x in dir(math) if not "__" in x]:
raise SyntaxError(f"Unknown func {x}()")
fun = getattr(math, x)
return fun(*args)
binOps = {
ast.Add: operator.add,
ast.Sub: operator.sub,
ast.Mult: operator.mul,
ast.Div: operator.truediv,
ast.Mod: operator.mod,
ast.Pow: operator.pow,
ast.Call: checkmath,
ast.BinOp: ast.BinOp,
}
unOps = {
ast.USub: operator.neg,
ast.UAdd: operator.pos,
ast.UnaryOp: ast.UnaryOp,
}
ops = tuple(binOps) + tuple(unOps)
tree = ast.parse(s, mode='eval')
def _eval(node):
if isinstance(node, ast.Expression):
logger.debug("Expr")
return _eval(node.body)
elif isinstance(node, ast.Str):
logger.debug("Str")
return node.s
elif isinstance(node, ast.Num):
logger.debug("Num")
return node.value
elif isinstance(node, ast.Constant):
logger.info("Const")
return node.value
elif isinstance(node, ast.BinOp):
logger.debug("BinOp")
if isinstance(node.left, ops):
left = _eval(node.left)
else:
left = node.left.value
if isinstance(node.right, ops):
right = _eval(node.right)
else:
right = node.right.value
return binOps[type(node.op)](left, right)
elif isinstance(node, ast.UnaryOp):
logger.debug("UpOp")
if isinstance(node.operand, ops):
operand = _eval(node.operand)
else:
operand = node.operand.value
return unOps[type(node.op)](operand)
elif isinstance(node, ast.Call):
args = [_eval(x) for x in node.args]
r = checkmath(node.func.id, *args)
return r
else:
raise SyntaxError(f"Bad syntax, {type(node)}")
return _eval(tree)
if __name__ == "__main__":
logger.setLevel(logging.DEBUG)
ch = logging.StreamHandler()
logger.addHandler(ch)
assert safe_eval("1+1") == 2
assert safe_eval("1+-5") == -4
assert safe_eval("-1") == -1
assert safe_eval("-+1") == -1
assert safe_eval("(100*10)+6") == 1006
assert safe_eval("100*(10+6)") == 1600
assert safe_eval("2**4") == 2**4
assert safe_eval("sqrt(16)+1") == math.sqrt(16) + 1
assert safe_eval("1.2345 * 10") == 1.2345 * 10
print("Tests pass")
If all you need is a user provided dictionary, a possible better solution is json.loads. The main limitation is that JSON dicts ("objects") require string keys. Also you can only provide literal data, but that is also the case for ast.literal_eval.
I want to code a unit converter and I need to extract the given value from the unit in the input string.
To provide a user friendly experience while using the converter I want the user to be able to input the value and the unit in the same string. My problem is that I want to extract the numbers and the letters so that I can tell the program the unit and the value and store them in two different variables. For extracting the letters, I used the in operator, and that works properly. I also found a solution for getting the numbers from the input, but that doesn't work for values with exponents.
a = str(input("Type in your wavelength: "))
if "mm" in a:
print("Unit = Millimeter")
b = float(a.split()[0])
Storing simple inputs like 567 mm as a float in b works but I want to be able to extract inputs like 5*10**6 mm but it says
could not convert string to float: '5*10**6'.
So what can I use to extract more complex numbers like this into a float?
Traditionally, in Python, as in many other languages, exponents are prefixed by the letter e or E. While 5 * 10**6 is not a valid floating point literal, 5e6 most definitely is.
This is something to keep in mind for the future, but it won't solve your issue with the in operator. The problem is that in can only check if something you already know is there. What if your input was 5e-8 km instead?
You should start by coming up with an unambiguously clear definition of how you identify the boundary between number and units in a string. For example, units could be the last contiguous bit of non-digit characters in your string.
You could then split the string using regular expressions. Since the first part can be an arbitrary expression, so you can evaluate it with something as simple as ast.literal_eval. The more complicated your expression can be, the more complicated your parser will have to be as well.
Here's an example to get you started:
from ast import literal_eval
import re
pattern = re.compile(r'(.*[\d\.])\s*(\D+)')
data = '5 * 10**6 mm'
match = pattern.fullmatch(data)
if not match:
raise ValueError('Invalid Expression')
num, units = match.groups()
num = literal_eval(num)
It seems that you are looking for the eval function, as noted in #Rasgel's answer. Documentation here
As some people have pointed out, it poses a big security risk.
To circumvent this, I can think of 2 ways:
1. Combine eval with regex
If you only want to do basic arithmetic operations like addition, subtraction and maybe 2**4 or sth like that, then you can use regex to first remove any non-numerical, non-arithmetic operational characters.
import re
a = str(input("Type in your wavelength: "))
if "mm" in a:
print("Unit = Millimeter")
# After parsing the units,
# Remove anything other than digits, +, -, *, /, . (floats), ! (factorial?) and ()
# If you require any other symbols, add them in
pruned_a = re.sub(r'[^0-9\*\+\-\/\!\.\(\)]', "", a)
result = eval(pruned_a)
2. Make sure eval doesn't actually evaluate any of your local or global variables in your python code.
result = eval(expression, {'__builtins__': None}, {})
(the above code is from another Stackoverflow answer here: Math Expression Evaluation -- there might be other solutions there that you might be interested in)
Combined
import re
a = str(input("Type in your wavelength: "))
if "mm" in a:
print("Unit = Millimeter")
# After parsing the units,
# Remove anything other than digits, +, -, *, /, . (floats), ! (factorial?) and ()
# If you require any other symbols, add them in
pruned_a = re.sub(r'[^0-9\*\+\-\/\!\.\(\)]', "", a)
result = eval(pruned_a, {'__builtins__': None}, {}) #to be extra safe :)
There are many ways to tackle this simple problem, using str.split, regular expressions, eval, ast.literal_eval... Here I propose you to have your own safe routine that will evaluate simple mathematical expressions, code below:
import re
import ast
import operator
def safe_eval(s):
bin_ops = {
ast.Add: operator.add,
ast.Sub: operator.sub,
ast.Mult: operator.mul,
ast.Div: operator.itruediv,
ast.Mod: operator.mod,
ast.Pow: operator.pow
}
node = ast.parse(s, mode='eval')
def _eval(node):
if isinstance(node, ast.Expression):
return _eval(node.body)
elif isinstance(node, ast.Str):
return node.s
elif isinstance(node, ast.Num):
return node.n
elif isinstance(node, ast.BinOp):
return bin_ops[type(node.op)](_eval(node.left), _eval(node.right))
else:
raise Exception('Unsupported type {}'.format(node))
return _eval(node.body)
if __name__ == '__main__':
text = str(input("Type in your wavelength: "))
tokens = [v.strip() for v in text.split()]
if len(tokens) < 2:
raise Exception("expected input: <wavelength expression> <unit>")
wavelength = safe_eval("".join(tokens[:-1]))
dtype = tokens[-1]
print(f"You've typed {wavelength} in {dtype}")
I'll also recommend you read this post Why is using 'eval' a bad practice?
In case you have a string like 5*106and want to convert this number into a float, you can use the eval() function.
>>> float(eval('5*106'))
530.0
I convert a string to a json-object using the json-library:
a = '{"index":1}'
import json
json.loads(a)
{'index': 1}
However, if I instead change the string a to contain a leading 0, then it breaks down:
a = '{"index":01}'
import json
json.loads(a)
>>> JSONDecodeError: Expecting ',' delimiter
I believe this is due to the fact that it is invalid JSON if an integer begins with a leading zero as described in this thread.
Is there a way to remedy this? If not, then I guess the best way is to remove any leading zeroes by a regex from the string first, then convert to json?
A leading 0 in a number literal in JSON is invalid unless the number literal is only the character 0 or starts with 0.. The Python json module is quite strict in that it will not accept such number literals. In part because a leading 0 is sometimes used to denote octal notation rather than decimal notation. Deserialising such numbers could lead to unintended programming errors. That is, should 010 be parsed as the number 8 (in octal notation) or as 10 (in decimal notation).
You can create a decoder that will do what you want, but you will need to heavily hack the json module or rewrite much of its internals. Either way, you will see a performance slow down as you will no longer be using the C implementation of the module.
Below is an implementation that can decode JSON which contains numbers with any number of leading zeros.
import json
import re
import threading
# a more lenient number regex (modified from json.scanner.NUMBER_RE)
NUMBER_RE = re.compile(
r'(-?(?:\d*))(\.\d+)?([eE][-+]?\d+)?',
(re.VERBOSE | re.MULTILINE | re.DOTALL))
# we are going to be messing with the internals of `json.scanner`. As such we
# want to return it to its initial state when we're done with it, but we need to
# do so in a thread safe way.
_LOCK = threading.Lock()
def thread_safe_py_make_scanner(context, *, number_re=json.scanner.NUMBER_RE):
with _LOCK:
original_number_re = json.scanner.NUMBER_RE
try:
json.scanner.NUMBER_RE = number_re
return json.scanner._original_py_make_scanner(context)
finally:
json.scanner.NUMBER_RE = original_number_re
json.scanner._original_py_make_scanner = json.scanner.py_make_scanner
json.scanner.py_make_scanner = thread_safe_py_make_scanner
class MyJsonDecoder(json.JSONDecoder):
def __init__(self, *args, **kwargs):
super().__init__(*args, **kwargs)
# overwrite the stricter scan_once implementation
self.scan_once = json.scanner.py_make_scanner(self, number_re=NUMBER_RE)
d = MyJsonDecoder()
n = d.decode('010')
assert n == 10
json.loads('010') # check the normal route still raise an error
I would stress that you shouldn't rely on this as a proper solution. Rather, it's a quick hack to help you decode malformed JSON that is nearly, but not quite valid. It's useful if recreating the JSON in a valid form is not possible for some reason.
First, using regex on JSON is evil, almost as bad as killing a kitten.
If you want to represent 01 as a valid JSON value, then consider using this structure:
a = '{"index" : "01"}'
import json
json.loads(a)
If you need the string literal 01 to behave like a number, then consider just casting it to an integer in your Python script.
How to convert string int JSON into real int with json.loads
Please see the post above
You need to use your own version of Decoder.
More information can be found here , in the github
https://github.com/simplejson/simplejson/blob/master/index.rst
c = '{"value": 02}'
value= json.loads(json.dumps(c))
print(value)
This seems to work .. It is strange
> >>> c = '{"value": 02}'
> >>> import json
> >>> value= json.loads(json.dumps(c))
> >>> print(value) {"value": 02}
> >>> c = '{"value": 0002}'
> >>> value= json.loads(json.dumps(c))
> >>> print(value) {"value": 0002}
As #Dunes, pointed out the loads produces string as an outcome which is not a valid solution.
However,
DEMJSON seems to decode it properly.
https://pypi.org/project/demjson/ -- alternative way
>>> c = '{"value": 02}'
>>> import demjson
>>> demjson.decode(c)
{'value': 2}
Is it possible to capitalize a word using string formatting? For example,
"{user} did such and such.".format(user="foobar")
should return "Foobar did such and such."
Note that I'm well aware of .capitalize(); however, here's a (very simplified version of) code I'm using:
printme = random.choice(["On {date}, {user} did la-dee-dah. ",
"{user} did la-dee-dah on {date}. "
])
output = printme.format(user=x,date=y)
As you can see, just defining user as x.capitalize() in the .format() doesn't work, since then it would also be applied (incorrectly) to the first scenario. And since I can't predict fate, there's no way of knowing which random.choice would be selected in advance. What can I do?
Addt'l note: Just doing output = random.choice(['xyz'.format(),'lmn'.format()]) (in other words, formatting each string individually, and then using .capitalize() for the ones that need it) isn't a viable option, since printme is actually choosing from ~40+ strings.
As said #IgnacioVazquez-Abrams, create a subclass of string.Formatter allow you to extend/change the format string processing.
In your case, you have to overload the method convert_field
from string import Formatter
class ExtendedFormatter(Formatter):
"""An extended format string formatter
Formatter with extended conversion symbol
"""
def convert_field(self, value, conversion):
""" Extend conversion symbol
Following additional symbol has been added
* l: convert to string and low case
* u: convert to string and up case
default are:
* s: convert with str()
* r: convert with repr()
* a: convert with ascii()
"""
if conversion == "u":
return str(value).upper()
elif conversion == "l":
return str(value).lower()
# Do the default conversion or raise error if no matching conversion found
return super(ExtendedFormatter, self).convert_field(value, conversion)
# Test this code
myformatter = ExtendedFormatter()
template_str = "normal:{test}, upcase:{test!u}, lowcase:{test!l}"
output = myformatter.format(template_str, test="DiDaDoDu")
print(output)
You can pass extra values and just not use them, like this lightweight option
printme = random.choice(["On {date}, {user} did la-dee-dah. ",
"{User} did la-dee-dah on {date}. "
])
output = printme.format(user=x, date=y, User=x.capitalize())
The best choice probably depends whether you are doing this enough to need your own fullblown Formatter.
You can create your own subclass of string.Formatter which will allow you to recognize a custom conversion that you can use to recase your strings.
myformatter.format('{user!u} did la-dee-dah on {date}, and {pronoun!l} liked it. ',
user=x, date=y, pronoun=z)
In python 3.6+ you can use fstrings now. https://realpython.com/python-f-strings/
>>> txt = 'aBcD'
>>> f'{txt.upper()}'
'ABCD'
I have a situation with some code where eval() came up as a possible solution. Now I have never had to use eval() before but, I have come across plenty of information about the potential danger it can cause. That said, I'm very wary about using it.
My situation is that I have input being given by a user:
datamap = input('Provide some data here: ')
Where datamap needs to be a dictionary. I searched around and found that eval() could work this out. I thought that I might be able to check the type of the input before trying to use the data and that would be a viable security precaution.
datamap = eval(input('Provide some data here: ')
if not isinstance(datamap, dict):
return
I read through the docs and I am still unclear if this would be safe or not. Does eval evaluate the data as soon as its entered or after the datamap variable is called?
Is the ast module's .literal_eval() the only safe option?
datamap = eval(input('Provide some data here: ')) means that you actually evaluate the code before you deem it to be unsafe or not. It evaluates the code as soon as the function is called. See also the dangers of eval.
ast.literal_eval raises an exception if the input isn't a valid Python datatype, so the code won't be executed if it's not.
Use ast.literal_eval whenever you need eval. You shouldn't usually evaluate literal Python statements.
ast.literal_eval() only considers a small subset of Python's syntax to be valid:
The string or node provided may only consist of the following Python literal structures: strings, bytes, numbers, tuples, lists, dicts, sets, booleans, and None.
Passing __import__('os').system('rm -rf /a-path-you-really-care-about') into ast.literal_eval() will raise an error, but eval() will happily delete your files.
Since it looks like you're only letting the user input a plain dictionary, use ast.literal_eval(). It safely does what you want and nothing more.
eval:
This is very powerful, but is also very dangerous if you accept strings to evaluate from untrusted input. Suppose the string being evaluated is "os.system('rm -rf /')" ? It will really start deleting all the files on your computer.
ast.literal_eval:
Safely evaluate an expression node or a string containing a Python literal or container display. The string or node provided may only consist of the following Python literal structures: strings, bytes, numbers, tuples, lists, dicts, sets, booleans, None, bytes and sets.
Syntax:
eval(expression, globals=None, locals=None)
import ast
ast.literal_eval(node_or_string)
Example:
# python 2.x - doesn't accept operators in string format
import ast
ast.literal_eval('[1, 2, 3]') # output: [1, 2, 3]
ast.literal_eval('1+1') # output: ValueError: malformed string
# python 3.0 -3.6
import ast
ast.literal_eval("1+1") # output : 2
ast.literal_eval("{'a': 2, 'b': 3, 3:'xyz'}") # output : {'a': 2, 'b': 3, 3:'xyz'}
# type dictionary
ast.literal_eval("",{}) # output : Syntax Error required only one parameter
ast.literal_eval("__import__('os').system('rm -rf /')") # output : error
eval("__import__('os').system('rm -rf /')")
# output : start deleting all the files on your computer.
# restricting using global and local variables
eval("__import__('os').system('rm -rf /')",{'__builtins__':{}},{})
# output : Error due to blocked imports by passing '__builtins__':{} in global
# But still eval is not safe. we can access and break the code as given below
s = """
(lambda fc=(
lambda n: [
c for c in
().__class__.__bases__[0].__subclasses__()
if c.__name__ == n
][0]
):
fc("function")(
fc("code")(
0,0,0,0,"KABOOM",(),(),(),"","",0,""
),{}
)()
)()
"""
eval(s, {'__builtins__':{}})
In the above code ().__class__.__bases__[0] nothing but object itself.
Now we instantiated all the subclasses, here our main enter code hereobjective is to find one class named n from it.
We need to code object and function object from instantiated subclasses. This is an alternative way from CPython to access subclasses of object and attach the system.
From python 3.7 ast.literal_eval() is now stricter. Addition and subtraction of arbitrary numbers are no longer allowed. link
Python's eager in its evaluation, so eval(input(...)) (Python 3) will evaluate the user's input as soon as it hits the eval, regardless of what you do with the data afterwards. Therefore, this is not safe, especially when you eval user input.
Use ast.literal_eval.
As an example, entering this at the prompt could be very bad for you:
__import__('os').system('rm -rf /a-path-you-really-care-about')
In recent Python3 ast.literal_eval() no longer parses simple strings, instead you are supposed to use the ast.parse() method to create an AST then interpret it.
This is a complete example of using ast.parse() correctly in Python 3.6+ to evaluate simple arithmetic expressions safely.
import ast, operator, math
import logging
logger = logging.getLogger(__file__)
def safe_eval(s):
def checkmath(x, *args):
if x not in [x for x in dir(math) if not "__" in x]:
raise SyntaxError(f"Unknown func {x}()")
fun = getattr(math, x)
return fun(*args)
binOps = {
ast.Add: operator.add,
ast.Sub: operator.sub,
ast.Mult: operator.mul,
ast.Div: operator.truediv,
ast.Mod: operator.mod,
ast.Pow: operator.pow,
ast.Call: checkmath,
ast.BinOp: ast.BinOp,
}
unOps = {
ast.USub: operator.neg,
ast.UAdd: operator.pos,
ast.UnaryOp: ast.UnaryOp,
}
ops = tuple(binOps) + tuple(unOps)
tree = ast.parse(s, mode='eval')
def _eval(node):
if isinstance(node, ast.Expression):
logger.debug("Expr")
return _eval(node.body)
elif isinstance(node, ast.Str):
logger.debug("Str")
return node.s
elif isinstance(node, ast.Num):
logger.debug("Num")
return node.value
elif isinstance(node, ast.Constant):
logger.info("Const")
return node.value
elif isinstance(node, ast.BinOp):
logger.debug("BinOp")
if isinstance(node.left, ops):
left = _eval(node.left)
else:
left = node.left.value
if isinstance(node.right, ops):
right = _eval(node.right)
else:
right = node.right.value
return binOps[type(node.op)](left, right)
elif isinstance(node, ast.UnaryOp):
logger.debug("UpOp")
if isinstance(node.operand, ops):
operand = _eval(node.operand)
else:
operand = node.operand.value
return unOps[type(node.op)](operand)
elif isinstance(node, ast.Call):
args = [_eval(x) for x in node.args]
r = checkmath(node.func.id, *args)
return r
else:
raise SyntaxError(f"Bad syntax, {type(node)}")
return _eval(tree)
if __name__ == "__main__":
logger.setLevel(logging.DEBUG)
ch = logging.StreamHandler()
logger.addHandler(ch)
assert safe_eval("1+1") == 2
assert safe_eval("1+-5") == -4
assert safe_eval("-1") == -1
assert safe_eval("-+1") == -1
assert safe_eval("(100*10)+6") == 1006
assert safe_eval("100*(10+6)") == 1600
assert safe_eval("2**4") == 2**4
assert safe_eval("sqrt(16)+1") == math.sqrt(16) + 1
assert safe_eval("1.2345 * 10") == 1.2345 * 10
print("Tests pass")
If all you need is a user provided dictionary, a possible better solution is json.loads. The main limitation is that JSON dicts ("objects") require string keys. Also you can only provide literal data, but that is also the case for ast.literal_eval.