Python, exclusive exponentiation from a string input - python

While working within python 3, I have created a calculator that accepts string inputs such as "-28 + 4.0/3 * 5" and other similar mathematical equations. As an exercise I had wanted to support exponentiation through use of the '^' key such that inputs like "5.23 * 2^4/3^2 -1.0" or other equations that contain values to a certain power would be functional. However, implementation with my current code has proven difficult. Not wanting to scrap my work, I realized that I could implement this if I could find a way to take the original string and selectively solve for the '^' operations such that inputs like the aforementioned "5.23 * 2^4/3^2 -1.0" would become "5.23 * 16/9 -1.0" which I could then feed into the code written prior. Only problem is, I am having some trouble isolating these pieces of the equations and was hoping someone might be able to lend a hand.

As binary and infix operators, you could split the string into symbols (numbers, operators), assign priority to operators and then rearrange it into (prefix-notation-like) stack.
Or split the input string into the parts separated by exponent mark, each number at the end-begining of neighbooring sub strings could then be cut, evaluated and replaced: "6 * 4^3 +2" -> ["6 * 4", "3 + 2"] -> "6 *" + x + "+ 2"

Related

Python calculator - evaluating string of statements using decimal module

I'm writing a Discord bot that will accept user input as a string and then evaluate the expression inside it, nothing fancy just simple arithmetic operations. I have two concerns - safety and decimals. First I used simpleeval package, it worked fine but it had trouble with decimals, e.g
0.1 + 0.1 + 0.1 - 0.3 would return 5.551115123125783e-17. After googling a lot I found an answer that work's but it uses the built in eval() function and apparently using it is a big no.
Is there a better/safer way of handling this? This implements https://docs.python.org/3/library/tokenize.html#examples decistmt() method which substitutes Decimals for floats in a string of statements. But in the end I use eval() and with all those checks I'm still unsure if it's safe and I'd rather avoid it.
This is what decistmt() does:
from tokenize import tokenize, untokenize, NUMBER, STRING, NAME, OP
from io import BytesIO
def decistmt(s):
"""Substitute Decimals for floats in a string of statements.
>>> from decimal import Decimal
>>> s = 'print(+21.3e-5*-.1234/81.7)'
>>> decistmt(s)
"print (+Decimal ('21.3e-5')*-Decimal ('.1234')/Decimal ('81.7'))"
The format of the exponent is inherited from the platform C library.
Known cases are "e-007" (Windows) and "e-07" (not Windows). Since
we're only showing 12 digits, and the 13th isn't close to 5, the
rest of the output should be platform-independent.
>>> exec(s) #doctest: +ELLIPSIS
-3.21716034272e-0...7
Output from calculations with Decimal should be identical across all
platforms.
>>> exec(decistmt(s))
-3.217160342717258261933904529E-7
"""
result = []
g = tokenize(BytesIO(s.encode('utf-8')).readline) # tokenize the string
for toknum, tokval, _, _, _ in g:
if toknum == NUMBER and '.' in tokval: # replace NUMBER tokens
result.extend([
(NAME, 'Decimal'),
(OP, '('),
(STRING, repr(tokval)),
(OP, ')')
])
else:
result.append((toknum, tokval))
return untokenize(result).decode('utf-8')
# example user input: "(20+5)^4 - 550 + 8"
#bot.command()
async def calc(context, *, user_input):
#this is so user can use both ^ and ** for power and use "," and "." for decimals
equation = user_input.replace('^', "**").replace(",", ".")
valid_operators: list = ["+", "-", "/", "*", "%", "^", "**"]
# checks if a string contains any element from a list, it will also return false if the iterable is empty, so this covers empty check too
operator_check: bool = any(
operator in equation for operator in valid_operators)
# checks if arithmetic operator is last or first element in equation, to prevent doing something like ".calc 2+" or ".calc +2"
def is_last_or_first(equation: str):
for operator in valid_operators:
if operator == equation[-1]:
return True
elif operator == equation[0]:
if operator == "-":
return False
else:
return True
#isupper and islower checks whether there are letters in user input
if not operator_check or is_last_or_first(equation) or equation.isupper() or equation.islower():
return await context.send("Invalid input")
result = eval(decistmt(equation))
result = float(result)
# returning user_input here so user sees "^" instead of "**"
async def result_generator(result: int or float):
await context.send(f'**Input:** ```fix\n{user_input}```**Result:** ```fix\n{result}```')
# this is so if the result is .0 it returns an int
if result.is_integer():
await result_generator(int(result))
else:
await result_generator(result)
This is what happens after user input
user_input = "0.1 + 0.1 + 0.1 - 0.3"
float_to_decimal = decistmt(user_input)
print(float_to_decimal)
print(type(float_to_decimal))
# Decimal ('0.1')+Decimal ('0.1')+Decimal ('0.1')-Decimal ('0.3')
# <class 'str'>
Now I need to evaluate this input so I'm using eval(). My question is - is this safe (I assume not) and is there some other way to evaluate float_to_decimal?
EDIT. As requested, more in depth explanation:
The whole application is a chat bot. This "calc" function is one of the commands users can use. It is invoked by inputing ".calc " in the chat, ".calc" is a prefix, anything after that is arguments and it will be concatenated to a string and ultimately a string is what I'll get as an argument. I perform a bunch of checks to limit the input (remove letters etc.). After checks I am left with a string consisting of numbers, arithmetic operators and brackets. I want to evaluate the mathematical expression from that string. I pass that string to decistmt function which transforms each float in that string to Decimal objects, the result IS A STRING looking like this: "Decimal ('2.5') + Decimal ('-5.2')". Now I need to evaluate the expression inside that string. I used simpleeval module for that but it is incompatible with Decimal module so I'm evaluating using built in eval() method. My question is, is there a safer way of evaluating mathematical expression in a string like that?
A recent spinoff package from pyparsing is plusminus, a wrapper around pyparsing specifically for this case of embedded arithmetic evaluation. You can try it yourself at http://ptmcg.pythonanywhere.com/plusminus. Since plusminus uses its own parser, it is not subject to the common eval attacks, as Ned Batchelder describes in this still-timely blog post: https://nedbatchelder.com/blog/201206/eval_really_is_dangerous.html. Plusminus also includes handling for expressions which, while valid and "safe", are so time-consuming that they could be used for a denial-of-service attack - such as "9**9**9".
I'm planning a refactoring pass on plusminus that will change the API a bit, but you could probably use it as-is for your Discord plugin. (You'll find that plusminus also supports a number of additional operators and functions beyond the standard ones defined for Python, such as "|4-7|" for absolute value, or "√42" for sqrt(42) - click the Examples button on that page for more expressions that are supported. You can also save values in a variable, and then use that variable in later expressions. (This may not work for your plugin, since the variable state might be shared by multiple Discordians.)
Plusminus is also designed to support subclassing to define new operators, such as a dice roller that will evaluate "3d6+d20" as "3 rolls of a 6-sided die, plus 1 roll of a 20-sided die", including random die rolls each time it is evaluated.
Try using Decimal on numbers as #Shiva suggested.
Also here's the answer from user #unutbu which consists of using PyParsing library with custom wrapper for evaluating math expressions.
In case of Python or system expressions (e.g. import <package> or dir()) will throw an ParseException error.

Reconstructing two (string concatenated) numbers that were originally floats

Unfortunately the printing instruction of a code was written without an end-of-the-line character and one every 26 numbers consists of two numbers joined together. The following is a code that shows an example of such behaviour; at the end there is a fragment of the original database.
import numpy as np
for _ in range(2):
A=np.random.rand()+np.random.randint(0,100)
B=np.random.rand()+np.random.randint(0,100)
C=np.random.rand()+np.random.randint(0,100)
D=np.random.rand()+np.random.randint(0,100)
with open('file.txt','a') as f:
f.write(f'{A},{B},{C},{D}')
And thus the output example file looks very similar to what follows:
40.63358599010553,53.86722741700399,21.800795158561158,13.95828176311762557.217562728494684,2.626308403991772,4.840593988487278,32.401778122213486
With the issue being that there are two numbers 'printed together', in the example they were as follows:
13.95828176311762557.217562728494684
So you cannot know if they should be
13.958281763117625, 57.217562728494684
or
13.9582817631176255, 7.217562728494684
Please understand that in this case they are only two options, but the problem that I want to address considers 'unbounded numbers' which are type Python's "float" (where 'unbounded' means in a range we don't know e.g. in the range +- 1E4)
Can the original numbers be reconstructed based on "some" python internal behavior I'm missing?
Actual data with periodicity 27 (i.e. the 26th number consists of 2 joined together):
0.9221878978925224, 0.9331311610066017,0.8600582424784715,0.8754578588852764,0.8738648974725404, 0.8897837559800233,0.6773502027673041,0.736325377603136,0.7956454122424133, 0.8083168444596229,0.7089031184165164, 0.7475306242508357,0.9702361286847581, 0.9900689384633811,0.7453878225174624, 0.7749000030576826,0.7743879170108678, 0.8032590543649807,0.002434,0.003673,0.004194,0.327903,11.357262,13.782266,20.14374,31.828905,33.9260060.9215201173775437, 0.9349343132442707,0.8605282244327555,0.8741626682026793,0.8742163597524663, 0.8874673376386358,0.7109322043854609,0.7376362393985332,0.796158275345
To expand my comment into an actual answer:
We do have some information - An IEEE-754 standard float only has 32 bits of precision, some of which is taken up by the mantissa (not all numbers can be represented by a float). For datasets like yours, they're brushing up against the edge of that precision.
We can make that work for us - we just need to test whether the number can, in fact, be represented by a float, at each possible split point. We can abuse strings for this, by testing num_str == str(float(num_str)) (i.e. a string remains the same after being converted to a float and back to a string)
If your number is able to be represented exactly by the IEEE float standard, then the before and after will be equal
If the number cannot be represented exactly by the IEEE float standard, it will be coerced into the nearest number that the float can represent. Obviously, if we then convert this back to a string, will not be identical to the original.
Here's a snippet, for example, that you can play around with
def parse_number(s: str) -> List[float]:
if s.count('.') == 2:
first_decimal = s.index('.')
second_decimal = s[first_decimal + 1:].index('.') + first_decimal + 1
split_idx = second_decimal - 1
for i in range(second_decimal - 1, first_decimal + 1, -1):
a, b = s[:split_idx], s[split_idx:]
if str(float(a)) == a and str(float(b)) == b:
return [float(a), float(b)]
# default to returning as large an a as possible
return [float(s[:second_decimal - 1]), float(s[second_decimal - 1:])]
else:
return [float(s)]
parse_number('33.9260060.9215201173775437')
# [33.926006, 0.9215201173775437]
# this is the only possible combination that actually works for this particular input
Obviously this isn't foolproof, and for some numbers there may not be enough information to differentiate the first number from the second. Additionally, for this to work, the tool that generated your data needs to have worked with IEEE standards-compliant floats (which does appear to be the case in this example, but may not be if the results were generated using a class like Decimal (python) or BigDecimal (java) or something else).
Some inputs might also have multiple possibilities. In the above snippet I've biased it to take the longest possible [first number], but you could modify it to go in the opposite order and instead take the shortest possible [first number].
Yes, you have one available weapon: you're using the default precision to display the numbers. In the example you cite, there are 15 digits after the decimal point, making it easy to reconstruct the original numbers.
Let's take a simple case, where you have only 3 digits after the decimal point. It's trivial to separate
13.95857.217
The formatting requires a maximum of 2 digits before the decimal point, and three after.
Any case that has five digits between the points, is trivial to split.
13.958 57.217
However, you run into the "trailing zero" problem in some cases. If you see, instead
13.9557.217
This could be either
13.950 57.217
or
13.955 07.217
Your data do not contain enough information to differentiate the two cases.

Order of operation for **

I want to know why does the following happen.
The code below evaluates right side 1**3 first then 2**1
2**1**3 has the value of 2
However, for the below code left side 7//3 is evaluated first then 2*3. Finally 1+6-1=6.
1+7//3*3-1 has the value of 6
Take a look at the documentation of operator precedence. Although multiplication * and floor division // have the same precedence, you should take note of this part:
Operators in the same box group left to right (except for exponentiation, which groups from right to left).
For the convention of 213 being evaluated right-associative, see cross-site dupe on the math stackexchange site: What is the order when doing xyz and why?
The TL;DR is this: since the left-associative version (xy)z would just equal xy*z, it's not useful to have another (worse) notation for the same thing, so exponentiation should be right associative.
Almost all operators in Python (that share the same precedence) have left-to-right associativity. For example:
1 / 2 / 3 ≡ (1 / 2) / 3
One exception is the exponent operator which is right-to-left associativity:
2 ** 3 ** 4 ≡ 2 ** (3 ** 4)
That's just the way the language is defined, matching mathematical notation where abc ≡ a(bc).
If it were (ab)c, that would just be abc.
Per Operator Precedence, the operator is right associative: a**b**c**d == a**(b**(c**d)).
So, if you do this:
a,b,c,d = 2,3,5,7
a**b**c**d == a**(b**(c**d))
you should get true after a looooong time.
The Exponent operator in python has a Right to Left precedence. That is out of all the occurrences in an expression the calculation will be done from Rightmost to Leftmost. The exponent operator is an exception among all the other operators as most of them follow a Left to Right associativity rule.
2**1**3 = 2
The expression
1+7//3*3-1
It is a simple case of Left to Right associativity. As // and * operator share the same precedence, Associativity(one is the Left) is taken into account.
This is just how math typically works
213
This is the same as the first expression you used. To evaluate this with math, you'd work your way down, so 13=1 and then 21 which equals 2.
You can make sense of this just by thinking about the classic PEMDAS (or Please Excuse My Dear Aunt Sally) order of operations from mathematics. In your first one, 2**1**3 is equivalent to , which is really read as . Looking at it this way, you see that you do parenthesis (P) first (the 1**3).
In the second one, 1+7//3*3-1 == 6 you have to note that the MD and AS of PEMDAS are actually done in order of whichever comes first reading from left-to-right. It's simply a fault of language that we have to write one letter before another (that is, we could write this as PEDMAS and it still be correct if we treat the D and M appropriately).
All that to say, Python is treating the math exactly the same way as we should even if this were written with pen and paper.

How to force pyparsing to parenthesize infix notation "9 + 2 + 3"

Let's take a look at the simplest arithmetic example in the pyparsing doc, here.
More specifically, I'm looking at the "+" operation that is defined as left associative and the first example test where we're parsing "9 + 2 + 3".
The outcome of the parsing I would have expected would be ((9+2)+3), that is, first compute the infix binary operator on 9 and 2 and then compute the infix binary operator on the result and 3. What I get however is (9+2+3), all on the same level, which is really not all that helpful, after all I have now to decide the order of evaluation myself and yet it was defined to be left associative. Why am I forced to parenthesize myself? What am I missing?
Thanks & Regards

What is the recommended whitespace for slicing with expression indices in Python?

Examples of slicing in documentation only show integer literals and variables used as indices, not more complex expressions (e.g. myarray[x/3+2:x/2+3:2]). PEP-8 also doesn't cover this case. What is the usual usage of whitespace here: myarray[x/3+2:x/2+3:2], myarray[x/3+2 : x/2+3 : 2], or myarray[x/3+2: x/2+3: 2] (there don't seem to be other reasonable options)?
I have never seen spaces used in slicing operations, so would err on the side of avoiding them. Then again, unless it's performance critical I'd be inclined to move the expressions outside of the slicing operation altogether. After all, your goal is readability:
lower = x / 3 + 2
upper = x / 2 + 3
myarray[lower:upper:2]
I believe the most relevant extract of PEP8 on this subject is:
The guidelines provided here are intended to improve the readability of code and make it consistent across the wide spectrum of Python code.
In this case, my personal choice would probably be either Steve Mayne's answer, or perhaps:
myarray[slice(x / 3 + 2, x / 2 + 3, 2)]
Rule 1. Pet Peeves
However, in a slice the colon acts like a binary operator, and should
have equal amounts on either side (treating it as the operator with
the lowest priority). In an extended slice, both colons must have the
same amount of spacing applied. Exception: when a slice parameter is
omitted, the space is omitted:
Rule 2. Other Recommendations
If operators with different priorities are used, consider adding whitespace around the operators with the lowest priority(ies). Use your own judgment; however, never use more than one space, and always have the same amount of whitespace on both sides of a binary operator:
The following fails rule 2.
myarray[x/3+2:x/2+3:2]
myarray[x/3+2:x/2+3:2]
myarray[x/3+2 : x/2+3 : 2]
myarray[x/3+2: x/2+3: 2]
So the answer is,
myarray[x/3 + 2 : x/2 + 3 : 2]
Black Playground link Bug

Categories

Resources