Python & regular expressions

Python & regular expressions - python

Hi i'm making a program that will take a command such as !math(5+5) from skype and return the result, i'm having trouble getting the regex expression right, I have been reading the docs and I just can't get it right.
I am trying to support all operators, so , * /, +, -, %, and **. the trouble comes when I try to use **, either I use one regex and lose the option for exponents or have the exponent only option.
Here's my expression:
expr = re.search(r'((\d+)(\s*.?|.*\**)(\d+))', 'math(500**1000)')
And then i'm parsing it using groups,
build_expr = {
'number1': int(expr.group(2)),
'operator': expr.group(3),
'number2': int(expr.group(4))
}
results for giving the reseach module the argument with exponents:
>>> expr.group()
'500**1000'
>>> expr.group(1)
'500**1000'
>>> expr.group(2)
'500'
>>> expr.group(3)
'**100'
And it works just fine and dandy with 1 character strings, such as math(500+1000)
>>> expr.group(1)
'500+1000'
>>> expr.group(2)
'500'
>>> expr.group(3)
'+'
>>> expr.group(4)
'1000'
Here's the entire function
def math_func(expr_arg):
expr = re.search(r'((\d+)(\s*.?|.*\**)(\d+))', expr_arg)
#parse_expr = ' '.join(expr.group()).split()
build_expr = {
'number1': int(expr.group(2)),
'operator': expr.group(3),
'number2': int(expr.group(4))
}
if build_expr['operator'] == '+':
operation = build_expr['number1'] + build_expr['number2']
return str(operation)
elif build_expr['operator'] == '-':
operation = build_expr['number1'] - build_expr['number2']
return str(operation)
elif build_expr['operator'] == '/':
operation = build_expr['number1'] / build_expr['number2']
return str(operation)
elif build_expr['operator'] == '%':
operation = build_expr['number1'] % build_expr['number2']
return str(operation)
elif build_expr['operator'] == '*':
operation = build_expr['number1'] * build_expr['number2']
return str(operation)
elif build_expr['operator'] == '**':
operation = build_expr['number1'] ** build_expr['number2']
return str(operation)
else:
return 'Invalid operator'
return 'shes all good son'
f = math_func('math(500+1000)')
Message.Chat.SendMessage('>> ' + f)

It can be matched using:
(\d+)\s*([-+*/%]+)\s*(\d+)
Breakdown:
(\d+) will match one or more digits
\s* will match the whitespaces, if there's any
([-+*/%]+) will match one or more operator characters

How about just using:
(\d+)\s*(\*\*|[+/%*-])\s*(\d+)
[+/%*-] means "one character, one from the list inside brackets".
No need to wrap everything inside a capturing group, the whole match is already stored in group(0).
I'm not sure how you got to .*\** so I can't tell you what's your mistake is here, but the new regex should do the job.

Assuming you only have trusted data, you could just replace your math_func with eval and be done with it.

Related

TypeError: 'int' object is not subscriptable while doing s-expression in Python [duplicate]

This question already has answers here:
How to create a "singleton" tuple with only one element
(4 answers)
Closed 11 days ago.
I am trying to write a basic s-expression calculator in Python using s-expression which can contains add or multiply or both or none or just an integar number.
I tried the following snippet:
def calc(expr):
print(expression[0])
if isinstance(expr, int):
return expr
elif expr[0] == '+':
return calc(expr[1]) + calc(expr[2])
elif expr[0] == '*':
return calc(expr[1]) * calc(expr[2])
else:
raise ValueError("Unknown operator: %s" % expr[0])
# Example usage
# expression = ('+', ('*', 3, 4), 5)
expression = (7)
result = calc(expression)
print(result)
When I tried to pass the expression ('+', ('*', 3, 4), 5) , it gives the correct answer but when I just try to use number 7 or 7 inside tuple (7), it gives the above error. How to solve this?

Code is fine, debug is not
You print used to debug is not correctly placed, or assume expression to be a Sequence, not an int.
[Good practice] Don't print a global variable but local: print(expr).
This is less confusing and will help you for debugging this code.
[Branch simplification] Replace every elif with if. Since you return in every branch, you don't need elif. Remove the else too. This will allows you to place code after the first if that will be run for all remaining branches without having to place it in every elif.
def calc(expr: int | tuple):
print(expr)
if isinstance(expr, int):
return expr
if expr[0] == '+':
return calc(expr[1]) + calc(expr[2])
if expr[0] == '*':
return calc(expr[1]) * calc(expr[2])
raise ValueError("Unknown operator: %s" % expr[0])
[Fix] Move the print below the first if.
def calc(expr: int | tuple):
if isinstance(expr, int):
return expr
print(expr)
if expr[0] == '+':
return calc(expr[1]) + calc(expr[2])
if expr[0] == '*':
return calc(expr[1]) * calc(expr[2])
raise ValueError("Unknown operator: %s" % expr[0])
This code works, both for (7) and ('+', ('*', 3, 4), 5).

Including commas, question marks, exclamation marks in output (python)

The goal of this program is to take a user input and convert it to ascii-text.
The code works as it should, but it doesn't include commas, periods, exclamation marks or question marks.
I have tried to include: !, ?, ' and commas, as a seperate list and try to call it in the input. But I wasn't fully sure how to do it.
At the moment I just used a bunch of else-if statements, it works but I feel like there must be a simpler way to fix that. I can't really figure out how. Tips are extremely appreciated!
def asciiToLeet(c):
l33tLetters = ["#", "8", "(", "|)", "3", "#", "6", "[-]", "|", "_|", "|<", "1", "[]\/[]", "[]\[]", "0", "|D", "(,)", "|Z", "$", "']['",
"|_|", "\/", "\/\/", "}{", "`/", "2"]
if c == ' ': return ' '
elif c == '.': return '.'
elif c == ',': return ','
elif c == '?': return '?'
elif c == '!': return '!'
elif c == "'": return "'"
asciiCode = ord(c)
if asciiCode >= ord('a') and asciiCode <= ord('z'):
return l33tLetters[asciiCode - ord('a')]
if asciiCode >= ord('A') and asciiCode <= ord('Z'):
return l33tLetters[asciiCode - ord('A')]
return ""
if __name__ == "__main__":
inputString = input()
outputString = ""
for c in inputString:
outputString += asciiToLeet(c)
print(outputString)
My expectation is for the code to show the output with the punctuations without having to use if-else statements.

You have return "" at the end of your method. Thus, if all of your lookups fail, it discards the input character. Instead, do return c. This will cause the input character to be returned as-is if the lookups to make it "leet" don't match it.

Splitting bracket delimited text which can contain quoted strings

I am trying to split some text. Basically I want to separate level-1 brackets, like "('1','a',NULL),(2,'b')" => ["('1','a',NULL)", "(2,'b')]", but I need to be aware of possible quoted strings inside. It needs to at least satisfy the following py.tests:
from splitter import split_text
def test_normal():
assert split_text("('1'),('2')") == ["('1')", "('2')"]
assert split_text("(1),(2),(3)") == ["(1)", "(2)", "(3)"]
def test_complex():
assert split_text("('1','a'),('2','b')") == ["('1','a')", "('2','b')"]
assert split_text("('1','a',NULL),(2,'b')") == ["('1','a',NULL)", "(2,'b')"]
def test_apostrophe():
assert split_text("('\\'1','a'),('2','b')") == ["('\\'1','a')", "('2','b')"]
def test_coma_in_string():
assert split_text("('1','a,c'),('2','b')") == ["('1','a,c')", "('2','b')"]
def test_bracket_in_string():
assert split_text("('1','a)c'),('2','b')") == ["('1','a)c')", "('2','b')"]
def test_bracket_and_coma_in_string():
assert split_text("('1','a),(c'),('2','b')") == ["('1','a),(c')", "('2','b')"]
def test_bracket_and_coma_in_string_apostrophe():
assert split_text("('1','a\\'),(c'),('2','b')") == ["('1','a\\'),(c')", "('2','b')"]
I have tried the following:
1) Regular expressions
This looks like the best solution, but unfortunately I did not come up with anything satisfying all tests.
My best try is:
def split_text(text):
return re.split('(?<=\)),(?=\()', text)
But obviously, that is rather simplistic and fails test_bracket_and_coma_in_string and test_bracket_and_coma_in_string_apostrophe.
2) Finite-state-machine-like solution
I tried to code the FSM myself:
OUTSIDE, IN_BRACKETS, IN_STRING, AFTER_BACKSLASH = range(4)
def split_text(text):
state = OUTSIDE
read = []
result = []
for character in text:
if state == OUTSIDE:
if character == ',':
result.append(''.join(read))
read = []
elif character == '(':
read.append(character)
state = IN_BRACKETS
else:
read.append(character)
elif state == IN_BRACKETS:
read.append(character)
if character == ')':
state = OUTSIDE
elif character == "'":
state = IN_STRING
elif state == IN_STRING:
read.append(character)
if character == "'":
state = IN_BRACKETS
elif character == '\\':
state = AFTER_BACKSLASH
elif state == AFTER_BACKSLASH:
read.append(character)
state = IN_STRING
result.append(''.join(read)) # The rest of string
return result
It works, passes all tests, but is very slow.
3) pyparsing
from pyparsing import QuotedString, ZeroOrMore, Literal, Group, Suppress, Word, nums
null_value = Literal('NULL')
number_value = Word(nums)
string_value = QuotedString("'", escChar='\\', unquoteResults=False)
value = null_value | number_value | string_value
one_bracket = Group(Literal('(') + value + ZeroOrMore(Literal(',') + value) + Literal(')'))
all_brackets = one_bracket + ZeroOrMore(Suppress(',') + one_bracket)
def split_text(text):
parse_result = all_brackets.parseString(text)
return [''.join(a) for a in parse_result]
Also passes all tests, but surprisingly it is even slower than solution #2.
Any ideas how to make the solution fast and robust? I have this feeling that I am missing something obvious.

One way would be to use the newer regex module which supports the (*SKIP)(*FAIL) functionality:
import regex as re
def split_text(text):
rx = r"""'.*?(?<!\\)'(*SKIP)(*FAIL)|(?<=\)),(?=\()"""
return re.split(rx, text)
Broken down it says:
'.*?(?<!\\)' # look for a single quote up to a new single quote
# that MUST NOT be escaped (thus the neg. lookbehind)
(*SKIP)(*FAIL)| # these parts shall fail
(?<=\)),(?=\() # your initial pattern with a positive lookbehind/ahead
This succeeds on all your examples.

I cooked this and it works on given tests.
tests = ["('1'),('2')",
"(1),(2),(3)",
"('1','a'),('2','b')",
"('1','a',NULL),(2,'b')",
"('\\'1','a'),('2','b')",
"('1','a,c'),('2','b')",
"('1','a)c'),('2','b')",
"('1','a),(c'),('2','b')",
"('1','a\\'),(c'),('2','b')"]
for text in tests:
tmp = ''
res = []
bracket = 0
quote = False
for idx,i in enumerate(text):
if i=="'":
if text[idx-1]!='\\':
quote = not quote
tmp += i
elif quote:
tmp += i
elif i==',':
if bracket: tmp += i
else: pass
else:
if i=='(': bracket += 1
elif i==')': bracket -= 1
if bracket: tmp += i
else:
tmp += i
res.append(tmp)
tmp = ''
print res
Output:
["('1')", "('2')"]
['(1)', '(2)', '(3)']
["('1','a')", "('2','b')"]
["('1','a',NULL)", "(2,'b')"]
["('\\'1','a')", "('2','b')"]
["('1','a,c')", "('2','b')"]
["('1','a)c')", "('2','b')"]
["('1','a),(c')", "('2','b')"]
["('1','a\\'),(c')", "('2','b')"]
The code has room for improvement, and edits are welcome. :)

This is the regular expression which seems to work and passes all the tests. Running it on real data it is about 6x faster than finite state machine implemented in Python.
PATTERN = re.compile(
r"""
\( # Opening bracket
(?:
# String
(?:'(?:
(?:\\')|[^'] # Either escaped apostrophe, or other character
)*'
)
|
# or other literal not containing right bracket
[^')]
)
(?:, # Zero or more of them separated with comma following the first one
# String
(?:'(?:
(?:\\')|[^'] # Either escaped apostrophe, or other character
)*'
)
|
# or other literal
[^')]
)*
\) # Closing bracket
""",
re.VERBOSE)
def split_text(text):
return PATTERN.findall(text)

converting infix to prefix in python

I am trying to write an Infix to Prefix Converter where e.g. I would like to convert this:
1 + ((C + A ) * (B - F))
to something like:
add(1, multiply(add(C, A), subtract(B, F)))
but I get this instead :
multiply(add(1, add(C, A), subtract(B, F)))
This is the code I have so far
postfix = []
temp = []
newTemp = []
def textOperator(s):
if s is '+':
return 'add('
elif s is '-':
return 'subtract('
elif s is '*':
return 'multiply('
else:
return ""
def typeof(s):
if s is '(':
return leftparentheses
elif s is ')':
return rightparentheses
elif s is '+' or s is '-' or s is '*' or s is '%' or s is '/':
return operator
elif s is ' ':
return empty
else :
return operand
infix = "1 + ((C + A ) * (B - F))"
for i in infix :
type = typeof(i)
if type is operand:
newTemp.append(i)
elif type is operator:
postfix.append(textOperator(i))
postfix.append(newTemp.pop())
postfix.append(', ')
elif type is leftparentheses :
newTemp.append(i)
elif type is rightparentheses :
next = newTemp.pop()
while next is not '(':
postfix.append(next)
next = newTemp.pop()
postfix.append(')')
newTemp.append(''.join(postfix))
while len(postfix) > 0 :
postfix.pop()
elif type is empty:
continue
print("newTemp = ", newTemp)
print("postfix = ", postfix)
while len(newTemp) > 0 :
postfix.append(newTemp.pop())
postfix.append(')')
print(''.join(postfix))
Can someone please help me figure out how I would fix this.

What I see, with the parenthetical clauses, is a recursive problem crying out for a recursive solution. The following is a rethink of your program that might give you some ideas of how to restructure it, even if you don't buy into my recursion argument:
import sys
from enum import Enum
class Type(Enum): # This could also be done with individual classes
leftparentheses = 0
rightparentheses = 1
operator = 2
empty = 3
operand = 4
OPERATORS = { # get your data out of your code...
"+": "add",
"-": "subtract",
"*": "multiply",
"%": "modulus",
"/": "divide",
}
def textOperator(string):
if string not in OPERATORS:
sys.exit("Unknown operator: " + string)
return OPERATORS[string]
def typeof(string):
if string == '(':
return Type.leftparentheses
elif string == ')':
return Type.rightparentheses
elif string in OPERATORS:
return Type.operator
elif string == ' ':
return Type.empty
else:
return Type.operand
def process(tokens):
stack = []
while tokens:
token = tokens.pop()
category = typeof(token)
print("token = ", token, " (" + str(category) + ")")
if category == Type.operand:
stack.append(token)
elif category == Type.operator:
stack.append((textOperator(token), stack.pop(), process(tokens)))
elif category == Type.leftparentheses:
stack.append(process(tokens))
elif category == Type.rightparentheses:
return stack.pop()
elif category == Type.empty:
continue
print("stack = ", stack)
return stack.pop()
INFIX = "1 + ((C + A ) * (B - F))"
# pop/append work from right, so reverse, and require a real list
postfix = process(list(INFIX[::-1]))
print(postfix)
The result of this program is a structure like:
('add', '1', ('multiply', ('add', 'C', 'A'), ('subtract', 'B', 'F')))
Which you should be able to post process into the string form you desire (again, recursively...)
PS: type and next are Python built-ins and/or reserved words, don't use them for variable names.
PPS: replace INFIX[::-1] with sys.argv[1][::-1] and you can pass test cases into the program to see what it does with them.
PPPS: like your original, this only handles single digit numbers (or single letter variables), you'll need to provide a better tokenizer than list() to get that working right.

how to "add" things together python

I wrote a function like this, the op gives a operation sign which like '+','-','*','/' or more, the code "adds" everything use the given operator,
Here is the code:
def arithmetic(op,*args):
result = args[0]
for x in args[1:]:
if op =='+':
result += x
elif op == '-':
result -= x
elif op == '*':
result *= x
elif op == '/':
result /= x
return result
Is there a way i can use the +,-,*,/ directly? So I don't have to write an If-Else statement?

You can use the corresponding operators:
import operator
def arithmetic(opname, *args):
op = {'+': operator.add,
'-': operator.sub,
'*': operator.mul,
'/': operator.div}[opname]
result = args[0]
for x in args[1:]:
result = op(result, x)
return result
or shorter, with reduce:
import operator,functools
def arithmetic(opname, arg0, *args):
op = {'+': operator.add,
'-': operator.sub,
'*': operator.mul,
'/': operator.div}[opname]
return functools.reduce(op, args, arg0)

I think you're looking for the builtin reduce function combined with operator:
import operator
a = range(10)
reduce(operator.add,a) #45
reduce(operator.sub,a) #-45
reduce(operator.mul,a) #0 -- first element is 0.
reduce(operator.div,a) #0 -- first element is 0.
Of course, if you want to do this using strings, you can map the strings to an operation using a dict:
operations = {'+':operator.add,'-':operator.sub,} # ...
then it becomes:
reduce(operations[your_operator],a)

For the + operator, you have the built-in sum function.

You can use exec:
def arithmetic(op, *args):
result = args[0]
for x in args[1:]:
exec('result ' + op + '= x')
return result

Develop Reference

Python is a programming language that lets you work quickly and integrate systems more effectively.

Python & regular expressions - python

It can be matched using: (\d+)\s([-+/%]+)\s(\d+) Breakdown: (\d+) will match one or more digits \s will match the whitespaces, if there's any ([-+*/%]+) will match one or more operator characters

Assuming you only have trusted data, you could just replace your math_func with eval and be done with it.

Related

TypeError: 'int' object is not subscriptable while doing s-expression in Python [duplicate]

Including commas, question marks, exclamation marks in output (python)

Splitting bracket delimited text which can contain quoted strings

converting infix to prefix in python

how to "add" things together python

Categories

Resources

Develop Reference

Python is a programming language that lets you work quickly and integrate systems more effectively.

Python & regular expressions - python

It can be matched using: (\d+)\s*([-+*/%]+)\s*(\d+) Breakdown: (\d+) will match one or more digits \s* will match the whitespaces, if there's any ([-+*/%]+) will match one or more operator characters

Assuming you only have trusted data, you could just replace your math_func with eval and be done with it.

Related

TypeError: 'int' object is not subscriptable while doing s-expression in Python [duplicate]

Including commas, question marks, exclamation marks in output (python)

Splitting bracket delimited text which can contain quoted strings

converting infix to prefix in python

how to "add" things together python

Categories

Resources

It can be matched using: (\d+)\s([-+/%]+)\s(\d+) Breakdown: (\d+) will match one or more digits \s will match the whitespaces, if there's any ([-+*/%]+) will match one or more operator characters