I'm trying to create a function which will solve for some numeric computation – which is given as a string.
Example:
def calculate(expression):
# Solve the expression below
return
# Result should be 19
calculate("5+8-3+9")
I have tried using .split() but got stuck.
For a problem like this we can try tackling it by using this, a string calculator.
'''
Calculates a string of characters only for addition and subtraction
with parentheses. This calculator utilizes the stack method.
'''
import re # imports regular expression library for usage
def calculate(s: str) -> int:
s = re.sub(r'[A-Za-z\s\t]+', '', s)
res = 0
num = 0
sign = 1
stack = []
for ss in s:
# checks if each element is a digit
if ss.isdigit():
num = 10 * num + int(ss)
# if not a digit, checks if its + or - sign
elif ss in ["-", "+"]:
res = res + sign * num
num = 0
sign = [-1, 1][ss == "+"]
'''
sign = [-1, 1][ss=="+"] is the same as:
# int(True) = 1, int(False) = 0. Hence,
if ss == "+":
sign = 1
else:
sign = -1
'''
return res + num * sign
s = input("Enter your string: ")
# OR if you'd like, can uncomment this line below and comment the line above.
# s = "5+8-3+9" # As an expression in a string
print(calculate(s))
I would suggest breaking the question down to its numbers and operators.
Also, I've made the assumption that only whole numbers will be used – and only addition and subtraction.
def calculate(expression):
# Get all components
components = re.findall("[+-]{1}[0-9]*|^[0-9]*",expression)
# get each number with its positive or negative operator
operators = re.compile("[-+]")
# Iterate and add to a list
all_nums = []
for x in components:
# get the number
n = int(re.sub(operators,"",x))
# For all terms after the first
if operators.search(x):
op = operators.search(x).group()
if op=="+":
n = n
elif op=="-":
n=-n
# Save the number
all_nums.append(n)
# Finally, add them up
return sum(all_nums)
x = "5+8-3+9"
calculate(x)
# returns 19
First of all, it's okay to be a beginner - I was in your exact shoes just a few years ago!
I'm going to attempt to provide an elementary/beginner approach to solving this problem, with just the basics.
So first we want to determine what the limits of our function input will be. For this, I'll assume we only accept mathematical expressions with basic addition/subtraction operators.
import re
def calculate(expression: str) -> int:
if not re.match("^[0-9\+-]*$", expression):
return None
For this you'll see I opted for regex, which is a slightly more advanced concept, but think about it like a validity check for expression. Basically, the pattern I wrote checks that there is a fully qualified string that has only integers, plus sign, and minus sign. If you want to learn more about the expression ^[0-9\+-]*$, I highly recommend https://regexr.com/.
For our purposes and understanding though, these test cases should suffice:
>>> re.match("^[0-9\+-]*$", "abcs")
>>> re.match("^[0-9\+-]*$", "1+2")
<re.Match object; span=(0, 3), match='1+2'>
>>> re.match("^[0-9\+-]*$", "1+2/3")
>>>
Now that we have verified our expression, we can get to work on calculating the final value.
Let's try your idea with str.split()! It won't be entirely straightforward because split by definition splits a string up according to a delimiter(s) but discards them in the output. Fear not, because there's another way! The re package I imported earlier can come into handy. So the re library comes with a handy function, split!
By using capture groups for our separator, we are able to split and keep our separators.
>>> re.split("(\d+)", "1+393958-3")
['', '1', '+', '393958', '-', '3', '']
So, with this up our sleeve...
import re
def calculate(expression: str) -> int:
if not re.match("^[0-9\+-]*$", expression):
return None
expression_arr = re.split("(\d+)", expression)[1:-1]
while len(expression_arr) > 1:
# TODO stuff
return int(expression[0])
We can now move onto our while loop. It stands to reason that as long as the array has more than one item, there is some sort of operation left to do.
import re
def calculate(expression: str) -> int:
if not re.match("^[0-9\+-]*$", expression):
return None
expression_arr = re.split("(\d+)", expression)[1:-1]
while len(expression_arr) > 1:
if expression_arr[1] == "+":
eval = int(expression_arr[0]) + int(expression_arr[2])
if expression_arr[1] == "-":
eval = int(expression_arr[0]) - int(expression_arr[2])
del expression_arr[:3]
expression_arr.insert(0, eval)
return int(expression_arr[0])
It's pretty straightforward from there - we check the next operator (which always has to be at expression_arr[1]) and either add or subtract, and make the corresponding changes to expression_arr.
We can verify that it passes the test case you provided. (I added some logging to help with visualization)
>>> calculate("5+8-3+9")
['5', '+', '8', '-', '3', '+', '9']
[13, '-', '3', '+', '9']
[10, '+', '9']
[19]
19
Related
so i need to code a program which, for example if given the input 3[a]2[b], prints "aaabb" or when given 3[ab]2[c],prints "abababcc"(basicly prints that amount of that letter in the given order). i tried to use a for loop to iterate the first given input and then detect "[" letters in it so it'll know that to repeatedly print but i don't know how i can make it also understand where that string ends
also this is where i could get it to,which probably isnt too useful:
string=input()
string=string[::-1]
bulundu=6
for i in string:
if i!="]":
if i!="[":
lst.append(i)
if i=="[":
break
The approach I took is to remove the brackets, split the items into a list, then walk the list, and if the item is a number, add that many repeats of the next item to the result for output:
import re
data = "3[a]2[b]"
# Remove brackets and convert to a list
data = re.sub(r'[\[\]]', ' ', data).split()
result = []
for i, item in enumerate(data):
# If item is a number, print that many of the next item
if item.isdigit():
result.append(data[i+1] * int(item))
print(''.join(result))
# aaabb
A different approach, inspired by Subbu's use of re.findall. This approach finds all 'pairs' of numbers and letters using match groups, then multiplies them to produce the required text:
import re
data = "3[a]2[b]"
matches = re.findall('(\d+)\[([a-zA-Z]+)\]',data)
# [(3, 'a'), (2, 'b')]
for x in matches:
print(x[1] * int(x[0]), end='')
#aaabb
Lenghty and documented version using NO regex but simple string and list manipulation:
first split the input into parts that are numbers and texts
then recombinate them again
I opted to document with inline comments
This could be done like so:
# testcases are tuples of input and correct result
testcases = [ ("3[a]2[b]","aaabb"),
("3[ab]2[c]","abababcc"),
("5[12]6[c]","1212121212cccccc"),
("22[a]","a"*22)]
# now we use our algo for all those testcases
for inp,res in testcases:
split_inp = [] # list that takes the splitted values of the input
num = 0 # accumulator variable for more-then-1-digit numbers
in_text = False # bool that tells us if we are currently collecting letters
# go over all letters : O(n)
for c in inp:
# when a [ is reached our num is complete and we need to store it
# we collect all further letters until next ] in a list that we
# add at the end of your split_inp
if c == "[":
split_inp.append(num) # add the completed number
num = 0 # and reset it to 0
in_text = True # now in text
split_inp.append([]) # add a list to collect letters
# done collecting letters
elif c == "]":
in_text = False # no longer collecting, convert letters
split_inp[-1] = ''.join(split_inp[-1]) # to text
# between [ and ] ... simply add letter to list at end
elif in_text:
split_inp[-1].append(c) # add letter
# currently collecting numbers
else:
num *= 10 # increase current number by factor 10
num += int(c) # add newest number
print(repr(inp), split_inp, sep="\n") # debugging output for parsing part
# now we need to build the string from our parsed data
amount = 0
result = [] # intermediate list to join ['aaa','bb']
# iterate the list, if int remember it, it text, build composite
for part in split_inp:
if isinstance(part, int):
amount = part
else:
result.append(part*amount)
# join the parts
result = ''.join(result)
# check if all worked out
if result == res:
print("CORRECT: ", result + "\n")
else:
print (f"INCORRECT: should be '{res}' but is '{result}'\n")
Result:
'3[a]2[b]'
[3, 'a', 2, 'b']
CORRECT: aaabb
'3[ab]2[c]'
[3, 'ab', 2, 'c']
CORRECT: abababcc
'5[12]6[c]'
[5, '12', 6, 'c']
CORRECT: 1212121212cccccc
'22[a]'
[22, 'a']
CORRECT: aaaaaaaaaaaaaaaaaaaaaa
This will also handle cases of '5[12]' wich some of the other solutions wont.
You can capture both the number of repetitions n and the pattern to repeat v in one go using the described pattern. This essentially matches any sequence of digits - which is the first group we need to capture, reason why \d+ is between brackets (..) - followed by a [, followed by anything - this anything is the second pattern of interest, hence it is between backets (...) - which is then followed by a ].
findall will find all these matches in the passed line, then the first match - the number - will be cast to an int and used as a multiplier for the string pattern. The list of int(n) * v is then joined with an empty space. Malformed patterns may throw exceptions or return nothing.
Anyway, in code:
import re
pattern = re.compile("(\d+)\[(.*?)\]")
def func(x): return "".join([v*int(n) for n,v in pattern.findall(x)])
print(func("3[a]2[b]"))
print(func("3[ab]2[c]"))
OUTPUT
aaabb
abababcc
FOLLOW UP
Another solution which achieves the same result, without using regular expression (ok, not nice at all, I get it...):
def func(s): return "".join([int(x[0])*x[1] for x in map(lambda x:x.split("["), s.split("]")) if len(x) == 2])
I am not much more than a beginner and looking at the other answers, I thought understanding regex might be a challenge for a new contributor such as yourself since I myself haven't really dealt with regex.
The beginner friendly way to do this might be to loop through the input string and use string functions like isnumeric() and isalpha()
data = "3[a]2[b]"
chars = []
nums = []
substrings = []
for i, char in enumerate(data):
if char.isnumeric():
nums.append(char)
if char.isalpha():
chars.append(char)
for i, char in enumerate(chars):
substrings.append(char * int(nums[i]))
string = "".join(substrings)
print(string)
OUTPUT:
aaabb
And on trying different values for data:
data = "0[a]2[b]3[p]"
OUTPUT bbppp
data = "1[a]1[a]2[a]"
OUTPUT aaaa
NOTE: In case you're not familiar with the above functions, they are string functions, which are fairly self-explanatory. They are used as <your_string_here>.isalpha() which returns true if and only if the string is an alphabet (whitespace, numerics, and symbols return false
And, similarly for isnumeric()
For example,
"]".isnumeric() and "]".isalpha() return False
"a".isalpha() returns True
IF YOU NEED ANY CLARIFICATION ON A FUNCTION USED, PLEASE DO NOT HESITATE TO LEAVE A COMMENT
I'm a beginner to Python and I'm having trouble understanding some of the code in the provided moneyfmt function in the Recipes section of the Python Library of the decimal module
decimal Recipes
def moneyfmt(value, places=2, curr='', sep=',', dp='.',
pos='', neg='-', trailneg=''):
"""Convert Decimal to a money formatted string.
places: required number of places after the decimal point
curr: optional currency symbol before the sign (may be blank)
sep: optional grouping separator (comma, period, space, or blank)
dp: decimal point indicator (comma or period)
only specify as blank when places is zero
pos: optional sign for positive numbers: '+', space or blank
neg: optional sign for negative numbers: '-', '(', space or blank
trailneg:optional trailing minus indicator: '-', ')', space or blank
>>> d = Decimal('-1234567.8901')
>>> moneyfmt(d, curr='$')
'-$1,234,567.89'
>>> moneyfmt(d, places=0, sep='.', dp='', neg='', trailneg='-')
'1.234.568-'
>>> moneyfmt(d, curr='$', neg='(', trailneg=')')
'($1,234,567.89)'
>>> moneyfmt(Decimal(123456789), sep=' ')
'123 456 789.00'
>>> moneyfmt(Decimal('-0.02'), neg='<', trailneg='>')
'<0.02>'
"""
q = Decimal(10) ** -places # 2 places --> '0.01'
sign, digits, exp = value.quantize(q).as_tuple()
result = []
digits = list(map(str, digits))
build, next = result.append, digits.pop
if sign:
build(trailneg)
for i in range(places):
build(next() if digits else '0')
if places:
build(dp)
if not digits:
build('0')
i = 0
while digits:
build(next())
i += 1
if i == 3 and digits:
i = 0
build(sep)
build(curr)
build(neg if sign else pos)
return ''.join(reversed(result))
The part I can't follow is:
build, next = result.append, digits.pop
if sign:
build(trailneg)
for i in range(places):
build(next() if digits else '0')
if places:
build(dp)
if not digits:
build('0')
i = 0
while digits:
build(next())
i += 1
if i == 3 and digits:
i = 0
build(sep)
build(curr)
build(neg if sign else pos)
I've looked up the next() method, but I don't understand how it's implemented here. I haven't been able to find 'build' listed as a Python method, or function anywhere. I think it's a variable, but if it IS a variable, I really don't get how it's being used here.
Can someone walk me through the code here?
basically the build and next are variables that refer to methods manipulating with the "result" variable
Example
x = [1, 2, 3]
add = x.append
add(4)
x printed should look like this: "[1, 2, 3, 4]"
same with the "next", but the next is actually "result.pop"
if you will call, for example, just "print" to python3 console, you can notice
"built-in function print", that means it is inbuild function, but that doesn't matter at all, for now, if you write your function and do the same, it will print name of that function and address in memory that contains executable function, with parentheses you say to python that you want to execute this function, back to print, if you will store the print as a variable, you can use the variable as an alias for print.
new_print = print
new_print('foo')
This is very useful when you have a function, that will call another function, this is used in many libraries such as GUI development PyQt and many others. You can do it by yourself:
def foo(number, function, arg):
if number > 5:
function(arg)
if you will call the foo(1, print, 'bar')
it will print "bar"
Simple enough, or still complicated?
Anything else you'd like to understand?
And happy pythoning!
I have a text file (an output of a different process that I can't alter) which contains logical comparisons (only these three: >, <=, in) stored as strings. Let's say this is a line in my file, which should be evaluated:
myStr = "x>2 and y<=30 and z in ('def', 'abc')"
Some of my variables are categorical and I specify them, and the rest are numerical:
categoricalVars = ('z')
The values of my variables are stored in a dictionary, let's assume these are their values. Note that they always come in as strings, even for numeric variables:
x, y, z = '5', '6', 'abc'
So my question is how I can safely evaluate (i.e. without using eval()) the truth of myStr in reference to this last line.
What I have done is: First change myStr to reflect the data types:
import re
delim = "(\>|\<=|\ in )" # Put in group to find later which delimiter is used
def pyRules(s):
varName = re.split(delim, s)[0]
rest = "".join(re.split(delim, s)[1:])
if varName in categoricalVars:
return varName + rest
else:
return "float(" + varName + ")" + rest
# Call:
[pyRules(e) for e in myStr.split(' and ')]
# Result:
['float(x)>2', 'float(y)<=30', "z in ('def', 'abc')"]
Now I can easily do:
[eval(pyRules(e)) for e in myStr.split(' and ')]
# Result:
[True, True, True]
But I want to avoid this. I tried ast.literal_eval() but got the following error:
import ast
[ast.literal_eval(pyRules(e)) for e in myStr.split(' and ')]
# Result:
Traceback (most recent call last):
File "<ipython-input-556-dae16951de03>", line 1, in <module>
ast.literal_eval(ast.parse(conds[0]))
File "C:\ProgramData\Anaconda2\lib\ast.py", line 80, in literal_eval
return _convert(node_or_string)
File "C:\ProgramData\Anaconda2\lib\ast.py", line 79, in _convert
raise ValueError('malformed string')
ValueError: malformed string
Next, I tried the following approach, which almost gave me the right answer:
def pyRules(s):
varName = re.split(delim, s)[0]
operation = "".join(re.split(delim, s)[1:])
if varName in categoricalVars:
return "'{" + varName + "}'" + operation
else:
return "float({" + varName + "})" + operation
rules = [pyRules(e).format(x='5',y='6',z='abc') for e in myStr.split(' and ')]
# rules is:
['float(5)>2', 'float(6)<=30', "'abc' in ('def', 'abc')"]
I can again use the eval() on this and get [True, True, True] but to avoid it I defined my own inequality checker function:
def check(x):
first, operation, second = re.split(delim, x)
if operation == ">":
return first > second
elif operation == "<=":
return first <= second
elif operation == " in ":
return first in second
# Call:
[check(pyRules(e).format(x='5',y='6',z='abc')) for e in myStr.split(' and ')]
# Result:
[True, False, True]
It is having a hard time evaluating the second item, i.e: 'float(6)<=30' I also recreated this function using the operator module per this SO thread which is essentially the same thing, and got the same result.
I checked pyparsing, couldn't get it to work (which even looks scary, look at this!), and SymPy but unfortunately it also uses eval frequently, as documented in the hyperlink I provided.
Question 2: Is it okay to use eval given that I am 100% sure that I don't have any crazy string that can interfere with os and erase my disk and other crazy stuff like that?
Note: This is a piece of a big code that I built in Python 2, so Python 2 based answers would be ideal; but I can move to Python 3 if anybody thinks my answer is in that sphere.
After a few hours of work, I figured out a way to get ast.literal_eval() to work! My logic is to look at the two sides of, say, x>2, i.e. x and 2, make sure these are safe by evaluating both with literal_eval, and then run it through my check() function, which does the evaluation. Same for z in ('def', 'abc'): First make sure both z and ('def', 'abc') are safe, then do the actual boolean checking with the check() function.
Since I fully trusted my inputs I could've done the easier eval() way, but just wanted to be double-cautious. And wanted to build some code for everybody out there who have security issues (user inputs etc.) and need to safely evaluate logicals. Hope it helps somebody!
Please see my full code below, any comments/recommendations are welcome.
import re
import ast
myStr = "x>2 and y<=30 and z in ('def', 'abc')"
categoricalVars = ('z')
x, y, z = '5', '6', 'abc'
delim = "(\>|\<=|\ in )" # Put in group to find in the func check() which delimiter is used
def pyRules(s):
"""
Place {} around variable names so that we can str.format() in the func check()
"""
varName = re.split(delim, s)[0]
rest = "".join(re.split(delim, s)[1:])
return "'{" + varName + "}'" + rest
def check(x):
"""
If operation is > or <= then it is a numeric var, use double literal_eval to
parse floats e.g. "'5'" (dual quotes) to 5.0. This is equivalent to:
float(ast.literal_eval(first)). Else it is categorical, just literal_eval once
"""
first, operation, second = re.split(delim, x)
if operation == ">":
return ast.literal_eval(ast.literal_eval(first)) > ast.literal_eval(second)
elif operation == "<=":
return ast.literal_eval(ast.literal_eval(first)) <= ast.literal_eval(second)
elif operation == " in ":
return ast.literal_eval(first) in ast.literal_eval(second)
# These are my raw rules:
print [pyRules(e) for e in myStr.split(' and ')]
# These are my processed rules:
print [pyRules(e).format(x='5',y='6',z='abc') for e in myStr.split(' and ')]
# And these are my final results of logical evaluation:
print [check(pyRules(e).format(x='5',y='6',z='abc')) for e in myStr.split(' and ')]
Results of the three result lines:
["'{x}'>2", "'{y}'<=30", "'{z}' in ('def', 'abc')"]
["'5'>2", "'6'<=30", "'abc' in ('def', 'abc')"]
[True, True, True]
Thanks!
I have a collection of strings like:
"0"
"90/100"
None
"1-5%/34B-1"
"-13/7"
I would like to convert these into integers (or None) so that I start picking numbers from the beginning and stop at the first non-number character. The above data would thus become:
0
90
None
1
None
I tried doing something like the code below, but ran into multiple problems, like causing ValueError with that int(new_n) line when new_n was just empty string. And even without that, the code just looks horrible:
def pick_right_numbers(old_n):
new_n = ''
numbers = {'0', '1', '2', '3', '4', '5', '6', '7', '8', '9'}
if old_n is None:
return None
else:
for n in old_n:
if n in numbers:
new_n += n
else:
return int(new_n)
if new_n:
return int(new_n)
else:
return None
Could someone nudge me to the right direction with this?
Is this the sort of thing you're looking for ?
import re
data = ['0', '90/100', None, '1-5%/34B-1', '-13/7']
def pick_right_numbers(old_n):
if old_n is None:
return None
else:
digits = re.match("([0-9]*)",old_n).groups()[0]
if digits.isdigit():
return int(digits)
else:
return None
for string in data:
result = pick_right_numbers(string)
if result is not None:
print("Matched section is : {0:d}".format(result))
It uses re (pattern matching) to detect a block of digits at the start of a string (match only matches the beginning of a string, search would find a block anywhere in the string).
It checks for a match, confirms the match is digits (otherwise the last data element matches, but is the empty string) and converts that to an integer to return.
a basic way to do this, would be:
input_list = ["0", "90/100", None, "1-5%/34B-1", "-13/7"]
char_list = ['0', '1', '2', '3', '4', '5', '6', '7', '8', '9']
output_list = []
for input_str in input_list:
if isinstance(input_str, str):
i = 0
for input_char in input_str:
if input_char in char_list:
i += 1
else:
break
else:
i = 0
if i:
output = int(input_str[0:i])
else:
output = None
output_list.append(output)
but there are quite a few variants. if it's a function that you would repeat 10.000+ times per day, some performance profiling would be smart to consider alternatives.
edit: it might be smart to consider what a string is in python 2 vs 3 (see What is the difference between isinstance('aaa', basestring) and isinstance('aaa', str)?)
edit2: see how Bakuriu's solutions simplifies this ->
from itertools import takewhile
input_list = ["0", "90/100", None, "1-5%/34B-1", "-13/7"]
output_list = []
for input_str in input_list:
text = ''.join(takewhile(str.isdigit, input_str or ""))
output_list.append(int(text) if text else None)
(so i think he should add that as the best answer to be honest ;)
>>> import re
>>> s = ["0", "90/100", None, "1-5%/34B-1", "-13/7"]
>>> [int(c) if c else None for c in (re.sub('([0-9]*).*', r'\1', str(x)) for x in s)]
[0, 90, None, 1, None]
How it works
We have two list comprehensions. The inner removes everything from the elements of list s except the initial numbers:
>>> list(re.sub('([0-9]*).*', r'\1', str(x)) for x in s)
['0', '90', '', '1', '']
The outer list comprehension converts those strings, if nonempty, to integers or otherwise to None:
>>> [int(c) if c else None for c in ('0', '90', '', '1', '')]
[0, 90, None, 1, None]
Alternative: using takewhile
As per Bakuriu's comment, we can use intertools.takewhile in place of re.sub:
>>> from itertools import takewhile
>>> [int(c) if len(c) else None for c in (''.join(takewhile(str.isdigit, x or "")) for x in s)]
[0, 90, None, 1, None]
Modifications to original code
Alternatively, we can modify the original code:
def pick_right_numbers(old_n):
if old_n is None:
return None
else:
new_n = ''
for n in old_n:
if not n.isdigit():
break
new_n += n
return int(new_n) if len(new_n) else None
This code produces the output:
>>> [pick_right_numbers(x) for x in s]
[0, 90, None, 1, None]
There are various methods to check if an object is a number. See for instance this answer.
However you only need to check one char at a time, so your method is actually fine. The array will be permanently in cache, so it will be scanned fast.
Note that you can just write it in a nicer way:
if n in "0123456789":
Another possibility, probably the fastest, is checking the range, treating them as numerical values via ASCII representation (using the fact that digits are contiguous in that representation, and are in the order you expect):
zero = ord('0')
nine = ord('9')
for n in old_n:
nn = ord(n)
if (nn >= zero) and (nn <= nine):
The most elegant way, of course, would to call the native isdigit() on it; you save on all clutter and make your intent completely clear.
Note that it might be more than you ask for -⑦ is a digit according to Unicode. But you're unlikely to encounter such cases. Also note that due to this, it will likely be slower than your implementation.
Note that you need to check for new_n == '' also inside the loop! The best way to not repeat yourself is to fall out of the loop to the final if
def pick_right_numbers(old_n):
new_n = ''
if old_n is None:
return None
else:
for n in old_n:
if n.isdigit():
new_n += n
else:
break
if new_n:
return int(new_n)
else:
return None
Of course if you need speed you will probably have to change the approach, as you are growing a vector in a loop. But if this is the logic making sense to you, only complicate it if this is the bottleneck of the program.
Assume I have a string as follows: expression = '123 + 321'.
I am walking over the string character-by-character as follows: for p in expression. I am I am checking if p is a digit using p.isdigit(). If p is a digit, I'd like to grab the whole number (so grab 123 and 321, not just p which initially would be 1).
How can I do that in Python?
In C (coming from a C background), the equivalent would be:
int x = 0;
sscanf(p, "%d", &x);
// the full number is now in x
EDIT:
Basically, I am accepting a mathematical expression from a user that accepts positive integers, +,-,*,/ as well as brackets: '(' and ')'. I am walking the string character by character and I need to be able to determine whether the character is a digit or not. Using isdigit(), I can that. If it is a digit however, I need to grab the whole number. How can that be done?
>>> from itertools import groupby
>>> expression = '123 + 321'
>>> expression = ''.join(expression.split()) # strip whitespace
>>> for k, g in groupby(expression, str.isdigit):
if k: # it's a digit
print 'digit'
print list(g)
else:
print 'non-digit'
print list(g)
digit
['1', '2', '3']
non-digit
['+']
digit
['3', '2', '1']
This is one of those problems that can be approached from many different directions. Here's what I think is an elegant solution based on itertools.takewhile:
>>> from itertools import chain, takewhile
>>> def get_numbers(s):
... s = iter(s)
... for c in s:
... if c.isdigit():
... yield ''.join(chain(c, takewhile(str.isdigit, s)))
...
>>> list(get_numbers('123 + 456'))
['123', '456']
This even works inside a list comprehension:
>>> def get_numbers(s):
... s = iter(s)
... return [''.join(chain(c, takewhile(str.isdigit, s)))
... for c in s if c.isdigit()]
...
>>> get_numbers('123 + 456')
['123', '456']
Looking over other answers, I see that this is not dissimilar to jamylak's groupby solution. I would recommend that if you don't want to discard the extra symbols. But if you do want to discard them, I think this is a bit simpler.
The Python documentation includes a section on simulating scanf, which gives you some idea of how you can use regular expressions to simulate the behavior of scanf (or sscanf, it's all the same in Python). In particular, r'\-?\d+' is the Python string that corresponds to the regular expression for an integer. (r'\d+' for a nonnegative integer.) So you could embed this in your loop as
integer = re.compile(r'\-?\d+')
for p in expression:
if p.isdigit():
# somehow find the current position in the string
integer.match(expression, curpos)
But that still reflects a very C-like way of thinking. In Python, your iterator variable p is really just an individual character that has actually been pulled out of the original string and is standing on its own. So in the loop, you don't naturally have access to the current position within the string, and trying to calculate it is going to be less than optimal.
What I'd suggest instead is using Python's built in regexp matching iteration method:
integer = re.compile(r'\-?\d+') # only do this once in your program
all_the_numbers = integer.findall(expression)
and now all_the_numbers is a list of string representations of all the integers in the expression. If you wanted to actually convert them to integers, then you could do this instead of the last line:
all_the_numbers = [int(s) for s in integer.finditer(expression)]
Here I've used finditer instead of findall because you don't have to make a list of all the strings before iterating over them again to convert them to integers.
Though I'm not familiar with sscanf, I'm no C developer, it looks like it's using format strings in a way not dissimilar to what I'd use python's re module for. Something like this:
import re
nums = re.compile('\d+')
found = nums.findall('123 + 321')
# if you know you're only looking for two values.
left, right = found
You can use shlex http://docs.python.org/library/shlex.html
>>> from shlex import shlex
>>> expression = '123 + 321'
>>> for e in shlex(expression):
... print e
...
123
+
321
>>> expression = '(92831 * 948) / 32'
>>> for e in shlex(expression):
... print e
...
(
92831
*
948
)
/
32
I'd split the string up on the ' + ' string, giving you what's outside of them:
>>> expression = '123 + 321'
>>> ex = expression.split(' + ')
>>> ex
['123', '321']
>>> int_ex = map(int, ex)
>>> int_ex
[123, 321]
>>> sum(int_ex)
444
It's dangerous, but you could use eval:
>>> eval('123 + 321')
444
I'm just taking a stab at you parsing the string, and doing raw calculations on it.
e_array = expression.split('+')
i_array = map(int, e_array)
And i_array holds all integers in the expression.
UPDATE
If you already know all the special characters in your expression and you want to eliminate them all
import re
e_array = re.split('[*/+\-() ]', expression) # all characters here is mult, div, plus, minus, left- right- parathesis and space
i_array = map(int, filter(lambda x: len(x), e_array))