Retrieving a full number - python

Assume I have a string as follows: expression = '123 + 321'.
I am walking over the string character-by-character as follows: for p in expression. I am I am checking if p is a digit using p.isdigit(). If p is a digit, I'd like to grab the whole number (so grab 123 and 321, not just p which initially would be 1).
How can I do that in Python?
In C (coming from a C background), the equivalent would be:
int x = 0;
sscanf(p, "%d", &x);
// the full number is now in x
EDIT:
Basically, I am accepting a mathematical expression from a user that accepts positive integers, +,-,*,/ as well as brackets: '(' and ')'. I am walking the string character by character and I need to be able to determine whether the character is a digit or not. Using isdigit(), I can that. If it is a digit however, I need to grab the whole number. How can that be done?

>>> from itertools import groupby
>>> expression = '123 + 321'
>>> expression = ''.join(expression.split()) # strip whitespace
>>> for k, g in groupby(expression, str.isdigit):
if k: # it's a digit
print 'digit'
print list(g)
else:
print 'non-digit'
print list(g)
digit
['1', '2', '3']
non-digit
['+']
digit
['3', '2', '1']

This is one of those problems that can be approached from many different directions. Here's what I think is an elegant solution based on itertools.takewhile:
>>> from itertools import chain, takewhile
>>> def get_numbers(s):
... s = iter(s)
... for c in s:
... if c.isdigit():
... yield ''.join(chain(c, takewhile(str.isdigit, s)))
...
>>> list(get_numbers('123 + 456'))
['123', '456']
This even works inside a list comprehension:
>>> def get_numbers(s):
... s = iter(s)
... return [''.join(chain(c, takewhile(str.isdigit, s)))
... for c in s if c.isdigit()]
...
>>> get_numbers('123 + 456')
['123', '456']
Looking over other answers, I see that this is not dissimilar to jamylak's groupby solution. I would recommend that if you don't want to discard the extra symbols. But if you do want to discard them, I think this is a bit simpler.

The Python documentation includes a section on simulating scanf, which gives you some idea of how you can use regular expressions to simulate the behavior of scanf (or sscanf, it's all the same in Python). In particular, r'\-?\d+' is the Python string that corresponds to the regular expression for an integer. (r'\d+' for a nonnegative integer.) So you could embed this in your loop as
integer = re.compile(r'\-?\d+')
for p in expression:
if p.isdigit():
# somehow find the current position in the string
integer.match(expression, curpos)
But that still reflects a very C-like way of thinking. In Python, your iterator variable p is really just an individual character that has actually been pulled out of the original string and is standing on its own. So in the loop, you don't naturally have access to the current position within the string, and trying to calculate it is going to be less than optimal.
What I'd suggest instead is using Python's built in regexp matching iteration method:
integer = re.compile(r'\-?\d+') # only do this once in your program
all_the_numbers = integer.findall(expression)
and now all_the_numbers is a list of string representations of all the integers in the expression. If you wanted to actually convert them to integers, then you could do this instead of the last line:
all_the_numbers = [int(s) for s in integer.finditer(expression)]
Here I've used finditer instead of findall because you don't have to make a list of all the strings before iterating over them again to convert them to integers.

Though I'm not familiar with sscanf, I'm no C developer, it looks like it's using format strings in a way not dissimilar to what I'd use python's re module for. Something like this:
import re
nums = re.compile('\d+')
found = nums.findall('123 + 321')
# if you know you're only looking for two values.
left, right = found

You can use shlex http://docs.python.org/library/shlex.html
>>> from shlex import shlex
>>> expression = '123 + 321'
>>> for e in shlex(expression):
... print e
...
123
+
321
>>> expression = '(92831 * 948) / 32'
>>> for e in shlex(expression):
... print e
...
(
92831
*
948
)
/
32

I'd split the string up on the ' + ' string, giving you what's outside of them:
>>> expression = '123 + 321'
>>> ex = expression.split(' + ')
>>> ex
['123', '321']
>>> int_ex = map(int, ex)
>>> int_ex
[123, 321]
>>> sum(int_ex)
444
It's dangerous, but you could use eval:
>>> eval('123 + 321')
444
I'm just taking a stab at you parsing the string, and doing raw calculations on it.

e_array = expression.split('+')
i_array = map(int, e_array)
And i_array holds all integers in the expression.
UPDATE
If you already know all the special characters in your expression and you want to eliminate them all
import re
e_array = re.split('[*/+\-() ]', expression) # all characters here is mult, div, plus, minus, left- right- parathesis and space
i_array = map(int, filter(lambda x: len(x), e_array))

Related

Regex for Transformations (without using multiple statements)

What is the best way to use Regex to extract and transform one statement to another?
Specifically, I have implemented the below to find and extract a sudent number from a block of text and transform it as follows: AB123CD to AB-123-CD
Right now, this is implemented as 3 statements as follows:
gg['student_num'] = gg['student_test'].str.extract('(\d{2})\w{3}\d{2}') + \
'-' + gg['student_num'].str.extract('\d{2}(\w{3})\d{2}') + \
'-' + gg['student_test'].str.extract('\d{2}\w{3}(\d{2})')
It doesn't feel right to me that I would need to have three statements -
one for each group - concatenated together below (or even more if this was more complicated) and wondered if there was a better way to find and transform some text?
You could get list of segments using regexp and then join them this way:
'-'.join(re.search(r'(\d{2})(\w{3})(\d{2})', string).groups())
You could get AttributeError if string doesn't contain needed pattern (re.search() returns None), so you might want to wrap this expression in try...except block.
This is not regex, but it is quick and concise:
s = "AB123CD"
first = [i for i, a in enumerate(s) if a.isdigit()][0]
second = [i for i, a in enumerate(s) if a.isdigit()][-1]
new_form = s[:first]+"-"+s[first:second+1]+"-"+s[second+1:]
Output:
AB-123-CD
Alternative regex solution:
letters = re.findall("[a-zA-Z]+", s)
numbers = re.findall("[0-9]+", s)
letters.insert(1, numbers[0])
final = '-'.join(letters)
print(final)
Output:
AB-123-CD
Try this. Hope that helps
>>> import re
>>> s = r'ABC123DEF'
>>> n = re.search(r'\d+',s).group()
>>> f = re.findall(r'[A-Za-z]+',s)
>>> new_s = f[0]+"-"+n+"-"+f[1]
>>> new_s
Output:
'ABC-123-DEF'

Check and remove particular char from string in python

I'm in a situation where I have a string and a special symbol that is consecutively repeating, such as:
s = 'a.b.c...d..e.g'
How can I check whether it is repeating or not and remove consecutive symbols, resulting in this:
s = 'a.b.c.d.e.g'
import re
result = re.sub(r'\.{2,}', '.', 'a.b.c...d..e.g')
A bit more generalized version:
import re
symbol = '.'
regex_pattern_to_replace = re.escape(symbol)+'{2,}'
# Note that escape sequences are processed in replace_to
# but this time we have no backslash characters in it.
# In case of more complex replacement we could use
# replace_to = replace_to.replace('\\', '\\\\')
# to defend against occasional escape sequences.
replace_to = symbol
result = re.sub(regex_pattern_to_replace, replace_to, 'a.b.c...d..e.g')
The same with compiled regex (added after Cristian Ciupitu's comment):
compiled_regex = re.compile(regex_pattern_to_replace)
# You can store the compiled_regex and reuse it multiple times.
result = compiled_regex.sub(replace_to, 'a.b.c...d..e.g')
Check out the docs for re.sub
Simple and clear:
>>> a = 'a.b.c...d..e.g'
>>> while '..' in a:
a = a.replace('..','.')
>>> a
'a.b.c.d.e.g'
Lot's of answers so why not throw another one into the mix.
You can zip the string with itself off by one and eliminate all matching '.'s:
''.join(x[0] for x in zip(s, s[1:]+' ') if x != ('.', '.'))
Certainly not the fastest, just interesting. It's trivial to turn this into eliminating all repeating elements:
''.join(a for a,b in zip(s, s[1:]+' ') if a != b)
Note: you can use izip_longest (py2) or zip_longest (py3) if ' ' as a filler causes an issue.
My previous answer was a dud so here's another attempt using reduce(). This is reasonably efficient with O(n) time complexity:
def remove_consecutive(s, symbol='.'):
def _remover(x, y):
if y == symbol and x[-1:] == y:
return x
else:
return x + y
return reduce(_remover, s, '')
for s in 'abcdefg', '.a.', '..aa..', '..aa...b...c.d.e.f.g.....', '.', '..', '...', '':
print remove_consecutive(s)
Output
abcdefg
.a.
.aa.
.aa.b.c.d.e.f.g.
.
.
.
Kind of complicated, but it works and it's being done in a single loop:
import itertools
def remove_consecutive(s, c='.'):
return ''.join(
itertools.chain.from_iterable(
c if k else g
for k, g in itertools.groupby(s, c.__eq__)
)
)

Python print first N integers from a string

Is it possible without regex in python to print the first n integers from a string containing both integers and characters?
For instance:
string1 = 'test120202test34234e23424'
string2 = 'ex120202test34234e23424'
foo(string1,6) => 120202
foo(string2,6) => 120202
Anything's possible without a regex. Most things are preferable without a regex.
On easy way is.
>>> str = 'test120202test34234e23424'
>>> str2 = 'ex120202test34234e23424'
>>> ''.join(c for c in str if c.isdigit())[:6]
'120202'
>>> ''.join(c for c in str2 if c.isdigit())[:6]
'120202'
You might want to handle your corner cases some specific way -- it all depends on what you know your code should do.
>>> str3 = "hello 4 world"
>>> ''.join(c for c in str3 if c.isdigit())[:6]
'4'
And don't name your strings str!
You can remove all the alphabets from you string with str.translate and the slice till the number of digits you want, like this
import string
def foo(input_string, num):
return input_string.translate(None, string.letters)[:num]
print foo('test120202test34234e23424', 6) # 120202
print foo('ex120202test34234e23424', 6) # 120202
Note: This simple technique works only in Python 2.x
But the most efficient way is to go with the itertools.islice
from itertools import islice
def foo(input_string, num):
return "".join(islice((char for char in input_string if char.isdigit()),num))
This is is the most efficient way because, it doesn't have to process the entire string before returning the result.
If you didn't want to process the whole string - not a problem with the length of strings you give as an example - you could try:
import itertools
"".join(itertools.islice((c for c in str2 if c.isdigit()),0,5))

Regex for wrapping digits with curly braces?

I am trying to using Python's re.sub() to match a string with an e character and insert curly braces immediately after the e character and after the lastdigit. For example:
12.34e56 to 12.34e{56}
1e10 to 1e{10}
I can't seem to find the correct regex to insert the desired curly braces. For example, I can properly insert the left brace like this:
>>> import re
>>> x = '12.34e10'
>>> pattern = re.compile(r'(e)')
>>> sub = z = re.sub(pattern, "\1e{", x)
>>> print(sub)
12.34e{10 # this is the correct placement for the left brace
My problem arises when using two back references.
>>> import re
>>> x = '12.34e10'
>>> pattern = re.compile(r'(e).+($)')
>>> sub = z = re.sub(pattern, "\1e{\2}", x)
>>> print(sub)
12.34e{} # this is not what I want, digits 10 have been removed
Can anyone point out my problem? Thanks for the help.
re.sub(r'e(\d+)', r'e{\1}', '12.34e56')
returns '12.34e{56}'
or, the same result but different logic (don't replace e with e):
re.sub(r'(?<=e)(\d+)', r'{\1}', '12.34e56')
Your brace placement is incorrect.
Here's a solution ensuring the that there's a number with optional decimal place before the e:
import re
samples = ['12.34e56','1e10']
for s in samples:
print re.sub(r'(\d+(?:\.\d+)?)e([0-9]+)',"\g<1>e{\g<2>}",s)
Yields:
12.34e{56}
1e{10}

python string manipulation [duplicate]

I have a string s with nested brackets: s = "AX(p>q)&E((-p)Ur)"
I want to remove all characters between all pairs of brackets and store in a new string like this: new_string = AX&E
i tried doing this:
p = re.compile("\(.*?\)", re.DOTALL)
new_string = p.sub("", s)
It gives output: AX&EUr)
Is there any way to correct this, rather than iterating each element in the string?
Another simple option is removing the innermost parentheses at every stage, until there are no more parentheses:
p = re.compile("\([^()]*\)")
count = 1
while count:
s, count = p.subn("", s)
Working example: http://ideone.com/WicDK
You can just use string manipulation without regular expression
>>> s = "AX(p>q)&E(qUr)"
>>> [ i.split("(")[0] for i in s.split(")") ]
['AX', '&E', '']
I leave it to you to join the strings up.
>>> import re
>>> s = "AX(p>q)&E(qUr)"
>>> re.compile("""\([^\)]*\)""").sub('', s)
'AX&E'
Yeah, it should be:
>>> import re
>>> s = "AX(p>q)&E(qUr)"
>>> p = re.compile("\(.*?\)", re.DOTALL)
>>> new_string = p.sub("", s)
>>> new_string
'AX&E'
Nested brackets (or tags, ...) are something that are not possible to handle in a general way using regex. See http://www.amazon.de/Mastering-Regular-Expressions-Jeffrey-Friedl/dp/0596528124/ref=sr_1_1?ie=UTF8&s=gateway&qid=1304230523&sr=8-1-spell for details why. You would need a real parser.
It's possible to construct a regex which can handle two levels of nesting, but they are already ugly, three levels will already be quite long. And you don't want to think about four levels. ;-)
You can use PyParsing to parse the string:
from pyparsing import nestedExpr
import sys
s = "AX(p>q)&E((-p)Ur)"
expr = nestedExpr('(', ')')
result = expr.parseString('(' + s + ')').asList()[0]
s = ''.join(filter(lambda x: isinstance(x, str), result))
print(s)
Most code is from: How can a recursive regexp be implemented in python?
You could use re.subn():
import re
s = 'AX(p>q)&E((-p)Ur)'
while True:
s, n = re.subn(r'\([^)(]*\)', '', s)
if n == 0:
break
print(s)
Output
AX&E
this is just how you do it:
# strings
# double and single quotes use in Python
"hey there! welcome to CIP"
'hey there! welcome to CIP'
"you'll understand python"
'i said, "python is awesome!"'
'i can\'t live without python'
# use of 'r' before string
print(r"\new code", "\n")
first = "code in"
last = "python"
first + last #concatenation
# slicing of strings
user = "code in python!"
print(user)
print(user[5]) # print an element
print(user[-3]) # print an element from rear end
print(user[2:6]) # slicing the string
print(user[:6])
print(user[2:])
print(len(user)) # length of the string
print(user.upper()) # convert to uppercase
print(user.lstrip())
print(user.rstrip())
print(max(user)) # max alphabet from user string
print(min(user)) # min alphabet from user string
print(user.join([1,2,3,4]))
input()

Categories

Resources