How to match beginning of string or character in Python - python

I have a string consisting of parameter number _ parameter number:
dir = 'a1.8000_b1.0000_cc1.3000_al0.209_be0.209_c1.344_e0.999'
I need to get the number behind a parameter chosen, i.e.
par='be' -->need 0.209
par='e' -->need 0.999
I tried:
num1 = float(re.findall(par + '(\d+\.\d*)', dir)[0])
but for par='e' this will match 0.209 and 0.999, so I tried to match the parameter together with the beginning of the string or an underscore:
num1 = float(re.findall('[^_]'+par+'(\d+\.\d*)', dir)[0])
which didn't work for some reason.
Any suggestions? Thank you!

Your [^_] pattern matches any character that is not the underscore.
Use a (..|..) or grouping instead:
float(re.findall('(?:^|_)' + par + r'(\d+\.\d*)', dir)[0])
I used a (?:..) non-capturing group there so that it doesn't interfere with your original group indices.
Demo:
>>> import re
>>> dir = 'a1.8000_b1.0000_cc1.3000_al0.209_be0.209_c1.344_e0.999'
>>> par = 'e'
>>> re.findall('(?:^|_)' + par + r'(\d+\.\d*)', dir)
['0.999']
>>> par = 'a'
>>> re.findall('(?:^|_)' + par + r'(\d+\.\d*)', dir)
['1.8000']
To elaborate, when using a character group ([..]) and you start that group with the caret (^) you invert the character group, turning it from matching the listed characters to matching everything else instead:
>>> re.findall('[a]', 'abcd')
['a']
>>> re.findall('[^a]', 'abcd')
['b', 'c', 'd']

without regex solution:
def func(par,strs):
ind=strs.index('_'+par)+1+len(par)
ind1=strs.find('_',ind) if strs.find('_',ind)!=-1 else len(strs)
return strs[ind:ind1]
output:
>>> func('be',dir)
'0.209'
>>> func('e',dir)
'0.999'
>>> func('cc',dir)
'1.3000'

A solution without regex:
>>> def get_value(dir, parm):
... return map(float, [t[len(parm):] for t in dir.split('_') if t.startswith(parm)])
...
>>> get_value('a1.8000_b1.0000_cc1.3000_al0.209_be0.209_c1.344_e0.999', "be")
[0.20899999999999999]
If there are multiple occurrences of the parameter in the string, all of them are evaluated.
And a version without casting to a float:
return [t[len(parm):] for t in dir.split('_') if t.startswith(parm)]

(?P<param>[a-zA-Z]*)(?P<version>[^_]*)

Related

How can you group a very specfic pattern with regex?

Problem:
https://coderbyte.com/editor/Simple%20Symbols
The str parameter will be composed of + and = symbols with
several letters between them (ie. ++d+===+c++==a) and for the string
to be true each letter must be surrounded by a + symbol. So the string
to the left would be false. The string will not be empty and will have
at least one letter.
Input:"+d+=3=+s+"
Output:"true"
Input:"f++d+"
Output:"false"
I'm trying to create a regular expression for the following problem, but I keep running into various problems. How can I produce something that returns the specified rules('+\D+')?
import re
plusReg = re.compile(r'[(+A-Za-z+)]')
plusReg.findall()
>>> []
Here I thought I could create my own class that searches for the pattern.
import re
plusReg = re.compile(r'([\\+,\D,\\+])')
plusReg.findall('adf+a+=4=+S+')
>>> ['a', 'd', 'f', '+', 'a', '+', '=', '=', '+', 'S', '+']
Here I thought I the '\\+' would single out the plus symbol and read it as a char.
mo = plusReg.search('adf+a+=4=+S+')
mo.group()
>>>'a'
Here using the same shell, I tried using the search instead of findall, but I just ended up with the first letter which isn't even surrounded by a plus.
My end result is to group the string 'adf+a+=4=+S+' into ['+a+','+S+'] and so on.
edit:
Solution:
import re
def SimpleSymbols(str):
#added padding, because if str = 'y+4==+r+'
#then program would return true when it should return false.
string = '=' + str + '='
#regex that returns false if a letter *doesn't* have a + in front or back
plusReg = re.compile(r'[^\+][A-Za-z].|.[A-Za-z][^\+]')
#if statement that returns "true" if regex doesn't find any letters
#without a + behind or in front
if plusReg.search(string) is None:
return "true"
return "false"
print SimpleSymbols(raw_input())
I borrowed some code from ekhumoro and Sanjay. Thanks
One approach is to search the string for any letters that are either: (1) not preceeded by a +, or (2) not followed by a +. This can be done using look ahead and look behind assertions:
>>> rgx = re.compile(r'(?<!\+)[a-zA-Z]|[a-zA-Z](?!\+)')
So if rgx.search(string) returns None, the string is valid:
>>> rgx.search('+a+') is None
True
>>> rgx.search('+a+b+') is None
True
but if it returns a match, the string is invalid:
>>> rgx.search('+ab+') is None
False
>>> rgx.search('+a=b+') is None
False
>>> rgx.search('a') is None
False
>>> rgx.search('+a') is None
False
>>> rgx.search('a+') is None
False
The important thing about look ahead/behind assertions is that they don't consume characters, so they can handle overlapping matches.
Something like this should do the trick:
import re
def is_valid_str(s):
return re.findall('[a-zA-Z]', s) == re.findall('\+([a-zA-Z])\+', s)
Usage:
In [10]: is_valid_str("f++d+")
Out[10]: False
In [11]: is_valid_str("+d+=3=+s+")
Out[11]: True
I think you are on the right track. The regular expression you have is correct, but it can simplify down to just letters:
search_pattern = re.compile(r'\+[a-zA-z]\+')
for upper and lower case strings. Now we can use this regex with the findall function:
results = re.findall(search_pattern, 'adf+a+=4=+S+') # returns ['+a+', '+S+']
Now the question needs you to return a boolean depending on if the string is valid to the specified pattern so we can wrap this all up into a function:
def is_valid_pattern(pattern_string):
search_pattern = re.compile(r'\+[a-zA-z]?\+')
letter_pattern = re.compile(r'[a-zA-z]') # to search for all letters
results = re.findall(search_pattern, pattern_string)
letters = re.findall(letter_pattern, pattern_string)
# if the lenght of the list of all the letters equals the length of all
# the values found with the pattern, we can say that it is a valid string
return len(results) == len(letter_pattern)
You should be looking for what isn't there, as opposed to what is. You should search for something like, ([^\+][A-Za-z]|[A-Za-z][^\+]). The | in the middle is a logical or operator. Then on either side, it checks if it can find any scenario where there is a letter without a "+" on the left/right respectively. If if finds something, that means the string fails. If it can't find anything, that means that there are no instances of a letter not being surrounded by "+"'s.

how to remove whitespace inside bracket?

I have the following string:
res = '(321, 3)-(m-5, 5) -(31,1)'
I wanna remove the whitespace withing the bracket but i haven't any knowledge about regular expression
I ve try this but that doesn't work:
import re
res = re.sub(r'\(.*\s+\)', '', res)
You can substitute a non-greedy wildcard match for characters in parentheses with a function that splits the match on whitespace and rejoins it.
>>> import re
>>> res = '(321, 3)-(m-5, 5) -(31,1)'
>>> re.sub(r'\(.*?\)', lambda x: ''.join(x.group(0).split()), res)
'(321,3)-(m-5,5) -(31,1)'
You could convert the string into a list, go through each letter and count if you are within brackets or not. In toRemove, you collect the positions of whitespaces, which you then remove from the list. Then you convert the list back to a string ...
res = '(321, 3)-(m-5, 5) -(31,1)'
r = list(res)
insideBracket = 0
toRemove = []
for pos,letter in enumerate(r):
if letter == '(':
insideBracket += 1
elif letter == ')':
insideBracket -= 1
if insideBracket > 0:
if letter == ' ':
toRemove.append(pos)
for t in toRemove[::-1]:
r.pop(t)
result = ''.join(r)
print(result)
I think regular expressions aren't quite powerful enough to do what you want here; you want to remove all whitespace that's found in between parenthesis characters. The trouble is, solving this for the general case means you're doing a context-sensitive match on the string, and regular expressions are mostly context-insensitive, and so can't do your job. There are lookaheads and lookbehinds that can restrict matches to particular contexts, but they won't solve your problem in the general case either:
The contained pattern must only match strings of some fixed length, meaning that abc or a|b are allowed, but a* and a{3,4} are not. Group references are not supported even if they match strings of some fixed length.
Because of this, I would match the parenthesis groups first:
>>> re.split(r'(\([^)]*\))', res)
['', '(321, 3)', '-', '(m-5, 5)', ' -', '(31,1)', '']
and then remove whitespace from them in a second step before joining everything back up into a single string:
>>> g = re.split(r'(\([^)]*\))', res)
>>> g[1::2] = [re.sub(r'\s*', '', x) for x in g[1::2]]
>>> ''.join(g)
'(321,3)-(m-5,5) -(31,1)'

python capitalize first letter only

I am aware .capitalize() capitalizes the first letter of a string but what if the first character is a integer?
this
1bob
5sandy
to this
1Bob
5Sandy
Only because no one else has mentioned it:
>>> 'bob'.title()
'Bob'
>>> 'sandy'.title()
'Sandy'
>>> '1bob'.title()
'1Bob'
>>> '1sandy'.title()
'1Sandy'
However, this would also give
>>> '1bob sandy'.title()
'1Bob Sandy'
>>> '1JoeBob'.title()
'1Joebob'
i.e. it doesn't just capitalize the first alphabetic character. But then .capitalize() has the same issue, at least in that 'joe Bob'.capitalize() == 'Joe bob', so meh.
If the first character is an integer, it will not capitalize the first letter.
>>> '2s'.capitalize()
'2s'
If you want the functionality, strip off the digits, you can use '2'.isdigit() to check for each character.
>>> s = '123sa'
>>> for i, c in enumerate(s):
... if not c.isdigit():
... break
...
>>> s[:i] + s[i:].capitalize()
'123Sa'
This is similar to #Anon's answer in that it keeps the rest of the string's case intact, without the need for the re module.
def sliceindex(x):
i = 0
for c in x:
if c.isalpha():
i = i + 1
return i
i = i + 1
def upperfirst(x):
i = sliceindex(x)
return x[:i].upper() + x[i:]
x = '0thisIsCamelCase'
y = upperfirst(x)
print(y)
# 0ThisIsCamelCase
As #Xan pointed out, the function could use more error checking (such as checking that x is a sequence - however I'm omitting edge cases to illustrate the technique)
Updated per #normanius comment (thanks!)
Thanks to #GeoStoneMarten in pointing out I didn't answer the question! -fixed that
Here is a one-liner that will uppercase the first letter and leave the case of all subsequent letters:
import re
key = 'wordsWithOtherUppercaseLetters'
key = re.sub('([a-zA-Z])', lambda x: x.groups()[0].upper(), key, 1)
print key
This will result in WordsWithOtherUppercaseLetters
As seeing here answered by Chen Houwu, it's possible to use string package:
import string
string.capwords("they're bill's friends from the UK")
>>>"They're Bill's Friends From The Uk"
a one-liner: ' '.join(sub[:1].upper() + sub[1:] for sub in text.split(' '))
You can replace the first letter (preceded by a digit) of each word using regex:
re.sub(r'(\d\w)', lambda w: w.group().upper(), '1bob 5sandy')
output:
1Bob 5Sandy
def solve(s):
for i in s[:].split():
s = s.replace(i, i.capitalize())
return s
This is the actual code for work. .title() will not work at '12name' case
I came up with this:
import re
regex = re.compile("[A-Za-z]") # find a alpha
str = "1st str"
s = regex.search(str).group() # find the first alpha
str = str.replace(s, s.upper(), 1) # replace only 1 instance
print str
def solve(s):
names = list(s.split(" "))
return " ".join([i.capitalize() for i in names])
Takes a input like your name: john doe
Returns the first letter capitalized.(if first character is a number, then no capitalization occurs)
works for any name length

Retrieving a full number

Assume I have a string as follows: expression = '123 + 321'.
I am walking over the string character-by-character as follows: for p in expression. I am I am checking if p is a digit using p.isdigit(). If p is a digit, I'd like to grab the whole number (so grab 123 and 321, not just p which initially would be 1).
How can I do that in Python?
In C (coming from a C background), the equivalent would be:
int x = 0;
sscanf(p, "%d", &x);
// the full number is now in x
EDIT:
Basically, I am accepting a mathematical expression from a user that accepts positive integers, +,-,*,/ as well as brackets: '(' and ')'. I am walking the string character by character and I need to be able to determine whether the character is a digit or not. Using isdigit(), I can that. If it is a digit however, I need to grab the whole number. How can that be done?
>>> from itertools import groupby
>>> expression = '123 + 321'
>>> expression = ''.join(expression.split()) # strip whitespace
>>> for k, g in groupby(expression, str.isdigit):
if k: # it's a digit
print 'digit'
print list(g)
else:
print 'non-digit'
print list(g)
digit
['1', '2', '3']
non-digit
['+']
digit
['3', '2', '1']
This is one of those problems that can be approached from many different directions. Here's what I think is an elegant solution based on itertools.takewhile:
>>> from itertools import chain, takewhile
>>> def get_numbers(s):
... s = iter(s)
... for c in s:
... if c.isdigit():
... yield ''.join(chain(c, takewhile(str.isdigit, s)))
...
>>> list(get_numbers('123 + 456'))
['123', '456']
This even works inside a list comprehension:
>>> def get_numbers(s):
... s = iter(s)
... return [''.join(chain(c, takewhile(str.isdigit, s)))
... for c in s if c.isdigit()]
...
>>> get_numbers('123 + 456')
['123', '456']
Looking over other answers, I see that this is not dissimilar to jamylak's groupby solution. I would recommend that if you don't want to discard the extra symbols. But if you do want to discard them, I think this is a bit simpler.
The Python documentation includes a section on simulating scanf, which gives you some idea of how you can use regular expressions to simulate the behavior of scanf (or sscanf, it's all the same in Python). In particular, r'\-?\d+' is the Python string that corresponds to the regular expression for an integer. (r'\d+' for a nonnegative integer.) So you could embed this in your loop as
integer = re.compile(r'\-?\d+')
for p in expression:
if p.isdigit():
# somehow find the current position in the string
integer.match(expression, curpos)
But that still reflects a very C-like way of thinking. In Python, your iterator variable p is really just an individual character that has actually been pulled out of the original string and is standing on its own. So in the loop, you don't naturally have access to the current position within the string, and trying to calculate it is going to be less than optimal.
What I'd suggest instead is using Python's built in regexp matching iteration method:
integer = re.compile(r'\-?\d+') # only do this once in your program
all_the_numbers = integer.findall(expression)
and now all_the_numbers is a list of string representations of all the integers in the expression. If you wanted to actually convert them to integers, then you could do this instead of the last line:
all_the_numbers = [int(s) for s in integer.finditer(expression)]
Here I've used finditer instead of findall because you don't have to make a list of all the strings before iterating over them again to convert them to integers.
Though I'm not familiar with sscanf, I'm no C developer, it looks like it's using format strings in a way not dissimilar to what I'd use python's re module for. Something like this:
import re
nums = re.compile('\d+')
found = nums.findall('123 + 321')
# if you know you're only looking for two values.
left, right = found
You can use shlex http://docs.python.org/library/shlex.html
>>> from shlex import shlex
>>> expression = '123 + 321'
>>> for e in shlex(expression):
... print e
...
123
+
321
>>> expression = '(92831 * 948) / 32'
>>> for e in shlex(expression):
... print e
...
(
92831
*
948
)
/
32
I'd split the string up on the ' + ' string, giving you what's outside of them:
>>> expression = '123 + 321'
>>> ex = expression.split(' + ')
>>> ex
['123', '321']
>>> int_ex = map(int, ex)
>>> int_ex
[123, 321]
>>> sum(int_ex)
444
It's dangerous, but you could use eval:
>>> eval('123 + 321')
444
I'm just taking a stab at you parsing the string, and doing raw calculations on it.
e_array = expression.split('+')
i_array = map(int, e_array)
And i_array holds all integers in the expression.
UPDATE
If you already know all the special characters in your expression and you want to eliminate them all
import re
e_array = re.split('[*/+\-() ]', expression) # all characters here is mult, div, plus, minus, left- right- parathesis and space
i_array = map(int, filter(lambda x: len(x), e_array))

Regex for wrapping digits with curly braces?

I am trying to using Python's re.sub() to match a string with an e character and insert curly braces immediately after the e character and after the lastdigit. For example:
12.34e56 to 12.34e{56}
1e10 to 1e{10}
I can't seem to find the correct regex to insert the desired curly braces. For example, I can properly insert the left brace like this:
>>> import re
>>> x = '12.34e10'
>>> pattern = re.compile(r'(e)')
>>> sub = z = re.sub(pattern, "\1e{", x)
>>> print(sub)
12.34e{10 # this is the correct placement for the left brace
My problem arises when using two back references.
>>> import re
>>> x = '12.34e10'
>>> pattern = re.compile(r'(e).+($)')
>>> sub = z = re.sub(pattern, "\1e{\2}", x)
>>> print(sub)
12.34e{} # this is not what I want, digits 10 have been removed
Can anyone point out my problem? Thanks for the help.
re.sub(r'e(\d+)', r'e{\1}', '12.34e56')
returns '12.34e{56}'
or, the same result but different logic (don't replace e with e):
re.sub(r'(?<=e)(\d+)', r'{\1}', '12.34e56')
Your brace placement is incorrect.
Here's a solution ensuring the that there's a number with optional decimal place before the e:
import re
samples = ['12.34e56','1e10']
for s in samples:
print re.sub(r'(\d+(?:\.\d+)?)e([0-9]+)',"\g<1>e{\g<2>}",s)
Yields:
12.34e{56}
1e{10}

Categories

Resources