How to replace multiple matches in Regex

How to replace multiple matches in Regex - python

I'm trying to replace '=' with '==' in the following string:
log="[x] = '1' and [y] <> '7' or [z]='51'".
Unfortunately, only the second '=' is getting replaced. Why is the first one not being replaced and how do I replace the first one as well?
def subs_equal_sign(logic):
y = re.compile(r'\]\s?\=\s?')
iterator = y.finditer(logic)
for match in iterator:
j = str(match.group())
return logic.replace(j, ']==')
The output should be:
log="[x] == '1' and [y] <> '7' or [z]=='51'".
This is what i get instead:
log="[x] = '1' and [y] <> '7' or [z]=='51'".

for match in iterator:
j = str(match.group())
return logic.replace(j, ']==')
This part goes through the matches and doesn't do any replacing.
Only when you leave the loop, you do replacing - that's why it changes only the last one. ;)
Also, you do replacing without using the regex - simple str.replace takes all substrings matches and replaces them. So if your first = didn't have space before, it would get changed anyway!
Looking at your regex, there is only one space possible between ] and =, so why not do the replacing on those two cases, instead of using regexes? ;)
def subs_equal_sign(logic):
return logic.replace(']=', ']==').replace('] =', ']==')

Maybe the replace() function is what you are looking for :
log="[x] = '1' and [y] <> '7' or [z]='51'"
log = log.replace("=", "==")

Change your function to
def subs_equal_sign(logic):
y = re.compile(r'\]\s?\=\s?')
return y.sub("]==", logic)
and the output will now be
>>> subs_equal_sign('''log="[x] = '1' and [y] <> '7' or [z]='51'".''')
'log="[x]==\'1\' and [y] <> \'7\' or [z]==\'51\'".'
as expected.
#h4z3 correctly pointed out that your key problem is iterating through the matched groups without doing anything to them. You can make it work by simply using re.sub() to replace all occurrences at once.

A quick way to deal with this is to remove the whitespace:
def subs_equal_sign(logic):
for k in range(len(logic))):
logic[k].replace(' ','')
y = re.compile(r'\]\s?\=\s?')
iterator = y.finditer(logic)
for match in iterator:
j = str(match.group())
return logic.replace(j, ']==')
Does the string represent the branching logic for a REDCap variable? If so, I wrote a function a while back that should convert REDCap's SQL-like syntax to a pythonic form. Here it is:
def make_pythonic(str):
"""
Takes the branching logic string of a field name
and converts the syntax to that of Python.
"""
# make list of all checkbox vars in branching_logic string
# NOTE: items in list have the same serialization (ordering)
# as in the string.
checkbox_snoop = re.findall('[a-z0-9_]*\([0-9]*\)', str)
# if there are entries in checkbox_snoop
if len(checkbox_snoop) > 0:
# serially replace "[mycheckboxvar(888)]" syntax of each
# checkbox var in the logic string with the appropraite
# "record['mycheckboxvar___888']" syntax
for item in checkbox_snoop:
item = re.sub('\)', '', item)
item = re.sub('\(', '___', item)
str = re.sub('[a-z0-9_]*\([0-9]*\)', item, str)
# mask and substitute
str = re.sub('<=', 'Z11Z', str)
str = re.sub('>=', 'X11X', str)
str = re.sub('=', '==', str)
str = re.sub('Z11Z', '<=', str)
str = re.sub('X11X', '>=', str)
str = re.sub('<>', '!=', str)
str = re.sub('\[', 'record[\'', str)
str = re.sub('\]', '\']', str)
# return the string
return str

This could replace the given character with the new char to be replaced in the entire string.
log=log.replace("=","==")#Replaces the given substring with new string
print(log)#Display

Related

How can we remove word with repeated single character?

I am trying to remove word with single repeated characters using regex in python, for example :
good => good
gggggggg => g
What I have tried so far is following
re.sub(r'([a-z])\1+', r'\1', 'ffffffbbbbbbbqqq')
Problem with above solution is that it changes good to god and I just want to remove words with single repeated characters.

A better approach here is to use a set
def modify(s):
#Create a set from the string
c = set(s)
#If you have only one character in the set, convert set to string
if len(c) == 1:
return ''.join(c)
#Else return original string
else:
return s
print(modify('good'))
print(modify('gggggggg'))
If you want to use regex, mark the start and end of the string in our regex by ^ and $ (inspired from #bobblebubble comment)
import re
def modify(s):
#Create the sub string with a regex which only matches if a single character is repeated
#Marking the start and end of string as well
out = re.sub(r'^([a-z])\1+$', r'\1', s)
return out
print(modify('good'))
print(modify('gggggggg'))
The output will be
good
g

If you do not want to use a set in your method, this should do the trick:
def simplify(s):
l = len(s)
if l>1 and s.count(s[0]) == l:
return s[0]
return s
print(simplify('good'))
print(simplify('abba'))
print(simplify('ggggg'))
print(simplify('g'))
print(simplify(''))
output:
good
abba
g
g
Explanations:
You compute the length of the string
you count the number of characters that are equal to the first one and you compare the count with the initial string length
depending on the result you return the first character or the whole string

You can use trim command:
take a look at this examples:
"ggggggg".Trim('g');
Update:
and for characters which are in the middle of the string use this function, thanks to this answer
in java:
public static string RemoveDuplicates(string input)
{
return new string(input.ToCharArray().Distinct().ToArray());
}
in python:
used = set()
unique = [x for x in mylist if x not in used and (used.add(x) or True)]
but I think all of these answers does not match situation like aaaaabbbbbcda, this string has an a at the end of string which does not appear in the result (abcd). for this kind of situation use this functions which I wrote:
In:
def unique(s):
used = set()
ret = list()
s = list(s)
for x in s:
if x not in used:
ret.append(x)
used = set()
used.add(x)
return ret
print(unique('aaaaabbbbbcda'))
out:
['a', 'b', 'c', 'd', 'a']

What do these single quotes do at the beginning of line 2?

I found the following code in some random website explaining concatenating:
data_numb = input("Input Data, then press enter: ")
numb = ''.join(list(filter(str.isdigit, data_numb)))
print('(' + numb[:3] + ') ' + numb[3:6] + '-' + numb[6:])
and I was wondering what the single quotes do in the
numb = ''.join(
Any help is appreciated!

join(iterable) is a method from the str class.
Return a string which is the concatenation of the strings in iterable.
A TypeError will be raised if there are any non-string values in
iterable, including bytes objects. The separator between elements is
the string providing this method.
''.join(("Hello", "World")) will return 'HelloWorld'.
';'.join(("Hello", "World", "how", "are", "you")) will return 'Hello;World;how;are;you'.
join is very helpful if you need to add a delimiter between each element from a list (or any iterable) of strings.
It looks like nothing but if you do not use join, this kind of operation is often ugly to implement because of edge effects:
For a list or tuple of strings :
def join(list_strings, delimiter):
str_result = ''
for e in list_strings[:-1]:
str_result += e + delimiter
if list_strings:
str_result += list_strings[-1]
return str_result
For any iterable :
def join(iterable, delimiter):
iterator = iter(iterable)
str_result = ''
try:
str_result += next(iterator)
while True:
str_result += delimiter + next(iterator)
except StopIteration:
return str_result
Because join works on any iterable, you don't need to create a list from the filter result.
numb = ''.join(filter(str.isdigit, data_numb))
works as well

Join method is used to concatenate a string with any iterable object. In this example, the first string is an empty string, also represented by two single quotes, '' (don't confuse the single quotes with a single double quote).
The join() method of a string object concatenates it with another iterable provided. So, if the first string is an empty string, the resultant string is the concatenated output of the elements in the iterable.
What is its use:
It can be used to concatenate a list of strings. For example:
a = ['foo', 'bar']
b = ''.join(a)
print(b) # foobar
It can be used to concatenate strings. (Since a string is an iterable, as well)
a = "foobar"
b = ''.join(a)
print(b) # foobar
You can think of more use cases, but this is just a gist of it. You can also refer to the documentation here.

Print the first, second occurred character in a list

I working on a simple algorithm which prints the first character who occurred twice or more.
for eg:
string ='abcabc'
output = a
string = 'abccba'
output = c
string = 'abba'
output = b
what I have done is:
string = 'abcabc'
s = []
for x in string:
if x in s:
print(x)
break
else:
s.append(x)
output: a
But its time complexity is O(n^2), how can I do this in O(n)?

Change s = [] to s = set() (and obviously the corresponding append to add). in over set is O(1), unlike in over list which is sequential.
Alternately, with regular expressions (O(n^2), but rather fast and easy):
import re
match = re.search(r'(.).*\1', string)
if match:
print(match.group(1))
The regular expression (.).*\1 means "any character which we'll remember for later, any number of intervening characters, then the remembered character again". Since regexp is scanned left-to-right, it will find a in "abba" rather than b, as required.

Use dictionaries
string = 'abcabc'
s = {}
for x in string:
if x in s:
print(x)
break
else:
s[x] = 0
or use sets
string = 'abcabc'
s = set()
for x in string:
if x in s:
print(x)
break
else:
s.add(x)
both dictionaries and sets use indexing and search in O(1)

How can you group a very specfic pattern with regex?

Problem:
https://coderbyte.com/editor/Simple%20Symbols
The str parameter will be composed of + and = symbols with
several letters between them (ie. ++d+===+c++==a) and for the string
to be true each letter must be surrounded by a + symbol. So the string
to the left would be false. The string will not be empty and will have
at least one letter.
Input:"+d+=3=+s+"
Output:"true"
Input:"f++d+"
Output:"false"
I'm trying to create a regular expression for the following problem, but I keep running into various problems. How can I produce something that returns the specified rules('+\D+')?
import re
plusReg = re.compile(r'[(+A-Za-z+)]')
plusReg.findall()
>>> []
Here I thought I could create my own class that searches for the pattern.
import re
plusReg = re.compile(r'([\\+,\D,\\+])')
plusReg.findall('adf+a+=4=+S+')
>>> ['a', 'd', 'f', '+', 'a', '+', '=', '=', '+', 'S', '+']
Here I thought I the '\\+' would single out the plus symbol and read it as a char.
mo = plusReg.search('adf+a+=4=+S+')
mo.group()
>>>'a'
Here using the same shell, I tried using the search instead of findall, but I just ended up with the first letter which isn't even surrounded by a plus.
My end result is to group the string 'adf+a+=4=+S+' into ['+a+','+S+'] and so on.
edit:
Solution:
import re
def SimpleSymbols(str):
#added padding, because if str = 'y+4==+r+'
#then program would return true when it should return false.
string = '=' + str + '='
#regex that returns false if a letter *doesn't* have a + in front or back
plusReg = re.compile(r'[^\+][A-Za-z].|.[A-Za-z][^\+]')
#if statement that returns "true" if regex doesn't find any letters
#without a + behind or in front
if plusReg.search(string) is None:
return "true"
return "false"
print SimpleSymbols(raw_input())
I borrowed some code from ekhumoro and Sanjay. Thanks

One approach is to search the string for any letters that are either: (1) not preceeded by a +, or (2) not followed by a +. This can be done using look ahead and look behind assertions:
>>> rgx = re.compile(r'(?<!\+)[a-zA-Z]|[a-zA-Z](?!\+)')
So if rgx.search(string) returns None, the string is valid:
>>> rgx.search('+a+') is None
True
>>> rgx.search('+a+b+') is None
True
but if it returns a match, the string is invalid:
>>> rgx.search('+ab+') is None
False
>>> rgx.search('+a=b+') is None
False
>>> rgx.search('a') is None
False
>>> rgx.search('+a') is None
False
>>> rgx.search('a+') is None
False
The important thing about look ahead/behind assertions is that they don't consume characters, so they can handle overlapping matches.

Something like this should do the trick:
import re
def is_valid_str(s):
return re.findall('[a-zA-Z]', s) == re.findall('\+([a-zA-Z])\+', s)
Usage:
In [10]: is_valid_str("f++d+")
Out[10]: False
In [11]: is_valid_str("+d+=3=+s+")
Out[11]: True

I think you are on the right track. The regular expression you have is correct, but it can simplify down to just letters:
search_pattern = re.compile(r'\+[a-zA-z]\+')
for upper and lower case strings. Now we can use this regex with the findall function:
results = re.findall(search_pattern, 'adf+a+=4=+S+') # returns ['+a+', '+S+']
Now the question needs you to return a boolean depending on if the string is valid to the specified pattern so we can wrap this all up into a function:
def is_valid_pattern(pattern_string):
search_pattern = re.compile(r'\+[a-zA-z]?\+')
letter_pattern = re.compile(r'[a-zA-z]') # to search for all letters
results = re.findall(search_pattern, pattern_string)
letters = re.findall(letter_pattern, pattern_string)
# if the lenght of the list of all the letters equals the length of all
# the values found with the pattern, we can say that it is a valid string
return len(results) == len(letter_pattern)

You should be looking for what isn't there, as opposed to what is. You should search for something like, ([^\+][A-Za-z]|[A-Za-z][^\+]). The | in the middle is a logical or operator. Then on either side, it checks if it can find any scenario where there is a letter without a "+" on the left/right respectively. If if finds something, that means the string fails. If it can't find anything, that means that there are no instances of a letter not being surrounded by "+"'s.

Search for a character within a Python string and then modify subsequent character

I need to be able to find an asterisk in a Python string, and then run the upper method on the subsequent character in that string. So suppose I have:
s*tring
I need to turn it into:
sTring

Though iteration over the characters of the String and replace is possible it is more efficient to make use of pattern matching. You can do this by re module's sub() method. Start by defining a lambda function to carry out the conversion process and then match and replace using re.sub() with this function. Hope this helps!
import re
txt = "s*tring"
callback = lambda pat: pat.group(0).replace("*", "").upper()
txt = re.sub("\*[a-z]", callback, txt)
print txt
Result:
>> python main.py
sTring
Remember this can used for sentences as well.. setting txt = "My *name is *c.*swadhikar"
Result:
>> python main.py
My Name is C.Swadhikar

You can do it a brute force way like this:
def str_upper(string):
my_list = list(string)
indexes = []
for char in my_list:
if char is "*":
indexes.append(my_list.index(char))
my_list.remove(char)
result = []
for i, letter in enumerate(my_list):
if i in indexes:
result.append(letter.upper())
else:
result.append(letter)
return "".join(result)
Output:
>>> str_upper("s*tring")
sTring
>>> str_upper("s*tri*ng")
sTriNg
This is also a much shorter pythonic solution:
def str_upper2(string):
return "".join(s[0].upper() + s[1:] for s in string.split('*'))

Develop Reference

Python is a programming language that lets you work quickly and integrate systems more effectively.

How to replace multiple matches in Regex - python

Maybe the replace() function is what you are looking for : log="[x] = '1' and [y] <> '7' or [z]='51'" log = log.replace("=", "==")

This could replace the given character with the new char to be replaced in the entire string. log=log.replace("=","==")#Replaces the given substring with new string print(log)#Display

Related

How can we remove word with repeated single character?

What do these single quotes do at the beginning of line 2?

Print the first, second occurred character in a list

How can you group a very specfic pattern with regex?

Search for a character within a Python string and then modify subsequent character

Categories

Resources