Regex to find specific letter before a condition Python - python

I just want to find all characters (other than A) which are followed by triple A, i.e., have AAA to the right. I don’t want to include the triple A in the output and just want the character immediately preceding AAA
result = []
s = 'ACAABAACAAABACDBADDDFSDDDFFSSSASDAFAAACBAAAFASD'
pattern = "r'(\w[BF])(?!AAA)'"
for item in re.finditer(pattern, s):
result.append(item.group())
print(result)
I used this pattern r'(\w[BF])(?!AAA)' but didn't worked
I just need find this letters in []
'ACAABAA[C]AAABACDBADDDFSDDDFFSSSASDA[F]AAAC[B]AAAFASD'

In your example, you want to match a single character at the left of tripple A. Using \w[BF] matches at least 2 characters being 1 word character followed by either B or F
The negative lookahead asserts that what is directly to the right is not tripple A, but you want the opposite.
You can match a single B-Z and assert what is directly to the right is AAA
[B-Z](?=AAA)
Regex demo | Python demo
import re
result = []
s = 'ACAABAACAAABACDBADDDFSDDDFFSSSASDAFAAACBAAAFASD'
pattern = r'[B-Z](?=AAA)'
for item in re.finditer(pattern, s):
result.append(item.group())
print(result)
Output
['C', 'F', 'B']
You could also use re.findall
import re
s = 'ACAABAACAAABACDBADDDFSDDDFFSSSASDAFAAACBAAAFASD'
pattern = r'[B-Z](?=AAA)'
result = re.findall(pattern, s)
print(result)
Python demo

[^A](?=A{3})
Here I use positive lookahead.

Here is your problem's solution:
pattern = "([B-Z]{1})(A{3})"
for item in re.finditer(pattern, s):
result.append(item.group(1))

Related

Finding exact number of characters in word

I'm looking for a way to find words with the exact number of the given character.
For example:
If we have this input: ['teststring1','strringrr','wow','strarirngr'] and we are looking for 4 r characters
It will return only ['strringrr','strarirngr'] because they are the words with 4 letters r in it.
I decided to use regex and read the documentation and I can't find a function that satisfies my needs.
I tried with [r{4}] but it apparently returns any word with letters r in it.
Please help
something like this:
import collections
def map_characters(string):
characters = collections.defaultdict(lambda: 0)
for char in string:
characters[char] += 1
return characters
items = ['teststring1','strringrr','wow','strarirngr']
for item in items:
characters_map = map_characters(item)
# if any of string has 4 identical letters
# we print it
if max(characters_map.values()) >= 4:
print(item)
# in the result it outputs strringrr and strarirngr
# because these words have 4 r letters
You can use str.count() to count the occurrences of a character, combined with list comprehensions to create a new list:
myArray = ['teststring1','strringrr','wow','strarirngr']
letter = "r"
amount = 4
filtered = [item for item in myArray if item.count(letter) == amount]
print(filtered) # ['strringrr', 'strarirngr']
If you wanted to make this reusable (to look for different letters or different amounts), you could pack it into a function:
def filterList(stringList, pattern, occurrences):
return [item for item in stringList if item.count(pattern)==occurrences]
myArray = ['teststring1','strringrr','wow','strarirngr']
letter = "r"
amount = 4
print(filterList(myArray, letter, amount)) # ['strringrr', 'strarirngr']
The square brackets are for matching any items in the set, e.g. [abc] matches any words with a,b or c. In your case, it evaluates to [rrrr], so any one r is a match. Try it without the brackets: r{4}
Since you asked about using regex, you could use the following:
import re
l = ['teststring1', 'strringrr', 'wow', 'strarirngr']
[ word for word in l if re.match(r'(.*r.*){4}', word) ]
output: ['strringrr', 'strarirngr']

How can we remove word with repeated single character?

I am trying to remove word with single repeated characters using regex in python, for example :
good => good
gggggggg => g
What I have tried so far is following
re.sub(r'([a-z])\1+', r'\1', 'ffffffbbbbbbbqqq')
Problem with above solution is that it changes good to god and I just want to remove words with single repeated characters.
A better approach here is to use a set
def modify(s):
#Create a set from the string
c = set(s)
#If you have only one character in the set, convert set to string
if len(c) == 1:
return ''.join(c)
#Else return original string
else:
return s
print(modify('good'))
print(modify('gggggggg'))
If you want to use regex, mark the start and end of the string in our regex by ^ and $ (inspired from #bobblebubble comment)
import re
def modify(s):
#Create the sub string with a regex which only matches if a single character is repeated
#Marking the start and end of string as well
out = re.sub(r'^([a-z])\1+$', r'\1', s)
return out
print(modify('good'))
print(modify('gggggggg'))
The output will be
good
g
If you do not want to use a set in your method, this should do the trick:
def simplify(s):
l = len(s)
if l>1 and s.count(s[0]) == l:
return s[0]
return s
print(simplify('good'))
print(simplify('abba'))
print(simplify('ggggg'))
print(simplify('g'))
print(simplify(''))
output:
good
abba
g
g
Explanations:
You compute the length of the string
you count the number of characters that are equal to the first one and you compare the count with the initial string length
depending on the result you return the first character or the whole string
You can use trim command:
take a look at this examples:
"ggggggg".Trim('g');
Update:
and for characters which are in the middle of the string use this function, thanks to this answer
in java:
public static string RemoveDuplicates(string input)
{
return new string(input.ToCharArray().Distinct().ToArray());
}
in python:
used = set()
unique = [x for x in mylist if x not in used and (used.add(x) or True)]
but I think all of these answers does not match situation like aaaaabbbbbcda, this string has an a at the end of string which does not appear in the result (abcd). for this kind of situation use this functions which I wrote:
In:
def unique(s):
used = set()
ret = list()
s = list(s)
for x in s:
if x not in used:
ret.append(x)
used = set()
used.add(x)
return ret
print(unique('aaaaabbbbbcda'))
out:
['a', 'b', 'c', 'd', 'a']

how to remove whitespace inside bracket?

I have the following string:
res = '(321, 3)-(m-5, 5) -(31,1)'
I wanna remove the whitespace withing the bracket but i haven't any knowledge about regular expression
I ve try this but that doesn't work:
import re
res = re.sub(r'\(.*\s+\)', '', res)
You can substitute a non-greedy wildcard match for characters in parentheses with a function that splits the match on whitespace and rejoins it.
>>> import re
>>> res = '(321, 3)-(m-5, 5) -(31,1)'
>>> re.sub(r'\(.*?\)', lambda x: ''.join(x.group(0).split()), res)
'(321,3)-(m-5,5) -(31,1)'
You could convert the string into a list, go through each letter and count if you are within brackets or not. In toRemove, you collect the positions of whitespaces, which you then remove from the list. Then you convert the list back to a string ...
res = '(321, 3)-(m-5, 5) -(31,1)'
r = list(res)
insideBracket = 0
toRemove = []
for pos,letter in enumerate(r):
if letter == '(':
insideBracket += 1
elif letter == ')':
insideBracket -= 1
if insideBracket > 0:
if letter == ' ':
toRemove.append(pos)
for t in toRemove[::-1]:
r.pop(t)
result = ''.join(r)
print(result)
I think regular expressions aren't quite powerful enough to do what you want here; you want to remove all whitespace that's found in between parenthesis characters. The trouble is, solving this for the general case means you're doing a context-sensitive match on the string, and regular expressions are mostly context-insensitive, and so can't do your job. There are lookaheads and lookbehinds that can restrict matches to particular contexts, but they won't solve your problem in the general case either:
The contained pattern must only match strings of some fixed length, meaning that abc or a|b are allowed, but a* and a{3,4} are not. Group references are not supported even if they match strings of some fixed length.
Because of this, I would match the parenthesis groups first:
>>> re.split(r'(\([^)]*\))', res)
['', '(321, 3)', '-', '(m-5, 5)', ' -', '(31,1)', '']
and then remove whitespace from them in a second step before joining everything back up into a single string:
>>> g = re.split(r'(\([^)]*\))', res)
>>> g[1::2] = [re.sub(r'\s*', '', x) for x in g[1::2]]
>>> ''.join(g)
'(321,3)-(m-5,5) -(31,1)'

Python: Finding Regex occurance for variable char

I know, for example, that if I want to find lengths of all the occurrences of consecutive 'a's
in input = "1111aaaaa11111aaaaaaa111aaa", I can do
[len(s) for s in re.findall(r'a+', input)]
However, I'm not sure how to do this with a char variable. For instance,
CHAR = 'a'
[len(s) for s in re.findall(r'??????', input)] # Trying to find occurrences of CHARs..
Is there a way to do this??
Here is a general solution that should work for strings of any length:
CHAR = 'a'
[len(s) for s in re.findall(r'(?:{})+'.format(re.escape(CHAR)), input)]
Or an alternative using itertools (single character only):
import itertools
[sum(1 for _ in g) for k, g in itertools.groupby(input) if k == CHAR]
I think what you're asking for is:
[len(s) for s in re.findall(r'{}+'.format(CHAR), input)]
Except of course that this won't work if CHAR is a special value, like \. If that's an issue:
[len(s) for s in re.findall(r'{}+'.format(re.escape(CHAR)), input)]
If you want to match two or more instead of one or more, the syntax for that is {2,}. As the docs say:
{m,n} Causes the resulting RE to match from m to n repetitions of the preceding RE, attempting to match as many repetitions as possible. For example, a{3,5} will match from 3 to 5 'a' characters. Omitting m specifies a lower bound of zero, and omitting n specifies an infinite upper bound. As an example, a{4,}b will match aaaab or a thousand 'a' characters followed by a b, but not aaab…
That gets a little ugly when we're using {} for string formatting, so let's switch to %-formatting:
[len(s) for s in re.findall(r'%s{2,}' % (re.escape(CHAR),), input)]
… or just simple concatenation:
[len(s) for s in re.findall(re.escape(CHAR) + r'{2,}', input)]

Regular expression to replace a character on odd repeated occurrences in Python

Can't get a regular expression to replace a character on odd repeated occurrences in Python.
Example:
char = ``...```.....``...`....`````...`
to
``...``````.....``...``....``````````...``
on even occurrences doesn't replace.
for example:
>>> import re
>>> s = "`...```.....``...`....`````...`"
>>> re.sub(r'((?<!`)(``)*`(?!`))', r'\1\1', s)
'``...``````.....``...``....``````````...``'
Maybe I'm old fashioned (or my regex skills aren't up to par), but this seems to be a lot easier to read:
import re
def double_odd(regex,string):
"""
Look for groups that match the regex. Double every second one.
"""
count = [0]
def _double(match):
count[0] += 1
return match.group(0) if count[0]%2 == 0 else match.group(0)*2
return re.sub(regex,_double,string)
s = "`...```.....``...`....`````...`"
print double_odd('`',s)
print double_odd('`+',s)
It seems that I might have been a little confused about what you were actually looking for. Based on the comments, this becomes even easier:
def odd_repl(match):
"""
double a match (all of the matched text) when the length of the
matched text is odd
"""
g = match.group(0)
return g*2 if len(g)%2 == 1 else g
re.sub(regex,odd_repl,your_string)
This may be not as good as the regex solution, but works:
In [101]: s1=re.findall(r'`{1,}',char)
In [102]: s2=re.findall(r'\.{1,}',char)
In [103]: fill=s1[-1] if len(s1[-1])%2==0 else s1[-1]*2
In [104]: "".join("".join((x if len(x)%2==0 else x*2,y)) for x,y in zip(s1,s2))+fill
Out[104]: '``...``````.....``...``....``````````...``'

Categories

Resources