python regex matching "ab" or "ba" words - python

I tried matching words including the letter "ab" or "ba" e.g. "ab"olition, f"ab"rics, pro"ba"ble. I came up with the following regular expression:
r"[Aa](?=[Bb])[Bb]|[Bb](?=[Aa])[Aa]"
But it includes words that start or end with ", (, ), / ....non-alphanumeric characters. How can I erase it? I just want to match words list.
import sys
import re
word=[]
dict={}
f = open('C:/Python27/brown_half.txt', 'rU')
w = open('C:/Python27/brown_halfout.txt', 'w')
data = f.read()
word = data.split() # word is list
f.close()
for num2 in word:
match2 = re.findall("\w*(ab|ba)\w*", num2)
if match2:
dict[num2] = (dict[num2] + 1) if num2 in dict.keys() else 1
for key2 in sorted(dict.iterkeys()):print "%s: %s" % (key2, dict[key2])
print len(dict.keys())
Here, I don't know how to mix it up with "re.compile~~" method that 1st comment said...

To match all the words with ab or ba (case insensitive):
import re
text = 'fabh, obar! (Abtt) yybA, kk'
pattern = re.compile(r"(\w*(ab|ba)\w*)", re.IGNORECASE)
# to print all the matches
for match in pattern.finditer(text):
print match.group(0)
# to print the first match
print pattern.search(text).group(0)
https://regex101.com/r/uH3xM9/1

Regular expressions are not the best tool for the job in this case. They'll complicate stuff way too much for such simple circumstances. You can instead use Python's builtin in operator (works for both Python 2 and 3)...
sentence = "There are no probable situations whereby that may happen, or so it seems since the Abolition."
words = [''.join(filter(lambda x: x.isalpha(), token)) for token in sentence.split()]
for word in words:
word = word.lower()
if 'ab' in word or 'ba' in word:
print('Word "{}" matches pattern!'.format(word))
As you can see, 'ab' in word evaluates to True if the string 'ab' is found as-is (that is, exactly) in word, or False otherwise. For example 'ba' in 'probable' == True and 'ab' in 'Abolition' == False. The second line takes take of dividing the sentence in words and taking out any punctuation character. word = word.lower() makes word lowercase before the comparisons, so that for word = 'Abolition', 'ab' in word == True.

I would do it this way:
Strip your string from unwanted chars using the below two
techniques, your choice:
a - By building a translation dictionary and using translate method:
>>> import string
>>> del_punc = dict.fromkeys(ord(c) for c in string.punctuation)
s = 'abolition, fabrics, probable, test, case, bank;, halfback 1(ablution).'
>>> s = s.translate(del_punc)
>>> print(s)
'abolition fabrics probable test case bank halfback 1ablution'
b - using re.sub method:
>>> import string
>>> import re
>>> s = 'abolition, fabrics, probable, test, case, bank;, halfback 1(ablution).'
>>> s = re.sub(r'[%s]'%string.punctuation, '', s)
>>> print(s)
'abolition fabrics probable test case bank halfback 1ablution'
Next will be finding your words containing 'ab' or 'ba':
a - Splitting over whitespaces and finding occurrences of your desired strings, which is the one I recommend to you:
>>> [x for x in s.split() if 'ab' in x.lower() or 'ba' in x.lower()]
['abolition', 'fabrics', 'probable', 'bank', 'halfback', '1ablution']
b -Using re.finditer method:
>>> pat
re.compile('\\b.*?(ab|ba).*?\\b', re.IGNORECASE)
>>> for m in pat.finditer(s):
print(m.group())
abolition
fabrics
probable
test case bank
halfback
1ablution

string = "your string here"
lowercase = string.lower()
if 'ab' in lowercase or 'ba' in lowercase:
print(true)
else:
print(false)

Try this one
[(),/]*([a-z]|(ba|ab))+[(),/]*

Related

Python challenge to convert a string to camelCase

I have a python challenge that if given a string with '_' or '-' in between each word such as the_big_red_apple or the-big-red-apple to convert it to camel case. Also if the first word is uppercase keep it as uppercase. This is my code. Im not allowed to use the re library in the challenge however but I didn't know how else to do it.
from re import sub
def to_camel_case(text):
if text[0].isupper():
text = sub(r"(_|-)+"," ", text).title().replace(" ", "")
else:
text = sub(r"(_|-)+"," ", text).title().replace(" ", "")
text = text[0].lower() + text[1:]
return print(text)
Word delimiters can be - dash or _ underscore.
Let's simplify, making them all underscores:
text = text.replace('-', '_')
Now we can break out words:
words = text.split('_')
With that in hand it's simple to put them back together:
text = ''.join(map(str.capitalize, words))
or more verbosely, with a generator expression,
assign ''.join(word.capitalize() for word in words).
I leave "finesse the 1st character"
as an exercise to the reader.
If you RTFM you'll find it contains a wealth of knowledge.
https://docs.python.org/3/library/re.html#raw-string-notation
'+'
Causes the resulting RE to match 1 or more repetitions of the preceding RE. ab+ will match ‘a’ followed by any non-zero number of ‘b’s
The effect of + is turn both
db_rows_read and
db__rows_read
into DbRowsRead.
Also,
Raw string notation (r"text") keeps regular expressions sane.
The regex in your question doesn't exactly
need a raw string, as it has no crazy
punctuation like \ backwhacks.
But it's a very good habit to always put
a regex in an r-string, Just In Case.
You never know when code maintenance
will tack on additional elements,
and who wants a subtle regex bug on their hands?
You can try it like this :
def to_camel_case(text):
s = text.replace("-", " ").replace("_", " ")
s = s.split()
if len(text) == 0:
return text
return s[0] + ''.join(i.capitalize() for i in s[1:])
print(to_camel_case('momo_es-es'))
the output of print(to_camel_case('momo_es-es')) is momoEsEs
r"..." refers to Raw String in Python which simply means treating backlash \ as literal instead of escape character.
And (_|-)[+] is a Regular Expression that match the string containing one or more - or _ characters.
(_|-) means matching the string that contains - or _.
+ means matching the above character (- or _) than occur one or more times in the string.
In case you cannot use re library for this solution:
def to_camel_case(text):
# Since delimiters can have 2 possible answers, let's simplify it to one.
# In this case, I replace all `_` characters with `-`, to make sure we have only one delimiter.
text = text.replace("_", "-") # the_big-red_apple => the-big-red-apple
# Next, we should split our text into words in order for us to iterate through and modify it later.
words = text.split("-") # the-big-red-apple => ["the", "big", "red", "apple"]
# Now, for each word (except the first word) we have to turn its first character to uppercase.
for i in range(1, len(words)):
# `i`start from 1, which means the first word IS NOT INCLUDED in this loop.
word = words[i]
# word[1:] means the rest of the characters except the first one
# (e.g. w = "apple" => w[1:] = "pple")
words[i] = word[0].upper() + word[1:].lower()
# you can also use Python built-in method for this:
# words[i] = word.capitalize()
# After this loop, ["the", "big", "red", "apple"] => ["the", "Big", "Red", "Apple"]
# Finally, we put the words back together and return it
# ["the", "Big", "Red", "Apple"] => theBigRedApple
return "".join(words)
print(to_camel_case("the_big-red_apple"))
Try this:
First, replace all the delimiters into a single one, i.e. str.replace('_', '-')
Split the string on the str.split('-') standardized delimiter
Capitalize each string in list, i.e. str.capitilize()
Join the capitalize string with str.join
>>> s = "the_big_red_apple"
>>> s.replace('_', '-').split('-')
['the', 'big', 'red', 'apple']
>>> ''.join(map(str.capitalize, s.replace('_', '-').split('-')))
'TheBigRedApple'
>> ''.join(word.capitalize() for word in s.replace('_', '-').split('-'))
'TheBigRedApple'
If you need to lowercase the first char, then:
>>> camel_mile = lambda x: x[0].lower() + x[1:]
>>> s = 'TheBigRedApple'
>>> camel_mile(s)
'theBigRedApple'
Alternative,
First replace all delimiters to space str.replace('_', ' ')
Titlecase the string str.title()
Remove space from string, i.e. str.replace(' ', '')
>>> s = "the_big_red_apple"
>>> s.replace('_', ' ').title().replace(' ', '')
'TheBigRedApple'
Another alternative,
Iterate through the characters and then keep a pointer/note on previous character, i.e. for prev, curr in zip(s, s[1:])
check if the previous character is one of your delimiter, if so, uppercase the current character, i.e. curr.upper() if prev in ['-', '_'] else curr
skip whitepace characters, i.e. if curr != " "
Then add the first character in lowercase, [s[0].lower()]
>>> chars = [s[0].lower()] + [curr.upper() if prev in ['-', '_'] else curr for prev, curr in zip(s, s[1:]) if curr != " "]
>>> "".join(chars)
'theBigRedApple'
Yet another alternative,
Replace/Normalize all delimiters into a single one, s.replace('-', '_')
Convert it into a list of chars, list(s.replace('-', '_'))
While there is still '_' in the list of chars, keep
find the position of the next '_'
replacing the character after '_' with its uppercase
replacing the '_' with ''
>>> s = 'the_big_red_apple'
>>> s_list = list(s.replace('-', '_'))
>>> while '_' in s_list:
... where_underscore = s_list.index('_')
... s_list[where_underscore+1] = s_list[where_underscore+1].upper()
... s_list[where_underscore] = ""
...
>>> "".join(s_list)
'theBigRedApple'
or
>>> s = 'the_big_red_apple'
>>> s_list = list(s.replace('-', '_'))
>>> while '_' in s_list:
... where_underscore = s_list.index('_')
... s_list[where_underscore:where_underscore+2] = ["", s_list[where_underscore+1].upper()]
...
>>> "".join(s_list)
'theBigRedApple'
Note: Why do we need to convert the string to list of chars? Cos strings are immutable, 'str' object does not support item assignment
BTW, the regex solution can make use of some group catching, e.g.
>>> import re
>>> s = "the_big_red_apple"
>>> upper_regex_group = lambda x: x.group(1).upper()
>>> re.sub("[_|-](\w)", upper_regex_group, s)
'theBigRedApple'
>>> re.sub("[_|-](\w)", lambda x: x.group(1).upper(), s)
'theBigRedApple'

Get text between two signs in a sentence

The task is to get text between two signs in a sentence.
User input sentence in one line in next one he input signs(for this case it's [ and ]).
Example:
In this sentence [need to get] only [few words].
Output needs to look like:
need to get few words
Can someone have any clue how to do this?
I have some idea like split input so we will access every element of the list and if a first sign is [ and finish with ] we save that word to other list, but there is a problem if the word doesn't end with ]
P.S. user will never input empty string or have a sign inside sign like [word [another] word].
You can use a regex:
import re
text = 'In this sentence [need to get] only [few words] and not [unbalanced'
' '.join(re.findall(r'\[(.*?)\]', text))
Output: 'need to get few words'
Or '(?<=\[).*?(?=\])' as regex using lookarounds
You can use regular expressions like this:
import re
your_string = "In this sentence [need to get] only [few words]"
matches = re.findall(r'\[([^\[\]]*)]', your_string)
print(' '.join(matches))
Regex demo
Solution without regex:
your_string = "In this sentence [need to get] only [few words]"
result_parts = []
current_square_brackets_part = ''
need_to_add_letter_to_current_square_brackets_part = False
for letter in your_string:
if letter == '[':
need_to_add_letter_to_current_square_brackets_part = True
elif letter == ']':
need_to_add_letter_to_current_square_brackets_part = False
result_parts.append(current_square_brackets_part)
current_square_brackets_part = ''
elif need_to_add_letter_to_current_square_brackets_part:
current_square_brackets_part += letter
print(' '.join(result_parts))
Here is a more classical solution using parsing.
It reads the string character by character and keeps it only if a flag is set. The flag is set when meeting a [ and unset on ]
text = 'In this sentence [need to get] only [few words] and not [unbalanced'
add = False
l = []
m = []
for c in text:
if c == '[':
add = True
elif c == ']':
if add and m:
l.append(''.join(m))
add = False
m = []
elif add:
m.append(c)
out = ' '.join(l)
print(out)
Output: need to get few words

compare specific string to a word python

say I have a certain string and a list of strings.
I would like to append to a new list all the words from the list (of strings)
that are exactly like the pattern
for example:
list of strings = ['string1','string2'...]
pattern =__letter__letter_ ('_c__ye_' for instance)
I need to add all strings that are made up of the same letters in the same places as the pattern, and has the same length.
so for instance:
new_list = ['aczxyep','zcisyef'...]
I have tried this:
def pattern_word_equality(words,pattern):
list1 = []
for word in words:
for letter in word:
if letter in pattern:
list1.append(word)
return list1
help will be much appreciated :)
If your pattern is as simple as _c__ye_, then you can look for the characters in the specific positions:
words = ['aczxyep', 'cxxye', 'zcisyef', 'abcdefg']
result1 = list(filter(lambda w: w[1] == 'c' and w[4:6] == 'ye', words))
If your pattern is getting more complex, then you can start using regular expressions:
pat = re.compile("^.c..ye.$")
result2 = list(filter(lambda w: pat.match(w), words))
Output:
print(result1) # ['aczxyep', 'zcisyef']
print(result2) # ['aczxyep', 'zcisyef']
This works:
words = ['aczxyep', 'cxxye', 'zcisyef', 'abcdefg']
pattern = []
for i in range(len(words)):
if (words[i])[1].lower() == 'c' and (words[i])[4:6].lower() == 'ye':
pattern.append(words[i])
print(pattern)
You start by defining the words and pattern lists. Then you loop around for the amount of items in words by using len(words). You then find whether the i item number is follows the pattern by seeing if the second letter is c and the 5th and 6th letters are y and e. If this is true then it appends that word onto pattern and it prints them all out at the end.

Python string replacement [duplicate]

This question already has answers here:
Do regular expressions from the re module support word boundaries (\b)?
(5 answers)
Closed 3 years ago.
I'm trying to replace the occurrence of a word with another:
word_list = { "ugh" : "disappointed"}
tmp = ['laughing ugh']
for index, data in enumerate(tmp):
for key, value in word_list.iteritems():
if key in data:
tmp[index]=data.replace(key, word_list[key])
print tmp
Whereas this works... the occurrence of ugh in laughing is also being replaced in the output: ladisappointeding disappointed.
How does one avoid this so that the output is laughing disappointed?
In that case, you may want to consider to replace word by word.
Example:
word_list = { "ugh" : "disappointed"}
tmp = ['laughing ugh']
for t in tmp:
words = t.split()
for i in range(len(words)):
if words[i] in word_list.keys():
words[i] = word_list[words[i]]
newline = " ".join(words)
print(newline)
Output:
laughing disappointed
Step-by-Step Explanations:
Get every sentence in the tmp list:
for t in tmp:
split the sentence into words:
words = t.split()
check whether any word in words are in the word_list keys. If it does, replace it with its value:
for i in range(len(words)):
if words[i] in word_list.keys():
words[i] = word_list[words[i]]
rejoin the replaced words and print the result out:
newline = " ".join(words)
print(newline)
You can do this by using a RegEx:
>>> import re
>>> re.sub(r'\bugh\b', 'disappointed', 'laughing ugh')
'laughing disappointed'
The \b stands for a word boundary.
Use re.sub:
for key, value in word_list.items():
tmp = re.sub("\\b{}\\b".format(key), value, tmp[index])
word_list = { "ugh" : "disappointed", "123" : "lol"}
tmp = ['laughing 123 ugh']
for word in tmp:
words = word.split()
for i in words[:]:
if i in word_list.keys():
replace_value = word_list.get(i)
words[words.index(i)] = replace_value
output = " ".join(words)
print output
This code will swap each key of the dict (so the word you want to replace ) with the dict value of that key ( the word you want it to be replaced with) in every case and with multiple values!
Output:
laughing lol disappointed
Hope that helps!
You can use regular expressions:
import re
for index, data in enumerate(tmp):
for key, value in word_list.iteritems():
if key in data:
pattern = '\b' + key + '\b'
data = re.sub(pattern, value, data)
tmp[index] = data
Side note: you need data = ... line (to overwrite data variable) otherwise it will work incorrectly when word_list contains multiple entries.
Fast:
>>> [re.sub(r'\w+', lambda m: word_list.get(m.group(), m.group()), t)
for t in tmp]
['laughing disappointed']
>>>
Very Fast:
>>> [re.sub(r'\b(?:%s)\b' % '|'.join(word_list.keys()), lambda m: word_list.get(m.group(), m.group()), t)
... for t in tmp]
['laughing disappointed']
>>>

How do I print words with only 1 vowel?

my code so far, but since i'm so lost it doesn't do anything close to what I want it to do:
vowels = 'a','e','i','o','u','y'
#Consider 'y' as a vowel
input = input("Enter a sentence: ")
words = input.split()
if vowels == words[0]:
print(words)
so for an input like this:
"this is a really weird test"
I want it to only print:
this, is, a, test
because they only contains 1 vowel.
Try this:
vowels = set(('a','e','i','o','u','y'))
def count_vowels(word):
return sum(letter in vowels for letter in word)
my_string = "this is a really weird test"
def get_words(my_string):
for word in my_string.split():
if count_vowels(word) == 1:
print word
Result:
>>> get_words(my_string)
this
is
a
test
Here's another option:
import re
words = 'This sentence contains a bunch of cool words'
for word in words.split():
if len(re.findall('[aeiouy]', word)) == 1:
print word
Output:
This
a
bunch
of
words
You can translate all the vowels to a single vowel and count that vowel:
import string
trans = string.maketrans('aeiouy','aaaaaa')
strs = 'this is a really weird test'
print [word for word in strs.split() if word.translate(trans).count('a') == 1]
>>> s = "this is a really weird test"
>>> [w for w in s.split() if len(w) - len(w.translate(None, "aeiouy")) == 1]
['this', 'is', 'a', 'test']
Not sure if words with no vowels are required. If so, just replace == 1 with < 2
You may use one for-loop to save the sub-strings into the string array if you have checked he next character is a space.
Them for each substring, check if there is only one a,e,i,o,u (vowels) , if yes, add into the another array
aFTER THAT, FROM another array, concat all the strings with spaces and comma
Try this:
vowels = ('a','e','i','o','u','y')
words = [i for i in input('Enter a sentence ').split() if i != '']
interesting = [word for word in words if sum(1 for char in word if char in vowel) == 1]
i found so much nice code here ,and i want to show my ugly one:
v = 'aoeuiy'
o = 'oooooo'
sentence = 'i found so much nice code here'
words = sentence.split()
trans = str.maketrans(v,o)
for word in words:
if not word.translate(trans).count('o') >1:
print(word)
I find your lack of regex disturbing.
Here's a plain regex only solution (ideone):
import re
str = "this is a really weird test"
words = re.findall(r"\b[^aeiouy\W]*[aeiouy][^aeiouy\W]*\b", str)
print(words)

Categories

Resources