Pattern matching in regex

Pattern matching in regex - python

I have below dictionary that I created in such a way that the key is the length of the values.
{4: {'lost', 'lust', 'list', 'last', 'lest', 'blue'}, 5: {'beryl'}, 8: {'blowlamp', 'blimbing', 'bluejays', 'jigsawed'}, 9: {'blistered', 'oospheres', 'blackcaps', 'blastular', 'blotchier', 'troweller'}, 10: {'blancmange', 'blackguard', 'volcanizes'}, 6: {'blague', 'blacks', 'blonde', 'blocks'}, 7: {'blawort', 'blinder', 'blender', 'blonder', 'blunder', 'blander'}}
I want to pull out a list of values in this dictionary in such a way that the vowel comes at the same place for 5 words like
[lost,lust,list,last,lest],[blinder,blender,blonder,blunder,blander]]
I am not having any idea on how to get the list in such a way. One way I thought could be through regex but on what basis do I match? The length of the words could be anything and vowel could be anywhere.
PS this is a codewars question. https://www.codewars.com/kata/vowel-alternations/train/python
My approach so far that I got the values with same length in a dictionary so that I can work on the values. I just have no idea how to work on the values.
It would be helpful if anyone can explain me what they are thinking and what is the best way to do this.
The rest of the code
mdict={}
rev_multidict = {}
for i in words:
for sublist in i:
mdict[sublist] = len(sublist)
for key, value in mdict.items():
rev_multidict.setdefault(value, set()).add(key)
for key,value in rev_multidict.items():
i = rev_multidict[key]
print(i)

You could check the first string for the location of the vowels and generate a regex string to match. Every char maps to either '[aeiou]' or '.' depending on if it's a vowel or not. Why you do with 'y' is up to you.

the following code is the start of one way of approaching it:
#!/usr/bin/python
import re
words = 'last lest best list '.split()
words1 = 'blander wigwam blunder slender'.split()
print("word list one: {}".format(words))
print('')
aoa = [re.split('[aeiou]', word) for word in words]
for item in aoa:
print(item)
print('\n')
print("word list two: {}".format(words1))
print('')
aoa1 = [re.split('[aeiou]', word) for word in words1]
for item in aoa1:
print(item)
the output is:
word list one: ['last', 'lest', 'best', 'list']
['l', 'st']
['l', 'st']
['b', 'st']
['l', 'st']
word list two: ['blander', 'wigwam', 'blunder', 'slender']
['bl', 'nd', 'r']
['w', 'gw', 'm']
['bl', 'nd', 'r']
['sl', 'nd', 'r']
The regex splits on vowels. If you look closely at the output of the split, you will notice that for the words that should match, the corresponding list index values are the same. Perhaps you can iterate over the lists and do a comparison.
That is left up to you...

Assuming you want to print out only if the output list contains 5 words:
text = {4: {'lost', 'lust', 'list', 'last', 'lest', 'blue'}, 5: {'beryl'}, 8: {'blowlamp', 'blimbing', 'bluejays', 'jigsawed'}, 9: {'blistered', 'oospheres', 'blackcaps', 'blastular', 'blotchier', 'troweller'}, 10: {'blancmange', 'blackguard', 'volcanizes'}, 6: {'blague', 'blacks', 'blonde', 'blocks'}, 7: {'blawort', 'blinder', 'blender', 'blonder', 'blunder', 'blander'}}
vowels = "aeiou"
for i in range(4,10):
words = text[i]
rep_idx = []
for word in words:
for letter in vowels:
if letter in word:
idx = word.index(letter)
if idx not in rep_idx:
word_list = []
for word in words:
if word[idx] in vowels:
word_list.append(word)
if len(word_list) == 5:
print ("{}, Vowel Index: {}".format(word_list, idx))
rep_idx.append(idx)
Output:
>>>
['lust', 'lest', 'list', 'lost', 'last'], Vowel Index: 1
['blonder', 'blender', 'blinder', 'blander', 'blunder'], Vowel Index: 5
['blistered', 'blackcaps', 'troweller', 'blastular', 'blotchier'], Vowel Index: 2
['blistered', 'troweller', 'oospheres', 'blastular', 'blotchier'], Vowel Index: 7

Ok, for the codewars Question (I've taken different approach than you, so I didn't use your code):
First you define a simple function changing all vowels for some (here a dollar sign) character:
from collections import Counter
def translate(word):
for ch in 'eyuioa':
if ch in word:
word=word.replace(ch,'$')
return word
Then you define a function, that takes list of words as an input (eg. ['last', 'lest', 'list', 'lost', 'lust'])) that counts occurences of each translated word, and find that translated word that occurs 5 times. Store it in a list and add a [None] just in case the list is empty (word not found), so that you don't get an error. Then simply print all words that meet a condition.
def find_solutions(input_list):
tuples_list = list(map(lambda x: (x,translate(x)),input_list))
counting = Counter(map(lambda x: x[1], tuples_list))
desired_pattern = [x for x,y in dict(counting).items() if y ==5] + [None]
return [x for x, y in tuples_list if y==desired_pattern[0]]
example:
find_solutions(['last', 'lest', 'list', 'lost', 'lust'])

This would be very robust approach, but it seems to work:
vowels = 'aeiou'
def find_solutions(words):
solutions = []
vowel_list = list(vowels)
cases = []
for word in words:
for i, l in enumerate(word):
if l in vowel_list:
c = list(word)
temp = []
for vowel in vowel_list:
c[i] = vowel
temp.append(''.join(c))
cases.append(temp)
for case in cases:
if all([item in words for item in case]):
if case not in solutions:
solutions.append(case)
return solutions

Related

How to add x amount of words of x length to different nested lists

I've got a piece of work which requires me to add words to there corresponding list depending on the length of the word. I.e all words of length 1 will go in list 1, length 2 will go in list 2, etc...
Below is the code I currently have. As you can see I've created a list with L empty buckets and the idea is to have each length word to go in there corresponding bucket. This is where I am stuck. Without knowing how many buckets there are going to be, I don't know how to add them.
I am very new to Python and any help would be much appreciated!!
def empty_buckets(n):
"""Return a list with n empty lists. Assume n is a positive integer. """
buckets = []
for bucket in range(n):
buckets.append([])
return buckets
Compute the maximum length L of all words.
longest = ''
for L in words:
if len(L) > len(longest):
longest = L
return longest
Create a list of L empty lists (buckets).
buckets = empty_buckets(L)

You can get the longest word in a list of words with max() and supplying a key-function of len.
You can create one more bucket for "empty" words and sort all your words into the buckets using a for loop and indexing into your buckets with len(word):
# create some demo strings and add some other words
words = [ str(10**k) for k in range(10)]
words.extend(["this","should","work","out","somehow"])
print(words) # ['1', '10', '100', '1000', '10000', '100000', '1000000', '10000000',
# '100000000', '1000000000', 'this', 'should', 'work', 'out', 'somehow']
longest = len(max(words,key=len)) # get the length of the longest word
# create a empty bucket for "" and one bucket for length 1 up to longest
bins = [None] + [ [] for _ in range(longest+1)]
# loop over words and put then in the bin at index len(word)
for w in words:
bins[len(w)].append(w)
print(bins)
Output:
[None, ['1'], ['10'], ['100', 'out'], ['1000', 'this', 'work'], ['10000'],
['100000', 'should'], ['1000000', 'somehow'], ['10000000'],
['100000000'], ['1000000000']]
Doku:
max(iterable, key=len)
len()
range()

buckets = [0] * longest # this will make a list of longest size
Then, make a list within each element and I am using the first element of the list to keep a count of that bucket.
for i in range(longest):
buckets[i] = [0]
Then you need to add the words to the buckets.
for L in words:
buckets[len(L)][0] += 1 # increasing the count of that bucket
buckets[len(L)].append(L) # Adding the word to that bucket
Here is an example:
longest = 10
words = ['this', 'that', 'foremost']
buckets = [0] * longest # this will make a list of longest size
for i in range(longest):
buckets[i] = [0]
for L in words:
buckets[len(L)][0] += 1 # increasing the count of that bucket
buckets[len(L)].append(L) # Adding the word to that bucket
To access any of the counts, it is just buckets[number][0] and to access all the words, you loop that count starting with buckets[number][1].

As I had mentioned in comment before, I used dictionary to solve this problem.
Here, you do not need to bother about creating empty list using any external function as we do not know the actual length.
So you can try like this.
You can visit https://rextester.com/ZQKA28350 to run the code online.
def add_words_to_bucket(words):
d = {}
for word in words:
l = len(word)
if l in d:
d[l].append(word)
else:
i = 0
while l >= 0 and not l in d:
if not i:
d[l] = [word]
else:
d[l] = []
l = l - 1
i += 1
return d
def get_as_list(d):
bucket = [d[i] for i in range(0, len(d))]
return bucket
words = ["a", "git", "go", "py", "java", "paper", "ruby", "r"]
d = add_words_to_bucket(words)
bucket = get_as_list(d)
print(d) # {0: [], 1: ['a', 'r'], 2: ['go', 'py'], 3: ['git'], 4: ['java', 'ruby'], 5: ['paper']}
print(bucket) # [[], ['a', 'r'], ['go', 'py'], ['git'], ['java', 'ruby'], ['paper']]
words2 = ["a", "git", "go", "py", "", "java", "paper", "ruby", "r","TheIpMan", ""]
d2 = add_words_to_bucket(words2)
bucket2 = get_as_list(d2)
print(d2) # {0: ['', ''], 1: ['a', 'r'], 2: ['go', 'py'], 3: ['git'], 4: ['java', 'ruby'], 5: ['paper'], 6: [], 7: [], 8: ['TheIpMan']}
print(bucket2) # [['', ''], ['a', 'r'], ['go', 'py'], ['git'], ['java', 'ruby'], ['paper'], [], [], ['TheIpMan']]

This should do the trick:
def bucket_words_by_length(words):
d = {}
[d.setdefault(len(word), []).append(word) for word in words]
buckets = [d.get(k, []) for k in range(max(d.keys()) + 1)]
return buckets
For example,
>>>words = ['hi', 'my', 'friend', 'how', 'are', 'you']
>>>bucket_words_by_length(words)
[[], [], ['my', 'hi'], ['you', 'how', 'are'], [], [], ['friend']]
This implementation first builds a dictionary with lengths as its keys and a list of the words of corresponding length as its values. Next, it iterates through all of the lengths producing an empty list if no words are of that length and producing the list of words otherwise.

Efficient and fastest way to search in a list of strings

The following function return the number of words from a list that contain the exact same characters as the word entered. The order of the characters in the words is not important. However, say there is a list that contain millions of words. What is the most efficient and fastest way to perform this search?
Example:
words_list = ['yek','lion','eky','ekky','kkey','opt'];
if we were to match the word "key" with the words in the list, the function only return "yek" and "eky" since they share the same exact characters with "key" regardless of the order.
Below is the function I wrote
def find_a4(words_list, word):
# all possible permutations of the word that we are looking for
# it's a set of words
word_permutations = set([''.join(p) for p in permutations(word)])
word_size = len(word)
count = 0
for word in word_list:
# in the case of word "key",
# we only accept words that have 3 characters
# and they are in the word_permutations
if len(word) == word_size and word in word_permutations:
count += 1
return count

A dictionary whose key is the sorted version of the word:
word_list = ['yek','lion','eky','ekky','kkey','opt']
from collections import defaultdict
word_index = defaultdict(set)
for word in word_list:
idx = tuple(sorted(word))
word_index[idx].add(word)
# word_index = {
# ('e', 'k', 'y'): {'yek', 'eky'},
# ('i', 'l', 'n', 'o'): {'lion'},
# ('e', 'k', 'k', 'y'): {'kkey', 'ekky'},
# ('o', 'p', 't'): {'opt'}
# }
Then for querying you would do:
def find_a4(word_index, word):
idx = tuple(sorted(word))
return len(word_index[idx])
Or if you need to return the actual words, change it to return word_index[idx].
Efficiency: querying runs in average in O(1) time.

For large string, you will have n! permutations to search. I will sort all the strings before comparison, this will be nlog(n), and will sort and compare only when lengths match -
def find_a4(words_list, word):
word = ''.join(sorted(word))
word_size = len(word)
count = 0
for word1 in words_list:
if len(word1) == word_size:
if word == ''.join(sorted(word1)):
count += 1
return count

Counting the Vowels at the End of a Word

Write a function named vowelEndings that takes a string, text, as a parameter.
The function vowelEndings returns a dictionary d in which the keys are all the vowels that are the last letter of some word in text. The letters a, e, i, o and u are vowels. No other letter is a vowel. The value corresponding to each key in d is a list of all the words ending with that vowel. No word should appear more than once in a given list. All of the letters in text are lower case.
The following is an example of correct output:
>>> t = 'today you are you there is no one alive who is you-er than you'
>>> vowelEndings(t)
{'u': ['you'], 'o': ['no', 'who'], 'e': ['are', 'there', 'one', 'alive']}
This is what I have so far:
def vowelEndings(text):
vowels = 'aeiouAEIOU'
vowelCount = 0
words = text.split()
for word in words:
if word[0] in vowels:
vowelCount += 1
return vowelCount
t = 'today you are you there is no one alive who is you-er than you'
print(vowelEndings(t))
Output:
5
What is doing is counting the vowels of the beginning of each word but it should be counting the vowels of the end of each word. Also, it should print out the vowel and the word which the vowel is referring to like in the question. I need help with that.

You are close. The missing aspects are:
To extract the last letter, use word[-1].
You need to create a dictionary with vowel keys.
The dictionary values should be set to avoid duplicates.
The classic Python solution is to use collections.defaultdict:
from collections import defaultdict
t = 'today you are you there is no one alive who is you-er than you'
def vowelEndings(text):
vowels = set('aeiou')
d = defaultdict(set)
for word in text.split():
final = word[-1]
if final in vowels:
d[final].add(word)
return d
print(vowelEndings(t))
defaultdict(set,
{'e': {'alive', 'are', 'one', 'there'},
'o': {'no', 'who'},
'u': {'you'}})

Given a word from a list, I need to find all the next number values of the list

I am starting with python. Sorry if the question is trivial.
I have been searching, but I do not found any like my problem.
Given a word from a list, I need to find the first word that contains at least one number from the position of the given word.
For example, the given word is = CAR and my list is:
ls1 = ['MOTO', 'FREZZE', 'CAR', 'DECIDING', 'LOCAL', 'USING', '4587125', 'JOY', 'CAR', 'YORT', '548H21']
I expected to return:
ls2 = ['4587125','548H21']
I have been trying, but no results found...
Thanks for your help,
My code,
def hasNumbers(inputString):
return any(char.isdigit() for char in inputString)
def number_word (character):
while not hasNumbers(character):
character = [ls1()[i + 1] for i, word in enumerate(ls1()[:-1]) if word == character]
if hasNumbers(character):
return character
Marcus

Iterate the list and keep track how often you have seen the word, then when you see a number, yield it as often as you have seen the word before. You can make this a generator:
def get_nums_after(lst, word):
seen = 0
for x in lst:
if x == word:
seen += 1
if any(c.isdigit() for c in x):
while seen > 0:
yield x
seen -= 1
Examples:
ls1 = ['MOTO', 'FREZZE', 'CAR', 'DECIDING', 'LOCAL', 'USING', '4587125', 'JOY', 'CAR', 'YORT', '548H21']
print(list(get_nums_after(ls1, "CAR")))
# ['4587125', '548H21']
ls2 = ["1", "CAR", "FOO", "2", "3", "CAR", "CAR", "4"]
print(list(get_nums_after(ls2, "CAR")))
# ['2', '4', '4']

given_word = "CAR"
words = ['MOTO', 'FREZZE', 'CAR', 'DECIDING', 'LOCAL', 'USING', '4587125', 'JOY', 'CAR', 'YORT', '548H21']
required_words = []
for i in range(len(words)):
if given_word == words[i]:
for j in word[i:]:
if any(k.isdigit() for k in j)
required_words.append(j)
break
print required_words

you can do something like :
>>> ls1 = ['MOTO', 'FREZZE', 'CAR', 'DECIDING', 'LOCAL', 'USING', '4587125', 'JOY', 'CAR', 'YORT', '548H21']
find the index of word 'CAR' :
>>> ls2=[i for i in range(len(ls1)) if ls1[i]=='CAR']
>>> ls2
[2, 8]
Then create a temporary list that start to each 'CAR' word in ls1 and find the first string that contain a digit :
>>> ls4=[]
>>> for i in ls2:
... ls3=ls1[i:]
... for j in ls3:
... if any(char.isdigit() for char in j):
... ls4.append(j)
... break
...
>>> ls4
['4587125', '548H21']

Delete words which have 2 consecutive vowels in it

What i want is remove the words which have more than two consecutive vowels in it. So input:
s = " There was a boat in the rain near the shore, by some mysterious lake"
Output:
[boat,rain,near,mysterious]
So here is my code.
I was just wondering if there is any better way to do this or is this efficient enough.And if you can do this with python dict or lists are ok? :) I'm new to python so yeah. :) comments would be nice.
def change(s):
vowel = ["a","e","i","o","u"]
words = []
a = s[:].replace(",","").split()
for i in vowel:
s = s.replace(i, "*").replace(",","")
for i,j in enumerate(s.split()):
if "**" in j:
words.append(a[i])
return words

Alternatively, you could always use regular expressions and list comprehension to get the list of words:
>>> import re
>>> [x for x in s.split() if re.search(r'[aeiou]{2}', x)]
['boat', 'rain', 'near', 'mysterious']
s.split() splits the sentence into a list of words. The expression [x for x in s.split()] considers each word in this list in turn.
The re.search(r'[aeiou]{2}', x) part of the expression searches each word for two consecutive letters from the group [aeiou]. Only if two consecutive vowels are found is the word put in the new list.

using sets:
First method using set.intersection will only find non identical consecutive pairs so oo would not be a match:
s = " There was a boat in the rain near the shore, by some mysterious lake"
vowels = "aeiouAEIOU"
print([x for x in s.split() if any(len(set(x[i:i+2]).intersection(vowels))== 2 for i in range(len(x))) ])
['boat', 'rain', 'near', 'mysterious']
Method 2 uses set.issubset so now identical consecutive pairs will be considered a match.
using set.issubset with a function using the yield from python 3 syntax which might be more appropriate and indeed to catch repeated identical vowels :
vowels = "aeiouAEIOU"
def get(x, step):
yield from (x[i:i+step] for i in range(len(x[:-1])))
print([x for x in s.split() if any(set(pr).issubset(vowels) for pr in get(x, 2))])
Or again in a single list comp:
print([x for x in s.split() if any(set(pr).issubset(vowels) for pr in (x[i:i+2] for i in range(len(x[:-1]))))])
Finally make vowels a set and check if it is a set.issuperset of any pair of chars:
vowels = {'a', 'u', 'U', 'o', 'e', 'i', 'A', 'I', 'E', 'O'}
def get(x, step):
yield from (x[i:i+step] for i in range(len(x[:-1])))
print([x for x in s.split() if any(vowels.issuperset(pr) for pr in get(x, 2))])

Using pairwise iteration:
from itertools import tee
def pairwise(iterable):
a, b = tee(iter(iterable))
next(b)
return zip(a,b)
vowels = 'aeiouAEIOU'
[word for word in s.split() if any(
this in vowels and next in vowels for this,next in pairwise(word))]

Use regular expressions instead:
import re
s = 'There was a boat in the rain near the shore, by some mysterious lake'
l = [i for i in s.split(' ') if re.search('[aeiou]{2,}', i)]
print ' '.join(l) # back to string

Using product instead:
from itertools import product
vowels = 'aiueo'
comb = list(product(vowels, repeat=2))
s = " There was a boat in the rain near the shore, by some mysterious lake"
def is2consecutive_vowels(word):
for i in range(len(word)-1):
if (word[i], word[i+1]) in comb:
return True
return False
print [word for word in s.split() if is2consecutive_vowels(word)]
# ['boat', 'rain', 'near', 'mysterious']
or if you don't need to use any external library:
vowels = 'aeiou'
def is2consecutive_vowels2(word):
for i in range(len(word)-1):
if word[i] in vowels and word[i+1] in vowels:
return True
return False
print [word for word in s.split() if is2consecutive_vowels2(word)]
# ['boat', 'rain', 'near', 'mysterious']
This one is even quicker than regex solution!

a=[]
def count(s):
c=0
t=s.split()
for i in t:
for j in range(len(i)-1):
w=i[j]
u=i[j+1]
if u in "aeiou" and w in "aeiou":
c+=1
if(c>=1):
a.append(i)
c=0
return(a)
print(count("There was a boat in the rain near the shore, by some mysterious lake"))

Develop Reference

Python is a programming language that lets you work quickly and integrate systems more effectively.

Pattern matching in regex - python

You could check the first string for the location of the vowels and generate a regex string to match. Every char maps to either '[aeiou]' or '.' depending on if it's a vowel or not. Why you do with 'y' is up to you.

Related

How to add x amount of words of x length to different nested lists

Efficient and fastest way to search in a list of strings

Counting the Vowels at the End of a Word

Given a word from a list, I need to find all the next number values of the list

Delete words which have 2 consecutive vowels in it

Categories

Resources