How to find substrings in a list of words

How to find substrings in a list of words - python

I'm trying to find if exists a substring in a list of strings,
FOR EXAMPLE:
I have the list of words ['GGBASDEPINK','ASDEKNIP','PINK','WORDRRAB','BAR']
PINK is a substring of ASDEKNIP , because the reverse of PINK is KNIP
and the word BAR is in WORDRRAB because the reverse is RAB
How to find if substrip is exits? and if yes so put in reverse that string
so the new list should be:
d = ['GGBASDEPINK','ASDEKNIP','PINK','WORDRRAB','BAR' ,'KNIP', 'RAB']
I tried like this
d = ['GGBASDEPINK','ASDEKNIP','PINK','WORDRRAB','BAR']
for word in d:
word = word[::-1]
if word in d:
print(word)
But it gives nothing

Use itertools.permutations:
from itertools import permutations
d = ['GGBASDEPINK','ASDEKNIP','PINK','WORDRRAB','BAR']
for x, y in permutations(d, 2):
rev = y[::-1]
if rev in x:
d.append(rev)
print(d)
# ['GGBASDEPINK', 'ASDEKNIP', 'PINK', 'WORDRRAB', 'BAR', 'KNIP', 'RAB']

Related

How to find longest string from a string separated by comma in python [duplicate]

This question already has answers here:
How to get all the maximums max function
(4 answers)
Closed 1 year ago.
I have a string separated by commas ,. I want to find the longest string from the given string.
words = 'run,barn,abcdefghi,yellow,barracuda,shark,fish,swim'
What I did so far
print(max(words.split(','), key=len))
And I am getting this output abcdefghi but as you can see abcdefghi and barracuda have same length. So, why I am only getting one instead of two or all.
Also
words = 'fishes,sam,gollum,sauron,frodo,balrog'
in the above string many words have same length. I want to return every one of them.

You can zip len of word to word then create dict from len and return largest len like below:
>>> from collections import defaultdict
>>> words = 'run,barn,abcdefghi,yellow,barracuda,shark,fish,swim'
>>> dct = defaultdict(list)
>>> lstWrdSplt = words.split(',')
>>> for word, length in (zip(lstWrdSplt,(map(len,lstWrdSplt)))):
... dct[length].append(word)
>>> dct[max(dct)]
['abcdefghi', 'barracuda']
# for more explanation
>>> dct
defaultdict(list,
{3: ['run'],
4: ['barn', 'fish', 'swim'],
9: ['abcdefghi', 'barracuda'],
6: ['yellow'],
5: ['shark']})
You can use this as function and use regex for find only words like below:
from collections import defaultdict
import re
def mxLenWord(words):
dct = defaultdict(list)
lstWrdSplt = re.findall('\w+', words)
for word, length in (zip(lstWrdSplt,(map(len,lstWrdSplt)))):
dct[length].append(word.strip())
return dct[max(dct)]
words = 'rUnNiNg ,swimming, eating,biking, climbing'
mxLenWord(words)
Output:
['swimming', 'climbing']

Try the below
from collections import defaultdict
data = defaultdict(list)
words = 'run,barn,abcdefghi,yellow,barracuda,shark,fish,swim'
for w in words.split(','):
data[len(w)].append(w)
word_len = sorted(data.keys(),reverse=True)
for wlen in word_len:
print(f'{wlen} -> {data[wlen]}')
output
9 -> ['abcdefghi', 'barracuda']
6 -> ['yellow']
5 -> ['shark']
4 -> ['barn', 'fish', 'swim']
3 -> ['run']

There're plenty of methods which I find way too complicated for such an easy task. You can solve it using combination of sorted() and groupby():
from itertools import groupby
words = 'run,barn,abcdefghi,yellow,barracuda,shark,fish,swim'
_, (*longest,) = next(groupby(sorted(words.split(","), key=len, reverse=True), len))
print(longest)
To find all words with same length you can use next one-liner:
from itertools import groupby
words = 'fishes,sam,gollum,sauron,frodo,balrog'
words_len = {l: list(w) for l, w in groupby(sorted(words.split(","), key=len), len)}
print(words_len)

Print the maximum occurence of the anagrams and the anagram words itself among the input anagrams

a = ['ab', 'absa', 'sbaa', 'basa', 'ba']
res = []
s = 0
for i in range(len(a)):
b=a[i]
c = ''.join(sorted(b))
res.append(c)
res.sort(reverse=False)
wordfreq = [res.count(p) for p in res]
d = dict(zip(res, wordfreq))
all_values = d.values() #all_values is a list
max_value = max(all_values)
print(max_value)
max_key = max(d, key=d.get)
print(max_key)
In the given problem user inputs various anagram words, the output should be the maximum frequency of that word and print those anagrams.
If you please help me print those anagrams from the input it will be really helpful.
Ooutput:
3 aabs
Expected Ooutput:
3
absa sbaa basa

You can create a dictionary of word v/s list of anagrams
and then print out the word which contains the maximum number of elements in the anagram list
from collections import defaultdict
words = ['ab','absa','sbaa','basa','ba']
wordToAnagram= defaultdict(list)
# word vs list anagram
# loop below will create {aabs: ['absa', 'sbaa', 'basa']}
for word in words:
s = "".join(sorted(word))
wordToAnagram[s].append(word)
word, anagrams = max(wordToAnagram.items(), key=lambda x: len(x[1]))
print(" ".join(anagrams))
OUTPUT:
3
absa sbaa basa
Details
wordToAnagrams
After iterating through words
wordToAnagram(dictionary) looks like this
{
"ab" : ["ab", "ba"]
"aabs": ["absa", "sbaa", "base"]
}
dictionary.items()
wordToAnagram.items() returns tuple-pair of dictionary key-value
where,
key: is our sorted string "ab" or "aabs",
value : is list of anagrams, e.g for key = "ab", value equals["ab", "ba"]
dict_items([('ab', ['ab', 'ba']), ('aabs', ['absa', 'sbaa', 'base'])])
max function using 'key' and lambda expression
max(wordToAnagram.items(), key=lambda x: len(x[1]))
finds maximum value from wordToAnagram.items() iterable, by comparing length of anagrams list (len(x[1])

You can try with numpy
and mode from statistics module
import numpy as np
from statistics import mode
words = ['ab','absa','sbaa','basa','ba']
# This sorts the letters of each word, and creates a list of them
sorted_words = [''.join(sorted(word)) for word in words]
max_freq_anagrams = np.array(words)[np.array(sorted_words) == mode(sorted_words)]
# mode(sorted_words) gives you the (sorted) word with the highest frequency
# np.array(sorted_words) == mode(sorted_words) gives you a list of true/false
# and finaly you slice your words by this true/false list
print(len(max_freq_anagrams))
print(list(max_freq_anagrams))
In case you have multiple max frequent words e.g.
words = ['ab','absa','sbaa','basa','ba', 'ba']
then instead of mode(sorted_words) use max(set(sorted_words), key=sorted_words.count) which takes the first most frequent word.

Longest word in sentence, potential equality of lengths

sentence = 'Cunning fox peels apples.'.strip('.')
def longest_word(target):
set = max(target.split(), key=len)
temp = [x for x in set]
count = 0
for i in range(len(temp)):
if temp[i].isalpha() == True:
count += 1
return set,count
print(longest_word(sentence))
The code works if the longest word in a sentence is strictly longer in symbols than any other, however, how should I adjust the code if the sentence is something like:
sentence = 'Black bananas and green tomatos are red.'
How can I return that there are n words that are equally long? Obviously it's enough to count the symbols in one of the words, but the:
set = max(sentence.split(),key=len)
returns only the first of the longest words.

Use the itertools module. It has a groupby() function that can be used to group an iterator based on a custom-defined function, in this case len():
>>> sentence = 'Black bananas and green tomatos are red.'
>>> words = sorted(sentence.strip(".").split(), key=len)
>>> groups = [list(g) for k,g in itertools.groupby(words, len)]
>>> groups
[['and', 'are', 'red'], ['Black', 'green'], ['bananas', 'tomatos']]
>>> groups[-1]
['bananas', 'tomatos']

You probably need to make two passes along the list of targets, once to get the maximum length, and then to select all words whose length matches the maximum:
def longest_words(targets):
targets = targets.split()
max_len = max(len(item) for item in targets)
return set(item for item in targets if len(item) == max_len)
Quick test:
In [17]: sentence = 'Black bananas and green tomatos are red.'
In [18]: longest_words(sentence.strip('.'))
Out[18]: {'bananas', 'tomatos'}

You can first get maximum length, and check and fetch each word of the same length:
sentence = 'Black bananas and green tomatos are red.'.strip('.')
def longest_word(target):
words = target.split()
max_len = max([len(w) for w in words])
return max_len, [w for w in words if len(w)==max_len]
print(longest_word(sentence))
You can also define a customized len function to count only characters,
def word_len(word):
return len([c for c in word if c.isalpha()])
and replace len with word_len in the previous example.

My function returns a list with the longest word or words.
def longest(sentence):
ordered = sorted(sentence.split(), key=len)
l = len(ordered[-1]) # the last, longest element
m = [ordered[-1]]
for elt in ordered[-2::-1]:
if len(elt) < l : return m
m.append(elt)
return m

Extracting multiple substring from a string

I have a complicated string and would like to try to extract multiple substring from it.
The string consists of a set of items, separated by commas. Each item has an identifier (id-n) for a pair of words inside which is enclosed by brackets. I want to get only the word inside the bracket which has a number attached to its end (e.g. 'This-1'). The number actually indicates the position of how the words should be arrannged after extraction.
#Example of how the individual items would look like
id1(attr1, is-2) #The number 2 here indicates word 'is' should be in position 2
id2(attr2, This-1) #The number 1 here indicates word 'This' should be in position 1
id3(attr3, an-3) #The number 3 here indicates word 'an' should be in position 3
id4(attr4, example-4) #The number 4 here indicates word 'example' should be in position 4
id5(attr5, example-4) #This is a duplicate of the word 'example'
#Example of string - this is how the string with the items looks like
string = "id1(attr1, is-1), id2(attr2, This-2), id3(attr3, an-3), id4(attr4, example-4), id5(atttr5, example-4)"
#This is how the result should look after extraction
result = 'This is an example'
Is there an easier way to do this? Regex doesn't work for me.

A trivial/naive approach:
>>> z = [x.split(',')[1].strip().strip(')') for x in s.split('),')]
>>> d = defaultdict(list)
>>> for i in z:
... b = i.split('-')
... d[b[1]].append(b[0])
...
>>> ' '.join(' '.join(d[t]) for t in sorted(d.keys(), key=int))
'is This an example example'
You have duplicated positions for example in your sample string, which is why example is repeated in the code.
However, your sample is not matching your requirements either - but this results is as per your description. Words arranged as per their position indicators.
Now, if you want to get rid of duplicates:
>>> ' '.join(e for t in sorted(d.keys(), key=int) for e in set(d[t]))
'is This an example'

Why not regex? This works.
In [44]: s = "id1(attr1, is-2), id2(attr2, This-1), id3(attr3, an-3), id4(attr4, example-4), id5(atttr5, example-4)"
In [45]: z = [(m.group(2), m.group(1)) for m in re.finditer(r'(\w+)-(\d+)\)', s)]
In [46]: [x for y, x in sorted(set(z))]
Out[46]: ['This', 'is', 'an', 'example']

OK, how about this:
sample = "id1(attr1, is-2), id2(attr2, This-1),
id3(attr3, an-3), id4(attr4, example-4), id5(atttr5, example-4)"
def make_cryssie_happy(s):
words = {} # we will use this dict later
ll = s.split(',')[1::2]
# we only want items like This-1, an-3, etc.
for item in ll:
tt = item.replace(')','').lstrip()
(word, pos) = tt.split('-')
words[pos] = word
# there can only be one word at a particular position
# using a dict with the numbers as positions keys
# is an alternative to using sets
res = [words[i] for i in sorted(words)]
# sort the keys, dicts are unsorted!
# create a list of the values of the dict in sorted order
return ' '.join(res)
# return a nice string
print make_cryssie_happy(sample)

Append to List Nested in Dictionary

I am trying to append to lists nested in a dictionary so I can see which letters follow a letter. I have the desired result at the bottom I would like to get. Why is this not matching up?
word = 'google'
word_map = {}
word_length = len(word)
last_letter = word_length - 1
for index, letter in enumerate(word):
if index < last_letter:
if letter not in word_map.keys():
word_map[letter] = list(word[index+1])
if letter in word_map.keys():
word_map[letter].append(word[index+1])
if index == last_letter:
word_map[letter] = None
print word_map
desired_result = {'g':['o', 'l'], 'o':['o', 'g'], 'l':['e'],'e':None}
print desired_result

Use the standard library to your advantage:
from itertools import izip_longest
from collections import defaultdict
s = 'google'
d = defaultdict(list)
for l1,l2 in izip_longest(s,s[1:],fillvalue=None):
d[l1].append(l2)
print d
The first trick here is to yield the letters pair-wise (with a None at the end). That's exactly what we do with izip_longest(s,s[1:],fillvalue=None). From there, it's a simple matter of appending the second letter to the dictionary list which corresponds to the first character. The defaultdict allows us to avoid all sorts of tests to check if the key is in the dict or not.

if letter not in word_map.keys():
word_map[letter] = list(word[index+1])
# now letter IS in word_map, so this also executes:
if letter in word_map.keys():
word_map[letter].append(word[index+1])
You meant:
if letter not in word_map.keys():
word_map[letter] = list(word[index+1])
else:
word_map[letter].append(word[index+1])
Another thing: what if the last letter also occurs in the middle of the word?

Develop Reference

Python is a programming language that lets you work quickly and integrate systems more effectively.

How to find substrings in a list of words - python

Use itertools.permutations: from itertools import permutations d = ['GGBASDEPINK','ASDEKNIP','PINK','WORDRRAB','BAR'] for x, y in permutations(d, 2): rev = y[::-1] if rev in x: d.append(rev) print(d) # ['GGBASDEPINK', 'ASDEKNIP', 'PINK', 'WORDRRAB', 'BAR', 'KNIP', 'RAB']

Related

How to find longest string from a string separated by comma in python [duplicate]

Print the maximum occurence of the anagrams and the anagram words itself among the input anagrams

Longest word in sentence, potential equality of lengths

Extracting multiple substring from a string

Append to List Nested in Dictionary

Categories

Resources