Finding words that are inside successive words - python

def sucontain(A):
C = A.split()
def magic(x):
B = [C[i]==C[i+1] for i in range(len(C)-1)]
return any(B)
N = [x for x in C if magic(x)]
return N
Phrase = "So flee fleeting candy can and bandage"
print (sucontain(Phrase))
The goal of this function is to create a list of the words that are inside of each successive word. For example the function would take the string ""So flee fleeting candy can and bandage" as input and return ['flee', 'and'] because flee is inside fleeting (the next word) and 'and' is inside 'bandage'. If no cases like these are found, an empty list [] should be returned. My code right now is returning [] instead of ['flee', 'and']. Can someone point out what I'm doing wrong? thank you

Just pair the consecutive words, then it becomes an easy list comprehension…
>>> s = "So flee fleeting candy can and bandage"
>>> words = s.split()
>>> [i for i, k in zip(words, words[1:]) if i in k]
['flee', 'and']

There is definitely something wrong with your magic function. It accepts x as an argument but doesn't use it anywhere.
Here is an alternate version that doesn't use an additional function:
def sucontain(A):
C = A.split()
return [w for i, w in enumerate(C[:-1]) if w in C[i+1]]
The enumerate() function allows us to loop over the indices and the values together, which makes it very straight forward to perform the test. C[i+1] is the next value and w is the current value so w in C[i+1] checks to see if the current value is contained in the next value. We use C[:-1] to make sure that we stop one before the last item, otherwise C[i+1] would result in an IndexError.

Looking ahead can be problematic. Instead of testing whether the current word is in the next one, check to see whether the previous word is in the current one. This almost always makes things simpler.
Also, use descriptive variable names instead of C and A and x and B and N and magic.
def succotash(text): # okay, so that isn't very descriptive
lastword = " " # space won't ever be in a word
results = []
for currentword in text.split():
if lastword in currentword:
results.append(currentword)
lastword = currentword
return results
print succotash("So flee fleeting candy can and bandage")

Related

Iterating through elements in a list from different index positions

This should be an easy one but I have simply not come to a solution.
This is the exercise:
Start with 4 words “comfortable”, “round”, “support”, “machinery”, return a list of all possible 2 word combinations.
Example: ["comfortable round", "comfortable support", "comfortable machinery", ...]
I have started coding a loop that would go through every element, starting with the element at index[0] :
words = ["comfortable, ", 'round, ', 'support, ', 'machinery, ']
index_zero= words[0]
for i in words:
words = index_zero + i
words_one = index_one + i
print(words)
>>> Output=
comfortable, comfortable,
comfortable, round,
comfortable, support,
comfortable, machinery
The issue is when I want to start iterating from the 2nd element ('round'). I have tried operating the indexes (index[0] + 1) but of course, it won't return anything as the elements are strings.
I know a conversion from string to indexes needs to take place, but I'm not sure how.
I have also tried defining a function, but it will return None
word_list = ["comfortable, ", 'round, ', 'support, ', 'machinery, ']
index_change = word_list[0]+ 1
def word_variations(set_of_words):
for i in set_of_words:
set_of_words = set_of_words[0] + i
set_of_words = word_variations(word_list)
print(set_of_words)
I think this would do what you're looking for:
def word_variations(word_list):
combinations = []
for first_word in word_list:
for second_word in word_list:
if first_word != second_word:
combinations.append(f'{first_word}, {second_word}')
return combinations
word_list = ["comfortable", "round", "support", "machinery"]
print(word_variations(word_list))
Explanation:
You need to include a return statement at the end of the function to return a value. In my example function word_variations(), I first define an empty list called combinations. This will store each combination we compute. Then I iterate through all the words in the input word_list, create another inner loop to iterate through all words again, and if the first_word does not equal the second_word append the combination to my combinations list. Once all loops are complete, return the finished list from the function.
If I slightly change the code to print each of the results on a new line:
def word_variations(word_list):
combinations = []
for first_word in word_list:
for second_word in word_list:
if first_word != second_word:
combinations.append(f'{first_word}, {second_word}')
return combinations
word_list = ["comfortable", "round", "support", "machinery"]
for combo in word_variations(word_list):
print(combo)
the output is:
comfortable, round
comfortable, support
comfortable, machinery
round, comfortable
round, support
round, machinery
support, comfortable
support, round
support, machinery
machinery, comfortable
machinery, round
machinery, support
If you want to work with indexes in a Python loop like that, you should use either enumerate or iterate over the length of the list. The following examples will start the loop at the second element.
Example getting both index and the word at once with enumerate:
for i, word in enumerate(set_of_words[1:]):
Example using only indexes:
for i in range(1, len(set_of_words)):
Note: set_of_words[1:] above is a slice that returns the list starting at the second element.
You can also use itertools.permutations() like this
from itertools import permutations
lst = ['comfortable', 'round', 'support', 'machinery']
for i in list(permutations(lst, 2)):
print(i)

fast way to search for a set of words in a list of words python

I have a set of fixed words of size 20. I have a large file of 20,000 records, where each record contains a string and I want to find if any word from the fixed set is present in a string and if present the index of the word.
example
s1=set([barely,rarely, hardly])#( actual size 20)
l2= =["i hardly visit", "i do not visit", "i can barely talk"] #( actual size 20,000)
def get_token_index(token,indx):
if token in s1:
return indx
else:
return -1
def find_word(text):
tokens=nltk.word_tokenize(text)
indexlist=[]
for i in range(0,len(tokens)):
indexlist.append(i)
word_indx=map(get_token_index,tokens,indexlist)
for indx in word_indx:
if indx !=-1:
# Do Something with tokens[indx]
I want to know if there is a better/faster way to do it.
This suggesting is only removing some glaring inefficiencies, but won't affect the overall complexity of your solution:
def find_word(text, s1=s1): # micro-optimization, make s1 local
tokens = nltk.word_tokenize(text)
for i, word in in enumerate(tokens):
if word in s1:
# Do something with `word` and `i`
Essentially, you are slowing things down by using map when all you really need is a condition inside your loop body anyway... So basically, just get rid of get_token_index, it is over-engineered.
You can use list comprehension with a double for loop:
s1=set(["barely","rarely", "hardly"])
l2 = ["i hardly visit", "i do not visit", "i can barely talk"]
locations = [c for c, b in enumerate(l2) for a in s1 if a in b]
In this example, the output would be:
[0, 2]
However, if you would like a way of accessing the indexes at which a certain word appears:
from collections import defaultdict
d = defaultdict(list)
for word in s1:
for index, sentence in l2:
if word in sentence:
d[word].append(index)
This should work:
strings = []
for string in l2:
words = string.split(' ')
for s in s1:
if s in words:
print "%s at index %d" % (s, words.index(s))
The Easiest Way and Slightly More Efficient way would be using the Python Generator Function
index_tuple = list((l2.index(i) for i in s1 i in l2))
you can time it and check how efficiently this works with your requirement

Python: finding the two words following a key word

I'm sure I am missing something obvious here, but I have been staring at this code for a while and cannot find the root of the problem.
I want to search through many strings, find all the occurrences of certain keywords, and for each of these hits, to retrieve (and save) the two words immediately preceding and following the keywords.
So far the code I have find those words, but when there is more than one occurrence of the keyword in a string, the code returns two different lists. How can I aggregate those lists at the observation/string level (so that I can match it back to string i)?
Here is a mock example of a sample and desired results. Keyword is "not":
review_list=['I like this book.', 'I do not like this novel, no, I do not.']
results= [[], ['I do not like this I do not']]
Current results (using code below) would be:
results = [[], ['I do not like this'], ['I do not']]
Here is the code (simplified version):
for i in review_list:
if (" not " or " neither ") in i:
z = i.split(' ')
for x in [x for (x, y) in enumerate(z) if find_not in y]:
neg_1=[(' '.join(z[max(x-numwords,0):x+numwords+1]))]
neg1.append(neg_1)
elif (" not " or " neither ") not in i:
neg_1=[]
neg1.append(neg_1)
Again, I am certain this is basic, but as a new Python user, any help will be greatly appreciated. Thanks!
To extract only words (removing punctuation) e.g from a string such as
'I do not like this novel, no, I do not.'
I recommend regular expressions:
import re
words = re.findall(r'\w+', somestring)
To find all indices at which one word equals not:
indices = [i for i, w in enumerate(words) if w=='not']
To get the two previous and to following words as well, I recommend a set to remove duplications:
allindx = set()
for i in indices:
for j in range(max(0, i-2), min(i+3, len(words))):
allindx.add(j)
and finally to get all the words in question into a space-joined string:
result = ' '.join(words[i] for i in sorted(allindx))
Now of course we can put all these tidbits together into a function...:
import re
def twoeachside(somestring, keyword):
words = re.findall(r'\w+', somestring)
indices = [i for i, w in enumerate(words) if w=='not']
allindx = set()
for i in indices:
for j in range(max(0, i-2), min(i+3, len(words)):
allindx.add(j)
result = ' '.join(words(i) for i in sorted(allindx))
return result
Of course, this function works on a single sentence. To make a list of results from a list of sentences:
review_list = ['I like this book.', 'I do not like this novel, no, I do not.']
results = [twoeachside(s, 'not') for s in review_list]
assert results == [[], ['I do not like this I do not']]
the last assert of course just being a check that the code works as you desire:-)
EDIT: actually judging from the example you somewhat absurdly require the results' items to be lists with a single string item if non-empty but empty lists if the string in them would be empty. This absolutely weird spec can of course also be met...:
results = [twoeachside(s, 'not') for s in review_list]
results = [[s] if s else [] for s in results]
it just makes no sense whatsoever, but hey!, it's your spec!-)

Find + Find next in Python

Let L be a list of strings.
Here is the code I use for finding a string texttofind in the list L.
texttofind = 'Bonjour'
for s in L:
if texttofind in s:
print 'Found!'
print s
break
How would you do a Find next feature ? Do I need to store the index of the previously found string?
One approach for huge lists would be to use a generator. Suppose you do not know whether the user will need the next match.
def string_in_list(s, entities):
"""Return elements of entities that contain given string."""
for e in entities:
if s in e:
yield e
huge_list = ['you', 'say', 'hello', 'I', 'say', 'goodbye'] # ...
matches = string_in_list('y', huge_list) # look for strings with letter 'y'
next(matches) # first match
next(matches) # second match
The other answers suggesting list comprehensions are great for short lists when you want all results immediately. The nice thing about this approach is that if you never need the third result no time is wasted finding it. Again, it would really only matter for big lists.
Update: If you want the cycle to restart at the first match, you could do something like this...
def string_in_list(s, entities):
idx = 0
while idx < len(entities):
if s in entities[idx]:
yield entities[idx]
idx += 1
if idx >= len(entities):
# restart from the beginning
idx = 0
huge_list = ['you', 'say', 'hello']
m = string_in_list('y', huge_list)
next(m) # you
next(m) # say
next(m) # you, again
See How to make a repeating generator for other ideas.
Another Update
It's been years since I first wrote this. Here's a better approach using itertools.cycle:
from itertools import cycle # will repeat after end
# look for s in items of huge_list
matches = cycle(i for i in huge_list if s in i)
next(matches)
Finding all strings in L which have as substring s.
[f for f in L if s in f]
If you want to find all indexes of strings in L which have s as a substring,
[i for i in range(0, len(L)) if L[i].find(s) >= 0]
This will find next if it exists. You can wrap it in function and return None/Empty string if it doesn't.
L = ['Hello', 'Hola', 'Bonjour', 'Salam']
for l in L:
if l == texttofind:
print l
if L.index(l) >= 0 and L.index(l) < len(L):
print L[L.index(l)+1]

Recursion and appending to lists

I'm having trouble with a program, the program takes one word, and changing one letter at a time, converts that word into the target word. Although, keep in mind that the converted word must be a legal word according to a dictionary of words that I've been given.
I'm having trouble figuring out how to make it recursive. The program has a limit to the amount of steps it must take.
The output needs to be a list. So if the parameters for the function changeling are
changeling("find","lose"), the output should be:
['find','fine','line','lone','lose'].
with my current code:
def changeling(word,target,steps):
holderlist=[]
i=0
if steps<0 and word!=target:
return None
if steps!=-1:
for items in wordList:
if len(items)==len(word):
i=0
if items!=word:
for length in items:
if i==1:
if items[1]==target[1] and items[0]==word[0] and items[2:]==word[2:]:
if items==target:
print "Target Achieved"
holder.list.append(target)
holderlist.append(items)
holderlist.append(changeling(items,target,steps-1))
elif i>0 and i<len(word)-1 and i!=1:
if items[i]==target[i] and items[0:i]==word[0:i] and items[i+1:]==word[i+1:]:
if items==target:
print "Target Achieved"
holderlist.append(items)
holderlist.append(changeling(items,target,steps-1))
elif i==0:
if items[0]==target[0] and items[1:]==word[1:]:
if items==target:
print "Target Achieved"
holderlist.append(items)
holderlist.append(changeling(items,target,steps-1))
elif i==len(word)-1:
if items[len(word)-1]==target[len(word)-1] and items[0:len(word)-1]==word[0:len(word)-1]:
if items==target:
print "Target Achieved"
holderlist.append(items)
holderlist.append(changeling(items,target,steps-1))
else:
return None
i+=1
return holderlist
I receive a messy output:
['fine', ['line', ['lone', ['lose', []]]], 'fond', []]
I get the answer I wanted, but I'm not sure how to a)clean it up, by not having lists within lists. and b)fond appears, because when find is called it gives fine and fond, fine is the one that ends up with the target word, and fond fails, but I'm not sure how to get rid of it once I've appended it to the holderlist.
Any help would be appreciated.
Cheers.
I'm not completely convinced that using extend instead of append will solve all your problems, because it seems like that may not account for making a change that does not lead to solving the word and requires backtracking.
If it turns out I am correct and the other answers don't end up working, here is a recursive function that will convert your current result into what you are looking for:
def flatten_result(nested_list, target):
if not nested_list:
return None
for word, children in zip(nested_list[::2], nested_list[1::2]):
if word == target:
return [word]
children_result = flatten_result(children, target)
if children_result:
return [word] + children_result
return None
>>> result = ['fine', ['line', ['lone', ['lose', []]]], 'fond', []]
>>> flatten_result(result, 'lose')
['fine', 'line', 'lone', 'lose']
If you're trying to add a list to a list, you probably want extend rather than append.
http://docs.python.org/library/stdtypes.html#mutable-sequence-types
Here's an alternative implementation. It doesn't use recursion, but instead permutations. It's been rewritten to pass the wordlist rather than rely on the global wordlist, which should make it more portable. This implementation relies strictly on generators, too, which ensures a smaller memory footprint than expanding lists (as in the extend/append solution)
import itertools
somelists = [['find','fine','line','lone','lose'],
['bank','hank','hark','lark','lurk'],
['tank','sank','sink','sing','ding']]
def changeling(word, target, wordlist):
def difference(word, target):
return len([i for i in xrange(len(word)) if word[i] != target[i]])
for length in xrange(1, len(wordlist) + 1):
for possibilities in [j for j in itertools.permutations(wordlist, length) if j[0] is word and j[-1] is target]:
#computes all permutations and discards those whose initial word and target word don't match parameters
if all(difference(possibilities[i], possibilities[i+1]) == 1 for i in xrange(0, len(possibilities) - 1)):
#checks that all words are exactly one character different from previous link
return possibilities
#returns first result that is valid; this can be changed to yield if you wish to have all results
for w in somelists:
print "from '%s' to '%s' using only %s" % (w[-2], w[0], w)
print changeling(w[-2], w[0], w)
print
w[-2], w[0] can be modified/replaced to match any words you choose

Categories

Resources