Alter the letter in list of strings - python

An example:
eword_list = ["a", "is", "bus", "on", "the"]
alter_the_list("A bus station is where a bus stops A train station is where a train stops On my desk I have a work station", word_list)
print("1.", word_list)
word_list = ["a", 'up', "you", "it", "on", "the", 'is']
alter_the_list("It is up to YOU", word_list)
print("2.", word_list)
word_list = ["easy", "come", "go"]
alter_the_list("Easy come easy go go go", word_list)
print("3.", word_list)
word_list = ["a", "is", "i", "on"]
alter_the_list("", word_list)
print("4.", word_list)
word_list = ["a", "is", "i", "on", "the"]
alter_the_list("May your coffee be strong and your Monday be short", word_list)
print("5.", word_list)
def alter_the_list(text, word_list):
return[text for text in word_list if text in word_list]
I'm trying to remove any word from the list of words which is a separate word in the string of text. The string of text should be converted to lower case before I check the elements of the list of words are all in lower case. There is no punctuation in the string of text and each word in the parameter list of word is unique. I don't know how to fix it.
output:
1. ['a', 'is', 'bus', 'on', 'the']
2. ['a', 'up', 'you', 'it', 'on', 'the', 'is']
3. ['easy', 'come', 'go']
4. ['a', 'is', 'i', 'on']
5. ['a', 'is', 'i', 'on', 'the']
expected:
1. ['the']
2. ['a', 'on', 'the']
3. []
4. ['a', 'is', 'i', 'on']
5. ['a', 'is', 'i', 'on', 'the']

I've done it like this:
def alter_the_list(text, word_list):
for word in text.lower().split():
if word in word_list:
word_list.remove(word)
text.lower().split() returns a list of all space-separated tokens in text.
The key is that you're required to alter word_list. It is not enough to return a new list; you have to use Python 3's list methods to modify the list in-place.

If the order of the resulting list does not matter you can use sets:
def alter_the_list(text, word_list):
word_list[:] = set(word_list).difference(text.lower().split())
This function will update word_list in place due to the assignment to the list slice with word_list[:] = ...

1
Your main problem is that you return a value from your function, but then ignore it. You have to save it in some way to print out, such as:
word_list = ["easy", "come", "go"]
word_out = alter_the_list("Easy come easy go go go", word_list)
print("3.", word_out)
What you printed is the original word list, not the function result.
2
You ignore the text parameter to the function. You reuse the variable name as a loop index in your list comprehension. Get a different variable name, such as
return[word for word in word_list if word in word_list]
3
You still have to involve text in the logic of the list you build. Remember that you're looking for words that are not in the given text.
Most of all, learn basic debugging.
See this lovely debug blog for help.
If nothing else, learn to use simple print statements to display the values of your variables, and to trace program execution.
Does that get you moving toward a solution?

I like #Simon's answer better, but if you want to do it in two list comprehensions:
def alter_the_list(text, word_list):
# Pull out all words found in the word list
c = [w for w in word_list for t in text.split() if t == w]
# Find the difference of the two lists
return [w for w in word_list if w not in c]

Related

Stemmer function that takes a string and returns the stems of each word in a list

I am trying to create this function which takes a string as input and returns a list containing the stem of each word in the string. The problem is, that using a nested for loop, the words in the string are appended multiple times in the list. Is there a way to avoid this?
def stemmer(text):
stemmed_string = []
res = text.split()
suffixes = ('ed', 'ly', 'ing')
for word in res:
for i in range(len(suffixes)):
if word.endswith(suffixes[i]):
stemmed_string.append(word[:-len(suffixes[i])])
elif len(word) > 8:
stemmed_string.append(word[:8])
else:
stemmed_string.append(word)
return stemmed_string
If I call the function on this text ('I have a dog is barking') this is the output:
['I',
'I',
'I',
'have',
'have',
'have',
'a',
'a',
'a',
'dog',
'dog',
'dog',
'that',
'that',
'that',
'is',
'is',
'is',
'barking',
'barking',
'bark']
You are appending something in each round of the loop over suffixes. To avoid the problem, don't do that.
It's not clear if you want to add the shortest possible string out of a set of candidates, or how to handle stacked suffixes. Here's a version which always strips as much as possible.
def stemmer(text):
stemmed_string = []
suffixes = ('ed', 'ly', 'ing')
for word in text.split():
for suffix in suffixes:
if word.endswith(suffix):
word = word[:-len(suffix)]
stemmed_string.append(word)
return stemmed_string
Notice the fixed syntax for looping over a list, too.
This will reduce "sparingly" to "spar", etc.
Like every naïve stemmer, this will also do stupid things with words like "sly" and "thing".
Demo: https://ideone.com/a7FqBp

Create dictionary of context words without stopwords

I am trying to create a dictionary of words in a text and their context. The context should be the list of words that occur within a 5 word window (two words on either side) of the term's position in the string. Effectively, I want to ignore the stopwords in my output vectors.
My code is below. I can get the stopwords out of my dictionary's keys but not the values.
words = ["This", "is", "an", "example", "sentence" ]
stopwords = ["it", "the", "was", "of"]
context_size = 2
stripes = {word:words[max(i - context_size,0):j] for word,i,j in zip(words,count(0),count(context_size+1)) if word.lower() not in stopwords}
print(stripes)
the output is:
{'example': ['is', 'an', 'example', 'sentence'], 'sentence': ['an', 'example', 'sentence']}
words = ["This", "is", "a", "longer", "example", "sentence"]
stopwords = set(["it", "the", "was", "of", "is", "a"])
context_size = 2
stripes = []
for index, word in enumerate(words):
if word.lower() in stopwords:
continue
i = max(index - context_size, 0)
j = min(index + context_size, len(words) - 1) + 1
context = words[i:index] + words[index + 1:j]
stripes.append((word, context))
print(stripes)
I would recommend to use a tuple list so in case a word occurs more than once in words the dict does not just contain the last one which overwrites previous ones. I would also put stopwords in a set, especially if its a larger list like NLTKs stopwords since that speeds up things.
I also excluded the word itself from the context but depending on how you want to use it you might want to include it.
This results in:
[('This', ['is', 'a']), ('longer', ['is', 'a', 'example', 'sentence']), ('example', ['a', 'longer', 'sentence']), ('sentence', ['longer', 'example'])]

how to get a list with words that are next to a specific word in a string in python

Assuming I have a string
string = 'i am a person i believe i can fly i believe i can touch the sky'.
What I would like to do is to get all the words that are next to (from the right side) the word 'i', so in this case am, believe, can, believe, can.
How could I do that in python ? I found this but it only gives the first word, so in this case, 'am'
Simple generator method:
def get_next_words(text, match, sep=' '):
words = iter(text.split(sep))
for word in words:
if word == match:
yield next(words)
Usage:
text = 'i am a person i believe i can fly i believe i can touch the sky'
words = get_next_words(text, 'i')
for w in words:
print(w)
# am
# believe
# can
# believe
# can
You can write a regular expression to find the words after the target word:
import re
word = "i"
string = 'i am a person i believe i can fly i believe i can touch the sky'
pat = re.compile(r'\b{}\b \b(\w+)\b'.format(word))
print(pat.findall(string))
# ['am', 'believe', 'can', 'believe', 'can']
One way is to use a regular expression with a look behind assertion:
>>> import re
>>> string = 'i am a person i believe i can fly i believe i can touch the sky'
>>> re.findall(r'(?<=\bi )\w+', string)
['am', 'believe', 'can', 'believe', 'can']
You can split the string and get the next index of the word "i" as you iterate with enumerate:
string = 'i am a person i believe i can fly i believe i can touch the sky'
sl = string.split()
all_is = [sl[i + 1] for i, word in enumerate(sl[:-1]) if word == 'i']
print(all_is)
# ['am', 'believe', 'can', 'believe', 'can']
Note that as #PatrickHaugh pointed out, we want to be careful if "i" is the last word so we can exclude iterating over the last word completely.
import re
string = 'i am a person i believe i can fly i believe i can touch the sky'
words = [w.split()[0] for w in re.split('i +', string) if w]
print(words)

How to pull a word out of a list then shuffle the word?

So I want to pull a word out of a list, then I want to jumble the word so I will have to guess what the mixed up word is. Once guessed correctly, will move onto the next word.
My code so far:
import random
words = ['Jumble', 'Star', 'Candy', 'Wings', 'Power', 'String', 'Shopping', 'Blonde', 'Steak', 'Speakers', 'Case', 'Stubborn', 'Cat', 'Marker', 'Elevator', 'Taxi', 'Eight', 'Tomato', 'Penguin', 'Custard']
from random import shuffle
shuffle(words)
Start with your code:
import random
words = ['Jumble', 'Star', 'Candy', 'Wings', 'Power', 'String', 'Shopping', 'Blonde', 'Steak', 'Speakers', 'Case', 'Stubborn', 'Cat', 'Marker', 'Elevator', 'Taxi', 'Eight', 'Tomato', 'Penguin', 'Custard']
Use random.choice() to grab a single word:
chosen = random.choice(words)
If you want to grab multiple words and deal with them one at a time, you can use random.sample() instead. If you want to deal with every single word, you can use random.shuffle() as you've already done and then iterate over words (with for word in words: ...).
Finally, shuffle the word:
letters = list(chosen) # list of all the letters in chosen, in order
random.shuffle(letters)
scrambled = ''.join(letters)
try something like
for w in words:
s = list(w)
shuffle(s)
print ''.join(s)
Convert each word into list first before you apply shuffle to each word. Then join them together
Pls kindly noted that shuffle() changes the input parameter in-place which it will always return a None .
As Kevin suggested, you can pick a random word by random.choice() and feed word as below showing:
I thinking you are trying to do like
sorted(words[2], key=lambda k: random.random())
# ['y', 'd', 'a', 'C', 'n']
you can join it by :
''.join(sorted(words[2], key=lambda k: random.random()))
# 'andCy'
You do not need to import shuffle it is already imported with random

Convert a list of string sentences to words

I'm trying to essentially take a list of strings containg sentences such as:
sentence = ['Here is an example of what I am working with', 'But I need to change the format', 'to something more useable']
and convert it into the following:
word_list = ['Here', 'is', 'an', 'example', 'of', 'what', 'I', 'am',
'working', 'with', 'But', 'I', 'need', 'to', 'change', 'the format',
'to', 'something', 'more', 'useable']
I tried using this:
for item in sentence:
for word in item:
word_list.append(word)
I thought it would take each string and append each item of that string to word_list, however the output is something along the lines of:
word_list = ['H', 'e', 'r', 'e', ' ', 'i', 's' .....etc]
I know I am making a stupid mistake but I can't figure out why, can anyone help?
You need str.split() to split each string into words:
word_list = [word for line in sentence for word in line.split()]
Just .split and .join:
word_list = ' '.join(sentence).split(' ')
You haven't told it how to distinguish a word. By default, iterating through a string simply iterates through the characters.
You can use .split(' ') to split a string by spaces. So this would work:
for item in sentence:
for word in item.split(' '):
word_list.append(word)
for item in sentence:
for word in item.split():
word_list.append(word)
Split sentence into words:
print(sentence.rsplit())

Categories

Resources