Word list from text file

Word list from text file - python

I need to create a word list from a text file. The list is going to be used in a hangman code and needs to exclude the following from the list:
duplicate words
words containing less than 5 letters
words that contain 'xx' as a substring
words that contain upper case letters
the word list then needs to be output into file so that every word appears on its own line.
The program also needs to output the number of words in the final list.
This is what I have, but it's not working properly.
def MakeWordList():
infile=open(('possible.rtf'),'r')
whole = infile.readlines()
infile.close()
L=[]
for line in whole:
word= line.split(' ')
if word not in L:
L.append(word)
if len(word) in range(5,100):
L.append(word)
if not word.endswith('xx'):
L.append(word)
if word == word.lower():
L.append(word)
print L
MakeWordList()

You're appending the word many times with this code,
You arn't actually filtering out the words at all, just adding them a different number of timed depending on how many if's they pass.
you should combine all the if's:
if word not in L and len(word) >= 5 and not 'xx' in word and word.islower():
L.append(word)
Or if you want it more readable you can split them:
if word not in L and len(word) >= 5:
if not 'xx' in word and word.islower():
L.append(word)
But don't append after each one.

Think about it: in your nested if-statements, ANY word that is not already in the list will make it through on your first line. Then if it is 5 or more characters, it will get added again (I bet), and again, etc. You need to rethink your logic in the if statements.

Improved code:
def MakeWordList():
with open('possible.rtf','r') as f:
data = f.read()
return set([word for word in data if len(word) >= 5 and word.islower() and not 'xx' in word])
set(_iterable_) returns a set-type object that has no duplicates (all set items must be unique). [word for word...] is a list comprehension which is a shorter way of creating simple lists. You can iterate over every word in 'data' (this assumes each word is on a separate line). if len(word) >= 5 and word.islower() and not 'xx' in word accomplishes the final three requirements (must be more than 5 letters, have only lowercase letters, and cannot contain 'xx').

Related

Correct one-liner list comprehension

I have the following list and string:
words = ['AIBONITO', 'BICINIUM', 'LIMONIUM', 'PICKNICK', 'SILENIUM', 'TITANIUM']
letters = 'ADEOLR'
I want to delete items in the list that contain a letter in the string. The following code does just that.
code:
for letter in letters:
for word in words:
if letter in word:
words.remove(word)
print(words)
output:
['BICINIUM', 'PICKNICK']
Now I would like to convert it to a one-liner. I tried to do it by using the following code:
print([words for letter in letters for word in words if letter not in word])
This gives me a list with 12 items all containing ['BICINIUM', 'PICKNICK']. What do I need to change in the one-liner to obtain the same output as the first piece of code?
I know I can add "[0]" at the end of the one-liner but that's not really clean.

using all allows to do that.
words = ['AIBONITO', 'BICINIUM', 'LIMONIUM', 'PICKNICK', 'SILENIUM', 'TITANIUM']
letters = 'ADEOLR'
result = [word for word in words if all(letter not in word for letter in letters)]
yields:
['BICINIUM', 'PICKNICK']
An alternative uses a set of letters for the same result:
letters = set('ADEOLR')
result = [word for word in words if letters.isdisjoint(word)]

letters = set(letters)
[word for word in words if len(set(word).intersection(letters))==0]
#['BICINIUM', 'PICKNICK']

Code for creating an array of longest words not working [Python]

I am trying to write python code that takes words from an array and makes a new array that includes all of the longest words of the previous array. I can't find where the problem is, but whenever I run this, it just eats a ton of RAM but doesn't work (it should print the words Good and Cool). Does anyone see the problem?
words = ["Good", "Bad", "Cool"]
def longest_word():
longest = [""]
for word in words:
for word2 in longest:
if len(word) > len(word2):
longest.clear()
longest.append(word)
elif len(word) == len(word2):
longest.append(word)
print(str(longest_word))
longest_word()

You could do this in a list comprehension using the max function like this:
words = ["Good", "Bad", "Cool"]
max_length = len(max(words,key=len))
longest = [word for word in words if len(word) == max_length]
The max function will return the string with the most characters because of the key=len argument, and then you compare the length of each word in your list to the length of the returned string from max. If the lengths match then it adds the word to the longest list.
Edit: Changed my answer to reflect #NathanielFord's suggestion

While I wouldn't go about it this way, you're pretty close with a few simplifications:
words = ["Good", "Bad", "Cool"]
def longest_word():
longest = [] # Don't add an empty word here
l = len(words[0]) # Remember the length of the first word
for word in words:
if len(word) > l: # If the next word is longer than the longest length found so far...
longest = [] # Reset your list
longest.append(word) # Add the word
l = len(word) # Remember the new longest length
elif len(word) == l:
longest.append(word)
print(longest)
longest_word()
There are two reasons not to iterate over the longest list. The first is that you're actively modifying that list - if you clear it in the middle of doing so it will lead to strange results. Secondly, it's much more computationally expensive - you might get O(n^2) time in the worst case. The only info you need is the length so far of the longest word, so remember that and update it instead of trying to recheck each element of your longest list. (After all, you know all those elements have to equal each other!)

You could just loop through once getting the length of each word and saving it to a longestWord variable if it's greater. Then loop through a second time to add it to the array if it's the same length as the longestWord variable.

Python: Iterate through string and print only specific words

I'm taking a class in python and now I'm struggling to complete one of the tasks.
The aim is to ask for an input, integrate through that string and print only words that start with letters > g. If the word starts with a letter larger than g, we print that word. Otherwise, we empty the word and iterate through the next word(s) in the string to do the same check.
This is the code I have, and the output. Would be grateful for some tips on how to solve the problem.
# [] create words after "G" following the Assignment requirements use of functions, menhods and kwyowrds
# sample quote "Wheresoever you go, go with all your heart" ~ Confucius (551 BC - 479 BC)
# [] copy and paste in edX assignment page
quote = input("Enter a sentence: ")
word = ""
# iterate through each character in quote
for char in quote:
# test if character is alpha
if char.isalpha():
word += char
else:
if word[0].lower() >= "h":
print(word.upper())
else:
word=""
Enter a sentence: Wheresoever you go, go with all your heart
WHERESOEVER
WHERESOEVERYOU
WHERESOEVERYOUGO
WHERESOEVERYOUGO
WHERESOEVERYOUGOGO
WHERESOEVERYOUGOGOWITH
WHERESOEVERYOUGOGOWITHALL
WHERESOEVERYOUGOGOWITHALLYOUR
The output should look like,
Sample output:
WHERESOEVER
YOU
WITH
YOUR
HEART

Simply a list comprehension with split will do:
s = "Wheresoever you go, go with all your heart"
print(' '.join([word for word in s.split() if word[0].lower() > 'g']))
# Wheresoever you with your heart
Modifying to match with the desired output (Making all uppercase and on new lines):
s = "Wheresoever you go, go with all your heart"
print('\n'.join([word.upper() for word in s.split() if word[0].lower() > 'g']))
'''
WHERESOEVER
YOU
WITH
YOUR
HEART
'''
Without list comprehension:
s = "Wheresoever you go, go with all your heart"
for word in s.split(): # Split the sentence into words and iterate through each.
if word[0].lower() > 'g': # Check if the first character (lowercased) > g.
print(word.upper()) # If so, print the word all capitalised.

Here is a readable and commented solution. The idea is first to split the sentence into a list of words using re.findall (regex package) and iterate through this list, instead of iterating on each character as you did. It is then quite easy to print only the words starting by a letter greater then 'g':
import re
# Prompt for an input sentence
quote = input("Enter a sentence: ")
# Split the sentence into a list of words
words = re.findall(r'\w+', quote)
# Iterate through each word
for word in words:
# Print the word if its 1st letter is greater than 'g'
if word[0].lower() > 'g':
print(word.upper())
To go further, here is also the one-line style solution based on exactly the same logic, using list comprehension:
import re
# Prompt for an input sentence
quote = input("Enter a sentence: ")
# Print each word starting by a letter greater than 'g', in upper case
print(*[word.upper() for word in re.findall(r'\w+', quote) if word[0].lower() > 'g'], sep='\n')

s = "Wheresoever you go, go with all your heart"
out = s.translate(str.maketrans(string.punctuation, " "*len(string.punctuation)))
desired_result = [word.upper() for word in out.split() if word and word[0].lower() > 'g']
print(*desired_result, sep="\n")

Your problem is that you're only resetting word to an empty string in the else clause. You need to reset it to an empty string immediately after the print(word.upper()) statement as well for the code as you've wrote it to work correctly.
That being said, if it's not explicitly disallowed for the class you're taking, you should look into string methods, specifically string.split()

Removing an item from a list

I would like to remove to ignore duplicates in my list. For example, let's say the function checks for words that end with a ''.'' and puts them in a list. I would like to make sure that duplicate words don't go in the list.
Here is what I have so far:
def endwords(sent):
list = []
words = sent.split()
for word in words:
if "." in word:
list.append(word)
# bottom if statment does not work for some reason. thats the one i am trying to fix
if (word == list):
list.remove(word)
return list

How about you check if the word is already in the list before appending it, like so:
def endwords(sent):
wordList = []
words = sent.split()
for word in words:
if "." in word and word not in wordList:
wordList.append(word)
return wordList
You're trying to check if word == list, but that's seeing if the word is equal to the entire list. To check if an element is in a container in python, you can use the in keyword. Alternatively, to check if something is not in a container, you can use not in.
Another option is to use a set:
def endwords(sent):
wordSet = set()
words = sent.split()
for word in words:
if "." in word:
wordSet.add(word)
return wordSet
And to make things a little cleaner, here is a version using set comprehension:
def endwords(sent):
return {word for word in sent.split() if '.' in word}
If you want to get a list out of this function, you can do so like this:
def endwords(sent):
return list({word for word in sent.split() if '.' in word})
Since you said in your question you want to check if the word ends with a '.', you probably also want to use the endswith() function like so:
def endwords(sent):
return list({word for word in sent.split() if word.endswith('.')})

After statement
list = []
you can't use built-in list class and to understand that you can spend about an hour or so, that's why we avoid names of built-ins for our objects.
More at this answer.
function checks for words that end with a ''.''
Statement
"." in word
checks if word contains dot symbol (e.g. "." in "sample.text" will work ok while it simply doesn't end with dot), if you need to check that it ends with dot – use str.endswith method.
I would like to make sure that duplicate words don't go in the list.
just make sure before storing one that it hasn't been stored already.
Finally we can write
def endwords(sent, end='.'):
unique_words = []
words = sent.split()
for word in words:
if word.endswith(end) and word not in unique_words:
unique_words.append(word)
return unique_words
Test
>>>sent = ' '.join(['some.', 'oth.er'] * 10)
>>>unique_words = endwords(sent)
>>>unique_words
['some.']
P. S.
If order doesn't matter – use set, it will take care of duplicates (works only with hashable types, str is hashable):
def endwords(sent, end='.'):
unique_words = set()
words = sent.split()
for word in words:
if word.endswith(end) and word not in unique_words:
unique_words.add(word)
return unique_words
or with set comprehension
def endwords(sent, end='.'):
words = sent.split()
return {word for word in words if word.endswith(end)}

You can add a sample judge for the question.
def endwords(sent):
list = []
words = sent.split()
for word in words:
if "." in word:
if word not in list:
list.append(word)
# bottom if statment does not work for some reason. thats the one i am trying to fix
return list

Why not use a set?
def endwords(sent):
my_list = set()
words = sent.split()
for word in words:
if "." in word:
my_list.add(word)
return my_list

The less verbose way to do it would be using list comprehension, that is
my_list = [word for word in words if '.' in word]
And to ensure the elements aren't duplicated, just use set.
my_list = set(my_list) # No more duplicated values

Search strings in list containing specific letters in random order

I am writing a code in Python 2.7 in which I have defined a list of strings. I then want to search this list's elements for a set of letters. These letters must be in random order. i.e. search the list for every single letter from input.
I have been google'ing around but i haven't found a solution.
Here's what i got:
wordlist = ['mississippi','miss','lake','que']
letters = str(aqk)
for item in wordlist:
if item.find(letters) != -1:
print item
This is an example. Here the only output should be 'lake' and 'que' since these words contain 'a','q' and 'k'.
How can I rewrite my code so that this will be done?
Thanks in advance!
Alex

It would be easy using set():
wordlist = ['mississippi','miss','lake','que']
letters = set('aqk')
for word in wordlist:
if letters & set(word):
print word
Output:
lake
que
Note: The & operator does an intersection between the two sets.

for item in wordlist:
for character in letters:
if character in item:
print item
break

Here goes your solution:
for item in wordlist:
b = False
for c in letters:
b = b | (item.find(c) != -1)
if b:
print item

[word for word in wordlist if any(letter in word for letter in 'aqk')]

Using sets and the in syntax to check.
wordlist = ['mississippi','miss','lake','que']
letters = set('aqk')
for word in wordlist:
if word in letters:
print word

Develop Reference

Python is a programming language that lets you work quickly and integrate systems more effectively.

Word list from text file - python

Think about it: in your nested if-statements, ANY word that is not already in the list will make it through on your first line. Then if it is 5 or more characters, it will get added again (I bet), and again, etc. You need to rethink your logic in the if statements.

Related

Correct one-liner list comprehension

Code for creating an array of longest words not working [Python]

Python: Iterate through string and print only specific words

Removing an item from a list

Search strings in list containing specific letters in random order

Categories

Resources