I would like to remove to ignore duplicates in my list. For example, let's say the function checks for words that end with a ''.'' and puts them in a list. I would like to make sure that duplicate words don't go in the list.
Here is what I have so far:
def endwords(sent):
list = []
words = sent.split()
for word in words:
if "." in word:
list.append(word)
# bottom if statment does not work for some reason. thats the one i am trying to fix
if (word == list):
list.remove(word)
return list
How about you check if the word is already in the list before appending it, like so:
def endwords(sent):
wordList = []
words = sent.split()
for word in words:
if "." in word and word not in wordList:
wordList.append(word)
return wordList
You're trying to check if word == list, but that's seeing if the word is equal to the entire list. To check if an element is in a container in python, you can use the in keyword. Alternatively, to check if something is not in a container, you can use not in.
Another option is to use a set:
def endwords(sent):
wordSet = set()
words = sent.split()
for word in words:
if "." in word:
wordSet.add(word)
return wordSet
And to make things a little cleaner, here is a version using set comprehension:
def endwords(sent):
return {word for word in sent.split() if '.' in word}
If you want to get a list out of this function, you can do so like this:
def endwords(sent):
return list({word for word in sent.split() if '.' in word})
Since you said in your question you want to check if the word ends with a '.', you probably also want to use the endswith() function like so:
def endwords(sent):
return list({word for word in sent.split() if word.endswith('.')})
After statement
list = []
you can't use built-in list class and to understand that you can spend about an hour or so, that's why we avoid names of built-ins for our objects.
More at this answer.
function checks for words that end with a ''.''
Statement
"." in word
checks if word contains dot symbol (e.g. "." in "sample.text" will work ok while it simply doesn't end with dot), if you need to check that it ends with dot – use str.endswith method.
I would like to make sure that duplicate words don't go in the list.
just make sure before storing one that it hasn't been stored already.
Finally we can write
def endwords(sent, end='.'):
unique_words = []
words = sent.split()
for word in words:
if word.endswith(end) and word not in unique_words:
unique_words.append(word)
return unique_words
Test
>>>sent = ' '.join(['some.', 'oth.er'] * 10)
>>>unique_words = endwords(sent)
>>>unique_words
['some.']
P. S.
If order doesn't matter – use set, it will take care of duplicates (works only with hashable types, str is hashable):
def endwords(sent, end='.'):
unique_words = set()
words = sent.split()
for word in words:
if word.endswith(end) and word not in unique_words:
unique_words.add(word)
return unique_words
or with set comprehension
def endwords(sent, end='.'):
words = sent.split()
return {word for word in words if word.endswith(end)}
You can add a sample judge for the question.
def endwords(sent):
list = []
words = sent.split()
for word in words:
if "." in word:
if word not in list:
list.append(word)
# bottom if statment does not work for some reason. thats the one i am trying to fix
return list
Why not use a set?
def endwords(sent):
my_list = set()
words = sent.split()
for word in words:
if "." in word:
my_list.add(word)
return my_list
The less verbose way to do it would be using list comprehension, that is
my_list = [word for word in words if '.' in word]
And to ensure the elements aren't duplicated, just use set.
my_list = set(my_list) # No more duplicated values
Related
I have the following list and string:
words = ['AIBONITO', 'BICINIUM', 'LIMONIUM', 'PICKNICK', 'SILENIUM', 'TITANIUM']
letters = 'ADEOLR'
I want to delete items in the list that contain a letter in the string. The following code does just that.
code:
for letter in letters:
for word in words:
if letter in word:
words.remove(word)
print(words)
output:
['BICINIUM', 'PICKNICK']
Now I would like to convert it to a one-liner. I tried to do it by using the following code:
print([words for letter in letters for word in words if letter not in word])
This gives me a list with 12 items all containing ['BICINIUM', 'PICKNICK']. What do I need to change in the one-liner to obtain the same output as the first piece of code?
I know I can add "[0]" at the end of the one-liner but that's not really clean.
using all allows to do that.
words = ['AIBONITO', 'BICINIUM', 'LIMONIUM', 'PICKNICK', 'SILENIUM', 'TITANIUM']
letters = 'ADEOLR'
result = [word for word in words if all(letter not in word for letter in letters)]
yields:
['BICINIUM', 'PICKNICK']
An alternative uses a set of letters for the same result:
letters = set('ADEOLR')
result = [word for word in words if letters.isdisjoint(word)]
letters = set(letters)
[word for word in words if len(set(word).intersection(letters))==0]
#['BICINIUM', 'PICKNICK']
I hope everyone is safe.
I am trying to go over a string and capitalize every first letter of the string.
I know I can use .title() but
a) I want to figure out how to use capitalize or something else in this case - basics, and
b) The strings in the tests, have some words with (') which makes .title() confused and capitalize the letter after the (').
def to_jaden_case(string):
appended_string = ''
word = len(string.split())
for word in string:
new_word = string[word].capitalize()
appended_string +=str(new_word)
return appended_string
The problem is the interpreter gives me "TypeError: string indices must be integers" even tho I have an integer input in 'word'. Any help?
thanks!
You are doing some strange things in the code.
First, you split the string just to count the number of words, but don't store it to manipulate the words after that.
Second, when iterating a string with a for in, what you get are the characters of the string, not the words.
I have made a small snippet to help you do what you desire:
def first_letter_of_word_upper(string, exclusions=["a", "the"]):
words = string.split()
for i, w in enumerate(words):
if w not in exclusions:
words[i] = w[0].upper() + w[1:]
return " ".join(words)
test = first_letter_of_word_upper("miguel angelo santos bicudo")
test2 = first_letter_of_word_upper("doing a bunch of things", ["a", "of"])
print(test)
print(test2)
Notes:
I assigned the value of the string splitting to a variable to use it in the loop
As a bonus, I included a list to allow you exclude words that you don't want to capitalize.
I use the original same array of split words to build the result... and then join based on that array. This a way to do it efficiently.
Also, I show some useful Python tricks... first is enumerate(iterable) that returns tuples (i, j) where i is the positional index, and j is the value at that position. Second, I use w[1:] to get a substring of the current word that starts at character index 1 and goes all the way to the end of the string. Ah, and also the usage of optional parameters in the list of arguments of the function... really useful things to learn! If you didn't know them already. =)
You have a logical error in your code:
You have used word = len(string.split()) which is of no use ,Also there is an issue in the for loop logic.
Try this below :
def to_jaden_case(string):
appended_string = ''
word_list = string.split()
for i in range(len(word_list)):
new_word = word_list[i].capitalize()
appended_string += str(new_word) + " "
return appended_string
from re import findall
def capitalize_words(string):
words = findall(r'\w+[\']*\w+', string)
for word in words:
string = string.replace(word, word.capitalize())
return string
This just grabs all the words in the string, then replaces the words in the original string, the characters inside the [ ] will be included in the word aswell
You are using string index to access another string word is a string you are accessing word using string[word] this causing the error.
def to_jaden_case(string):
appended_string = ''
for word in string.split():
new_word = word.capitalize()
appended_string += new_word
return appended_string
Simple solution using map()
def to_jaden_case(string):
return ' '.join(map(str.capitalize, string.split()))
In for word in string: word will iterate over the characters in string. What you want to do is something like this:
def to_jaden_case(string):
appended_string = ''
splitted_string = string.split()
for word in splitted_string:
new_word = word.capitalize()
appended_string += new_word
return appended_string
The output for to_jaden_case("abc def ghi") is now "AbcDefGhi", this is CammelCase. I suppose you actually want this: "Abc Def Ghi". To achieve that, you must do:
def to_jaden_case(string):
appended_string = ''
splitted_string = string.split()
for word in splitted_string:
new_word = word.capitalize()
appended_string += new_word + " "
return appended_string[:-1] # removes the last space.
Look, in your code word is a character of string, it is not index, therefore you can't use string[word], you can correct this problem by modifying your loop or using word instead of string[word]
So your rectified code will be:
def to_jaden_case(string):
appended_string = ''
for word in range(len(string)):
new_word = string[word].capitalize()
appended_string +=str(new_word)
return appended_string
Here I Changed The Third Line for word in string with for word in len(string), the counterpart give you index of each character and you can use them!
Also I removed the split line, because it's unnecessary and you can do it on for loop like len(string)
I have the following dictionary in Python:
myDict = {"how":"como", "you?":"tu?", "goodbye":"adios", "where":"donde"}
and with a string like : "How are you?" I wish to have the following result once compared to myDict:
"como are tu?"
as you can see If a word doesn't appear in myDict like "are" in the result appears as it.
This is my code until now:
myDict = {"how":"como", "you?":"tu?", "goodbye":"adios", "where":"donde"}
def translate(word):
word = word.lower()
word = word.split()
for letter in word:
if letter in myDict:
return myDict[letter]
print(translate("How are you?"))
As a result only gets the first letter : como , so what am I doing wrong for not getting the entire sentence?
Thanks for your help in advanced!
The function returns (exits) the first time it hits a return statement. In this instance, that will always be the first word.
What you should do is make a list of words, and where you see the return currently, you should add to the list.
Once you have added each word, you can then return the list at the end.
PS: your terminology is confusing. What you have are phrases, each made up of words. "This is a phrase" is a phrase of 4 words: "This", "is", "a", "phrase". A letter would be the individual part of the word, for example "T" in "This".
The problem is that you are returning the first word that is mapped in your dictionary, so you can use this (I have changed some variable names because is kind of confusing):
myDict = {"how":"como", "you?":"tu?", "goodbye":"adios", "where":"donde"}
def translate(string):
string = string.lower()
words = string.split()
translation = ''
for word in words:
if word in myDict:
translation += myDict[word]
else:
translation += word
translation += ' ' # add a space between words
return translation[:-1] #remove last space
print(translate("How are you?"))
Output:
'como are tu?'
When you call return, the method that is currently being executed is terminated, which is why yours stops after finding one word. For your method to work properly, you would have to append to a String that is stored as a local variable within the method.
Here's a function that uses list comprehension to translate a String if its exists in the dictionary:
def translate(myDict, string):
return ' '.join([myDict[x.lower()] if x.lower() in myDict.keys() else x for x in string.split()])
Example:
myDict = {"how": "como", "you?": "tu?", "goodbye": "adios", "where": "donde"}
print(translate(myDict, 'How are you?'))
>> como are tu?
myDict = {"how":"como", "you?":"tu?", "goodbye":"adios", "where":"donde"}
s = "How are you?"
newString =''
for word in s.lower().split():
newWord = word
if word in myDict:
newWord = myDict[word]
newString = newString+' '+newWord
print(newString)
This is what I have so far, but I'm stuck. I'm using nltk for the word list and trying to find all the words with the letters in "sand". From this list I want to find all the words I can make from the remaining letters.
import nltk.corpus.words.words()
pwordlist = []
for w in wordlist:
if 's' in w:
if 'a' in w:
if 'n' in w:
if 'd' in w:
pwordlist.append(w)
In this case I have to use all the letters to find the words possible.
I think this will work for finding the possible words with the remaining letters, but I can't figure out how to remove only 1 instance of the letters in 'sand'.
puzzle_letters = nltk.FreqDist(x)
[w for w in pwordlist if len(w) = len(pwordlist) and nltk.FreqDist(w) = puzzle_letters]
I would separate the logic into four sections:
A function contains(word, letters), which we'll use to detect whether a word contains "sand"
A function subtract(word, letters), which we'll use to remove "sand" from the word.
A function get_anagrams(word), which finds all of the anagrams of a word.
The main algorithm that combines all of the above to find words that are anagrams of other words once you remove "sand".
from collections import Counter
words = ??? #todo: somehow get a list of every English word.
def contains(word, letters):
return not Counter(letters) - Counter(word)
def subtract(word, letters):
remaining = Counter(word) - Counter(letters)
return "".join(remaining.elements())
anagrams = {}
for word in words:
base = "".join(sorted(word))
anagrams.setdefault(base, []).append(word)
def get_anagrams(word):
return anagrams.get("".join(sorted(word)), [])
for word in words:
if contains(word, "sand"):
reduced_word = subtract(word, "sand")
matches = get_anagrams(reduced_word)
if matches:
print word, matches
Running the above code on the Words With Friends dictionary, I get a lot of results, including:
...
cowhands ['chow']
credentials ['reticle', 'tiercel']
cyanids ['icy']
daftness ['efts', 'fest', 'fets']
dahoons ['oho', 'ooh']
daikons ['koi']
daintiness ['seniti']
daintinesses ['sienites']
dalapons ['opal']
dalesman ['alme', 'lame', 'male', 'meal']
...
Program:
from nltk.corpus import words
from collections import defaultdict
def norm(word):
return ''.join(sorted(word))
completers = defaultdict(list)
for word in words.words():
completers[norm(word + 'sand')].append(word)
for word in words.words():
comps = completers[norm(word)]
if comps:
print(word, comps)
Output:
...
admirableness ['miserable']
adnascent ['enact']
adroitness ['sorite', 'sortie', 'triose']
adscendent ['cedent', 'decent']
adsorption ['portio']
adventuress ['vesture']
adversant ['avert', 'tarve', 'taver', 'trave']
...
Let's answer your question instead of spoiling the fun by doing the whole exercise for you: To remove just one instance of the letter, specify a replacement and give a limit to how many times it should apply:
>>> "Frodo".replace("o", "", 1)
'Frdo'
Or if you need to apply a regexp just once (though in this case you don't need a regexp):
>>> import re
>>> re.sub(r"[od]", "", "Frodo", 1)
'Frdo'
Now if you have a string whose letters (s, a, n, d) you want to remove from a word word, you can simply loop over the string:
>>> for letter in "sand":
word = word.replace(letter, "", word)
I'll leave it to you to embed this in a loop that goes over all words in your wordlist, and to utilize the remaining letters.
I need to create a word list from a text file. The list is going to be used in a hangman code and needs to exclude the following from the list:
duplicate words
words containing less than 5 letters
words that contain 'xx' as a substring
words that contain upper case letters
the word list then needs to be output into file so that every word appears on its own line.
The program also needs to output the number of words in the final list.
This is what I have, but it's not working properly.
def MakeWordList():
infile=open(('possible.rtf'),'r')
whole = infile.readlines()
infile.close()
L=[]
for line in whole:
word= line.split(' ')
if word not in L:
L.append(word)
if len(word) in range(5,100):
L.append(word)
if not word.endswith('xx'):
L.append(word)
if word == word.lower():
L.append(word)
print L
MakeWordList()
You're appending the word many times with this code,
You arn't actually filtering out the words at all, just adding them a different number of timed depending on how many if's they pass.
you should combine all the if's:
if word not in L and len(word) >= 5 and not 'xx' in word and word.islower():
L.append(word)
Or if you want it more readable you can split them:
if word not in L and len(word) >= 5:
if not 'xx' in word and word.islower():
L.append(word)
But don't append after each one.
Think about it: in your nested if-statements, ANY word that is not already in the list will make it through on your first line. Then if it is 5 or more characters, it will get added again (I bet), and again, etc. You need to rethink your logic in the if statements.
Improved code:
def MakeWordList():
with open('possible.rtf','r') as f:
data = f.read()
return set([word for word in data if len(word) >= 5 and word.islower() and not 'xx' in word])
set(_iterable_) returns a set-type object that has no duplicates (all set items must be unique). [word for word...] is a list comprehension which is a shorter way of creating simple lists. You can iterate over every word in 'data' (this assumes each word is on a separate line). if len(word) >= 5 and word.islower() and not 'xx' in word accomplishes the final three requirements (must be more than 5 letters, have only lowercase letters, and cannot contain 'xx').