i recently wrote a method to cycle through /usr/share/dict/words and return a list of palindromes using my ispalindrome(x) method
here's some of the code...what's wrong with it? it just stalls for 10 minutes and then returns a list of all the words in the file
def reverse(a):
return a[::-1]
def ispalindrome(a):
b = reverse(a)
if b.lower() == a.lower():
return True
else:
return False
wl = open('/usr/share/dict/words', 'r')
wordlist = wl.readlines()
wl.close()
for x in wordlist:
if not ispalindrome(x):
wordlist.remove(x)
print wordlist
wordlist = wl.readlines()
When you do this, there is a new line character at the end, so your list is like:
['eye\n','bye\n', 'cyc\n']
the elements of which are obviously not a palindrome.
You need this:
['eye','bye', 'cyc']
So strip the newline character and it should be fine.
To do this in one line:
wordlist = [line.strip() for line in open('/usr/share/dict/words')]
EDIT: Iterating over a list and modifying it is causing problems. Use a list comprehension,as pointed out by Matthew.
Others have already pointed out better solutions. I want to show you why the list is not empty after running your code. Since your ispalindrome() function will never return True because of the "newlines problem" mentioned in the other answers, your code will call wordlist.remove(x) for every single item. So why is the list not empty at the end?
Because you're modifying the list as you're iterating over it. Consider the following:
>>> l = [1,2,3,4,5,6]
>>> for i in l:
... l.remove(i)
...
>>> l
[2, 4, 6]
When you remove the 1, the rest of the elements travels one step upwards, so now l[0] is 2. The iteration counter has advanced, though, and will look at l[1] in the next iteration and therefore remove 3 and so on.
So your code removes half of the entries. Moral: Never modify a list while you're iterating over it (unless you know exactly what you're doing :)).
I think there are two problems.
Firstly, what is the point in reading all of the words into a list? Why not process each word in turn and print it if it's a palindrome.
Secondly, watch out for whitespace. You have newlines at the end of each of your words!
Since you're not identifying any palindromes (due to the whitespace), you're going to attempt to remove every item from the list. While you're iterating over it!
This solution runs in well under a second and identifies lots of palindromes:
for word in open('/usr/share/dict/words', 'r'):
word = word.strip()
if ispalindrome(word):
print word
Edit:
Perhaps more 'pythonic' is to use generator expressions:
def ispalindrome(a):
return a[::-1].lower() == a.lower()
words = (word.strip() for word in open('/usr/share/dict/words', 'r'))
palindromes = (word for word in words if ispalindrome(word))
print '\n'.join(palindromes)
It doesn't return all the words. It returns half. This is because you're modifying the list while iterating over it, which is a mistake. A simpler, and more effective solution, is to use a list comprehension. You can modify sukhbir's to do the whole thing:
[word for word in (word.strip() for word in wl.readlines()) if ispalindrome(word)]
You can also break this up:
stripped = (word.strip() for word in wl.readlines())
wordlist = [word for word in stripped if ispalindrome(word)]
You're including the newline at the end of each word in /usr/share/dict/words. That means you never find any palindromes. You'll speed things up if you just log the palindromes as you find them, instead of deleting non-palindromes from the list, too.
Related
This is my code, but it doesn't work. It should read text from the console, split it into words and distribute them into 3 lists and use separators between them.
words = list(map(str, input().split(" ")))
lowercase_words = []
uppercase_words = []
mixedcase_words = []
def split_symbols(list):
from operator import methodcaller
list = words
map(methodcaller(str,"split"," ",",",":",";",".","!","( )","","'","\\","/","[ ]","space"))
return list
for word in words:
if words[word] == word.lower():
words[word] = lowercase_words
elif words[word] == word.upper():
words[word] = uppercase_words
else:
words[word] = mixedcase_words
print(f"Lower case: {split_symbols(lowercase_words)}")
print(f"Upper case: {split_symbols(uppercase_words)}")
print(f"Mixed case: {split_symbols(mixedcase_words)}")
There are several issues in your code.
1) words is a list and word is string. And you are trying to access the list with the index as string which will throw an error. You must use integer for indexing a list. In this case, you don't even need indexes.
2) To check lower or upper case you can just do, word == word.lower() or word == word.upper(). Or another approach would be to use islower() or isupper() function which return a boolean.
3) You are trying to assign an empty list to that element of list. What you want is to append the word to that particular list. You want something like lowercase_words.append(word). Same for uppercase and mixedcase
So, to fix this two issues you can write the code like this -
for word in words:
if word == word.lower(): # same as word.islower()
lowercase_words.append(word)
elif word == word.upper(): # same as word.isupper()
uppercase_words.append(word)
else:
mixedcase_words.append(word)
My advice would be to refrain from naming variable things like list. Also, in split_words() you are assigning list to words. I think you meant it other way around.
Now I am not sure about the "use separators between them" part of the question. But the line map(methodcaller(str,"split"," ",",",":",";",".","!","( )","","'","\\","/","[ ]","space")) is definitely wrong. map() takes a function and an iterable. In your code the iterable part is absent and I think this where the input param list fits in. So, it may be something like -
map(methodcaller("split"," "), list)
But then again I am not sure what are you trying to achieve with that many seperator
My function first calculates all possible anagrams of the given word. Then, for each of these anagrams, it checks if they are valid words, but checking if they equal to any of the words in the wordlist.txt file. The file is a giant file with a bunch of words line by line. So I decided to just read each line and check if each anagram is there. However, it comes up blank. Here is my code:
def perm1(lst):
if len(lst) == 0:
return []
elif len(lst) == 1:
return [lst]
else:
l = []
for i in range(len(lst)):
x = lst[i]
xs = lst[:i] + lst[i+1:]
for p in perm1(xs):
l.append([x] + p)
return l
def jumbo_solve(string):
'''jumbo_solve(string) -> list
returns list of valid words that are anagrams of string'''
passer = list(string)
allAnagrams = []
validWords = []
for x in perm1(passer):
allAnagrams.append((''.join(x)))
for x in allAnagrams:
if x in open("C:\\Users\\Chris\\Python\\wordlist.txt"):
validWords.append(x)
return(validWords)
print(jumbo_solve("rarom"))
If have put in many print statements to debug, and the passed in list, "allAnagrams", is fully functional. For example, with the input "rarom, one valid anagram is the word "armor", which is contained in the wordlist.txt file. However, when I run it, it does not detect if for some reason. Thanks again, I'm still a little new to Python so all the help is appreciated, thanks!
You missed a tiny but important aspect of:
word in open("C:\\Users\\Chris\\Python\\wordlist.txt")
This will search the file line by line, as if open(...).readlines() was used, and attempt to match the entire line, with '\n' in the end. Really, anything that demands iterating over open(...) works like readlines().
You would need
x+'\n' in open("C:\\Users\\Chris\\Python\\wordlist.txt")
if the file is a list of words on separate lines to make this work to fix what you have, but it's inefficient to do this on every function call. Better to do once:
wordlist = open("C:\\Users\\Chris\\Python\\wordlist.txt").read().split('\n')
this will create a list of words if the file is a '\n' separated word list. Note you can use
`readlines()`
instead of read().split('\n'), but this will keep the \n on every word, like you have, and you would need to include that in your search as I show above. Now you can use the list as a global variable or as a function argument.
if x in wordlist: stuff
Note Graphier raised an important suggestion in the comments. A set:
wordlist = set(open("C:\\Users\\Chris\\Python\\wordlist.txt").read().split('\n'))
Is better suited for a word lookup than a list, since it's O(word length).
You have used the following code in the wrong way:
if x in open("C:\\Users\\Chris\\Python\\wordlist.txt"):
Instead, try the following code, it should solve your problem:
with open("words.txt", "r") as file:
lines = file.read().splitlines()
for line in lines:
# do something here
So, putting all advice together, your code could be as simple as:
from itertools import permutations
def get_valid_words(file_name):
with open(file_name) as f:
return set(line.strip() for line in f)
def jumbo_solve(s, valid_words=None):
"""jumbo_solve(s: str) -> list
returns list of valid words that are anagrams of `s`"""
if valid_words is None:
valid_words = get_valid_words("C:\\Users\\Chris\\Python\\wordlist.txt")
return [word for word in permutations(s) if word in valid_words]
if __name__ == "__main__":
print(jumbo_solve("rarom"))
I'm trying to create a basic program to pick out the positions of words in a quote. So far, I've got the following code:
print("Your word appears in your quote at position(s)", string.index(word))
However, this only prints the first position where the word is indexed, which is fine if the quote only contains the word once, but if the word appears multiple times, it will still only print the first position and none of the others.
How can I make it so that the program will print every position in succession?
Note: very confusingly, string here stores a list. The program is supposed to find the positions of words stored within this list.
It seems that you're trying to find occurrences of a word inside a string: the re library has a function called finditer that is ideal for this purpose. We can use this along with a list comprehension to make a list of the indexes of a word:
>>> import re
>>> word = "foo"
>>> string = "Bar foo lorem foo ipsum"
>>> [x.start() for x in re.finditer(word, string)]
[4, 14]
This function will find matches even if the word is inside another, like this:
>>> [x.start() for x in re.finditer("foo", "Lorem ipsum foobar")]
[12]
If you don't want this, encase your word inside a regular expression like this:
[x.start() for x in re.finditer("\s+" + word + "\s+", string)]
Probably not the fastest/best way but it will work. Used in rather than == in case there were quotations or other unexpected punctuation aswell! Hope this helps!!
def getWord(string, word):
index = 0
data = []
for i in string.split(' '):
if i.lower() in word.lower():
data.append(index)
index += 1
return data
Here is a code I quickly made that should work:
string = "Hello my name is Amit and I'm answering your question".split(' ')
indices = [index for (word, index) in enumerate(string) if word == "QUERY"]
That should work, although returns the index of the word. You could make a calculation that adds the lengths of all words before that word to get the index of the letter.
I am writing code in which a word list is inputted. If any of the words in the list are of exactly 4 characters then those words will be returned in this format:
['word','four']
I am making a loop to check the whole list but obviously return is stopping the function so only the first 4 letter word is getting printed. As per instructions 'return' must be used and not print and the output must be in the list format like above. any help will be appreciated. Thank you.
def letter(list):
word = []
for word in list:
if len(word)==4:
return word
Once return is used it exits from the function so its better to populate an entire list of four letter words an then return it
def letter(word_list):
words = []
for word in word_list:
if len(word)==4:
words.append(word)
return words
One way to do it is using the built-in filter function that takes a boolean function and an iterable as inputs. For filter(f(item), lst) returns all items in lst for which f(item) returns true. Keep in mind that filter() returns a filter object, so you need to apply list(filter()) to return the list. For this case, the code would be:
list(filter(lambda word: len(word) == 4, words))
Another way to do it would be to use a list comprehension:
[word for word in words if len(word) == 4]
Using a list comprehension
def letter(list):
return [word for word in list if len(word)==4]
So my function should open a file and count the word length and give the output. For example,
many('sample.txt')
Words of length 1: 2
Words of length 2: 6
Words of length 3: 7
Words of length 4: 6
My sample.txt file contains:
This is a test file. How many words are of length one?
How many words are of length three? We should figure it out!
Can a function do this?
My coding so far,
def many(fname): infile = open(fname,'r')
text = infile.read()
infile.close()
L = text.split()
L.sort
for item in L:
if item == 1:
print('Words of length 1:', L.count(item))
Can anyone tell me what I'm doing wrong. I call the function nothing happens. It's clearly because of my coding but I don't know where to go from here. Any help would be nice, thanks.
You want to obtain a list of lengths (1, 2, 3, 4,... characters) and a number of occurrences of words with this length in the file.
So until L = text.split() it was a good approach. Now have a look at dictionaries in Python, that will allow you to store the data structure mentioned above and iterate over the list of words in the file. Just a hint...
Since this is homework, I'll post a short solution here, and leave it as exercise to figure out what it does and why it works :)
>>> from collections import Counter
>>> text = open("sample.txt").read()
>>> counts = Counter([len(word.strip('?!,.')) for word in text.split()])
>>> counts[3]
7
What do you expect here
if item == 1:
and here
L.count(item)
And what does actually happen? Use a debugger and have a look at the variable values or just print them to the screen.
Maybe also this:
>>> s
'This is a test file. How many words are of length one? How many words are of length three? We should figure it out! Can a function do this?'
>>> {x:[len([c for c in w ]) for w in s.split()].count(x) for x in [len([c for c in w ]) for w in s.split()] }
{1: 2, 2: 6, 3: 5, 4: 6, 5: 4, 6: 5, 8: 1}
Let's analyze your problem step-by-step.
You need to:
Retrieve all the words from a file
Iterate over all the words
Increment the counter N every time you find a word of length N
Output the result
You already did the step 1:
def many(fname):
infile = open(fname,'r')
text = infile.read()
infile.close()
L = text.split()
Then you (try to) sort the words, but it is not useful. You would sort them alphanumerically, so it is not useful for your task.
Instead, let's define a Python dictionary to hold the count of words
lengths = dict()
#sukhbir correctly suggested in a comment to use the Counter class, and I encourage you to go and search for it, but I'll stick to traditional dictionaries in this example as i find it important to familiarize with the basics of the language before exploring the library.
Let's go on with step 2:
for word in L:
length = len(word)
For each word in the list, we assign to the variable length the length of the current word. Let's check if the counter already has a slot for our length:
if length not in lengths:
lengths[length] = 0
If no word of length length was encountered, we allocate that slot and we set that to zero. We can finally execute step 3:
lengths[length] += 1
Finally, we incremented the counter of words with the current length of 1 unit.
At the end of the function, you'll find that lengths will contain a map of word length -> number of words of that length. Let's verify that by printing its contents (step 4):
for length, counter in lengths.items():
print "Words of length %d: %d" % (length, counter)
If you copy and paste the code I wrote (respecting the indentation!!) you will get the answers you need.
I strongly suggest you to go through the Python tutorial.
The regular expression library might also be helpful, if being somewhat overkill. A simple word matching re might be something like:
import re
f = open("sample.txt")
text = f.read()
words = re.findall("\w+", text)
Words is then a list of... words :)
This however will not properly match words like 'isn't' and 'I'm', as \w only matches alphanumerics. In the spirit of this being homework I guess I'll leave that for the interested reader, but Python Regular Expression documentation is pretty good as a start.
Then my approach for counting these words by length would be something like:
occurrence = dict()
for word in words:
try:
occurrence[len(word)] = occurrence[len(word)] + 1
except KeyError:
occurrence[len(word)] = 1
print occurrence.items()
Where a dictionary (occurrence) is used to store the word lengths and their occurrence in your text. The try: and except: keywords deal with the first time we try and store a particular length of word in the dictionary, where in this case the dictionary is not happy at being asked to retrieve something that it has no knowledge of, and the except: picks up the exception that is thrown as a result and stores the first occurrence of that length of word. The last line prints everything in your dictionary.
Hope this helps :)