I'm new to python world, and I made a code of scrabble finder with two wildcards (* and ?) in it. When scoring the word, I would like to score wildcard letters to zero, but it looks like it doesn't work. I'm wondering what is missing here.
When you look into the line after "# Add score and valid word to the empty list", I tried to code if a letter in the word is not in the rack, I removed the letter so that I can only score other characters that are not coming from wildcards and matches with the letter in the rack. For example, if I have B* in my rack and the word is BO, I would like to remove O and only score B so that I can score wildcard to zero.
But the result is not what I expected.
import sys
if len(sys.argv) < 2:
print("no rack error.")
exit(1)
rack = sys.argv[1]
rack_low = rack.lower()
# Turn the words in the sowpods.txt file into a Python list.
with open("sowpods.txt","r") as infile:
raw_input = infile.readlines()
data = [datum.strip('\n') for datum in raw_input]
# Find all of the valid sowpods words that can be made
# up of the letters in the rack.
valid_words = []
# Call each word in the sowpods.txt
for word in data:
# Change word to lowercase not to fail due to case.
word_low = word.lower()
candidate = True
rack_letters = list(rack_low)
# Iterate each letter in the word and check if the letter is in the
# Scrabble rack. If used once in the rack, remove the letter from the rack.
# If there's no letter in the rack, skip the letter.
for letter in word_low:
if letter in rack_letters:
rack_letters.remove(letter)
elif '*' in rack_letters:
rack_letters.remove('*')
elif '?' in rack_letters:
rack_letters.remove('?')
else:
candidate = False
if candidate == True:
# Add score and valid word to the empty list
total = 0
for letter in word_low:
if letter not in rack_letters:
word_strip = word_low.strip(letter)
for letter in word_strip:
total += scores[letter]
valid_words.append([total, word_low])
I'm going to go a slightly different route with my answer and hopefully speed the overall process up. We're going to import another function from the standard library -- permutations -- and then find possible results by trimming the total possible word list by the length of the rack (or, whatever argument is passed).
I've commented accordingly.
import sys
from itertools import permutations # So we can get our permutations from all the letters.
if len(sys.argv) < 2:
print("no rack error.")
exit(1)
rack = sys.argv[1]
rack_low = rack.lower()
# Turn the words in the sowpods.txt file into a Python list.
txt_path = r'C:\\\\\sowpods.txt'
with open(txt_path,'r') as infile:
raw_input = infile.readlines()
# Added .lower() here.
data = [i.strip('\n').lower() for i in raw_input]
## Sample rack of 7 letters with wildcard character.
sample_rack = 'jrnyoj?'
# Remove any non-alphabetic characters (i.e. - wildcards)
# We're using the isalpha() method.
clean_rack = ''.join([i for i in sample_rack if i.isalpha()])
# Trim word list to the letter count in the rack.
# (You can skip this part, but it might make producing results a little quicker.)
trimmed_data = [i for i in data if len(i) <= len(clean_rack)]
# Create all permutations from the letters in the rack
# We'll iterate over a count from 2 to the length of the rack
# so that we get all relevant permutations.
all_permutations = list()
for i in range(2, len(clean_rack) + 1):
all_permutations.extend(list(map(''.join, permutations(clean_rack, i))))
# We'll use set().intersection() to help speed the discovery process.
valid_words = list(set(all_permutations).intersection(set(trimmed_data)))
# Print sorted list of results to check.
print(f'Valid words for a rack containing letters \'{sample_rack}\' are:\n\t* ' + '\n\t* '.join(sorted(valid_words)))
Our output would be the following:
Valid words for a rack containing letters 'jrnyoj?' are:
* jo
* jor
* joy
* no
* nor
* noy
* ny
* on
* ony
* or
* oy
* yo
* yon
If you want to verify that the results are actually in the sowpods.txt file, you can just index the sowpods.txt list by where the word you want to look up is indexed:
trimmed_data[trimmed_data.index('jor')]
When you are totalling the scores you are using the words from the wordlist and not the inputted words:
total=0
for letter in word_low:
...
Rather, this should be:
total=0
for letter in rack_low:
...
Also, You do not need to loop and remove the letters with strip at the end.
you can just have:
total = 0
for letter in rack_low:
if letter not in rack_letters:
try:
total += scores[letter]
except KeyError: # If letter is * or ? then a KeyError occurs
pass
valid_words.append([total, word_low])
Related
I wrote a program (appear) that keeps generating random letters, from a to z, till a given word appears:
import random
def appear(word):
word = list(word)
w = word
l = list('abcdefghijklmnopqrstuvwxyz')
i = 0
while len(w) > 0:
r = int(random.random() * 26)
print(l[r], end='')
if (w[0] == l[r] and i == 1) or (w[0] == l[r] and len(w) == len(word)):
i = 1
del w[0]
else:
i = 0
w = word
For example, appear('car') should produce: ajzkcar.
I tried printing in each loop the value of w, and the problem seems to be that the program fails to reset w to the original word if it doesn't find two consecutive letters, even though I clearly say that it should in the last "else"
I suggest to not delete anything but keep track of the string you have output so far, trim it to the length of word and then check if it actually is word:
import random
def appear(word):
l = list('abcdefghijklmnopqrstuvwxyz')
s = ""
while True:
r = int(random.random() * 26)
print(l[r], end='')
s += l[r] # keep track
s = s[-len(word):] # truncate
if s == word: # compare
return
appear("car")
You need to keep the letters that you have generated in memory, and compare them with the letters of the target word.
The word you're looking for has a fixed length. When you generate a new letter, you need to add that new letter to your memory at the right, and discard the oldest letter from your memory, at the left.
What data structure to use for this? Adding letters to the right and removing letters from the left, so that the total length remains fixed? I immediately think about a fixed-length queue.
The simplest way to get a fixed-length queue in python is to use collections.deque with a maxlen argument.
Also, to choose a letter, I have a preference for random.choice(letters) over letters[int(random.random() * 26)]
from collections import deque
from random import choice, choices
from string import ascii_lowercase as alphabet
def appear(word):
queue = deque(choices(alphabet, k=len(word)), maxlen=len(word))
target = deque(word)
print(''.join(queue), end='')
while queue != target:
new_letter = choice(alphabet)
queue.append(new_letter)
print(new_letter, end='')
appear('a')
# jca
appear('ab')
# zdoxkcnswafzsclmeduwhyhpdfwljujduwvbsxayihtfmlqrjxamlqnestzsncjjzbyfuzaczmuaiddfehckkrcnzfwwgnxfxcaifasaybokkxrqievmwqhisnaqhezcxwxfrstvuvwoedstpsrxkmxbubab
When a person enters a function (e.g. find_from_dict(letters)), the function searches a word from dictionary.txt that can be made from the letters that the user has inputted—a word that contains the most letters inputted).
For example, letters is input as random typing such as "BAJPPNLE" which will then find "APPLE" from the dictionary since "APPLE" has the most letters from "BAJPPNLE".
def find_from_dict(letters):
n = 0
y = 0
x = 0
dictFile = [line.rstrip('\n') for line in open("dictionary.txt")]
listLetters = list(letters)
final = []
while True:
if n < len(dictFile) and len(list(dictFile[n])) <= len(listLetters) and x < len(list(dictFile[n])) and list(dictFile[n])[x] in listLetters:
x = x + 1
elif n < len(dictFile) and len(list(dictFile[n])) <= len(listLetters) and x < len(list(dictFile[n])) and list(dictFile[n])[x] not in listLetters:
x = 0
n = n + 1
elif n < len(dictFile) and len(list(dictFile[n])) <= len(listLetters) and x == len(list(dictFile[n])):
final.append(dictFile[n])
elif n < len(dictFile) and len(list(dictFile[n])) > len(listLetters):
n = n + 1
else:
print(final)
break
I have this code at the moment, but since my dictionary.txt file is huge and the code is inefficient, it takes forever to go through..
Does anyone have any idea how I could make this code efficient?
You can speed this up by preparing a word index formed of the sorted letters in your word list. Then look for sorted combinations of the letters in that index:
for example:
from collections import defaultdict
from itertools import combinations
with open("/usr/share/dict/words","r") as wordList:
words = defaultdict(list)
for word in wordList.read().upper().split("\n"):
words[tuple(sorted(word))].append(word) # index by sorted letters
def findWords(letters):
for size in range(len(letters),2,-1): # from large to small (minimum 3 letters)
for combo in combinations(sorted(letters),size): # combinations of that size
for word in (w for w in words[combo]): # matching fords from index
yield word # return as you go (iterator)
# If you only want one, change this to: return word
testing:
while True:
letters = input("Enter letters:")
if not letters: break
for word in findWords(letters.upper()):
stop = input(word)
if stop: break
print("")
sample output:
Enter letters:BAJPPNLE
JELAB
BEJAN
LEBAN
NABLE
PEBAN
PEBAN
ALPEN
NEPAL
PANEL
PENAL
PLANE
ALPEN
NEPAL
PANEL
PENAL
PLANE
APPLE
NAPPE.
Enter letters:EPROING
PERIGON
PIGEON
IGNORE
REGION
PROGNE
OPINER.
Enter letters:
if you need a solution without using libraries, you will need to use a recursive approach that does a breadth first traversal of the combination tree:
with open("/usr/share/dict/words","r") as wordList:
words = dict()
for word in wordList.read().upper().split("\n"):
words.setdefault(tuple(sorted(word)),list()).append(word) # index by sorted letters
def findWords(letters,size=None):
if size == None:
letters = sorted(letters)
for size in range(len(letters),2,-1):
for word in findWords(letters,size): yield word
elif len(letters) == size:
for word in words.get(tuple(letters),[]): yield word
elif len(letters)>size:
for i in range(len(letters)):
for word in findWords(letters[:i]+letters[i+1:],size):
yield word
You can kind of "cheat" your way through it by pre-processing the dictionary file.
The idea is: instead of having a list of words, you have a list of groups which is determined by the sorted letters of the words.
For example, something like:
"aeegr": [
"agree",
"eager",
],
"alps": [
"alps",
"laps",
"pals",
]
Then if you wanted to just find the exact match, you could sort the letters from the input and search in the processed file.
But you want the one that matches the most letters, so what you could do is number the letters with prime numbers (I'm only considering lowercase ascii characters), so that a is 2, b is 3, c is 5, d is 7 and so on.
Then, you can get a number by multiplying all the letters, so for example for alps you'd get 2*37*53*67.
In your dictionary file you then have the numbers obtained the same way for each word.
Like:
262774: [
"alps",
"laps",
"pals",
]
You then go through your dictionary and if the initial number divided by the dictionary number has a remainder of 0, that's a possible match.
The maximum number with a remainder of 0 is the one that you want, because that's the one with the most letters present.
Keep in mind that the numbers might get very big very quickly, depending on how many letters you use.
Apologies if this is the wrong forum - it's my first question. I'm learning python and writing a password generator as an exercise from www.practicepython.org
I've written the following but it can be really slow so I guess i"m doing it inefficiently. I want to select a random word from the dictionary and then add ascii characters to it. I want at least 2 ascii characters in the password so I use a while loop to ensure that the word element contains (length - 2).
This works fine if you say that you want the password to be 10 characters long, but if you constrict to something like 5 I think the while loop has to go through so many iterations it can take up to 30 seconds.
I can't find the answer via searching - guidance appreciated!
import string
import random
import nltk
from nltk.corpus import words
word = words.words()[random.randint(1, len(words.words()))]
ascii_str = (string.ascii_letters + string.digits + string.punctuation)
length = int(input("How long do you want the password to be? "))
while len(word) >= (length - 2):
word = words.words()[random.randint(1, len(words.words()))]
print("The password is: " + word, end="")
for i in range(0, (length - len(word))):
print(ascii_str[random.randint(1, len(ascii_str) - 1)], end="")
Start by calling words.words() just once and store that in a variable:
allwords = words.words()
That saves a lot of work, because now the nltk.corpus library won't try to load the whole list each time you try to get the length of the list or try to select a random word with the index you generated.
Next, use random.choice() to pick a random element from that list. That eliminates the need to keep passing in a list length:
word = random.choice(allwords)
# ...
while len(word) >= (length - 2):
word = random.choice(allwords)
Next, you could group the words by length first:
allwords = words.words()
by_length = {}
for word in allwords:
by_length.setdefault(len(word), []).append(word)
This gives you a dictionary with keys representing the length of the words; the nltk corpus has words between 1 and 24 letters long. Each value in the dictionary is a list of words of the same length, so by_length[12] would give you a list of words that are all exactly 12 characters long.
This allows you to pick words of a specific length:
# start with the desired length, and see if there are words this long in the
# dictionary, but don’t presume that all possible lengths exist:
wordlength = length - 2
while wordlength > 0 and wordlength not in by_length:
wordlength -= 1
# we picked a length, but it could be 0, -1 or -2, so start with an empty word
# and then pick a random word from the list with words of the right length.
word = ''
if wordlength > 0:
word = random.choice(by_length[wordlength])
Now word is the longest random word that'll fit your criteria: at least 2 characters shorter than the required length, and taken at random from the word list.
More importantly: we only picked a random word once. Provided you keep the by_length dictionary around for longer and re-use it in a password-generating function, that's a big win.
Picking the nearest available length from by_length can be done without stepping through every possible length one step at a time if you use bisection, but I’ll leave adding that as an exercise for the reader.
You are looking at random.choice
From the docs:
random.choice(seq)
Return a random element from the non-empty sequence seq.
In [22]: import random
In [23]: random.choice([1,2,3,4,5])
Out[23]: 3
In [24]: random.choice([1,2,3,4,5])
Out[24]: 5
In [25]: random.choice([1,2,3,4,5])
Out[25]: 1
The code can then be simplified to
import string
import random
import nltk
from nltk.corpus import words
#All words assigned to a list first
words = words.words()
#Get a random word
word = random.choice(words)
ascii_str = string.ascii_letters + string.digits + string.punctuation
length = int(input("How long do you want the password to be? "))
while len(word) >= (length - 2):
word = random.choice(words)
#Use random.sample to choose n random samples, and join them all to make a string
password = word + ''.join(random.sample(ascii_str, length))
print("The password is: " + password, end="")
Possible outputs are
How long do you want the password to be? 10
The password is: heyT{7<XEVc!l
How long do you want the password to be? 8
The password is: hiBk-^8t7]
But ofcourse, this is not an optimized solution as noted by #MartjinPieters in the comment, but I will try to provide something along the lines as he pointed in his answer, in a different way as follows
I will use itertools.groupby to create the by_length dictionary, a dictionary with key as word length and values as list of words of that length using itertools.groupby
I will ensure a minimum length restriction for length of password
Use random.sample to choose pass_len random samples, and join them all to make a string, and append the word in front!
import string
import random
from itertools import groupby
#All words assigned to a list first
words = ['a', 'c', 'e', 'bc', 'def', 'ghij' , 'jklmn']
#Get a random word
word = random.choice(words)
ascii_str = string.ascii_letters + string.digits + string.punctuation
#Check for minimum length, and exit the code if it is not
min_length = 8
pass_len = int(input("How long do you want the password to be? Minimum length is {}".format(min_length)))
if pass_len <= min_length:
print('Password is not long enough')
exit()
#Create the by_length dictionary, a dictionary with key as word length and values as list of words of that length using itertools.groupby
by_length = {}
for model, group in groupby(words, key=len):
by_length[model] = list(group)
chosen_word = ''
req_len = pass_length - 2
#Iterate till you find the word of required length of pass_len - 2, else reduce the required length by 1
while req_len > 0:
if req_len in words:
chosen_word = by_length[req_len]
else:
req_len -= 1
#Use random.sample to choose n random samples, and join them all to make a string
password = word + ''.join(random.sample(ascii_str, length))
print("The password is: " + password, end="")
I have an input text file from which I have to count sum of characters, sum of lines, and sum of each word.
So far I have been able to get the count of characters, lines and words. I also converted the text to all lower case so I don't get 2 different counts for same word where one is in lower case and the other is in upper case.
Now looking at the output I realized that, the count of words is not as clean. I have been struggling to output clean data where it does not count any special characters, and also when counting words not to include a period or a comma at the end of it.
Ex. if the text file contains the line: "Hello, I am Bob. Hello to Bob *"
it should output:
2 Hello
2 Bob
1 I
1 am
1 to
Instead my code outputs
1 Hello,
1 Hello
1 Bob.
1 Bob
1 I
1 am
1 to
1 *
Below is the code I have as of now.
# Open the input file
fname = open('2013_honda_accord.txt', 'r').read()
# COUNT CHARACTERS
num_chars = len(fname)
# COUNT LINES
num_lines = fname.count('\n')
#COUNT WORDS
fname = fname.lower() # convert the text to lower first
words = fname.split()
d = {}
for w in words:
# if the word is repeated - start count
if w in d:
d[w] += 1
# if the word is only used once then give it a count of 1
else:
d[w] = 1
# Add the sum of all the repeated words
num_words = sum(d[w] for w in d)
lst = [(d[w], w) for w in d]
# sort the list of words in alpha for the same count
lst.sort()
# list word count from greatest to lowest (will also show the sort in reserve order Z-A)
lst.reverse()
# output the total number of characters
print('Your input file has characters = ' + str(num_chars))
# output the total number of lines
print('Your input file has num_lines = ' + str(num_lines))
# output the total number of words
print('Your input file has num_words = ' + str(num_words))
print('\n The 30 most frequent words are \n')
# print the number of words as a count from the text file with the sum of each word used within the text
i = 1
for count, word in lst[:10000]:
print('%2s. %4s %s' % (i, count, word))
i += 1
Thanks
Try replacing
words = fname.split()
With
get_alphabetical_characters = lambda word: "".join([char if char in 'abcdefghijklmnopqrstuvwxyz' else '' for char in word])
words = list(map(get_alphabetical_characters, fname.split()))
Let me explain the various parts of the code.
Starting with the first line, whenever you have a declaration of the form
function_name = lambda argument1, argument2, ..., argumentN: some_python_expression
What you're looking at is the definition of a function that doesn't have any side effects, meaning it can't change the value of variables, it can only return a value.
So get_alphabetical_characters is a function that we know due to the suggestive name, that it takes a word and returns only the alphabetical characters contained within it.
This is accomplished using the "".join(some_list) idiom which takes a list of strings and concatenates them (in other words, it producing a single string by joining them together in the given order).
And the some_list here is provided by the generator expression [char if char in 'abcdefghijklmnopqrstuvwxyz' else '' for char in word]
What this does is it steps through every character in the given word, and puts it into the list if it's alphebetical, or if it isn't it puts a blank string in it's place.
For example
[char if char in 'abcdefghijklmnopqrstuvwyz' else '' for char in "hello."]
Evaluates to the following list:
['h','e','l','l','o','']
Which is then evaluates by
"".join(['h','e','l','l','o',''])
Which is equivalent to
'h'+'e'+'l'+'l'+'o'+''
Notice that the blank string added at the end will not have any effect. Adding a blank string to any string returns that same string again.
And this in turn ultimately yields
"hello"
Hope that's clear!
Edit #2: If you want to include periods used to mark decimal we can write a function like this:
include_char = lambda pos, a_string: a_string[pos].isalnum() or a_string[pos] == '.' and a_string[pos-1:pos].isdigit()
words = "".join(map(include_char, fname)).split()
What we're doing here is that the include_char function checks if a character is "alphanumeric" (i.e. is a letter or a digit) or that it's a period and that the character preceding it is numeric, and using this function to strip out all the characters in the string we want, and joining them into a single string, which we then separate into a list of strings using the str.split method.
This program may help you:
#I created a list of characters that I don't want \
# them to be considered as words!
char2remove = (".",",",";","!","?","*",":")
#Received an string of the user.
string = raw_input("Enter your string: ")
#Make all the letters lower-case
string = string.lower()
#replace the special characters with white-space.
for char in char2remove:
string = string.replace(char," ")
#Extract all the words in the new string (have repeats)
words = string.split(" ")
#creating a dictionary to remove repeats
to_count = dict()
for word in words:
to_count[word]=0
#counting the word repeats.
for word in to_count:
#if there is space in a word, it is white-space!
if word.isalpha():
print word, string.count(word)
Works as below:
>>> ================================ RESTART ================================
>>>
Enter your string: Hello, I am Bob. Hello to Bob *
i 1
am 1
to 1
bob 2
hello 2
>>>
Another way is using Regex to remove all non-letter chars (to get rid off char2remove list):
import re
regex = re.compile('[^a-zA-Z]')
your_str = raw_input("Enter String: ")
your_str = your_str.lower()
regex.sub(' ', your_str)
words = your_str.split(" ")
to_count = dict()
for word in words:
to_count[word]=0
for word in to_count:
if word.isalpha():
print word, your_str.count(word)
Beginner python coder here, keep things simple, please.
So, I need this code below to scramble two letters without scrambling the first or last letters. Everything seems to work right up until the scrambler() function.
from random import randint
def wordScramble(string):
stringArray = string.split()
for word in stringArray:
if len(word) >= 4:
letter = randint(1,len(word)-2)
point = letter
while point == letter:
point = randint(1, len(word)-2)
word = switcher(word,letter,point)
' '.join(stringArray)
return stringArray
def switcher(word,letter,point):
word = list(word)
word[letter],word[point]=word[point],word[letter]
return word
print(wordScramble("I can't wait to see how this turns itself out"))
The outcome is always:
I can't wait to see how this turns itself out
Since you are a beginner, I tried to change your code as little as possible. Mostly you are expecting changes to word to change the contents or your list stringArray. The comments mark the changes and reasons.
from random import randint
def wordScramble(myString): # avoid name clashes with python modules
stringArray = myString.split()
for i, word in enumerate(stringArray): # keep the index so we can update the list
if len(word) >= 4:
letter = randint(1,len(word)-2)
point = letter
while point == letter:
point = randint(1, len(word)-2)
stringArray[i] = switcher(word,letter,point) # update the array
return ' '.join(stringArray) # return the result of the join
def switcher(word,letter,point):
word = list(word)
word[letter],word[point]=word[point],word[letter]
return ''.join(word) # return word back as a string
print(wordScramble("I can't wait to see how this turns itself out"))
Because there had to be a cleaner (and better documented) way to do this:
from random import sample
def wordScramble(sentence):
# Split sentence into words; apply switcher to each; rejoin into a sentence
return ' '.join([switcher(x) for x in sentence.split()])
def switcher(word):
if len(word) <= 3: # Don't bother if not enough letters to scramble
return word
# Pick 2 positions from interior of word
a,b = sorted(sample( xrange(1,len(word)-1), 2 ))
# Re-assemble word with out 2 positions swapped using bits before, between & after them
return word[:a] + word[b] + word[a+1:b] + word[a] + word[b+1:]
print wordScramble("I can't wait to see how this turns itself out")