Store value of "if any(" search - python

I'm trying to store the value of word, is there a way to do this?
if any(word in currentFile for word in otherFile):

Don't use any if you want the words themselves:
words = [word for word in otherFile if word in currentFile]
Then you can truth-test directly (since an empty list is falsy):
if words:
# do stuff
And also access the words that matched:
print words
EDIT: If you only want the first matching word, you can do that too:
word = next((word for word in otherFile if word in currentFile), None)
if word:
# do stuff with word

Just a little follow-up:
You should consider what is an input to any() function here. Input is a generator. So let's break it down:
word in currentFile is a boolean expression - output value is True or False
for word in otherFile performs an iteration over otherFile
So the output of any() argument would be in fact generator of boolean values. You can check it by simply executing [word in currentFile for word in otherFile]. Note that brackets means that a list would be created, with all values computed at once. Generator works functionally the same (if what you do is a single loop over all values), but are better memory-wise. The point is - what you feed to any() is a list of booleans. It has no knowledge about actual words - therefore it cannot possibly output one.
No. You'll have to write explicit loop:
def find_first(currentFile, otherFile)
for word in currentFile:
if word in otherFile:
return word
If no match is found, function would implicitly return None which may be handled by a caller outside of find_first() function.

You're not going to be able to store this value directly from any. I'd recommend a for-loop
for word in otherFile:
if word in currentFile:
break
else:
word = None
if word is not None:
print word, "was found in the current file"
Note that this will store only the first relevant value of word. If you would like all relevant values of word, then this should do it:
words = [word for word in otherFile if word in currentFile]
for word in words:
print word, "was found in the current file"

You can get the first word from otherFile that is also in currentFile by dropping all words from otherFile that are not in currentFile and then taking the next one:
from itertools import dropwhile
word = next(dropwhile(lambda word: word not in currentFile, otherfile))
If there is no such word, this raises StopIteration.
You can get all words from otherFile that are also in currentFile by using a list comprehension:
words = [word for word in otherFile if word in currentFile]
Or by using a set intersection:
words = list(set(otherFile) & set(currentFile))
Or by using the filter function:
words = filter(lambda word: word in currentFile, otherFile)

Related

Display element of a list detected by "if any"

I'm using a simple system to check if some banned words are currently in a string but I'd like to improve it to display the word in question so I added the line
print ("BANNED WORD DETECTED : ", word)
But I get the error
"NameError: name 'word' is not defined"
If think that the problem is that my system is just checking if any of the words is in the list without "storing it" somewhere, maybe I am misunderstanding the python list system, any advice of what I should modify ?
# -*- coding: utf-8 -*-
bannedWords = ['word1','word2','check']
mystring = "the string i'd like to check"
if any(word in mystring for word in bannedWords):
print ("BANNED WORD DETECTED : ", word)
else :
print (mystring)
any() isn't suitable for this, use a generator expression with next() instead or a list comprehension:
banned_word = next((word for word in mystring.split() if word in bannedWords), None)
if banned_word is not None:
print("BANNED WORD DETECTED : ", word)
Or for multiple words:
banned_words = [word for word in mystring.split() if word in bannedWords]
if banned_words:
print("BANNED WORD DETECTED : ", ','.join(banned_words))
For improved O(1) membership testing, make bannedWords a set rather than a list
Don't use any here. A generator isn't the right tool either. You actually want a list comprehension to collect all the matching words.
matching = [word for word in bannedWords if word in mystring]
if matching:
print ("BANNED WORD(S) DETECTED : ", ','.join(matching))
you could do that very easily :-
Just take a referance variable to check if it found anything within the loop.
detected = False ;
for word in bannedWords:
if word in mystring :
detected = True ;
print("Detected Banned word " ,word) ;
if not detected:
print(mystring ) ;
If you want a more pythonic way :-
print("Banned words are {}".format([word for word in bannedWords if word in mystring]) if len([word for word in bannedWords if word in mystring]) else mystring) ;

Find how many words start with certain letter in a list

I am trying to output the total of how many words start with a letter 'a' in a list from a separate text file. I'm looking for an output such as this.
35 words start with a letter 'a'.
However, i'm outputting all the words that start with an 'a' instead of the total with my current code. Should I be using something other than a for loop?
So far, this is what I have attempted:
wordsFile = open("words.txt", 'r')
words = wordsFile.read()
wordsFile.close()
wordList = words.split()
print("Words:",len(wordList)) # prints number of words in the file.
a_words = 0
for a_words in wordList:
if a_words[0]=='a':
print(a_words, "start with the letter 'a'.")
The output I'm getting thus far:
Words: 334
abate start with the letter 'a'.
aberrant start with the letter 'a'.
abeyance start with the letter 'a'.
and so on.
You could replace this with a sum call in which you feed 1 for every word in wordList that starts with a:
print(sum(1 for w in wordList if w.startswith('a')), 'start with the letter "a"')
This can be further trimmed down if you use the boolean values returned by startswith instead, since True is treated as 1 in these contexts the effect is the same:
print(sum(w.startswith('a') for w in a), 'start with the letter "a"')
With your current approach, you're not summing anything, you're simply printing any word that matches. In addition, you're re-naming a_word from an int to the contents of the list as you iterate through it.
Also, instead of using a_word[0] to check for the first character, you could use startswith(character) which has the same effect and is a bit more readable.
You are using the a_words as the value of the word in each iteration and missing a counter. If we change the for loop to have words as the value and reserved a_words for the counter, we can increment the counter each time the criteria is passed. You could change a_words to wordCount or something generic to make it more portable and friendly for other letters.
a_words = 0
for words in wordList:
if words[0]=='a':
a_words += 1
print(a_words, "start with the letter 'a'.")
sum(generator) is a way to go, but for completeness sake, you may want to do it with list comprehension (maybe if it's slightly more readable or you want to do something with words starting with a etc.).
words_starting_with_a = [word for word in word_list if word.startswith('a')]
After that you may use len built-in to retrieve length of your new list.
print(len(words_starting_with_a), "words start with a letter 'a'")
Simple alternative solution using re.findall function(without splitting text and for loop):
import re
...
words = wordsFile.read()
...
total = len(re.findall(r'\ba\w+?\b', words))
print('Total number of words that start with a letter "a" : ', total)

Cannot prove that there are Words in my list for my WordSearch game

I have created a list of all possible outcomes for this specific wordgrid, doing diagonals,up,down and all the reverses too):
I have called this allWords, but when I try too find specific words I know are in the allWords the loop does not find the Hidden words. I know my problem but I do know how to go around it (sorry for terrible explanation hopefully an example below will show it better):
an Example follows: My wordList is the list of words that I know are hidden somewhere in the wordgrid. My allWords is a list of Rows,Columns,Diagonals from the wordgrid but
WordList = ['HAMMER','....']
allWords = ['ARBHAMMERTYU','...']
that HAMMER is in allWords but 'cloaked' by other characters after it so I am unable to show HAMMER is in the wordgrid.
length = len(allWords)
for i in range(length):
word = allWords[i]
if word in wordList:
print("I have found", word)
it does not find any word HAMMER in allWords.
Any help towards solving this problem would be great
You are not comparing each word in wordList to a word in allWords. The line if word in wordList compares the exact word.
i.e.
if word in wordList will return True only if the word Hammer is in wordList.
To match substring you need another loop:
for i in range(length):
word = allWords[i]
for w in WordList:
if w in word:
print("I have found ", word)
If I understand your problem correctly, you probably need to implement a function that checks if a token (e.g. 'HAMMER') is present in any of the entries in allWords. My best bet for solution would be to use regular expressions.
import re
def findWordInWordList(word, allWords):
pattern = re.compile(".*%s.*" % word)
for item in allWords:
match = pattern.search(item)
if match:
return match
This will return first occurence, if you want more then it's easy to collect them in a list.
You could try something like this:
for word in allWords:
if word in WordList:
print("I have found", word)
Ah, or maybe the error is that you wrote wordList and you really defined WordList. Hope this helps.
If I understand correctly, you are trying to find a match inside allWords and you want to iterate over WordList and determine if there is a substring match.
So, if that is correct, then your code is not exactly doing that. To go through your code step by step to correct what is happening:
length = len(allWords)
for i in range(length):
What you want to do above is not necessarily go over your allWords. You want to iterate over WordList and see if it is inside allWords. You are not doing that, instead you want to do this:
length = len(WordList)
for i in range(length):
With that in mind, that means now you want to reference WordList and not allWords, so you want to now change this:
word = allWords[i]
to this:
word = WordList[i]
Finally, here comes a new bit of information to determine if you in fact have a substring match in the strings you are matching. A method called "any". The "any" method works by returning True if at least one match of what you are looking for is found. It looks like this:
any(if "something" in word in word for words)
Then it will return True if it "something" is in word otherwise it will return False.
So, to put this all together, and run your code with your sample input, we get:
WordList = ['HAMMER','....']
allWords = ['ARBHAMMERTYU','...']
length = len(WordList)
for i in range(length):
word = WordList[i]
if any(word in w for w in allWords):
print("I have found", word)
Output:
I have found HAMMER

How to put only words that start with an uppercase letter in the dictionary?

I have a text document. I want to compile a dictionary (DICT) from this document. The dictionary must only contain all the words that begin with an uppercase letter. (it does not matter if the word is at the beginning of a sentence)
Until now I have done this:
By the way I must use the for loop and the split function for this problem
DICT = {}
for line in lines: # lines is the text without line breaks
words = line.split(" ")
for word in words:
if word in DICT:
DICT[word] += 1
else:
DICT[word] = 1
But I suppose this only makes the dictionary out of all the words in my text.
How do I only choose the words that begin with a capital letter?
How do I verify if I have made the dictionary correctly?
Use the s.isupper() method to test if a string is uppercase. You can use indexing to select just the first character.
Thus, to test if the first character is uppercase, use:
if word[0].isupper():
If you want a fast and pythonic approach, use a collections.Counter() object to do the counting, and split on all whitespace to remove newlines:
from collections import Counter
counts = Counter()
for line in lines: # lines is the text without line breaks
counts.update(word for word in line.split() if word[0].isupper())
Here, word.split() without arguments splits on all whitespace, removing any whitespace at the start and end of the line (including the newline).
from itertools import groupby
s = "QWE asd ZXc vvQ QWE"
# extract all the words with capital first letter
caps = [word for word in s.split(" ") if word[0].isupper()]
# group and count them
caps_counts = {word: len(list(group)) for word, group in groupby(sorted(caps))}
print(caps_counts)
groupby might be less efficient than manual looping as it requires sorted iterable performs a sort, and sorting is O(NlogN) complex, over O(N) compelxity in case of manual looping. But this variant a bit more "pythonic".
You can check if the word begins with a capital letter using the using the isupper function mentioned and include this before your if else statement.
if word[0].isupper():
if word in DICT:
DICT[word] += 1
else:
DICT[word] = 1
To then verify this you can use the any method:
any(word[0].islower() for word in DICT.keys())
Which should return False. You can asset this if you choose.
To make everything a bit nicer you can utilize the defaultdict
from collection import defaultdict
DICT = defaultdict(int)
for line in lines:
words = line.split(" ")
for word in words:
if (word in DICT) and (word[0].isupper()):
DICT[word] += 1

Removing punctuation/numbers from text problem

I had some code that worked fine removing punctuation/numbers using regular expressions in python, I had to change the code a bit so that a stop list worked, not particularly important. Anyway, now the punctuation isn't being removed and quite frankly i'm stumped as to why.
import re
import nltk
# Quran subset
filename = raw_input('Enter name of file to convert to ARFF with extension, eg. name.txt: ')
# create list of lower case words
word_list = re.split('\s+', file(filename).read().lower())
print 'Words in text:', len(word_list)
# punctuation and numbers to be removed
punctuation = re.compile(r'[-.?!,":;()|0-9]')
for word in word_list:
word = punctuation.sub("", word)
print word_list
Any pointers on why it's not working would be great, I'm no expert in python so it's probably something ridiculously stupid. Thanks.
Change
for word in word_list:
word = punctuation.sub("", word)
to
word_list = [punctuation.sub("", word) for word in word_list]
Assignment to word in the for-loop above, simply changes the value referenced by this temporary variable. It does not alter word_list.
You're not updating your word list. Try
for i, word in enumerate(word_list):
word_list[i] = punctuation.sub("", word)
Remember that although word starts off as a reference to the string object in the word_list, assignment rebinds the name word to the new string object returned by the sub function. It doesn't change the originally referenced object.

Categories

Resources