Python simple check - python

I am just learning Python and I want to check if a list contains a word. When I check for the word it always returns 0, even though the function can find it and print it. But the if/else statement always return 0, when I use 2 returns as below. Can you help me?
def line_number(text, word):
"""
Returns the line number (beginning with 1) where the word appears for the first time
:param text: Text in which the word is searched for
:param word: A word to search for
:return: List of Line Numbers
"""
lines = text.splitlines()
i = 0
for line in lines:
i = i +1
if word in line:
return i
else:
return 0
# print("nope")
words = ['erwachet', 'Brust', 'Wie', 'Ozean', 'Knabe']
for word in words:
num = line_number(wilhelm_tell, word)
if num > 0:
print(f"Das Word {word} findet sich auf Zeile {num}.")
else:
print(f"Das Wort {word} wurde nicht gefunden!")

You should return 0 after the for loop ends and not inside the loop.
def line_number(text, word):
"""
Returns the line number (beginning with 1) where the word appears for the first time
:param text: Text in which the word is searched for
:param word: A word to search for
:return: List of Line Numbers
"""
lines = text.splitlines()
i = 0
for line in lines:
i = i +1
if word in line:
return i
return 0
The problem is the else statement inside the loop because it will break the loop in the first iteration.

I think the problem is that you're returning 0 on the first line the word is not in. So instead, you should return 0 after you looked through every line:
def line_number(text, word):
"""
Returns the line number (beginning with 1) where the word appears for the first time
:param text: Text in which the word is searched for
:param word: A word to search for
:return: List of Line Numbers
"""
lines = text.splitlines()
i = 0
for line in lines:
i = i +1
if word in line:
return i
return 0

I think the problem might be that you are returning 0 if the word is not within a line. When you use the return keyword, it will exit the function and return the value. So if a word is not within the first line of the text, it will return 0 (even though the word is present on further lines)
Here is a little example of what i think is happening:
def is_element_within_list(element_to_search, some_list):
for element in some_list:
if element == element_to_search:
return True
else:
return False
some_list = [1, 2, 3]
element_to_search = 2
print(is_element_within_list(element_to_search, some_list))
# output: False
We have a function that checks if an element is within a list (we could use the keyword "in", but is for the sake of the example). So, despite that 2 is within some_list, the function output is False, because the function is returning False on the else if the elements are no the same

Related

identifying the substring when the number of characters in between don't matter

def checkPattern(x, string):
e = len(string)
if len(x) < e:
return False
for i in range(e - 1):
x = string[i]
y = string[i + 1]
last = x.rindex(x)
first = x.index(y)
if last == -1 or first == -1 or last > first:
return False
return True
if __name__ == "__main__":
x = str(input())
string = "hello"
if checkPattern(x, string) is True:
print('YES')
if checkPattern(x, string) is False:
print('NO')
So basically the code is supposed to identify a substring when the number of characters between the substring's letters don't matter. string = "hello" is supposed to be the substring. While the characters in between don't matter the order still matters so If I type "h.e.l.l.o" for example it's a YES, but if it's something like "hlelo" it's a NO. I sorta copied the base of the code and I'm still a little new to python so sorry if the question and code aren't clear.
Assuming I understand, and the input hlelo is No and the input h.e..l.l.!o is Yes, then the following code should work:
def checkPattern(x, string):
assert x and string, "Error. Both inputs should be non-empty. "
count_idx = 0 # index which counts where you are.
for letter in x:
if letter == string[count_idx]:
count_idx += 1 # increment to check the next string
if count_idx == len(string):
return True
# pattern was found if counter found matches equal to string length
return False
if __name__ == "__main__":
inp = input()
string = "hello"
if checkPattern(inp, string) is True:
print('YES')
if checkPattern(inp, string) is False:
print('NO')
Explaination: Regardless of the input string, x, you want to loop through each character of the search-string hello, to check if you find each character in the correct order. What my solution does is that it counts how many of the characters h, e, l, l, o it has found, starting from 0. If it finds a match for h, it moves on to check for a match for e, and so on. Ultimately, if you search through the entire string x, and the counter does not equal to the length of the search string (i.e. you could not find all the hello characters), it returns false.
EDIT: Small debug in the way the return worked. Instead returns if ever the counter goes over the length. Also added more examples given in comments
Here is my solution to this problem:
pattern = "hello"
def patternCheck(word, pattern) -> bool:
plist = list(pattern)
wlist = list(word)
for p in plist:
if p in wlist:
for _ in range(wlist.index(p) , -1, -1):
wlist.pop(_)
else:
return False
return True
print(patternCheck("h.e.l.l.o", pattern))
print(patternCheck("aalohel", pattern))
print(patternCheck("hhhhheeelllooo", pattern))
Explanation
First we convert our strings to a list
plist = list(pattern)
wlist = list(word)
Now we check using a for loop if every element in our pattern list is in the word list.
for p in plist:
if p in wlist:
If yes then we remove all the elements from index 0 to the index of that element.
for _ in range(wlist.index(p) , -1, -1):
wlist.pop(_)
We are removing elements in decreasing order of there indices to protect ourself from the IndexError: pop index out of range.
If the for loop ends normally then there was a match and we return True. Else if the element was not found in the word list in the first place then we return false as there is no match.

How to get the position of a character in Python and store it in a variable?

I am looking for a way to store the position integer of a character into a variable, but now I'm using a way I used in Delphi 2010, which is not right, according to Jupyter Notebook
This is my code I have this far:
def animal_crackers(text):
for index in text:
if index==' ':
if text[0] == text[pos(index)+1]:
return True
else:
return False
else:
pass
The aim, is to get two words (word + space + word) and if the beginning letters, of both words, match, then it has to show True, otherwise it shows False
For getting the index of a letter in a string (as the title asks), just use str.index(), or str.find() if you don't want an error to be raised if the letter/substring could not be found:
>>> text = 'seal sheep'
>>> text.index(' ')
4
However for your program, you do not need to use str.index if you want to identify the first and second word. Instead, use str.split() to break up a given text into a list of substrings:
>>> words = text.split() # With no arguments, splits words by whitespace
>>> words
['seal', 'sheep']
Then, you can take the letter of the first word and check if the second word begins with the same letter:
# For readability, you can assign the two words into their own variables
>>> first_word, second_word = words[0], words[1]
>>> first_word[0] == second_word[0]
True
Combined into a function, it may look like this:
def animal_crackers(text):
words = text.split()
first_word, second_word = words[0], words[1]
return first_word[0] == second_word[0]
Assuming that text is a single line containing two words:
def animal_crackers(text):
words = text.split()
if len(words)== 1:
break # we only have one word!
# here, the .lower() is only necessary is the program is NOT case-sensitive
# if you do care about the case of the letter, remove them
if word[0].lower() == words[1][0].lower():
return True
else:
return false

Binary Search using a for loop, searching for words in a list and comparing

I'm trying to compare the words in "alice_list" to "dictionary_list", and if a word isnt found in the "dictionary_list" to print it and say it is probably misspelled. I'm having issues where its not printing anything if its not found, maybe you guys could help me out. I have the "alice_list" being appended to uppercase, as the "dictionary_list" is all in capitals. Any help with why its not working would be appreciated as I'm about to pull my hair out over it!
import re
# This function takes in a line of text and returns
# a list of words in the line.
def split_line(line):
return re.findall('[A-Za-z]+(?:\'[A-Za-z]+)?', line)
# --- Read in a file from disk and put it in an array.
dictionary_list = []
alice_list = []
misspelled_words = []
for line in open("dictionary.txt"):
line = line.strip()
dictionary_list.extend(split_line(line))
for line in open("AliceInWonderLand200.txt"):
line = line.strip()
alice_list.extend(split_line(line.upper()))
def searching(word, wordList):
first = 0
last = len(wordList) - 1
found = False
while first <= last and not found:
middle = (first + last)//2
if wordList[middle] == word:
found = True
else:
if word < wordList[middle]:
last = middle - 1
else:
first = middle + 1
return found
for word in alice_list:
searching(word, dictionary_list)
--------- EDITED CODE THAT WORKED ----------
Updated a few things if anyone has the same issue, and used "for word not in" to double check what was being outputted in the search.
"""-----Binary Search-----"""
# search for word, if the word is searched higher than list length, print
words = alice_list
for word in alice_list:
first = 0
last = len(dictionary_list) - 1
found = False
while first <= last and not found:
middle = (first + last) // 2
if dictionary_list[middle] == word:
found = True
else:
if word < dictionary_list[middle]:
last = middle - 1
else:
first = middle + 1
if word > dictionary_list[last]:
print("NEW:", word)
# checking to make sure words match
for word in alice_list:
if word not in dictionary_list:
print(word)
Your function split_line() returns a list. You then take the output of the function and append it to the dictionary list, which means each entry in the dictionary is a list of words rather than a single word. The quick fix it to use extend instead of append.
dictionary_list.extend(split_line(line))
A set might be a better choice than a list here, then you wouldn't need the binary search.
--EDIT--
To print words not in the list, just filter the list based on whether your function returns False. Something like:
notfound = [word for word in alice_list if not searching(word, dictionary_list)]
Are you required to use binary search for this program? Python has this handy operator called "in". Given an element as the first operand and and a list/set/dictionary/tuple as the second, it returns True if that element is in the structure, and false if it is not.
Examples:
1 in [1, 2, 3, 4] -> True
"APPLE" in ["HELLO", "WORLD"] -> False
So, for your case, most of the script can be simplified to:
for word in alice_list:
if word not in dictionary_list:
print(word)
This will print each word that is not in the dictionary list.

How to find check only first index in each split string?

I am trying to create define a function that:
Splits a string called text at every new line (ex text="1\n2\n\3)
Checks ONLY the first index in each of the individual split items to see if number is 0-9.
Return any index that has 0-9, it can be more than one line
ex: count_digit_leading_lines ("AAA\n1st") → 1 # 2nd line starts w/ digit 1
So far my code is looking like this but I can't figure out how to get it to only check the first index in each split string:
def count_digit_leading_lines(text):
for line in range(len(text.split('\n'))):
for index, line in enumerate(line):
if 0<=num<=9:
return index
It accepts the arguement text, it iterates over each individual line (new split strings,) I think it goes in to check only the first index but this is where I get lost...
The code should be as simple as :
text=text.strip() #strip all whitespace : for cases ending with '\n' or having two '\n' together
text=text.replace('\t','') #for cases with '\t' etc
s=text.split('\n') #Split each sentence (# '\n')
#s=[words.strip() for words in s] #can also use this instead of replace('\t')
for i,sentence in enumerate(s):
char=sentence[0] #get first char in each sentence
if char.isdigit(): #if 1st char is a digit (0-9)
return i
UPDATE:
Just noticed OP's comment on another answer stating you don't want to use enumerate in your code (though its good practice to use enumeration). So the for loop modified version without enumerate is :
for i in range(len(s)):
char=s[i][0] #get first char in each sentence
if char.isdigit(): #if 1st char is a digit (0-9)
return i
This should do it:
texts = ["1\n2\n\3", 'ABC\n123\n456\n555']
def _get_index_if_matching(text):
split_text = text.split('\n')
if split_text:
for line_index, line in enumerate(split_text):
try:
num = int(line[0])
if 0 < num < 9:
return line_index
except ValueError:
pass
for text in texts:
print(_get_index_if_matching(text))
It will return 0 and then 1
You could change out your return statement for a yield, making your function a generator. Then you could get the indexes one by one in a loop, or make them into a list. Here's a way you could do it:
def count_digit_leading_lines(text):
for index, line in enumerate(text.split('\n')):
try:
int(line[0])
yield index
except ValueError: pass
# Usage:
for index in count_digit_leading_lines(text):
print(index)
# Or to get a list
print(list(count_digit_leading_lines(text)))
Example:
In : list(count_digit_leading_lines('he\n1\nhto2\n9\ngaga'))
Out: [1, 3]

Find anagrams of a given word in a file

Alright so for class we have this problem where we need to be able to input a word and from a given text file (wordlist.txt) a list will be made using any anagrams of that word found in the file.
My code so far looks like this:
def find_anagrams1(string):
"""Takes a string and returns a list of anagrams for that string from the wordlist.txt file.
string -> list"""
anagrams = []
file = open("wordlist.txt")
next = file.readline()
while next != "":
isit = is_anagram(string, next)
if isit is True:
anagrams.append(next)
next = file.readline()
file.close()
return anagrams
Every time I try to run the program it just returns an empty list, despite the fact that I know there are anagrams present. Any ideas on what's wrong?
P.S. The is_anagram function looks like this:
def is_anagram(string1, string2):
"""Takes two strings and returns True if the strings are anagrams of each other.
list,list -> string"""
a = sorted(string1)
b = sorted(string2)
if a == b:
return True
else:
return False
I am using Python 3.4
The problem is that you are using the readline function. From the documentation:
file.readline = readline(...)
readline([size]) -> next line from the file, as a string.
Retain newline. A non-negative size argument limits the maximum
number of bytes to return (an incomplete line may be returned then).
Return an empty string at EOF.
The key information here is "Retain newline". That means that if you have a file containing a list of words, one per line, each word is going to be returned with a terminal newline. So when you call:
next = file.readline()
You're not getting example, you're getting example\n, so this will never match your input string.
A simple solution is to call the strip() method on the lines read from the file:
next = file.readline().strip()
while next != "":
isit = is_anagram(string, next)
if isit is True:
anagrams.append(next)
next = file.readline().strip()
file.close()
However, there are several problems with this code. To start with, file is a terrible name for a variable, because this will mask the python file module.
Rather than repeatedly calling readline(), you're better off taking advantage of the fact that an open file is an iterator which yields the lines of the file:
words = open('wordlist.txt')
for word in words:
word = word.strip()
isit = is_anagram(string, word)
if isit:
anagrams.append(word)
words.close()
Note also here that since is_anagram returns True or False, you
don't need to compare the result to True or False (e.g., if isit
is True). You can simply use the return value on its own.
Yikes, don't use for loops:
import collections
def find_anagrams(x):
anagrams = [''.join(sorted(list(i))) for i in x]
anagrams_counts = [item for item, count in collections.Counter(anagrams).items() if count > 1]
return [i for i in x if ''.join(sorted(list(i))) in anagrams_counts]
Here's another solution, that I think is quite elegant. This runs in O(n * m) where n is the number of words and m is number of letters (or average number of letters/word).
# anagarams.py
from collections import Counter
import urllib.request
def word_hash(word):
return frozenset(Counter(word).items())
def download_word_file():
url = 'https://raw.githubusercontent.com/first20hours/google-10000-english/master/google-10000-english-no-swears.txt'
urllib.request.urlretrieve(url, 'words.txt')
def read_word_file():
with open('words.txt') as f:
words = f.read().splitlines()
return words
if __name__ == "__main__":
# downloads a file to your working directory
download_word_file()
# reads file into memory
words = read_word_file()
d = {}
for word in words:
k = word_hash(word)
if k in d:
d[k].append(word)
else:
d[k] = [word]
# Prints the filtered results to only words with anagrams
print([x for x in d.values() if len(x) > 1])

Categories

Resources