Pangram detection

Pangram detection - python

beginner here--
Given a string, my code must detect whether or not it is a pangram. Return True if it is, False if not.It should ignore numbers and punctuation.
When given "ABCD45EFGH,IJK,LMNOPQR56STUVW3XYZ" it returns none and when given "This isn't a pangram! is not a pangram." it returns True when the answer should be False.
This isn't a pangram! is not a pangram. What am I not seeing?
import string
def is_pangram(s):
singlechar = set(s)
list = ['a', 'b', 'c', 'd', 'e', 'f', 'g', 'h', 'i', 'j', 'k', 'l', 'm', 'n', 'o', 'p', 'q', 'r', 's', 't', 'u', 'v', 'w', 'x', 'y', 'z']
for index, item in enumerate(singlechar):
if item in list:
list.remove(item)
if list:
return True
break
if not list:
return False

Sets are a great way to check whether something belongs in two collections with their intersection or doesn't belong in one of the two with their difference.
In your case, if the intersection between the set of the letters in your phrase and the letters a-z is of length 26, it is a pangram.
from string import ascii_lowercase
def is_pangram(s):
return len(set(s.lower()).intersection(ascii_lowercase)) == 26

You could have just continued to use sets and their method .difference to find out if there were more characters in the set of all characters or there were no differences (before that you would need to strip the string from punctuation (and whitespace) and make it lowercase (done by .lower and .translate and .maketrans methods of strings)):
import string
def is_pangram(s):
input_set = set(s.lower().translate(
str.maketrans('', '', f'{string.punctuation} ')))
check_set = set(string.ascii_lowercase)
return not check_set.difference(input_set)
value1 = 'The quick brown fox jumps over a lazy dog!'
print(is_pangram(value1))
# True
value2 = 'This isn\'t a pangram! is not a pangram'
print(is_pangram(value2))
# False
If you want to still do it with a list:
def is_pangram(s):
input_set = set(s.lower().translate(
str.maketrans('', '', f'{string.punctuation} ')))
lst = list(string.ascii_lowercase)
for item in input_set:
if item in lst:
lst.remove(item)
if not lst:
return True
return False

Related

Why does python return None in this instance?

I have this python practice question which is to return True if a word is an isogram (word with nonrepeating characters). It is also supposed to return True if the isogram is a blank string.
My answer didn't work out.
from string import ascii_lowercase
def is_isogram(iso):
for x in iso:
return False if (iso.count(x) > 1) and (x in ascii_lowercase) else True
#None
While another answered:
def is_isogram(word):
word = str(word).lower()
alphabet_list = ['a', 'b', 'c', 'd', 'e', 'f', 'g', 'h', 'i', 'j', 'k', 'l', 'm', 'n', 'o', 'p', 'q', 'r', 's', 't', 'u', 'v', 'w', 'x', 'y', 'z']
for i in word:
if word.count(i) > 1 and i in alphabet_list:
return False
return True
#True
I'm not sure why the return value is different with just a slightly different structure or is it how to return statement is defined?

I would use a set operation. Using str.count repeatedly is expensive as you need to read the whole string over and over.
If your string only has unique characters, then its length equals that of its set of characters.
def is_isogram(iso):
return len(set(iso)) == len(iso)
print(is_isogram('abc'))
print(is_isogram('abac'))
print(is_isogram(''))
print(is_isogram(' '))
Output:
True
False
True
True
You can easily implement additional checks. For instance, convert to unique case if the case doesn't matter. If you want to exclude some characters (e.g. spaces), pre-filter the characters iso = [x for x in iso if x not in excluded_set].

I think the difference is that in the other code they are looping the letters in the word and return false if a false condition is met and only if they get to the end of the letters in the word without meeting a false condition they are returning true.
In your code because the return statement for any condition is in the for loop it will only check the first letter, not the rest of the world.
I tried your code and I am getting a true output unless the first letter is repeating.
Edit: I didn't cover the none type output, someone else has already commented saying that it's happening because you never enter your for loop

Python ignore punctuation and white space

string = "Python, program!"
result = []
for x in string:
if x not in result:
result.append(x)
print(result)
This program makes it so if a repeat letter is used twice in a string, it'll appear only once in the list. In this case, the string "Python, program!" will appear as
['P', 'y', 't', 'h', 'o', 'n', ',', ' ', 'p', 'r', 'g', 'a', 'm', '!']
My question is, how do I make it so the program ignores punctuation such as ". , ; ? ! -", and also white spaces? So the final output would look like this instead:
['P', 'y', 't', 'h', 'o', 'n', 'p', 'r', 'g', 'a', 'm']

Just check if the string (letter) is alphanumeric using str.isalnum as an additional condition before appending the character to the list:
string = "Python, program!"
result = []
for x in string:
if x.isalnum() and x not in result:
result.append(x)
print(result)
Output:
['P', 'y', 't', 'h', 'o', 'n', 'p', 'r', 'g', 'a', 'm']
If you don't want numbers in your output, try str.isalpha() instead (returns True if the character is alphabetic).

You can filler them out using the string module. This build in library contains several constants that refer to collections of characters in order, like letters and whitespace.
import string
start = "Python, program!" #Can't name it string since that's the module's name
result = []
for x in start:
if x not in result and (x in string.ascii_letters):
result.append(x)
print(result)

Nesting a function inside itself (i'm desperate)

Mentally exhausted.
An explanation just for context, dont actually need help with hashes:
I'm trying to make a python script that can bruteforce a hashed string or password (learning only, i'm sure there are tenter code herehousands out there).
The goal is making a function that can try all the possible combinations of different letters, starting from one character (a, b... y, z) and then start trying with one more character (aa, ab... zy, zz then aaa, aab... zzy, zzz) indefinetly until it finds a match.
First, it asks you for a string (aaaa for example) then it hashes the string, and then try to bruteforce that hash with the function, and finally the function returns the string again when it finds a match.
PASSWORD_INPUT = input("Password string input: ")
PASSWORD_HASH = encrypt_password(PASSWORD_INPUT) # This returns the clean hash
found_password = old_decrypt() # This is the function below
print(found_password)
I managed to do this chunk of ugly ass code:
built_password = ['', '', '', '', '', '', '', '', '', '', '', '', '', '', '', '']
def old_decrypt():
global built_password
# First letter
for a in range(len(characters)): # Characters is a list with the abecedary
built_password[0] = characters[a]
if test_password(pretty(built_password)): # This returns True if it matches
return pretty(built_password)
# Second letter
for b in range(len(characters)):
built_password[1] = characters[b]
if test_password(pretty(built_password)):
return pretty(built_password)
# Third letter
for c in range(len(characters)):
built_password[2] = characters[c]
if test_password(pretty(built_password)):
return pretty(built_password)
# Fourth letter
for d in range(len(characters)):
built_password[3] = characters[d]
if test_password(pretty(built_password)):
return pretty(built_password)
The problem of this is that it only works with 4 letters strings.
As you can see, it's almost the exact same loop for every letter, so i thought "Hey i can make this a single function"... After obsessively trying everything that came to my mind for 3 whole days i came with this:
# THIS WORKS
def decrypt(letters_amount_int):
global built_password
for function_num in range(letters_amount_int):
for letter in range(len(characters)):
built_password[function_num] = characters[letter]
if letters_amount_int >= 1:
decrypt(letters_amount_int - 1)
if test_password(pretty(built_password)):
return pretty(built_password)
# START
n = 1
while True:
returned = decrypt(n)
# If it returns a string it gets printed, else trying to print None raises TypeError
try:
print("Found hash for: " + returned)
break
except TypeError:
n += 1
Function gets a "1", tries with 1 letter and if it doesnt return anything it gets a "2" and then tries with 2.
It works, but for some reason it makes a ridiculous number of unnecessary loops that takes exponentially more and more time, i've been smashing my head and came to the conclussion that i'm not understanding something about python's internal functioning.
Can someone please drop some light on this? Thanks
In case of needed these are the other functions:
def encrypt_password(password_str):
return hashlib.sha256(password_str.encode()).hexdigest()
def test_password(password_to_test_str):
global PASSWORD_HASH
if PASSWORD_HASH == encrypt_password(password_to_test_str):
return True
characters = ['0', '1', '2', '3', '4', '5', '6', '7', '8', '9',
'a', 'b', 'c', 'd', 'e', 'f', 'g', 'h', 'i', 'j',
'k', 'l', 'm', 'n', 'o', 'p', 'q', 'r', 's', 't',
'u', 'v', 'w', 'x', 'y', 'z', 'A', 'B', 'C', 'D',
'E', 'F', 'G', 'H', 'I', 'J', 'K', 'L', 'M', 'N',
'O', 'P', 'Q', 'R', 'S', 'T', 'U', 'V', 'W', 'X',
'Y', 'Z']

Recursion in this case gives a very elegant solution:
def add_char(s, limit):
if len(s) == limit:
yield s
else:
for c in characters:
yield from add_char(s + c, limit)
def generate_all_passwords_up_to_length(maxlen):
for i in range(1, maxlen + 1):
yield from add_char("", i)
for password in generate_all_passwords_up_to_length(5):
test_password(password)

Maybe you could try something like this. Inspired by Multiple permutations, including duplicates
Itertools has a cartesian product generator, which is related to permutation.
import itertools
def decrypt(characters, num_chars):
for test_list in itertools.product(characters, repeat=num_chars):
test_str = ''.join(test_list)
if test_password(test_str):
return test_str
return None
for i in range(min_chars, max_chars+1):
result = decrypt(characters, i)
if result:
print(f'Match found: {result}')
If you run this code with characters, min_chars, max_chars = (characters, 1, 3) and print test_str at each step, you'll get:
0
1
2
00
01
02
10
11
12
20
21
22
or will stop earlier if a match is found. I recommend you look up a recursive, pure python implementation of the the cartesian product function if you want to learn more. However, I'd suspect the cartesian product generator will be faster than a recursive solution.
Note that itertools.product() is a generator, so it's generating each value on demand, and writing it this way allows you to find a match for shorter strings faster than longer ones. But the time it takes this brute force algorithm should indeed increase exponentially with the length of the true password.
Hope this helps.

Returning a set for all individual letters, vs a set for each word

I don't understand why I am receiving a set for each individual letter when I have the code below; however, when I simply remove the '''if word in 'abcdefghijklmnopqrstuvwxyz ':''' then I receive a set for each phrase. However, I need something that will remove anything that isn't a space (i.e. / [ ] - etc., from the larger passage, so the abcd was the best I could think of for this).
Two follow-up questions:
It seems that if I use return vs print, I receive two different answers (return only returns the last set; where print returns all sets).
Rather than having it be 5 individual sets, how would I put this into a list of 5 sets?
def make_itemsets(words):
words = str(words)
words.lower().split()
for word in words:
newset = set()
if word in 'abcdefghijklmnopqrstuvwxyz ':
newset.update(word)
print(newset)
words = ['sed', 'ut', 'perspiciatis', 'unde', 'omnis']
make_itemsets(words)
This returns the five lists (but doesn't remove all excess and won't remove non-characters from the larger passage):
def make_itemsets(words):
words = str(words)
words.lower().split()
for word in words:
newset = set()
newset.update(word)
print(newset)
This would be expected output:
[{'d', 'e', 's'},
{'t', 'u'},
{'a', 'c', 'e', 'i', 'p', 'r', 's', 't'},
{'d', 'e', 'n', 'u'},
{'i', 'm', 'n', 'o', 's'}]

You can have your expected output like this:
print ( [set(w) for w in words] )
Output is:
[{'d', 's', 'e'}, {'u', 't'}, {'p', 'e', 'i', 'a', 'c', 'r', 's', 't'}, {'d', 'u', 'e', 'n'}, {'m', 'i', 'o', 's', 'n'}]
Note that sets have no order.
If you want words which are alphabetic characters only, you can do this:
print ( [set(w) for w in words if w.isalpha()] )

Match all clusters containing only letters:
for word in re.compile('[a-z]+').findall('sed ut perspfkdls'):
If you want to keep create a list for aggragated results:
result = []
...
result.append({c for c in word})
...
return result
Edit: I updated my answer after reading the clarification.
def make_itemsets(words):
matcher = re.compile('[a-z]+')
words = str(words).lower()
words = matcher.findall(words)
return [{c for c in w} for w in words]
Edit 2: I already gave almost a complete implementation, so I connected the dots.

Iterating through list, ignoring duplicates

I've written a program that attempts to find a series of letters (toBeFound - these letters represent a word) in a list of letters (letterList), however it refuses to acknowledge the current series of 3 letters as it counts the 'I' in the first list twice, adding it to the duplicate list.
Currently this code returns "incorrect", when it should return "correct".
letterList= ['F','I', 'I', 'X', 'O', 'R', 'E']
toBeFound = ['F', 'I', 'X']
List = []
for i in toBeFound[:]:
for l in letterList[:]:
if l== i:
letterList.remove(l)
List.append(i)
if List == toBeFound:
print("Correct.")
else:
print("Incorrect.")
letterList and toBeFound are sample values, the letters in each can be anything. I can't manage to iterate through the code and successfully ensure that duplicates are ignored. Any help would be greatly appreciated!

Basically, you're looking to see if toBeFound is a subset of letterList, right?
That is a hint to use sets:
In [1]: letters = set(['F','I', 'I', 'X', 'O', 'R', 'E'])
In [2]: find = set(['F', 'I', 'X'])
In [3]: find.issubset(letters)
Out[3]: True
In [4]: find <= letters
Out[4]: True
(BTW, [3] and [4] are different notations for the same operator.)

I think this would solve your problem. Please try it and let me know
letterList= ['F','I', 'I', 'X', 'O', 'R', 'E']
toBeFound = ['F', 'I', 'X']
found_list = [i for i in toBeFound if i in letterList]
print("Correct" if toBeFound == found_list else "Incorrect")

You could make the initial list a set, but if you want to look up a word like 'hello' it wont work because you'll need both l's.
One way to solve this is to use a dictionary to check and see how we are doing so far.
letterList = ['H', 'E', 'L', 'X', 'L', 'I', 'O']
toBeFound = ['H', 'E', 'L', 'L', 'O']
# build dictionary to hold our desired letters and their counts
toBeFoundDict = {}
for i in toBeFound:
if i in toBeFoundDict:
toBeFoundDict[i] += 1
else:
toBeFoundDict[i] = 1
letterListDict = {} # dictionary that holds values from input
output_list = [] # dont use list its a reserved word
for letter in letterList:
if letter in letterListDict: # already in dictionary
# if we dont have too many of the letter add it
if letterListDict[letter] < toBeFoundDict[letter]:
output_list.append(letter)
# update the dictionary
letterListDict[letter] += 1
else: # not in dictionary so lets add it
letterListDict[letter] = 1
if letter in toBeFoundDict:
output_list.append(letter)
if output_list == toBeFound:
print('Success')
else:
print('fail')

How about this: (I tested in python3.6)
import collections
letterList= ['F','I', 'I', 'X', 'O', 'R', 'E']
toBeFound = ['F', 'I', 'X']
collections.Counter(letterList)
a=collections.Counter(letterList) # print(a) does not show order
# but a.keys() has order preserved
final = [i for i in a.keys() if i in toBeFound]
if final == toBeFound:
print("Correct")
else:
print("Incorrect")

If you're looking to check if letterList has the letters of toBeFound in the specified order and ignoring repeating letters, this would be a simple variation on the old "file match" algorithm. You could implement it in a non-destructive function like this:
def letterMatch(letterList,toBeFound):
i= 0
for letter in letterList:
if letter == toBeFound[i] : i += 1
elif i > 0 and letter != toBeFound[i-1] : break
if i == len(toBeFound) : return True
return False
letterMatch(['F','I', 'I', 'X', 'O', 'R', 'E'],['F', 'I', 'X'])
# returns True
On the other hand, if what you're looking for is testing if letterList has all the letters needed to form toBeFound (in any order), then the logic is much simpler as you only need to "check out" the letters of toBeFound using the ones in letterList:
def lettermatch(letterList,toBeFound):
missing = toBeFound.copy()
for letter in letterList:
if letter in missing : missing.remove(letter)
return not missing

As requested.
letterList= ['F','I', 'I', 'X', 'O', 'R', 'E']
toBeFound = ['F', 'I', 'X']
List = []
for i in toBeFound[:]:
for l in set(letterList):
if l== i:
List.append(i)
if List == toBeFound:
print("Correct.")
else:
print("Incorrect.")
This prints correct. I made the letterList a set! Hope it helps.

One simple way is to just iterate through toBeFound, and look for each element in letterList.
letterList= ['F','I', 'I', 'X', 'O', 'R', 'E']
toBeFound = ['F', 'I', 'X']
found = False
for x in letterList:
if x not in toBeFound:
found = False
break
if found:
print("Correct.")
else:
print("Incorrect.")

Develop Reference

Python is a programming language that lets you work quickly and integrate systems more effectively.

Pangram detection - python

Related

Why does python return None in this instance?

Python ignore punctuation and white space

Nesting a function inside itself (i'm desperate)

Returning a set for all individual letters, vs a set for each word

Iterating through list, ignoring duplicates

Categories

Resources