Anagram test for two strings in python

Anagram test for two strings in python - python

This is the question:
Write a function named test_for_anagrams that receives two strings as
parameters, both of which consist of alphabetic characters and returns
True if the two strings are anagrams, False otherwise. Two strings are
anagrams if one string can be constructed by rearranging the
characters in the other string using all the characters in the
original string exactly once. For example, the strings "Orchestra" and
"Carthorse" are anagrams because each one can be constructed by
rearranging the characters in the other one using all the characters
in one of them exactly once. Note that capitalization does not matter
here i.e. a lower case character can be considered the same as an
upper case character.
My code:
def test_for_anagrams (str_1, str_2):
str_1 = str_1.lower()
str_2 = str_2.lower()
print(len(str_1), len(str_2))
count = 0
if (len(str_1) != len(str_2)):
return (False)
else:
for i in range(0, len(str_1)):
for j in range(0, len(str_2)):
if(str_1[i] == str_2[j]):
count += 1
if (count == len(str_1)):
return (True)
else:
return (False)
#Main Program
str_1 = input("Enter a string 1: ")
str_2 = input("Enter a string 2: ")
result = test_for_anagrams (str_1, str_2)
print (result)
The problem here is when I enter strings as Orchestra and Carthorse, it gives me result as False. Same for the strings The eyes and They see. Any help would be appreciated.

I'm new to python, so excuse me if I'm wrong
I believe this can be done in a different approach: sort the given strings and then compare them.
def anagram(a, b):
# string to list
str1 = list(a.lower())
str2 = list(b.lower())
#sort list
str1.sort()
str2.sort()
#join list back to string
str1 = ''.join(str1)
str2 = ''.join(str2)
return str1 == str2
print(anagram('Orchestra', 'Carthorse'))

The problem is that you just check whether any character matches exist in the strings and increment the counter then. You do not account for characters you already matched with another one. That’s why the following will also fail:
>>> test_for_anagrams('aa', 'aa')
False
Even if the string is equal (and as such also an anagram), you are matching the each a of the first string with each a of the other string, so you have a count of 4 resulting in a result of False.
What you should do in general is count every character occurrence and make sure that every character occurs as often in each string. You can count characters by using a collections.Counter object. You then just need to check whether the counts for each string are the same, which you can easily do by comparing the counter objects (which are just dictionaries):
from collections import Counter
def test_for_anagrams (str_1, str_2):
c1 = Counter(str_1.lower())
c2 = Counter(str_2.lower())
return c1 == c2
>>> test_for_anagrams('Orchestra', 'Carthorse')
True
>>> test_for_anagrams('aa', 'aa')
True
>>> test_for_anagrams('bar', 'baz')
False

For completeness: If just importing Counter and be done with the exercise is not in the spirit of the exercise, you can just use plain dictionaries to count the letters.
def test_for_anagrams(str_1, str_2):
counter1 = {}
for c in str_1.lower():
counter1[c] = counter1.get(c, 0) + 1
counter2 = {}
for c in str_2.lower():
counter2[c] = counter2.get(c, 0) + 1
# print statements so you can see what's going on,
# comment out/remove at will
print(counter1)
print(counter2)
return counter1 == counter2
Demo:
print(test_for_anagrams('The eyes', 'They see'))
print(test_for_anagrams('orchestra', 'carthorse'))
print(test_for_anagrams('orchestr', 'carthorse'))
Output:
{' ': 1, 'e': 3, 'h': 1, 's': 1, 't': 1, 'y': 1}
{' ': 1, 'e': 3, 'h': 1, 's': 1, 't': 1, 'y': 1}
True
{'a': 1, 'c': 1, 'e': 1, 'h': 1, 'o': 1, 's': 1, 'r': 2, 't': 1}
{'a': 1, 'c': 1, 'e': 1, 'h': 1, 'o': 1, 's': 1, 'r': 2, 't': 1}
True
{'c': 1, 'e': 1, 'h': 1, 'o': 1, 's': 1, 'r': 2, 't': 1}
{'a': 1, 'c': 1, 'e': 1, 'h': 1, 'o': 1, 's': 1, 'r': 2, 't': 1}
False

Traverse through string test and validate weather character present in string test1 if present store the data in string value.
compare the length of value and length of test1 if equals return True Else False.
def anagram(test,test1):
value =''
for data in test:
if data in test1:
value += data
if len(value) == len(test1):
return True
else:
return False
anagram("abcd","adbc")

I have done Anagram Program in basic way and easy to understandable .
def compare(str1,str2):
if((str1==None) or (str2==None)):
print(" You don't enter string .")
elif(len(str1)!=len(str2)):
print(" Strings entered is not Anagrams .")
elif(len(str1)==len(str2)):
b=[]
c=[]
for i in str1:
#print(i)
b.append(i)
b.sort()
print(b)
for j in str2:
#print(j)
c.append(j)
c.sort()
print(c)
if (b==c and b!=[] ):
print(" String entered is Anargama .")
else:
print(" String entered are not Anargama.")
else:
print(" String entered is not Anargama .")
str1=input(" Enter the first String :")
str2=input(" Enter the second String :")
compare(str1,str2)

A more concise and pythonic way to do it is using sorted & lower/upper keys.
You can first sort the strings and then use lower/ upper to make the case consistent for proper comparison as follows:
# Function definition
def test_for_anagrams (str_1, str_2):
if sorted(str_1).lower() == sorted(str_2).lower():
return True
else:
return False
#Main Program
str_1 = input("Enter a string 1: ")
str_2 = input("Enter a string 2: ")
result = test_for_anagrams (str_1, str_2)
print (result)

Another solution:
def test_for_anagrams(my_string1, my_string2):
s1,s2 = my_string1.lower(), my_string2.lower()
count = 0
if len(s1) != len(s2) :
return False
for char in s1 :
if s2.count(char,0,len(s2)) == s1.count(char,0,len(s1)):
count = count + 1
return count == len(s1)

My solution is:
#anagrams
def is_anagram(a, b):
if sorted(a) == sorted(b):
return True
else:
return False
print(is_anagram("Alice", "Bob"))

def anagram(test,test1):
test_value = []
if len(test) == len(test1):
for i in test:
value = test.count(i) == test1.count(i)
test_value.append(value)
else:
test_value.append(False)
if False in test_value:
return False
else:
return True
check for length of test and test1 , if length matches traverse through string test and compare the character count in both test and test1 strings if matches store the value in string.

Related

Removing duplicates using a dictionary

I am writing a function that is supposed to count duplicates and mention how many duplicates are of each individual record. For now my output is giving me the total number of duplications, which I don't want.
i.e. if there are 4 duplicates of one record, it's giving me 4 instead of 1; if there are 6 duplicates of 2 individual records it should give me 2.
Could someone please help find the bug?
Thank you
def duplicate_count(text):
text = text.lower()
dict = {}
word = 0
if len(text) != "":
for a in text:
dict[a] = dict.get(a,0) + 1
for a in text:
if dict[a] > 1:
word = word + 1
return word
else:
return "0"

Fixed it:
def duplicate_count(text):
text = text.lower()
dict = {}
word = 0
if len(text) != "":
for a in text:
dict[a] = dict.get(a,0) + 1
return sum(1 for a in dict.values() if a >= 2)
else:
return "0"

You can do this with set and sum. First set is used to remove all duplicates. This is so we can have as few iterations as possible, as-well-as get an immediate count, as opposed to a "one-at-a-time" count. The set is then used to create a dictionary that stores the amount of times a character repeats. Those values are then used as a generator in sum to sum all the times that the "repeat value" is greater than 1.
def dup_cnt(t:str) -> int:
if not t: return 0
t = t.lower()
d = dict()
for c in set(t):
d[c] = t.count(c)
return sum(v>1 for v in d.values())
print(dup_cnt('aabccdeefggh')) #4

I don't really understand the question you asked.
But I assume you want to get the count or details of each letter's duplication in the text. You can do this, hoping this can help.
def duplicate_count(text):
count_dict = {}
for letter in text.lower():
count_dict[letter] = count_dict.setdefault(letter, 0) + 1
return count_dict
ret = duplicate_count('asuhvknasiasifjiasjfija')
# Get all letter details
print(ret)
#{'a': 5, 's': 4, 'u': 1, 'h': 1, 'v': 1, 'k': 1, 'n': 1, 'i': 4, 'f': 2, 'j': 3}
# Get all letter count
print(len(ret))
# 10
# Get only the letters appear more than once in the text
dedup = {k: v for k, v in ret.items() if v > 1}
# Get only duplicated letter details
print(dedup)
# {'a': 5, 's': 4, 'i': 4, 'f': 2, 'j': 3}
# Get only duplicated letter count
print(len(dedup))
# 5

What's the best way to find the intersection between two strings?

I need to find the intersection between two strings.
Assertions:
assert intersect("test", "tes") == list("tes"), "Assertion 1"
assert intersect("test", "ta") == list("t"), "Assertion 2"
assert intersect("foo", "fo") == list("fo"), "Assertion 3"
assert intersect("foobar", "foo") == list("foo"), "Assertion 4"
I tried different implementations for the intersect function. intersect would receive 2 str parameters, w and w2
List comprehension. Iterate and look for occurrences in the second string.
return [l for l in w if l in w2]
Fail assertion 1 and 2 because multiple t in w match the one t in w2
Sets intersections.
return list(set(w).intersection(w2)
return list(set(w) & set(w2))
Fails assertion 3 and 4 because a set is a collection of unique elements and duplicated letters will be discarded.
Iterate and count.
out = ""
for c in s1:
if c in s2 and not c in out:
out += c
return out
Fails because it also eliminates duplicates.
difflib (Python Documentation)
letters_diff = difflib.ndiff(word, non_wildcards_letters)
letters_intersection = []
for l in letters_diff:
letter_code, letter = l[:2], l[2:]
if letter_code == " ":
letters_intersection.append(letter)
return letters_intersection
Passes
difflib works but can anybody think of a better, optimized approach?
EDIT:
The function would return a list of chars. The order doesn't really matter.

Try this:
def intersect(string1, string2):
common = []
for char in set(string1):
common.extend(char * min(string1.count(char), string2.count(char)))
return common
Note: It doesn't preserve the order (if I remember set() correctly, the letters will be returned in alphabetical order). But, as you say in your comments, order doesn't matter

This works pretty well for your test cases:
def intersect(haystack, needle):
while needle:
pos = haystack.find(needle)
if pos >= 0:
return list(needle)
needle = needle[:-1]
return []
But, bear in mind that, all your test cases are longer then shorter, do not have an empty search term, an empty search space, or a non-match.

Gives the number of co-occurrence for all n-grams in the two strings:
from collections import Counter
def all_ngrams(text):
ngrams = ( text[i:i+n] for n in range(1, len(text)+1)
for i in range(len(text)-n+1) )
return Counter(ngrams)
def intersection(string1, string2):
count_1 = all_ngrams(string1)
count_2 = all_ngrams(string2)
return count_1 & count_2 # intersection: min(c[x], d[x])
intersection('foo', 'f') # Counter({'f': 1})
intersection('foo', 'o') # Counter({'o': 1})
intersection('foobar', 'foo') # Counter({'f': 1, 'fo': 1, 'foo': 1, 'o': 2, 'oo': 1})
intersection('abhab', 'abab') # Counter({'a': 2, 'ab': 2, 'b': 2})
intersection('achab', 'abac') # Counter({'a': 2, 'ab': 1, 'ac': 1, 'b': 1, 'c': 1})
intersection('test', 'ates') # Counter({'e': 1, 'es': 1, 's': 1, 't': 1, 'te': 1, 'tes': 1})

Python dictionary being returned

I'm trying to create dictionary with the count of each letter appearing in a list of words. The method count_letters_v2(word_list), print the letters but does not return the dictionary. Why? And how do I fix this? Thank you.
def count_letters_v2(word_list):
count = {}
for word in word_list:
print word
for letter in word:
print letter
if letter not in count:
count[letter] = 1
else:
count[letter] = count[letter] + 1
return count
def main():
count_letters_v2(['a','short','list','of','words'])
if __name__ == '__main__':
main()

It does return the dictionary. That's what return count does. Do you want to print out the dictionary? Then change main() to
def main():
print count_letters_v2(['a','short','list','of','words'])
For the record, there's a Counter object (a subclass of dict so can do all the same things) in the standard library that will do this all for you.
from collections import Counter
def count_letters_v3(word_list):
return Counter(''.join(word_list))
print count_letters_v3(['a','short','list','of','words'])
Output:
Counter({'a': 1,
'd': 1,
'f': 1,
'h': 1,
'i': 1,
'l': 1,
'o': 3,
'r': 2,
's': 3,
't': 2,
'w': 1})

As said, you code works but you didn't do anytinhg with the return value of the function.
That said, I can think of some improvements though. First, the get method
of dict will make the case of a new letter cleaner, setting the default value to 0:
...
count = {}
for word in word_list:
for letter in word:
count[letter] = count.get(letter, 0) + 1
...
Otherwise, you can use a Counter object from collections:
from collections import Counter
def count_letters_v2(word_list):
count = Counter()
for word in word_list:
count.update(word)
return count
If you have a very long list of words, you shouldn't use str.join since it builds new string. The chain_from_iterable method of itertools module will chain the letters for free:
from collections import Counter
from itertools import chain
def count_letters_v2(word_list):
return Counter(chain.from_iterable(word_list))

Python dictionary sorting Anagram of a string [duplicate]

This question already has answers here:
Permutation of string as substring of another
(11 answers)
Closed 5 years ago.
I have the following question and I found this Permutation of string as substring of another, but this is using C++, I am kind of confused applying to python.
Given two strings s and t, determine whether some anagram of t is a
substring of s. For example: if s = "udacity" and t = "ad", then the
function returns True. Your function definition should look like:
question1(s, t) and return a boolean True or False.
So I answered this question but they want me to use dictionaries instead of sorting string. The reviewer saying that;
We can first compile a dictionary of counts for t and check with every
possible consecutive substring sets in s. If any set is anagram of t,
then we return True, else False. Comparing counts of all characters
will can be done in constant time since there are only limited amount
of characters to check. Looping through all possible consecutive
substrings will take worst case O(len(s)). Therefore, the time
complexity of this algorithm is O(len(s)). space complexity is O(1)
although we are creating a dictionary because we can have at most 26
characters and thus it is bounded.
Could you guys please help how I can use dictionaries in my solution.
Here is my solution;
# Check if s1 and s2 are anagram to each other
def anagram_check(s1, s2):
# sorted returns a new list and compare
return sorted(s1) == sorted(s2)
# Check if anagram of t is a substring of s
def question1(s, t):
for i in range(len(s) - len(t) + 1):
if anagram_check(s[i: i+len(t)], t):
return True
return False
def main():
print question1("udacity", "city")
if __name__ == '__main__':
main()
'''
Test Case 1: question1("udacity", "city") -- True
Test Case 2: question1("udacity", "ud") -- True
Test Case 3: question1("udacity", "ljljl") -- False
'''
Any help is appreciated. Thank you,

A pure python solution for getting an object which corresponds to how many of that char in the alphabet is in a string (t)
Using the function chr() you can convert an int to its corresponding ascii value, so you can easily work from 97 to 123 and use chr() to get that value of the alphabet.
So if you have a string say:
t = "abracadabra"
then you can do a for-loop like:
dt = {}
for c in range(97, 123):
dt[chr(c)] = t.count(chr(c))
this worked for this part of the solution giving back the result of:
{'k': 0, 'v': 0, 'a': 5, 'z': 0, 'n': 0, 't': 0, 'm': 0, 'q': 0, 'f': 0, 'x': 0, 'e': 0, 'r': 2, 'b': 2, 'i': 0, 'l': 0, 'h': 0, 'c': 1, 'u': 0, 'j': 0, 'p': 0, 's': 0, 'y': 0, 'o': 0, 'd': 1, 'w': 0, 'g': 0}
A different solution?
Comments are welcome, but why is storing in a dict necessary? using count(), can you not simply compare the counts for each char in t, to the count of that char in s? If the count of that char in t is greater than in s return False else True.
Something along the lines of:
def question1(s, t):
for c in range(97, 123):
if t.count(chr(c)) > s.count(chr(c)):
return False
return True
which gives results:
>>> question1("udacity", "city")
True
>>> question1("udacity", "ud")
True
>>> question1("udacity", "ljljl")
False
If a dict is necessary...
If it is, then just create two as above and go through each key...
def question1(s, t):
ds = {}
dt = {}
for c in range(97, 123):
ds[chr(c)] = s.count(chr(c))
dt[chr(c)] = t.count(chr(c))
for c in range(97, 123):
if dt[chr(c)] > ds[chr(c)]:
return False
return True
Update
The above answers ONLY CHECK FOR SUBSEQUENCES NOT SUBSTRING anagrams. As maraca explained to me in the comments, there is a distinction between the two and your example makes that clear.
Using the sliding window idea (by slicing the string), the code below should work for substrings:
def question1(s, t):
dt = {}
for c in range(97, 123):
dt[chr(c)] = t.count(chr(c))
for i in range(len(s) - len(t) + 1):
contains = True
for c in range(97, 123):
if dt[chr(c)] > s[i:i+len(t)].count(chr(c)):
contains = False
break
if contains:
return True
return False
The code above does work for ALL cases and utilizes a dictionary to speed up the calculations correctly :)

import collections
print collections.Counter("google")
Counter({'o': 2, 'g': 2, 'e': 1, 'l': 1})

Converting raw input string to numbers (for each letter) then summing the numbers [duplicate]

This question already has answers here:
Convert alphabet letters to number in Python
(19 answers)
Closed 7 years ago.
I want a user to enter a word e.g. apple and then convert each character in this string to a corresponding letter (a = 1, b = 2, c = 3 etc.)
So far I've defined all the letters
a=1
b=2
c=3
d=4
e=5
f=6
g=7
etc...
and have split the string to print out each letter using
word = str(raw_input("Enter a word: ").lower())
for i in range (len(word)):
print word[i]
this prints the characters individually, but I can't figure out how to print these as their corresponding numbers, which I could then sum together.

In your case, It is better to use a dictionary which defines all the characters and there value. The string library provides an easier way to do this. Using string.ascii_lowercase within a dict comprehension you can populate your dictionary mapping as such.
>>> import string
>>> wordmap = {x:y for x,y in zip(string.ascii_lowercase,range(1,27))}
>>> wordmap
{'a': 1, 'c': 3, 'b': 2, 'e': 5, 'd': 4, 'g': 7, 'f': 6, 'i': 9, 'h': 8, 'k': 11, 'j': 10, 'm': 13, 'l': 12, 'o': 15, 'n': 14, 'q': 17, 'p': 16, 's': 19, 'r': 18, 'u': 21, 't': 20, 'w': 23, 'v': 22, 'y': 25, 'x': 24, 'z': 26}
Now you can easily map this to your output. First we take the input
>>> word = str(raw_input("Enter a word: ").lower())
Enter a word: apple
>>> values = []
Now we just loop through the input word and append the values to an empty list. We append the values because we also need to find the sum of the values.
>>> for i in word:
... print "{}".format(wordmap[i])
... values.append(wordmap[i])
...
1
16
16
12
5
You can finally use the sum function to output the sum total of the values.
>>> sum(values)
50

>>> string = 'hello'
>>> for char in string:
... print ord(char) - 96,
...
8 5 12 12 15
You can get the sum as
>>> sum = 0
>>> for char in string:
... sum += ord(char) - 96
...
>>> sum
52
or lighter as
>>> sum ( [ ord(i) - 96 for i in string ] )
52

If the values associated to letters is not necessary folowing the ASCII order values, a solution could be to store the letters and their corresponding value to a dict:
values = {
'a':1,
'b':2,
'c':3,
'd':4,
'e':5,
'f':6,
'g':7,
# etc.
}
word = str(raw_input("Enter a word: ").lower())
for i in range (len(word)):
print values[word[i]]

First of all you can loop over your string itself as its an iterator then you can get the expected id for your letters with ord(i)%96 :
word = str(raw_input("Enter a word: ").lower())
for i in word:
print ord(i)%96
Note that ord('a')=97.
And for sum :
sum(ord(i)%96 for i in word)

Use ord (converts a character to it's code):
def getnumber(letter)
return ord(letter) - ord('a') + 1
and then, you can just do:
print sum(map(getnumber, word))

If you want the default ordinal of ANSI chars, you may use:
w = 'apple'
print [ ord(i) for i in w ]
If not, define your own mapping in a function, and call that in the list comprehension instead of the ord().

Store the characters in a dict: dict['a'] = 1, etc.

word = raw_input("Enter a word: ").lower()
s=[list(i) for i in word.split()]
s=[x for _list in s for x in _list]
print s
for i in s:
print ord(i)-96

To get your current method working you could do this...
word = str(raw_input("Enter a word: ").lower())
for i in range (len(word)):
# eval to change the string to an object
print eval(str(word[i]))
This will replace the character string with the character object you have defined already.
But a better way to solver your problem might be something like this...
nums = map(lambda x: ord(x)-96, list(raw_input("Enter word: ").lower()))
This returns a list of all the letters numbered where a is 1, b is 2 ... y is 25, z is 26. You can then do for i in nums: print i to print all the numbers out line by line.

Develop Reference

Python is a programming language that lets you work quickly and integrate systems more effectively.

Anagram test for two strings in python - python

Another solution: def test_for_anagrams(my_string1, my_string2): s1,s2 = my_string1.lower(), my_string2.lower() count = 0 if len(s1) != len(s2) : return False for char in s1 : if s2.count(char,0,len(s2)) == s1.count(char,0,len(s1)): count = count + 1 return count == len(s1)

My solution is: #anagrams def is_anagram(a, b): if sorted(a) == sorted(b): return True else: return False print(is_anagram("Alice", "Bob"))

Related

Removing duplicates using a dictionary

What's the best way to find the intersection between two strings?

Python dictionary being returned

Python dictionary sorting Anagram of a string [duplicate]

Converting raw input string to numbers (for each letter) then summing the numbers [duplicate]

Categories

Resources