Python dictionary sorting Anagram of a string [duplicate]

Python dictionary sorting Anagram of a string [duplicate] - python

This question already has answers here:
Permutation of string as substring of another
(11 answers)
Closed 5 years ago.
I have the following question and I found this Permutation of string as substring of another, but this is using C++, I am kind of confused applying to python.
Given two strings s and t, determine whether some anagram of t is a
substring of s. For example: if s = "udacity" and t = "ad", then the
function returns True. Your function definition should look like:
question1(s, t) and return a boolean True or False.
So I answered this question but they want me to use dictionaries instead of sorting string. The reviewer saying that;
We can first compile a dictionary of counts for t and check with every
possible consecutive substring sets in s. If any set is anagram of t,
then we return True, else False. Comparing counts of all characters
will can be done in constant time since there are only limited amount
of characters to check. Looping through all possible consecutive
substrings will take worst case O(len(s)). Therefore, the time
complexity of this algorithm is O(len(s)). space complexity is O(1)
although we are creating a dictionary because we can have at most 26
characters and thus it is bounded.
Could you guys please help how I can use dictionaries in my solution.
Here is my solution;
# Check if s1 and s2 are anagram to each other
def anagram_check(s1, s2):
# sorted returns a new list and compare
return sorted(s1) == sorted(s2)
# Check if anagram of t is a substring of s
def question1(s, t):
for i in range(len(s) - len(t) + 1):
if anagram_check(s[i: i+len(t)], t):
return True
return False
def main():
print question1("udacity", "city")
if __name__ == '__main__':
main()
'''
Test Case 1: question1("udacity", "city") -- True
Test Case 2: question1("udacity", "ud") -- True
Test Case 3: question1("udacity", "ljljl") -- False
'''
Any help is appreciated. Thank you,

A pure python solution for getting an object which corresponds to how many of that char in the alphabet is in a string (t)
Using the function chr() you can convert an int to its corresponding ascii value, so you can easily work from 97 to 123 and use chr() to get that value of the alphabet.
So if you have a string say:
t = "abracadabra"
then you can do a for-loop like:
dt = {}
for c in range(97, 123):
dt[chr(c)] = t.count(chr(c))
this worked for this part of the solution giving back the result of:
{'k': 0, 'v': 0, 'a': 5, 'z': 0, 'n': 0, 't': 0, 'm': 0, 'q': 0, 'f': 0, 'x': 0, 'e': 0, 'r': 2, 'b': 2, 'i': 0, 'l': 0, 'h': 0, 'c': 1, 'u': 0, 'j': 0, 'p': 0, 's': 0, 'y': 0, 'o': 0, 'd': 1, 'w': 0, 'g': 0}
A different solution?
Comments are welcome, but why is storing in a dict necessary? using count(), can you not simply compare the counts for each char in t, to the count of that char in s? If the count of that char in t is greater than in s return False else True.
Something along the lines of:
def question1(s, t):
for c in range(97, 123):
if t.count(chr(c)) > s.count(chr(c)):
return False
return True
which gives results:
>>> question1("udacity", "city")
True
>>> question1("udacity", "ud")
True
>>> question1("udacity", "ljljl")
False
If a dict is necessary...
If it is, then just create two as above and go through each key...
def question1(s, t):
ds = {}
dt = {}
for c in range(97, 123):
ds[chr(c)] = s.count(chr(c))
dt[chr(c)] = t.count(chr(c))
for c in range(97, 123):
if dt[chr(c)] > ds[chr(c)]:
return False
return True
Update
The above answers ONLY CHECK FOR SUBSEQUENCES NOT SUBSTRING anagrams. As maraca explained to me in the comments, there is a distinction between the two and your example makes that clear.
Using the sliding window idea (by slicing the string), the code below should work for substrings:
def question1(s, t):
dt = {}
for c in range(97, 123):
dt[chr(c)] = t.count(chr(c))
for i in range(len(s) - len(t) + 1):
contains = True
for c in range(97, 123):
if dt[chr(c)] > s[i:i+len(t)].count(chr(c)):
contains = False
break
if contains:
return True
return False
The code above does work for ALL cases and utilizes a dictionary to speed up the calculations correctly :)

import collections
print collections.Counter("google")
Counter({'o': 2, 'g': 2, 'e': 1, 'l': 1})

Related

How could one reduce the usage of helper functions in lambda expressions?

In this example I'm taking letters from a set and append them to a dictionary where the letter becomes the key and the literal 1 becomes the value to each pair.
def base_dict_from_set(s):
return reduce(lambda d,e : addvalue(1, e, d), s, dict())
def addvalue(value, key, d):
d[key] = value
return d
>>> base_dict_from_set(set("Hello World!".lower()))
{'o': 1, '!': 1, 'l': 1, 'd': 1, 'w': 1, ' ': 1, 'r': 1, 'e': 1, 'h': 1}
I was wondering whether I could somehow be rid of the 'addvalue' helper function and add the element and reference the modified dictionary within the lambda function itself.
The routine within addvalue itself seams very simple to me, so I would prefer something that looks like this:
def base_dict_from_set(s):
reutrn reduce(lambda d,e : d[e] = 1, s, dict())
I don't have a lot of experience in python and I come from a functional programming perspective. My goal is to understand pythons functional capabilities but I am too unexperienced to properly phrase and google what I am looking for.

What you are trying to do is why dict.fromkeys exists: create a dict that maps each key to the same constant value.
>>> dict.fromkeys("Hello World!".lower(), 1)
{'h': 1, 'e': 1, 'l': 1, 'o': 1, ' ': 1, 'w': 1, 'r': 1, 'd': 1, '!': 1}
There's no need to convert the string to a set first, since any duplicates will just be overwritten by the following occurrences.
(If the constant value is mutable, you should use the dict comprehension to ensure that each key gets its own mutable value, rather than every key sharing a reference to the same mutable value.)

You can use a dict comprehension for the same result:
{l: 1 for l in set("Hello World!".lower())}

To answer exactly the question asked, yes you can get rid of the addvalue by replacing addvalue(1, e, d) with {**d, e:1}.
Nevertheless, your code is still faulty. It is not counting the occurrences, but creates a dict of key: 1 for every letter in the string and it should create a dict of key: number_of_occurences to achieve this you should replace addvalue(1, e, d) with {**d, e: 1 + (d[e] if e in d else 0)} and not convert the string to set as it eliminates duplicates

I'm a bit surprised that you tried to use reduce when your goal is to transform each item in an input collection (the letters in a string) to an output collection (a key/value pair where the key is the letter and the value is a constant number), independently of each other.
In my view, reduce is for when an operation needs to be done to items in a sequence and taking all previous items into account (for instance, when calculating a sum of values).
So in a functional style, using map here would be more appropriate than reduce, in my opinion. Python supports this:
def quant_dict_from_set(s):
return dict(map(lambda c: (c, 1), s.lower()))
Where map converts the string to key/value pairs and the dict constructor collects these pairs in a dictionary, while eliminating duplicate keys at the same time.
But more idiomatic approaches would be to use a dictionary comprehension or the dict.fromkeys constructor.

Hacky and hard to read, but closest to the lambda you were trying to write, and hopefully educational:
>>> f = lambda d, e: d.__setitem__(e, 1) or d
>>> d = {}
>>> output = f(d, 42)
>>> output
{42: 1}
>>> output is d
True
Using __setitem__ avoids the = assignment.
__setitem__ returns None, so the expression d.__setitem__(e, 1) or d always evaluates to d, which is returned by the lambda.

You can use collections.Counter, a subclass of dict specifically for counting occurrences of elements.
>>> import collections
>>> collections.Counter('Hello, World!'.lower())
Counter({'l': 3, 'o': 2, 'h': 1, 'e': 1, ',': 1, ' ': 1, 'w': 1, 'r': 1, 'd': 1, '!': 1})
>>> collections.Counter(set('Hello, World!'.lower()))
Counter({'w': 1, 'l': 1, 'r': 1, ',': 1, 'h': 1, 'd': 1, 'o': 1, 'e': 1, ' ': 1, '!': 1})
Note that Counter is appropriate if you want to count the elements, of if you want to initiate the values to the constant 1. If you want to initiate the values to another constant, then Counter will not be the solution and you should use a dictionary comprehension or the dict.fromkeys constructor.

Why this python character sequence code giving unexpected result?

I am writing a python program to find character sequence in a word. But the program is giving the unexpected result.
I have found a similar type program that works perfectly.
To me I think the two program is quite similar but dont know why one of them does not work
The program that is not working:
# Display the character sequence in a word
dict={}
string=input("Enter the string:").strip().lower()
for letter in string:
if letter !=dict.keys():
dict[letter]=1
else:
dict[letter]=dict[letter]+1
print(dict)
The program that is working:
def char_frequency(str1):
dict = {}
for n in str1:
keys = dict.keys()
if n in keys:
dict[n] += 1
else:
dict[n] = 1
return dict
print(char_frequency('google.com'))
The output for the first program is giving:
Enter the string:google.com
{'g': 1, 'c': 1, 'm': 1, 'o': 1, 'l': 1, '.': 1, 'e': 1}
The output for the second program is:
{'c': 1, 'e': 1, 'o': 3, 'g': 2, '.': 1, 'm': 1, 'l': 1}
The above is the correct output.
Now the questions in my mind.
i. Why the first program is not working correctly?
ii. Is the ideology of these two programs are different?

Actually, there's a little mistake is in the if statement you have used. Just have a look at the below modified program.
Note: Also make sure not to use pre-defined data type names like dict as variable names. I have changed that to d here.
>>> d = {}
>>>
>>> string=input("Enter the string:").strip().lower()
Enter the string:google.com
>>>
>>> for letter in string:
... if letter not in d.keys():
... d[letter] = 1
... else:
... d[letter] = d[letter] + 1
...
>>> print(d)
{'g': 2, 'o': 3, 'l': 1, 'e': 1, '.': 1, 'c': 1, 'm': 1}
>>>
You can have also have a look at the below statements executed on the terminal.
Comparing a key with d.keys() will always return False as key is a string here and d.keys() will always be an object of type dict_keys (Python3) and a list (Python2).
>>> d = {"k1": "v1", "k3": "v2", "k4": "Rishi"}
>>>
>>> d.keys()
dict_keys(['k1', 'k3', 'k4'])
>>>
>>> "k1" in d
True
>>>
>>> not "k1" in d
False
>>>
>>> "k1" == d.keys()
False
>>>
>>> "k1" not in d
False
>>>
Answers of your 2 questions:
Because the statement letter != dict.keys() is always True so no increment in key counts. Just change it to letter not in dict.keys(). And it is better to use d in place of dict so that the statement will look like letter not in d.keys().
Logic of both the programs are same i.e. iterating over the dictionary, checking for an existence of key in dictionary. If it does not exist, create a new key with count 1 else increment the related count by 1.
Thank you v. much.

This line is nonsensical:
if letter !=dict.keys():
letter is a length one str, while dict.keys() returns a key view object, which is guaranteed to never be equal to a str of any kind. Your if check is always false. The correct logic would be:
if letter not in dict:
(you could add .keys() if you really want to, but it's wasteful and pointless; membership testing on a dict is checking its keys implicitly).
Side-note: You're going to confuse the crap out of yourself by naming a variable dict, because you're name-shadowing the dict constructor; if you ever need to use it, it won't be available in that scope. Don't shadow built-in names if at all possible.

What's the best way to find the intersection between two strings?

I need to find the intersection between two strings.
Assertions:
assert intersect("test", "tes") == list("tes"), "Assertion 1"
assert intersect("test", "ta") == list("t"), "Assertion 2"
assert intersect("foo", "fo") == list("fo"), "Assertion 3"
assert intersect("foobar", "foo") == list("foo"), "Assertion 4"
I tried different implementations for the intersect function. intersect would receive 2 str parameters, w and w2
List comprehension. Iterate and look for occurrences in the second string.
return [l for l in w if l in w2]
Fail assertion 1 and 2 because multiple t in w match the one t in w2
Sets intersections.
return list(set(w).intersection(w2)
return list(set(w) & set(w2))
Fails assertion 3 and 4 because a set is a collection of unique elements and duplicated letters will be discarded.
Iterate and count.
out = ""
for c in s1:
if c in s2 and not c in out:
out += c
return out
Fails because it also eliminates duplicates.
difflib (Python Documentation)
letters_diff = difflib.ndiff(word, non_wildcards_letters)
letters_intersection = []
for l in letters_diff:
letter_code, letter = l[:2], l[2:]
if letter_code == " ":
letters_intersection.append(letter)
return letters_intersection
Passes
difflib works but can anybody think of a better, optimized approach?
EDIT:
The function would return a list of chars. The order doesn't really matter.

Try this:
def intersect(string1, string2):
common = []
for char in set(string1):
common.extend(char * min(string1.count(char), string2.count(char)))
return common
Note: It doesn't preserve the order (if I remember set() correctly, the letters will be returned in alphabetical order). But, as you say in your comments, order doesn't matter

This works pretty well for your test cases:
def intersect(haystack, needle):
while needle:
pos = haystack.find(needle)
if pos >= 0:
return list(needle)
needle = needle[:-1]
return []
But, bear in mind that, all your test cases are longer then shorter, do not have an empty search term, an empty search space, or a non-match.

Gives the number of co-occurrence for all n-grams in the two strings:
from collections import Counter
def all_ngrams(text):
ngrams = ( text[i:i+n] for n in range(1, len(text)+1)
for i in range(len(text)-n+1) )
return Counter(ngrams)
def intersection(string1, string2):
count_1 = all_ngrams(string1)
count_2 = all_ngrams(string2)
return count_1 & count_2 # intersection: min(c[x], d[x])
intersection('foo', 'f') # Counter({'f': 1})
intersection('foo', 'o') # Counter({'o': 1})
intersection('foobar', 'foo') # Counter({'f': 1, 'fo': 1, 'foo': 1, 'o': 2, 'oo': 1})
intersection('abhab', 'abab') # Counter({'a': 2, 'ab': 2, 'b': 2})
intersection('achab', 'abac') # Counter({'a': 2, 'ab': 1, 'ac': 1, 'b': 1, 'c': 1})
intersection('test', 'ates') # Counter({'e': 1, 'es': 1, 's': 1, 't': 1, 'te': 1, 'tes': 1})

Anagram test for two strings in python

This is the question:
Write a function named test_for_anagrams that receives two strings as
parameters, both of which consist of alphabetic characters and returns
True if the two strings are anagrams, False otherwise. Two strings are
anagrams if one string can be constructed by rearranging the
characters in the other string using all the characters in the
original string exactly once. For example, the strings "Orchestra" and
"Carthorse" are anagrams because each one can be constructed by
rearranging the characters in the other one using all the characters
in one of them exactly once. Note that capitalization does not matter
here i.e. a lower case character can be considered the same as an
upper case character.
My code:
def test_for_anagrams (str_1, str_2):
str_1 = str_1.lower()
str_2 = str_2.lower()
print(len(str_1), len(str_2))
count = 0
if (len(str_1) != len(str_2)):
return (False)
else:
for i in range(0, len(str_1)):
for j in range(0, len(str_2)):
if(str_1[i] == str_2[j]):
count += 1
if (count == len(str_1)):
return (True)
else:
return (False)
#Main Program
str_1 = input("Enter a string 1: ")
str_2 = input("Enter a string 2: ")
result = test_for_anagrams (str_1, str_2)
print (result)
The problem here is when I enter strings as Orchestra and Carthorse, it gives me result as False. Same for the strings The eyes and They see. Any help would be appreciated.

I'm new to python, so excuse me if I'm wrong
I believe this can be done in a different approach: sort the given strings and then compare them.
def anagram(a, b):
# string to list
str1 = list(a.lower())
str2 = list(b.lower())
#sort list
str1.sort()
str2.sort()
#join list back to string
str1 = ''.join(str1)
str2 = ''.join(str2)
return str1 == str2
print(anagram('Orchestra', 'Carthorse'))

The problem is that you just check whether any character matches exist in the strings and increment the counter then. You do not account for characters you already matched with another one. That’s why the following will also fail:
>>> test_for_anagrams('aa', 'aa')
False
Even if the string is equal (and as such also an anagram), you are matching the each a of the first string with each a of the other string, so you have a count of 4 resulting in a result of False.
What you should do in general is count every character occurrence and make sure that every character occurs as often in each string. You can count characters by using a collections.Counter object. You then just need to check whether the counts for each string are the same, which you can easily do by comparing the counter objects (which are just dictionaries):
from collections import Counter
def test_for_anagrams (str_1, str_2):
c1 = Counter(str_1.lower())
c2 = Counter(str_2.lower())
return c1 == c2
>>> test_for_anagrams('Orchestra', 'Carthorse')
True
>>> test_for_anagrams('aa', 'aa')
True
>>> test_for_anagrams('bar', 'baz')
False

For completeness: If just importing Counter and be done with the exercise is not in the spirit of the exercise, you can just use plain dictionaries to count the letters.
def test_for_anagrams(str_1, str_2):
counter1 = {}
for c in str_1.lower():
counter1[c] = counter1.get(c, 0) + 1
counter2 = {}
for c in str_2.lower():
counter2[c] = counter2.get(c, 0) + 1
# print statements so you can see what's going on,
# comment out/remove at will
print(counter1)
print(counter2)
return counter1 == counter2
Demo:
print(test_for_anagrams('The eyes', 'They see'))
print(test_for_anagrams('orchestra', 'carthorse'))
print(test_for_anagrams('orchestr', 'carthorse'))
Output:
{' ': 1, 'e': 3, 'h': 1, 's': 1, 't': 1, 'y': 1}
{' ': 1, 'e': 3, 'h': 1, 's': 1, 't': 1, 'y': 1}
True
{'a': 1, 'c': 1, 'e': 1, 'h': 1, 'o': 1, 's': 1, 'r': 2, 't': 1}
{'a': 1, 'c': 1, 'e': 1, 'h': 1, 'o': 1, 's': 1, 'r': 2, 't': 1}
True
{'c': 1, 'e': 1, 'h': 1, 'o': 1, 's': 1, 'r': 2, 't': 1}
{'a': 1, 'c': 1, 'e': 1, 'h': 1, 'o': 1, 's': 1, 'r': 2, 't': 1}
False

Traverse through string test and validate weather character present in string test1 if present store the data in string value.
compare the length of value and length of test1 if equals return True Else False.
def anagram(test,test1):
value =''
for data in test:
if data in test1:
value += data
if len(value) == len(test1):
return True
else:
return False
anagram("abcd","adbc")

I have done Anagram Program in basic way and easy to understandable .
def compare(str1,str2):
if((str1==None) or (str2==None)):
print(" You don't enter string .")
elif(len(str1)!=len(str2)):
print(" Strings entered is not Anagrams .")
elif(len(str1)==len(str2)):
b=[]
c=[]
for i in str1:
#print(i)
b.append(i)
b.sort()
print(b)
for j in str2:
#print(j)
c.append(j)
c.sort()
print(c)
if (b==c and b!=[] ):
print(" String entered is Anargama .")
else:
print(" String entered are not Anargama.")
else:
print(" String entered is not Anargama .")
str1=input(" Enter the first String :")
str2=input(" Enter the second String :")
compare(str1,str2)

A more concise and pythonic way to do it is using sorted & lower/upper keys.
You can first sort the strings and then use lower/ upper to make the case consistent for proper comparison as follows:
# Function definition
def test_for_anagrams (str_1, str_2):
if sorted(str_1).lower() == sorted(str_2).lower():
return True
else:
return False
#Main Program
str_1 = input("Enter a string 1: ")
str_2 = input("Enter a string 2: ")
result = test_for_anagrams (str_1, str_2)
print (result)

Another solution:
def test_for_anagrams(my_string1, my_string2):
s1,s2 = my_string1.lower(), my_string2.lower()
count = 0
if len(s1) != len(s2) :
return False
for char in s1 :
if s2.count(char,0,len(s2)) == s1.count(char,0,len(s1)):
count = count + 1
return count == len(s1)

My solution is:
#anagrams
def is_anagram(a, b):
if sorted(a) == sorted(b):
return True
else:
return False
print(is_anagram("Alice", "Bob"))

def anagram(test,test1):
test_value = []
if len(test) == len(test1):
for i in test:
value = test.count(i) == test1.count(i)
test_value.append(value)
else:
test_value.append(False)
if False in test_value:
return False
else:
return True
check for length of test and test1 , if length matches traverse through string test and compare the character count in both test and test1 strings if matches store the value in string.

Python KeyError while comparing chars in dict and list

I have a problem concerning a comparison between a char key in a dict and a char within a list.
The Task is to read a text and count all beginning letters.
I have a list with chars:
bchars = ('i','g','h','n','h')
and a dict with the alphabet and frequency default to zero:
d = dict(dict())
for i in range(97,123):
d[i-97]={chr(i):0}
no i want to check like the following:
for i in range(len(bchars)):
for j in range(len(d)):
if(bchars[i] in d[j]):
d[j][chr(i+97)] +=1
else:
d[j][chr(i+97)] +=0
so if the char in the list is a key at the certain position then += 1 else += zero
I thought by using a if/else statement I can bypass the KeyError.
Is there any more elegant solution for that?

The specific problem is that you check whether bchars[i] is in d[j], but then the key you actually use is chr(i+97).
chr(i+97) is the index of the ith character in bchars, but mapped to ASCII characters starting from 'a'. Why would you want to use this as your key?
I think you really want to do:
for i in range(len(bchars)):
for j in range(len(d)):
if(bchars[i] in d[j]):
d[j][bchars[i]] += 1
else:
d[j][bchars[i]] = 1
Note that you can't use += in the else; remember how you literally just checked whether the key was there and decided it wasn't?
More broadly, though, your code doesn't make sense - it is overcomplicated and does not use the real power of Python's dictionaries. d looks like:
{0: {'a': 0}, 1: {'b': 0}, 2: {'c': 0}, ...}
It would be much more sensible to build a dictionary mapping character directly to count:
{'a': 0, 'b': 0, 'c': 0, ...}
then you can simply do:
for char in bchars:
if char in d:
d[char] += 1
Python even comes with a class just for doing this sort of thing.

The nested dictionary doesn't seem necessary:
d = [0] * 26
for c in bchars:
d[ord(c)-97] += 1
You might also want to look at the Counter class in the collections module.

from collections import Counter
bchars = ('i','g','h','n','h')
counts = Counter(bchars)
print(counts)
print(counts['h'])
prints
Counter({'h': 2, 'i': 1, 'g': 1, 'n': 1})
2

Develop Reference

Python is a programming language that lets you work quickly and integrate systems more effectively.

Python dictionary sorting Anagram of a string [duplicate] - python

import collections print collections.Counter("google") Counter({'o': 2, 'g': 2, 'e': 1, 'l': 1})

Related

How could one reduce the usage of helper functions in lambda expressions?

Why this python character sequence code giving unexpected result?

What's the best way to find the intersection between two strings?

Anagram test for two strings in python

Python KeyError while comparing chars in dict and list

Categories

Resources