Substitution Cipher Verification - python

I am tasked to write a function that returns whether two strings are substitution ciphers of each other. It is assumed that one isn't given a key. The output is expected to return True or False.
Here is what I have written so far on this (borrowed from a CodeFights question). The idea is to append the counts of each element in the string and add it to the string1count and string2count variables. Then, compare the counts at each index, and if they are not equal, we can assume that it is not a valid substitution cipher since each element in the array needs to have the same number of corresponding of characters in order to be a substitution cipher.
def isSubstitutionCipher(string1, string2):
string1count = []
string2count = []
for i in range(0,len(string1)):
string1count.append(string1.count(string1[i]))
for i in range(0,len(string2)):
string2count.append(string2.count(string2[i]))
for i in range(0,len(string1count)):
if string1count.count(string1count[i])!=string2count.count(string1count[i]):
return False
return True
Does anyone else have other proposals on how to solve this very general question / problem statement?

you could try to re-create the subsitution:
def isSubstitutionCipher(string1, string2):
if len(string1) != len(string2):
return False
subst = {}
for c1, c2 in zip(string1, string2):
if c1 in subst:
if c2 != subst[c1]:
return False
else:
if c2 in subst.values():
return False
subst[c1] = c2
return True
for all the characters you have already seen, make sure the substitution matches. for the new ones: store them in the substitution and make sure they are not already a substitution target.
this will return False at the first character that does not match.

Here is a variation on hiro's excellent answer:
def is_sub(s,t):
if len(s) != len(t):return False
d = dict(zip(s,t))
return t == ''.join(d[c] for c in s)

We can use word patterns to check if one string is the ciphertext of another.
word pattern: first letter gets the number 0 and the first occurrence of each different letter after that gets the next number.
advantage is this has O(n) complexity
Code
def isSubstitutionCipher(s1, s2):
def word_pattern(s):
' Generates word pattern of s '
seen, pattern = {}, []
for c in s:
seen.setdefault(c, len(seen))
pattern.append(seen[c])
return pattern
return word_pattern(s1) == word_pattern(s2) # related by ciphertext if same word patterns
Test
'
print(isSubstitutionCipher('banana', 'cololo')) # Output: True
print(isSubstitutionCipher('dog', 'cat') # Output: True
print(isSubstitutionCipher('banana', 'cololl') # Output: False

Related

How do I check if the next item in a string is the alphabetical successor of the one before? + Inverse

I'm trying to compress a string in a way that any sequence of letters in strict alphabetical order is swapped with the first letter plus the length of the sequence.
For example, the string "abcdefxylmno", would become: "a6xyl4"
Single letters that aren't in order with the one before or after just stay the way they are.
How do I check that two letters are successors (a,b) and not simply in alphabetical order (a,c)? And how do I keep iterating on the string until I find a letter that doesn't meet this requirement?
I'm also trying to do this in a way that makes it easier to write an inverse function (that given the result string gives me back the original one).
EDIT :
I've managed to get the function working, thanks to your suggestion of using the alphabet string as comparison; now I'm very much stuck on the inverse function: given "a6xyl4" expand it back into "abcdefxylmno".
After quite some time I managed to split the string every time there's a number and I made a function that expands a 2 char string, but it fails to work when I use it on a longer string:
from string import ascii_lowercase as abc
def subString(start,n):
L=[]
ind = abc.index(start)
newAbc = abc[ind:]
for i in range(len(newAbc)):
while i < n:
L.append(newAbc[i])
i+=1
res = ''.join(L)
return res
def unpack(S):
for i in range(len(S)-1):
if S[i] in abc and S[i+1] not in abc:
lett = str(S[i])
num = int(S[i+1])
return subString(lett,num)
def separate(S):
lst = []
for i in S:
lst.append(i)
for el in lst:
if el.isnumeric():
ind = lst.index(el)
lst.insert(ind+1,"-")
a = ''.join(lst)
L = a.split("-")
if S[-1].isnumeric():
L.remove(L[-1])
return L
else:
return L
def inverse(S):
L = separate(S)
for i in L:
return unpack(i)
Each of these functions work singularly, but inverse(S) doesn't output anything. What's the mistake?
You can use the ord() function which returns an integer representing the Unicode character. Sequential letters in alphabetical order differ by 1. Thus said you can implement a simple funtion:
def is_successor(a,b):
# check for marginal cases if we dont ensure
# input restriction somewhere else
if ord(a) not in range(ord('a'), ord('z')) and ord(a) not in range(ord('A'),ord('Z')):
return False
if ord(b) not in range(ord('a'), ord('z')) and ord(b) not in range(ord('A'),ord('Z')):
return False
# returns true if they are sequential
return ((ord(b) - ord(a)) == 1)
You can use chr(int) method for your reversing stage as it returns a string representing a character whose Unicode code point is an integer given as argument.
This builds on the idea that acceptable subsequences will be substrings of the ABC:
from string import ascii_lowercase as abc # 'abcdefg...'
text = 'abcdefxylmno'
stack = []
cache = ''
# collect subsequences
for char in text:
if cache + char in abc:
cache += char
else:
stack.append(cache)
cache = char
# if present, append the last sequence
if cache:
stack.append(cache)
# stack is now ['abcdef', 'xy', 'lmno']
# Build the final string 'a6x2l4'
result = ''.join(f'{s[0]}{len(s)}' if len(s) > 1 else s for s in stack)

check if letters of a string are in sequential order in another string

If it were just checking whether letters in a test_string are also in a control_string,
I would not have had this problem.
I will simply use the code below.
if set(test_string.lower()) <= set(control_string.lower()):
return True
But I also face a rather convoluted task of discerning whether the overlapping letters in the
control_string are in the same sequential order as those in test_string.
For example,
test_string = 'Dih'
control_string = 'Danish'
True
test_string = 'Tbl'
control_string = 'Bottle'
False
I thought of using the for iterator to compare the indices of the alphabets, but it is quite hard to think of the appropriate algorithm.
for i in test_string.lower():
for j in control_string.lower():
if i==j:
index_factor = control_string.index(j)
My plan is to compare the primary index factor to the next factor, and if primary index factor turns out to be larger than the other, the function returns False.
I am stuck on how to compare those index_factors in a for loop.
How should I approach this problem?
You could just join the characters in your test string to a regular expression, allowing for any other characters .* in between, and then re.search that pattern in the control string.
>>> test, control = "Dih", "Danish"
>>> re.search('.*'.join(test), control) is not None
True
>>> test, control = "Tbl", "Bottle"
>>> re.search('.*'.join(test), control) is not None
False
Without using regular expressions, you can create an iter from the control string and use two nested loops,1) breaking from the inner loop and else returning False until all the characters in test are found in control. It is important to create the iter, even though control is already iterable, so that the inner loop will continue where it last stopped.
def check(test, control):
it = iter(control)
for a in test:
for b in it:
if a == b:
break
else:
return False
return True
You could even do this in one (well, two) lines using all and any:
def check(test, control):
it = iter(control)
return all(any(a == b for b in it) for a in test)
Complexity for both approaches should be O(n), with n being the max number of characters.
1) This is conceptually similar to what #jpp does, but IMHO a bit clearer.
Here's one solution. The idea is to iterate through the control string first and yield a value if it matches the next test character. If the total number of matches equals the length of test, then your condition is satisfied.
def yield_in_order(x, y):
iterstr = iter(x)
current = next(iterstr)
for i in y:
if i == current:
yield i
current = next(iterstr)
def checker(test, control):
x = test.lower()
return sum(1 for _ in zip(x, yield_in_order(x, control.lower()))) == len(x)
test1, control1 = 'Tbl', 'Bottle'
test2, control2 = 'Dih', 'Danish'
print(checker(test1, control1)) # False
print(checker(test2, control2)) # True
#tobias_k's answer has cleaner version of this. If you want some additional information, e.g. how many letters align before there's a break found, you can trivially adjust the checker function to return sum(1 for _ in zip(x, yield_in_order(...))).
You can use find(letter, last_index) to find occurence of desired letter after processed letters.
def same_order_in(test, control):
index = 0
control = control.lower()
for i in test.lower():
index = control.find(i, index)
if index == -1:
return False
# index += 1 # uncomment to check multiple occurrences of same letter in test string
return True
If test string have duplicate letters like:
test_string = 'Diih'
control_string = 'Danish'
With commented line same_order_in(test_string, control_string) == True
and with uncommented line same_order_in(test_string, control_string) == False
Recursion is the best way to solve such problems.
Here's one that checks for sequential ordering.
def sequentialOrder(test_string, control_string, len1, len2):
if len1 == 0: # base case 1
return True
if len2 == 0: # base case 2
return False
if test_string[len1 - 1] == control_string[len2 - 1]:
return sequentialOrder(test_string, control_string, len1 - 1, len2 - 1) # Recursion
return sequentialOrder(test_string, control_string, len1, len2-1)
test_string = 'Dih'
control_string = 'Danish'
print(isSubSequence(test_string, control_string, len(test_string), len(control_string)))
Outputs:
True
and False for
test_string = 'Tbl'
control_string = 'Bottle'
Here's an Iterative approach that does the same thing,
def sequentialOrder(test_string,control_string,len1,len2):
i = 0
j = 0
while j < len1 and i < len2:
if test_string[j] == control_string[i]:
j = j + 1
i = i + 1
return j==len1
test_string = 'Dih'
control_string = 'Danish'
print(sequentialOrder(test_string,control_string,len(test_string) ,len(control_string)))
An elegant solution using a generator:
def foo(test_string, control_string):
if all(c in control_string for c in test_string):
gen = (char for char in control_string if char in test_string)
if all(x == test_string[i] for i, x in enumerate(gen)):
return True
return False
print(foo('Dzn','Dahis')) # False
print(foo('Dsi','Dahis')) # False
print(foo('Dis','Dahis')) # True
First check if all the letters in the test_string are contained in the control_string. Then check if the order is similar to the test_string order.
A simple way is making use of the key argument in sorted, which serves as a key for the sort comparison:
def seq_order(l1, l2):
intersection = ''.join(sorted(set(l1) & set(l2), key = l2.index))
return True if intersection == l1 else False
Thus this is computing the intersection of the two sets and sorting it according to the longer string. Having done so you only need to compare the result with the shorter string to see if they are the same.
The function returns True or False accordingly. Using your examples:
seq_order('Dih', 'Danish')
#True
seq_order('Tbl', 'Bottle')
#False
seq_order('alp','apple')
#False

Python 2.7 finding if some anagram of one string is a substring of another [duplicate]

This question already has answers here:
Anagram of String 2 is Substring of String 1
(5 answers)
Closed 5 years ago.
EDIT: Posting my final solution because this was a very helpful thread and I want to add some finality to it. Using the advice from both answers below I was able to craft a solution. I added a helper function in which I defined an anagram. Here is my final solution:
def anagram(s1, s2):
s1 = list(s1)
s2 = list(s2)
s1.sort()
s2.sort()
return s1 == s2
def Question1(t, s):
t_len = len(t)
s_len = len(s)
t_sort = sorted(t)
for start in range(s_len - t_len + 1):
if anagram(s[start: start+t_len], t):
return True
return False
print Question1("app", "paple")
I am working on some practice technical interview questions and I'm stuck on the following question:
Find whether an anagram of string t is a substring of s
I have worked out the following two variants of my code, and a solution to this I believe lies in a cross between the two. The problem I am having is that the first code always prints False., regardless of input. The second variation works to some degree. However, it cannot sort individual letters. For example t=jks s=jksd will print True! however t=kjs s=jksd will print False.
def Question1():
# Define strings as raw user input.
t = raw_input("Enter phrase t:")
s = raw_input("Enter phrase s:")
# Use the sorted function to find if t in s
if sorted(t.lower()) in sorted(s.lower()):
print("True!")
else:
print("False.")
Question1()
Working variant:
def Question1():
# Define strings as raw user input.
t = raw_input("Enter phrase t:")
s = raw_input("Enter phrase s:")
# use a loop to find if t is in s.
if t.lower() in s.lower():
print("True!")
else:
print("False.")
Question1()
I believe there is a solution that lies between these two, but I'm having trouble figuring out how to use sorted in this situation.
You're very much on the right track. First, please note that there is no loop in your second attempt.
The problem is that you can't simply sort all of s and then look for sorted(t) in that. Rather, you have to consider each len(t) sized substring of s, and check that against the sorted t. Consider the trivial example:
t = "abd"
s = "abdc"
s trivially contains t. However, when you sort them, you get the strings abd and abcd, and the in comparison fails. The sorting gets other letters in the way.
Instead, you need to step through s in chunks the size of t.
t_len = len(t)
s_len = len(s)
t_sort = sorted(t)
for start in range(s_len - t_len + 1):
chunk = s[start:start+t_len]
if t_sort == sorted(chunk):
# SUCCESS!!
I think your problem lies in the "substring" requirement. If you sort, you destroy order. Which means that while you can determine that an anagram of string1 is an anagram of a substring of string2, until you actually deal with string2 in order, you won't have a correct answer.
I'd suggest iterating over all the substrings of length len(s1) in s2. This is a straightforward for loop. Once you have the substrings, you can compare them (sorted vs sorted) with s1 to decide if there is any rearrangement of s1 that yields a contiguous substring of s2.
Viz:
s1 = "jks"
s2 = "aksjd"
print('s1=',s1, ' s2=', s2)
for offset in range(len(s2) - len(s1) + 1):
ss2 = s2[offset:offset+len(s1)]
if sorted(ss2) == sorted(s1):
print('{} is an anagram of {} at offset {} in {}'.format(ss2, s1, offset, s2))

Search a string for a given key

I've been doing some more CodeEval challenges and came across one on the hard tab.
You are given two strings. Determine if the second string is a substring of the first (Do NOT use any substr type library function). The second string may contain an asterisk() which should be treated as a regular expression i.e. matches zero or more characters. The asterisk can be escaped by a \ char in which case it should be interpreted as a regular '' character. To summarize: the strings can contain alphabets, numbers, * and \ characters.
So you are given two strings in a file that look something like this: Hello,ell your job is to figure out if ell is in hello, what I do:
I haven't quite gotten it perfect, but I did get it to the point where it passes and works with a 65% complete. How it runs through the string, and the key, and checks if the characters match. If the characters match, it appends the character into a list. After this it divides the length of the string by 2 and checks if the length of the list is either greater than, or equal to half of the string. I figured half of the string length would be enough to verify if it indeed matches or not. Example of how it works:
h == e -> no
e == e -> yes -> list
l == e -> no
l == e -> no
...
My question is what can I do better to the point where I can verify the wildcards that are said above?
import sys
def search_string(string, key):
""" Search a string for a specified key.
If the key exists out put "true" if it doesn't output "false"
>>> search_string("test", "est")
true
>>> search_string("testing", "rawr")
false"""
results = []
for c in string:
for ch in key:
if c == ch:
results.append(c)
if len(string) / 2 < len(results) or len(string) / 2 == len(results):
return "true"
else:
return "false"
if __name__ == '__main__':
with open(sys.argv[1]) as data:
for line in data.readlines():
data_list = line.rstrip().split(",")
search_key = data_list[1]
word = data_list[0]
print(search_string(word, search_key))
I've come up with a solution to this problem. You've said "Do NOT use any substr type library function", I'm not sure If some of the functions I used are allowed or not, so tell me if I've broken any rules :D
Hope this helps you :)
def search_string(string, key):
key = key.replace("\\*", "<NormalStar>") # every \* becomes <NormalStar>
key = key.split("*") # splitting up the key makes it easier to work with
#print(key)
point = 0 # for checking order, e.g. test = t*est, test != est*t
found = "true" # default
for k in key:
k = k.replace("<NormalStar>", "*") # every <NormalStar> becomes *
if k in string[point:]: # the next part of the key is after the part before
point = string.index(k) + len(k) # move point after this
else: # k nbt found, return false
found = "false"
break
return found
print(search_string("test", "est")) # true
print(search_string("t....est", "t*est")) # true
print(search_string("n....est", "t*est")) # false
print(search_string("est....t", "t*est")) # false
print(search_string("anything", "*")) # true
print(search_string("test", "t\*est")) # false
print(search_string("t*est", "t\*est")) # true

Finding a common char between two strings recursively

Im trying to write a recursive code that recieves 2 strings and returns True is they have a common char or False if the dont.
I first wrote an iterative code cause I thaught it may help.
The problem that I have is that I dont know how to compare all of the chars in each string. this is what i did so far:
iterative code:
def any_char_present_it(s1,s2):
if len(s1)==0 or len(s2)==0:
return False
for i in s2:
for m in s1:
if i==m:
return True
return False
recursive code:
def any_char_present(s1,s2):
if len_rec(s2)==0:
return False
if s1[0]==s2[0]:
return True
return any_char_present(s1,s2[1:])
You can use sets and set theory to check for common characters without iterating through everything yourself.
has_common_chars turns both strings into sets and finds the intersection of them. If the length of the intersection is greater than zero, there is at least one character in common.
s1 = "No one writes to the Colonel"
s2 = "Now is the time or all good men to come to the ade of the lemon."
s3 = "ZZZZ"
def has_common_chars(s1, s2):
return len(set(s1) & set(s2)) > 0
print has_common_chars(s1, s2)
print has_common_chars(s2, s3)
>>> True
>>> False
EDIT s/union/intersection
Just to go off your code, you have to try every combination. In order to do that you can decrement each string at the return statement as so:
#return check(s1, decremented s2) or check(decremented s1, s2)
return (any_char_present(s1,s2[1:]) or any_char_present(s1[1:],s2))
This should exhaust all possible combinations to find a char match at any point across the two string inputs.
Entire code:
def any_char_present(s1,s2):
#update this if statement to check both strings
#you can check for empty strings this way too
if not s1 or not s2:
return False
if s1[0]==s2[0]:
return True
return (any_char_present(s1,s2[1:]) or any_char_present(s1[1:],s2))
print(any_char_present("xyz", "aycd"))

Categories

Resources