Python: compare two strings retain difference from one end - python

Beginner here, and haven't found an answer for this question though some are similar.
If I have two strings:
s1 = 'abcdefghijk'
s2 = 'abcdefghi'
How do I get 'jk' as an output? 'abcdefghi' must first match, and then I get the difference on the end.
The next after this (which I may be able to figure out if I get the first question answered) is what if s2 = 'cdefghi' and I still want output to be only 'jk' not 'ab' and 'jk'.

You can find the first index of s2 in s1 with find(), i.e.:
def after(s1, s2):
index = s1.find(s2)
# return None if s2 is not part of s1
# or if there are no characters behind s2 in s1
if index != -1 and index + len(s2) < len(s1):
return s1[index + len(s2):]
else:
return None
s1 = "abcdefghijk"
s2 = "cdefghij"
print(after(s1, s2))

For the first case, case s1 = 'abcdefghijk' s2 = 'abcdefghi' , the below would work too.
>>> set(s1) - set(s2)
{'j', 'k'}
>>> ''.join( set(s1) - set(s2))
'jk'
So basically set logic can be applied on strings to extract overlapping and non-overlapping parts of the mentioned strings.
For more info ... https://docs.python.org/2/library/sets.html
But for the 2nd case , #user3760780 suggestion seems to be the best fit.

You can use the string method index to find the start of the substring, then add the length of the substring to get where you want to start taking your extra difference from.
base = 'abcdefghijk'
sub = 'abcdefghi'
def extra(base, sub):
start = base.index(sub)
end = start + len(sub)
return base[end:]
extra(base, sub)
A ValueError will be thrown here if sub is not a substring, and you can choose to do what you wish in that case.
Edit: based on your comment on your question, to return nothing - I'm guessing you mean maybe an empty string - do:
def diff(base, sub):
try:
start = base.index(sub)
end = start + len(sub)
return base[end:]
except ValueError:
return ''
Whether you use find or index here probably depends on what you're actually wanting to use this for.

Related

Substitution Cipher Verification

I am tasked to write a function that returns whether two strings are substitution ciphers of each other. It is assumed that one isn't given a key. The output is expected to return True or False.
Here is what I have written so far on this (borrowed from a CodeFights question). The idea is to append the counts of each element in the string and add it to the string1count and string2count variables. Then, compare the counts at each index, and if they are not equal, we can assume that it is not a valid substitution cipher since each element in the array needs to have the same number of corresponding of characters in order to be a substitution cipher.
def isSubstitutionCipher(string1, string2):
string1count = []
string2count = []
for i in range(0,len(string1)):
string1count.append(string1.count(string1[i]))
for i in range(0,len(string2)):
string2count.append(string2.count(string2[i]))
for i in range(0,len(string1count)):
if string1count.count(string1count[i])!=string2count.count(string1count[i]):
return False
return True
Does anyone else have other proposals on how to solve this very general question / problem statement?
you could try to re-create the subsitution:
def isSubstitutionCipher(string1, string2):
if len(string1) != len(string2):
return False
subst = {}
for c1, c2 in zip(string1, string2):
if c1 in subst:
if c2 != subst[c1]:
return False
else:
if c2 in subst.values():
return False
subst[c1] = c2
return True
for all the characters you have already seen, make sure the substitution matches. for the new ones: store them in the substitution and make sure they are not already a substitution target.
this will return False at the first character that does not match.
Here is a variation on hiro's excellent answer:
def is_sub(s,t):
if len(s) != len(t):return False
d = dict(zip(s,t))
return t == ''.join(d[c] for c in s)
We can use word patterns to check if one string is the ciphertext of another.
word pattern: first letter gets the number 0 and the first occurrence of each different letter after that gets the next number.
advantage is this has O(n) complexity
Code
def isSubstitutionCipher(s1, s2):
def word_pattern(s):
' Generates word pattern of s '
seen, pattern = {}, []
for c in s:
seen.setdefault(c, len(seen))
pattern.append(seen[c])
return pattern
return word_pattern(s1) == word_pattern(s2) # related by ciphertext if same word patterns
Test
'
print(isSubstitutionCipher('banana', 'cololo')) # Output: True
print(isSubstitutionCipher('dog', 'cat') # Output: True
print(isSubstitutionCipher('banana', 'cololl') # Output: False

How to check if list of strings all match the same regex in multiple regexes js_regex

I have following regexes / code snippet:
import js_regex
# I got 2 regexes
a = js_regex.compile(r"^[A-Z]{2}[A-Z0-9]{9}[0-9]$")
b = js_regex.compile(r"^\$[A-Z]{3}$")
# which I can test like this:
if a.match("BE46138E7195"):
print("match a")
if b.match("$USD"):
print("match b")
if not a.match("BDDD"):
print("not matching works")
# ab: third pattern to combine a and b
# first question: is this possible without making a new js_regex?
ab = js_regex.compile(r"^[A-Z]{2}[A-Z0-9]{9}[0-9]||$[A-Z]{3}$")
if ab.match("BE46138E7195"):
print("match ab")
if ab.match("$USD"):
print("match ab")
if not ab.match("BDDD"):
print("not matching works")
So as you can see, 2 regexes and already a first question (see snippet).
But the main question. Suppose I have a list of strings:
["BED", "KLO", "BN"]
I want to check if ALL strings in that list are matching with my ab regex.
BUT: it is ok if they are ALL not matching, like:
["A", "B", "C"]
is ok, because they are all not matching. So I have 2 groups that are possible:
[AB] and [not AB].
What's the best way to tackle this?
Per each of the strings check (the AB check) I would go this way:
if a.match(txt) and b.match(txt):
print("We have a match!")
Now, if you want to check if whole list matches:
def ab_match(txt):
return a.match(txt) and b.match(txt)
list_res = [ab_match(txt) for txt in alist]
all_match = all(list_res)
all_no_match = all((not res for res in list_res))
How does it work:
all - returns True only if all of the values in the iterator have Boolean value of True
So, in case of checking if none of the strings are matching you have to first revert the result per each alist list element.
Separate question if you can combine both regexes. Essentially, you can: for regex A or B you can construct a regex A|B. Typically, (A)|(B) to ensure that the alternative is between the whole regexes not parts of them.
Your ab pattern is wrong, you have a || inside that allows an empty string match, you need just one | to define an alternation.
Next, you did not group the patterns correctly, you need to wrap the p1|p2 with ^(?:p1|p2)$.
So, if you wanted to declare ab pattern you would use
ab = js_regex.compile(r"^(?:[A-Z]{2}[A-Z0-9]{9}[0-9]|\$[A-Z]{3})$")
You can use this pattern like this:
def validate(arr):
return (all(map(ab.search, arr)) or all(map(lambda x: not ab.search(x), arr)) )
See the Python demo:
import js_regex, re
a = js_regex.compile(r"^[A-Z]{2}[A-Z0-9]{9}[0-9]$")
b = js_regex.compile(r"^\$[A-Z]{3}$")
ab = re.compile(f'{a.pattern}|{b.pattern}') # Build the ab pattern from the two above
def validate(arr):
return (all(map(ab.search, arr)) or all(map(lambda x: not ab.search(x), arr)) )
l1 = ["BED", "KLO", "BN"]
l2 = ["A", "B", "C"]
l3 = ["xxx", "$AUD"]
l4 = ['XX46434G8630', '$USD', 'XX46434V7047']
print(validate(l1), validate(l2), validate(l3), validate(l4), sep="\n")
Output:
True
True
False
True

Python 2.7 finding if some anagram of one string is a substring of another [duplicate]

This question already has answers here:
Anagram of String 2 is Substring of String 1
(5 answers)
Closed 5 years ago.
EDIT: Posting my final solution because this was a very helpful thread and I want to add some finality to it. Using the advice from both answers below I was able to craft a solution. I added a helper function in which I defined an anagram. Here is my final solution:
def anagram(s1, s2):
s1 = list(s1)
s2 = list(s2)
s1.sort()
s2.sort()
return s1 == s2
def Question1(t, s):
t_len = len(t)
s_len = len(s)
t_sort = sorted(t)
for start in range(s_len - t_len + 1):
if anagram(s[start: start+t_len], t):
return True
return False
print Question1("app", "paple")
I am working on some practice technical interview questions and I'm stuck on the following question:
Find whether an anagram of string t is a substring of s
I have worked out the following two variants of my code, and a solution to this I believe lies in a cross between the two. The problem I am having is that the first code always prints False., regardless of input. The second variation works to some degree. However, it cannot sort individual letters. For example t=jks s=jksd will print True! however t=kjs s=jksd will print False.
def Question1():
# Define strings as raw user input.
t = raw_input("Enter phrase t:")
s = raw_input("Enter phrase s:")
# Use the sorted function to find if t in s
if sorted(t.lower()) in sorted(s.lower()):
print("True!")
else:
print("False.")
Question1()
Working variant:
def Question1():
# Define strings as raw user input.
t = raw_input("Enter phrase t:")
s = raw_input("Enter phrase s:")
# use a loop to find if t is in s.
if t.lower() in s.lower():
print("True!")
else:
print("False.")
Question1()
I believe there is a solution that lies between these two, but I'm having trouble figuring out how to use sorted in this situation.
You're very much on the right track. First, please note that there is no loop in your second attempt.
The problem is that you can't simply sort all of s and then look for sorted(t) in that. Rather, you have to consider each len(t) sized substring of s, and check that against the sorted t. Consider the trivial example:
t = "abd"
s = "abdc"
s trivially contains t. However, when you sort them, you get the strings abd and abcd, and the in comparison fails. The sorting gets other letters in the way.
Instead, you need to step through s in chunks the size of t.
t_len = len(t)
s_len = len(s)
t_sort = sorted(t)
for start in range(s_len - t_len + 1):
chunk = s[start:start+t_len]
if t_sort == sorted(chunk):
# SUCCESS!!
I think your problem lies in the "substring" requirement. If you sort, you destroy order. Which means that while you can determine that an anagram of string1 is an anagram of a substring of string2, until you actually deal with string2 in order, you won't have a correct answer.
I'd suggest iterating over all the substrings of length len(s1) in s2. This is a straightforward for loop. Once you have the substrings, you can compare them (sorted vs sorted) with s1 to decide if there is any rearrangement of s1 that yields a contiguous substring of s2.
Viz:
s1 = "jks"
s2 = "aksjd"
print('s1=',s1, ' s2=', s2)
for offset in range(len(s2) - len(s1) + 1):
ss2 = s2[offset:offset+len(s1)]
if sorted(ss2) == sorted(s1):
print('{} is an anagram of {} at offset {} in {}'.format(ss2, s1, offset, s2))

python return equal where the string is not actually equal?

I have two strings like s1='fly,dream';s2='dream,fly'
I want the s1 equals to s2.
The code I tried is:
def Isequal(m,n):
s1=m.split(',') s2=n.split(',') s1.sort() s2.sort()
if s1 == s2:
print 'Equal'
else:
print s1,s2
Note:s1 may be equal to s2.
Then
def Isequal(m,n):
s1=m.split(',')
s2=n.split(',')
if s1 == s2.reverse() || s1 == s2:
print 'Equal'
else:
print s1,s2
Is this code right? I there something to improve?
Your code splits the two strings by , (which returns a list) and calls the sort method on the list. Since the two substrings are identical, sorting the list of the substrings results in equal lists. The best way to know what is happening is printing the stuff out. See the results.
>>> s1 = 'fly,dream'
>>> s2 = 'dream,fly'
>>> s1 = s1.split(',')
>>> s1
['fly', 'dream']
>>> s2 = s2.split(',')
>>> s2
['dream', 'fly']
>>> s1.sort()
>>> s1
['dream', 'fly']
>>> s2.sort()
>>> s2
['dream', 'fly']
>>> s1 == s2
True
If you want to check that the two strings consist of the same substrings, use sets, like follows :
>>> varOne = set(s1.split(','))
>>> varTwo = set(s2.split(','))
>>> varOne == varTwo
True
Beware that sets only allow unique items, so fly,dream,fly and dream,dream,fly will result in True here.
Set would be more elegant here:
def Isequal(m, n):
s1 = set(m.split(','))
s2 = set(n.split(','))
if s1 == s2:
print 'Equal'
else:
print s1, s2
and should be more efficient too.
You probably don't want to use sort() to flip a list. The sorting that takes place entirely depends on the string (it varies on the first letter of each string). You can use .reverse to reverse a list:
def Isequal(m,n):
m = m.split(',')
m.reverse()
if m == n.split(','):
print "Equal"
else:
print m, n
If you want to sort the list in place, you can always do .reverse() instead of .sort()
If reversing the list was just an example in your question, and your strings would actually have more items when you split them, you can use sets:
def Isequal(m,n):
if not set(m.split(',')).symmetric_difference(n.split(',')):
print "Equal"
else:
print m, n
By the way, Sparse is better than dense.. The semi-colons are rather... ugly.

Method to find substring

I know that in python there is a in operator which can used to check whether any sub-string or char is present in a string or not. I want to do this by checking each string (of length substring). Is the code below is the only way or is there any other way that I can achieve this?
m = "college"
s = "col"
lm = len(m)
ls = len(s)
f = 0
for i in range(lm):
if (i+ls) <= lm:
if s == m[i:(i+ls)]:
global f
f = 1
break
if f:
print "present"
else:
print "not present"
What I am doing here is if my sub-string is col, my program checks the string of length sub-string with sub-string by moving from start to end of the main-string and returns true or not.
col
oll
lle
leg
ege
Your code is a legitimate way to quickly implement a general substring search, but not the only one. More efficient algorithms include Boyer-Moore string search, Knuth-Morris-Pratt search, or search implemented with a DFA.
This is a large topic and your question does not make it clear what kind of information you are actually after. In case of Python, it is of course most effective to simply use the in operator and the related methods str.find and str.index, all of which deploy a simplified Boyer-Moore.
You could try something like this:
In [1]: m = 'college'
In [2]: s = 'col'
In [3]: if any(m[i:i+len(s)] == s for i in range(len(m)-len(s)+1)):
...: print 'Present'
...: else:
...: print 'Not present'
...:
Present
Where the any checks every substring of m of length len(s) and sees if it equals s. If so, it returns True and stops further processing (this is called 'short-circuiting' and is pretty similar to the break you have above).
Here is what the any piece would look like if we replaced it with a list comprehension and took out the equality comparison:
In [4]: [m[i:i+len(s)] for i in range(len(m)-len(s)+1)]
Out[4]: ['col', 'oll', 'lle', 'leg', 'ege']
You don't need global there. Also, you can do
In [1]: %paste
m = "college"
s = "col"
In [2]: 'not ' * all(s != m[i:i+len(s)] for i in range(1+len(m)-len(s))) + 'present'
Out[2]: 'present'
But actually you should of course just do s in m,
This kind of problem calls for a functional solution:
def strcomp(s, subs):
if len(s) < len(subs):
return False
elif s[0:len(subs)] == subs:
return True
else:
return strcomp(s[1:], subs)
You call recursively the strcomp function, each time with the "long" string - s losing its head until you either find subs in the first position or s becomes shorter than subs.

Categories

Resources