Python Subsequence of a String - python

The exercise asks me to write a program. It wrote
"Given a string s and a string t, check if s is a subsequence of t.
For example: "ac", "abcd" => True."
So I wrote this:
def isSubsequence(s, t):
s, t = map(list, [s, t])
for c in t:
if c in s:
s.pop(0)
return not s
It worked ok in most cases except one:
s = "rjufvjafbxnbgriwgokdgqdqewn"
t = "mjmqqjrmzkvhxlyruonekhhofpzzslupzojfuoztvzmmqvmlhgqxehojfowtrinbatjujaxekbcydldglkbxsqbbnrkhfdnpfbuaktupfftiljwpgglkjqunvithzlzpgikixqeuimmtbiskemplcvljqgvlzvnqxgedxqnznddkiujwhdefziydtquoudzxstpjjitmiimbjfgfjikkjycwgnpdxpeppsturjwkgnifinccvqzwlbmgpdaodzptyrjjkbqmgdrftfbwgimsmjpknuqtijrsnwvtytqqvookinzmkkkrkgwafohflvuedssukjgipgmypakhlckvizmqvycvbxhlljzejcaijqnfgobuhuiahtmxfzoplmmjfxtggwwxliplntkfuxjcnzcqsaagahbbneugiocexcfpszzomumfqpaiydssmihdoewahoswhlnpctjmkyufsvjlrflfiktndubnymenlmpyrhjxfdcq"
I don't know why my code didn't work on this one. So if someone knows the answer, please tell me.

Here is what you can do:
def isSubsequence(s, t):
s = list(s)
for i,(a,b) in enumerate(zip(t,s)):
if a != b:
s.insert(i,'.')
return len(t) == len(s)
print(isSubsequence('Apes are goo.', 'Apples are good.'))
Output:
True
Your case is that that specific s is not a subsequence of that specific t. To prove it:
def isSubsequence(s, t):
s = list(s)
for i,(a,b) in enumerate(zip(t,s)):
if a != b:
s.insert(i,'.')
print(t)
print(''.join(s))
s = "rjufvjafbxnbgriwgokdgqdqewn"
t = "mjmqqjrmzkvhxlyruonekhhofpzzslupzojfuoztvzmmqvmlhgqxehojfowtrinbatjujaxekbcydldglkbxsqbbnrkhfdnpfbuaktupfftiljwpgglkjqunvithzlzpgikixqeuimmtbiskemplcvljqgvlzvnqxgedxqnznddkiujwhdefziydtquoudzxstpjjitmiimbjfgfjikkjycwgnpdxpeppsturjwkgnifinccvqzwlbmgpdaodzptyrjjkbqmgdrftfbwgimsmjpknuqtijrsnwvtytqqvookinzmkkkrkgwafohflvuedssukjgipgmypakhlckvizmqvycvbxhlljzejcaijqnfgobuhuiahtmxfzoplmmjfxtggwwxliplntkfuxjcnzcqsaagahbbneugiocexcfpszzomumfqpaiydssmihdoewahoswhlnpctjmkyufsvjlrflfiktndubnymenlmpyrhjxfdcq"
isSubsequence(s, t)
Output:
mjmqqjrmzkvhxlyruonekhhofpzzslupzojfuoztvzmmqvmlhgqxehojfowtrinbatjujaxekbcydldglkbxsqbbnrkhfdnpfbuaktupfftiljwpgglkjqunvithzlzpgikixqeuimmtbiskemplcvljqgvlzvnqxgedxqnznddkiujwhdefziydtquoudzxstpjjitmiimbjfgfjikkjycwgnpdxpeppsturjwkgnifinccvqzwlbmgpdaodzptyrjjkbqmgdrftfbwgimsmjpknuqtijrsnwvtytqqvookinzmkkkrkgwafohflvuedssukjgipgmypakhlckvizmqvycvbxhlljzejcaijqnfgobuhuiahtmxfzoplmmjfxtggwwxliplntkfuxjcnzcqsaagahbbneugiocexcfpszzomumfqpaiydssmihdoewahoswhlnpctjmkyufsvjlrflfiktndubnymenlmpyrhjxfdcq
......r...........................j.u...................f...............................................................v..............................j..................................................................................................a................f..b..............................................................................x............n...b....................g....................................................................................r...i.......................wgokdgqdqewn
UPDATED to include simpler implementation given by #StevenRumbalski at the comments:
def isSubsequence(s, t, start=-1):
return all((start:=t.find(c, start+1)) > -1 for c in s)

I suppose the order also matters
def isSubsequence(s, t): # order matters
s, t = list(s), list(t)
for c in s:
if c in t:
c_idx = t.index(c)
t = t[c_idx:]
else:
return False
return True

def isSubsequence(s, t):
start = -1
for i in s:
start = t.find(i, start + 1)
if start == -1:
return False
return True

If this is an excercise you probably should figure it out yourself :), but since you already posted your attempt, allow me to propose this simpler approach which may work for you:
from itertools import combinations
def isSubsequence(s, t):
return any(s == ''.join(c) for c in combinations(t, len(s)))
It's obviously less performant, but maybe at least a helpful suggestion.

Related

What do I do to remove (or avoid returning altogether) "None" at the end of this recursive function

char_complement looks at the index of a character in "input" and returns the letter in the same index in "output". If a letter isn't in the list, a blank is returned.
string_complement is supposed to do similar, but for numerous characters.
def char_complement(c):
input=["A","T","C","G"]
output=["T","A","G","C"]
if c in input:
x= input.index(c)
return output[x]
return ""
def string_complement(s):
while s!="":
return char_complement(s[0])+str(string_complement(s[1:]))
If s is "ATTAGTC", "TAATCAGNone" is returned.
as other users says: you need a return "" in string_complement function at the end
Here i had some time to play with the problem. I was thinking that Recursion is an elegant way to solve problems, but in python there's an annoying problem: RecursionError (it raises when you get to recursion lvl 3000 by default and it can be set*). I've thinked that there is better solutions than using a function to traslate every char. So I decided to try with a dictionary to solve that. And suposing that you need to translate long strings combination I proposed other methods.
I came up with 4 diferent methods to try:
#recursive + auxFunction
#original proposed method
def char_complement(c):
input=["A","T","C","G"]
output=["T","A","G","C"]
if c in input:
x= input.index(c)
return output[x]
return ""
def string_complement(s):
while s!="":
return char_complement(s[0])+str(string_complement(s[1:]))
return ""
#recursive + dict
def string_complement2(s):
char_comp ={'A':'T','T':'A','C':'G','G':'C'}
while s!="":
return char_comp[s[0]] + string_complement2(s[1:])
return ""
#comprehension + dict
def string_complement3(s):
char_comp ={'A':'T','T':'A','C':'G','G':'C'}
return [char_comp[thisS] for thisS in s]
# for + dict
def string_complement4(s):
char_comp ={'A':'T','T':'A','C':'G','G':'C'}
complement = ""
for thisS in s:
complement += char_comp[thisS]
return complement
for last i only have to test all the methods and compare execution times. so I proposed this (maybe time isn't the best library to measure time between excecutions idk):
from time import time
import numpy as np
def test_and_get_times(n):
#generate seq
atcgList = ['A','T','C','G']
seq = ''.join( np.random.choice(atcgList,n))
try : #to avoid RecursionError
t1 = time()
a1 = string_complement(seq)
t2 = time()
a2 = string_complement2(seq)
t3 = time()
except RecursionError: #if RecursionError set values to nan
t2=np.nan
t3=np.nan
finally:
t4 = time()
a3 = string_complement3(seq)
t5 = time()
a4 = string_complement4(seq)
t6 = time()
return [t2-t1,t3-t2,t5-t4,t6-t5]
#diferent lengths to test methods
nList = np.arange(10,10001,111)
#test all lengths
times = np.zeros([4,len(nList)])
for i,n in enumerate(nList):
times[:,i] = test_and_get_times(n)
#plot results
import matplotlib.pyplot as plt
plt.figure()
plt.plot(nList,times[0],label='Recursive + auxFunc')
plt.plot(nList,times[1],label='Recursive + auxDic')
plt.plot(nList,times[2],label='Comprehension+ auxDic')
plt.plot(nList,times[3],label='For + auxDic')
plt.legend()
run the code and get results:
Conclusion seems that using a dictionary to get the complements is faster than creating an auxiliary function for the same thing. Using for-comprehension or simply a for avoids the RecursionError. and let run largers sequencies
Sorry for my potato english ( i might have a lot of errors semantics and sintactics)
hope this can help you in some way!
cheers!
*to set another value to recursion Limit:
import sys
sys.setrecursionlimit(desiredValue)
Simply add return "" to the end of your function, so that when the while loop ends, and empty string is returned as opposed to None:
def string_complement(s):
while s!="":
return char_complement(s[0])+str(string_complement(s[1:]))
return ""
Altogether:
def char_complement(c):
input=["A","T","C","G"]
output=["T","A","G","C"]
if c in input:
x= input.index(c)
return output[x]
return ""
def string_complement(s):
while s!="":
return char_complement(s[0])+str(string_complement(s[1:]))
return ""
print(string_complement("ATTAGTC"))
Output:
TAATCAG
Is there a need to use recursion? This can be easily done with indexes:
INPUT = ["A","T","C","G"]
OUTPUT = ["T","A","G","C"]
def get_complement(str_in):
letters = list(str_in)
return ''.join(
[
OUTPUT[INPUT.index(letter)]
for letter in letters
]
)
The problem is solved by a one-line declaration and two one-line functions:
COMPLEMENTS = dict(A='T', T='A', C='G', G='C')
def char_complement(c):
return COMPLEMENTS.get(c, "")
def string_complement(s):
return ''.join([char_complement(c) for c in s])
print(string_complement("ATTAGTC"))
>>>TAATCAG

check if letters of a string are in sequential order in another string

If it were just checking whether letters in a test_string are also in a control_string,
I would not have had this problem.
I will simply use the code below.
if set(test_string.lower()) <= set(control_string.lower()):
return True
But I also face a rather convoluted task of discerning whether the overlapping letters in the
control_string are in the same sequential order as those in test_string.
For example,
test_string = 'Dih'
control_string = 'Danish'
True
test_string = 'Tbl'
control_string = 'Bottle'
False
I thought of using the for iterator to compare the indices of the alphabets, but it is quite hard to think of the appropriate algorithm.
for i in test_string.lower():
for j in control_string.lower():
if i==j:
index_factor = control_string.index(j)
My plan is to compare the primary index factor to the next factor, and if primary index factor turns out to be larger than the other, the function returns False.
I am stuck on how to compare those index_factors in a for loop.
How should I approach this problem?
You could just join the characters in your test string to a regular expression, allowing for any other characters .* in between, and then re.search that pattern in the control string.
>>> test, control = "Dih", "Danish"
>>> re.search('.*'.join(test), control) is not None
True
>>> test, control = "Tbl", "Bottle"
>>> re.search('.*'.join(test), control) is not None
False
Without using regular expressions, you can create an iter from the control string and use two nested loops,1) breaking from the inner loop and else returning False until all the characters in test are found in control. It is important to create the iter, even though control is already iterable, so that the inner loop will continue where it last stopped.
def check(test, control):
it = iter(control)
for a in test:
for b in it:
if a == b:
break
else:
return False
return True
You could even do this in one (well, two) lines using all and any:
def check(test, control):
it = iter(control)
return all(any(a == b for b in it) for a in test)
Complexity for both approaches should be O(n), with n being the max number of characters.
1) This is conceptually similar to what #jpp does, but IMHO a bit clearer.
Here's one solution. The idea is to iterate through the control string first and yield a value if it matches the next test character. If the total number of matches equals the length of test, then your condition is satisfied.
def yield_in_order(x, y):
iterstr = iter(x)
current = next(iterstr)
for i in y:
if i == current:
yield i
current = next(iterstr)
def checker(test, control):
x = test.lower()
return sum(1 for _ in zip(x, yield_in_order(x, control.lower()))) == len(x)
test1, control1 = 'Tbl', 'Bottle'
test2, control2 = 'Dih', 'Danish'
print(checker(test1, control1)) # False
print(checker(test2, control2)) # True
#tobias_k's answer has cleaner version of this. If you want some additional information, e.g. how many letters align before there's a break found, you can trivially adjust the checker function to return sum(1 for _ in zip(x, yield_in_order(...))).
You can use find(letter, last_index) to find occurence of desired letter after processed letters.
def same_order_in(test, control):
index = 0
control = control.lower()
for i in test.lower():
index = control.find(i, index)
if index == -1:
return False
# index += 1 # uncomment to check multiple occurrences of same letter in test string
return True
If test string have duplicate letters like:
test_string = 'Diih'
control_string = 'Danish'
With commented line same_order_in(test_string, control_string) == True
and with uncommented line same_order_in(test_string, control_string) == False
Recursion is the best way to solve such problems.
Here's one that checks for sequential ordering.
def sequentialOrder(test_string, control_string, len1, len2):
if len1 == 0: # base case 1
return True
if len2 == 0: # base case 2
return False
if test_string[len1 - 1] == control_string[len2 - 1]:
return sequentialOrder(test_string, control_string, len1 - 1, len2 - 1) # Recursion
return sequentialOrder(test_string, control_string, len1, len2-1)
test_string = 'Dih'
control_string = 'Danish'
print(isSubSequence(test_string, control_string, len(test_string), len(control_string)))
Outputs:
True
and False for
test_string = 'Tbl'
control_string = 'Bottle'
Here's an Iterative approach that does the same thing,
def sequentialOrder(test_string,control_string,len1,len2):
i = 0
j = 0
while j < len1 and i < len2:
if test_string[j] == control_string[i]:
j = j + 1
i = i + 1
return j==len1
test_string = 'Dih'
control_string = 'Danish'
print(sequentialOrder(test_string,control_string,len(test_string) ,len(control_string)))
An elegant solution using a generator:
def foo(test_string, control_string):
if all(c in control_string for c in test_string):
gen = (char for char in control_string if char in test_string)
if all(x == test_string[i] for i, x in enumerate(gen)):
return True
return False
print(foo('Dzn','Dahis')) # False
print(foo('Dsi','Dahis')) # False
print(foo('Dis','Dahis')) # True
First check if all the letters in the test_string are contained in the control_string. Then check if the order is similar to the test_string order.
A simple way is making use of the key argument in sorted, which serves as a key for the sort comparison:
def seq_order(l1, l2):
intersection = ''.join(sorted(set(l1) & set(l2), key = l2.index))
return True if intersection == l1 else False
Thus this is computing the intersection of the two sets and sorting it according to the longer string. Having done so you only need to compare the result with the shorter string to see if they are the same.
The function returns True or False accordingly. Using your examples:
seq_order('Dih', 'Danish')
#True
seq_order('Tbl', 'Bottle')
#False
seq_order('alp','apple')
#False

Combine string recursive-python

I have two string list:
A = ['YELLOW']
B = ['BA']
I want to combine these two string using a recursive function to get
['YBAEBALBALBAOBAWBA']
HERE IS my function :
def Combine(A, B):
if len(A) > 0:
return str(A[0]) + str(B) + Combine(A[:0], B)
--
I have no idea how recursive works?
Could someone please help me!
You were very close!
def Combine(A, B):
if len(A) > 0:
return str(A[0]) + str(B) + Combine(A[1:], B) # <-- fix 1
else:
return '' # <-- fix 2
in order to call recursively with the rest of A you should call A[1:]
you took care of the case that len(A) > 0 but forgot to take care of the case that A ran out of characters (the else)
Running
A = 'YELLOW'
B = 'BA'
print(Combine(A, B))
OUTPUT
YBAEBALBALBAOBAWBA

Match two strings (char to char) till the first non-match using python

I am trying to match two strings sequentially till the first the non-matched character and then determine the percentage exact match. My code is like this:
def match(a, b):
a, b = list(a), list(b)
count = 0
for i in range(len(a)):
if (a[i]!= b[i]): break
else: count = count + 1
return count/len(a)
a = '354575368987943'
b = '354535368987000'
c = '354575368987000'
print(match(a,b)) # return 0.267
print(match(a,c)) # return 0.8
Is there any built-in method in python already which can do it faster ? For simplicity assume that both strings are of same length.
There's no built-in to do the entire thing, but you can use a built-in for computing the common prefix:
import os
def match(a, b):
common = os.path.commonprefix([a, b])
return float(len(common))/len(a)
I don't think there is such build-in method.
But you can improve your implementation:
No need to wrap the inputs in list(...). Strings are indexable.
No need for count variable, i already carries the same meaning. And you can return immediately when you know the result.
Like this, with some doctests added as a bonus:
def match(a, b):
"""
>>> match('354575368987943', '354535368987000')
0.26666666666666666
>>> match('354575368987943', '354575368987000')
0.8
>>> match('354575368987943', '354575368987943')
1
"""
for i in range(len(a)):
if a[i] != b[i]:
return i / len(a)
return 1
alternative
(Just now saw that the answer below me thought of the same thing while I was editing the post)
def match(l1, l2):
# find mismatch
try:
stop = next(i for i, (el1, el2) in enumerate(zip(l1, l2)) if el1 != el2)
return stop/len(l1)
except StopIteration:
return 1

List membership in Python without "in"

How to define a function is_member() that takes a value (i.e. a number, string, etc) x and a list of values a, and returns True if x is a member of a, False otherwise. (Note that this is exactly what the in operator does, but for the sake of the exercise I should pretend Python did not have this operator.
This is what I've come up with, but it doesn't work!
def is_member(x, a):
return x == a[::]
I can think of two (edit: three) ways to do this:
First:
def is_member(array, value):
try:
array.index(value)
except ValueError:
return False
else:
return True
Second:
def is_member(array, value):
for item in array:
if item == value:
return True
return False
EDIT: Also, third:
def is_member(array, value):
return array.count(value) > 0
Recursive solution:
def is_member(value, array):
if len(array) == 0:
return False
return value == array[0] or is_member(value, array[1:])
Using a generator expression (note that this in operator has nothing to do with the another one)
def is_member(x, a):
return any(x == y for y in a)
>>> is_member(10, xrange(1000000000000000))
True
You could simply just iterate over every element in the list then:
def is_member(col, a):
for i in xrange(len(col)):
if a == col[i]: return True
return False
>> a = [1,2,3,4]
>> is_member(a, 2)
True
>> is_member(a, 5)
False
Without using the "in" operator:
from itertools import imap
def is_member( item, array ):
return any( imap(lambda x: x == item, array ) )
which will cycle through the items of the list, one at a time, and short circuit when it hits a value that is True.
Well, there are a lot of ways to do this, of course -- but you're a little hamstrung by the prohibition of "in" anywhere in the code. Here are a few things to try.
Variations on a theme ...
def is_member(item, seq):
return sum(map(lambda x: x == item, seq)) > 0
def is_member(item, seq):
return len(filter(lambda x: x != item, seq)) != len(seq)
You may have heard that asking for forgiveness is better than asking for permission ...
def is_member(item, seq):
try:
seq.index(item)
return True
except:
return False
Or something a little more functional-flavored ...
import itertools, operator, functools
def is_member(item, seq):
not_eq = functools.partial(operator.ne, item)
return bool(list(itertools.dropwhile(not_eq, seq)))
But, since your requirements preclude the use of the looping construct which would be most reasonable, I think the experts would recommend writing your own looping framework. Something like ...
def loop(action, until):
while True:
action()
if until():
break
def is_member(item, seq):
seq = seq
sigil = [False]
def check():
if seq[0] == item:
sigil[0] = True
def til():
seq.remove(seq[0])
return not len(seq)
loop(check, til)
return sigil[0]
Let us know how it goes.

Categories

Resources