Variation of finding edit distance with only insertions and deletions? - python

I need to find the edit distance between a word and its sorted word (ex: apple and aelpp), using only insertions and deletions recursively.
I have found some sources that used insertions, deletions, and substitutions, but I am not sure how to only use insertion and deletion.
This is the code I found:
def ld(s, t):
if not s: return len(t)
if not t: return len(s)
if s[0] == t[0]: return ld(s[1:], t[1:])
l1 = ld(s, t[1:])
l2 = ld(s[1:], t)
l3 = ld(s[1:], t[1:])
return 1 + min(l1, l2, l3)
What edits would need to be made to only find the number of insertions and deletions?

Remove l3, which computes substitutions like so
def ld2(s, t):
if not s: return len(t)
if not t: return len(s)
if s[0] == t[0]: return ld2(s[1:], t[1:])
l1 = ld2(s, t[1:])
l2 = ld2(s[1:], t)
return 1 + min(l1, l2)
You can see that ld('apple', 'applx') is equal to 1, while ld2 with the same parameters evaluates to 2.

Related

Iterative merge sort?

I am aware of classical recursive approach to sort something by merging.
It yields O(n * log(n)) complexity, which can be more or less easily shown via recurrence relation.
I've tried to reimplement merge sort in iterative fashion:
def atomize(l):
return list(
map(
lambda x: [x],
l if l is not None else []
)
)
def merge(l, r):
res = []
while (len(l) + len(r)) > 0:
if len(l) < 1:
res += r
r = []
elif len(r) < 1:
res += l
l = []
else:
if l[0] <= r[0]:
res.append(l.pop(0))
else:
res.append(r.pop(0))
return res
def iter_merge_sort(l):
atoms = atomize(l) # O(n)
while len(atoms) > 1: # O(n - 1)
atoms.append(merge(atoms.pop(0), atoms.pop(0)))
return atoms[0]
...and feels like I am mistaken somewhere, yet I fail to notice exact place. Recursive merge sort splits problem unless list of unsorted values reduces to a list of singletons - single elements that can be compared. That's what atomize(...) does: given a list, produces a list of lists-singletons (order O(n)).
Obviously, merge(...) is O(n) as well: ignore for moment that no linked lists are used for concatenation, that's not important here.
Finally.. the while block in the iter_merge_sort(...) itself takes exactly n - 1 repetitions, each of which costs at most O(n). Hence, I took O(n * log(n)) algorithm and "improved" it to be (n - 1) * n ~ O(n * n). Where is my mistake?
Your algorithm is entirely correct. The problem lies in that you're using list.pop(0) as a way to dequeue, which costs O(n) in Python since all items after a popped item of a list have to be copied to the preceding positions.
You can use collections.deque in place of list so that you can use the deque.popleft method, which costs O(1):
from collections import deque
def atomize(l):
return deque(
map(
lambda x: deque([x]),
l if l is not None else []
)
)
def merge(l, r):
res = deque()
while (len(l) + len(r)) > 0:
if len(l) < 1:
res += r
r = deque()
elif len(r) < 1:
res += l
l = deque()
else:
if l[0] <= r[0]:
res.append(l.popleft())
else:
res.append(r.popleft())
return res
def iter_merge_sort(l):
atoms = atomize(l) # O(n)
while len(atoms) > 1: # O(n - 1)
atoms.append(merge(atoms.popleft(), atoms.popleft()))
return list(atoms[0])
so that:
iter_merge_sort([3,5,1,6,2,1])
returns:
[1, 1, 2, 3, 5, 6]

How to calculate sum of two polynomials?

For instance 3x^4 - 17x^2 - 3x + 5. Each term of the polynomial can be represented as a pair of integers (coefficient,exponent). The polynomial itself is then a list of such pairs like
[(3,4), (-17,2), (-3,1), (5,0)] for the polynomial as shown.
Zero polynomial, 0, is represented as the empty list [], since it has no terms with nonzero coefficients.
I want to write two functions to add and multiply two input polynomials with the same representation of tuple (coefficient, exponent):
addpoly(p1, p2)
multpoly(p1, p2)
Test Cases:
addpoly([(4,3),(3,0)], [(-4,3),(2,1)])
should give [(2, 1),(3, 0)]
addpoly([(2,1)],[(-2,1)])
should give []
multpoly([(1,1),(-1,0)], [(1,2),(1,1),(1,0)])
should give [(1, 3),(-1, 0)]
Here is something that I started with but got completely struck!
def addpoly(p1, p2):
(coeff1, exp1) = p1
(coeff2, exp2) = p2
if exp1 == exp2:
coeff3 = coeff1 + coeff2
As suggested in the comments, it is much simpler to represent polynomials as multisets of exponents.
In Python, the closest thing to a multiset is the Counter data structure. Using a Counter (or even just a plain dictionary) that maps exponents to coefficients will automatically coalesce entries with the same exponent, just as you'd expect when writing a simplified polynomial.
You can perform operations using a Counter, and then convert back to your list of pairs representation when finished using a function like this:
def counter_to_poly(c):
p = [(coeff, exp) for exp, coeff in c.items() if coeff != 0]
# sort by exponents in descending order
p.sort(key = lambda pair: pair[1], reverse = True)
return p
To add polynomials, you group together like-exponents and sum their coefficients.
def addpoly(p, q):
r = collections.Counter()
for coeff, exp in (p + q):
r[exp] += coeff
return counter_to_poly(r)
(In fact, if you were to stick with the Counter representation throughout, you could just return p + q).
To multiply polynomials, you multiply each term from one polynomial pairwise with every term from the other. And furthermore, to multiply terms, you add exponents and multiply coefficients.
def mulpoly(p, q):
r = collections.Counter()
for (c1, e1), (c2, e2) in itertools.product(p, q):
r[e1 + e2] += c1 * c2
return counter_to_poly(r)
This python code worked for me,hope this works for u too...
Addition func
def addpoly(p1,p2):
i=0
su=0
j=0
c=[]
if len(p1)==0:
#if p1 empty
return p2
if len(p2)==0:
#if p2 is empty
return p1
while i<len(p1) and j<len(p2):
if int(p1[i][1])==int(p2[j][1]):
su=p1[i][0]+p2[j][0]
if su !=0:
c.append((su,p1[i][1]))
i=i+1
j=j+1
elif p1[i][1]>p2[j][1]:
c.append((p1[i]))
i=i+1
elif p1[i][1]<p2[j][1]:
c.append((p2[j]))
j=j+1
if p1[i:]!=[]:
for k in p1[i:]:
c.append(k)
if p2[j:]!=[]:
for k in p2[j:]:
c.append(k)
return c
Multiply func
def multipoly(p1,p2):
p=[]
s=0
for i in p1:
c=[]
for j in p2:
s=i[0]*j[0]
e=i[1]+j[1]
c.append((s,e))
p=addpoly(c,p)
return p
I have come up with a solution but I'm unsure that it's optimized!
def addpoly(p1,p2):
for i in range(len(p1)):
for item in p2:
if p1[i][1] == item[1]:
p1[i] = ((p1[i][0] + item[0]),p1[i][1])
p2.remove(item)
p3 = p1 + p2
for item in (p3):
if item[0] == 0:
p3.remove(item)
return sorted(p3)
and the second one:-
def multpoly(p1,p2):
for i in range(len(p1)):
for item in p2:
p1[i] = ((p1[i][0] * item[0]), (p1[i][1] + item[1]))
p2.remove(item)
return p1

compare 2 strings for common substring

i wish to find longest common substring of 2 given strings recursively .i have written this code but it is too inefficient .is there a way i can do it in O(m*n) here m an n are respective lengths of string.here's my code:
def lcs(x,y):
if len(x)==0 or len(y)==0:
return " "
if x[0]==y[0]:
return x[0] + lcs(x[1:],y[1:])
t1 = lcs(x[1:],y)
t2 = lcs(x,y[1:])
if len(t1)>len(t2):
return t1
else:
return t2
x = str(input('enter string1:'))
y = str(input('enter string2:'))
print(lcs(x,y))
You need to memoize your recursion. Without that, you will end up with an exponential number of calls since you will be repeatedly solving the same problem over and over again. To make the memoized lookups more efficient, you can define your recursion in terms of the suffix lengths, instead of the actual suffixes.
You can also find the pseudocode for the DP on Wikipedia.
Here is a naive non-recursive solution which uses the powerset() recipe from itertools:
from itertools import chain, combinations, product
def powerset(iterable):
"powerset([1,2,3]) --> () (1,) (2,) (3,) (1,2) (1,3) (2,3) (1,2,3)"
s = list(iterable)
return chain.from_iterable(combinations(s, r) for r in range(len(s) + 1))
def naive_lcs(a, b):
return ''.join(max(set(powerset(a)) & set(powerset(b)), key=len))
It has problems:
>>> naive_lcs('ab', 'ba')
'b'
>>> naive_lcs('ba', 'ab')
'b'
There can be more than one solution for some pairs of strings, but my program picks one arbitrarily.
Also, since any of the combinations can be the longest common one, and since calculating these combinations takes O(2 ^ n) time, this solution doesn't compute in O(n * m) time. With Dynamic Programming and memoizing OTOH we can find a solution that, in theory, should perform better:
from functools import lru_cache
#lru_cache()
def _dynamic_lcs(xs, ys):
if not (xs and ys):
return set(['']), 0
elif xs[-1] == ys[-1]:
result, rlen = _dynamic_lcs(xs[:-1], ys[:-1])
return set(each + xs[-1] for each in result), rlen + 1
else:
xlcs, xlen = _dynamic_lcs(xs, ys[:-1])
ylcs, ylen = _dynamic_lcs(xs[:-1], ys)
if xlen > ylen:
return xlcs, xlen
elif xlen < ylen:
return ylcs, ylen
else:
return xlcs | ylcs, xlen
def dynamic_lcs(xs, ys):
result, _ = _dynamic_lcs(xs, ys)
return result
if __name__ == '__main__':
seqs = list(powerset('abcde'))
for a, b in product(seqs, repeat=2):
assert naive_lcs(a, b) in dynamic_lcs(a, b)
dynamic_lcs() also solves the problem that some pairs strings can have multiple common longest sub-sequences. The result is the set of these, instead of one string. Finding the set of all common sub-sequences though is still of exponential complexity.
Thanks to Pradhan for reminding me of Dynamic Programming and memoization.

Evaluating Polynomial coefficients

I'm trying to write a function that takes as input a list of coefficients (a0, a1, a2, a3.....a n) of a polynomial p(x) and the value x. The function will return p(x), which is the value of the polynomial when evaluated at x.
A polynomial of degree n with coefficient a0, a1, a2, a3........an is the function
p(x)= a0+a1*x+a2*x^2+a3*x^3+.....+an*x^n
So I'm not sure how to attack the problem. I'm thinking that I will need a range but how can I make it so that it can handle any numerical input for x? I'm not expecting you guys to give the answer, I'm just in need of a little kick start. Do I need a for loop, while loop or could recursive be an option here?
def poly(lst, x)
I need to iterate over the items in the list, do I use the indices for that, but how can I make it iterate over an unknown number of items?
I'm thinking I can use recursion here:
def poly(lst, x):
n = len(lst)
If n==4:
return lst[o]+lst[1]*x+lst[2]*x**2+lst[3]*x**3
elif n==3:
return lst[o]+lst[1]*x+lst[2]*x**2
elif n==2:
return lst[o]+lst[1]*x
elif n==1:
return lst[o]
else:
return lst[o]+lst[1]*x+lst[2]*x**2+lst[3]*x**3+lst[n]*x**n
This works for n<=4 but I get a index error: list index out of range for n>4, can't see why though.
The most efficient way is to evaluate the polynomial backwards using Horner's Rule. Very easy to do in Python:
# Evaluate a polynomial in reverse order using Horner's Rule,
# for example: a3*x^3+a2*x^2+a1*x+a0 = ((a3*x+a2)x+a1)x+a0
def poly(lst, x):
total = 0
for a in reversed(lst):
total = total*x+a
return total
simple:
def poly(lst, x):
n, tmp = 0, 0
for a in lst:
tmp = tmp + (a * (x**n))
n += 1
return tmp
print poly([1,2,3], 2)
simple recursion:
def poly(lst, x, i = 0):
try:
tmp = lst.pop(0)
except IndexError:
return 0
return tmp * (x ** (i)) + poly(lst, x, i+1)
print poly([1,2,3], 2)
def evalPoly(lst, x):
total = 0
for power, coeff in enumerate(lst): # starts at 0 by default
total += (x**power) * coeff
return total
Alternatively, you can use a list and then use sum:
def evalPoly(lst, x):
total = []
for power, coeff in enumerate(lst):
total.append((x**power) * coeff)
return sum(total)
Without enumerate:
def evalPoly(lst, x):
total, power = 0, 0
for coeff in lst:
total += (x**power) * coeff
power += 1
return total
Alternative to non-enumerate method:
def evalPoly(lst, x):
total = 0
for power in range(len(lst)):
total += (x**power) * lst[power] # lst[power] is the coefficient
return total
Also #DSM stated, you can put this together in a single line:
def evalPoly(lst, x):
return sum((x**power) * coeff for power, coeff in enumerate(lst))
Or, using lambda:
evalPoly = lambda lst, x: sum((x**power) * coeff for power, coeff in enumerate(lst))
Recursive solution:
def evalPoly(lst, x, power = 0):
if power == len(lst): return (x**power) * lst[power]
return ((x**power) * lst[power]) + evalPoly(lst, x, power + 1)
enumerate(iterable, start) is a generator expression (so it uses yield instead of return that yields a number and then an element of the iterable. The number is equivalent to the index of the element + start.
From the Python docs, it is also the same as:
def enumerate(sequence, start=0):
n = start
for elem in sequence:
yield n, elem
n += 1
Either with recursion, or without, the essence of the solution is to create a loop on "n", because the polynomial starts at x^0 and goes up to a_n.x^n and that's the variable you should also consider as an input. Besides that, use a trick called multiply and accumulate to be able to calculate partial results on each loop iteration.
def evalPoly(lst, x, power):
if power == 0:
return lst[power]
return ((x**power) * lst[power]) + evalPoly(lst, x, power - 1)
lst = [7, 1, 2, 3]
x = 5
print(evalPoly(lst, x, 3))
Equation to evaluate is - 3x^3 + 2x^2 + x + 7
when x = 5, result is - 437

Checking if ranges cross paths

I wrote the following method to check if a list of ranges cross paths. Another way of saying this is that the ranges are not nested.
def check_ranges(lst):
for i in range(len(lst)):
for j in range(i+1,len(lst)):
# (a,b) and (x,y) are being compared
a = lst[i][0]
b = lst[i][1]
x = lst[j][0]
y = lst[j][1]
#both of these conditions mean that they cross
if x < a and b > y:
return True
if x > a and b < y:
return True
return False
The first should return false and the second true.
check_ranges([(7,16),(6,17),(5,18),(4,19)])
check_ranges([(5,16),(6,17),(5,18),(4,19)])
It works as it is now, but it seems really inefficient. Does anyone now if this is a common problem or if there is a more efficient solution?
You could sort, which will put at least the starting points in sorted order. Then you only really need to check the endpoint against the previous entry; it should be smaller:
from itertools import islice
def window(seq, n=2):
"Returns a sliding window (of width n) over data from the iterable"
" s -> (s0,s1,...s[n-1]), (s1,s2,...,sn), ... "
it = iter(seq)
result = tuple(islice(it, n))
if len(result) == n:
yield result
for elem in it:
result = result[1:] + (elem,)
yield result
def check_ranges(lst):
return any(a[1] < b[1] for a, b in window(sorted(lst)))
I'm using the window example tool from an older itertools documentation page here to create the sliding window.
This implementation returns:
>>> def check_ranges(lst):
... return any(a[1] < b[1] for a, b in window(sorted(lst)))
...
>>> check_ranges([(7,16),(6,17),(5,18),(4,19)])
False
>>> check_ranges([(5,16),(6,17),(5,18),(4,19)])
True
It is not entirely clear if matching end points would be a problem or not; if they are not, then you could change the < to a <= test instead.
I'm not sure about the algorithm which you are using to detect "crossover", but you could simplify your code using a comprehension and any:
return any((x<a and b<y or x>a and b<y)
for i,(a,b) in enumerate(lst)
for (x,y) in lst[i:])

Categories

Resources