Iterative merge sort? - python

I am aware of classical recursive approach to sort something by merging.
It yields O(n * log(n)) complexity, which can be more or less easily shown via recurrence relation.
I've tried to reimplement merge sort in iterative fashion:
def atomize(l):
return list(
map(
lambda x: [x],
l if l is not None else []
)
)
def merge(l, r):
res = []
while (len(l) + len(r)) > 0:
if len(l) < 1:
res += r
r = []
elif len(r) < 1:
res += l
l = []
else:
if l[0] <= r[0]:
res.append(l.pop(0))
else:
res.append(r.pop(0))
return res
def iter_merge_sort(l):
atoms = atomize(l) # O(n)
while len(atoms) > 1: # O(n - 1)
atoms.append(merge(atoms.pop(0), atoms.pop(0)))
return atoms[0]
...and feels like I am mistaken somewhere, yet I fail to notice exact place. Recursive merge sort splits problem unless list of unsorted values reduces to a list of singletons - single elements that can be compared. That's what atomize(...) does: given a list, produces a list of lists-singletons (order O(n)).
Obviously, merge(...) is O(n) as well: ignore for moment that no linked lists are used for concatenation, that's not important here.
Finally.. the while block in the iter_merge_sort(...) itself takes exactly n - 1 repetitions, each of which costs at most O(n). Hence, I took O(n * log(n)) algorithm and "improved" it to be (n - 1) * n ~ O(n * n). Where is my mistake?

Your algorithm is entirely correct. The problem lies in that you're using list.pop(0) as a way to dequeue, which costs O(n) in Python since all items after a popped item of a list have to be copied to the preceding positions.
You can use collections.deque in place of list so that you can use the deque.popleft method, which costs O(1):
from collections import deque
def atomize(l):
return deque(
map(
lambda x: deque([x]),
l if l is not None else []
)
)
def merge(l, r):
res = deque()
while (len(l) + len(r)) > 0:
if len(l) < 1:
res += r
r = deque()
elif len(r) < 1:
res += l
l = deque()
else:
if l[0] <= r[0]:
res.append(l.popleft())
else:
res.append(r.popleft())
return res
def iter_merge_sort(l):
atoms = atomize(l) # O(n)
while len(atoms) > 1: # O(n - 1)
atoms.append(merge(atoms.popleft(), atoms.popleft()))
return list(atoms[0])
so that:
iter_merge_sort([3,5,1,6,2,1])
returns:
[1, 1, 2, 3, 5, 6]

Related

Variation of finding edit distance with only insertions and deletions?

I need to find the edit distance between a word and its sorted word (ex: apple and aelpp), using only insertions and deletions recursively.
I have found some sources that used insertions, deletions, and substitutions, but I am not sure how to only use insertion and deletion.
This is the code I found:
def ld(s, t):
if not s: return len(t)
if not t: return len(s)
if s[0] == t[0]: return ld(s[1:], t[1:])
l1 = ld(s, t[1:])
l2 = ld(s[1:], t)
l3 = ld(s[1:], t[1:])
return 1 + min(l1, l2, l3)
What edits would need to be made to only find the number of insertions and deletions?
Remove l3, which computes substitutions like so
def ld2(s, t):
if not s: return len(t)
if not t: return len(s)
if s[0] == t[0]: return ld2(s[1:], t[1:])
l1 = ld2(s, t[1:])
l2 = ld2(s[1:], t)
return 1 + min(l1, l2)
You can see that ld('apple', 'applx') is equal to 1, while ld2 with the same parameters evaluates to 2.

compare 2 strings for common substring

i wish to find longest common substring of 2 given strings recursively .i have written this code but it is too inefficient .is there a way i can do it in O(m*n) here m an n are respective lengths of string.here's my code:
def lcs(x,y):
if len(x)==0 or len(y)==0:
return " "
if x[0]==y[0]:
return x[0] + lcs(x[1:],y[1:])
t1 = lcs(x[1:],y)
t2 = lcs(x,y[1:])
if len(t1)>len(t2):
return t1
else:
return t2
x = str(input('enter string1:'))
y = str(input('enter string2:'))
print(lcs(x,y))
You need to memoize your recursion. Without that, you will end up with an exponential number of calls since you will be repeatedly solving the same problem over and over again. To make the memoized lookups more efficient, you can define your recursion in terms of the suffix lengths, instead of the actual suffixes.
You can also find the pseudocode for the DP on Wikipedia.
Here is a naive non-recursive solution which uses the powerset() recipe from itertools:
from itertools import chain, combinations, product
def powerset(iterable):
"powerset([1,2,3]) --> () (1,) (2,) (3,) (1,2) (1,3) (2,3) (1,2,3)"
s = list(iterable)
return chain.from_iterable(combinations(s, r) for r in range(len(s) + 1))
def naive_lcs(a, b):
return ''.join(max(set(powerset(a)) & set(powerset(b)), key=len))
It has problems:
>>> naive_lcs('ab', 'ba')
'b'
>>> naive_lcs('ba', 'ab')
'b'
There can be more than one solution for some pairs of strings, but my program picks one arbitrarily.
Also, since any of the combinations can be the longest common one, and since calculating these combinations takes O(2 ^ n) time, this solution doesn't compute in O(n * m) time. With Dynamic Programming and memoizing OTOH we can find a solution that, in theory, should perform better:
from functools import lru_cache
#lru_cache()
def _dynamic_lcs(xs, ys):
if not (xs and ys):
return set(['']), 0
elif xs[-1] == ys[-1]:
result, rlen = _dynamic_lcs(xs[:-1], ys[:-1])
return set(each + xs[-1] for each in result), rlen + 1
else:
xlcs, xlen = _dynamic_lcs(xs, ys[:-1])
ylcs, ylen = _dynamic_lcs(xs[:-1], ys)
if xlen > ylen:
return xlcs, xlen
elif xlen < ylen:
return ylcs, ylen
else:
return xlcs | ylcs, xlen
def dynamic_lcs(xs, ys):
result, _ = _dynamic_lcs(xs, ys)
return result
if __name__ == '__main__':
seqs = list(powerset('abcde'))
for a, b in product(seqs, repeat=2):
assert naive_lcs(a, b) in dynamic_lcs(a, b)
dynamic_lcs() also solves the problem that some pairs strings can have multiple common longest sub-sequences. The result is the set of these, instead of one string. Finding the set of all common sub-sequences though is still of exponential complexity.
Thanks to Pradhan for reminding me of Dynamic Programming and memoization.

Evaluating Polynomial coefficients

I'm trying to write a function that takes as input a list of coefficients (a0, a1, a2, a3.....a n) of a polynomial p(x) and the value x. The function will return p(x), which is the value of the polynomial when evaluated at x.
A polynomial of degree n with coefficient a0, a1, a2, a3........an is the function
p(x)= a0+a1*x+a2*x^2+a3*x^3+.....+an*x^n
So I'm not sure how to attack the problem. I'm thinking that I will need a range but how can I make it so that it can handle any numerical input for x? I'm not expecting you guys to give the answer, I'm just in need of a little kick start. Do I need a for loop, while loop or could recursive be an option here?
def poly(lst, x)
I need to iterate over the items in the list, do I use the indices for that, but how can I make it iterate over an unknown number of items?
I'm thinking I can use recursion here:
def poly(lst, x):
n = len(lst)
If n==4:
return lst[o]+lst[1]*x+lst[2]*x**2+lst[3]*x**3
elif n==3:
return lst[o]+lst[1]*x+lst[2]*x**2
elif n==2:
return lst[o]+lst[1]*x
elif n==1:
return lst[o]
else:
return lst[o]+lst[1]*x+lst[2]*x**2+lst[3]*x**3+lst[n]*x**n
This works for n<=4 but I get a index error: list index out of range for n>4, can't see why though.
The most efficient way is to evaluate the polynomial backwards using Horner's Rule. Very easy to do in Python:
# Evaluate a polynomial in reverse order using Horner's Rule,
# for example: a3*x^3+a2*x^2+a1*x+a0 = ((a3*x+a2)x+a1)x+a0
def poly(lst, x):
total = 0
for a in reversed(lst):
total = total*x+a
return total
simple:
def poly(lst, x):
n, tmp = 0, 0
for a in lst:
tmp = tmp + (a * (x**n))
n += 1
return tmp
print poly([1,2,3], 2)
simple recursion:
def poly(lst, x, i = 0):
try:
tmp = lst.pop(0)
except IndexError:
return 0
return tmp * (x ** (i)) + poly(lst, x, i+1)
print poly([1,2,3], 2)
def evalPoly(lst, x):
total = 0
for power, coeff in enumerate(lst): # starts at 0 by default
total += (x**power) * coeff
return total
Alternatively, you can use a list and then use sum:
def evalPoly(lst, x):
total = []
for power, coeff in enumerate(lst):
total.append((x**power) * coeff)
return sum(total)
Without enumerate:
def evalPoly(lst, x):
total, power = 0, 0
for coeff in lst:
total += (x**power) * coeff
power += 1
return total
Alternative to non-enumerate method:
def evalPoly(lst, x):
total = 0
for power in range(len(lst)):
total += (x**power) * lst[power] # lst[power] is the coefficient
return total
Also #DSM stated, you can put this together in a single line:
def evalPoly(lst, x):
return sum((x**power) * coeff for power, coeff in enumerate(lst))
Or, using lambda:
evalPoly = lambda lst, x: sum((x**power) * coeff for power, coeff in enumerate(lst))
Recursive solution:
def evalPoly(lst, x, power = 0):
if power == len(lst): return (x**power) * lst[power]
return ((x**power) * lst[power]) + evalPoly(lst, x, power + 1)
enumerate(iterable, start) is a generator expression (so it uses yield instead of return that yields a number and then an element of the iterable. The number is equivalent to the index of the element + start.
From the Python docs, it is also the same as:
def enumerate(sequence, start=0):
n = start
for elem in sequence:
yield n, elem
n += 1
Either with recursion, or without, the essence of the solution is to create a loop on "n", because the polynomial starts at x^0 and goes up to a_n.x^n and that's the variable you should also consider as an input. Besides that, use a trick called multiply and accumulate to be able to calculate partial results on each loop iteration.
def evalPoly(lst, x, power):
if power == 0:
return lst[power]
return ((x**power) * lst[power]) + evalPoly(lst, x, power - 1)
lst = [7, 1, 2, 3]
x = 5
print(evalPoly(lst, x, 3))
Equation to evaluate is - 3x^3 + 2x^2 + x + 7
when x = 5, result is - 437

Checking if ranges cross paths

I wrote the following method to check if a list of ranges cross paths. Another way of saying this is that the ranges are not nested.
def check_ranges(lst):
for i in range(len(lst)):
for j in range(i+1,len(lst)):
# (a,b) and (x,y) are being compared
a = lst[i][0]
b = lst[i][1]
x = lst[j][0]
y = lst[j][1]
#both of these conditions mean that they cross
if x < a and b > y:
return True
if x > a and b < y:
return True
return False
The first should return false and the second true.
check_ranges([(7,16),(6,17),(5,18),(4,19)])
check_ranges([(5,16),(6,17),(5,18),(4,19)])
It works as it is now, but it seems really inefficient. Does anyone now if this is a common problem or if there is a more efficient solution?
You could sort, which will put at least the starting points in sorted order. Then you only really need to check the endpoint against the previous entry; it should be smaller:
from itertools import islice
def window(seq, n=2):
"Returns a sliding window (of width n) over data from the iterable"
" s -> (s0,s1,...s[n-1]), (s1,s2,...,sn), ... "
it = iter(seq)
result = tuple(islice(it, n))
if len(result) == n:
yield result
for elem in it:
result = result[1:] + (elem,)
yield result
def check_ranges(lst):
return any(a[1] < b[1] for a, b in window(sorted(lst)))
I'm using the window example tool from an older itertools documentation page here to create the sliding window.
This implementation returns:
>>> def check_ranges(lst):
... return any(a[1] < b[1] for a, b in window(sorted(lst)))
...
>>> check_ranges([(7,16),(6,17),(5,18),(4,19)])
False
>>> check_ranges([(5,16),(6,17),(5,18),(4,19)])
True
It is not entirely clear if matching end points would be a problem or not; if they are not, then you could change the < to a <= test instead.
I'm not sure about the algorithm which you are using to detect "crossover", but you could simplify your code using a comprehension and any:
return any((x<a and b<y or x>a and b<y)
for i,(a,b) in enumerate(lst)
for (x,y) in lst[i:])

What is the best way to get all the divisors of a number?

Here's the very dumb way:
def divisorGenerator(n):
for i in xrange(1,n/2+1):
if n%i == 0: yield i
yield n
The result I'd like to get is similar to this one, but I'd like a smarter algorithm (this one it's too much slow and dumb :-)
I can find prime factors and their multiplicity fast enough.
I've an generator that generates factor in this way:
(factor1, multiplicity1)
(factor2, multiplicity2)
(factor3, multiplicity3)
and so on...
i.e. the output of
for i in factorGenerator(100):
print i
is:
(2, 2)
(5, 2)
I don't know how much is this useful for what I want to do (I coded it for other problems), anyway I'd like a smarter way to make
for i in divisorGen(100):
print i
output this:
1
2
4
5
10
20
25
50
100
UPDATE: Many thanks to Greg Hewgill and his "smart way" :)
Calculating all divisors of 100000000 took 0.01s with his way against the 39s that the dumb way took on my machine, very cool :D
UPDATE 2: Stop saying this is a duplicate of this post. Calculating the number of divisor of a given number doesn't need to calculate all the divisors. It's a different problem, if you think it's not then look for "Divisor function" on wikipedia. Read the questions and the answer before posting, if you do not understand what is the topic just don't add not useful and already given answers.
Given your factorGenerator function, here is a divisorGen that should work:
def divisorGen(n):
factors = list(factorGenerator(n))
nfactors = len(factors)
f = [0] * nfactors
while True:
yield reduce(lambda x, y: x*y, [factors[x][0]**f[x] for x in range(nfactors)], 1)
i = 0
while True:
f[i] += 1
if f[i] <= factors[i][1]:
break
f[i] = 0
i += 1
if i >= nfactors:
return
The overall efficiency of this algorithm will depend entirely on the efficiency of the factorGenerator.
To expand on what Shimi has said, you should only be running your loop from 1 to the square root of n. Then to find the pair, do n / i, and this will cover the whole problem space.
As was also noted, this is a NP, or 'difficult' problem. Exhaustive search, the way you are doing it, is about as good as it gets for guaranteed answers. This fact is used by encryption algorithms and the like to help secure them. If someone were to solve this problem, most if not all of our current 'secure' communication would be rendered insecure.
Python code:
import math
def divisorGenerator(n):
large_divisors = []
for i in xrange(1, int(math.sqrt(n) + 1)):
if n % i == 0:
yield i
if i*i != n:
large_divisors.append(n / i)
for divisor in reversed(large_divisors):
yield divisor
print list(divisorGenerator(100))
Which should output a list like:
[1, 2, 4, 5, 10, 20, 25, 50, 100]
I think you can stop at math.sqrt(n) instead of n/2.
I will give you example so that you can understand it easily. Now the sqrt(28) is 5.29 so ceil(5.29) will be 6. So I if I will stop at 6 then I will can get all the divisors. How?
First see the code and then see image:
import math
def divisors(n):
divs = [1]
for i in xrange(2,int(math.sqrt(n))+1):
if n%i == 0:
divs.extend([i,n/i])
divs.extend([n])
return list(set(divs))
Now, See the image below:
Lets say I have already added 1 to my divisors list and I start with i=2 so
So at the end of all the iterations as I have added the quotient and the divisor to my list all the divisors of 28 are populated.
Source: How to determine the divisors of a number
Although there are already many solutions to this, I really have to post this :)
This one is:
readable
short
self contained, copy & paste ready
quick (in cases with a lot of prime factors and divisors, > 10 times faster than the accepted solution)
python3, python2 and pypy compliant
Code:
def divisors(n):
# get factors and their counts
factors = {}
nn = n
i = 2
while i*i <= nn:
while nn % i == 0:
factors[i] = factors.get(i, 0) + 1
nn //= i
i += 1
if nn > 1:
factors[nn] = factors.get(nn, 0) + 1
primes = list(factors.keys())
# generates factors from primes[k:] subset
def generate(k):
if k == len(primes):
yield 1
else:
rest = generate(k+1)
prime = primes[k]
for factor in rest:
prime_to_i = 1
# prime_to_i iterates prime**i values, i being all possible exponents
for _ in range(factors[prime] + 1):
yield factor * prime_to_i
prime_to_i *= prime
# in python3, `yield from generate(0)` would also work
for factor in generate(0):
yield factor
An illustrative Pythonic one-liner:
from itertools import chain
from math import sqrt
def divisors(n):
return set(chain.from_iterable((i,n//i) for i in range(1,int(sqrt(n))+1) if n%i == 0))
But better yet, just use sympy:
from sympy import divisors
I like Greg solution, but I wish it was more python like.
I feel it would be faster and more readable;
so after some time of coding I came out with this.
The first two functions are needed to make the cartesian product of lists.
And can be reused whnever this problem arises.
By the way, I had to program this myself, if anyone knows of a standard solution for this problem, please feel free to contact me.
"Factorgenerator" now returns a dictionary. And then the dictionary is fed into "divisors", who uses it to generate first a list of lists, where each list is the list of the factors of the form p^n with p prime.
Then we make the cartesian product of those lists, and we finally use Greg' solution to generate the divisor.
We sort them, and return them.
I tested it and it seem to be a bit faster than the previous version. I tested it as part of a bigger program, so I can't really say how much is it faster though.
Pietro Speroni (pietrosperoni dot it)
from math import sqrt
##############################################################
### cartesian product of lists ##################################
##############################################################
def appendEs2Sequences(sequences,es):
result=[]
if not sequences:
for e in es:
result.append([e])
else:
for e in es:
result+=[seq+[e] for seq in sequences]
return result
def cartesianproduct(lists):
"""
given a list of lists,
returns all the possible combinations taking one element from each list
The list does not have to be of equal length
"""
return reduce(appendEs2Sequences,lists,[])
##############################################################
### prime factors of a natural ##################################
##############################################################
def primefactors(n):
'''lists prime factors, from greatest to smallest'''
i = 2
while i<=sqrt(n):
if n%i==0:
l = primefactors(n/i)
l.append(i)
return l
i+=1
return [n] # n is prime
##############################################################
### factorization of a natural ##################################
##############################################################
def factorGenerator(n):
p = primefactors(n)
factors={}
for p1 in p:
try:
factors[p1]+=1
except KeyError:
factors[p1]=1
return factors
def divisors(n):
factors = factorGenerator(n)
divisors=[]
listexponents=[map(lambda x:k**x,range(0,factors[k]+1)) for k in factors.keys()]
listfactors=cartesianproduct(listexponents)
for f in listfactors:
divisors.append(reduce(lambda x, y: x*y, f, 1))
divisors.sort()
return divisors
print divisors(60668796879)
P.S.
it is the first time I am posting to stackoverflow.
I am looking forward for any feedback.
Here is a smart and fast way to do it for numbers up to and around 10**16 in pure Python 3.6,
from itertools import compress
def primes(n):
""" Returns a list of primes < n for n > 2 """
sieve = bytearray([True]) * (n//2)
for i in range(3,int(n**0.5)+1,2):
if sieve[i//2]:
sieve[i*i//2::i] = bytearray((n-i*i-1)//(2*i)+1)
return [2,*compress(range(3,n,2), sieve[1:])]
def factorization(n):
""" Returns a list of the prime factorization of n """
pf = []
for p in primeslist:
if p*p > n : break
count = 0
while not n % p:
n //= p
count += 1
if count > 0: pf.append((p, count))
if n > 1: pf.append((n, 1))
return pf
def divisors(n):
""" Returns an unsorted list of the divisors of n """
divs = [1]
for p, e in factorization(n):
divs += [x*p**k for k in range(1,e+1) for x in divs]
return divs
n = 600851475143
primeslist = primes(int(n**0.5)+1)
print(divisors(n))
If your PC has tons of memory, a brute single line can be fast enough with numpy:
N = 10000000; tst = np.arange(1, N); tst[np.mod(N, tst) == 0]
Out:
array([ 1, 2, 4, 5, 8, 10, 16,
20, 25, 32, 40, 50, 64, 80,
100, 125, 128, 160, 200, 250, 320,
400, 500, 625, 640, 800, 1000, 1250,
1600, 2000, 2500, 3125, 3200, 4000, 5000,
6250, 8000, 10000, 12500, 15625, 16000, 20000,
25000, 31250, 40000, 50000, 62500, 78125, 80000,
100000, 125000, 156250, 200000, 250000, 312500, 400000,
500000, 625000, 1000000, 1250000, 2000000, 2500000, 5000000])
Takes less than 1s on my slow PC.
Adapted from CodeReview, here is a variant which works with num=1 !
from itertools import product
import operator
def prod(ls):
return reduce(operator.mul, ls, 1)
def powered(factors, powers):
return prod(f**p for (f,p) in zip(factors, powers))
def divisors(num) :
pf = dict(prime_factors(num))
primes = pf.keys()
#For each prime, possible exponents
exponents = [range(i+1) for i in pf.values()]
return (powered(primes,es) for es in product(*exponents))
Old question, but here is my take:
def divs(n, m):
if m == 1: return [1]
if n % m == 0: return [m] + divs(n, m - 1)
return divs(n, m - 1)
You can proxy with:
def divisorGenerator(n):
for x in reversed(divs(n, n)):
yield x
NOTE: For languages that support, this could be tail recursive.
Assuming that the factors function returns the factors of n (for instance, factors(60) returns the list [2, 2, 3, 5]), here is a function to compute the divisors of n:
function divisors(n)
divs := [1]
for fact in factors(n)
temp := []
for div in divs
if fact * div not in divs
append fact * div to temp
divs := divs + temp
return divs
Here's my solution. It seems to be dumb but works well...and I was trying to find all proper divisors so the loop started from i = 2.
import math as m
def findfac(n):
faclist = [1]
for i in range(2, int(m.sqrt(n) + 2)):
if n%i == 0:
if i not in faclist:
faclist.append(i)
if n/i not in faclist:
faclist.append(n/i)
return facts
If you only care about using list comprehensions and nothing else matters to you!
from itertools import combinations
from functools import reduce
def get_devisors(n):
f = [f for f,e in list(factorGenerator(n)) for i in range(e)]
fc = [x for l in range(len(f)+1) for x in combinations(f, l)]
devisors = [1 if c==() else reduce((lambda x, y: x * y), c) for c in set(fc)]
return sorted(devisors)
My solution via generator function is:
def divisor(num):
for x in range(1, num + 1):
if num % x == 0:
yield x
while True:
yield None
Try to calculate square root a given number and then loop range(1,square_root+1).
number = int(input("Enter a Number: "))
square_root = round(number ** (1.0 / 2))
print(square_root)
divisor_list = []
for i in range(1,square_root+1):
if number % i == 0: # Check if mod return 0 if yes then append i and number/i in the list
divisor_list.append(i)
divisor_list.append(int(number/i))
print(divisor_list)
def divisorGen(n): v = n last = [] for i in range(1, v+1) : if n % i == 0 : last.append(i)
I donĀ“t understand why there are so many complicated solutions to this problem.
Here is my take on it:
def divisors(n):
lis =[1]
s = math.ceil(math.sqrt(n))
for g in range(s,1, -1):
if n % g == 0:
lis.append(g)
lis.append(int(n / g))
return (set(lis))
return [x for x in range(n+1) if n/x==int(n/x)]

Categories

Resources