I want an infinite generator of all "pair-terms". Where 0 is pair-term and a tuple (a,b) of two pair-terms is a pair term. It's only important that each item appears at least once (in a finite amount of time), but exactly once would be more efficient.
I came up with
def pairTerms():
yield 0
generated=[]
diagonal=-1 #sum indices in generated of the pairs we are generating, could be replaced by len(generated)-1
for t in pairTerms():
generated.append(t)
diagonal+=1
for i,a in enumerate(generated):
yield (a,generated[diagonal-i])
But this quickly fills up the memory.
EDIT: this approach actually seems to work good enough, generating over 10 million terms before fulling up the memory.
Alternatively:
def pairTermsDepth(depth):
yield 0
if depth:
for a in pairTermsDepth(depth-1):
for b in pairTermsDepth(depth-1):
yield (a,b)
def pairTerms():
i=0
while True:
for item in pairTermsDepth(i):
i+=1
yield item
But this has the disadvantage of re-listing all old terms when a new while iteration has been reached and exhausting the stack.
Note: I didn't quite know how to tag this question, feel free to change them.
The following approach can find the first 100 million terms in half a minute on my computer (printing them will take longer), and the memory usage for generating the first N terms is O(sqrt(N)).
def pair_terms() :
yield 0
# By delaying this recursion until after a yield, we avoid
# an infinite recursive loop.
generated = []
generator = pair_terms()
this = generator.next()
while True:
for j in range(len(generated)):
yield (this, generated[j])
yield (generated[j], this)
yield (this, this)
generated.append(this)
this = generator.next()
The trick is that to produce the n'th term, I only need to keep a record of terms up to sqrt(n). I do that by having the generator call itself recursively. That seems like extra work, but since you're only making O(sqrt(n)) recursive calls, the overhead of the recursive calls is a rounding error compared to generating results.
If you care more about memory than speed you can also try listing them by length, as such:
def pairTermsLength(L):
if L == 1:
yield 0
else:
for k in range(1,L//2+1):
for a in pairTermsLength(k):
if L-k != k:
for b in pairTermsLength(L-k):
yield(a,b)
yield(b,a)
else:
for b in pairTermsLength(L-k):
yield(a,b)
def pairTerms():
L = 1
while True:
for p in pairTermsLength(L):
yield p
L += 1
This will use memory and recursion depth linear to the length (in number of 0's) of the longest pair-term generated. The number of pair-terms of length n is the n-th Catalan number, which grows exponentially with n, so the memory consumption is O(log(n)). To give you an idea, for a length of 30 you are already in 10^16 territory, which is probably way more than you have time for anyway, even with a faster algorithm.
Related
I've written a program to benchmark two ways of finding "the longest Collatz chain for integers less than some bound".
The first way is with "backtrack memoization" which keeps track of the current chain from start till hash table collision (in a stack) and then pops all the values into the hash table (with incrementing chain length values).
The second way is with simpler memoization that only memoizes the starting value of the chain.
To my surprise and confusion, the algorithm that memoizes the entirety of the sub-chain up until the first collision is consistently slower than the algorithm which only memoizes the starting value.
I'm wondering if this is due to one of the following factors:
Is Python really slow with stacks? Enough that it offsets performance
gains
Is my code/algorithm bad?
Is it simply the case that, statistically, as integers grow large,
the time spent revisiting the non-memoized elements of previously
calculated Collatz chains/sub-chains is asymptotically minimal, to
the point that any overhead due to popping elements off a stack
simply isn't worth the gains?
In short, I'm wondering if this unexpected result is due to the language, the code, or math (i.e. the statistics of Collatz).
import time
def results(backtrackMemoization, start, maxChainValue, collatzDict):
print()
print(("with " if backtrackMemoization else "without ") + "backtracking memoization")
print("length of " + str(collatzDict[maxChainValue[0]]) + " found for n = " + str(maxChainValue[0]))
print("computed in " + str(round(time.time() - start, 3)) + " seconds")
def collatz(backtrackMemoization, start, maxChainValue, collatzDict):
for target in range(1, maxNum):
n = target
if (backtrackMemoization):
stack = []
else:
length = 0
while (n not in collatzDict):
if (backtrackMemoization):
stack.append(n)
else:
length = length + 1
if (n % 2):
n = 3 * n + 1
else:
n = n // 2
if (backtrackMemoization):
additionalLength = 1
while (len(stack) > 0):
collatzDict[stack.pop()] = collatzDict[n] + additionalLength
additionalLength = additionalLength + 1
else:
collatzDict[target] = collatzDict[n] + length
if (collatzDict[target] > collatzDict[maxChainValue[0]]):
maxChainValue[0] = target
def benchmarkAlgo(maxNum, backtrackMemoization):
start = time.time()
maxChainValue = [1]
collatzDict = {1:0}
collatz(backtrackMemoization, start, maxChainValue, collatzDict)
results(backtrackMemoization, start, maxChainValue, collatzDict)
try:
maxNum = int(input("enter upper bound> "))
print("setting upper bound to " + str(maxNum))
except:
maxNum = 100000
print("defaulting upper bound to " + str(maxNum))
benchmarkAlgo(maxNum, True)
benchmarkAlgo(maxNum, False)
There is a tradeoff in your code. Without the backtrack memoization, dictionary lookups will miss about twice as many times as when you use it. For example, if maxNum = 1,000,000 then the number of missed dictionary lookups is
without backtrack memoization: 5,226,259
with backtrack memoization: 2,168,610
On the other hand, with backtrack memoization, you are constructing a much bigger dictionary since you are collecting lengths of chains not only for the target values, but also for any value that is encountered in the middle of a chain. Here is the final length of collatzDict for maxNum = 1,000,000:
without backtrack memoization: 999,999
with backtrack memoization: 2,168,611
There is a cost of writing to this dictionary that many more times, popping all these additional values from the stack, etc. It seems that in the end, this cost outweighs the benefits of reducing dictionary lookup misses. In my tests, the code with backtrack memoization run about 20% slower.
It is possible to optimize backtrack memoization, to keep the dictionary lookup misses low while reducing the cost of constructing the dictionary:
Let the stack consist of tuples (n, i) where n is as in your code, and i is the length of the chain traversed up to this point (i.e. i is incremented at every iteration of the while loop). Such a tuple is put on the stack only if n < maxNum. In addition, keep track of how long the whole chain gets before you find a value that is already in the dictionary (i.e. of the total number of iterations of the while loop).
The information collected in this way will let you construct new dictionary entries from the tuples that were put on the stack.
The dictionary obtained in this way will be exactly the same as the one constructed without backtrack memoization, but it will be built in a more efficient way, since a key n will be added when it is first encountered. For this reason, dictionary lookup misses will be still much lower than without backtrack memoization. Here are the numbers of misses I obtained for maxNum = 1,000,000:
without backtrack memoization: 5,226,259
with backtrack memoization: 2,168,610
with optimized backtrack memoization: 2,355,035
For larger values of maxNum the optimized code should run faster than without backtrack memoization. In my tests it was about 25% faster for maxNum >= 1,000,000 .
I look up online and know that list.pop() has O(1) time complexity but list.pop(i) has O(n) time complexity. While I am writing leetcode, many people use pop(i) in a for loop and they say it is O(n) time complexity and in fact it is faster than my code, which only uses one loop but many lines in that loop. I wonder why this would happen, and should I use pop(i) instead of many lines to avoid it?
Example: Leetcode 26. Remove Duplicates from Sorted Array
My code: (faster than 75%)
class Solution(object):
def removeDuplicates(self, nums):
"""
:type nums: List[int]
:rtype: int
"""
left, right = 0, 0
count = 1
while right < len(nums)-1:
if nums[right] == nums[right+1]:
right += 1
else:
nums[left+1]=nums[right+1]
left += 1
right += 1
count += 1
return count
and other people's code, faster than 90%: (this guy does not say O(n), but why O(n^2) faster than my O(n)?)
https://leetcode.com/problems/remove-duplicates-from-sorted-array/discuss/477370/python-3%3A-straight-forward-6-lines-solution-90-faster-100-less-memory
My optimized code (faster than 89%)
class Solution(object):
def removeDuplicates(self, nums):
"""
:type nums: List[int]
:rtype: int
"""
left, right = 0, 0
while right < len(nums)-1:
if nums[right] != nums[right+1]:
nums[left+1]=nums[right+1]
left += 1
right += 1
return left + 1
Your algorithm genuinely does take O(n) time and the "pop in reverse order" algorithm genuinely does take O(n²) time. However, LeetCode isn't reporting that your time complexity is better than 89% of submissions; it is reporting your actual running time is better than 89% of all submissions. The actual running time depends on what inputs the algorithm is tested with; not just the sizes but also the number of duplicates.
It also depends how the running times across multiple test cases are averaged; if most of the test cases are for small inputs where the quadratic solution is faster, then the quadratic solution may come out ahead overall even though its time complexity is higher. #Heap Overflow also points out in the comments that the overhead time of LeetCode's judging system is proportionally large and quite variable compared to the time it takes for the algorithms to run, so the discrepancy could simply be due to random variation in that overhead.
To shed some light on this, I measured running times using timeit. The graph below shows my results; the shapes are exactly what you'd expect given the time complexities, and the crossover point is somewhere between 8000 < n < 9000 on my machine. This is based on sorted lists where each distinct element appears on average twice. The code I used to generate the times is given below.
Timing code:
def linear_solution(nums):
left, right = 0, 0
while right < len(nums)-1:
if nums[right] != nums[right+1]:
nums[left+1]=nums[right+1]
left += 1
right += 1
return left + 1
def quadratic_solution(nums):
prev_obj = []
for i in range(len(nums)-1,-1,-1):
if prev_obj == nums[i]:
nums.pop(i)
prev_obj = nums[i]
return len(nums)
from random import randint
from timeit import timeit
def gen_list(n):
max_n = n // 2
return sorted(randint(0, max_n) for i in range(n))
# I used a step size of 1000 up to 15000, then a step size of 5000 up to 50000
step = 1000
max_n = 15000
reps = 100
print('n', 'linear time (ms)', 'quadratic time (ms)', sep='\t')
for n in range(step, max_n+1, step):
# generate input lists
lsts1 = [ gen_list(n) for i in range(reps) ]
# copy the lists by value, since the algorithms will mutate them
lsts2 = [ list(g) for g in lsts1 ]
# use iterators to supply the input lists one-by-one to timeit
iter1 = iter(lsts1)
iter2 = iter(lsts2)
t1 = timeit(lambda: linear_solution(next(iter1)), number=reps)
t2 = timeit(lambda: quadratic_solution(next(iter2)), number=reps)
# timeit reports the total time in seconds across all reps
print(n, 1000*t1/reps, 1000*t2/reps, sep='\t')
The conclusion is that your algorithm is indeed faster than the quadratic solution for large enough inputs, but the inputs LeetCode is using to measure running times are not "large enough" to overcome the variation in the judging overhead, and the fact that the average includes times measured on smaller inputs where the quadratic algorithm is faster.
Just because the solution is not O(n), you can't assume it to be O(n^2).
It doesn't quite become O(n^2) because he is using pop in reverse order which decreases the time to pop every time, using pop(i) on forward order will consume more time than that on reverse, as the pop searches from reverse and in every loop he is decreasing the number of elements on the back. Try that same solution in non-reverse order, run few times to make sure, you'll see.
Anyway, regarding why his solution is faster, You have an if condition with a lot of variables, he has only used one variable prev_obj, using the reverse order makes it possible to do with just one variable. So the number of basic mathematical operations are more in your case, so with same O(n) complexity each of your n-loops is longer than his.
Just look at your count varible, in every iteration its value is left+1 you could return left+1, just removing that would decrease n amount of count=count+1 you have to do.
I just posted this solution and it is 76% faster
class Solution:
def removeDuplicates(self, nums: List[int]) -> int:
a=sorted(set(nums),key=lambda item:item)
for i,v in enumerate(a):
nums[i]=v
return len(a)
and this one gives faster than 90%.
class Solution:
def removeDuplicates(self, nums: List[int]) -> int:
a ={k:1 for k in nums} #<--- this is O(n)
for i,v in enumerate(a.keys()): #<--- this is another O(n), but the length is small so O(m)
nums[i]=v
return len(a)
You can say both of them are more than O(n) if you look at the for loop,
But since we are working with dublicate members when I am looping over the reduced memebers while your code is looping over all memebers. So the time required to make that unique set/dict is if lesser than time required for you to loop over those extra members and to check for if conditions, then my solution can be faster.
I am fairly new to Python and I have been trying to find a fast way to find primes till a given number.
When I use the Prime of Eratosthenes sieve using the following code:
#Finding primes till 40000.
import time
start = time.time()
def prime_eratosthenes(n):
list = []
prime_list = []
for i in range(2, n+1):
if i not in list:
prime_list.append(i)
for j in range(i*i, n+1, i):
list.append(j)
return prime_list
lists = prime_eratosthenes(40000)
print lists
end = time.time()
runtime = end - start
print "runtime =",runtime
Along with the list containing the primes, I get a line like the one below as output:
runtime = 20.4290001392
Depending upon the RAM being used etc, I usually consistently get a value within an range of +-0.5.
However when I try to find the primes till 40000 using a brute force method as in the following code:
import time
start = time.time()
prime_lists = []
for i in range(1,40000+1):
for j in range(2,i):
if i%j==0:
break
else:
prime_lists.append(i)
print prime_lists
end = time.time()
runtime = end - start
print "runtime =",runtime
This time, along with the the list of primes, I get a smaller value for runtime:
runtime = 16.0729999542
The value only varies within a range of +-0.5.
Clearly, the sieve is slower than the brute force method.
I also observed that the difference between the runtimes in the two cases only increases with an increase in the value 'n' till which primes are to be found.
Can anyone give a logical explanation for the above mentioned behavior? I expected the sieve to function more efficiently than the brute force method but it seems to work vice-versa here.
While appending to a list is not the best way to implement this algorithm (the original algorithm uses fixed size arrays), it is amortized constant time. I think a bigger issue is if i not in list which is linear time. The best change you can make for larger inputs is having the outer for loop only check up to sqrt(n), which saves a lot of computation.
A better approach is to keep a boolean array which keeps track of striking off numbers, like what is seen in the Wikipedia article for the Sieve. This way, skipping numbers is constant time since it's an array access.
For example:
def sieve(n):
nums = [0] * n
for i in range(2, int(n**0.5)+1):
if nums[i] == 0:
for j in range(i*i, n, i):
nums[j] = 1
return [i for i in range(2, n) if nums[i] == 0]
So to answer your question, your two for loops make the algorithm do potentially O(n^2) work, while being smart about the outer for loop makes the new algorithm take up to O(n sqrt(n)) time (in practice, for reasonably-sized n, the runtime is closer to O(n))
I watched the talk Three Beautiful Quicksorts and was messing around with quicksort. My implementation in python was very similar to c (select pivot, partition around it and recursing over smaller and larger partitions). Which I thought wasn't pythonic.
So this is the implementation using list comprehension in python.
def qsort(list):
if list == []:
return []
pivot = list[0]
l = qsort([x for x in list[1:] if x < pivot])
u = qsort([x for x in list[1:] if x >= pivot])
return l + [pivot] + u
Lets call the recursion metho qsortR. now I noticed that qsortR runs much slower than qsort for large(r) lists. Actually "maximum recursion depth exceeded in cmp" even for 1000 elems for recursion method. Which I reset in sys.setrecursionlimit.
Some numbers:
list-compr 1000 elems 0.491770029068
recursion 1000 elems 2.24620914459
list-compr 2000 elems 0.992327928543
recursion 2000 elems 7.72630095482
All the code is here.
I have a couple of questions:
Why is list comprehension so much faster?
Some enlightenment on the limit on recursion in python. I first set it to 100000 in what cases should I be careful?
(What exactly is meant by 'optimizing tail recursion', how is it done?)
Trying to sort 1000000 elements hogged memory of my laptop (with the recursion method). What should I do if I want to sort so many elements? What kind of optimizations are possible?
Why is list comprehension so much faster?
Because list comprehension implies C loop which is much faster than slow general way of using Python's for block.
Some enlightenment on the limit on recursion in python. I first set it to 100000 in what cases should I be careful?
In case you run out of memory.
Trying to sort 1000000 elements hogged memory of my laptop (with the recursion method). What should I do if I want to sort so many elements? What kind of optimizations are possible?
Python's recursion gives such an overhead because every function call allocates a lot of stack memory space on each call.
In general, iteration is the answer (will give better performance in statistically 99% of use cases).
Talking about memory structures, if you have simple data structures, like chars, integers, floats: use built-in array.array which is much more memory efficient than a list.
Have you tried writing a non-recursive implementation of partition? I suspect that the performance difference is purely the partition implementation. You are recursing for each element in your implementation.
Update
Here's a quick implementation. It is still not super fast or even efficient, but it is much better than your original recursive one.
>>> def partition(data):
... pivot = data[0]
... less, equal, greater = [], [], []
... for elm in data:
... if elm < pivot:
... less.append(elm)
... elif elm > pivot:
... greater.append(elm)
... else:
... equal.append(elm)
... return less, equal, greater
...
>>> def qsort2(data):
... if data:
... less, equal, greater = partition(data)
... return qsort2(less) + equal + qsort2(greater)
... return data
...
I also think that there are a larger number of temporary lists generated in the "traditional" version.
Try to compare the list comprehension to an in-place algorithm when the memory goes really big. The code below get a near execution time when sorting 100K integers numbers, but you will probably get stucked in the list comprehension solution when sorting 1M integers. I've made the tests using a 4Gb machine. The full code: http://snipt.org/Aaaje2
class QSort:
def __init__(self, lst):
self.lst = lst
def sorted(self):
self.qsort_swap(0, len(self.lst))
return self.lst
def qsort_swap(self, begin, end):
if (end - begin) > 1:
pivot = self.lst[begin]
l = begin + 1
r = end
while l < r:
if self.lst[l] <= pivot:
l += 1
else:
r -= 1
self.lst[l], self.lst[r] = self.lst[r], self.lst[l]
l -= 1
self.lst[begin], self.lst[l] = self.lst[l], self.lst[begin]
# print begin, end, self.lst
self.qsort_swap(begin, l)
self.qsort_swap(r, end)
I have this, and it works:
# E. Given two lists sorted in increasing order, create and return a merged
# list of all the elements in sorted order. You may modify the passed in lists.
# Ideally, the solution should work in "linear" time, making a single
# pass of both lists.
def linear_merge(list1, list2):
finalList = []
for item in list1:
finalList.append(item)
for item in list2:
finalList.append(item)
finalList.sort()
return finalList
# +++your code here+++
return
But, I'd really like to learn this stuff well. :) What does 'linear' time mean?
Linear means O(n) in Big O notation, while your code uses a sort() which is most likely O(nlogn).
The question is asking for the standard merge algorithm. A simple Python implementation would be:
def merge(l, m):
result = []
i = j = 0
total = len(l) + len(m)
while len(result) != total:
if len(l) == i:
result += m[j:]
break
elif len(m) == j:
result += l[i:]
break
elif l[i] < m[j]:
result.append(l[i])
i += 1
else:
result.append(m[j])
j += 1
return result
>>> merge([1,2,6,7], [1,3,5,9])
[1, 1, 2, 3, 5, 6, 7, 9]
Linear time means that the time taken is bounded by some undefined constant times (in this context) the number of items in the two lists you want to merge. Your approach doesn't achieve this - it takes O(n log n) time.
When specifying how long an algorithm takes in terms of the problem size, we ignore details like how fast the machine is, which basically means we ignore all the constant terms. We use "asymptotic notation" for that. These basically describe the shape of the curve you would plot in a graph of problem size in x against time taken in y. The logic is that a bad curve (one that gets steeper quickly) will always lead to a slower execution time if the problem is big enough. It may be faster on a very small problem (depending on the constants, which probably depends on the machine) but for small problems the execution time isn't generally a big issue anyway.
The "big O" specifies an upper bound on execution time. There are related notations for average execution time and lower bounds, but "big O" is the one that gets all the attention.
O(1) is constant time - the problem size doesn't matter.
O(log n) is a quite shallow curve - the time increases a bit as the problem gets bigger.
O(n) is linear time - each unit increase means it takes a roughly constant amount of extra time. The graph is (roughly) a straight line.
O(n log n) curves upwards more steeply as the problem gets more complex, but not by very much. This is the best that a general-purpose sorting algorithm can do.
O(n squared) curves upwards a lot more steeply as the problem gets more complex. This is typical for slower sorting algorithms like bubble sort.
The nastiest algorithms are classified as "np-hard" or "np-complete" where the "np" means "non-polynomial" - the curve gets steeper quicker than any polynomial. Exponential time is bad, but some are even worse. These kinds of things are still done, but only for very small problems.
EDIT the last paragraph is wrong, as indicated by the comment. I do have some holes in my algorithm theory, and clearly it's time I checked the things I thought I had figured out. In the mean time, I'm not quite sure how to correct that paragraph, so just be warned.
For your merging problem, consider that your two input lists are already sorted. The smallest item from your output must be the smallest item from one of your inputs. Get the first item from both and compare the two, and put the smallest in your output. Put the largest back where it came from. You have done a constant amount of work and you have handled one item. Repeat until both lists are exhausted.
Some details... First, putting the item back in the list just to pull it back out again is obviously silly, but it makes the explanation easier. Next - one input list will be exhausted before the other, so you need to cope with that (basically just empty out the rest of the other list and add it to the output). Finally - you don't actually have to remove items from the input lists - again, that's just the explanation. You can just step through them.
Linear time means that the runtime of the program is proportional to the length of the input. In this case the input consists of two lists. If the lists are twice as long, then the program will run approximately twice as long. Technically, we say that the algorithm should be O(n), where n is the size of the input (in this case the length of the two input lists combined).
This appears to be homework, so I will no supply you with an answer. Even though this is not homework, I am of the opinion that you will be best served by taking a pen and a piece of paper, construct two smallish example lists which are sorted, and figure out how you would merge those two lists, by hand. Once you figured that out, implementing the algorithm is a piece of cake.
(If all goes well, you will notice that you need to iterate over each list only once, in a single direction. That means that the algorithm is indeed linear. Good luck!)
If you build the result in reverse sorted order, you can use pop() and still be O(N)
pop() from the right end of the list does not require shifting the elements, so is O(1)
Reversing the list before we return it is O(N)
>>> def merge(l, r):
... result = []
... while l and r:
... if l[-1] > r[-1]:
... result.append(l.pop())
... else:
... result.append(r.pop())
... result+=(l+r)[::-1]
... result.reverse()
... return result
...
>>> merge([1,2,6,7], [1,3,5,9])
[1, 1, 2, 3, 5, 6, 7, 9]
This thread contains various implementations of a linear-time merge algorithm. Note that for practical purposes, you would use heapq.merge.
Linear time means O(n) complexity. You can read something about algorithmn comlexity and big-O notation here: http://en.wikipedia.org/wiki/Big_O_notation .
You should try to combine those lists not after getting them in the finalList, try to merge them gradually - adding an element, assuring the result is sorted, then add next element... this should give you some ideas.
A simpler version which will require equal sized lists:
def merge_sort(L1, L2):
res = []
for i in range(len(L1)):
if(L1[i]<L2[i]):
first = L1[i]
secound = L2[i]
else:
first = L2[i]
secound = L1[i]
res.extend([first,secound])
return res
itertoolz provides an efficient implementation to merge two sorted lists
https://toolz.readthedocs.io/en/latest/_modules/toolz/itertoolz.html#merge_sorted
'Linear time' means that time is an O(n) function, where n - the number of items input (items in the lists).
f(n) = O(n) means that that there exist constants x and y such that x * n <= f(n) <= y * n.
def linear_merge(list1, list2):
finalList = []
i = 0
j = 0
while i < len(list1):
if j < len(list2):
if list1[i] < list2[j]:
finalList.append(list1[i])
i += 1
else:
finalList.append(list2[j])
j += 1
else:
finalList.append(list1[i])
i += 1
while j < len(list2):
finalList.append(list2[j])
j += 1
return finalList