representing recurrence by chaining iterables in Python - python

I'm solving a problem where I have levels in a binary tree. I'm given a level, then a position.
The second level is [True, False].
The third level is [True, False, False, True].
The fourth [True, False, False, True, False, True, True, False], and so on.
To solve the problem, I may need to calculate this sequence out many times to get the element at a given position at that level.
For the initial array pattern = [True, False]
I want to do something like:
for _ in range(level):
pattern = pattern + [not elem for elem in pattern]
Obviously for large limits this is not working well for me. My attempts at a solution using the chain method from itertools has so far been fruitless. What is a memory efficient way to express this in Python?
EDIT
This did what I was looking for, but still did not meet the runtime requirements I was looking for.
for _ in range(level):
lsb, msb = tee(pattern)
pattern = chain(lsb, map(lambda x: not x, msb))
Ultimately, the sol'n involved finding the global index of the target element in question, and determining how many 'right' paths were taken from the root (base case = 1) to get to it, observing that the state from the parent to child does not change if a left path was taken, but flips if a right path was taken. It appears that most of the clever soln's are some spin on this fact.

What is a memory efficient way to express this in Python?
Since the approach you're using doubles the memory needed on each iteration, it won't scale easily. It may be better to find an analytic approach.
The generator below takes O(1) time to generate each element. And, crucially, calculating the next value depends only on the index and the previous value.
def gen():
yield True
n, prev = 1, 1
while True:
x = n ^ n - 1
y = x ^ x >> 1
if y.bit_length() % 2:
z = 1 - prev
else:
z = prev
yield bool(z)
prev = z
n += 1
A recurrence relation like this allows to compute elements in constant memory. Implementing the idea with cython or pypy should increase performance significantly.

Trying to generate elements one by one is a bad idea, and saving them all is even worse. You only need one element's value, and you can compute it directly.
Suppose the element you want is at index 2**i + k, where k < 2**i. Then this element is the negation of the element at index k, and the element at index k can be computed the same way. You end up negating element 0 once for each set bit in your desired index's binary representation. If there are an even number of set bits, the value is True. Otherwise, the value is False.

Related

python getting the largest even sum of an array with k elements

I've been studying python algorithm and would like to solve a problem that is:
A positive integer array A and an integer K are given.
Find the largest even sum of the array A with K elements.
If not possible, return -1.
For example, if there is an array A= [1,2,3,4,4,5] and K= 3,
the answer is 12 (5+4+3),
which is the largest even sum with K (3) elements.
However, if A= [3, 3, 3] and K= 1,
the answer is -1 because it cannot make an even sum with one element.
I tried to exclude every minimum odd from the array, but it failed when K=n in the while loop.
Is there any simple way to solve this problem? I would sincerely appreciate if you could give some advice.
Sort the array and "take" the biggest K elements.
If it's already even sum - you are done.
Otherwise, you need to replace exactly one element, replace an even element you have chosen with an odd one you have not, or the other way around. You need the difference between the two elements to be minimized.
A naive solution will check all possible ways to do that, but that's O(n^2). You can do better, by checking the actual two viable candidates:
The maximal odd element you did not choose, and the minimal
even element you have chosen
The maximal even element you did not choose and the minimal odd element you have chosen.
Choose the one that the difference between the two elements is minimal. If no such two elements exist (i.e. your k=3, [3,3,3] example) - there is no viable solution.
Time complexity is O(nlogn) for sorting.
In my (very rusty) python, it should be something like:
def FindMaximalEvenArray(a, k):
a = sorted(a)
chosen = a[len(a)-k:]
not_chosen = a[0:len(a)-k]
if sum(chosen) % 2 == 0:
return sum(chosen)
smallest_chosen_even = next((x for x in chosen if x % 2 == 0), None)
biggest_not_chosen_odd = next((x for x in not_chosen[::-1] if x % 2 != 0), None)
candidiate1 = smallest_chosen_even - biggest_not_chosen_odd if smallest_chosen_even and biggest_not_chosen_odd else float("inf")
smallest_chosen_odd = next((x for x in chosen if x % 2 != 0), None)
biggest_not_chosen_even = next((x for x in not_chosen[::-1] if x % 2 == 0), None)
candidiate2 = smallest_chosen_odd - biggest_not_chosen_even if smallest_chosen_odd and biggest_not_chosen_even else float("inf")
if candidiate1 == float("inf") and candidiate2 == float("inf"):
return -1
return sum(chosen) - min(candidiate1, candidiate2)
Note: This can be done even better (in terms of time complexity), because you don't actually care for the order of all elements, only finding the "candidates" and the top K elements. So you could use Selection Algorithm instead of sorting, which will make this run in O(n) time.

Elements of Programming Interview 5.15 (Random Subset Computation)

Algorithm problem:
Write a program which takes as input a positive integer n and size
k <= n; return a size-k subset of {0, 1, 2, .. , n -1}. The subset
should be represented as an array. All subsets should be equally
likely, and in addition, all permutations of elements of the array
should be equally likely. You may assume you have a function which
takes as input a nonnegative integer t and returns an integer in the
set {0, 1,...,t-1}.
My original solution to this in pseudocode is as follows:
Set t = n, and output the result of the random number generator into a set() until set() has size(set) == t. Return list(set)
The author solution is as follows:
def online_sampling(n, k):
changed_elements = {}
for i in range(k):
rand_idx = random.randrange(i, n)
rand_idx_mapped = changed_elements.get(rand_idx, rand_idx)
i_mapped = changed_elements.get(i, i)
changed_elements[rand_idx] = i_mapped
changed_elements[i] = rand_idx_mapped
return [changed_elements[i] for i in range(k)]
I totally understand the author's solution - my question is more about why my solution is incorrect. My guess is that it becomes greatly inefficient as t approaches n, because in that case, the probability that I need to keep running the random num function until I get a number that isn't in t gets higher and higher. If t == n, for the very last element to add to set there is just a 1/n chance that I get the correct element, and would probabilistically need to run the given rand() function n times just to get the last item.
Is this the correct reason for why my solution isn't efficient? Is there anything else I'm missing? And how would one describe the time complexity of my solution then? By the above rationale, I believe would be O(n^2) since probabilistically need to run n + n - 1 + n - 2... times.
Your solution is (almost) correct.
Firstly, it will run in O(n log n) instead of O(n^2), assuming that all operations with set are O(1). Here's why.
The expected time to add the first element to the set is 1 = n/n.
The expected time to add the second element to the set is n/(n-1), because the probability to randomly choose yet unchosen element is (n-1)/n. See geometric distribution for an explanation.
...
For k-th element, the expected time is n/(n-k). So for n elements the total time is n/n + n/(n-1) + ... + n/1 = n * (1 + 1/2 + ... + 1/n) = n log n.
Moreover, we can prove by induction that all chosen subsets will be equiprobable.
However, when you do list(set(...)), it is not guaranteed the resulting list will contain elements in the same order as you put them into a set. For example, if set is implemented as a binary search tree then the list will always be sorted. So you have to store the list of unique found elements separately.
UPD (#JimMischel): we proved the average case running time. There still is a possibility that the algorithm will run indefinitely (for example, if rand() always returns 1).
Your method has a big problem. You may return duplicate numbers if you random number generator create same number two times isn't it?
If you say set() will not keep duplicate numbers, your method has created members of set with different chance. So numbers in your set will not be equally likely.
Problem with your method is not efficiency, it does not create an equally likely result set. The author uses a variation of Fisher-Yates method for creating that subset which will be equally likely.

How can I merge overlapping strings in python?

I have some strings,
['SGALWDV', 'GALWDVP', 'ALWDVPS', 'LWDVPSP', 'WDVPSPV']
These strings partially overlap each other. If you manually overlapped them you would get:
SGALWDVPSPV
I want a way to go from the list of overlapping strings to the final compressed string in python. I feel like this must be a problem that someone has solved already and am trying to avoid reinventing the wheel. The methods I can imagine now are either brute force or involve getting more complicated by using biopython and sequence aligners than I would like. I have some simple short strings and just want to properly merge them in a simple way.
Does anyone have any advice on a nice way to do this in python? Thanks!
Here is a quick sorting solution:
s = ['SGALWDV', 'GALWDVP', 'ALWDVPS', 'LWDVPSP', 'WDVPSPV']
new_s = sorted(s, key=lambda x:s[0].index(x[0]))
a = new_s[0]
b = new_s[-1]
final_s = a[:a.index(b[0])]+b
Output:
'SGALWDVPSPV'
This program sorts s by the value of the index of the first character of each element, in an attempt to find the string that will maximize the overlap distance between the first element and the desired output.
My proposed solution with a more challenging test list:
#strFrag = ['SGALWDV', 'GALWDVP', 'ALWDVPS', 'LWDVPSP', 'WDVPSPV']
strFrag = ['ALWDVPS', 'SGALWDV', 'LWDVPSP', 'WDVPSPV', 'GALWDVP', 'LWDVPSP', 'ALWDVPS']
for repeat in range(0, len(strFrag)-1):
bestMatch = [2, '', ''] #overlap score (minimum value 3), otherStr index, assembled str portion
for otherStr in strFrag[1:]:
for x in range(0,len(otherStr)):
if otherStr[x:] == strFrag[0][:len(otherStr[x:])]:
if len(otherStr)-x > bestMatch[0]:
bestMatch = [len(otherStr)-x, strFrag.index(otherStr), otherStr[:x]+strFrag[0]]
if otherStr[:-x] == strFrag[0][-len(otherStr[x:]):]:
if x > bestMatch[0]:
bestMatch = [x, strFrag.index(otherStr), strFrag[0]+otherStr[-x:]]
if bestMatch[0] > 2:
strFrag[0] = bestMatch[2]
strFrag = strFrag[:bestMatch[1]]+strFrag[bestMatch[1]+1:]
print(strFrag)
print(strFrag[0])
Basically the code compares every string/fragment to the first in list and finds the best match (most overlap). It consolidates the list progressively, merging the best matches and removing the individual strings. Code assumes that there are no unfillable gaps between strings/fragments (Otherwise answer may not result in longest possible assembly. Can be solved by randomizing the starting string/fragment). Also assumes that the reverse complement is not present (poor assumption with contig assembly), which would result in nonsense/unmatchable strings/fragments. I've included a way to restrict the minimum match requirements (changing bestMatch[0] value) to prevent false matches. Last assumption is that all matches are exact. To enable flexibility in permitting mismatches when assembling the sequence makes the problem considerably more complex. I can provide a solution for assembling with mismatches upon request.
To determine the overlap of two strings a and b, you can check if any prefix of b is a suffix of a. You can then use that check in a simple loop, aggregating the result and slicing the next string in the list according to the overlap.
lst = ['SGALWDV', 'GALWDVP', 'ALWDVPS', 'LWDVPSP', 'WDVPSPV']
def overlap(a, b):
return max(i for i in range(len(b)+1) if a.endswith(b[:i]))
res = lst[0]
for s in lst[1:]:
o = overlap(res, s)
res += s[o:]
print(res) # SGALWDVPSPV
Or using reduce:
from functools import reduce # Python 3
print(reduce(lambda a, b: a + b[overlap(a,b):], lst))
This is probably not super-efficient, with complexity of about O(n k), with n being the number of strings in the list and k the average length per string. You can make it a bit more efficient by only testing whether the last char of the presumed overlap of b is the last character of a, thus reducing the amount of string slicing and function calls in the generator expression:
def overlap(a, b):
return max(i for i in range(len(b)) if b[i-1] == a[-1] and a.endswith(b[:i]))
Here's my solution which borders on brute force from the OP's perspective. It's not bothered by order (threw in a random shuffle to confirm that) and there can be non-matching elements in the list, as well as other independent matches. Assumes overlap means not a proper subset but independent strings with elements in common at the start and end:
from collections import defaultdict
from random import choice, shuffle
def overlap(a, b):
""" get the maximum overlap of a & b plus where the overlap starts """
overlaps = []
for i in range(len(b)):
for j in range(len(a)):
if a.endswith(b[:i + 1], j):
overlaps.append((i, j))
return max(overlaps) if overlaps else (0, -1)
lst = ['SGALWDV', 'GALWDVP', 'ALWDVPS', 'LWDVPSP', 'WDVPSPV', 'NONSEQUITUR']
shuffle(lst) # to verify order doesn't matter
overlaps = defaultdict(list)
while len(lst) > 1:
overlaps.clear()
for a in lst:
for b in lst:
if a == b:
continue
amount, start = overlap(a, b)
overlaps[amount].append((start, a, b))
maximum = max(overlaps)
if maximum == 0:
break
start, a, b = choice(overlaps[maximum]) # pick one among equals
lst.remove(a)
lst.remove(b)
lst.append(a[:start] + b)
print(*lst)
OUTPUT
% python3 test.py
NONSEQUITUR SGALWDVPSPV
%
Computes all the overlaps and combines the largest overlap into a single element, replacing the original two, and starts process over again until we're down to a single element or no overlaps.
The overlap() function is horribly inefficient and likely can be improved but that doesn't matter if this isn't the type of matching the OP desires.
Once the peptides start to grow to 20 aminoacids cdlane's code chokes and spams (multiple) incorrect answer(s) with various amino acid lengths.
Try to add and use AA sequence 'VPSGALWDVPS' with or without 'D' and the code starts to fail its task because the N-and C-terminus grow and do not reflect what Adam Price is asking for. The output is: 'SGALWDVPSGALWDVPSPV' and thus 100% incorrect despite the effort.
Tbh imo there is only one 100% answer and that is to use BLAST and its protein search page or BLAST in the BioPython package. Or adapt cdlane's code to reflect AA gaps, substitutions and AA additions.
Dredging up an old thread, but had to solve this myself today.
For this specific case, where the fragments are already in order, and each overlap by the same amount (in this case 1), the following fairly simply concatenation works, though might not be the worlds most robust solution:
lst = ['SGALWDV', 'GALWDVP', 'ALWDVPS', 'LWDVPSP', 'WDVPSPV']
reference = "SGALWDVPSPV"
string = "".join([i[0] for i in lst] + [lst[-1][1:]])
reference == string
True

Find the index of the n'th item in a list

I want to find the index of the n'th occurrence of an item in a list. e.g.,
x=[False,True,True,False,True,False,True,False,False,False,True,False,True]
What is the index of the n'th true? If I wanted the fifth occurrence (4th if zero-indexed), the answer is 10.
I've come up with:
indargs = [ i for i,a in enumerate(x) if a ]
indargs[n]
Note that x.index returns the first occurrence or the first occurrence after some point, and therefore as far as I can tell is not a solution.
There is also a solution in numpy for cases similar to the above, e.g. using cumsum and where, but I'd like to know if there's a numpy-free way to solve the problem.
I'm concerned about performance since I first encountered this while implemented a Sieve of Eratosthenes for a Project Euler problem, but this is a more general question that I have encountered in other situations.
EDIT: I've gotten a lot of great answers, so I decided to do some performance tests. Below are timeit execution times in seconds for lists with len nelements searching for the 4000'th/1000'th True. The lists are random True/False. Source code linked below; it's a touch messy. I used short / modified versions of the posters' names to describe the functions except listcomp, which is the simple list comprehension above.
True Test (100'th True in a list containing True/False)
nelements eyquem_occur eyquem_occurrence graddy taymon listcomp hettinger26 hettinger
3000: 0.007824 0.031117 0.002144 0.007694 0.026908 0.003563 0.003563
10000: 0.018424 0.103049 0.002233 0.018063 0.088245 0.003610 0.003769
50000: 0.078383 0.515265 0.002140 0.078074 0.442630 0.003719 0.003608
100000: 0.152804 1.054196 0.002129 0.152691 0.903827 0.003741 0.003769
200000: 0.303084 2.123534 0.002212 0.301918 1.837870 0.003522 0.003601
True Test (1000'th True in a list containing True/False)
nelements eyquem_occur eyquem_occurrence graddy taymon listcomp hettinger26 hettinger
3000: 0.038461 0.031358 0.024167 0.039277 0.026640 0.035283 0.034482
10000: 0.049063 0.103241 0.024120 0.049383 0.088688 0.035515 0.034700
50000: 0.108860 0.516037 0.023956 0.109546 0.442078 0.035269 0.035373
100000: 0.183568 1.049817 0.024228 0.184406 0.906709 0.035135 0.036027
200000: 0.333501 2.141629 0.024239 0.333908 1.826397 0.034879 0.036551
True Test (20000'th True in a list containing True/False)
nelements eyquem_occur eyquem_occurrence graddy taymon listcomp hettinger26 hettinger
3000: 0.004520 0.004439 0.036853 0.004458 0.026900 0.053460 0.053734
10000: 0.014925 0.014715 0.126084 0.014864 0.088470 0.177792 0.177716
50000: 0.766154 0.515107 0.499068 0.781289 0.443654 0.707134 0.711072
100000: 0.837363 1.051426 0.501842 0.862350 0.903189 0.707552 0.706808
200000: 0.991740 2.124445 0.498408 1.008187 1.839797 0.715844 0.709063
Number Test (750'th 0 in a list containing 0-9)
nelements eyquem_occur eyquem_occurrence graddy taymon listcomp hettinger26 hettinger
3000: 0.026996 0.026887 0.015494 0.030343 0.022417 0.026557 0.026236
10000: 0.037887 0.089267 0.015839 0.040519 0.074941 0.026525 0.027057
50000: 0.097777 0.445236 0.015396 0.101242 0.371496 0.025945 0.026156
100000: 0.173794 0.905993 0.015409 0.176317 0.762155 0.026215 0.026871
200000: 0.324930 1.847375 0.015506 0.327957 1.536012 0.027390 0.026657
Hettinger's itertools solution is almost always the best. taymon's and graddy's solutions are next best for most situations, though the list comprehension approach can be better for short arrays when you want the n'th instance such that n is high or lists in which there are fewer than n occurrences. If there's a chance that there are fewer than n occurrences, the initial count check saves time. Also, graddy's is more efficient when searching for numbers instead of True/False... not clear why that is. eyquem's solutions are essentially equivalent to others with slightly more or less overhead; eyquem_occur is approximately the same as taymon's solution, while eyquem_occurrence is similar to listcomp.
The answer from #Taymon using list.index was great.
FWIW, here's a functional approach using the itertools module. It works with any iterable input, not just lists:
>>> from itertools import compress, count, imap, islice
>>> from functools import partial
>>> from operator import eq
>>> def nth_item(n, item, iterable):
indicies = compress(count(), imap(partial(eq, item), iterable))
return next(islice(indicies, n, None), -1)
The example is nice because it shows off how to effectively combine Python's functional toolset. Note, that once the pipeline is set-up, there are no trips around Python's eval loop -- everything gets done at C speed, with a tiny memory footprint, with lazy evaluation, with no variable assignments, and with separately testable components. IOW, it is everything functional programmers dream about :-)
Sample run:
>>> x = [False,True,True,False,True,False,True,False,False,False,True,False,True]
>>> nth_item(50, True, x)
-1
>>> nth_item(0, True, x)
1
>>> nth_item(1, True, x)
2
>>> nth_item(2, True, x)
4
>>> nth_item(3, True, x)
6
I can't say for certain that this is the fastest way, but I imagine it'd be pretty good:
i = -1
for j in xrange(n):
i = x.index(True, i + 1)
The answer is i.
[y for y in enumerate(x) if y[1]==True][z][0]
Note: Here Z is the n'th occurance,
If you're concerned with performance, you are best off seeing if there are algorithmic optimizations you can make. For example, if you are calling this function many times on the same values, you may wish to cache previous computations (e.g. once you find the 50th occurrence of an element, you can find any previous occurrence in O(1) time).
Otherwise, you want to make sure your technique works on (lazy) iterators.
The most *in*elegant and performance-happy way I can think of implementing it is as:
def indexOfNthOccurrence(N, element, stream):
"""for N>0, returns index or None"""
seen = 0
for i,x in enumerate(stream):
if x==element:
seen += 1
if seen==N:
return i
(if you really care about the performance difference between enumerate and other techniques, you will need to resort to profiling, especially with the numpy functions, which may resort to C)
To preprocess the entire stream and support O(1) queries:
from collections import *
cache = defaultdict(list)
for i,elem in enumerate(YOUR_LIST):
cache[elem] += [i]
# e.g. [3,2,3,2,5,5,1]
# 0 1 2 3 4 5 6
# cache: {3:[0,2], 1:[6], 2:[1,3], 5:[4,5]}
A solution that first creates a list object and returns the nth-1 element of this list : function occurence()
And a solution that fulfill functional programmers'dreams too, I think, using generators, because I love them : function occur()
S = 'stackoverflow.com is a fantastic amazing site'
print 'object S is string %r' % S
print "indexes of 'a' in S :",[indx for indx,elem in enumerate(S) if elem=='a']
def occurence(itrbl,x,nth):
return [indx for indx,elem in enumerate(itrbl)
if elem==x ][nth-1] if x in itrbl \
else None
def occur(itrbl,x,nth):
return (i for pos,i in enumerate(indx for indx,elem in enumerate(itrbl)
if elem==x)
if pos==nth-1).next() if x in itrbl\
else None
print "\noccurence(S,'a',4th) ==",occurence(S,'a',4)
print "\noccur(S,'a',4th) ==",occur(S,'a',4)
result
object S is string 'stackoverflow.com is a fantastic amazing site'
indexes of 'a' in S : [2, 21, 24, 27, 33, 35]
occur(S,'a',4th) == 27
occurence(S,'a',4th) == 27
The second solution seems complex but it isn't really. It doesn't need to run completely through the iterable: the process stops as soon as the wanted occurence is found.
Here is another way to find the nth occurrence of x in a list itrbl:
def nthoccur(nth,x,itrbl):
count,index = 0,0
while count < nth:
if index > len(itrbl) - 1:
return None
elif itrbl[index] == x:
count += 1
index += 1
else:
index += 1
return index - 1
if efficiency is a concern i think its better to iterate the normally ( O(N) ) instead of list comprehension which takes O(L) where L is length of list
Example : Consider a very huge list and you want to find the first occurence N=1 it is obviously better to stop as soon as you find the first occurence
count = 0
for index,i in enumerate(L):
if i:
count = count + 1
if count==N:
return index
here is a way :
for the example above :
x=[False,True,True,False,True,False,True,False,False,False,True,False,True]
we can define a function find_index
def find_index(lst, value, n):
c=[]
i=0
for element in lst :
if element == value :
c .append (i)
i+=1
return c[n]
and if we apply the function :
nth_index = find_index(x, True, 4)
print nth_index
the result is:
10
I think this should work.
def get_nth_occurrence_of_specific_term(my_list, term, n):
assert type(n) is int and n > 0
start = -1
for i in range(n):
if term not in my_list[start + 1:]:
return -1
start = my_list.index(term, start + 1)
return start
You could use count:
from itertools import count
x = [False, True, True, False, True, False, True, False, False, False, True, False, True]
def nth_index(n, item, iterable):
counter = count(1)
return next((i for i, e in enumerate(iterable) if e == item and next(counter) == n), -1)
print(nth_index(3, True, x))
Output
4
The idea is that due to the short-circuit nature of e == item and next(counter) == n), the expression next(counter) == n only gets evaluated when e == item so you are only counting the elements equals to item.
You can use next with enumerate and a generator expression. itertools.islice allows you to slice an iterable as required.
from itertools import islice
x = [False,True,True,False,True,False,True,False,False,False,True,False,True]
def get_nth_index(L, val, n):
"""return index of nth instance where value in list equals val"""
return next(islice((i for i, j in enumerate(L) if j == val), n-1, n), -1)
res = get_nth_index(x, True, 3) # 4
If the iterator is exhausted, i.e. the nth occurrence of the specified value doesn't exist, next can return a default value, in this instance -1:

Python: speed up removal of every n-th element from list

I'm trying to solve this programming riddle and although the solution (see code below) works correctly, it is too slow for succesful submission.
Any pointers as how to make this run
faster (removal of every n-th element from a list)?
Or suggestions for a better algorithm to calculate the same; seems I can't think of anything
else than brute-force for now...
Basically, the task at hand is:
GIVEN:
L = [2,3,4,5,6,7,8,9,10,11,........]
1. Take the first remaining item in list L (in the general case 'n'). Move it to
the 'lucky number list'. Then drop every 'n-th' item from the list.
2. Repeat 1
TASK:
Calculate the n-th number from the 'lucky number list' ( 1 <= n <= 3000)
My original code (it calculated the 3000 first lucky numbers in about a second on my machine - unfortunately too slow):
"""
SPOJ Problem Set (classical) 1798. Assistance Required
URL: http://www.spoj.pl/problems/ASSIST/
"""
sieve = range(3, 33900, 2)
luckynumbers = [2]
while True:
wanted_n = input()
if wanted_n == 0:
break
while len(luckynumbers) < wanted_n:
item = sieve[0]
luckynumbers.append(item)
items_to_delete = set(sieve[::item])
sieve = filter(lambda x: x not in items_to_delete, sieve)
print luckynumbers[wanted_n-1]
EDIT: thanks to the terrific contributions of Mark Dickinson, Steve Jessop and gnibbler, I got at the following, which is quite a whole lot faster than my original code (and succesfully got submitted at http://www.spoj.pl with 0.58 seconds!)...
sieve = range(3, 33810, 2)
luckynumbers = [2]
while len(luckynumbers) < 3000:
if len(sieve) < sieve[0]:
luckynumbers.extend(sieve)
break
luckynumbers.append(sieve[0])
del sieve[::sieve[0]]
while True:
wanted_n = input()
if wanted_n == 0:
break
else:
print luckynumbers[wanted_n-1]
This series is called ludic numbers
__delslice__ should be faster than __setslice__+filter
>>> L=[2,3,4,5,6,7,8,9,10,11,12]
>>> lucky=[]
>>> lucky.append(L[0])
>>> del L[::L[0]]
>>> L
[3, 5, 7, 9, 11]
>>> lucky.append(L[0])
>>> del L[::L[0]]
>>> L
[5, 7, 11]
So the loop becomes.
while len(luckynumbers) < 3000:
item = sieve[0]
luckynumbers.append(item)
del sieve[::item]
Which runs in less than 0.1 second
Try using these two lines for the deletion and filtering, instead of what you have; filter(None, ...) runs considerably faster than the filter(lambda ...).
sieve[::item] = [0]*-(-len(sieve)//item)
sieve = filter(None, sieve)
Edit: much better to simply use del sieve[::item]; see gnibbler's solution.
You might also be able to find a better termination condition for the while loop: for example, if the first remaining item in the sieve is i then the first i elements of the sieve will become the next i lucky numbers; so if len(luckynumbers) + sieve[0] >= wanted_n you should already have computed the number you need---you just need to figure out where in sieve it is so that you can extract it.
On my machine, the following version of your inner loop runs around 15 times faster than your original for finding the 3000th lucky number:
while len(luckynumbers) + sieve[0] < wanted_n:
item = sieve[0]
luckynumbers.append(item)
sieve[::item] = [0]*-(-len(sieve)//item)
sieve = filter(None, sieve)
print (luckynumbers + sieve)[wanted_n-1]
An explanation on how to solve this problem can be found here. (The problem I linked to asks for more, but the main step in that problem is the same as the one you're trying to solve.) The site I linked to also contains a sample solution in C++.
The set of numbers can be represented in a binary tree, which supports the following operations:
Return the nth element
Erase the nth element
These operations can be implemented to run in O(log n) time, where n is the number of nodes in the tree.
To build the tree, you can either make a custom routine that builds the tree from a given array of elements, or implement an insert operation (make sure to keep the tree balanced).
Each node in the tree need the following information:
Pointers to the left and right children
How many items there are in the left and right subtrees
With such a structure in place, solving the rest of the problem should be fairly straightforward.
I also recommend calculating the answers for all possible input values before reading any input, instead of calculating the answer for each input line.
A Java implementation of the above algorithm gets accepted in 0.68 seconds at the website you linked.
(Sorry for not providing any Python-specific help, but hopefully the algorithm outlined above will be fast enough.)
You're better off using an array and zeroing out every Nth item using that strategy; after you do this a few times in a row, the updates start getting tricky so you'd want to re-form the array. This should improve the speed by at least a factor of 10. Do you need vastly better than that?
Why not just create a new list?
L = [x for (i, x) in enumerate(L) if i % n]

Categories

Resources