splitting a list dynamically with range and value to split - python

I want to split the value into number of spits provided. so for example if I have a value = 165340
and split = 5 then the list should become ['0-33068', '33069-66137', '66138-99204', '99205-132272', '132273-165340']...
so far I have just come up with something like this but this is not dynamic...
so thinking how can I build a list of strings like of numbers split with the difference val/split
for i in range(split):
if i==0:
lst.append('%s-%s' % (i, val/split))
elif i==1:
lst.append('%s-%s' % (val/split+i, val/split*2+1))
elif i == 2:
lst.append('%s-%s' % (val/split*i+2, val/split*3))
elif i == 3:
lst.append('%s-%s' % (val/split*i+1, val/split*4))
elif i == 4:
lst.append('%s-%s' % (val/split*i+1, val/split*5))
else:
pass

FINAL:
I made a bunch of attempts here, especially in using remainder = value % numsplits, then int(i * remainder // numsplits) to try and keep things close. Eventually, though, I had to give up and go back to floating point which seems to give the closest results. The usual floating point concerns apply.
def segment(value, numsplits):
return ["{}-{}".format(
int(round(1 + i * value/(numsplits*1.0),0)),
int(round(1 + i * value/(numsplits*1.0) +
value/(numsplits*1.0)-1, 0))) for
i in range(numsplits)]
>>> segment(165340, 5)
['1-33068', '33069-66136', '66137-99204', '99205-132272', '132273-165340']
>>> segment(7, 4)
['1-2', '3-4', '4-5', '6-7']
I don't see a huge issue with this one. I did start at 1 instead of 0, but that's not necessary (change both the int(round(1 + i * ... to int(round(i * ... to change that). Old results follow.
value = 165340
numsplits = 5
result = ["{}-{}".format(i + value//numsplits*i, i + value//numsplits*i + value//numsplits) for i in range(numsplits)]
Probably worth tossing in a function
def segment(value,numsplits):
return ["{}-{}".format(value*i//numsplits, 1 + value//numsplits*i + value//numsplits) for i in range(numsplits)]
The following will cut it off at your value
def segment(value, numsplits):
return ["{}-{}".format(max(0,i + value*i//numsplits), min(value,i + value*i//numsplits + value//numsplits)) for i in range(numsplits)]

To answer this question, it's important to know exactly how we should treat 0 - but it doesn't seem like you've asked yourself this question. The intervals in your example output are inconsistent; you're starting with 0 in the first interval and the first two intervals both have 33,069 elements (counting 0) in them, but you're also ending your last interval at 165340. If 0 and 165340 are both counted in the number of elements, then 165340 is not divisible into five even intervals.
Here are a few different solutions that might help you understand the problem.
Even intervals, counting from zero
Let's start with the assumption that you really do want both 0 and the "top" value counted as elements and displayed in the result. In other words, the value 11 would actually indicate the following 12-element range:
[0, 1, 2, 3, 4, 5, 6, 7, 8, 9, 10, 11]
And be evenly split into the following non-negative intervals:
['0-3', '4-7', '8-11']
If we're only concerned with evenly-divisible cases, we can use a fairly short function (NOTE: These solutions are valid for Python 3.x, or for Python 2.x with from __future__ import division):
>>> def evenintervals(value, n):
... binsize = (value + 1) // n
... intervals = ((x * binsize, (x + 1) * binsize - 1) for x in range(n))
... return ['{}-{}'.format(x, y) for x, y in intervals]
...
>>> evenintervals(11, 3)
['0-3', '4-7', '8-11']
>>> evenintervals(17, 2)
['0-8', '9-17']
However, this function deals with 165340 (and any other not-evenly-divisible case) by dropping some numbers off the end:
>>> evenintervals(165340, 5)
['0-33067', '33068-66135', '66136-99203', '99204-132271', '132272-165339']
From a purely mathematical perspective, this just doesn't work. However, we could fudge it a bit if for some reason you want to display 0, but not actually count it as an element of the first interval.
Even intervals, counting from one
Here's a function that doesn't count 0 as an element of the list, but does give you the option of displaying it, if you're just that zany:
>>> def evenintervals1(value, n, show_zero=False):
... binsize = value // n
... intervals = [[x * binsize + 1, (x + 1) * binsize] for x in range(n)]
... if show_zero:
... intervals[0][0] = 0
... return ['{}-{}'.format(x, y) for x, y in intervals]
...
>>> evenintervals1(20, 4)
['1-5', '6-10', '11-15', '16-20']
>>> evenintervals1(20, 5, show_zero=True)
['0-5', '6-10', '11-15', '16-20']
This version of the function might be the closest thing to what you asked for in your question, even though it doesn't show the exact values you gave in your example output:
>>> evenintervals1(165340, 5, show_zero=True)
['0-33068', '33069-66136', '66137-99204', '99205-132272', '132273-165340']
But we still have problems with inputs that aren't evenly divisible. What if we wanted a more general solution?
Uneven intervals
Let's think about how to deal with a wider range of inputs. We should be able to produce, from any positive integer n, anywhere from 1 to n non-overlapping ranges of positive integers. In other words, if our integer is 5, we want to be able to produce a list with as many as five ranges. But how should we distribute "extra" elements, in order to make the ranges as even as possible?
We probably don't want to distribute them randomly. We could just lengthen or shorten the last range in the list, but that has the potential to be very lop-sided:
# 40 split 7 times, adding remainder to last item
['1-5', '6-10', '11-15', '16-20', '21-25', '26-30', '31-40']
# 40 split 7 times, subtracting excess from last item
['1-6', '7-12', '13-18', '19-24', '25-30', '31-36', '37-40']
In the former case the last element is 100% larger than the others and in the latter case it's 33% smaller. If you're splitting a very large value into a much smaller number of intervals, this may not be as much of a problem.
More likely, we want a function that produces the most even set of ranges possible. I'm going to do this by spreading the remainder of the division out among the first elements of the list, with a little help from itertools:
>>> from itertools import zip_longest # izip_longest for Python 2.7
>>> def anyintervals(value, n):
... binsize, extras = value // n, value % n
... intervals = []
... lower = 0
... upper = 0
... for newbinsize in map(sum, zip_longest([binsize] * n, [1] * extras, fillvalue=0)):
... lower, upper = upper + 1, upper + newbinsize
... intervals.append((lower, upper))
... return ['{}-{}'.format(x, y) for x, y in intervals]
...
>>> anyintervals(11, 3)
['1-4', '5-8', '9-11']
>>> anyintervals(17, 2)
['1-9', 10-17']
Finally, with the example inputs given in the OP:
>>> anyintervals(165340, 5)
['1-33068', '33069-66136', '66137-99204', '99205-132272', '132273-165340']
If it were really important to show the first interval starting at zero, we could apply the same logic here that was used in evenintervals1 to modify the very first integer in intervals before returning, or write a similar function to this one that started counting at zero.
I did implement another version that distributes the "extras" among the last ranges rather than the first, and there are certainly many other implementations that you might be interested in fiddling around with, but those solutions are left as an exercise to the reader. ;)

One possibility using numpy:
from numpy import arange
v = 165340
s = 5
splits = arange(s + 1) * (v / s)
lst = ['%d-%d' % (splits[idx], splits[idx+1]) for idx in range(s)]
print '\n'.join(lst)
output:
0-33068
33068-66136
66136-99204
99204-132272
132272-165340

Related

form the largest number possible in a list [duplicate]

This question already has answers here:
Sort a list to form the largest possible number
(9 answers)
Closed 4 years ago.
Given a list such as:
[3, 30, 34, 5, 9].
Output: 9534330
Write a program to return the largest number possible
In my code I have used permutation here:
from itertools import permutations
x = [3, 30, 34, 5, 9]
y = permutations(x)
n = len(y)
e = []
for i in y:
a = map(str, i)
e.append(int("".join(i)))
print "Largest Number {}".format(sorted(e)[-1])
Here n which is the length of the number of permutations is 120 because of 5!.
Is there a better way to solve this problem?
Sorting all numbers in descending order is the simplest solution that occurs to us. But this doesn’t work.
For example, 548 is greater than 60, but in the output, 60 comes before 548. As a second example, 98 is greater than 9, but 9 comes before 98 in the output.
The solution is to use any comparison based sorting algorithm. Thus, instead of using the default comparison, write a comparison function myCompare() and use it to sort numbers.
Given two numbers X and Y, how should myCompare() decide which number to put first – we compare two numbers XY (Y appended at the end of X) and YX (X appended at the end of Y).
If XY is larger, then, in the output, X should come before Y, else Y should come before X.
For example, let X and Y be 542 and 60. To compare X and Y, we compare 54260 and 60542. Since 60542 is greater than 54260, we put Y first.
Calculating Permutations yield a higher time complexity.
A better solution in python would be:
def largestNumber(A):
maxlen = len(str(max(A)))
if all(v == 0 for v in A):
return '0'
return ''.join(sorted((str(v) for v in A), reverse=True,
key=lambda i: i*(maxlen * 2 // len(i))))
largestNumber([3, 30, 34, 5, 9])
The solution to this problem leads to an interesting transformation that is worth explaining.
Assume we want to know which of XY or YX is larger for given X and Y. Numerically, we want the largest of X.10^y + Y and Y.10^x + X, where the lowercase denote the number of digits of the uppercase variables.
Then with a little math, the comparison
X.10^y + Y < Y.10^x + X
can be rewritten
X / (10^x - 1) < Y / (10^y - 1)
so that XY < YX is certainly a transitive relation and defines a total order. This is very good news because it means that the problem can be reduced to ordinary sorting by using this modified comparison operation.
Now notice that X / (10^x - 1) is a periodic fractional number of the form 0.XXXX..., and to compare 0.XXXX... and 0.YYYY..., it suffices to compare over the longest period. Hence the comparison can work as an ordinary string comparison, except that when the end of the shorter string is reached, we cycle back to the first character.
E.g. 12345 > 12 because 12345 > 12|12|1 and 12105 < 12 because 12105 < 12|12|1.
The comparison function can be described as follows:
def Cmp(X, Y):
l= max(len(X), len(Y))
for i in range(l):
if X[i % len(X)] < Y[i % len(Y)]:
return 1 # X > Y
elif X[i % len(X)] > Y[i % len(Y)]:
return -1 # X < Y
return 0 # X == Y
I don't recommend this particular implementation, which will be slow because of the %.

Speeding up algorithm that finds multiples in a given range

I'm a stumped on how to speed up my algorithm which sums multiples in a given range. This is for a problem on codewars.com here is a link to the problem
codewars link
Here's the code and i'll explain what's going on in the bottom
import itertools
def solution(number):
return multiples(3, number) + multiples(5, number) - multiples(15, number)
def multiples(m, count):
l = 0
for i in itertools.count(m, m):
if i < count:
l += i
else:
break
return l
print solution(50000000) #takes 41.8 seconds
#one of the testers takes 50000000000000000000000000000000000000000 as input
# def multiples(m, count):
# l = 0
# for i in xrange(m,count ,m):
# l += i
# return l
so basically the problem ask the user return the sum of all the multiples of 3 and 5 within a number. Here are the testers.
test.assert_equals(solution(10), 23)
test.assert_equals(solution(20), 78)
test.assert_equals(solution(100), 2318)
test.assert_equals(solution(200), 9168)
test.assert_equals(solution(1000), 233168)
test.assert_equals(solution(10000), 23331668)
my program has no problem getting the right answer. The problem arises when the input is large. When pass in a number like 50000000 it takes over 40 seconds to return the answer. One of the inputs i'm asked to take is 50000000000000000000000000000000000000000, which a is huge number. That's also the reason why i'm using itertools.count() I tried using xrange in my first attempt but range can't handle numbers larger than a c type long. I know the slowest part the problem is the multiples method...yet it is still faster then my first attempt using list comprehension and checking whether i % 3 == 0 or i % 5 == 0, any ideas guys?
This solution should be faster for large numbers.
def solution(number):
number -= 1
a, b, c = number // 3, number // 5, number // 15
asum, bsum, csum = a*(a+1) // 2, b*(b+1) // 2, c*(c+1) // 2
return 3*asum + 5*bsum - 15*csum
Explanation:
Take any sequence from 1 to n:
1, 2, 3, 4, ..., n
And it's sum will always be given by the formula n(n+1)/2. This can be proven easily if you consider that the expression (1 + n) / 2 is just a shortcut for computing the average, or Arithmetic mean of this particular sequence of numbers. Because average(S) = sum(S) / length(S), if you take the average of any sequence of numbers and multiply it by the length of the sequence, you get the sum of the sequence.
If we're given a number n, and we want the sum of the multiples of some given k up to n, including n, we want to find the summation:
k + 2k + 3k + 4k + ... xk
where xk is the highest multiple of k that is less than or equal to n. Now notice that this summation can be factored into:
k(1 + 2 + 3 + 4 + ... + x)
We are given k already, so now all we need to find is x. If x is defined to be the highest number you can multiply k by to get a natural number less than or equal to n, then we can get the number x by using Python's integer division:
n // k == x
Once we find x, we can find the sum of the multiples of any given k up to a given n using previous formulas:
k(x(x+1)/2)
Our three given k's are 3, 5, and 15.
We find our x's in this line:
a, b, c = number // 3, number // 5, number // 15
Compute the summations of their multiples up to n in this line:
asum, bsum, csum = a*(a+1) // 2, b*(b+1) // 2, c*(c+1) // 2
And finally, multiply their summations by k in this line:
return 3*asum + 5*bsum - 15*csum
And we have our answer!

Random contiguous slice of list in Python based on a single random integer

Using a single random number and a list, how would you return a random slice of that list?
For example, given the list [0,1,2] there are seven possibilities of random contiguous slices:
[ ]
[ 0 ]
[ 0, 1 ]
[ 0, 1, 2 ]
[ 1 ]
[ 1, 2]
[ 2 ]
Rather than getting a random starting index and a random end index, there must be a way to generate a single random number and use that one value to figure out both starting index and end/length.
I need it that way, to ensure these 7 possibilities have equal probability.
Simply fix one order in which you would sort all possible slices, then work out a way to turn an index in that list of all slices back into the slice endpoints. For example, the order you used could be described by
The empty slice is before all other slices
Non-empty slices are ordered by their starting point
Slices with the same starting point are ordered by their endpoint
So the index 0 should return the empty list. Indices 1 through n should return [0:1] through [0:n]. Indices n+1 through n+(n-1)=2n-1 would be [1:2] through [1:n]; 2n through n+(n-1)+(n-2)=3n-3 would be [2:3] through [2:n] and so on. You see a pattern here: the last index for a given starting point is of the form n+(n-1)+(n-2)+(n-3)+…+(n-k), where k is the starting index of the sequence. That's an arithmetic series, so that sum is (k+1)(2n-k)/2=(2n+(2n-1)k-k²)/2. If you set that term equal to a given index, and solve that for k, you get some formula involving square roots. You could then use the ceiling function to turn that into an integral value for k corresponding to the last index for that starting point. And once you know k, computing the end point is rather easy.
But the quadratic equation in the solution above makes things really ugly. So you might be better off using some other order. Right now I can't think of a way which would avoid such a quadratic term. The order Douglas used in his answer doesn't avoid square roots, but at least his square root is a bit simpler due to the fact that he sorts by end point first. The order in your question and my answer is called lexicographical order, his would be called reverse lexicographical and is often easier to handle since it doesn't depend on n. But since most people think about normal (forward) lexicographical order first, this answer might be more intuitive to many and might even be the required way for some applications.
Here is a bit of Python code which lists all sequence elements in order, and does the conversion from index i to endpoints [k:m] the way I described above:
from math import ceil, sqrt
n = 3
print("{:3} []".format(0))
for i in range(1, n*(n+1)//2 + 1):
b = 1 - 2*n
c = 2*(i - n) - 1
# solve k^2 + b*k + c = 0
k = int(ceil((- b - sqrt(b*b - 4*c))/2.))
m = k + i - k*(2*n-k+1)//2
print("{:3} [{}:{}]".format(i, k, m))
The - 1 term in c doesn't come from the mathematical formula I presented above. It's more like subtracting 0.5 from each value of i. This ensures that even if the result of sqrt is slightly too large, you won't end up with a k which is too large. So that term accounts for numeric imprecision and should make the whole thing pretty robust.
The term k*(2*n-k+1)//2 is the last index belonging to starting point k-1, so i minus that term is the length of the subsequence under consideration.
You can simplify things further. You can perform some computation outside the loop, which might be important if you have to choose random sequences repeatedly. You can divide b by a factor of 2 and then get rid of that factor in a number of other places. The result could look like this:
from math import ceil, sqrt
n = 3
b = n - 0.5
bbc = b*b + 2*n + 1
print("{:3} []".format(0))
for i in range(1, n*(n+1)//2 + 1):
k = int(ceil(b - sqrt(bbc - 2*i)))
m = k + i - k*(2*n-k+1)//2
print("{:3} [{}:{}]".format(i, k, m))
It is a little strange to give the empty list equal weight with the others. It is more natural for the empty list to be given weight 0 or n+1 times the others, if there are n elements on the list. But if you want it to have equal weight, you can do that.
There are n*(n+1)/2 nonempty contiguous sublists. You can specify these by the end point, from 0 to n-1, and the starting point, from 0 to the endpoint.
Generate a random integer x from 0 to n*(n+1)/2.
If x=0, return the empty list. Otherwise, x is unformly distributed from 1 through n(n+1)/2.
Compute e = floor(sqrt(2*x)-1/2). This takes the values 0, 1, 1, 2, 2, 2, 3, 3, 3, 3, etc.
Compute s = (x-1) - e*(e+1)/2. This takes the values 0, 0, 1, 0, 1, 2, 0, 1, 2, 3, ...
Return the interval starting at index s and ending at index e.
(s,e) takes the values (0,0),(0,1),(1,1),(0,2),(1,2),(2,2),...
import random
import math
n=10
x = random.randint(0,n*(n+1)/2)
if (x==0):
print(range(n)[0:0]) // empty set
exit()
e = int(math.floor(math.sqrt(2*x)-0.5))
s = int(x-1 - (e*(e+1)/2))
print(range(n)[s:e+1]) // starting at s, ending at e, inclusive
First create all possible slice indexes.
[0:0], [1:1], etc are equivalent, so we include only one of those.
Finally you pick a random index couple, and apply it.
import random
l = [0, 1, 2]
combination_couples = [(0, 0)]
length = len(l)
# Creates all index couples.
for j in range(1, length+1):
for i in range(j):
combination_couples.append((i, j))
print(combination_couples)
rand_tuple = random.sample(combination_couples, 1)[0]
final_slice = l[rand_tuple[0]:rand_tuple[1]]
print(final_slice)
To ensure we got them all:
for i in combination_couples:
print(l[i[0]:i[1]])
Alternatively, with some math...
For a length-3 list there are 0 to 3 possible index numbers, that is n=4. You have 2 of them, that is k=2. First index has to be smaller than second, therefor we need to calculate the combinations as described here.
from math import factorial as f
def total_combinations(n, k=2):
result = 1
for i in range(1, k+1):
result *= n - k + i
result /= f(k)
# We add plus 1 since we included [0:0] as well.
return result + 1
print(total_combinations(n=4)) # Prints 7 as expected.
there must be a way to generate a single random number and use that one value to figure out both starting index and end/length.
It is difficult to say what method is best but if you're only interested in binding single random number to your contiguous slice you can use modulo.
Given a list l and a single random nubmer r you can get your contiguous slice like that:
l[r % len(l) : some_sparkling_transformation(r) % len(l)]
where some_sparkling_transformation(r) is essential. It depents on your needs but since I don't see any special requirements in your question it could be for example:
l[r % len(l) : (2 * r) % len(l)]
The most important thing here is that both left and right edges of the slice are correlated to r. This makes a problem to define such contiguous slices that wont follow any observable pattern. Above example (with 2 * r) produces slices that are always empty lists or follow a pattern of [a : 2 * a].
Let's use some intuition. We know that we want to find a good random representation of the number r in a form of contiguous slice. It cames out that we need to find two numbers: a and b that are respectively left and right edges of the slice. Assuming that r is a good random number (we like it in some way) we can say that a = r % len(l) is a good approach.
Let's now try to find b. The best way to generate another nice random number will be to use random number generator (random or numpy) which supports seeding (both of them). Example with random module:
import random
def contiguous_slice(l, r):
random.seed(r)
a = int(random.uniform(0, len(l)+1))
b = int(random.uniform(0, len(l)+1))
a, b = sorted([a, b])
return l[a:b]
Good luck and have fun!

Finding the largest palindrome product of two 3-digit numbers: what is the error in logic?

I thought of solving this problem in the following way: start with two variables with value 999, multiplying one by another in a loop that decrements one or the other until a palindrome is found. The code is this:
def is_palindrome(n):
if str(n) == str(n)[::-1]:
return True
else:
return False
def largest_palindrome_product_of_3_digit():
x = 999
y = 999
for i in reversed(range(x + y + 1)):
if is_palindrome(x * y):
return x * y
if i % 2 == 0:
x -= 1
else:
y -= 1
The result of my method is 698896, while the correct result is 906609. Could you point me where my logic is incorrect?
Here are a couple of hints:
If n=y*x is any number in the range(600000, 700000) (for example) with y<=x, and x<1000, what's the smallest possible value of x?
If n is a palindromic number, both its first and last digit are 6, so what does that imply about the last digits of x & y?
Now generalize and figure out an efficient algorithm. :)
I've never done this problem before, but I just coded a reasonably fast algorithm that's around 2000 times faster than a brute-force search that uses
for x in xrange(2, 1000):
for y in xrange(2, x+1):
n = y*x
#etc
According to timeit.py, the brute-force algorithm takes around 1.29 seconds on my old machine, the algorithm I hinted at above takes around 747 microseconds.
Edit
I've improved my bounds (and modified my algorithm slightly) and brought the time down to 410 µsec. :)
To answer your questions in the comment:
Yes, we can start x at the square root of the beginning of the range, and we can stop y at x (just in case we find a palindromic square).
What I was getting at with my 2nd hint is that for x=10*I+i, y=10*J+j, we don't need to test all 81 combinations of i and j, we only need to test the ones where (i*j)%10 equals the digit we want. So if we know that our palindrome starts and ends with 9 then (i, j) must be in [(1, 9), (3, 3), (7, 7), (9, 1)].
I don't think I should post my actual code here; it's considered bad form on SO to post complete solutions to Project Euler problems. And perhaps some SO people don't even like it when people supply hints. Maybe that's why I got down-voted...
You're missing possible numbers.
You're considering O(x+y) numbers and you need to consider O(x * y) numbers. Your choices are, essentially, to either loop one of them from 999, down to 1, then decrement the other and...
Simple demonstration:
>>> want = set()
>>> for x in [1, 2, 3, 4, 5]:
... for y in [1, 2, 3, 4, 5]:
... want.add(x * y)
...
>>> got = set()
>>> x = 5
>>> y = 5
>>> for i in reversed(range(x + y + 1)):
... got.add(x * y)
... if i % 2:
... x -= 1
... else:
... y -= 1
...
>>> want == got
False
Alternatively, you do know the top of the range (999 * 999) and you can generate all palindromic numbers in that range, from the highest to the lowest. From there, doing a prime factorization and checking if there's a split of the factors that multiply to two numbers in the range [100,999] is trivial.

Finding numbers from a to b not divisible by x to y

This is a problem I've been pondering for quite some time.
What is the fastest way to find all numbers from a to b that are not divisible by any number from x to y?
Consider this:
I want to find all the numbers from 1 to 10 that are not divisible by 2 to 5.
This process will become extremely slow if I where to use a linear approach;
Like this:
result = []
a = 1
b = 10
x = 2
y = 5
for i in range(a,b):
t = False
for j in range(x,y):
if i%j==0:
t = True
break
if t is False:
result.append(i)
return result
Does anybody know of any other methods of doing this with less computation time than a linear solution?
If not, can anyone see how this might be done faster, as I am blank at this point...
Sincerely,
John
[EDIT]
The range of the number are 0 to >1,e+100
This is true for a, b, x and y
You only need to check prime values in the range of the possible divisors - for example, if a value is not divisible by 2, it won't be divisible by any multiple of 2 either; likewise for every other prime and prime multiple. Thus in your example you can check 2, 3, 5 - you don't need to check 4, because anything divisible by 4 must be divisible by 2. Hence, a faster approach would be to compute primes in whatever range you are interested in, and then simply calculate which values they divide.
Another speedup is to add each value in the range you are interested in to a set: when you find that it is divisible by a number in your range, remove it from the set. You then should only be testing numbers that remain in the set - this will stop you testing numbers multiple times.
If we combine these two approaches, we see that we can create a set of all values (so in the example, a set with all values 1 to 10), and simply remove the multiples of each prime in your second range from that set.
Edit: As Patashu pointed out, this won't quite work if the prime that divides a given value is not in the set. To fix this, we can apply a similar algorithm to the above: create a set with values [a, b], for each value in the set, remove all of its multiples. So for the example given below in the comments (with [3, 6]) we'd start with 3 and remove it's multiples in the set - so 6. Hence the remaining values we need to test would be [3, 4, 5] which is what we want in this case.
Edit2: Here's a really hacked up, crappy implementation that hasn't been optimized and has horrible variable names:
def find_non_factors():
a = 1
b = 1000000
x = 200
y = 1000
z = [True for p in range(x, y+1)]
for k, i in enumerate(z):
if i:
k += x
n = 2
while n * k < y + 1:
z[(n*k) - x] = False
n += 1
k = {p for p in range(a, b+1)}
for p, v in enumerate(z):
if v:
t = p + x
n = 1
while n * t < (b + 1):
if (n * t) in k:
k.remove(n * t)
n += 1
return k
Try your original implementation with those numbers. It takes > 1 minute on my computer. This implementation takes under 2 seconds.
Ultimate optimization caveat: Do not pre-maturely optimize. Any time you attempt to optimize code, profile it to ensure it needs optimization, and profile the optimization on the same kind of data you intend it to be optimized for to confirm it is a speedup. Almost all code does not need optimization, just to give the correct answer.
If you are optimizing for small x-y and large a-b:
Create an array with length that is the lowest common multiple out of all the x, x+1, x+2... y. For example, for 2, 3, 4, 5 it would be 60, not 120.
Now populate this array with booleans - false initially for every cell, then for each number in x-y, populate all entries in the array that are multiples of that number with true.
Now for each number in a-b, index into the array modulo arraylength and if it is true, skip else if it is false, return.
You can do this a little quicker by removing from you x to y factors numbers whos prime factor expansions are strict supersets of other numbers' prime factor expansions. By which I mean - if you have 2, 3, 4, 5, 4 is 2*2 a strict superset of 2 so you can remove it and now our array length is only 30. For something like 3, 4, 5, 6 however, 4 is 2*2 and 6 is 3*2 - 6 is a superset of 3 so we remove it, but 4 is not a superset of everything so we keep it in. LCM is 3*2*2*5 = 60. Doing this kind of thing would give some speed up on its own for large a-b, and you might not need to go the array direction if that's all you need.
Also, keep in mind that if you aren't going to use the entire result of the function every single time - like, maybe sometimes you're only interested in the lowest value - write it as a generator rather than as a function. That way you can call it until you have enough numbers and then stop, saving time.

Categories

Resources