form the largest number possible in a list [duplicate] - python

This question already has answers here:
Sort a list to form the largest possible number
(9 answers)
Closed 4 years ago.
Given a list such as:
[3, 30, 34, 5, 9].
Output: 9534330
Write a program to return the largest number possible
In my code I have used permutation here:
from itertools import permutations
x = [3, 30, 34, 5, 9]
y = permutations(x)
n = len(y)
e = []
for i in y:
a = map(str, i)
e.append(int("".join(i)))
print "Largest Number {}".format(sorted(e)[-1])
Here n which is the length of the number of permutations is 120 because of 5!.
Is there a better way to solve this problem?

Sorting all numbers in descending order is the simplest solution that occurs to us. But this doesn’t work.
For example, 548 is greater than 60, but in the output, 60 comes before 548. As a second example, 98 is greater than 9, but 9 comes before 98 in the output.
The solution is to use any comparison based sorting algorithm. Thus, instead of using the default comparison, write a comparison function myCompare() and use it to sort numbers.
Given two numbers X and Y, how should myCompare() decide which number to put first – we compare two numbers XY (Y appended at the end of X) and YX (X appended at the end of Y).
If XY is larger, then, in the output, X should come before Y, else Y should come before X.
For example, let X and Y be 542 and 60. To compare X and Y, we compare 54260 and 60542. Since 60542 is greater than 54260, we put Y first.
Calculating Permutations yield a higher time complexity.
A better solution in python would be:
def largestNumber(A):
maxlen = len(str(max(A)))
if all(v == 0 for v in A):
return '0'
return ''.join(sorted((str(v) for v in A), reverse=True,
key=lambda i: i*(maxlen * 2 // len(i))))
largestNumber([3, 30, 34, 5, 9])

The solution to this problem leads to an interesting transformation that is worth explaining.
Assume we want to know which of XY or YX is larger for given X and Y. Numerically, we want the largest of X.10^y + Y and Y.10^x + X, where the lowercase denote the number of digits of the uppercase variables.
Then with a little math, the comparison
X.10^y + Y < Y.10^x + X
can be rewritten
X / (10^x - 1) < Y / (10^y - 1)
so that XY < YX is certainly a transitive relation and defines a total order. This is very good news because it means that the problem can be reduced to ordinary sorting by using this modified comparison operation.
Now notice that X / (10^x - 1) is a periodic fractional number of the form 0.XXXX..., and to compare 0.XXXX... and 0.YYYY..., it suffices to compare over the longest period. Hence the comparison can work as an ordinary string comparison, except that when the end of the shorter string is reached, we cycle back to the first character.
E.g. 12345 > 12 because 12345 > 12|12|1 and 12105 < 12 because 12105 < 12|12|1.
The comparison function can be described as follows:
def Cmp(X, Y):
l= max(len(X), len(Y))
for i in range(l):
if X[i % len(X)] < Y[i % len(Y)]:
return 1 # X > Y
elif X[i % len(X)] > Y[i % len(Y)]:
return -1 # X < Y
return 0 # X == Y
I don't recommend this particular implementation, which will be slow because of the %.

Related

What is an efficient way of counting the number of unique multiplicative and additive pairs in a list of integers in Python?

Given a sorted array A = [n, n+1, n+2,... n+k] elements, I am trying to count the unique number of multiplicative and additive pairs such that the condition xy >= x+y is satisfied. Where x and y are indices of the list, and y > x.
Here is my minimum working example using a naive brute force approach:
def minimum_working_example(A):
A.sort()
N = len(A)
mpairs = []
x = 0
while x < N:
for y in range(N):
if x<y and (A[x]*A[y])>=(A[x]+A[y]):
mpairs.append([A[x], A[y]])
else:
continue
x+=1
return len(mpairs)
A = [1,2,3,4,5]
print(minimum_working_example(A))
#Output = 6, Unique pairs that satisfy xy >= x+y: (2, 3), (2, 4), (2, 5), (3, 4), (3, 5), (4, 5)
However this approach has an exponential time complexity for large lists.
What sorting or searching algorithms exist that will allow me to implement a more efficient solution?
This question has a closed-form mathematical solution, but if you'd prefer to implement in a programming langauge, you just need to find all unique pairs of numbers from your list, and count the number that satisfy your requirement. itertools.combinations is your friend here:
import itertools
A = [1,2,3,4,5]
pairs = []
for x, y in itertools.combinations(A, 2):
if x*y >= x + y:
pairs.append((x,y))
Output
[(2, 3), (2, 4), (2, 5), (3, 4), (3, 5), (4, 5)]
Basic algebra ... solve for one variable in terms of the other:
xy >= x + y
xy - y >= x
y(x-1) >= x
Now, if your elements are all positive integers, you get
if x == 1, no solution
if x == 2, y >= 2
else x > 2
y >= x/(x-1)
In this last case, x/(x-1) is a fraction between 1 and 2; again,
y >= 2
Solves the inequality.
This gives you a trivially accessible solution in O(1) time; if you want the pairs themselves, you're constrained by the printing, which is O(n^2) time.
So using the fact that x*y >= x+y if both (mistake in my original comment) x and y are >=2 (see #Prune's answer for details), then you may as well remove 0 and 1 from your list if they appear, because they won't make any suitable pair.
So now assuming all numbers or >=2 and you have k of them (e.g. replace k by k-1 in the following operation if you have n=1), all possible pairs will satisfy your condition. And the number of pairs among k elements is the well known formula k*(k-1)/2 (google it if you don't know about it). The time to compute this number is essentially the same (one multiplication, one division) no matter what value of k you have (unless you start going to crazy big numbers), so complexity is O(1).
This assumes your integers are positive, if not the formula will be slightly more complicated but still possible as a closed form solution.
If you want a more mathematical solution, consider that xy > x+y has no solutions for y=1. Otherwise, you can algebraically work this out to x > y/(y-1). Now if we have two consecutive, positive integers and divide the larger by the smaller, we either get exactly 2 (if y=2) or get some fraction between 1 and 2 exclusive. Note that x has to be greater than this y/(y-1) quotient, but also has to be less than y. If y=2, then the only possible x value in our list of positive integers has to be 1, in which case there are no matches because 1 is not greater than 2/1. So this all simplifies to "For each number y in our list, count all of the values x that are in the range of [2,y)." If you do the math, this should come out to adding 1 + 2 + 3 + ... + k, which is simply k(k+1)/2. Again, we're assuming n and k are positive integers; you can derive a slightly more complicated formula when you take into account cases for n <= 0.
But assuming you DO want to stick with a brute force approach, and not do a little mathematical reasoning to find a different approach: I tried out several variations, and here's a faster solution based on the following.
You said the list is already sorted, so I dropped the sorting function.
Likewise, the "else: continue" isn't necessary, so for simplicity I dropped that.
Instead of looping through all x and y values, then checking if x < y, you can just make your second loop check y values in the range from x+1 to y. BUT...
You can use itertools to generate the unique pairs of all numbers in your list A
If you ultimately really only care about the length of the pairs list and not the number pairs themselves, then you can just count the pairs along the way instead of storing them. Otherwise you can run out of memory at high N values.
I get slightly faster results with the equivalent test of x(y-1)-y>0. More so than with x(y-1)>y too.
So here's what I have:
def example4(A):
mpair_count = 0
for pair in itertools.combinations(A, 2):
if pair[0]*(pair[1]-1) - pair[1] > 0:
mpair_count += 1
return mpair_count
Here's everything timed:
from timeit import default_timer as timer
import itertools
def minimum_working_example(A):
A.sort()
N = len(A)
mpairs = []
x = 0
while x < N:
for y in range(N):
if x<y and (A[x]*A[y])>=(A[x]+A[y]):
mpairs.append([A[x], A[y]])
else:
continue
x+=1
return len(mpairs)
# Cutting down the range
def example2(A):
N = len(A)
mpairs = []
x = 0
while x < N:
for y in range(x+1,N):
if (A[x]*A[y])>=(A[x]+A[y]):
mpairs.append([A[x], A[y]])
x += 1
return len(mpairs)
# Using itertools
def example3(A):
mpair_count = 0
for pair in itertools.combinations(A, 2):
if pair[0]*pair[1] > sum(pair):
mpair_count += 1
return mpair_count
# Using itertools and the different comparison
def example4(A):
mpair_count = 0
for pair in itertools.combinations(A, 2):
if pair[0]*(pair[1]-1) - pair[1] > 0:
mpair_count += 1
return mpair_count
# Same as #4, but slightly different
def example5(A):
mpair_count = 0
for pair in itertools.combinations(A, 2):
if pair[0]*(pair[1]-1) > pair[1]:
mpair_count += 1
return mpair_count
A = range(1,5000)
start = timer()
print(minimum_working_example(A))
end = timer()
print(end - start)
start = timer()
print(example2(A))
end = timer()
print(end - start)
start = timer()
print(example3(A))
end = timer()
print(end - start)
start = timer()
print(example4(A))
end = timer()
print(end - start)
start = timer()
print(example5(A))
end = timer()
print(end - start)
Result:
12487503
8.29403018155
12487503
7.81883932384
12487503
3.39669140954
12487503
2.79594281764
12487503
2.92911447083

Getting timeout error in this python while using itertools.permutation ? Please tell me how can i reduce time required by this program execution

Yet Another Minimax Problem
You are given non-negative integers. We define the score for some permutation () of length to be the maximum of for.
Find the permutation with the minimum possible score and print its score.
Note: is the exclusive-OR (XOR) operator.
code:
# Enter your code here. Read input from STDIN. Print output to STDOUT
import itertools
import math
from operator import xor
def per_me(g):
max =0
for r in range(0,len(g)-1):
if(xor(g[r],g[r+1])>max):
max=(xor(g[r],g[r+1]))
return max
n = int(raw_input())
arr = raw_input()
l = list(map(int,arr.split(' ')))
p = itertools.permutations(l)
count = 1000000000000
for i in p:
if(per_me(i)<count):
count = per_me(i)
print count
Input:
10
12 0 4 3 1 1 12 3 11 11
output:
8
How can I reduce time required by this code
Let i be the largest natural number such that some but not all of the input numbers have bit i set. The minimum score will be the minimum value of a xor b where a has bit i set, and b doesn't, and a and b are in the input list. It's easy to see that the score must be at least this large (since at some point in any permutation there must be a number with bit i set next to one without bit i set), and any permutation that groups all the inputs with bit i set first and the inputs without bit i set afterwards and puts the a and b from above together at the boundary achieves exactly that score (because a xor b has i as its highest bit, and all other adjacent numbers have highest bit less than i).
Even without thinking further, this reduces the problem to an O(n^2) problem, which should be good enough since n <= 3000.
def minxor(aa):
bits = unbits = 0
for a in aa:
bits |= a
unbits |= ~a
i = bits & unbits
while i & (i-1):
i &= i-1
xs = [a for a in aa if a & i]
ys = [a for a in aa if (a & i) == 0]
return min(x ^ y for x in xs for y in ys)
print minxor([1, 2, 3, 4])
print minxor([1, 2, 3])
print minxor([7, 6, 5, 4])
print minxor([12, 0, 4, 3, 1, 1, 12, 3, 11, 11])
(Note that in the code, i is not quite the same as in the description -- rather than the index of the largest bit that's present in some but not all of the inputs, it's the value of that bit).
One can optimize further by not comparing every x and y when computing the min. That reduces the solution to O(n log n), and one can find the solution even for quite large input lists:
def bestpair(xs, ys, i):
if i == 0:
return 0
x0 = [x for x in xs if (x&i)==0]
x1 = [x for x in xs if x&i]
y0 = [y for y in ys if (y&i)==0]
y1 = [y for y in ys if y&i]
choices = []
if x0 and y0:
choices.append(bestpair(x0, y0, i//2))
if x1 and y1:
choices.append(bestpair(x1, y1, i//2))
if choices:
return min(choices)
return bestpair(xs, ys, i//2) + i
def minxor(aa):
bits = unbits = 0
for a in aa:
bits |= a
unbits |= ~a
i = bits & unbits
while i & (i-1):
i &= i-1
return bestpair([a for a in aa if a & i], [a for a in aa if (a & i)==0], i)
print minxor(range(100000))

MaxDoubleSliceSum Algorithm

I'm trying to solve the problem of finding the MaxDoubleSliceSum value. Simply, it's the maximum sum of any slice minus one element within this slice (you have to drop one element, and the first and the last element are excluded also). So, technically the first and the last element of the array cannot be included in any slice sum.
Here's the full description:
A non-empty zero-indexed array A consisting of N integers is given.
A triplet (X, Y, Z), such that 0 ≤ X < Y < Z < N, is called a double slice.
The sum of double slice (X, Y, Z) is the total of A[X + 1] + A[X + 2] + ... + A[Y − 1] + A[Y + 1] + A[Y + 2] + ... + A[Z − 1].
For example, array A such that:
A[0] = 3
A[1] = 2
A[2] = 6
A[3] = -1
A[4] = 4
A[5] = 5
A[6] = -1
A[7] = 2
contains the following example double slices:
double slice (0, 3, 6), sum is 2 + 6 + 4 + 5 = 17,
double slice (0, 3, 7), sum is 2 + 6 + 4 + 5 − 1 = 16,
double slice (3, 4, 5), sum is 0.
The goal is to find the maximal sum of any double slice.
Write a function:
def solution(A)
that, given a non-empty zero-indexed array A consisting of N integers, returns the maximal sum of any double slice.
For example, given:
A[0] = 3
A[1] = 2
A[2] = 6
A[3] = -1
A[4] = 4
A[5] = 5
A[6] = -1
A[7] = 2
the function should return 17, because no double slice of array A has a sum of greater than 17.
Assume that:
N is an integer within the range [3..100,000];
each element of array A is an integer within the range [−10,000..10,000].
Complexity:
expected worst-case time complexity is O(N);
expected worst-case space complexity is O(N), beyond input storage (not counting the storage required for input arguments).
Elements of input arrays can be modified.
Here's my try:
def solution(A):
if len(A) <= 3:
return 0
max_slice = 0
minimum = A[1] # assume the first element is the minimum
max_end = -A[1] # and drop it from the slice
for i in xrange(1, len(A)-1):
if A[i] < minimum: # a new minimum found
max_end += minimum # put back the false minimum
minimum = A[i] # assign the new minimum to minimum
max_end -= minimum # drop the new minimum out of the slice
max_end = max(0, max_end + A[i])
max_slice = max(max_slice, max_end)
return max_slice
What makes me think that this may approach the correct solution but some corners of the problem may haven't been covered is that 9 out 14 test cases pass correctly (https://codility.com/demo/results/demoAW7WPN-PCV/)
I know that this can be solved by applying Kadane’s algorithm forward and backward. but I'd really appreciate it if someone can point out what's missing here.
Python solution O(N)
This should be solved using Kadane’s algorithm from two directions.
ref:
Python Codility Solution
C++ solution - YouTube tutorial
JAVA solution
def compute_sum(start, end, step, A):
res_arr = [0]
res = 0
for i in range(start, end, step):
res = res + A[i]
if res < 0:
res_arr.append(0)
res = 0
continue
res_arr.append(res)
return res_arr
def solution(A):
if len(A) < 3:
return 0
arr = []
left_arr = compute_sum(1, len(A)-1, 1, A)
right_arr = compute_sum(len(A)-2, 0, -1, A)
k = 0
for i in range(len(left_arr)-2, -1, -1):
arr.append(left_arr[i] + right_arr[k])
k = k + 1
return max(arr)
This is just how I'd write the algorithm.
Assume a start index of X=0, then iteratively sum the squares to the right.
Keep track of the index of the lowest int as you count, and subtract the lowest int from the sum when you use it. This effectively lets you place your Y.
Keep track of the max sum, and the X, Y, Z values for that sum
if the sum ever turns negative then save the max sum as your result, so long as it is greater than the previous result.
Choose a new X, You should start looking after Y and subtract one from whatever index you find. And repeat the previous steps, do this until you have reached the end of the list.
How might this be an improvement?
Potential problem case for your code: [7, 2, 4, -18, -14, 20, 22]
-18 and -14 separate the array into two segments. The sum of the first segment is 7+2+4=13, the sum of the second segment is just 20. The above algorithm handles this case, yours might but I'm bad at python (sorry).
EDIT (error and solution): It appears my original answer brings nothing new to what I thought was the problem, but I checked the errors and found the actual error occurs here: [-20, -10, 10, -70, 20, 30, -30] will not be handled correctly. It will exclude the positive 10, so it returns 50 instead of 60.
It appears the askers code doesn't correctly identify the new starting position (my method for this is shown in case 4), it's important that you restart the iterations at Y instead of Z because Y effectively deletes the lowest number, which is possibly the Z that fails the test.

Finding the largest palindrome product of two 3-digit numbers: what is the error in logic?

I thought of solving this problem in the following way: start with two variables with value 999, multiplying one by another in a loop that decrements one or the other until a palindrome is found. The code is this:
def is_palindrome(n):
if str(n) == str(n)[::-1]:
return True
else:
return False
def largest_palindrome_product_of_3_digit():
x = 999
y = 999
for i in reversed(range(x + y + 1)):
if is_palindrome(x * y):
return x * y
if i % 2 == 0:
x -= 1
else:
y -= 1
The result of my method is 698896, while the correct result is 906609. Could you point me where my logic is incorrect?
Here are a couple of hints:
If n=y*x is any number in the range(600000, 700000) (for example) with y<=x, and x<1000, what's the smallest possible value of x?
If n is a palindromic number, both its first and last digit are 6, so what does that imply about the last digits of x & y?
Now generalize and figure out an efficient algorithm. :)
I've never done this problem before, but I just coded a reasonably fast algorithm that's around 2000 times faster than a brute-force search that uses
for x in xrange(2, 1000):
for y in xrange(2, x+1):
n = y*x
#etc
According to timeit.py, the brute-force algorithm takes around 1.29 seconds on my old machine, the algorithm I hinted at above takes around 747 microseconds.
Edit
I've improved my bounds (and modified my algorithm slightly) and brought the time down to 410 µsec. :)
To answer your questions in the comment:
Yes, we can start x at the square root of the beginning of the range, and we can stop y at x (just in case we find a palindromic square).
What I was getting at with my 2nd hint is that for x=10*I+i, y=10*J+j, we don't need to test all 81 combinations of i and j, we only need to test the ones where (i*j)%10 equals the digit we want. So if we know that our palindrome starts and ends with 9 then (i, j) must be in [(1, 9), (3, 3), (7, 7), (9, 1)].
I don't think I should post my actual code here; it's considered bad form on SO to post complete solutions to Project Euler problems. And perhaps some SO people don't even like it when people supply hints. Maybe that's why I got down-voted...
You're missing possible numbers.
You're considering O(x+y) numbers and you need to consider O(x * y) numbers. Your choices are, essentially, to either loop one of them from 999, down to 1, then decrement the other and...
Simple demonstration:
>>> want = set()
>>> for x in [1, 2, 3, 4, 5]:
... for y in [1, 2, 3, 4, 5]:
... want.add(x * y)
...
>>> got = set()
>>> x = 5
>>> y = 5
>>> for i in reversed(range(x + y + 1)):
... got.add(x * y)
... if i % 2:
... x -= 1
... else:
... y -= 1
...
>>> want == got
False
Alternatively, you do know the top of the range (999 * 999) and you can generate all palindromic numbers in that range, from the highest to the lowest. From there, doing a prime factorization and checking if there's a split of the factors that multiply to two numbers in the range [100,999] is trivial.

splitting a list dynamically with range and value to split

I want to split the value into number of spits provided. so for example if I have a value = 165340
and split = 5 then the list should become ['0-33068', '33069-66137', '66138-99204', '99205-132272', '132273-165340']...
so far I have just come up with something like this but this is not dynamic...
so thinking how can I build a list of strings like of numbers split with the difference val/split
for i in range(split):
if i==0:
lst.append('%s-%s' % (i, val/split))
elif i==1:
lst.append('%s-%s' % (val/split+i, val/split*2+1))
elif i == 2:
lst.append('%s-%s' % (val/split*i+2, val/split*3))
elif i == 3:
lst.append('%s-%s' % (val/split*i+1, val/split*4))
elif i == 4:
lst.append('%s-%s' % (val/split*i+1, val/split*5))
else:
pass
FINAL:
I made a bunch of attempts here, especially in using remainder = value % numsplits, then int(i * remainder // numsplits) to try and keep things close. Eventually, though, I had to give up and go back to floating point which seems to give the closest results. The usual floating point concerns apply.
def segment(value, numsplits):
return ["{}-{}".format(
int(round(1 + i * value/(numsplits*1.0),0)),
int(round(1 + i * value/(numsplits*1.0) +
value/(numsplits*1.0)-1, 0))) for
i in range(numsplits)]
>>> segment(165340, 5)
['1-33068', '33069-66136', '66137-99204', '99205-132272', '132273-165340']
>>> segment(7, 4)
['1-2', '3-4', '4-5', '6-7']
I don't see a huge issue with this one. I did start at 1 instead of 0, but that's not necessary (change both the int(round(1 + i * ... to int(round(i * ... to change that). Old results follow.
value = 165340
numsplits = 5
result = ["{}-{}".format(i + value//numsplits*i, i + value//numsplits*i + value//numsplits) for i in range(numsplits)]
Probably worth tossing in a function
def segment(value,numsplits):
return ["{}-{}".format(value*i//numsplits, 1 + value//numsplits*i + value//numsplits) for i in range(numsplits)]
The following will cut it off at your value
def segment(value, numsplits):
return ["{}-{}".format(max(0,i + value*i//numsplits), min(value,i + value*i//numsplits + value//numsplits)) for i in range(numsplits)]
To answer this question, it's important to know exactly how we should treat 0 - but it doesn't seem like you've asked yourself this question. The intervals in your example output are inconsistent; you're starting with 0 in the first interval and the first two intervals both have 33,069 elements (counting 0) in them, but you're also ending your last interval at 165340. If 0 and 165340 are both counted in the number of elements, then 165340 is not divisible into five even intervals.
Here are a few different solutions that might help you understand the problem.
Even intervals, counting from zero
Let's start with the assumption that you really do want both 0 and the "top" value counted as elements and displayed in the result. In other words, the value 11 would actually indicate the following 12-element range:
[0, 1, 2, 3, 4, 5, 6, 7, 8, 9, 10, 11]
And be evenly split into the following non-negative intervals:
['0-3', '4-7', '8-11']
If we're only concerned with evenly-divisible cases, we can use a fairly short function (NOTE: These solutions are valid for Python 3.x, or for Python 2.x with from __future__ import division):
>>> def evenintervals(value, n):
... binsize = (value + 1) // n
... intervals = ((x * binsize, (x + 1) * binsize - 1) for x in range(n))
... return ['{}-{}'.format(x, y) for x, y in intervals]
...
>>> evenintervals(11, 3)
['0-3', '4-7', '8-11']
>>> evenintervals(17, 2)
['0-8', '9-17']
However, this function deals with 165340 (and any other not-evenly-divisible case) by dropping some numbers off the end:
>>> evenintervals(165340, 5)
['0-33067', '33068-66135', '66136-99203', '99204-132271', '132272-165339']
From a purely mathematical perspective, this just doesn't work. However, we could fudge it a bit if for some reason you want to display 0, but not actually count it as an element of the first interval.
Even intervals, counting from one
Here's a function that doesn't count 0 as an element of the list, but does give you the option of displaying it, if you're just that zany:
>>> def evenintervals1(value, n, show_zero=False):
... binsize = value // n
... intervals = [[x * binsize + 1, (x + 1) * binsize] for x in range(n)]
... if show_zero:
... intervals[0][0] = 0
... return ['{}-{}'.format(x, y) for x, y in intervals]
...
>>> evenintervals1(20, 4)
['1-5', '6-10', '11-15', '16-20']
>>> evenintervals1(20, 5, show_zero=True)
['0-5', '6-10', '11-15', '16-20']
This version of the function might be the closest thing to what you asked for in your question, even though it doesn't show the exact values you gave in your example output:
>>> evenintervals1(165340, 5, show_zero=True)
['0-33068', '33069-66136', '66137-99204', '99205-132272', '132273-165340']
But we still have problems with inputs that aren't evenly divisible. What if we wanted a more general solution?
Uneven intervals
Let's think about how to deal with a wider range of inputs. We should be able to produce, from any positive integer n, anywhere from 1 to n non-overlapping ranges of positive integers. In other words, if our integer is 5, we want to be able to produce a list with as many as five ranges. But how should we distribute "extra" elements, in order to make the ranges as even as possible?
We probably don't want to distribute them randomly. We could just lengthen or shorten the last range in the list, but that has the potential to be very lop-sided:
# 40 split 7 times, adding remainder to last item
['1-5', '6-10', '11-15', '16-20', '21-25', '26-30', '31-40']
# 40 split 7 times, subtracting excess from last item
['1-6', '7-12', '13-18', '19-24', '25-30', '31-36', '37-40']
In the former case the last element is 100% larger than the others and in the latter case it's 33% smaller. If you're splitting a very large value into a much smaller number of intervals, this may not be as much of a problem.
More likely, we want a function that produces the most even set of ranges possible. I'm going to do this by spreading the remainder of the division out among the first elements of the list, with a little help from itertools:
>>> from itertools import zip_longest # izip_longest for Python 2.7
>>> def anyintervals(value, n):
... binsize, extras = value // n, value % n
... intervals = []
... lower = 0
... upper = 0
... for newbinsize in map(sum, zip_longest([binsize] * n, [1] * extras, fillvalue=0)):
... lower, upper = upper + 1, upper + newbinsize
... intervals.append((lower, upper))
... return ['{}-{}'.format(x, y) for x, y in intervals]
...
>>> anyintervals(11, 3)
['1-4', '5-8', '9-11']
>>> anyintervals(17, 2)
['1-9', 10-17']
Finally, with the example inputs given in the OP:
>>> anyintervals(165340, 5)
['1-33068', '33069-66136', '66137-99204', '99205-132272', '132273-165340']
If it were really important to show the first interval starting at zero, we could apply the same logic here that was used in evenintervals1 to modify the very first integer in intervals before returning, or write a similar function to this one that started counting at zero.
I did implement another version that distributes the "extras" among the last ranges rather than the first, and there are certainly many other implementations that you might be interested in fiddling around with, but those solutions are left as an exercise to the reader. ;)
One possibility using numpy:
from numpy import arange
v = 165340
s = 5
splits = arange(s + 1) * (v / s)
lst = ['%d-%d' % (splits[idx], splits[idx+1]) for idx in range(s)]
print '\n'.join(lst)
output:
0-33068
33068-66136
66136-99204
99204-132272
132272-165340

Categories

Resources