Code takes a lot of time to run for large numbers

Code takes a lot of time to run for large numbers - python

Given the number of squares in a board (e.g. scrabble or chess board), N and dimensions AxB, this code tries to determine all possible
dimensional combinations that can give N number of squares in the board.
Example: N = 8
There are four possible dimensional combinations to obtain exactly 8 squares in the board. So, the code outputs board dimensions
1x8 2x3, 3x2 and 8x1. The 8x1 boards have eight 1x1 squares; the 3x2 boards have six 1x1 squares and two 2x2 squares.
Here is my solution:
def dims(num_sqrs):
dim_list=[]
for i in range(1,num_sqrs+1):
temp = []
for x in range(1,num_sqrs+1):
res = 0
a = i
b = x
while (a != 0) and (b !=0):
res = res + (a*b)
a = a -1
b = b-1
if res == num_sqrs:
dim_list.append((i,x))
print(dim_list)
dims(8)
However, this code takes too much time to run for large values of N.
Any suggestion to optimize the efficiency of the code will be much appreciated.

Here are two pretty obvious observations:
The square count for AxB is the same as the square count for BxA
If C>B then the square count for AxC is greater than the square count for AxB
Given those facts, it should be clear that:
We only need to consider AxB for A≤B, since we can just add BxA to the list if A≠B
For a given A and N, there is at most one value of B which has a square count of N.
The code below is based on the above. It tries each AxA in turn, for each one checking to see if there is some B≥A which produces the correct square count. It stops when the square count for AxA exceeds N.
Now, to find the correct value of B, a couple of slightly less obvious observations.
Suppose the square count for AxA is N. Then the square count for (A+1)x(Ax1) is N + (A+1)².
Proof: Every square in AxA can be identified by its upper left co-ordinate [i, j] and its size s. I'll write that as [s: *i, j]. (Here I'm assuming that coordinates are zero-based and go from top to bottom and left to right.)
For each such square 0 ≤ i + s < A and 0 ≤ j + s < A (assuming 0-based coordinates).
Now, suppose we change each square [s: i, j] into the square based at the same coordinate but with a size one larger, [s+1: i, j]. That new square is a square in (A+1)x(A+1), because 0 ≤ i + s + 1 < A + 1 (and similarly for j). So that transformation gives us every square in A + 1 whose size is at least 2. The only squares which we've missed are the squares of size 1, and there are exactly (A+1)×(A+1) of them.
Suppose the square count for AxB is N, and B≥A. Then the square count for Ax(B+1) is N + the sum of each integer from 1 to A. (These are the triangular number, which are A×(A+1)/2; I think that's well-known.)
Proof: The squares in Ax(B+1) are precisely the squares in AxB plus the squares whose right-hand side includes the last column of Ax(B+1). So we only need to count those. There is one such square of size A, two of size A-1, three of size A-2, and so on up to A squares of size 1.
So for a given A, we can compute the square count for AxA and the increment in the square count for each increase in B. If the increment even divides the difference between the target count and the count of AxA, then we've found an AxB.
The program below also relies on one more algebraic identity, which is pretty straight-forward: the sum of two consecutive triangular numbers is a square. That's obvious by just arranging the two triangles. The larger one contains the diagonal of the square. These facts are used to compute the next base value and increment for the next value of A.
def finds(n):
a = 1
base = 1 # Square count for AxA
inc = 1 # Difference between count(AxB) and count(AxB+1)
rects = []
while base < n:
if (n - base) % inc == 0:
rects.append((a, a + (n - base) // inc))
a += 1
newinc = inc + a
base += inc + newinc
inc = newinc
if base == n:
return rects + [(a, a)] + list(map(lambda p:p[::-1], reversed(rects)))
else:
return rects + list(map(lambda p:p[::-1], reversed(rects)))
The slowest part of that function is adding the reverse of the reverses of the AxB solutions at the end, which I only did to simplify counting the solutions correctly. My first try, which was almost twice as fast, used the loop while base <= n and then just returned rects. But it's still fast enough.
For example:
>>> finds(1000000)
[(1, 1000000), (4, 100001), (5, 66668), (15, 8338), (24, 3341),
(3341, 24), (8338, 15), (66668, 5), (100001, 4), (1000000, 1)]
>>> finds(760760)
[(1, 760760), (2, 253587), (3, 126794), (4, 76077), (7, 27172),
(10, 13835), (11, 11530), (12, 9757), (13, 8364), (19, 4010),
(20, 3629), (21, 3300), (38, 1039), (39, 988), (55, 512),
(56, 495), (65, 376), (76, 285), (285, 76), (376, 65),
(495, 56), (512, 55), (988, 39), (1039, 38), (3300, 21),
(3629, 20), (4010, 19), (8364, 13), (9757, 12), (11530, 11),
(13835, 10), (27172, 7), (76077, 4), (126794, 3), (253587, 2),
(760760, 1)]
The last one came out of this test, which took a few seconds: (It finds each successive maximum number of solutions, if you don't feel like untangling the functional elements)
>>> from functools import reduce
>>> print('\n'.join(
map(lambda l:' '.join(map(lambda ab:"%dx%d"%ab, l)),
reduce(lambda a,b: a if len(b) <= len(a[-1]) else a + [b],
(finds(n) for n in range(2,1000001)),[[(1,1)]]))))
1x1
1x2 2x1
1x5 2x2 5x1
1x8 2x3 3x2 8x1
1x14 2x5 3x3 5x2 14x1
1x20 2x7 3x4 4x3 7x2 20x1
1x50 2x17 3x9 4x6 6x4 9x3 17x2 50x1
1x140 2x47 3x24 4x15 7x7 15x4 24x3 47x2 140x1
1x280 4x29 5x20 6x15 7x12 12x7 15x6 20x5 29x4 280x1
1x770 2x257 3x129 4x78 10x17 11x15 15x11 17x10 78x4 129x3 257x2 770x1
1x1430 2x477 3x239 4x144 10x29 11x25 12x22 22x12 25x11 29x10 144x4 239x3 477x2 1430x1
1x3080 2x1027 3x514 4x309 7x112 10x59 11x50 20x21 21x20 50x11 59x10 112x7 309x4 514x3 1027x2 3080x1
1x7700 2x2567 3x1284 4x771 7x277 10x143 11x120 20x43 21x40 40x21 43x20 120x11 143x10 277x7 771x4 1284x3 2567x2 7700x1
1x10010 2x3337 3x1669 4x1002 10x185 11x155 12x132 13x114 20x54 21x50 50x21 54x20 114x13 132x12 155x11 185x10 1002x4 1669x3 3337x2 10010x1
1x34580 2x11527 3x5764 4x3459 7x1237 12x447 13x384 19x188 20x171 38x59 39x57 57x39 59x38 171x20 188x19 384x13 447x12 1237x7 3459x4 5764x3 11527x2 34580x1
1x40040 2x13347 3x6674 4x4005 7x1432 10x731 11x610 12x517 13x444 20x197 21x180 39x64 64x39 180x21 197x20 444x13 517x12 610x11 731x10 1432x7 4005x4 6674x3 13347x2 40040x1
1x100100 2x33367 3x16684 4x10011 7x3577 10x1823 11x1520 12x1287 13x1104 20x483 21x440 25x316 39x141 55x83 65x68 68x65 83x55 141x39 316x25 440x21 483x20 1104x13 1287x12 1520x11 1823x10 3577x7 10011x4 16684x3 33367x2 100100x1
1x340340 2x113447 3x56724 4x34035 7x12157 10x6191 11x5160 12x4367 13x3744 20x1627 21x1480 34x583 39x449 55x239 65x180 84x123 123x84 180x65 239x55 449x39 583x34 1480x21 1627x20 3744x13 4367x12 5160x11 6191x10 12157x7 34035x4 56724x3 113447x2 340340x1
1x760760 2x253587 3x126794 4x76077 7x27172 10x13835 11x11530 12x9757 13x8364 19x4010 20x3629 21x3300 38x1039 39x988 55x512 56x495 65x376 76x285 285x76 376x65 495x56 512x55 988x39 1039x38 3300x21 3629x20 4010x19 8364x13 9757x12 11530x11 13835x10 27172x7 76077x4 126794x3 253587x2 760760x1

I think the critical detail is that #Qudus is looking for boards where there are N squares of any size.
One simple optimization is to just break when res > n. Another optimization to make it about twice as fast is to only run it for boards where length >= width.
def dims(num_sqrs):
dim_list=[]
for i in range(1, num_sqrs + 1):
temp = []
for x in range(1, i + 1):
res = 0
a = i
b = x
while (a != 0) and (b != 0):
res = res + (a * b)
a = a - 1
b = b - 1
if res > num_sqrs:
break
if res == num_sqrs:
dim_list.append((i, x))
if i != x:
dim_list.append((x, i))
print(dim_list)
Here's a much faster solution that takes a different approach:
def dims(num_sqrs):
dim_list = []
sum_squares = [0]
sums = [0]
for i in range(1, num_sqrs + 1):
sums.append(sums[-1] + i)
sum_squares.append(sum_squares[-1] + i * i)
for i in range(1, num_sqrs + 1):
if sum_squares[i] > num_sqrs:
break
if sum_squares[i] == num_sqrs:
dim_list.append((i, i))
break
for x in range(i + 1, num_sqrs + 1):
total_squares = sum_squares[i] + sums[i] * (x - i)
if total_squares == num_sqrs:
dim_list.append((x, i))
dim_list.append((i, x))
break
if total_squares > num_sqrs:
break
return dim_list

Start with basic algebraic analysis. I derived my own formula for the sums of various sizes. From the initial analysis, we get that for a board of size n x m, there are (n-k)*(m-k) squares of size k. Summing this for k in [0, min(m, n)] we have a simple calculation formula:
sum(((n-k) * (m-k) for k in range(0, min(n, m))))
I expanded the product to nm - k(n+m) + k^2, re-derived the individual series sums, and made a non-iterative formula, assuming n <= m:
n * n * m
- n * (n - 1) / 2 * (n + m)
+ ((n - 1) * n * (2 * n - 1))/6
This first link then spoiled my fun with an even shorter formula:
t = m - n
n * (n + 1) / 6 * (2 * n + 3 * t + 1)
which follows from mine with a bit of nifty rearrangement of terms.
Now to the point of this exercise: given a desired count of squares, Q, find all rectangle dimensions (n, m) that have exactly that many squares. Starting with the formula above:
q = n * (n + 1) / 6 * (2 * n + 3 * t + 1)
Since we're given Q, the desired value for q, we can iterate through all values of n, finding whether there is a positive, integral value for t that satisfies the formula. Start by solving this for t:
t = (6/(n*(n+1)) * q - 2*n - 1) / 3
combining the denominators:
t = (6*q) / (3*n*(n+1)) - (2*n + 1)/3
I'll use the first version. Since a solution of n x m implies a solution of m x n, we can limit our search to only those cases n <= m. Also, since the numerator shrinks (negative n^3 term), we can limit the search for values of n that allow t >= 1 -- in other words, have the combined numerator at least as large as the denominator:
numer = 6 * num_sqrs - n * (n+1) * (2*n+1)
denom = 3 * n * (n+1)
Solving this:
num_sqrs > (n * (n+1) * (n+2)) / 3
Thus, the (cube root of n) / 3 is a convenient upper bound for our loop limits.
This gives us a simple iteration loop in the code:
def dims(num_sqrs):
dim = [(1, num_sqrs)]
limit = ceil((3*num_sqrs)**(1.0/3.0))
for n in range(2, limit):
numer = 6 * num_sqrs - n * (n+1) * (2*n+1)
denom = 3 * n * (n+1)
if numer % denom == 0:
t = numer // denom
if t >= 0:
dim.append((n, n+t))
return dim
Output for a couple of test cases:
>>> print(dims(8))
[(1, 8), (2, 3)]
>>> print(dims(2000))
[(1, 2000), (2, 667), (3, 334), (4, 201)]
>>> print(dims(1000000))
[(1, 1000000), (4, 100001), (5, 66668), (15, 8338), (24, 3341)]
>>> print(dims(21493600))
[(1, 21493600), (4, 2149361), (5, 1432908), (15, 179118), (24, 71653), (400, 401)]
These return immediately, so I expect that this solution is fast enough for OP's purposes.
It's quite possible that a parameterized equation would give us direct solutions, rather than iterating through possibilities. I'll leave that for the Project Euler folks. :-)

This uses the formula derived in the link provided by the OP. The only real optimization is trying not to look at dimensions that cannot produce the result. Pre-loading the results with the two end cases (figures = [(1,n_squares),(n_squares,1)]) saved a lot with big numbers. I think there are others chunks that can be discarded but I haven't figured them out yet.
def h(n_squares):
# easiest case for a n x m figure:
# n = 1 and m = n_squares
figures = [(1,n_squares),(n_squares,1)]
for n in range(2, n_squares+1):
for m in range(n, n_squares+1):
t = m - n
x = int((n * (n + 1) / 6) * ((2 * n) + (3 * t) + 1))
if x > n_squares:
break
if x == n_squares:
figures.extend([(n,m),(m,n)])
#print(f'{n:>6} x {m:<6} has {n_squares} squares')
if x > n_squares and n == m:
break
return figures
It also doesn't make lots of lists which can blow up your computer with really big numbers like 21493600 (400x401).
Formula derivation from link in OP's comment (in case that resource disappears):
text from Link
courtesy:
Doctor Anthony, The Math Forum
Link
If we have an 8 x 9 board the numbers of squares are as follows:
Size of Square Number of Squares
-------------- -----------------
1 x 1 8 x 9 = 72
2 x 2 7 x 8 = 56
3 x 3 6 x 7 = 42
4 x 4 5 x 6 = 30
5 x 5 4 x 5 = 20
6 x 6 3 x 4 = 12
7 x 7 2 x 3 = 6
8 x 8 1 x 2 = 2
----------------------------------------
Total = 240
For the general case of an n x m board, where m = n + t
We require
n n
SUM[r(r + t)] = SUM[r^2 + rt}
r=1 r=1
= n(n + 1)(2n + 1)/6 + tn(n + 1)/2
= [n(n + 1)/6]*[2n + 1 + 3t]
No. of squares =
[n(n + 1)/6]*[2n + 3t + 1] .......(1)
In the example above t = 1 and so
No. of squares = 8 x 9/6[16 + 3 + 1]
= (72/6)[20]
= 240 (as required)
The general formula for an (n x n+t) board is that given in (1)
above.
No. of squares = [n(n + 1)/6]*[2n + 3t + 1]

Related

Geometric series: calculate quotient and number of elements from sum and first & last element

Creating evenly spaced numbers on a log scale (a geometric progression) can easily be done for a given base and number of elements if the starting and final values of the sequence are known, e.g., with numpy.logspace and numpy.geomspace. Now assume I want to define the geometric progression the other way around, i.e., based on the properties of the resulting geometric series. If I know the sum of the series as well as the first and last element of the progression, can I compute the quotient and number of elements?
For instance, assume the first and last elements of the progression are and and the sum of the series should be equal to . I know from trial and error that it works out for n=9 and r≈1.404, but how could these values be computed?

You have enough information to solve it:
Sum of series = a + a*r + a*(r^2) ... + a*(r^(n-1))
= a*((r^n)-1)/(r-1)
= a*((last element * r) - 1)/(r-1)
Given the sum of series, a, and the last element, you can use the above equation to find the value of r.
Plugging in values for the given example:
50 = 1 * ((15*r)-1) / (r-1)
50r - 50 = 15r - 1
35r = 49
r = 1.4
Then, using sum of series = a*((r^n)-1)/(r-1):
50 = 1*((1.4^n)-1)(1.4-1)
21 = 1.4^n
n = log(21)/log(1.4) = 9.04
You can approximate n and recalculate r if n isn't an integer.

We have to reconstruct geometric progesssion, i.e. obtain a, q, m (here ^ means raise into power):
a, a * q, a * q^2, ..., a * q^(m - 1)
if we know first, last, total:
first = a # first item
last = a * q^(m - 1) # last item
total = a * (q^m - 1) / (q - 1) # sum
Solving these equation we can find
a = first
q = (total - first) / (total - last)
m = log(last / a) / log(q)
if you want to get number of items n, note that n == m + 1
Code:
import math
...
def Solve(first, last, total):
a = first
q = (total - first) / (total - last)
n = math.log(last / a) / math.log(q) + 1
return (a, q, n);
Fiddle
If you put your data (1, 15, 50) you'll get the solution
a = 1
q = 1.4
n = 9.04836151801382 # not integer
since n is not an integer you, probably want to adjust; let last == 15 be exact, when total can vary. In this case q = (last / first) ^ (1 / (n - 1)) and total = first * (q ^ n - 1) / (q - 1)
a = 1
q = 1.402850552006674
n = 9
total = 49.752 # now n is integer, but total <> 50

You have to solve the following two equations for r and n:
a:= An / Ao = r^(n - 1)
and
s:= Sn / Ao = (r^n - 1) / (r - 1)
You can eliminate n by
s = (r a - 1) / (r - 1)
and solve for r. Then n follows by log(a) / log(r) + 1.
In your case, from s = 50 and a = 15, we obtain r = 7/5 = 1.4 and n = 9.048...
It makes sense to round n to 9, but then r^8 = 15 (r ~ 1.40285) and r = 1.4 are not quite compatible.

Dividing an even number into N parts each part being a multiple of 2

Let's assume I have the number 100 which I need to divide into N parts each of which shouldn't exceed 30 initially. So the initial grouping would be (30,30,30). The remainder (which is 10) is to be distributed among these three groups by adding 2 to each group in succession, thus ensuring that each group is a multiple of 2. The desired output should therefore look like (34,34,32).
Note: The original number is always even.
I tried solving this in Python and this is what I came up with. Clearly it's not working in the way I thought it would. It distributes the remainder by adding 1 (and not 2, as desired) iteratively to each group.
num = 100
parts = num//30 #Number of parts into which 'num' is to be divided
def split(a, b):
result = ([a//b + 1] * (a%b) + [a//b] * (b - a%b))
return(result)
print(split(num, parts))
Output:
[34, 33, 33]
Desired output:
[34, 34, 32]

Simplified problem: forget about multiples of 2
First, let's simplify your problem for a second. Forget about the multiples of 2. Imagine you want to split a non-necessarily-even number n into k non-necessarily-even parts.
Obviously the most balanced solution is to have some parts be n // k, and some parts be n // k + 1.
How many of which? Let's call r the number of parts with n // k + 1. Then there are k - r parts with n // k, and all the parts sum up to:
(n // k) * (k - r) + (n // k + 1) * r
== (n // k) * (k - r) + (n // k) * r + r
== (n // k) * (k - r + r) + r
== (n // k) * k + r
But the parts should sum up to n, so we need to find r such that:
n == (n // k) * k + r
Happily, you might recognise Euclidean division here, with n // k being the quotient and r being the remainder.
This gives us our split function:
def split(n, k):
d,r = divmod(n, k)
return [d+1]*r + [d]*(k-r)
Testing:
print( split(50, 3) )
# [17, 17, 16]
Splitting into multiples of 2
Now back to your split_even problem. Now that we have the generic function split, a simple way to solve split_even is to use split:
def split_even(n, k):
return [2 * x for x in split(n // 2, k)]
Testing:
print( split_even(100, 3) )
# [34, 34, 32]
Generalisation: multiples of m
It's trivial to do the same thing with multiples of a number m other than 2:
def split_multiples(n, k, m=2):
return [m * x for x in split(n // m, k)]
Testing:
print( split_multiples(102, 4, 3) )
# [27, 27, 24, 24]

This solution is not very clear and easy to follow but it does not need any loops.
Full code:
def split(a,b):
lower = (a//b//2) * 2
num = a % (b*2) // 2
return [lower + 2] * num + [lower] * (b - num)
Explanation:
First get the value of all parts: We round the result of the division (value // parts) down to the next even value ((x // 2) * 2)
To get the number of higher values: We use the remainder of the division of a in double as many parts and divide it by two to compensate the multiplication
last: higher numbers are just lower + 2 times the computed number of higher values and lower numbers are filling the other spaces

My approach here is to create three arrays and sum them, the first two are simple, but the last is a little more complex to follow - it's just repping 2 (by) as many times as is can given the remainder, then repping 0s.
# Part 1
np.repeat(first, x//first)
# Part 2
np.repeat(by, x//first)
# Part 3
np.repeat([by, 0], [(x//first) - ((x - (x//first*first)) // by % by), (x - (x//first*first)) // by % by])
Wrapped into a function:
def split(x, first, by):
return(np.repeat(first, x//first) + np.repeat(by, x//first) + np.repeat([by, 0], [(x//first) - ((x - (x//first*first)) // by % by), (x - (x//first*first)) // by % by]))
split(100, 30, 2)

Minimum moves to reach k

Given two numbers m and n, in one move you can get two new pairs:
m+n, n
m, n+m
Let's intially set m = n = 1 find the minimum number of moves so that at least one of the numbers equals k
it's guaranteed there's a solution (i.e. there exist a sequence of moves that leads to k)
For example:
given k = 5
the minimum number of moves so that m or n is equal to k is 3
1, 1
1, 2
3, 2
3, 5
Total of 3 moves.
I have come up with a solution using recursion in python, but it doesn't seem to work on big number (i.e 10^6)
def calc(m, n, k):
if n > k or m > k:
return 10**6
elif n == k or m == k:
return 0
else:
return min(1+calc(m+n, n, k), 1+calc(m, m+n, k))
k = int(input())
print(calc(1, 1, k))
How can I improve the performance so it works for big numbers?

Non-Recursive Algorithm based on Priority Queue (using Heap)
State: (sum_, m, n, path)
sum_ is current sum (i.e. m + n)
m and n are the first and second numbers
path is the sequence of (m, n) pairs to get to the current sum
In each step there are two possible moves
Replace first number by the sum
Replace second number by the sum
Thus each state generates two new states. States are prioritized by:
moves: states with a lower number of have higher priority
sum: States with higher sums have higher priority
We use a Priority Queue (Heap in this case) to process states by priority.
Code
from heapq import heappush, heappop
def calc1(k):
if k < 1:
return None, None # No solution
m, n, moves = 1, 1, 0
if m == k or n == k:
return moves, [(m, n)]
h = [] # Priority queue (heap)
path = [(m, n)]
sum_ = m + n
# Python's heapq acts as a min queue.
# We can order thing by max by using -value rather than value
# Thus Tuple (moves+1, -sum_, ...) prioritizes by 1) min moves, and 2) max sum
heappush(h, (moves+1, -sum_, sum_, n, path))
heappush(h, (moves+1, -sum_, m, sum_, path))
while h:
# Get state with lowest sum
moves, sum_, m, n, path = heappop(h)
sum_ = - sum_
if sum_ == k:
return moves, path # Found solution
if sum_ < k:
sum_ = m + n # new sum
# Replace first number with sum
heappush(h, (moves+1, -sum_, sum_, n, path + [(sum_, n)]))
# Replace second number with sum
heappush(h, (moves+1, -sum_, m, sum_, path + [(m, sum_)]))
# else:
# so just continues since sum_ > k
# Exhausted all options, so no solution
return None, None
Test
Test Code
for k in [5, 100, 1000]:
moves, path = calc1(k)
print(f'k: {k}, Moves: {moves}, Path: {path}')
Output
k: 5, Moves: 3, Path: [(1, 1), (2, 3), (2, 5)]
k: 100, Moves: 10, Path: [(1, 1), (2, 3), (5, 3), (8, 3), (8, 11),
(8, 19), (27, 19), (27, 46), (27, 73), (27, 100)]
k: 1000, Moves: 15, Path: [(1, 1), (2, 3), (5, 3), (8, 3), (8, 11),
(19, 11), (19, 30), (49, 30), (79, 30), (79, 109),
(188, 109), (297, 109), (297, 406), (297, 703), (297, 1000)]
Performance Improvement
Following two adjustments to improve performance
Not including path just number of steps (providing 3X speedup for k = 10,000
Not using symmetric pairs (provided 2x additional with k = 10, 000
By symmetric pairs, mean pairs of m, n which are the same forward and backwards, such as (1, 2) and (2, 1).
We don't need to branch on both of these since they will provide the same solution step count.
Improved Code
from heapq import heappush, heappop
def calc(k):
if k < 1:
return None, None
m, n, moves = 1, 1, 0
if m == k or n == k:
return moves
h = [] # Priority queue (heap)
sum_ = m + n
heappush(h, (moves+1, -sum_, sum_, n))
while h:
moves, sum_, m, n = heappop(h)
sum_ = - sum_
if sum_ == k:
return moves
if sum_ < k:
sum_ = m + n
steps = [(sum_, n), (m, sum_)]
heappush(h, (moves+1, -sum_, *steps[0]))
if steps[0] != steps[-1]: # not same tuple in reverse (i.e. not symmetric)
heappush(h, (moves+1, -sum_, *steps[1]))
Performance
Tested up to k = 100, 000 which took ~2 minutes.
Update
Converted solution by #גלעדברקן from JavaScript to Python to test
def g(m, n, memo):
key = (m, n)
if key in memo:
return memo[key]
if m == 1 or n == 1:
memo[key] = max(m, n) - 1
elif m == 0 or n == 0:
memo[key] = float("inf")
elif m > n:
memo[key] = (m // n) + g(m % n, n, memo)
else:
memo[key] = (n // m) + g(m, n % m, memo)
return memo[key]
def f(k, memo={}):
if k == 1:
return 0
return min(g(k, n, memo) for n in range((k // 2) + 1))
Performance of #גלעדברקן Code
Completed 100K in ~1 second
This is 120X faster than my above heap based solution.

This is an interesting problem in number theory, including linear Diophantine equations. Since there are solutions available on line, I gather that you want help in deriving the algorithm yourself.
Restate the problem: you start with two numbers characterized as 1*m+0*n, 0*m+1*n. Use the shorthand (1, 0) and (0, 1). You are looking for the shortest path to any solution to the linear Diophantine equation
a*m + b*n = k
where (a, b) is reached from starting values (1, 1) a.k.a. ( (1, 0), (0, 1) ).
So ... starting from (1, 1), how can you characterize the paths you reach from various permutations of the binary enhancement. At each step, you have two choices: a += b or b += a. Your existing algorithm already recognizes this binary search tree.
These graph transitions -- edges along a lattice -- can be characterized, in terms of which (a, b) pairs you can reach on a given step. Is that enough of a hint to move you along? That characterization is the key to converting this problem into something close to a direct computation.

We can do much better than the queue even with brute force, trying each possible n when setting m to k. Here's JavaScript code, very close to Python syntax:
function g(m, n, memo){
const key = m + ',' + n;
if (memo[key])
return memo[key];
if (m == 1 || n == 1)
return Math.max(m, n) - 1;
if (m == 0 || n == 0)
return Infinity;
let answer;
if (m > n)
answer = Math.floor(m / n) + g(m % n, n, memo);
else
answer = Math.floor(n / m) + g(m, n % m, memo);
memo[key] = answer;
return answer;
}
function f(k, memo={}){
if (k == 1)
return 0;
let best = Infinity;
for (let n=1; n<=Math.floor(k/2); n++)
best = Math.min(best, g(k, n, memo));
return best;
}
var memo = {};
var ks = [1, 2, 5, 6, 10, 100, 1000, 100000];
for (let k of ks)
console.log(`${ k }: ${ f(k, memo) }`);

By default, in Python, for a recursive function the recursion limit is set to 10^4. You can change it using sys module:
import sys
sys.setrecursionlimit(10**6)

How to use numpy to generate random numbers on segmentation intervals

I am using numpy module in python to generate random numbers. When I need to generate random numbers in a continuous interval such as [a,b], I will use
(b-a)*np.random.rand(1)+a
but now I Need to generate a uniform random number in the interval [a, b] and [c, d], what should I do?
I want to generate a random number that is uniform over the length of all the intervals. I do not select an interval with equal probability, and then generate a random number inside the interval. If [a, b] and [c, d] are equal in length, There is no problem with this use, but when the lengths of the intervals are not equal, the random numbers generated by this method are not completely uniform.

You could do something like
a,b,c,d = 1,2,7,9
N = 10
r = np.random.uniform(a-b,d-c,N)
r += np.where(r<0,b,c)
r
# array([7.30557415, 7.42185479, 1.48986144, 7.95916547, 1.30422703,
# 8.79749665, 8.19329762, 8.72669862, 1.88426196, 8.33789181])

You can use
np.random.uniform(a,b)
for your random numbers between a and b (including a but excluding b)
So for random number in [a,b] and [c,d], you can use
np.random.choice( [np.random.uniform(a,b) , np.random.uniform(c,d)] )

Here's a recipe:
def random_multiinterval(*intervals, shape=(1,)):
# FIXME assert intervals are valid and non-overlapping
size = sum(i[1] - i[0] for i in intervals)
v = size * np.random.rand(*shape)
res = np.zeros_like(v)
for i in intervals:
res += (0 < v) * (v < (i[1] - i[0])) * (i[0] + v)
v -= i[1] - i[0]
return res
In [11]: random_multiinterval((1, 2), (3, 4))
Out[11]: array([1.34391171])
In [12]: random_multiinterval((1, 2), (3, 4), shape=(3, 3))
Out[12]:
array([[1.42936024, 3.30961893, 1.01379663],
[3.19310627, 1.05386192, 1.11334538],
[3.2837065 , 1.89239373, 3.35785566]])
Note: This is uniformly distributed over N (non-overlapping) intervals, even if they have different sizes.

You can just assign a probability for how likely it will be [a,b] or [c,d] and then generate accordingly:
import numpy as np
import random
random_roll = random.random()
a = 1
b = 5
c = 7
d = 10
if random_roll > .5: # half the time we will use [a,b]
my_num = (b - a) * np.random.rand(1) + a
else: # the other half we will use [c,d]
my_num = (d - c) * np.random.rand(1) + c
print(my_num)

OverFlowError while solving "Number of subsets without consecutive numbers"

I'm trying to solve a problem in TalentBuddy using Python
The problem is :
Given a number N. Print to the standard output the total number of
subsets that can be formed using the {1,2..N} set, but making sure
that none of the subsets contain any two consecutive integers. The
final count might be very large, this is why you must print the result
modulo 524287.
I've worked the code. All of the tests are OK, except Test 6. I got OverFlowError when the test is submitting 10000000 as the argument of my function. I don't know what should I do to resolve this error
My code :
import math
def count_subsets(n):
step1 = (1 / math.sqrt(5)) * (((1 + math.sqrt(5)) / 2) ** (n + 2))
step2 = (1 / math.sqrt(5)) * (((1 - math.sqrt(5)) / 2) ** (n + 2))
res = step1 - step2
print int(res) % 524287
I guess this is taking up memory a lot. I wrote this after I found a mathematical formula to the same topic on the Internet.
I guess my code isn't Pythonic at all.
How to do this, the "Pythonic" way? How to resolve the OverFlowError?
EDIT: In the problem, I've given the example input 3, and the result (output) is 5.
Explanation: The 5 sets are, {}, {1}, {2}, {3}, {1,3}.
However, in Test 6, the problem I've given are:
Summary for test #6
Input test:
[10000000]
Expected output:
165366
Your output:
Traceback (most recent call last):
On line 4, in function count_subsets:
step1 = (1 / math.sqrt(5)) * (((1 + math.sqrt(5)) / 2) ** (n + 2))
OverflowError:

Let f(N) be the number of subsets that contain no consecutive numbers. There's F(N-2) subsets that contain N, and F(N-1) subsets that don't contain N. This gives:
F(N) = F(N-1) + F(N-2).
F(0) = 1 (there's 1 subset of {}, namely {}).
F(1) = 2 (there's 2 subsets of {1}, namely {} and {1}).
This is the fibonacci sequence, albeit with non-standard starting conditions.
There is, as you've found, a formula using the golden ratio to calculate this. The problem is that for large N, you need more and more accuracy in your floating-point calculation.
An exact way to do the calculation is to use iteration:
a_0 = 1
b_0 = 2
a_{n+1} = b_n
b_{n+1} = a_n + b_n
The naive version of this is easy but slow.
def subsets(n, modulo):
a, b = 1, 2
for _ in xrange(n):
a, b = b, (a + b) % modulo
return a
Instead, a standard trick is to write the repeated application of the recurrences as a matrix power:
( a_n ) = | 0 1 |^N ( 1 )
( b_n ) = | 1 1 | . ( 2 )
You can compute the matrix power (using modulo-524287 arithmetic) by repeated squaring. See Exponentiation by squaring. Here's complete code:
def mul2x2(a, b, modulo):
result = [[0, 0], [0, 0]]
for i in xrange(2):
for j in xrange(2):
for k in xrange(2):
result[i][j] += a[i][k] * b[k][j]
result[i][j] %= modulo
return result
def pow(m, n, modulo):
result = [[1, 0], [0, 1]]
while n:
if n % 2: result = mul2x2(result, m, modulo)
m = mul2x2(m, m, modulo)
n //= 2
return result
def subsets(n):
m = pow([[0, 1], [1, 1]], n, 524287)
return (m[0][0] + 2 * m[0][1]) % 524287
for i in xrange(1, 10):
print i, subsets(i)
for i in xrange(1, 20):
print i, subsets(10 ** i)
This prints solutions for every power of 10 up to 10^19, and it's effectively instant (0.041sec real on my laptop).

Develop Reference

Python is a programming language that lets you work quickly and integrate systems more effectively.

Code takes a lot of time to run for large numbers - python

Related

Geometric series: calculate quotient and number of elements from sum and first & last element

Dividing an even number into N parts each part being a multiple of 2

Minimum moves to reach k

How to use numpy to generate random numbers on segmentation intervals

OverFlowError while solving "Number of subsets without consecutive numbers"

Categories

Resources