Solution to splitting array into equal subarrays too slow

Solution to splitting array into equal subarrays too slow - python

I'm having trouble with a dynamic programming problem. I have tested my solution against test cases, and it is correct. However, it is too slow. This leads me to believe that I may not be caching the solutions to subproblems effectively. Here is the problem statement:
There is an array A that contains N numbers. Given an array, a player can split the array into two non-empty subarrays only if the elements in each subarray sum to the same value. After a split is made, the player discards one subarray and is allowed to continue splitting on the remaining subarray. This continues until a split is no longer possible. What is the maximum number of splits possible on the given array A?
Here is my (slow) solution, which calls topSplit(A) on the given array:
def topSplitAux(A, C, i, j):
if -1 != C[i][j]:
return C[i][j]
if i == j:
return 0
s = float('-inf')
for k in range(i + 1, j):
if sum(A[i:k]) == sum(A[k:j]):
p1 = 1 + topSplitAux(A, C, i, k)
p2 = 1 + topSplitAux(A, C, k, j)
s = max(s, p1, p2)
C[i][j] = s
if s == float('-inf'): # we couldn't split, game over
C[i][j] = 0
return C[i][j]
def topSplit(A):
# initialize a cache to store solutions already solved
n = len(A)
# the subproblem we are interested in will be in C[0][n]
C = [[-1 for _ in range(n + 1)] for _ in range(n + 1)]
return topSplitAux(A, C, 0, n)
if __name__ == '__main__':
T = int(raw_input())
for t in range(T):
N = int(raw_input())
A = map(int, raw_input().split())
n = len(A)
print topSplit(A)
Here's a simple test case:
3
3
3 3 3
4
2 2 2 2
7
4 1 0 1 1 0 1
with expected result:
0
2
3
Any help on making this solution faster would be greatly appreciated. Thanks!

Related

Python question on how these two lists are different?

I have a weird problem where I have 2 seemingly identical lists and yet using one gets one answer and using the other gets a second answer even though they are the same list, just generated by different methods.
try the program below and see that b and c are the same. (print outs on line 8)
then when using b's on line 11,13 and 16 we get an answer of 0.
using c's on line 11,13 and 16 we get an answer of 6.
i've tried this on numerous compliers and cannot understand what aspect python is using to differenciate from b and c. Try it yourself by copying and pasting below(Thanks in advance for helping out!):
def oddCells(n, m, indices):
a = [0]*m
b =[]
for every in range(n):
b +=[a]
c = [ [0] * m for i in range(n) ]
result = 0
print(b,"|",c,b == c)
for e in indices:
for i in range(m):
b[e[0]][i] += 1
for j in range(n):
b[j][e[1]] += 1
for i in range(n):
for j in range(m):
if b[i][j] %2 != 0:
result +=1
return(result)
W =[[0,1],[1,1]]
print(oddCells(2,3,W))
############SECOND VERSION TO COMPARE####################################
def oddCells(n, m, indices):
a = [0]*m
b =[]
for every in range(n):
b +=[a]
c = [ [0] * m for i in range(n) ]
result = 0
print(b,"|",c,b == c)
for e in indices:
for i in range(m):
c[e[0]][i] += 1
for j in range(n):
c[j][e[1]] += 1
for i in range(n):
for j in range(m):
if c[i][j] %2 != 0:
result +=1
return(result)
W =[[0,1],[1,1]]
print(oddCells(2,3,W))

codility counting lesson 2 : swap elements from arrays

I'm studying Codility Counting Lesson (https://codility.com/media/train/2-CountingElements.pdf) and I need help to understand the fastest solution.
I would like to know what does counting function means:
count = counting(A, m)
The Problem:
You are given an integer m (1 < m < 1000000) and two non-empty, zero-indexed arrays A and B of n integers, a0, a1, ... , an−1 and b0, b1, ... , bn−1 respectively (0 < ai, bi < m). The goal is to check whether there is a swap operation which can be performed on these arrays in such a way that the sum of elements in array A equals the sum of elements in array B after the swap. By swap operation we mean picking one element from array A and one element from array B and exchanging them.
The solution:
def fast_solution(A, B, m):
n = len(A)
sum_a = sum(A)
sum_b = sum(B)
d = sum_b - sum_a
if d % 2 == 1:
return False
d //= 2
count = counting(A, m)
for i in xrange(n):
if 0 <= B[i] - d and B[i] - d <= m and count[B[i] - d] > 0:
return True
return False

Counting is defined earlier in the text and is implemented as follows:
def counting(A, m):
n = len(A)
count = [0] * (m + 1)
for k in xrange(n):
count[A[k]] += 1
return count
It just counts how many times each element appears in the array.

Incorrect indexing for max subarray in Python

I wrote both a brute-force and a divide-and-conquer implementation of the Max Subarray problem in Python. Tests are run by drawing a random sample of integers.
When the length of the input array is large, the assert in __main__ fails because the recursive algorithm does not return the correct answer. However, the two algorithms DO agree when the array is less than 10 elements long (this is approximate, and the actual size of the failed input varies on each execution). The issue does not seem to be related to even or odd array lengths, but it does appear to be related to how the array is indexed.
Sorry if I'm missing something stupid, but why does the recursive algorithm stop returning the correct output when the input array starts getting larger?
# Subarray solutions are represented by an array in the form
# [lower_bound, higher_bound, sum]
from sys import maxsize
import random
import time
# Brute force implementation (THETA(n^2))
def bf_max_subarray(A):
biggest = -maxsize - 1
left = 0
right = 0
for i in range(0, len(A)):
sum = 0
for j in range(i, len(A)):
sum += A[j]
if sum > biggest:
biggest = sum
left = i
right = j
return [left, right, biggest]
# Part of divide-and-conquer solution
def cross_subarray(A, l, m, r):
lsum = -maxsize - 1
rsum = -maxsize - 1
lbound = 0
rbound = 0
tempsum = 0
for i in range(m, l-1, -1):
tempsum += A[i]
if tempsum > lsum:
lsum = tempsum
lbound = i
tempsum = 0
for j in range(m+1, r+1):
tempsum += A[j]
if tempsum > rsum:
rsum = tempsum
rbound = j
return [lbound, rbound, lsum + rsum]
# Recursive solution
def rec_max_subarray(A, l, r):
# Base case: array of one element
if (l == r):
return [l, r, A[l]]
else:
m = (l+r)//2
left = rec_max_subarray(A, l, m)
right = rec_max_subarray(A, m+1, r)
cross = cross_subarray(A, l, m, r)
# Returns the array representing the subarray with the maximum sum.
return max([left, right, cross], key=lambda i:i[2])
if __name__ == "__main__":
for i in range(1, 101):
A = random.sample(range(-i*2, i), i)
start = time.clock()
bf = bf_max_subarray(A)
bf_time = time.clock() - start
start = time.clock()
dc = rec_max_subarray(A, 0, len(A)-1)
dc_time = time.clock() - start
assert dc == bf # Make sure the algorithms agree.

The subarray with the maximum sum is represented by an array of the form [left_bound, right_bound, sum].
But thanks toreturn max([left, right, cross], key=lambda i:i[2]), rec_max_subarray returns the correct maximum sum for A, but risks returning indicies that do not match the indicies returned in bf_max_subarray. My error was assuming that the boundaries of a subarray with the maximum sum would be unique.
The solution is to either fix the criteria that selects a subarray, or just to assert the equality of the sums using assert dc[2] == bf[2].

How to efficiently get all combinations where the sum is 10 or below in Python

Imagine you're trying to allocate some fixed resources (e.g. n=10) over some number of territories (e.g. t=5). I am trying to find out efficiently how to get all the combinations where the sum is n or below.
E.g. 10,0,0,0,0 is good, as well as 0,0,5,5,0 etc., while 3,3,3,3,3,3 is obviously wrong.
I got this far:
import itertools
t = 5
n = 10
r = [range(n+1)] * t
for x in itertools.product(*r):
if sum(x) <= n:
print x
This brute force approach is incredibly slow though; there must be a better way?
Timings (1000 iterations):
Default (itertools.product) --- time: 40.90 s
falsetru recursion --- time: 3.63 s
Aaron Williams Algorithm (impl, Tony) --- time: 0.37 s

Possible approach follows. Definitely would use with caution (hardly tested at all, but the results on n=10 and t=5 look reasonable).
The approach involves no recursion. The algorithm to generate partitions of a number n (10 in your example) having m elements (5 in your example) comes from Knuth's 4th volume. Each partition is then zero-extended if necessary, and all the distinct permutations are generated using an algorithm from Aaron Williams which I have seen referred to elsewhere. Both algorithms had to be translated to Python, and that increases the chance that errors have crept in. The Williams algorithm wanted a linked list, which I had to fake with a 2D array to avoid writing a linked-list class.
There goes an afternoon!
Code (note your n is my maxn and your t is my p):
import itertools
def visit(a, m):
""" Utility function to add partition to the list"""
x.append(a[1:m+1])
def parts(a, n, m):
""" Knuth Algorithm H, Combinatorial Algorithms, Pre-Fascicle 3B
Finds all partitions of n having exactly m elements.
An upper bound on running time is (3 x number of
partitions found) + m. Not recursive!
"""
while (1):
visit(a, m)
while a[2] < a[1]-1:
a[1] -= 1
a[2] += 1
visit(a, m)
j=3
s = a[1]+a[2]-1
while a[j] >= a[1]-1:
s += a[j]
j += 1
if j > m:
break
x = a[j] + 1
a[j] = x
j -= 1
while j>1:
a[j] = x
s -= x
j -= 1
a[1] = s
def distinct_perms(partition):
""" Aaron Williams Algorithm 1, "Loopless Generation of Multiset
Permutations by Prefix Shifts". Finds all distinct permutations
of a list with repeated items. I don't follow the paper all that
well, but it _possibly_ has a running time which is proportional
to the number of permutations (with 3 shift operations for each
permutation on average). Not recursive!
"""
perms = []
val = 0
nxt = 1
l1 = [[partition[i],i+1] for i in range(len(partition))]
l1[-1][nxt] = None
#print(l1)
head = 0
i = len(l1)-2
afteri = i+1
tmp = []
tmp += [l1[head][val]]
c = head
while l1[c][nxt] != None:
tmp += [l1[l1[c][nxt]][val]]
c = l1[c][nxt]
perms.extend([tmp])
while (l1[afteri][nxt] != None) or (l1[afteri][val] < l1[head][val]):
if (l1[afteri][nxt] != None) and (l1[i][val]>=l1[l1[afteri][nxt]][val]):
beforek = afteri
else:
beforek = i
k = l1[beforek][nxt]
l1[beforek][nxt] = l1[k][nxt]
l1[k][nxt] = head
if l1[k][val] < l1[head][val]:
i = k
afteri = l1[i][nxt]
head = k
tmp = []
tmp += [l1[head][val]]
c = head
while l1[c][nxt] != None:
tmp += [l1[l1[c][nxt]][val]]
c = l1[c][nxt]
perms.extend([tmp])
return perms
maxn = 10 # max integer to find partitions of
p = 5 # max number of items in each partition
# Find all partitions of length p or less adding up
# to maxn or less
# Special cases (Knuth's algorithm requires n and m >= 2)
x = [[i] for i in range(maxn+1)]
# Main cases: runs parts fn (maxn^2+maxn)/2 times
for i in range(2, maxn+1):
for j in range(2, min(p+1, i+1)):
m = j
n = i
a = [0, n-m+1] + [1] * (m-1) + [-1] + [0] * (n-m-1)
parts(a, n, m)
y = []
# For each partition, add zeros if necessary and then find
# distinct permutations. Runs distinct_perms function once
# for each partition.
for part in x:
if len(part) < p:
y += distinct_perms(part + [0] * (p - len(part)))
else:
y += distinct_perms(part)
print(y)
print(len(y))

Make your own recursive function which do not recurse with an element unless it's possible to make a sum <= 10.
def f(r, n, t, acc=[]):
if t == 0:
if n >= 0:
yield acc
return
for x in r:
if x > n: # <---- do not recurse if sum is larger than `n`
break
for lst in f(r, n-x, t-1, acc + [x]):
yield lst
t = 5
n = 10
for xs in f(range(n+1), n, 5):
print xs

You can create all the permutations with itertools, and parse the results with numpy.
>>> import numpy as np
>>> from itertools import product
>>> t = 5
>>> n = 10
>>> r = range(n+1)
# Create the product numpy array
>>> prod = np.fromiter(product(r, repeat=t), np.dtype('u1,' * t))
>>> prod = prod.view('u1').reshape(-1, t)
# Extract only permutations that satisfy a condition
>>> prod[prod.sum(axis=1) < n]
Timeit:
>>> %%timeit
prod = np.fromiter(product(r, repeat=t), np.dtype('u1,' * t))
prod = prod.view('u1').reshape(-1, t)
prod[prod.sum(axis=1) < n]
10 loops, best of 3: 41.6 ms per loop
You could even speed up the product computation by populating combinations directly in numpy.

You could optimize the algorithm using Dynamic Programming.
Basically, have an array a, where a[i][j] means "Can I get a sum of j with the elements up to the j-th element (and using the jth element, assuming you have your elements in an array t (not the number you mentioned)).
Then you can fill the array doing
a[0][t[0]] = True
for i in range(1, len(t)):
a[i][t[i]] = True
for j in range(t[i]+1, n+1):
for k in range(0, i):
if a[k][j-t[i]]:
a[i][j] = True
Then, using this info, you could backtrack the solution :)
def backtrack(j = len(t)-1, goal = n):
print j, goal
all_solutions = []
if j == -1:
return []
if goal == t[j]:
all_solutions.append([j])
for i in range(j-1, -1, -1):
if a[i][goal-t[j]]:
r = backtrack(i, goal - t[j])
for l in r:
print l
l.append(j)
all_solutions.append(l)
all_solutions.extend(backtrack(j-1, goal))
return all_solutions
backtrack() # is the answer

Maximum sum of sublist with a specific length

I'm supposed to write a function which takes two numbers, the first is a given number, and the second is the length for the maximum sublist that I'm supposed to find:
for example input (1234,2)
the output would be 7
this is my code so far, it just computes the sum of the entire digits:
def altsum_digits(n,d):
b=str(n)
c=[]
for digit in b:
c.append(int(digit))
maxthere=0
realmax=0
for a in str(d):
for i in c:
maxthere=max(0,(maxthere+int(i)))
realmax=max(maxthere,realmax)
maxthere==0
print(realmax)

By what i get from question, this should do what you want:
def do(n, d):
print sum(sorted([int(x) for x in str(n)])[-d:])

let's say you get a number n, and a length k.
What you have to do is first turn n into a list of numbers, and then use a sliding window of size k where at each step you add the next number, and substract the first one in the sliding window, and keep track of the max_sum so you can return it at the end.
The function would look something like this
def altsum_digits(n, k):
list_n = [int(x) for x in str(n)]
max_sum = sum(list_n[:k])
for i in range(k, len(list_n)):
current_sum = current_sum + list_n[i] - list_n[i - k]
max_sum = max(current_sum, max_sum)
return max_sum
It's an O(n) solution, so it's a lot better than generating all sublists of size k. Hope it helps!

Let's clarify to make sure we're on the same page.
Inputs: 1) a list li of digits; 2) n
Output: the slice from li of length n that has maximal sum.
li = [4,2,1,7,1,3,8,4,7,8,1]
n = 2
slices = (li[x:x+n] for x in range(len(li)-n+1))
max(map(sum,slices))
Out[113]: 15

def sublists(lst, n):
return (lst[i:i+n] for i in range(len(lst) - n + 1))
def max_sublist_sum(lst, n):
return max(sum(sub) for sub in sublists(lst, n))
max_sublist_sum([1,2,3,4], 2) # => 7

This should do the trick:
def altsum_digits(n, d):
l = list(map(int, str(n)))
m = c = sum(l[:d])
for i in range(0, len(l)-d):
c = c - l[i] + l[i+d]
if c > m: m = c
print m
altsum_digits(1234,2)
>>> 7

I think I understand what you're asking, and here is my solution. I've tested it on your input as well as other inputs with varying lengths of substring. This code finds the maximum sum of adjacent substrings in the input.
def sum_of_sublist(input, maxLength):
input_array = [int(l) for l in str(input)]
tempMax = 0
realMax = 0
for i in range(len(input_array) - (maxLength - 1)):
for inc in range(0, maxLength):
tempMax += input_array[i+inc]
if tempMax > realMax:
realMax = tempMax
tempMax = 0
print realMax
sum_of_sublist(1234, 2)
So, for an input for the call sum_of_sublist(1234, 2), it will print the value 7 because the largest sum of 2 consecutive numbers is 3 + 4 = 7. Similarly, for the callsum_of_sublist(12531, 3), the program will print 10 because the largest sum of 3 consecutive numbers is 2 + 5 + 3 = 10.

Develop Reference

Python is a programming language that lets you work quickly and integrate systems more effectively.

Solution to splitting array into equal subarrays too slow - python

Related

Python question on how these two lists are different?

codility counting lesson 2 : swap elements from arrays

Incorrect indexing for max subarray in Python

How to efficiently get all combinations where the sum is 10 or below in Python

Maximum sum of sublist with a specific length

Categories

Resources