Sorting an iterator in python - python

I want to iterate over a big itertools product, but I want to do it in a different order from the one that product offers. The problem is that sorting an iterator using sorted takes time. For example:
from itertools import product
import time
RNG = 15
RPT = 6
start = time.time()
a = sorted(product(range(RNG), repeat=RPT), key=sum)
print("Sorted: " + str(time.time() - start))
print(type(a))
start = time.time()
a = product(range(RNG), repeat=RPT)
print("Unsorted: " + str(time.time() - start))
print(type(a))
Creating the sorted iterator takes about twice as long. I'm guessing this is because sorted actually involves going through the whole iterator and returning a list. Whereas the second unsorted iterator is doing some sort of lazy evaluation magic.
I guess there's really two questions here.
General question: is there a lazy evaluation way to change the order items appear in an iterator?
Specific question: is there a way to loop through all m-length lists of ints less than n, hitting lists with smaller sums first?

If your objective is to reduce memory consumption, you could write your own generator to return the permutations in order of their sum (see below). But, if memory is not a concern, sorting the output of itertools.product() will be faster than the Python code that produces the same result.
Writing a recursive function that produces the combinations of values in order of their sum can be achieved by merging multiple iterators (one per starting value) based on the smallest sum:
def sumCombo(A,N):
if N==1:
yield from ((n,) for n in A) # single item combos
return
pA = [] # list of iterator/states
for i,n in enumerate(A): # for each starting value
ip = sumCombo(A[i:],N-1) # iterator recursion to N-1
p = next(ip) # current N-1 combination
pA.append((n+sum(p),p,n,ip)) # sum, state & iterator
while pA:
# index and states of smallest sum
i,(s,p,n,ip) = min(enumerate(pA),key=lambda ip:ip[1][0])
ps = s
while s == ps: # output equal sum combinations
yield (n,*p) # yield starting number with recursed
p = next(ip,None) # advance iterator
if p is None:
del pA[i] # remove exhausted iterators
break
s = n+sum(p) # compute new sum
pA[i] = (s,p,n,ip) # and update states
This will only produce combinations of values as opposed to the product which produces distinct permutations of these combinations. (38,760 combinations vs 11,390,625 products).
In order to obtain all the products, you would need to run these combinations through a function that generates distinct permutations:
def permuteDistinct(A):
if len(A) == 1:
yield tuple(A) # single value
return
seen = set() # track starting value
for i,n in enumerate(A): # for each starting value
if n in seen: continue # not yet used
seen.add(n)
for p in permuteDistinct(A[:i]+A[i+1:]):
yield (n,*p) # starting value & rest
def sumProd(A,N):
for p in sumCombo(A,N): # combinations in order of sum
yield from permuteDistinct(p) # permuted
So sumProd(range(RNG),RPT) will produce the 11,390,625 permutations in order of their sum, without storing them in a list BUT it will take 5 times longer to do so (compared to sorting the product).
a = sorted(product(range(RNG), repeat=RPT), key=sum) # 4.6 sec
b = list(sumProd(range(RNG),RPT)) # 23 sec
list(map(sum,a)) == list(map(sum,b)) # True (same order of sums)
a == b # False (order differs for equal sums)
a[5:15] b[5:15] sum
(0, 1, 0, 0, 0, 0) (0, 1, 0, 0, 0, 0) 1
(1, 0, 0, 0, 0, 0) (1, 0, 0, 0, 0, 0) 1
(0, 0, 0, 0, 0, 2) (0, 0, 0, 0, 0, 2) 2
(0, 0, 0, 0, 1, 1) (0, 0, 0, 0, 2, 0) 2
(0, 0, 0, 0, 2, 0) (0, 0, 0, 2, 0, 0) 2
(0, 0, 0, 1, 0, 1) (0, 0, 2, 0, 0, 0) 2
(0, 0, 0, 1, 1, 0) (0, 2, 0, 0, 0, 0) 2
(0, 0, 0, 2, 0, 0) (2, 0, 0, 0, 0, 0) 2
(0, 0, 1, 0, 0, 1) (0, 0, 0, 0, 1, 1) 2
(0, 0, 1, 0, 1, 0) (0, 0, 0, 1, 0, 1) 2
If your process is searching for specific sums, it may be interesting to filter on combinations first and only expand distinct permutations for the combinations (sums) that meet your criteria. This could potentially cut down the number of iterations considerably (sumCombo(range(RNG),RPT) # 0.22 sec is faster than sorting the products).

Related

How can I improve performance of Sudoku solver?

I can't improve the performance of the following Sudoku Solver code. I know there are 3 loops here and they probably cause slow performance but I can't find a better/more efficient way. "board" is mutated with every iteration of recursion - if there are no zeros left, I just need to exit the recursion.
I tried to isolate "board" from mutation but it hasn't changed the performance. I also tried to use list comprehension for the top 2 "for" loops (i.e. only loop through rows and columns with zeros), tried to find coordinates of all zeros, and then use a single loop to go through them - hasn't helped.
I think I'm doing something fundamentally wrong here with recursion - any advice or recommendation on how to make the solution faster?
def box(board,row,column):
start_col = column - (column % 3)
finish_col = start_col + 3
start_row = row - (row % 3)
finish_row = start_row + 3
return [y for x in board[start_row:finish_row] for y in x[start_col:finish_col]]
def possible_values(board,row,column):
values = {1,2,3,4,5,6,7,8,9}
col_values = [v[column] for v in board]
row_values = board[row]
box_values = box(board, row, column)
return (values - set(row_values + col_values + box_values))
def solve(board, i_row = 0, i_col = 0):
for rn in range(i_row,len(board)):
if rn != i_row: i_col = 0
for cn in range(i_col,len(board)):
if board[rn][cn] == 0:
options = possible_values(board, rn, cn)
for board[rn][cn] in options:
if solve(board, rn, cn):
return board
board[rn][cn] = 0
#if no options left for the cell, go to previous cell and try next option
return False
#if no zeros left on the board, problem is solved
return True
problem = [
[9, 0, 0, 0, 8, 0, 0, 0, 1],
[0, 0, 0, 4, 0, 6, 0, 0, 0],
[0, 0, 5, 0, 7, 0, 3, 0, 0],
[0, 6, 0, 0, 0, 0, 0, 4, 0],
[4, 0, 1, 0, 6, 0, 5, 0, 8],
[0, 9, 0, 0, 0, 0, 0, 2, 0],
[0, 0, 7, 0, 3, 0, 2, 0, 0],
[0, 0, 0, 7, 0, 5, 0, 0, 0],
[1, 0, 0, 0, 4, 0, 0, 0, 7]
]
solve(problem)
Three things you can do to speed this up:
Maintain additional state using arrays of integers to keep track of row, col, and box candidates (or equivalently values already used) so that finding possible values is just possible_values = row_candidates[row] & col_candidates[col] & box_candidates[box]. This is a constant factors thing that will change very little in your approach.
As kosciej16 suggested use the min-remaining-values heuristic for selecting which cell to fill next. This will turn your algorithm into crypto-DPLL, giving you early conflict detection (cells with 0 candiates), constraint propagation (cells with 1 candidate), and a lower effective branching factor for the rest.
Add logic to detect hidden singles (like the Norvig solver does). This will make your solver a little slower for the simplest puzzles, but it will make a huge difference for puzzles where hidden singles are important (like 17 clue puzzles).
A result that worked at the end thanks to 53x15 and kosciej16. Not ideal or most optimal but passes the required performance test:
def solve(board, i_row = 0, i_col = 0):
cells_to_solve = [((rn, cn), possible_values(board,rn,cn)) for rn in range(len(board)) for cn in range(len(board)) if board[rn][cn] == 0]
if not cells_to_solve: return True
min_n_of_values = min([len(x[1]) for x in cells_to_solve])
if min_n_of_values == 0: return False
best_cells_to_try = [((rn,cn),options) for ((rn,cn),options) in cells_to_solve if len(options) == min_n_of_values]
for ((rn,cn),options) in best_cells_to_try:
for board[rn][cn] in options:
if solve(board, rn, cn):
return board
board[rn][cn] = 0
return False

Divide integer by a list to create a new list

I've created a list of number in a specified range. I now want to divide an value by each element in the list, and then add that new value to a new list.
Heres what I've got:
Y = []
value = 55 #can be any value of my choosing
newx = list(range(50,500,10))
newy = value/(newx)**2
Y.append(newy)
I keep getting TypeError with unsupported operand types for ** or pow(): list and int and I don't know why. NOTE: The ** is a syntax for power i.e 1/(x^2)
One "clean" option to do it is to use numpy array:
import numpy as np
value = 55 #can be any value of my choosing
Y = np.arange(50,500,10)
Y = value/(Y)**2
You got an error since in python you cannot take a square of a list (and you also cannot devide a number by a list). numpy array allows you to take a square and to do this division and many other mathematical operations.
Your description says what you want to do: divide a value by each element in a list. But that's not what you're actually doing, which is trying to divide the value by the list itself. You should do what you say you want to:
Y = [value/v for v in newx]
(I don't understand what the ** is for, you don't mention that anywhere.)
You can just use a list comprehension :
newy = [value/x**2 for x in newx]
The error you get is because the square of a list isn't defined.
The square of a numpy.array is defined though, and would be a new array with the square of each element from the original array.
Depending on the value and range you're working with, you might want to convert the int to float first. You could get 0s otherwise :
>>> value = 55
>>> newx = range(50, 500, 10)
>>> [value/x**2 for x in newx]
[0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0]
But :
>>> [value/float(x)**2 for x in newx]
[0.022, 0.015277777777777777, 0.011224489795918367, 0.00859375, 0.006790123456790123, 0.0055, 0.004545454545454545, 0.0038194444444444443, 0.003254437869822485, 0.0028061224489795917, 0.0024444444444444444, 0.0021484375, 0.0019031141868512112, 0.0016975308641975309, 0.0015235457063711912, 0.001375, 0.0012471655328798186, 0.0011363636363636363, 0.0010396975425330812, 0.0009548611111111111, 0.00088, 0.0008136094674556213, 0.0007544581618655693, 0.0007015306122448979, 0.0006539833531510107, 0.0006111111111111111, 0.0005723204994797086, 0.000537109375, 0.000505050505050505, 0.0004757785467128028, 0.0004489795918367347, 0.0004243827160493827, 0.00040175310445580715, 0.0003808864265927978, 0.0003616042077580539, 0.00034375, 0.0003271861986912552, 0.00031179138321995464, 0.00029745808545159546, 0.0002840909090909091, 0.00027160493827160494, 0.0002599243856332703, 0.00024898143956541424, 0.00023871527777777777, 0.00022907122032486465]

fill in list in multiple steps

Lets assume you have a list with y poisitions (0 for sake of this question). If y = 10:
[0,0,0,0,0,0,0,0,0,0]
You want to fill adjacent positions up to a given value x and append it to an empty list. If x = 4:
[[1,1,1,1,0,0,0,0,0,0], [0,1,1,1,1,0,0,0,0,0], [0,0,1,1,1,1,0,0,0,0], ... , [0,0,0,0,0,0,1,1,1,1]]
I made that occur through this function:
def permutations(number=4, limit=10):
perms = []
if type(number) == int:
a = -1
b = a + number
while b < limit:
a+=1
b = a + number
start = [0 for x in range(limit)]
for i in range(a, b):
start[i] = 1
perms.append(start)
This is fine, but if I want to do the same thing, but pass a tuple instead of an integer I'd like the output to be:
if number = (4,3):
[[1,1,1,1,0,1,1,1,0,0], [1,1,1,1,0,0,1,1,1,0], [1,1,1,1,0,0,0,1,1,1],
[0,1,1,1,1,0,1,1,1,0], [0,1,1,1,1,0,0,1,1,1],
[0,0,1,1,1,1,0,1,1,1]]
The 0 between the two groupings of 1's is necessary the first value of the tuple corresponds to the number of 1's in the first grouping, and the second value of the tuple corresponds to the number of 1's in the second grouping. Ideally this function would work with tuples that have more than 2 values.
This idea is a little challenging to get across so please let me know if you need any clarification.
Thank you for your help!
The simplest approach I can think of is to generate all possible combinations of 1 and 0, and filter out all of the ones that don't have the right grouping lengths.
import itertools
def permutations(tup, limit=10):
for candidate in itertools.product([0,1], repeat=limit):
segment_lengths = [len(list(b)) for a,b in itertools.groupby(candidate) if a == 1]
if tup == tuple(segment_lengths):
yield candidate
for seq in permutations((4, 3), 10):
print seq
Result:
(0, 0, 1, 1, 1, 1, 0, 1, 1, 1)
(0, 1, 1, 1, 1, 0, 0, 1, 1, 1)
(0, 1, 1, 1, 1, 0, 1, 1, 1, 0)
(1, 1, 1, 1, 0, 0, 0, 1, 1, 1)
(1, 1, 1, 1, 0, 0, 1, 1, 1, 0)
(1, 1, 1, 1, 0, 1, 1, 1, 0, 0)
Note that this is very slow for large values of limit - it has to evaluate 2^limit candidate sequences. Not bad for limit = 10; only 1024 candidates need to be evaluated. But it quickly grows into the millions and beyond for larger limits.
Edit: Inspired by user2097159's excellent comment, here's an approach with better run time.
import itertools
"""Finds all non-negative integer sequences whose sum equals `total`, and who have `size` elements."""
def possible_sums(total, size):
if total == 0:
yield [0]*size
return
if size == 1:
yield [total]
return
for i in range(total+1):
left = [i]
for right in possible_sums(total-i, size-1):
yield left + right
"""
combines two lists a and b in order like:
[a[0], b[0], a[1], b[1]...]
"""
def interleave(a,b):
result = []
for pair in itertools.izip_longest(a,b):
for item in pair:
if item is not None:
result.append(item)
return result
"""flattens a list of lists into a one dimensional list"""
def flatten(seq):
return [x for item in seq for x in item]
def permutations(tup, limit):
one_segments = [[1]*size for size in tup]
for i in range(len(tup)-1):
one_segments[i].append(0)
remaining_zeroes = limit - sum(tup) - len(tup) + 1
assert remaining_zeroes >= 0, "not enough room to separate ranges!"
for gap_sizes in possible_sums(remaining_zeroes, len(tup)+1):
zero_segments = [[0]*size for size in gap_sizes]
yield flatten(interleave(zero_segments, one_segments))
for seq in permutations((4, 3), 10):
print seq
You can generate all list recursively.
F(tup, limit) =
[1, 1, ...1, 0] combine with all solutions of F(tup[1:], limit - len(tup[1]) - 1)
[0, 1 ,1 , ... 1, 0] combine with all solutions of F(tup[1:], limit - len(tup[1]) - 2)
.
.
.
if tup is empty return a list of zero
if sum(tup) + len(tup) - 1 > limit, return an empty list since there is no solution.
e.g. permutations((4,3,2), 10) shall return []
Otherwise, enumerating how many prefix zero there will be:
Generate prefix list which is [0, 0, 0 .. 0, 1, 1, ... 1, 0] The number of 1s is the value of first item in the tuple. Append additional 0 if it's not the last item of the tuple.
Call the function recursively for the rest element in the tuple to solve the similar sub-problem
Combine the prefix list with each solution of the sub-problem
Here is the code:
def permutations(tup, limit=100):
if len(tup) <= 0:
return [[0] * limit]
minimum_len = sum(tup) + len(tup) - 1
if minimum_len > limit:
return []
perms = []
for prefix_zero in range(0, limit - minimum_len + 1):
prefix = [0] * prefix_zero + [1] * tup[0]
if len(tup) > 1:
prefix += [0]
suffix_list = permutations(tup[1:], limit - len(prefix))
perms += [prefix + suffix for suffix in suffix_list] #combine the solutions
return perms
This solution creates all permutations of blocks of ones (a list defined by each entry in the tuple) with blocks of zeros (lists of length one) for the extra padding.
import itertools as it
spec = (1,2,3)
nBlocks = len(spec)
nZeros = 5
totalSize = sum(spec) + nZeros+1-nBlocks
blocks = [[1,]*s + [0,] for s in spec]
zeros = [[0,],]*(nZeros+1-nBlocks)
a = list(it.permutations(blocks + zeros, nZeros+1))
b = [list(it.chain.from_iterable(l))[:-1] for l in a]
for l in b:
print l
Without using itertools.
My shot at this, should be fairly quick, but uses a recursive generator (python recursion depth limit, here I come...).
# simple test case
seqs = (1, 2, 3)
length = 10
# '0' spots count
zeros = length - (sum(seqs))
# partitions count
partitions = len(seqs) + 1
# first and last can partitions have 0 zeros
# so use a flag when we call the function or check if it's the last partition
def generate_gaps(zeros_left, partition, first=False):
"""
:param zeros_left: how many zeros we can still use
:param partition: what partition is this
:param first: is this the first gap
:return: all possible gaps
"""
for gap in range((0 if first or partition == 0 else 1), zeros_left + 1):
if partition == 0:
if (zeros_left - gap) == 0:
yield [gap]
else:
for rest in generate_gaps(zeros_left - gap, partition - 1):
yield [gap] + rest
for gaps in generate_gaps(zeros, partitions - 1, True):
print "using gaps: " + str(gaps)
# merge lists
# zip gaps (0's) and sequences (1's) - all but last gap (added to result)
gaps_seqs = zip(gaps, seqs)
# expand everything... magic (could be done explicitly trivially).
result = sum(map(lambda x: [0] * x[0] + [1] * x[1], gaps_seqs)
# last gap (truncated from zip)
result = result + [[0] * gaps[-1]], [])
A simple non-recursive generator solution without itertools:
def fill_sequence(sequence, size):
n_slots = size - len(sequence)
for start in xrange(n_slots + 1):
yield [0]*start + sequence + [0]*(n_slots - start)
def all_placements(inner_sizes, outer_size):
x, y = inner_sizes
for margin in xrange(1, outer_size - sum(block_sizes) + 1):
sequence = [1]*x + [0]*margin + [1]*y
for p in fill_sequence(sequence, outer_size):
yield p
So that:
>>> list(all_placements((4,3), 10))
[[1, 1, 1, 1, 0, 1, 1, 1, 0, 0],
[0, 1, 1, 1, 1, 0, 1, 1, 1, 0],
[0, 0, 1, 1, 1, 1, 0, 1, 1, 1],
[1, 1, 1, 1, 0, 0, 1, 1, 1, 0],
[0, 1, 1, 1, 1, 0, 0, 1, 1, 1],
[1, 1, 1, 1, 0, 0, 0, 1, 1, 1]]
The idea is quite simple. Suppose you fix the number of zeros between your two blocks of ones, call it the margin. This gives you a 4 + margin + 3 sequence. You can easily place this sequence in the larger list of zeros using the approach you took in your post. Then simply iteratively increase the margin, yielding all possible placements.

Python nonogram uniqueness

I'm trying to write a Python script to determine if a given nonogram is unique. My current script just takes way too long to run so I was wondering if anyone had any ideas.
I understand that the general nonogram problem is NP-hard. However, I know two pieces of information about my given nonograms:
When representing the black/white boxes as 0s and 1s, respectively, I know how many of each I have.
I'm only considering 6x6 nonograms.
I initially used a brute force approach (so 2^36 cases). Knowing (1), however, I was able to narrow it down to n-choose-k (36-choose-number of zeroes) cases. However, when k is near 18, this is still ~2^33 cases. Takes days to run.
Any ideas how I might speed this up? Is it even possible?
Again, I don't care what the solution is -- I already have it. What I'm trying to determine is if that solution is unique.
EDIT:
This isn't exactly the full code but has the general idea:
def unique(nonogram):
found = 0
# create all combinations with the same number of 1s and 0s as incoming nonogram
for entry in itertools.combinations(range(len(nonogram)), nonogram.count(1)):
blank = [0]*len(nonogram) # initialize blank nonogram
for element in entry:
blank[element] = 1 # distribute 1s across nonogram
rows = find_rows(blank) # create row headers (like '2 1')
cols = find_cols(blank)
if rows == nonogram_rows and cols == nonogram_cols:
found += 1 # row and col headers same as original nonogram
if found > 1:
break # obviously not unique
if found == 1:
print('Unique nonogram')
I can't think of a clever way to prove uniqueness other than to solve the problem, but 6x6 is small enough that we can basically do a brute-force solution. To speed things up, instead of looping over every possible nonogram we can loop over all satisfying rows. Something like this (note: untested) should work:
from itertools import product, groupby
from collections import defaultdict
def vec_to_spec(v):
return tuple(len(list(g)) for k,g in groupby(v) if k)
def build_specs(n=6):
specs = defaultdict(list)
for v in product([0,1], repeat=n):
specs[vec_to_spec(v)].append(v)
return specs
def check(rowvecs, row_counts, col_counts):
colvecs = zip(*rowvecs)
row_pass = all(vec_to_spec(r) == tuple(rc) for r,rc in zip(rowvecs, row_counts))
col_pass = all(vec_to_spec(r) == tuple(rc) for r,rc in zip(colvecs, col_counts))
return row_pass and col_pass
def nonosolve(row_counts, col_counts):
specs = build_specs(len(row_counts))
possible_rows = [specs[tuple(r)] for r in row_counts]
sols = []
for poss in product(*possible_rows):
if check(poss, row_counts, col_counts):
sols.append(poss)
return sols
from which we learn that
>>> rows = [[2,2],[4], [1,1,1,], [2], [1,1,1,], [3,1]]
>>> cols = [[1,1,2],[1,1],[1,1],[4,],[2,1,],[3,2]]
>>> nonosolve(rows, cols)
[((1, 1, 0, 0, 1, 1), (0, 0, 1, 1, 1, 1), (1, 0, 0, 1, 0, 1),
(0, 0, 0, 1, 1, 0), (1, 0, 0, 1, 0, 1), (1, 1, 1, 0, 0, 1))]
>>> len(_)
1
is unique, but
>>> rows = [[1,1,1],[1,1,1], [1,1,1,], [1,1,1], [1,1,1], [1,1,1]]
>>> cols = rows
>>> nonosolve(rows, cols)
[((0, 1, 0, 1, 0, 1), (1, 0, 1, 0, 1, 0), (0, 1, 0, 1, 0, 1), (1, 0, 1, 0, 1, 0), (0, 1, 0, 1, 0, 1), (1, 0, 1, 0, 1, 0)),
((1, 0, 1, 0, 1, 0), (0, 1, 0, 1, 0, 1), (1, 0, 1, 0, 1, 0), (0, 1, 0, 1, 0, 1), (1, 0, 1, 0, 1, 0), (0, 1, 0, 1, 0, 1))]
>>> len(_)
2
isn't.
[Note that this isn't a very good solution for the problem in general as it throws away most of the information, but it was straightforward.]

Detect whether sequence is a multiple of a subsequence in Python

I have a tuple of zeros and ones, for instance:
(1, 0, 1, 1, 1, 0, 1, 1, 1, 0, 1, 1)
It turns out:
(1, 0, 1, 1, 1, 0, 1, 1, 1, 0, 1, 1) == (1, 0, 1, 1) * 3
I want a function f such that if s is a non-empty tuple of zeros and ones, f(s) is the shortest subtuple r such that s == r * n for some positive integer n.
So for instance,
f( (1, 0, 1, 1, 1, 0, 1, 1, 1, 0, 1, 1) ) == (1, 0, 1, 1)
What is a slick way to write the function f in Python?
Edit:
The naive method I am currently using
def f(s):
for i in range(1,len(s)):
if len(s)%i == 0 and s == s[:i] * (len(s)/i):
return s[:i]
I believe I have an O(n) time solution (actually 2n+r, n is length of tuple, r is sub tuplle) which does not use suffix trees, but uses a string matching algorithm (like KMP, which you should find off-the shelf).
We use the following little known theorem:
If x,y are strings over some alphabet,
then xy = yx if and only if x = z^k and y = z^l for some string z and integers k,l.
I now claim that, for the purposes of our problem, this means that all we need to do is determine if the given tuple/list (or string) is a cyclic shift of itself!
To determine if a string is a cyclic shift of itself, we concatenate it with itself (it does not even have to be a real concat, just a virtual one will do) and check for a substring match (with itself).
For a proof of that, suppose the string is a cyclic shift of itself.
The we have that the given string y = uv = vu.
Since uv = vu, we must have that u = z^k and v= z^l and hence y = z^{k+l} from the above theorem. The other direction is easy to prove.
Here is the python code. The method is called powercheck.
def powercheck(lst):
count = 0
position = 0
for pos in KnuthMorrisPratt(double(lst), lst):
count += 1
position = pos
if count == 2:
break
return lst[:position]
def double(lst):
for i in range(1,3):
for elem in lst:
yield elem
def main():
print powercheck([1, 0, 1, 1, 1, 0, 1, 1, 1, 0, 1, 1])
if __name__ == "__main__":
main()
And here is the KMP code which I used (due to David Eppstein).
# Knuth-Morris-Pratt string matching
# David Eppstein, UC Irvine, 1 Mar 2002
def KnuthMorrisPratt(text, pattern):
'''Yields all starting positions of copies of the pattern in the text.
Calling conventions are similar to string.find, but its arguments can be
lists or iterators, not just strings, it returns all matches, not just
the first one, and it does not need the whole text in memory at once.
Whenever it yields, it will have read the text exactly up to and including
the match that caused the yield.'''
# allow indexing into pattern and protect against change during yield
pattern = list(pattern)
# build table of shift amounts
shifts = [1] * (len(pattern) + 1)
shift = 1
for pos in range(len(pattern)):
while shift <= pos and pattern[pos] != pattern[pos-shift]:
shift += shifts[pos-shift]
shifts[pos+1] = shift
# do the actual search
startPos = 0
matchLen = 0
for c in text:
while matchLen == len(pattern) or \
matchLen >= 0 and pattern[matchLen] != c:
startPos += shifts[matchLen]
matchLen -= shifts[matchLen]
matchLen += 1
if matchLen == len(pattern):
yield startPos
For your sample this outputs
[1,0,1,1]
as expected.
I compared this against shx2's code(not the numpy one), by generating a random 50 bit string, then replication to make the total length as 1 million. This was the output (the decimal number is the output of time.time())
1362988461.75
(50, [1, 1, 1, 0, 0, 1, 0, 1, 0, 0, 1, 0, 0, 1, 1, 1, 0, 0, 0, 0, 0, 0, 1, 1, 1, 1, 0, 0, 0, 1, 1, 0, 1, 1, 1, 1, 1, 1, 1, 0, 0, 1, 1, 1, 0, 0, 0, 0, 0, 1])
1362988465.96
50 [1, 1, 1, 0, 0, 1, 0, 1, 0, 0, 1, 0, 0, 1, 1, 1, 0, 0, 0, 0, 0, 0, 1, 1, 1, 1, 0, 0, 0, 1, 1, 0, 1, 1, 1, 1, 1, 1, 1, 0, 0, 1, 1, 1, 0, 0, 0, 0, 0, 1]
1362988487.14
The above method took ~4 seconds, while shx2's method took ~21 seconds!
Here was the timing code. (shx2's method was called powercheck2).
def rand_bitstring(n):
rand = random.SystemRandom()
lst = []
for j in range(1, n+1):
r = rand.randint(1,2)
if r == 2:
lst.append(0)
else:
lst.append(1)
return lst
def main():
lst = rand_bitstring(50)*200000
print time.time()
print powercheck(lst)
print time.time()
powercheck2(lst)
print time.time()
The following solution is O(N^2), but has the advantage of not creating any copies (or slices) of your data, as it is based on iterators.
Depending on the size of your input, the fact you avoid making copies of the data can result in a significant speed-up, but of course, it would not scale as well for huge inputs as algorithms with lower complexity (e.g. O(N*logN)).
[This is the second revision of my solution, the first one is given below. This one is simpler to understand, and is more along the lines of OP's tuple-multiplication, only using iterators.]
from itertools import izip, chain, tee
def iter_eq(seq1, seq2):
""" assumes the sequences have the same len """
return all( v1 == v2 for v1, v2 in izip(seq1, seq2) )
def dup_seq(seq, n):
""" returns an iterator which is seq chained to itself n times """
return chain(*tee(seq, n))
def is_reps(arr, slice_size):
if len(arr) % slice_size != 0:
return False
num_slices = len(arr) / slice_size
return iter_eq(arr, dup_seq(arr[:slice_size], num_slices))
s = (1, 0, 1, 1, 1, 0, 1, 1, 1, 0, 1, 1)
for i in range(1,len(s)):
if is_reps(s, i):
print i, s[:i]
break
[My original solution]
from itertools import islice
def is_reps(arr, num_slices):
if len(arr) % num_slices != 0:
return False
slice_size = len(arr) / num_slices
for i in xrange(slice_size):
if len(set( islice(arr, i, None, num_slices) )) > 1:
return False
return True
s = (1, 0, 1, 1, 1, 0, 1, 1, 1, 0, 1, 1)
for i in range(1,len(s)):
if is_reps(s, i):
print i, s[:i]
break
You can avoid the call to set() by using something like:
def is_iter_unique(seq):
""" a faster version of testing len(set(seq)) <= 1 """
seen = set()
for x in seq:
seen.add(x)
if len(seen) > 1:
return False
return True
and replacing this line:
if len(set( islice(arr, i, None, num_slices) )) > 1:
with:
if not is_iter_unique(islice(arr, i, None, num_slices)):
Simplifying Knoothe's solution. His algorithm is right, but his implementation is too complex. This implementation is also O(n).
Since your array is only composed of ones and zeros, what I do is use existing str.find implementation (Bayer Moore) to implement Knoothe's idea. It's suprisingly simpler and amazingly faster at runtime.
def f(s):
s2 = ''.join(map(str, s))
return s[:(s2+s2).index(s2, 1)]
Here's another solution (competing with my earlier iterators-based solution), leveraging numpy.
It does make a (single) copy of your data, but taking advantage of the fact your values are 0s and 1s, it is super-fast, thanks to numpy's magics.
import numpy as np
def is_reps(arr, slice_size):
if len(arr) % slice_size != 0:
return False
arr = arr.reshape((-1, slice_size))
return (arr.all(axis=0) | (~arr).all(axis=0)).all()
s = (1, 0, 1, 1, 1, 0, 1, 1, 1, 1, 1, 1) * 1000
a = np.array(s, dtype=bool)
for i in range(1,len(s)):
if is_reps(a, i):
print i, s[:i]
break
Just a different approach to the problem
I first determine all the factors of the length and then split the list and check if all the parts are same
>>> def f(s):
def factors(n):
#http://stackoverflow.com/a/6800214/977038
return set(reduce(list.__add__,
([i, n//i] for i in range(2, int(n**0.5) + 1) if n % i == 0)))
_len = len(s)
for fact in reversed(list(factors(_len))):
compare_set = set(izip(*[iter(s)]*fact))
if len(compare_set) == 1:
return compare_set
>>> f(t)
set([(1, 0, 1, 1)])
You can archive it in sublinear time by XOR'ing the rotated binary form for the input array:
get the binary representation of the array, input_binary
loop from i = 1 to len(input_array)/2, and for each loop, rotate the input_binary to the right by i bits, save it as rotated_bin, then compare the XOR of rotated_bin and input_binary.
The first i that yields 0, is the index to which is the desired substring.
Complete code:
def get_substring(arr):
binary = ''.join(map(str, arr)) # join the elements to get the binary form
for i in xrange(1, len(arr) / 2):
# do a i bit rotation shift, get bit string sub_bin
rotated_bin = binary[-i:] + binary[:-i]
if int(rotated_bin) ^ int(binary) == 0:
return arr[0:i]
return None
if __name__ == "__main__":
test = [1, 0, 1, 1, 1, 0, 1, 1, 1, 0, 1, 1]
print get_substring(test) # [1,0,1,1]
This one is just a dumb recursive comparison in Haskell. It takes about one second for Knoothe's million long string (f a). Cool problem! I'll think about it some more.
a = concat $ replicate 20000
[1,1,1,0,0,1,0,1,0,0,1,0,0,1,1,1,0,0,
0,0,0,0,1,1,1,1,0,0,0,1,1,0,1,1,1,1,
1,1,1,0,0,1,1,1,0,0,0,0,0,1]
f s =
f' s [] where
f' [] result = []
f' (x:xs) result =
let y = result ++ [x]
in if concat (replicate (div (length s) (length y)) y) == s
then y
else f' xs y

Categories

Resources