Python nonogram uniqueness - python

I'm trying to write a Python script to determine if a given nonogram is unique. My current script just takes way too long to run so I was wondering if anyone had any ideas.
I understand that the general nonogram problem is NP-hard. However, I know two pieces of information about my given nonograms:
When representing the black/white boxes as 0s and 1s, respectively, I know how many of each I have.
I'm only considering 6x6 nonograms.
I initially used a brute force approach (so 2^36 cases). Knowing (1), however, I was able to narrow it down to n-choose-k (36-choose-number of zeroes) cases. However, when k is near 18, this is still ~2^33 cases. Takes days to run.
Any ideas how I might speed this up? Is it even possible?
Again, I don't care what the solution is -- I already have it. What I'm trying to determine is if that solution is unique.
EDIT:
This isn't exactly the full code but has the general idea:
def unique(nonogram):
found = 0
# create all combinations with the same number of 1s and 0s as incoming nonogram
for entry in itertools.combinations(range(len(nonogram)), nonogram.count(1)):
blank = [0]*len(nonogram) # initialize blank nonogram
for element in entry:
blank[element] = 1 # distribute 1s across nonogram
rows = find_rows(blank) # create row headers (like '2 1')
cols = find_cols(blank)
if rows == nonogram_rows and cols == nonogram_cols:
found += 1 # row and col headers same as original nonogram
if found > 1:
break # obviously not unique
if found == 1:
print('Unique nonogram')

I can't think of a clever way to prove uniqueness other than to solve the problem, but 6x6 is small enough that we can basically do a brute-force solution. To speed things up, instead of looping over every possible nonogram we can loop over all satisfying rows. Something like this (note: untested) should work:
from itertools import product, groupby
from collections import defaultdict
def vec_to_spec(v):
return tuple(len(list(g)) for k,g in groupby(v) if k)
def build_specs(n=6):
specs = defaultdict(list)
for v in product([0,1], repeat=n):
specs[vec_to_spec(v)].append(v)
return specs
def check(rowvecs, row_counts, col_counts):
colvecs = zip(*rowvecs)
row_pass = all(vec_to_spec(r) == tuple(rc) for r,rc in zip(rowvecs, row_counts))
col_pass = all(vec_to_spec(r) == tuple(rc) for r,rc in zip(colvecs, col_counts))
return row_pass and col_pass
def nonosolve(row_counts, col_counts):
specs = build_specs(len(row_counts))
possible_rows = [specs[tuple(r)] for r in row_counts]
sols = []
for poss in product(*possible_rows):
if check(poss, row_counts, col_counts):
sols.append(poss)
return sols
from which we learn that
>>> rows = [[2,2],[4], [1,1,1,], [2], [1,1,1,], [3,1]]
>>> cols = [[1,1,2],[1,1],[1,1],[4,],[2,1,],[3,2]]
>>> nonosolve(rows, cols)
[((1, 1, 0, 0, 1, 1), (0, 0, 1, 1, 1, 1), (1, 0, 0, 1, 0, 1),
(0, 0, 0, 1, 1, 0), (1, 0, 0, 1, 0, 1), (1, 1, 1, 0, 0, 1))]
>>> len(_)
1
is unique, but
>>> rows = [[1,1,1],[1,1,1], [1,1,1,], [1,1,1], [1,1,1], [1,1,1]]
>>> cols = rows
>>> nonosolve(rows, cols)
[((0, 1, 0, 1, 0, 1), (1, 0, 1, 0, 1, 0), (0, 1, 0, 1, 0, 1), (1, 0, 1, 0, 1, 0), (0, 1, 0, 1, 0, 1), (1, 0, 1, 0, 1, 0)),
((1, 0, 1, 0, 1, 0), (0, 1, 0, 1, 0, 1), (1, 0, 1, 0, 1, 0), (0, 1, 0, 1, 0, 1), (1, 0, 1, 0, 1, 0), (0, 1, 0, 1, 0, 1))]
>>> len(_)
2
isn't.
[Note that this isn't a very good solution for the problem in general as it throws away most of the information, but it was straightforward.]

Related

Sorting an iterator in python

I want to iterate over a big itertools product, but I want to do it in a different order from the one that product offers. The problem is that sorting an iterator using sorted takes time. For example:
from itertools import product
import time
RNG = 15
RPT = 6
start = time.time()
a = sorted(product(range(RNG), repeat=RPT), key=sum)
print("Sorted: " + str(time.time() - start))
print(type(a))
start = time.time()
a = product(range(RNG), repeat=RPT)
print("Unsorted: " + str(time.time() - start))
print(type(a))
Creating the sorted iterator takes about twice as long. I'm guessing this is because sorted actually involves going through the whole iterator and returning a list. Whereas the second unsorted iterator is doing some sort of lazy evaluation magic.
I guess there's really two questions here.
General question: is there a lazy evaluation way to change the order items appear in an iterator?
Specific question: is there a way to loop through all m-length lists of ints less than n, hitting lists with smaller sums first?
If your objective is to reduce memory consumption, you could write your own generator to return the permutations in order of their sum (see below). But, if memory is not a concern, sorting the output of itertools.product() will be faster than the Python code that produces the same result.
Writing a recursive function that produces the combinations of values in order of their sum can be achieved by merging multiple iterators (one per starting value) based on the smallest sum:
def sumCombo(A,N):
if N==1:
yield from ((n,) for n in A) # single item combos
return
pA = [] # list of iterator/states
for i,n in enumerate(A): # for each starting value
ip = sumCombo(A[i:],N-1) # iterator recursion to N-1
p = next(ip) # current N-1 combination
pA.append((n+sum(p),p,n,ip)) # sum, state & iterator
while pA:
# index and states of smallest sum
i,(s,p,n,ip) = min(enumerate(pA),key=lambda ip:ip[1][0])
ps = s
while s == ps: # output equal sum combinations
yield (n,*p) # yield starting number with recursed
p = next(ip,None) # advance iterator
if p is None:
del pA[i] # remove exhausted iterators
break
s = n+sum(p) # compute new sum
pA[i] = (s,p,n,ip) # and update states
This will only produce combinations of values as opposed to the product which produces distinct permutations of these combinations. (38,760 combinations vs 11,390,625 products).
In order to obtain all the products, you would need to run these combinations through a function that generates distinct permutations:
def permuteDistinct(A):
if len(A) == 1:
yield tuple(A) # single value
return
seen = set() # track starting value
for i,n in enumerate(A): # for each starting value
if n in seen: continue # not yet used
seen.add(n)
for p in permuteDistinct(A[:i]+A[i+1:]):
yield (n,*p) # starting value & rest
def sumProd(A,N):
for p in sumCombo(A,N): # combinations in order of sum
yield from permuteDistinct(p) # permuted
So sumProd(range(RNG),RPT) will produce the 11,390,625 permutations in order of their sum, without storing them in a list BUT it will take 5 times longer to do so (compared to sorting the product).
a = sorted(product(range(RNG), repeat=RPT), key=sum) # 4.6 sec
b = list(sumProd(range(RNG),RPT)) # 23 sec
list(map(sum,a)) == list(map(sum,b)) # True (same order of sums)
a == b # False (order differs for equal sums)
a[5:15] b[5:15] sum
(0, 1, 0, 0, 0, 0) (0, 1, 0, 0, 0, 0) 1
(1, 0, 0, 0, 0, 0) (1, 0, 0, 0, 0, 0) 1
(0, 0, 0, 0, 0, 2) (0, 0, 0, 0, 0, 2) 2
(0, 0, 0, 0, 1, 1) (0, 0, 0, 0, 2, 0) 2
(0, 0, 0, 0, 2, 0) (0, 0, 0, 2, 0, 0) 2
(0, 0, 0, 1, 0, 1) (0, 0, 2, 0, 0, 0) 2
(0, 0, 0, 1, 1, 0) (0, 2, 0, 0, 0, 0) 2
(0, 0, 0, 2, 0, 0) (2, 0, 0, 0, 0, 0) 2
(0, 0, 1, 0, 0, 1) (0, 0, 0, 0, 1, 1) 2
(0, 0, 1, 0, 1, 0) (0, 0, 0, 1, 0, 1) 2
If your process is searching for specific sums, it may be interesting to filter on combinations first and only expand distinct permutations for the combinations (sums) that meet your criteria. This could potentially cut down the number of iterations considerably (sumCombo(range(RNG),RPT) # 0.22 sec is faster than sorting the products).

Fill missing values in lists

I have a list which consists of 0's and 1's. The list should ideally look like this 0,1,0,1,0,1,0,1,0,1,0,1,0,1.....
But due to some error in logging, my list looks like this: 0,1,0,1,1,1,0,1,0,0,0,1,0,1.... As one can clearly there are some missed 0's and 1's in middle. How can I fix this list to add those 0's and 1's in between the missing elements so as to get to the desired list values.
Here is the code used by me, this does the task for me but it is not the most pythonic way of writing scripts. So how can I improve on this script?
l1 = [0,1,0,1,1,1,0,1,0,0,0,1,0,1]
indices = []
for i in range(1,len(l1)):
if l1[i]!=l1[i-1]:
continue
else:
if l1[i]==0:
val=1
else:
val=0
l1.insert(i, val)
EDIT
As asked in the comments, Let me explain why is this important rather than generating 1's and 0's. I have TTL pulse coming i.e. a series of HIGH(1) and LOW(0) coming in and simultaneously time for each of these TTL pulse is logged on 2 machines with different clocks.
Now while machine I is extremely stable and logging each sequence of HIGH(1) and low(1) accurately, the other machine ends up missing a couple of them and as a result I don't have time information for those.
All I wanted was to merge the missing TTL pulse on one machine wrt to the other machine. This will now allow me to align time on both of them or log None for not received pulse.
Reason for doing this rather than correcting the logging thing (as asked in comments) is that this is an old collected data. We have now fixed the logging issue.
You can try something like this:
from itertools import chain
l1 = [0,1,0,1,1,1,0,1,0,0,0,1,0,1]
c = max(l1.count(0), l1.count(1))
print list(chain(*zip([0]*c,[1]*c)))
Output:
[0, 1, 0, 1, 0, 1, 0, 1, 0, 1, 0, 1, 0, 1]
why would you have a list of 0,1,0,1,0,1? there is no good reason i can think of. oh well thats beyond the scope of this question i guess...
list(itertools.islice(itertools.cycle([0,1]),expected_length))
Just multiply a new list.
>>> l1 = [0,1,0,1,1,1,0,1,0,0,0,1,0,1]
>>> l1
[0, 1, 0, 1, 1, 1, 0, 1, 0, 0, 0, 1, 0, 1]
>>> [0,1] * (len(l1)//2)
[0, 1, 0, 1, 0, 1, 0, 1, 0, 1, 0, 1, 0, 1]
If the list has an odd number of elements, add the necessary 0:
>>> l2 = [0,1,0,1,1,1,0,1,0,0,0,1,0,1,0]
>>> l2_ = [0,1] * (len(l1)//2)
>>> if len(l2)%2: l2_.append(0)
...
>>> l2
[0, 1, 0, 1, 1, 1, 0, 1, 0, 0, 0, 1, 0, 1, 0]
>>> l2_
[0, 1, 0, 1, 0, 1, 0, 1, 0, 1, 0, 1, 0, 1, 0]

Generating a list of repetitions regardless of the order

I want to generate combinations that associate indices in a list with "slots". For instance,(0, 0, 1) means that 0 and 1 belong to the same slot while 2 belongs to an other. (0, 1, 1, 1) means that 1, 2, 3 belong to the same slot while 0 is by itself. In this example, 0 and 1 are just ways of identifying these slots but do not carry information for my usage.
Consequently, (0, 0, 0) is absolutely identical to (1, 1, 1) for my purposes, and (0, 0, 1) is equivalent to (1, 1, 0).
The classical cartesian product generates a lot of these repetitions I'd like to get rid of.
This is what I obtain with itertools.product :
>>> LEN, SIZE = (3,1)
>>> list(itertools.product(range(SIZE+1), repeat=LEN))
>>>
[(0, 0, 0),
(0, 0, 1),
(0, 1, 0),
(0, 1, 1),
(1, 0, 0),
(1, 0, 1),
(1, 1, 0),
(1, 1, 1)]
And this is what I'd like to get:
>>> [(0, 0, 0),
(0, 0, 1),
(0, 1, 0),
(0, 1, 1)]
It is easy with small lists but I don't quite see how to do this with bigger sets. Do you have a suggestion?
If it's unclear, please tell me so that I can clarify my question. Thank you!
Edit: based on Sneftel's answer, this function seems to work, but I don't know if it actually yields all the results:
def test():
for p in product(range(2), repeat=3):
j=-1
good = True
for k in p:
if k> j and (k-j) > 1:
good = False
elif k >j:
j = k
if good:
yield p
I would start by making the following observations:
The first element of each combination must be 0.
The second element must be 0 or 1.
The third element must be 0, 1 or 2, but it can only be 2 if the second element was 1.
These observations suggest the following algorithm:
def assignments(n, m, used=0):
"""Generate assignments of `n` items to `m` indistinguishable
buckets, where `used` buckets have been used so far.
>>> list(assignments(3, 1))
[(0, 0, 0)]
>>> list(assignments(3, 2))
[(0, 0, 0), (0, 0, 1), (0, 1, 0), (0, 1, 1)]
>>> list(assignments(3, 3))
[(0, 0, 0), (0, 0, 1), (0, 1, 0), (0, 1, 1), (0, 1, 2)]
"""
if n == 0:
yield ()
return
aa = list(assignments(n - 1, m, used))
for first in range(used):
for a in aa:
yield (first,) + a
if used < m:
for a in assignments(n - 1, m, used + 1):
yield (used,) + a
This handles your use case (12 items, 5 buckets) in a few seconds:
>>> from timeit import timeit
>>> timeit(lambda:list(assignments(12, 5)), number=1)
4.513746023178101
>>> sum(1 for _ in assignments(12, 5))
2079475
This is substantially faster than the function you give at the end of your answer (the one that calls product and then drops the invalid assignments) would be if it were modified to handle the (12, 5) use case:
>>> timeit(lambda:list(test(12, 5)), number=1)
540.693009853363
Before checking for duplicates, you should harmonize the notation (assuming you don't want to set up some fancy AI): iterate through the lists and assign set-affiliation numbers for differing elements starting at 0, counting upwards. That is, you create a temporary dictionary per line that you are processing.
An exemplary output would be
(0,0,0) -> (0,0,0)
(0,1,0) -> (0,1,0)
but
(1,0,1) -> (0,1,0)
Removing the duplicates can then easily be performed as the problem is reduced to the problem of the solved question at Python : How to remove duplicate lists in a list of list?
If you only consider the elements of the cartesian product where the first occurrences of all indices are sorted and consecutive from zero, that should be sufficient. itertools.combinations_with_replacement() will eliminate those that are not sorted, so you'll only need to check that indices aren't being skipped.
In your specific case you could simply take the first or the second half of the list of those items produced by a cartesian product.
import itertools
alphabet = '01'
words3Lettered = [''.join(letter) for letter in itertools.product(alphabet,repeat=3)]
for n lettered words use repeat=n
words3Lettered looks like this:
['000', '001', '010', '011', '100', '101', '110', '111']
next,
usefulWords = words3Lettered[:len(words3Lettered)/2]
which looks like this:
['000', '001', '010', '011']
you might be interested in the other half i.e. words3Lettered[len(words3Lettered)/2:] though the other half was supposed to "fold" onto the first half.
most probably you want to use the combination of letters in numeric form so...
indexes = [tuple(int(j) for j in word) for word in usefulWords]
which gives us:
[(0, 0, 0), (0, 0, 1), (0, 1, 0), (0, 1, 1)]

How can I write a function to return all the binary numbers with N digits, and in sorted order? [closed]

Closed. This question does not meet Stack Overflow guidelines. It is not currently accepting answers.
Questions asking for code must demonstrate a minimal understanding of the problem being solved. Include attempted solutions, why they didn't work, and the expected results. See also: Stack Overflow question checklist
Closed 9 years ago.
Improve this question
Each binary number should be represented as a tuple. When the function is called, the result should be a tuple containing 2^N binary numbers.
Ex. Binary(2)----> ((0,0), (0,1), (1,0), (1,1))
I am trying to use a while loop to do this.
Just some advice on where I could begin would be very helpful.
You can use itertools.product, to get what you want
print [item for item in itertools.product([0, 1], repeat = 4)]
Output
[(0, 0, 0, 0), (0, 0, 0, 1), (0, 0, 1, 0), (0, 0, 1, 1), (0, 1, 0, 0),
(0, 1, 0, 1), (0, 1, 1, 0), (0, 1, 1, 1), (1, 0, 0, 0), (1, 0, 0, 1),
(1, 0, 1, 0), (1, 0, 1, 1), (1, 1, 0, 0), (1, 1, 0, 1), (1, 1, 1, 0),
(1, 1, 1, 1)]
Change the repeat to the desired value.
Edit:
Performance comparison with list and comprehension.
print timeit.timeit("[item for item in itertools.product([0, 1], repeat = 4)]", number = 1000000)
print timeit.timeit("list(itertools.product([0, 1], repeat = 4))", number = 1000000)
List comprehension is slightly faster than list.
You'll probably need two loops -- an outer one to loop thru the values, and an inner one to process the binary digits for each value.
You can either loop thru the values as integers & convert them to binary -- or you can carry a "current value" in binary around the loop, copying & incrementing it.
try this:
def binary(n):
num_digits = len(bin(n).replace('0b',''))
all_bin_numbers=()
for i in range(n):
bin_num=()
for digit in str(bin(i)).replace('0b','').rjust(num_digits, '0'):
bin_num += (int(digit),)
all_bin_numbers += (bin_num,)
return all_bin_numbers
print binary(2)
e: holy cow, that itertools answer.
e2: so it appears I didn't fully read the question, I was thinking you wanted all the binary numbers up to and including your specified n.
from numpy import binary_repr
[map(int, binary_repr(i, N)) for i in range(2**N)]

Detect whether sequence is a multiple of a subsequence in Python

I have a tuple of zeros and ones, for instance:
(1, 0, 1, 1, 1, 0, 1, 1, 1, 0, 1, 1)
It turns out:
(1, 0, 1, 1, 1, 0, 1, 1, 1, 0, 1, 1) == (1, 0, 1, 1) * 3
I want a function f such that if s is a non-empty tuple of zeros and ones, f(s) is the shortest subtuple r such that s == r * n for some positive integer n.
So for instance,
f( (1, 0, 1, 1, 1, 0, 1, 1, 1, 0, 1, 1) ) == (1, 0, 1, 1)
What is a slick way to write the function f in Python?
Edit:
The naive method I am currently using
def f(s):
for i in range(1,len(s)):
if len(s)%i == 0 and s == s[:i] * (len(s)/i):
return s[:i]
I believe I have an O(n) time solution (actually 2n+r, n is length of tuple, r is sub tuplle) which does not use suffix trees, but uses a string matching algorithm (like KMP, which you should find off-the shelf).
We use the following little known theorem:
If x,y are strings over some alphabet,
then xy = yx if and only if x = z^k and y = z^l for some string z and integers k,l.
I now claim that, for the purposes of our problem, this means that all we need to do is determine if the given tuple/list (or string) is a cyclic shift of itself!
To determine if a string is a cyclic shift of itself, we concatenate it with itself (it does not even have to be a real concat, just a virtual one will do) and check for a substring match (with itself).
For a proof of that, suppose the string is a cyclic shift of itself.
The we have that the given string y = uv = vu.
Since uv = vu, we must have that u = z^k and v= z^l and hence y = z^{k+l} from the above theorem. The other direction is easy to prove.
Here is the python code. The method is called powercheck.
def powercheck(lst):
count = 0
position = 0
for pos in KnuthMorrisPratt(double(lst), lst):
count += 1
position = pos
if count == 2:
break
return lst[:position]
def double(lst):
for i in range(1,3):
for elem in lst:
yield elem
def main():
print powercheck([1, 0, 1, 1, 1, 0, 1, 1, 1, 0, 1, 1])
if __name__ == "__main__":
main()
And here is the KMP code which I used (due to David Eppstein).
# Knuth-Morris-Pratt string matching
# David Eppstein, UC Irvine, 1 Mar 2002
def KnuthMorrisPratt(text, pattern):
'''Yields all starting positions of copies of the pattern in the text.
Calling conventions are similar to string.find, but its arguments can be
lists or iterators, not just strings, it returns all matches, not just
the first one, and it does not need the whole text in memory at once.
Whenever it yields, it will have read the text exactly up to and including
the match that caused the yield.'''
# allow indexing into pattern and protect against change during yield
pattern = list(pattern)
# build table of shift amounts
shifts = [1] * (len(pattern) + 1)
shift = 1
for pos in range(len(pattern)):
while shift <= pos and pattern[pos] != pattern[pos-shift]:
shift += shifts[pos-shift]
shifts[pos+1] = shift
# do the actual search
startPos = 0
matchLen = 0
for c in text:
while matchLen == len(pattern) or \
matchLen >= 0 and pattern[matchLen] != c:
startPos += shifts[matchLen]
matchLen -= shifts[matchLen]
matchLen += 1
if matchLen == len(pattern):
yield startPos
For your sample this outputs
[1,0,1,1]
as expected.
I compared this against shx2's code(not the numpy one), by generating a random 50 bit string, then replication to make the total length as 1 million. This was the output (the decimal number is the output of time.time())
1362988461.75
(50, [1, 1, 1, 0, 0, 1, 0, 1, 0, 0, 1, 0, 0, 1, 1, 1, 0, 0, 0, 0, 0, 0, 1, 1, 1, 1, 0, 0, 0, 1, 1, 0, 1, 1, 1, 1, 1, 1, 1, 0, 0, 1, 1, 1, 0, 0, 0, 0, 0, 1])
1362988465.96
50 [1, 1, 1, 0, 0, 1, 0, 1, 0, 0, 1, 0, 0, 1, 1, 1, 0, 0, 0, 0, 0, 0, 1, 1, 1, 1, 0, 0, 0, 1, 1, 0, 1, 1, 1, 1, 1, 1, 1, 0, 0, 1, 1, 1, 0, 0, 0, 0, 0, 1]
1362988487.14
The above method took ~4 seconds, while shx2's method took ~21 seconds!
Here was the timing code. (shx2's method was called powercheck2).
def rand_bitstring(n):
rand = random.SystemRandom()
lst = []
for j in range(1, n+1):
r = rand.randint(1,2)
if r == 2:
lst.append(0)
else:
lst.append(1)
return lst
def main():
lst = rand_bitstring(50)*200000
print time.time()
print powercheck(lst)
print time.time()
powercheck2(lst)
print time.time()
The following solution is O(N^2), but has the advantage of not creating any copies (or slices) of your data, as it is based on iterators.
Depending on the size of your input, the fact you avoid making copies of the data can result in a significant speed-up, but of course, it would not scale as well for huge inputs as algorithms with lower complexity (e.g. O(N*logN)).
[This is the second revision of my solution, the first one is given below. This one is simpler to understand, and is more along the lines of OP's tuple-multiplication, only using iterators.]
from itertools import izip, chain, tee
def iter_eq(seq1, seq2):
""" assumes the sequences have the same len """
return all( v1 == v2 for v1, v2 in izip(seq1, seq2) )
def dup_seq(seq, n):
""" returns an iterator which is seq chained to itself n times """
return chain(*tee(seq, n))
def is_reps(arr, slice_size):
if len(arr) % slice_size != 0:
return False
num_slices = len(arr) / slice_size
return iter_eq(arr, dup_seq(arr[:slice_size], num_slices))
s = (1, 0, 1, 1, 1, 0, 1, 1, 1, 0, 1, 1)
for i in range(1,len(s)):
if is_reps(s, i):
print i, s[:i]
break
[My original solution]
from itertools import islice
def is_reps(arr, num_slices):
if len(arr) % num_slices != 0:
return False
slice_size = len(arr) / num_slices
for i in xrange(slice_size):
if len(set( islice(arr, i, None, num_slices) )) > 1:
return False
return True
s = (1, 0, 1, 1, 1, 0, 1, 1, 1, 0, 1, 1)
for i in range(1,len(s)):
if is_reps(s, i):
print i, s[:i]
break
You can avoid the call to set() by using something like:
def is_iter_unique(seq):
""" a faster version of testing len(set(seq)) <= 1 """
seen = set()
for x in seq:
seen.add(x)
if len(seen) > 1:
return False
return True
and replacing this line:
if len(set( islice(arr, i, None, num_slices) )) > 1:
with:
if not is_iter_unique(islice(arr, i, None, num_slices)):
Simplifying Knoothe's solution. His algorithm is right, but his implementation is too complex. This implementation is also O(n).
Since your array is only composed of ones and zeros, what I do is use existing str.find implementation (Bayer Moore) to implement Knoothe's idea. It's suprisingly simpler and amazingly faster at runtime.
def f(s):
s2 = ''.join(map(str, s))
return s[:(s2+s2).index(s2, 1)]
Here's another solution (competing with my earlier iterators-based solution), leveraging numpy.
It does make a (single) copy of your data, but taking advantage of the fact your values are 0s and 1s, it is super-fast, thanks to numpy's magics.
import numpy as np
def is_reps(arr, slice_size):
if len(arr) % slice_size != 0:
return False
arr = arr.reshape((-1, slice_size))
return (arr.all(axis=0) | (~arr).all(axis=0)).all()
s = (1, 0, 1, 1, 1, 0, 1, 1, 1, 1, 1, 1) * 1000
a = np.array(s, dtype=bool)
for i in range(1,len(s)):
if is_reps(a, i):
print i, s[:i]
break
Just a different approach to the problem
I first determine all the factors of the length and then split the list and check if all the parts are same
>>> def f(s):
def factors(n):
#http://stackoverflow.com/a/6800214/977038
return set(reduce(list.__add__,
([i, n//i] for i in range(2, int(n**0.5) + 1) if n % i == 0)))
_len = len(s)
for fact in reversed(list(factors(_len))):
compare_set = set(izip(*[iter(s)]*fact))
if len(compare_set) == 1:
return compare_set
>>> f(t)
set([(1, 0, 1, 1)])
You can archive it in sublinear time by XOR'ing the rotated binary form for the input array:
get the binary representation of the array, input_binary
loop from i = 1 to len(input_array)/2, and for each loop, rotate the input_binary to the right by i bits, save it as rotated_bin, then compare the XOR of rotated_bin and input_binary.
The first i that yields 0, is the index to which is the desired substring.
Complete code:
def get_substring(arr):
binary = ''.join(map(str, arr)) # join the elements to get the binary form
for i in xrange(1, len(arr) / 2):
# do a i bit rotation shift, get bit string sub_bin
rotated_bin = binary[-i:] + binary[:-i]
if int(rotated_bin) ^ int(binary) == 0:
return arr[0:i]
return None
if __name__ == "__main__":
test = [1, 0, 1, 1, 1, 0, 1, 1, 1, 0, 1, 1]
print get_substring(test) # [1,0,1,1]
This one is just a dumb recursive comparison in Haskell. It takes about one second for Knoothe's million long string (f a). Cool problem! I'll think about it some more.
a = concat $ replicate 20000
[1,1,1,0,0,1,0,1,0,0,1,0,0,1,1,1,0,0,
0,0,0,0,1,1,1,1,0,0,0,1,1,0,1,1,1,1,
1,1,1,0,0,1,1,1,0,0,0,0,0,1]
f s =
f' s [] where
f' [] result = []
f' (x:xs) result =
let y = result ++ [x]
in if concat (replicate (div (length s) (length y)) y) == s
then y
else f' xs y

Categories

Resources