Find all subsequences in list

Find all subsequences in list - python

I have a list of 1s and 0s as follows:
lst = [1, 0, 0, 0, 1, 1, 1, 1, 0, 0, 1, 1, 1, 0, 1, 0, 0, 1]
I'm looking for a method that finds all the sequences of 0s within this list and returns their indices, i.e.:
[1, 3]
[8, 9]
[13, 13]
[15, 16]
This answer shows a method of getting the longest sequence, but I can't think of a way to work from it to get all the sequences.

def f(l):
_1to0 = [ i+1 for i, (x, y) in enumerate(zip(l[:-1], l[1:])) if y == 0 and x != y ]
_0to1 = [ i for i, (x, y) in enumerate(zip(l[:-1], l[1:])) if x == 0 and x != y ]
if l[0] == 0:
_1to0.insert(0,0)
if l[-1] == 0:
_0to1.append(len(l))
return zip(_1to0, _0to1)
Detect changes 1 -> 0 (starts) and 0 -> 1 (ends)
If start with 0, add a start at indice 0
If ends with 0, add an end at the last indice
Combine starts and ends in pairs
In [1]: list(f([1, 0, 0, 0, 1, 1, 1, 1, 0, 0, 1, 1, 1, 0, 1, 0, 0, 1]))
Out[1]: [(1, 3), (8, 9), (13, 13), (15, 16)]

For Python 3.8 you can modify the first answer in referenced code by using the Walrus operator
Code
from itertools import groupby
import operator
lst = [1, 0, 0, 0, 1, 1, 1, 1, 0, 0, 1, 1, 1, 0, 1, 0, 0, 1]
r = [(p[1][0][0], p[1][-1][0]) for (x,y) in groupby(enumerate(lst), operator.itemgetter(1)) if (p := (x, list(y)))[0] == 0]
print(r)
Output
[(1, 3), (8, 9), (13, 13), (15, 16)]
Explanation
Adding a Walrus operator to OP code reference we have:
r = [p for (x,y) in groupby(enumerate(lst), operator.itemgetter(1)) if (p := (x, list(y)))[0] == 0]
# Outputs: [(0, [(1, 0), (2, 0), (3, 0)]), (0, [(8, 0), (9, 0)]), (0, [(13, 0)]), (0, [(15, 0), (16, 0)])]
Conditional in the list comprehension:
(p := (x, list(y)))[0] # is a check for x == 0
Need to capture the right terms in p
First p[1] for instance is:
[(1, 0), (2, 0), (3, 0)]
We want the (1, 3) which index 0 of the first and last term of the list
p[1][0][0] # index zero of first tuple -> 1
p[1][-1][0] # index zero of last tuple -> 3
So in general we have the tuple:
(p[1][0][0], p[1][-1][0])

list = [1, 0, 0, 0, 1, 1, 1, 1, 0, 0, 1, 1, 1, 0, 1, 0, 0, 1]
indexes_counts = []
start_zero_index = -1
is_inside_zero_sequence = False
zero_length = 0
# Use enumerate to loop on the lists and indexes
for i, x in enumerate(list):
# If inside a zeroes sequence
if is_inside_zero_sequence:
# If current item is zero too
if 0 == x:
# Increase the zro_length counter
zero_length += 1
# Else, current element is not zero
else:
# Handle end of zeroes sequence
indexes_counts.append([start_zero_index, zero_length])
is_inside_zero_sequence = False
zero_length = 0
# If not in zeroes sequence and current number is not zero
elif 0 == x:
# Handle not zero
is_inside_zero_sequence = True
start_zero_index = i
zero_length = 1
# [[1, 3], [8, 2], [13, 1], [15, 2]]
print(indexes_counts)

Related

find the number of pairs that belong to a column but not a higher order column

I have an n x k binary numpy array, I am trying to find an efficient way to find the number of pairs of ones that belong to some column[j] but not to any higher column, in this case higher means in increasing index value.
For example in the array:
array([[1, 1, 1, 0, 1, 0],
[1, 0, 1, 1, 1, 0],
[1, 0, 1, 0, 0, 0],
[1, 1, 1, 1, 1, 0],
[1, 0, 1, 0, 1, 1],
[1, 0, 1, 0, 1, 1],
[1, 1, 1, 1, 0, 0],
[1, 1, 1, 0, 1, 0]], dtype=int32)
the output should be array([ 0, 0, 11, 2, 14, 1], dtype=int32). This makes sense because we see column[2] has all ones, so any pair of ones will necessarily have a highest column in common of at least 2, because even though column[0] also has all ones, it's lower, so no pair of ones have it as their highest in common. In all cases I am considering, column[0] will always have all ones.
Here is some example code that works and I believe is something like O(n^2 k)
def hcc(i, j, k, bin_mat):
# hcc means highest common columns
# i: index i
# j: index j
# k: number of columns - 1
# bin_mat: binary matrix
for q in range(k, 0, -1):
if (bin_mat[i, q] and bin_mat[j, q]):
return q
return 0
def get_num_pairs_columns(bin_mat):
k = bin_mat.shape[1]-1
num_pairs_hcc = np.zeros(k+1, dtype=np.int32) # number of one-pairs in columns
for i in range(bin_mat.shape[0]):
for j in range(bin_mat.shape[0]):
if(i < j):
num_pairs_hcc[hcc(i, j, k, bin_mat)] += 1
return num_pairs_highest_column
Another way I've though of approaching the problem is through sets. So every column gets its own set, and the index of every row with a one gets added to such a set. So for the example above, this would look like:
set = [{0, 1, 2, 3, 4, 5, 6, 7},
{0, 3, 6, 7},
{0, 1, 2, 3, 4, 5, 6, 7},
{1, 3, 6},
{0, 1, 3, 4, 5, 7},
{4, 5}]
The idea is to find the number of pairs in set[j] that are in no higher set (it can be in a lower set, just not higher). Since, I mentioned before, all cases will have column zero with all ones, every set is a subset of set[0]. So a much worse performing code using this approach looks like this:
def generate_sets(bin_mat):
sets = []
for j in range(bin_mat.shape[1]):
column = set()
for i in range(bin_mat.shape[0]):
if bin_mat[i, j] == 1:
column.add(i)
sets.append(column)
return sets
def get_hcc_sets(bin_mat):
sets = generate_sets(bin_mat)
pairs_sets = []
num_pairs_hcc = np.zeros(len(sets), dtype=np.int32)
for subset in sets:
pairs_sets.append({p for p in itertools.combinations(sorted(subset), r = 2)})
for j in range(len(sets)-1):
intersections = [pairs_sets[j].intersection(pairs_sets[q]) for q in range(j+1, len(sets))]
num_pairs_hcc[j] = len(pairs_sets[j] - set.union(*intersections))
num_pairs_hcc[len(sets)-1]=len(pairs_sets[len(sets)-1])
return num_pairs_hcc
I haven't checked that this sets implementation always produces the same results as the previous one, but in the finitely many cases I tried, it works. However, I am 100% certain that my first implementation gives exactly the result I need.
another reference example:
input:
array([[1, 1, 0],
[1, 1, 0],
[1, 1, 0],
[1, 1, 0],
[1, 0, 1],
[1, 0, 1],
[1, 0, 1],
[1, 0, 1]], dtype=int32)
output:
array([16, 6, 6], dtype=int32)
Is there a way to beat my O(n^2 k) implementation. It seems rather brute force and like there should be something I can exploit to make this calculation faster. I always expect n to be greater than k, by a orders of magnitude in many cases. So I'd rather the k have a higher exponent than the n.

If you are going for the O(n² k) approach in python, you can do it with much shorter code using itertools and set; the code might be more efficient too.
import itertools
t = [[1, 1, 1, 0, 1, 0],
[1, 0, 1, 1, 1, 0],
[1, 0, 1, 0, 0, 0],
[1, 1, 1, 1, 1, 0],
[1, 0, 1, 0, 1, 1],
[1, 0, 1, 0, 1, 1],
[1, 1, 1, 1, 0, 0],
[1, 1, 1, 0, 1, 0]]
n,k = len(t),len(t[0])
# build set of pairs of 1 in column j
def candidates(j):
return {(i1, i2) for (i1, i2) in itertools.combinations(range(n), 2) if 1 == t[i1][j] == t[i2][j]}
# build set of pairs of 1 in higher columns
def badpairs(j):
return {(i1, i2) for (i1, i2) in itertools.combinations(range(n), 2) if any(1 == t[i1][j0] == t[i2][j0] for j0 in range(j+1, k))}
# set difference
def finalpairs(j):
return candidates(j) - badpairs(j)
# print pairs
for j in range(k):
print(j, finalpairs(j))
# 0 set()
# 1 set()
# 2 {(2, 4), (1, 2), (2, 7), (4, 6), (0, 6), (2, 3), (6, 7), (0, 2), (2, 6), (5, 6), (2, 5)}
# 3 {(1, 6), (3, 6)}
# 4 {(0, 1), (0, 7), (0, 4), (3, 4), (1, 5), (3, 7), (0, 3), (1, 4), (5, 7), (1, 7), (0, 5), (1, 3), (4, 7), (3, 5)}
# 5 {(4, 5)}
# print number of pairs
for j in range(k):
print(j, len(finalpairs(j)))
# 0 0
# 1 0
# 2 11
# 3 2
# 4 14
# 5 1
Alternate definition for badpairs:
def badpairs(j):
return set().union(*(candidates(j0) for j0 in range(j+1, k)))
Slightly different approach: avoid building badpairs
def finalpairs(j):
return {(i1, i2) for (i1, i2) in itertools.combinations(range(n), 2) if 1 == t[i1][j] == t[i2][j] and not any(1 == t[i1][j0] == t[i2][j0] for j0 in range(j+1, k))}

Converting a dictionary to a list of lists

Let's say I got a dictionary defined this way: d ={(0, 1): 1, (1, 0): 4, (1, 3): 7, (2, 3): 11}
I want to convert it to a list in a way that each key represents that index of the value within the nested list and index of the nested list itself in the list. Each nested list has 4 items, indices with no defined values are set to 0.
I suck at describing. Here's what I want my function to return: lst = [[0,1,0,0], [4,0,0,7], [0,0,0,11]]. Here's my unfinished, non working code:
def convert):
lst = []
for i in range(len(d)):
lst += [[0,0,0,0]] # adding the zeros first.
for i in d:
for j in range(4):
lst[j] = list(i[j]) # and then the others.

How about:
for (x,y), value in d.items():
list[x][y] = value
Here is the entire function, which also creates the correct list size automatically
def convert(d):
# Figure out how big x and y can get
max_x = max([coord[0] for coord in d.keys()])
max_y = max([coord[1] for coord in d.keys()])
# Create a 2D array with the given dimensions
list = [[0] * (max_y + 1) for ix in range(max_x + 1)]
# Assign values
for (x,y), value in d.items():
list[x][y] = value
return list
if __name__ == "__main__":
d ={(0, 1): 1, (1, 0): 4, (1, 3): 7, (2, 3): 11}
print(convert(d))

# Input
example_d = {(0, 1): 1, (1, 0): 4, (1, 3): 7, (2, 3): 11}
def get_list(d):
# Figure out the required lengths by looking at the highest indices
max_list_idx = max(x for (x, _), _ in d.items())
max_sublist_idx = max(y for (_, y), _ in d.items())
# Create an empty list with the max sizes
t = [[0] * (max_sublist_idx + 1) for _ in range(max_list_idx + 1)]
# Fill out the empty list according to the input
for (x, y), value in d.items():
t[x][y] = value
return t
print(get_list(example_d))
# Output: [[0, 1, 0, 0], [4, 0, 0, 7], [0, 0, 0, 11]]

You can try this.
max_x=max(d,key=lambda x:x[0])[0] # For finding max number of rows
# 2
max_y=max(d,key=lambda x:x[1])[1] # For finding max of columns
# 3
new_list=[[0]*(max_y+1) for _ in range(max_x+1)] # Creating a list with max_x+1 rows and max_y+1 columns filled with zeros
# [[0, 0, 0, 0], [0, 0, 0, 0], [0, 0, 0, 0]]
for (x,y),v in d:
new_list[x][y]=v
new_list
# [[0, 1, 0, 0], [4, 0, 0, 7], [0, 0, 0, 11]]

You can use a list comprehension:
d ={(0, 1): 1, (1, 0): 4, (1, 3): 7, (2, 3): 11}
m_x, m_y = map(max, zip(*d))
m = [[d.get((b, a), 0) for a in range(m_y+1)] for b in range(m_x+1)]
Ouptut:
[[0, 1, 0, 0], [4, 0, 0, 7], [0, 0, 0, 11]]

Count the number of occurences of a pattern in a list in Python

Given a pattern [1,1,0,1,1], and a binary list of length 100, [0,1,1,0,0,...,0,1]. I want to count the number of occurences of this pattern in this list. Is there a simple way to do this without the need to track the each item at every index with a variable?
Note something like this, [...,1, 1, 0, 1, 1, 1, 1, 0, 1, 1,...,0] can occur but this should be counted as 2 occurrences.

Convert your list to string using join. Then do:
text.count(pattern)
If you need to count overlapping matches then you will have to use regex matching or define your own function.
Edit
Here is the full code:
def overlapping_occurences(string, sub):
count = start = 0
while True:
start = string.find(sub, start) + 1
if start > 0:
count+=1
else:
return count
given_list = [1, 1, 0, 1, 1, 1, 1, 0, 1, 1]
pattern = [1,1,0,1,1]
text = ''.join(str(x) for x in given_list)
print(text)
pattern = ''.join(str(x) for x in pattern)
print(pattern)
print(text.count(pattern)) #for no overlapping
print(overlapping_occurences(text, pattern))

l1 = [1, 1, 0, 1, 1, 1, 1, 0, 1, 1, 0]
l1str = str(l1).replace(" ", "").replace("[", "").replace("]", "")
l3 = [1, 1, 0, 1, 1]
l3str = str(l3).replace(" ", "").replace("[", "").replace("]", "")
l1str = l1str.replace(l3str, "foo")
foo = l1str.count("foo")
print(foo)

you can always use the naive way :
for loop on slices of the list (as in the slice that starts at i-th index and ends at i+[length of pattern]).
and you can improve it - notice that if you found an occurence in index i' you can skip i+1 and i+2 and check from i+3 and onwards (meaning - you can check if there is a sub-pattern that will ease your search )
it costs O(n*m)
you can use backwards convolution (called pattern matching algorithem)
this costs O(n*log(n)) which is better

I think a simple regex would suffice:
def find(sample_list):
list_1 = [1,1,0,1,1]
str_1 = str(list_1)[1:-1]
print len(re.findall(str_1, str(sample_list)))
Hope this solves your problem.

from collections import Counter
a = [1,1,0,1,1]
b = [1,1,0,1,1,1,1,0,1,1]
lst = list()
for i in range(len(b)-len(a)+1):
lst.append(tuple(b[i:i+len(a)]))
c = Counter(lst)
print c[tuple(a)]
output
2
the loop can be written in one line like, for more "clean" but less understood code
lst = [tuple(b[i:i+len(a)]) for i in range(len(b)-len(a)+1)]
NOTE, I'm using tuple cause they are immutable objects and can be hashed
you can also use the hash functionality and create your own hash method like multiple each var with 10 raised to his position e.g
[1,0,1] = 1 * 1 + 0 * 10 + 1 * 100 = 101
that way you can make a one pass on the list and check if it contains the pattern by simply check if sub_list == 101

You can solve it using following two steps:
Combine all elements of the list in a single string
Use python count function to match the pattern in the string
a_new = ''.join(map(str,a))
pattern = ''.join(map(str,pattern))
a_new.count(pattern)

You can divide the lookup list into chucks of size of the pattern you are looking. You can achieve this using simple recipe involving itertools.islice to yield a sliding window iterator
>>> from itertools import islice
>>> p = [1,1,0,1,1]
>>> l = [0,1,1,0,0,0,1,1,0,1,1,1,0,0,1]
>>> [tuple(islice(l,k,len(p)+k)) for k in range(len(l)-len(p)+1)]
This will give you output like:
>>> [(0, 1, 1, 0, 0), (1, 1, 0, 0, 0), (1, 0, 0, 0, 1), (0, 0, 0, 1, 1), (0, 0, 1, 1, 0), (0, 1, 1, 0, 1), (1, 1, 0, 1, 1), (1, 0, 1, 1, 1), (0, 1, 1, 1, 0), (1, 1, 1, 0, 0), (1, 1, 0, 0, 1)]
Now you can use collections.Counter to count the occurrence of each sublist in sequence like
>>> from collections import Counter
>>> c = Counter([tuple(islice(l,k,len(p)+k)) for k in range(len(l)-len(p)+1)])
>>> c
>>> Counter({(0, 1, 1, 0, 1): 1, (1, 1, 1, 0, 0): 1, (0, 0, 1, 1, 0): 1, (0, 1, 1, 1, 0): 1, (1, 1, 0, 0, 0): 1, (0, 0, 0, 1, 1): 1, (1, 1, 0, 1, 1): 1, (0, 1, 1, 0, 0): 1, (1, 0, 1, 1, 1): 1, (1, 1, 0, 0, 1): 1, (1, 0, 0, 0, 1): 1})
To fetch frequency of your desired sequence use
>>> c.get(tuple(p),0)
>>> 1
Note I have used tuple everywhere as dict keys since list is not a hashable type in python so cannot be used as dict keys.

You can try range approach :
pattern_data=[1,1,0,1,1]
data=[1,1,0,1,1,0,0,0,0,1,1,1,1,0,0,1,1,0,1,1,1,1,0,1,1,0,0,0,0,0,1,1,0,1,0,1,1,0,1,1,1,1,0,1,1,0,0,0,0,0,0,0,1,1,0,1,1,0,1,1,1,1,0,1,1,0,0,0,0,0,0,0,0,0,0,1,1,0,1,1,1,1,0,1,1,0,0,0,0,0,0,0,0,0,0,1,1,0,1,1]
count=0
for i in range(0,len(data),1):
if data[i:i+len(pattern_data)]==pattern_data:
print(i,data[i:i+len(pattern_data)])
j+=1
print(count)
output:
0 [1, 1, 0, 1, 1]
15 [1, 1, 0, 1, 1]
20 [1, 1, 0, 1, 1]
35 [1, 1, 0, 1, 1]
40 [1, 1, 0, 1, 1]
52 [1, 1, 0, 1, 1]
55 [1, 1, 0, 1, 1]
60 [1, 1, 0, 1, 1]
75 [1, 1, 0, 1, 1]
80 [1, 1, 0, 1, 1]
95 [1, 1, 0, 1, 1]
11

Python How to repeat a list's elements in another list content until the length of the second list fulfilled?

How to repeat a list's elements in another list content until the length of the second list fulfilled?
For example:
LA = [0,1,2]
LB = [(0,0),(1,0),(2,0),(3,0),(4,0),(5,0),(6,0)]
the end result should be:
LC = [(0,0,0),(1,0,1),(2,0,2),(3,0,0),(4,0,1),(5,0,2),(6,0,0)]
Hopefully it can be done in one line

You can use itertools.cycle:
from itertools import cycle
LA = [0,1,2]
LB = [(0,0),(1,0),(2,0),(3,0),(4,0),(5,0),(6,0)]
LC = [(i, j, k) for (i, j), k in zip(LB, cycle(LA))]
print LC
# [(0, 0, 0), (1, 0, 1), (2, 0, 2), (3, 0, 0), (4, 0, 1), (5, 0, 2), (6, 0, 0)]
This works because zip generates items until one of the iterables is exhausted...but a cycle object is inexhaustible, so we'll keep padding items from LA until LB runs out.

#use list comprehension and get the element from LA by using the index from LB %3.
[v+(LA[k%3],) for k,v in enumerate(LB)]
Out[718]: [[0, 0, 0], [1, 0, 1], [2, 0, 2], [3, 0, 0], [4, 0, 1], [5, 0, 2], [6, 0, 0]]

Try enumerate() like this along with list comprehension -
[elem + (LA[i % len(LA)],) for i, elem in enumerate(LB)]

Here a more "explicit" version that works with any length of LA.
LA = [0,1,2]
LB = [(0,0),(1,0),(2,0),(3,0),(4,0),(5,0),(6,0)]
i = 0
LC = []
for x,y in LB:
try:
z = LA[i]
except IndexError:
i = 0
z = LA[i]
LC.append((x,y,z))
i += 1
print LC
[(0, 0, 0), (1, 0, 1), (2, 0, 2), (3, 0, 0), (4, 0, 1), (5, 0, 2), (6, 0, 0)]

Permutations and indexes, python

I create a list of all permutations of lets say 0,1,2
perm = list(itertools.permutations([0,1,2]))
This is used for accessing indexes in another list in that specific order. Every time a index is accessed it is popped.
When an element is popped, the elements with indexes higher than the popped elements index will shift one position down. This means that if I want to pop from my list by indexes [0,1,2] it will result in an index error, since index 2 will not exist when I reach it. [0,1,2] should therefor be popped in order [0,0,0].
more examples is
[0,2,1] = [0,1,0]
[2,0,1] = [2,0,0]
[1,2,0] = [1,1,0]
right now this is being handled through a series of checks, my question is if anyone knows a smart way to turn the list of lists generated by itertools into the desired list:
[(0, 1, 2), (0, 2, 1), (1, 0, 2), (1, 2, 0), (2, 0, 1), (2, 1, 0)]
[(0, 0, 0), (0, 1, 0), (1, 0, 0), (1, 1, 0), (2, 0, 0), (2, 1, 0)]

Simply iterate through each tuple, and decrement the indexes of each subsequent index that is greater than that element:
l=[(0, 1, 2), (0, 2, 1), (1, 0, 2), (1, 2, 0), (2, 0, 1), (2, 1, 0)]
def lower_idxs(lst):
new_row = list(lst)
for i, val in enumerate(new_row):
for j in xrange(i+1, len(new_row)):
if new_row[i] < new_row[j]:
new_row[j] -= 1
return new_row
print [lower_idxs(x) for x in l]
will print out
[[0, 0, 0], [0, 1, 0], [1, 0, 0], [1, 1, 0], [2, 0, 0], [2, 1, 0]]
Here is a fancier one-liner based on Randy C's solution:
print [tuple(y-sum(v<y for v in x[:i]) for i,y in enumerate(x)) for x in l]

Here's a one-liner for it (assuming your list is l):
[v-sum(v>v2 for v2 in l[:k]) for k, v in enumerate(l)]

Develop Reference

Python is a programming language that lets you work quickly and integrate systems more effectively.

Find all subsequences in list - python

Related

find the number of pairs that belong to a column but not a higher order column

Converting a dictionary to a list of lists

Count the number of occurences of a pattern in a list in Python

Python How to repeat a list's elements in another list content until the length of the second list fulfilled?

Permutations and indexes, python

Categories

Resources