Related
I have an n x k binary numpy array, I am trying to find an efficient way to find the number of pairs of ones that belong to some column[j] but not to any higher column, in this case higher means in increasing index value.
For example in the array:
array([[1, 1, 1, 0, 1, 0],
[1, 0, 1, 1, 1, 0],
[1, 0, 1, 0, 0, 0],
[1, 1, 1, 1, 1, 0],
[1, 0, 1, 0, 1, 1],
[1, 0, 1, 0, 1, 1],
[1, 1, 1, 1, 0, 0],
[1, 1, 1, 0, 1, 0]], dtype=int32)
the output should be array([ 0, 0, 11, 2, 14, 1], dtype=int32). This makes sense because we see column[2] has all ones, so any pair of ones will necessarily have a highest column in common of at least 2, because even though column[0] also has all ones, it's lower, so no pair of ones have it as their highest in common. In all cases I am considering, column[0] will always have all ones.
Here is some example code that works and I believe is something like O(n^2 k)
def hcc(i, j, k, bin_mat):
# hcc means highest common columns
# i: index i
# j: index j
# k: number of columns - 1
# bin_mat: binary matrix
for q in range(k, 0, -1):
if (bin_mat[i, q] and bin_mat[j, q]):
return q
return 0
def get_num_pairs_columns(bin_mat):
k = bin_mat.shape[1]-1
num_pairs_hcc = np.zeros(k+1, dtype=np.int32) # number of one-pairs in columns
for i in range(bin_mat.shape[0]):
for j in range(bin_mat.shape[0]):
if(i < j):
num_pairs_hcc[hcc(i, j, k, bin_mat)] += 1
return num_pairs_highest_column
Another way I've though of approaching the problem is through sets. So every column gets its own set, and the index of every row with a one gets added to such a set. So for the example above, this would look like:
set = [{0, 1, 2, 3, 4, 5, 6, 7},
{0, 3, 6, 7},
{0, 1, 2, 3, 4, 5, 6, 7},
{1, 3, 6},
{0, 1, 3, 4, 5, 7},
{4, 5}]
The idea is to find the number of pairs in set[j] that are in no higher set (it can be in a lower set, just not higher). Since, I mentioned before, all cases will have column zero with all ones, every set is a subset of set[0]. So a much worse performing code using this approach looks like this:
def generate_sets(bin_mat):
sets = []
for j in range(bin_mat.shape[1]):
column = set()
for i in range(bin_mat.shape[0]):
if bin_mat[i, j] == 1:
column.add(i)
sets.append(column)
return sets
def get_hcc_sets(bin_mat):
sets = generate_sets(bin_mat)
pairs_sets = []
num_pairs_hcc = np.zeros(len(sets), dtype=np.int32)
for subset in sets:
pairs_sets.append({p for p in itertools.combinations(sorted(subset), r = 2)})
for j in range(len(sets)-1):
intersections = [pairs_sets[j].intersection(pairs_sets[q]) for q in range(j+1, len(sets))]
num_pairs_hcc[j] = len(pairs_sets[j] - set.union(*intersections))
num_pairs_hcc[len(sets)-1]=len(pairs_sets[len(sets)-1])
return num_pairs_hcc
I haven't checked that this sets implementation always produces the same results as the previous one, but in the finitely many cases I tried, it works. However, I am 100% certain that my first implementation gives exactly the result I need.
another reference example:
input:
array([[1, 1, 0],
[1, 1, 0],
[1, 1, 0],
[1, 1, 0],
[1, 0, 1],
[1, 0, 1],
[1, 0, 1],
[1, 0, 1]], dtype=int32)
output:
array([16, 6, 6], dtype=int32)
Is there a way to beat my O(n^2 k) implementation. It seems rather brute force and like there should be something I can exploit to make this calculation faster. I always expect n to be greater than k, by a orders of magnitude in many cases. So I'd rather the k have a higher exponent than the n.
If you are going for the O(n² k) approach in python, you can do it with much shorter code using itertools and set; the code might be more efficient too.
import itertools
t = [[1, 1, 1, 0, 1, 0],
[1, 0, 1, 1, 1, 0],
[1, 0, 1, 0, 0, 0],
[1, 1, 1, 1, 1, 0],
[1, 0, 1, 0, 1, 1],
[1, 0, 1, 0, 1, 1],
[1, 1, 1, 1, 0, 0],
[1, 1, 1, 0, 1, 0]]
n,k = len(t),len(t[0])
# build set of pairs of 1 in column j
def candidates(j):
return {(i1, i2) for (i1, i2) in itertools.combinations(range(n), 2) if 1 == t[i1][j] == t[i2][j]}
# build set of pairs of 1 in higher columns
def badpairs(j):
return {(i1, i2) for (i1, i2) in itertools.combinations(range(n), 2) if any(1 == t[i1][j0] == t[i2][j0] for j0 in range(j+1, k))}
# set difference
def finalpairs(j):
return candidates(j) - badpairs(j)
# print pairs
for j in range(k):
print(j, finalpairs(j))
# 0 set()
# 1 set()
# 2 {(2, 4), (1, 2), (2, 7), (4, 6), (0, 6), (2, 3), (6, 7), (0, 2), (2, 6), (5, 6), (2, 5)}
# 3 {(1, 6), (3, 6)}
# 4 {(0, 1), (0, 7), (0, 4), (3, 4), (1, 5), (3, 7), (0, 3), (1, 4), (5, 7), (1, 7), (0, 5), (1, 3), (4, 7), (3, 5)}
# 5 {(4, 5)}
# print number of pairs
for j in range(k):
print(j, len(finalpairs(j)))
# 0 0
# 1 0
# 2 11
# 3 2
# 4 14
# 5 1
Alternate definition for badpairs:
def badpairs(j):
return set().union(*(candidates(j0) for j0 in range(j+1, k)))
Slightly different approach: avoid building badpairs
def finalpairs(j):
return {(i1, i2) for (i1, i2) in itertools.combinations(range(n), 2) if 1 == t[i1][j] == t[i2][j] and not any(1 == t[i1][j0] == t[i2][j0] for j0 in range(j+1, k))}
Let's say I got a dictionary defined this way: d ={(0, 1): 1, (1, 0): 4, (1, 3): 7, (2, 3): 11}
I want to convert it to a list in a way that each key represents that index of the value within the nested list and index of the nested list itself in the list. Each nested list has 4 items, indices with no defined values are set to 0.
I suck at describing. Here's what I want my function to return: lst = [[0,1,0,0], [4,0,0,7], [0,0,0,11]]. Here's my unfinished, non working code:
def convert):
lst = []
for i in range(len(d)):
lst += [[0,0,0,0]] # adding the zeros first.
for i in d:
for j in range(4):
lst[j] = list(i[j]) # and then the others.
How about:
for (x,y), value in d.items():
list[x][y] = value
Here is the entire function, which also creates the correct list size automatically
def convert(d):
# Figure out how big x and y can get
max_x = max([coord[0] for coord in d.keys()])
max_y = max([coord[1] for coord in d.keys()])
# Create a 2D array with the given dimensions
list = [[0] * (max_y + 1) for ix in range(max_x + 1)]
# Assign values
for (x,y), value in d.items():
list[x][y] = value
return list
if __name__ == "__main__":
d ={(0, 1): 1, (1, 0): 4, (1, 3): 7, (2, 3): 11}
print(convert(d))
# Input
example_d = {(0, 1): 1, (1, 0): 4, (1, 3): 7, (2, 3): 11}
def get_list(d):
# Figure out the required lengths by looking at the highest indices
max_list_idx = max(x for (x, _), _ in d.items())
max_sublist_idx = max(y for (_, y), _ in d.items())
# Create an empty list with the max sizes
t = [[0] * (max_sublist_idx + 1) for _ in range(max_list_idx + 1)]
# Fill out the empty list according to the input
for (x, y), value in d.items():
t[x][y] = value
return t
print(get_list(example_d))
# Output: [[0, 1, 0, 0], [4, 0, 0, 7], [0, 0, 0, 11]]
You can try this.
max_x=max(d,key=lambda x:x[0])[0] # For finding max number of rows
# 2
max_y=max(d,key=lambda x:x[1])[1] # For finding max of columns
# 3
new_list=[[0]*(max_y+1) for _ in range(max_x+1)] # Creating a list with max_x+1 rows and max_y+1 columns filled with zeros
# [[0, 0, 0, 0], [0, 0, 0, 0], [0, 0, 0, 0]]
for (x,y),v in d:
new_list[x][y]=v
new_list
# [[0, 1, 0, 0], [4, 0, 0, 7], [0, 0, 0, 11]]
You can use a list comprehension:
d ={(0, 1): 1, (1, 0): 4, (1, 3): 7, (2, 3): 11}
m_x, m_y = map(max, zip(*d))
m = [[d.get((b, a), 0) for a in range(m_y+1)] for b in range(m_x+1)]
Ouptut:
[[0, 1, 0, 0], [4, 0, 0, 7], [0, 0, 0, 11]]
Given a pattern [1,1,0,1,1], and a binary list of length 100, [0,1,1,0,0,...,0,1]. I want to count the number of occurences of this pattern in this list. Is there a simple way to do this without the need to track the each item at every index with a variable?
Note something like this, [...,1, 1, 0, 1, 1, 1, 1, 0, 1, 1,...,0] can occur but this should be counted as 2 occurrences.
Convert your list to string using join. Then do:
text.count(pattern)
If you need to count overlapping matches then you will have to use regex matching or define your own function.
Edit
Here is the full code:
def overlapping_occurences(string, sub):
count = start = 0
while True:
start = string.find(sub, start) + 1
if start > 0:
count+=1
else:
return count
given_list = [1, 1, 0, 1, 1, 1, 1, 0, 1, 1]
pattern = [1,1,0,1,1]
text = ''.join(str(x) for x in given_list)
print(text)
pattern = ''.join(str(x) for x in pattern)
print(pattern)
print(text.count(pattern)) #for no overlapping
print(overlapping_occurences(text, pattern))
l1 = [1, 1, 0, 1, 1, 1, 1, 0, 1, 1, 0]
l1str = str(l1).replace(" ", "").replace("[", "").replace("]", "")
l3 = [1, 1, 0, 1, 1]
l3str = str(l3).replace(" ", "").replace("[", "").replace("]", "")
l1str = l1str.replace(l3str, "foo")
foo = l1str.count("foo")
print(foo)
you can always use the naive way :
for loop on slices of the list (as in the slice that starts at i-th index and ends at i+[length of pattern]).
and you can improve it - notice that if you found an occurence in index i' you can skip i+1 and i+2 and check from i+3 and onwards (meaning - you can check if there is a sub-pattern that will ease your search )
it costs O(n*m)
you can use backwards convolution (called pattern matching algorithem)
this costs O(n*log(n)) which is better
I think a simple regex would suffice:
def find(sample_list):
list_1 = [1,1,0,1,1]
str_1 = str(list_1)[1:-1]
print len(re.findall(str_1, str(sample_list)))
Hope this solves your problem.
from collections import Counter
a = [1,1,0,1,1]
b = [1,1,0,1,1,1,1,0,1,1]
lst = list()
for i in range(len(b)-len(a)+1):
lst.append(tuple(b[i:i+len(a)]))
c = Counter(lst)
print c[tuple(a)]
output
2
the loop can be written in one line like, for more "clean" but less understood code
lst = [tuple(b[i:i+len(a)]) for i in range(len(b)-len(a)+1)]
NOTE, I'm using tuple cause they are immutable objects and can be hashed
you can also use the hash functionality and create your own hash method like multiple each var with 10 raised to his position e.g
[1,0,1] = 1 * 1 + 0 * 10 + 1 * 100 = 101
that way you can make a one pass on the list and check if it contains the pattern by simply check if sub_list == 101
You can solve it using following two steps:
Combine all elements of the list in a single string
Use python count function to match the pattern in the string
a_new = ''.join(map(str,a))
pattern = ''.join(map(str,pattern))
a_new.count(pattern)
You can divide the lookup list into chucks of size of the pattern you are looking. You can achieve this using simple recipe involving itertools.islice to yield a sliding window iterator
>>> from itertools import islice
>>> p = [1,1,0,1,1]
>>> l = [0,1,1,0,0,0,1,1,0,1,1,1,0,0,1]
>>> [tuple(islice(l,k,len(p)+k)) for k in range(len(l)-len(p)+1)]
This will give you output like:
>>> [(0, 1, 1, 0, 0), (1, 1, 0, 0, 0), (1, 0, 0, 0, 1), (0, 0, 0, 1, 1), (0, 0, 1, 1, 0), (0, 1, 1, 0, 1), (1, 1, 0, 1, 1), (1, 0, 1, 1, 1), (0, 1, 1, 1, 0), (1, 1, 1, 0, 0), (1, 1, 0, 0, 1)]
Now you can use collections.Counter to count the occurrence of each sublist in sequence like
>>> from collections import Counter
>>> c = Counter([tuple(islice(l,k,len(p)+k)) for k in range(len(l)-len(p)+1)])
>>> c
>>> Counter({(0, 1, 1, 0, 1): 1, (1, 1, 1, 0, 0): 1, (0, 0, 1, 1, 0): 1, (0, 1, 1, 1, 0): 1, (1, 1, 0, 0, 0): 1, (0, 0, 0, 1, 1): 1, (1, 1, 0, 1, 1): 1, (0, 1, 1, 0, 0): 1, (1, 0, 1, 1, 1): 1, (1, 1, 0, 0, 1): 1, (1, 0, 0, 0, 1): 1})
To fetch frequency of your desired sequence use
>>> c.get(tuple(p),0)
>>> 1
Note I have used tuple everywhere as dict keys since list is not a hashable type in python so cannot be used as dict keys.
You can try range approach :
pattern_data=[1,1,0,1,1]
data=[1,1,0,1,1,0,0,0,0,1,1,1,1,0,0,1,1,0,1,1,1,1,0,1,1,0,0,0,0,0,1,1,0,1,0,1,1,0,1,1,1,1,0,1,1,0,0,0,0,0,0,0,1,1,0,1,1,0,1,1,1,1,0,1,1,0,0,0,0,0,0,0,0,0,0,1,1,0,1,1,1,1,0,1,1,0,0,0,0,0,0,0,0,0,0,1,1,0,1,1]
count=0
for i in range(0,len(data),1):
if data[i:i+len(pattern_data)]==pattern_data:
print(i,data[i:i+len(pattern_data)])
j+=1
print(count)
output:
0 [1, 1, 0, 1, 1]
15 [1, 1, 0, 1, 1]
20 [1, 1, 0, 1, 1]
35 [1, 1, 0, 1, 1]
40 [1, 1, 0, 1, 1]
52 [1, 1, 0, 1, 1]
55 [1, 1, 0, 1, 1]
60 [1, 1, 0, 1, 1]
75 [1, 1, 0, 1, 1]
80 [1, 1, 0, 1, 1]
95 [1, 1, 0, 1, 1]
11
How to repeat a list's elements in another list content until the length of the second list fulfilled?
For example:
LA = [0,1,2]
LB = [(0,0),(1,0),(2,0),(3,0),(4,0),(5,0),(6,0)]
the end result should be:
LC = [(0,0,0),(1,0,1),(2,0,2),(3,0,0),(4,0,1),(5,0,2),(6,0,0)]
Hopefully it can be done in one line
You can use itertools.cycle:
from itertools import cycle
LA = [0,1,2]
LB = [(0,0),(1,0),(2,0),(3,0),(4,0),(5,0),(6,0)]
LC = [(i, j, k) for (i, j), k in zip(LB, cycle(LA))]
print LC
# [(0, 0, 0), (1, 0, 1), (2, 0, 2), (3, 0, 0), (4, 0, 1), (5, 0, 2), (6, 0, 0)]
This works because zip generates items until one of the iterables is exhausted...but a cycle object is inexhaustible, so we'll keep padding items from LA until LB runs out.
#use list comprehension and get the element from LA by using the index from LB %3.
[v+(LA[k%3],) for k,v in enumerate(LB)]
Out[718]: [[0, 0, 0], [1, 0, 1], [2, 0, 2], [3, 0, 0], [4, 0, 1], [5, 0, 2], [6, 0, 0]]
Try enumerate() like this along with list comprehension -
[elem + (LA[i % len(LA)],) for i, elem in enumerate(LB)]
Here a more "explicit" version that works with any length of LA.
LA = [0,1,2]
LB = [(0,0),(1,0),(2,0),(3,0),(4,0),(5,0),(6,0)]
i = 0
LC = []
for x,y in LB:
try:
z = LA[i]
except IndexError:
i = 0
z = LA[i]
LC.append((x,y,z))
i += 1
print LC
[(0, 0, 0), (1, 0, 1), (2, 0, 2), (3, 0, 0), (4, 0, 1), (5, 0, 2), (6, 0, 0)]
I create a list of all permutations of lets say 0,1,2
perm = list(itertools.permutations([0,1,2]))
This is used for accessing indexes in another list in that specific order. Every time a index is accessed it is popped.
When an element is popped, the elements with indexes higher than the popped elements index will shift one position down. This means that if I want to pop from my list by indexes [0,1,2] it will result in an index error, since index 2 will not exist when I reach it. [0,1,2] should therefor be popped in order [0,0,0].
more examples is
[0,2,1] = [0,1,0]
[2,0,1] = [2,0,0]
[1,2,0] = [1,1,0]
right now this is being handled through a series of checks, my question is if anyone knows a smart way to turn the list of lists generated by itertools into the desired list:
[(0, 1, 2), (0, 2, 1), (1, 0, 2), (1, 2, 0), (2, 0, 1), (2, 1, 0)]
[(0, 0, 0), (0, 1, 0), (1, 0, 0), (1, 1, 0), (2, 0, 0), (2, 1, 0)]
Simply iterate through each tuple, and decrement the indexes of each subsequent index that is greater than that element:
l=[(0, 1, 2), (0, 2, 1), (1, 0, 2), (1, 2, 0), (2, 0, 1), (2, 1, 0)]
def lower_idxs(lst):
new_row = list(lst)
for i, val in enumerate(new_row):
for j in xrange(i+1, len(new_row)):
if new_row[i] < new_row[j]:
new_row[j] -= 1
return new_row
print [lower_idxs(x) for x in l]
will print out
[[0, 0, 0], [0, 1, 0], [1, 0, 0], [1, 1, 0], [2, 0, 0], [2, 1, 0]]
Here is a fancier one-liner based on Randy C's solution:
print [tuple(y-sum(v<y for v in x[:i]) for i,y in enumerate(x)) for x in l]
Here's a one-liner for it (assuming your list is l):
[v-sum(v>v2 for v2 in l[:k]) for k, v in enumerate(l)]