Vectorizing an indexing operation in numpy

Vectorizing an indexing operation in numpy - python

What is the best way to vectorize the following code in numpy?
from numpy import *
A = zeros(5, dtype='int')
I = [1, 1, 1, 3]
J = [2, 1, 1, 1]
for i, j in zip(I, J):
A[i] += j
print A
The result should be:
[0 4 0 1 0]
Here A is the original array, I stores the index at which we want to increment by the corresponding entry of J.
If one simply vectorizes the above by doing:
A[I] += J
print A
one gets the wrong answer
[0 1 0 1 0]
as, apparently, repeated indices are ignored. Is there an equivalent operation to += which does not ignore repeated indices?

You can use numpy.bincount():
A = numpy.zeros(5, dtype='int')
I = [1, 1, 1, 3]
J = [2, 1, 1, 1]
sums = numpy.bincount(I, J)
A[:len(sums)] += sums
print(A)
prints
[0 4 0 1 0]

In principle you can do it with numpy's bincount and unique, but I'd guess it'll only make the code much less readable without any sensible performance improvement.

Related

moving through variables in matrix with recursion

I'm trying to write a function which got a kind of a matrix - a list that consists of sub-lists
So I want to go through all the matrix members and print them out. Only in recursion!
But I dont know how to create a good "stop conditions" for my function.
So I get more numbers than I wanted.
def mission(mat):
def move(mat, i=0, j=0 ,k=0):
print(mat[i][j])
if j<(len(mat[0])-1):
move(mat,i,j+1)
if i<(len(mat)-1):
move(mat,i+1,0)
move(mat)
mat = [[1, 0, 0, 3, 0],
[0, 0, 2, 3, 0],
[2, 0, 0, 2, 0],
[0, 1, 2, 3, 3]]
mission(mat)
*edit
I got another question - is there a way to decrease 2 list that looks like the mat function i did here (with the same length - just different numbers)
without using numpy or for ?

You can test for end condition on function start, and check if you have already crossed the last item in the last sublist. If so, you return.
if(i == len(mat) - 1 and j == len(mat[0]) - 1):
return
Then, you check whether you are at the last item in your current sublist, and if so, increment the sublist index i and set the item index back to 0.
if(j == len(mat[0]) - 1):
i += 1
j = 0
If you are neither at the end of the whole 2d matrix (list), nor the end of any sublist, you just need to increment the item index j.
else:
j += 1
Then, you can safely call your function recursively. The whole code ends up looking like this.
def move(mat, i=0, j=0):
print(mat[i][j])
if(i == len(mat) - 1 and j == len(mat[0]) - 1):
return
if(j == len(mat[0]) - 1):
i += 1
j = 0
else:
j += 1
move(mat, i, j)
mat = [[1, 0, 0, 3, 0],
[0, 0, 2, 3, 0],
[2, 0, 0, 2, 0],
[0, 1, 2, 3, 3]]
move(mat)
Output::
1
0
0
3
0
0
0
2
3
0
2
0
0
2
0
0
1
2
3
3

as for the move function, simply change the 2nd if to elif
def move(mat, i=0, j=0):
print(mat[i][j])
if j<(len(mat[0])-1):
move(mat,i,j+1)
elif i<(len(mat)-1):
move(mat,i+1,0)

Making a matrix with numpy.array

I tried to create a matrix using numpy.array with the following code
def matrix_input(3):
matrix = []
for i in range(N):
a = nd.array(input().split(),int)
matrix.append(a)
print(matrix)
But I'm getting the following output:
[array([1, 1, 1]), array([1, 1]), array([1, 1, 1])]
For the input:
1 1 1
1 1
1 1 1
I don't want the matrix to have the word array in them... How do I remove it?

Make it a list on the 4th line of your code. Also, correct your function as mentioned in the code below. Function call and function creation are two different things, so does the arguments you pass into it.
import numpy as np
def matrix_input(N): # Argument to function while creation is wrong, use N instead of 3.
matrix = []
for i in range(N):
a = list(np.array(input().split(),int)) # Make it a list here
matrix.append(a)
print(matrix)
output:
matrix_input(3)
1 1 1
1 1
1 1 1
[[1, 1, 1], [1, 1], [1, 1, 1]]
Alternative method for creating a Proper matrix :
import numpy as np
matrix_1 = np.matrix([[1,1,1],[1,1,0],[1,1,1]])
print(matrix_1)
Output:
[[1 1 1]
[1 1 0]
[1 1 1]]

how to make efficiently large sparse matrix in python?

1.
i try to make a numpy array with shape:(6962341, 268148), type: np.uint8
2.
i have the data consist of [x1,x2,x3,x4], [x2,x1], [x4,x5,x3]...
3.
i want to assign array[x1,x2] += 1, array[x1,x3] += 1, array[x1,x4] += 1, array[x2,x3] += 1, ...
4.
so i have tried a function of the following structure.
import numpy as np
from itertools import combinations
base_array = np.zeros((row_size, col_size), dtype=np.uint8))
for each_list in data:
for (x,y) in list(combinations(each_list,2)):
if x>y:
base_array[y,x] += 1
else:
base_array[x,y] += 1
it basically compute the upper triangle of a matrix and i will use the upper triangle value. also you can think this is similar to make the base matrix A for co-occurrence matrix. but this function is too slow and i think it is possible to make faster.
What should i do?

Assuming your data is integers (since they represent rows and columns) or you can hash your data x1, x2, ... into 1, 2, ... integers, here is a fast solution:
#list of pairwise combinations in your data
comb_list = []
for each_list in data:
comb_list += list(combinations(each_list,2))
#convert combination int to index (numpy is 0 based indexing)
comb_list = np.array(comb_list) - 1
#make array with flat indices
flat = np.ravel_multi_index((comb_list[:,0],comb_list[:,1]),(row_size,col_size))
#count number of duplicates for each index using np.bincount
base_array = np.bincount(flat,None,row_size*col_size).reshape((row_size,col_size)).astype(np.uint8)
sample data:
[[1, 2, 3, 4], [2, 1], [4, 5, 3, 4]]
Corresponding output:
[[0 1 1 1 0]
[1 0 1 1 0]
[0 0 0 2 0]
[0 0 1 1 1]
[0 0 1 1 0]]
EDIT: corresponding to explanation in comments:
data=[[1, 2, 3, 4], [2, 1], [4, 5, 3, 4]]
base_array = np.zeros((len(data), np.max(np.amax(data))), dtype=np.uint8)
for i, each_list in enumerate(data):
for j in each_list:
base_array[i, j-1] = 1
Output:
[[1 1 1 1 0]
[1 1 0 0 0]
[0 0 1 1 1]]

How do I extract a 2D NumPy sub-array from a 2D NumPy array based on patterns?

I have a 2D NumPy array which looks like this:
Array=
[
[0,0,0,0,0,0,0,2,2,2],
[0,0,0,0,0,0,0,2,2,2].
[0,0,1,1,1,0,0,2,2,2],
[0,0,1,1,1,0,0,2,2,2],
[0,0,1,1,1,0,0,1,1,1],
[0,0,0,0,0,0,0,1,1,1]
]
I need to display the arrays of non-zero elements as:
Array1:
[
[1,1,1],
[1,1,1],
[1,1,1]
]
Array2:
[
[2,2,2],
[2,2,2],
[2,2,2],
[2,2,2]
]
Array3:
[
[1,1,1],
[1,1,1]
]
Could someone please help me out with what logic I could use to achieve the following output? I can't use fixed indexes (like array[a:b, c:d]) since the logic i create should be able to work for any NumPy array with a similar pattern.

This uses scipy.ndimage.label to recursively identify disconnected sub-arrays.
import numpy as np
from scipy.ndimage import label
array = np.array(
[[0,0,0,0,0,0,0,2,2,2,3,3,3],
[0,0,0,0,0,0,0,2,2,2,0,0,1],
[0,0,1,1,1,0,0,2,2,2,0,2,1],
[0,0,1,1,1,0,0,2,2,2,0,2,0],
[0,0,1,1,1,0,0,1,1,1,0,0,0],
[0,0,0,0,0,0,0,1,1,1,0,0,0]])
# initialize list to collect sub-arrays
arr_list = []
def append_subarrays(arr, val, val_0):
'''
arr : 2D array
val : the value used for filtering
val_0 : the original value, which we want to preserve
'''
# remove everything that's not the current val
arr[arr != val] = 0
if 0 in arr: # <-- not a single rectangle yet
# get relevant indices as well as their minima and maxima
x_ind, y_ind = np.where(arr != 0)
min_x, max_x, min_y, max_y = min(x_ind), max(x_ind) + 1, min(y_ind), max(y_ind) + 1
# cut subarray (everything corresponding to val)
arr = arr[min_x:max_x, min_y:max_y]
# use the label function to assign different values to disconnected regions
labeled_arr = label(arr)[0]
# recursively apply append_subarrays to each disconnected region
for sub_val in np.unique(labeled_arr[labeled_arr != 0]):
append_subarrays(labeled_arr.copy(), sub_val, val_0)
else: # <-- we only have a single rectangle left ==> append
arr_list.append(arr * val_0)
for i in np.unique(array[array > 0]):
append_subarrays(array.copy(), i, i)
for arr in arr_list:
print(arr, end='\n'*2)
Output (note: modified example array):
[[1]
[1]]
[[1 1 1]
[1 1 1]
[1 1 1]]
[[1 1 1]
[1 1 1]]
[[2 2 2]
[2 2 2]
[2 2 2]
[2 2 2]]
[[2]
[2]]
[[3 3 3]]

This sounds like a floodfill problem, so skimage.measure.label is a good approach:
Array=np.array([[0,0,0,0,0,0,0,2,2,2],
[0,0,0,0,0,0,0,2,2,2],
[0,0,1,1,1,0,0,2,2,2],
[0,0,1,1,1,0,0,2,2,2],
[0,0,1,1,1,0,0,1,1,1],
[0,0,0,0,0,0,0,1,1,1]
])
from skimage.measure import label
labels = label(Array, connectivity=1)
for label in range(1, labels.max()+1):
xs, ys = np.where(labels==label)
shape = (len(np.unique(xs)), len(np.unique(ys)))
print(Array[xs, ys].reshape(shape))
Output:
[[2 2 2]
[2 2 2]
[2 2 2]
[2 2 2]]
[[1 1 1]
[1 1 1]
[1 1 1]]
[[1 1 1]
[1 1 1]]

startRowIndex = 0 #indexes of sub-arrays
endRowIndex = 0
startColumnIndex = 0
endColumnIndex = 0
tmpI = 0 #for iterating inside the i,j loops
tmpJ = 0
value = 0 #which number we are looking for in array
for i in range(array.shape[0]): #array.shape[0] says how many rows, shape[1] says how many columns
for j in range(array[i].size): #for all elements in a row
if(array[i,j] != 0): #if the element is different than 0
startRowIndex = i
startColumnIndex = j
tmpI = i
tmpJ = j #you cannot change the looping indexes so create tmp indexes
value = array[i,j] #save what number will be sub-array (for example 2)
while(array[tmpI,tmpJ] != 0 and array[tmpI,tmpJ] == value ): #iterate over column numbers
tmpJ+=1
if tmpJ == array.shape[1]: #if you reached end of the array (that is end of the column)
break
#if you left the array then it means you are on index which is not zero,
#so the previous where zero, but displaying array like this a[start:stop]
#will take the values from <start; stop) (stop is excluded)
endColumnIndex = tmpJ
tmpI = i
tmpJ = j
while(array[tmpI,tmpJ] != 0 and array[tmpI,tmpJ] == value): #iterate over row numbers
tmpI += 1
if tmpI == array.shape[0]: #if you reached end of the array
break
#if you left the array then it means you are on index which is not zero,
#so the previous where zero
endRowIndex = tmpI
print(array[startRowIndex:endRowIndex, startColumnIndex:endColumnIndex])
#change array to zero with already used elements
array[startRowIndex:endRowIndex, startColumnIndex:endColumnIndex] = 0
This one is kinda brute-force
but works the way you want it.
This approach doesn't use any external library other than numpy

Here's my pure Python (no NumPy) solution. I took advantage of the fact that the contiguous regions are always rectangular.
The algorithm scans from top-left to bottom-right; when it finds the corner of a region, it scans to find the top-right and bottom-left corners. The dictionary skip is populated so that later scans can skip horizontally past any rectangle which has already been found.
The time complexity is O(nm) for a grid with n rows and m columns, which is optimal for this problem.
def find_rectangles(grid):
width, height = len(grid[0]), len(grid)
skip = dict()
for y in range(height):
x = 0
while x < width:
if (x, y) in skip:
x = skip[x, y]
elif not grid[y][x]:
x += 1
else:
v = grid[y][x]
x2 = x + 1
while x2 < width and grid[y][x2] == v:
x2 += 1
y2 = y + 1
while y2 < height and grid[y2][x] == v:
skip[x, y2] = x2
y2 += 1
yield [ row[x:x2] for row in grid[y:y2] ]
x = x2
Example:
>>> for r in find_rectangles(grid1): # example from the question
... print(r)
...
[[2, 2, 2], [2, 2, 2], [2, 2, 2], [2, 2, 2]]
[[1, 1, 1], [1, 1, 1], [1, 1, 1]]
[[1, 1, 1], [1, 1, 1]]
>>> for r in find_rectangles(grid2): # example from mcsoini's answer
... print(r)
...
[[2, 2, 2], [2, 2, 2], [2, 2, 2], [2, 2, 2]]
[[3, 3, 3]]
[[1], [1]]
[[1, 1, 1], [1, 1, 1], [1, 1, 1]]
[[2], [2]]
[[1, 1, 1], [1, 1, 1]]

We can do this using scipy.ndimage.label and scipy.ndimage.find_objects:
from scipy.ndimage import label,find_objects
Array = np.array(Array)
[Array[j][i] for j in find_objects(*label(Array)) for i in find_objects(Array[j])]
# [array([[1, 1, 1],
# [1, 1, 1]]), array([[2, 2, 2],
# [2, 2, 2],
# [2, 2, 2],
# [2, 2, 2]]), array([[1, 1, 1],
# [1, 1, 1],
# [1, 1, 1]])]

Vector pair ordering in numpy

I am looking to order a pair of vectors by the first inequal element. Example:
[0, 1, 2] < [0, 2, 1]
because 0 == 0 so look at the next index, where 1 < 2.
Is there a simple way to do this in numpy? Right now I am using this to find the difference between the "greater" and "lesser" vector, which leads to my first try, which is:
(x - y) * np.sign((x - y)[np.nonzero(x - y)[0][0]])

You can use tuple: (0,1,2)<(0,2,1). So a function like
def cmp(v1, v2): return tuple(v1) < tuple(v2)
should suffice ...

np.lexsort is probably the most efficient way to do this:
import numpy as np
# an (N, k) array of N k-dimensional vectors
data = np.array([[0, 2, 3], [0, 1, 2], [0, 1, 3], [0, 2, 1]])
print data
# [[0 2 3]
# [0 1 2]
# [0 1 3]
# [0 2 1]]
# lexsort assumes (k, N), so transpose data first. we also need to reverse the
# order of the columns, since lexsort sorts by the last column first
idx = np.lexsort(data[:, ::-1].T)
print data[idx]
# [[0 1 2]
# [0 1 3]
# [0 2 1]
# [0 2 3]]

I would bet it will be way faster to do a simple loop through both arrays
def comparison(a,b):
for i in xrange(len(a)): #assuming they have to be the same length
if a[i] < b[i]:
return True
elif a[i] > b[i]:
return False
return False
For the 3 element vectors you posted the iteration is 7x faster on my machine. For large enough stretches of identical initial elements the iteration will become slower but make sure that is the case before you go vectorizing.

Develop Reference

Python is a programming language that lets you work quickly and integrate systems more effectively.

Vectorizing an indexing operation in numpy - python

You can use numpy.bincount(): A = numpy.zeros(5, dtype='int') I = [1, 1, 1, 3] J = [2, 1, 1, 1] sums = numpy.bincount(I, J) A[:len(sums)] += sums print(A) prints [0 4 0 1 0]

In principle you can do it with numpy's bincount and unique, but I'd guess it'll only make the code much less readable without any sensible performance improvement.

Related

moving through variables in matrix with recursion

Making a matrix with numpy.array

how to make efficiently large sparse matrix in python?

How do I extract a 2D NumPy sub-array from a 2D NumPy array based on patterns?

Vector pair ordering in numpy

Categories

Resources