Vector pair ordering in numpy - python

I am looking to order a pair of vectors by the first inequal element. Example:
[0, 1, 2] < [0, 2, 1]
because 0 == 0 so look at the next index, where 1 < 2.
Is there a simple way to do this in numpy? Right now I am using this to find the difference between the "greater" and "lesser" vector, which leads to my first try, which is:
(x - y) * np.sign((x - y)[np.nonzero(x - y)[0][0]])

You can use tuple: (0,1,2)<(0,2,1). So a function like
def cmp(v1, v2): return tuple(v1) < tuple(v2)
should suffice ...

np.lexsort is probably the most efficient way to do this:
import numpy as np
# an (N, k) array of N k-dimensional vectors
data = np.array([[0, 2, 3], [0, 1, 2], [0, 1, 3], [0, 2, 1]])
print data
# [[0 2 3]
# [0 1 2]
# [0 1 3]
# [0 2 1]]
# lexsort assumes (k, N), so transpose data first. we also need to reverse the
# order of the columns, since lexsort sorts by the last column first
idx = np.lexsort(data[:, ::-1].T)
print data[idx]
# [[0 1 2]
# [0 1 3]
# [0 2 1]
# [0 2 3]]

I would bet it will be way faster to do a simple loop through both arrays
def comparison(a,b):
for i in xrange(len(a)): #assuming they have to be the same length
if a[i] < b[i]:
return True
elif a[i] > b[i]:
return False
return False
For the 3 element vectors you posted the iteration is 7x faster on my machine. For large enough stretches of identical initial elements the iteration will become slower but make sure that is the case before you go vectorizing.


how to make efficiently large sparse matrix in python?

i try to make a numpy array with shape:(6962341, 268148), type: np.uint8
i have the data consist of [x1,x2,x3,x4], [x2,x1], [x4,x5,x3]...
i want to assign array[x1,x2] += 1, array[x1,x3] += 1, array[x1,x4] += 1, array[x2,x3] += 1, ...
so i have tried a function of the following structure.
import numpy as np
from itertools import combinations
base_array = np.zeros((row_size, col_size), dtype=np.uint8))
for each_list in data:
for (x,y) in list(combinations(each_list,2)):
if x>y:
base_array[y,x] += 1
base_array[x,y] += 1
it basically compute the upper triangle of a matrix and i will use the upper triangle value. also you can think this is similar to make the base matrix A for co-occurrence matrix. but this function is too slow and i think it is possible to make faster.
What should i do?
Assuming your data is integers (since they represent rows and columns) or you can hash your data x1, x2, ... into 1, 2, ... integers, here is a fast solution:
#list of pairwise combinations in your data
comb_list = []
for each_list in data:
comb_list += list(combinations(each_list,2))
#convert combination int to index (numpy is 0 based indexing)
comb_list = np.array(comb_list) - 1
#make array with flat indices
flat = np.ravel_multi_index((comb_list[:,0],comb_list[:,1]),(row_size,col_size))
#count number of duplicates for each index using np.bincount
base_array = np.bincount(flat,None,row_size*col_size).reshape((row_size,col_size)).astype(np.uint8)
sample data:
[[1, 2, 3, 4], [2, 1], [4, 5, 3, 4]]
Corresponding output:
[[0 1 1 1 0]
[1 0 1 1 0]
[0 0 0 2 0]
[0 0 1 1 1]
[0 0 1 1 0]]
EDIT: corresponding to explanation in comments:
data=[[1, 2, 3, 4], [2, 1], [4, 5, 3, 4]]
base_array = np.zeros((len(data), np.max(np.amax(data))), dtype=np.uint8)
for i, each_list in enumerate(data):
for j in each_list:
base_array[i, j-1] = 1
[[1 1 1 1 0]
[1 1 0 0 0]
[0 0 1 1 1]]

How do I extract a 2D NumPy sub-array from a 2D NumPy array based on patterns?

I have a 2D NumPy array which looks like this:
I need to display the arrays of non-zero elements as:
Could someone please help me out with what logic I could use to achieve the following output? I can't use fixed indexes (like array[a:b, c:d]) since the logic i create should be able to work for any NumPy array with a similar pattern.
This uses scipy.ndimage.label to recursively identify disconnected sub-arrays.
import numpy as np
from scipy.ndimage import label
array = np.array(
# initialize list to collect sub-arrays
arr_list = []
def append_subarrays(arr, val, val_0):
arr : 2D array
val : the value used for filtering
val_0 : the original value, which we want to preserve
# remove everything that's not the current val
arr[arr != val] = 0
if 0 in arr: # <-- not a single rectangle yet
# get relevant indices as well as their minima and maxima
x_ind, y_ind = np.where(arr != 0)
min_x, max_x, min_y, max_y = min(x_ind), max(x_ind) + 1, min(y_ind), max(y_ind) + 1
# cut subarray (everything corresponding to val)
arr = arr[min_x:max_x, min_y:max_y]
# use the label function to assign different values to disconnected regions
labeled_arr = label(arr)[0]
# recursively apply append_subarrays to each disconnected region
for sub_val in np.unique(labeled_arr[labeled_arr != 0]):
append_subarrays(labeled_arr.copy(), sub_val, val_0)
else: # <-- we only have a single rectangle left ==> append
arr_list.append(arr * val_0)
for i in np.unique(array[array > 0]):
append_subarrays(array.copy(), i, i)
for arr in arr_list:
print(arr, end='\n'*2)
Output (note: modified example array):
[[1 1 1]
[1 1 1]
[1 1 1]]
[[1 1 1]
[1 1 1]]
[[2 2 2]
[2 2 2]
[2 2 2]
[2 2 2]]
[[3 3 3]]
This sounds like a floodfill problem, so skimage.measure.label is a good approach:
from skimage.measure import label
labels = label(Array, connectivity=1)
for label in range(1, labels.max()+1):
xs, ys = np.where(labels==label)
shape = (len(np.unique(xs)), len(np.unique(ys)))
print(Array[xs, ys].reshape(shape))
[[2 2 2]
[2 2 2]
[2 2 2]
[2 2 2]]
[[1 1 1]
[1 1 1]
[1 1 1]]
[[1 1 1]
[1 1 1]]
startRowIndex = 0 #indexes of sub-arrays
endRowIndex = 0
startColumnIndex = 0
endColumnIndex = 0
tmpI = 0 #for iterating inside the i,j loops
tmpJ = 0
value = 0 #which number we are looking for in array
for i in range(array.shape[0]): #array.shape[0] says how many rows, shape[1] says how many columns
for j in range(array[i].size): #for all elements in a row
if(array[i,j] != 0): #if the element is different than 0
startRowIndex = i
startColumnIndex = j
tmpI = i
tmpJ = j #you cannot change the looping indexes so create tmp indexes
value = array[i,j] #save what number will be sub-array (for example 2)
while(array[tmpI,tmpJ] != 0 and array[tmpI,tmpJ] == value ): #iterate over column numbers
if tmpJ == array.shape[1]: #if you reached end of the array (that is end of the column)
#if you left the array then it means you are on index which is not zero,
#so the previous where zero, but displaying array like this a[start:stop]
#will take the values from <start; stop) (stop is excluded)
endColumnIndex = tmpJ
tmpI = i
tmpJ = j
while(array[tmpI,tmpJ] != 0 and array[tmpI,tmpJ] == value): #iterate over row numbers
tmpI += 1
if tmpI == array.shape[0]: #if you reached end of the array
#if you left the array then it means you are on index which is not zero,
#so the previous where zero
endRowIndex = tmpI
print(array[startRowIndex:endRowIndex, startColumnIndex:endColumnIndex])
#change array to zero with already used elements
array[startRowIndex:endRowIndex, startColumnIndex:endColumnIndex] = 0
This one is kinda brute-force
but works the way you want it.
This approach doesn't use any external library other than numpy
Here's my pure Python (no NumPy) solution. I took advantage of the fact that the contiguous regions are always rectangular.
The algorithm scans from top-left to bottom-right; when it finds the corner of a region, it scans to find the top-right and bottom-left corners. The dictionary skip is populated so that later scans can skip horizontally past any rectangle which has already been found.
The time complexity is O(nm) for a grid with n rows and m columns, which is optimal for this problem.
def find_rectangles(grid):
width, height = len(grid[0]), len(grid)
skip = dict()
for y in range(height):
x = 0
while x < width:
if (x, y) in skip:
x = skip[x, y]
elif not grid[y][x]:
x += 1
v = grid[y][x]
x2 = x + 1
while x2 < width and grid[y][x2] == v:
x2 += 1
y2 = y + 1
while y2 < height and grid[y2][x] == v:
skip[x, y2] = x2
y2 += 1
yield [ row[x:x2] for row in grid[y:y2] ]
x = x2
>>> for r in find_rectangles(grid1): # example from the question
... print(r)
[[2, 2, 2], [2, 2, 2], [2, 2, 2], [2, 2, 2]]
[[1, 1, 1], [1, 1, 1], [1, 1, 1]]
[[1, 1, 1], [1, 1, 1]]
>>> for r in find_rectangles(grid2): # example from mcsoini's answer
... print(r)
[[2, 2, 2], [2, 2, 2], [2, 2, 2], [2, 2, 2]]
[[3, 3, 3]]
[[1], [1]]
[[1, 1, 1], [1, 1, 1], [1, 1, 1]]
[[2], [2]]
[[1, 1, 1], [1, 1, 1]]
We can do this using scipy.ndimage.label and scipy.ndimage.find_objects:
from scipy.ndimage import label,find_objects
Array = np.array(Array)
[Array[j][i] for j in find_objects(*label(Array)) for i in find_objects(Array[j])]
# [array([[1, 1, 1],
# [1, 1, 1]]), array([[2, 2, 2],
# [2, 2, 2],
# [2, 2, 2],
# [2, 2, 2]]), array([[1, 1, 1],
# [1, 1, 1],
# [1, 1, 1]])]

Sorting a list using argsort in Numpy?

I have a list in python and the first numbers are [[29.046875, 1], [33.65625, 1], [18.359375, 1], [11.296875, 1], [36.671875, 1], [23.578125, 1],.........,[34.5625, 1]]
The above list is given an id of listNumber. I'm trying to use numpy.argsort to sort it based on the float elements:
listNumber = np.array(listNumber)
But this gives me the following but not sure why:
[[1 0]
[1 0]
[1 0]
[1 0]
[1 0]
[1 0]]
Why is this returning this? and is there another way to approach this?
Ok so i think there's two things going on here:
1- Your list is a list of lists
2- The 'argsort' function:
returns the indices that would sort an array.
According to the documentation.
So what is happening is the function reads through each item of the list, which in itself is a list, say index 0 is:
[29.046875, 1]
Then it is saying, okay this is another list so let me sort it and then return a number based on where it would go if it was the new index:
[29.046875, 1] -> [1, 0]
Because 1 would come before 29 if it was sorted in ascending order.
It does this for every nested list then gives you a final list containing all these 1's and 0's.
This answers the first question. Another user was able to answer the second :)
You must set axis like:
import numpy as np
l = [[29.046875, 1], [33.65625, 1], [18.359375, 1], [11.296875, 1], [36.671875, 1], [23.578125, 1],[34.5625, 1]]
l = np.argsort(l, axis=0) # sorts along first axis (down)
[[3 0]
[2 1]
[5 2]
[0 3]
[1 4]
[6 5]
[4 6]]
Try this;
sortedList = listNumber[listNumber[:,0].argsort(axis=0)]
I don't know why people like using predone functions instead of using their own algorithm. Anyway, you are using argsort in a bad way. argsort returns an array containing the INDEXES of your elements, thos are 2 examples :
Code 1:
import numpy as geek
# input array
in_arr = geek.array([ 2, 0, 1, 5, 4, 1, 9])
print ("Input unsorted array : ", in_arr)
out_arr = geek.argsort(in_arr)
print ("Output sorted array indices : ", out_arr)
print("Output sorted array : ", in_arr[out_arr])
Output :
Input unsorted array : [2 0 1 5 4 1 9]
Output sorted array indices : [1 2 5 0 4 3 6]
Output sorted array : [0 1 1 2 4 5 9]
Code 2:
# Python program explaining
# argpartition() function
import numpy as geek
# input 2d array
in_arr = geek.array([[ 2, 0, 1], [ 5, 4, 3]])
print ("Input array : ", in_arr)
# output sorted array indices
out_arr1 = geek.argsort(in_arr, kind ='mergesort', axis = 0)
print ("Output sorteded array indices along axis 0: ", out_arr1)
out_arr2 = geek.argsort(in_arr, kind ='heapsort', axis = 1)
print ("Output sorteded array indices along axis 1: ", out_arr2)
Input array : [[2 0 1]
[5 4 3]]
Output sorteded array indices along axis 0: [[0 0 0]
[1 1 1]]
Output sorteded array indices along axis 1: [[1 2 0]
[2 1 0]]
I am supposing that your data is stored in listnumber
import numpy as np
new_listnumber = listnumber[:, 0]
index_array = np.argsort(new_listnumber , axis=0)
New_val = listnumber[index_array]

how to get all the number of low-dimension-numpy in the high-dimension numpy

After searching around for a decent solutions and found that everything out there was difficult to use.
a = [[0 0 1 1]
[1 1 1 1]
[0 0 1 0]
[1 1 1 1]]
b = [[0 0 1]
[1 1 1]]
So how to get the numbers (b in a), for the above, return 2.
You're not too clear about different cases that this problem can have, like overlapping, etc., but if you want to check only the intersection from start of a's rows and don't want to take the overlapped sub-arrays into the consideration here is a solution by splitting the a based on b's shape:
In [22]: x_a, y_a = a.shape
In [23]: x_b, y_b = b.shape
In [24]: sum((b == arr).all() for arr in np.vsplit(a[:,:y_b], x_a//x_b))
Out[24]: 2
np.vsplit(a[:,:y_b], x_a//x_b) will give you the first y_b items from each row in a, splited based to equal x_a//x_b sized chunks.
In [25]: np.vsplit(a[:,:y_b], x_a//x_b)
[array([[0, 0, 1],
[1, 1, 1]]), array([[0, 0, 1],
[1, 1, 1]])]
And the following generator expression within sum will give you number of equal sub arrays to b.
In [24]: sum((b == arr).all() for arr in np.vsplit(a[:,:y_b], x_a//x_b))
Out[24]: 2

Vectorizing an indexing operation in numpy

What is the best way to vectorize the following code in numpy?
from numpy import *
A = zeros(5, dtype='int')
I = [1, 1, 1, 3]
J = [2, 1, 1, 1]
for i, j in zip(I, J):
A[i] += j
print A
The result should be:
[0 4 0 1 0]
Here A is the original array, I stores the index at which we want to increment by the corresponding entry of J.
If one simply vectorizes the above by doing:
A[I] += J
print A
one gets the wrong answer
[0 1 0 1 0]
as, apparently, repeated indices are ignored. Is there an equivalent operation to += which does not ignore repeated indices?
You can use numpy.bincount():
A = numpy.zeros(5, dtype='int')
I = [1, 1, 1, 3]
J = [2, 1, 1, 1]
sums = numpy.bincount(I, J)
A[:len(sums)] += sums
[0 4 0 1 0]
In principle you can do it with numpy's bincount and unique, but I'd guess it'll only make the code much less readable without any sensible performance improvement.

