1.
i try to make a numpy array with shape:(6962341, 268148), type: np.uint8
2.
i have the data consist of [x1,x2,x3,x4], [x2,x1], [x4,x5,x3]...
3.
i want to assign array[x1,x2] += 1, array[x1,x3] += 1, array[x1,x4] += 1, array[x2,x3] += 1, ...
4.
so i have tried a function of the following structure.
import numpy as np
from itertools import combinations
base_array = np.zeros((row_size, col_size), dtype=np.uint8))
for each_list in data:
for (x,y) in list(combinations(each_list,2)):
if x>y:
base_array[y,x] += 1
else:
base_array[x,y] += 1
it basically compute the upper triangle of a matrix and i will use the upper triangle value. also you can think this is similar to make the base matrix A for co-occurrence matrix. but this function is too slow and i think it is possible to make faster.
What should i do?
Assuming your data is integers (since they represent rows and columns) or you can hash your data x1, x2, ... into 1, 2, ... integers, here is a fast solution:
#list of pairwise combinations in your data
comb_list = []
for each_list in data:
comb_list += list(combinations(each_list,2))
#convert combination int to index (numpy is 0 based indexing)
comb_list = np.array(comb_list) - 1
#make array with flat indices
flat = np.ravel_multi_index((comb_list[:,0],comb_list[:,1]),(row_size,col_size))
#count number of duplicates for each index using np.bincount
base_array = np.bincount(flat,None,row_size*col_size).reshape((row_size,col_size)).astype(np.uint8)
sample data:
[[1, 2, 3, 4], [2, 1], [4, 5, 3, 4]]
Corresponding output:
[[0 1 1 1 0]
[1 0 1 1 0]
[0 0 0 2 0]
[0 0 1 1 1]
[0 0 1 1 0]]
EDIT: corresponding to explanation in comments:
data=[[1, 2, 3, 4], [2, 1], [4, 5, 3, 4]]
base_array = np.zeros((len(data), np.max(np.amax(data))), dtype=np.uint8)
for i, each_list in enumerate(data):
for j in each_list:
base_array[i, j-1] = 1
Output:
[[1 1 1 1 0]
[1 1 0 0 0]
[0 0 1 1 1]]
I have a 2D NumPy array which looks like this:
Array=
[
[0,0,0,0,0,0,0,2,2,2],
[0,0,0,0,0,0,0,2,2,2].
[0,0,1,1,1,0,0,2,2,2],
[0,0,1,1,1,0,0,2,2,2],
[0,0,1,1,1,0,0,1,1,1],
[0,0,0,0,0,0,0,1,1,1]
]
I need to display the arrays of non-zero elements as:
Array1:
[
[1,1,1],
[1,1,1],
[1,1,1]
]
Array2:
[
[2,2,2],
[2,2,2],
[2,2,2],
[2,2,2]
]
Array3:
[
[1,1,1],
[1,1,1]
]
Could someone please help me out with what logic I could use to achieve the following output? I can't use fixed indexes (like array[a:b, c:d]) since the logic i create should be able to work for any NumPy array with a similar pattern.
This uses scipy.ndimage.label to recursively identify disconnected sub-arrays.
import numpy as np
from scipy.ndimage import label
array = np.array(
[[0,0,0,0,0,0,0,2,2,2,3,3,3],
[0,0,0,0,0,0,0,2,2,2,0,0,1],
[0,0,1,1,1,0,0,2,2,2,0,2,1],
[0,0,1,1,1,0,0,2,2,2,0,2,0],
[0,0,1,1,1,0,0,1,1,1,0,0,0],
[0,0,0,0,0,0,0,1,1,1,0,0,0]])
# initialize list to collect sub-arrays
arr_list = []
def append_subarrays(arr, val, val_0):
'''
arr : 2D array
val : the value used for filtering
val_0 : the original value, which we want to preserve
'''
# remove everything that's not the current val
arr[arr != val] = 0
if 0 in arr: # <-- not a single rectangle yet
# get relevant indices as well as their minima and maxima
x_ind, y_ind = np.where(arr != 0)
min_x, max_x, min_y, max_y = min(x_ind), max(x_ind) + 1, min(y_ind), max(y_ind) + 1
# cut subarray (everything corresponding to val)
arr = arr[min_x:max_x, min_y:max_y]
# use the label function to assign different values to disconnected regions
labeled_arr = label(arr)[0]
# recursively apply append_subarrays to each disconnected region
for sub_val in np.unique(labeled_arr[labeled_arr != 0]):
append_subarrays(labeled_arr.copy(), sub_val, val_0)
else: # <-- we only have a single rectangle left ==> append
arr_list.append(arr * val_0)
for i in np.unique(array[array > 0]):
append_subarrays(array.copy(), i, i)
for arr in arr_list:
print(arr, end='\n'*2)
Output (note: modified example array):
[[1]
[1]]
[[1 1 1]
[1 1 1]
[1 1 1]]
[[1 1 1]
[1 1 1]]
[[2 2 2]
[2 2 2]
[2 2 2]
[2 2 2]]
[[2]
[2]]
[[3 3 3]]
This sounds like a floodfill problem, so skimage.measure.label is a good approach:
Array=np.array([[0,0,0,0,0,0,0,2,2,2],
[0,0,0,0,0,0,0,2,2,2],
[0,0,1,1,1,0,0,2,2,2],
[0,0,1,1,1,0,0,2,2,2],
[0,0,1,1,1,0,0,1,1,1],
[0,0,0,0,0,0,0,1,1,1]
])
from skimage.measure import label
labels = label(Array, connectivity=1)
for label in range(1, labels.max()+1):
xs, ys = np.where(labels==label)
shape = (len(np.unique(xs)), len(np.unique(ys)))
print(Array[xs, ys].reshape(shape))
Output:
[[2 2 2]
[2 2 2]
[2 2 2]
[2 2 2]]
[[1 1 1]
[1 1 1]
[1 1 1]]
[[1 1 1]
[1 1 1]]
startRowIndex = 0 #indexes of sub-arrays
endRowIndex = 0
startColumnIndex = 0
endColumnIndex = 0
tmpI = 0 #for iterating inside the i,j loops
tmpJ = 0
value = 0 #which number we are looking for in array
for i in range(array.shape[0]): #array.shape[0] says how many rows, shape[1] says how many columns
for j in range(array[i].size): #for all elements in a row
if(array[i,j] != 0): #if the element is different than 0
startRowIndex = i
startColumnIndex = j
tmpI = i
tmpJ = j #you cannot change the looping indexes so create tmp indexes
value = array[i,j] #save what number will be sub-array (for example 2)
while(array[tmpI,tmpJ] != 0 and array[tmpI,tmpJ] == value ): #iterate over column numbers
tmpJ+=1
if tmpJ == array.shape[1]: #if you reached end of the array (that is end of the column)
break
#if you left the array then it means you are on index which is not zero,
#so the previous where zero, but displaying array like this a[start:stop]
#will take the values from <start; stop) (stop is excluded)
endColumnIndex = tmpJ
tmpI = i
tmpJ = j
while(array[tmpI,tmpJ] != 0 and array[tmpI,tmpJ] == value): #iterate over row numbers
tmpI += 1
if tmpI == array.shape[0]: #if you reached end of the array
break
#if you left the array then it means you are on index which is not zero,
#so the previous where zero
endRowIndex = tmpI
print(array[startRowIndex:endRowIndex, startColumnIndex:endColumnIndex])
#change array to zero with already used elements
array[startRowIndex:endRowIndex, startColumnIndex:endColumnIndex] = 0
This one is kinda brute-force
but works the way you want it.
This approach doesn't use any external library other than numpy
Here's my pure Python (no NumPy) solution. I took advantage of the fact that the contiguous regions are always rectangular.
The algorithm scans from top-left to bottom-right; when it finds the corner of a region, it scans to find the top-right and bottom-left corners. The dictionary skip is populated so that later scans can skip horizontally past any rectangle which has already been found.
The time complexity is O(nm) for a grid with n rows and m columns, which is optimal for this problem.
def find_rectangles(grid):
width, height = len(grid[0]), len(grid)
skip = dict()
for y in range(height):
x = 0
while x < width:
if (x, y) in skip:
x = skip[x, y]
elif not grid[y][x]:
x += 1
else:
v = grid[y][x]
x2 = x + 1
while x2 < width and grid[y][x2] == v:
x2 += 1
y2 = y + 1
while y2 < height and grid[y2][x] == v:
skip[x, y2] = x2
y2 += 1
yield [ row[x:x2] for row in grid[y:y2] ]
x = x2
Example:
>>> for r in find_rectangles(grid1): # example from the question
... print(r)
...
[[2, 2, 2], [2, 2, 2], [2, 2, 2], [2, 2, 2]]
[[1, 1, 1], [1, 1, 1], [1, 1, 1]]
[[1, 1, 1], [1, 1, 1]]
>>> for r in find_rectangles(grid2): # example from mcsoini's answer
... print(r)
...
[[2, 2, 2], [2, 2, 2], [2, 2, 2], [2, 2, 2]]
[[3, 3, 3]]
[[1], [1]]
[[1, 1, 1], [1, 1, 1], [1, 1, 1]]
[[2], [2]]
[[1, 1, 1], [1, 1, 1]]
We can do this using scipy.ndimage.label and scipy.ndimage.find_objects:
from scipy.ndimage import label,find_objects
Array = np.array(Array)
[Array[j][i] for j in find_objects(*label(Array)) for i in find_objects(Array[j])]
# [array([[1, 1, 1],
# [1, 1, 1]]), array([[2, 2, 2],
# [2, 2, 2],
# [2, 2, 2],
# [2, 2, 2]]), array([[1, 1, 1],
# [1, 1, 1],
# [1, 1, 1]])]
I have a list in python and the first numbers are [[29.046875, 1], [33.65625, 1], [18.359375, 1], [11.296875, 1], [36.671875, 1], [23.578125, 1],.........,[34.5625, 1]]
The above list is given an id of listNumber. I'm trying to use numpy.argsort to sort it based on the float elements:
listNumber = np.array(listNumber)
print(np.argsort(listNumber))
But this gives me the following but not sure why:
[[1 0]
[1 0]
[1 0]
...
[1 0]
[1 0]
[1 0]]
Why is this returning this? and is there another way to approach this?
Ok so i think there's two things going on here:
1- Your list is a list of lists
2- The 'argsort' function:
returns the indices that would sort an array.
According to the documentation.
So what is happening is the function reads through each item of the list, which in itself is a list, say index 0 is:
[29.046875, 1]
Then it is saying, okay this is another list so let me sort it and then return a number based on where it would go if it was the new index:
[29.046875, 1] -> [1, 0]
Because 1 would come before 29 if it was sorted in ascending order.
It does this for every nested list then gives you a final list containing all these 1's and 0's.
This answers the first question. Another user was able to answer the second :)
You must set axis like:
import numpy as np
l = [[29.046875, 1], [33.65625, 1], [18.359375, 1], [11.296875, 1], [36.671875, 1], [23.578125, 1],[34.5625, 1]]
l = np.argsort(l, axis=0) # sorts along first axis (down)
print(l)
output:
[[3 0]
[2 1]
[5 2]
[0 3]
[1 4]
[6 5]
[4 6]]
Try this;
sortedList = listNumber[listNumber[:,0].argsort(axis=0)]
print(sortedList)
I don't know why people like using predone functions instead of using their own algorithm. Anyway, you are using argsort in a bad way. argsort returns an array containing the INDEXES of your elements, thos are 2 examples :
Code 1:
import numpy as geek
# input array
in_arr = geek.array([ 2, 0, 1, 5, 4, 1, 9])
print ("Input unsorted array : ", in_arr)
out_arr = geek.argsort(in_arr)
print ("Output sorted array indices : ", out_arr)
print("Output sorted array : ", in_arr[out_arr])
Output :
Input unsorted array : [2 0 1 5 4 1 9]
Output sorted array indices : [1 2 5 0 4 3 6]
Output sorted array : [0 1 1 2 4 5 9]
Code 2:
# Python program explaining
# argpartition() function
import numpy as geek
# input 2d array
in_arr = geek.array([[ 2, 0, 1], [ 5, 4, 3]])
print ("Input array : ", in_arr)
# output sorted array indices
out_arr1 = geek.argsort(in_arr, kind ='mergesort', axis = 0)
print ("Output sorteded array indices along axis 0: ", out_arr1)
out_arr2 = geek.argsort(in_arr, kind ='heapsort', axis = 1)
print ("Output sorteded array indices along axis 1: ", out_arr2)
Output:
Input array : [[2 0 1]
[5 4 3]]
Output sorteded array indices along axis 0: [[0 0 0]
[1 1 1]]
Output sorteded array indices along axis 1: [[1 2 0]
[2 1 0]]
I am supposing that your data is stored in listnumber
import numpy as np
new_listnumber = listnumber[:, 0]
index_array = np.argsort(new_listnumber , axis=0)
New_val = listnumber[index_array]
print(New_val)