Possible bug in scipy.ndimage.measurements.label? - python

I was trying to do a percolation program in python, and I saw a tutorial recommending scipy.ndimage.measurements.label to identify the clusters. The problem is I stated to notice some odd behaviors in the function. Some elements that should belong to the same cluster are receiving different labels. Here is a code snippet that reproduce my problem.
import numpy as np
import scipy
from scipy.ndimage import measurements
grid = np.array([[0, 1, 1, 0, 1, 1, 0, 1, 0, 1],
[0, 1, 0, 0, 0, 0, 0, 0, 1, 0],
[1, 1, 1, 1, 1, 1, 0, 1, 1, 1],
[1, 0, 1, 0, 1, 1, 0, 1, 1, 1],
[0, 0, 1, 0, 1, 0, 0, 0, 0, 1],
[0, 1, 1, 1, 0, 0, 0, 0, 0, 1],
[0, 1, 0, 1, 1, 1, 0, 0, 1, 1], #<- notice the last two elements
[1, 1, 0, 1, 1, 1, 1, 1, 1, 0],
[1, 0, 0, 0, 1, 1, 1, 1, 0, 1],
[1, 1, 1, 0, 0, 0, 1, 1, 0, 0]])
labels, nlabels = measurements.label(grid)
print "Scipy Version: ", scipy.__version__
print
print labels
The output I get is:
Scipy Version: 0.13.0
[[0 1 1 0 2 2 0 3 0 4]
[0 1 0 0 0 0 0 0 5 0]
[1 1 1 1 1 1 0 5 5 5]
[1 0 1 0 1 1 0 5 5 5]
[0 0 1 0 1 0 0 0 0 5]
[0 1 1 1 0 0 0 0 0 5]
[0 1 0 1 1 1 0 0 1 5] #<- The last two elements
[1 1 0 1 1 1 1 1 1 0] # are set with different labels
[1 0 0 0 1 1 1 1 0 6]
[1 1 1 0 0 0 1 1 0 0]]
Am I missing something about the way this function is supposed to work or is this a bug?
This is very important because labeling the clusters correctly is crucial to get the right results in percolation.
Thanks, for the help.

Related

Scale/resize a square matrix into a larger size whilst retaining the grid structure/pattern (Python)

arr = [[1 0 0] # 3x3
[0 1 0]
[0 0 1]]
largeArr = [[1 1 0 0 0 0] # 6x6
[1 1 0 0 0 0]
[0 0 1 1 0 0]
[0 0 1 1 0 0]
[0 0 0 0 1 1]
[0 0 0 0 1 1]]
Like above, I want to retain the same 'grid' format whilst increasing the dimensions of the 2D array. How would I go about doing this? I assume the original matrix can only be scaled up by an integer n.
You can use numba if performance is of importance (similar post) with no python jitting and in parallel mode if needed (this code can be written faster by some optimizations):
#nb.njit # #nb.njit("int64[:, ::1](int64[:, ::1], int64)", parallel =True)
def numba_(arr, n):
res = np.empty((arr.shape[0] * n, arr.shape[0] * n), dtype=np.int64)
for i in range(arr.shape[0]): # for i in nb.prange(arr.shape[0])
for j in range(arr.shape[0]):
res[n * i: n * (i + 1), n * j: n * (j + 1)] = arr[i, j]
return res
So, as an example:
arr = [[0 0 0 1 1]
[0 1 1 1 1]
[1 1 0 0 1]
[0 0 1 0 1]
[0 1 1 0 1]]
res (n=3):
[[0 0 0 0 0 0 0 0 0 1 1 1 1 1 1]
[0 0 0 0 0 0 0 0 0 1 1 1 1 1 1]
[0 0 0 0 0 0 0 0 0 1 1 1 1 1 1]
[0 0 0 1 1 1 1 1 1 1 1 1 1 1 1]
[0 0 0 1 1 1 1 1 1 1 1 1 1 1 1]
[0 0 0 1 1 1 1 1 1 1 1 1 1 1 1]
[1 1 1 1 1 1 0 0 0 0 0 0 1 1 1]
[1 1 1 1 1 1 0 0 0 0 0 0 1 1 1]
[1 1 1 1 1 1 0 0 0 0 0 0 1 1 1]
[0 0 0 0 0 0 1 1 1 0 0 0 1 1 1]
[0 0 0 0 0 0 1 1 1 0 0 0 1 1 1]
[0 0 0 0 0 0 1 1 1 0 0 0 1 1 1]
[0 0 0 1 1 1 1 1 1 0 0 0 1 1 1]
[0 0 0 1 1 1 1 1 1 0 0 0 1 1 1]
[0 0 0 1 1 1 1 1 1 0 0 0 1 1 1]]
Performances (perfplot)
In my benchmarks, numba will be the fastest (for large n, parallel mode will be better), after that BrokenBenchmark answer will be faster than scipy.ndimage.zoom. In the benchmarks, f is arr.shape[0] and n is the repeating count:
You can use repeat() twice:
arr.repeat(2, 0).repeat(2, 1)
This outputs:
[[1. 1. 0. 0. 0. 0.]
[1. 1. 0. 0. 0. 0.]
[0. 0. 1. 1. 0. 0.]
[0. 0. 1. 1. 0. 0.]
[0. 0. 0. 0. 1. 1.]
[0. 0. 0. 0. 1. 1.]]
You could use scipy.ndimage.zoom
In [3]: arr = np.array([[1, 0, 0], [0, 1, 0], [0, 0, 1]])
In [4]: ndimage.zoom(arr, 2, order=0, grid_mode=True, mode="nearest")
Out[4]:
array([[1, 1, 0, 0, 0, 0],
[1, 1, 0, 0, 0, 0],
[0, 0, 1, 1, 0, 0],
[0, 0, 1, 1, 0, 0],
[0, 0, 0, 0, 1, 1],
[0, 0, 0, 0, 1, 1]])
This can be done using Pillow (fork of PIL) as follows:
from PIL import Image
import numpy as np
n = 3 # repeatation
im = Image.fromarray(arr)
up_im = im.resize((im.width*n, im.height*n),resample=Image.NEAREST)
up_arr = np.array(up_im)
Example:
arr = np.array(
[[0, 0, 0, 1, 1],
[0, 1, 1, 1, 1],
[1, 1, 0, 0, 1],
[0, 0, 1, 0, 1],
[0, 1, 1, 0, 1]])
res (n=3):
np.array(
[[0, 0, 0, 0, 0, 0, 0, 0, 0, 1, 1, 1, 1, 1, 1],
[0, 0, 0, 0, 0, 0, 0, 0, 0, 1, 1, 1, 1, 1, 1],
[0, 0, 0, 0, 0, 0, 0, 0, 0, 1, 1, 1, 1, 1, 1],
[0, 0, 0, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1],
[0, 0, 0, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1],
[0, 0, 0, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1],
[1, 1, 1, 1, 1, 1, 0, 0, 0, 0, 0, 0, 1, 1, 1],
[1, 1, 1, 1, 1, 1, 0, 0, 0, 0, 0, 0, 1, 1, 1],
[1, 1, 1, 1, 1, 1, 0, 0, 0, 0, 0, 0, 1, 1, 1],
[0, 0, 0, 0, 0, 0, 1, 1, 1, 0, 0, 0, 1, 1, 1],
[0, 0, 0, 0, 0, 0, 1, 1, 1, 0, 0, 0, 1, 1, 1],
[0, 0, 0, 0, 0, 0, 1, 1, 1, 0, 0, 0, 1, 1, 1],
[0, 0, 0, 1, 1, 1, 1, 1, 1, 0, 0, 0, 1, 1, 1],
[0, 0, 0, 1, 1, 1, 1, 1, 1, 0, 0, 0, 1, 1, 1],
[0, 0, 0, 1, 1, 1, 1, 1, 1, 0, 0, 0, 1, 1, 1]])
numba is by far the most performant in terms of speed. As the matrix size increases, PIL takes much more time.

Numpy: Insert arbitrary number of zeros into matrix rows at various indices

Problem
I have a 2D array that contains a series of 0's and 1's which represent values that have been bit-packed. I need to insert an arbitrary number of 0's at arbitrary points in every row in order to pad the bit-packed values a multiple of 8 bits.
I have 3 vectors.
A vector containing indices that I want to insert zeros at
A vector containing the number of zeros that I want to insert at each point from vector 1.
A vector that contains the size of each bit-string I am padding. (Probably don't need this to solve but it could be fun!)
Example
I have a vector that contains indices to insert before: [0 6 14]
and a vector that contains the number of zeroes that I want to insert: [2 0 4]
and a vector that has the size of each bitstring I am padding: [6, 8, 4]
The aim is to insert the zeroes into each row of array as such:
[[0 0 0 0 0 1 0 0 0 0 0 0 0 1 0 0 0 1]
[0 0 0 0 0 1 0 0 0 0 0 0 1 0 0 0 0 1]
[0 0 0 0 1 0 0 0 0 0 0 0 1 0 0 0 1 0]
[0 0 0 0 1 1 0 0 0 0 0 1 0 0 0 0 1 1]
[0 0 0 1 0 0 0 0 0 0 0 1 0 0 0 1 0 0]
[0 0 0 1 0 1 0 0 0 0 0 1 1 0 0 1 0 1]
[0 0 0 1 1 0 0 0 0 0 0 1 1 0 0 1 1 0]
[0 0 0 1 1 1 0 0 0 0 1 0 0 0 0 1 1 1]
[0 0 1 0 0 0 0 0 0 0 1 0 0 0 1 0 0 0]
[1 1 0 0 1 0 1 1 1 1 1 1 1 1 1 0 0 1]]
*Spaces added between columns to highlight insertion points.
Becomes:
| | | | | |
v v v v v v
[[0 0 0 0 0 0 0 1 0 0 0 0 0 0 0 1 0 0 0 0 0 0 0 1]
[0 0 0 0 0 0 0 1 0 0 0 0 0 0 1 0 0 0 0 0 0 0 0 1]
[0 0 0 0 0 0 1 0 0 0 0 0 0 0 1 0 0 0 0 0 0 0 1 0]
[0 0 0 0 0 0 1 1 0 0 0 0 0 1 0 0 0 0 0 0 0 0 1 1]
[0 0 0 0 0 1 0 0 0 0 0 0 0 1 0 0 0 0 0 0 0 1 0 0]
[0 0 0 0 0 1 0 1 0 0 0 0 0 1 1 0 0 0 0 0 0 1 0 1]
[0 0 0 0 0 1 1 0 0 0 0 0 0 1 1 0 0 0 0 0 0 1 1 0]
[0 0 0 0 0 1 1 1 0 0 0 0 1 0 0 0 0 0 0 0 0 1 1 1]
[0 0 0 0 1 0 0 0 0 0 0 0 1 0 0 0 0 0 0 0 1 0 0 0]
[0 0 1 1 0 0 1 0 1 1 1 1 1 1 1 1 0 0 0 0 1 0 0 1]]
*Arrows denote inserted 0's
I am trying the most performant way of doing this. All of the vectors/arrays are numpy arrays. I've looked into using numpy.insert but that doesn't seem do have the ability to insert multiple values at a given index. I've also thought about using numpy.hstack and then flattening, but was unable to yield the result I wanted.
Any help is greatly appreciated!
np.insert does support inserting multiple values at the same index, you just have to provide that index multiple times. So you can obtain your desired result as follows:
indices = np.array([0, 6, 14])
n_zeros = np.array([2, 0, 4])
result = np.insert(matrix,
np.repeat(indices, n_zeros),
0,
axis=1)
Formatted the matrix for you (although it might be easier to work with a contrived example):
matrix = nparray([[0, 0, 0, 0, 0, 1, 0, 0, 0, 0, 0, 0, 0, 1, 0, 0, 0, 1],
[0, 0, 0, 0, 0, 1, 0, 0, 0, 0, 0, 0, 1, 0, 0, 0, 0, 1],
[0, 0, 0, 0, 1, 0, 0, 0, 0, 0, 0, 0, 1, 0, 0, 0, 1, 0],
[0, 0, 0, 0, 1, 1, 0, 0, 0, 0, 0, 1, 0, 0, 0, 0, 1, 1],
[0, 0, 0, 1, 0, 0, 0, 0, 0, 0, 0, 1, 0, 0, 0, 1, 0, 0],
[0, 0, 0, 1, 0, 1, 0, 0, 0, 0, 0, 1, 1, 0, 0, 1, 0, 1],
[0, 0, 0, 1, 1, 0, 0, 0, 0, 0, 0, 1, 1, 0, 0, 1, 1, 0],
[0, 0, 0, 1, 1, 1, 0, 0, 0, 0, 1, 0, 0, 0, 0, 1, 1, 1],
[0, 0, 1, 0, 0, 0, 0, 0, 0, 0, 1, 0, 0, 0, 1, 0, 0, 0],
[1, 1, 0, 0, 1, 0, 1, 1, 1, 1, 1, 1, 1, 1, 1, 0, 0, 1]])
indices = np.array([0, 6, 14])
num_zeros = np.array([2, 0, 4])
pad = np.array([6, 8, 4])
You need to allocate a new array to do this operation. Creating zero-filled arrays in numpy is very cheap. So let's start with allocating a zero filled array with our desired output shape:
out_shape = np.array(matrix.shape)
out_shape[1] += num_zeros.sum()
zeros = np.zeros(out_shape, dtype=matrix.dtype)
Now, write matrix to continuous blocks of memory in zeros by using slices:
meta = np.stack([indices, num_zeros])
meta = meta[:, meta[1] != 0] # throw away 0 slices
slices = meta.T.ravel().cumsum()
slices = np.append(cs, zeros.shape[1]) # for convenience
prev = 0
for start, end in zip(slices[1::2], slices[2::2]):
zeros[:, slice(start,end)] = matrix[:, slice(prev, prev + end-start)]
prev = end-start
Output in zeros:
[[0 0 0 0 0 0 0 1 0 0 0 0 0 0 0 1 0 0 0 0 0 0 0 1]
[0 0 0 0 0 0 0 1 0 0 0 0 0 0 1 0 0 0 0 0 0 0 0 1]
[0 0 0 0 0 0 1 0 0 0 0 0 0 0 1 0 0 0 0 0 0 0 1 0]
[0 0 0 0 0 0 1 1 0 0 0 0 0 1 0 0 0 0 0 0 0 0 1 1]
[0 0 0 0 0 1 0 0 0 0 0 0 0 1 0 0 0 0 0 0 0 1 0 0]
[0 0 0 0 0 1 0 1 0 0 0 0 0 1 1 0 0 0 0 0 0 1 0 1]
[0 0 0 0 0 1 1 0 0 0 0 0 0 1 1 0 0 0 0 0 0 1 1 0]
[0 0 0 0 0 1 1 1 0 0 0 0 1 0 0 0 0 0 0 0 0 1 1 1]
[0 0 0 0 1 0 0 0 0 0 0 0 1 0 0 0 0 0 0 0 1 0 0 0]
[0 0 1 1 0 0 1 0 1 1 1 1 1 1 1 1 0 0 0 0 1 0 0 1]]
My approach would be to create a zero array up front and copy the columns into the correct locations. The indexing is a little hairy with respect to clarity, so there is probably room for improvement there.
data = np.array(
[[0, 0, 0, 0, 0, 1, 0, 0, 0, 0, 0, 0, 0, 1, 0, 0, 0, 1],
[0, 0, 0, 0, 0, 1, 0, 0, 0, 0, 0, 0, 1, 0, 0, 0, 0, 1],
[0, 0, 0, 0, 1, 0, 0, 0, 0, 0, 0, 0, 1, 0, 0, 0, 1, 0],
[0, 0, 0, 0, 1, 1, 0, 0, 0, 0, 0, 1, 0, 0, 0, 0, 1, 1],
[0, 0, 0, 1, 0, 0, 0, 0, 0, 0, 0, 1, 0, 0, 0, 1, 0, 0],
[0, 0, 0, 1, 0, 1, 0, 0, 0, 0, 0, 1, 1, 0, 0, 1, 0, 1],
[0, 0, 0, 1, 1, 0, 0, 0, 0, 0, 0, 1, 1, 0, 0, 1, 1, 0],
[0, 0, 0, 1, 1, 1, 0, 0, 0, 0, 1, 0, 0, 0, 0, 1, 1, 1],
[0, 0, 1, 0, 0, 0, 0, 0, 0, 0, 1, 0, 0, 0, 1, 0, 0, 0],
[1, 1, 0, 0, 1, 0, 1, 1, 1, 1, 1, 1, 1, 1, 1, 0, 0, 1]])
insert_before = [0, 6, 14]
zero_pads = [0, 2, 4]
res = np.zeros((len(data), 8*len(zero_pads)), dtype=int)
for i in range(len(zero_pads)):
res[:, i*8+zero_pads[i]:(i+1)*8] = data[:, insert_before[i]:insert_before[i]+8-zero_pads[i]]
>>> res
array([[0, 0, 0, 0, 0, 1, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0,
0, 1],
[0, 0, 0, 0, 0, 1, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 1],
[0, 0, 0, 0, 1, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 1, 0],
[0, 0, 0, 0, 1, 1, 0, 0, 0, 0, 0, 0, 0, 0, 0, 1, 0, 0, 0, 0, 0, 0, 1, 1],
[0, 0, 0, 1, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 1, 0, 0, 0, 0, 0, 1, 0, 0],
[0, 0, 0, 1, 0, 1, 0, 0, 0, 0, 0, 0, 0, 0, 0, 1, 0, 0, 0, 0, 0, 1, 0, 1],
[0, 0, 0, 1, 1, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 1, 0, 0, 0, 0, 0, 1, 1, 0],
[0, 0, 0, 1, 1, 1, 0, 0, 0, 0, 0, 0, 0, 0, 1, 0, 0, 0, 0, 0, 0, 1, 1, 1],
[0, 0, 1, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 1, 0, 0, 0, 0, 0, 1, 0, 0, 0],
[1, 1, 0, 0, 1, 0, 1, 1, 0, 0, 1, 1, 1, 1, 1, 1, 0, 0, 0, 0, 1, 0, 0, 1]])

Explode column into columns

I got pandas train df which looks like this:
image
1 [[0, 0, 0], [1, 0, 1], [0, 1, 1]]
2 [[1, 1, 1], [0, 0, 1], [0, 0, 1]]
2 [[0, 0, 1], [0, 1, 1], [1, 1, 1]]
Is there any way to "explode" it but into columns
1 2 3 4 5 6 7 8 9
1 0, 0, 0, 1, 0, 1, 0, 1, 1
2 1, 1, 1, 0, 0, 1, 0, 0, 1
2 0, 0, 1, 0, 1, 1, 1, 1, 1
np.vstack the Series of lists of lists, then reshape
pd.DataFrame(np.vstack(df['image']).reshape(len(df), -1))
0 1 2 3 4 5 6 7 8
0 0 0 0 1 0 1 0 1 1
1 1 1 1 0 0 1 0 0 1
2 0 0 1 0 1 1 1 1 1

How can I check if an 1-D array is in a 2-D array?

I have the following matrix in numpy [[1 0 0 1 1 1], [1 0 0 0 1 0], [1 1 0 0 1 0], [0 1 0 1 1 1], [0 0 0 1 0 1]] and I want to check if the array [1 0 0 0 1 0] is in the matrix. I try to use
if 1-array in 2-D array:
print('True')
but I have an error DeprecationWarning: elementwise comparison failed; this will raise an error in the future.
If I run
import numpy as np
arr_2d = np.array([[1, 0, 0, 1, 1, 1],
[1, 0, 0, 0, 1, 0],
[1, 1, 0, 0, 1, 0],
[0, 1, 0, 1, 1, 1],
[0, 0, 0, 1, 0, 1]])
arr_1d = np.array([1, 0, 0, 0, 1, 0])
print(arr_1d in arr_2d)
It returns True without warnings.
I would suggest posting the code you used to get to those arrays, so we can see if there's something wrong with them.

Displaying python 2d list without commas, brackets, etc. and newline after every row

I'm trying to display a python 2D list without the commas, brackets, etc., and I'd like to display a new line after every 'row' of the list is over.
This is my attempt at doing so:
ogm = repr(ogm).replace(',', ' ')
ogm = repr(ogm).replace('[', ' ')
ogm = repr(ogm).replace("'", ' ')
ogm = repr(ogm).replace('"', ' ')
print repr(ogm).replace(']', ' ')
This is the input:
[[0, 0, 0, 0, 0, 0, 0, 0, 0, 0], [1, 1, 0, 1, 0, 0, 0, 0, 0, 0], [1, 1, 1, 0, 0, 0, 0, 0, 0, 0], [0, 1, 1, 0, 0, 0, 0, 0, 1, 1], [0, 0, 0, 0, 0, 0, 1, 1, 1, 0], [0, 0, 0, 1, 1, 0, 1, 1, 1, 1], [0, 0, 1, 1, 0, 0, 1, 1, 1, 1], [0, 1, 0, 0, 0, 0, 0, 1, 1, 0], [0, 0, 0, 0, 0, 0, 1, 1, 0, 0], [1, 0, 1, 1, 1, 1, 0, 0, 0, 0]]
This is the output:
"' 0 0 0 0 0 0 0 0 0 0 1 1 0 1 0 0 0 0 0 0 1 1 1 0 0 0 0 0 0 0 0 1 1 0 0 0 0 0 1 1 0 0 0 0 0 0 1 1 1 0 0 0 0 1 1 0 1 1 1 1 0 0 1 1 0 0 1 1 1 1 0 1 0 0 0 0 0 1 1 0 0 0 0 0 0 0 1 1 0 0 1 0 1 1 1 1 0 0 0 0 '"
I'm encountering two problems:
There are stray " and ' which I can't get rid of
I have no idea how to do a newline
Simple way:
for row in list2D:
print " ".join(map(str,row))
Maybe join is appropriate for you:
print "\n".join(" ".join(str(el) for el in row) for row in ogm)
0 0 0 0 0 0 0 0 0 0
1 1 0 1 0 0 0 0 0 0
1 1 1 0 0 0 0 0 0 0
0 1 1 0 0 0 0 0 1 1
0 0 0 0 0 0 1 1 1 0
0 0 0 1 1 0 1 1 1 1
0 0 1 1 0 0 1 1 1 1
0 1 0 0 0 0 0 1 1 0
0 0 0 0 0 0 1 1 0 0
1 0 1 1 1 1 0 0 0 0
print "\n".join(" ".join(map(str, line)) for line in ogm)
If you want the rows and columns transposed
print "\n".join(" ".join(map(str, line)) for line in zip(*ogm))
for row in list2D:
print(*row)
To make the display even more readable you can use tabs or fill the cells with spaces to align the columns.
def printMatrix(matrix):
for lst in matrix:
for element in lst:
print(element, end="\t")
print("")
It will display
6 8 99
999 7 99
3 7 99
instead of
6 8 99
999 7 99
3 7 99
ogm = [[0, 0, 0, 0, 0, 0, 0, 0, 0, 0], [1, 1, 0, 1, 0, 0, 0, 0, 0, 0], [1, 1, 1, 0, 0, 0, 0, 0, 0, 0], [0, 1, 1, 0, 0, 0, 0, 0, 1, 1], [0, 0, 0, 0, 0, 0, 1, 1, 1, 0], [0, 0, 0, 1, 1, 0, 1, 1, 1, 1], [0, 0, 1, 1, 0, 0, 1, 1, 1, 1], [0, 1, 0, 0, 0, 0, 0, 1, 1, 0], [0, 0, 0, 0, 0, 0, 1, 1, 0, 0], [1, 0, 1, 1, 1, 1, 0, 0, 0, 0]]
s1 = str(ogm)
s2 = s1.replace('], [','\n')
s3 = s2.replace('[','')
s4 = s3.replace(']','')
s5= s4.replace(',','')
print s5
btw the " is actually two ' without any gap
i am learning python for a week. u guys have given some xcellent solutions. here is how i did it....this works too....... :)

Categories

Resources