I am building out a function to convert coo row index (illustrated as 'rows' below) to csr row pointers (results stored in 'rows_v3' below) without using any package.
The logic is accurate and works on the example below. However, when I ran this on a row with a length of 100k, the code keeps running on forever without completing. I suspect this had to do with the fact that I have nested loops in the function below - is that right? And if so, how can I tweak this function to address this runtime issue?
rows = [0, 0, 0, 1, 1, 1, 2, 2, 2, 2, 3, 3, 3, 3, 4, 4, 4, 5, 5, 5, 6, 6]
# number of elements in csr_ptrs
unique_rows = len(set(rows)) +1
rows_v3 = [0] * unique_rows
for i in range(0, unique_rows):
if i == 0:
rows_v3[i] = 0
else:
nzz = len([x for x in rows if x == (i-1)])
rows_v3[i] = rows_v3[i-1] + nzz
rows_v3
There's no need to nest loops for this.
rows = [0, 0, 0, 1, 1, 1, 2, 2, 2, 2, 3, 3, 3, 3, 4, 4, 4, 5, 5, 5, 6, 6]
# You need to use this instead of the length of the unique elements
# Rows with no non-zero values are still rows
# Even this will miss all-zero rows at the bottom of the matrix
n_rows = max(rows) + 1
# Create a pointer list
indptr = [0] * n_rows
# Count the number of values in each row
for i in rows:
indptr[i] += 1
# Do a cumsum
for i in range(n_rows - 1):
indptr[i + 1] += indptr[i]
# Add a zero on the front of the pointer list
# If that's the style of indptr you're doing
intptr = [0] + indptr
First lets make a sparse matrix for reference:
In [36]: from scipy import sparse
In [37]: rows = [0, 0, 0, 1, 1, 1, 2, 2, 2, 2, 3, 3, 3, 3, 4, 4, 4, 5, 5, 5, 6, 6]
In [38]: rows = np.array(rows)
In [39]: data = np.ones(rows.shape, dtype=int)
In [40]: cols = np.arange(len(data))
In [41]: M = sparse.coo_matrix((data, (rows, cols)))
In [42]: M
Out[42]:
<7x22 sparse matrix of type '<class 'numpy.int64'>'
with 22 stored elements in COOrdinate format>
In [43]: M
Out[43]:
<7x22 sparse matrix of type '<class 'numpy.int64'>'
with 22 stored elements in COOrdinate format>
In [44]: M.A
Out[44]:
array([[1, 1, 1, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0],
[0, 0, 0, 1, 1, 1, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0],
[0, 0, 0, 0, 0, 0, 1, 1, 1, 1, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0],
[0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 1, 1, 1, 1, 0, 0, 0, 0, 0, 0, 0, 0],
[0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 1, 1, 1, 0, 0, 0, 0, 0],
[0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 1, 1, 1, 0, 0],
[0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 1, 1]])
I created cols so that there are no duplicates. Otherwise conversion to csr will sum duplicates, and make our task more complicated.
In [45]: Mr = M.tocsr()
In [46]: Mr.indptr
Out[46]: array([ 0, 3, 6, 10, 14, 17, 20, 22], dtype=int32)
To get the indptr directly (again no-duplicates):
In [48]: x = np.diff(rows)
In [49]: x
Out[49]: array([0, 0, 1, 0, 0, 1, 0, 0, 0, 1, 0, 0, 0, 1, 0, 0, 1, 0, 0, 1, 0])
Add start and ending 1's:
In [50]: x1 = np.concatenate(([1], x, [1]))
In [51]: x1
Out[51]:
array([1, 0, 0, 1, 0, 0, 1, 0, 0, 0, 1, 0, 0, 0, 1, 0, 0, 1, 0, 0, 1, 0,
1])
In [52]: np.nonzero(x1)[0]
Out[52]: array([ 0, 3, 6, 10, 14, 17, 20, 22])
I have two labelled 2D numpy arrays a and b with identical shapes. I would like to re-label the array b by something similar to a GIS geometric union of the two arrays, such that cells with unique combination of values in array a and b are assigned new unique IDs:
I'm not concerned with the specific numbering of the regions in the output, so long as the values are all unique. I have attached sample arrays and desired outputs below: my real datasets are much larger, with both arrays having integer labels which range from "1" to "200000". So far I've experimented with concatenating the array IDs to form unique combinations of values, but ideally I would like to output a simple set of new IDs in the form of 1, 2, 3..., etc.
import numpy as np
import matplotlib.pyplot as plt
# Example labelled arrays a and b
input_a = np.array([[0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0],
[0, 1, 1, 1, 1, 1, 2, 2, 2, 2, 2, 0],
[0, 1, 1, 1, 1, 1, 2, 2, 2, 2, 2, 0],
[0, 1, 1, 1, 1, 1, 2, 2, 2, 2, 2, 0],
[0, 1, 1, 1, 1, 1, 2, 2, 2, 2, 2, 0],
[0, 1, 1, 1, 1, 1, 2, 2, 2, 2, 2, 0],
[0, 3, 3, 3, 3, 3, 2, 2, 2, 2, 2, 0],
[0, 0, 3, 3, 3, 3, 2, 2, 2, 2, 0, 0],
[0, 0, 3, 3, 3, 3, 2, 2, 2, 2, 0, 0],
[0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0],
[0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0],
[0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0]])
input_b = np.array([[0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0],
[0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0],
[0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0],
[0, 0, 1, 1, 1, 3, 3, 3, 3, 3, 0, 0],
[0, 0, 1, 1, 1, 3, 3, 3, 3, 3, 0, 0],
[0, 0, 1, 1, 1, 2, 2, 2, 2, 2, 0, 0],
[0, 0, 1, 1, 1, 2, 2, 2, 2, 2, 0, 0],
[0, 0, 1, 1, 1, 2, 2, 2, 2, 2, 0, 0],
[0, 0, 1, 1, 1, 2, 2, 2, 2, 2, 0, 0],
[0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0],
[0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0],
[0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0]])
# Plot inputs
plt.imshow(input_a, cmap="spectral", interpolation='nearest')
plt.imshow(input_b, cmap="spectral", interpolation='nearest')
# Desired output, union of a and b
output = np.array([[0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0],
[0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0],
[0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0],
[0, 0, 1, 1, 1, 2, 3, 3, 3, 3, 0, 0],
[0, 0, 1, 1, 1, 2, 3, 3, 3, 3, 0, 0],
[0, 0, 1, 1, 1, 4, 7, 7, 7, 7, 0, 0],
[0, 0, 5, 5, 5, 6, 7, 7, 7, 7, 0, 0],
[0, 0, 5, 5, 5, 6, 7, 7, 7, 7, 0, 0],
[0, 0, 5, 5, 5, 6, 7, 7, 7, 7, 0, 0],
[0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0],
[0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0],
[0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0]])
# Plot desired output
plt.imshow(output, cmap="spectral", interpolation='nearest')
If I understood the circumstances correctly, you are looking to have unique pairings from a and b. So, 1 from a and 1 from b would have one unique tag in the output; 1 from a and 3 from b would have another unique tag in the output. Also looking at the desired output in the question, it seems that there is an additional conditional situation here that if b is zero, the output is to be zero as well irrespective of the unique pairings.
The following implementation tries to solve all of that -
c = a*(b.max()+1) + b
c[b==0] = 0
_,idx = np.unique(c,return_inverse= True)
out = idx.reshape(b.shape)
Sample run -
In [21]: a
Out[21]:
array([[0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0],
[0, 1, 1, 1, 1, 1, 2, 2, 2, 2, 2, 0],
[0, 1, 1, 1, 1, 1, 2, 2, 2, 2, 2, 0],
[0, 1, 1, 1, 1, 1, 2, 2, 2, 2, 2, 0],
[0, 1, 1, 1, 1, 1, 2, 2, 2, 2, 2, 0],
[0, 1, 1, 1, 1, 1, 2, 2, 2, 2, 2, 0],
[0, 3, 3, 3, 3, 3, 2, 2, 2, 2, 2, 0],
[0, 0, 3, 3, 3, 3, 2, 2, 2, 2, 0, 0],
[0, 0, 3, 3, 3, 3, 2, 2, 2, 2, 0, 0],
[0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0],
[0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0],
[0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0]])
In [22]: b
Out[22]:
array([[0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0],
[0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0],
[0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0],
[0, 0, 1, 1, 1, 3, 3, 3, 3, 3, 0, 0],
[0, 0, 1, 1, 1, 3, 3, 3, 3, 3, 0, 0],
[0, 0, 1, 1, 1, 2, 2, 2, 2, 2, 0, 0],
[0, 0, 1, 1, 1, 2, 2, 2, 2, 2, 0, 0],
[0, 0, 1, 1, 1, 2, 2, 2, 2, 2, 0, 0],
[0, 0, 1, 1, 1, 2, 2, 2, 2, 2, 0, 0],
[0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0],
[0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0],
[0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0]])
In [23]: out
Out[23]:
array([[0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0],
[0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0],
[0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0],
[0, 0, 1, 1, 1, 3, 5, 5, 5, 5, 0, 0],
[0, 0, 1, 1, 1, 3, 5, 5, 5, 5, 0, 0],
[0, 0, 1, 1, 1, 2, 4, 4, 4, 4, 0, 0],
[0, 0, 6, 6, 6, 7, 4, 4, 4, 4, 0, 0],
[0, 0, 6, 6, 6, 7, 4, 4, 4, 4, 0, 0],
[0, 0, 6, 6, 6, 7, 4, 4, 4, 4, 0, 0],
[0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0],
[0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0],
[0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0]])
Sample plot -
# Plot inputs
plt.figure()
plt.imshow(a, cmap="spectral", interpolation='nearest')
plt.figure()
plt.imshow(b, cmap="spectral", interpolation='nearest')
# Plot output
plt.figure()
plt.imshow(out, cmap="spectral", interpolation='nearest')
Here is a way to do it conceptually in terms of set union, but not to GIS geometric union, since that was mentioned after I answered.
Make a list of all possible unique 2-tuples of values with one from a and the other from b in that order. Map each tuple in that list to its index in it. Create the union array using that map.
For example say a and b are arrays each containing values in range(4) and assume for simplicity they have the same shape. Then:
v = range(4)
from itertools import permutations
p = list(permutations(v,2))
m = {}
for i,x in enumerate(p):
m[x] = i
union = np.empty_like(a)
for i,x in np.ndenumerate(a):
union[i] = m[(x,b[i])]
For demonstration, generating a and b with
np.random.randint(4, size=(3, 3))
produced:
a = array([[3, 0, 3],
[1, 3, 2],
[0, 0, 3]])
b = array([[1, 3, 1],
[0, 0, 1],
[2, 3, 0]])
m = {(0, 1): 0,
(0, 2): 1,
(0, 3): 2,
(1, 0): 3,
(1, 2): 4,
(1, 3): 5,
(2, 0): 6,
(2, 1): 7,
(2, 3): 8,
(3, 0): 9,
(3, 1): 10,
(3, 2): 11}
union = array([[10, 2, 10],
[ 3, 9, 7],
[ 1, 2, 9]])
In this case the property that a union should be bigger or equal to its composits is reflected in increased numerical values rather than increase in number of elements.
An issue with using itertools permutations is that the number of permutations could be much larger than needed. It would be much larger if the number of overlaps per area is much smaller than the number of areas.
The question uses Union but the picture shows an Intersection. Divakar's answer replicates the pictured Intersection, and is more elegant than my solution below, which produces the Union.
One could make a dictionary of only the actual overlaps, and then work from that. Flattening the input arrays first makes this easier for me to see, I'm not sure if that is feasible for you:
shp = numpy.shape(input_a)
a = input_a.flatten()
b = input_b.flatten()
s = set(((i,j) for i,j in zip(a,b))) # unique pairings
d = {p:i for i,p in enumerate(sorted(list(s))} # dict{pair:index}
output_c = numpy.array([d[i,j] for i,j in zip(a,b)]).reshape(shp)
array([[ 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0],
[ 0, 1, 1, 1, 1, 1, 5, 5, 5, 5, 5, 0],
[ 0, 1, 1, 1, 1, 1, 5, 5, 5, 5, 5, 0],
[ 0, 1, 2, 2, 2, 4, 7, 7, 7, 7, 5, 0],
[ 0, 1, 2, 2, 2, 4, 7, 7, 7, 7, 5, 0],
[ 0, 1, 2, 2, 2, 3, 6, 6, 6, 6, 5, 0],
[ 0, 8, 9, 9, 9, 10, 6, 6, 6, 6, 5, 0],
[ 0, 0, 9, 9, 9, 10, 6, 6, 6, 6, 0, 0],
[ 0, 0, 9, 9, 9, 10, 6, 6, 6, 6, 0, 0],
[ 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0],
[ 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0],
[ 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0]])
I'm doing image processing with scipy.ndimage. Given a ring-shaped object, I'd like to generate a "profile" around its circumference. The profile could be something like thickness measurements at various point around the ring, or the mean signal along the ring's "thickness."
It seems to me that I could use ndimage.mean if I could first get a good labels image.
If my ring looks like this,
A = array([[0, 0, 0, 0, 1, 0, 0, 0, 0, 0],
[0, 0, 1, 1, 1, 1, 1, 1, 0, 0],
[0, 0, 1, 1, 1, 1, 1, 1, 0, 0],
[0, 0, 1, 1, 0, 0, 1, 1, 0, 0],
[0, 0, 1, 1, 0, 0, 1, 1, 0, 0],
[0, 0, 1, 1, 0, 0, 1, 1, 0, 0],
[0, 0, 1, 1, 0, 0, 1, 1, 1, 0],
[0, 0, 1, 1, 0, 0, 1, 1, 0, 0],
[0, 0, 1, 1, 1, 1, 1, 1, 0, 0],
[0, 0, 0, 0, 1, 0, 0, 0, 0, 0]])
I could get the "profile of means" with numpy.mean( A, labels ) where labels is.
array([[0, 0, 0, 0, 2, 0, 0, 0, 0, 0],
[0, 0, 1, 1, 2, 3, 4, 4, 0, 0],
[0, 0, 1, 1, 2, 3, 4, 5, 0, 0],
[0, 0, 16, 16, 0, 0, 5, 5, 0, 0],
[0, 0, 15, 15, 0, 0, 6, 6, 0, 0],
[0, 0, 14, 14, 0, 0, 7, 7, 0, 0],
[0, 0, 13, 13, 0, 0, 8, 8, 8, 0],
[0, 0, 13, 12, 0, 0, 9, 9, 0, 0],
[0, 0, 12, 12, 11, 10, 9, 9, 0, 0],
[0, 0, 0, 0, 11, 0, 0, 0, 0, 0]])
I bet there's some interpolation stuff that will go overlooked with this, but it's all I can come up with on my own.
Is there a way to generate my proposed labels image? Is there a better approach for generating my profiles?
A standard probability distribution on a circle (or sphere) is the Von-Mises Fisher distribution.
Scipy supports this distribution: http://docs.scipy.org/doc/scipy/reference/generated/scipy.stats.vonmises.html
So you should be able to use the fit function to find maximum-likelihood parameters for your data.