I need to make a matrix, for n dimensions, to look like this for n=4:
[0,0,0,0]
[1,0,0,0]
[1,1,0,0]
[1,1,1,0]
because I need the positions of the 1s, ie
0, 1
0, 2
0, 3
1, 2
1, 3
2, 3
This is because I want to work out the distances between x points, without wasting time repeating a distance. These coordinates will let me do it only once.
You essentially want to increment the number of 1s (starting from 0) in each row, while padding the rest of the row with 0s, thereby keeping a constant length. Try something like this:
>>> n = 4
>>> [[1]*i + [0]*(n - i) for i in xrange(n)]
[[0, 0, 0, 0], [1, 0, 0, 0], [1, 1, 0, 0], [1, 1, 1, 0]]
If you're using NumPy:
>>> import numpy as np
>>> np.tril(np.ones((n, n), dtype=int), -1)
array([[0, 0, 0, 0],
[1, 0, 0, 0],
[1, 1, 0, 0],
[1, 1, 1, 0]])
List comprehensions to the rescue!
>>> matrix = [[1]*i + [0]*(4-1) for i in range(4)]
Substitute 4 with any range you want. For Python lower than 3.X you should you xrange instead of range
for n=5
matrix = [[1 if x<y else 0 for x in range(n)] for y in range(n)]
The output:
[0, 0, 0, 0, 0]
[1, 0, 0, 0, 0]
[1, 1, 0, 0, 0]
[1, 1, 1, 0, 0]
[1, 1, 1, 1, 0]
You explained that your reason for wanting the lower triangular matrix is to get the positions of the 1s. If that is really the only reason for making the matrix, there are more efficient ways to generate those positions. In particular, itertools.combinations(range(n), 2) would work:
In [209]: import itertools
In [210]: list(itertools.combinations(range(4), 2))
Out[210]: [(0, 1), (0, 2), (0, 3), (1, 2), (1, 3), (2, 3)]
Related
Given an integer n, create nxn nummy array such that all of the elements present in both its diagonals are 1 and all others are 0
Input: 4
Output
*[[1, 0, 0, 1],
[0, 1, 1, 0],
[0, 1, 1, 0],
[1, 0, 0, 1]]*
how do i achieve this array?
You can use the fill_diagonal to fill the elements in the principal diagonal and use it with np.fliplr to fill elements across the other diagonal. Refer link
import numpy as np
a = np.zeros((4, 4), int)
np.fill_diagonal(a, 1)
np.fill_diagonal(np.fliplr(a), 1)
Output :
array([[1, 0, 0, 1],
[0, 1, 1, 0],
[0, 1, 1, 0],
[1, 0, 0, 1]])
Create an identity matrix and its flipped view, then take the maximum of the two:
np.maximum(np.eye(5, dtype=int), np.fliplr(np.eye(5, dtype=int)))
#array([[1, 0, 0, 0, 1],
# [0, 1, 0, 1, 0],
# [0, 0, 1, 0, 0],
# [0, 1, 0, 1, 0],
# [1, 0, 0, 0, 1]])
Edited: changed [::-1] to np.fliplr (for better performance).
I would do (assuming n=5):
import numpy as np
d = np.diagflat(np.ones(5,int))
a = d | np.rot90(d)
print(a)
Output:
[[1 0 0 0 1]
[0 1 0 1 0]
[0 0 1 0 0]
[0 1 0 1 0]
[1 0 0 0 1]]
I harness fact that we could use | (binary OR) here for getting same effect as max, because arrays holds solely 0s and 1s.
I have a matrix M with values 0 through N within it. I'd like to unroll this matrix to create a new matrix A where each submatrix A[i, :, :] represents whether or not M == i.
The solution below uses a loop.
# Example Setup
import numpy as np
np.random.seed(0)
N = 5
M = np.random.randint(0, N, size=(5,5))
# Solution with Loop
A = np.zeros((N, M.shape[0], M.shape[1]))
for i in range(N):
A[i, :, :] = M == i
This yields:
M
array([[4, 0, 3, 3, 3],
[1, 3, 2, 4, 0],
[0, 4, 2, 1, 0],
[1, 1, 0, 1, 4],
[3, 0, 3, 0, 2]])
M.shape
# (5, 5)
A
array([[[0, 1, 0, 0, 0],
[0, 0, 0, 0, 1],
[1, 0, 0, 0, 1],
[0, 0, 1, 0, 0],
[0, 1, 0, 1, 0]],
...
[[1, 0, 0, 0, 0],
[0, 0, 0, 1, 0],
[0, 1, 0, 0, 0],
[0, 0, 0, 0, 1],
[0, 0, 0, 0, 0]]])
A.shape
# (5, 5, 5)
Is there a faster way, or a way to do it in a single numpy operation?
Broadcasted comparison is your friend:
B = (M[None, :] == np.arange(N)[:, None, None]).view(np.int8)
np.array_equal(A, B)
# True
The idea is to expand the dimensions in such a way that the comparison can be broadcasted in the manner desired.
As pointed out by #Alex Riley in the comments, you can use np.equal.outer to avoid having to do the indexing stuff yourself,
B = np.equal.outer(np.arange(N), M).view(np.int8)
np.array_equal(A, B)
# True
You can make use of some broadcasting here:
P = np.arange(N)
Y = np.broadcast_to(P[:, None], M.shape)
T = np.equal(M, Y[:, None]).astype(int)
Alternative using indices:
X, Y = np.indices(M.shape)
Z = np.equal(M, X[:, None]).astype(int)
You can index into the identity matrix like so
A = np.identity(N, int)[:, M]
or so
A = np.identity(N, int)[M.T].T
Or use the new (v1.15.0) put_along_axis
A = np.zeros((N,5,5), int)
np.put_along_axis(A, M[None], 1, 0)
Note if N is much larger than 5 then creating an NxN identity matrix may be considered wasteful. We can mitigate this using stride tricks:
def read_only_identity(N, dtype=float):
z = np.zeros(2*N-1, dtype)
s, = z.strides
z[N-1] = 1
return np.lib.stride_tricks.as_strided(z[N-1:], (N, N), (-s, s))
Suppose I have a 4x4 matrix that looks like the following:
[[0, 0, 0, 0]
[0, 0, 1, 0]
[0, 0, 0, 0]
[0, 0, 0, 0]]
I want to write a function that takes all 4 surrounding fields of the one and turns them into a 1 as well.
The above matrix would become:
[[0, 0, 1, 0]
[0, 1, 1, 1]
[0, 0, 1, 0]
[0, 0, 0, 0]]
I know that this is possible using if-statements, but I really want to optimize my code.
The matrix only contains 0's and 1's. If the 1 is at the edge of the matrix, the 1's should not wrap around, i.e. if the most left field is a 1, the most right field still stays at 0. Also, I am using Python 3.5
Is there a more mathematical or concise way to do this?
This looks like binary dilation. There's a function available in SciPy that implements this efficiently:
>>> from scipy.ndimage import binary_dilation
>>> x
array([[0, 0, 0, 0],
[0, 0, 1, 0],
[0, 0, 0, 0],
[0, 0, 0, 0]])
>>> binary_dilation(x).astype(int)
array([[0, 0, 1, 0],
[0, 1, 1, 1],
[0, 0, 1, 0],
[0, 0, 0, 0]])
1s at the edges are handled as you've specified they should be (i.e. no wrapping).
See the documentation for further options and arguments.
FWIW, here's a way to do it just using Numpy. We pad the original data with rows & columns of zeros, and then bitwise-OR offset copies of the padded array together.
import numpy as np
def fill(data):
rows, cols = data.shape
padded = np.pad(data, 1, 'constant', constant_values=0)
result = np.copy(data)
for r, c in ((0, 1), (1, 0), (1, 2), (2, 1)):
result |= padded[r:r+rows, c:c+cols]
return result
data = np.asarray(
[
[0, 0, 0, 0],
[0, 0, 1, 0],
[0, 0, 0, 0],
[0, 0, 0, 0],
], dtype='uint8')
print(data, '\n')
result = fill(data)
print(result)
output
[[0 0 0 0]
[0 0 1 0]
[0 0 0 0]
[0 0 0 0]]
[[0 0 1 0]
[0 1 1 1]
[0 0 1 0]
[0 0 0 0]]
Say I've labeled an image with scipy.ndimage.measurements.label like so:
[[0, 1, 0, 0, 0, 0],
[0, 1, 0, 0, 0, 0],
[0, 1, 0, 0, 0, 0],
[0, 0, 0, 0, 3, 0],
[2, 2, 0, 0, 0, 0],
[2, 2, 0, 0, 0, 0]]
What's a fast way to collect the coordinates belonging to each label? I.e. something like:
{ 1: [[0, 1], [1, 1], [2, 1]],
2: [[4, 0], [4, 1], [5, 0], [5, 1]],
3: [[3, 4]] }
I'm working with images that are ~15,000 x 5000 pixels in size, and roughly half of each image's pixels are labeled (i.e. non-zero).
Rather than iterating through the entire image with nditer, would it be faster to do something like np.where(img == label) for each label?
EDIT:
Which algorithm is fastest depends on how big the labeled image is as compared to how many labels it has. Warren Weckesser and Salvador Dali / BHAT IRSHAD's methods (which are based on np.nonzero and np.where) all seem to scale linearly with the number of labels, whereas iterating through each image element with nditer obviously scales linearly with the size of labeled image.
The results of a small test:
size: 1000 x 1000, num_labels: 10
weckesser ... 0.214357852936s
dali ... 0.650229930878s
nditer ... 6.53645992279s
size: 1000 x 1000, num_labels: 100
weckesser ... 0.936990022659s
dali ... 1.33582305908s
nditer ... 6.81486487389s
size: 1000 x 1000, num_labels: 1000
weckesser ... 8.43906402588s
dali ... 9.81333303452s
nditer ... 7.47897100449s
size: 1000 x 1000, num_labels: 10000
weckesser ... 100.405524015s
dali ... 118.17239809s
nditer ... 9.14583897591s
So the question becomes more specific:
For labeled images in which the number of labels is on the order of sqrt(size(image)) is there an algorithm to gather label coordinates that is faster than iterating through every image element (i.e. with nditer)?
Here's a possibility:
import numpy as np
a = np.array([[0, 1, 0, 0, 0, 0],
[0, 1, 0, 0, 0, 0],
[0, 1, 0, 0, 0, 0],
[0, 0, 0, 0, 3, 0],
[2, 2, 0, 0, 0, 0],
[2, 2, 0, 0, 0, 0]])
# If the array was computed using scipy.ndimage.measurements.label, you
# already know how many labels there are.
num_labels = 3
nz = np.nonzero(a)
coords = np.column_stack(nz)
nzvals = a[nz[0], nz[1]]
res = {k:coords[nzvals == k] for k in range(1, num_labels + 1)}
I called this script get_label_indices.py. Here's a sample run:
In [97]: import pprint
In [98]: run get_label_indices.py
In [99]: pprint.pprint(res)
{1: array([[0, 1],
[1, 1],
[2, 1]]),
2: array([[4, 0],
[4, 1],
[5, 0],
[5, 1]]),
3: array([[3, 4]])}
You can do something like this (let img is your original nd.array)
res = {}
for i in np.unique(img)[1:]:
x, y = np.where(a == i)
res[i] = zip(list(x), list(y))
which will give you what you want:
{
1: [(0, 1), (1, 1), (2, 1)],
2: [(4, 0), (4, 1), (5, 0), (5, 1)],
3: [(3, 4)]
}
Whether it will be faster - is up to the benchmark to determine.
Per Warren's suggestion, I do not need to use unique and can just do
res = {}
for i in range(1, num_labels + 1)
x, y = np.where(a == i)
res[i] = zip(list(x), list(y))
Try this:
>>> z
array([[0, 1, 0, 0, 0, 0],
[0, 1, 0, 0, 0, 0],
[0, 1, 0, 0, 0, 0],
[0, 0, 0, 0, 3, 0],
[2, 2, 0, 0, 0, 0],
[2, 2, 0, 0, 0, 0]])
>>> {i:zip(*np.where(z==i)) for i in np.unique(z) if i}
{1: [(0, 1), (1, 1), (2, 1)], 2: [(4, 0), (4, 1), (5, 0), (5, 1)], 3: [(3, 4)]}
This is basically an argsort operation, with some additional work to get the desired format:
def sorting_based(img, nlabels):
img_flat = img.ravel()
label_counts = np.bincount(img_flat)
lin_idx = np.argsort(img_flat)[label_counts[0]:]
coor = np.column_stack(np.unravel_index(lin_idx, img.shape))
ptr = np.cumsum(label_counts[1:-1])
out = dict(enumerate(np.split(coor, ptr), start=1))
return out
As you found out, doing np.where(img == label) for each label results in quadratic runtime O(m*n), with m=n_pixels and n=n_labels. The sorting based approach reduces the complexity to O(m*log(m) + n).
It is possible to do this operation in linear time, but I don't think it's possible to vectorize with Numpy. You could abuse the scipy.sparse.csr_matrix similar to this answer, but at that point you're probably better off writing code that actually makes sense, in Numba, Cython, etc.
I'm pretty sure this is an easy problem but I am completely blacking out on how to fix this. I am trying to work my way through the PGM class on coursera and it starts of with joint probability distribution. So I am trying to generate a list of all possible distributions given n variables, where each variable can take on some discrete value between 0...z
so for instance say we have 3 variables, and each can take on values of just 0 and 1 I want to generate this:
[[0, 0, 1]
[0, 1, 0]
[1, 0, 0]
[1, 1, 0]
[0, 1, 1]
[1, 1, 1]
[1, 0, 1]
[0, 0, 0]]
I am working in python I am drawing a blank on how to dynamically generate this.
If you prefer list comprehension:
[[a, b, c] for a in range(2) for b in range(2) for c in range(2)]
And I forgot to mention that you can use pprint to get the effect you want:
>>> import pprint
>>> pprint.pprint([[a, b, c] for a in range(2) for b in range(2) for c in range(2)])
[[0, 0, 0],
[0, 0, 1],
[0, 1, 0],
[0, 1, 1],
[1, 0, 0],
[1, 0, 1],
[1, 1, 0],
[1, 1, 1]]
>>>
It sounds like you want the Cartesian product:
from itertools import product
for x in product([0,1], [0,1], [0,1]):
print x
[0, 0, 0]
[0, 0, 1]
[0, 1, 0]
[0, 1, 1]
[1, 0, 0]
[1, 0, 1]
[1, 1, 0]
[1, 1, 1]
Slight improvement over Nathan's method:
>>> import itertools
>>> list(itertools.product([0, 1], repeat=3))
[(0, 0, 0),
(0, 0, 1),
(0, 1, 0),
(0, 1, 1),
(1, 0, 0),
(1, 0, 1),
(1, 1, 0),
(1, 1, 1)]