How to toggle theano matrix based on vector of int position - python

Using theano tensor operations, how can I toggle one cell on each row of a matrix based on a integer position indicator on the correspond row index of a vector (i.e. |v| = rows of the matrix). For example, given a 100x5 matrix of zeros
M = [
[0, 0, 0, 0, 0],
[0, 0, 0, 0, 0],
...
[0, 0, 0, 0, 0],
[0, 0, 0, 0, 0]
] # |M| = 100x5
and a 100-element vector of integer in the range of [0, 4].
V = [2, 4, ..., 0, 2] # |V| = 100, max(V) = 4, min(V) = 0
update (or create another) matrix M to
M = [
[0, 0, 1, 0, 0],
[0, 0, 0, 0, 1],
...
[1, 0, 0, 0, 0],
[0, 0, 1, 0, 0]
] # |M| = 100x5
(I know how to do this iteratively using conventional codes, but I want to run it as part of an algorithm on GPU without complicating my input which is currently vector V, so a direct theano implementation would be great.)

I figured out the answer myself. This operation is known as one-hot and it is supported as the "to_one_hot" in Theano's extra_ops package. Code:
M_one_hot = theano.tensor.extra_ops.to_one_hot(V, 5, dtype='int32')

Related

Graph Edges intialization between nodes using numpy?

Let's say I have to initialise the bi-directional edges for the following graph between the nodes:
I can easily do this using the following code:
import numpy as np
node_num = 3
graph = np.ones([node_num, node_num]) - np.eye(node_num)
Now I am extending this graph in the following way:
What is the simple and efficient way to make it code for this graph?
Assuming you're looking for an adjacency matrix, you could use:
out = np.block([
[1 - np.eye(3), np.eye(3) ],
[ np.eye(3), np.zeros((3, 3))]
]).astype(int)
out:
array([[0, 1, 1, 1, 0, 0], # A
[1, 0, 1, 0, 1, 0], # B
[1, 1, 0, 0, 0, 1], # C
[1, 0, 0, 0, 0, 0], # BC
[0, 1, 0, 0, 0, 0], # AB
[0, 0, 1, 0, 0, 0]]) # AB(red)
but I would suggest you just initialize it as the outputted adjacency matrix. I would only use a short one liner for very simple graphs like your first image, not the second.

How to keep a fixed size of unique values in random positions in an array while replacing others with a mask?

This can be a very simple question as I am still exploring Python. And for this issue I use numpy.
Updated 09/30/21: adopted and modified codes shown below for any potential future reference. I also added an elif in the loop for classes that have fewer counts than the wanted size. Some codes may be unnecessary tho.
new_array = test_array.copy()
uniques, counts = np.unique(new_array, return_counts=True)
print("classes:", uniques, "counts:", counts)
for unique, count in zip(uniques, counts):
#print (unique, count)
if unique != 0 and count > 3:
ids = np.random.choice(count, count-3, replace=False)
new_array[tuple(i[ids] for i in np.where(new_array == unique))] = 0
elif unique != 0 and count <= 3:
ids = np.random.choice(count, count, replace=False)
new_array[tuple(i[ids] for i in np.where(new_array == unique))] = unique
Below is original question.
Let's say I have a 2D array like this:
test_array = np.array([[0,0,0,0,0],
[1,1,1,1,1],
[0,0,0,0,0],
[2,2,2,4,4],
[4,4,4,2,2],
[0,0,0,0,0]])
print("existing classes:", np.unique(test_array))
# "existing classes: [0 1 2 4]"
Now I want to keep a fixed size (e.g. 2 values) in each class that != 0 (in this case two 1s, two 2s, and two 4s) and replace the rest with 0. Where the value being replaced is random with each run (or from a seed).
For example, with run 1 I will have
([[0,0,0,0,0],
[1,0,0,1,0],
[0,0,0,0,0],
[2,0,0,0,4],
[4,0,0,2,0],
[0,0,0,0,0]])
with another run it might be
([[0,0,0,0,0],
[1,1,0,0,0],
[0,0,0,0,0],
[2,0,2,0,4],
[4,0,0,0,0],
[0,0,0,0,0]])
etc. Could anyone help me with this?
My strategy is
Create a new array initialized to all zeros
Find the elements in each class
For each class
Randomly sample two of elements to keep
Set those elements of the new array to the class value
The trick is keeping the shape of the indexes appropriate so you retain the shape of the original array.
import numpy as np
test_array = np.array([[0,0,0,0,0],
[1,1,1,1,1],
[0,0,0,0,0],
[2,2,2,4,4],
[4,4,4,2,2],
[0,0,0,0,0]])
def sample_classes(arr, n_keep=2, random_state=42):
classes, counts = np.unique(test_array, return_counts=True)
rng = np.random.default_rng(random_state)
out = np.zeros_like(arr)
for klass, count in zip(classes, counts):
# Find locations of the class elements
indexes = np.nonzero(arr == klass)
# Sample up to n_keep elements of the class
keep_idx = rng.choice(count, n_keep, replace=False)
# Select the kept elements and reformat for indexing the output array and retaining its shape
keep_idx_reshape = tuple(ind[keep_idx] for ind in indexes)
out[keep_idx_reshape] = klass
return out
You can use it like
In [3]: sample_classes(test_array) [3/1174]
Out[3]:
array([[0, 0, 0, 0, 0],
[0, 1, 1, 0, 0],
[0, 0, 0, 0, 0],
[2, 0, 0, 4, 0],
[4, 0, 0, 2, 0],
[0, 0, 0, 0, 0]])
In [4]: sample_classes(test_array, n_keep=3)
Out[4]:
array([[0, 0, 0, 0, 0],
[1, 0, 1, 1, 0],
[0, 0, 0, 0, 0],
[0, 2, 0, 4, 0],
[4, 4, 0, 2, 2],
[0, 0, 0, 0, 0]])
In [5]: sample_classes(test_array, random_state=88)
Out[5]:
array([[0, 0, 0, 0, 0],
[0, 0, 1, 1, 0],
[0, 0, 0, 0, 0],
[0, 0, 0, 0, 0],
[4, 0, 4, 2, 2],
[0, 0, 0, 0, 0]])
In [6]: sample_classes(test_array, random_state=88, n_keep=4)
Out[6]:
array([[0, 0, 0, 0, 0],
[0, 1, 1, 1, 1],
[0, 0, 0, 0, 0],
[2, 2, 0, 4, 4],
[4, 4, 0, 2, 2],
[0, 0, 0, 0, 0]])
Here is my not-so-elegant solution:
def unique(arr, num=2, seed=None):
np.random.seed(seed)
vals = {}
for i, row in enumerate(arr):
for j, val in enumerate(row):
if val in vals and val != 0:
vals[val].append((i, j))
elif val != 0:
vals[val] = [(i, j)]
new = np.zeros_like(arr)
for val in vals:
np.random.shuffle(vals[val])
while len(vals[val]) > num:
vals[val].pop()
for row, col in vals[val]:
new[row,col] = val
return new
The following should be O(n log n) in array size
def keep_k_per_class(data,k,rng):
out = np.zeros_like(data)
unq,cnts = np.unique(data,return_counts=True)
assert (cnts >= k).all()
# calculate class boundaries from class sizes
CNTS = cnts.cumsum()
# indirectly group classes together by partial sorting
idx = data.ravel().argpartition(CNTS[:-1])
# the following lines implement simultaneous drawing without replacement
# from all classes
# lower boundaries of intervals to draw random numbers from
# for each class they start with the lower class boundary
# and from there grow one by one - together with the
# swapping out below this implements "without replacement"
lb = np.add.outer(np.arange(k),CNTS-cnts)
pick = rng.integers(lb,CNTS,lb.shape)
for l,p in zip(lb,pick):
# populate output array
out.ravel()[idx[p]] = unq
# swap out used indices so still available ones occupy a linear
# range (per class)
idx[p] = idx[l]
return out
Examples:
rng = np.random.default_rng()
>>>
>>> keep_k_per_class(test_array,2,rng)
array([[0, 0, 0, 0, 0],
[1, 1, 0, 0, 0],
[0, 0, 0, 0, 0],
[2, 0, 2, 0, 4],
[0, 4, 0, 0, 0],
[0, 0, 0, 0, 0]])
>>> keep_k_per_class(test_array,2,rng)
array([[0, 0, 0, 0, 0],
[1, 1, 0, 0, 0],
[0, 0, 0, 0, 0],
[0, 2, 0, 0, 0],
[4, 0, 4, 0, 2],
[0, 0, 0, 0, 0]])
and a large one
>>> BIG = np.add.outer(np.tile(test_array,(100,100)),np.arange(0,500,5))
>>> BIG.size
30000000
>>> res = keep_k_per_class(BIG,30,rng)
### takes ~4 sec
### check
>>> np.unique(np.bincount(res.ravel()),return_counts=True)
(array([ 0, 30, 29988030]), array([100, 399, 1]))

Map Terrain Analysis: Alternative to numpy.roll function?

I'm trying to analyse map terrain given by the StarCraft 2 bot API.
A beginner's task for this analysis was finding cliffs for reapers, which are special units in SC2 that can jump up and down cliffs.
To solve this, I analyse points where the point itself is not pathable (=cliff) and the northern and southern two points are pathable. Pathable points are marked as 1 and not pathable as 0 in the array.
The terrain map exists as a 2D numpy array. The following is a small excerpt from a larger 200x200 array:
import numpy as np
example = np.array([[0, 0, 0, 0],
[0, 1, 1, 0],
[0, 0, 0, 0],
[0, 1, 1, 0],
[0, 0, 0, 0]])
Here, the points [2, 1] and [2, 2] would match the criteria where the points themselves are not pathable (=0) and the points above and below them are pathable (=1).
This can be achieved by the following code:
above = np.roll(example, 1, axis=0) # Shift rows downwards
below = np.roll(example, -1, axis=0) # Shift rows upwards
result = np.zeros_like(example) # Create array with zeros
result[(example == 0) & (above == 1) & (below == 1)] = 1 # Set cells to 1 that match condition
print(repr(result))
# array([[0, 0, 0, 0],
[0, 0, 0, 0],
[0, 1, 1, 0],
[0, 0, 0, 0],
[0, 0, 0, 0]])
Now my question is if the same can be achieved with less code?
The np.roll function creates a new np.array object each time, so analysing hundreds of nearby points could probably result in 100 lines of uncessary code and high memory usage.
I'm trying to find something similar to
result = np.zeros_like(example)
result[(example == 0) & (example[-1, 0] == 1) & (example[1, 0 == 1)] = 1
# or
result[(example == 0) & ((example[-1:2, 0].sum() == 2)] = 1
Here the numbers in the brackets display the relative position to the currently analysed point, but I don't know if there is a way to get this to work with numpy.
Also the result for the zeroth row wouldn't be clear when checking the point "above" it: It could access either the last row, result in an error or return a default value (0 or 1).
Edit:
I found this post and it pointed me towards the scipy convolve2d function which can be applied here, which might be what I am looking for:
import numpy as np
from scipy import signal
example = np.array([[0, 0, 0, 0],
[0, 1, 1, 0],
[0, 0, 0, 0],
[0, 1, 1, 0],
[0, 0, 0, 0]])
kernel = np.zeros((3, 3), dtype=int)
kernel[::2, 1] = 1
print(repr(kernel))
# array([[0, 1, 0],
# [0, 0, 0],
# [0, 1, 0]])
result2 = signal.convolve2d(example, kernel, mode="same")
print(repr(result2))
# array([[0, 1, 1, 0],
# [0, 0, 0, 0],
# [0, 2, 2, 0],
# [0, 0, 0, 0],
# [0, 1, 1, 0]])
result2[result2 < 2] = 0
result2[result2 == 2] = 1
print(repr(result2))
# array([[0, 0, 0, 0],
# [0, 0, 0, 0],
# [0, 1, 1, 0],
# [0, 0, 0, 0],
# [0, 0, 0, 0]])
Edit2:
Another solution may be scipy.ndimage.minimum_filter which seems to work similarly:
import numpy as np
from scipy import ndimage
example = np.array([[0, 0, 0, 0],
[0, 1, 1, 0],
[0, 0, 0, 0],
[0, 1, 1, 0],
[0, 0, 0, 0]])
kernel = np.zeros((3, 3), dtype=int)
kernel[::2, 1] = 1
print(repr(kernel))
# array([[0, 1, 0],
# [0, 0, 0],
# [0, 1, 0]])
result3 = ndimage.minimum_filter(example, footprint=kernel_vertical, mode="constant")
print(repr(result3))
# array([[0, 0, 0, 0],
# [0, 0, 0, 0],
# [0, 1, 1, 0],
# [0, 0, 0, 0],
# [0, 0, 0, 0]])

Calculating distances between unique Python array regions?

I have a raster with a set of unique ID patches/regions which I've converted into a two-dimensional Python numpy array. I would like to calculate pairwise Euclidean distances between all regions to obtain the minimum distance separating the nearest edges of each raster patch. As the array was originally a raster, a solution needs to account for diagonal distances across cells (I can always convert any distances measured in cells back to metres by multiplying by the raster resolution).
I've experimented with the cdist function from scipy.spatial.distance as suggested in this answer to a related question, but so far I've been unable to solve my problem using the available documentation. As an end result I would ideally have a 3 by X array in the form of "from ID, to ID, distance", including distances between all possible combinations of regions.
Here's a sample dataset resembling my input data:
import numpy as np
import matplotlib.pyplot as plt
# Sample study area array
example_array = np.array([[0, 0, 0, 2, 2, 0, 0, 0, 0, 0, 0, 0],
[0, 0, 2, 0, 2, 2, 0, 6, 0, 3, 3, 3],
[0, 0, 0, 0, 2, 2, 0, 0, 0, 3, 3, 3],
[0, 0, 0, 0, 0, 0, 0, 0, 3, 0, 3, 0],
[0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 3, 3],
[1, 1, 0, 0, 0, 0, 0, 0, 3, 3, 3, 3],
[1, 1, 1, 0, 0, 0, 3, 3, 3, 0, 0, 3],
[1, 1, 1, 0, 0, 0, 3, 3, 3, 0, 0, 0],
[1, 1, 1, 0, 0, 0, 3, 3, 3, 0, 0, 0],
[1, 0, 1, 0, 0, 0, 0, 0, 0, 0, 0, 0],
[1, 0, 1, 0, 0, 0, 0, 5, 5, 0, 0, 0],
[0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 4]])
# Plot array
plt.imshow(example_array, cmap="spectral", interpolation='nearest')
Distances between labeled regions of an image can be calculated with the following code,
import itertools
from scipy.spatial.distance import cdist
# making sure that IDs are integer
example_array = np.asarray(example_array, dtype=np.int)
# we assume that IDs start from 1, so we have n-1 unique IDs between 1 and n
n = example_array.max()
indexes = []
for k in range(1, n):
tmp = np.nonzero(example_array == k)
tmp = np.asarray(tmp).T
indexes.append(tmp)
# calculating the distance matrix
distance_matrix = np.zeros((n-1, n-1), dtype=np.float)
for i, j in itertools.combinations(range(n-1), 2):
# use squared Euclidean distance (more efficient), and take the square root only of the single element we are interested in.
d2 = cdist(indexes[i], indexes[j], metric='sqeuclidean')
distance_matrix[i, j] = distance_matrix[j, i] = d2.min()**0.5
# mapping the distance matrix to labeled IDs (could be improved/extended)
labels_i, labels_j = np.meshgrid( range(1, n), range(1, n))
results = np.dstack((labels_i, labels_j, distance_matrix)).reshape((-1, 3))
print(distance_matrix)
print(results)
This assumes integer IDs, and would need to be extended if that is not the case. For instance, with the test data above, the calculated distance matrix is,
# From 1 2 3 4 5 # To
[[ 0. 4.12310563 4. 9.05538514 5. ] # 1
[ 4.12310563 0. 3.16227766 10.81665383 8.24621125] # 2
[ 4. 3.16227766 0. 4.24264069 2. ] # 3
[ 9.05538514 10.81665383 4.24264069 0. 3.16227766] # 4
[ 5. 8.24621125 2. 3.16227766 0. ]] # 5
while the full output can be found here. Note that this takes the Eucledian distance from the center of each pixel. For instance, the distance between zones 1 and 3 is 2.0, while they are separated by 1 pixel.
This is a brute-force approach, where we calculate all the pairwise distances between pixels of different regions. This should be sufficient for most applications. Still, if you need better performance, have a look at scipy.spatial.cKDTree which would be more efficient in computing the minimum distance between two regions, when compared to cdist.

Matrix similarity along one axis

Hi haves 2D matrices and want to calculate a measure of similarity along the Y axis.
For example, the following matrix should yield 0:
[1, 0, 0, 0]
[0, 1, 0, 0]
[0, 0, 1, 0]
[0, 0, 0, 1]
While this one should yield 1:
[0, 1, 1, 0]
[0, 1, 1, 0]
[0, 1, 1, 0]
[0, 1, 1, 0]
In these examples I used binary values in the matrices, but in reality they are floats between 0 and 1. The matrices are much bigger and there is noise - the calculation has to be very fast as I have a large number of matrices to calculate for every experiment.
Right now I'm doing a Random PCA, keeping the first component as the measure of similarity. However, it is somewhat slow and I have the feeling that it is overkill. Any suggestions welcome!
The real problem here is how to define similarity.
I assume you define similarity as proportion of equal rows. That is, if you randomly take two different rows, what is the probability that those two rows are equal? This definition is the simplest I can think of that fits your example desired results.
If that's indeed what you want, it is easily computed as follows, where A denotes the data matrix:
d = squeeze(all(bsxfun(#eq, A, permute(A, [3 2 1])), 2)); %// test all pairs
%// of rows for equality
result = (sum(d(:))-size(d,1))/(numel(d)-size(d,1)); %// compute average, but
%// removing similarity of each row with itself
Use all with axis=0 can have the logical result, then reapply to the matrix:
Example:
mx
matrix([[1, 0, 0, 0],
[0, 1, 0, 0],
[0, 0, 1, 0],
[0, 0, 0, 1]])
mx1
matrix([[0, 1, 1, 0],
[0, 1, 1, 0],
[0, 1, 1, 0]])
To apply:
# use .A to convert to array to do the logical calculation
np.matrix(mx.A * mx.all(axis=0).A)
matrix([[0, 0, 0, 0],
[0, 0, 0, 0],
[0, 0, 0, 0],
[0, 0, 0, 0]])
The same for mx1:
np.matrix(mx1.A * mx1.all(axis=0).A)
matrix([[0, 1, 1, 0],
[0, 1, 1, 0],
[0, 1, 1, 0]])

Categories

Resources