I have a space of zeros with a variable dimension and an array of ones with a variable dimension, for instance:
import numpy
space = numpy.zeros((1000,5))
a = numpy.ones((150))
I would like to insert the ones of the array inside the matrix in order that those ones will be homogeneously distributed inside the matrix.
You can use numpy.linspace to obtain the indices.
It's not obvious if you'd like to assign a slice of five ones to every index location or just assign to the first index of the slice. This is how both of these would work:
space = numpy.zeros((1000,5))
a = numpy.ones((150, 5))
b = numpy.ones((150,))
index = numpy.rint(numpy.linspace(start=0, stop=999, num=150)).astype(np.int)
# This would assign five ones to every location
space[index] = a
# This would assign a one to the first element at every location
space[index, 0] = b
Related
So I have an array I am trying to slice and index using two other boolean arrays and then set a value on that subset of the array. I saw this post:
Setting values in a numpy arrays indexed by a slice and two boolean arrays
and suspect I am getting a copy instead of a view of my array so it isn't saving the values I am setting on the array. I think I managed to reproduce the problem with a much shorter code, but am a very out of my depth.
#first array
a = np.arange(0,100).reshape(10,10)
#conditional array of same size
b = np.random.rand(10,10)
b = b < 0.8
#create out array of same size as a
out = np.zeros(a.shape)
#define neighborhood to slice values from
nhood = tuple([slice(3,6),slice(5,7)])
#define subset where b == True in neighborhood
subset = b[nhood]
#define values in out that are in the neighborhood but not excluded by b
candidates = out[nhood][subset]
#get third values from neighborhood using math
c = np.random.rand(len(candidates))
#this is in a for loop so this is checking to see a value has already been changed earlier - returns all true now
update_these = candidates < c
#set sliced, indexed subset of array with the values from c that are appropriate
out[nhood][subset][update_these] = c[update_these]
print(out) ##PRODUCES - ARRAY OF ALL ZEROS STILL
I have also tried chaining the boolean index with
out[nhood][(subset)&(update_these)] = c[update_these]
But that made an array of the wrong size.
Help?
I have a set of arrays that I want to match to a seperate set of arrays I have, within a certain range. Example code bloew:
arr1_1 = np.array([468.12, 30.4, 879.74])
arr1_2 = np.array([351.20, 84.98, 514.45])
arr1_3 = np.array([21.456, 89.56, 69.45])
array_1 = np.column_stack ((arr1_1, arr1_2, arr1_3))
arr2_1 = np.array([879.12, 48.4, 212.47...])
arr2_2 = np.array([389.06, 80.91, 87.98...])
arr2_3 = np.array([224.566, 98.35, 657.30..])
array_2 = np.column_stack ((arr2_1, arr2_2, arr2_3))
The second set of arrays are much larger than the first. Is there any way to match the second set to the first, by column, within a specific range? i.e any rows from array_2 where the second columns value (array2_2) is within ±5 of array1_2 in array_1? In this examply my ideal output would be
array_match = ([48.4, 80.91, 98.35])
(because 80.91 is within 5 of 84.98)
I have a matrix to store k minimum distances for N elements. Whenever a new element arrives I want to compute the distances to all N elements and if any distance is lower to the maximum distance stored I want to update that value and store the new distance. Initially the distances are set to np.inf.
elems = np.array([[5, 5],[4, 4],[8, 8]])
k=2
center_mindists = np.full((len(elems),k), np.inf)
So when a new element arrives, let's say x=np.array([1,1]) I have to compute the distance to all elements and store it if it is lesser than the maximum distance stored at the time
distances = np.sum(np.abs(elems - x)) #[8 6 14]
To do so, I find the indices where there is the maximum distance in each row and then select the max stored distances that are higher to the recently computed distance
max_min_idx = np.argmax(center_mindists, axis=1) #[0 0 0]
id0 = np.indices(max_min_idx.shape)
lower_id = distances < centers_mindists[id0, max_min_idx]
Finally I have to update those values with the new ones:
center_mindists[id0, max_min_idx][lower_idx] = distances[lower_idx[0]]
The thing is that the assignation does not change the values on the center_min_dists matrix and I couldn't find a solution for this.
Thanks a lot!!
You can perform the assignment in two steps, since you have a double index, the first part of which makes a copy. Instead of
center_mindists[id0, max_min_idx][lower_idx] = distances[lower_idx[0]]
explicitly update the copy, and assign it back:
temp = center_mindists[id0, max_min_idx]
temp[lower_idx] = distances[lower_idx[0]]
center_mindists[id0, max_min_idx] = temp
This is actually pretty convenient because you really use temp to compute the lower_idx mask in the first place.
center_mindists[id0, max_min_idx] is a copy, because the indices are not slices (basic indexing).
center_mindists[id0, max_min_idx][lower_idx] = ...
modifies that copy, not the original, so nothing ends up happening.
You have to somehow combine indices so that you have only one set of advanced indexing
center_mindists[idx0, idx1] = ....
I am trying to do the following on Numpy without using a loop :
I have a matrix X of dimensions N*d and a vector y of dimension N.
y contains integers ranging from 1 to K.
I am trying to get a matrix M of size K*d, where M[i,:]=np.mean(X[y==i,:],0)
Can I achieve this without using a loop?
With a loop, it would go something like this.
import numpy as np
N=3
d=3
K=2
X=np.eye(N)
y=np.random.randint(1,K+1,N)
M=np.zeros((K,d))
for i in np.arange(0,K):
line=X[y==i+1,:]
if line.size==0:
M[i,:]=np.zeros(d)
else:
M[i,:]=mp.mean(line,0)
Thank you in advance.
The code's basically collecting specific rows off X and adding them for which we have a NumPy builtin in np.add.reduceat. So, with that in focus, the steps to solve it in a vectorized way could be as listed next -
# Get sort indices of y
sidx = y.argsort()
# Collect rows off X based on their IDs so that they come in consecutive order
Xr = X[np.arange(N)[sidx]]
# Get unique row IDs, start positions of each unique ID
# and their counts to be used for average calculations
unq,startidx,counts = np.unique((y-1)[sidx],return_index=True,return_counts=True)
# Add rows off Xr based on the slices signified by the start positions
vals = np.true_divide(np.add.reduceat(Xr,startidx,axis=0),counts[:,None])
# Setup output array and set row summed values into it at unique IDs row positions
out = np.zeros((K,d))
out[unq] = vals
This solves the question, but creates an intermediate K×N boolean matrix, and doesn't use the built-in mean function. This may lead to worse performance or worse numerical stability in some cases. I'm letting the class labels range from 0 to K-1 rather than 1 to K.
# Define constants
K,N,d = 10,1000,3
# Sample data
Y = randint(0,K-1,N) #K-1 to omit one class to test no-examples case
X = randn(N,d)
# Calculate means for each class, vectorized
# Map samples to labels by taking a logical "outer product"
mark = Y[None,:]==arange(0,K)[:,None]
# Count number of examples in each class
count = sum(mark,1)
# Avoid divide by zero if no examples
count += count==0
# Sum within each class and normalize
M = (dot(mark,X).T/count).T
print(M, shape(M), shape(mark))
Suppose I have a two dimensional numpy array with a given shape and I would like to get a view of the values that satisfy a predicate based on the value's position. That is, if x and y are the column and row index accordingly and a predicate x>y the function should return only the array's values for which the column index is greater than the row index.
The easy way to do is a double loop but I would like a possibly faster (vectorized maybe?) approach?
Is there a better way?
In general, you could do this by constructing an open mesh grid corresponding to the row/column indices, apply your predicate to get a boolean mask, then index into your array using this mask:
A = np.zeros((10,20))
y, x = np.ogrid[:A.shape[0], :A.shape[1]]
mask = x > y
A[mask] = 1
Your specific example happens to be the upper triangle - you can get a copy of it using np.triu, or you can get the corresponding row/column indices using np.triu_indices.