My problem has different moving objects. We calculate distance between these objects in different time frame.
I have a nd array A with shape (a,b) which stores distances. a is the number of fames and b is the number of coordinates on which this distance is calculated.
I have a list L which has the names of these objects. It has a length of b.
I want to find where distance value is 1 . Then I want to locate the name of this index in list L (which has same index). I write the following
A=[[nd array]]
L=[list of names]
list_to_array=np.array(L)
array_of_names_meeting_criteria=list_to_array[np.where(A==1)[1]]
The problem is i am not getting names per frame. I want this array to be split frame wise so i get (a,x) where a is number of frames and for each frame I have x number of names
sample case
A=np.array[[1,2,2,6],[3,4,5,1],[3,1,17,4],[2,3,1,5]]
L=[('cat','dog'),('lion','elephant'),('man','women'),('fish','shark')]
llist_to_array=np.array(L)
array_of_names_meeting_criteria=list_to_array[np.where(A==1)[1]]
I should get the below :-
[['cat','dog'],['fish','shark'],['lion','elephant'],['man','women']]
I just did some minor edits from your code and here's the result:
A = np.array([[1,2,2,6],[3,4,5,1],[3,1,17,4],[2,3,1,5]])
L = [('cat','dog'),('lion','elephant'),('man','women'),('fish','shark')]
list_to_array = np.array(L)
array_of_names_meeting_criteria = list_to_array[np.where(A==1)[1]]
Related
I am trying to find out the 3 nearest neighbours of a row within a set of 10 rows(each 10 rows is a class), and then average out those 3 neighbours.
I need to do this over an array of 400 rows, where each consecutive 10 rows belong to one class.
I think I have managed to capture the 3 nearest neighbours for each row within 'indices' below.
In the output below, 'indices' is a 10x3 matrix.
I'm just not sure how to go about referencing those particular 3 rows in the original xclass that the 3 elements of each row of 'indices' refers to, and then add them (the challenge) and then divide by 3 to get the average (i assume this division is straight-forward).
Updated this para after the responses below:
Basically, X has dimensions 400x4096
Indices could be for example [[1,3,5],[2,4,8].....]
What I need to do is average out rows 1,3 and 5 of X and obtain a resultant row of shape 1x4096.
Similarly average out rows 2,4,8 of X and obtain a new row for this set and so on for each row in indices.
So basically each element in a particular row of indices refers to a specific row in X.
'''
for counter in range(0,399,10):
#print(counter)
xclass=X[counter:counter+9]
yclass=y[counter:counter+9]
#print(xclass)
nbrs = NearestNeighbors(n_neighbors=3, algorithm='brute').fit(xclass)
distances, indices = nbrs.kneighbors(xclass)
#print(indices)
'''
appreciate any insight.
You can index lists in python using a word as such...
a = ['aaa', 'bbb', 'ccc']
b = a[a.index('aaa')]
print(b)
output: aaa
and also like...
a = ['aaa', 'bbb', 'ccc']
word = 'aaa'
b = a[a.index(word)]
print(b)
output: aaa
so can do something like...
a = ['aaa', 'bbb', 'ccc']
word = 'aaa'
b = a[a.index(word+1)]
print(b)
output: bbb
I assume you are using numpy (or something similar). In general, you can take any indexing array and use that to capture the particular entries of interest in another array. For example,
import numpy as np
#Accessing an array by an indexing array.
X = np.arange(30).reshape(6,5) #6, 5 long vectors
I = [[0,1,2],[3,4,5]] #I wish to collect vectors 0,1,2 together and vectors 3,4,5 together
C = X[I,:] #Will be a 2 by 3 by 5. 2 collections of 3 5-long vectors.
print(C)
#Computations for averaging those collected arrays.
#Note that C is shape (2,3,5), we wish to average the 3s together, hence we need to
#average along the middle axis (axis=1).
A = np.average(C,axis=1)
print(A)
More detail about X[I,:]. In general, what we did here was specify all the x-coordinates and all the y-coordinates to capture in our array X. Since, we wanted the full vectors in X we didn't care about the y-coordinates and wanted to capture all of them, hence :. Likewise, we wanted to pull the x-coordinates 3 at a time, so we specified to pull [0,1,2] and then to pull [3,4,5]. You could change those to any indexes you wish.
I apologise for the tounge-twister of a title.
To summarise, I am trying to apply dtw to a dataframe of tree-ring series I have. I want to be able to apply dtw to each column, comparing each to the rest in the dataset, but it's just figuiring out the logic, that's getting me confused - I now have a single array 9one column) and an array of arrays I want to compare it to individually (the other columns) - as I have 46 columns, It's going to take a monumentous amount of time to manually do this. So I'm looking for a way that will print the distances between each column.
I have my single array ie, column 1 (a1):
array([2231.121954586618, 2191.32688635395, 2153.33037342928,
2167.460745065675, 2182.327272147529, 2148.104497944283,
2114.629371754906, 2093.254599793933, 2013.228738264795,
1960.124018035272, 1956.115012446374, 2004.772102502964,
1996.031697793075, 1984.117922837165, 1927.018245950742,
1889.983294062236, 1857.106663618318, 1855.521387844768,
1854.30527162405, 1843.946144942001, 1834.918111326537,
1786.367506785417, 1764.596236951255, 1765.789120636587,
1768.225728544412, 1801.390137110182, 1820.438710725669,
1821.776101512033, 1814.626915671021, 1789.410699262131,
1752.680382970908, 1774.240633213347, 1793.576383615812,
1802.430943044276, 1810.653920721653, 1832.59203921635,
1836.215188930494, 1804.727265942576, 1807.798802135772,
1853.273004232627, 1875.641068893134, 1880.352238594259,
1845.111091114404, 1807.281434172499, 1802.326163448382,
1779.565520429905, 1827.148896035324, 1860.634653074935],
dtype=object)
and array of arrays, ie columns 2:46 (a1_compare):
array([[2338.980748451803, 2313.115476761541, 2266.320969548615, ...,
1971.777882561555, 2004.912406403344, 2005.090872507429],
[5085.120869045766, 4994.508983933459, 4926.377921200292, ...,
3810.539158921751, 3757.139414193585, 3698.921580852207],
[1441.5932022738868, 1441.5932022738868, 250.2478024965511, ...,
2864.532339498514, 2775.946234841519, 2764.567521984336],
...,
[822.4370926086343, 848.1167402384477, 887.7301546370533, ...,
1549.347739499023, 1592.226581401639, 1577.883355154341],
[1508.596325796503, 1593.192415483712, 1587.73520115259, ...,
1467.943298815971, 1556.004468001763, 1528.921150058964],
[1300.0305814488, 1369.177320180398, 1480.576904436118, ...,
1379.66588731831, 1367.312665162758, 1328.830519316272]],
dtype=object)
and finally my code to attempt to compare them:
def compare1(array1, array_arrays):
for i in array_arrays:
distance, path = fastdtw(a1, i, dist = manhattan_distance)
return distance
But this is only returning one value:
compare1(a1, a1_compare)
12271.277
when I want it to be each individual - the first is: 4164.2393701224755 but I want all the others too. Any suggestions as to how I can do this without having to individually compare each column/array?
If I'm interpreting your problem correctly, your for loop computes and then over-writes distance for each sub array in a1_compare until the last iteration when the value of only the last one is returned. There are many ways to save the results of every iteration, but the most sensible to me is to allocate an empty array the same length as a1_compare and save distance to the appropriate index of the output array each iteration:
def compare1(array1, array_arrays):
distances = np.empty(len(array_arrays)) #create an empty container the right size
#enumerate is an easy way to get an index at the same time as the value itself.
for i, value in enumerate(array_arrays):
distance, path = fastdtw(a1, value, dist = manhattan_distance)
distances[i] = distance #save our results
return distances #return all of them
I have a matrix to store k minimum distances for N elements. Whenever a new element arrives I want to compute the distances to all N elements and if any distance is lower to the maximum distance stored I want to update that value and store the new distance. Initially the distances are set to np.inf.
elems = np.array([[5, 5],[4, 4],[8, 8]])
k=2
center_mindists = np.full((len(elems),k), np.inf)
So when a new element arrives, let's say x=np.array([1,1]) I have to compute the distance to all elements and store it if it is lesser than the maximum distance stored at the time
distances = np.sum(np.abs(elems - x)) #[8 6 14]
To do so, I find the indices where there is the maximum distance in each row and then select the max stored distances that are higher to the recently computed distance
max_min_idx = np.argmax(center_mindists, axis=1) #[0 0 0]
id0 = np.indices(max_min_idx.shape)
lower_id = distances < centers_mindists[id0, max_min_idx]
Finally I have to update those values with the new ones:
center_mindists[id0, max_min_idx][lower_idx] = distances[lower_idx[0]]
The thing is that the assignation does not change the values on the center_min_dists matrix and I couldn't find a solution for this.
Thanks a lot!!
You can perform the assignment in two steps, since you have a double index, the first part of which makes a copy. Instead of
center_mindists[id0, max_min_idx][lower_idx] = distances[lower_idx[0]]
explicitly update the copy, and assign it back:
temp = center_mindists[id0, max_min_idx]
temp[lower_idx] = distances[lower_idx[0]]
center_mindists[id0, max_min_idx] = temp
This is actually pretty convenient because you really use temp to compute the lower_idx mask in the first place.
center_mindists[id0, max_min_idx] is a copy, because the indices are not slices (basic indexing).
center_mindists[id0, max_min_idx][lower_idx] = ...
modifies that copy, not the original, so nothing ends up happening.
You have to somehow combine indices so that you have only one set of advanced indexing
center_mindists[idx0, idx1] = ....
i want to create a loop for the following lines in python (i use pycharm):
mean_diff = np.mean(np.array([diff_list[0].values, diff_list[1].values, diff_list[2].values, diff_list[3].values,...,diff_list[100], axis=0)
with this i get the mean of each individual cell from different arrays (raster change over time)
i tried the following:
for x in range(100):
mean_diff = np.mean(np.array([diff_list[x].values]), axis=0)
but what's happening here is that it will start to calculate the mean between the mean of the last iteration and the new array and so on, instead of adding everything up first and afterwards calculating the mean of the total. one idea was to create a "sumarray" first with all the diff_list values in it, but i failed to do that too. the original type of my diff_list is a list which contains data frames in it (for each row it has an array, so it's a 3d array/data frame (??)... -> see picture: image shows the structure of the list).
You need to populate the array, not do the computation, within the loop. Python list comprehensions are perfect for this:
Your first program is the equivalent of:
mean_diff = np.mean(np.array([a.values for a in diff_list[:101]], axis=0))
Or if you prefer:
x = []
for a in diff_list[:101]:
x.append(a.values)
mean_diff = np.mean(np.array(x, axis=0))
If you are using the whole list instead of its first 101 elements you can drop the "[:101]".
I am trying to do the following on Numpy without using a loop :
I have a matrix X of dimensions N*d and a vector y of dimension N.
y contains integers ranging from 1 to K.
I am trying to get a matrix M of size K*d, where M[i,:]=np.mean(X[y==i,:],0)
Can I achieve this without using a loop?
With a loop, it would go something like this.
import numpy as np
N=3
d=3
K=2
X=np.eye(N)
y=np.random.randint(1,K+1,N)
M=np.zeros((K,d))
for i in np.arange(0,K):
line=X[y==i+1,:]
if line.size==0:
M[i,:]=np.zeros(d)
else:
M[i,:]=mp.mean(line,0)
Thank you in advance.
The code's basically collecting specific rows off X and adding them for which we have a NumPy builtin in np.add.reduceat. So, with that in focus, the steps to solve it in a vectorized way could be as listed next -
# Get sort indices of y
sidx = y.argsort()
# Collect rows off X based on their IDs so that they come in consecutive order
Xr = X[np.arange(N)[sidx]]
# Get unique row IDs, start positions of each unique ID
# and their counts to be used for average calculations
unq,startidx,counts = np.unique((y-1)[sidx],return_index=True,return_counts=True)
# Add rows off Xr based on the slices signified by the start positions
vals = np.true_divide(np.add.reduceat(Xr,startidx,axis=0),counts[:,None])
# Setup output array and set row summed values into it at unique IDs row positions
out = np.zeros((K,d))
out[unq] = vals
This solves the question, but creates an intermediate K×N boolean matrix, and doesn't use the built-in mean function. This may lead to worse performance or worse numerical stability in some cases. I'm letting the class labels range from 0 to K-1 rather than 1 to K.
# Define constants
K,N,d = 10,1000,3
# Sample data
Y = randint(0,K-1,N) #K-1 to omit one class to test no-examples case
X = randn(N,d)
# Calculate means for each class, vectorized
# Map samples to labels by taking a logical "outer product"
mark = Y[None,:]==arange(0,K)[:,None]
# Count number of examples in each class
count = sum(mark,1)
# Avoid divide by zero if no examples
count += count==0
# Sum within each class and normalize
M = (dot(mark,X).T/count).T
print(M, shape(M), shape(mark))