I am trying to run a PCA analysis over an dataset representing the 3 bands of an image. The dataset is of size (300000,3) being pixels and 3bands.I find the Eigen values and pairs which are then put into a tuple called eig_pairs. I then calculate the variance to determine how many bands to use for PCA.
I determine that I wish to use 2 bands.
My eig_pairs shape is a list of tuples of size 3.
Following this tutorial I says I need to reshape everything by reducing from original dimension space (3) to how every many a dimension equal to the number of dimensions I wish to use (2). Their example goes for 7 to 4 as shown here:
matrix_w = np.hstack((eig_pairs[0][1].reshape(7,1),
eig_pairs[1][1].reshape(7,1),
eig_pairs[2][1].reshape(7,1),
eig_pairs[3][1].reshape(7,1)))
Following this logic I changed my own to:
matrix_w = np.hstack((eig_pairs0.reshape(3,1),
eig_pairs1.reshape(3,1)))
However I get the error ValueError: shapes (3131892,3) and (2,3) not aligned: 3 (dim 1) != 2 (dim 0)
#read in image
img = cv2.imread('/Volumes/EXTERNAL/Stitched-Photos-for-Chris/p7_0015_20161005-949am-75m-pass-1.jpg.png',1)
row,col = img.shape[:2]
b,g,r = cv2.split(img)
# Pandas dataset
# samples = 3000000, featuress = 3
dataSet = pd.DataFrame({'bBnad':b.flat[:],'gBnad':g.flat[:],'rBnad':r.flat[:]})
print(dataSet.head())
# Standardize the data
X = dataSet.values
X_std = StandardScaler().fit_transform(X) #converts data from unit8 to float64
#Calculating Eigenvectors and eigenvalues of Covariance matrix
meanVec = np.mean(X_std, axis=0)
covarianceMatx = np.cov(X_std.T)
eigVals, eigVecs = np.linalg.eig(covarianceMatx)
# Create a list of (eigenvalue, eigenvector) tuples
eig_pairs = [ (np.abs(eigVals[i]),eigVecs[:,i]) for i in range(len(eigVals))]
# Sort from high to low
eig_pairs.sort(key = lambda x: x[0], reverse= True)
# Determine how many PC going to choose for new feature subspace via
# the explained variance measure which is calculated from eigen vals
# The explained variance tells us how much information (variance) can
# be attributed to each of the principal components
tot = sum(eigVals)
var_exp = [(i / tot)*100 for i in sorted(eigVals, reverse=True)]
cum_var_exp = np.cumsum(var_exp)
#convert 3 dimension space to 2 dimensional space therefore getting a 2x3 matrix W
matrix_w = np.hstack((eig_pairs[0][1].reshape(3,1),
eig_pairs[1][1].reshape(3,1)))
Appreciate any help.
Related
I have a 3d numpy array representing time series data, ie [number of samples, time steps, features].
I would like to scale each feature between -1 and 1. However, each feature should be scaled with respect to the maximum and minimum of all samples in the first dimension of my array. For example, my array is of shape:
multi_data.shape
(66, 5004, 2)
I tried the following:
data_min = multi_data.min(axis=1, keepdims=True)
data_max = multi_data.max(axis=1, keepdims=True)
multi_data = (2*(multi_data-data_min)/(data_max-data_min))-1
The problem is this scales each "batch" (the first dimension of my array) independently. What I am trying to do is scale each feature (for which I have two), by the max and min values across all 66 batches and then scale each feature based on those maximum and minimum values, but I can't quite work out how to achieve this. Any pointers would be very welcome.
How about chaining that with another min/max:
data_min = multi_data.min(axis=1, keepdims=True).min(axis=0, keepdims=True)
data_max = multi_data.max(axis=1, keepdims=True).max(axis=0, keepdims=True)
multi_data = (2*(multi_data-data_min)/(data_max-data_min))-1
Or:
data_min = multi_data.min(axis=(0,1), keepdims=True)
data_max = multi_data.max(axis=(0,1), keepdims=True)
multi_data = (2*(multi_data-data_min)/(data_max-data_min))-1
Since you're taking min/max of the first two dimensions, you can just forget keepdims and use broadcasting so you can save quite a bit of memory in this case:
data_min = multi_data.min(axis=(0,1))
data_max = multi_data.max(axis=(0,1))
multi_data = (2*(multi_data-data_min)/(data_max-data_min))-1
Hello i have a question regarding a problem I am facing in python. I was studying about tensors and I saw that each row/column of a tensor must have the same size. Is it possible to create a tensor of perhaps a 3d object or matrix where lets say we have 3 axis : x,y,z
In the x axis I want to create a vector to work as an index. So let x be from 0 to N
Then on the y axis I want to have N random integer vectors of size m (where mm
Is it possible?
My first approach was to create a big vector of Nm and a big matrix of (Nm,Nm) dimensions where i would store all my random vectors and matrices and then if I wanted to change for example the my second vector then i would have to play with the indexes. However is there another way to approach this problem with tensors or numpy that I m unaware of?
Thank you in advance for your advices
First vector, N = 3, [1,2, 3]
Second N vectors with length m, m = 2
[[4,5], [6,7], [7,8]]
So, N matrices of size (m,m)
[[[1,1], [2,2]], [[1,1], [2,2]], [[1,1], [2,2]] ]
Lets create numpy arrays from them.
import numpy as np
N = 3
m = 2
a = np.array([1,2,3])
b = np.random.randn(N, m)
c = np.random.randn(N, m, m)
You see the problem here? The last matrix c has already 3 dimensions according to your definitions.
Your argument can be simplified.
Let's say our final matrix is -
a = np.zeros((3,2,2)) # 3 dimensions, x,y,z
1) For first dimension -
a[0,:,:] = 0 # first axis, first index = 0
a[1,:,:] = 1 # first axis, 2nd index = 1
a[2,:,:] = 2 # first axis, 3rd index = 2
2) Now, we need to fill up the rest of the positions, but dimensions don't match up.
So, it's better to create separate tensors for them.
I have this huge tensor from which I just want to keep selected tensors.
Background - first contains coordinates of quadrilaterals being predicted.
np.shape(coords_detached) = (15969, 8)
coords.shape() = torch.Size([15969, 8])
The second, contains same coordinates but filtered after selection using NMS, for this discussion just say I select 9 rows from above tensor. 8 coordinates + 1 confidence score
But NMS is being done in numpy so I detach the tensors.
coords_nms = torch.tensor(nms_coords_, dtype=torch.float32)
coords_nms.shape() = torch.Size([9, 9])
So now I want to select just these 9 rows from the original tensor, coz it had the gradient information that gets lost during detach() and numpy nms.
I tried this :
s = torch.ones_like(nms_coords_)
s *=-1
nms_coords = torch.where(coords == coords_nms[:,:-1], coords, s)
nms_coords = nms_coords[nms_coords>=0]
nms_coords.reshape(-1, 8)
to iterate through coords and match value coords_nms and just store those. but it needs same dimension at axis=0
The iterative loop would be the following but how to do it using tensor notation :
poo = []
for x in coords:
for z in nms_coords_:
if sum(x[:] == z[:-1]) == 8 :
poo.append(z[:-1])
I'm quite new to programming in general, but I could not figure this problem out until now.
I've got a two-dimensional numpy array mask, lets say mask.shape is (3800,3500)which is filled with 0s and 1s representing a spatial resolution of a 2D image, where a 1 represents a visible pixel and 0 represents background.
I've got a second two-dimensional array data of data.shape is (909,x) where x is exactly the amount of 1s in the first array. I now want to replace each 1 in the first array with a vector of length 909 from the second array. Resulting in a final 3D array of shape(3800,3500,909) which is basically a 2D x by y image where select pixels have a spectrum of 909 values in z direction.
I tried
mask_vector = mask.flatten
ones = np.ones((909,1))
mask_909 = mask_vector.dot(ones) #results in a 13300000 by 909 2d array
count = 0
for i in mask_vector:
if i == 1:
mask_909[i,:] = data[:,count]
count += 1
result = mask_909.reshape((3800,3500,909))
This results in a viable 3D array giving a 2D picture when doing plt.imshow(result.mean(axis=2))
But the values are still only 1s and 0s not the wanted spectral data in z direction.
I also tried using np.where but broadcasting fails as the two 2D arrays have clearly different shapes.
Has anybody got a solution? I am sure that there must be an easy way...
Basically, you simply need to use np.where to locate the 1s in your mask array. Then initialize your result array to zero and replace the third dimension with your data using the outputs of np.where:
import numpy as np
m, n, k = 380, 350, 91
mask = np.round(np.random.rand(m, n))
x = np.sum(mask == 1)
data = np.random.rand(k, x)
result = np.zeros((m, n, k))
row, col = np.where(mask == 1)
result[row,col] = data.transpose()
I would like to use a generic filter to calculate the mean of values within a given window (or kernel), for values that fulfill a couple of conditions. I expected the following code to produce a mean filter of the first array in a 3-layer window, using the other two arrays to mask values from the mean calculation.
from scipy import ndimage
import numpy as np
#some test data
tstArr = np.random.rand(3,7,7)
tstArr = tstArr*10
tstArr = np.int_(tstArr)
tstArr[1] = tstArr[1]*100
tstArr[2] = tstArr[2] *1000
#mean function
def testFun(tstData,processLayer,nLayers,kernelSize):
funData= tstData.reshape((nLayers,kernelSize,kernelSize))
meanLayer = funData[processLayer]
maskedData = meanLayer[(funData[1]>1)&(funData[2]<9000)]
returnMean = np.mean(maskedData)
return returnMean
#number of layers in the array
nLayers = np.shape(tstArr)[0]
#window size
kernelSize = 5
#create a sampling window of 5x5 elements from each array
footprnt = np.ones((nLayers,kernelSize,kernelSize),dtype = np.int)
# calculate the mean of the first layer in the array (other two are for masking)
processLayer = 0
tstOut = ndimage.generic_filter(tstArr, testFun, footprint=footprnt, extra_arguments = (processLayer,nLayers,kernelSize))
I thought this would yield a 7x7 array of masked mean values from the first layer in the input array. The output is a 3x7x7 array, and I don't understand what the values represent. I'm not sure how to produce the "masked" mean-filtered array, or how to interpret the output as given.
Your code produce a mean filter of the first array in a 3-layer window, using the over two arrays to mask values from the mean calculation. You will find the result in tstOut[1].
What is going on ? When you call ndimage.generic_filter with tstArr of shape (3, 7, 7) and footprint=np.ones((3, 5, 5)) then for all i from 0 to 2, for all j from 0 to 6 and for all k from 0 to 6, testFun is called with the subarray of tstArr centered in (i, j, k) and of shape (3, 5, 5) (the array is reflected at the boundary to supply missing values).
In the end:
tstOut[0] is the mean filter of tstArr[0] with tstArr[0] and tstArr[1] as masks
tstOut[1] is the mean filter of tstArr[0] with tstArr[1] and tstArr[2] as masks
tstOut[2] is the mean filter of tstArr[1] with tstArr[2] and tstArr[2] as masks
Again, the wanted result is in tstOut[1].
I hope this will help you.