Np.std on 3 dimensional Array - python

a = (np.arange(12)).reshape(2,2,3)
I am simply trying to get the standard deviation of each column in my 3D numpy array. When I do this for the mean I get the expected result - each mean in the resulting array is a float.
a.mean(axis = 0).mean(axis = 0)
output:
array([4.5, 5.5, 6.5])
However, for the standard deviation:
a.std(axis = 0).std(axis = 0)
Returns:
array([0., 0., 0.])
When verifying that np.std works correctly on one column
np.std(np.array([1,4,7,10]))
it returns
3.3541019662496847
Why are the column standard deviations returning 0,0,0 ?

For a 2D-numpy array finding the standard deviation and mean of each column can be done as:
a = (np.arange(12)).reshape(4,3)
a_mean = a.T.mean(axis=1)
a_std = a.T.std(axis=1)
As for 3d numpy arrays, I am not sure what exacty you mean with column.
So maybe the solution you are looking for is to first reshape the array into a 2d-numpy array and then use the code above.

Related

Correlation of columns of two arrays in python

I have two arrays: 900x421 and 900x147. I need to correlate all columns from these arrays so that the output is 421x147. In Matlab function corr() does it, but I can't find a function that does the same in python.
the numpy.corrcoef function is the way to go. You need both arguments x and y to be of the same shape. You can do so by concatenate the two arrays. Let's say arr1 is of shape 900x421 and arr2 is of shape 900x147. You can do the following
import numpy as np
two_arrays = np.concatenate((arr1, arr2), axis=1) # 900x568
corr = np.corrcoef(two_arrays.T) # 568x568 array
desired_output = corr[0:421, 421:]
The np.corrcoef treats each row as a variable and each column as observation. That is why we need to transpose the array.

Minimum difference of Numpy arrays

I have two 3-dimensional Numpy arrays of the same size. Their entries are similar, but not quite the same. I would like to shift one array in all three space dimensions, so that the difference between both arrays is minimal.
I tried to write a function with arguments
- list of lengths I like to shift the array,
- array 1,
- array 2.
But I do not know how I can minimize this function, I tried using scipy.optimize.minimize, but failed:
import numpy as np
from scipy.optimize import minimize
def array_diff(shift, array1, array2):
roll = np.roll(np.roll(np.roll(array2, shift[0], axis=0), shift[1], axis=1), shift[2], axis=2)
diff = np.abs(np.subtract(array1, roll))
diffs = np.sum(diff)
return diffs
def opt_diff(func, array1, array2):
opt = minimize(func, x0=np.zeros(3), args=(array1, array2))
return opt
min_diff = opt_diff(array_diff, array1, array2)
This gives an error message regarding roll = np.roll(...) It says "slice indices must be integers or have an index method". I guess, that I am using the minimize function nor correctly, but have no idea, how to fix it.
My goal is to minimize the function img_diff and get the minimum sum of all entries of the difference array. As a result I would like to have the three parameters shift[0], shift[1] and shift[2] for shift in y-, x-, and z-direction.
Thank you for all your help.
This gives an error message regarding roll = np.roll(...) It says
"slice indices must be integers or have an index method".
np.roll requires an integer for the shift parameter. np.zeros creates an array of floats. Specify an integer type for x0:
x0=np.zeros(3,dtype=np.int32)
x0=np.zeros(3)
x0
Out[3]: array([ 0., 0., 0.])
x0[0]
Out[4]: 0.0
x0=np.zeros(3,dtype=np.int32)
x0[0]
Out[6]: 0
scipy.optimize.minimize will try to adjust x0 by fractions so maybe just add a statement to array_diff:
def array_diff(shift, array1, array2):
shift = shift.astype(np.int32)
...

Numpy array with different standard deviation per row

I'd like to get an NxM matrix where numbers in each row are random samples generated from different normal distributions(same mean but different standard deviations). The following code works:
import numpy as np
mean = 0.0 # same mean
stds = [1.0, 2.0, 3.0] # different stds
matrix = np.random.random((3,10))
for i,std in enumerate(stds):
matrix[i] = np.random.normal(mean, std, matrix.shape[1])
However, this code is not quite efficient as there is a for loop involved. Is there a faster way to do this?
np.random.normal() is vectorized; you can switch axes and transpose the result:
np.random.seed(444)
arr = np.random.normal(loc=0., scale=[1., 2., 3.], size=(1000, 3)).T
print(arr.mean(axis=1))
# [-0.06678394 -0.12606733 -0.04992722]
print(arr.std(axis=1))
# [0.99080274 2.03563299 3.01426507]
That is, the scale parameter is the column-wise standard deviation, hence the need to transpose via .T since you want row-wise inputs.
How about this?
rows = 10000
stds = [1, 5, 10]
data = np.random.normal(size=(rows, len(stds)))
scaled = data * stds
print(np.std(scaled, axis=0))
Output:
[ 0.99417905 5.00908719 10.02930637]
This exploits the fact that a two normal distributions can be interconverted by linear scaling (in this case, multiplying by standard deviation). In the output, each column (second axis) will contain a normally distributed variable corresponding to a value in stds.

Replicating a matrix in pandas or numpy to a certain size

I have a matrix A which is (41, 41) which is a dataframe.
B is a matrix of size (7154, 8240), ndarray.
I want replicate A (keeping the whole 41x41 matrix intact) to the size of B. It will not fit exactly, but then it should just clip the rows that does not fit.
This is to be able to multiply A*B.
I tried this code, but I cannot multiply with a float.
repeat = pd.concat([A]*(B.shape[0]/A.shape[0]), axis=0, ignore_index=True)
filter_large = pd.concat([repeat]*(B.shape[1]/A.shape[1]), axis=1, ignore_index=True)
filter_l = filter_large.values # change from a dataframe to a numpy array
AB = A*filter_l
I should mention that I've tried numpy.resize but it does not keep the matrix intact, mixing up all rows which is not what I want.
This code will do what you ask for:
shapeMultiples = (np.ceil(B.shape[0]/A.shape[0]).astype(int), np.ceil(B.shape[1]/A.shape[1]).astype(int))
res = np.tile(A, shapeMultiples)[:B.shape[0], :B.shape[1]]
Explanation:
np.tile(A, reps) repeats the matrix A multiple times along each axis. How often it is repeated is specified for each axis in reps.
For your example it should be repeated b.shape[0]/a.shape[0] times along axis 0 and b.shape[1]/a.shape[1] times along axis 1. However you have to round these values up, to make sure it extends the size of matrix B, which is what np.ceil does. Since reps is expected to be a shape of integers but ceil returns floats, we have to cast the type to int.
In the final step we cut of the result to make it fit the size of B with [:B.shape[0], :B.shape[1]].

How to create masked array with a vector that separates sections of a 2d array?

Let's say I have a standard 2d numpy array, let's call it my2darray with values. In this array there are two major sections. Let's say for each column, there is a specific row which separates "scenario1" and "scenario2". How can i create 2 masked arrays that represent the top section of my2darray and the bottom of my2darray. For example, i am interested in calculating the mean of the top half and the mean of the second half. One idea is to have a mask that is of the same shape as my2darray but that seems like a waste of memory. Is there a better idea? Let's say I have a vector, in which the length is equal to the number of rows in my2darray (in this case 6), i.e. I have
myvector=np.array([9, 15, 5,7,11,11])
I am using python 2.6 with numpy 1.5.0
Using NumPy's broadcasted comparison, we can create such a 2D mask in a vectorized manner. Rest of the work is all about sum-reduction along the first axis for which we can take help from np.einsum. Thus, we would have an implementation like so -
N = my2darray.shape[0]
mask = myvector <= np.arange(N)[:,None]
uout = np.true_divide(np.einsum('ij,ij->j',my2darray,~mask),myvector)
lout = np.true_divide(np.einsum('ij,ij->j',my2darray,mask),N-myvector)
Sample run to verify results -
In [184]: N = my2darray.shape[0]
...: mask = myvector <= np.arange(N)[:,None]
...: uout = np.true_divide(np.einsum('ij,ij->j',my2darray,~mask),myvector)
...: lout = np.true_divide(np.einsum('ij,ij->j',my2darray,mask),N-myvector)
...:
In [185]: uout
Out[185]: array([ 6. , 4.6, 4. , 0. ])
In [186]: [my2darray[:item,i].mean() for i,item in enumerate(myvector)]
Out[186]: [6.0, 4.5999999999999996, 4.0, 0.0] # Loopy version results
In [187]: lout
Out[187]: array([ 5.2 , 4. , 2.66666667, 2. ])
In [188]: [my2darray[item:,i].mean() for i,item in enumerate(myvector)]
Out[188]: [5.2000000000000002, 4.0, 2.6666666666666665, 2.0] # Loopy version
Another potentially faster way would be to calculate the summations for the upper mask, store it and from it, subtract the sum along the first axis along the entire length of the 2D input array. This could be then used for the calculation of the lower part average. Thus, after we store N and calculate mask, we would have -
usum = np.einsum('ij,ij->j',my2darray,~mask)
uout = np.true_divide(usums,myvector)
lout = np.true_divide(my2darray.sum(0) - usums,N-myvector)

Categories

Resources