Related
I have two HxW matrices A and B. I'd like to get an NxHxW matrix C such that C[0]=A, C[-1]=B, and each of the remaining N-2 slices are linearly interpolated between A and B. Is there a single numpy function I can do this with, without needing a for loop?
Just use linspace if you are looking for linear interpolation between just 2 points.
A = np.array([[0,1],
[2,3]])
B = np.array([[1, 3],
[-1,-2]])
C = np.linspace(A,B,4) #<- Change this to H+2, which is H linearly interpolated values between the 2 points
C
array([[[ 0. , 1. ], #<-- A matrix is C[0]
[ 2. , 3. ]],
[[ 0.33333333, 1.66666667],
[ 1. , 1.33333333]], #
#<-- Elementwise equally spaced values
[[ 0.66666667, 2.33333333], #
[ 0. , -0.33333333]],
[[ 1. , 3. ], #<-- B matrix is C[-1]
[-1. , -2. ]]])
I have an arbitrary row vector "u" and an arbitrary matrix "e" as follows:
u = np.resize(np.array([8,3]),[1,2])
e = np.resize(np.array([[2,2,5,5],[1, 6, 7, 4]]),[4,2])
np.cov(u,e)
array([[ 12.5, 0. , 0. , -12.5, 7.5],
[ 0. , 0. , 0. , 0. , 0. ],
[ 0. , 0. , 0. , 0. , 0. ],
[-12.5, 0. , 0. , 12.5, -7.5],
[ 7.5, 0. , 0. , -7.5, 4.5]])
The matrix that this returns is 5x5. This is confusing to me because the largest dimension of the inputs is only 4.
Thus, this may be less of a numpy question and more of a math question...not sure...
Please refer to the official numpy documentation (https://docs.scipy.org/doc/numpy-1.14.0/reference/generated/numpy.cov.html) and check whether you usage of the numpy.cov function is consistent with what you are trying to achieve and you understand what you are trying to do.
When looking at the signature
numpy.cov(m, y=None, rowvar=True, bias=False, ddof=None, fweights=None, aweights=None)
m : array_like
A 1-D or 2-D array containing multiple variables and observations.
Each row of m represents a variable, and each column a single observation > > of all those variables. Also see rowvar below.
y : array_like, optional
An additional set of variables and observations. y has the same form as that of m.
Note how m and y are combined as shown in the last example on the page
>>> x = [-2.1, -1, 4.3]
>>> y = [3, 1.1, 0.12]
>>> X = np.stack((x, y), axis=0)
>>> print(np.cov(X))
[[ 11.71 -4.286 ]
[ -4.286 2.14413333]]
>>> print(np.cov(x, y))
[[ 11.71 -4.286 ]
[ -4.286 2.14413333]]
>>> print(np.cov(x))
11.71
I have the following numpy array:
foo = np.array([[0.0, 10.0], [0.13216, 12.11837], [0.25379, 42.05027], [0.30874, 13.11784]])
which yields:
[[ 0. 10. ]
[ 0.13216 12.11837]
[ 0.25379 42.05027]
[ 0.30874 13.11784]]
How can I normalize the Y component of this array. So it gives me something like:
[[ 0. 0. ]
[ 0.13216 0.06 ]
[ 0.25379 1 ]
[ 0.30874 0.097]]
Referring to this Cross Validated Link, How to normalize data to 0-1 range?, it looks like you can perform min-max normalisation on the last column of foo.
v = foo[:, 1] # foo[:, -1] for the last column
foo[:, 1] = (v - v.min()) / (v.max() - v.min())
foo
array([[ 0. , 0. ],
[ 0.13216 , 0.06609523],
[ 0.25379 , 1. ],
[ 0.30874 , 0.09727968]])
Another option for performing normalisation (as suggested by OP) is using sklearn.preprocessing.normalize, which yields slightly different results -
from sklearn.preprocessing import normalize
foo[:, [-1]] = normalize(foo[:, -1, None], norm='max', axis=0)
foo
array([[ 0. , 0.2378106 ],
[ 0.13216 , 0.28818769],
[ 0.25379 , 1. ],
[ 0.30874 , 0.31195614]])
sklearn.preprocessing.MinMaxScaler can also be used (feature_range=(0, 1) is default):
from sklearn import preprocessing
min_max_scaler = preprocessing.MinMaxScaler()
v = foo[:,1]
v_scaled = min_max_scaler.fit_transform(v)
foo[:,1] = v_scaled
print(foo)
Output:
[[ 0. 0. ]
[ 0.13216 0.06609523]
[ 0.25379 1. ]
[ 0.30874 0.09727968]]
Advantage is that scaling to any range can be done.
I think you want this:
foo[:,1] = (foo[:,1] - foo[:,1].min()) / (foo[:,1].max() - foo[:,1].min())
You are trying to min-max scale between 0 and 1 only the second column.
Using sklearn.preprocessing.minmax_scale, should easily solve your problem.
e.g.:
from sklearn.preprocessing import minmax_scale
column_1 = foo[:,0] #first column you don't want to scale
column_2 = minmax_scale(foo[:,1], feature_range=(0,1)) #second column you want to scale
foo_norm = np.stack((column_1, column_2), axis=1) #stack both columns to get a 2d array
Should yield
array([[0. , 0. ],
[0.13216 , 0.06609523],
[0.25379 , 1. ],
[0.30874 , 0.09727968]])
Maybe you want to min-max scale between 0 and 1 both columns. In this case, use:
foo_norm = minmax_scale(foo, feature_range=(0,1), axis=0)
Which yields
array([[0. , 0. ],
[0.42806245, 0.06609523],
[0.82201853, 1. ],
[1. , 0.09727968]])
note: Not to be confused with the operation that scales the norm (length) of a vector to a certain value (usually 1), which is also commonly referred to as normalization.
I recently posted a question here which was answered exactly as I asked. However, I think I overestimated my ability to manipulate the answer further. I read the broadcasting doc, and followed a few links that led me way back to 2002 about numpy broadcasting.
I've used the second method of array creation using broadcasting:
N = 10
out = np.zeros((N**3,4),dtype=int)
out[:,:3] = (np.arange(N**3)[:,None]/[N**2,N,1])%N
which outputs:
[[0,0,0,0]
[0,0,1,0]
...
[0,1,0,0]
[0,1,1,0]
...
[9,9,8,0]
[9,9,9,0]]
but I do not understand via the docs how to manipulate that. I would ideally like to be able to set the increments in which each individual column changes.
ex. Column A changes by 0.5 up to 2, column B changes by 0.2 up to 1, and column C changes by 1 up to 10.
[[0,0,0,0]
[0,0,1,0]
...
[0,0,9,0]
[0,0.2,0,0]
...
[0,0.8,9,0]
[0.5,0,0,0]
...
[1.5,0.8,9,0]]
Thanks for any help.
You can adjust your current code just a little bit to make it work.
>>> out = np.zeros((4*5*10,4))
>>> out[:,:3] = (np.arange(4*5*10)[:,None]//(5*10, 10, 1)*(0.5, 0.2, 1)%(2, 1, 10))
>>> out
array([[ 0. , 0. , 0. , 0. ],
[ 0. , 0. , 1. , 0. ],
[ 0. , 0. , 2. , 0. ],
...
[ 0. , 0. , 8. , 0. ],
[ 0. , 0. , 9. , 0. ],
[ 0. , 0.2, 0. , 0. ],
...
[ 0. , 0.8, 9. , 0. ],
[ 0.5, 0. , 0. , 0. ],
...
[ 1.5, 0.8, 9. , 0. ]])
The changes are:
No int dtype on the array, since we need it to hold floats in some columns. You could specify a float dtype if you want (or even something more complicated that only allows floats in the first two columns).
Rather than N**3 total values, figure out the number of distinct values for each column, and multiply them together to get our total size. This is used for both zeros and arange.
Use the floor division // operator in the first broadcast operation because we want integers at this point, but later we'll want floats.
The values to divide by are again based on the number of values for the later columns (e.g. for A,B,C numbers of values, divide by B*C, C, 1).
Add a new broadcast operation to multiply by various scale factors (how much each value increases at once).
Change the values in the broadcast mod % operation to match the bounds on each column.
This small example helps me understand what is going on:
In [123]: N=2
In [124]: np.arange(N**3)[:,None]/[N**2, N, 1]
Out[124]:
array([[ 0. , 0. , 0. ],
[ 0.25, 0.5 , 1. ],
[ 0.5 , 1. , 2. ],
[ 0.75, 1.5 , 3. ],
[ 1. , 2. , 4. ],
[ 1.25, 2.5 , 5. ],
[ 1.5 , 3. , 6. ],
[ 1.75, 3.5 , 7. ]])
So we generate a range of numbers (0 to 7) and divide them by 4,2, and 1.
The rest of the calculation just changes each value without further broadcasting
Apply %N to each element
In [126]: np.arange(N**3)[:,None]/[N**2, N, 1]%N
Out[126]:
array([[ 0. , 0. , 0. ],
[ 0.25, 0.5 , 1. ],
[ 0.5 , 1. , 0. ],
[ 0.75, 1.5 , 1. ],
[ 1. , 0. , 0. ],
[ 1.25, 0.5 , 1. ],
[ 1.5 , 1. , 0. ],
[ 1.75, 1.5 , 1. ]])
Assigning to an int array is the same as converting the floats to integers:
In [127]: (np.arange(N**3)[:,None]/[N**2, N, 1]%N).astype(int)
Out[127]:
array([[0, 0, 0],
[0, 0, 1],
[0, 1, 0],
[0, 1, 1],
[1, 0, 0],
[1, 0, 1],
[1, 1, 0],
[1, 1, 1]])
Suppose we had two arrays: some values, e.g. array([1.2, 1.4, 1.6]), and some indices (let's say, array([0, 2, 1])) Our output is expected to be the values put into a bigger array, "addressed" by the indices, so we would get
array([[ 1.2, 0. , 0. ],
[ 0. , 0. , 1.4],
[ 0. , 1.6, 0. ]])
Is there a way to do this without loops, in a nice, fast way?
With
a = zeros((3,3))
b = array([0, 2, 1])
vals = array([1.2, 1.4, 1.6])
You just need to index it (with the help of arange or r_):
>>> a[r_[:len(b)], b] = vals
array([[ 1.2, 0. , 0. ],
[ 0. , 0. , 1.4],
[ 0. , 1.6, 0. ]])
How do we modify this for higher dimensions? For example, a is a 5x4x3 array and b and vals are 5x4 arrays.
then How do we modify the statement a[r_[:len(b)],b] = vals ?