I was running a simple experiment when I noticed a difference between R's and Python's FFT.
First, Python:
import numpy as np
from pyfftw.interfaces.numpy_fft import fft
a = np.array([1, 2, 3])
fft(a)
>> array([ 6. +0.j , -1.5+0.8660254j, -1.5-0.8660254j])
b = np.array([[1, 2, 3], [4, 5, 6]])
fft(b)
>> array([[ 6. +0.j , -1.5+0.8660254j, -1.5-0.8660254j],
[15. +0.j , -1.5+0.8660254j, -1.5-0.8660254j]])```
Now R:
> a = c(1, 2, 3)
> fft(a)
[1] 6.0+0.000000i -1.5+0.866025i -1.5-0.866025i
> b = rbind(c(1, 2, 3), c(4, 5, 6))
> fft(b)
[,1] [,2] [,3]
[1,] 21+0i -3+1.732051i -3-1.732051i
[2,] -9+0i 0+0.000000i 0+0.000000i
I notice that the first row of the R result corresponds to the element-wise sum of the first and second row of the Python result, whereas the second row of the R result corresponds to the subtraction.
What am I doing wrong? I run the same experiment using np.matrix and R matrix, but got the same results. Which one should be the correct result when applying the FFT to a matrix or multidimensional array?
Following the suggestion in the comments, I did the following:
from pyfftw.interfaces.numpy_fft import fftn
b = np.array([[1, 2, 3], [4, 5, 6]])
fftn(b)
>> array([[21.+0.j , -3.+1.73205081j, -3.-1.73205081j],
[-9.+0.j , 0.+0.j , 0.+0.j ]])
which works with fft2 too.
Indeed, Python FFT is 1D (over each row) unless fft2 or fftn are used.
Related
I'm trying to experiment with using Matrices to solve polynomial expressions, and it worked so far.
In my code:
import numpy as np
A = np.array([[1, 1, 1], [4,2,1], [9,3,1]])
B = np.array([2, 5, 10])
sol = np.linalg.solve(A, B)
print(sol)
Array A is just the first 3 values required to solve for a quadratic. Array B is for x^2 + 1.
So The output of the function should be:
[1. 0. 1.]
Instead, I'm getting:
[ 1.00000000e+00 -8.32667268e-16 1.00000000e+00]
I get the e+00, but why is the second value "-8.32667268e-16"??
I've double checked my math and it should be x^2 + 1.
Well technically, it's -8x10^(-16) ~= 0, hence the answer is correct.
Although you can format it to be exactly 0 using either a sigmoid function or just using Rolle's Theorem.
-8.32667268e-16 = -0.0000000000000008326… It’s a floating point rounding error. See the documentation.
Because during the matrix operations, the numbers get rounded slightly. Try:
import numpy as np
A = np.array([[1, 1, 1],
[4, 2, 1],
[9, 3, 1]])
B = np.array([2, 5, 10])
sol = [round(n) for n in np.linalg.solve(A, B)]
print(sol)
Output:
[1.0, -0.0, 1.0]
I am trying to interpolate a 2D numpy matrix with the dimensions (5, 3) to a matrix with the dimensions (7, 3) along the axis 1 (columns). Obviously, the wrong approach would be to randomly insert rows anywhere between the original matrix, see the following example:
Source:
[[0, 1, 1]
[0, 2, 0]
[0, 3, 1]
[0, 4, 0]
[0, 5, 1]]
Target (terrible interpolation -> not wanted!):
[[0, 1, 1]
[0, 1.5, 0.5]
[0, 2, 0]
[0, 3, 1]
[0, 3.5, 0.5]
[0, 4, 0]
[0, 5, 1]]
The correct approach would be to take every row into account and interpolate between all of them to expand the source matrix to a (7, 3) matrix. I am aware of the scipy.interpolate.interp1d or scipy.interpolate.interp2d methods, but could not get it to work with other Stack Overflow posts or websites. I hope to receive any type of tips or tricks.
Update #1: The expected values should be equally spaced.
Update #2:
What I want to do is basically use the separate columns of the original matrix, expand the length of the column to 7 and interpolate between the values of the original column. See the following example:
Source:
[[0, 1, 1]
[0, 2, 0]
[0, 3, 1]
[0, 4, 0]
[0, 5, 1]]
Split into 3 separate Columns:
[0 [1 [1
0 2 0
0 3 1
0 4 0
0] 5] 1]
Expand length to 7 and interpolate between them, example for second column:
[1
1.66
2.33
3
3.66
4.33
5]
It seems like each column can be treated completely independently, but for each column you need to define essentially an "x" coordinate so that you can fit some function "f(x)" from which you generate your output matrix.
Unless the rows in your matrix are associated with some other datastructure (e.g. a vector of timestamps), an obvious set of x values is just the row-number:
x = numpy.arange(0, Source.shape[0])
You can then construct an interpolating function:
fit = scipy.interpolate.interp1d(x, Source, axis=0)
and use that to construct your output matrix:
Target = fit(numpy.linspace(0, Source.shape[0]-1, 7)
which produces:
array([[ 0. , 1. , 1. ],
[ 0. , 1.66666667, 0.33333333],
[ 0. , 2.33333333, 0.33333333],
[ 0. , 3. , 1. ],
[ 0. , 3.66666667, 0.33333333],
[ 0. , 4.33333333, 0.33333333],
[ 0. , 5. , 1. ]])
By default, scipy.interpolate.interp1d uses piecewise-linear interpolation. There are many more exotic options within scipy.interpolate, based on higher order polynomials, etc. Interpolation is a big topic in itself, and unless the rows of your matrix have some particular properties (e.g. being regular samples of a signal with a known frequency range), there may be no "truly correct" way of interpolating. So, to some extent, the choice of interpolation scheme will be somewhat arbitrary.
You can do this as follows:
from scipy.interpolate import interp1d
import numpy as np
a = np.array([[0, 1, 1],
[0, 2, 0],
[0, 3, 1],
[0, 4, 0],
[0, 5, 1]])
x = np.array(range(a.shape[0]))
# define new x range, we need 7 equally spaced values
xnew = np.linspace(x.min(), x.max(), 7)
# apply the interpolation to each column
f = interp1d(x, a, axis=0)
# get final result
print(f(xnew))
This will print
[[ 0. 1. 1. ]
[ 0. 1.66666667 0.33333333]
[ 0. 2.33333333 0.33333333]
[ 0. 3. 1. ]
[ 0. 3.66666667 0.33333333]
[ 0. 4.33333333 0.33333333]
[ 0. 5. 1. ]]
I have an array A whose shape is (N, N, K) and I would like to compute another array B with the same shape where B[:, :, i] = np.linalg.inv(A[:, :, i]).
As solutions, I see map and for loops but I am wondering if numpy provides a function to do this (I have tried np.apply_over_axes but it seems that it can only handle 1D array).
with a for loop:
B = np.zeros(shape=A.shape)
for i in range(A.shape[2]):
B[:, :, i] = np.linalg.inv(A[:, :, i])
with map:
B = np.asarray(map(np.linalg.inv, np.squeeze(np.dsplit(A, A.shape[2])))).transpose(1, 2, 0)
For an invertible matrix M we have inv(M).T == inv(M.T) (the transpose of the inverse is equal to the inverse of the transpose).
Since np.linalg.inv is broadcastable, your problem can be solved by simply transposing A, calling inv and transposing the result:
B = np.linalg.inv(A.T).T
For example:
>>> N, K = 2, 3
>>> A = np.random.randint(1, 5, (N, N, K))
>>> A
array([[[4, 2, 3],
[2, 3, 1]],
[[3, 3, 4],
[4, 4, 4]]])
>>> B = np.linalg.inv(A.T).T
>>> B
array([[[ 0.4 , -4. , 0.5 ],
[-0.2 , 3. , -0.125]],
[[-0.3 , 3. , -0.5 ],
[ 0.4 , -2. , 0.375]]])
You can check the values of B match the inverses of the arrays in A as expected:
>>> all(np.allclose(B[:, :, i], np.linalg.inv(A[:, :, i])) for i in range(K))
True
my stack is something like this
array([[[1, 2, 3],
[4, 5, 6],
[7, 8, 9]],
[[2, 2, 2],
[2, 2, 2],
[2, 2, 2]]])
I want this result:
array([[ 1.5, 2. , 2.5],
[ 3. , 3.5, 4. ],
[ 4.5, 5. , 5.5]])
I updated my question I think it's more clearer now.
Well, first, you don't have a stack of 2D arrays, you have three separate variables.
Fortunately, most functions in NumPy take an array_like argument. And the tuple (a, b, c) is "array-like" enough—it'll be converted into the 3D array that you should have had in the first place.
Anyway, the obvious function to take the mean is np.mean. As the docs say:
The average is taken over the flattened array by default, otherwise over the specified axis.
So just specify the axis you want—the newly-created axis 0.
np.mean((a,b,c), axis=0)
In your updated question, you now have a single 2x3x3 array, a, instead of three 2x2 arrays, a, b, and c, and you want the mean across the first axis (the one with dimension 2). This is the same thing, but slightly easier:
np.mean(a, axis=0)
Or course the mean of 4, 7, and 3 is 4.666666666666667, not 4. In your updated question, that seems to be what you want; in your original question… I'm not sure if you wanted to truncate or round, or if you wanted the median or something else rather than the mean, or anything else, but those are all easy (add dtype=int64 to the call, call .round() on the result, call median instead of mean, etc.).
>>> a = np.array([[1,2],[3,4]])
>>> b = np.array([[1,5],[6,7]])
>>> c = np.array([[1,8],[8,3]])
>>> np.mean((a,b,c), axis=0)
array([[ 1. , 5. ],
[ 5.66666667, 4.66666667]])
As per your output it seems you are looking for median rather than mean.
>>> np.median((a,b,c), axis=0)
array([[ 1., 5.],
[ 6., 4.]])
I have data in a file in following form:
user_id, item_id, rating
1, abc,5
1, abcd,3
2, abc, 3
2, fgh, 5
So, the matrix I want to form for above data is following:
# itemd_ids
# abc abcd fgh
[[5, 3, 0] # user_id 1
[3, 0, 5]] # user_id 2
where missing data is replaced by 0.
But from this I want to create both user to user similarity matrix and item to item similarity matrix?
How do I do that?
Technically, this is not a programming problem but a math problem. But I think you better off using variance-covariance matrix. Or correlation matrix, if the scale of the values are very different, say, instead of having:
>>> x
array([[5, 3, 0],
[3, 0, 5],
[5, 5, 0],
[1, 1, 7]])
You have:
>>> x
array([[5, 300, 0],
[3, 0, 5],
[5, 500, 0],
[1, 100, 7]])
To get a variance-cov matrix:
>>> np.cov(x)
array([[ 6.33333333, -3.16666667, 6.66666667, -8. ],
[ -3.16666667, 6.33333333, -5.83333333, 7. ],
[ 6.66666667, -5.83333333, 8.33333333, -10. ],
[ -8. , 7. , -10. , 12. ]])
Or the correlation matrix:
>>> np.corrcoef(x)
array([[ 1. , -0.5 , 0.91766294, -0.91766294],
[-0.5 , 1. , -0.80295507, 0.80295507],
[ 0.91766294, -0.80295507, 1. , -1. ],
[-0.91766294, 0.80295507, -1. , 1. ]])
This is the way to look at it, the diagonal cell, i.e., (0,0) cell, is the correlation of your 1st vector in X to it self, so it is 1. The other cells, i.e, (0,1) cell, is the correlation between the 1st and 2nd vector in X. They are negatively correlated. Or similarly, the 1st and 3rd cell are positively correlated.
covariance matrix or correlation matrix avoid the zero problem pointed out by #Akavall.
See this question: What's the fastest way in Python to calculate cosine similarity given sparse matrix data?
Having:
A = np.array(
[[0, 1, 0, 0, 1],
[0, 0, 1, 1, 1],
[1, 1, 0, 1, 0]])
dist_out = 1-pairwise_distances(A, metric="cosine")
dist_out
Result in:
array([[ 1. , 0.40824829, 0.40824829],
[ 0.40824829, 1. , 0.33333333],
[ 0.40824829, 0.33333333, 1. ]])
But that works for dense matrix. For sparse you have to develop your solution.