This is an incomplete Python snippet of convolution with FFT.
I want to modify it to make it support, 1) valid convolution
2) and full convolution
import numpy as np
from numpy.fft import fft2, ifft2
image = np.array([[3,2,5,6,7,8],
[5,4,2,10,8,1]])
kernel = np.array([[4,5],
[1,2]])
fft_size = # what size should I put here for,
# 1) valid convolution
# 2) full convolution
convolution = ifft2(fft2(image, fft_size) * fft2(kernel, fft_size))
Thank you in advance.
In the case of 1-dimensional arrays x and y with lengths L and M, resp., you need to pad the FFT to size L + M - 1 for mode="full". For the 2-d case, apply that rule to each axis.
Using numpy, you can compute the size in the 2-d case with
np.array(x.shape) + np.array(y.shape) - 1
To implement the "valid" mode, you'll have to compute the "full" result and then slice out the valid part. For 1-d, assuming L > M, the valid data is the L - M + 1 elements in the center of the full data. Again, apply the same rule to each axis in the 2-d case.
For example,
import numpy as np
from numpy.fft import fft2, ifft2
def fftconvolve2d(x, y, mode="full"):
"""
x and y must be real 2-d numpy arrays.
mode must be "full" or "valid".
"""
x_shape = np.array(x.shape)
y_shape = np.array(y.shape)
z_shape = x_shape + y_shape - 1
z = ifft2(fft2(x, z_shape) * fft2(y, z_shape)).real
if mode == "valid":
# To compute a valid shape, either np.all(x_shape >= y_shape) or
# np.all(y_shape >= x_shape).
valid_shape = x_shape - y_shape + 1
if np.any(valid_shape < 1):
valid_shape = y_shape - x_shape + 1
if np.any(valid_shape < 1):
raise ValueError("empty result for valid shape")
start = (z_shape - valid_shape) // 2
end = start + valid_shape
z = z[start[0]:end[0], start[1]:end[1]]
return z
Here's the function applied to your example data:
In [146]: image
Out[146]:
array([[ 3, 2, 5, 6, 7, 8],
[ 5, 4, 2, 10, 8, 1]])
In [147]: kernel
Out[147]:
array([[4, 5],
[1, 2]])
In [148]: fftconvolve2d(image, kernel, mode="full")
Out[148]:
array([[ 12., 23., 30., 49., 58., 67., 40.],
[ 23., 49., 37., 66., 101., 66., 21.],
[ 5., 14., 10., 14., 28., 17., 2.]])
In [149]: fftconvolve2d(image, kernel, mode="valid")
Out[149]: array([[ 49., 37., 66., 101., 66.]])
More error checking could be added, and it could be modified to handle complex arrays and n-dimensional arrays. And it would be nice if additional padding was chosen to make the FFT calculation more efficient. If you made all those enhancements, you might end up with something like scipy.signal.fftconvolve (https://github.com/scipy/scipy/blob/master/scipy/signal/signaltools.py#L210):
In [152]: from scipy.signal import fftconvolve
In [153]: fftconvolve(image, kernel, mode="full")
Out[153]:
array([[ 12., 23., 30., 49., 58., 67., 40.],
[ 23., 49., 37., 66., 101., 66., 21.],
[ 5., 14., 10., 14., 28., 17., 2.]])
In [154]: fftconvolve(image, kernel, mode="valid")
Out[154]: array([[ 49., 37., 66., 101., 66.]])
Related
I want to filter the second index of numpy array, but why can't I save it to filter_arr = []
the code :
import numpy as np
data = [
[0.0,52.0,33.0,44.0,51.0],
[0.0,30.0,45.0,12.0,44.0],
[0.0,67.0,99.0,23.0,78.0]
]
arr = np.array(data)
filter_arr = []
for i in range(0,len(arr)):
if arr[:1,i] > 50:
filter_arr.append(arr)
filter_arr = np.array(filter_arr)
filter_arr
the filter array should be :
array([[[ 0., 52., 33., 44., 51.],
[ 0., 67., 99., 23., 78.]],
You could try below:
This will filter whole row by second index.
import numpy as np
data = [
[0.0,52.0,33.0,44.0,51.0],
[0.0,30.0,45.0,12.0,44.0],
[0.0,67.0,99.0,23.0,78.0]
]
arr = np.array(data)
filter_arr = []
for i in range(len(arr)):
if arr[i,1] > 50:
filter_arr.append(arr[i])
filter_arr = np.array(filter_arr)
filter_arr
arr[np.max(arr, axis = 1) > 50,:]
or as a function:
def filter_rows(arr, min=50):
return arr[np.max(arr, axis = 1) > min,:]
based on https://stackoverflow.com/a/70185464/3957794
Don't use loops with numpy, the correct approach here is to use numpy slicing/indexing:
filter_arr = arr[arr[:, 1]>50]
Output:
array([[ 0., 52., 33., 44., 51.],
[ 0., 67., 99., 23., 78.]])
So I'm trying to to take the dot product of two arrays using numpy's dot product function.
import numpy as np
MWFrPos_Hydro1 = subPos1[submaskFirst1]
x = MWFrPos_Hydro1
MWFrVel_Hydro1 = subVel1[submaskFirst1]
y = MWFrVel_Hydro1
MWFrPosMag_Hydro1 = [np.linalg.norm(i) for i in MWFrPos_Hydro1]
np.dot(x, y)
returns
---------------------------------------------------------------------------
ValueError Traceback (most recent call last)
<ipython-input-135-9ef41eb4235d> in <module>()
6
7
----> 8 np.dot(x, y)
ValueError: shapes (1220,3) and (1220,3) not aligned: 3 (dim 1) != 1220 (dim 0)
And I using this function improperly?
The arrays look like this
print x
[[ 51.61872482 106.19775391 69.64765167]
[ 33.86419296 11.75729942 11.84990311]
[ 12.75009823 58.95491028 38.06708527]
...,
[ 99.00266266 96.0210495 18.79844856]
[ 27.18083954 74.35041809 78.07577515]
[ 19.29788399 82.16114044 1.20453501]]
print y
[[ 40.0402298 -162.62153625 -163.00158691]
[-359.41983032 -115.39328766 14.8419466 ]
[ 95.92044067 -359.26425171 234.57330322]
...,
[ 130.17840576 -7.00977898 42.09699249]
[ 37.37852478 -52.66002655 -318.15155029]
[ 126.1726532 121.3104248 -416.20855713]]
Would for looping np.vdot be more optimal in this circumstance?
You can't take the dot product of two n * m matrices unless m == n -- when multiplying two matrices, A and B, B needs to have as many columns as A has rows. (So you can multiply an n * m matrix with an m * n matrix.)
See this article on multiplying matrices.
Some possible products for (n,3) arrays (here I'll just one)
In [434]: x=np.arange(12.).reshape(4,3)
In [435]: x
Out[435]:
array([[ 0., 1., 2.],
[ 3., 4., 5.],
[ 6., 7., 8.],
[ 9., 10., 11.]])
element by element product, summed across the columns; n values. This is a magnitude like number.
In [436]: (x*x).sum(axis=1)
Out[436]: array([ 5., 50., 149., 302.])
Same thing with einsum, which gives more control over which axes are multiplied, and which are summed.
In [437]: np.einsum('ij,ij->i',x,x)
Out[437]: array([ 5., 50., 149., 302.])
dot requires last of the 1st and 2nd last of 2nd to have the same size, so I have to use x.T (transpose). The diagonal matches the above.
In [438]: np.dot(x,x.T)
Out[438]:
array([[ 5., 14., 23., 32.],
[ 14., 50., 86., 122.],
[ 23., 86., 149., 212.],
[ 32., 122., 212., 302.]])
np.einsum('ij,kj',x,x) does the same thing.
There is a new matmul product, but with 2d arrays like this it is just dot. I have to turn them into 3d arrays to get the 4 values; and even with that I have to squeeze out excess dimensions:
In [450]: x[:,None,:]#x[:,:,None]
Out[450]:
array([[[ 5.]],
[[ 50.]],
[[ 149.]],
[[ 302.]]])
In [451]: np.squeeze(_)
Out[451]: array([ 5., 50., 149., 302.])
This seems like it should be straightforward, but I can't figure it out.
Data source is a two column, comma delimited input file with these contents:
6,10
5,9
8,13
...
And my code is:
import numpy as np
data = np.loadtxt("data.txt", delimiter=",")
m = len(data)
x = np.reshape(data[:,0], (m,1))
y = np.ones((m,1))
z = np.matrix([x,y])
Which gives me this error:
Users/acpigeon/.virtualenvs/ipynb/lib/python2.7/site-packages/numpy-1.9.0.dev_297f54b-py2.7-macosx-10.9-intel.egg/numpy/matrixlib/defmatrix.pyc in __new__(subtype, data, dtype, copy)
270 shape = arr.shape
271 if (ndim > 2):
--> 272 raise ValueError("matrix must be 2-dimensional")
273 elif ndim == 0:
274 shape = (1, 1)
ValueError: matrix must be 2-dimensional
No amount of reshaping seems to get this to work, so I'm either missing something really simple or there's a better way to do this.
EDIT:
Would have been helpful to specify the output I am looking for. Here is a line of code that generates the desired result:
In [1]: np.matrix([[5,1],[6,1],[8,1]])
Out[1]:
matrix([[5, 1],
[6, 1],
[8, 1]])
The desired output can be generated this way:
In [12]: np.array((data[:, 0], np.ones(m))).transpose()
Out[12]:
array([[ 6., 1.],
[ 5., 1.],
[ 8., 1.]])
The above is copied from ipython and so has ipython style prompts.
Answer to previous version
To eliminate the error, replace:
x = np.reshape(data[:, 0], (m, 1))
with:
x = data[:, 0]
The former line produces a 2-dimensional matrix and that is what causes the error message. The latter produces a 1-D array with the same data.
Or how about first turning the array into a matrix, and then change the last column to 1?
In [2]: data=np.loadtxt('stack23859379.txt',delimiter=',')
In [3]: np.matrix(data)
Out[3]:
matrix([[ 6., 10.],
[ 5., 9.],
[ 8., 13.]])
In [4]: z = np.matrix(data)
In [5]: z[:,1]=1
In [6]: z
Out[6]:
matrix([[ 6., 1.],
[ 5., 1.],
[ 8., 1.]])
I have two matrices, A and B:
A = array([[2., 13., 25., 1.], [ 18., 5., 1., 25.]])
B = array([[2, 1], [0, 3]])
I want to index each row of A with each row of B, producing the slice:
array([[25., 13.], [18., 25.]])
That is, I essentially want something like:
array([A[i,b] for i,b in enumerate(B)])
Is there a way to fancy-index this directly? The best I can do is this "flat-hack":
A.flat[B + arange(0,A.size,A.shape[1])[:,None]]
#Ophion's answer is great, and deserves the credit, but I wanted to add some explanation, and offer a more intuitive construction.
Instead of rotating B and then rotating the result back, it's better to just rotate the arange. I think this gives the most intuitive solution, even if it takes more characters:
A[((0,),(1,)), B]
or equivalently
A[np.arange(2)[:, None], B]
This works because what's really going on here, is you're making an i array and a j array, each of which have the same shape as your desired result.
i = np.array([[0, 0],
[1, 1]])
j = B
But you can use just
i = np.array([[0],
[1]])
Because it will broadcast to match B (this is what np.arange(2)[:,None] gives).
Finally, to make it more general (not knowing 2 as the arange size), you could also generate i from B with
i = np.indices(B.shape)[0]
however you build i and j, you just call it like
>>> A[i, j]
array([[ 25., 13.],
[ 18., 25.]])
Not pretty but:
A[np.arange(2),B.T].T
array([[ 25., 13.],
[ 18., 25.]])
I have a very large MySQL query in my web app that looks like this:
query =
SELECT video_tag.video_id, (sum(user_rating.rating) * video.rating_norm) as score
FROM video_tag
JOIN user_rating ON user_rating.item_id = video_tag.tag_id
JOIN video ON video.id = video_tag.video_id
WHERE item_type = 3 AND user_id = 1 AND rating != 0 AND video.website_id = 2
AND rating_norm > 0 AND video_id NOT IN (1,2,3) GROUP BY video_id
ORDER BY score DESC LIMIT 20"
This query joins three tables (video, video_tag, and user_rating), groups the results, and does some basic math to compute a score for each video. This takes about 2s to run as the tables are large.
Instead of making SQL do all this work, I suspect it would be faster to do this computation using NumPy arrays. The data in 'video' and 'video_tag' is constant - so I could just load those table into memory once and not have to ping SQL each time.
However, while I can load these three tables into three separate arrays, I'm having a heck of a time replicating the above query (specifically the JOIN and GROUP BY parts). Has anyone any experience with replicating SQL queries using NumPy arrays?
Thanks!
What makes this exercise awkward is the single-data-type constraint for NumPy arrays. For instance, the GROUP BY operation implicitly requires (at least) one field/column of continuous values (to aggregate/sum) and one field/column to partition or group by.
Of course, NumPy recarrays can represent a 2D array (or SQL Table) using a different data type for each column (aka 'Field'), but I find these composite arrays cumbersome to work with. So in the code snippets below, i just used the conventional ndarray class to replicate the two SQL operations highlighted in the OP's Question.
to mimic SQL JOIN in NumPy:
first, create two NumPy arrays (A & B) each to represent an SQL Table. The primary keys for A are in 1st column; foreign key for B also in 1st column.
import numpy as NP
A = NP.random.randint(10, 100, 40).reshape(8, 5)
a = NP.random.randint(1, 3, 8).reshape(8, -1) # add column of primary keys
A = NP.column_stack((a, A))
B = NP.random.randint(0, 10, 4).reshape(2, 2)
b = NP.array([1, 2])
B = NP.column_stack((b, B))
Now (attempt to) replicate JOIN using NumPy array objects:
# prepare the array that will hold the 'result set':
AB = NP.column_stack((A, NP.zeros((A.shape[0], B.shape[1]-1))))
def join(A, B) :
'''
returns None, side effect is population of 'results set' NumPy array, 'AB';
pass in A, B, two NumPy 2D arrays, representing the two SQL Tables to join
'''
k, v = B[:,0], B[:,1:]
dx = dict(zip(k, v))
for i in range(A.shape[0]) :
AB[i:,-2:] = dx[A[i,0]]
to mimic SQL GROUP BY in NumPy:
def group_by(AB, col_id) :
'''
returns 2D NumPy array aggregated on the unique values in column specified by col_id;
pass in a 2D NumPy array and the col_id (integer) which holds the unique values to group by
'''
uv = NP.unique(AB[:,col_id])
temp = []
for v in uv :
ndx = AB[:,0] == v
temp.append(NP.sum(AB[:,1:][ndx,], axis=0))
temp = NP. row_stack(temp)
uv = uv.reshape(-1, 1)
return NP.column_stack((uv, temp))
for a test case, they return the correct result:
>>> A
array([[ 1, 92, 50, 67, 51, 75],
[ 2, 64, 35, 38, 69, 11],
[ 1, 83, 62, 73, 24, 55],
[ 2, 54, 71, 38, 15, 73],
[ 2, 39, 28, 49, 47, 28],
[ 1, 68, 52, 28, 46, 69],
[ 2, 82, 98, 24, 97, 98],
[ 1, 98, 37, 32, 53, 29]])
>>> B
array([[1, 5, 4],
[2, 3, 7]])
>>> join(A, B)
array([[ 1., 92., 50., 67., 51., 75., 5., 4.],
[ 2., 64., 35., 38., 69., 11., 3., 7.],
[ 1., 83., 62., 73., 24., 55., 5., 4.],
[ 2., 54., 71., 38., 15., 73., 3., 7.],
[ 2., 39., 28., 49., 47., 28., 3., 7.],
[ 1., 68., 52., 28., 46., 69., 5., 4.],
[ 2., 82., 98., 24., 97., 98., 3., 7.],
[ 1., 98., 37., 32., 53., 29., 5., 4.]])
>>> group_by(AB, 0)
array([[ 1., 341., 201., 200., 174., 228., 20., 16.],
[ 2., 239., 232., 149., 228., 210., 12., 28.]])