I am looking for a way to apply a Gaussian filter to an image (tensor) only using PyTorch functions. Using numpy, the equivalent code is
import numpy as np
from scipy import signal
import matplotlib.pyplot as plt
# Define 2D Gaussian kernel
def gkern(kernlen=256, std=128):
"""Returns a 2D Gaussian kernel array."""
gkern1d = signal.gaussian(kernlen, std=std).reshape(kernlen, 1)
gkern2d = np.outer(gkern1d, gkern1d)
return gkern2d
# Generate random matrix and multiply the kernel by it
A = np.random.rand(256*256).reshape([256,256])
# Test plot
plt.imshow(A*gkern(256, std=32))
The closest suggestion I found is based on this post:
import torch.nn as nn
conv = nn.Conv2d(in_channels = 1, out_channels = 1, kernel_size=264, bias=False)
with torch.no_grad():
conv.weight = gaussian_weights
But it gives me the error NameError: name 'gaussian_weights' is not defined. How can I make it work?
Yupp I also had the same idea. So now the question becomes: is there a way to define a Gaussian kernel (or a 2D Gaussian) without using Numpy and/or explicitly specifying the weights?
Yes, it is pretty easy. Just have a look to the function documentation of signal.gaussian. There is a link to the source code. So what the method is doing is the following:
def gaussian(M, std, sym=True):
if M < 1:
return np.array([])
if M == 1:
return np.ones(1, 'd')
odd = M % 2
if not sym and not odd:
M = M + 1
n = np.arange(0, M) - (M - 1.0) / 2.0
sig2 = 2 * std * std
w = np.exp(-n ** 2 / sig2)
if not sym and not odd:
w = w[:-1]
return w
And you are lucky because is the straightforward to convert in Pytorch, (almost) just replacing np by torch and you are done!
Also, note that np.outer equivalent in torch is ger.
There is a Pytorch class to apply Gaussian Blur to your image:
torchvision.transforms.GaussianBlur(kernel_size, sigma=(0.1, 2.0))
Check the documentation for more info
Assuming that the question actually asks for a convolution with a Gaussian (i.e. a Gaussian blur, which is what the title and the accepted answer imply to me) and not for a multiplication (i.e. a vignetting effect, which is what the question's demo code produces), here is a pure PyTorch version that does not need torchvision to be installed (otherwise torchvision.transforms.GaussianBlur() can be used instead, as has been proposed by Mushfirat Mohaimin's answer):
from math import ceil
import torch
from torch.nn.functional import conv2d
from torch.distributions import Normal
def gaussian_kernel_1d(sigma: float, num_sigmas: float = 3.) -> torch.Tensor:
radius = ceil(num_sigmas * sigma)
support = torch.arange(-radius, radius + 1, dtype=torch.float)
kernel = Normal(loc=0, scale=sigma).log_prob(support).exp_()
# Ensure kernel weights sum to 1, so that image brightness is not altered
return kernel.mul_(1 / kernel.sum())
def gaussian_filter_2d(img: torch.Tensor, sigma: float) -> torch.Tensor:
kernel_1d = gaussian_kernel_1d(sigma) # Create 1D Gaussian kernel
padding = len(kernel_1d) // 2 # Ensure that image size does not change
img = img.unsqueeze(0).unsqueeze_(0) # Need 4D data for ``conv2d()``
# Convolve along columns and rows
img = conv2d(img, weight=kernel_1d.view(1, 1, -1, 1), padding=(padding, 0))
img = conv2d(img, weight=kernel_1d.view(1, 1, 1, -1), padding=(0, padding))
return img.squeeze_(0).squeeze_(0) # Make 2D again
if __name__ == "__main__":
import matplotlib.pyplot as plt
img = torch.rand(size=(100, 100))
img_filtered = gaussian_filter_2d(img, sigma=1.5)
The code uses the basic idea of a separable filter that Andrei Bârsan implied in a comment to this answer. This means that convolution with a 2D Gaussian kernel can be replaced by convolving twice with a 1D Gaussian kernel – once along the image's columns, once along its rows. This is more efficient in general, as it uses 2N rather than N² multiplications per pixel for a kernel of side length N.
So in the provided code, we first create a 1D Gaussian kernel with gaussian_kernel_1d(), which we then apply twice in gaussian_filter_2d().
Some more notes on the code:
The parameter num_sigmas controls how many standard deviations and thus how much of the bulge of the Gaussian function we actually sample for producing the convolution kernel. As the Gaussian function theoretically has infinite support (meaning it is never zero), this presents a trade-off between accuracy and kernel size (which affects speed and memory use). A length of 3 * sigma should be sufficient for the two halves of the support usually, given that it will cover 99.7% of the area under the corresponding Gaussian function.
Rather than using Normal().log_prob().exp_() for producing the kernel, we could explicitly write
the function of the normal distribution here, which might be a bit more efficient. In fact, we could write kernel = support.square_().mul_(-.5 / (sigma ** 2)).exp_(), thus (1) altering the values of support in-place (as we won't need them, any longer) and (2) even omitting the normalization constant of the normal distribution (as we must normalize the kernel by its sum before returning it, anyway).
Although we use conv2d() rather than conv1d(), effectively we still have two 1D convolutions, as we apply a N×1 and 1×N kernel in conv2d(). We could have used conv1d() instead, but the code is much simpler with conv2d().
In more recent PyTorch versions, we can use conv2d(…, padding="same"), rather than calculating the padding amount ourselves. In either case, using conv2d()'s padding parameter implies padding with zeros. If we wanted more padding options, we could manually pad the image with torch.nn.functional.pad() before the convolution instead.
Used all the codes from above and updated with Pytorch revision of torch.outer
import torch
def gaussian_fn(M, std):
n = torch.arange(0, M) - (M - 1.0) / 2.0
sig2 = 2 * std * std
w = torch.exp(-n ** 2 / sig2)
return w
def gkern(kernlen=256, std=128):
"""Returns a 2D Gaussian kernel array."""
gkern1d = gaussian_fn(kernlen, std=std)
gkern2d = torch.outer(gkern1d, gkern1d)
return gkern2d
# Generate random matrix and multiply the kernel by it
A = np.random.rand(256*256).reshape([256,256])
A = torch.from_numpy(A)
guassian_filter = gkern(256, std=32)
f = plt.figure(figsize=(12,5))
ax[0].imshow(A, cmap='gray')
ax[1].imshow(guassian_filter, cmap='gray')
ax[2].imshow(A*guassian, cmap='gray')
I am building an audio-based deep learning model. As part of the preporcessing I want to augment the audio in my datasets. One augmentation that I want to do is to apply RIR (room impulse response) function. I am working with Python 3.9.5 and TensorFlow 2.8.
In Python the standard way to do it is, if the RIR is given as a finite impulse response (FIR) of n taps, is using SciPy lfilter
import numpy as np
from scipy import signal
import soundfile as sf
h = np.load("rir.npy")
x, fs = sf.read("audio.wav")
y = signal.lfilter(h, 1, x)
Running in loop on all the files may take a long time. Doing it with TensorFlow map utility on TensorFlow datasets:
# define filter function
def h_filt(audio, label):
h = np.load("rir.npy")
x = audio.numpy()
y = signal.lfilter(h, 1, x)
return tf.convert_to_tensor(y, dtype=tf.float32), label
# apply it via TF map on dataset
aug_ds = ds.map(h_filt)
Using tf.numpy_function:
tf_h_filt = tf.numpy_function(h_filt, [audio, label], [tf.float32, tf.string])
# apply it via TF map on dataset
aug_ds = ds.map(tf_h_filt)
I have two questions:
Is this way correct and fast enough (less than a minute for 50,000 files)?
Is there a faster way to do it? E.g. replace the SciPy function with a built-in TensforFlow function. I didn't find the equivalent of lfilter or SciPy's convolve.
Here is one way you could do
Notice that tensor flow function is designed to receive batches of inputs with multiple channels, and the filter can have multiple input channels and multiple output channels. Let N be the size of the batch I, the number of input channels, F the filter width, L the input width and O the number of output channels. Using padding='SAME' it maps an input of shape (N, L, I) and a filter of shape (F, I, O) to an output of shape (N, L, O).
import numpy as np
from scipy import signal
import tensorflow as tf
# data to compare the two approaches
x = np.random.randn(100)
h = np.random.randn(11)
# h
y_lfilt = signal.lfilter(h, 1, x)
# Since the denominator of your filter transfer function is 1
# the output of lfiler matches the convolution
y_np = np.convolve(h, x)
assert np.allclose(y_lfilt, y_np[:len(y_lfilt)])
# now let's do the convolution using tensorflow
y_tf = tf.nn.conv1d(
# x must be padded with half of the size of h
# to use padding 'SAME'
np.pad(x, len(h) // 2).reshape(1, -1, 1),
# the time axis of h must be flipped
h[::-1].reshape(-1, 1, 1), # a 1x1 matrix of filters
assert np.allclose(y_lfilt, np.squeeze(y_tf)[:len(y_lfilt)])
I want to decompose a 3-dimensional tensor using SVD.
I am not quite sure if and, how following decomposition can be achieved.
I already know how I can split the tensor horizontally from this tutorial: tensors.org Figure 2.2b
d = 10; A = np.random.rand(d,d,d)
Am = A.reshape(d**2,d)
Um,Sm,Vh = LA.svd(Am,full_matrices=False)
U = Um.reshape(d,d,d); S = np.diag(Sm)
Matrix methods can be naturally extended to higher-orders. SVD, for instance, can be generalized to tensors e.g. with the Tucker decomposition, sometimes called a higher-order SVD.
We maintain a Python library for tensor methods, TensorLy, which lets you do this easily. In this case you want a partial Tucker as you want to leave one of the modes uncompressed.
Let's import the necessary parts:
import tensorly as tl
from tensorly import random
from tensorly.decomposition import partial_tucker
For testing, let's create a 3rd order tensor of size (10, 10, 10):
size = 10
order = 3
shape = (size, )*order
tensor = random.random_tensor(shape)
You can now decompose the tensor using the tensor decomposition. In your case, you want to leave one of the dimensions untouched, so you'll only have two factors (your U and V) and a core tensor (your S):
core, factors = partial_tucker(tensor, rank=size, modes=[0, 2])
You can reconstruct the original tensor from your approximation using a series of n-mode products to contract the core with the factors:
from tensorly import tenalg
rec = tenalg.multi_mode_dot(core, factors, modes=[0, 2])
rec_error = tl.norm(rec - tensor)/tl.norm(tensor)
print(f'Relative reconstruction error: {rec_error}')
In my case, I get
Relative reconstruction error: 9.66027176805661e-16
You can also use "tensorlearn" package in python for example using tensor-train (TT) SVD algorithm.
import numpy as np
import tensorlearn as tl
#lets generate an arbitrary array
tensor = np.arange(0,1000)
#reshaping it into a higher (3) dimensional tensor
tensor = np.reshape(tensor,(10,20,5))
#decompose the tensor to its factors
tt_factors=tl.auto_rank_tt(tensor, epsilon) #epsilon is the error bound
#tt_factors is a list of three arrays which are the tt-cores
#rebuild (estimating) the tensor using the factors again as tensor_hat
#lets see the error
print('error (%)= ',error*100) #which is less than epsilon
# one usage of tensor decomposition is data compression
# So, lets calculate the compression ratio
#data saving
print('data_saving (%): ', data_saving*100)
I'm working in the estimation of cloud displacement for wind energy purposes with RGB GOES satellital images. I find the following the methodology from this paper "An Automated Technique for Obtaining Cloud Motion From Geosynchronous Satellite Data Using Cross Correlation" to achieve it. I don't know if this is a good way to compute this. The code bassically gets the cross correlation from the Fourier Transform to calculate cloud displacement between roi_a and roi_b images.
import numpy as np
import cv2 as cv
import matplotlib.pyplot as plt
img_a = cv.imread('2019.1117.1940.goes-16.rgb.tif', 0)
img_b = cv.imread('2019.1117.1950.goes-16.rgb.tif', 0)
roi_a = img_a[700:900, 1900:2100]
roi_b = img_b[700:900, 1900:2100]
def Fou(image):
fft_roi= np.fft.fft2(image)
return fft_roi
def inv_Fou(C_w):
c_t = np.fft.ifft2(C_w)
c_t = np.abs(c_t)
return c_t
#Step 1: gets the FFT
G_t0 = Fou(roi_a)##t_0
fft_roiA_conj = np.conj(G_t0) #Conjugate
G_t1 = Fou(roi_b)##t_1
#Step 2: Compute C(m, v)
prod = np.dot(fft_roiA_conj, G_t1)
#Step 3: Perform the inverse FFT
inv = inv_Fou(prod)
plt.imshow(inv, cmap = 'gray', )
plt.title('C (m,v) --> Cov(p,q)')
#Step 4: Compute cross correlation coefficient and the maximum cross correlation coefficient
def rms(sigma):
"Compute the standar deviation of an image"
rms = np.std(sigma)
return rms
R_t = inv / (rms(roi_a) * rms(roi_b))
This is the first time that I use FFT on images, so I have some questions about it:
I don't add fftshift, is this can affect the result?
What is difference between use np.dot in step 2 and simple '*', like prod = fft_roiA_conj * G_t1
How to interpret the image result (C(m, v) -> Cov (p, q)) from step 3?
How can I obtain the maximum coefficient p' and q' (maximum coefficient of x and y directions) from R_t?
1 - fftshift is a circular rotation, if you have a two sided signal you are computing the correlation is shifted (circularly), what is important is that you map your indices to displacements correctly, with or without fftshift.
2 - numpy.dot is the matrix product (equivalent to # operator for recent python versions), and the * operator does element-wise multiplication, in my understanding you want the element-wise product at step 2.
3 - Once you correct the step 2 you will have an image such that inv[i, j] the correlation of the immage roi_a and the image roi_b rolled by i rows and j columns
To answer the last question I will workout an example.
I will use the image scipy.misc.face, it is a RGB image, so it brings three matrices that are highly correlated.
import scipy
import numpy as np
import matplotlib.pyplot as plt
f = scipy.misc.face()
plt.figure(figsize=(12, 4))
plt.subplot(131), plt.imshow(f[:,:, 0])
plt.subplot(132), plt.imshow(f[:,:, 1])
plt.subplot(133), plt.imshow(f[:,:, 2])
The function img_corrcombine the three steps of the cross correlation (for images of the same size), notice that I am use rfft2 and irfft2, this are the FFT for real data, that take advantage of symmetry in the frequency domain.
def img_corr(foi_a, foi_b):
return np.fft.irfft2(np.fft.rfft2(foi_a) * np.conj(np.fft.rfft2(foi_b)))
C = img_corr(f[:,:,1], f[:,:,2])
plt.figure(figsize=(12, 4))
plt.subplot(121), plt.imshow(C), plt.title('FFT indices')
plt.subplot(122), plt.imshow(np.fft.fftshift(C, (0, 1))), plt.title('fftshift ed version')
To retrieve the position
# this returns the indice in the vector of all pixels
best_corr = np.argmax(C)
# unravel index gives the 2D index
best_pos = np.unravel_index(best_corr, C.shape)
# this get the positions as a fraction of the image size
relative_pos = [np.fft.fftfreq(size)[index] for index, size in zip(best_pos, C.shape)]
I hope this completes the answer.
I am building a neural network that makes use of T-distribution noise. I am using functions defined in the numpy library np.random.standard_t and the one defined in tensorflow tf.distributions.StudentT. The link to the documentation of the first function is here and that to the second function is here. I am using the said functions like below:
a = np.random.standard_t(df=3, size=10000) # numpy's function
t_dist = tf.distributions.StudentT(df=3.0, loc=0.0, scale=1.0)
sess = tf.Session()
b = sess.run(t_dist.sample(10000))
In the documentation provided for the Tensorflow implementation, there's a parameter called scale whose description reads
The scaling factor(s) for the distribution(s). Note that scale is not technically the standard deviation of this distribution but has semantics more similar to standard deviation than variance.
I have set scale to be 1.0 but I have no way of knowing for sure if these refer to the same distribution.
Can someone help me verify this? Thanks
I would say they are, as their sampling is defined in almost the exact same way in both cases. This is how the sampling of tf.distributions.StudentT is defined:
def _sample_n(self, n, seed=None):
# The sampling method comes from the fact that if:
# X ~ Normal(0, 1)
# Z ~ Chi2(df)
# Y = X / sqrt(Z / df)
# then:
# Y ~ StudentT(df).
seed = seed_stream.SeedStream(seed, "student_t")
shape = tf.concat([[n], self.batch_shape_tensor()], 0)
normal_sample = tf.random.normal(shape, dtype=self.dtype, seed=seed())
df = self.df * tf.ones(self.batch_shape_tensor(), dtype=self.dtype)
gamma_sample = tf.random.gamma([n],
0.5 * df,
samples = normal_sample * tf.math.rsqrt(gamma_sample / df)
return samples * self.scale + self.loc # Abs(scale) not wanted.
So it is a standard normal sample divided by the square root of a chi-square sample with parameter df divided by df. The chi-square sample is taken as a gamma sample with parameter 0.5 * df and rate 0.5, which is equivalent (chi-square is a special case of gamma). The scale value, like the loc, only comes into play in the last line, as a way to "relocate" the distribution sample at some point and scale. When scale is one and loc is zero, they do nothing.
Here is the implementation for np.random.standard_t:
double legacy_standard_t(aug_bitgen_t *aug_state, double df) {
double num, denom;
num = legacy_gauss(aug_state);
denom = legacy_standard_gamma(aug_state, df / 2);
return sqrt(df / 2) * num / sqrt(denom);
So essentially the same thing, slightly rephrased. Here we have also have a gamma with shape df / 2 but it is standard (rate one). However, the missing 0.5 is now by the numerator as / 2 within the sqrt. So it's just moving the numbers around. Here there is no scale or loc, though.
In truth, the difference is that in the case of TensorFlow the distribution really is a noncentral t-distribution. A simple empirical proof that they are the same for loc=0.0 and scale=1.0 is to plot histograms for both distributions and see how close they look.
import numpy as np
import tensorflow as tf
import matplotlib.pyplot as plt
t_np = np.random.standard_t(df=3, size=10000)
with tf.Graph().as_default(), tf.Session() as sess:
t_dist = tf.distributions.StudentT(df=3.0, loc=0.0, scale=1.0)
t_tf = sess.run(t_dist.sample(10000))
plt.hist((t_np, t_tf), np.linspace(-10, 10, 20), label=['NumPy', 'TensorFlow'])
That looks pretty close. Obviously, from the point of view of statistical samples, this is not any kind of proof. If you were not still convinced, there are some statistical tools for testing whether a sample comes from a certain distribution or two samples come from the same distribution.
I've got an image that I apply a Gaussian Blur to using both cv2.GaussianBlur and skimage.gaussian_filter libraries, but I get significantly different results. I'm curious as to why, and what can be done to make skimage look more like cv2. I know skimage.gaussian_filter is a wrapper around scipy.scipy.ndimage.filters.gaussian_filter. To clearly state the question, why are the two functions different and what can be done to make them more similar?
Here is my test image:
Here is the cv2 version (appears blurrier):
Here is the skimage/scipy version (appears sharper):
skimage_response = skimage.filters.gaussian_filter(im, 2, multichannel=True, mode='reflect')
cv2_response = cv2.GaussianBlur(im, (33, 33), 2)
So sigma=2 and the size of the filter is big enough that it shouldn't make a difference. Imagemagick covnert -gaussian-blur 0x2 visually agrees with cv2.
Versions: cv2=2.4.10, skimage=0.11.3, scipy=0.13.3
If anyone is curious about how to make skimage.gaussian_filter() match Matlab's equivalent imgaussfilt() (the reason I found this question), pass the parameter 'truncate=2' to skimage.gaussian_filter(). Both skimage and Matlab calculate the kernel size as a function of sigma. Matlab's default is 2. Skimage's default is 4, resulting in a significantly larger kernel by default.
These two are equal:
gau_img = cv2.GaussianBlur(img, (5,5), 10.0) # 5*5 kernal, 2 on each side. 2 = 1/5 * 10 = 1/5 * sigma
gau_img = skimage.filters.gaussian(img, sigma=10, truncate=1/5)
The whole Gaussian kernel is defined by sigma only. But which part of gaussian kernel do you use to blur the image is defined by truncate (in skimage) or ksize (in opencv).
For GaussianBlur, you are using a rather large kernel (size=33), which causes a lot of smoothing. Smoothing will depend drastically on you kernel size. With your parameters each new pixel value is "averaged" in a 33*33 pixel "window".
A definition of cv2.GaussianBlur can be found here
In contrast, skimage.filters.gaussian seems to work on a smaller kernel. In skimage, the "size" is defined by sigma which is related to kernel size as described here: https://en.wikipedia.org/wiki/Gaussian_filter
Definition can be found here: http://scikit-image.org/docs/dev/api/skimage.filters.html#skimage.filters.gaussian
In order to get corresponding results, you'd have to work with a smaller kernel for OpenCV.
Furthermore, for both libraries, I'd strongly recommend to use up to date library versions.
According to [Scipy0.15.1 API][1]:
scipy.ndimage.filters(img, sigma=sigma, truncate = 4.0)
It setsthe Gauss filter with the kernel size in truncate * sigma. In this understanding, the following two fuctionc will give the same results on gray scale image:
trunc_val = 3
sigma_val = 3
k_size = int(sigma_val * trunc_val)
gau_img1 = cv2.GaussianBlur(img, (k_size,k_size), sigma_val)
gau_img2 = gaussian_filter(img, sigma = sigma_val, truncate = trunc_val)
cv2.imshow("cv2 res", gau_img1)
cv2.imshow("scipy res", gau_img2)
Some Test results:
trunc_val = 3; sigma_val = 3
trunc_val = 3; sigma_val = 1
trunc_val = 3; sigma_val = 9
Both opencv and scipy allow specifying sigma which has identical meaning in both libraries. The kernel size is determined differently:
in scipy it's derived from the truncate parameter as int(truncate * sigma + 0.5)
in opencv it can be specified independently of sigma (if omitted, it's calculated differently from sigma = 0.3*((ksize-1)*0.5 - 1) + 0.8.
So to get identical results you need to explicitly specify both the kernel size and sigma:
import cv2
from skimage.filters import gaussian
import matplotlib.pyplot as plt
img = cv2.imread('bg4dZ.png', cv2.IMREAD_GRAYSCALE)
truncate = 4
sigma = 2
radius = int(truncate * sigma + 0.5)
ksize = 2 * radius + 1
opencv = cv2.GaussianBlur(img, (ksize, ksize), sigma, borderType=cv2.BORDER_REFLECT)
scipy = gaussian(img, sigma, truncate=truncate, preserve_range=True, mode='reflect')
fig, axs = plt.subplots(ncols=4, layout='constrained', figsize=(16, 4))
axs[0].imshow(img, cmap='gray')
axs[1].imshow(opencv, cmap='gray')
axs[2].imshow(scipy, cmap='gray')
diff = opencv - scipy
diff = axs[3].imshow(diff, cmap='seismic', vmin=diff.min(), vmax=-diff.min())
fig.colorbar(diff, shrink=.95)
for ax in axs:
The remaining differences (see 4th plot) are caused by floating point calcuation and different resulting datatype (uint8 vs float64).