Filtering audio signal in TensorFlow - python

I am building an audio-based deep learning model. As part of the preporcessing I want to augment the audio in my datasets. One augmentation that I want to do is to apply RIR (room impulse response) function. I am working with Python 3.9.5 and TensorFlow 2.8.
In Python the standard way to do it is, if the RIR is given as a finite impulse response (FIR) of n taps, is using SciPy lfilter
import numpy as np
from scipy import signal
import soundfile as sf
h = np.load("rir.npy")
x, fs = sf.read("audio.wav")
y = signal.lfilter(h, 1, x)
Running in loop on all the files may take a long time. Doing it with TensorFlow map utility on TensorFlow datasets:
# define filter function
def h_filt(audio, label):
h = np.load("rir.npy")
x = audio.numpy()
y = signal.lfilter(h, 1, x)
return tf.convert_to_tensor(y, dtype=tf.float32), label
# apply it via TF map on dataset
aug_ds = ds.map(h_filt)
Using tf.numpy_function:
tf_h_filt = tf.numpy_function(h_filt, [audio, label], [tf.float32, tf.string])
# apply it via TF map on dataset
aug_ds = ds.map(tf_h_filt)
I have two questions:
Is this way correct and fast enough (less than a minute for 50,000 files)?
Is there a faster way to do it? E.g. replace the SciPy function with a built-in TensforFlow function. I didn't find the equivalent of lfilter or SciPy's convolve.

Here is one way you could do
Notice that tensor flow function is designed to receive batches of inputs with multiple channels, and the filter can have multiple input channels and multiple output channels. Let N be the size of the batch I, the number of input channels, F the filter width, L the input width and O the number of output channels. Using padding='SAME' it maps an input of shape (N, L, I) and a filter of shape (F, I, O) to an output of shape (N, L, O).
import numpy as np
from scipy import signal
import tensorflow as tf
# data to compare the two approaches
x = np.random.randn(100)
h = np.random.randn(11)
# h
y_lfilt = signal.lfilter(h, 1, x)
# Since the denominator of your filter transfer function is 1
# the output of lfiler matches the convolution
y_np = np.convolve(h, x)
assert np.allclose(y_lfilt, y_np[:len(y_lfilt)])
# now let's do the convolution using tensorflow
y_tf = tf.nn.conv1d(
# x must be padded with half of the size of h
# to use padding 'SAME'
np.pad(x, len(h) // 2).reshape(1, -1, 1),
# the time axis of h must be flipped
h[::-1].reshape(-1, 1, 1), # a 1x1 matrix of filters
stride=1,
padding='SAME',
data_format='NWC')
assert np.allclose(y_lfilt, np.squeeze(y_tf)[:len(y_lfilt)])

Related

Is there a Tensorflow equivalent for np.random.choice to randomly sample from a discrete set?

I am trying to implement an image augmentation strategy similar to RandAugment in TensorFlow. From the RandAugment paper, the following code shows how N augmentations are randomly selected to be applied to images.
transforms = [’Identity’, ’AutoContrast’, ’Equalize’, ’Rotate’, ’Solarize’,
’Color’,’Posterize’, ’Contrast’, ’Brightness’, ’Sharpness’,
’ShearX’, ’ShearY’,’TranslateX’, ’TranslateY’]
def randaugment(N, M):
"""Generate a set of distortions.
Args:
N: Number of augmentation transformations to apply
sequentially.
M: Magnitude for all the transformations.
"""
sampled_ops = np.random.choice(transforms, N)
return [(op, M) for op in sampled_ops]
However, I wish to do this per batch of images in TensorFlow, ideally as efficiently as possible. It would look something like
transform_names = ['Identity', 'Brightness', 'Colour', 'Contrast', 'Equalise', 'Rotate',
'Sharpness', 'ShearX', 'ShearY', 'TranslateX', 'TranslateY']
transforms = {'Identity':identity, 'Brightness':brightness, 'Colour':colour,
'Contrast':contrast, 'Equalise':equalise, 'Rotate':rotate,
'Sharpness':sharpness, 'ShearX':shear_x, 'ShearY':shear_y,
'TranslateX':translate_x, 'TranslateY':translate_y}
def brightness(image, M):
M = tf.math.minimum(M, 0.95)
M = tf.math.maximum(M, 0.05)
B = M - 1
image = tf.image.adjust_brightness(image, delta=B)
image = tf.clip_by_value(image, clip_value_min=0, clip_value_max=1)
return image
def augment(image):
N = 3
M = tf.random.uniform(minval=0, maxval=1, shape=[])
sampled_ops = np.random.choice(transform_names, N)
for op in sampled_ops:
image = transforms[op](image, M)
return image
x = tf.data.Dataset.from_tensor_slices(x)
x = x.batch(batch_size)
x_a = x.map(augment)
where x is the dataset of images, and augment is the augmentation function that randomly samples N augmentations to apply to each image. I've added the brightness function to illustrate the composition of the individual augmentation functions. From what I've gathered, any NumPy function seems to only be called once across the entire dataset, meaning the sampled augmentations will be the same for every image.
How could I write this code such that the individual augmentations are randomly sampled independently for each batch?

Specific tensor decomposition

I want to decompose a 3-dimensional tensor using SVD.
I am not quite sure if and, how following decomposition can be achieved.
I already know how I can split the tensor horizontally from this tutorial: tensors.org Figure 2.2b
d = 10; A = np.random.rand(d,d,d)
Am = A.reshape(d**2,d)
Um,Sm,Vh = LA.svd(Am,full_matrices=False)
U = Um.reshape(d,d,d); S = np.diag(Sm)
Matrix methods can be naturally extended to higher-orders. SVD, for instance, can be generalized to tensors e.g. with the Tucker decomposition, sometimes called a higher-order SVD.
We maintain a Python library for tensor methods, TensorLy, which lets you do this easily. In this case you want a partial Tucker as you want to leave one of the modes uncompressed.
Let's import the necessary parts:
import tensorly as tl
from tensorly import random
from tensorly.decomposition import partial_tucker
For testing, let's create a 3rd order tensor of size (10, 10, 10):
size = 10
order = 3
shape = (size, )*order
tensor = random.random_tensor(shape)
You can now decompose the tensor using the tensor decomposition. In your case, you want to leave one of the dimensions untouched, so you'll only have two factors (your U and V) and a core tensor (your S):
core, factors = partial_tucker(tensor, rank=size, modes=[0, 2])
You can reconstruct the original tensor from your approximation using a series of n-mode products to contract the core with the factors:
from tensorly import tenalg
rec = tenalg.multi_mode_dot(core, factors, modes=[0, 2])
rec_error = tl.norm(rec - tensor)/tl.norm(tensor)
print(f'Relative reconstruction error: {rec_error}')
In my case, I get
Relative reconstruction error: 9.66027176805661e-16
You can also use "tensorlearn" package in python for example using tensor-train (TT) SVD algorithm.
https://github.com/rmsolgi/TensorLearn/tree/main/Tensor-Train%20Decomposition
import numpy as np
import tensorlearn as tl
#lets generate an arbitrary array
tensor = np.arange(0,1000)
#reshaping it into a higher (3) dimensional tensor
tensor = np.reshape(tensor,(10,20,5))
epsilon=0.05
#decompose the tensor to its factors
tt_factors=tl.auto_rank_tt(tensor, epsilon) #epsilon is the error bound
#tt_factors is a list of three arrays which are the tt-cores
#rebuild (estimating) the tensor using the factors again as tensor_hat
tensor_hat=tl.tt_to_tensor(tt_factors)
#lets see the error
error_tensor=tensor-tensor_hat
error=tl.tensor_frobenius_norm(error_tensor)/tl.tensor_frobenius_norm(tensor)
print('error (%)= ',error*100) #which is less than epsilon
# one usage of tensor decomposition is data compression
# So, lets calculate the compression ratio
data_compression_ratio=tl.tt_compression_ratio(tt_factors)
#data saving
data_saving=1-(1/data_compression_ratio)
print('data_saving (%): ', data_saving*100)

scRNA-seq: How to use TSNE python implementation using precalculated PCA score/load?

Python t-sne implementation from this resource: https://lvdmaaten.github.io/tsne/
Btw I'm a beginner to scRNA-seq.
What I am trying to do: Use a scRNA-seq data set and run t-SNE on it but with using previously calculated PCAs (I have PCA.score and PCA.load files)
Q1: I should be able to use my selected calculated PCAs in the tSNE, but which file do I use the pca.score or pca.load when running Y = tsne.tsne(X)?
Q2: I've tried removing/replacing parts of the PCA calculating code to attempt to remove PCA preprocessing but it always seems to give an error. What should I change for it to properly use my already PCA data and not calculate PCA from it again?
The piece of PCA processing code is this in its raw form:
def pca(X=np.array([]), no_dims=50):
"""
Runs PCA on the NxD array X in order to reduce its dimensionality to
no_dims dimensions.
"""
print("Preprocessing the data using PCA...")
(n, d) = X.shape
X = X - np.tile(np.mean(X, 0), (n, 1))
(l, M) = X #np.linalg.eig(np.dot(X.T, X))
Y = np.dot(X, M[:, 0:no_dims])
return Y
You should use the PCA score.
As for not running pca, you can just comment out this line:
X = pca(X, initial_dims).real
What I did is to add a parameter do_pca and edit the function such:
def tsne(X=np.array([]), no_dims=2, initial_dims=50, perplexity=30.0,do_pca=True):
"""
Runs t-SNE on the dataset in the NxD array X to reduce its
dimensionality to no_dims dimensions. The syntaxis of the function is
`Y = tsne.tsne(X, no_dims, perplexity), where X is an NxD NumPy array.
"""
# Check inputs
if isinstance(no_dims, float):
print("Error: array X should have type float.")
return -1
if round(no_dims) != no_dims:
print("Error: number of dimensions should be an integer.")
return -1
# Initialize variables
if do_pca:
X = pca(X, initial_dims).real
(n, d) = X.shape
max_iter = 50
[.. rest stays the same..]
Using an example dataset, without commenting out that line:
import numpy as np
from sklearn.manifold import TSNE
from sklearn.datasets import load_digits
import matplotlib.pyplot as plt
import sys
import os
from tsne import *
X,y = load_digits(return_X_y=True,n_class=3)
If we run the default:
res = tsne(X=X,initial_dims=20,do_pca=True)
plt.scatter(res[:,0],res[:,1],c=y)
If we pass it a pca :
pc = pca(X)[:,:20]
res = tsne(X=pc,initial_dims=20,do_pca=False)
plt.scatter(res[:,0],res[:,1],c=y)

Gaussian filter in PyTorch

I am looking for a way to apply a Gaussian filter to an image (tensor) only using PyTorch functions. Using numpy, the equivalent code is
import numpy as np
from scipy import signal
import matplotlib.pyplot as plt
# Define 2D Gaussian kernel
def gkern(kernlen=256, std=128):
"""Returns a 2D Gaussian kernel array."""
gkern1d = signal.gaussian(kernlen, std=std).reshape(kernlen, 1)
gkern2d = np.outer(gkern1d, gkern1d)
return gkern2d
# Generate random matrix and multiply the kernel by it
A = np.random.rand(256*256).reshape([256,256])
# Test plot
plt.figure()
plt.imshow(A*gkern(256, std=32))
plt.show()
The closest suggestion I found is based on this post:
import torch.nn as nn
conv = nn.Conv2d(in_channels = 1, out_channels = 1, kernel_size=264, bias=False)
with torch.no_grad():
conv.weight = gaussian_weights
But it gives me the error NameError: name 'gaussian_weights' is not defined. How can I make it work?
Yupp I also had the same idea. So now the question becomes: is there a way to define a Gaussian kernel (or a 2D Gaussian) without using Numpy and/or explicitly specifying the weights?
Yes, it is pretty easy. Just have a look to the function documentation of signal.gaussian. There is a link to the source code. So what the method is doing is the following:
def gaussian(M, std, sym=True):
if M < 1:
return np.array([])
if M == 1:
return np.ones(1, 'd')
odd = M % 2
if not sym and not odd:
M = M + 1
n = np.arange(0, M) - (M - 1.0) / 2.0
sig2 = 2 * std * std
w = np.exp(-n ** 2 / sig2)
if not sym and not odd:
w = w[:-1]
return w
And you are lucky because is the straightforward to convert in Pytorch, (almost) just replacing np by torch and you are done!
Also, note that np.outer equivalent in torch is ger.
There is a Pytorch class to apply Gaussian Blur to your image:
torchvision.transforms.GaussianBlur(kernel_size, sigma=(0.1, 2.0))
Check the documentation for more info
Assuming that the question actually asks for a convolution with a Gaussian (i.e. a Gaussian blur, which is what the title and the accepted answer imply to me) and not for a multiplication (i.e. a vignetting effect, which is what the question's demo code produces), here is a pure PyTorch version that does not need torchvision to be installed (otherwise torchvision.transforms.GaussianBlur() can be used instead, as has been proposed by Mushfirat Mohaimin's answer):
from math import ceil
import torch
from torch.nn.functional import conv2d
from torch.distributions import Normal
def gaussian_kernel_1d(sigma: float, num_sigmas: float = 3.) -> torch.Tensor:
radius = ceil(num_sigmas * sigma)
support = torch.arange(-radius, radius + 1, dtype=torch.float)
kernel = Normal(loc=0, scale=sigma).log_prob(support).exp_()
# Ensure kernel weights sum to 1, so that image brightness is not altered
return kernel.mul_(1 / kernel.sum())
def gaussian_filter_2d(img: torch.Tensor, sigma: float) -> torch.Tensor:
kernel_1d = gaussian_kernel_1d(sigma) # Create 1D Gaussian kernel
padding = len(kernel_1d) // 2 # Ensure that image size does not change
img = img.unsqueeze(0).unsqueeze_(0) # Need 4D data for ``conv2d()``
# Convolve along columns and rows
img = conv2d(img, weight=kernel_1d.view(1, 1, -1, 1), padding=(padding, 0))
img = conv2d(img, weight=kernel_1d.view(1, 1, 1, -1), padding=(0, padding))
return img.squeeze_(0).squeeze_(0) # Make 2D again
if __name__ == "__main__":
import matplotlib.pyplot as plt
img = torch.rand(size=(100, 100))
img_filtered = gaussian_filter_2d(img, sigma=1.5)
plt.subplot(121)
plt.imshow(img)
plt.subplot(122)
plt.imshow(img_filtered)
plt.show()
The code uses the basic idea of a separable filter that Andrei Bârsan implied in a comment to this answer. This means that convolution with a 2D Gaussian kernel can be replaced by convolving twice with a 1D Gaussian kernel – once along the image's columns, once along its rows. This is more efficient in general, as it uses 2N rather than N² multiplications per pixel for a kernel of side length N.
So in the provided code, we first create a 1D Gaussian kernel with gaussian_kernel_1d(), which we then apply twice in gaussian_filter_2d().
Some more notes on the code:
The parameter num_sigmas controls how many standard deviations and thus how much of the bulge of the Gaussian function we actually sample for producing the convolution kernel. As the Gaussian function theoretically has infinite support (meaning it is never zero), this presents a trade-off between accuracy and kernel size (which affects speed and memory use). A length of 3 * sigma should be sufficient for the two halves of the support usually, given that it will cover 99.7% of the area under the corresponding Gaussian function.
Rather than using Normal().log_prob().exp_() for producing the kernel, we could explicitly write
the function of the normal distribution here, which might be a bit more efficient. In fact, we could write kernel = support.square_().mul_(-.5 / (sigma ** 2)).exp_(), thus (1) altering the values of support in-place (as we won't need them, any longer) and (2) even omitting the normalization constant of the normal distribution (as we must normalize the kernel by its sum before returning it, anyway).
Although we use conv2d() rather than conv1d(), effectively we still have two 1D convolutions, as we apply a N×1 and 1×N kernel in conv2d(). We could have used conv1d() instead, but the code is much simpler with conv2d().
In more recent PyTorch versions, we can use conv2d(…, padding="same"), rather than calculating the padding amount ourselves. In either case, using conv2d()'s padding parameter implies padding with zeros. If we wanted more padding options, we could manually pad the image with torch.nn.functional.pad() before the convolution instead.
Used all the codes from above and updated with Pytorch revision of torch.outer
import torch
def gaussian_fn(M, std):
n = torch.arange(0, M) - (M - 1.0) / 2.0
sig2 = 2 * std * std
w = torch.exp(-n ** 2 / sig2)
return w
def gkern(kernlen=256, std=128):
"""Returns a 2D Gaussian kernel array."""
gkern1d = gaussian_fn(kernlen, std=std)
gkern2d = torch.outer(gkern1d, gkern1d)
return gkern2d
# Generate random matrix and multiply the kernel by it
A = np.random.rand(256*256).reshape([256,256])
A = torch.from_numpy(A)
guassian_filter = gkern(256, std=32)
ax=[]
f = plt.figure(figsize=(12,5))
ax.append(f.add_subplot(131))
ax.append(f.add_subplot(132))
ax.append(f.add_subplot(133))
ax[0].imshow(A, cmap='gray')
ax[1].imshow(guassian_filter, cmap='gray')
ax[2].imshow(A*guassian, cmap='gray')
plt.show()

Simultaneously fit linearly every line of a 2d numpy array

I am working in Python on image analysis. I have an image (2d numpy array) with some intensity drift in it. I want to level it.
To remove the increasing/decreasing intensity over the width of the image, I want to fit every row of the 2d numpy array with a line. I however do not want to loop through every row index.
MWE:
import numpy as np
import matplotlib.pyplot as plt
width=1500
height=2500
np.random.random((width,height))
fill_fun = lambda x,a,b : a*x+b
play_image = fill_fun(np.tile(np.arange(width),(height,1)),0.15,2)+np.random.random( (height,width) )
#For representation purposes:
#plt.imshow(play_image,cmap='Greys_r')
#plt.show()
#1) Fit every row and kill the intensity decrease/increase tendency
fit_func = lambda p,x: p[0]*x+b
errfunc = lambda p, x, y: abs(fitfunc(p, x) - y) # Distance to the target function
x_axis=np.linspace(0,width,width)
for i in range(height):
row_val=play_image[i,:]
p0=[(row_val[-1]-row_val[0])/float(width),row_val[0]] #guess
p1, success = optimize.leastsq(errfunc, p0[:], args=(x_axis,row_val))
play_image[i,:]-= fit_func(p1,x_axis)-p1[1]
By doing this I effectively level my image intensity horizontally. Is there anyway I can replace the loop by a matrix operation ? To somehow fit all the lines at the same time with a (height,2) parameter vector ?
Thanks for the help
Fitting a line is a simple formula to use directly, which can be done about three short lines in numpy (most of the code below is just making and plotting the data and fits):
import numpy as np
import matplotlib.pyplot as plt
# make the data as sequential sections of a circle
theta = np.linspace(np.pi, 0, 120)
y = np.reshape(np.sin(theta), (10,12))
x = np.repeat(np.arange(12)[None,:], 10, axis=0)
# fit the line
m = lambda x: np.mean(x, axis=1)
beta = ( m(y*x) - m(x)*m(y) )/(m(x*x) - m(x)**2)
alpha = m(y) - beta*m(x)
# plot the data and fits
plt.plot([y[:,i] for i in range(12)], ".") # plot the data
plt.gca().set_color_cycle(None) # reset the color cycle
fits = alpha[:,None] + beta[:,None]*x # make lines from the fits for the plots
plt.plot(fits.T)
plt.show()
You can implement the normal equations and their solution pretty easily. The main challenge is keeping track of the appropriate dimensions so all the vectorized operations work correctly. Here's one method:
import numpy as np
# image size
m = 100
n = 125
# A random image to work with.
np.random.seed(123)
img = np.random.randint(0, 100, size=(m, n))
# X is the design matrix. It is the same for each row. It has shape (n, 2).
X = np.column_stack((np.ones(n), np.arange(n)))
# A is X.T.dot(X), but in this case we can use an explicit formula for each term.
s1 = 0.5*n*(n - 1) # Sum of integers
s2 = n*(n - 0.5)*(n - 1)/3.0 # Sum of squared integers
A = np.array([[n, s1], [s1, s2]])
# Y has shape (2, m). Each column is a vector on the right-hand-side of the
# normal equations.
Y = X.T.dot(img.T)
# Solve the normal equations. beta has shape (2, m). Each column gives the
# coefficients of the linear fit for each row of img.
beta = np.linalg.solve(A, Y)
# Create an array that holds the linear drift for each row.
# X has shape (n, 2) and beta has shape (2, m), so row_drift has shape (m, n),
# the same as img.
row_drift = X.dot(beta).T
# Remove the drift from img.
img2 = img - row_drift

Categories

Resources