Save Audio Features extracted using Librosa in a multichannel Numpy array

Save Audio Features extracted using Librosa in a multichannel Numpy array - python

I am trying to extract features from audio files using Librosa, to feed to a CNN as Numpy arrays.
Currently i save a single feature at a time to feed into the CNN. I save two dimensional (single-channel) log-scaled mel-spectrogram features in Python using Librosa:
def build_features():
y, sr = librosa.load("audio.wav")
mel = librosa.feature.melspectrogram(
n_fft=4096,
n_mels=128, #Mel-bins
hop_length=2048,
)
logamplitude = librosa.amplitude_to_db
logspec = logamplitude(mel, ref=1.0)[np.newaxis, :, :, np.newaxis]
This gives the shape (1,128,323,1).
I would like to add another feature, let's say a tempogram. I can do this, using the same code, but replacing melspectrogram to tempogram', and setting the window length to 128.
This gives me a tempogram shape of (1,128,323,1).
Now i would like to "stack" these 2 feature layers, into a multi-channel numpy object, that i can feed into a CNN in Keras.
How should i code this?
EDIT:
Think I figured it out, using np.vstack()

Related

Reshaping Image for PyTorch

I used to use keras and the image format it followed is [Height x Width x Channels x Samples]. i decided to switch to PyTorch. But i didn’t switch out my data loading schemes. So now i have numpy arrays of shape HxWxCxS, instead of SxCxHxW which is required for PyTorch. Does anyone have any idea to convert this ?

First, Keras format is (samples, height, width, channels).
All you need to do is a moved = numpy.moveaxis(data, -1,1)
If by luck you were using the non-default config "channels_first", then the config is identical to that of PyTorch, which is (samples, channels, height, width).
And when transforming to torch: data = torch.from_numpy(moved)

You can convert your numpy arrays to tensors in pytorch quite easily by using the from_numpy function:
import torch
import numpy as np
a = np.ones(5)
b = torch.from_numpy(a)
b is now useable in pytorch

Error when getting features from tensorflow-dataset

Im getting an error when attempting to load the Caltech tensorflow-dataset. I'm using the standard code found in the tensorflow-datasets GitHub
The error is this:
tensorflow.python.framework.errors_impl.InvalidArgumentError: Cannot batch tensors with different shapes in component 0. First element had shape [204,300,3] and element 1 had shape [153,300,3]. [Op:IteratorGetNextSync]
The error points to the line for features in ds_train.take(1)
Code:
ds_train, ds_test = tfds.load(name="caltech101", split=["train", "test"])
ds_train = ds_train.shuffle(1000).batch(128).prefetch(10)
for features in ds_train.take(1):
image, label = features["image"], features["label"]

The issue comes from the fact that the dataset contains variable-sized images (see the dataset description here). Tensorflow can only batch together things with the same shape, so you first need to either reshape the images to a common shape (e.g., the input shape of your network) or pad them accordingly.
If you want to resize, use tf.image.resize_images:
def preprocess(features, label):
features['image'] = tf.image.resize_images(features['image'], YOUR_TARGET_SIZE)
# Other possible transformations needed (e.g., converting to float, normalizing to [0,1]
return features, label
If, instead, you want to pad, use tf.image.pad_to_bounding_box (just replace it in the above preprocess function and adapt the parameters as needed).
Normally, for most of the networks I'm aware of, resizing is used.
Finally, map the function on your dataset:
ds_train = (ds_train
.map(prepocess)
.shuffle(1000)
.batch(128)
.prefetch(10))
Note: The variable shapes in the error codes come from the shuffle call.

inputing numpy array images into pytorch neural net

I have a numpy array representation of an image and I want to turn it into a tensor so I can feed it through my pytorch neural network.
I understand that the neural networks take in transformed tensors which are not arranged in [100,100,3] but [3,100,100] and the pixels are rescaled and the images must be in batches.
So I did the following:
import cv2
my_img = cv2.imread('testset/img0.png')
my_img.shape #reuturns [100,100,3] a 3 channel image with 100x100 resolution
my_img = np.transpose(my_img,(2,0,1))
my_img.shape #returns [3,100,100]
#convert the numpy array to tensor
my_img_tensor = torch.from_numpy(my_img)
#rescale to be [0,1] like the data it was trained on by default
my_img_tensor *= (1/255)
#turn the tensor into a batch of size 1
my_img_tensor = my_img_tensor.unsqueeze(0)
#send image to gpu
my_img_tensor.to(device)
#put forward through my neural network.
net(my_img_tensor)
However this returns the error:
RuntimeError: _thnn_conv2d_forward is not implemented for type torch.ByteTensor

The problem is that the input you give to your network is of type ByteTensor while only float operations are implemented for conv like operations. Try the following
my_img_tensor = my_img_tensor.type('torch.DoubleTensor')
# for converting to double tensor
Source PyTorch Discussion Forum
Thanks to AlbanD

Keras CNN with varying image sizes

I'm trying to use the VOC2012 dataset for training a CNN. For my project, I require B&W data, so I extracted the R components. So far so good. The trouble is that the images are of different sizes, so I can't figure out how to pass it to the model. I compiled my model, and then created my mini-batches of size 32 as below (where X_train and Y_train are the paths to the files).
for x in X_train:
img = plt.imread(x)
img = img.reshape(*(img.shape), 1)
X.append(img)
for y in Y_train:
img = plt.imread(y)
img = img.reshape(*(img.shape), 1)
Y.append(img)
model.train_on_batch(np.array(X), np.array(Y))
However, I suspect that because the images are all of different sizes, the numpy array has a shape (32,) rather than (32, height, width, 1) as I'd expect. How do I take care of this?

According to some sources, one is indeed able to train at least some architectures with varying input sizes. (Quora, Cross Validated)
When it comes to generating an array of arrays varying in size, one might simply use a Python list of NumPy arrays, or an ndarray of type object to collect all the image data. Then in the training process, the Quora answer mentioned that only batch size 1 can be used, or one might clump several images together based on the sizes. Even padding with zeros could be used to make the images evenly sized, but I can't say much about the validity of that approach.
Best of luck in your research!
Example code for illustration:
# Generate 10 "images" with different sizes
images = [np.zeros((i+5, i+10)) for i in range(10)]
images = np.array([np.zeros((i+5, i+10)) for i in range(10)])
# Or an empty array to append to
images = np.array([], dtype=object)

How to format training input and output data on Keras

I am new to Deep Learning and I struggle with some data format on Keras. My CNN is based on the Stacked Hourglass Networks for Human Pose Estimation from A.Newell et al.
On this network the input is a 256x256 RGB image and the output should be a 64x64 heatmap highlighting body joints (shoulder, knee,...). I manage to build the network and I have all the data (images) with their annotations (pixel labels for body joints). I was wondering how should I format the Input and Output Data of the training set to train my model. Currently I use a numpy array (256,256,3) for an image and I don't know how to format my output. Should I create a table [n,64,64,7]? (n being the size of the training set and 7 is the number of filters I use to obtain a heatmap for 7 joints)
Thank you for your time.

The output can also be a numpy array.
Consider this example:
Training set: 50 images of size 256x256x3. This can be combined into a single numpy array of shape(50, 256, 256, 3).
Similar approach to format the output data.
Sample code below:
#a, b and c are arrays of size 256x256x3
import numpy as np
temp = []
temp.append(a)
temp.append(b)
temp.append(c)
output_labels = []
output_labels = np.stack(temp)
The output_labels array will be of shape(3x256x256x3).

Keras recommend to create data generator to feed training data and ground truth to network.
Specific to stacked hourglass network case, you can refer to my implementation for details https://github.com/yuanyuanli85/Stacked_Hourglass_Network_Keras/tree/master/src/data_gen

Develop Reference

Python is a programming language that lets you work quickly and integrate systems more effectively.

Save Audio Features extracted using Librosa in a multichannel Numpy array - python

Related

Reshaping Image for PyTorch

Error when getting features from tensorflow-dataset

inputing numpy array images into pytorch neural net

Keras CNN with varying image sizes

How to format training input and output data on Keras

Categories

Resources