Well I think I should mention it that it's the very first time I'm trying Audio signal processing in Python. I have an audio data set and I am extracting pitch features using Aubio library, and MFCC feature using the python_speech_features library in Python. The thing is, for a single audio file, I am getting around 84 valued vector for the pitch and 12 valued feature vector for MFCC.
Image of extracted pitch feature vector
So how do I save all these so many values in a single csv file? I have around 700 audio files separated in different directories wrt to emotions. Should I take the mean of all of these values and save them wrt the audio file in a csv? Like this:
Also, how would I used these values for classification then?
Any help would be much appreciated, Thanks.
There is not a simple answer to your question.
I have understand that for each data sample you extract a set of features, the same for each sample, don't you?
I suppose you work within a for loop, something like this:
import numpy as np
all_features = []
for path in path_list:
x = open_file(path) #an hypothetical function to open your files
features = extract_features(x) #an hypothetical function to extract features
all_features.append(features)
if your code looks like my simple example, you have created a list all_features whose elements all_features[i] contains the extracted features from the sample i. In addition i suppose that your extracted features is a numpy vector. If it is not, you should convert it into a numpy vector (something like features = np.array(features)).
Ok, now you are ready to create a dataset:
data = np.vstack(all_features)
the vertical stack np.vstack generates a matrix of shape (n_samples, n_features). Warning: all features vector must have the same shape!
Now you want to save the dataset, there is on ocean of possibilities, this my favorite three options:
1) using pandas to create a csv file:
import pandas as pd
df = pd.DataFrame(data)
df.to_csv(filename+'.csv', index=False, header=header) #header is a list of string to name columns of csv
#see https://pandas.pydata.org/pandasdocs/stable/generated/pandas.DataFrame.to_csv.html
2) dump memory into a pickle file:
import six.moves.cPickle as pickle
with open(filename+'.pkl', 'wb') as f:
pickle.dump(data, f)
3)save as numpy file:
np.save(filename+'.npy', data)
Concerning the classification problem, if you want to use a supervised method (MLP, RF, SVM, KNN, ...) you need a class labels (the ground truth), i.e. a vector with shape equals to the number of sample that relates each sample to a integer (for example 0,1 in a binary classification, or 0,1,2,3 for a 4-class classification). This strongly depend from what you want, what is the goal of your training.
Once you have the the data matrix and the label vector, each machine-learning method will be able to classify, if you have enough samples. With this aim, i suggest you to use same augmenting criteria, to have an idea have a look to this paper, it could give you same ideas.
Hoping i have help you, good work!
Python has a built-in csv module.
This section's example gives a simple example on how to use a writer to write rows to your csv.
Related
I'm attempting to receive a features vector of short wav (audio) files using wav2vec by using Hugging Face Transformers.
However, for unknown reasons, no matter which approach I use to control the output size, the results do not meet my requirements.
Ideally, I'd like to get all of the vectors to be the same length (e.g. 60K).
I try to get it with the following command:
feature_extractor(input_audio, sampling_rate=16000, return_tensors="np", padding="max_length",
max_length=60000).input_values
That command helped me create a minimal boundary of the data size by padding all the vectors into a minimum of 60K length, but I was surprised to see vectors with 120K values created as well.
Then I remove the padding parameter in the hope of obtaining vectors with no padding but an upper boundary of 60K.
Based on the max_length documentation:
Maximum length of the returned list and optionally padding length
So I executed this line:
feature_extractor(input_audio, sampling_rate=16000, return_tensors="np",
max_length=60000).input_values
Unexpectedly, I receive vectors ranging in length from 20K to 120K. Not limited at all.
To reproduce my bug and results, I've included a snippet of code and a link to relevant audio data.
import librosa
import numpy as np
from transformers import Wav2Vec2FeatureExtractor
from pathlib import Path
p = Path(dataset_path)
audio_files = [i.parents[0] / i.name for i in p.glob('**/*.wav')]
feature_extractor = Wav2Vec2FeatureExtractor.from_pretrained('facebook/wav2vec2-base-960h')
for file in (audio_files):
input_audio, _ = librosa.load(file,
sr=16000)
features_with_padding = feature_extractor(input_audio, sampling_rate=16000,
return_tensors="np", padding="max_length", max_length=60000).input_values
features_without_padding = feature_extractor(input_audio, sampling_rate=16000,
return_tensors="np", max_length=60000).input_values
print(features_with_padding.shape, features_without_padding.shape)
In this drive folder, I attached 2 wav files that create about 80K length vector.
How could I create a one-size feature vector with a wav2vec transformer?
At the moment truncation is not supported by the feature extractor in Hugging Face, so if you want to "pad" to a "max_length" that is shorter than the sample length, it simply won't change anything since no padding is needed.
However, we should definitely add a truncation functionality to Transformers as it is very important.
I am working on an Python, TensorFlow, image classification model, and in my training images, I have 12,611 images, but in my training labels, I have 12,613. (each image has a number as the title, and this number corresponds to the same number in a CSV file with the accompanying information for that image).
From here, what I need to do is simply remove those 2 extra data points for which I don't have pictures for. How can I write a code to help with this?
(If the code tells me which data points are the extras, I can manually remove them from the CSV file)
Thanks for the help.
Well its very straightforward, you can try something like this (As I dont kno exactly how and where you have saved your images, you might have to update the code to meet your use case) :
dir_path = r'/path/to/folder/of/images'
csv_path = r'/path/to/csv/file'
images = []
# Get all images labels
for filename in os.listdir(dir_path):
images.append(int(filename.split('.')[0]))
# Read CSV
df = pd.read_csv(csv_path)
# Print which labels are extra
for i in df['<COLUMN_NAME>'].tolist():
if i not in images:
print(i)
I want to train a neural network, I work with Python (3.6.9) and Tensorflow (2.4.0) and my problem is that my dataset is too big to be stored in memory.
A bit of context :
My network takes in input a small complex matrix of dimension 64 by 32.
My dataset is stored in the form of a very large ".mat" file generated by a matlab code.
In the mat file, the samples are stored in a large cell array.
I use the h5py library to open the mat file.
Example of python code to load only one sample from the file :
f = h5py.File('dataset.mat', 'r')
refs = f['data'] # array of reference of each sample
sample = f[refs[0]][()].view(np.complex) # load the first sample
Currently, I load only a small part of the dataset that I store in a tensorflow dataset (ds = tf.data.Dataset.from_tensor_slices(datas)).
I would like to take advantage of the possibility offered by the h5py library to be able to load each example individually to load the examples on the fly during network training.
I tried the following approach:
f = h5py.File('dataset.mat', 'r')
refs = f['data'] # array of reference of each sample
ds_index = tf.data.Dataset.range(len(refs))
ds = ds_index.map(lambda i : f[refs[i]][()].view(np.complex))
but, I have the following error :
NotImplementedError: in user code:
<ipython-input-66-6cf802c8359a>:15 __call__ *
return self._f[self._rs[i]]['channel'][()].view(np.complex).astype(np.complex64).T
/usr/local/lib/python3.6/dist-packages/tensorflow/python/framework/ops.py:855 __array__
" a NumPy call, which is not supported".format(self.name))
NotImplementedError: Cannot convert a symbolic Tensor (args_0:0) to a numpy array. This error may indicate that you're trying to pass a Tensor to a NumPy call, which is not supported
Do you know how to fix this error or can it be a better way to load examples on the fly ?
I have a large 40 mb (about 173,397 lines) .dat file filled with binary data (random symbols). It is an astronomical photograph. I need to read and display it with Python. I am using a binary file because I will need to extract pixel value data from specific regions of the image. But for now I just need to ingest it into Python. Something like the READU procedure in IDL. Tried numpy and matplotlib but nothing worked. Suggestions?
You need to know the data type and dimensions of the binary file. For example, if the file contains float data, use numpy.fromfile like:
import numpy as np
data = np.fromfile(filename, dtype=float)
Then reshape the array to the dimensions of the image, dims, using numpy.reshape (the equivalent of REFORM in IDL):
im = np.reshape(data, dims)
I have a dataset stored in NetCDF4 format that consists of Intensity values with 3 dimensions: Loop, Delay and Wavelength. I named my coordinates the same as the dimensions (I don't know if it's good or bad...)
I'm using xarray (formerly xray) in Python to load the dataset:
import xarray as xr
ds = xr.open_dataset('test_data.netcdf4')
Now I want to manipulate the data while keeping track of the original data. For instance, I would:
Apply an offset to the Delay coordinates and keep the original Delay dataarray untouched. This seems to be done with:
ds_ = ds.assign_coords(Delay_corr=ds_.Delay.copy(deep=True) + 25)
Substitute the coordinates Delay for Delay_corr for all relevant dataarrays in the dataset. However, I have no clue how to do this and I didn't find anything in the documentation.
Would anybody know how to perform item #2?
To download the NetCDF4 file with test data:
http://1drv.ms/1QHQTRy
The method you're looking for is the xr.swap_dims() method:
ds.coords['Delay_corr'] = ds.Delay + 25 # could also use assign_coords
ds2 = ds.swap_dims({'Delay': 'Delay_corr'})
See this section of the xarray docs for a full example.
I think it's much simpler than that.
If you don't want to change the existing data, you create a copy. Note that changing ds won't change the netcdf4 file, but assuming you still don't want to change ds:
ds_ = ds.copy(deep=True)
Then just set the Delay coord as a modified version of the old one
ds_.coords['Delay'] = ds_['Delay'] + 25