Generating a 3D numpy array from variably-shaped 2D numpy arrays? - python

I am generating a large number of mel spectrograms to train a NN for phoneme detection.
Each mel spectrogram (generated with librosa.core.melspectrogram in Python) is represented as a 2D numpy array, where axis 1 (the length of the vectors) varies from spectrogram to spectrogram. They vary from shape (128, 2) to (128, 200).
In order to generate a 3D array, all spectrograms must have the same shape, so I'm guessing that I should append zeros to the ends of vectors that are shorter than 200. Then I can just add them all to a Python list, call np.array on it and a 3D numpy array will be generated, right?
I have attempted this myself unsuccessfully. All help is appreciated.
Edit: (code has been requested, this is essentially what I want to do)
spectrograms = []
for audio_array in all_audio_arrays:
audio_array, sr = librosa.core.load(audio_file, sr=sample_rate, mono=True)
melspectrogram = librosa.feature.melspectrogram(y=audio_array, sr=sample_rate, S=None, n_fft=window_size, hop_length=hop_length)
# melspectrogram is a 2D numpy array
# the shape could be between (128, 2) and (128, 200)
spectrograms.append(melspectrogram)
# I want this to be 3D
np.asarray(spectrograms)

I can't answer if it's an appropriate approach for your learner to pad with zeros. But doing so is quite easy using np.concatenate
import numpy as np
a = np.ones((128,2))
b = np.ones((128,200))
padding = np.zeros((a.shape[0], b.shape[1] - a.shape[1])) #(128, 198)
a = np.concatenate((a, padding), axis=1)
print (a.shape)
>>> (128L, 200L)

Related

Getting mean of 2D array

I have a vector representation of n = 1000 images, where each image is represented as 2048 numbers.
So I have a numpy array with a shape of (1000, 2048) that I need to find the mean of in a 2048-d vector.
If I run this function:
def get_means(f_embeddings):
means = []
for embedding in f_embeddings:
means.append(np.mean(embedding))
return np.array(means)
I get an ndarray of shape (1000,). How do I loop loop over the array correctly to have a 2048-d vector of means from the original array?
Try:
np.mean(f_embeddings, axis=0)
which should do it without the loop.

convert numpy array to 4D nifti file

I am doing single-voxel simulations on python to generate simulated signals with added noise. Then, I want to convert the resulting numpy array, with the following shape (100, 100) into a nifti file.
Rows represent one simulated signal under different conditions of noise and tensor rotation. Each column represents the correspondent signal intensity for that voxel under those conditions when measured with a specific sampling scheme (100 different directions).
[DWIs array]
I am to save this matrix into a nifti file with the following format (10, 10, 1, 100).
[Desired shape]
I don’t know how to properly allocate the numpy array (DWIs.shape = (100,100)) to the format I desire (10, 10, 1, 100):
data[…, ] = ?
 
converted_array = np.array(data, dtype=np.float32)
nifti_file = nib.Nifti1Image(converted_array, affine=np.eye(4))
nib.save(nifti_file, os.path.join(path_to_save, 'snr{}'.format(snr), 'full/dwi_sims_snr{}.nii.gz'.format(snr)))
In NumPy you do not need to "allocate" data arrays.
Suppose you have a 100x100 converted_array. That is
>>> converted_array.shape
(100,100)
>>> converted_array[0]
[146.4, 72.9, ..., 174.9]
then you can reshape this array as
>>> nifti_array = converted_array.reshape((10,10,1,100))
>>> nifti_array[0][0][0]
[146.4, 72.9, ..., 174.9]

A 2d matrix can be reconstructed, in which a mask has been used with numpy where and flattened

As the question says I have a 2D matrix (1000, 2000), in which I apply a condition with the numpy where function, in this way:
import numpy as np
A = np.random.randn(1000, 2000)
print(A.shape)
(1000, 2000)
mask = np.where((A >=0.1) & (A <= 0.5))
A = A[mask]
print(A.shape)
(303112,)
and I get a flattened matrix which I use as input in a Fortran program which only supports 1D matrices, the output of this program has the same dimension as the input 1D matrix (303112,), is there any method or function to reconstruct the flattened matrix to its original 2D form. I was thinking of saving the indexes in a boolean matrix and use these to reconstruct the matrix, if someone knows of any numpy method or any suggestion would be of great help.
Greetings.
IIUC you need to maintain the 1D indexes and 2D indexes of the mask so that when you try to update those values using a FORTRAN program, you can switch to 1D for input and then switch back to 2D to update the original array.
You can use np.ravel_multi_index to convert 2D indexes to 1D. Then you can use these 1D indexes to convert them back to 2D using np.unravel_index (though since you already have the 2D mask, you don't need to convert 1D to 2D again.)
import numpy as np
A = np.random.randn(1000, 2000)
mask = np.where((A >=0.1) & (A <= 0.5))
idx_flat = np.ravel_multi_index(mask, (1000,2000)) #FLAT 1D indexes using original mask
idx_2d = np.unravel_index(idx_flat, (1000,2000)) #2D INDEXES using the FLAT 1D indexes
#Comparing the method of using flat indexes and A[mask]
print(np.allclose(A.ravel()[idx_flat],A[mask]))
### True
#Comparing the freshly created 2d indexes to the original mask
print(np.allclose(idx_2d,mask))
### True
Here is a dummy test case with end to end code for a (3,3) matrix.
import numpy as np
#Dummy matrix A and mask
A = np.random.randn(3, 3) #<---- shape (3,3)
mask = np.where(A <= 0.5)
mask[0].shape #Number of indexes in 2D mask
###Output: (6,)
#########################################################
#Flatten 2D indexes to 1D
idx_flat = np.ravel_multi_index(mask, (3,3)) #<--- shape (3,3)
idx_flat.shape #Number of indexes in flattened mask
###Output: (6,)
#########################################################
#Feed the 6 length array to fortran function
def fortran_function(x):
return x**2
flat_array = A.ravel()[idx_flat]
fortran_output = fortran_function(flat_array)
#Number of values in fortran_output
fortran_output.shape
###Output: (6,)
#########################################################
#Create a empty array
new_arr = np.empty((3,3)) #<---- shape (3,3)
new_arr[:] = np.nan
new_arr[mask] = fortran_output #Feed the 1D array to the 2D masked empty array
new_arr
array([[5.63399114e-04, nan, 7.86255167e-01],
[3.94992857e+00, 4.88932044e-02, 2.45489069e+00],
[3.51957270e-02, nan, nan]])

How to make a Numpy 3D array with different length in the 2nd dimension?

I have some code that constructs a 3D numpy array (x_3d) on the fly using values from a 2D numpy array (x) in a forloop
x_3d = np.empty((0, 20, 10))
for i in range(num_samples):
x_3d = np.append(x_3d, [x[i*20:(i+1)*20, :]],
axis=0)
The resulting shape of the 3D array is (num_samples, 20, 10).
If I want to take slices of different length from the 2D array so that the number of rows varies how can I do that? I have looked at this post. By storing the 2D arrays initially in a list, and then transform the list back to an array gave me the shape of (num_samples, ), while each element is a 2D numpy array it is not a 3D numpy array with the shape of (num_samples, length_varies, 10).

python - repeating numpy array without replicating data

This question has been asked before, but the solution only works for 1D/2D arrays, and I need a more general answer.
How do you create a repeating array without replicating the data? This strikes me as something of general use, as it would help to vectorize python operations without the memory hit.
More specifically, I have a (y,x) array, which I want to tile multiple times to create a (z,y,x) array. I can do this with numpy.tile(array, (nz,1,1)), but I run out of memory. My specific case has x=1500, y=2000, z=700.
One simple trick is to use np.broadcast_arrays to broadcast your (x, y) against a z-long vector in the first dimension:
import numpy as np
M = np.arange(1500*2000).reshape(1500, 2000)
z = np.zeros(700)
# broadcasting over the first dimension
_, M_broadcast = np.broadcast_arrays(z[:, None, None], M[None, ...])
print M_broadcast.shape, M_broadcast.flags.owndata
# (700, 1500, 2000), False
To generalize the stride_tricks method given for a 1D array in this answer, you just need to include the shape and stride length for each dimension of your output array:
M_strided = np.lib.stride_tricks.as_strided(
M, # input array
(700, M.shape[0], M.shape[1]), # output dimensions
(0, M.strides[0], M.strides[1]) # stride length in bytes
)

Categories

Resources