I used to use keras and the image format it followed is [Height x Width x Channels x Samples]. i decided to switch to PyTorch. But i didn’t switch out my data loading schemes. So now i have numpy arrays of shape HxWxCxS, instead of SxCxHxW which is required for PyTorch. Does anyone have any idea to convert this ?
First, Keras format is (samples, height, width, channels).
All you need to do is a moved = numpy.moveaxis(data, -1,1)
If by luck you were using the non-default config "channels_first", then the config is identical to that of PyTorch, which is (samples, channels, height, width).
And when transforming to torch: data = torch.from_numpy(moved)
You can convert your numpy arrays to tensors in pytorch quite easily by using the from_numpy function:
import torch
import numpy as np
a = np.ones(5)
b = torch.from_numpy(a)
b is now useable in pytorch
Related
I am trying to build pose detection using cv2, tensorflow in google colab
I am encountering with the following error..
Code:
import tensorflow as tf
import tensorflow_hub as hub
import cv2
from matplotlib import pyplot as plt
import numpy as np
from google.colab.patches import cv2_imshow
model = hub.load('https://tfhub.dev/google/movenet/multipose/lightning/1')
movenet = model.signatures['serving_default']
img_original = cv2.imread('/content/brandon-atchison-eexdeq3NleQ-unsplash.jpeg',1)
img_copy = img_original.copy()
input_img = tf.cast(img_original,dtype=tf.int32)
img_copy.shape
tensor = tf.convert_to_tensor(img_original,dtype=tf.int32)
tensor
results = movenet(tensor)
I have created the variable img_copy cuz I need to perform some operations on the image and want the original image as it is. Not sure what is the error I am facing while trying to get results from the movenet model.
edit:
Try:
results = movenet(tensor[None, ...])
since you are missing the batch dimension, which is needed to feed data to your model. You could also use tf.expand_dims:
tensor = tf.expand_dims(tensor, axis=0)
# resize
tensor = tf.image.resize(tensor, [32 * 186, 32 * 125])
Here is a working example:
import tensorflow_hub as hub
model = hub.load('https://tfhub.dev/google/movenet/multipose/lightning/1')
movenet = model.signatures['serving_default']
tensor = tf.random.uniform((1, 160, 256, 3), minval=0, maxval=255, dtype=tf.int32)
movenet(tensor)
Check the model description and make sure you have the correct shape:
A frame of video or an image, represented as an int32 tensor of dynamic shape: 1xHxWx3, where H and W need to be a multiple of 32 and the larger dimension is recommended to be 256. To prepare the input image tensor, one should resize (and pad if needed) the image such that the above conditions are hold. Please see the Usage section for more detailed explanation. Note that the size of the input image controls the tradeoff between speed vs. accuracy so choose the value that best suits your application. The channel order is RGB with values in [0, 255].
I have a .wav file which I read into an array in python.
import wave as wav
path = "Casio-Celesta-C5.wav"
f = wav.open(path)
data_sound = f.readframes(-1)
data_sound = np.frombuffer(data_sound,"Int16")
I want to perform average/max pooling on it with Tensorflow or Keras. But I'm not familiar with this framework so can anyone show me how to implement it? Or do anyone know another way to do it without Tensorflow?
Install Tensorflow
pip install tensorflow==2.0.0-rc1
Maxpooling - tf.nn.max_pool1d
Avgpooling - tf.nn.avg_pool1d
https://www.tensorflow.org/api_docs/python/tf/nn/max_pool1d
https://www.tensorflow.org/api_docs/python/tf/keras/layers/MaxPool1D
Input data
import numpy as np
a = np.random.randn(4).astype('float32')
print(a)
bac = a[None][:,:,None] # convert to batch/features/channels format
# here batch size = 1, no. of channels = 1
[0.02703167 0.382881 0.57891446 0.58068216]
Tensorflow
import tensorflow as tf
output = tf.nn.max_pool1d(bac, 2, 2, padding='VALID')
print(np.squeeze(output)) #strip off the additional dimensions
[0.382881 0.58068216]
Keras
from tensorflow.keras import layers
output = layers.MaxPool1D(pool_size=2, strides=2)(bac)
print(np.squeeze(output)) #strip off the additional dimensions
[0.382881 0.58068216]
I have a numpy array representation of an image and I want to turn it into a tensor so I can feed it through my pytorch neural network.
I understand that the neural networks take in transformed tensors which are not arranged in [100,100,3] but [3,100,100] and the pixels are rescaled and the images must be in batches.
So I did the following:
import cv2
my_img = cv2.imread('testset/img0.png')
my_img.shape #reuturns [100,100,3] a 3 channel image with 100x100 resolution
my_img = np.transpose(my_img,(2,0,1))
my_img.shape #returns [3,100,100]
#convert the numpy array to tensor
my_img_tensor = torch.from_numpy(my_img)
#rescale to be [0,1] like the data it was trained on by default
my_img_tensor *= (1/255)
#turn the tensor into a batch of size 1
my_img_tensor = my_img_tensor.unsqueeze(0)
#send image to gpu
my_img_tensor.to(device)
#put forward through my neural network.
net(my_img_tensor)
However this returns the error:
RuntimeError: _thnn_conv2d_forward is not implemented for type torch.ByteTensor
The problem is that the input you give to your network is of type ByteTensor while only float operations are implemented for conv like operations. Try the following
my_img_tensor = my_img_tensor.type('torch.DoubleTensor')
# for converting to double tensor
Source PyTorch Discussion Forum
Thanks to AlbanD
I am trying to extract features from audio files using Librosa, to feed to a CNN as Numpy arrays.
Currently i save a single feature at a time to feed into the CNN. I save two dimensional (single-channel) log-scaled mel-spectrogram features in Python using Librosa:
def build_features():
y, sr = librosa.load("audio.wav")
mel = librosa.feature.melspectrogram(
n_fft=4096,
n_mels=128, #Mel-bins
hop_length=2048,
)
logamplitude = librosa.amplitude_to_db
logspec = logamplitude(mel, ref=1.0)[np.newaxis, :, :, np.newaxis]
This gives the shape (1,128,323,1).
I would like to add another feature, let's say a tempogram. I can do this, using the same code, but replacing melspectrogram to tempogram', and setting the window length to 128.
This gives me a tempogram shape of (1,128,323,1).
Now i would like to "stack" these 2 feature layers, into a multi-channel numpy object, that i can feed into a CNN in Keras.
How should i code this?
EDIT:
Think I figured it out, using np.vstack()
I need to convert an image in a numpy array loaded via cv2 into the correct format for the deep learning library mxnet for its convolutional layers.
My current images are shaped as follows: (256, 256, 3), or (height, width, channels).
From what I've been told, this actually needs to be (3, 256, 256), or (channels, height, width).
Unfortunately, my knowledge of numpy/python opencv isn't good enough to know how to manipulate the arrays correctly.
I've figured out that I could split the arrays into channels by cv2.split, but I'm uncertain of how to combine them again in the right format (I don't know if using cv2.split is optimal, or if there are better ways in numpy).
Thanks for any help.
You can use numpy.rollaxis as follow:
If your image as shape (height, width, channels)
import numpy as np
new_shaped_image = np.rollaxis(image, axis=2, start=0)
This means that the 2nd axis of the new_shaped_image will be at 0 spot.
So new_shaped_image.shape will be (channels, height, width)
arr.transpose(2,0,1).shape
# (3, 256, 256)