I'm trying to import the MNIST dataset in Python as follows:
import h5py
f = h5py.File("mnist.h5")
x_test = f["x_test"]
x_train = f["x_train"]
y_test = f["y_test"]
y_train = f["y_train"]
the type of say, y_train says h5py._hl.dataset.Dataset
I want to convert them to float for mathematical convenience. I try this:
D = x_train.astype(float)
y_train = y_train.astype(float)+np.ones((60000,1));
but I get this traceback:
Traceback (most recent call last):
File "<ipython-input-14-f3677d523d45>", line 1, in <module>
y_train = y_train.astype(float)+np.ones((60000,1));
TypeError: unsupported operand type(s) for +: 'AstypeContext' and 'float'
Where am I missing out? Thanks.
You are using two different libraries that have two completely different meanings for astype.
If you were doing this in numpy, something like this works:
a = np.array([1, 2, 3])
a = a.astype(float) + np.ones((60000,1))
But in h5py, astype is a different function and meant to be used in a context manager:
This will throw the same error as what you are getting:
import h5py
f = h5py.File('mytestfile.hdf5', 'w')
dset = f.create_dataset("default", (100,))
dset.astype(float) + np.ones((60000,1))
But the code below, will work (see astype in h5py docs):
f = h5py.File('mytestfile.hdf5', 'w')
dset = f.create_dataset("default", (100,))
with dset.astype('float'):
out = dset[:]
out += np.ones((100,))
This problem is similar to Creating reference to HDF dataset in H5py using astype
Related
I'm trying to figure out how to load binary data file using FixedLengthRecordDataset:
import tensorflow as tf
import struct
import numpy as np
RAW_N = 2 + 20*20 + 1
def convert_binary_to_float_array(register):
return struct.unpack('f'*RAW_N, register.numpy())
raw_dataset = tf.data.FixedLengthRecordDataset(filenames=['mydata.bin'],record_bytes=RAW_N*4)
float_ds = raw_dataset.map(map_func=convert_binary_to_float_array)
This code throws:
AttributeError: in user code:
tf-load-data.py:14 convert_binary_to_float_array *
return struct.unpack('f'*RAW_N, register.numpy())
AttributeError: 'Tensor' object has no attribute 'numpy'
numpy() is available if I try to iterate over the dataset:
raw_dataset = tf.data.FixedLengthRecordDataset(filenames=['mydata.bin'],record_bytes=RAW_N*4)
for register in raw_dataset:
print(struct.unpack('f'*RAW_N, register.numpy()))
By reading the Tensor type description, I realized that numpy() is available only during eager execution. Thus, I can deduce that during the map() call the elements are not provided as EagerTensor.
How to load this data into a dataset?
I'm using TensorFlow 2.4.1
I would suggest working with tf.io.decode_raw. I unfortunately do not know what mydata.bin looks like so I created some dummy data:
import random
import struct
import tensorflow as tf
import numpy as np
RAW_N = 2 + 20*20 + 1
bytess = random.sample(range(1, 5000), RAW_N*4)
with open('mydata.bin', 'wb') as f:
f.write(struct.pack('1612i', *bytess))
def convert_binary_to_float_array(register):
return tf.io.decode_raw(register, out_type=tf.float32)
raw_dataset = tf.data.FixedLengthRecordDataset(filenames=['/content/mydata.bin'], record_bytes=RAW_N*4)
raw_dataset = raw_dataset.map(convert_binary_to_float_array)
for register in raw_dataset:
print(register)
You could also try first decoding your data into integers with tf.io.decode_raw and then casting to float with tf.cast, but I am not sure if it will make a difference.
I'm experiencing an error that I cannot seem to resolve when attempting to convert from a Dataset to array when using xarray. I'm encountering this because I'm attempting to add a time dimension to a netcdf file (open netcdf, add a timestamp that is the same across all data, save out netcdf).
import xarray as xr
import pandas as pd
scriptpath = os.path.dirname(os.path.abspath(__file__))
outputfile = scriptpath + '\\20210629_deadgrass.aus.nc'
times= pd.to_datetime(str(yesterday.strftime('%Y%m%d')))
time_da = xr.Dataset({"time": times})
arr = xr.open_dataset(outputfile)
ds = arr.to_array()
dst = ds.expand_dims(time=time_da) #errors here
The error I'm receiving is
Exception has occurred: TypeError
cannot directly convert an xarray.Dataset into a numpy array. Instead, create an xarray.DataArray first, either with indexing on the Dataset or by invoking the `to_array()` method.
File "Z:\UpdateAussieGRASS.py", line 101, in <module>
dst = ds.expand_dims(time=time_da)
I can't seem to work out what I'm doing wrong with to_array() in the second last line. Examples of to_array() are here. Autogenerated documentation is here.
ds is already an xarray.DataArray. The error occurs on this line:
dst = ds.expand_dims(time=time_da) #errors here
Because while ds is a DataArray, time_da is not. This should work:
dst = ds.expand_dims(time=time_da.to_array())
I have a CSV files with all numeric values except the header row. When trying to build tensors, I get the following exception:
Traceback (most recent call last):
File "pytorch.py", line 14, in <module>
test_tensor = torch.tensor(test)
ValueError: could not determine the shape of object type 'DataFrame'
This is my code:
import torch
import dask.dataframe as dd
device = torch.device("cuda:0")
print("Loading CSV...")
test = dd.read_csv("test.csv", encoding = "UTF-8")
train = dd.read_csv("train.csv", encoding = "UTF-8")
print("Converting to Tensor...")
test_tensor = torch.tensor(test)
train_tensor = torch.tensor(train)
Using pandas instead of Dask for CSV parsing produced the same error. I also tried to specify dtype=torch.float64 inside the call to torch.tensor(data), but got the same error again.
Try converting it to an array first:
test_tensor = torch.Tensor(test.values)
I think you're just missing .values
import torch
import pandas as pd
train = pd.read_csv('train.csv')
train_tensor = torch.tensor(train.values)
Newer version of pandas highly recommend to use to_numpy instead of values
train_tensor = torch.tensor(train.to_numpy())
Only using NumPy
import numpy as np
import torch
tensor = torch.from_numpy(
np.genfromtxt("train.csv", delimiter=",")
)
I'm trying to implement an example project on DZone (https://dzone.com/articles/cv-r-cvs-retrieval-system-based-on-job-description) and running into a problem. In this case, I've set
dir_pca_we_EWE = 'pickle_model_pca.pkl'
And am executing the following:
def reduce_dimensions_WE(dir_we_EWE, dir_pca_we_EWE):
m1 = KeyedVectors.load_word2vec_format('./wiki.en/GoogleNews.bin', binary=True)
model1 = {}
# normalize vectors
for string in m1.wv.vocab:
model1[string] = m1.wv[string] / np.linalg.norm(m1.wv[string])
# reduce dimensionality
pca = decomposition.PCA(n_components=200)
pca.fit(np.array(list(model1.values())))
model1 = pca.transform(np.array(list(model1.values())))
i = 0
for key, value in model1.items():
model1[key] = model1[i] / np.linalg.norm(model1[i])
i = i + 1
with open(dir_pca_we_EWE, 'wb') as handle:
pickle.dump(model1, handle, protocol=pickle.HIGHEST_PROTOCOL)
return model1
This then produces the following error:
Traceback (most recent call last):
File "<stdin>", line 1, in <module>
File "<stdin>", line 12, in reduce_dimensions_WE
AttributeError: 'numpy.ndarray' object has no attribute 'items'
As always, all help is greatly appreciated!
You start by initializing model1 = {} as an empty dict. By using transform in
model1 = pca.transform(np.array(list(model1.values())))
the variable model1 becomes a numpy.ndarray, which is the return type of the transform method of the pca. In the line
for key, value in model1.items():
...
you still use model1 as if it is a dict, which it no longer is.
#datasailor answers your question and tells you what is wrong. In the comments you ask for how to reduce dimentions of data to 200, and I think the easiest way to do this would be using the .fit_transform from sklearn.decomposition.PCA, instead of the .transform as you are currently using:
from sklearn.decomposition import PCA
pca = PCA(n_components=200)
lower_dim_Data=pca.fit_transform(data)
Working on a project based on speaker recognition using python and getting the following error while finding MFCC.
Traceback (most recent call last):
File "neh1.py", line 10, in <module>
complexSpectrum = numpy.fft(signal)
TypeError: 'module' object is not callable
This is the part of code:
import numpy
from scipy.fftpack import dct
from scipy.io import wavfile
sampleRate, signal = wavfile.read("/home/neha/Audio/b6.wav")
numCoefficients = 13 # choose the sive of mfcc array
minHz = 0
maxHz = 22.000
complexSpectrum = numpy.fft(signal)
powerSpectrum = abs(complexSpectrum) ** 2
filteredSpectrum = numpy.dot(powerSpectrum, melFilterBank())
logSpectrum = numpy.log(filteredSpectrum)
dctSpectrum = dct(logSpectrum, type=2)
What would be the issue?
A TypeError: 'module' object is not callable means you're trying to use something like a function when it's not actually a function or a method (e.g. doing foo() when foo is an int or a module). As #JohnGordon points out, numpy.fft is a module, but you're calling it like a function. You want to use `numpy.fft.fft() to do what you want.
See the numpy.fft docs for more functions related to fast Fourier Transforms.