H5PY problem saving composite numpy arrays - python

In an attempt to reverse-engineer a file format, I have arrived at a following minimal example for creating a composite numpy datatype and saving it to HDF5. The original file seems to be storing datasets of the below data type. However, I do not seem to be able to write such datasets to a file.
import numpy as np
import h5py
data = ("Many cats".encode(), np.linspace(0, 1, 20))
data_type = [('index', 'S' + str(len(data[0]))), ('values', '<f8', (20,))]
arr = np.array(data, dtype=data_type)
print(arr)
h5f = h5py.File("lol.h5", 'w')
dset = h5f.create_dataset("data", arr, dtype=data_type)
h5f.close()
This code crashes with the error
Traceback (most recent call last):
File "test.py", line 13, in
dset = h5f.create_dataset("data", arr, dtype=data_type)
File "/opt/anaconda3/lib/python3.7/site-packages/h5py/_hl/group.py", line
116, in create_dataset
dsid = dataset.make_new_dset(self, shape, dtype, data, **kwds)
File "/opt/anaconda3/lib/python3.7/site-packages/h5py/_hl/dataset.py", line
75, in make_new_dset
shape = tuple(shape)
TypeError: iteration over a 0-d array
How can I overcome this issue?

I restructured/reordered your code to get it to work with h5py.
The code below works for 1 row. You will have to adjust to make the number of rows a variable.
import numpy as np
import h5py
data = ("Many cats".encode(), np.linspace(0, 1, 20))
data_type = [('index', 'S' + str(len(data[0]))), ('values', '<f8', (20,))]
arr = np.zeros((1,), dtype=data_type)
arr[0]['index'] = "Many cats".encode()
arr[0]['values'] = np.linspace(0, 1, 20)
h5f = h5py.File("lol.h5", 'w')
dset = h5f.create_dataset("data", data=arr)
h5f.close()

Related

'yerr' shape matches 'y' shape but throws value error

I am using the Spyder IDE(5.3.3) with python(3.9.13 64bit) on Ubuntu 20.04LTS. I am trying to plot errorbar by calculating the standard deviation between '5' sets of data. My x-coordinate is named 'RC_AVG', y-coordinate is named 'PMF_AVG' and, the standard deviation is named 'PMF_STD'. After storing data in these lists, I've reshaped all of them to shape (175,1) and then I am using the ax.errorbar command to plot the errorbars but python throws 'Value error': 'yerr' (shape: (175, 1)) must be a scalar or a 1D or (2, n) array-like whose shape matches 'y' (shape: (175, 1)). I am unable to understand the cause of this error and need help in understanding it. However, when I remove the reshape(175,1) from the x,y and, the error coordinates the code works fine and I get the graph. I am attaching the code below:
typeimport numpy as np
import matplotlib.pyplot as plt
fig, (ax1) = plt.subplots(1,1,figsize=(5,5))
file_name = "bsResult-THETA83-IRUN"
result = []
for i in range(1,6):
a = np.array(np.loadtxt(file_name+str(i)+".xvg", dtype = float,skiprows=18,max_rows=175))
result.append(a)
result = np.array(result)
result1 = result.copy()
RC_AVG = np.mean(result1[:,:,0],axis=0).reshape(175,1) ###### x-coordinate
PMF_AVG = np.mean(result1[:,:,1],axis=0).reshape(175,1) ##### y-coordinate
PMF_STD = np.std(result1[:,:,2],axis=0).reshape(175,1) ###### error-coordinate
ax1.set_xlim(0.1,1.70)
ax1.set_xlabel("\u03B6 $(nm)$",fontweight = 'bold',fontsize=12)
ax1.set_ylabel("G $(k_{B}T)$",fontweight = 'bold',fontsize=12)
ax1.errorbar(RC_AVG,PMF_AVG,yerr=PMF_STD,label = 'Nitrogen',color='#D32D41',linewidth=1.0,elinewidth=1.0,
capsize=1.1,ecolor='black',errorevery=(8))
#####################################################################
Traceback (most recent call last):
File "/home/sps/software/yes/lib/python3.9/site-packages/spyder_kernels/py3compat.py", line 356, in compat_exec
exec(code, globals, locals)
File "/media/sps/hdd/PMF/REFERENCE/pmf-2G6X6-epswdr-wspce-k400/pmfReference.py", line 31, in <module>
ax1.errorbar(RC_AVG,PMF_AVG,yerr=PMF_STD,label = 'Nitrogen',color='#D32D41',linewidth=1.0,elinewidth=1.0,
File "/home/sps/software/yes/lib/python3.9/site-packages/matplotlib/__init__.py", line 1423, in inner
return func(ax, *map(sanitize_sequence, args), **kwargs)
File "/home/sps/software/yes/lib/python3.9/site-packages/matplotlib/axes/_axes.py", line 3588, in errorbar
raise ValueError(
ValueError: 'yerr' (shape: (175, 1)) must be a scalar or a 1D or (2, n) array-like whose shape matches 'y' (shape: (175, 1)) here
I am able to get the errorbars in the plot if I remove the reshape(175,1) from the x,y and, the error coordinates as shown below:
import matplotlib.pyplot as plt
fig, (ax1) = plt.subplots(1,1,figsize=(5,5))
file_name = "bsResult-THETA83-IRUN"
result = []
for i in range(1,6):
a = np.array(np.loadtxt(file_name+str(i)+".xvg", dtype = float,skiprows=18,max_rows=175))
result.append(a)
result = np.array(result)
result1 = result.copy()
RC_AVG = np.mean(result1[:,:,0],axis=0)#.reshape(175,1) ----commented reshape
PMF_AVG = np.mean(result1[:,:,1],axis=0)#.reshape(175,1) ---commented reshape
PMF_STD = np.std(result1[:,:,2],axis=0)#.reshape(175,1) ----commented reshape
ax1.set_xlim(0.1,1.70)
ax1.set_xlabel("\u03B6 $(nm)$",fontweight = 'bold',fontsize=12)
ax1.set_ylabel("G $(k_{B}T)$",fontweight = 'bold',fontsize=12)
ax1.errorbar(RC_AVG,PMF_AVG,yerr=PMF_STD,label = 'Nitrogen',color='#D32D41',linewidth=1.0,elinewidth=1.0,
capsize=1.1,ecolor='black',errorevery=(8))
type here
[enter image description here](https://www.stackoverflow.com/)

setting an array element with a sequence requested array has an inhomogeneous shape after 1 dimensions The detected shape was (2,)+inhomogeneous part

import os
import numpy as np
from scipy.signal import *
import csv
import matplotlib.pyplot as plt
from scipy import signal
from brainflow.board_shim import BoardShim, BrainFlowInputParams, LogLevels, BoardIds
from brainflow.data_filter import DataFilter, FilterTypes, AggOperations, WindowFunctions, DetrendOperations
from sklearn.cluster import KMeans
#Options to read: 'EEG-IO', 'EEG-VV', 'EEG-VR', 'EEG-MB'
data_folder = 'EEG-IO'
# Parameters and bandpass filtering
fs = 250.0
# Reading data files
file_idx = 0
list_of_files = [f for f in os.listdir(data_folder) if os.path.isfile(os.path.join(data_folder, f)) and '_data' in f] #List of all the files, Lists are randomized, its only looking for file with _data in it
print(list_of_files)
file_sig = list_of_files[file_idx] # Data File
file_stim = list_of_files[file_idx].replace('_data','_labels') #Label File, Replacing _data with _labels
print ("Reading: ", file_sig, file_stim)
# Loading data
if data_folder == 'EEG-IO' or data_folder == 'EEG-MB':
data_sig = np.loadtxt(open(os.path.join(data_folder,file_sig), "rb"), delimiter=";", skiprows=1, usecols=(0,1,2)) #data_sig would be a buffer
elif data_folder == 'EEG-VR' or data_folder == 'EEG-VV':
data_sig = np.loadtxt(open(os.path.join(data_folder,file_sig), "rb"), delimiter=",", skiprows=5, usecols=(0,1,2))
data_sig = data_sig[0:(int(200*fs)+1),:] # getting data ready -- not needed for previous 2 datasets
data_sig = data_sig[:,0:3] #
data_sig[:,0] = np.array(range(0,len(data_sig)))/fs
############ Calculating PSD ############
index, ch = data_sig.shape[0], data_sig.shape[1]
# print(index)
feature_vectors = [[], []]
feature_vectorsa = [[], []]
feature_vectorsb = [[], []]
feature_vectorsc = [[], []]
#for x in range(ch):
#for x in range(1,3):
#while x <
#while x>0:
x=1
while x>0 and x<3:
if x==1:
data_sig[:,1] = lowpass(data_sig[:,1], 10, fs, 4)
elif x==2:
data_sig[:,2] = lowpass(data_sig[:,2], 10, fs, 4)
for y in range(500, 19328 ,500):
#print(ch)
if x==1:
DataFilter.detrend(data_sig[y-500:y, 1], DetrendOperations.LINEAR.value)
psd = DataFilter.get_psd_welch(data_sig[y-500:y, 1], nfft, nfft//2, 250,
WindowFunctions.BLACKMAN_HARRIS.value)
band_power_delta = DataFilter.get_band_power(psd, 1.0, 4.0)
# Theta 4-8
band_power_theta = DataFilter.get_band_power(psd, 4.0, 8.0)
#Alpha 8-12
band_power_alpha = DataFilter.get_band_power(psd, 8.0, 12.0)
#Beta 12-30
band_power_beta = DataFilter.get_band_power(psd, 12.0, 30.0)
# print(feature_vectors.shape)
feature_vectors[x].insert(y, [band_power_delta, band_power_theta, band_power_alpha, band_power_beta])
feature_vectorsa[x].insert(y, [band_power_delta, band_power_theta])
elif x==2:
DataFilter.detrend(data_sig[y-500:y, 2], DetrendOperations.LINEAR.value)
psd = DataFilter.get_psd_welch(data_sig[y-500:y, 2], nfft, nfft//2, 250,
WindowFunctions.BLACKMAN_HARRIS.value)
band_power_delta = DataFilter.get_band_power(psd, 1.0, 4.0)
# Theta 4-8
band_power_theta = DataFilter.get_band_power(psd, 4.0, 8.0)
#Alpha 8-12
band_power_alpha = DataFilter.get_band_power(psd, 8.0, 12.0)
#Beta 12-30
band_power_beta = DataFilter.get_band_power(psd, 12.0, 30.0)
# print(feature_vectors.shape)
# feature_vectorsc[x].insert(y, [band_power_delta, band_power_theta, band_power_alpha, band_power_beta])
# feature_vectorsd[x].insert(y, [band_power_delta, band_power_theta])
x = x+1
print(feature_vectorsa)
powers = np.log10(np.asarray(feature_vectors, dtype=float))
powers1 = np.log10(np.asarray(feature_vectorsa, dtype=float))
# powers2 = np.log10(np.asarray(feature_vectorsb))
# powers3 = np.log10(np.asarray(feature_vectorsc))
print(powers.shape)
print(powers1.shape)
Super confused. When I run my code, I keep on getting this error:
ValueError: setting an array element with a sequence. The requested array has an inhomogeneous shape after 1 dimensions. The detected shape was (2,) + inhomogeneous part.
Traceback:
File "/Users/mikaelhaji/Downloads/EEG-EyeBlinks/read_data.py", line 170, in
powers = np.log10(np.asarray(feature_vectors, dtype=float))
File "/Library/Frameworks/Python.framework/Versions/3.8/lib/python3.8/site-packages/numpy/core/_asarray.py", line 102, in asarray
return array(a, dtype, copy=False, order=order)
ValueError: setting an array element with a sequence. The requested array has an inhomogeneous shape after 1 dimensions. The detected shape was (2,) + inhomogeneous part.
If you have any thoughts/ answers as to why this may be occurring, please let me know.
Thanks in advance for the responses.
Here's a simple case that produces your error message:
In [19]: np.asarray([[1,2,3],[4,5]],float)
Traceback (most recent call last):
File "<ipython-input-19-72fd80bc7856>", line 1, in <module>
np.asarray([[1,2,3],[4,5]],float)
File "/usr/local/lib/python3.8/dist-packages/numpy/core/_asarray.py", line 102, in asarray
return array(a, dtype, copy=False, order=order)
ValueError: setting an array element with a sequence. The requested array has an inhomogeneous shape after 1 dimensions. The detected shape was (2,) + inhomogeneous part.
If I omit the float, it makes an object dtype array - with warning.
In [20]: np.asarray([[1,2,3],[4,5]])
/usr/local/lib/python3.8/dist-packages/numpy/core/_asarray.py:102: VisibleDeprecationWarning: Creating an ndarray from ragged nested sequences (which is a list-or-tuple of lists-or-tuples-or ndarrays with different lengths or shapes) is deprecated. If you meant to do this, you must specify 'dtype=object' when creating the ndarray.
return array(a, dtype, copy=False, order=order)
Out[20]: array([list([1, 2, 3]), list([4, 5])], dtype=object)
Have you tried downgrading the numpy? When I downgraded it from 1.24.1 to 1.21.6, the error went away, only UserWarning and FutureWarning exist.
!pip install numpy==1.21.6
I was getting the same error.
I was opening a txt file that contains a table of values, and saving it into a NumPy array defining the dtype as float since otherwise, the numbers would be strings.
with open(dirfile) as fh:
next(fh)
header = next(fh)[2:]
next(fh)
data = np.array([line.strip().split() for line in fh], float)
For the previous files, it worked perfectly however for the last file it did not:
The requested array has an inhomogeneous shape after 1 dimensions. The detected shape was (35351,) + inhomogeneous part.
However when I ran data = nploadtxt(fh) a new error appeared: Wrong number of columns at line 35351
So, my problem was that the last line of the file was missing the values of the two last columns. I corrected it in the txt file since I wanted to have the same structure of a numpy.array(dtype=float) and everything worked fine.
[1, np.array[0,1,2], 3, np.array[8,9,10]] you might have an issue like this.
simple thing you can do is: Put the break point where the error is arising -- > run the IDE in debug mode --> print the particular variable or line --> avoid this array within array scenario and it will work ! (It worked for me, hope it works for people looking for the solution !!)

squeeze function in numpy running out of memory

I try to make a climatic map in python which I am not used to use but want to try if it is more handy than plotting in R. I use the example by
http://joehamman.com/2013/10/12/plotting-netCDF-data-with-Python/ with my data.
from netCDF4 import Dataset
import numpy as np
myfil = "xxxxx"
fh = Dataset(myfil, mode='r')
lons = fh.variables['lon'][:]
lats = fh.variables['lat'][:]
tmean = fh.variables['Tmean_ANN'][:1]
from mpl_toolkits.basemap import Basemap
import matplotlib.pyplot as plt
# Get some parameters for the Stereographic Projection
lon0 = lons.mean()
lats0 = lats.mean()
m = Basemap(width=5000000,height=3500000,
resolution='l',projection='stere',
lat_ts=60,lat_0=lats0,lon_0=lon0)
lon, lat = np.meshgrid(lons, lats, sparse=True)
xi, yi = m(lon, lat)
# Plot Data
print(xi.shape)
print(yi.shape)
print(tmean.shape)
results
(1, 1142761)
(1142761, 1)
(1, 1069, 1069)
Trying to run this line
cs = m.contour(xi,yi, np.squeeze(tmean))
I got the error
cs = m.contour(xi,yi, np.squeeze(tmean))
Traceback (most recent call last):
File "<ipython-input-37-8be9f03a0e45>", line 1, in <module>
cs = m.contour(xi,yi, np.squeeze(tmean))
File "C:\ProgramData\Anaconda3\lib\site-packages\mpl_toolkits\basemap\__init__.py", line 546, in with_transform
return plotfunc(self,x,y,data,*args,**kwargs)
File "C:\ProgramData\Anaconda3\lib\site-packages\mpl_toolkits\basemap\__init__.py", line 3566, in contour
np.logical_or(np.greater(x,self.xmax+epsx),np.greater(y,self.ymax+epsy))
MemoryError
Any help why do I got this. My hypothesis is that the dimension of xi and yi is not compatible with tmean thus I got the error. The np.sqeeze function works with tmean data outside the m.contour function But I could not solve that for a while.

Strange TypeError with Theano

Traceback (most recent call last):
File "test.py", line 37, in <module>
print convLayer1.output.shape.eval({x:xTrain})
File "/Volumes/TONY/anaconda/lib/python2.7/site-packages/theano/gof/graph.py", line 415, in eval
rval = self._fn_cache[inputs](*args)
File "/Volumes/TONY/anaconda/lib/python2.7/site-packages/theano/compile/function_module.py", line 513, in __call__
allow_downcast=s.allow_downcast)
File "/Volumes/TONY/anaconda/lib/python2.7/site-packages/theano/tensor/type.py", line 180, in filter
"object dtype", data.dtype)
TypeError
And here is my code:
import scipy.io as sio
import numpy as np
import theano.tensor as T
from theano import shared
from convnet3d import ConvLayer, NormLayer, PoolLayer, RectLayer
from mlp import LogRegr, HiddenLayer, DropoutLayer
from activations import relu, tanh, sigmoid, softplus
dataReadyForCNN = sio.loadmat("DataReadyForCNN.mat")
xTrain = dataReadyForCNN["xTrain"]
# xTrain = np.random.rand(10, 1, 5, 6, 2).astype('float64')
xTrain.shape
dtensor5 = T.TensorType('float64', (False,)*5)
x = dtensor5('x') # the input data
yCond = T.ivector()
# input = (nImages, nChannel(nFeatureMaps), nDim1, nDim2, nDim3)
kernel_shape = (5,6,2)
fMRI_shape = (51, 61, 23)
n_in_maps = 1 # channel
n_out_maps = 5 # num of feature maps, aka the depth of the neurons
num_pic = 2592
layer1_input = x
# layer1_input.eval({x:xTrain}).shape
# layer1_input.shape.eval({x:numpy.zeros((2592, 1, 51, 61, 23))})
convLayer1 = ConvLayer(layer1_input, n_in_maps, n_out_maps, kernel_shape, fMRI_shape,
num_pic, tanh)
print convLayer1.output.shape.eval({x:xTrain})
It is really weird as the error was not thrown in Jupyter (but it takes long long time to run and finally the kernel is down I really don't know why), but as I move it to the shell and run python fileName.py the error was thrown.
The problem lies in loadmat from scipy. The typeerror you are getting is thrown by this code in Theano:
if not data.flags.aligned:
...
raise TypeError(...)
Now, when you create a new array in numpy from raw data, it would usually be aligned:
>>> a = np.array(2)
>>> a.flags.aligned
True
But if you savemat / loadmat it, the value of the flag gets lost:
>>> savemat('test', {'a':a})
>>> a2 = loadmat('test')['a']
>>> a2.flags.aligned
False
(seems like this particular issue is discussed here)
One quick and dirty way to address it is to create a new numpy array from the array you loaded:
>>> a2 = loadmat('test')['a']
>>> a3 = np.array(a2)
>>> a3.flags.aligned
True
So, for your code:
dataReadyForCNN = np.array(sio.loadmat("DataReadyForCNN.mat"))

Use scikit-cuda to compute singular value decomposition with cuSOLVER

I am trying to use scikit-cuda's wrappers for the cuSOLVER functions, in particular I want to execute cusolverDnSgesvd to compute full-matrix single precision SVD on a matrix of real numbers.
Using the code here and here as a reference, I managed to get this far:
import pycuda.autoinit
import pycuda.driver as drv
import pycuda.gpuarray as gpuarray
import numpy as np
from skcuda import cusolver
handle = cusolver.cusolverDnCreate()
m = 50
n = 25
a = np.asarray(np.random.random((m, n)))
a_gpu = gpuarray.to_gpu(a)
ldu = m
ldvt = n
s_gpu = gpuarray.empty(min(m, n), np.float32)
u_gpu = gpuarray.empty((ldu, m), np.float32)
vh_gpu = gpuarray.empty((n, n), np.float32)
work_size = cusolver.cusolverDnSgesvd_bufferSize(handle, m, n)
work = gpuarray.empty((m,n), np.float32)
u_gpu, s_gpu, vh_gpu = cusolver.cusolverDnSgesvd(
handle=handle,
jobu='A',
jobvt='A',
m=m,
n=n,
A=a,
lda=m,
S=s_gpu,
U=u_gpu,
ldu=ldu,
VT=vh_gpu,
ldvt=ldvt,
Work=work,
Lwork=work_size,
rwork=None,
devInfo=0
)
But the code isn't working, probably because I'm messing up with types.
Traceback (most recent call last):
File "/home/vektor/PycharmProjects/yancut/test_svd.py", line 44, in <module>
devInfo=0
File "/home/vektor/anaconda3/lib/python3.4/site-packages/skcuda/cusolver.py", line 577, in cusolverDnSgesvd
int(A), lda, int(S), int(U),
TypeError: only length-1 arrays can be converted to Python scalars
How should I provide all the arguments so that the SVD is executed in a proper way?
UPDATE1:
After using this question as reference, I edited my code and I'm getting a new error.
import pycuda.autoinit
import pycuda.driver as drv
import pycuda.gpuarray as gpuarray
import numpy as np
import ctypes
from skcuda import cusolver
rows = 20
cols = 10
a = np.asarray(np.random.random((rows, cols)))
a_gpu = gpuarray.to_gpu(a.copy())
lda = rows
u_gpu = gpuarray.empty((rows, rows), np.float32)
v_gpu = gpuarray.empty((cols, cols), np.float32)
s_gpu = gpuarray.empty(cols, np.float32)
devInfo = gpuarray.zeros(1, np.int32)
handle = cusolver.cusolverDnCreate()
worksize = cusolver.cusolverDnSgesvd_bufferSize(handle, rows, cols)
print("SIZE", worksize)
Workspace = gpuarray.empty(worksize, np.float32)
svd_status = cusolver.cusolverDnSgesvd(
handle=handle,
jobu='A',
jobvt='A',
m=rows,
n=cols,
A=a_gpu.ptr,
lda=rows,
S=s_gpu.ptr,
U=u_gpu.ptr,
ldu=rows,
VT=v_gpu.ptr,
ldvt=cols,
Work=Workspace.ptr,
Lwork=worksize,
rwork=Workspace.ptr,
devInfo=devInfo.ptr
)
status = cusolver.cusolverDnDestroy(handle)
And I'm getting a new error
Traceback (most recent call last):
File "/home/vektor/PycharmProjects/yancut/test_svd.py", line 53, in <module>
devInfo=devInfo.ptr
File "/home/vektor/anaconda3/lib/python3.4/site-packages/skcuda/cusolver.py", line 579, in cusolverDnSgesvd
Lwork, int(rwork), int(devInfo))
ctypes.ArgumentError: argument 2: <class 'TypeError'>: wrong type
It now seems that I'm doing something wrong with devInfo
From the documentation it looks like each of the matrices (so A, S, U, VT) need to be passed as device pointers. So for PyCUDA gpuarrays, pass A.ptr rather than A. etc and it should work.

Categories

Resources