so I have three Arrays of dimension (1949, 2649)
Jun_1TMean = xr.DataArray(Jun_1T.variables['__xarray_dataarray_variable__'])
lon = xr.DataArray(lon2)
lat = xr.DataArray(lat2)
When I do
June_1T =np.array( [Jun_1TMean, lat, lon])
June_1T.shape
I get (3, 1949, 2649)
However I actually want shape (1949, 2649, 1949, 2649, 1949, 2649) instead
Apart from the fact you cant just 'stack' NumPy arrays as separate axes, without a broadcastable function or ufunc like +, *, etc, I don't think you want to be doing that. A numpy array with those dimensions as you suggest, which has a dtype int64 (float will be worse) will take -
array_space = (1949*2649*1949*2649*1949*2649)*8 bytes
= 1100959591182509749608 bytes
= 1100959591182.51 GB
= 1100959.59 Petabytes
For reference, the combined data of Google, Amazon, Microsoft and Facebook collectively is estimated to be 1,200 petabytes
Related
I am using Scikit-Image imread function for reading images for a PyTorch data loader.
I get errors from the function ToTensor(), saying the the strides of the numpy array are negative.
I read about it and using somearray.copy() solves it.
Yet, I'd like to solve it from the root. How can I force Scikit-Image to read the image into a contiguous array with regular strides?
I looked for solutions for this case and they mostly about creating a new copy of data which I want to avoid.
Those are the properties of the array:
print(f'shape: {img.shape}')
print(f'dtype: {img.dtype}')
print(f'strides: {img.strides}')
The output:
shape: (4032, 3024, 3)
dtype: uint8
strides: (3, -12096, 1)
When I run img.base I get the values of the data. Though the dimensions are (3024, 4032, 3)
I don't know a lot about image file formats, but can make some deductions from the data you provided
shape: (4032, 3024, 3)
dtype: uint8
strides: (3, -12096, 1)
img.base (3024, 4032, 3)
img is a view of its base. The negative strides[1] means that dimension has been reversed, e.g. with a ::-1 indexing. The fact that the largest stride is in the middle, means the first two dimensions have been swapped (transpose(1,0,2)). I expect img.base.strides is (12096,3,1). 12096 is 3*4032.
jpg is a compressed format, but I assume the base is close in layout to the file, and this view is needed to conform to our normal numpy expectations for an array.
img.copy() will have the same shape, but strides will be (9072,3,1).
If plt.imread produces an array with that shape and strides, it may well have returned that copy rather than the view. It's not necessarily being any more "efficient".
Think about how we print a 2d array - 1st dimension, rows, going down, 2nd, columns, going across, left to right. But think about a common xy plot - x goes left to right, and y goes from bottom up. Or look at what np.meshgrid says about indexing, 'ij' versus 'xy'.
Having the size 3 dimension last is just another convention. That's the color 'channel', 3 for RGB, 4 adds a transparency value, and 1 for b/w. Sometimes arrays have that dimension first.
So I have multiple files that can be accessed and be treated as 2D arrays.
What I would like to do is take all those 2D arrays and put them in a single 3D array.
For example, if I have 10 files with the shapes (100,100), when I combine them, I should be left with a 3D array of shape (10,100,100). The following attempt I have is the following:
filenames = glob.glob('source')
preset = np.empty([100,100], dtype = 'int16')
for file in filenames:
data = fits.open(file)[0].data
np.vstack([preset,data]).reshape((10,100,100))
But what I'm getting is the following error:
ValueError: cannot reshape array of size 20000 into shape (10,100,100)
You are performing the operation pair by pair. Try to perform this on all the arrays together:
arrs = [fits.open(file)[0].data for file in filenames]
np.vstack(arrs).reshape((10,100,100))
Or even more direct:
np.stack(arrs)
I'm extracting some features from some data generated with an accelerometer and I have the following arrays:
X_mfccs_processed (list with 40 values)
Y_mfccs_processed (list with 40 values)
Z_mfccs_processed (list with 40 values)
X_mean (1 value)
Y_mean (1 value)
Z_mean (1 value)
At the moment i'm able to create a 3D array [shape=(1,40,3)] and insert into it my mfcss arrays
self.extracted_features = np.ndarray(shape=(1, len(self.X_mfccs_processed), 3))
self.extracted_features[:,:,0] = self.X_mfccs_processed
self.extracted_features[:,:,1] = self.Y_mfccs_processed
self.extracted_features[:,:,2] = self.Z_mfccs_processed
My question is: How can i create a 4D array [shape=(1,40,1,3)] where to store also my mean values?
To make the array, instead of assigning values to a preallocated array a better way is:
self.extracted_features = np.array([X_mfccs_processed,Y_mfccs_processed,Z_mfccs_processed]).T[None,...]
or equivalently:
self.extracted_features = np.array([X_mfccs_processed,Y_mfccs_processed,Z_mfccs_processed]).T.reshape(1,-1,3)
However, you cannot add another dimension with shape 1 and insert mean values in it. A dimension value is the number of elements along that dimension. An easy way to think about it is that a matrix of shape (1,N) is a 1xN matrix and it does not mean you can insert the mean in first dimension an a list of size N in the second dimension. You need to come up with another idea to store your means. I would suggest a separate array like this with shape (1,3,1):
self.extracted_features_mean = np.array([X_mean,Y_mean,Z_mean]).T[None,...]
And use similar indexing to access the mean. An alternative would be using dictionaries. Depending on your application, you can pick one that is easier and/or faster.
Usually np.reshape(self.extracted_features, (1,40,1,3)) works well.
The shape would have to be different to store the mean values as well. There isn't enough space.
(1,40,1,6) or (1,40,2,3)
seem reasonable shapes.
for (1,40,1,6)
self.extracted_features = np.ndarray(shape=(1, len(self.X_mfccs_processed), 1, 6))
self.extracted_features[:,:,:,0] = self.X_mfccs_processed
self.extracted_features[:,:,:,1] = self.Y_mfccs_processed
self.extracted_features[:,:,:,2] = self.Z_mfccs_processed
self.extracted_features[:,:,:,3] = self.X_mean
self.extracted_features[:,:,:,4] = self.Y_mean
self.extracted_features[:,:,:,5] = self.Z_mean
for (1,40,2,3)
self.extracted_features = np.ndarray(shape=(1, len(self.X_mfccs_processed), 2, 3))
self.extracted_features[:,:,0,0] = self.X_mfccs_processed
self.extracted_features[:,:,0,1] = self.Y_mfccs_processed
self.extracted_features[:,:,0,2] = self.Z_mfccs_processed
self.extracted_features[:,:,1,0] = self.X_mean
self.extracted_features[:,:,1,1] = self.Y_mean
self.extracted_features[:,:,1,2] = self.Z_mean
I should mention this casts the mean values meaning that it duplicates them (40 times). This would be bad for storage but if you doing some type of machine learning or numerics this might be a good tradeoff. Alternatively you could do a (1,41,1,3) shape.
I have different .npy files, in which there are saved numpy arrays (or images represented as matrices, with a dimension = 64, the other one I don't know).
I want to read them, store them in a numpy.ndarray of 3 dimensions.
What I have done till now is something very different, and I'm having problems dealing with the structures I created.
database_list = list()
labels_list = list()
for filename in glob.glob('*.npy'):
database_list.append(np.load(filename))
label_temp = extract_label(filename)
labels_list.append(label_temp)
database = np.array(database_list)
labels = np.array(labels_list)
In that way, I have a numpy.ndarray database of shape (n_elements,).
Let's assume that I reshape each image as (n, 64), I want database to be of the shape (n_elements, n, 64).
How can I do it?
What I want to achieve is an array of the same shape of MNIST database, for working on neural network.
EDIT:
database type is numpy.ndarray. It can't be reshaped, database is of size n, say 10 (because it is composed of n elements, for example 10 if 10 files are loaded. The files are matrices of two dimensions, but I want them to be "part of" database).
For database = np.array(database_list) to make a 3d array with shape (n_elements, dim1, dim2), the database_list has to contain arrays all with the shape (dim1, dim2). If they differ in shape the result will be a (n_elements,) shaped array with object dtype (or in some cases it will throw an error).
I am trying to use tslearn library to analyze an audio numpy file. The file has a 45K row (45K audio samples) and 1 column, but each row has a nested object of (N,13). So the length of each sample is different while the features are the same (13 features). I want to stretch them all to 93 rows, which means if I print the shape of any of them, it will return (93,13).
data example:
first nested object in the dataset, shape (43,13)
second nested object in the dataset, shape (30,13)
I tried to use this tslearn library: https://tslearn.readthedocs.io/en/latest/gen_modules/preprocessing/tslearn.preprocessing.TimeSeriesResampler.html#tslearn.preprocessing.TimeSeriesResampler
but it will only change the column instead of the row. so basically if I have an array that is (44,13), it will change the array shape to (44,93) instead of (93.13). So I tried to rotate the array for 90 degrees and redo the analysis, but since the dataset itself is only 1D with 45K nested object, I had to make an empty list, use for loop to take out each object, rotate them 90 degrees and put them back to the list. Then I tried to change the list back to an array since the tslearn.preprocessing.TimeSeriesResampler only accepts array as parameters. However, it tells me that 'ValueError: could not broadcast input array from shape (13,41) into shape (13)' while trying to transfer the list back to an array.
import numpy as np
spoken_train = np.load("spoken_train.npy", allow_pickle=True)
lis = []
for i in range(len(spoken_train)):
lis.append(np.rot90(spoken_train[i]))
myarray = np.asarray(lis)
---------------------------------------------------------------------------
ValueError Traceback (most recent call last)
<ipython-input-65-440f2eba9eba> in <module>
2 for i in range(len(spoken_train)):
3 lis.append(np.rot90(spoken_train[i]))
----> 4 myarray = np.asarray(lis)
/anaconda3/lib/python3.7/site-packages/numpy/core/numeric.py in asarray(a, dtype, order)
536
537 """
--> 538 return array(a, dtype, copy=False, order=order)
539
540
ValueError: could not broadcast input array from shape (13,41) into shape (13)
What should I do? If there is any easier way to rotate the nested array, please let me know as well. Thank you!
Does this fit the bill:
lis = []
for i in range(len(spoken_train)):
item = spoken_train[i]
lis.append( item + np.zeros((1,item.shape[-1])))
myarray = np.concatenate(lis)
The item in the loop must have same number of columns though. According to your examples, all arrays in spoken_train must have the last dimension of 13.
lis = np.copy(z) #since they have the same number of arrays
for i in range(len(spoken_train)):
lis[i] = np.rot90(spoken_train[i])