This question already has answers here:
What does -1 mean in numpy reshape?
(12 answers)
Closed 6 years ago.
I have a numpy array (A) of shape = (100000, 28, 28)
I reshape it using A.reshape(-1, 28x28)
This is very common use in Machine learning pipelines.
How does this work ? I have never understood the meaning of '-1' in reshape.
An exact question is this
But no solid explanation. Any answers pls ?
in numpy, creating a matrix of 100X100 items is like this:
import numpy as np
x = np.ndarray((100, 100))
x.shape # outputs: (100, 100)
numpy internally stores all these 10000 items in an array of 10000 items regardless of the shape of this object, this allows us to change the shape of this array into any dimensions as long as the number of items on the array does not change
for example, reshaping our object to 10X1000 is ok as we keep the 10000 items:
x = x.reshape(10, 1000)
reshaping to 10X2000 wont work as we does not have enough items on the list
x.reshape(10, 2000)
ValueError: total size of new array must be unchanged
so back to the -1 question, what it does is the notation for unknown dimension, meaning:
let numpy fill the missing dimension with the correct value so my array remain with the same number of items.
so this:
x = x.reshape(10, 1000)
is equivalent to this:
x = x.reshape(10, -1)
internally what numpy does is just calculating 10000 / 10 to get the missing dimension.
-1 can even be on the start of the array or in the middle.
the above two examples are equivalent to this:
x = x.reshape(-1, 1000)
if we will try to mark two dimensions as unknown, numpy will raise an exception as it cannot know what we are meaning as there are more than one way to reshape the array.
x = x.reshape(-1, -1)
ValueError: can only specify one unknown dimension
It means, that the size of the dimension, for which you passed -1, is being inferred. Thus,
A.reshape(-1, 28*28)
means, "reshape A so that its second dimension has a size of 28*28 and calculate the correct size of the first dimension".
See documentation of reshape.
Related
This question already has answers here:
Multiplying all combinations of array elements in numpy
(2 answers)
Closed 1 year ago.
I have two arrays of size 200 and 300. However I would like to merge them to be an array of 200x300 so 200 rows and 300 columns. This is a basic question I know... and not even sure it's possible but how might I do this?
I tried using np.hstack but hstack created an array of size 500:
array1 = np.random.rand(200)
array2 = np.random.rand(300)
test = np.hstack((array1,array2))
test.shape
(500,)
I also tried stack, vstack, block, etc. but they require that the arrays are the same length, however with the real data I am using the arrays are not the same length.
My goal is to make one 2d array with shape 200x300.
You can do this:
a = np.random.rand(2)
b = np.random.rand(3)
np.outer(a, b)
array([[0.10570007, 0.14838246, 0.04037839],
[0.13164818, 0.18480859, 0.0502908 ]])
This question already has answers here:
Good ways to "expand" a numpy ndarray?
(6 answers)
Closed 1 year ago.
I have a numpy array with dimensions (1316, 21) and I need to increase it to (1329, 21). It doesn't matter what values are stored in the added space at the end. I tried to do:
x = np.append(x, np.zeros(13))
But that changes the dimensions of the array to (27649,) which shows that it is converting it into a one dimensional array then adding the zeros to the end.
How do I append empty 2 dimensional values to an array like this?
Use np.concatenate or np.vstack
np.concatenate([x, np.zeros((13, x.shape[1]))], axis=0)
# or
np.vstack([x, np.zeros((13, x.shape[1]))])
Ummm...there is no converting the dimensions of a numpy array in python. A numpy array is simply a section of your RAM. You can't append to it in the sense of literally adding bytes to the end of the array, but you can create another array and copy over all the data (which is what np.append(), or np.vstack(), or np.concatenate(), etc.). In general, the dimensions of your array is simply a few variables that python keeps track of to present the data in the array to you, same thing as it's dtype.
For example,
X = np.array([1,2,3,4,5],dtype='int32')
print(X)
X.dtype = 'int16' #The data is not converted it is simply displayed differently now.
print(X) #Displays the data for the array.
X.shape = (5,2) #Does not convert the data or touch it.
print(X) #Displays the data for you using the parameter set in .shape.
For your data, you can simply update the .shape when you append more data.
x = np.append(x, np.zeros((13,21)))
x.shape = (1329, 21)
May be like this:
import numpy as np
x = np.array([[1, 2, 3,4], [4, 5, 6,7]])
x = np.append(x, [np.zeros(4) for _ in range(13)] , axis=0)
print(x.shape)
print(x)
I'm extracting some features from some data generated with an accelerometer and I have the following arrays:
X_mfccs_processed (list with 40 values)
Y_mfccs_processed (list with 40 values)
Z_mfccs_processed (list with 40 values)
X_mean (1 value)
Y_mean (1 value)
Z_mean (1 value)
At the moment i'm able to create a 3D array [shape=(1,40,3)] and insert into it my mfcss arrays
self.extracted_features = np.ndarray(shape=(1, len(self.X_mfccs_processed), 3))
self.extracted_features[:,:,0] = self.X_mfccs_processed
self.extracted_features[:,:,1] = self.Y_mfccs_processed
self.extracted_features[:,:,2] = self.Z_mfccs_processed
My question is: How can i create a 4D array [shape=(1,40,1,3)] where to store also my mean values?
To make the array, instead of assigning values to a preallocated array a better way is:
self.extracted_features = np.array([X_mfccs_processed,Y_mfccs_processed,Z_mfccs_processed]).T[None,...]
or equivalently:
self.extracted_features = np.array([X_mfccs_processed,Y_mfccs_processed,Z_mfccs_processed]).T.reshape(1,-1,3)
However, you cannot add another dimension with shape 1 and insert mean values in it. A dimension value is the number of elements along that dimension. An easy way to think about it is that a matrix of shape (1,N) is a 1xN matrix and it does not mean you can insert the mean in first dimension an a list of size N in the second dimension. You need to come up with another idea to store your means. I would suggest a separate array like this with shape (1,3,1):
self.extracted_features_mean = np.array([X_mean,Y_mean,Z_mean]).T[None,...]
And use similar indexing to access the mean. An alternative would be using dictionaries. Depending on your application, you can pick one that is easier and/or faster.
Usually np.reshape(self.extracted_features, (1,40,1,3)) works well.
The shape would have to be different to store the mean values as well. There isn't enough space.
(1,40,1,6) or (1,40,2,3)
seem reasonable shapes.
for (1,40,1,6)
self.extracted_features = np.ndarray(shape=(1, len(self.X_mfccs_processed), 1, 6))
self.extracted_features[:,:,:,0] = self.X_mfccs_processed
self.extracted_features[:,:,:,1] = self.Y_mfccs_processed
self.extracted_features[:,:,:,2] = self.Z_mfccs_processed
self.extracted_features[:,:,:,3] = self.X_mean
self.extracted_features[:,:,:,4] = self.Y_mean
self.extracted_features[:,:,:,5] = self.Z_mean
for (1,40,2,3)
self.extracted_features = np.ndarray(shape=(1, len(self.X_mfccs_processed), 2, 3))
self.extracted_features[:,:,0,0] = self.X_mfccs_processed
self.extracted_features[:,:,0,1] = self.Y_mfccs_processed
self.extracted_features[:,:,0,2] = self.Z_mfccs_processed
self.extracted_features[:,:,1,0] = self.X_mean
self.extracted_features[:,:,1,1] = self.Y_mean
self.extracted_features[:,:,1,2] = self.Z_mean
I should mention this casts the mean values meaning that it duplicates them (40 times). This would be bad for storage but if you doing some type of machine learning or numerics this might be a good tradeoff. Alternatively you could do a (1,41,1,3) shape.
This question has been asked before, but the solution only works for 1D/2D arrays, and I need a more general answer.
How do you create a repeating array without replicating the data? This strikes me as something of general use, as it would help to vectorize python operations without the memory hit.
More specifically, I have a (y,x) array, which I want to tile multiple times to create a (z,y,x) array. I can do this with numpy.tile(array, (nz,1,1)), but I run out of memory. My specific case has x=1500, y=2000, z=700.
One simple trick is to use np.broadcast_arrays to broadcast your (x, y) against a z-long vector in the first dimension:
import numpy as np
M = np.arange(1500*2000).reshape(1500, 2000)
z = np.zeros(700)
# broadcasting over the first dimension
_, M_broadcast = np.broadcast_arrays(z[:, None, None], M[None, ...])
print M_broadcast.shape, M_broadcast.flags.owndata
# (700, 1500, 2000), False
To generalize the stride_tricks method given for a 1D array in this answer, you just need to include the shape and stride length for each dimension of your output array:
M_strided = np.lib.stride_tricks.as_strided(
M, # input array
(700, M.shape[0], M.shape[1]), # output dimensions
(0, M.strides[0], M.strides[1]) # stride length in bytes
)
I have a list of several hundred 10x10 arrays that I want to stack together into a single Nx10x10 array. At first I tried a simple
newarray = np.array(mylist)
But that returned with "ValueError: setting an array element with a sequence."
Then I found the online documentation for dstack(), which looked perfect: "...This is a simple way to stack 2D arrays (images) into a single 3D array for processing." Which is exactly what I'm trying to do. However,
newarray = np.dstack(mylist)
tells me "ValueError: array dimensions must agree except for d_0", which is odd because all my arrays are 10x10. I thought maybe the problem was that dstack() expects a tuple instead of a list, but
newarray = np.dstack(tuple(mylist))
produced the same result.
At this point I've spent about two hours searching here and elsewhere to find out what I'm doing wrong and/or how to go about this correctly. I've even tried converting my list of arrays into a list of lists of lists and then back into a 3D array, but that didn't work either (I ended up with lists of lists of arrays, followed by the "setting array element as sequence" error again).
Any help would be appreciated.
newarray = np.dstack(mylist)
should work. For example:
import numpy as np
# Here is a list of five 10x10 arrays:
x = [np.random.random((10,10)) for _ in range(5)]
y = np.dstack(x)
print(y.shape)
# (10, 10, 5)
# To get the shape to be Nx10x10, you could use rollaxis:
y = np.rollaxis(y,-1)
print(y.shape)
# (5, 10, 10)
np.dstack returns a new array. Thus, using np.dstack requires as much additional memory as the input arrays. If you are tight on memory, an alternative to np.dstack which requires less memory is to
allocate space for the final array first, and then pour the input arrays into it one at a time.
For example, if you had 58 arrays of shape (159459, 2380), then you could use
y = np.empty((159459, 2380, 58))
for i in range(58):
# instantiate the input arrays one at a time
x = np.random.random((159459, 2380))
# copy x into y
y[..., i] = x