The shape of a sliced array - python

I have problems about the array calculation after it is being sliced. THe problem is cased by the uncertainty of the shape of the sliced array.
For example, I have a 2D array data with shape of (118,3), however, when I only use the first column of data as following, I can only determine the shape as (118,). The #column cannot be determined unless I use reshape. I do not understand why.
print shape(data), shape(data[:, 0])
The result is : (118, 3) (118,).
I have found similar question asked on stackoverflow. But it did not answer my confusion.

Giving a concrete index for a dimension, reduces this dimension in the result. If you want to keep this dimension, you have to provide a one-element slice:
print data[:, 0:1].shape
results in (118, 1).

Related

transpose Keyword not working as I expected [duplicate]

My goal is to to turn a row vector into a column vector and vice versa. The documentation for numpy.ndarray.transpose says:
For a 1-D array, this has no effect. (To change between column and row vectors, first cast the 1-D array into a matrix object.)
However, when I try this:
my_array = np.array([1,2,3])
my_array_T = np.transpose(np.matrix(myArray))
I do get the wanted result, albeit in matrix form (matrix([[66],[640],[44]])), but I also get this warning:
PendingDeprecationWarning: the matrix subclass is not the recommended way to represent matrices or deal with linear algebra (see https://docs.scipy.org/doc/numpy/user/numpy-for-matlab-users.html). Please adjust your code to use regular ndarray.
my_array_T = np.transpose(np.matrix(my_array))
How can I properly transpose an ndarray then?
A 1D array is itself once transposed, contrary to Matlab where a 1D array doesn't exist and is at least 2D.
What you want is to reshape it:
my_array.reshape(-1, 1)
Or:
my_array.reshape(1, -1)
Depending on what kind of vector you want (column or row vector).
The -1 is a broadcast-like, using all possible elements, and the 1 creates the second required dimension.
If your array is my_array and you want to convert it to a column vector you can do:
my_array.reshape(-1, 1)
For a row vector you can use
my_array.reshape(1, -1)
Both of these can also be transposed and that would work as expected.
IIUC, use reshape
my_array.reshape(my_array.size, -1)

How to concatenate tensor to another list of tensor in pytorch?

I have a tensor of shape "torch.Size([2, 2, 3])" and another tensor of shape "torch.Size([2, 1, 3])". I want a concatenated tensor of shape "torch.Size([2, 2, 6])".
For example :
a=torch.tensor([[[2,3,5],[12,13,15]],[[20,30,50],[120,130,150]]])
b=torch.tensor([[[99,99,99]],[[999,999,999]]])
I want the output as : [[[99,99,99,2,3,5],[99,99,99,12,13,15]],[[999,999,999,20,30,50],[999,999,999,120,130,150]]]
I have written a O(n2) solution using two for loops but,
This is taking a lot of time with millions of calculation, Does anyone help me in doing this efficiently ?? May be some matrix calculation trick for tensors ??
To exactly match the example you have provided:
c = torch.cat([b.repeat([1,a.shape[1]//b.shape[1],1]),a],2)
The reasoning behind this is that the concatenate operation in pytorch (and numpy and other libraries) will complain if the dimensions of the two tensors in the non-specified axes (in this case 0 and 1) do not match. Therefore, you have to repeat the tensor along the non-matching axis (the first axis, therefore the second element of the repeat list) in order to make the dimensions align. Note that the solution here will only work if the middle dimension of a is evenly divisible by the middle dimension of b.
In newer versions of pytorch, this can also be done using the torch.tile() function.

How to balance data when they look like a 3-D array?

I've got a numpy_array of size (3275412, 50, 22) which represents my data reshaped for LSTM purposes and I have got a target vector of shape (3275412,).
I want to balance my data so that there is approximately the same number of data with target 0 and 1.
The way I prepared the data makes that I can not do this balancing operation before reshaping.
Firstly, I wanted to apply make_imbalance function (see this link for details) but I can't apply it on a 2-D array (got an error).
My question is : what's the most efficient way to do it for a 3D array ?
My thoughts: I thought about firstly "flatten" my 3-D array to a 2-D array by "concatenating" the second and third dimension (but don't know how so please tell me ??) then apply make_imbalance and then reshape the result to a 3-D array (again, don't know how to do). It seems a little bit tricky however...
So any help would be appreciated, either for an other imbalancing method or for help about reshaping 3D->2D or vice-versa
You can use np.reshape with -1 for unknown dimension size.
data2d = data3d.reshape(data3d.shape[0], -1)
will give you a 2d array of shape (n_samples, n_features)
with the second and the third dimensions merged.
data2d_new, y_new = make_imbalance(data2d, y)
After make_imbalance call, you will get a 2d array with a shape (n_samples_new, n_features), where the number of rows is "unknown" but you know your other two 'feature' dimensions of the original 3d array, so
data3d_new = data2d.reshape(-1, data3d.shape[1], data3d.shape[2])
will give you back the balanced 3d dataset.

Numpy/Tensorflow: Multiplying each depth-wise vector of 3D tensor by a 2D matrix

I have a 4x4x256 tensor and a 128x256 matrix. I need to multiply each 256-d depth-wise vector of the tensor by the matrix, such that I get a 4x4x128 tensor as a result.
Working in Numpy it's not clear to me how to do this. In their current shape it doesn't look like any variant of np.dot exists to do this. Manipulating the shapes to take advantage of broadcasting rules doesn't seem to provide any help. np.tensordot and np.einsum may be useful but looking at the documentation is going right over my head.
Is there an efficient way to do this?
You can use np.einsum to do this operation. An example with random values:
a = np.arange(4096.).reshape(4,4,256)
b = np.arange(32768.).reshape(128,256)
c = np.einsum('ijk,lk->ijl',a,b)
print(c.shape)
Here, the subscripts argument is: ijk,lk->ijl
From your requirement, i=4, j=4, k=256, l=128
The comma separates the subscripts for two operands, and the subscripts state that the multiplication should be performed over the last subscript in each tensor (the subscript k which is common to both the tensors).
The tensor subscript after the -> states that the resultant tensor should have the shape (i,j,l). Now depending on the type of operation you are performing, you might have to retain this subscript or change this subscript to jil, but the rest of the subscripts remains the same.

Writing functions that accept both 1-D and 2-D numpy arrays?

My understanding is that 1-D arrays in numpy can be interpreted as either a column-oriented vector or a row-oriented vector. For instance, a 1-D array with shape (8,) can be viewed as a 2-D array of shape (1,8) or shape (8,1) depending on context.
The problem I'm having is that the functions I write to manipulate arrays tend to generalize well in the 2-D case to handle both vectors and matrices, but not so well in the 1-D case.
As such, my functions end up doing something like this:
if arr.ndim == 1:
# Do it this way
else:
# Do it that way
Or even this:
# Reshape the 1-D array to a 2-D array
if arr.ndim == 1:
arr = arr.reshape((1, arr.shape[0]))
# ... Do it the 2-D way ...
That is, I find I can generalize code to handle 2-D cases (r,1), (1,c), (r,c), but not the 1-D cases without branching or reshaping.
It gets even uglier when the function operates on multiple arrays as I would check and convert each argument.
So my question is: am I missing some better idiom? Is the pattern I've described above common to numpy code?
Also, as a related matter of API design principles, if the caller passes a 1-D array to some function that returns a new array, and the return value is also a vector, is it common practice to reshape a 2-D vector (r,1) or (1,c) back to a 1-D array or simply document that the function returns a 2-D array regardless?
Thanks
I think in general NumPy functions that require an array of shape (r,c) make no special allowance for 1-D arrays. Instead, they expect the user to either pass an array of shape (r,c) exactly, or for the user to pass a 1-D array that broadcasts up to shape (r,c).
If you pass such a function a 1-D array of shape (c,) it will broadcast to shape (1,c), since broadcasting adds new axes on the left. It can also broadcast to shape (r,c) for an arbitrary r (depending on what other array it is being combined with).
On the other hand, if you have a 1-D array, x, of shape (r,) and you need it to broadcast up to shape (r,c), then NumPy expects the user to pass an array of shape (r,1) since broadcasting will not add the new axes on the right for you.
To do that, the user must pass x[:,np.newaxis] instead of just x.
Regarding return values: I think it better to always return a 2-D array. If the user knows the output will be of shape (1,c), and wants a 1-D array, let her slice off the 1-D array x[0] herself.
By making the return value always the same shape, it will be easier to understand code that uses this function, since it is not always immediately apparent what the shape of the inputs are.
Also, broadcasting blurs the distinction between a 1-D array of shape (c,) and a 2-D array of shape (r,c). If your function returns a 1-D array when fed 1-D input, and a 2-D array when fed 2-D input, then your function makes the distinction strict instead of blurred. Stylistically, this reminds me of checking if isinstance(obj,type), which goes against the grain of duck-typing. Don't do it if you don't have to.
unutbu's explanation is good, but I disagree on the return dimension.
The function internal pattern depends on the type of function.
Reduce operations with an axis argument can often be written so that the number of dimensions doesn't matter.
Numpy has also an atleast_2d (and atleast_1d) function that is also commonly used if you need an explicit 2d array. In statistics, I sometimes use a function like atleast_2d_cols, that reshapes 1d (r,) to 2d (r,1) for code that expects 2d, or if the input array is 1d, then the interpretation and linear algebra requires a column vector. (reshaping is cheap so this is not a problem)
In a third case, I might have different code paths if the lower dimensional case can be done cheaper or simpler than the higher dimensional case. (example: if 2d requires several dot products.)
return dimension
I think not following the numpy convention with the return dimension can be very confusing to users for general functions. (topic specific functions can be different.)
For example, reduce operations loose one dimension.
For many other functions the output dimension matches the input dimension. I think a 1d input should have a 1d output and not an extra redundant dimension. Except for functions in linalg, I don't remember any functions that would return a redundant extra dimension. (The scalar versus 1-element array case is not always consistent.)
Stylistically this reminds me of an isinstance check:
Try without it if you allow for example for numpy matrices and masked arrays. You will get funny results that are not easy to debug. Although, for most numpy and scipy functions the user has to know whether the array type will work with them, since there are few isinstance checks and asarray might not always do the right thing.
As a user, I always know what kind of "array_like" I have, a list, tuple or which array subclass, especially when I use multiplication.
np.array(np.eye(3).tolist()*3)
np.matrix(range(3)) * np.eye(3)
np.arange(3) * np.eye(3)
another example: What does this do?
>>> x = np.array(tuple(range(3)), [('',int)]*3)
>>> x
array((0, 1, 2),
dtype=[('f0', '<i4'), ('f1', '<i4'), ('f2', '<i4')])
>>> x * np.eye(3)
This question has already very good answers. Here I just want to add what I usually do (which somehow summarizes responses by others) when I want to write functions that accept a wide range of inputs while the operations I do on them require a 2d row or column vector.
If I know the input is always 1d (array or list):
a. if I need a row: x = np.asarray(x)[None,:]
b. if I need a column: x = np.asarray(x)[:,None]
If the input can be either 2d (array or list) with the right shape or 1d (which needs to be converted to 2d row/column):
a. if I need a row: x = np.atleast_2d(x)
b. if I need a column: x = np.atleast_2d(np.asarray(x).T).T or x = np.reshape(x, (len(x),-1)) (the latter seems faster)
This is a good use for decorators
def atmost_2d(func):
def wrapr(x):
return func(np.atleast_2d(x)).squeeze()
return wrapr
For example, this function will pick out the last column of its input.
#atmost_2d
def g(x):
return x[:,-1]
But: it works for:
1d:
In [46]: b
Out[46]: array([0, 1, 2, 3, 4, 5])
In [47]: g(b)
Out[47]: array(5)
2d:
In [49]: A
Out[49]:
array([[0, 1],
[2, 3],
[4, 5]])
In [50]: g(A)
Out[50]: array([1, 3, 5])
0d:
In [51]: g(99)
Out[51]: array(99)
This answer builds on the previous two.

Categories

Resources