Why np.concatenate changes dimension

Why np.concatenate changes dimension - python

In the following program, I am trying to understand how np.concatenate command works. After accessing each row of the array a by for loop, when I concatenate along row axis I expect a 2-dimensional array having the shape of (5,5) but it changes.
I want to have the same dimension (5,5) after concatenation. How can I do that?
I tried to repeat the above method for the 2-dimensional array by storing them in a list [(2,5),(2,5),(2,5)]. At the end when I concatenate it gives me the shape of (6,5) as expected but in the following case, it is different.
a = np.arange(25).reshape(5,5)
ind =[0,1,2,3,4]
list=[]
for i in ind:
list.append(a[i])
new= np.concatenate(list, axis=0)
print(list)
print(len(list))
print(new)
print(new.shape)
This gives the following results for new:
[ 0 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24]
and for new.shape:
(25,)

To preface this you really should not be using concatenate here.
Setup
a = np.arange(25).reshape(5,5)
L = [i for i in a]
You're question asks:
Why is np.concatenate changing dimension?
It's not changing dimension, it is doing exactly what it is supposed to based on the input you are giving it. From the documentation:
Join a sequence of arrays along an existing axis
When you pass your list to concatenate, don't think of it as passing a (5, 5) list, think of it as passing 5 (5,) shape arrays, which are getting joined along axis 0, which will intuitively produce a (25,) shape output.
Now this behavior also gives insight on how to work around this. If passing 5 (5,) shape arrays produces a (25,) shape output, we just need to pass (1, 5) shape arrays to produce a (5, 5) shape output. We can accomplish this by simply adding a dimension to each element of L:
np.concatenate([[i] for i in L])
array([[ 0, 1, 2, 3, 4],
[ 5, 6, 7, 8, 9],
[10, 11, 12, 13, 14],
[15, 16, 17, 18, 19],
[20, 21, 22, 23, 24]])
However, the much better way to approach this is to simply use stack, vstack, etc..
>>> np.stack(L)
array([[ 0, 1, 2, 3, 4],
[ 5, 6, 7, 8, 9],
[10, 11, 12, 13, 14],
[15, 16, 17, 18, 19],
[20, 21, 22, 23, 24]])
>>> np.vstack(L)
array([[ 0, 1, 2, 3, 4],
[ 5, 6, 7, 8, 9],
[10, 11, 12, 13, 14],
[15, 16, 17, 18, 19],
[20, 21, 22, 23, 24]])

Related

How to change certain numpy array's target value to another value (if a list of target values is available)?

I have a following array:
arr = np.arange(24).reshape(4, 6)
and have a list of target values that should be changed:
target = [0, 2, 8]
Now I would like to change all target values in arr to 100.
I could perform that with:
arr[(arr == 0)|(arr == 2)|(arr == 8)] = 100
It is not practicable if I have long lists (with 10 elements).
Is there a better way to perform it without using for loop?
Thank you for any inputs :-)

Use np.isin
arr[np.isin(arr, target)] = 100
arr
array([[100, 1, 100, 3, 4, 5],
[ 6, 7, 100, 9, 10, 11],
[ 12, 13, 14, 15, 16, 17],
[ 18, 19, 20, 21, 22, 23]])

Numpy reshape seems to output value error

I tried to use reshape
import numpy as np
d = np.arange(30).reshape(1,3)
It is not working cannot reshape array of size 30 into shape (1,3)
but when I tried to use
d = np.arange(30).reshape(-1,3) # This works
Why do we have to use -1?.
It's really confusing and I'm can't seem to figure out how reshape works. I would really appreciate if someone can help me figure out how this works. I tried docs and other posts in SO but it wasn't much helpful.
I am new to ML and Python.

A reshape means that you order your elements of the array, according to other dimensions. For example arange(27) will produce a vector containing 27 elements. But with .reshape(9, 3) you specify here that you want to transform it into a two dimensional array, where the first dimension contains 9 elements, and the second three elements. So the result will look like:
>>> np.arange(27).reshape(9, 3)
array([[ 0, 1, 2],
[ 3, 4, 5],
[ 6, 7, 8],
[ 9, 10, 11],
[12, 13, 14],
[15, 16, 17],
[18, 19, 20],
[21, 22, 23],
[24, 25, 26]])
But we can also make it a 3×3×3 array:
>>> np.arange(27).reshape(3, 3, 3)
array([[[ 0, 1, 2],
[ 3, 4, 5],
[ 6, 7, 8]],
[[ 9, 10, 11],
[12, 13, 14],
[15, 16, 17]],
[[18, 19, 20],
[21, 22, 23],
[24, 25, 26]]])
-1 is used as a value that will numpy derive the dimension.
So if you have an array containing 30 elements, and you reshape these to m×3, then m is 10. -1 is thus not the real value, it is used for programmer convenience if for example you do not know the number of elements, but know that it is divisable by three.
The following two are (under the assumption that m contains 30 elements equivalent:
m.reshape(10, 3)
m.reshape(-1, 3)
Note that you can specify at most one -1, since otherwise there are multiple possibilities, and it becomes also harder to find a valid configuration.

Change shape of multi-dimensional NumPy array without changing dimensions

I am currently monkey-patching a Scikit-Learn function and one of the lines requires a NumPy array with 2 dimensions. However, the data I am working with is a NumPy array with 3 dimensions, which raises the error "too many values to unpack."
I am calling the K-Means function fit to cluster the data. My problem boils down to this following line of code, assuming X is the ndarray that I pass in:
n_samples, n_features = X.shape
X is an array with 3 dimensions, like the following:
X = np.array([[[1, 2, 3],
[4, 5, 6]],
[[7, 8, 9],
[10, 11, 12]],
[[13, 14, 15],
[16, 17, 18]]])
The data represents a group of time series of data points that have 6 dimensions. For example, the first element, [[1, 2, 3], [4, 5, 6]] would represent a time series with samples from 2 time periods, each sample with 3 dimensions.
And I have monkey-patched the k_means_ code to allow me to perform clustering on an ndarray of ndarrays. My goal is to perform k-means on 2D arrays.
Is it possible to set the shape of the 3D ndarray to 2 elements? For example, I tried converting the 3D array to a 2D array of objects but it ends up getting converted back to a 3D array.
np.array([[x.astype(object) for x in c] for c in combined])
Likewise, the following code is also converted back to a 3D array.
np.array([[np.array(x) for x in c] for c in combined])
The list comprehension [[x.astype(object) for x in c] for c in combined] looks like it creates the correct array, but because it is of type list, it no longer works in the function.
I am looking for some way to "convert" a 3D NumPy array into 2 dimensions. Any help would be greatly appreciated!
Note: I am not looking for a way to reshape the array. I need to keep all the dimensions but change the shape to ignore one of the dimensions.

To make an array of arrays, we have to play some tricks, because np.array tries to make an as-high dimensional array as it can. If the subarrays vary in size that is ok, but if they are all the same we have to fight that.
Here's one way:
start with a 3d array:
In [812]: arr = np.arange(24).reshape(2,3,4)
and an empty object array of the right size (but flattened)
In [813]: A = np.empty((6,),object)
copy values (again with flattening), and reshape to the target shape
In [814]: A[:]=list(arr.reshape(-1,4))
In [815]: A=A.reshape(2,3)
In [816]: A
Out[816]:
array([[array([0, 1, 2, 3]), array([4, 5, 6, 7]), array([ 8, 9, 10, 11])],
[array([12, 13, 14, 15]), array([16, 17, 18, 19]),
array([20, 21, 22, 23])]], dtype=object)
So now we have a (2,3) array, who's shape can be unpacked.
I tried to start with np.empty((2,3),object), but couldn't get the A[:,:]=... assignment to work. For this object reshaping to work we have to split arr into a list of arrays. An object array is, like a list, an array of pointers.
But will the scikit functions accept such an array? (after passing the shape hurdle). I suspect the object reshaping is a short sighted solution.
In [824]: [[x.astype(object) for x in c] for c in arr]
Out[824]:
[[array([0, 1, 2, 3], dtype=object),
array([4, 5, 6, 7], dtype=object),
array([8, 9, 10, 11], dtype=object)],
[array([12, 13, 14, 15], dtype=object),
array([16, 17, 18, 19], dtype=object),
array([20, 21, 22, 23], dtype=object)]]
In [825]: _[0][0].shape
Out[825]: (4,)
This creates a nested list of lists, with the inner elements being (4,) object array. Wrap that in np.array and it recreates a 3d array with dtype object.
reshaping, which for some unknown reason, you don't want to do preserves the numeric dtype
In [828]: arr.reshape(2,-1)
Out[828]:
array([[ 0, 1, 2, 3, 4, 5, 6, 7, 8, 9, 10, 11],
[12, 13, 14, 15, 16, 17, 18, 19, 20, 21, 22, 23]])
In [829]: arr.reshape(-1,4)
Out[829]:
array([[ 0, 1, 2, 3],
[ 4, 5, 6, 7],
[ 8, 9, 10, 11],
[12, 13, 14, 15],
[16, 17, 18, 19],
[20, 21, 22, 23]])

Choosing and iterating specific sub-arrays in multidimensional arrays in Python

This is a question that comes from the post here Iterating and selecting a specific array from a multidimensional array in Python
In that post, user #Cleb solved what it was my original problem: how to perform a sum through columns in a 3d array:
import numpy as np
arra = np.arange(16).reshape(2, 2, 4)
which gives
array([[[0, 1, 2, 3],
[4, 5, 6, 7]],
[[8, 9, 10, 11],
[12, 13, 14, 15]]])
and the problem was how to perform the sum of columns in each matrix, i. e., 0 + 4, 1 + 5, ... , 8 + 12, ..., 11 + 15. It was solved by #Cleb.
Then I wondered how to do it in the case of a sum of 0 + 8, 1 + 9, ..., 4 + 12, ..., 7 + 15, (odd and even columns) which was also solved by #Cleb.
But then I wondered if there are a general idea (which can be modified in each specific case). Imagine you can add the first and the last rows and the center rows, in columns, separately, i. e., 0 + 12, 1 + 13, ..., 3 + 15, 4 + 8, 5 + 9, ..., 7 + 11.
Is there a general way? Thank you.

Depending on the how exactly arra is defined, you can shift your values appropriately using np.roll:
arra_mod = np.roll(arra, arra.shape[2])
arra_mod then looks as follows:
array([[[12, 13, 14, 15],
[ 0, 1, 2, 3]],
[[ 4, 5, 6, 7],
[ 8, 9, 10, 11]]])
Now you can simply use the command from your previous question to get your desired output:
map(sum, arra_mod)
which gives you the desired output:
[array([12, 14, 16, 18]), array([12, 14, 16, 18])]
You can also use a list comprehension
[sum(ai) for ai in arra_mod]
which gives you the same output.
If you prefer one-liner, you can therefore simply do:
map(sum, np.roll(arra, arra.shape[2]))

non-broadcastable output operand numpy 2D cast into 3D

In NumPy,
foo = np.array([[i+10*j for i in range(10)] for j in range(3)])
array([[ 0, 1, 2, 3, 4, 5, 6, 7, 8, 9],
[10, 11, 12, 13, 14, 15, 16, 17, 18, 19],
[20, 21, 22, 23, 24, 25, 26, 27, 28, 29]])
filter = np.nonzero(foo > 100)#nothing matches
foo[:,filter]
array([], shape=(3, 2, 0), dtype=int64)
foo[:,0:0]
array([], shape=(3, 0), dtype=int64)
filter2 = np.nonzero(np.sum(foo,axis=0) < 47)
foo[:,filter2]
array([[[ 0, 1, 2, 3, 4, 5]],
[[10, 11, 12, 13, 14, 15]],
[[20, 21, 22, 23, 24, 25]]])
foo[:,filter2].shape
(3, 1, 6)
I have a 'filter' condition where I want to perform an operation on all rows for all matching columns, but if filter is an empty array, somehow my foo[:,filter] gets broadcast into a 3D array. Another example is with filter2 -> again, foo[:,filter2] gives me a 3D array when I am expecting the result of foo[:,(np.sum(foo,axis=0) < 47)]
Can someone explain what the proper use case of np.nonzero is compared to using booleans to find the correct columns/indices?

First, foo[filter] == foo[filter.nonzero()] when filter is a Boolean array.
To understand why you're getting unexpected results you have to understand a little about how python does indexing. To do multidimensional indexing in python you can either use indices in [], separated by commas or use a tuple. So foo[1, 2, 3] is the same as foo[(1, 2, 3)]. With this in mind take a look at what happens when you do foo[:, something]. I believe in your example you were trying to get foo[:, something[0], something[1]], but instead you got foo[(slice[None], (something[0], something[1]))].
This is all somewhat academic, because if you're just using filter for indexing you probably don't need to use nonzero, just use the boolean array as the index but if you need to, you can do something like:
foo[:, filter[0]]
# OR
index = (slice(None),) + filter.nonzero()
foo[index]

Develop Reference

Python is a programming language that lets you work quickly and integrate systems more effectively.

Why np.concatenate changes dimension - python

Related

How to change certain numpy array's target value to another value (if a list of target values is available)?

Numpy reshape seems to output value error

Change shape of multi-dimensional NumPy array without changing dimensions

Choosing and iterating specific sub-arrays in multidimensional arrays in Python

non-broadcastable output operand numpy 2D cast into 3D

Categories

Resources