Concatenating/Appending Multiple Vertical Arrays of Different Sizes - python

I have a function that returns a numpy array. I loop this function with different data files but will end up with every loops giving out a different sized array (which is the desired output) but I cannot figure out how to properly append these arrays. Example arrays and the method I use for arranging them after I grab the data from the file is shown:
a1 = np.array([1,2,3])
a2 = np.vstack(a1)
# array([[1],
[2],
[3]])
b1 = np.array([4,5,6,7])
b2 = np.vstack(b2)
# array([[4],
[5],
[6],
[7]])
Simply I have these two arrays with one having 3 elements and one with 4. I want to arrange these vertically to look something like this for it to be exported:
1 4
2 5
3 6
7
I do not want zeros or Na to fill the gaps in the data as that would make more work.
This needs to work for vertical arrays with a column width of 2 to get output data to be organized like this:
1 2 5 6 10 11
2 3 6 7 11 12
3 4 7 8 12 13
8 9
So the first loop would produce this vertical 3,2 array while the second iteration of the loop would produce the 4,2 array where I would want to append or concatenate the 4,2 array to the original 3,2 array and so on. These sets of arrays will always be width of 2 but the lengths will change from each set of 2.
I have tried using the basic np.column_stack, np.concatenate, and np.append functions but they haven't worked. These can be lists instead of numpy arrays if that works better or even organizing the outputted data in a dataframe would be fine.
======= Update =======
To be more specific and after trying some of the solutions provided here are some more details on my issue.
My function gets data from a data file (works fine) which returns 2 lists or arrays (which ever) of values that are the same dimensions (no issue here either).
Now I am trying to do this while looping over all of the files in a directory and I want to append/concatenate these two lists (or arrays) for each file together but they could be different sizes. The trouble arises when I try to put them together vertically to yield columns of the output data. Also I need to do a simple mathematical operation on the values within the loop so I think they might need to be numpy arrays (or something similar) and not a list.
Loop #1 returns:
outdata1 = [0.0012, 0.0013, 0.00124, 0.00127]
outdata2 = [0.0016, 0.0014, 0.00134, 0.0013]
Loop #2 returns:
outdata1 = [0.00155, 0.00174, 0.0018]
outdata2 = [0.0019, 0.0020, 0.0021]
and so on...
Now I need to do math on these and spit them out into vertically organized column data without cutting off any data. This can be done with putting Na in space or with a data frame if that would work and I could correct those spaces before export. I would like it to look like this:
0.0012 0.0016 0.00155 0.0019
0.0013 0.0014 0.00174 0.0020
0.00124 0.00134 0.0018 0.0021
0.00127 0.0013

First, vstack on an array treats the array as a list on the first dimension. It then makes each 'row/element' into a 2d array, and concatenates them.
These all do the same thing:
In [94]: np.vstack(np.array([1,2,3]))
Out[94]:
array([[1],
[2],
[3]])
In [95]: np.vstack([[1],[2],[3]])
Out[95]:
array([[1],
[2],
[3]])
In [96]: np.concatenate(([[1]],[[2]],[[3]]), axis=0)
Out[96]:
array([[1],
[2],
[3]])
Matching arrays or lists can be 'column_stack` - the arrays are turned into (n,1) arrays, and then joined on the 2nd dimension:
In [97]: np.column_stack(([1,2,3], [4,5,6]))
Out[97]:
array([[1, 4],
[2, 5],
[3, 6]])
But the ragged arrays don't work.
An array of lists/arrays of differing size has object dtype, and is, for many purposes like a list of lists:
In [98]: np.array(([1,2,3],[4,5,6,7]))
Out[98]: array([list([1, 2, 3]), list([4, 5, 6, 7])], dtype=object)
Your last structure could written as a ragged list of lists:
In [100]: [[1,2,5,6,10,11],[2,3,6,7,11,12],[3,4,7,8,12,13],[8,9]]
Out[100]: [[1, 2, 5, 6, 10, 11], [2, 3, 6, 7, 11, 12], [3, 4, 7, 8, 12, 13], [8, 9]]
In [101]: np.array(_)
Out[101]:
array([list([1, 2, 5, 6, 10, 11]), list([2, 3, 6, 7, 11, 12]),
list([3, 4, 7, 8, 12, 13]), list([8, 9])], dtype=object)
Notice though this doesn't line up the [8,9] with the others. You need some sort of filler/spacer. The Python list zip_longest provides that:
In [102]: from itertools import zip_longest
In [103]: alist = [[1,2,3],[2,3,4],[5,6,7,8],[11,12,13]]
In [104]: list(zip_longest(*alist))
Out[104]: [(1, 2, 5, 11), (2, 3, 6, 12), (3, 4, 7, 13), (None, None, 8, None)]
With this padding we can make a 2d array (object dtype because of the None):
In [105]: np.array(_)
Out[105]:
array([[1, 2, 5, 11],
[2, 3, 6, 12],
[3, 4, 7, 13],
[None, None, 8, None]], dtype=object)
===
I can generate the numbers in your last display with a little function:
In [232]: def foo(i,n):
...: return np.column_stack((np.arange(i,i+n), np.arange(i+1,i+1+n)))
...:
In [233]: foo(1,3)
Out[233]:
array([[1, 2],
[2, 3],
[3, 4]])
In [234]: foo(5,4)
Out[234]:
array([[5, 6],
[6, 7],
[7, 8],
[8, 9]])
In [235]: foo(10,3)
Out[235]:
array([[10, 11],
[11, 12],
[12, 13]])
I can put all those arrays in a list:
In [236]: [Out[233], Out[234], Out[235]]
Out[236]:
[array([[1, 2],
[2, 3],
[3, 4]]), array([[5, 6],
[6, 7],
[7, 8],
[8, 9]]), array([[10, 11],
[11, 12],
[12, 13]])]
I can turn that list into an object dtype array:
In [237]: np.array([Out[233], Out[234], Out[235]])
Out[237]:
array([array([[1, 2],
[2, 3],
[3, 4]]),
array([[5, 6],
[6, 7],
[7, 8],
[8, 9]]),
array([[10, 11],
[11, 12],
[12, 13]])], dtype=object)
I could also display several rows of these arrays with:
In [238]: for i in range(3):
...: print(np.hstack([a[i,:] for a in Out[236]]))
...:
[ 1 2 5 6 10 11]
[ 2 3 6 7 11 12]
[ 3 4 7 8 12 13]
but to show the 4th row, which only exists for the middle array, I'd have to add more code to test whether we're off the end, and whether to add padding etc. I'll leave that exercise up to you, if it really matters. :)

Since you mentioned that lists are ok, why not use a list of such "vertical arrays"?:
my_list = []
while (not_done_yet):
two_col_array = your_func (some_param) # your_func returns (x,2) array
my_list.append(two_col_array)
my_list would now be a list of arrays of shape (x,2), where x could be different for different arrays in the list.

Related

Accessing columns of matlab matrix and its realization in numpy

I am trying to find a realization of accessing elements of numpy arrays corresponding to a feature of Matlab.
Suppose given a (2,2,2) Matlab matrix m in the form
m(:,:,1) = [1,2;3,4]
m(:,:,2) = [5,6;7,8]
Even though this is a 3-d array, Matlab allows accessing its column in the fashion like
m(:,1) = [1;3]
m(:,2) = [2;4]
m(:,3) = [5;7]
m(:,4) = [6;8]
I am curious to know that if numpy supports such indexing so that given the following array
m = array([[[1, 2],
[3, 4]],
[[5, 6],
[7, 8]]])
One can also access columns in the fashion as Matlab listed above.
My answer to this question is as following, suppose given the array listed as in the question
m = array([[[1, 2],
[3, 4]],
[[5, 6],
[7, 8]]])
One can create a list, I call it m_list in the form such that
m_list = [m[i][:,j] for i in range(m.shape[0]) for j in range(m.shape[-1])]
This will output m_list in the form such that
m_list = [array([1, 3]), array([2, 4]), array([7, 9]), array([ 8, 10])]
Now we can access elements of m_list exactly as the fashion as Matlab as listed in the question.
In [41]: m = np.arange(1,9).reshape(2,2,2)
In [42]: m
Out[42]:
array([[[1, 2],
[3, 4]],
[[5, 6],
[7, 8]]])
Indexing the equivalent blocks:
In [47]: m[0,:,0]
Out[47]: array([1, 3])
In [48]: m[0,:,1]
Out[48]: array([2, 4])
In [49]: m[1,:,0]
Out[49]: array([5, 7])
In [50]: m[1,:,1]
Out[50]: array([6, 8])
We can reshape, to "flatten" one pair of dimensions:
In [84]: m = np.arange(1,9).reshape(2,2,2)
In [85]: m.reshape(2,4)
Out[85]:
array([[1, 2, 3, 4],
[5, 6, 7, 8]])
In [87]: m.reshape(2,4)[:,2]
Out[87]: array([3, 7])
and throw in a transpose:
In [90]: m.transpose(1,0,2).reshape(2,4)
Out[90]:
array([[1, 2, 5, 6],
[3, 4, 7, 8]])
MATLAB originally was strictly 2d. Then sometime around v3.9 (2000) they allowed for more, but in a kludgy way. They added a way to index the trailing dimension as though it was multidimensional. In another recent SO I noticed that when reshaping to (2,2,1,1) the result remained (2,2). Trailing size 1 dimensions are squeeze out.
I suspect the m(:,3) is a consequence of that as well.
Testing a 4d MATLAB
>> m=reshape(1:36,2,3,3,2);
>> m(:,:,1)
ans =
1 3 5
2 4 6
>> reshape(m,2,3,6)(:,:,1)
ans =
1 3 5
2 4 6
>> m(:,17)
ans =
33
34
>> reshape(m,2,18)(:,17)
ans =
33
34

Why is the print result of 3d arrays different from the mental visualisation of the same in python?

I am learning machine learning and I don't have much coding experience. While trying to understand 3d arrays, I was instructed to visualise a 2x4x3 array to be:
But when I create a random array with the same shape using:
X = np.random.randint(10, size=(2, 4, 3))
print(X)
the output is
[[[6 1 0]
[6 6 5]
[2 7 0]
[5 4 3]]
[[7 8 2]
[9 1 2]
[2 0 1]
[8 0 9]]]
This looks like 4x3x2 to me.
Am I wrong in understanding 2x4x3 as the image given above? Why is python printing 3d arrays like this? And finally if my mental visualisation is correct, how are the generated random values arranged in the image?
MATLAB/Octave does display this 3d array as 3 blocks of (2,4) matrices
>> reshape(1:24,2, 4, 3)
ans =
ans(:,:,1) =
1 3 5 7
2 4 6 8
ans(:,:,2) =
9 11 13 15
10 12 14 16
ans(:,:,3) =
17 19 21 23
18 20 22 24
But here the trailing dimension is the outermost. This called column major or Fortran convention. Notice how the values increase, going down the column.
But in numpy the leading dimension is outer most. Values increase across the rows. This is a row-major or C ordering
In [22]: np.arange(1,25).reshape(2,4,3)
Out[22]:
array([[[ 1, 2, 3],
[ 4, 5, 6],
[ 7, 8, 9],
[10, 11, 12]],
[[13, 14, 15],
[16, 17, 18],
[19, 20, 21],
[22, 23, 24]]])
This dimension ordering matches the nesting in the list equivalent:
In [24]: np.arange(1,25).reshape(2,4,3).tolist()
Out[24]:
[[[1, 2, 3], [4, 5, 6], [7, 8, 9], [10, 11, 12]],
[[13, 14, 15], [16, 17, 18], [19, 20, 21], [22, 23, 24]]]
The meaning of the dimensions comes from the application and user, not from Python/numpy. Images are often (height, width, channels). Computationally it may be convenient to keep the 3 (or 4) elements of a channel for one pixel together, that is, make it that dimension last. So your (2,4,3) could be thought of as a (2,4) image with 3 colors (rgb). The normal numpy print isn't the best for visualizing that.
But if the image is (400, 600, 3) shape, we don't want a 'print' of the array. We want a plot or image display, a picture, that renders that last dimension as colors.
It is 2×4×3. The first dimension is the outermost one. We see that the outer list (see the outer square brackets), has two elements:
[[[6, 1, 0],
[6, 6, 5],
[2, 7, 0],
[5, 4, 3]],
[[7, 8, 2],
[9, 1, 2],
[2, 0, 1],
[8, 0, 9]]]
Each of these items has the same dimensions: a 4×3 matrix. Indeed, if take a look at the first item of the list, we have:
[[6, 1, 0],
[6, 6, 5],
[2, 7, 0],
[5, 4, 3]]
here there are four rows, and if we take a look at the first row for example, we see a collection with three elements:
[6, 1, 0]

Using numpy.vectorize() to rotate all elements of a NumPy array

I am in the beginning phases of learning NumPy. I have a Numpy array of 3x3 matrices. I would like to create a new array where each of those matrices is rotated 90 degrees. I've studied this answer but I still can't figure out what I am doing wrong.
import numpy as np
# 3x3
m = np.array([[1,2,3], [4,5,6], [7,8,9]])
# array of 3x3
a = np.array([m,m,m,m])
# rotate a single matrix counter-clockwise
def rotate90(x):
return np.rot90(x)
# function that can be called on all elements of an np.array
# Note: I've tried different values for otypes= without success
f = np.vectorize(rotate90)
result = f(a)
# ValueError: Axes=(0, 1) out of range for array of ndim=0.
# The error occurs in NumPy's rot90() function.
Note: I realize I could do the following but I'd like to understand the vectorized option.
t = np.array([ np.rot90(x, k=-1) for x in a])
No need to do the rotations individually: numpy has a builtin numpy.rot90(m, k=1, axes=(0, 1)) function. By default the matrix is thus rotate over the first and second dimension.
If you want to rotate one level deeper, you simply have to set the axes over which rotation happens, one level deeper (and optionally swap them if you want to rotate in a different direction). Or as the documentation specifies:
axes: (2,) array_like
The array is rotated in the plane defined by the
axes. Axes must be different.
So we rotate over the y and z plane (if we label the dimensions x, y and z) and thus we either specify (2,1) or (1,2).
All you have to do is set the axes correctly, when you want to rotate to the right/left:
np.rot90(a,axes=(2,1)) # right
np.rot90(a,axes=(1,2)) # left
This will rotate all matrices, like:
>>> np.rot90(a,axes=(2,1))
array([[[7, 4, 1],
[8, 5, 2],
[9, 6, 3]],
[[7, 4, 1],
[8, 5, 2],
[9, 6, 3]],
[[7, 4, 1],
[8, 5, 2],
[9, 6, 3]],
[[7, 4, 1],
[8, 5, 2],
[9, 6, 3]]])
Or if you want to rotate to the left:
>>> np.rot90(a,axes=(1,2))
array([[[3, 6, 9],
[2, 5, 8],
[1, 4, 7]],
[[3, 6, 9],
[2, 5, 8],
[1, 4, 7]],
[[3, 6, 9],
[2, 5, 8],
[1, 4, 7]],
[[3, 6, 9],
[2, 5, 8],
[1, 4, 7]]])
Note that you can only specify the axes from numpy 1.12 and (probably) future versions.
Normally np.vectorize is used to apply a scalar (Python, non-numpy) function to all elements of an array, or set of arrays. There's a note that's often overlooked:
The vectorize function is provided primarily for convenience, not for
performance. The implementation is essentially a for loop.
In [278]: m = np.array([[1,2,3],[4,5,6]])
In [279]: np.vectorize(lambda x:2*x)(m)
Out[279]:
array([[ 2, 4, 6],
[ 8, 10, 12]])
This multiplies each element of m by 2, taking care of the looping paper-work for us.
Better yet, when given several arrays, it broadcasts (a generalization of 'outer product').
In [280]: np.vectorize(lambda x,y:2*x+y)(np.arange(3), np.arange(2)[:,None])
Out[280]:
array([[0, 2, 4],
[1, 3, 5]])
This feeds (x,y) scalar tuples to the lambda for all combinations of a (3,) array broadcasted against a (2,1) array, resulting in a (2,3) array. It can be viewed as a broadcasted extension of map.
The problem with np.vectorize(np.rot90) is that rot90 takes a 2d array, but vectorize will feed it scalars.
However I see in the docs that for v1.12 they've added a signature parameter. This is the first time I used it.
Your problem - apply np.rot90 to 2d elements of a 3d array:
In [266]: m = np.array([[1,2,3],[4,5,6]])
In [267]: a = np.stack([m,m])
In [268]: a
Out[268]:
array([[[1, 2, 3],
[4, 5, 6]],
[[1, 2, 3],
[4, 5, 6]]])
While you could describe this a as an array of 2d arrays, it's better to think of it as a 3d array of integers. That's how the np.vectorize(myfun)(a) sees it, giving myfun each number.
Applied to a 2d m:
In [269]: np.rot90(m)
Out[269]:
array([[3, 6],
[2, 5],
[1, 4]])
With the Python work horse, the list comprehension:
In [270]: [np.rot90(i) for i in a]
Out[270]:
[array([[3, 6],
[2, 5],
[1, 4]]), array([[3, 6],
[2, 5],
[1, 4]])]
The result is a list, but we could wrap that in np.array.
Python map does the same thing.
In [271]: list(map(np.rot90, a))
Out[271]:
[array([[3, 6],
[2, 5],
[1, 4]]), array([[3, 6],
[2, 5],
[1, 4]])]
The comprehension and map both iterate on the 1st dimension of a, action on the resulting 2d element.
vectorize with signature:
In [272]: f = np.vectorize(np.rot90, signature='(n,m)->(k,l)')
In [273]: f(a)
Out[273]:
array([[[3, 6],
[2, 5],
[1, 4]],
[[3, 6],
[2, 5],
[1, 4]]])
The signature tells it to pass a 2d array and expect back a 2d array. (I should explore how signature plays with the otypes parameter.)
Some quick time comparisons:
In [287]: timeit np.array([np.rot90(i) for i in a])
10000 loops, best of 3: 40 µs per loop
In [288]: timeit np.array(list(map(np.rot90, a)))
10000 loops, best of 3: 41.1 µs per loop
In [289]: timeit np.vectorize(np.rot90, signature='(n,m)->(k,l)')(a)
1000 loops, best of 3: 234 µs per loop
In [290]: %%timeit f=np.vectorize(np.rot90, signature='(n,m)->(k,l)')
...: f(a)
...:
1000 loops, best of 3: 196 µs per loop
So for a small array, the Python list methods are faster, by quite a bit. Sometimes, numpy approaches do better with larger arrays, though I doubt in this case.
rot90 with the axes parameter is even better, and will do well with larger arrays:
In [292]: timeit np.rot90(a,axes=(1,2))
100000 loops, best of 3: 15.7 µs per loop
Looking at the np.rot90 code, I see that it is just doing np.flip (reverse) and np.transpose, in various combinations depending on the k. In effect for this case it is doing:
In [295]: a.transpose(0,2,1)[:,::-1,:]
Out[295]:
array([[[3, 6],
[2, 5],
[1, 4]],
[[3, 6],
[2, 5],
[1, 4]]])
(this is even faster than rot90.)
I suspect vectorize with the signature is doing something like:
In [301]: b = np.zeros(2,dtype=object)
In [302]: b[...] = [m,m]
In [303]: f = np.frompyfunc(np.rot90, 1,1)
In [304]: f(b)
Out[304]:
array([array([[3, 6],
[2, 5],
[1, 4]]),
array([[3, 6],
[2, 5],
[1, 4]])], dtype=object)
np.stack(f(b)) will convert the object array into a 3d array like the other code.
frompyfunc is the underlying function for vectorize, and returns an array of objects. Here I create an array like your a except it is 1d, containing multiple m arrays. It is an array of arrays, as opposed to a 3d array.

Efficiently change order of numpy array

I have a 3 dimensional numpy array. The dimension can go up to 128 x 64 x 8192. What I want to do is to change the order in the first dimension by interchanging pairwise.
The only idea I had so far is to create a list of the indices in the correct order.
order = [1,0,3,2...127,126]
data_new = data[order]
I fear, that this is not very efficient but I have no better idea so far
You could reshape to split the first axis into two axes, such that latter of those axes is of length 2 and then flip the array along that axis with [::-1] and finally reshape back to original shape.
Thus, we would have an implementation like so -
a.reshape(-1,2,*a.shape[1:])[:,::-1].reshape(a.shape)
Sample run -
In [170]: a = np.random.randint(0,9,(6,3))
In [171]: order = [1,0,3,2,5,4]
In [172]: a[order]
Out[172]:
array([[0, 8, 5],
[4, 5, 6],
[0, 0, 2],
[7, 3, 8],
[1, 6, 3],
[2, 4, 4]])
In [173]: a.reshape(-1,2,*a.shape[1:])[:,::-1].reshape(a.shape)
Out[173]:
array([[0, 8, 5],
[4, 5, 6],
[0, 0, 2],
[7, 3, 8],
[1, 6, 3],
[2, 4, 4]])
Alternatively, if you are looking to efficiently create those constantly flipping indices order, we could do something like this -
order = np.arange(data.shape[0]).reshape(-1,2)[:,::-1].ravel()

Sort matrix based on its diagonal entries

First of all I would like to point out that my question is different than this one: Sort a numpy matrix based on its diagonal
The question is as follow:
Suppose I have a numpy matrix
A=
5 7 8
7 2 9
8 9 3
I would like to sort the matrix based on its diagonal and then re-arrange the matrix element based on it. Such that now
sorted_A:
2 9 7
9 3 8
7 8 5
Note that:
(1). The diagonal is sorted
(2). The other elements (non-diagonal) re-adjusted by it. How?
because diag(A)= [5,2,3] & diag(sorted_A)=[2,3,5]
so row/column indices A=[0,1,2] become [1,2,0] in sorted_A.
So far I use brute force where I extract the diagonal elements, get the indices O(N²) and then re-arrange the matrix (another O(N²)). I wonder if there is any efficient/elegant way to do this. I appreciate all the help I can get.
Sorting the rows based on the diagonal values is easy:
In [192]: A=np.array([[5,7,8],[7,2,9],[8,9,3]])
In [193]: A
Out[193]:
array([[5, 7, 8],
[7, 2, 9],
[8, 9, 3]])
In [194]: np.diag(A)
Out[194]: array([5, 2, 3])
In [195]: idx=np.argsort(np.diag(A))
In [196]: idx
Out[196]: array([1, 2, 0], dtype=int32)
In [197]: A[idx,:]
Out[197]:
array([[7, 2, 9],
[8, 9, 3],
[5, 7, 8]])
Rearranging the elements in each row to the original diagonals are back on the diagonal will take some experimenting - trial and error. We probably have to 'roll' each row based on some value related to the sorting idx. I don't recall if there is a function to roll each row separately or if we have to iterate over the rows to do that.
In [218]: A1=A[idx,:]
In [219]: [np.roll(a,-i) for a,i in zip(A1,[1,1,1])]
Out[219]: [array([2, 9, 7]), array([9, 3, 8]), array([7, 8, 5])]
In [220]: np.array([np.roll(a,-i) for a,i in zip(A1,[1,1,1])])
Out[220]:
array([[2, 9, 7],
[9, 3, 8],
[7, 8, 5]])
So roll with [1,1,1] does the job. But off hand I don't see how that can be derived. I suspect we need to generate several more test cases, possibly larger ones, and look for a pattern.
That roll probably has something to do with how much the row has moved, the difference between the original position and the new one. Let's try:
np.arange(3)-idx
In [222]: np.array([np.roll(a,i) for a,i in zip(A1,np.arange(3)-idx)])
Out[222]:
array([[2, 9, 7],
[9, 3, 8],
[7, 8, 5]])
Applying the sorting idx to both rows and columns seems to do the trick as well:
In [227]: A[idx,:][:,idx]
Out[227]:
array([[2, 9, 7],
[9, 3, 8],
[7, 8, 5]])
In [229]: A[idx[:,None],idx]
Out[229]:
array([[2, 9, 7],
[9, 3, 8],
[7, 8, 5]])
Here I simplify a straightforward solution that has been stated before but is hard to get your heads around.
This is useful if you want to sort a table (e.g. confusion matrix by its diagonal magnitude and arrange rows and columns accordingly.
>>> A=np.array([[5,1,4],[7,2,9],[8,0,3]])
>>> A
array([[5, 1, 4],
[7, 2, 9],
[8, 0, 3]])
>>> diag = np.diag(A)
>>> diag
array([5, 2, 3])
>>> idx=np.argsort(diag) # get the order of items that are in diagon
>>> A[idx,:][:,idx] # reorder rows and arrows based on the order of items on diagon
array([[2, 9, 7],
[0, 3, 8],
[1, 4, 5]])
if you want to sort in descending order just add idx = idx[::-1] # reverse order

Categories

Resources