block mean of 2D numpy array (in both dimensions) - python

This question is related to Block mean of numpy 2D array (in fact the title is almost the same!) except that my case is a generalization. I want to divide a 2D array into a sub-blocks in both directions and take the mean over the blocks. (The linked example only divides the array in one dimension).
Thus if my array is this:
import numpy as np
a=np.arange(16).reshape((4,4))
array([[ 0, 1, 2, 3],
[ 4, 5, 6, 7],
[ 8, 9, 10, 11],
[12, 13, 14, 15]])
If my sub-blocks have a size 2x2, then my desired answer is
array([[ 2.5, 4.5],
[10.5, 12.5]])
The only way I could think of doing this was to carefully reshape on one dimension at a time:
np.mean(np.mean(a.reshape((2,2,-1)),axis=1).reshape((-1,2,2)),axis=2)
This gives the correct solution but is a bit of a convoluted mess, and I was wondering if there is a cleaner easier code to do the same thing, maybe some numpy blocking function that I am unaware of ?

You can do:
# sample data
a=np.arange(24).reshape((6,4))
rows, cols = a.shape
a.reshape(rows//2, 2, cols//2, 2).mean(axis=(1,-1))
Output:
array([[ 2.5, 4.5],
[10.5, 12.5],
[18.5, 20.5]])

Related

How to square a row in NumPy to go from a 2-d array to a 3-d one where each row was squared?

I am trying to figure out a way to get the rows of a 2-d matrix squared.
The behaviour I would like to have is something like this:
in[1] import numpy as np
in[2] a = np.array([[1,2,3],
[4,5,6]])
in[3] some_function(a) # for each row, row.reshape(-1,1); row # row.T
out[1] array([[[ 1, 2, 3],
[ 2, 4, 6],
[ 3, 6, 9]],
[[16, 20, 24],
[20, 25, 30],
[24, 30, 36]]])
I need this to make a softmax derivative for auto diff in a manual implementation of a feed-forward neural network.
The same derivative would look like this for a point:
in[4] def softmax_derivative(x):
in[5] s = x.reshape(-1,1)
in[6] return np.diagflat(s) - np.dot(s,s.T)
Instead of np.diagflat I am using:
in[7] matrix = np.array([[1,2,3],
[4,5,6])
in[8] matrix.shape
out[2] (2,3)
in[9] Id = np.eye(matrix.shape[-1])
in[10] (matrix[...,np.newaxis] * Id).shape
out[3] (2,3,3)
The reason I want a 3-d array of the squared rows is to subtract it from the 3-d array of the diagonal rows which I get in the same way as in the above example.
While I know that I can get the same multiplication result from
in[11] def get_squared_rows(matrix):
in[12] s = matrix.reshape(-1,1)
in[13] return s # s.T
I do not know how to get it to the correct shape in a fast way. Since, yes, the correct 2-d arrays are a part of the matrix on the diagonal, I have to get them together to match the shape of the diagonal 3-d matrix I got. This means I would somehow both have to extract the correct matrices and then turn that into a 3-d array of shape (n_samples,row,row). I do not know how to do that any faster than just a simple loop through all rows of the input matrix.
Use broadcasting:
>>> a[:, None, :] * a[:, :, None]
array([[[ 1, 2, 3],
[ 2, 4, 6],
[ 3, 6, 9]],
[[16, 20, 24],
[20, 25, 30],
[24, 30, 36]]])

Efficient ways to aggregate and replicate values in a numpy matrix

In my work I often need to aggregate and expand matrices of various quantities, and I am looking for the most efficient ways to do these actions. E.g. I'll have an NxN matrix that I want to aggregate from NxN into PxP where P < N. This is done using a correspondence between the larger dimensions and the smaller dimensions. Usually, P will be around 100 or so.
For example, I'll have a hypothetical 4x4 matrix like this (though in practice, my matrices will be much larger, around 1000x1000)
m=np.array([[1, 2, 3, 4], [5, 6, 7, 8], [9, 10, 11, 12], [13, 14, 15, 16]])
>>> m
array([[ 1, 2, 3, 4],
[ 5, 6, 7, 8],
[ 9, 10, 11, 12],
[13, 14, 15, 16]])
and a correspondence like this (schematically):
0 -> 0
1 -> 1
2 -> 0
3 -> 1
that I usually store in a dictionary. This means that indices 0 and 2 (for rows and columns) both get allocated to new index 0 and indices 1 and 3 (for rows and columns) both get allocated to new index 1. The matrix could be anything at all, but the correspondence is always many-to-one when I want to compress.
If the input matrix is A and the output matrix is B, then cell B[0, 0] would be the sum of A[0, 0] + A[0, 2] + A[2, 0] + A[2, 2] because new index 0 is made up of original indices 0 and 2.
The aggregation process here would lead to:
array([[ 1+3+9+11, 2+4+10+12 ],
[ 5+7+13+15, 6+8+14+16 ]])
= array([[ 24, 28 ],
[ 40, 44 ]])
I can do this by making an empty matrix of the right size and looping over all 4x4=16 cells of the initial matrix and accumulating in nested loops, but this seems to be inefficient and the vectorised nature of numpy is always emphasised by people. I have also done it by using np.ix_ to make sets of indices and use m[row_indices, col_indices].sum(), but I am wondering what the most efficient numpy-like way to do it is.
Conversely, what is the sensible and efficient way to expand a matrix using the correspondence the other way? For example with the same correspondence but in reverse I would go from:
array([[ 1, 2 ],
[ 3, 4 ]])
to
array([[ 1, 2, 1, 2 ],
[ 3, 4, 3, 4 ],
[ 1, 2, 1, 2 ],
[ 3, 4, 3, 4 ]])
where the values simply get replicated into the new cells.
In my attempts so far for the aggregation, I have used approaches with pandas methods with groupby on index and columns and then extracting the final matrix with, e.g. df.values. However, I don't know the equivalent way to expand a matrix, without using a lot of things like unstack and join and so on. And I see people often say that using pandas is not time-efficient.
Edit 1: I was asked in a comment about exactly how the aggregation should be done. This is how it would be done if I were using nested loops and a dictionary lookup between the original dimensions and the new dimensions:
>>> m=np.array([[1,2,3,4],[5,6,7,8],[9,10,11,12],[13,14,15,16]])
>>> mnew=np.zeros((2,2))
>>> big2small={0:0, 1:1, 2:0, 3:1}
>>> for i in range(4):
... inew = big2small[i]
... for j in range(4):
... jnew = big2small[j]
... mnew[inew, jnew] += m[i, j]
...
>>> mnew
array([[24., 28.],
[40., 44.]])
Edit 2: Another comment asked for the aggregation example towards the start to be made more explicit, so I have done so.
Assuming you don't your indices don't have a regular structure I would do it try sparse matrices.
import scipy.sparse as ss
import numpy as np
# your current array of indices
g=np.array([[0,0],[1,1],[2,0],[3,1]])
# a sparse matrix of (data=ones, (row_ind=g[:,0], col_ind=g[:,1]))
# it is one for every pair (g[i,0], g[i,1]), zero elsewhere
u=ss.csr_matrix((np.ones(len(g)), (g[:,0], g[:,1])))
Aggregate
m=np.array([[1,2,3,4],[5,6,7,8],[9,10,11,12],[13,14,15,16]])
u.T # m # u
Expand
m2 = np.array([[1,2],[3,4]])
u # m2 # u.T

numpy.r_ place the the 1’s in the mid

I am reading numpy.r_ docs. I get it that I cannot place the 1’s at the mid position.
For example ,
a = np.array( [[3,4,5],[ 33,44,55]])
b = np.array( [[-3,-4,-5],[ -33,-44,-55]])
np.r_['0,3,1',a,b]
Actually firstly the shape (2,3) of a is upgraded to shape (1,2,3) and the same as b.Then plus the two shape (1,2,3) + (1,2,3) = (2,2,3) is the final shape of result, note I only plus the first number since the '0' in the '0,3,1'.
Now the question is that according the docs, I can upgrade the shape of a to shape(1,2,3) or (2,3,1) ,but how can upgrade to the shape (2,1,3) ?
In [381]: a = np.array( [[3,4,5],[ 33,44,55]])
...:
...: b = np.array( [[-3,-4,-5],[ -33,-44,-55]])
...:
...: np.r_['0,3,1',a,b]
Out[381]:
array([[[ 3, 4, 5],
[ 33, 44, 55]],
[[ -3, -4, -5],
[-33, -44, -55]]])
Your question should have displayed this result. It helps the reader visualize the action, and better understand your question. Not everyone can run your sample (I couldn't when I first read it on my phone).
You can do the same concatenation with stack (or even np.array((a,b))):
In [382]: np.stack((a,b))
Out[382]:
array([[[ 3, 4, 5],
[ 33, 44, 55]],
[[ -3, -4, -5],
[-33, -44, -55]]])
stack with axis produces what you want (again, a good question would display the desired result):
In [383]: np.stack((a,b), axis=1)
Out[383]:
array([[[ 3, 4, 5],
[ -3, -4, -5]],
[[ 33, 44, 55],
[-33, -44, -55]]])
We can add the dimension to a by itself with:
In [384]: np.expand_dims(a,1)
Out[384]:
array([[[ 3, 4, 5]],
[[33, 44, 55]]])
In [385]: _.shape
Out[385]: (2, 1, 3)
a[:,None] and a.reshape(2,1,3) also do it.
As you found, I can't do the same with np.r_:
In [413]: np.r_['0,3,0',a].shape
Out[413]: (2, 3, 1)
In [414]: np.r_['0,3,1',a].shape
Out[414]: (1, 2, 3)
In [415]: np.r_['0,3,-1',a].shape
Out[415]: (1, 2, 3)
Even looking at the code it is hard to tell how r_ is handling this 3rd parameter. It looks like it uses the ndmin parameter to expand the arrays (which prepends new axes if needed), and then some sort of transpose to move the new axis.
This could be classed as bug in r_, but it's been around so long, I doubt if any one will care. It's more useful for expanding "slices" than for fancy concatenation.
While the syntax of np.r_ may be convenient on occasion, it isn't an essential function. It's just another front end to np.concatenate (with the added arange/linspace functionality).

Python: alter some matrix values according to a criteria without FOR cycles

I've been programming in C language for several years, but I'm newbie for Python, so my apologies in advance for my doubt below. I already researched for this in the internet, unsuccessfully.
I got an enormous dataset which has several columns. There are three specific columns that I need to work with. Let's call them DATA, CRITERIA1 and CRITERIA2. I have them each one as numpy ndarray.
The elements in both CRITERIA are integers; shape = (777777,) and type = int32
The elements in DATA are matrices of complex numbers; shape = (3,5,777777) and type = complex128
I call the dimension 777777 as "row-number".
For all the rows that fulfill:
((5<CRITERIA1<9) && (CRITERIA2!=7))
the matrices elements for corresponding rows must be multiplied by (3,2i).
I can easily do it with FOR cycles and IFs.
However, I was told that Python has the power to do it at once, without FORs and IFs. Is it true?? How???
Greetings!
Try np.where():
import numpy as np
c1 = np.arange(10) # array([0, 1, 2, 3, 4, 5, 6, 7, 8, 9])
c2 = np.arange(10)+1 # array([ 1, 2, 3, 4, 5, 6, 7, 8, 9, 10])
pos = np.where((c1>5) & (c1<9) & (c2!=7))
c1 = c1.astype('complex')
c1[pos] *= np.array([3+2j])
# array([ 0. +0.j, 1. +0.j, 2. +0.j, 3. +0.j, 4. +0.j, 5. +0.j, 6. +0.j, 21.+14.j, 24.+16.j, 9. +0.j])

Delete rows at select indexes from a numpy array

In my dataset I've close to 200 rows but for a minimal working e.g., let's assume the following array:
arr = np.array([[1,2,3,4], [5,6,7,8],
[9,10,11,12], [13,14,15,16],
[17,18,19,20], [21,22,23,24]])
I can take a random sampling of 3 of the rows as follows:
indexes = np.random.choice(np.arange(arr.shape[0]), int(arr.shape[0]/2), replace=False)
Using these indexes, I can select my test cases as follows:
testing = arr[indexes]
I want to delete the rows at these indexes and I can use the remaining elements for my training set.
From the post here, it seems that training = np.delete(arr, indexes) ought to do it. But I get 1d array instead.
I also tried the suggestion here using training = arr[indexes.astype(np.bool)] but it did not give a clean separation. I get element [5,6,7,8] in both the training and testing sets.
training = arr[indexes.astype(np.bool)]
testing
Out[101]:
array([[13, 14, 15, 16],
[ 5, 6, 7, 8],
[17, 18, 19, 20]])
training
Out[102]:
array([[ 1, 2, 3, 4],
[ 5, 6, 7, 8],
[ 9, 10, 11, 12]])
Any idea what I am doing wrong? Thanks.
To delete indexed rows from numpy array:
arr = np.delete(arr, indexes, axis=0)
One approach would be to get the remaining row indices with np.setdiff1d and then use those row indices to get the desired output -
out = arr[np.setdiff1d(np.arange(arr.shape[0]), indexes)]
Or use np.in1d to leverage boolean indexing -
out = arr[~np.in1d(np.arange(arr.shape[0]), indexes)]

Categories

Resources