Converting bsxfun with #times to numpy - python

This is the code I have in Octave:
sum(bsxfun(#times, X*Y, X), 2)
The bsxfun part of the code produces element-wise multiplication so I thought that numpy.multiply(X*Y, X) would do the trick but I got an exception. When I did a bit of research I found that element-wise multiplication won't work on Python arrays (specifically if X and Y are of type "numpy.ndarray"). So I was wondering if anyone can explain this a bit more -- i.e. would type casting to a different type of object work? The Octave code works so I know I don't have a linear algebra mistake. I'm assuming that bsxfun and numpy.multiply are not actually equivalent but I'm not sure why so any explanations would be great.
I was able to find a website! that gives Octave to Matlab function conversions but it didn't seem to be help in my case.

bsxfun in Matlab stand for binary singleton expansion, in numpy it's called broadcasting and should happen automatically. The solution will depend on the dimensions of your X, i.e. is it a row or column vector but this answer shows one way to do it:
How to multiply numpy 2D array with numpy 1D array?
I think that the issue here is that broadcasting requires one of the dimensions to be 1 and, unlike Matlab, numpy seems to differentiate between a 1 dimensional 2 element vector and a 2 dimensional 2 element, i.e. the difference between a matrix of shape (2,) and of shape (2,1), you need the latter for broadcasting to happen.

For those who don't know Numpy, I think it's worth pointing out that the equivalent of Octave's (and Matlab's) * operator (matrix multiplication) is numpy.dot (and, debatably, numpy.outer). Numpy's * operator is similar to bsxfun(#times,...) in Octave, which is itself a generalization of .*.
In Octave, when applying bsxfun, there are implicit singleton dimensions to the right of the "true" size of the operands; that is, an n1 x n2 x n3 array can be considered as n1 x n2 x n3 x 1 x 1 x 1 x.... In Numpy, the implicit singleton dimensions are to the left; so an m1 x m2 x m3 can be considered as ... x 1 x 1 x m1 x m2 x m3. This matters when considering operand sizes: in Octave, bsxfun(#times,a,b) will work if a is 2 x 3 x 4 and b is 2 x 3. In Numpy one could not multiply two such arrays, but one could multiply a 2 x 3 x 4 and a 3 x 4 array.
Finally, bsxfun(#times, X*Y, X) in Octave will probably look something like numpy.dot(X,Y) * X. There are still some gotchas: for instance, if you're expecting an outer product (that is, in Octave X is a column vector, Y a row vector), you could look at using numpy.outer instead, or be careful about the shape of X and Y.

Little late, but I would like to provide an example of having equivalent bsxfun and repmat in python. This is a little bit of code I was just converting from Matlab to Python numpy:
Matlab:
x =
-2
-1
0
1
2
n =
2
M = repmat(x,1,n+1)
M =
-2 -2 -2
-1 -1 -1
0 0 0
1 1 1
2 2 2
M = bsxfun(#power,M,0:n)
M =
1 -2 4
1 -1 1
1 0 0
1 1 1
1 2 4
Equivalent in Python:
In [8]: x
Out[8]:
array([[-2],
[-1],
[ 0],
[ 1],
[ 2]])
In [9]: n=2
In [11]: M = np.tile(x, (1, n + 1))
In [12]: M
Out[12]:
array([[-2, -2, -2],
[-1, -1, -1],
[ 0, 0, 0],
[ 1, 1, 1],
[ 2, 2, 2]])
In [13]: M = np.apply_along_axis(pow, 1, M, range(n + 1))
In [14]: M
Out[14]:
array([[ 1, -2, 4],
[ 1, -1, 1],
[ 1, 0, 0],
[ 1, 1, 1],
[ 1, 2, 4]])

Related

iterating a filtered Numpy array whilst maintaining index information

I am attempting to pass filtered values from a Numpy array into a function.
I need to pass values only above a certain value, and their index position with the Numpy array.
I am attempting to avoid iterating over the entire array within python by using Numpys own filtering systems, the arrays i am dealing with have 20k of values in them with potentially only very few being relevant.
import numpy as np
somearray = np.array([1,2,3,4,5,6])
arrayindex = np.nonzero(somearray > 4)
for i in arrayindex:
somefunction(arrayindex[0], somearray[arrayindex[0]])
This threw up errors of logic not being able to handle multiple values,
this led me to testing it through print statement to see what was going on.
for cell in arrayindex:
print(f"index {cell}")
print(f"data {somearray[cell]}")
I expected an output of
index 4
data 5
index 5
data 6
But instead i get
index [4 5]
data [5 6]
I have looked through different methods to iterate through numpy arrays such and neditor, but none seem to still allow me to do the filtering of values outside of the for loop.
Is there a solution to my quandary?
Oh, i am aware that is is generally frowned upon to loop through a numpy array, however the function that i am passing these values to are complex, triggering certain events and involving data to be uploaded to a data base dependent on the data location within the array.
Thanks.
import numpy as np
somearray = np.array([1,2,3,4,5,6])
arrayindex = [idx for idx, val in enumerate(somearray) if val > 4]
for i in range(0, len(arrayindex)):
somefunction(arrayindex[i], somearray[arrayindex[i]])
for i in range(0, len(arrayindex)):
print("index", arrayindex[i])
print("data", somearray[arrayindex[i]])
You need to have a clear idea of what nonzero produces, and pay attention to the difference between indexing with a list(s) and with a tuple.
===
In [110]: somearray = np.array([1,2,3,4,5,6])
...: arrayindex = np.nonzero(somearray > 4)
nonzero produces a tuple of arrays, one per dimension (this becomes more obvious with 2d arrays):
In [111]: arrayindex
Out[111]: (array([4, 5]),)
It can be used directly as an index:
In [113]: somearray[arrayindex]
Out[113]: array([5, 6])
In this 1d case you could take the array out of the tuple, and iterate on it:
In [114]: for i in arrayindex[0]:print(i, somearray[i])
4 5
5 6
argwhere does a 'transpose', which could also be used for iteration
In [115]: idxs = np.argwhere(somearray>4)
In [116]: idxs
Out[116]:
array([[4],
[5]])
In [117]: for i in idxs: print(i,somearray[i])
[4] [5]
[5] [6]
idxs is (2,1) shape, so i is (1,) shape array, resulting in the brackets in the display. Occasionally it's useful, but nonzero is used more (often by it's other name, np.where).
2d
argwhere has a 2d example:
In [119]: x=np.arange(6).reshape(2,3)
In [120]: np.argwhere(x>1)
Out[120]:
array([[0, 2],
[1, 0],
[1, 1],
[1, 2]])
In [121]: np.nonzero(x>1)
Out[121]: (array([0, 1, 1, 1]), array([2, 0, 1, 2]))
In [122]: x[np.nonzero(x>1)]
Out[122]: array([2, 3, 4, 5])
While nonzero can be used to index the array, argwhere elements can't.
In [123]: for ij in np.argwhere(x>1):
...: print(ij,x[ij])
...:
...
IndexError: index 2 is out of bounds for axis 0 with size 2
Problem is that ij is a list, which is used to index on dimension. numpy distinguishes between lists and tuples when indexing. (Earlier versions fudged the difference, but current versions are taking a more rigorous approach.)
So we need to change the list into a tuple. One way is to unpack it:
In [124]: for i,j in np.argwhere(x>1):
...: print(i,j,x[i,j])
...:
...:
0 2 2
1 0 3
1 1 4
1 2 5
I could have used: print(ij,x[tuple(ij)]) in [123].
I should have used unpacking the [117] iteration:
In [125]: for i, in idxs: print(i,somearray[i])
4 5
5 6
or somearray[tuple(i)]

SciPy: Symmetric permutation of sparse CSR matrix

I'd like to symmetrically permute a sparse matrix, permuting rows and columns in the same way. For example, I would like to rotate the rows and columns, which takes:
1 2 3
0 1 0
0 0 1
to
1 0 0
0 1 0
2 3 1
In Octave or MATLAB, one can do this concisely with matrix indexing:
A = sparse([1 2 3; 0 1 0; 0 0 1]);
perm = [2 3 1];
Aperm = A(perm,perm);
I am interested in doing this in Python, with NumPy/SciPy. Here is an attempt:
#!/usr/bin/env python
import numpy as np
from scipy.sparse import csr_matrix
row = np.array([0, 0, 0, 1, 2])
col = np.array([0, 1, 2, 1, 2])
data = np.array([1, 2, 3, 1, 1])
A = csr_matrix((data, (row, col)), shape=(3, 3))
p = np.array([1, 2, 0])
#Aperm = A[p,p] # gives [1,1,1], the permuted diagonal
Aperm = A[:,p][p,:] # works, but more verbose
Is there a cleaner way to accomplish this sort of symmetric permutation of a matrix?
(I'm more interested in concise syntax than I am in performance)
In MATLAB
A(perm,perm)
is a block operation. In numpy A[perm,perm] selects elements on the diagonal.
A[perm[:,None], perm]
is the block indexing. The MATLAB diagonal requires something like sub2ind. What's concise in one is more verbose in the other, and v.v.
Actually numpy is using the same logic in both cases. It 'broadcasts' one index against the other, A (n,) against (n,) in the diagonal case, and (n,1) against (1,n) in the block case. The results are (n,) and (n,n) shaped.
This numpy indexing works with sparse matrices as well, though it isn't as fast. It actually uses matrix multiplication to do this sort of indexing - with an 'extractor' matrix based on the indices (maybe 2, M*A*M.T).
MATLAB's documentation about a permutation matrix:
https://www.mathworks.com/help/matlab/math/sparse-matrix-operations.html#f6-13070

Convert scipy condensed distance matrix to lower matrix read by rows

I have a condensed distance matrix from scipy that I need to pass to a C function that requires the matrix be converted to the lower triangle read by rows. For example:
0 1 2 3
0 4 5
0 6
0
The condensed form of this is: [1,2,3,4,5,6] but I need to convert it to
0
1 0
2 4 0
3 5 6 0
The lower triangle read by rows is: [1,2,4,3,5,6].
I was hoping to convert the compact distance matrix to this form without creating a redundant matrix.
Here's a quick implementation--but it creates the square redundant distance matrix as an intermediate step:
In [128]: import numpy as np
In [129]: from scipy.spatial.distance import squareform
c is the condensed form of the distance matrix:
In [130]: c = np.array([1, 2, 3, 4, 5, 6])
d is the redundant square distance matrix:
In [131]: d = squareform(c)
Here's your condensed lower triangle distances:
In [132]: d[np.tril_indices(d.shape[0], -1)]
Out[132]: array([1, 2, 4, 3, 5, 6])
Here's a method that avoids forming the redundant distance matrix. The function condensed_index(i, j, n) takes the row i and column j of the redundant distance matrix, with j > i, and returns the corresponding index in the condensed distance array.
In [169]: def condensed_index(i, j, n):
...: return n*i - i*(i+1)//2 + j - i - 1
...:
As above, c is the condensed distance array.
In [170]: c
Out[170]: array([1, 2, 3, 4, 5, 6])
In [171]: n = 4
In [172]: i, j = np.tril_indices(n, -1)
Note that the arguments are reversed in the following call:
In [173]: indices = condensed_index(j, i, n)
indices gives the desired permutation of the condensed distance array.
In [174]: c[indices]
Out[174]: array([1, 2, 4, 3, 5, 6])
(Basically the same function as condensed_index(i, j, n) was given in several answers to this question.)

Numpy array and Matlab Matrix are mismatching [3D]

The following octave code shows a sample 3D matrix using Octave/Matlab
octave:1> A=zeros(3,3,3);
octave:2>
octave:2> A(:,:,1)= [[1 2 3];[4 5 6];[7 8 9]];
octave:3>
octave:3> A(:,:,2)= [[11 22 33];[44 55 66];[77 88 99]];
octave:4>
octave:4> A(:,:,3)= [[111 222 333];[444 555 666];[777 888 999]];
octave:5>
octave:5>
octave:5> A
A =
ans(:,:,1) =
1 2 3
4 5 6
7 8 9
ans(:,:,2) =
11 22 33
44 55 66
77 88 99
ans(:,:,3) =
111 222 333
444 555 666
777 888 999
octave:6> A(1,3,2)
ans = 33
And I need to convert the same matrix using numpy ... unfortunately When I'm trying to access the same index using array in numpy I get different values as shown below!!
import numpy as np
array = np.array([[[1 ,2 ,3],[4 ,5 ,6],[7 ,8 ,9]], [[11 ,22 ,33],[44 ,55 ,66],[77 ,88 ,99]], [[111 ,222 ,333],[444 ,555 ,666],[777 ,888 ,999]]])
>>> array[0,2,1]
8
Also I read the following document that shows the difference between matrix implementation in Matlab and in Python numpy Numpy for Matlab users but I didn't find a sample 3d array and the mapping of it into Matlab and vice versa!
the answer is different for example accessing the element(1,3,2) in Matlab doesn't match the same index using numpy (0,2,1)
Octave/Matlab
octave:6> A(1,3,2)
ans = 33
Python
>>> array[0,2,1]
8
The way your array is constructed in numpy is different than it is in MATLAB.
Where your MATLAB array is (y, x, z), your numpy array is (z, y, x). Your 3d numpy array is a series of 'stacked' 2d arrays, so you're indexing "outside->inside" (for lack of a better term). Here's your array definition expanded so this (hopefully) makes a little more sense:
[[[1, 2, 3],
[4, 5, 6], # Z = 0
[7 ,8 ,9]],
[[11 ,22 ,33],
[44 ,55 ,66], # Z = 1
[77 ,88 ,99]],
[[111 ,222 ,333],
[444 ,555 ,666], # Z = 2
[777 ,888 ,999]]
]
So with:
import numpy as np
A = np.array([[[1 ,2 ,3],[4 ,5 ,6],[7 ,8 ,9]], [[11 ,22 ,33],[44 ,55 ,66],[77 ,88 ,99]], [[111 ,222 ,333],[444 ,555 ,666],[777 ,888 ,999]]])
B = A[1, 0, 2]
B returns 33, as expected.
If you want a less mind-bending way to indexing your array, consider generating it as you did in MATLAB.
MATLAB and Python index differently. To investigate this, lets create a linear array of number 1 to 8 and then reshape the result to be a 2-by-2-by-2 matrix in each language:
MATLAB:
M_flat = 1:8
M = reshape(M_flat, [2,2,2])
which returns
M =
ans(:,:,1) =
1 3
2 4
ans(:,:,2) =
5 7
6 8
Python:
import numpy as np
P_flat = np.array(range(1,9))
P = np.reshape(P, [2,2,2])
which returns
array([[[1, 2],
[3, 4]],
[[5, 6],
[7, 8]]])
The first thing you should notice is that the first two dimensions have switched. This is because MATLAB uses column-major indexing which means we count down the columns first whereas Python use row-major indexing and hence it counts across the rows first.
Now let's try indexing them. So let's try slicing along the different dimensions. In MATLAB, I know to get a slice out of the third dimension I can do
M(:,:,1)
ans =
1 3
2 4
Now let's try the same in Python
P[:,:,0]
array([[1, 3],
[5, 7]])
So that's completely different. To get the MATLAB 'equivalent' we need to go
P[0,:,:]
array([[1, 2],
[3, 4]])
Now this returns the transpose of the MATLAB version which is to be expected due the the row-major vs column-major difference.
So what does this mean for indexing? It looks like Python puts the major index at the end which is the reverse of MALTAB.
Let's say I index as follows in MATLAB
M(1,2,2)
ans =
7
now to get the 7 from Python we should go
P(1,1,0)
which is the MATLAB syntax reversed. Note that is is reversed because we created the Python matrix with a row-major ordering in mind. If you create it as you did in your code you would have to swap the last 2 indices so rather create the matrix correctly in the first place as Ander has suggested in the comments.
I think better than just calling the difference "row major" or "column major" is numpy's way of describing them:
ā€˜Cā€™ means to read / write the elements using C-like index order, with the last axis index changing fastest, back to the first axis index changing slowest. ā€˜Fā€™ means to read / write the elements using Fortran-like index order, with the first index changing fastest, and the last index changing slowest.
Some gifs to illustrate the difference: The first is row-major (python / c), second is column-major (MATLAB/ Fortran)
I think that the problem is the way you create the matrix in numpy and also the different representation of matlab and numpy, why you don't use the same system in matlab and numpy
>>> A = np.zeros((3,3,3),dtype=int)
>>> A
array([[[0, 0, 0],
[0, 0, 0],
[0, 0, 0]],
[[0, 0, 0],
[0, 0, 0],
[0, 0, 0]],
[[0, 0, 0],
[0, 0, 0],
[0, 0, 0]]])
>>> A[:,:,0] = np.array([[1,2,3],[4,5,6],[7,8,9]])
>>> A[:,:,1] = np.array([[11,22,33],[44,55,66],[77,88,99]])
>>> A[:,:,2] = np.array([[111,222,333],[444,555,666],[777,888,999]])
>>> A
array([[[ 1, 11, 111],
[ 2, 22, 222],
[ 3, 33, 333]],
[[ 4, 44, 444],
[ 5, 55, 555],
[ 6, 66, 666]],
[[ 7, 77, 777],
[ 8, 88, 888],
[ 9, 99, 999]]])
>>> A[0,2,1]
33
I think that python uses this type of indexing to create arrays as shown in the following figure:
https://www.google.com.eg/search?q=python+indexing+arrays+numpy&biw=1555&bih=805&source=lnms&tbm=isch&sa=X&ved=0ahUKEwia7b2J1qzOAhUFPBQKHXtdCBkQ_AUIBygC#imgrc=7JQu1w_4TCaAnM%3A
And, there are many ways to store your data, you can choose order='F' to count the columns first as matlab does, while the default is order='C' that count the rows first....

Operations on 'N' dimensional numpy arrays

I am attempting to generalize some Python code to operate on arrays of arbitrary dimension. The operations are applied to each vector in the array. So for a 1D array, there is simply one operation, for a 2-D array it would be both row and column-wise (linearly, so order does not matter). For example, a 1D array (a) is simple:
b = operation(a)
where 'operation' is expecting a 1D array. For a 2D array, the operation might proceed as
for ii in range(0,a.shape[0]):
b[ii,:] = operation(a[ii,:])
for jj in range(0,b.shape[1]):
c[:,ii] = operation(b[:,ii])
I would like to make this general where I do not need to know the dimension of the array beforehand, and not have a large set of if/elif statements for each possible dimension.
Solutions that are general for 1 or 2 dimensions are ok, though a completely general solution would be preferred. In reality, I do not imagine needing this for any dimension higher than 2, but if I can see a general example I will learn something!
Extra information:
I have a matlab code that uses cells to do something similar, but I do not fully understand how it works. In this example, each vector is rearranged (basically the same function as fftshift in numpy.fft). Not sure if this helps, but it operates on an array of arbitrary dimension.
function aout=foldfft(ain)
nd = ndims(ain);
for k = 1:nd
nx = size(ain,k);
kx = floor(nx/2);
idx{k} = [kx:nx 1:kx-1];
end
aout = ain(idx{:});
In Octave, your MATLAB code does:
octave:19> size(ain)
ans =
2 3 4
octave:20> idx
idx =
{
[1,1] =
1 2
[1,2] =
1 2 3
[1,3] =
2 3 4 1
}
and then it uses the idx cell array to index ain. With these dimensions it 'rolls' the size 4 dimension.
For 5 and 6 the index lists would be:
2 3 4 5 1
3 4 5 6 1 2
The equivalent in numpy is:
In [161]: ain=np.arange(2*3*4).reshape(2,3,4)
In [162]: idx=np.ix_([0,1],[0,1,2],[1,2,3,0])
In [163]: idx
Out[163]:
(array([[[0]],
[[1]]]), array([[[0],
[1],
[2]]]), array([[[1, 2, 3, 0]]]))
In [164]: ain[idx]
Out[164]:
array([[[ 1, 2, 3, 0],
[ 5, 6, 7, 4],
[ 9, 10, 11, 8]],
[[13, 14, 15, 12],
[17, 18, 19, 16],
[21, 22, 23, 20]]])
Besides the 0 based indexing, I used np.ix_ to reshape the indexes. MATLAB and numpy use different syntax to index blocks of values.
The next step is to construct [0,1],[0,1,2],[1,2,3,0] with code, a straight forward translation.
I can use np.r_ as a short cut for turning 2 slices into an index array:
In [201]: idx=[]
In [202]: for nx in ain.shape:
kx = int(np.floor(nx/2.))
kx = kx-1;
idx.append(np.r_[kx:nx, 0:kx])
.....:
In [203]: idx
Out[203]: [array([0, 1]), array([0, 1, 2]), array([1, 2, 3, 0])]
and pass this through np.ix_ to make the appropriate index tuple:
In [204]: ain[np.ix_(*idx)]
Out[204]:
array([[[ 1, 2, 3, 0],
[ 5, 6, 7, 4],
[ 9, 10, 11, 8]],
[[13, 14, 15, 12],
[17, 18, 19, 16],
[21, 22, 23, 20]]])
In this case, where 2 dimensions don't roll anything, slice(None) could replace those:
In [210]: idx=(slice(None),slice(None),[1,2,3,0])
In [211]: ain[idx]
======================
np.roll does:
indexes = concatenate((arange(n - shift, n), arange(n - shift)))
res = a.take(indexes, axis)
np.apply_along_axis is another function that constructs an index array (and turns it into a tuple for indexing).
If you are looking for a programmatic way to index the k-th dimension an n-dimensional array, then numpy.take might help you.
An implementation of foldfft is given below as an example:
In[1]:
import numpy as np
def foldfft(ain):
result = ain
nd = len(ain.shape)
for k in range(nd):
nx = ain.shape[k]
kx = (nx+1)//2
shifted_index = list(range(kx,nx)) + list(range(kx))
result = np.take(result, shifted_index, k)
return result
a = np.indices([3,3])
print("Shape of a = ", a.shape)
print("\nStarting array:\n\n", a)
print("\nFolded array:\n\n", foldfft(a))
Out[1]:
Shape of a = (2, 3, 3)
Starting array:
[[[0 0 0]
[1 1 1]
[2 2 2]]
[[0 1 2]
[0 1 2]
[0 1 2]]]
Folded array:
[[[2 0 1]
[2 0 1]
[2 0 1]]
[[2 2 2]
[0 0 0]
[1 1 1]]]
You could use numpy.ndarray.flat, which allows you to linearly iterate over a n dimensional numpy array. Your code should then look something like this:
b = np.asarray(x)
for i in range(len(x.flat)):
b.flat[i] = operation(x.flat[i])
The folks above provided multiple appropriate solutions. For completeness, here is my final solution. In this toy example for the case of 3 dimensions, the function 'ops' replaces the first and last element of a vector with 1.
import numpy as np
def ops(s):
s[0]=1
s[-1]=1
return s
a = np.random.rand(4,4,3)
print '------'
print 'Array a'
print a
print '------'
for ii in np.arange(a.ndim):
a = np.apply_along_axis(ops,ii,a)
print '------'
print ' Axis',str(ii)
print a
print '------'
print ' '
The resulting 3D array has a 1 in every element on the 'border' with the numbers in the middle of the array unchanged. This is of course a toy example; however ops could be any arbitrary function that operates on a 1D vector.
Flattening the vector will also work; I chose not to pursue that simply because the book-keeping is more difficult and apply_along_axis is the simplest approach.
apply_along_axis reference page

Categories

Resources