I'm having some array operation issues. Here's an example:
A = np.ones((5,2))
B = np.ones((5,2)) * 2
X = np.zeros((5,1))
C = A[:,0] + B[:,0]
D = C + X
The shapes I'm getting are:
shape(A[:,0]) = (5,)
shape(B[:,0]) = (5,)
shape(X) = (5,1)
shape(C) = (5,)
shape(D) = (5,5)
When I extract a column from an array, the output is from shape (5,), not (5,1). Is there any way to correct that without having to reshape arrays all the time?
When I add D = C + X, the result is an (5,5) array, but should be (5,1).
Solution 1
D = X + C.reshape(shape(X))
shape(D)
#(5, 1)
print(D)
#[[ 3.]
# [ 3.]
# [ 3.]
# [ 3.]
# [ 3.]]
Solution 2 (better) numpy-convert-row-vector-to-column-vector
C = A[:,0:1] + B[:,0:1]
Why,
C and X have different shapes, and you sum row with number, geting a matrix with shape (5,5)
print(C)
#[ 3. 3. 3. 3. 3.]
print(X)
#[[ 0.]
# [ 0.]
# [ 0.]
# [ 0.]
# [ 0.]]
When broadcasting an array like C with (5,) with a 2d array, numpy adds dimensions at the start as needed, (1,5). So a (1,5) + (5,1) => (5,5).
To get a (5,1) result, you need, in one way or other, make C a (5,1) array.
C[:,None] + X # None or np.newaxis is an easy way
C.reshape(5,1) + X # equivalent
or index A with a list or slice
C = A[:,[0]] + B[:,[0]]
A[:,0] removes a dimension, producing a (5,) array.
Note, MATLAB adds the default dimensions to the end; numpy because it has a default C order, does so at the start. Adding dimensions like that requires minimal change, just changing the shape.
Functions like np.sum have a keepdimensions parameter to avoid this sort of dimension reduction.
Related
I dont even know how to phrase what I am trying to do so I'm going straight to a simple example. I have a blocked array that looks something like this:
a = np.array([
[1,2,0,0],
[3,4,0,0],
[9,9,0,0],
[0,0,5,6],
[0,0,7,8],
[0,0,8,8]
])
and I want as an output:
np.array([
[1/9,2/9,0,0],
[3/9,4/9,0,0],
[9/9,9/9,0,0],
[0,0,5/8,6/8],
[0,0,7/8,8/8],
[0,0,8/8,8/8]
])
Lets view this as two blocks
Block 1
np.array([
[1,2,0,0],
[3,4,0,0],
[9,9,0,0],
])
Block 2
np.array([
[0,0,5,6],
[0,0,7,8],
[0,0,8,8]
])
I want to normalize by the last row of each block. I.e I want to divide each block by the last row (plus epsilon for stability so the zeros are 0/(0+eps) = 0).
I need an efficient way to do this.
My current inefficient solution is to create a new array of the same shape as a where block one in the new array is the last row of the corresponding block in a and the divide. As follows:
norming_indices = np.array([2,2,2,5,5,5])
divisors = a[norming_indices, :]
b = a / (divisors + 1e-9)
In this example:
divisors = np.array([
[9,9,0,0],
[9,9,0,0],
[9,9,0,0],
[0,0,8,8],
[0,0,8,8],
[0,0,8,8]
])
This like a very inefficient way to do this, does anyone have a better approach?
Reshape to three dimensions, apply the normalization for each block (last row (index 2) of each 3-row-block (step 3), then reshape back to original shape:
b = a.reshape(-1, 3, 4)
b = b / b[:,2::3].max(axis=2,keepdims=True)
b = b.reshape(a.shape)
np.concatenate may help you
a = np.array([
[1,2,0,0],
[3,4,0,0],
[9,9,0,0],
[0,0,5,6],
[0,0,7,8],
[0,0,8,8]
])
b = np.concatenate((a[0:3, :] / (a[2, :] + 1e-9),
a[3:, :] / (a[5, :] + 1e-9)))
print(b)
Output:
[[0.11111111 0.22222222 0. 0. ]
[0.33333333 0.44444444 0. 0. ]
[1. 1. 0. 0. ]
[0. 0. 0.625 0.75 ]
[0. 0. 0.875 1. ]
[0. 0. 1. 1. ]]
I'm writing a program but I'm finding dificulties to update a numpy array.
The code:
print("p: " + str(pontoP))
print("d: " + str(deslocamento))
novoP = np.array([0,0,0])
novoP = pontoP + deslocamento
pontos[i] = novoP
print("p+d: " + str(pontos[i]))
The output:
p: [0. 1. 0.33333333]
d: [ 0. -1. 0.]
p+d: [0 0 0]
pontoP, novoP and deslocamento are 1D numpy arrays (length 3), and pontos is a 2D numpy array (size 8 x 3).
The line novoP = pontoP + deslocamento is working: the arrays are being summed element-wise. However, pontos[i] = novoP is failing to update the 2D array pontos.
What can I do? The desired result is to replace the ith array of pontos with the contents of novoP.
Thanks to #hpauli , I found that the issue was the type of numpy array. It was an int and when I tried to put a float in it, the float was being rounded.
I'm trying to populate an array in python more efficiently. I have a 5x3 matrix A that I am transforming into a 3x3 matrix (Z) by calculating z11, z12, ..., z33 independently. The code below works, but it's clunky and I'm hoping to automate this into a loop so that it will take an A matrix of any size (n x m) and transform it into a Z matrix of size (m x m). If someone could help me out I would greatly appreciate it!
import numpy as np
A = np.array([[1,0,0],
[0,1,0],
[0,1,1],
[0,0,-1],
[0,0,1]])
A1=A[:,0]
A2=A[:,1]
A3=A[:,2]
C = np.array([-2,-2, -9,-6,-4])
X = np.array([-4,-4,-8])
z11 = (sum(A1*A1))*(C[0]/X[0])
z12 = (sum(A1*A2))*(C[0]/X[1])
z13 = (sum(A1*A3))*(C[0]/X[2])
z21 = (sum(A2*A1))*(C[1]/X[0])
z22 = (sum(A2*A2))*(C[1]/X[1])
z23 = (sum(A2*A3))*(C[1]/X[2])
z31 = (sum(A3*A1))*(C[2]/X[0])
z32 = (sum(A3*A2))*(C[2]/X[1])
z33 = (sum(A3*A3))*(C[2]/X[2])
Z = np.array([[z11,z12,z13],
[z21,z22,z23],
[z31,z32,z33]])
We can use the broadcasting to achieve the same. First let's increase A by one dimension using A[:, None] and then multiply it with A. Since shape of A[:, None] is (3, 1, 5) and shape of A is (3, 5), numpy first repeats(intuitively) the array corresponding to dimension where both array don't match and then does the multiplication. This way each column of A gets multiplied with every other column(to makes sure that columns are multiplied, I have used transpose) Then we can take sum along the last axis and multiply with C[:, None] to achieve the desired output.
Use:
m = A.shape[1]
B = A[:, None].T * A.T
Z = np.sum(B, axis = -1).astype(float)*C[:m, None]/X
Output:
>>> Z
array([[0.5 , 0. , 0. ],
[0. , 1. , 0.25 ],
[0. , 2.25 , 3.375]])
Given two arrays, a and b, with shapes; (3, 3) and (1000,). How do I multiply them to get an array with shape (3, 3, 1000)?
a = np.array([[1,2,3],[4,5,6],[7,8,9]])
b = np.linspace(1, 1000, 1000)
c = a * b # does not work
c = np.outer(a, b) # does not work
c = np.outer(a, b[None,] # nope
I have tried a lot of things, too many to remember them all.
I have also googled (and searched on SO) but to no avail.
IIUC, use numpy.einsum:
c = np.einsum("ij,k->ijk", a, b)
Output:
c.shape
# (3, 3, 1000)
You can do it with multiplication by reshaping your arrays:
M,N = a.shape
B = b.size
c = a.reshape(M,N,1) * b.reshape(1,1,B)
print(c.shape)
print(c[:,:,0])
print(c[:,:,B-1])
Output:
% python3 script.py
(3, 3, 1000)
[[1. 2. 3.]
[4. 5. 6.]
[7. 8. 9.]]
[[1000. 2000. 3000.]
[4000. 5000. 6000.]
[7000. 8000. 9000.]]
You need to understand the broadcasting rules. The bottomline is:
both arrays need to have the same number of axes and
the sizes along each axis must be the same or one of them must be 1.
You can multiply and array with shape (3,3) only by another with shape (3,3), (3,1) or (1,3). There are other broadcasting rules. Read them.
Your shapes are (3,3) and (1000,). As you said, you need the final shape to be 3-dimensional. The 3 needs to match an axis with length 1. Same with the 1000. So you can add axes to each to end up with shapes (3, 3, 1) and (1, 1, 1000):
c = a[:, :, np.newaxis]* b[np.newaxis,np.newaxis]
I have a two dimensional array, i.e. an array of sequences which are also arrays. For each sequence I would like to calculate the autocorrelation, so that for a (5,4) array, I would get 5 results, or an array of dimension (5,7).
I know I could just loop over the first dimension, but that's slow and my last resort. Is there another way?
Thanks!
EDIT:
Based on the chosen answer plus the comment from mtrw, I have the following function:
def xcorr(x):
"""FFT based autocorrelation function, which is faster than numpy.correlate"""
# x is supposed to be an array of sequences, of shape (totalelements, length)
fftx = fft(x, n=(length*2-1), axis=1)
ret = ifft(fftx * np.conjugate(fftx), axis=1)
ret = fftshift(ret, axes=1)
return ret
Note that length is a global variable in my code, so be sure to declare it. I also didn't restrict the result to real numbers, since I need to take into account complex numbers as well.
Using FFT-based autocorrelation:
import numpy
from numpy.fft import fft, ifft
data = numpy.arange(5*4).reshape(5, 4)
print data
##[[ 0 1 2 3]
## [ 4 5 6 7]
## [ 8 9 10 11]
## [12 13 14 15]
## [16 17 18 19]]
dataFT = fft(data, axis=1)
dataAC = ifft(dataFT * numpy.conjugate(dataFT), axis=1).real
print dataAC
##[[ 14. 8. 6. 8.]
## [ 126. 120. 118. 120.]
## [ 366. 360. 358. 360.]
## [ 734. 728. 726. 728.]
## [ 1230. 1224. 1222. 1224.]]
I'm a little confused by your statement about the answer having dimension (5, 7), so maybe there's something important I'm not understanding.
EDIT: At the suggestion of mtrw, a padded version that doesn't wrap around:
import numpy
from numpy.fft import fft, ifft
data = numpy.arange(5*4).reshape(5, 4)
padding = numpy.zeros((5, 3))
dataPadded = numpy.concatenate((data, padding), axis=1)
print dataPadded
##[[ 0. 1. 2. 3. 0. 0. 0. 0.]
## [ 4. 5. 6. 7. 0. 0. 0. 0.]
## [ 8. 9. 10. 11. 0. 0. 0. 0.]
## [ 12. 13. 14. 15. 0. 0. 0. 0.]
## [ 16. 17. 18. 19. 0. 0. 0. 0.]]
dataFT = fft(dataPadded, axis=1)
dataAC = ifft(dataFT * numpy.conjugate(dataFT), axis=1).real
print numpy.round(dataAC, 10)[:, :4]
##[[ 14. 8. 3. 0. 0. 3. 8.]
## [ 126. 92. 59. 28. 28. 59. 92.]
## [ 366. 272. 179. 88. 88. 179. 272.]
## [ 734. 548. 363. 180. 180. 363. 548.]
## [ 1230. 920. 611. 304. 304. 611. 920.]]
There must be a more efficient way to do this, especially because autocorrelation is symmetric and I don't take advantage of that.
For really large arrays it becomes important to have n = 2 ** p, where p is an integer. This will save you huge amounts of time. For example:
def xcorr(x):
l = 2 ** int(np.log2(x.shape[1] * 2 - 1))
fftx = fft(x, n = l, axis = 1)
ret = ifft(fftx * np.conjugate(fftx), axis = 1)
ret = fftshift(ret, axes=1)
return ret
This might give you wrap-around errors. For large arrays the auto correlation should be insignificant near the edges, though.
Maybe it's just a preference, but I wanted to follow from the definition. I personally find it a bit easier to follow that way. This is my implementation for an arbitrary nd array.
from itertools import product
from numpy import empty, roll
def autocorrelate(x):
"""
Compute the multidimensional autocorrelation of an nd array.
input: an nd array of floats
output: an nd array of autocorrelations
"""
# used for transposes
t = roll(range(x.ndim), 1)
# pairs of indexes
# the first is for the autocorrelation array
# the second is the shift
ii = [list(enumerate(range(1, s - 1))) for s in x.shape]
# initialize the resulting autocorrelation array
acor = empty(shape=[len(s0) for s0 in ii])
# iterate over all combinations of directional shifts
for i in product(*ii):
# extract the indexes for
# the autocorrelation array
# and original array respectively
i1, i2 = asarray(i).T
x1 = x.copy()
x2 = x.copy()
for i0 in i2:
# clip the unshifted array at the end
x1 = x1[:-i0]
# and the shifted array at the beginning
x2 = x2[i0:]
# prepare to do the same for
# the next axis
x1 = x1.transpose(t)
x2 = x2.transpose(t)
# normalize shifted and unshifted arrays
x1 -= x1.mean()
x1 /= x1.std()
x2 -= x2.mean()
x2 /= x2.std()
# compute the autocorrelation directly
# from the definition
acor[tuple(i1)] = (x1 * x2).mean()
return acor