Python, Numpy, method of adding arrays together in multiple dimensions (broadcasting) - python

I have a number of different size arrays with a common index.
For example,
Arr1 = np.arange(0, 1000, 1).reshape(100, 10)
Arr2 = np.arange(0, 500, 1).reshape(100,5)
Arr1.shape = (100, 10)
Arr2.shape = (100, 5)
I want to add these together into a new Array, Arr3 which is three dimensional. e.g.
Arr3 = Arr1 + Arr2
Arr3.shape = (100, 10, 5)
Note, in this instance the values should allign e.g.
Arr3[10, 3, 2] = Arr1[10, 3] + Arr2[10, 2]
I have been attempting to use the following method
test = Arr1.copy()
test = test[:, np.newaxis] + Arr2
Now, I've been able to make this work when adding two square matrices together.
m = np.arange(0, 100, 1)
[x, y] = np.meshgrid(x, y)
x.shape = (100, 100)
test44 = x.copy()
test44 = test44[:, np.newaxis] + x
test44.shape = (100, 100, 100)
test44[4, 3, 2] = 4
x[4, 2] = 2
x[3, 2] = 2
However, in my actual program I will not have square matrices for this issue.
In addition this method is extremely memory intensive as evidenced when you begin moving up the number of dimensions as follows.
test44 = test44[:, :, np.newaxis] + x
test44.shape = (100, 100, 100, 100)
# Note this next command will fail with a memory error on my computer.
test44 = test44[:, :, :, np.newaxis] + x
So my question has two parts:
Is it possible to create a 3D array from two differently shaped 2D array with a common "shared" axis.
Is such a method extensible at higher order dimensions?
Any assistance is greatly appreciated.

Yes what you're trying to do is called broadcasting, it's done automatically by numpy if the inputs have the right shapes. Try this:
Arr1 = Arr1.reshape((100, 10, 1))
Arr2 = Arr2.reshape((100, 1, 5))
Arr3 = Arr1 + Arr2
I've found this to be a really good introduction to broadcasting which should show you how to extend this kind of behavior to n dimensions.

Related

Python numpy reshape from (n,) to (n, 1 , 1) [duplicate]

I'm starting off with a numpy array of an image.
In[1]:img = cv2.imread('test.jpg')
The shape is what you might expect for a 640x480 RGB image.
In[2]:img.shape
Out[2]: (480, 640, 3)
However, this image that I have is a frame of a video, which is 100 frames long. Ideally, I would like to have a single array that contains all the data from this video such that img.shape returns (480, 640, 3, 100).
What is the best way to add the next frame -- that is, the next set of image data, another 480 x 640 x 3 array -- to my initial array?
A dimension can be added to a numpy array as follows:
image = image[..., np.newaxis]
Alternatively to
image = image[..., np.newaxis]
in #dbliss' answer, you can also use numpy.expand_dims like
image = np.expand_dims(image, <your desired dimension>)
For example (taken from the link above):
x = np.array([1, 2])
print(x.shape) # prints (2,)
Then
y = np.expand_dims(x, axis=0)
yields
array([[1, 2]])
and
y.shape
gives
(1, 2)
You could just create an array of the correct size up-front and fill it:
frames = np.empty((480, 640, 3, 100))
for k in xrange(nframes):
frames[:,:,:,k] = cv2.imread('frame_{}.jpg'.format(k))
if the frames were individual jpg file that were named in some particular way (in the example, frame_0.jpg, frame_1.jpg, etc).
Just a note, you might consider using a (nframes, 480,640,3) shaped array, instead.
Pythonic
X = X[:, :, None]
which is equivalent to
X = X[:, :, numpy.newaxis] and
X = numpy.expand_dims(X, axis=-1)
But as you are explicitly asking about stacking images,
I would recommend going for stacking the list of images np.stack([X1, X2, X3]) that you may have collected in a loop.
If you do not like the order of the dimensions you can rearrange with np.transpose()
You can use np.concatenate() use the axis parameter to specify the dimension that should be concatenated. If the arrays being concatenated do not have this dimension, you can use np.newaxis to indicate where the new dimension should be added:
import numpy as np
movie = np.concatenate((img1[:,np.newaxis], img2[:,np.newaxis]), axis=3)
If you are reading from many files:
import glob
movie = np.concatenate([cv2.imread(p)[:,np.newaxis] for p in glob.glob('*.jpg')], axis=3)
Consider Approach 1 with reshape method and Approach 2 with np.newaxis method that produce the same outcome:
#Lets suppose, we have:
x = [1,2,3,4,5,6,7,8,9]
print('I. x',x)
xNpArr = np.array(x)
print('II. xNpArr',xNpArr)
print('III. xNpArr', xNpArr.shape)
xNpArr_3x3 = xNpArr.reshape((3,3))
print('IV. xNpArr_3x3.shape', xNpArr_3x3.shape)
print('V. xNpArr_3x3', xNpArr_3x3)
#Approach 1 with reshape method
xNpArrRs_1x3x3x1 = xNpArr_3x3.reshape((1,3,3,1))
print('VI. xNpArrRs_1x3x3x1.shape', xNpArrRs_1x3x3x1.shape)
print('VII. xNpArrRs_1x3x3x1', xNpArrRs_1x3x3x1)
#Approach 2 with np.newaxis method
xNpArrNa_1x3x3x1 = xNpArr_3x3[np.newaxis, ..., np.newaxis]
print('VIII. xNpArrNa_1x3x3x1.shape', xNpArrNa_1x3x3x1.shape)
print('IX. xNpArrNa_1x3x3x1', xNpArrNa_1x3x3x1)
We have as outcome:
I. x [1, 2, 3, 4, 5, 6, 7, 8, 9]
II. xNpArr [1 2 3 4 5 6 7 8 9]
III. xNpArr (9,)
IV. xNpArr_3x3.shape (3, 3)
V. xNpArr_3x3 [[1 2 3]
[4 5 6]
[7 8 9]]
VI. xNpArrRs_1x3x3x1.shape (1, 3, 3, 1)
VII. xNpArrRs_1x3x3x1 [[[[1]
[2]
[3]]
[[4]
[5]
[6]]
[[7]
[8]
[9]]]]
VIII. xNpArrNa_1x3x3x1.shape (1, 3, 3, 1)
IX. xNpArrNa_1x3x3x1 [[[[1]
[2]
[3]]
[[4]
[5]
[6]]
[[7]
[8]
[9]]]]
a = np.expand_dims(a, axis=-1)
or
a = a[:, np.newaxis]
or
a = a.reshape(a.shape + (1,))
There is no structure in numpy that allows you to append more data later.
Instead, numpy puts all of your data into a contiguous chunk of numbers (basically; a C array), and any resize requires allocating a new chunk of memory to hold it. Numpy's speed comes from being able to keep all the data in a numpy array in the same chunk of memory; e.g. mathematical operations can be parallelized for speed and you get less cache misses.
So you will have two kinds of solutions:
Pre-allocate the memory for the numpy array and fill in the values, like in JoshAdel's answer, or
Keep your data in a normal python list until it's actually needed to put them all together (see below)
images = []
for i in range(100):
new_image = # pull image from somewhere
images.append(new_image)
images = np.stack(images, axis=3)
Note that there is no need to expand the dimensions of the individual image arrays first, nor do you need to know how many images you expect ahead of time.
You can use stack with the axis parameter:
img.shape # h,w,3
imgs = np.stack([img1,img2,img3,img4], axis=-1) # -1 = new axis is last
imgs.shape # h,w,3,nimages
For example: to convert grayscale to color:
>>> d = np.zeros((5,4), dtype=int) # 5x4
>>> d[2,3] = 1
>>> d3.shape
Out[30]: (5, 4, 3)
>>> d3 = np.stack([d,d,d], axis=-2) # 5x4x3 -1=as last axis
>>> d3[2,3]
Out[32]: array([1, 1, 1])
I followed this approach:
import numpy as np
import cv2
ls = []
for image in image_paths:
ls.append(cv2.imread('test.jpg'))
img_np = np.array(ls) # shape (100, 480, 640, 3)
img_np = np.rollaxis(img_np, 0, 4) # shape (480, 640, 3, 100).
This worked for me:
image = image[..., None]
This will help you add axis anywhere you want
import numpy as np
signal = np.array([[0.3394572666491664, 0.3089068053925853, 0.3516359279582483], [0.33932706934615525, 0.3094755563319447, 0.3511973743219001], [0.3394407172182317, 0.30889042266755573, 0.35166886011421256], [0.3394407172182317, 0.30889042266755573, 0.35166886011421256]])
print(signal.shape)
#(4,3)
print(signal[...,np.newaxis].shape) or signal[...:none]
#(4, 3, 1)
print(signal[:, np.newaxis, :].shape) or signal[:,none, :]
#(4, 1, 3)
there is three-way for adding new dimensions to ndarray .
first: using "np.newaxis" (something like #dbliss answer)
np.newaxis is just given an alias to None for making it easier to
understand. If you replace np.newaxis with None, it works the same
way. but it's better to use np.newaxis for being more explicit.
import numpy as np
my_arr = np.array([2, 3])
new_arr = my_arr[..., np.newaxis]
print("old shape", my_arr.shape)
print("new shape", new_arr.shape)
>>> old shape (2,)
>>> new shape (2, 1)
second: using "np.expand_dims()"
Specify the original ndarray in the first argument and the position
to add the dimension in the second argument axis.
my_arr = np.array([2, 3])
new_arr = np.expand_dims(my_arr, -1)
print("old shape", my_arr.shape)
print("new shape", new_arr.shape)
>>> old shape (2,)
>>> new shape (2, 1)
third: using "reshape()"
my_arr = np.array([2, 3])
new_arr = my_arr.reshape(*my_arr.shape, 1)
print("old shape", my_arr.shape)
print("new shape", new_arr.shape)
>>> old shape (2,)
>>> new shape (2, 1)

How can I use a 3d numpy array of indices to retrieve the corresponding values i a 4d array?

I have a 4d numpy array temperature of data with the measured temperature at points x,y,z and time t. Assuming I have an array indices with the indices where the first instance of a condition is met, say temperature < 0, how do I extract a 3d array with the first temperatures satisfying this condition? That is I'm looking for the equivalent of numpy's 1d version (import numpy as np tacitly assumed)
>>> temperatures = np.arange(10,-10,-1)
>>> ind = np.argmax(temperatures < 0)
>>> T = temperature[ind]
I have tried the analogous
In [1]: temperatures = np.random.random((11,8,5,200)) * 1000
In [2]: temperatures.shape
Out[2]: (11, 8, 5, 200)
In [3]: indices= np.argmax(temperatures > 900,axis=3)
In [4]: indices.shape
Out[4]: (11, 8, 5)
In [5]: T = temperatures[:,:,:,indices]
In [6]: T.shape
Out[6]: (11, 8, 5, 11, 8, 5)
However, the dimensions if Tis 6.
I could of course do it with a for loop:
indices = np.argmax(temperatures > 900,axis=3)
x,y,z = temperatures.shape[:-1]
T = np.zeros((x,y,z))
for indx in range(x):
for indy in range(y):
for indz in range(z):
T[indx,indy,indz] = temperatures[indx,indy,indz,indices[indx,indy,indz]]
but I'm looking for something fore elegant and more pythonic. Is there someone more skilled with numpy out there who can help me out on this?
P.S. For the sake of clarity, I'm not just looking for the temperature at these points given by indices, I'm also looking for other quantities in arrays of the same shape as temperature, e.g. the time derivative. Also, in reality the arrays are much larger then this minimal example.
Numpy advanced indexing does always work:
import numpy as np
temperatures = np.random.random((11,8,5, 200)) * 1000
indices = np.argmax(temperatures > 900, axis=3)
x, y, z = temperatures.shape[:-1]
T = temperatures[np.arange(x)[:, np.newaxis, np.newaxis],
np.arange(y)[np.newaxis, :, np.newaxis],
np.arange(z)[np.newaxis, np.newaxis, :],
indices]
As jdehesa pointed out this can be made more concise:
x, y, z = np.ogrid[:x, :y, :z]
T = temperatures[x, y, z, i]
I think you need:
axis = 3
indices = np.argmax(temperatures > 900, axis=axis)
result = np.take_along_axis(temperatures, np.expand_dims(indices, axis), axis)
result = result.squeeze(axis)

How can I append a single row dimension to an ndarray shape? [duplicate]

I'm starting off with a numpy array of an image.
In[1]:img = cv2.imread('test.jpg')
The shape is what you might expect for a 640x480 RGB image.
In[2]:img.shape
Out[2]: (480, 640, 3)
However, this image that I have is a frame of a video, which is 100 frames long. Ideally, I would like to have a single array that contains all the data from this video such that img.shape returns (480, 640, 3, 100).
What is the best way to add the next frame -- that is, the next set of image data, another 480 x 640 x 3 array -- to my initial array?
A dimension can be added to a numpy array as follows:
image = image[..., np.newaxis]
Alternatively to
image = image[..., np.newaxis]
in #dbliss' answer, you can also use numpy.expand_dims like
image = np.expand_dims(image, <your desired dimension>)
For example (taken from the link above):
x = np.array([1, 2])
print(x.shape) # prints (2,)
Then
y = np.expand_dims(x, axis=0)
yields
array([[1, 2]])
and
y.shape
gives
(1, 2)
You could just create an array of the correct size up-front and fill it:
frames = np.empty((480, 640, 3, 100))
for k in xrange(nframes):
frames[:,:,:,k] = cv2.imread('frame_{}.jpg'.format(k))
if the frames were individual jpg file that were named in some particular way (in the example, frame_0.jpg, frame_1.jpg, etc).
Just a note, you might consider using a (nframes, 480,640,3) shaped array, instead.
Pythonic
X = X[:, :, None]
which is equivalent to
X = X[:, :, numpy.newaxis] and
X = numpy.expand_dims(X, axis=-1)
But as you are explicitly asking about stacking images,
I would recommend going for stacking the list of images np.stack([X1, X2, X3]) that you may have collected in a loop.
If you do not like the order of the dimensions you can rearrange with np.transpose()
You can use np.concatenate() use the axis parameter to specify the dimension that should be concatenated. If the arrays being concatenated do not have this dimension, you can use np.newaxis to indicate where the new dimension should be added:
import numpy as np
movie = np.concatenate((img1[:,np.newaxis], img2[:,np.newaxis]), axis=3)
If you are reading from many files:
import glob
movie = np.concatenate([cv2.imread(p)[:,np.newaxis] for p in glob.glob('*.jpg')], axis=3)
Consider Approach 1 with reshape method and Approach 2 with np.newaxis method that produce the same outcome:
#Lets suppose, we have:
x = [1,2,3,4,5,6,7,8,9]
print('I. x',x)
xNpArr = np.array(x)
print('II. xNpArr',xNpArr)
print('III. xNpArr', xNpArr.shape)
xNpArr_3x3 = xNpArr.reshape((3,3))
print('IV. xNpArr_3x3.shape', xNpArr_3x3.shape)
print('V. xNpArr_3x3', xNpArr_3x3)
#Approach 1 with reshape method
xNpArrRs_1x3x3x1 = xNpArr_3x3.reshape((1,3,3,1))
print('VI. xNpArrRs_1x3x3x1.shape', xNpArrRs_1x3x3x1.shape)
print('VII. xNpArrRs_1x3x3x1', xNpArrRs_1x3x3x1)
#Approach 2 with np.newaxis method
xNpArrNa_1x3x3x1 = xNpArr_3x3[np.newaxis, ..., np.newaxis]
print('VIII. xNpArrNa_1x3x3x1.shape', xNpArrNa_1x3x3x1.shape)
print('IX. xNpArrNa_1x3x3x1', xNpArrNa_1x3x3x1)
We have as outcome:
I. x [1, 2, 3, 4, 5, 6, 7, 8, 9]
II. xNpArr [1 2 3 4 5 6 7 8 9]
III. xNpArr (9,)
IV. xNpArr_3x3.shape (3, 3)
V. xNpArr_3x3 [[1 2 3]
[4 5 6]
[7 8 9]]
VI. xNpArrRs_1x3x3x1.shape (1, 3, 3, 1)
VII. xNpArrRs_1x3x3x1 [[[[1]
[2]
[3]]
[[4]
[5]
[6]]
[[7]
[8]
[9]]]]
VIII. xNpArrNa_1x3x3x1.shape (1, 3, 3, 1)
IX. xNpArrNa_1x3x3x1 [[[[1]
[2]
[3]]
[[4]
[5]
[6]]
[[7]
[8]
[9]]]]
a = np.expand_dims(a, axis=-1)
or
a = a[:, np.newaxis]
or
a = a.reshape(a.shape + (1,))
There is no structure in numpy that allows you to append more data later.
Instead, numpy puts all of your data into a contiguous chunk of numbers (basically; a C array), and any resize requires allocating a new chunk of memory to hold it. Numpy's speed comes from being able to keep all the data in a numpy array in the same chunk of memory; e.g. mathematical operations can be parallelized for speed and you get less cache misses.
So you will have two kinds of solutions:
Pre-allocate the memory for the numpy array and fill in the values, like in JoshAdel's answer, or
Keep your data in a normal python list until it's actually needed to put them all together (see below)
images = []
for i in range(100):
new_image = # pull image from somewhere
images.append(new_image)
images = np.stack(images, axis=3)
Note that there is no need to expand the dimensions of the individual image arrays first, nor do you need to know how many images you expect ahead of time.
You can use stack with the axis parameter:
img.shape # h,w,3
imgs = np.stack([img1,img2,img3,img4], axis=-1) # -1 = new axis is last
imgs.shape # h,w,3,nimages
For example: to convert grayscale to color:
>>> d = np.zeros((5,4), dtype=int) # 5x4
>>> d[2,3] = 1
>>> d3.shape
Out[30]: (5, 4, 3)
>>> d3 = np.stack([d,d,d], axis=-2) # 5x4x3 -1=as last axis
>>> d3[2,3]
Out[32]: array([1, 1, 1])
I followed this approach:
import numpy as np
import cv2
ls = []
for image in image_paths:
ls.append(cv2.imread('test.jpg'))
img_np = np.array(ls) # shape (100, 480, 640, 3)
img_np = np.rollaxis(img_np, 0, 4) # shape (480, 640, 3, 100).
This worked for me:
image = image[..., None]
This will help you add axis anywhere you want
import numpy as np
signal = np.array([[0.3394572666491664, 0.3089068053925853, 0.3516359279582483], [0.33932706934615525, 0.3094755563319447, 0.3511973743219001], [0.3394407172182317, 0.30889042266755573, 0.35166886011421256], [0.3394407172182317, 0.30889042266755573, 0.35166886011421256]])
print(signal.shape)
#(4,3)
print(signal[...,np.newaxis].shape) or signal[...:none]
#(4, 3, 1)
print(signal[:, np.newaxis, :].shape) or signal[:,none, :]
#(4, 1, 3)
there is three-way for adding new dimensions to ndarray .
first: using "np.newaxis" (something like #dbliss answer)
np.newaxis is just given an alias to None for making it easier to
understand. If you replace np.newaxis with None, it works the same
way. but it's better to use np.newaxis for being more explicit.
import numpy as np
my_arr = np.array([2, 3])
new_arr = my_arr[..., np.newaxis]
print("old shape", my_arr.shape)
print("new shape", new_arr.shape)
>>> old shape (2,)
>>> new shape (2, 1)
second: using "np.expand_dims()"
Specify the original ndarray in the first argument and the position
to add the dimension in the second argument axis.
my_arr = np.array([2, 3])
new_arr = np.expand_dims(my_arr, -1)
print("old shape", my_arr.shape)
print("new shape", new_arr.shape)
>>> old shape (2,)
>>> new shape (2, 1)
third: using "reshape()"
my_arr = np.array([2, 3])
new_arr = my_arr.reshape(*my_arr.shape, 1)
print("old shape", my_arr.shape)
print("new shape", new_arr.shape)
>>> old shape (2,)
>>> new shape (2, 1)

Alter a 3D ndarray at the positions represented by a 2d ndarray

This is my first nontrivial use of numpy, and I'm having some trouble in one spot.
So, I have colors, a (xsize + 2, ysize + 2, 3) ndarray, and newlife, a (xsize + 2, ysize + 2) ndarray of booleans. I want to add a random value between -5 and 5 to all three values in colors at all positions where newlife is true. In other words newlife maps 2D vectors to whether or not I want to add a random value to the color in colors at that position.
I've tried a million variations on this:
colors[np.nonzero(newlife)] += (np.random.random_sample((xsize + 2,ysize + 2, 3)) * 10 - 5)
but I keep getting stuff like
ValueError: operands could not be broadcast together with shapes (589,3) (130,42,3) (589,3)
How do I do this?
I think this does what you want:
# example data
colors = np.random.randint(0, 100, (5,4,3))
newlife = np.random.randint(0, 2, (5,4), bool)
# create values to add, then mask with newlife
to_add = np.random.randint(-5,6, (5,4,3))
to_add[~newlife] = 0
# modify in place
colors += to_add
This changes the colors in-place assuming uint8 dtype. Both assumptions are not essential:
import numpy as np
n_x, n_y = 2, 2
colors = np.random.randint(5, 251, (n_x+2, n_y+2, 3), dtype=np.uint8)
mask = np.random.randint(0, 2, (n_x+2, n_y+2), dtype=bool)
n_change = np.count_nonzero(mask)
print(colors)
print(mask)
colors[mask] += np.random.randint(-5, 6, (n_change, 3), dtype=np.int8).view(np.uint8)
print(colors)
The easiest way of understanding this is to look at the shape of colors[mask].

vectorized/broadcasted Dot product of numpy arrays with different dimensions

The Problem:
I want to calculate the dot product of a very large set of data. I am able to do this in a nested for-loop, but this is way too slow.
Here is a small example:
import numpy as np
points = np.array([[0.5, 2, 3, 5.5, 8, 11], [1, 2, -1.5, 0.5, 4, 5]])
lines = np.array([[0, 2, 4, 6, 10, 10, 0, 0], [0, 0, 0, 0, 0, 4, 4, 0]])
x1 = lines[0][0:-1]
y1 = lines[1][0:-1]
L1 = np.asarray([x1, y1])
# calculate the relative length of the projection
# of each point onto each line
a = np.diff(lines)
b = points[:,:,None] - L1[:,None,:]
print(a.shape)
print(b.shape)
[rows, cols, pages] = np.shape(b)
Z = np.zeros((cols, pages))
for k in range(cols):
for l in range(pages):
Z[k][l] = a[0][l]*b[0][k][l] + a[1][l]*b[1][k][l]
N = np.linalg.norm(a, axis=0)**2
relativeProjectionLength = np.squeeze(np.asarray(Z/N))
In this example, the first two dimensions of both a and b represent the x- and y-coordinates that I need for the dot product.
The shape of a is (2,7) and b has (2,6,7). Since the dot product reduces the first dimension I would expect the result to be of the shape (6,7). How can I calculate this without the slow loops?
What I have tried:
I think that numpy.dot with correct broadcasting could do the job, however I have trouble setting up the dimensions correctly.
a = a[:, None, :]
Z = np.dot(a,b)
This on gives me the following error:
shapes (2,1,7) and (2,6,7) not aligned: 7 (dim 2) != 6 (dim 1)
You can use np.einsum -
np.einsum('ij,ikj->kj',a,b)
Explanation :
Keep the last axes aligned for the two inputs.
Sum-reduce the first from those.
Let the rest stay, which is the second axis of b.
Usual rules on whether to use einsum or stick to a loopy-dot based method apply here.
numpy.dot does not reduce the first dimension. From the docs:
For N dimensions it is a sum product over the last axis of a and the second-to-last of b:
dot(a, b)[i,j,k,m] = sum(a[i,j,:] * b[k,:,m])
That is exactly what the error is telling you: it is attempting to match axis 2 in the first vector to axis 1 in the second.
You can fix this using numpy.rollaxis or better yet numpy.moveaxis. Instead of a = a[:, None, :], do
a = np.movesxis(a, 0, -1)
b = np.moveaxis(b, 0, -2)
Z = np.dot(a, b)
Better yet, you can construct your arrays to have the correct shape up front. For example, transpose lines and do a = np.diff(lines, axis=0).

Categories

Resources