Apply function to all elements in NumPy matrix [duplicate] - python

This question already has answers here:
Most efficient way to map function over numpy array
(11 answers)
Closed 4 years ago.
Lets say I create a 3x3 NumPy Matrix. What is the best way to apply a function to all elements in the matrix, with out looping through each element if possible?
import numpy as np
def myFunction(x):
return (x * 2) + 3
myMatrix = np.matlib.zeros((4, 4))
# What is the best way to apply myFunction to each element in myMatrix?
EDIT: The current solutions proposed work great if the function is matrix-friendly, but what if it's a function like this that deals with scalars only?
def randomize():
x = random.randrange(0, 10)
if x < 5:
x = -1
return x
Would the only way be to loop through the matrix and apply the function to each scalar inside the matrix? I'm not looking for a specific solution (like how to randomize the matrix), but rather a general solution to apply a function over the matrix. Hope this helps!

This shows two possible ways of doing maths on a whole Numpy array without using an explicit loop:
import numpy as np
# Make a simple array with unique elements
m = np.arange(12).reshape((4,3))
# Looks like:
# array([[ 0, 1, 2],
# [ 3, 4, 5],
# [ 6, 7, 8],
# [ 9, 10, 11]])
# Apply formula to all elements without loop
m = m*2 + 3
# Looks like:
# array([[ 3, 5, 7],
# [ 9, 11, 13],
# [15, 17, 19],
# [21, 23, 25]])
# Define a function
def f(x):
return (x*2) + 3
# Apply function to all elements
f(m)
# Looks like:
# array([[ 9, 13, 17],
# [21, 25, 29],
# [33, 37, 41],
# [45, 49, 53]])

Related

sum through specific values in an array

I have an array of data-points, for example:
[10, 9, 8, 7, 6, 5, 4, 3, 2, 1]
and I need to perform the following sum on the values:
However, the problem is that I need to perform this sum on each value > i. For example, using the last 3 values in the set the sum would be:
and so on up to 10.
If i run something like:
import numpy as np
x = np.array([10, 9, 8, 7, 6, 5, 4, 3, 2, 1])
alpha = 1/np.log(2)
for i in x:
y = sum(x**(alpha)*np.log(x))
print (y)
It returns a single value of y = 247.7827060452275, whereas I need an array of values. I think I need to reverse the order of the data to achieve what I want but I'm having trouble visualising the problem (hope I explained it properly) as a whole so any suggestions would be much appreciated.
The following computes all the partial sums of the grand sum in your formula
import numpy as np
# Generate numpy array [1, 10]
x = np.arange(1, 11)
alpha = 1 / np.log(2)
# Compute parts of the sum
parts = x ** alpha * np.log(x)
# Compute all partial sums
part_sums = np.cumsum(parts)
print(part_sums)
You really do not any explicit loop, or a non-numpy operation (like sum()) here. numpy takes care of all your needs.

Subsetting A Pytorch Tensor Using Square-Brackets

I came across a line of code used to reduce a 3D Tensor to a 2D Tensor in PyTorch. The 3D tensor x is of size torch.Size([500, 50, 1]) and this line of code:
x = x[lengths - 1, range(len(lengths))]
was used to reduce x to a 2D tensor of size torch.Size([50, 1]). lengths is also a tensor of shape torch.Size([50]) containing values.
Please can anyone explain how this works? Thank you.
After being quite stumped by the behavior, I did some more digging into this, and found that it is consistent behavior with the indexing of multi-dimensional NumPy arrays. What makes this counter-intuitive is the less obvious fact that both arrays have to have the same length, i.e. in this case len(lengths).
In fact, it works as the following:
* lengths is determining the order in which you access the first dimension. I.e., if you have a 1D array a = [0, 1, 2, ...., 500], and access it with the list b = [300, 200, 100], then the result a[b] = [301, 201, 101] (This also explains the lengths - 1 operator, which simply causes the accessed values to be the same as the index used in b, or lengths, respectively).
* range(len(lengths)) then *simply chooses the i-th element in the i-th row. If you have a square matrix, you can interpret this as the diagonal of the matrix. Since you only access a single element for each position along the first two dimensions, this can be stored in a single dimension (thus reducing your 3D tensor to 2D). The latter dimension is simply kept "as is".
If you want to play around with this, I strongly recommend to change the range() value to something longer/shorter, which will result in the following error:
IndexError: shape mismatch: indexing arrays could not be broadcast
together with shapes (x,) (y,)
where x and y are your specific length values.
To write this accessing method out in the long form to understand what happens "under the hood", also consider the below example:
import torch
x = torch.randint(500, 50, 1)
lengths = torch.tensor([2, 30, 1, 4]) # random examples to explore
diag = list(range(len(lengths))) # [0, 1, 2, 3]
result = []
for i, row in enumerate(lengths):
temp_tensor = x[row, :, :] # temp_tensor.shape = [1, 50, 1]
temp_tensor = temp_tensor.squeeze(0)[diag[i]] # temp_tensor.shape = [1, 1]
result.append(temp.tensor)
# back to pytorch
result = torch.tensor(result)
result.shape # [4, 1]
The key feature here is passing values of a tensor lengths as indices for x.
Here simplified example, I swaped dimensions of container, so index dimenson goes first:
container = torch.arange(0, 50 )
container = f.reshape((5, 10))
>>>tensor([[ 0, 1, 2, 3, 4, 5, 6, 7, 8, 9],
[10, 11, 12, 13, 14, 15, 16, 17, 18, 19],
[20, 21, 22, 23, 24, 25, 26, 27, 28, 29],
[30, 31, 32, 33, 34, 35, 36, 37, 38, 39],
[40, 41, 42, 43, 44, 45, 46, 47, 48, 49]])
indices = torch.arange( 2, 7, dtype=torch.long )
>>>tensor([2, 3, 4, 5, 6])
print( container[ range( len(indices) ), indices] )
>>>tensor([ 2, 13, 24, 35, 46])
Note: we got one thing from a row ( range( len(indices) ) makes sequential row numbers), with column number given by indices[ row_number ]

How to use tf.map_fn to iterate over a tensor and return values of different dimensions for each iteration?

I wanted to iterate over a tensor without using Eager Execution for which I had to use tf.map_fn().
What I want to do can be shown as follows:
import tensorflow as tf
list_of_values = tf.constant([[1, 9, 65, 43], [8, 23, 21, 48], [11, 14, 98, 21], [98, 12, 32, 12]])
def value_finder(i):
def f1():
# Some computation with a local variable 'a' occurs
# . . .
return a # a = [[3, 4, 5]]
def f2():
# Some computation with a local variable 'b' occurs
# . . .
return b # b = [[7, 1, 2], [9, 3, 11]]
return tf.cond(tf.reduce_all(tf.less(tf.slice(i, [1], [1]), tf.constant(18))), f1, f2)
value_obtained = tf.map_fn(lambda i: value_finder(i), list_of_values))
The values a and b are not of the same dimension hence I get an error whenever I try to run my code. In my case, it is inevitable for the values that get returned to be of uneven dimension. Is there any way other way to iterate the tensor and get results other than to pad the values to make them of equal dimensions?

Avoid looping over arrays to get products of elements in numpy

I'm currently converting some old fortran code into python and looking to use numpy-style operations as much as I can, for speed.
The code calls for finding the products of all elements of two arrays, like so:
do i=1, nx
do j=1, ny
si(i,j) = xarray(i) * yarray(j)
enddo
enddo
so instead I have vectorized it like so:
for i, x in enumerate(xarray):
si[i] = x * yarray
but is there a way to remove that loop over x and generate the whole "nx x ny" array in one line, which would presumably be faster?
I think you are looking for np.outer
>>> nx = np.array([1,2,3,4])
>>> ny = np.array([2,3,4,5])
>>> np.outer(nx, ny)
array([[ 2, 3, 4, 5],
[ 4, 6, 8, 10],
[ 6, 9, 12, 15],
[ 8, 12, 16, 20]])
Try:
si = xarray.reshape(-1,1) * yarray

numpy 3d to 2d transformation based on 2d mask array

If I have an ndarray like this:
>>> a = np.arange(27).reshape(3,3,3)
>>> a
array([[[ 0, 1, 2],
[ 3, 4, 5],
[ 6, 7, 8]],
[[ 9, 10, 11],
[12, 13, 14],
[15, 16, 17]],
[[18, 19, 20],
[21, 22, 23],
[24, 25, 26]]])
I know I can get the maximum along a certain axis using np.max(axis=...):
>>> a.max(axis=2)
array([[ 2, 5, 8],
[11, 14, 17],
[20, 23, 26]])
Alternatively, I could get the indices along that axis which correspond to the maximum values from:
>>> indices = a.argmax(axis=2)
>>> indices
array([[2, 2, 2],
[2, 2, 2],
[2, 2, 2]])
My question -- Given the array indices and the array a, is there an elegant way to reproduce the array the array returned by a.max(axis=2)?
This would probably work:
import itertools as it
import numpy as np
def apply_mask(field,indices):
data = np.empty(indices.shape)
#It seems highly likely that there is a more numpy-approved way to do this.
idx = [range(i) for i in indices.shape]
for idx_tup,zidx in zip(it.product(*idx),indices.flat):
data[idx_tup] = field[idx_tup+(zidx,)]
return data
But, it seems pretty hacky/inefficient. It also doesn't allow for me to use this with any axis other than the "last" axis. Is there a numpy function (or some use of magical numpy indexing) to make this work? The naive a[:,:,a.argmax(axis=2)] doesn't work.
UPDATE:
It seems the following also works (and is a little nicer):
import numpy as np
def apply_mask(field,indices):
data = np.empty(indices.shape)
for idx_tup,zidx in np.ndenumerate(indices):
data[idx_tup] = field[idx_tup+(zidx,)]
return data
I would like to do this because I would like to extract the indices based on the data in 1 array (typically using argmax(axis=...)) and use those indices to pull data out of a bunch of other (equivalently shaped) arrays. I'm open to alternative ways to accomplish this (e.g. using boolean masked arrays). However, I like the "safety" that I get using these "index" arrays. With this I am guaranteed to have the right number of elements to create a new array which looks like a 2d "slice" through the 3d field.
Here is some magic numpy indexing that will do what you want, but unfortunately it's pretty unreadable.
def apply_mask(a, indices, axis):
magic_index = [np.arange(i) for i in indices.shape]
magic_index = np.ix_(*magic_index)
magic_index = magic_index[:axis] + (indices,) + magic_index[axis:]
return a[magic_index]
or equally unreadable:
def apply_mask(a, indices, axis):
magic_index = np.ogrid[tuple(slice(i) for i in indices.shape)]
magic_index.insert(axis, indices)
return a[magic_index]
I use index_at() to create the full index:
import numpy as np
def index_at(idx, shape, axis=-1):
if axis<0:
axis += len(shape)
shape = shape[:axis] + shape[axis+1:]
index = list(np.ix_(*[np.arange(n) for n in shape]))
index.insert(axis, idx)
return tuple(index)
a = np.random.randint(0, 10, (3, 4, 5))
axis = 1
idx = np.argmax(a, axis=axis)
print a[index_at(idx, a.shape, axis=axis)]
print np.max(a, axis=axis)

Categories

Resources