complex() with arrays [duplicate] - python

I want to combine 2 parts of the same array to make a complex array:
Data[:,:,:,0] , Data[:,:,:,1]
These don't work:
x = np.complex(Data[:,:,:,0], Data[:,:,:,1])
x = complex(Data[:,:,:,0], Data[:,:,:,1])
Am I missing something? Does numpy not like performing array functions on complex numbers? Here's the error:
TypeError: only length-1 arrays can be converted to Python scalars

This seems to do what you want:
numpy.apply_along_axis(lambda args: [complex(*args)], 3, Data)
Here is another solution:
# The ellipsis is equivalent here to ":,:,:"...
numpy.vectorize(complex)(Data[...,0], Data[...,1])
And yet another simpler solution:
Data[...,0] + 1j * Data[...,1]
PS: If you want to save memory (no intermediate array):
result = 1j*Data[...,1]; result += Data[...,0]
devS' solution below is also fast.

There's of course the rather obvious:
Data[...,0] + 1j * Data[...,1]

If your real and imaginary parts are the slices along the last dimension and your array is contiguous along the last dimension, you can just do
A.view(dtype=np.complex128)
If you are using single precision floats, this would be
A.view(dtype=np.complex64)
Here is a fuller example
import numpy as np
from numpy.random import rand
# Randomly choose real and imaginary parts.
# Treat last axis as the real and imaginary parts.
A = rand(100, 2)
# Cast the array as a complex array
# Note that this will now be a 100x1 array
A_comp = A.view(dtype=np.complex128)
# To get the original array A back from the complex version
A = A.view(dtype=np.float64)
If you want to get rid of the extra dimension that stays around from the casting, you could do something like
A_comp = A.view(dtype=np.complex128)[...,0]
This works because, in memory, a complex number is really just two floating point numbers. The first represents the real part, and the second represents the imaginary part.
The view method of the array changes the dtype of the array to reflect that you want to treat two adjacent floating point values as a single complex number and updates the dimension accordingly.
This method does not copy any values in the array or perform any new computations, all it does is create a new array object that views the same block of memory differently.
That makes it so that this operation can be performed much faster than anything that involves copying values.
It also means that any changes made in the complex-valued array will be reflected in the array with the real and imaginary parts.
It may also be a little trickier to recover the original array if you remove the extra axis that is there immediately after the type cast.
Things like A_comp[...,np.newaxis].view(np.float64) do not currently work because, as of this writing, NumPy doesn't detect that the array is still C-contiguous when the new axis is added.
See this issue.
A_comp.view(np.float64).reshape(A.shape) seems to work in most cases though.

This is what your are looking for:
from numpy import array
a=array([1,2,3])
b=array([4,5,6])
a + 1j*b
->array([ 1.+4.j, 2.+5.j, 3.+6.j])

I am python novice so this may not be the most efficient method but, if I understand the intent of the question correctly, steps listed below worked for me.
>>> import numpy as np
>>> Data = np.random.random((100, 100, 1000, 2))
>>> result = np.empty(Data.shape[:-1], dtype=complex)
>>> result.real = Data[...,0]; result.imag = Data[...,1]
>>> print Data[0,0,0,0], Data[0,0,0,1], result[0,0,0]
0.0782889873474 0.156087854837 (0.0782889873474+0.156087854837j)

import numpy as np
n = 51 #number of data points
# Suppose the real and imaginary parts are created independently
real_part = np.random.normal(size=n)
imag_part = np.random.normal(size=n)
# Create a complex array - the imaginary part will be equal to zero
z = np.array(real_part, dtype=complex)
# Now define the imaginary part:
z.imag = imag_part
print(z)

I use the following method:
import numpy as np
real = np.ones((2, 3))
imag = 2*np.ones((2, 3))
complex = np.vectorize(complex)(real, imag)
# OR
complex = real + 1j*imag

If you really want to eke out performance (with big arrays), numexpr can be used, which takes advantage of multiple cores.
Setup:
>>> import numpy as np
>>> Data = np.random.randn(64, 64, 64, 2)
>>> x, y = Data[...,0], Data[...,1]
With numexpr:
>>> import numexpr as ne
>>> %timeit result = ne.evaluate("complex(x, y)")
573 µs ± 21.1 µs per loop (mean ± std. dev. of 7 runs, 1000 loops each)
Compared to fast numpy method:
>>> %timeit result = np.empty(x.shape, dtype=complex); result.real = x; result.imag = y
1.39 ms ± 5.74 µs per loop (mean ± std. dev. of 7 runs, 1000 loops each)

That worked for me:
input:
[complex(a,b) for a,b in zip([1,2,3],[1,2,3])]
output:
[(1+4j), (2+5j), (3+6j)]

Related

Dask Distributed: Reducing Multiple Dimensions into a Distance Matrix

I want to calculate a large distance matrix, based on a higher dimensional vector. For instance, I have 1000 instances each represented by 20 vectors of length 10. The distance between each two instances is given by the mean distance between each of the 20 vectors associated to each vector. So I want to go from a 1000 by 20 by 10 matrix to a 1000 by 1000 (lower-triangular) matrix. Because these calculations can get slow, I want to use Dask distributed to block the algorithm and spread it over several CPU's. Below is how far I've gotten:
Preamble
import itertools
import random
import numpy as np
import dask.array
from dask.distributed import Client
The distance function is defined by
def distance(u, v):
result = np.empty([int((len(u)*(len(u)+1))/2)], dtype=float)
for i, j in itertools.product(range(len(u)),range(len(v))):
if j <= i:
differences = []
k = int(((i*(i+1))/2 +j-1)+1)
for x,y in itertools.product(u[i], v[j]):
difference = np.abs(np.array(x) - np.array(y)).sum(axis=1)
differences.apply(difference)
result[k] = np.mean(differences)
return result
and returns an array of length n*(n+1)/2 to describe the lower triangular matrix for this block of the distance matrix.
def distance_matrix(X):
X = np.asarray(X, dtype=object)
X = dask.array.from_array(X, (100, 20, 10)).astype(float)
print("chunksize: ", X.chunksize)
resulting_length = [int((X.chunksize[0]*(X.chunksize[0])+1)/2)]
result = dask.array.map_blocks(distance, X, X, chunks=(resulting_length), drop_axis=[1,2], dtype=float)
return result.compute()
I split up the input array in chunks and use dask.array.map_blocks to apply the distance calculation to all the blocks.
if __name__ == '__main__':
workers = 6
X = np.array([[[random.random() for _ in range(10)] for _ in range(20)] for _ in range(1000)])
client = Client(n_workers=workers)
results = similarity_matrix(X)
client.close()
print(results)
Unfortunately, this approach returns the wrong length of array at the end of the process. Would somebody to help me out here? I don't have much experience in distributed computing.
I'm a big fan of dask, but this problem is way too small to need it. The runtime issue you're seeing is because you are looping through each element in python rather than using vectorized operations in numpy.
As with many packages in python, numpy relies on highly efficient compiled code written in other, faster languages such as C to carry out array operations. When you do something like an array operation A + B numpy calls these fast routines once, and the array operations are carried out within a highly optimized C routine. There is overhead involved with making calls to other languages, but this is overwhelmed by the performance gain due to the single call to a very fast routine. If instead you loop over every element, adding cell-wise, you have a (slow) python process, and on each element, this calls the C code, which adds overhead for each element of the array. Because of this, you actually would be better off not using numpy if you're going to do this once for each element.
To implement this in a vectorized manner, you can exploit numpy's broadcasting rules to ensure the first dimensions of your two arrays expand to a new dimension. I don't totally understand what's going on in your distance function, but you could extend this simple version to do whatever you want:
In [1]: import numpy as np
In [2]: A = np.random.random((1000, 20))
...: B = np.random.random((1000, 20))
In [3]: distance = np.abs(A[:, np.newaxis, :] - B[np.newaxis, :, :]).sum(axis=-1)
In [4]: distance
Out[4]:
array([[7.22985776, 7.76185666, 5.61824886, ..., 7.62092039, 6.35189562,
7.06365986],
[5.73359499, 5.8422105 , 7.2644021 , ..., 5.72230353, 6.79390303,
5.03074007],
[7.27871151, 8.6856818 , 5.97489449, ..., 8.86620029, 7.49875638,
6.57389575],
...,
[7.67783107, 7.24419076, 4.17941596, ..., 8.68674754, 6.65078093,
5.67279811],
[7.1550136 , 6.10590227, 5.75417987, ..., 7.05953998, 5.8306628 ,
6.55112672],
[5.81748615, 6.79246838, 6.95053088, ..., 7.63994705, 6.77720511,
7.5663236 ]])
In [5]: distance.shape
Out[5]: (1000, 1000)
The performance difference can be seen clearly against a looped implementation:
In [6]: %%timeit
...: np.abs(A[:, np.newaxis, :] - B[np.newaxis, :, :]).sum(axis=-1)
...:
...:
45 ms ± 326 µs per loop (mean ± std. dev. of 7 runs, 10 loops each)
In [7]: %%timeit
...: distances = np.empty((1000, 1000))
...: for i in range(1000):
...: for j in range(1000):
...: distances[i, j] = np.abs(A[i, :] - B[j, :]).sum()
...:
2.42 s ± 7.6 ms per loop (mean ± std. dev. of 7 runs, 1 loop each)
The looped version takes more than 50x as long!

Applying a mapping function to each member of an ndarray with indices as arguments

I have an ndarray representing an RGB image with the shape of (width,height,3) and I wish to replace each value with the result of some function of itself, its position and the color channel it belongs to. Doing so in three nested for loops is extremely slow, is there a way to express this as a native array operation ?
Edit: looking for an in-place solution - one that does not involve creating another O(width x height) ndarray (unless numpy has some magic that can prevent such ndarray from actually being allocated)
I'm not sure if I got your question correct! What I understood is that you want to apply a mapping on each channel of your RGB image based on their corresponding indices, if so the code below MIGHT help, since no details was available in your question.
import numpy as np
bit_depth = 8
patch_size = 32
def lut_generator(constant_multiplier):
x = np.arange(2 ** bit_depth)
y = constant_multiplier * x
return dict(zip(x, y))
rgb = np.random.randint(0, (2**bit_depth), (patch_size, patch_size, 3))
# Considering a simple lookup table without using indices.
lut = lut_generator(5)
# splitting three channels followed and their respective indices.
# You can use indexes wherever you need them.
r, g, b = np.dsplit(rgb, rgb.shape[-1])
indexes = np.arange(rgb.size).reshape(rgb.shape)
r_idx, g_idx, b_idx = np.dsplit(indexes, indexes.shape[-1])
# Apply transformation on each channel.
transformed_r = np.vectorize(lut.get)(r)
transformed_g = np.vectorize(lut.get)(g)
transformed_b = np.vectorize(lut.get)(b)
Good luck!
Take note of the qualifications in many of the comments, using numpy arithmetic directly will often be easier and faster.
import numpy as np
def test(item, ix0, ix1, ix2):
# A function with the required signature. This you customise to suit.
return item*(ix0+ix1+ix2)//202
def make_function_for(arr, f):
''' where arr is a 3D numpy array and f is a function taking four arguments.
item : the item from the array
ix0 ... ix2 : the three indices
it returns the required result from these 4 arguments.
'''
def user_f(ix0, ix1, ix2):
# np.fromfunction requires only the three indices as arguments.
ix0=ix0.astype(np.int32)
ix1=ix1.astype(np.int32)
ix2=ix2.astype(np.int32)
return f(arr[ix0, ix1, ix2], ix0, ix1, ix2)
return user_f
# user_f is a function suitable for calling in np.fromfunction
a=np.arange(100*100*3)
a.shape=100,100,3
a[...]=np.fromfunction(make_function_for(a, test), a.shape)
My test function is pretty simple so I can do it in numpy.
Using fromfunction:
%timeit np.fromfunction(make_function_for(a, test), a.shape)
5.7 ms ± 346 µs per loop (mean ± std. dev. of 7 runs, 100 loops each)
Using numpy arithmetic:
def alt_func(arr):
temp=np.add.outer(np.arange(arr.shape[0]), np.arange(arr.shape[1]))
temp=np.add.outer(temp,np.arange(arr.shape[2]))
return arr*temp//202
%timeit alt_func(a)
967 µs ± 4.94 µs per loop (mean ± std. dev. of 7 runs, 1000 loops each)
So numpy arithmetic is almost 6 times as fast on my machine for this case.
Edited to correct my seemingly inevitable typos!

Computing distances from points to a line in vector form efficiently

Computing the distance of a point to a line in vector form is trivial.
However, I implemented it the following way, which is extremely slow:
def compute_point_distance_to_line(point):
dist = np.linalg.norm((a - point) - np.vdot((a - point), n) * n)
return dist
np.apply_along_axis(compute_point_distance_to_line, 1, xyz)
I used the notation from wikipedia, xyz shape is (2521909, 3), shape of a, n and point is consequently (3,)
I tried it the following way:
def compute_point_distance_to_line2(points):
_a = np.tile(a, (points.shape[0], 1))
_n = np.tile(n, (points.shape[0], 1))
_n_t = np.ascontiguousarray(np.swapaxes(_n, 0, 1))
diffs = _a - points
vdots_scaled = np.dot(diffs, _n_t) * n
diffs = diffs - vdots_scaled
return np.linalg.norm(diffs, axis=1)
Unfortunately, for me this results in a Memory Error when computing the dot product.
Are there any faster ways?
You can vectorize this with something like:
temp = np.subtract(a, xyz) # so we only have to compute this once
dist = np.linalg.norm(np.subtract(temp, np.multiply(np.dot(temp, n)[:, None], n)),
axis=-1)
# 220 ms ± 6 ms per loop (mean ± std. dev. of 7 runs, 1 loop each)
Compared with timing for your first code example above:
# 30 s ± 1.89 s per loop (mean ± std. dev. of 7 runs, 1 loop each)
It's giving me the same result as your first code example (checked with np.array_equal()), and it is a couple of orders of magnitude faster.
Explanation
The trick is getting the np.multiply() call to work correctly by adding an extra axis to the result of np.dot(), which I do with the [:, None] slice after np.dot(). Basically None used in numpy slices is a shortcut for adding an axis, so the result of np.dot() for you should have shape (2521909,), and after the brackets with None, it will have shape (2521909, 1). The result of np.multiply() (and temp) will then have shape (2521909, 3), and we take the norm along the last axis to get the 3-dimensional distance from the line to each of your 2521909 points.
In general, try not to use operations like tile when you can use broadcasting instead, especially when speed/memory is an issue.
https://docs.scipy.org/doc/numpy-1.15.0/user/basics.broadcasting.html
With a little algebra you can write this as a single matrix vector product, followed by the norm. This may help you avoid temporary variables and save on memory.
Here is a working example. Note that in this example all the 3D vectors are column vectors so p is 3x1000 instead of 1000x3. You will have to transpose your p to plug it into this example.
import numpy as np
# define an example line with unit n
a = np.array([1,2,3])
n = np.array([4,5,6])
norm2n = np.sum(n**2)
n = n/np.sqrt(norm2n)
# get some point data p
p = np.random.randn(3,1000)
# form the projection matrix (see use of None in broadcasting at link above)
P = np.eye(3) - n[:,None]*n[None,:]
# perform the projection using matrix multiplication
projected = P.dot(a[:,None]-p)
# get the distance
dist = np.sqrt(np.sum(projected**2, axis=0))

When to use .shape and when to use .reshape?

I ran into a memory problem when trying to use .reshape on a numpy array and figured if I could somehow reshape the array in place that would be great.
I realised that I could reshape arrays by simply changing the .shape value.
Unfortunately when I tried using .shape I again got a memory error which has me thinking that it doesn't reshape in place.
I was wondering when do I use one when do I use the other?
Any help is appreciated.
If you want additional information please let me know.
EDIT:
I added my code and how the matrix I want to reshape is created in case that is important.
Change the N value depending on your memory.
import numpy as np
N = 100
a = np.random.rand(N, N)
b = np.random.rand(N, N)
c = a[:, np.newaxis, :, np.newaxis] * b[np.newaxis, :, np.newaxis, :]
c = c.reshape([N*N, N*N])
c.shape = ([N, N, N, N])
EDIT2:
This is a better representation. Apparently the transpose seems to be important as it changes the arrays from C-contiguous to F-contiguous, and the resulting multiplication in above case is contiguous while in the one below it is not.
import numpy as np
N = 100
a = np.random.rand(N, N).T
b = np.random.rand(N, N).T
c = a[:, np.newaxis, :, np.newaxis] * b[np.newaxis, :, np.newaxis, :]
c = c.reshape([N*N, N*N])
c.shape = ([N, N, N, N])
numpy.reshape will copy the data if it can't make a proper view, whereas setting the shape will raise an error instead of copying the data.
It is not always possible to change the shape of an array without copying the data. If you want an error to be raise if the data is copied, you should assign the new shape to the shape attribute of the array.
I would like to revisit this question focusing on OOP paradigm, despite memory issues presented as the problem.
When to use .shape and when to use .reshape?
OOP principle of Encapsulation
Following OOP paradigms, since shape is a property of the object numpy.array it is always advisable to call an object.method to change properties. This adheres to OOP principle of encapsulation.
Performance Issues
As for performance, there seems to be no difference.
import numpy as np
# creates an array of 1,000,000 random floats
a = np.array(np.random.rand(1_000_000))
# (1000000,)
a.shape
# using IPython to time both operations resulted in
# 201 ns ± 4.85 ns per loop (mean ± std. dev. of 7 runs, 10000000 loops each)
%timeit a.shape = (5_000, 200)
# 217 ns ± 0.957 ns per loop (mean ± std. dev. of 7 runs, 1000000 loops each)
%timeit a.reshape (5_000, 200)
Running hardware
OS : Linux 4.15.0-142-generic #146~16.04.1-Ubuntu
CPU: Intel(R) Core(TM) i3-4170 CPU # 3.70GHz 4 cores
RAM: 16BG

Is there a way to efficiently invert an array of matrices with numpy?

Normally I would invert an array of 3x3 matrices in a for loop like in the example below. Unfortunately for loops are slow. Is there a faster, more efficient way to do this?
import numpy as np
A = np.random.rand(3,3,100)
Ainv = np.zeros_like(A)
for i in range(100):
Ainv[:,:,i] = np.linalg.inv(A[:,:,i])
It turns out that you're getting burned two levels down in the numpy.linalg code. If you look at numpy.linalg.inv, you can see it's just a call to numpy.linalg.solve(A, inv(A.shape[0]). This has the effect of recreating the identity matrix in each iteration of your for loop. Since all your arrays are the same size, that's a waste of time. Skipping this step by pre-allocating the identity matrix shaves ~20% off the time (fast_inverse). My testing suggests that pre-allocating the array or allocating it from a list of results doesn't make much difference.
Look one level deeper and you find the call to the lapack routine, but it's wrapped in several sanity checks. If you strip all these out and just call lapack in your for loop (since you already know the dimensions of your matrix and maybe know that it's real, not complex), things run MUCH faster (Note that I've made my array larger):
import numpy as np
A = np.random.rand(1000,3,3)
def slow_inverse(A):
Ainv = np.zeros_like(A)
for i in range(A.shape[0]):
Ainv[i] = np.linalg.inv(A[i])
return Ainv
def fast_inverse(A):
identity = np.identity(A.shape[2], dtype=A.dtype)
Ainv = np.zeros_like(A)
for i in range(A.shape[0]):
Ainv[i] = np.linalg.solve(A[i], identity)
return Ainv
def fast_inverse2(A):
identity = np.identity(A.shape[2], dtype=A.dtype)
return array([np.linalg.solve(x, identity) for x in A])
from numpy.linalg import lapack_lite
lapack_routine = lapack_lite.dgesv
# Looking one step deeper, we see that solve performs many sanity checks.
# Stripping these, we have:
def faster_inverse(A):
b = np.identity(A.shape[2], dtype=A.dtype)
n_eq = A.shape[1]
n_rhs = A.shape[2]
pivots = zeros(n_eq, np.intc)
identity = np.eye(n_eq)
def lapack_inverse(a):
b = np.copy(identity)
pivots = zeros(n_eq, np.intc)
results = lapack_lite.dgesv(n_eq, n_rhs, a, n_eq, pivots, b, n_eq, 0)
if results['info'] > 0:
raise LinAlgError('Singular matrix')
return b
return array([lapack_inverse(a) for a in A])
%timeit -n 20 aI11 = slow_inverse(A)
%timeit -n 20 aI12 = fast_inverse(A)
%timeit -n 20 aI13 = fast_inverse2(A)
%timeit -n 20 aI14 = faster_inverse(A)
The results are impressive:
20 loops, best of 3: 45.1 ms per loop
20 loops, best of 3: 38.1 ms per loop
20 loops, best of 3: 38.9 ms per loop
20 loops, best of 3: 13.8 ms per loop
EDIT: I didn't look closely enough at what gets returned in solve. It turns out that the 'b' matrix is overwritten and contains the result in the end. This code now gives consistent results.
A few things have changed since this question was asked and answered, and now numpy.linalg.inv supports multidimensional arrays, handling them as stacks of matrices with matrix indices being last (in other words, arrays of shape (...,M,N,N)). This seems to have been introduced in numpy 1.8.0. Unsurprisingly this is by far the best option in terms of performance:
import numpy as np
A = np.random.rand(3,3,1000)
def slow_inverse(A):
"""Looping solution for comparison"""
Ainv = np.zeros_like(A)
for i in range(A.shape[-1]):
Ainv[...,i] = np.linalg.inv(A[...,i])
return Ainv
def direct_inverse(A):
"""Compute the inverse of matrices in an array of shape (N,N,M)"""
return np.linalg.inv(A.transpose(2,0,1)).transpose(1,2,0)
Note the two transposes in the latter function: the input of shape (N,N,M) has to be transposed to shape (M,N,N) for np.linalg.inv to work, then the result has to be permuted back to shape (M,N,N).
A check and timing results using IPython, on python 3.6 and numpy 1.14.0:
In [5]: np.allclose(slow_inverse(A),direct_inverse(A))
Out[5]: True
In [6]: %timeit slow_inverse(A)
19 ms ± 138 µs per loop (mean ± std. dev. of 7 runs, 10 loops each)
In [7]: %timeit direct_inverse(A)
1.3 ms ± 6.39 µs per loop (mean ± std. dev. of 7 runs, 1000 loops each)
Numpy-Blas calls are not always the fastest possibility
On problems where you have to calculate lots of inverses, eigenvalues, dot-products of small 3x3 matrices or similar cases, numpy-MKL which I use can often be outperformed by quite a margin.
This external Blas routines are usually made for problems with larger matrices, for smaller ones you can write out a standard algorithm or take a look at eg. Intel IPP.
Please keep also in mind that Numpy uses C-ordered arrays by default (last dimension changes fastest).
For this example I took the code from Matrix inversion (3,3) python - hard coded vs numpy.linalg.inv and modified it a bit.
import numpy as np
import numba as nb
import time
#nb.njit(fastmath=True)
def inversion(m):
minv=np.empty(m.shape,dtype=m.dtype)
for i in range(m.shape[0]):
determinant_inv = 1./(m[i,0]*m[i,4]*m[i,8] + m[i,3]*m[i,7]*m[i,2] + m[i,6]*m[i,1]*m[i,5] - m[i,0]*m[i,5]*m[i,7] - m[i,2]*m[i,4]*m[i,6] - m[i,1]*m[i,3]*m[i,8])
minv[i,0]=(m[i,4]*m[i,8]-m[i,5]*m[i,7])*determinant_inv
minv[i,1]=(m[i,2]*m[i,7]-m[i,1]*m[i,8])*determinant_inv
minv[i,2]=(m[i,1]*m[i,5]-m[i,2]*m[i,4])*determinant_inv
minv[i,3]=(m[i,5]*m[i,6]-m[i,3]*m[i,8])*determinant_inv
minv[i,4]=(m[i,0]*m[i,8]-m[i,2]*m[i,6])*determinant_inv
minv[i,5]=(m[i,2]*m[i,3]-m[i,0]*m[i,5])*determinant_inv
minv[i,6]=(m[i,3]*m[i,7]-m[i,4]*m[i,6])*determinant_inv
minv[i,7]=(m[i,1]*m[i,6]-m[i,0]*m[i,7])*determinant_inv
minv[i,8]=(m[i,0]*m[i,4]-m[i,1]*m[i,3])*determinant_inv
return minv
#I was to lazy to modify the code from the link above more thoroughly
def inversion_3x3(m):
m_TMP=m.reshape(m.shape[0],9)
minv=inversion(m_TMP)
return minv.reshape(minv.shape[0],3,3)
#Testing
A = np.random.rand(1000000,3,3)
#Warmup to not measure compilation overhead on the first call
#You may also use #nb.njit(fastmath=True,cache=True) but this has also about 0.2s
#overhead on fist call
Ainv = inversion_3x3(A)
t1=time.time()
Ainv = inversion_3x3(A)
print(time.time()-t1)
t1=time.time()
Ainv2 = np.linalg.inv(A)
print(time.time()-t1)
print(np.allclose(Ainv2,Ainv))
Performance
np.linalg.inv: 0.36 s
inversion_3x3: 0.031 s
For loops are indeed not necessarily much slower than the alternatives and also in this case, it will not help you much. But here is a suggestion:
import numpy as np
A = np.random.rand(100,3,3) #this is to makes it
#possible to index
#the matrices as A[i]
Ainv = np.array(map(np.linalg.inv, A))
Timing this solution vs. your solution yields a small but noticeable difference:
# The for loop:
100 loops, best of 3: 6.38 ms per loop
# The map:
100 loops, best of 3: 5.81 ms per loop
I tried to use the numpy routine 'vectorize' with the hope of creating an even cleaner solution, but I'll have to take a second look into that. The change of ordering in the array A is probably the most significant change, since it utilises the fact that numpy arrays are ordered column-wise and therefor a linear readout of the data is ever so slightly faster this way.

Categories

Resources