Building a numpy array as a function of previous element - python

I would like to create a numpy array where the first element is a defined constant, and every next element is defined as the function of the previous element in the following way:
import numpy as np
def build_array_recursively(length, V_0, function):
returnList = np.empty(length)
returnList[0] = V_0
for i in range(1,length):
returnList[i] = function(returnList[i-1])
return returnList
d_t = 0.05
print(build_array_recursively(20, 0.3, lambda x: x-x*d_t+x*x/2*d_t*d_t-x*x*x/6*d_t*d_t*d_t))
The print method above outputs
[0.3 0.28511194 0.27095747 0.25750095 0.24470843 0.23254756 0.22098752
0.20999896 0.19955394 0.18962586 0.18018937 0.17122037 0.16269589
0.15459409 0.14689418 0.13957638 0.13262186 0.1260127 0.11973187 0.11376316]
Is there a fast way of doing this in numpy without a for loop?
If so is there a way to handle two elements before the current one, e.g. can a Fibonacci array be constructed similarly?
I found a similar question here
Is it possible to vectorize recursive calculation of a NumPy array where each element depends on the previous one?
but was not answered in general. In my example, the difference equation is difficult to solve manually.

This is faster for what you want to do. You don't have to use recursion for the function.
Calculate the element based on previous element. Append calculated element to a list, and then change the list to numpy.
def method2(length, V_0, d_t):
k = [V_0]
x = V_0
for i in range(1, length):
x = x - x * d_t + x * x / 2 * d_t * d_t - x * x * x / 6 * d_t * d_t * d_t
k.append(x)
return np.asarray(k)
print(method2(20,0.3, 0.05))
Running you existing method 10000 times takes 0.438 seconds, while method2 takes 0.097 seconds.

Using a function to make the code clearer (instead of the inline lambda):
def fn(x):
return x-x*d_t+x*x/2*d_t*d_t-x*x*x/6*d_t*d_t*d_t
And a function that combines elements of build_array_recursively and method2:
def foo1(length, V_0, function):
returnList = np.empty(length)
returnList[0] = x = V_0
for i in range(1,length):
returnList[i] = x = function(x)
return returnList
In [887]: timeit build_array_recursively(20,0.3, fn);
61.4 µs ± 63 ns per loop (mean ± std. dev. of 7 runs, 10000 loops each)
In [888]: timeit method2(20,0.3, fn);
16.9 µs ± 103 ns per loop (mean ± std. dev. of 7 runs, 100000 loops each)
In [889]: timeit foo1(20,0.3, fn);
13 µs ± 29.7 ns per loop (mean ± std. dev. of 7 runs, 100000 loops each)
The main time saver in method2 and foo2 is carrying over x, the last value, from one iteration to the next, rather than indexing with returnList[i-1].
The accumulation method, assigning to a preallocated array, or list append, is less important. Performance is usually similar.
Here the calculation is simple enough that details of what you do in the loop makes a big difference in the overall time.
All of these are loops. Some ufunc have a reduce (and accumulate) method, that can apply the function repeatedly to a elements of the input array. np.sum, np.cumsum, etc make use of this. But you can't do that with a general Python function.
You have to use some sort of compilation tool like numba to perform this sort of loop much faster.

Related

Fast way to find the maximum of all elements before i in a list in Python

I have an array, X, which I want to make monotonic. Specifically, I want to do
y = x.copy()
for i in range(1, len(x)):
y[i] = np.max(x[:i])
This is extremely slow for large arrays, but it feels like there should be a more efficient way of doing this. How can this operation be sped up?
The OP implementation is very inefficient because it does not use the information acquired on the previous iteration, resulting in O(n²) complexity.
def max_acc_OP(arr):
result = np.empty_like(arr)
for i in range(len(arr)):
result[i] = np.max(arr[:i + 1])
return result
Note that I fixed the OP code (which was otherwise throwing a ValueError: zero-size array to reduction operation maximum which has no identity) by allowing to get the largest value among those up to position i included.
It is easy to adapt that so that values at position i are excluded, but it leaves the first value of the result undefined, and it would never use the last value of the input. The first value of the result can be taken to be equal to the first value of the input, e.g.:
def max_acc2_OP(arr):
result = np.empty_like(arr)
result[0] = arr[0] # uses first value of input
for i in range(1, len(arr) + 1):
result[i] = np.max(arr[:i])
return result
It is equally easy to have similar adaptations for the code below, and I do not think it is particularly relevant to cover both cases of the value at position i included and excluded. Henceforth, only the "included" case is covered.
Back to the efficiency of the solotion, if you keep track of the current maximum and use that to fill your output array instead of re-computing the maximum for all value up to i at each iteration, you can easily get to O(n) complexity:
def max_acc(arr):
result = np.empty_like(arr)
curr_max = arr[0]
for i, x in enumerate(arr):
if x > curr_max:
curr_max = x
result[i] = curr_max
return result
However, this is still relatively slow because of the explicit looping.
Luckily, one can either rewrite this in vectorized form combining np.fmax() (or np.maximum() -- depending on how you need NaNs to be handled) and np.ufunc.accumulate():
np.fmax.accumulate()
# or
np.maximum.accumulate()
or, accelerating the solution above with Numba:
max_acc_nb = nb.njit(max_acc)
Some timings on relatively large inputs are provided below:
n = 10000
arr = np.random.randint(0, n, n)
%timeit -n 4 -r 4 max_acc_OP(arr)
# 97.5 ms ± 14.2 ms per loop (mean ± std. dev. of 4 runs, 4 loops each)
%timeit -n 4 -r 4 np.fmax.accumulate(arr)
# 112 µs ± 134 µs per loop (mean ± std. dev. of 4 runs, 4 loops each)
%timeit -n 4 -r 4 np.maximum.accumulate(arr)
# 88.4 µs ± 107 µs per loop (mean ± std. dev. of 4 runs, 4 loops each)
%timeit -n 4 -r 4 max_acc(arr)
# 2.32 ms ± 146 µs per loop (mean ± std. dev. of 4 runs, 4 loops each)
%timeit -n 4 -r 4 max_acc_nb(arr)
# 9.11 µs ± 3.01 µs per loop (mean ± std. dev. of 4 runs, 4 loops each)
indicating that max_acc() is already much faster than max_acc_OP(), but np.maximum.accumulate() / np.fmax.accumulate() is even faster, and max_acc_nb() comes out as the fastest. As always, it is important to take these kind of numbers with a grain of salt.
I think it will work faster to just keep track of the maximum rather than calculating it each time for each sub-array
y = x.copy()
_max = y[0]
for i in range(1, len(x)):
y[i] = _max
_max = max(x[i], _max)
you can use list comprehension for it. but you need to start your loop from 1 not from 0. either you can use like that if you want loop from 0.
y=[np.max(x[:i+1]) for i in range(len(x))]
or like that
y=[np.max(x[:i]) for i in range(1,len(x)+1)]

change python list (mutable object in general) values with something like map, but not returning new object

Python map function works both on mutables and immutables, it returns new iterable object(for python 3.x).
However I would like only to change the values (of a mutable object eg. list, np.ndarray), in the nice and consistent manner similar to the map().
Given following function:
def perturbation_bin(x, prob):
'''
Perturbation operation for binary representation. Decide independently for each bit, if it will be inverted or not.
:param x: binary chromosome
:param prob: bit inversion probability
:return: None
'''
assert x.__class__ == np.ndarray, '\nBinary chromosome must np.ndarray.'
assert x.dtype == np.int8, '\nBinary chromosome np.dtype must be np.int8.'
x[np.where(np.random.rand(len(x)) <= prob)] += 1
map(partial(np.mod, x2=2), x)
This code for each 'bit' (np.int8) randomly adds or adds not +1, changing the numpy ndarray object passed to the function.
However in the next step I would like to apply modulo to all elements of this np.ndarray.
As the code is written now, it only creates new map object inside the function.
Does such map-like function in python exist, not returning new object but altering the mutable object values instead?
Thank you
No, it does not. But you could easily make one, as long as the object you are willing to modify is a Sequence (specifically, you would need __len__, __getitem__ and __setitem__ to be defined):
def apply(obj, func):
n = len(obj)
for i in range(n):
obj[i] = func(obj[i])
return obj # or None
This is not going to be the fastest execution around, but it may be accelerated with Numba (provided that you work with suitable objects, e.g. Numpy arrays).
As for your specific problem you can replace:
map(partial(np.mod, x2=2), x)
with:
apply(x, lambda x: x % 2)
or, with a much more performant:
import numba as nb
#nb.njit
def apply_nb(obj, func):
n = len(obj)
for i in range(n):
obj[i] = func(obj[i])
return obj
#nb.njit
def mod2_nb(x):
return x % 2
apply_nb(x, mod2_nb)
Note that many NumPy functions, like np.mod() also support a out parameter:
np.mod(x, 2, out=x)
which will perform the operation in-place.
Timewise, the Numba-based operation seems to have an edge:
import numpy as np
x = np.random.randint(0, 1000, 1000)
%timeit X = x.copy(); np.mod(X, 2, out=X)
# 9.47 µs ± 22.7 ns per loop (mean ± std. dev. of 7 runs, 100000 loops each)
%timeit X = x.copy(); apply_nb(X, mod2_nb)
# 7.51 µs ± 173 ns per loop (mean ± std. dev. of 7 runs, 100000 loops each)
%timeit X = x.copy(); apply(X, lambda x: x % 2)
# 350 µs ± 4.58 µs per loop (mean ± std. dev. of 7 runs, 1000 loops each)

More optimized math function using NumPy

for my class I need to write more optimized math function using NumPy. Problem is, when using NumPy my solutions are slower when native Python.
function which cubes all the elements of an array and sum them
Python:
def cube(x):
result = 0
for i in range(len(x)):
result += x[i] ** 3
return result
My, using NumPy (15-30% slower):
def cube(x):
it = numpy.nditer([x, None])
for a, b in it:
b[...] = a*a*a
return numpy.sum(it.operands[1])
Some random calculation function
Python:
def calc(x):
m = sum(x) / len(x)
result = 0
for i in range(len(x)):
result += (x[i] - m)**4
return result / len(x)
NumPy (>10x slower):
def calc(x):
m = numpy.mean(x)
result = 0
for i in range(len(x)):
result += numpy.power((x[i] - m), 4)
return result / len(x)
I don't know how to approatch this, so far I have tried random functions from NumPy
To elaborate on what has been said in comments:
Numpy's power comes from being able to do all the looping in fast c/fortran rather than slow Python looping. For example, if you have an array x and you want to calculate the square of every value in that array, you could do
y = []
for value in x:
y.append(value**2)
or even (with a list comprehension)
y = [value**2 for value in x]
but it will be much faster if you can do all the looping inside numpy with
y = x**2
(assuming x is already a numpy array).
So for your examples, the proper way to do it in numpy would be
1.
def sum_of_cubes(x):
result = 0
for i in range(len(x)):
result += x[i] ** 3
return result
def sum_of_cubes_numpy(x):
return (x**3).sum()
def calc(x):
m = sum(x) / len(x)
result = 0
for i in range(len(x)):
result += (x[i] - m)**4
return result / len(x)
def calc_numpy(x):
m = numpy.mean(x) # or just x.mean()
return numpy.sum((x - m)**4) / len(x)
Note that I've assumed that the input x is already a numpy array, not a regular Python list: if you have a list lst, you can create an array from it with arr = numpy.array(lst).
In [337]: def cube(x):
...: result = 0
...: for i in range(len(x)):
...: result += x[i] ** 3
...: return result
...:
nditer is not a good numpy iterator, at least not when used in Python level code. It's really just a stepping stone toward writing compiled code. It's docs need a better disclaimer.
In [338]: def cube1(x):
...: it = numpy.nditer([x, None])
...: for a, b in it:
...: b[...] = a*a*a
...: return numpy.sum(it.operands[1])
...:
In [339]: cube(list(range(10)))
Out[339]: 2025
In [340]: cube1(list(range(10)))
Out[340]: 2025
In [341]: cube1(np.arange(10))
Out[341]: 2025
A more direct numpy iteration:
In [342]: def cube2(x):
...: it = [a*a*a for a in x]
...: return numpy.sum(it)
...:
The better whole-array code. Just as sum can work with the whole array, the power also applies the whole.
In [343]: def cube3(x):
...: return numpy.sum(x**3)
...:
In [344]: cube2(np.arange(10))
Out[344]: 2025
In [345]: cube3(np.arange(10))
Out[345]: 2025
Doing some timings:
The list reference:
In [346]: timeit cube(list(range(1000)))
438 µs ± 9.87 µs per loop (mean ± std. dev. of 7 runs, 1000 loops each)
The slow nditer:
In [348]: timeit cube1(np.arange(1000))
2.8 ms ± 5.65 µs per loop (mean ± std. dev. of 7 runs, 100 loops each)
The partial numpy:
In [349]: timeit cube2(np.arange(1000))
520 µs ± 20 µs per loop (mean ± std. dev. of 7 runs, 1000 loops each)
I can improve its time by passing a list instead of an array. Iteration on lists is faster.
In [352]: timeit cube2(list(range(1000)))
229 µs ± 9.53 µs per loop (mean ± std. dev. of 7 runs, 1000 loops each)
But the time for a 'pure' numpy version blows all of those out of the water:
In [350]: timeit cube3(np.arange(1000))
23.6 µs ± 128 ns per loop (mean ± std. dev. of 7 runs, 10000 loops each)
The general rule is that numpy methods applied to a numpy array are fastest. But if you must loop, it's usually better to use lists.
Sometimes the pure numpy approach creates very large temporary array. Then memory management complexities can reduce performance. In such cases a modest of number of iterations on a complex task may be best.

is there an efficient way to iterate trough all ndarray elements

My problem is that i have a ndarray of shape (N,M,3) and i am trying to check each element in the array using a low level approach currently i am doing something like:
for i in range(N):
for j in range(M):
if ndarr[i][j][2] == 3:
ndarr[i][j][0] == var1
and most of the time the ndarray i need to process is very large usually around 1000x1000 .
the same idea i managed to run on cpp withing a couple of milisecondes in python it take around 30 seconds at best.
i would really appreciate if someone can explain to me or point me towards reading material on how to efficiently iterate trough ndarray
There is no way of doing that efficiently.
NumPy is a small Python wrapper around C code/datatypes. So an ndarray is actually a multidimensional C array. That means the memory address of the array is the address of the first element of the array. All other elements are stored consecutively in memory.
What your Python for loop does, is grabbing each element of the array and temporarily saving it somewhere else (as a Python datastructure) before stuffing it back in the C array. As I have said, there is no way of doing that efficiently with a Python loop.
What you could do is using Numba #jit to speed up the for loop or look after a NumPy routine, that can iterate over an array.
you can use logical indexing to do this more efficiently, it might be interesting to see how it compares with your c implementation.
import numpy as np
a = np.random.randn(2, 4, 3)
print(a)
idx = a[:, :, 2] > 0
a[idx, 0] = 9
print(a)
In Numpy you have to use vectorized-commands (usually calling a C or Cython-function) to achieve good performance. As an alternative you can use Numba or Cython.
Two possible Implementations
import numba as nb
import numpy as np
def calc_np(ndarr,var1):
ndarr[ndarr[:,:,0]==3]=var1
return ndarr
#nb.njit(parallel=True,cache=True)
def calc_nb(ndarr,var1):
for i in nb.prange(ndarr.shape[0]):
for j in range(ndarr.shape[1]):
if ndarr[i,j,2] == 3:
ndarr[i,j,0] == var1
return ndarr
Timings
ndarr=np.random.randint(low=0,high=3,size=(1000,1000,3))
%timeit calc_np(ndarr,2)
#780 µs ± 6.78 µs per loop (mean ± std. dev. of 7 runs, 1000 loops each)
#first call takes longer due to compilation overhead
res=calc_nb(ndarr,2)
%timeit calc(ndarr,2)
#55.2 µs ± 160 ns per loop (mean ± std. dev. of 7 runs, 10000 loops each)
Edit
You also use a wrong indexing method. ndarr[i] gives a 2d view on the original 3d-array, the next indexing operation [j] gives the next view on the previous view. This also has quite an impact on performance.
def calc_1(ndarr,var1):
for i in range(ndarr.shape[0]):
for j in range(ndarr.shape[1]):
if ndarr[i][j][2] == 3:
ndarr[i][j][0] == var1
return ndarr
def calc_2(ndarr,var1):
for i in range(ndarr.shape[0]):
for j in range(ndarr.shape[1]):
if ndarr[i,j,2] == 3:
ndarr[i,j,0] == var1
return ndarr
%timeit calc_1(ndarr,2)
#549 ms ± 11.1 ms per loop (mean ± std. dev. of 7 runs, 1 loop each)
%timeit calc_2(ndarr,2)
#321 ms ± 2.24 ms per loop (mean ± std. dev. of 7 runs, 1 loop each)

Why is numpy faster at finding non-zero elements in a matrix?

def nonzero(a):
row,colum = a.shape
nonzero_row = np.array([],dtype=int)
nonzero_col = np.array([],dtype=int)
for i in range(0,row):
for j in range(0,colum):
if a[i,j] != 0:
nonzero_row = np.append(nonzero_row,i)
nonzero_col = np.append(nonzero_col,j)
return (nonzero_row,nonzero_col)
The above code is much slower compared to
(row,col) = np.nonzero(edges_canny)
It would be great if I can get any direction how to increase the speed and why numpy functions are much faster?
There are 2 reasons why NumPy functions can outperform Pythons types:
The values inside the array are native types, not Python types. This means NumPy doesn't need to go through the abstraction layer that Python has.
NumPy functions are (mostly) written in C. That actually only matters in some cases because a lot of Python functions are also written in C, for example sum.
In your case you also do something really inefficient: You append to an array. That's one really expensive operation in the middle of a double loop. That's an obvious (and unnecessary) bottleneck right there. You would get amazing speedups just by using lists as nonzero_row and nonzero_col and only convert them to array just before you return:
def nonzero_list_based(a):
row,colum = a.shape
a = a.tolist()
nonzero_row = []
nonzero_col = []
for i in range(0,row):
for j in range(0,colum):
if a[i][j] != 0:
nonzero_row.append(i)
nonzero_col.append(j)
return (np.array(nonzero_row), np.array(nonzero_col))
The timings:
import numpy as np
def nonzero_original(a):
row,colum = a.shape
nonzero_row = np.array([],dtype=int)
nonzero_col = np.array([],dtype=int)
for i in range(0,row):
for j in range(0,colum):
if a[i,j] != 0:
nonzero_row = np.append(nonzero_row,i)
nonzero_col = np.append(nonzero_col,j)
return (nonzero_row,nonzero_col)
arr = np.random.randint(0, 10, (100, 100))
%timeit np.nonzero(arr)
# 315 µs ± 5.39 µs per loop (mean ± std. dev. of 7 runs, 1000 loops each)
%timeit nonzero_original(arr)
# 759 ms ± 12.6 ms per loop (mean ± std. dev. of 7 runs, 1 loop each)
%timeit nonzero_list_based(arr)
# 13.1 ms ± 492 µs per loop (mean ± std. dev. of 7 runs, 100 loops each)
Even though it's 40 times slower than the NumPy operation it's still more than 60 times faster than your approach. There's an important lesson here: Avoid np.append whenever possible!
One additional point why NumPy outperforms alternative approaches is because they (mostly) use state-of-the art approaches (or they "import" them, i.e. BLAS/LAPACK/ATLAS/MKL) to solve the problems. These algorithms have been optimized for correctness and speed over years (if not decades). You shouldn't expect to find a faster or even comparable solution.

Categories

Resources