I have an array like:
[10 20 30 40]
And I want to build a matrix M1 like this:
10 0 0 0
20 10 0 0
30 20 10 0
40 30 20 10
My approach is to first build the following matrix M2 out of consecutive "rolls" of the array:
10 20 30 40
20 10 40 30
30 20 10 40
40 30 20 10
And then take the lower triangular matrix with np.tril. I would be interested then in efficient methods to build M2 or M1 directly without through M2.
A simple way to build M2 could be:
import numpy as np
def M2_simple(a):
a = np.asarray(a)
return np.stack([a[np.arange(-i, len(a) - i)] for i in range(len(a))]).T
print(M2_simple(np.array([10, 20, 30, 40])))
# [[10 40 30 20]
# [20 10 40 30]
# [30 20 10 40]
# [40 30 20 10]]
After some trying I came to the following, better solution, based on advanced indexing:
def M2_indexing(a):
a = np.asarray(a)
r = np.arange(len(a))[np.newaxis]
return a[np.newaxis][np.zeros_like(r), r - r.T].T
This is obviously much faster than the previous, but measuring the performance still seems not as fast as it could be (for example, it takes order of magnitude longer than tiling, not being a so different operation), and it requires me to build big indexing matrices.
Is there a better way to build these matrices?
EDIT:
Actually, you can build M1 directly using the same method:
import numpy as np
def M1_strided(a):
a = np.asarray(a)
n = len(a)
s, = a.strides
a0 = np.concatenate([np.zeros(len(a) - 1, a.dtype), a])
return np.lib.stride_tricks.as_strided(
a0, (n, n), (s, s), writeable=False)[:, ::-1]
print(M1_strided(np.array([10, 20, 30, 40])))
# [[10 0 0 0]
# [20 10 0 0]
# [30 20 10 0]
# [40 30 20 10]]
In this case the speed benefit is even better, since you are saving the call to np.tril:
N = 100
a = np.square(np.arange(N))
%timeit np.tril(M2_simple(a))
# 792 µs ± 15.3 µs per loop (mean ± std. dev. of 7 runs, 1000 loops each)
%timeit np.tril(M2_indexing(a))
# 259 µs ± 9.45 µs per loop (mean ± std. dev. of 7 runs, 1000 loops each)
%timeit np.tril(M2_strided(a))
# 134 µs ± 1.68 µs per loop (mean ± std. dev. of 7 runs, 10000 loops each)
%timeit M1_strided(a)
# 45.2 µs ± 583 ns per loop (mean ± std. dev. of 7 runs, 10000 loops each)
You can build the M2 matrix more efficiently with np.lib.stride_tricks.as_strided:
import numpy as np
from numpy.lib.stride_tricks import as_strided
def M2_strided(a):
a = np.asarray(a)
n = len(a)
s, = a.strides
return np.lib.stride_tricks.as_strided(
np.tile(a[::-1], 2), (n, n), (s, s), writeable=False)[::-1]
As an extra benefit, you will only use twice as much memory as the original array (as opposed to the squared size). You just need to be careful not to write to the array created like this (which should not be a problem if you are going to call np.tril later on in) - I added writeable=False to disallow writing operations.
A quick speed comparison with IPython:
N = 100
a = np.square(np.arange(N))
%timeit M2_simple(a)
# 693 µs ± 17.9 µs per loop (mean ± std. dev. of 7 runs, 1000 loops each)
%timeit M2_indexing(a)
# 163 µs ± 1.88 µs per loop (mean ± std. dev. of 7 runs, 10000 loops each)
%timeit M2_strided(a)
# 38.3 µs ± 348 ns per loop (mean ± std. dev. of 7 runs, 10000 loops each)
Another one using as_strided similar to #jdehesa's solution , but with negative strides that saves us the flipping at the end, like so -
def strided_app2(a):
n = len(a)
ae = np.concatenate((np.zeros(n-1,dtype=a.dtype),a))
s = a.strides[0]
return np.lib.stride_tricks.as_strided(ae[n-1:],(n,n),(s,-s),writeable=False)
Sample run -
In [66]: a
Out[66]: array([10, 20, 30, 40])
In [67]: strided_app2(a)
Out[67]:
array([[10, 0, 0, 0],
[20, 10, 0, 0],
[30, 20, 10, 0],
[40, 30, 20, 10]])
Digging further
Going deeper into the timings for each step, it's revealed that the bottleneck is the concatenation part. So, we can employ array-initialization, giving us an alternative one and seems to be much better for large arrays, like so -
def strided_app3(a):
n = len(a)
ae = np.zeros(2*n-1,dtype=a.dtype)
ae[-n:] = a
s = a.strides[0]
return np.lib.stride_tricks.as_strided(ae[n-1:],(n,n),(s,-s),writeable=False)
Timings -
In [55]: a = np.random.rand(100000)
In [56]: %timeit M1_strided(a) ##jdehesa's soln
...: %timeit strided_app2(a)
...: %timeit strided_app3(a)
10000 loops, best of 3: 107 µs per loop
10000 loops, best of 3: 94.5 µs per loop
10000 loops, best of 3: 84.4 µs per loop
In [61]: a = np.random.rand(1000000)
In [62]: %timeit M1_strided(a) ##jdehesa's soln
...: %timeit strided_app2(a)
...: %timeit strided_app3(a)
100 loops, best of 3: 2.02 ms per loop
100 loops, best of 3: 2 ms per loop
1000 loops, best of 3: 1.84 ms per loop
In [63]: a = np.random.rand(10000000)
In [64]: %timeit M1_strided(a) ##jdehesa's soln
...: %timeit strided_app2(a)
...: %timeit strided_app3(a)
10 loops, best of 3: 25.2 ms per loop
10 loops, best of 3: 24.6 ms per loop
100 loops, best of 3: 13.9 ms per loop
Actually, there is a builtin for that:
>>> import scipy.linalg as sl
>>> sl.toeplitz([10,20,30,40], [0,0,0,0])
array([[10, 0, 0, 0],
[20, 10, 0, 0],
[30, 20, 10, 0],
[40, 30, 20, 10]])
Related
I have a numpy array of hex string (eg: ['9', 'A', 'B']) and want to convert them all to integers between 0 255. The only way I know how to do this is use a for loop and append a seperate numpy array.
import numpy as np
hexArray = np.array(['9', 'A', 'B'])
intArray = np.array([])
for value in hexArray:
intArray = np.append(intArray, [int(value, 16)])
print(intArray) # output: [ 9. 10. 11.]
Is there a better way to do this?
A vectorized way with array's-view functionality -
In [65]: v = hexArray.view(np.uint8)[::4]
In [66]: np.where(v>64,v-55,v-48)
Out[66]: array([ 9, 10, 11], dtype=uint8)
Timings
Setup with given sample scaled-up by 1000x -
In [75]: hexArray = np.array(['9', 'A', 'B'])
In [76]: hexArray = np.tile(hexArray,1000)
# #tianlinhe's soln
In [77]: %timeit [int(value, 16) for value in hexArray]
1.08 ms ± 5.67 µs per loop (mean ± std. dev. of 7 runs, 1000 loops each)
# #FBruzzesi soln
In [78]: %timeit list(map(functools.partial(int, base=16), hexArray))
1.5 ms ± 40.5 µs per loop (mean ± std. dev. of 7 runs, 1000 loops each)
# From this post
In [79]: %%timeit
...: v = hexArray.view(np.uint8)[::4]
...: np.where(v>64,v-55,v-48)
15.9 µs ± 294 ns per loop (mean ± std. dev. of 7 runs, 100000 loops each)
With the use of list comprehension:
array1=[int(value, 16) for value in hexArray]
print (array1)
output:
[9, 10, 11]
Alternative using map:
import functools
list(map(functools.partial(int, base=16), hexArray))
[9, 10, 11]
intArray = [int(hexNum, 16) for hexNum in list(hexArray)]
Try this, uses list comprehension to convert each hexadecimal number to an integer.
Here is another good one:
int_array = np.frompyfunc(int, 2, 1) #Can be used, for example, to add broadcasting to a built-in Python function
int_array(hexArray,16).astype(np.uint32)
If you want to know more about it: https://numpy.org/doc/stable/reference/generated/numpy.frompyfunc.html?highlight=frompyfunc#numpy.frompyfunc
Check out the speed:
import numpy as np
import functools
hexArray = np.array(['ffaa', 'aa91', 'b1f6'])
hexArray = np.tile(hexArray,1000)
def x_test(hexArray):
v = hexArray.view(np.uint32)[::4]
return np.where(v > 64, v - 55, v - 48)
int_array = np.frompyfunc(int, 2, 1)
%timeit -n 100 int_array(hexArray,16).astype(np.uint32)
%timeit -n 100 np.fromiter(map(functools.partial(int, base=16), hexArray),dtype=np.uint32)
%timeit -n 100 [int(value, 16) for value in hexArray]
%timeit -n 100 x_test(hexArray)
print(f'\n\n{int_array(hexArray,16).astype(np.uint32)=}\n{np.fromiter(map(functools.partial(int, base=16), hexArray),dtype=np.uint32)=}\n{[int(value, 16) for value in hexArray][:10]=}\n{x_test(hexArray)=}')
460 µs ± 2.42 µs per loop (mean ± std. dev. of 7 runs, 100 loops each)
1.25 ms ± 2.66 µs per loop (mean ± std. dev. of 7 runs, 100 loops each)
1.11 ms ± 6.56 µs per loop (mean ± std. dev. of 7 runs, 100 loops each)
16.8 µs ± 165 ns per loop (mean ± std. dev. of 7 runs, 100 loops each)
int_array(hexArray,16).astype(np.uint32)=array([65450, 43665, 45558, ..., 65450, 43665, 45558], dtype=uint32)
np.fromiter(map(functools.partial(int, base=16), hexArray),dtype=np.uint32)=array([65450, 43665, 45558, ..., 65450, 43665, 45558], dtype=uint32)
[int(value, 16) for value in hexArray][:10]=[65450, 43665, 45558, 65450, 43665, 45558, 65450, 43665, 45558, 65450]
x_test(hexArray)=array([47, 42, 43, ..., 47, 42, 43], dtype=uint32)
Divakar's answer is the fastest, but, unfortunately, does not work for bigger hex numbers (at least for me)
Given array and n generate x
Input
array = [1 8 38 17]
n = 2
x = [1 2 3 8 9 10 38 39 40 17 18 19]
The code given by Patol75 is quick if the array and n are both small (although it does have to changed to work for any value of n).
def f(array,n):
return sum([[x + i for i in range(n+1)] for x in array],[])
It is slow when the array and n are both large. In that case, this is my best shot.
def g(array,n):
temp = np.vstack([array+i for i in range(n+1)])
return np.hstack([temp[:,i] for i in range(len(array))])
Here are the timings.
For small size.
array = np.array([1, 8, 38, 17])
n = 2
%timeit f(array,n)
9.44 µs ± 547 ns per loop (mean ± std. dev. of 7 runs, 100000 loops each)
%timeit g(array,n)
23.4 µs ± 1.65 µs per loop (mean ± std. dev. of 7 runs, 10000 loops each)
For bigger size.
array = np.random.randint(0,1000,1000)
n = 100
%timeit f(array,n)
479 ms ± 46.3 ms per loop (mean ± std. dev. of 7 runs, 1 loop each)
%timeit g(array,n)
2.69 ms ± 62.8 µs per loop (mean ± std. dev. of 7 runs, 100 loops each)
array = [1, 8, 38, 17]
n = 2
sum([[x, x + n - 1, x + n] for x in array], [])
# [1, 2, 3, 8, 9, 10, 38, 39, 40, 17, 18, 19]
This question already has answers here:
How can I sum every n array values and place the result into a new array? [duplicate]
(3 answers)
Closed 3 years ago.
For example,
I have a numpy array containing:
[1, 2, 3, 4, 5, 6]
I want to create an array as follows:
[3, 7, 11]
That is, I want to add the two neighboring elements into a new one.
I have tried the obvious:
for i in range(0, predictions.shape[0]+1, 2):
new_pred = np.append(new_pred, (predictions[i] + predictions[i+1]) / 2)
print(predictions.shape)
(16000, 0)
print(new_pred.shape)
(87998, 0)
But the dimension of new_pred is not half of 16000.
So I am wondering is there anything wrong with my code? And is there a convenient way to implement it?
There are many different possibilities, here it is one, neither the slowest one nor the fastest, of them,
>>> import numpy as np
>>> a = np.arange(30)
>>> a.reshape(-1, 2).sum(axis=1)
array([ 1, 5, 9, 13, 17, 21, 25, 29, 33, 37, 41, 45, 49, 53, 57])
>>>
For the record (please note that we have a new fastest answer that, imho, can't be bettered at all)
In [17]: a = np.arange(10**5)
In [18]: %timeit a.reshape(-1,2).sum(axis=1)
1.08 ms ± 1.35 µs per loop (mean ± std. dev. of 7 runs, 1000 loops each)
In [19]: %timeit [(a[i]+ a[i+1]) for i in range(0, len(a-1), 2)]
23.4 ms ± 117 µs per loop (mean ± std. dev. of 7 runs, 10 loops each)
In [20]: %timeit [sum(item) for ind, item in enumerate(zip(a, a[1:])) if ind%2 == 0]
49.9 ms ± 313 µs per loop (mean ± std. dev. of 7 runs, 10 loops each)
In [21]: %timeit [sum(item) for item in zip(a[::2], a[1::2])]
30.2 ms ± 91 µs per loop (mean ± std. dev. of 7 runs, 10 loops each)
...
In [23]: %timeit a[::2]+a[1::2]
78.9 µs ± 79.9 ns per loop (mean ± std. dev. of 7 runs, 10000 loops each)
Use slices of ndarray:
predictions[::2] + predictions[1::2]
It is 10 times faster than "reshape" solution
>>> a = np.arange(10**5)
>>> timeit(lambda: a.reshape(-1,2).sum(axis=-1), number=1000)
0.785971520585008
>>> timeit(lambda: a[::2]+a[1::2], number=1000)
0.07569492445327342
another pythonic Possibility would be to use list comprehensions:
something like this for the example you posted:
import numpy as np
a = np.arange(1, 7)
res = [(a[i]+ a[i+1]) for i in range(0, len(a-1), 2)]
print(res)
hope it helps
Using zip
zip_ls = zip(ls[::2], ls[1::2])
new_ls = [sum(item) for item in zip_ls]
Say I have an array of distances x=[1,2,1,3,3,2,1,5,1,1].
I want to get the indices from x where cumsum reaches 10, in this case, idx=[4,9].
So the cumsum restarts after the condition are met.
I can do it with a loop, but loops are slow for large arrays and I was wondering if I could do it in a vectorized way.
A fun method
sumlm = np.frompyfunc(lambda a,b:a+b if a < 10 else b,2,1)
newx=sumlm.accumulate(x, dtype=np.object)
newx
array([1, 3, 4, 7, 10, 2, 3, 8, 9, 10], dtype=object)
np.nonzero(newx==10)
(array([4, 9]),)
Here's one with numba and array-initialization -
from numba import njit
#njit
def cumsum_breach_numba2(x, target, result):
total = 0
iterID = 0
for i,x_i in enumerate(x):
total += x_i
if total >= target:
result[iterID] = i
iterID += 1
total = 0
return iterID
def cumsum_breach_array_init(x, target):
x = np.asarray(x)
result = np.empty(len(x),dtype=np.uint64)
idx = cumsum_breach_numba2(x, target, result)
return result[:idx]
Timings
Including #piRSquared's solutions and using the benchmarking setup from the same post -
In [58]: np.random.seed([3, 1415])
...: x = np.random.randint(100, size=1000000).tolist()
# #piRSquared soln1
In [59]: %timeit list(cumsum_breach(x, 10))
10 loops, best of 3: 73.2 ms per loop
# #piRSquared soln2
In [60]: %timeit cumsum_breach_numba(np.asarray(x), 10)
10 loops, best of 3: 69.2 ms per loop
# From this post
In [61]: %timeit cumsum_breach_array_init(x, 10)
10 loops, best of 3: 39.1 ms per loop
Numba : Appending vs. array-initialization
For a closer look at how the array-initialization helps, which seems be the big difference between the two numba implementations, let's time these on the array data, as the array data creation was in itself heavy on runtime and they both depend on it -
In [62]: x = np.array(x)
In [63]: %timeit cumsum_breach_numba(x, 10)# with appending
10 loops, best of 3: 31.5 ms per loop
In [64]: %timeit cumsum_breach_array_init(x, 10)
1000 loops, best of 3: 1.8 ms per loop
To force the output to have it own memory space, we can make a copy. Won't change the things in a big way though -
In [65]: %timeit cumsum_breach_array_init(x, 10).copy()
100 loops, best of 3: 2.67 ms per loop
Loops are not always bad (especially when you need one). Also, There is no tool or algorithm that will make this quicker than O(n). So let's just make a good loop.
Generator Function
def cumsum_breach(x, target):
total = 0
for i, y in enumerate(x):
total += y
if total >= target:
yield i
total = 0
list(cumsum_breach(x, 10))
[4, 9]
Just In Time compiling with Numba
Numba is a third party library that needs to be installed.
Numba can be persnickety about what features are supported. But this works.
Also, as pointed out by Divakar, Numba performs better with arrays
from numba import njit
#njit
def cumsum_breach_numba(x, target):
total = 0
result = []
for i, y in enumerate(x):
total += y
if total >= target:
result.append(i)
total = 0
return result
cumsum_breach_numba(x, 10)
Testing the Two
Because I felt like it ¯\_(ツ)_/¯
Setup
np.random.seed([3, 1415])
x0 = np.random.randint(100, size=1_000_000)
x1 = x0.tolist()
Accuracy
i0 = cumsum_breach_numba(x0, 200_000)
i1 = list(cumsum_breach(x1, 200_000))
assert i0 == i1
Time
%timeit cumsum_breach_numba(x0, 200_000)
%timeit list(cumsum_breach(x1, 200_000))
582 µs ± 40.2 µs per loop (mean ± std. dev. of 7 runs, 1000 loops each)
64.3 ms ± 5.66 ms per loop (mean ± std. dev. of 7 runs, 10 loops each)
Numba was on the order of 100 times faster.
For a more true apples to apples test, I convert a list to a Numpy array
%timeit cumsum_breach_numba(np.array(x1), 200_000)
%timeit list(cumsum_breach(x1, 200_000))
43.1 ms ± 202 µs per loop (mean ± std. dev. of 7 runs, 10 loops each)
62.8 ms ± 327 µs per loop (mean ± std. dev. of 7 runs, 10 loops each)
Which brings them to about even.
I have an N-by-M array, at each entry of whom, I need to do some NumPy operations and put the result there.
Right now, I'm doing it the naive way with a double loop:
import numpy as np
N = 10
M = 11
K = 100
result = np.zeros((N, M))
is_relevant = np.random.rand(N, M, K) > 0.5
weight = np.random.rand(3, 3, K)
values1 = np.random.rand(3, 3, K)
values2 = np.random.rand(3, 3, K)
for i in range(N):
for j in range(M):
selector = is_relevant[i, j, :]
result[i, j] = np.sum(
np.multiply(
np.multiply(
values1[..., selector],
values2[..., selector]
), weight[..., selector]
)
)
Since all the in-loop operations are simply NumPy operations, I think there must be a way to do this faster or loop-free.
We can use a combination of np.einsum and np.tensordot -
a = np.einsum('ijk,ijk,ijk->k',values1,values2,weight)
out = np.tensordot(a,is_relevant,axes=(0,2))
Alternatively, with one einsum call -
np.einsum('ijk,ijk,ijk,lmk->lm',values1,values2,weight,is_relevant)
And with np.dot and einsum -
is_relevant.dot(np.einsum('ijk,ijk,ijk->k',values1,values2,weight))
Also, play around with the optimize flag in np.einsum by setting it as True to use BLAS.
Timings -
In [146]: %%timeit
...: a = np.einsum('ijk,ijk,ijk->k',values1,values2,weight)
...: out = np.tensordot(a,is_relevant,axes=(0,2))
10000 loops, best of 3: 121 µs per loop
In [147]: %timeit np.einsum('ijk,ijk,ijk,lmk->lm',values1,values2,weight,is_relevant)
1000 loops, best of 3: 851 µs per loop
In [148]: %timeit np.einsum('ijk,ijk,ijk,lmk->lm',values1,values2,weight,is_relevant,optimize=True)
1000 loops, best of 3: 347 µs per loop
In [156]: %timeit is_relevant.dot(np.einsum('ijk,ijk,ijk->k',values1,values2,weight))
10000 loops, best of 3: 58.6 µs per loop
Very large arrays
For very large arrays, we can leverage numexpr to make use of multi-cores -
import numexpr as ne
a = np.einsum('ijk,ijk,ijk->k',values1,values2,weight)
out = np.empty((N, M))
for i in range(N):
for j in range(M):
out[i,j] = ne.evaluate('sum(is_relevant_ij*a)',{'is_relevant_ij':is_relevant[i,j], 'a':a})
Another very simple option is just:
result = (values1 * values2 * weight * is_relevant[:, :, np.newaxis, np.newaxis]).sum((2, 3, 4))
Divakar's last solution is faster than this though. Timings for comparison:
%timeit np.tensordot(np.einsum('ijk,ijk,ijk->k',values1,values2,weight),is_relevant,axes=(0,2))
# 30.9 µs ± 1.71 µs per loop (mean ± std. dev. of 7 runs, 10000 loops each)
%timeit np.einsum('ijk,ijk,ijk,lmk->lm',values1,values2,weight,is_relevant)
# 379 µs ± 486 ns per loop (mean ± std. dev. of 7 runs, 1000 loops each)
%timeit np.einsum('ijk,ijk,ijk,lmk->lm',values1,values2,weight,is_relevant,optimize=True)
# 145 µs ± 1.89 µs per loop (mean ± std. dev. of 7 runs, 10000 loops each)
%timeit is_relevant.dot(np.einsum('ijk,ijk,ijk->k',values1,values2,weight))
# 15 µs ± 124 ns per loop (mean ± std. dev. of 7 runs, 100000 loops each)
%timeit (values1 * values2 * weight * is_relevant[:, :, np.newaxis, np.newaxis]).sum((2, 3, 4))
# 152 µs ± 1.4 µs per loop (mean ± std. dev. of 7 runs, 10000 loops each)