How to faster iterate over a Python numpy.ndarray with 2 dimensions - python

So, i simply want to make this faster:
for x in range(matrix.shape[0]):
for y in range(matrix.shape[1]):
if matrix[x][y] == 2 or matrix[x][y] == 3 or matrix[x][y] == 4 or matrix[x][y] == 5 or matrix[x][y] == 6:
if x not in heights:
heights.append(x)
Simply iterate over a 2x2 matrix (usually round 18x18 or 22x22) and check it's x. But its kinda slow, i wonder which is the fastest way to do this.
Thank you very much!

For a numpy based approach, you can do:
np.flatnonzero(((a>=2) & (a<=6)).any(1))
# array([1, 2, 6], dtype=int64)
Where:
a = np.random.randint(0,30,(7,7))
print(a)
array([[25, 27, 28, 21, 18, 7, 26],
[ 2, 18, 21, 13, 27, 26, 2],
[23, 27, 18, 7, 4, 6, 13],
[25, 20, 19, 15, 8, 22, 0],
[27, 23, 18, 22, 25, 17, 15],
[19, 12, 12, 9, 29, 23, 21],
[16, 27, 22, 23, 8, 3, 11]])
Timings on a larger array:
a = np.random.randint(0,30, (1000,1000))
%%timeit
heights=[]
for x in range(a.shape[0]):
for y in range(a.shape[1]):
if a[x][y] == 2 or a[x][y] == 3 or a[x][y] == 4 or a[x][y] == 5 or a[x][y] == 6:
if x not in heights:
heights.append(x)
# 3.17 s ± 59.4 ms per loop (mean ± std. dev. of 7 runs, 1 loop each)
%%timeit
yatu = np.flatnonzero(((a>=2) & (a<=6)).any(1))
# 965 µs ± 11.6 µs per loop (mean ± std. dev. of 7 runs, 1000 loops each)
np.allclose(yatu, heights)
# true
Vectorizing with numpy yields to roughly a 3200x speedup

It looks like you want to find if 2, 3, 4, 5 or 6 appear in the matrix.
You can use np.isin() to create a matrix of true/false values, then use that as an indexer:
>>> arr = np.array([1,2,3,4,4,0]).reshape(2,3)
>>> arr[np.isin(arr, [2,3,4,5,6])]
array([2, 3, 4, 4])
Optionally, turn that into a plain Python set() for faster in lookups and no duplicates.
To get the positions in the array where those numbers appear, use argwhere:
>>> np.argwhere(np.isin(arr, [2,3,4,5,6]))
array([[0, 1],
[0, 2],
[1, 0],
[1, 1]])

Related

Intersection of two lists in numba

I would like to know the fastest way to compute the intersection of two list within a numba function. Just for clarification: an example of the intersection of two lists:
Input :
lst1 = [15, 9, 10, 56, 23, 78, 5, 4, 9]
lst2 = [9, 4, 5, 36, 47, 26, 10, 45, 87]
Output :
[9, 10, 4, 5]
The problem is, that this needs to be computed within the numba function and therefore e.g. sets can not be used. Do you have an idea?
My current code is very basic. I assume that there is room for improvement.
#nb.njit
def intersection:
result = []
for element1 in lst1:
for element2 in lst2:
if element1 == element2:
result.append(element1)
....
Since numba compiles and runs your code in machine code, your probably at the best for such a simple operation.
I ran some benchmarks below
#nb.njit
def loop_intersection(lst1, lst2):
result = []
for element1 in lst1:
for element2 in lst2:
if element1 == element2:
result.append(element1)
return result
#nb.njit
def set_intersect(lst1, lst2):
return set(lst1).intersection(set(lst2))
Resuls
loop_intersection
40.4 µs ± 1.5 µs per loop (mean ± std. dev. of 7 runs, 10000 loops each)
set_intersect
42 µs ± 6.74 µs per loop (mean ± std. dev. of 7 runs, 1000 loops each)
I played with this a bit to try and learn something, realizing that the answer has already been given. When I run the accepted answer I get a return value of [9, 10, 5, 4, 9]. I wasn’t clear if the repeated 9 was acceptable or not.
Assuming it’s OK, I ran a trial using list comprehension to see it made any difference. My results:
from numba import jit
def createLists():
l1 = [15, 9, 10, 56, 23, 78, 5, 4, 9]
l2 = [9, 4, 5, 36, 47, 26, 10, 45, 87]
#jit
def listComp():
l1, l2 = createLists()
return [i for i in l1 for j in l2 if i == j]
%timeit listComp()
5.84 microseconds +/- 10.5 nanoseconds
Or if you can can use Numpy this code is even faster and removes the duplicate "9" and is much faster with the Numba signature.
import numpy as np
from numba import jit, int64
#jit(int64[:](int64[:], int64[:]))
def JitListComp(l1, l2):
l3 = np.array([i for i in l1 for j in l2 if i == j])
return np.unique(l3) # and i not in crossSec]
#jit
def CreateList():
l1 = np.array([15, 9, 10, 56, 23, 78, 5, 4, 9])
l2 = np.array([9, 4, 5, 36, 47, 26, 10, 45, 87])
return JitListComp(l1, l2)
CreateList()
Out[39]: array([ 4, 5, 9, 10])
%timeit CreateList()
1.71 µs ± 10.4 ns per loop (mean ± std. dev. of 7 runs, 1000000 loops each)
You can use set operation for this:
def intersection(lst1, lst2):
return list(set(lst1) & set(lst2))
then simply call the function intersection(lst1,lst2). This will be the easiest way.

Reshaping data in Python (List/array)

I have the following Numpy array.
I want to reshape it to be (3,5,3)
So i must be like:
[
[
[1,6,11],
[2,7,12],
[3,8,13],
[4,9,14],
[5,10,15]
],.......
]
I tried reshape(3,5,3) but it doesn't give the wanted result?
Your input array is of shape (3, 3, 5) and you want it to be reshaped it to (3, 5, 3). There are many ways of doing this. Below are some, as also mentioned in the comments:
First would be to use numpy.reshape() which accepts newshape as a parameter:
In [77]: arr = np.arange(3*3*5).reshape(3, 3, 5)
# reshape to desired shape
In [78]: arr = arr.reshape((3, 5, 3))
In [79]: arr.shape
Out[79]: (3, 5, 3)
Or you can use numpy.transpose() as in:
In [80]: arr = np.arange(3*3*5).reshape(3, 3, 5)
In [81]: arr.shape
Out[81]: (3, 3, 5)
# now, we want to move the last axis which is 2 to second position
# thus our new shape would be `(3, 5, 3)`
In [82]: arr = np.transpose(arr, (0, 2, 1))
In [83]: arr.shape
Out[83]: (3, 5, 3)
Another way would be to use numpy.moveaxis() :
In [87]: arr = np.arange(3*3*5).reshape(3, 3, 5)
# move the last axis (-1) to 2nd position (1)
In [88]: arr = np.moveaxis(arr, -1, 1)
In [89]: arr.shape
Out[89]: (3, 5, 3)
Yet another way would be to just swap the axes using numpy.swapaxes() :
In [90]: arr = np.arange(3*3*5).reshape(3, 3, 5)
In [91]: arr.shape
Out[91]: (3, 3, 5)
# swap the position of ultimate and penultimate axes
In [92]: arr = np.swapaxes(arr, -1, 1)
In [93]: arr.shape
Out[93]: (3, 5, 3)
Choose whichever is more intuitive to you since all approaches return a new view of the desired shape.
Although all of the above return a view, there are some timing differences. So, the preferred way of doing this (for efficiency) would be:
In [124]: arr = np.arange(3*3*5).reshape(3, 3, 5)
In [125]: %timeit np.swapaxes(arr, -1, 1)
456 ns ± 6.79 ns per loop (mean ± std. dev. of 7 runs, 1000000 loops each)
In [126]: %timeit np.transpose(arr, (0, 2, 1))
458 ns ± 6.93 ns per loop (mean ± std. dev. of 7 runs, 1000000 loops each)
In [127]: %timeit np.reshape(arr, (3, 5, 3))
635 ns ± 9.06 ns per loop (mean ± std. dev. of 7 runs, 1000000 loops each)
In [128]: %timeit np.moveaxis(arr, -1, 1)
3.42 µs ± 79.6 ns per loop (mean ± std. dev. of 7 runs, 100000 loops each)
numpy.swapaxes() and numpy.transpose() takes almost the same time, with numpy.reshape() being a bit slower, while numpy.moveaxis being the slowest among all. So, it'd be wise to use either swapaxes or transpose ufunc.
I found a way of doing it using List comprehension and Numpy transpose.
Code:
import numpy as np
database = [
[
[1,2,3,4,5],
[6,7,8,9,10],
[11,12,13,14,15]
],
[
[16,17,18,19,20],
[21,22,23,24,25],
[26,27,28,29,30]
],
[
[31,32,33,34,35],
[36,37,38,39,40],
[41,42,43,44,45]
]
]
ans = [np.transpose(data) for data in database]
print(ans)
Output:
[array([[ 1, 6, 11],
[ 2, 7, 12],
[ 3, 8, 13],
[ 4, 9, 14],
[ 5, 10, 15]]),
array([[16, 21, 26],
[17, 22, 27],
[18, 23, 28],
[19, 24, 29],
[20, 25, 30]]),
array([[31, 36, 41],
[32, 37, 42],
[33, 38, 43],
[34, 39, 44],
[35, 40, 45]])]

Selecting Random Windows from Multidimensional Numpy Array Rows

I have a large array where each row is a time series and thus needs to stay in order.
I want to select a random window of a given size for each row.
Example:
>>>import numpy as np
>>>arr = np.array(range(42)).reshape(6,7)
>>>arr
array([[ 0, 1, 2, 3, 4, 5, 6],
[ 7, 8, 9, 10, 11, 12, 13],
[14, 15, 16, 17, 18, 19, 20],
[21, 22, 23, 24, 25, 26, 27],
[28, 29, 30, 31, 32, 33, 34],
[35, 36, 37, 38, 39, 40, 41]])
>>># What I want to do:
>>>select_random_windows(arr, window_size=3)
array([[ 1, 2, 3],
[11, 12, 13],
[14, 15, 16],
[22, 23, 24],
[38, 39, 40]])
What an ideal solution would look like to me:
def select_random_windows(arr, window_size):
offsets = np.random.randint(0, arr.shape[0] - window_size, size = arr.shape[1])
return arr[:, offsets: offsets + window_size]
But unfortunately this does not work
What I'm going with right now is terribly slow:
def select_random_windows(arr, wndow_size):
result = []
offsets = np.random.randint(0, arr.shape[0]-window_size, size = arr.shape[1])
for row, offset in enumerate(start_indices):
result.append(arr[row][offset: offset + window_size])
return np.array(result)
Sure, I could do the same with a list comprehension (and get a minimal speed boost), but I was wondering wether there is some super smart numpy vectorized way to do this.
Here's one leveraging np.lib.stride_tricks.as_strided -
def random_windows_per_row_strided(arr, W=3):
idx = np.random.randint(0,arr.shape[1]-W+1, arr.shape[0])
strided = np.lib.stride_tricks.as_strided
m,n = arr.shape
s0,s1 = arr.strides
windows = strided(arr, shape=(m,n-W+1,W), strides=(s0,s1,s1))
return windows[np.arange(len(idx)), idx]
Runtime test on bigger array with 10,000 rows -
In [469]: arr = np.random.rand(100000,100)
# #Psidom's soln
In [470]: %timeit select_random_windows(arr, window_size=3)
100 loops, best of 3: 7.41 ms per loop
In [471]: %timeit random_windows_per_row_strided(arr, W=3)
100 loops, best of 3: 6.84 ms per loop
# #Psidom's soln
In [472]: %timeit select_random_windows(arr, window_size=30)
10 loops, best of 3: 26.8 ms per loop
In [473]: %timeit random_windows_per_row_strided(arr, W=30)
100 loops, best of 3: 9.65 ms per loop
# #Psidom's soln
In [474]: %timeit select_random_windows(arr, window_size=50)
10 loops, best of 3: 41.8 ms per loop
In [475]: %timeit random_windows_per_row_strided(arr, W=50)
100 loops, best of 3: 10 ms per loop
In the return statement, change the slicing to advanced indexing, also you need to fix the sampling code a little bit:
def select_random_windows(arr, window_size):
offsets = np.random.randint(0, arr.shape[1]-window_size+1, size=arr.shape[0])
return arr[np.arange(arr.shape[0])[:,None], offsets[:,None] + np.arange(window_size)]
select_random_windows(arr, 3)
#array([[ 4, 5, 6],
# [ 7, 8, 9],
# [17, 18, 19],
# [25, 26, 27],
# [31, 32, 33],
# [39, 40, 41]])

Most Efficient Code for clipping the elements of a vector until it reaches a sum

Suppose we have an integer vector that sums to S1. I would like to take this vector, and produce another vector that sums to S2<S1. I'd like to do this by subtracting off the (first) max element one by one until the sum is down below 4.
E.g. clip_by_sum([1,4,8,3], total=10) == [1, 3, 3, 3].
An easy code which does this is:
def clip_to_sum(vec, total):
new_vec = np.array(vec)
current_total = np.sum(vec)
while current_total > total:
i = np.argmax(new_vec)
new_vec[i] -= 1
current_total -= 1
return new_vec
However, it's obviously horribly inefficient, because we only subtract off one element at a time no matter how much the lead vector is leading by.
Anyone have a nifty trick for doing this efficiently?
Edit: An input vector that already sums to less than S1 can be left unchanged, so for example clip_to_sum([1,4,8,3], 20) should be [1,4,8,3]
Edit For those wondering what this is for, it's for the mundane task of determining column widths in a fixed-width table!
You are basically going Robin Hood there and clipping off the values that are above global average w.r.t. total, until the global sum reaches a threshold. Using that theory, we will start off with a baseline number and then loop through, like so -
def clip_until_sum(vec, total):
# Get array version
a = np.asarray(vec)
if a.sum() <= total:
return a
# Baseline number
b = int(total/float(len(a)))
# Setup output
out = np.where(a > b, b, a)
s = out.sum()
# Loop to shift up values starting from baseline
while s<total:
idx = np.flatnonzero(a > out)
dss = total - s
out[idx[max(0,len(idx)-dss):]] += 1
s = out.sum()
return out
Sample runs -
Set #1 :
In [868]: clip_until_sum([1,4,8,3], 10)
Out[868]: array([1, 3, 3, 3])
In [869]: clip_until_sum([1,4,8,3], 11)
Out[869]: array([1, 3, 4, 3])
In [870]: clip_until_sum([1,4,8,3], 12)
Out[870]: array([1, 4, 4, 3])
In [871]: clip_until_sum([1,4,8,3], 13)
Out[871]: array([1, 4, 5, 3])
In [872]: clip_until_sum([1,4,8,3], 14)
Out[872]: array([1, 4, 6, 3])
In [873]: clip_until_sum([1,4,8,3], 15)
Out[873]: array([1, 4, 7, 3])
In [874]: clip_until_sum([1,4,8,3], 16)
Out[874]: array([1, 4, 8, 3])
Set #2 :
In [875]: clip_until_sum([1,4,8,3,5,6], 12)
Out[875]: array([1, 2, 2, 2, 2, 3])
Runtime test and verification -
In [164]: np.random.seed(0)
# Assuming 10000 elems with max of 1000 and total as half of sum
In [165]: vec = np.random.randint(0, 1000, size=10000)
In [167]: total = vec.sum()//2
In [168]: np.allclose(clip_to_sum(vec, total), clip_until_sum(vec, total))
Out[168]: True
In [169]: %timeit clip_to_sum(vec, total)
1 loop, best of 3: 19.1 s per loop
In [170]: %timeit clip_until_sum(vec, total)
100 loops, best of 3: 2.8 ms per loop
# #Warren Weckesser's soln
In [171]: %timeit limit_sum1(vec, total)
1000 loops, best of 3: 733 µs per loop
Here are two functions that compute the clipped array. The first, limit_sum1, will not give exactly the same result as your function, because it, in effect, makes different choices of which "max" to decrease when the the maximum occurs multiple times in the input vector. That is, if vec = [4, 4, 4], and total = 11, there are three possible results: [3, 4, 4], [4, 3, 4], and [4, 4, 3]. Your function gives [3, 4, 4], while limit_sum1 gives [4, 4, 3].
For very small input vectors, like the examples in the question, limit_sum2 is generally faster than limit_sum1, but neither is faster than your clip_to_sum. For somewhat longer input vectors with more varied input range, both are faster than clip_to_sum, and for very long input vectors, limit_sum1 is much faster. Examples with timing are below.
def limit_sum1(vec, total):
x = np.asarray(vec)
delta = x.sum() - total
if delta <= 0:
return x
i = np.argsort(x)
# j is the inverse of the sorting permutation i.
j = np.empty_like(i)
j[i] = np.arange(len(x))[::-1]
y = np.zeros(len(x)+1, dtype=int)
y[1:] = x[i]
d = np.diff(y)[::-1]
y = y[::-1]
wd = d * np.arange(1, len(d)+1)
cs = wd.cumsum()
k = np.searchsorted(cs, delta, side='right')
if k > 0:
y[:k] -= d[:k][::-1].cumsum()[::-1]
delta = delta - cs[k-1]
q, r = divmod(delta, k+1)
y[:k+1] -= q
y[:r] -= 1
x2 = y[j]
return x2
def limit_sum2(vec, total):
a = np.array(vec)
while a.sum() > total:
amax = a.max()
i = np.where(a == amax)[0]
if len(i) < len(a):
nextmax = a[a < amax].max()
else:
nextmax = 0
clip_to_nextmax_delta = len(i)*(amax - nextmax)
diff = a.sum() - total
if clip_to_nextmax_delta > diff:
q, r = divmod(diff, len(i))
a[i] -= q
a[i[:r]] -= 1
break
else:
# Clip all the current max values to nextmax.
a[i] = nextmax
return a
Examples
In [1388]: vec = np.array([1, 4, 8, 3])
limit_sum1, limit_sum2 and clip_to_sum all give the same result:
In [1389]: limit_sum1(vec, total=10)
Out[1389]: array([1, 3, 3, 3])
In [1390]: limit_sum2(vec, total=10)
Out[1390]: array([1, 3, 3, 3])
In [1391]: clip_to_sum(vec, total=10)
Out[1391]: array([1, 3, 3, 3])
clip_to_sum is faster with this small vector.
In [1392]: %timeit limit_sum1(vec, total=10)
33.1 µs ± 272 ns per loop (mean ± std. dev. of 7 runs, 10000 loops each)
In [1393]: %timeit limit_sum2(vec, total=10)
24.6 µs ± 138 ns per loop (mean ± std. dev. of 7 runs, 10000 loops each)
In [1394]: %timeit clip_to_sum(vec, total=10)
15.6 µs ± 44.8 ns per loop (mean ± std. dev. of 7 runs, 100000 loops each)
Let's try a longer vector containing bigger values.
In [1405]: np.random.seed(1729)
In [1406]: vec = np.random.randint(0, 100, size=50)
In [1407]: vec
Out[1407]:
array([13, 37, 21, 67, 13, 89, 59, 35, 65, 91, 36, 73, 93, 83, 43, 86, 44,
19, 51, 76, 12, 26, 43, 0, 42, 53, 30, 65, 3, 65, 37, 68, 64, 87,
91, 4, 70, 10, 50, 40, 34, 32, 13, 7, 93, 79, 16, 98, 1, 35])
In [1408]: vec.sum()
Out[1408]: 2362
Find a result using each function:
In [1409]: limit_sum1(vec, total=1500)
Out[1409]:
array([13, 37, 21, 38, 13, 38, 38, 35, 38, 38, 36, 38, 38, 38, 38, 38, 38,
19, 38, 38, 12, 26, 38, 0, 39, 38, 30, 38, 3, 38, 37, 38, 38, 38,
38, 4, 38, 10, 38, 39, 34, 32, 13, 7, 38, 38, 16, 38, 1, 35])
In [1410]: limit_sum2(vec, total=1500)
Out[1410]:
array([13, 37, 21, 38, 13, 38, 38, 35, 38, 38, 36, 38, 38, 38, 38, 38, 38,
19, 38, 38, 12, 26, 38, 0, 38, 38, 30, 38, 3, 38, 37, 38, 38, 38,
38, 4, 38, 10, 38, 38, 34, 32, 13, 7, 38, 39, 16, 39, 1, 35])
In [1411]: clip_to_sum(vec, total=1500)
Out[1411]:
array([13, 37, 21, 38, 13, 38, 38, 35, 38, 38, 36, 38, 38, 38, 38, 38, 38,
19, 38, 38, 12, 26, 38, 0, 38, 38, 30, 38, 3, 38, 37, 38, 38, 38,
38, 4, 38, 10, 38, 38, 34, 32, 13, 7, 38, 39, 16, 39, 1, 35])
This time, limit_sum1 is the fastest by a wide margin:
In [1413]: %timeit limit_sum1(vec, total=1500)
34.9 µs ± 257 ns per loop (mean ± std. dev. of 7 runs, 10000 loops each)
In [1414]: %timeit limit_sum2(vec, total=1500)
272 µs ± 2.12 µs per loop (mean ± std. dev. of 7 runs, 1000 loops each)
In [1415]: %timeit clip_to_sum(vec, total=1500)
1.74 ms ± 7.08 µs per loop (mean ± std. dev. of 7 runs, 1000 loops each)
You can modify your function to include a difference between the max and second max elements. This will use additional compute resources per loop but should reduce the total number of loops significantly.
I've tested this versus your original function and it gives the same results. Though, admittedly, I am having difficulty seeing any real speed up between the two.
def clip_to_sum(vec, total):
current_total = np.sum(vec)
new_vec = np.array(vec)
while current_total > total:
i = np.argmax(new_vec)
d = np.partition(new_vec.flatten(), -2)[-2]
diff = new_vec[i] - d
if not (new_vec[i] == diff) and diff > 0:
new_vec[i] -= diff
current_total -= diff
else:
new_vec[i] -= 1
current_total -= 1
return new_vec
Unfortunately it's not realy obvious what type of result you are interested in. But let's assume you have an array of a certain length and you want to take out the first elements A[0:ix] so that the sum is (somehow close to) S1, you can do:
S1 = 5
A = np.array([1,1,1,1,1,1,1,1,1,1,1,1,1,1,1,1,])
B = np.cumsum(A)
ix = np.argmax(B>=S1)+1
C = A[0:ix]
print("C = ", C); print("sum C = ", np.sum(C))
which gives
C = [1 1 1 1 1]
sum C = 5
You can write the same in 1 line
C = A[0:np.argmax(np.cumsum(A)>=S1)+1]

Python: shorter syntax for slices with gaps?

Suppose I want the first element, the 3rd through 200th elements, and the 201st element through the last element by step-size 3, from a list in Python.
One way to do it is with distinct indexing and concatenation:
new_list = old_list[0:1] + old_list[3:201] + old_list[201::3]
Is there a way to do this with just one index on old_list? I would like something like the following (I know this doesn't syntactically work since list indices cannot be lists and since Python unfortunately doesn't have slice literals; I'm just looking for something close):
new_list = old_list[[0, 3:201, 201::3]]
I can achieve some of this by switching to NumPy arrays, but I'm more interested in how to do it for native Python lists. I could also create a slice maker or something like that, and possibly strong arm that into giving me an equivalent slice object to represent the composition of all my desired slices.
But I'm looking for something that doesn't involve creating a new class to manage the slices. I want to just sort of concatenate the slice syntax and feed that to my list and have the list understand that it means to separately get the slices and concatenate their respective results in the end.
A slice maker object (e.g. SliceMaker from your other question, or np.s_) can accept multiple comma-separated slices; they are received as a tuple of slices or other objects:
from numpy import s_
s_[0, 3:5, 6::3]
Out[1]: (0, slice(3, 5, None), slice(6, None, 3))
NumPy uses this for multidimensional arrays, but you can use it for slice concatenation:
def xslice(arr, slices):
if isinstance(slices, tuple):
return sum((arr[s] if isinstance(s, slice) else [arr[s]] for s in slices), [])
elif isinstance(slices, slice):
return arr[slices]
else:
return [arr[slices]]
xslice(list(range(10)), s_[0, 3:5, 6::3])
Out[1]: [0, 3, 4, 6, 9]
xslice(list(range(10)), s_[1])
Out[2]: [1]
xslice(list(range(10)), s_[:])
Out[3]: [0, 1, 2, 3, 4, 5, 6, 7, 8, 9]
import numpy as np
a = list(range(15, 50, 3))
# %%timeit -n 10000 -> 41.1 µs ± 1.71 µs per loop (mean ± std. dev. of 7 runs, 10000 loops each)
[a[index] for index in np.r_[1:3, 5:7, 9:11]]
---
[18, 21, 30, 33, 42, 45]
import numpy as np
a = np.arange(15, 50, 3).astype(np.int32)
# %%timeit -n 10000 -> 31.9 µs ± 5.68 µs per loop (mean ± std. dev. of 7 runs, 10000 loops each)
a[np.r_[1:3, 5:7, 9:11]]
---
array([18, 21, 30, 33, 42, 45], dtype=int32)
import numpy as np
a = np.arange(15, 50, 3).astype(np.int32)
# %%timeit -n 10000 -> 7.17 µs ± 1.17 µs per loop (mean ± std. dev. of 7 runs, 10000 loops each)
slices = np.s_[1:3, 5:7, 9:11]
np.concatenate([a[_slice] for _slice in slices])
---
array([18, 21, 30, 33, 42, 45], dtype=int32)
Seems using numpy is a faster way.
Adding numpy part to the answer from ecatmur.
import numpy as np
def xslice(x, slices):
"""Extract slices from array-like
Args:
x: array-like
slices: slice or tuple of slice objects
"""
if isinstance(slices, tuple):
if isinstance(x, np.ndarray):
return np.concatenate([x[_slice] for _slice in slices])
else:
return sum((x[s] if isinstance(s, slice) else [x[s]] for s in slices), [])
elif isinstance(slices, slice):
return x[slices]
else:
return [x[slices]]
You're probably better off writing your own sequence type.
>>> L = range(20)
>>> L
[0, 1, 2, 3, 4, 5, 6, 7, 8, 9, 10, 11, 12, 13, 14, 15, 16, 17, 18, 19]
>>> operator.itemgetter(*(range(1, 5) + range(10, 18, 3)))(L)
(1, 2, 3, 4, 10, 13, 16)
And to get you started on that:
>>> operator.itemgetter(*(range(*slice(1, 5).indices(len(L))) + range(*slice(10, 18, 3).indices(len(L)))))(L)
(1, 2, 3, 4, 10, 13, 16)
Not sure if this is "better", but it works so why not...
[y for x in [old_list[slice(*a)] for a in ((0,1),(3,201),(201,None,3))] for y in x]
It's probably slow (especially compared to chain) but it's basic python (3.5.2 used for testing)
Why don;t you create a custom slice for your purpose
>>> from itertools import chain, islice
>>> it = range(50)
>>> def cslice(iterable, *selectors):
return chain(*(islice(iterable,*s) for s in selectors))
>>> list(cslice(it,(1,5),(10,15),(25,None,3)))
[1, 2, 3, 4, 10, 11, 12, 13, 14, 25, 28, 31, 34, 37, 40, 43, 46, 49]
You could extend list to allow multiple slices and indices:
class MultindexList(list):
def __getitem__(self, key):
if type(key) is tuple or type(key) is list:
r = []
for index in key:
item = super().__getitem__(index)
if type(index) is slice:
r += item
else:
r.append(item)
return r
else:
return super().__getitem__(key)
a = MultindexList(range(10))
print(a[1:3]) # [1, 2]
print(a[[1, 2]]) # [1, 2]
print(a[1, 1:3, 4:6]) # [1, 1, 2, 4, 5]

Categories

Resources