Writing a fast function to check elements in a matrix

Writing a fast function to check elements in a matrix - python

I'm trying to write a function that checks the elements in a matrix and returns another matrix of the results in booleans.
The input:
X: The list of 2D age array as described above.
The output:
The function should return a 2D array with the entries of either 0 or 1 as described above.
The function must run 15 times faster than this one:
def check_elems(X):
out = [[0]*len(X[0]) for _ in range(len(X))]
for i in range(len(X)):
for j in range(len(X[i])):
check = X[i][j]
if check>=14 and check%5==4 and check!=19:
out[i][j] = 1
return out
Here's the particular example:
The person is at least 14
The person's age ends in 4 or 9
And the person is not 19
For example, an age array of
[[22, 13, 31, 13],
[17, 14, 24, 22]]
will have the output array:
[[0, 0, 0, 0],
[0, 1, 1, 0]]

Without numpy:
X = [[22, 13, 31, 13],
[17, 14, 24, 22]]
%timeit check_elems(X)
%timeit [[1 if i%5==4 and i>=14 and i!=19 else 0 for i in l] for l in X]
With a standard laptop:
2.16 µs ± 15 ns per loop (mean ± std. dev. of 7 runs, 100000 loops each)
911 ns ± 2.23 ns per loop (mean ± std. dev. of 7 runs, 1000000 loops each)
EDIT
As kindly noted by #user3386109 the previous code does NOT satisfy the 15 times faster constraint required by OP, that I somehow misread as 1.5 times. I leave my code just as a baseline.

Related

Is there a different way I can use the items in a list exclude boundaries?

Suppose I have the list A=[2,32,41,2,4,73,5,9,20]. If I want to create a list B whose elements are the sum of neighboring elements in A, boundary values 2 and 20 excluded, here's the code I have:
A=[2,32,41,2,4,73,5,9,20]
B = []
for i in range (1,len(A)-1):
B.append(A[i-1]+A[i+1])
>>> B
>>> [43, 34, 45, 75, 9, 82, 25]
I'm just wondering is there a better way I can generate the list B? This pattern is what I usually rely on while coding with some problems like that, but I really want to know if there's a better/easier way I can use the elements in a list/array with excluded boundary points. (instead of using range, as what I did here)
Many thanks for the help and suggestions!

You can use zip:
In [109]: A
Out[109]: [2, 32, 41, 2, 4, 73, 5, 9, 20]
In [110]: [a + b for a,b in zip(A, A[2:])]
Out[110]: [43, 34, 45, 75, 9, 82, 25]
Slightly better performance using zip and list comp. Using %timeit to measure, ops version is b my version is a:
In [114]: %timeit a(A)
1.06 µs ± 7.63 ns per loop (mean ± std. dev. of 7 runs, 1000000 loops each)
In [115]: %timeit b(A)
1.63 µs ± 9.72 ns per loop (mean ± std. dev. of 7 runs, 1000000 loops each)

This is one way to do using with list comprehension:
a = [2, 32, 41, 2, 4, 73, 5, 9, 20]
b = [i + j for (i,j) in zip(a[:-2], a[2:])]
b
[43, 34, 45, 75, 9, 82, 25]

Indexing a numpy array with a arrays

I am trying to convert the vanilla python standard deviation function that takes n number of indexes defined by the variable number for calculations into numpy form. However the numpy code is faulty which is saying only integer scalar arrays can be converted to a scalar index is there any way i could by pass this.
Variables
import numpy as np
number = 5
list_= np.array([457.334015,424.440002,394.795990,408.903992,398.821014,402.152008,435.790985,423.204987,411.574005,
404.424988,399.519989,377.181000,375.467010,386.944000,383.614990,375.071991,359.511993,328.865997,
320.510010,330.079010,336.187012,352.940002,365.026001,361.562012,362.299011,378.549011,390.414001,
400.869995,394.773010,382.556000])
Vanilla python
std= np.array([list_[i:i+number].std() for i in range(0, len(list_)-number)])
Numpy form
counter = np.arange(0, len(list_)-number, 1)
std = list_[counter:counter+number].std()

In [46]: std= np.array([arr[i:i+number].std() for i in range(0, len(arr)-number)
...: ])
In [47]: std
Out[47]:
array([22.67653383, 10.3940773 , 14.60076482, 13.82801944, 13.68038469,
12.54834004, 13.13574418, 15.24698722, 14.65383773, 11.62092989,
8.57331689, 4.76392583, 9.49404494, 21.20874383, 24.91417226,
20.84991841, 13.22152789, 10.83343482, 16.01294245, 13.80007894,
10.51866421, 8.29287433, 11.24933733, 15.43661128, 13.65945978])
We can move the std out of the loop. Make a 2d array of windows, and apply std with axis:
In [48]: np.array([arr[i:i+number] for i in range(0, len(arr)-number)]).std(axis
...: =1)
Out[48]:
array([22.67653383, 10.3940773 , 14.60076482, 13.82801944, 13.68038469,
12.54834004, 13.13574418, 15.24698722, 14.65383773, 11.62092989,
8.57331689, 4.76392583, 9.49404494, 21.20874383, 24.91417226,
20.84991841, 13.22152789, 10.83343482, 16.01294245, 13.80007894,
10.51866421, 8.29287433, 11.24933733, 15.43661128, 13.65945978])
We could also generate the windows with indexing. A convenient way is to use linspace:
In [63]: idx = np.arange(0,len(arr)-number)
In [64]: idx = np.linspace(idx,idx+number,number, endpoint=False,dtype=int)
In [65]: idx
Out[65]:
array([[ 0, 1, 2, 3, 4, 5, 6, 7, 8, 9, 10, 11, 12, 13, 14, 15,
16, 17, 18, 19, 20, 21, 22, 23, 24],
...
[ 4, 5, 6, 7, 8, 9, 10, 11, 12, 13, 14, 15, 16, 17, 18, 19,
20, 21, 22, 23, 24, 25, 26, 27, 28]])
In [66]: arr[idx].std(axis=0)
Out[66]:
array([22.67653383, 10.3940773 , 14.60076482, 13.82801944, 13.68038469,
12.54834004, 13.13574418, 15.24698722, 14.65383773, 11.62092989,
8.57331689, 4.76392583, 9.49404494, 21.20874383, 24.91417226,
20.84991841, 13.22152789, 10.83343482, 16.01294245, 13.80007894,
10.51866421, 8.29287433, 11.24933733, 15.43661128, 13.65945978])
The rolling-windows using as_strided will probably be faster, but may be harder to understand.
In [67]: timeit std= np.array([arr[i:i+number].std() for i in range(0, len(arr)-
...: number)])
1.05 ms ± 7.01 µs per loop (mean ± std. dev. of 7 runs, 1000 loops each)
In [68]: timeit np.array([arr[i:i+number] for i in range(0, len(arr)-number)]).s
...: td(axis=1)
74.7 µs ± 108 ns per loop (mean ± std. dev. of 7 runs, 10000 loops each)
In [69]: %%timeit
...: idx = np.arange(0,len(arr)-number)
...: idx = np.linspace(idx,idx+number,number, endpoint=False,dtype=int)
...: arr[idx].std(axis=0)
117 µs ± 240 ns per loop (mean ± std. dev. of 7 runs, 10000 loops each)
In [73]: timeit np.std(rolling_window(arr, 5), 1)
74.5 µs ± 625 ns per loop (mean ± std. dev. of 7 runs, 10000 loops each)
using a more direct way to generate the rolling index:
In [81]: %%timeit
...: idx = np.arange(len(arr)-number)[:,None]+np.arange(number)
...: arr[idx].std(axis=1)
57.9 µs ± 87.5 ns per loop (mean ± std. dev. of 7 runs, 10000 loops each)
your error
In [82]: arr[np.array([1,2,3]):np.array([4,5,6])]
Traceback (most recent call last):
File "<ipython-input-82-3358e59f8fb5>", line 1, in <module>
arr[np.array([1,2,3]):np.array([4,5,6])]
TypeError: only integer scalar arrays can be converted to a scalar index

as taken from Rolling window for 1D arrays in Numpy?
def rolling_window(a, window):
shape = a.shape[:-1] + (a.shape[-1] - window + 1, window)
strides = a.strides + (a.strides[-1],)
return np.lib.stride_tricks.as_strided(a, shape=shape, strides=strides)
np.std(rolling_window(list_, 5), 1)
by the way, your vanilla python code is wrong. it should be:
std= np.array([list_[i:i+number].std() for i in range(0, len(list_)-number+1)])

Python Numpy appending multiple lists from objects

I am calling an object several times that is returning a numpy list:
for x in range(0,100):
d = simulation3()
d = [0, 1, 2, 3]
d = [4, 5, 6, 7]
..and many more
I want to take each list and append it to a 2D array.
final_array = [[0, 1, 2, 3],[4, 5, 6, 7]...and so forth]
I tried creating an empty array (final_array = np.zeros(4,4)) and appending it but the values are appending after the 4X4 matrix is created.
Can anyone help me with this? thank you!

You can use np.fromiter to create an array from an iterable. Since, by default, this function only works with scalars, you can use itertools.chain to help:
np.random.seed(0)
from itertools import chain
def simulation3():
return np.random.randint(0, 10, 4)
n = 5
d = np.fromiter(chain.from_iterable(simulation3() for _ in range(5)), dtype='i')
d.shape = 5, 4
print(d)
array([[5, 0, 3, 3],
[7, 9, 3, 5],
[2, 4, 7, 6],
[8, 8, 1, 6],
[7, 7, 8, 1]], dtype=int32)
But this is relatively inefficient. NumPy performs best with fixed size arrays. If you know the size of your array in advance, you can define an empty array and update rows sequentially. See the alternatives described by #norok2.

there are multiple way to do it in numpy , the easiest way is to use vstack like this :
for Ex :
#you have these array you want to concat
d1 = [0, 1, 2, 3]
d2 = [4, 5, 6, 7]
d3 = [4, 5, 6, 7]
#initialize your variable with zero raw
X = np.zeros((0,4))
#then each time you call your function use np.vstack like this :
X = np.vstack((np.array(d1),X))
X = np.vstack((np.array(d2),X))
X = np.vstack((np.array(d2),X))
# and finally you have your array like below
#array([[4., 5., 6., 7.],
# [4., 5., 6., 7.],
# [0., 1., 2., 3.]])

The optimal solution depends on the numbers / sizes you are dealing with.
My favorite solution (which only works if you already know the size of the final result) is to initialize the array which will contain your results and then fill each you could initialize your result and then fill it using views.
This the most memory efficient solution.
If you do not know the size of the final result, then you are better off by generating a list of lists, which can be converted (or stacked) as a NumPy array at the end of the process.
Here are some examples, where gen_1d_list() is used to generate some random numbers to mimic the result of simulate3() (meaning that in the following code, you should replace gen_1d_list(n, dtype) with simulate3()):
stacking1() implements the filling using views
stacking2() implements the list generation and converting to NumPy array
stacking3() implements the list generation and stacking to NumPy array
stacking4() implements the dynamic modification of a NumPy array using vstack() as proposed earlier.
import numpy as np
def gen_1d_list(n, dtype=int):
return list(np.random.randint(1, 100, n, dtype))
def stacking1(n, m, dtype=int):
arr = np.empty((n, m), dtype=dtype)
for i in range(n):
arr[i] = gen_1d_list(m, dtype)
return arr
def stacking2(n, m, dtype=int):
items = [gen_1d_list(m, dtype) for i in range(n)]
arr = np.array(items)
return arr
def stacking3(n, m, dtype=int):
items = [gen_1d_list(m, dtype) for i in range(n)]
arr = np.stack(items, dtype)
return arr
def stacking4(n, m, dtype=int):
arr = np.zeros((0, m), dtype=dtype)
for i in range(n):
arr = np.vstack((gen_1d_list(m, dtype), arr))
return arr
Time-wise, stacking1() and stacking2() are more or less equally fast, while stacking3() and stacking4() are slower (and, in proportion, much slower for small size inputs).
Some numbers, for small size inputs:
n, m = 4, 10
%timeit stacking1(n, m)
# 15.7 µs ± 182 ns per loop (mean ± std. dev. of 7 runs, 100000 loops each)
%timeit stacking2(n, m)
# 14.2 µs ± 141 ns per loop (mean ± std. dev. of 7 runs, 100000 loops each)
%timeit stacking3(n, m)
# 22.7 µs ± 282 ns per loop (mean ± std. dev. of 7 runs, 10000 loops each)
%timeit stacking4(n, m)
# 31.8 µs ± 270 ns per loop (mean ± std. dev. of 7 runs, 10000 loops each)
and for larger size inputs:
n, m = 4, 1000000
%timeit stacking1(n, m)
# 344 ms ± 1.64 ms per loop (mean ± std. dev. of 7 runs, 1 loop each)
%timeit stacking2(n, m)
# 350 ms ± 1.65 ms per loop (mean ± std. dev. of 7 runs, 1 loop each)
%timeit stacking3(n, m)
# 370 ms ± 2.75 ms per loop (mean ± std. dev. of 7 runs, 1 loop each)
%timeit stacking4(n, m)
# 369 ms ± 3.01 ms per loop (mean ± std. dev. of 7 runs, 1 loop each)

Fastest way to construct tuple from elements of list (Python)

I have 3 NumPy arrays, and I want to create tuples of the i-th element of each list. These tuples represent keys for a dictionary I had previously defined.
Ex:
List 1: [1, 2, 3, 4, 5]
List 2: [6, 7, 8, 9, 10]
List 3: [11, 12, 13, 14, 15]
Desired output: [mydict[(1,6,11)],mydict[(2,7,12)],mydict[(3,8,13)],mydict[(4,9,14)],mydict[(5,10,15)]]
These tuples represent keys of a dictionary I have previously defined (essentially, as input variables to a previously calculated function). I had read that this is the best way to store function values for lookup.
My current method of doing this is as follows:
[dict[x] for x in zip(l1, l2, l3)]
This works, but is obviously slow. Is there a way to vectorize this operation, or make it faster in any way? I'm open to changing the way I've stored the function values as well, if that is necessary.
EDIT: My apologies for the question being unclear. I do in fact, have NumPy arrays. My mistake for referring to them as lists and displaying them as such. They are of the same length.

Your question is a bit confusing, since you're calling these NumPy arrays, and asking for a way to vectorize things, but then showing lists, and labeling them as lists in your example, and using list in the title. I'm going to assume you do have arrays.
>>> l1 = np.array([1, 2, 3, 4, 5])
>>> l2 = np.array([6, 7, 8, 9, 10])
>>> l3 = np.array([11, 12, 13, 14, 15])
If so, you can stack these up in a 2D array:
>>> ll = np.stack((l1, l2, l3))
And then you can just transpose that:
>>> lt = ll.T
This is better than vectorized; it's constant-time. NumPy is just creating another view of the same data, with different striding so it reads in column order instead of row order.
>>> lt
array([[ 1, 6, 11],
[ 2, 7, 12],
[ 3, 8, 13],
[ 4, 9, 14],
[ 5, 10, 15]])
As miradulo points out, you can do both of these in one step with column_stack:
>>> lt = np.column_stack((l1, l2, l3))
But I suspect you're actually going to want ll as a value in its own right. (Although I admit I'm just guessing here at what you're trying to do…)
And of course if you want to loop over these rows as 1D arrays instead of doing further vectorized work, you can:
>>> for row in lt:
...: print(row)
[ 1 6 11]
[ 2 7 12]
[ 3 8 13]
[ 4 9 14]
[ 5 10 15]
Of course, you can convert them from 1D arrays to tuples just by calling tuple on each row. Or… whatever that mydict is supposed to be (it doesn't look like a dictionary—there's no key-value pairs, just values), you can do that.
>>> mydict = collections.namedtuple('mydict', list('abc'))
>>> tups = [mydict(*row) for row in lt]
>>> tups
[mydict(a=1, b=6, c=11),
mydict(a=2, b=7, c=12),
mydict(a=3, b=8, c=13),
mydict(a=4, b=9, c=14),
mydict(a=5, b=10, c=15)]
If you're worried about the time to look up a tuple of keys in a dict, itemgetter in the operator module has a C-accelerated version. If keys is a np.array, or a tuple, or whatever, you can do this:
for row in lt:
myvals = operator.itemgetter(*row)(mydict)
# do stuff with myvals
Meanwhile, I decided to slap together a C extension that should be as fast as possible (with no error handling, because I'm lazy it should be a tiny bit faster that way—this code will probably segfault if you give it anything but a dict and a tuple or list):
static PyObject *
itemget_itemget(PyObject *self, PyObject *args) {
PyObject *d;
PyObject *keys;
PyArg_ParseTuple(args, "OO", &d, &keys);
PyObject *seq = PySequence_Fast(keys, "keys must be an iterable");
PyObject **arr = PySequence_Fast_ITEMS(seq);
int seqlen = PySequence_Fast_GET_SIZE(seq);
PyObject *result = PyTuple_New(seqlen);
PyObject **resarr = PySequence_Fast_ITEMS(result);
for (int i=0; i!=seqlen; ++i) {
resarr[i] = PyDict_GetItem(d, arr[i]);
Py_INCREF(resarr[i]);
}
return result;
}
Times for looking up 100 random keys out of a 10000-key dictionary on my laptop with python.org CPython 3.7 on macOS:
itemget.itemget: 1.6µs
operator.itemgetter: 1.8µs
comprehension: 3.4µs
pure-Python operator.itemgetter: 6.7µs
So, I'm pretty sure anything you do is going to be fast enough—that's only 34ns/key that we're trying to optimize. But if that really is too slow, operator.itemgetter does a good enough job moving the loop to C and cuts it roughly in half, which is pretty close to the best possibly result you could expect. (It's hard to imagine looping up a bunch of boxed-value keys in a hash table in much less than 16ns/key, after all.)

Define your 3 lists. You mention 3 arrays, but show lists (and call them that as well):
In [112]: list1,list2,list3 = list(range(1,6)),list(range(6,11)),list(range(11,16))
Now create a dictionary with tuple keys:
In [114]: dd = {x:i for i,x in enumerate(zip(list1,list2,list3))}
In [115]: dd
Out[115]: {(1, 6, 11): 0, (2, 7, 12): 1, (3, 8, 13): 2, (4, 9, 14): 3, (5, 10, 15): 4}
Accessing elements from that dictionary with your code:
In [116]: [dd[x] for x in zip(list1,list2,list3)]
Out[116]: [0, 1, 2, 3, 4]
In [117]: timeit [dd[x] for x in zip(list1,list2,list3)]
1.62 µs ± 11.2 ns per loop (mean ± std. dev. of 7 runs, 1000000 loops each)
Now for an array equivalent - turn the lists into a 2d array:
In [118]: arr = np.array((list1,list2,list3))
In [119]: arr
Out[119]:
array([[ 1, 2, 3, 4, 5],
[ 6, 7, 8, 9, 10],
[11, 12, 13, 14, 15]])
Access the same dictionary elements. If I used column_stack I could have omitted the .T, but that's slower. (array transpose is fast)
In [120]: [dd[tuple(x)] for x in arr.T]
Out[120]: [0, 1, 2, 3, 4]
In [121]: timeit [dd[tuple(x)] for x in arr.T]
15.7 µs ± 21.2 ns per loop (mean ± std. dev. of 7 runs, 100000 loops each)
Notice that this is substantially slower. Iteration over an array is slower than iteration over a list. You can't access elements of a dictionary in any sort of numpy 'vectorized' fashion - you have to use a Python iteration.
I can improve on the array iteration by first turning it into a list:
In [124]: arr.T.tolist()
Out[124]: [[1, 6, 11], [2, 7, 12], [3, 8, 13], [4, 9, 14], [5, 10, 15]]
In [125]: timeit [dd[tuple(x)] for x in arr.T.tolist()]
3.21 µs ± 9.67 ns per loop (mean ± std. dev. of 7 runs, 100000 loops each)
Array construction times:
In [122]: timeit arr = np.array((list1,list2,list3))
3.54 µs ± 15.4 ns per loop (mean ± std. dev. of 7 runs, 100000 loops each)
In [123]: timeit arr = np.column_stack((list1,list2,list3))
18.5 µs ± 11.4 ns per loop (mean ± std. dev. of 7 runs, 100000 loops each)
With the pure Python itemgetter (from v3.6.3) there's no savings:
In [149]: timeit operator.itemgetter(*[tuple(x) for x in arr.T.tolist()])(dd)
3.51 µs ± 16.7 ns per loop (mean ± std. dev. of 7 runs, 100000 loops each)
and if I move the getter definition out of the time loop:
In [151]: %%timeit idx = operator.itemgetter(*[tuple(x) for x in arr.T.tolist()]
...: )
...: idx(dd)
...:
482 ns ± 1.85 ns per loop (mean ± std. dev. of 7 runs, 1000000 loops each)

Summarize 2darray by index

I want to sum columns of a 2d array dat by row index idx. The following example works but is slow for large arrays. Any idea to speed it up?
import numpy as np
dat = np.arange(18).reshape(6, 3, order = 'F')
idx = np.array([0, 1, 1, 1, 2, 2])
for i in np.unique(idx):
print(np.sum(dat[idx==i], axis = 0))
Output
[ 0 6 12]
[ 6 24 42]
[ 9 21 33]

Approach #1
We can leverage matrix-multiplication with np.dot -
In [56]: mask = idx[:,None] == np.unique(idx)
In [57]: mask.T.dot(dat)
Out[57]:
array([[ 0, 6, 12],
[ 6, 24, 42],
[ 9, 21, 33]])
Approach #2
For the case with idx already sorted, we can use np.add.reduceat -
In [52]: p = np.flatnonzero(np.r_[True,idx[:-1] != idx[1:]])
In [53]: np.add.reduceat(dat, p, axis=0)
Out[53]:
array([[ 0, 6, 12],
[ 6, 24, 42],
[ 9, 21, 33]])

A bit faster approach with set object and ndarray.sum() method:
In [216]: for i in set(idx):
...: print(dat[idx == i].sum(axis=0))
...:
[ 0 6 12]
[ 6 24 42]
[ 9 21 33]
Time execution comparison:
In [217]: %timeit for i in np.unique(idx): r = np.sum(dat[idx==i], axis = 0)
109 µs ± 1.1 µs per loop (mean ± std. dev. of 7 runs, 10000 loops each)
In [218]: %timeit for i in set(idx): r = dat[idx == i].sum(axis=0)
71.1 µs ± 1.98 µs per loop (mean ± std. dev. of 7 runs, 10000 loops each)

Develop Reference

Python is a programming language that lets you work quickly and integrate systems more effectively.

Writing a fast function to check elements in a matrix - python

Related

Is there a different way I can use the items in a list exclude boundaries?

Indexing a numpy array with a arrays

Python Numpy appending multiple lists from objects

Fastest way to construct tuple from elements of list (Python)

Summarize 2darray by index

Categories

Resources