finding zero values in numpy 3-D array [duplicate] - python

NumPy has the efficient function/method nonzero() to identify the indices of non-zero elements in an ndarray object. What is the most efficient way to obtain the indices of the elements that do have a value of zero?

numpy.where() is my favorite.
>>> x = numpy.array([1,0,2,0,3,0,4,5,6,7,8])
>>> numpy.where(x == 0)[0]
array([1, 3, 5])
The method where returns a tuple of ndarrays, each corresponding to a different dimension of the input. Since the input is one-dimensional, the [0] unboxes the tuple's only element.

There is np.argwhere,
import numpy as np
arr = np.array([[1,2,3], [0, 1, 0], [7, 0, 2]])
np.argwhere(arr == 0)
which returns all found indices as rows:
array([[1, 0], # Indices of the first zero
[1, 2], # Indices of the second zero
[2, 1]], # Indices of the third zero
dtype=int64)

You can search for any scalar condition with:
>>> a = np.asarray([0,1,2,3,4])
>>> a == 0 # or whatver
array([ True, False, False, False, False], dtype=bool)
Which will give back the array as an boolean mask of the condition.

You can also use nonzero() by using it on a boolean mask of the condition, because False is also a kind of zero.
>>> x = numpy.array([1,0,2,0,3,0,4,5,6,7,8])
>>> x==0
array([False, True, False, True, False, True, False, False, False, False, False], dtype=bool)
>>> numpy.nonzero(x==0)[0]
array([1, 3, 5])
It's doing exactly the same as mtrw's way, but it is more related to the question ;)

You can use numpy.nonzero to find zero.
>>> import numpy as np
>>> x = np.array([1,0,2,0,3,0,0,4,0,5,0,6]).reshape(4, 3)
>>> np.nonzero(x==0) # this is what you want
(array([0, 1, 1, 2, 2, 3]), array([1, 0, 2, 0, 2, 1]))
>>> np.nonzero(x)
(array([0, 0, 1, 2, 3, 3]), array([0, 2, 1, 1, 0, 2]))

If you are working with a one-dimensional array there is a syntactic sugar:
>>> x = numpy.array([1,0,2,0,3,0,4,5,6,7,8])
>>> numpy.flatnonzero(x == 0)
array([1, 3, 5])

I would do it the following way:
>>> x = np.array([[1,0,0], [0,2,0], [1,1,0]])
>>> x
array([[1, 0, 0],
[0, 2, 0],
[1, 1, 0]])
>>> np.nonzero(x)
(array([0, 1, 2, 2]), array([0, 1, 0, 1]))
# if you want it in coordinates
>>> x[np.nonzero(x)]
array([1, 2, 1, 1])
>>> np.transpose(np.nonzero(x))
array([[0, 0],
[1, 1],
[2, 0],
[2, 1])

import numpy as np
arr = np.arange(10000)
arr[8000:8900] = 0
%timeit np.where(arr == 0)[0]
%timeit np.argwhere(arr == 0)
%timeit np.nonzero(arr==0)[0]
%timeit np.flatnonzero(arr==0)
%timeit np.amin(np.extract(arr != 0, arr))
23.4 µs ± 1.5 µs per loop (mean ± std. dev. of 7 runs, 10000 loops each)
34.5 µs ± 680 ns per loop (mean ± std. dev. of 7 runs, 10000 loops each)
23.2 µs ± 447 ns per loop (mean ± std. dev. of 7 runs, 10000 loops each)
27 µs ± 506 ns per loop (mean ± std. dev. of 7 runs, 10000 loops each)
109 µs ± 669 ns per loop (mean ± std. dev. of 7 runs, 10000 loops each)

import numpy as np
x = np.array([1,0,2,3,6])
non_zero_arr = np.extract(x>0,x)
min_index = np.amin(non_zero_arr)
min_value = np.argmin(non_zero_arr)

Related

How to get a 2D array containing indices of another 2D array

Problem
import numpy as np
I have an an array, without any prior information of its contents. For example:
ourarray = \
np.array([[0,1],
[2,3],
[4,5]])
I want to get the pairs of numbers which can be used for indexing ourarray. Ie I want to get:
array([[0, 0, 1, 1, 2, 2],
[0, 1, 0, 1, 0, 1]])
(0,0, 0,1, 1,0, etc., all the possible indices of ourarray are in this array.)
Similar but different posts
how to find indices of a 2d numpy array occuring in another 2d array: here they search for one array within another one, not returning indices of the entire array.
Find indices of rows of numpy 2d array in another 2D array: they are dealing with two arrays to start with, the objective isn't to create a second array based on the first one containing its indices
Attempt 1 (Successful but inefficient)
I can get this array by:
np.array(np.where(np.ones(ourarray.shape)))
Which gives the desired result but it requires creting np.ones(ourarray.shape), which seems like not an efficient way of doing it.
Attempt 2 (Failed)
I also tried:
np.array(np.where(ourarray))
which does not work because there is no indices returned for the 0 entry of ourarray.
Question
Attempt 1 works, but I am looking for a more efficient way. How can I do this more efficiently?
You can use numpy.argwhere then use .T and get what you want.
try this:
>>> ourarray = np.array([[0,1],[2,3], [4,5]])
>>> np.argwhere(ourarray>=0).T
array([[0, 0, 1, 1, 2, 2],
[0, 1, 0, 1, 0, 1]])
If maybe any values exist in your array you can use this:
ourarray = np.array([[np.nan,1],[2,np.inf], [-4,-5]])
np.argwhere(np.ones(ourarray.shape)==1).T
# array([[0, 0, 1, 1, 2, 2],
# [0, 1, 0, 1, 0, 1]])
How do you intend to use this index?
The tuple produced by nonzero (where) is designed for convenient indexing:
In [54]: idx = np.nonzero(np.ones_like(ourarray))
In [55]: idx
Out[55]: (array([0, 0, 1, 1, 2, 2]), array([0, 1, 0, 1, 0, 1]))
In [56]: ourarray[idx]
Out[56]: array([0, 1, 2, 3, 4, 5])
or equivalently using the 2 arrays explicitly:
In [57]: ourarray[idx[0], idx[1]]
Out[57]: array([0, 1, 2, 3, 4, 5])
Your np.array(idx) can be used as in [57] but not as in [56]. The use of a tuple in [56] is important.
If we apply transpose to this we get an array.
In [58]: tidx = np.transpose(idx)
In [59]: tidx
Out[59]:
array([[0, 0],
[0, 1],
[1, 0],
[1, 1],
[2, 0],
[2, 1]])
to use that for indexing we have to iterate:
In [60]: [ourarray[i,j] for i,j in tidx]
Out[60]: [0, 1, 2, 3, 4, 5]
argwhere as proposed in the other answer is just the transpose. Using outarray>=0 is really no different from the np.ones expression. Both make an array that is True/1 for all elements.
In [61]: np.argwhere(np.ones_like(ourarray))
Out[61]:
array([[0, 0],
[0, 1],
[1, 0],
[1, 1],
[2, 0],
[2, 1]])
There are other ways of generating indices, np.indices, np.meshgrid , np.mgrid, np.ndindex, but they will require some sort of reshaping and/or transpose to get exactly what you want:
In [71]: np.indices(ourarray.shape)
Out[71]:
array([[[0, 0],
[1, 1],
[2, 2]],
[[0, 1],
[0, 1],
[0, 1]]])
In [72]: np.indices(ourarray.shape).reshape(2,6)
Out[72]:
array([[0, 0, 1, 1, 2, 2],
[0, 1, 0, 1, 0, 1]])
timings
If ourarray>=0 works, it is faster than np.ones:
In [79]: timeit np.ones_like(ourarray)
6.22 µs ± 11.5 ns per loop (mean ± std. dev. of 7 runs, 100000 loops each)
In [80]: timeit ourarray>=0
1.43 µs ± 15 ns per loop (mean ± std. dev. of 7 runs, 1000000 loops each)
np.where/nonzero adds a non-trivial time to that:
In [81]: timeit np.nonzero(ourarray>=0)
6.43 µs ± 8.15 ns per loop (mean ± std. dev. of 7 runs, 100000 loops each)
and a bit more time to convert the tuple to array:
In [82]: timeit np.array(np.nonzero(ourarray>=0))
10.4 µs ± 35.7 ns per loop (mean ± std. dev. of 7 runs, 100000 loops each)
The transpose round trip of argwhere adds more time:
In [83]: timeit np.argwhere(ourarray>=0).T
16.9 µs ± 35.4 ns per loop (mean ± std. dev. of 7 runs, 100000 loops each)
indices is about the same as [82], though it may scale differently.
In [84]: timeit np.indices(ourarray.shape).reshape(2,-1)
10.9 µs ± 33.4 ns per loop (mean ± std. dev. of 7 runs, 100000 loops each)

How do I convert Results of a loop to an array in Python? [duplicate]

So let's say I have a 2d array. How can I apply a function to every single item in the array and replace that item with the return? Also, the function's return will be a tuple, so the array will become 3d.
Here is the code in mind.
def filter_func(item):
if 0 <= item < 1:
return (1, 0, 1)
elif 1 <= item < 2:
return (2, 1, 1)
elif 2 <= item < 3:
return (5, 1, 4)
else:
return (4, 4, 4)
myarray = np.array([[2.5, 1.3], [0.4, -1.0]])
# Apply the function to an array
print(myarray)
# Should be array([[[5, 1, 4],
# [2, 1, 1]],
# [[1, 0, 1],
# [4, 4, 4]]])
Any ideas how I could do it? One way is to do np.array(list(map(filter_func, myarray.reshape((12,))))).reshape((2, 2, 3)) but that's quite slow, especially when I need to do it on an array of shape (1024, 1024).
I've also seen people use np.vectorize, but it somehow ends up as (array([[5, 2], [1, 4]]), array([[1, 1], [0, 4]]), array([[4, 1], [1, 4]])). Then it has shape of (3, 2, 2).
No need to change anything in your function.
Just apply the vectorized version of your function to your array
and stack the result:
np.stack(np.vectorize(filter_func)(myarray), axis=2)
The result is:
array([[[5, 1, 4],
[2, 1, 1]],
[[1, 0, 1],
[4, 4, 4]]])
Your list-map:
In [4]: np.array(list(map(filter_func, myarray.reshape((4,))))).reshape((2, 2, 3))
Out[4]:
array([[[5, 1, 4],
[2, 1, 1]],
[[1, 0, 1],
[4, 4, 4]]])
A variation using nested list comprehension:
In [5]: np.array([[filter_func(j) for j in row] for row in myarray])
Out[5]:
array([[[5, 1, 4],
[2, 1, 1]],
[[1, 0, 1],
[4, 4, 4]]])
Using vectorize, the result is one array for each element returned by the function.
In [6]: np.vectorize(filter_func)(myarray)
Out[6]:
(array([[5, 2],
[1, 4]]),
array([[1, 1],
[0, 4]]),
array([[4, 1],
[1, 4]]))
As #Vladi shows these can be combined with stack (or np.array followed by a transpose):
In [7]: np.stack(np.vectorize(filter_func)(myarray),2)
Out[7]:
array([[[5, 1, 4],
[2, 1, 1]],
[[1, 0, 1],
[4, 4, 4]]])
Your list-map is fastest. I've never found vectorize to be faster:
In [8]: timeit np.array(list(map(filter_func, myarray.reshape((4,))))).reshape((2, 2, 3))
17.2 µs ± 47.7 ns per loop (mean ± std. dev. of 7 runs, 100000 loops each)
In [9]: timeit np.array([[filter_func(j) for j in row] for row in myarray])
20.5 µs ± 78.1 ns per loop (mean ± std. dev. of 7 runs, 10000 loops each)
In [10]: timeit np.stack(np.vectorize(filter_func)(myarray),2)
75.2 µs ± 297 ns per loop (mean ± std. dev. of 7 runs, 10000 loops each)
Taking the np.vectorize(filter_func) out of the timing loop helps just a bit.
frompyfunc is similar to vectorize, but returns object dtype. It usually is faster:
In [29]: timeit np.stack(np.frompyfunc(filter_func, 1,3)(myarray),2).astype(int)
28.7 µs ± 125 ns per loop (mean ± std. dev. of 7 runs, 10000 loops each)
Generally if you have a function that only takes scalar inputs, it's hard to do better than simple iteration. vectorize/frompyfunc don't improve on that. Optimal use of numpy requires rewriting the function to work directly with arrays, as #Hammad demonstrates.
Though with this small example, even this proper numpy solution isn't faster. I expect it will scale better:
In [32]: timeit func(myarray)
25 µs ± 60.8 ns per loop (mean ± std. dev. of 7 runs, 10000 loops each)
you could use this function, with vectorised implementation
def func(arr):
elements = np.array([
[1, 0, 1],
[2, 1, 1],
[5, 1, 4],
[4, 4, 4],
])
arr = arr.astype(int)
mask = (arr != 0) & (arr != 1) & (arr != 2)
arr[mask] = -1
return elements[arr]
you wont be able to rewrite your array because of shape mismatch
but you could overwrite the variable myarray
myarray = func(myarray)
myarray
>>> [[[5, 1, 4],
[2, 1, 1]],
[[1, 0, 1],
[4, 4, 4]]]

How would you unionize N-arrays with different sizes?

The :
np.union1d(a, b)
can unionize two arrays with different sizes.
np.vstack((a, b, c)).T.ravel()
can unionize N arrays of the same size.
How would you unionize N-arrays with different sizes ?
And of course it should be fast ;) !
btw union is not just concatenation...
still testing, but would this do it :
np.unique(np.concatenate((a,b,c)))
Here's one with array-assignment + masking for positive numbers -
def unionize_ndarrays(L, maxnum=None):
if maxnum is None:
maxnum = max([np.max(i) for i in L])+1
# for lists : max([max(i) for i in L])+1
id_ar = np.zeros(maxnum, dtype=bool)
for i in L:
id_ar[i] = True
return np.flatnonzero(id_ar)
Computing the max number maxnum has noticeable runtime and could be the bottleneck even for a large number of small arrays. So, if that's known, feeding that in should help a lot on those scenarios.
Sample run -
In [43]: a = np.array([0, 1, 3, 4, 3])
...: b = np.array([0, 10, 3, 1, 2, 1])
...: c = np.array([6, 3, 4, 2])
In [44]: np.unique(np.concatenate((a,b,c)))
Out[44]: array([ 0, 1, 2, 3, 4, 6, 10])
In [45]: unionize_ndarrays((a,b,c))
Out[45]: array([ 0, 1, 2, 3, 4, 6, 10])
Benchmarking
1) Small sized arrays -
In [106]: L = [np.random.randint(0,10,n) for n in np.random.randint(4,10,10000)]
In [107]: %timeit unionize_ndarrays(L, maxnum=10)
2.74 ms ± 207 µs per loop (mean ± std. dev. of 7 runs, 100 loops each)
In [108]: %timeit np.unique(np.concatenate((L)))
3.06 ms ± 24.3 µs per loop (mean ± std. dev. of 7 runs, 100 loops each)
# Without maxnum fed
In [109]: %timeit unionize_ndarrays(L)
40.4 ms ± 542 µs per loop (mean ± std. dev. of 7 runs, 10 loops each)
If order is not important, we can also look into pandas.factorize, if we are dealing with small-sized arrays -
In [76]: a = np.array([0, 1, 3, 4, 3])
...: b = np.array([0, 10, 3, 1, 2, 1])
...: c = np.array([6, 3, 4, 2])
In [77]: L = [a,b,c]
In [80]: import pandas as pd
In [81]: pd.factorize(np.concatenate(L))[1]
Out[81]: array([ 0, 1, 3, 4, 10, 2, 6])
Related timings -
In [82]: L = [np.random.randint(0,10,n) for n in np.random.randint(4,10,10000)]
In [84]: %timeit pd.factorize(np.concatenate(L))[1]
2.1 ms ± 13.3 µs per loop (mean ± std. dev. of 7 runs, 100 loops each)
2) Big-sized (bigger variation in sizes) arrays -
Timings -
In [2]: L = [np.random.randint(0,1000,n) for n in np.random.randint(10,1000,10000)]
In [3]: %timeit unionize_ndarrays(L, maxnum=1000)
...: %timeit unionize_ndarrays(L)
...: %timeit np.unique(np.concatenate((L)))
14 ms ± 925 µs per loop (mean ± std. dev. of 7 runs, 100 loops each)
56.6 ms ± 641 µs per loop (mean ± std. dev. of 7 runs, 10 loops each)
242 ms ± 773 µs per loop (mean ± std. dev. of 7 runs, 1 loop each)
So, to choose one will depend on whether we have the priori info on max number and the size variation.
From NumPy manual:
To find the union of more than two arrays, use functools.reduce:
>>> from functools import reduce
>>> reduce(np.union1d, ([1, 3, 4, 3], [3, 1, 2, 1], [6, 3, 4, 2]))
array([1, 2, 3, 4, 6])
This also works with arrays of different sizes:
>>> reduce(np.union1d, ([0, 1, 3, 4, 3], [0, 10, 3, 1, 2, 1], [6, 3, 4, 2]))
array([ 0, 1, 2, 3, 4, 6, 10])

How to apply function which returns vector to each numpy array element (and get array with higher dimension)

Let's write it directly in code
Note: I edited mapper (original example use x -> (x, 2 * x, 3 * x) just for example), to generic blackbox function, which cause the troubles.
import numpy as np
def blackbox_fn(x): #I can't be changed!
assert np.array(x).shape == (), "I'm a fussy little function!"
return np.array([x, 2*x, 3*x])
# let's have 2d array
arr2d = np.array(list(range(4)), dtype=np.uint8).reshape(2, 2)
# each element should be mapped to vector
def mapper(x, blackbox_fn):
# there is some 3rdparty non-trivial function, returning np.array
# in examples returns np.array((x, 2 * x, 3 * x))
# but still this 3rdparty function operates only on scalar values
return vectorized_blackbox_fn(x)
So for input 2d array
array([[0, 1],
[2, 3]], dtype=uint8)
I would like to get 3d array
array([[[0, 0, 0],
[1, 2, 3]],
[[2, 4, 6],
[3, 6, 9]]], dtype=uint8)
I can write naive algorithm using for loop
# result should be 3d array, last dimension is same as mapper result size
arr3d = np.empty(arr2d.shape + (3,), dtype=np.uint8)
for y in range(arr2d.shape[1]):
for x in xrange(arr2d.shape[0]):
arr3d[x, y] = mapper(arr2d[x, y])
But is seems quite slow for large arrays.
I know there is np.vectorize, but using
np.vectorize(mapper)(arr2d)
not work, because of
ValueError: setting an array element with a sequence.
(seems that vectorize can't change dimension)
Is there some better (numpy idiomatic and faster) solution?
np.vectorize with the new signature option can handle this. It doesn't improve the speed, but makes the dimensional bookkeeping easier.
In [159]: def blackbox_fn(x): #I can't be changed!
...: assert np.array(x).shape == (), "I'm a fussy little function!"
...: return np.array([x, 2*x, 3*x])
...:
The documentation for signature is a bit cryptic. I've worked with it before, so made a good first guess:
In [161]: f = np.vectorize(blackbox_fn, signature='()->(n)')
In [162]: f(np.ones((2,2)))
Out[162]:
array([[[ 1., 2., 3.],
[ 1., 2., 3.]],
[[ 1., 2., 3.],
[ 1., 2., 3.]]])
With your array:
In [163]: arr2d = np.array(list(range(4)), dtype=np.uint8).reshape(2, 2)
In [164]: f(arr2d)
Out[164]:
array([[[0, 0, 0],
[1, 2, 3]],
[[2, 4, 6],
[3, 6, 9]]])
In [165]: _.dtype
Out[165]: dtype('int32')
The dtype is not preserved, because your blackbox_fn doesn't preserve it. As a default vectorize makes a test calculation with the first element, and uses its dtype to determine the result's dtype. It is possible to specify return dtype with the otypes parameter.
It can handle arrays other than 2d:
In [166]: f(np.arange(3))
Out[166]:
array([[0, 0, 0],
[1, 2, 3],
[2, 4, 6]])
In [167]: f(3)
Out[167]: array([3, 6, 9])
With a signature vectorize is using a Python level iteration. Without a signature it uses np.frompyfunc, with a bit better performance. But as long as blackbox_fn has to be called for element of the input, we can't improve the speed by much (at most 2x).
np.frompyfunc returns a object dtype array:
In [168]: fpy = np.frompyfunc(blackbox_fn, 1,1)
In [169]: fpy(1)
Out[169]: array([1, 2, 3])
In [170]: fpy(np.arange(3))
Out[170]: array([array([0, 0, 0]), array([1, 2, 3]), array([2, 4, 6])], dtype=object)
In [171]: np.stack(_)
Out[171]:
array([[0, 0, 0],
[1, 2, 3],
[2, 4, 6]])
In [172]: fpy(arr2d)
Out[172]:
array([[array([0, 0, 0]), array([1, 2, 3])],
[array([2, 4, 6]), array([3, 6, 9])]], dtype=object)
stack can't remove the array nesting in this 2d case:
In [173]: np.stack(_)
Out[173]:
array([[array([0, 0, 0]), array([1, 2, 3])],
[array([2, 4, 6]), array([3, 6, 9])]], dtype=object)
but we can ravel it, and stack. It needs a reshape:
In [174]: np.stack(__.ravel())
Out[174]:
array([[0, 0, 0],
[1, 2, 3],
[2, 4, 6],
[3, 6, 9]])
Speed tests:
In [175]: timeit f(np.arange(1000))
14.7 ms ± 322 µs per loop (mean ± std. dev. of 7 runs, 100 loops each)
In [176]: timeit fpy(np.arange(1000))
4.57 ms ± 161 µs per loop (mean ± std. dev. of 7 runs, 100 loops each)
In [177]: timeit np.stack(fpy(np.arange(1000).ravel()))
6.71 ms ± 207 µs per loop (mean ± std. dev. of 7 runs, 100 loops each)
In [178]: timeit np.array([blackbox_fn(i) for i in np.arange(1000)])
6.44 ms ± 235 µs per loop (mean ± std. dev. of 7 runs, 100 loops each)
Having your function return a list instead of any array might make reassembling the result easier, and maybe even faster
def foo(x):
return [x, 2*x, 3*x]
or playing about with the frompyfunc parameters;
def foo(x):
return x, 2*x, 3*x # return a tuple
In [204]: np.stack(np.frompyfunc(foo, 1,3)(arr2d),2)
Out[204]:
array([[[0, 0, 0],
[1, 2, 3]],
[[2, 4, 6],
[3, 6, 9]]], dtype=object)
10x speed up - I'm surprised:
In [212]: foo1 = np.frompyfunc(foo, 1,3)
In [213]: timeit np.stack(foo1(np.arange(1000)),1)
428 µs ± 17.9 µs per loop (mean ± std. dev. of 7 runs, 1000 loops each)
You can use basic NumPy broadcasting for these kind of "outer products"
np.arange(3)[:, None] * np.arange(2)
# array([[0, 0],
# [0, 1],
# [0, 2]])
In your case it would be
def mapper(x):
return (np.arange(3)[:, None, None] * x).transpose((1, 2, 0))
note the .transpose() is only needed if you specifically need the new axis to be at the end.
And it is almost 3x as fast as stacking 3 separate multiplications:
def mapper(x):
return (np.arange(3)[:, None, None] * x).transpose((1, 2, 0))
def mapper2(x):
return np.stack((x, 2 * x, 3 * x), axis = -1)
a = np.arange(30000).reshape(-1, 30)
%timeit mapper(a) # 48.2 µs ± 417 ns per loop (mean ± std. dev. of 7 runs, 10000 loops each)
%timeit mapper2(a) # 137 µs ± 3.57 µs per loop (mean ± std. dev. of 7 runs, 10000 loops each)
I might be getting this wrong, but comprehension does the job:
a = np.array([[0, 1],
[2, 3]])
np.array([[[j, j*2, j*3] for j in i] for i in a ])
#[[[0 0 0]
# [1 2 3]]
#
# [[2 4 6]
# [3 6 9]]]

how to create an array of specified dimension of specific type initialized with same value in python?

I wanna create some array in python of array of specified dimension of specific type initialized with same value. i can use numpy arrays of specific size but I am not sure how to initialize them with a specific value. Off course I don't want to use zeros() or ones()
Thanks a lot.
There are lots of ways to do this. The first one-liner that occurred to me is tile:
>>> numpy.tile(2, 25)
array([2, 2, 2, 2, 2, 2, 2, 2, 2, 2,
2, 2, 2, 2, 2, 2, 2, 2, 2, 2,
2, 2, 2, 2, 2])
You can tile a value in any shape:
>>> numpy.tile(2, (5, 5))
array([[2, 2, 2, 2, 2],
[2, 2, 2, 2, 2],
[2, 2, 2, 2, 2],
[2, 2, 2, 2, 2],
[2, 2, 2, 2, 2]])
However, as a number of answers below indicate, this isn't the fastest method. It's designed for tiling arrays of any size, not just single values, so if you really just want to fill an array with a single value, then it's much faster to allocate the array first, and then use slice assignment:
>>> a = numpy.empty((5, 5), dtype=int)
>>> a[:] = 2
>>> a
array([[2, 2, 2, 2, 2],
[2, 2, 2, 2, 2],
[2, 2, 2, 2, 2],
[2, 2, 2, 2, 2],
[2, 2, 2, 2, 2]])
According to a few tests I did, there aren't any faster approaches. However, two of the approaches mentioned in answers below are equally fast: ndarray.fill and numpy.full.
These tests were all done in ipython, using Python 3.6.1 on a newish mac running OS 10.12.6. Definitions:
def fill_tile(value, shape):
return numpy.tile(value, shape)
def fill_assign(value, shape, dtype):
new = numpy.empty(shape, dtype=dtype)
new[:] = value
return new
def fill_fill(value, shape, dtype):
new = numpy.empty(shape, dtype=dtype)
new.fill(value)
return new
def fill_full(value, shape, dtype):
return numpy.full(shape, value, dtype=dtype)
def fill_plus(value, shape, dtype):
new = numpy.zeros(shape, dtype=dtype)
new += value
return new
def fill_plus_oneline(value, shape, dtype):
return numpy.zeros(shape, dtype=dtype) + value
for f in [fill_assign, fill_fill, fill_full, fill_plus, fill_plus_oneline]:
assert (fill_tile(2, (500, 500)) == f(2, (500, 500), int)).all()
tile is indeed quite slow:
In [3]: %timeit fill_tile(2, (500, 500))
947 µs ± 10.3 µs per loop (mean ± std. dev. of 7 runs, 1000 loops each)
Slice assignment ties with ndarray.fill and numpy.full for first place:
In [4]: %timeit fill_assign(2, (500, 500), int)
102 µs ± 1.37 µs per loop (mean ± std. dev. of 7 runs, 10000 loops each)
In [5]: %timeit fill_fill(2, (500, 500), int)
102 µs ± 1.99 µs per loop (mean ± std. dev. of 7 runs, 10000 loops each)
In [6]: %timeit fill_full(2, (500, 500), int)
102 µs ± 1.47 µs per loop (mean ± std. dev. of 7 runs, 10000 loops each)
In-place broadcasted addition is only slightly slower:
In [7]: %timeit fill_plus(2, (500, 500), int)
179 µs ± 3.7 µs per loop (mean ± std. dev. of 7 runs, 10000 loops each)
And non-in-place broadcasted addition is only slightly slower than that:
In [8]: %timeit fill_plus_oneline(2, (500, 500), int)
213 µs ± 4.74 µs per loop (mean ± std. dev. of 7 runs, 1000 loops each)
How about:
shape = (100,100)
val = 3.14
dt = np.float
a = np.empty(shape,dtype=dt)
a.fill(val)
This way you can set things and pass the parameters in. Also, in terms of timings
In [35]: %timeit a=np.empty(shape,dtype=dt); a.fill(val)
100000 loops, best of 3: 13 us per loop
In [36]: %timeit a=np.tile(val,shape)
10000 loops, best of 3: 102 us per loop
So using empty with fill seems significantly faster than tile.
As of NumPy 1.8, you can use numpy.full() to achieve this.
>>> import numpy as np
>>> np.full((3,4), 100, dtype = int)
array([[ 100, 100, 100, 100],
[ 100, 100, 100, 100],
[ 100, 100, 100, 100]])
Are you looking for something like this?
>>> [3 for x in range(10)]
[3, 3, 3, 3, 3, 3, 3, 3, 3, 3]
You can pass the resulting array to numpy.array.

Categories

Resources