NumPy reshape array in-place

NumPy reshape array in-place - python

I need to do an in-place resizing of a NumPy array, so I'd prefer numpy.resize() module to numpy.reshape(). I find that numpy.resize() return an array with wrong dimensions if I specify -1 in one of the dimensions of the required shape. Does anyone know why is it so? What is an alternative way to do in-place resizing of an array?

The in-place resize you get with ndarray.resize does not allow for negative dimensions. You can easily check yourself:
a=np.array([[0,1],[2,3]])
a.resize((4,-1))
> ValueError: negative dimensions not allowed
In most of the cases, np.reshape will be returning a view of the array, and hence there will be no unnecessary copying and additional memory allocation involved (though it doesn't modify the array in-place):
a_view = a.reshape(4,-1)
np.shares_memory(a, a_view)
# True
But even though reshape does not allow for in-place operations, what you can do is assign the new shape to the shape attribute of the array, which does allow for negative dimensions:
a.shape = (4,-1)
Which is an in-place operation, and just as efficient as a.resize((4,1)) would be. Note that this method will raise an error when the reshape cannot be done without copying the data.
Here are some timings for efficiency comparison with a larger array, including the timings for reassigning from a view:
def inplace_reshape(a):
a.shape = (10000,-1)
def inplace_resize(a):
a.resize((10000,3))
def reshaped_view(a):
a = np.reshape(a, (10000,-1))
def resized_copy(a):
a = np.resize(a, (10000,3))
a = np.random.random((30000,1))
%timeit inplace_reshape(a)
# 383 ns ± 14.1 ns per loop (mean ± std. dev. of 7 runs, 1000000 loops each)
%timeit inplace_resize(a)
# 294 ns ± 20.8 ns per loop (mean ± std. dev. of 7 runs, 1000000 loops each)
%timeit reshaped_view(a)
# 1.5 µs ± 25.8 ns per loop (mean ± std. dev. of 7 runs, 1000000 loops each)
%timeit resized_copy(a)
# 21.5 µs ± 289 ns per loop (mean ± std. dev. of 7 runs, 10000 loops each)
Where both of them produce the same result:
b = np.copy(a)
a.shape = (10000,-1)
b.resize((10000,3))
np.array_equal(a,b)
# True

Related

Performance of numpy all/any vs testing a single element

I create an array that does not contain a single zero (let's ignore that it does, with zero probability, as np.random.rand() samples [0,1) uniformly). I want to check whether all values are equal to zero (for some other purpose the arrays may contain all zeros). Below are some timings.
Surprisingly to me, checking a single (nonzero) element is about 2000 times faster than using np.all() or np.any(). I would assume that the compiler internally replaces np.all() by np.any() of the inverse condition and that np.any()/np.all() returns True/False at the first instance that the condition is fulfilled/violated (i.e. the compiler does not create the entire array of True or False values first).
How comes np.all() or np.any() are that much slower when it would only have to check one element? Or is this because of the external knowledge I put that the array does not contain all zeros? In the case of an all-zeros array, I guess it might be too slow to do the boolean comparison separately for each element. I don't know about the performance of the underlying low-level algorithms, but each element needs to be accessed once independent of whether it goes one by one or creates the whole boolean array once.
import numpy as np
np.random.seed(100)
a = np.random.rand(10418,144)
%timeit a[0,0] == 0
%timeit (a == 0).all()
%timeit np.all(a == 0)
%timeit (a != 0).any()
%timeit np.any(a != 0)
# 400 ns ± 2.08 ns per loop (mean ± std. dev. of 7 runs, 1000000 loops each)
# 713 µs ± 382 ns per loop (mean ± std. dev. of 7 runs, 1000 loops each)
# 720 µs ± 1.17 µs per loop (mean ± std. dev. of 7 runs, 1000 loops each)
# 711 µs ± 407 ns per loop (mean ± std. dev. of 7 runs, 1000 loops each)
# 723 µs ± 630 ns per loop (mean ± std. dev. of 7 runs, 1000 loops each)

When you write a == 0, numpy creates a new array of type boolean, compares each element in a with 0 and stores the result in the array. This allocation, initialization, and subsequent deallocation is the reason for the high cost.
Note that you don't need the explicit a == 0 in the first place. Integers that are zero always evauate to False, nonzero integers to True. np.all(a) is equivalent to np.all(a != 0). So np.all(a==0) is equivalent to not np.any(a)

How to create an array with values along specified axis?

numpy.full() is a great function which allows us to generate an array of specific shape and values. For example,
>>>np.full((2,2),[1,2])
array([[1,2],
[1,2]])
However, it does not have a built-in option to apply values along a specific axis. So, the following code would not work:
>>>np.full((2,2),[1,2],axis=0)
array([[1,1],
[2,2]])
Hence, I am wondering how I can create a 10x48x271x397 multidimensional array with values [1,2,3,4,5,6,7,8,9,10] inserted along axis=0? In other words, an array with [1,2,3,4,5,6,7,8,9,10] repeated along the first dimensional axis. Is there a way to do this using numpy.full() or an alternative method?
#Does not work, no axis argument in np.full()
values=[1,2,3,4,5,6,7,8,9,10]
np.full((10, 48, 271, 397), values, axis=0)

Edit: adding ideas from Michael Szczesny
import numpy as np
shape = (10, 48, 271, 397)
root = np.arange(shape[0])
You can use np.full or np.broadcast_to (only get a view at creation time):
arr1 = np.broadcast_to(root, shape[::-1]).T
arr2 = np.full(shape[::-1], fill_value=root).T
%timeit np.broadcast_to(root, shape[::-1]).T
%timeit np.full(shape[::-1], fill_value=root).T
# 3.56 µs ± 18.2 ns per loop (mean ± std. dev. of 7 runs, 100,000 loops each)
# 75.6 ms ± 243 µs per loop (mean ± std. dev. of 7 runs, 10 loops each)
And instead of getting the shape backwards and the array backwards again, you can use singleton dimension, but it seems less generalizable:
root = root[:, None, None, None]
arr3 = np.broadcast_to(root, shape)
arr4 = np.full(shape, fill_value=root)
root = np.arange(shape[0])
%timeit root_ = root[:, None, None, None]; np.broadcast_to(root_, shape)
%timeit root_ = root[:, None, None, None]; np.full(shape, fill_value=root_)
# 3.61 µs ± 6.36 ns per loop (mean ± std. dev. of 7 runs, 100,000 loops each)
# 57.5 ms ± 114 µs per loop (mean ± std. dev. of 7 runs, 10 loops each)
Checks that everything is equal and actually what we want:
assert arr1.shape == shape
for i in range(shape[0]):
sub = arr1[i]
assert np.all(sub == i)
assert np.all(arr1 == arr2)
assert np.all(arr1 == arr3)
assert np.all(arr1 == arr4)

Why is np.linalg.norm(..., axis=1) slower than writing out the formula for vector norms?

To normalize the rows of a matrix X to unit length, I usually use:
X /= np.linalg.norm(X, axis=1, keepdims=True)
Trying to optimize this operation for an algorithm, I was quite surprised to see that writing out the normalization is about 40% faster on my machine:
X /= np.sqrt(X[:,0]**2+X[:,1]**2+X[:,2]**2)[:,np.newaxis]
X /= np.sqrt(sum(X[:,i]**2 for i in range(X.shape[1])))[:,np.newaxis]
How comes? Where is the performance lost in np.linalg.norm()?
import numpy as np
X = np.random.randn(10000,3)
%timeit X/np.linalg.norm(X,axis=1, keepdims=True)
# 276 µs ± 4.55 µs per loop (mean ± std. dev. of 7 runs, 1000 loops each)
%timeit X/np.sqrt(X[:,0]**2+X[:,1]**2+X[:,2]**2)[:,np.newaxis]
# 169 µs ± 1.38 µs per loop (mean ± std. dev. of 7 runs, 10000 loops each)
%timeit X/np.sqrt(sum(X[:,i]**2 for i in range(X.shape[1])))[:,np.newaxis]
# 185 µs ± 4.17 µs per loop (mean ± std. dev. of 7 runs, 10000 loops each)
I observe this for (1) python3.6 + numpy v1.17.2 and (2) python3.9 + numpy v1.19.3 on a MacbookPro 2015 with OpenBLAS support.
I don't think this is a duplicate of this post, which addresses matrix norms, while this one is about the L2-norm of vectors.

The source code for row-wise L2-norm boils down to the following lines of code:
def norm(x, keepdims=False):
x = np.asarray(x)
s = x**2
return np.sqrt(s.sum(axis=(1,), keepdims=keepdims))
The simplified code assumes real-valued x and makes use of the fact that np.add.reduce(s, ...) is equivalent to s.sum(...).
The OP question therefore is the same as asking why np.sum(x,axis=1) is slower than sum(x[:,i] for i in range(x.shape[1])):
%timeit X.sum(axis=1, keepdims=False)
# 131 µs ± 1.6 µs per loop (mean ± std. dev. of 7 runs, 10000 loops each)
%timeit sum(X[:,i] for i in range(X.shape[1]))
# 36.7 µs ± 91.2 ns per loop (mean ± std. dev. of 7 runs, 10000 loops each)
This question has been answered already here. In short, the reduction (.sum(axis=1)) comes with overhead costs that generally pay off in terms of floating-point precision and speed (e.g. cache mechanics, parallelism), but don't in the special case of a reduction over just three columns. In this case, the overhead is relatively large compared to the actual computation.
The situation changes if X has more columns. The numpy-boosted normalization now is substantially faster than the reduction using a python for-loop:
X = np.random.randn(10000,100)
%timeit X/np.linalg.norm(X,axis=1, keepdims=True)
# 3.36 ms ± 132 µs per loop (mean ± std. dev. of 7 runs, 100 loops each)
%timeit X/np.sqrt(sum(X[:,i]**2 for i in range(X.shape[1])))[:,np.newaxis]
# 5.92 ms ± 168 µs per loop (mean ± std. dev. of 7 runs, 100 loops each)
Another related SO thread is found here: numpy ufuncs vs. for loop.
The question remains why common special cases for reduction (such as the summation over the columns or rows of a matrix with low axis dimension) are not treated by numpy explicitly. Maybe it's because the effect of such optimizations often depends strongly on the target machine and increases code complexity considerably.

Performance of various numpy fancy indexing methods, also with numba

Since for my program fast indexing of Numpy arrays is quite necessary and fancy indexing doesn't have a good reputation considering performance, I decided to make a few tests. Especially since Numba is developing quite fast, I tried which methods work well with numba.
As inputs I've been using the following arrays for my small-arrays-test:
import numpy as np
import numba as nb
x = np.arange(0, 100, dtype=np.float64) # array to be indexed
idx = np.array((0, 4, 55, -1), dtype=np.int32) # fancy indexing array
bool_mask = np.zeros(x.shape, dtype=np.bool) # boolean indexing mask
bool_mask[idx] = True # set same elements as in idx True
y = np.zeros(idx.shape, dtype=np.float64) # output array
y_bool = np.zeros(bool_mask[bool_mask == True].shape, dtype=np.float64) #bool output array (only for convenience)
And the following arrays for my large-arrays-test (y_bool needed here to cope with dupe numbers from randint):
x = np.arange(0, 1000000, dtype=np.float64)
idx = np.random.randint(0, 1000000, size=int(1000000/50))
bool_mask = np.zeros(x.shape, dtype=np.bool)
bool_mask[idx] = True
y = np.zeros(idx.shape, dtype=np.float64)
y_bool = np.zeros(bool_mask[bool_mask == True].shape, dtype=np.float64)
This yields the following timings without using numba:
%timeit x[idx]
#1.08 µs ± 21 ns per loop (mean ± std. dev. of 7 runs, 1000000 loops each)
#large arrays: 129 µs ± 3.45 µs per loop (mean ± std. dev. of 7 runs, 10000 loops each)
%timeit x[bool_mask]
#482 ns ± 18.1 ns per loop (mean ± std. dev. of 7 runs, 1000000 loops each)
#large arrays: 621 µs ± 15.9 µs per loop (mean ± std. dev. of 7 runs, 1000 loops each)
%timeit np.take(x, idx)
#2.27 µs ± 104 ns per loop (mean ± std. dev. of 7 runs, 100000 loops each)
# large arrays: 112 µs ± 5.76 µs per loop (mean ± std. dev. of 7 runs, 10000 loops each)
%timeit np.take(x, idx, out=y)
#2.65 µs ± 134 ns per loop (mean ± std. dev. of 7 runs, 100000 loops each)
# large arrays: 134 µs ± 4.47 µs per loop (mean ± std. dev. of 7 runs, 10000 loops each)
%timeit x.take(idx)
#919 ns ± 21.3 ns per loop (mean ± std. dev. of 7 runs, 1000000 loops each)
# large arrays: 108 µs ± 1.71 µs per loop (mean ± std. dev. of 7 runs, 10000 loops each)
%timeit x.take(idx, out=y)
#1.79 µs ± 40.7 ns per loop (mean ± std. dev. of 7 runs, 1000000 loops each)
# larg arrays: 131 µs ± 2.92 µs per loop (mean ± std. dev. of 7 runs, 10000 loops each)
%timeit np.compress(bool_mask, x)
#1.93 µs ± 95.8 ns per loop (mean ± std. dev. of 7 runs, 1000000 loops each)
# large arrays: 618 µs ± 15.8 µs per loop (mean ± std. dev. of 7 runs, 1000 loops each)
%timeit np.compress(bool_mask, x, out=y_bool)
#2.58 µs ± 167 ns per loop (mean ± std. dev. of 7 runs, 100000 loops each)
# large arrays: 637 µs ± 9.88 µs per loop (mean ± std. dev. of 7 runs, 1000 loops each)
%timeit x.compress(bool_mask)
#900 ns ± 82.4 ns per loop (mean ± std. dev. of 7 runs, 1000000 loops each)
# large arrays: 628 µs ± 17.8 µs per loop (mean ± std. dev. of 7 runs, 1000 loops each)
%timeit x.compress(bool_mask, out=y_bool)
#1.78 µs ± 59.8 ns per loop (mean ± std. dev. of 7 runs, 1000000 loops each)
# large arrays: 628 µs ± 13.8 µs per loop (mean ± std. dev. of 7 runs, 1000 loops each)
%timeit np.extract(bool_mask, x)
#5.29 µs ± 194 ns per loop (mean ± std. dev. of 7 runs, 100000 loops each)
# large arrays: 641 µs ± 13 µs per loop (mean ± std. dev. of 7 runs, 1000 loops each)
And with numba, using jitting in nopython-mode, caching and nogil I decorated the ways of indexing, which are supported by numba:
#nb.jit(nopython=True, cache=True, nogil=True)
def fancy(x, idx):
x[idx]
#nb.jit(nopython=True, cache=True, nogil=True)
def fancy_bool(x, bool_mask):
x[bool_mask]
#nb.jit(nopython=True, cache=True, nogil=True)
def taker(x, idx):
np.take(x, idx)
#nb.jit(nopython=True, cache=True, nogil=True)
def ndtaker(x, idx):
x.take(idx)
This yields the following results for small and large arrays:
%timeit fancy(x, idx)
#686 ns ± 25.1 ns per loop (mean ± std. dev. of 7 runs, 1000000 loops each)
# large arrays: 84.7 µs ± 1.82 µs per loop (mean ± std. dev. of 7 runs, 10000 loops each)
%timeit fancy_bool(x, bool_mask)
#845 ns ± 31 ns per loop (mean ± std. dev. of 7 runs, 1000000 loops each)
# large arrays: 843 µs ± 14.2 µs per loop (mean ± std. dev. of 7 runs, 1000 loops each)
%timeit taker(x, idx)
#814 ns ± 21.1 ns per loop (mean ± std. dev. of 7 runs, 1000000 loops each)
# large arrays: 87 µs ± 1.52 µs per loop (mean ± std. dev. of 7 runs, 10000 loops each)
%timeit ndtaker(x, idx)
#831 ns ± 24.5 ns per loop (mean ± std. dev. of 7 runs, 1000000 loops each)
# large arrays: 85.4 µs ± 2.69 µs per loop (mean ± std. dev. of 7 runs, 10000 loops each)
Summary
While for numpy without numba it is clear that small arrays are by far best indexed with boolean masks (about a factor 2 compared to ndarray.take(idx)), for larger arrays ndarray.take(idx) will perform best, in this case around 6 times faster than boolean indexing. The breakeven-point is at an array-size of around 1000 cells with and index-array-size of around 20 cells.
For arrays with 1e5 elements and 5e3 index array size, ndarray.take(idx) will be around 10 times faster than boolean mask indexing. So it seems that boolean indexing seems to slow down considerably with array size, but catches up a little after some array-size-threshold is reached.
For the numba jitted functions there is a small speedup for all indexing functions except for boolean mask indexing. Simple fancy indexing works best here, but is still slower than boolean masking without jitting.
For larger arrays boolean mask indexing is a lot slower than the other methods, and even slower than the non-jitted version. The three other methods all perform quite good and around 15% faster than the non-jitted version.
For my case with many arrays of different sizes, fancy indexing with numba is the best way to go. Perhaps some other people can also find some useful information in this quite lengthy post.
Edit:
I'm sorry that I forgot to ask my question, which I in fact have. I was just rapidly typing this at the end of my workday and completely forgot it...
Well, do you know any better and faster method than those that I tested? Using Cython my timings were between Numba and Python.
As the index array is predefined once and used without alteration in long iterations, any way of pre-defining the indexing process would be great. For this I thought about using strides. But I wasn't able to pre-define a custom set of strides. Is it possible to get a predefined view into the memory using strides?
Edit 2:
I guess I'll move my question about predefined constant index arrays which will be used on the same value array (where only the values change but not the shape) for a few million times in iterations to a new and more specific question. This question was too general and perhaps I also formulated the question a little bit misleading. I'll post the link here as soon as I opened the new question!
Here is the link to the followup question.

Your summary isn't completely correct, you already did tests with differently sized arrays but one thing that you didn't do was to change the number of elements indexed.
I restricted it to pure indexing and omitted take (which effectively is integer array indexing) and compress and extract (because these are effectively boolean array indexing). The only difference for these are the constant factors. The constant factor for the methods take and compress will be less than the overhead for the numpy functions np.take and np.compress but otherwise the effects will be negligible for reasonably sized arrays.
Just let me present it with different numbers:
# ~ every 500th element
x = np.arange(0, 1000000, dtype=np.float64)
idx = np.random.randint(0, 1000000, size=int(1000000/500)) # changed the ratio!
bool_mask = np.zeros(x.shape, dtype=np.bool)
bool_mask[idx] = True
%timeit x[idx]
# 51.6 µs ± 2.02 µs per loop (mean ± std. dev. of 7 runs, 10000 loops each)
%timeit x[bool_mask]
# 1.03 ms ± 37.1 µs per loop (mean ± std. dev. of 7 runs, 1000 loops each)
# ~ every 50th element
idx = np.random.randint(0, 1000000, size=int(1000000/50)) # changed the ratio!
bool_mask = np.zeros(x.shape, dtype=np.bool)
bool_mask[idx] = True
%timeit x[idx]
# 1.46 ms ± 55.1 µs per loop (mean ± std. dev. of 7 runs, 1000 loops each)
%timeit x[bool_mask]
# 2.69 ms ± 154 µs per loop (mean ± std. dev. of 7 runs, 100 loops each)
# ~ every 5th element
idx = np.random.randint(0, 1000000, size=int(1000000/5)) # changed the ratio!
bool_mask = np.zeros(x.shape, dtype=np.bool)
bool_mask[idx] = True
%timeit x[idx]
# 14.9 ms ± 495 µs per loop (mean ± std. dev. of 7 runs, 100 loops each)
%timeit x[bool_mask]
# 8.31 ms ± 181 µs per loop (mean ± std. dev. of 7 runs, 100 loops each)
So what happened here? It's simple: Integer array indexing only needs to access as many elements as there are values in the index-array. That means if there are few matches it will be quite fast but slow if there are many indices. Boolean array indexing, however, always needs to walk through the whole boolean array and check for "true" values. That means it should be roughly "constant" for the array.
But, wait, it's not really constant for boolean arrays and why does integer array indexing take longer (last case) than boolean array indexing even if it has to process ~5 times less elements?
That's where it gets more complicated. In this case the boolean array had True at random places which means that it will be subject to branch prediction failures. These will be more likely if True and False will have equal occurrences but at random places. That's why the boolean array indexing got slower - because the ratio of True to False got more equal and thus more "random". Also the result array will be larger if there are more Trues which also consumes more time.
As an example for this branch prediction thing use this as example (could differ with different system/compilers):
bool_mask = np.zeros(x.shape, dtype=np.bool)
bool_mask[:1000000//2] = True # first half True, second half False
%timeit x[bool_mask]
# 5.92 ms ± 118 µs per loop (mean ± std. dev. of 7 runs, 100 loops each)
bool_mask = np.zeros(x.shape, dtype=np.bool)
bool_mask[::2] = True # True and False alternating
%timeit x[bool_mask]
# 16.6 ms ± 361 µs per loop (mean ± std. dev. of 7 runs, 100 loops each)
bool_mask = np.zeros(x.shape, dtype=np.bool)
bool_mask[::2] = True
np.random.shuffle(bool_mask) # shuffled
%timeit x[bool_mask]
# 18.2 ms ± 325 µs per loop (mean ± std. dev. of 7 runs, 100 loops each)
So the distribution of True and False will critically affect the runtime with boolean masks even if they contain the same amount of Trues! The same effect will be visible for the compress-functions.
For integer array indexing (and likewise np.take) another effect will be visible: cache locality. The indices in your case are randomly distributed, so your computer has to do a lot of "RAM" to "processor cache" loads because it's very unlikely two indices will be near to each other.
Compare this:
idx = np.random.randint(0, 1000000, size=int(1000000/5))
%timeit x[idx]
# 15.6 ms ± 703 µs per loop (mean ± std. dev. of 7 runs, 100 loops each)
idx = np.random.randint(0, 1000000, size=int(1000000/5))
idx = np.sort(idx) # sort them
%timeit x[idx]
# 4.33 ms ± 366 µs per loop (mean ± std. dev. of 7 runs, 100 loops each)
By sorting the indices the chances immensely increased that the next value will already be in the cache and this can lead to huge speedups. That's a very important factor if you know that the indices will be sorted (for example if they were created by np.where they are sorted, which makes the result of np.where especially efficient for indexing).
So, it's not like integer array indexing is slower for small arrays and faster for large arrays it depends on much more factors. Both do have their use-cases and depending on the circumstances one might be (considerably) faster than the other.
Let me also talk a bit about the numba functions. First some general statements:
cache won't make a difference, it just avoids recompiling the function. In interactive environments this is essentially useless. It's faster if you would package the functions in a module though.
nogil by itself won't provide any speed boost. It will be faster if it's called in different threads because each function execution can release the GIL and then multiple calls can run in parallel.
Otherwise I don't know how numba effectivly implements these functions, however when you use NumPy features in numba it could be slower or faster - but even if it's faster it won't be much faster (except maybe for small arrays). Because if it could be made faster the NumPy developers would also implement it. My rule of thumb is: If you can do it (vectorized) with NumPy don't bother with numba. Only if you can't do it with vectorized NumPy functions or NumPy would use too many temporary arrays then numba will shine!

How to get figure for web traffic + how to append column to numpy array?

I'd like to know how to append a column to a numpy array? Assuming I read in a .tsv as follows :
from sklearn import metrics,preprocessing,cross_validation
from sklearn.feature_extraction.text import TfidfVectorizer
import sklearn.linear_model as lm
import pandas as p
print "loading data.."
traindata = np.array(p.read_table('train.tsv')) #here is where I am unsure what to do
The first column of traindata holds the URL of each webpage.
The logic I would like after this is :
for each row in traindata
#run function to look up traffic webpage is getting, store this in a numpy array
Add a new column to traindata numpy array, append on the data in the array created into our "for each"
How can this be accomplished generally, even if you just use a "filler" method for retrieving web traffic? :)
Thanks!
Inputs and outputs :
Input : Numpy array of 26 columns.
We call a function on the value in the first column of each row, this function will return a number.
We append all these numbers into a numpy array with one column.
We append the Numpy array with 26 cols to the one made above to end up with a numpy array with 27 columns.
Output : Numpy array of 26 columns.

You can use numpy.hstack to append columns, like this:
import numpy as np
def some_function(x):
return 3*x
input = np.ones([10,26])
input = np.hstack([input,np.empty([input.shape[0],1])])
for row in input:
row[-1] = some_function(row[0])
output = input

One thing I don't like about numpy.hstack or numpy.c_ is that they aren't flexible enough to work on a 2-dimensional or 1-dimensional array.
For example, if I'm trying to calculate a value based on, say, the magnitude of a vector, and append it to that vector (like lifting a point to the paraboloid in a Delaunay triangulation problem), I'd like that function to work for a single 1D array or an array of 1D arrays. The function that I ended up with is:
def append_last_dim(array_in, array_augment):
newshape = list(array_in.T.shape)
newshape[0] += 1
ret_array = np.empty(newshape)
ret_array[:-1] = array_in.T
ret_array[-1] = array_augment
return ret_array.T
Example:
point_list = np.random.rand(5,4)
list_augment = point_list**2.sum(axis=-1) # shape (5,)
%timeit aug_array = append_last_dim(point_list,array_augment)
# 1.68 µs ± 19.9 ns per loop (mean ± std. dev. of 7 runs, 1000000 loops each)
point = point_list[0] # shape (4,)
augment = list_augment[0] # shape ()
%timeit append_last_dim(point, augment)
# 1.24 µs ± 9.78 ns per loop (mean ± std. dev. of 7 runs, 1000000 loops each)
def lift_point(point): # this works for 1 point or array of points
return append_last_dim(point,(point**2).sum(axis-1))
lift_point(point_list).shape # (5,5)
lift_point(point).shape # (5,)
numpy.c_ works with the array of points as-is, but is 10x slower and doesn't work for a single input array:
%timeit retval = np.c_[point_list,array_augment]
# 13.8 µs ± 47.2 ns per loop (mean ± std. dev. of 7 runs, 100000 loops each)
np.c_[point,augment]
# ValueError: all the input array dimensions for the concatenation axis must match exactly,
# but along dimension 0, the array at index 0 has size 4 and the array at
# index 1 has size 1
np.hstack and np.append don't work on the arguments as-is, as point_list and point_augment are of different dimensions, but if you reshape point_augment, then it the result is still ~2x slower and can't handle a single input or array of inputs with a unified call:
%timeit np.hstack((point_list,point_augment.reshape(5,1)))
# 3.49 µs ± 21.2 ns per loop (mean ± std. dev. of 7 runs, 100000 loops each)
%timeit np.append(point_list,point_augment.reshape((5,1)),axis=1)
# 2.45 µs ± 7.91 ns per loop (mean ± std. dev. of 7 runs, 100000 loops each)
Here are times for a list of 1000 points:
point_1k_list = np.random.rand(5,4)
point_augment = (point_1k_list**2).sum(axis=-1)
%timeit append_last_dim(point_1k_list,point_augment)
# 3.91 µs ± 35 ns per loop (mean ± std. dev. of 7 runs, 100000 loops each)
%timeit np.append(point_1k_list,point_augment.reshape((1000,1)),axis=1)
# 6.5 µs ± 140 ns per loop (mean ± std. dev. of 7 runs, 100000 loops each)
%timeit np.hstack((point_1k_list,point_augment.reshape((1000,1))))
# 7.82 µs ± 31.4 ns per loop (mean ± std. dev. of 7 runs, 100000 loops each)
%timeit np.c_[point_1k_list,point_augment]
# 19.3 µs ± 234 ns per loop (mean ± std. dev. of 7 runs, 100000 loops each)
I'm not sure why I can't find better built-in support in numpy for handling single-point or vectorized data, like the 'lift_point' function above.

Develop Reference

Python is a programming language that lets you work quickly and integrate systems more effectively.

NumPy reshape array in-place - python

Related

Performance of numpy all/any vs testing a single element

How to create an array with values along specified axis?

Why is np.linalg.norm(..., axis=1) slower than writing out the formula for vector norms?

Performance of various numpy fancy indexing methods, also with numba

How to get figure for web traffic + how to append column to numpy array?

Categories

Resources