numpy.full() is a great function which allows us to generate an array of specific shape and values. For example,
>>>np.full((2,2),[1,2])
array([[1,2],
[1,2]])
However, it does not have a built-in option to apply values along a specific axis. So, the following code would not work:
>>>np.full((2,2),[1,2],axis=0)
array([[1,1],
[2,2]])
Hence, I am wondering how I can create a 10x48x271x397 multidimensional array with values [1,2,3,4,5,6,7,8,9,10] inserted along axis=0? In other words, an array with [1,2,3,4,5,6,7,8,9,10] repeated along the first dimensional axis. Is there a way to do this using numpy.full() or an alternative method?
#Does not work, no axis argument in np.full()
values=[1,2,3,4,5,6,7,8,9,10]
np.full((10, 48, 271, 397), values, axis=0)
Edit: adding ideas from Michael Szczesny
import numpy as np
shape = (10, 48, 271, 397)
root = np.arange(shape[0])
You can use np.full or np.broadcast_to (only get a view at creation time):
arr1 = np.broadcast_to(root, shape[::-1]).T
arr2 = np.full(shape[::-1], fill_value=root).T
%timeit np.broadcast_to(root, shape[::-1]).T
%timeit np.full(shape[::-1], fill_value=root).T
# 3.56 µs ± 18.2 ns per loop (mean ± std. dev. of 7 runs, 100,000 loops each)
# 75.6 ms ± 243 µs per loop (mean ± std. dev. of 7 runs, 10 loops each)
And instead of getting the shape backwards and the array backwards again, you can use singleton dimension, but it seems less generalizable:
root = root[:, None, None, None]
arr3 = np.broadcast_to(root, shape)
arr4 = np.full(shape, fill_value=root)
root = np.arange(shape[0])
%timeit root_ = root[:, None, None, None]; np.broadcast_to(root_, shape)
%timeit root_ = root[:, None, None, None]; np.full(shape, fill_value=root_)
# 3.61 µs ± 6.36 ns per loop (mean ± std. dev. of 7 runs, 100,000 loops each)
# 57.5 ms ± 114 µs per loop (mean ± std. dev. of 7 runs, 10 loops each)
Checks that everything is equal and actually what we want:
assert arr1.shape == shape
for i in range(shape[0]):
sub = arr1[i]
assert np.all(sub == i)
assert np.all(arr1 == arr2)
assert np.all(arr1 == arr3)
assert np.all(arr1 == arr4)
Related
I implemented codes to try to get maximum occurrence in numpy array. I was satisfactory using numba, but got limitations. I wonder whether it can be improved to a general case.
numba implementation
import numba as nb
import numpy as np
import collections
#nb.njit("int64(int64[:])")
def max_count_unique_num(x):
"""
Counts maximum number of unique integer in x.
Args:
x (numpy array): Integer array.
Returns:
Int
"""
# get maximum value
m = x[0]
for v in x:
if v > m:
m = v
if m == 0:
return x.size
# count each unique value
num = np.zeros(m + 1, dtype=x.dtype)
for k in x:
num[k] += 1
# maximum count
m = 0
for k in num:
if k > m:
m = k
return m
For comparisons, I also implemented numpy's unique and collections.Counter
def np_unique(x):
""" Counts maximum occurrence using numpy's unique. """
ux, uc = np.unique(x, return_counts=True)
return uc.max()
def counter(x):
""" Counts maximum occurrence using collections.Counter. """
counts = collections.Counter(x)
return max(counts.values())
timeit
Edit: Add np.bincount for additional comparison, as suggested by #MechanicPig.
In [1]: x = np.random.randint(0, 2000, size=30000).astype(np.int64)
In [2]: %timeit max_count_unique_num(x)
30 µs ± 387 ns per loop (mean ± std. dev. of 7 runs, 10,000 loops each)
In [3]: %timeit np_unique(x)
1.14 ms ± 1.65 µs per loop (mean ± std. dev. of 7 runs, 1,000 loops each)
In [4]: %timeit counter(x)
2.68 ms ± 33.9 µs per loop (mean ± std. dev. of 7 runs, 100 loops each)
In [5]: x = np.random.randint(0, 200000, size=30000).astype(np.int64)
In [6]: %timeit counter(x)
3.07 ms ± 40.5 µs per loop (mean ± std. dev. of 7 runs, 100 loops each)
In [7]: %timeit np_unique(x)
1.3 ms ± 7.35 µs per loop (mean ± std. dev. of 7 runs, 1,000 loops each)
In [8]: %timeit max_count_unique_num(x)
490 µs ± 1.47 µs per loop (mean ± std. dev. of 7 runs, 1,000 loops each)
In [9]: x = np.random.randint(0, 2000, size=30000).astype(np.int64)
In [10]: %timeit np.bincount(x).max()
32.3 µs ± 250 ns per loop (mean ± std. dev. of 7 runs, 10,000 loops each)
In [11]: x = np.random.randint(0, 200000, size=30000).astype(np.int64)
In [12]: %timeit np.bincount(x).max()
830 µs ± 6.09 µs per loop (mean ± std. dev. of 7 runs, 1,000 loops each)
The limitations of numba implementation are quite obvious: efficiency only when all values in x are small positive int and will be significantly reduced for very large int; not applicable to float and negative values.
Any way I can generalize the implementation and keep the speed?
Update
After checking the source code of np.unique, an implementation for general cases can be:
#nb.njit(["int64(int64[:])", "int64(float64[:])"])
def max_count_unique_num_2(x):
x.sort()
n = 0
k = 0
x0 = x[0]
for v in x:
if x0 == v:
k += 1
else:
if k > n:
n = k
k = 1
x0 = v
# for last item in x if it equals to previous one
if k > n:
n = k
return n
timeit
In [154]: x = np.random.randint(0, 200000, size=30000).astype(np.int64)
In [155]: %timeit max_count_unique_num(x)
519 µs ± 5.33 µs per loop (mean ± std. dev. of 7 runs, 1,000 loops each)
In [156]: %timeit np_unique(x)
1.3 ms ± 9.88 µs per loop (mean ± std. dev. of 7 runs, 1,000 loops each)
In [157]: %timeit max_count_unique_num_2(x)
240 µs ± 1.92 µs per loop (mean ± std. dev. of 7 runs, 1,000 loops each)
In [158]: x = np.random.randint(0, 200000, size=300000).astype(np.int64)
In [159]: %timeit max_count_unique_num(x)
1.01 ms ± 7.2 µs per loop (mean ± std. dev. of 7 runs, 1,000 loops each)
In [160]: %timeit np_unique(x)
18.1 ms ± 395 µs per loop (mean ± std. dev. of 7 runs, 100 loops each)
In [161]: %timeit max_count_unique_num_2(x)
3.58 ms ± 28.7 µs per loop (mean ± std. dev. of 7 runs, 100 loops each)
So:
If large integer in x and the size is not large, max_count_unique_num_2 beats max_count_unique_num.
Both max_count_unique_num and max_count_unique_num_2 are significantly faster than np.unique.
Small modification on max_count_unique_num_2 can return the item that has maximum occurrence, even all items having same maximum occurrence.
max_count_unique_num_2 can even be accelerated if x is itself sorted by removing x.sort().
What if shortening your code:
#nb.njit("int64(int64[:])", fastmath=True)
def shortened(x):
num = np.zeros(x.max() + 1, dtype=x.dtype)
for k in x:
num[k] += 1
return num.max()
or paralleled:
#nb.njit("int64(int64[:])", parallel=True, fastmath=True)
def shortened_paralleled(x):
num = np.zeros(x.max() + 1, dtype=x.dtype)
for k in nb.prange(x.size):
num[x[k]] += 1
return num.max()
Parallelizing will beat for larger data sizes. Note that parallel will get different result in some runs and need to be cured if be possible.
For handling the floats (or negative values) using Numba:
#nb.njit("int8(float64[:])", fastmath=True)
def shortened_float(x):
num = np.zeros(x.size, dtype=np.int8)
for k in x:
for j in range(x.shape[0]):
if k == x[j]:
num[j] += 1
return num.max()
IMO, np.unique(x, return_counts=True)[1].max() is the best choice which handle both integers and floats in a very fast implementation. Numba can be faster for integers (it depends on the data sizes as larger data sizes weaker performance; AIK, it is due to looping instinct than arrays), but for floats the code must be optimized in terms of performance if it could; But I don't think that Numba can beat NumPy unique, particularly when we faced to large data.
Notes: np.bincount can handle just integers.
You can do that without using numpy too.
arr = [1,1,2,2,3,3,4,5,6,1,3,5,7,1]
counts = list(map(list(arr).count, set(arr)))
list(set(arr))[counts.index(max(counts))]
If you want to use numpy then try this,
arr = np.array([1,1,2,2,3,3,4,5,6,1,3,5,7,1])
uniques, counts = np.unique(arr, return_counts = True)
uniques[np.where(counts == counts.max())]
Both do the exact same job. To check which method is more efficient just do this,
time_i = time.time()
<arr declaration> # Creating a new array each iteration can cause the total time to increase which would be biased against the numpy method.
for i in range(10**5):
<method you want>
time_f = time.time()
When I ran this I got 0.39 seconds for the first method and 2.69 for the second one. So it's pretty safe to say that the first method is more efficient.
What I want to say is that your implementation is almost the same as numpy.bincount. If you want to make it universal, you can consider encoding the original data:
def encode(ar):
# Equivalent to numpy.unique(ar, return_inverse=True)[1] when ar.ndim == 1
flatten = ar.ravel()
perm = flatten.argsort()
sort = flatten[perm]
mask = np.concatenate(([False], sort[1:] != sort[:-1]))
encoded = np.empty(sort.shape, np.int64)
encoded[perm] = mask.cumsum()
encoded.shape = ar.shape
return encoded
def count_max(ar):
return max_count_unique_num(encode(ar))
I have a large 3D NumPy array:
x = np.random.rand(1_000_000_000).reshape(500, 1000, 2000)
And for each of the 500 2D arrays, I need to keep only the largest 800 elements within each column of each 2D array. To avoid costly sorting, I decided to use np.argpartition:
k = 800
idx = np.argpartition(x, -k, axis=1)[:, -k:]
result = x[np.arange(x.shape[0])[:, None, None], idx, np.arange(x.shape[2])]
While np.argpartition is reasonably fast, using idx to index back into x is really slow. Is there a faster (and memory efficient) way to perform this indexing?
Note that the results do not need to be in ascending sorted order. They just need to be the top 800
cutting the size by 10 to fit my memory, here are times for the various steps:
Creationg:
In [65]: timeit x = np.random.rand(1_000_000_00).reshape(500, 1000, 200)
1.89 s ± 82 ms per loop (mean ± std. dev. of 7 runs, 1 loop each)
In [66]: x = np.random.rand(1_000_000_00).reshape(500, 1000, 200)
In [67]: k=800
sort:
In [68]: idx = np.argpartition(x, -k, axis=1)[:, -k:]
In [69]: timeit idx = np.argpartition(x, -k, axis=1)[:, -k:]
2.52 s ± 292 ms per loop (mean ± std. dev. of 7 runs, 1 loop each)
the indexing:
In [70]: result = x[np.arange(x.shape[0])[:, None, None], idx, np.arange(x.shape[2])]
In [71]: timeit result = x[np.arange(x.shape[0])[:, None, None], idx, np.arange(x.shape[2])]
The slowest run took 4.11 times longer than the fastest. This could mean that an intermediate result is being cached.
2.6 s ± 1.87 s per loop (mean ± std. dev. of 7 runs, 1 loop each)
All three steps take about the same time. I don't see anything unusual about the last indexing. This .8 GB.
A simple copy, without indexing is nearly 1 sec.
In [75]: timeit x.copy()
980 ms ± 231 ms per loop (mean ± std. dev. of 7 runs, 1 loop each)
and full copy with advanced indexing:
In [77]: timeit x[np.arange(x.shape[0])[:, None, None], np.arange(x.shape[1])[:,
...: None], np.arange(x.shape[2])]
1.47 s ± 37.7 ms per loop (mean ± std. dev. of 7 runs, 1 loop each)
trying the idx again:
In [78]: timeit result = x[np.arange(x.shape[0])[:, None, None], idx, np.arange(x.shape[2])]
1.71 s ± 42.6 ms per loop (mean ± std. dev. of 7 runs, 1 loop each)
Keep in mind that when the operations start using nearly all the memory, and/or start requiring swapping and special memory requests to the OS, timings can really go to pot.
edit
You don't need the two step process. Just use partition:
out = np.partition(x, -k, axis=1)[:, -k:]
This is the same as result, and takes the same time as the idx step.
import numpy as np
a = np.random.random((500, 500, 500))
b = np.random.random((500, 500))
%timeit a[250, :, :] = b
%timeit a[:, 250, :] = b
%timeit a[:, :, 250] = b
107 µs ± 2.76 µs per loop (mean ± std. dev. of 7 runs, 10000 loops each)
52 µs ± 88.1 ns per loop (mean ± std. dev. of 7 runs, 10000 loops each)
1.59 ms ± 4.45 µs per loop (mean ± std. dev. of 7 runs, 1000 loops each)
Observations:
Performance across the three runs above is different: modifying the 1st and 3rd dimension (slicing the 2nd) is the fastest among the three, while modifying the 1st and 2nd dimension (slicing the 3rd) is the slowest.
There seems no monotonicity of speed w.r.t. the dimension being sliced.
Questions are:
What mechanism behind numpy makes my observations?
With answer to 1st question in mind, how to speed up my code by arranging dimensions properly, as some dimensions are modified in bulk and the rests are just being sliced?
As several comments have indicated, it's all about locality of reference. Think about what numpy has to do at the low-level, and how far away from each other in memory the consecutive lvalues are in the 3rd case.
Note also how the results of the timings change when the array are not C-contiguous, but F-contiguous instead:
a = np.asfortranarray(a)
b = np.asfortranarray(b)
%timeit a[250, :, :] = b
%timeit a[:, 250, :] = b
%timeit a[:, :, 250] = b
892 µs ± 22 µs per loop (mean ± std. dev. of 7 runs, 1000 loops each)
169 µs ± 66.4 ns per loop (mean ± std. dev. of 7 runs, 10000 loops each)
158 µs ± 24.2 ns per loop (mean ± std. dev. of 7 runs, 10000 loops each)
(very small side-note: for the same reason, it is sometimes advantageous to sort a DataFrame before doing a groupby and a bunch of repetitive operations on the groups, somewhat counter-intuitively since the sort itself takes O(nlogn)).
I need to do an in-place resizing of a NumPy array, so I'd prefer numpy.resize() module to numpy.reshape(). I find that numpy.resize() return an array with wrong dimensions if I specify -1 in one of the dimensions of the required shape. Does anyone know why is it so? What is an alternative way to do in-place resizing of an array?
The in-place resize you get with ndarray.resize does not allow for negative dimensions. You can easily check yourself:
a=np.array([[0,1],[2,3]])
a.resize((4,-1))
> ValueError: negative dimensions not allowed
In most of the cases, np.reshape will be returning a view of the array, and hence there will be no unnecessary copying and additional memory allocation involved (though it doesn't modify the array in-place):
a_view = a.reshape(4,-1)
np.shares_memory(a, a_view)
# True
But even though reshape does not allow for in-place operations, what you can do is assign the new shape to the shape attribute of the array, which does allow for negative dimensions:
a.shape = (4,-1)
Which is an in-place operation, and just as efficient as a.resize((4,1)) would be. Note that this method will raise an error when the reshape cannot be done without copying the data.
Here are some timings for efficiency comparison with a larger array, including the timings for reassigning from a view:
def inplace_reshape(a):
a.shape = (10000,-1)
def inplace_resize(a):
a.resize((10000,3))
def reshaped_view(a):
a = np.reshape(a, (10000,-1))
def resized_copy(a):
a = np.resize(a, (10000,3))
a = np.random.random((30000,1))
%timeit inplace_reshape(a)
# 383 ns ± 14.1 ns per loop (mean ± std. dev. of 7 runs, 1000000 loops each)
%timeit inplace_resize(a)
# 294 ns ± 20.8 ns per loop (mean ± std. dev. of 7 runs, 1000000 loops each)
%timeit reshaped_view(a)
# 1.5 µs ± 25.8 ns per loop (mean ± std. dev. of 7 runs, 1000000 loops each)
%timeit resized_copy(a)
# 21.5 µs ± 289 ns per loop (mean ± std. dev. of 7 runs, 10000 loops each)
Where both of them produce the same result:
b = np.copy(a)
a.shape = (10000,-1)
b.resize((10000,3))
np.array_equal(a,b)
# True
I'd like to know how to append a column to a numpy array? Assuming I read in a .tsv as follows :
from sklearn import metrics,preprocessing,cross_validation
from sklearn.feature_extraction.text import TfidfVectorizer
import sklearn.linear_model as lm
import pandas as p
print "loading data.."
traindata = np.array(p.read_table('train.tsv')) #here is where I am unsure what to do
The first column of traindata holds the URL of each webpage.
The logic I would like after this is :
for each row in traindata
#run function to look up traffic webpage is getting, store this in a numpy array
Add a new column to traindata numpy array, append on the data in the array created into our "for each"
How can this be accomplished generally, even if you just use a "filler" method for retrieving web traffic? :)
Thanks!
Inputs and outputs :
Input : Numpy array of 26 columns.
We call a function on the value in the first column of each row, this function will return a number.
We append all these numbers into a numpy array with one column.
We append the Numpy array with 26 cols to the one made above to end up with a numpy array with 27 columns.
Output : Numpy array of 26 columns.
You can use numpy.hstack to append columns, like this:
import numpy as np
def some_function(x):
return 3*x
input = np.ones([10,26])
input = np.hstack([input,np.empty([input.shape[0],1])])
for row in input:
row[-1] = some_function(row[0])
output = input
One thing I don't like about numpy.hstack or numpy.c_ is that they aren't flexible enough to work on a 2-dimensional or 1-dimensional array.
For example, if I'm trying to calculate a value based on, say, the magnitude of a vector, and append it to that vector (like lifting a point to the paraboloid in a Delaunay triangulation problem), I'd like that function to work for a single 1D array or an array of 1D arrays. The function that I ended up with is:
def append_last_dim(array_in, array_augment):
newshape = list(array_in.T.shape)
newshape[0] += 1
ret_array = np.empty(newshape)
ret_array[:-1] = array_in.T
ret_array[-1] = array_augment
return ret_array.T
Example:
point_list = np.random.rand(5,4)
list_augment = point_list**2.sum(axis=-1) # shape (5,)
%timeit aug_array = append_last_dim(point_list,array_augment)
# 1.68 µs ± 19.9 ns per loop (mean ± std. dev. of 7 runs, 1000000 loops each)
point = point_list[0] # shape (4,)
augment = list_augment[0] # shape ()
%timeit append_last_dim(point, augment)
# 1.24 µs ± 9.78 ns per loop (mean ± std. dev. of 7 runs, 1000000 loops each)
def lift_point(point): # this works for 1 point or array of points
return append_last_dim(point,(point**2).sum(axis-1))
lift_point(point_list).shape # (5,5)
lift_point(point).shape # (5,)
numpy.c_ works with the array of points as-is, but is 10x slower and doesn't work for a single input array:
%timeit retval = np.c_[point_list,array_augment]
# 13.8 µs ± 47.2 ns per loop (mean ± std. dev. of 7 runs, 100000 loops each)
np.c_[point,augment]
# ValueError: all the input array dimensions for the concatenation axis must match exactly,
# but along dimension 0, the array at index 0 has size 4 and the array at
# index 1 has size 1
np.hstack and np.append don't work on the arguments as-is, as point_list and point_augment are of different dimensions, but if you reshape point_augment, then it the result is still ~2x slower and can't handle a single input or array of inputs with a unified call:
%timeit np.hstack((point_list,point_augment.reshape(5,1)))
# 3.49 µs ± 21.2 ns per loop (mean ± std. dev. of 7 runs, 100000 loops each)
%timeit np.append(point_list,point_augment.reshape((5,1)),axis=1)
# 2.45 µs ± 7.91 ns per loop (mean ± std. dev. of 7 runs, 100000 loops each)
Here are times for a list of 1000 points:
point_1k_list = np.random.rand(5,4)
point_augment = (point_1k_list**2).sum(axis=-1)
%timeit append_last_dim(point_1k_list,point_augment)
# 3.91 µs ± 35 ns per loop (mean ± std. dev. of 7 runs, 100000 loops each)
%timeit np.append(point_1k_list,point_augment.reshape((1000,1)),axis=1)
# 6.5 µs ± 140 ns per loop (mean ± std. dev. of 7 runs, 100000 loops each)
%timeit np.hstack((point_1k_list,point_augment.reshape((1000,1))))
# 7.82 µs ± 31.4 ns per loop (mean ± std. dev. of 7 runs, 100000 loops each)
%timeit np.c_[point_1k_list,point_augment]
# 19.3 µs ± 234 ns per loop (mean ± std. dev. of 7 runs, 100000 loops each)
I'm not sure why I can't find better built-in support in numpy for handling single-point or vectorized data, like the 'lift_point' function above.