Related
If I have the following list
[0, 1, 2, 3, 4, 5, 6, 7, 8, 9]
Then
np.array_split([0, 1, 2, 3, 4, 5, 6, 7, 8, 9], 3)
Returns
[array([0, 1, 2, 3]), array([4, 5, 6]), array([7, 8, 9])]
Is there a way to get the sub-arrays in the following order?
[array([0, 3, 6, 9]), array([1, 4, 7]), array([2, 5, 8])]
As the lists are of differing lengths, a numpy.ndarray isn't possible without a bit of fiddling, as all sub-arrays must be the same length.
However, if a simple list meets your requirement, you can use:
l = [0, 1, 2, 3, 4, 5, 6, 7, 8, 9]
l2 = []
for i in range(3):
l2.append(l[i::3])
Output:
[[0, 3, 6, 9], [1, 4, 7], [2, 5, 8]]
Or more concisely, giving the same output:
[l[i::3] for i in range(3)]
Let's look into source code refactor of np.array_split:
def array_split(arr, Nsections):
Neach_section, extras = divmod(len(arr), Nsections)
section_sizes = ([0] + extras * [Neach_section + 1] + (Nsections - extras) * [Neach_section])
div_points = np.array(section_sizes).cumsum()
sub_arrs = []
for i in range(Nsections):
st = div_points[i]
end = div_points[i + 1]
sub_arrs.append(arr[st:end])
return sub_arrs
Taking into account your example arr = [0, 1, 2, 3, 4, 5, 6, 7, 8, 9] and Nsections = 3 it will construct section sizes [0, 4, 3, 3] and dividing points [0, 4, 7, 10]. Then do something like this:
[arr[div_points[i]:div_points[i + 1]] for i in range(3)]
Trying to mimic behaviour of numpy, indeed,
def array_split_withswap(arr, N):
sub_arrs = []
for i in range(N):
sub_arrs.append(arr[i::N])
Is the best option to go with (like in #S3DEV solution).
Given any N-tuple of slices (aka N-D slice) in NumPy how to convert it to corresponding indexes of N-D array represented as tuple of 1D arrays (indexes along each axes)? E.g. if we have np.nd_slice_to_indexes next code:
import numpy as np
print(np.nd_slice_to_indexes(np.s_[1 : 3]))
print(np.nd_slice_to_indexes(np.s_[1 : 3, 5 : 11 : 2]))
should print
(array([1, 2]),)
(array([1, 1, 1, 2, 2, 2]), array([5, 7, 9, 5, 7, 9]))
It is common for NumPy to represent indexes of N-D array as N-tuple of 1-D arrays of same length (each element of k-th array in tuple represents next index along k-th dimension). E.g. np.nonzero returns such N-tuple in code
print(np.nonzero([[0, 1, 1], [1, 1, 0]])) # Non-zero elements in 2D array.
# (array([0, 0, 1, 1], dtype=int64), array([1, 2, 0, 1], dtype=int64))
Same behavior should be achieved like in Pythonic function below, but in a more efficient (performant) way:
Try it online!
import numpy as np
def nd_slice_to_indexes(nd_slice):
assert type(nd_slice) in [tuple, slice], type(nd_slice)
if type(nd_slice) is not tuple:
nd_slice = (nd_slice,)
def iter_slices(slices):
if len(slices) == 0:
yield ()
else:
for i in range(slices[0].start, slices[0].stop, slices[0].step or 1):
for r in iter_slices(slices[1:]):
yield (i,) + r
*res, = np.vstack(list(iter_slices(nd_slice))).T
return tuple(res)
print(nd_slice_to_indexes(np.s_[1 : 3]))
print(nd_slice_to_indexes(np.s_[1 : 3, 5 : 11 : 2]))
print(nd_slice_to_indexes(np.s_[1 : 3, 5 : 11 : 2, 8 : 14 : 3]))
# (array([1, 2]),)
# (array([1, 1, 1, 2, 2, 2]), array([5, 7, 9, 5, 7, 9]))
# (array([1, 1, 1, 1, 1, 1, 2, 2, 2, 2, 2, 2]), array([5, 5, 7, 7, 9, 9, 5, 5, 7, 7, 9, 9]), array([ 8, 11, 8, 11, 8, 11, 8, 11, 8, 11, 8, 11]))
Thanks to suggestion of #hpaulj solved task efficiently using np.mgrid.
Try it online!
import numpy as np
def nd_slice_to_indexes(nd_slice):
grid = np.mgrid[{tuple: nd_slice, slice: (nd_slice,)}[type(nd_slice)]]
return tuple(grid[i].ravel() for i in range(grid.shape[0]))
print(nd_slice_to_indexes(np.s_[1 : 3]))
print(nd_slice_to_indexes(np.s_[1 : 3, 5 : 11 : 2]))
print(nd_slice_to_indexes(np.s_[1 : 3, 5 : 11 : 2, 8 : 14 : 3]))
# (array([1, 2]),)
# (array([1, 1, 1, 2, 2, 2]), array([5, 7, 9, 5, 7, 9]))
# (array([1, 1, 1, 1, 1, 1, 2, 2, 2, 2, 2, 2]), array([5, 5, 7, 7, 9, 9, 5, 5, 7, 7, 9, 9]), array([ 8, 11, 8, 11, 8, 11, 8, 11, 8, 11, 8, 11]))
This question already has an answer here:
Numpy - create matrix with rows of vector
(1 answer)
Closed 2 years ago.
I want to create a NumPy array by duplicating another array by a few rows. I did it as shown below. Is there a NumPyier way of doing this?
>>> a = np.arange(0,10)
>>> a
array([0, 1, 2, 3, 4, 5, 6, 7, 8, 9])
>>> b = tuple( a for _ in range(3) )
>>> b
(array([0, 1, 2, 3, 4, 5, 6, 7, 8, 9]), array([0, 1, 2, 3, 4, 5, 6, 7, 8, 9]), array([0, 1, 2, 3, 4, 5, 6, 7, 8, 9]))
>>> c = np.vstack( b )
>>> c
array([[0, 1, 2, 3, 4, 5, 6, 7, 8, 9],
[0, 1, 2, 3, 4, 5, 6, 7, 8, 9],
[0, 1, 2, 3, 4, 5, 6, 7, 8, 9]])
I found a way to do it. Sharing it here.
>>> a
array([0, 1, 2, 3, 4, 5, 6, 7, 8, 9])
>>> a[None,:]
array([[0, 1, 2, 3, 4, 5, 6, 7, 8, 9]])
>>> np.repeat( a[None,:], 3, axis=0 )
array([[0, 1, 2, 3, 4, 5, 6, 7, 8, 9],
[0, 1, 2, 3, 4, 5, 6, 7, 8, 9],
[0, 1, 2, 3, 4, 5, 6, 7, 8, 9]])
Say I have a Numpy vector,
A = zeros(100)
and I divide it into subvectors by a list of breakpoints which index into A, for instance,
breaks = linspace(0, 100, 11, dtype=int)
So the i-th subvector would be lie between the indices breaks[i] (inclusive) and breaks[i+1] (exclusive).
The breaks are not necessarily equispaced, this is only an example.
However, they will always be strictly increasing.
Now I want to operate on these subvectors. For instance, if I want to set all elements of the i-th subvector to i, I might do:
for i in range(len(breaks) - 1):
A[breaks[i] : breaks[i+1]] = i
Or I might want to compute the subvector means:
b = empty(len(breaks) - 1)
for i in range(len(breaks) - 1):
b = A[breaks[i] : breaks[i+1]].mean()
And so on.
How can I avoid using for loops and instead vectorize these operations?
You can use simple np.cumsum -
import numpy as np
# Form zeros array of same size as input array and
# place ones at positions where intervals change
A1 = np.zeros_like(A)
A1[breaks[1:-1]] = 1
# Perform cumsum along it to create a staircase like array, as the final output
out = A1.cumsum()
Sample run -
In [115]: A
Out[115]: array([3, 8, 0, 4, 6, 4, 8, 0, 2, 7, 4, 9, 3, 7, 3, 8, 6, 7, 1, 6])
In [116]: breaks
Out[116]: array([ 0, 4, 9, 11, 18, 20])
In [142]: out
Out[142]: array([0, 0, 0, 0, 1, 1, 1, 1, 1, 2, 2, 3, 3, 3, 3, 3, 3, 3, 4, 4]..)
If you want to have mean values of those subvectors from A, you can use np.bincount -
mean_vals = np.bincount(out, weights=A)/np.bincount(out)
If you are looking to extend this functionality and use a custom function instead, you might want to look into MATLAB's accumarray equivalent for Python/Numpy: numpy_groupies whose source code is available here.
There really isn't a single answer to your question, but several techniques that you can use as building blocks. Another one you may find helpful:
All numpy ufuncs have a .reduceat method, which you can use to your advantage for some of your calculations:
>>> a = np.arange(100)
>>> breaks = np.linspace(0, 100, 11, dtype=np.intp)
>>> counts = np.diff(breaks)
>>> counts
array([10, 10, 10, 10, 10, 10, 10, 10, 10, 10])
>>> sums = np.add.reduceat(a, breaks[:-1], dtype=np.float)
>>> sums
array([ 45., 145., 245., 345., 445., 545., 645., 745., 845., 945.])
>>> sums / counts # i.e. the mean
array([ 4.5, 14.5, 24.5, 34.5, 44.5, 54.5, 64.5, 74.5, 84.5, 94.5])
You could use np.repeat:
In [35]: np.repeat(np.arange(0, len(breaks)-1), np.diff(breaks))
Out[35]:
array([0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 2, 2, 2,
2, 2, 2, 2, 2, 2, 2, 3, 3, 3, 3, 3, 3, 3, 3, 3, 3, 4, 4, 4, 4, 4, 4,
4, 4, 4, 4, 5, 5, 5, 5, 5, 5, 5, 5, 5, 5, 6, 6, 6, 6, 6, 6, 6, 6, 6,
6, 7, 7, 7, 7, 7, 7, 7, 7, 7, 7, 8, 8, 8, 8, 8, 8, 8, 8, 8, 8, 9, 9,
9, 9, 9, 9, 9, 9, 9, 9])
To compute arbitrary binned statistics you could use scipy.stats.binned_statistic:
import numpy as np
import scipy.stats as stats
breaks = np.linspace(0, 100, 11, dtype=int)
A = np.random.random(100)
means, bin_edges, binnumber = stats.binned_statistic(
x=np.arange(len(A)), values=A, statistic='mean', bins=breaks)
stats.binned_statistic can compute means, medians, counts, sums; or,
to compute an arbitrary statistics for each bin, you can pass a callable to the statistic parameter:
def func(values):
return values.mean()
funcmeans, bin_edges, binnumber = stats.binned_statistic(
x=np.arange(len(A)), values=A, statistic=func, bins=breaks)
assert np.allclose(means, funcmeans)
So lets say I have a list of numbers and I want to create a vector out of all of them in the form (x, 0, 0). How would I do this?
hello = [1, 2, 3, 4, 5, 6, 7, 8, 9, 10]
So when I access, say, hello[2] I get (3, 0, 0) instead of just 3.
Try this, using numpy - "the fundamental package for scientific computing with Python":
import numpy as np
hello = np.array([1, 2, 3, 4, 5, 6, 7, 8, 9, 10])
hello = [np.array([n, 0, 0]) for n in hello]
The above will produce the results you expect:
>>> hello[2]
array([3, 0, 0])
>>> hello[2] * 3
array([9, 0, 0])
If you are working with vectors, it's best to use numpy as it has support for lots of vector operations that Python doesn't
>>> import numpy as np
>>> hello = np.array([1, 2, 3, 4, 5, 6, 7, 8, 9, 10])
>>> hello = (hello*np.array([(1,0,0)]*10).transpose()).transpose()
>>> hello[2]
array([3, 0, 0])
>>> hello[2]*3
array([9, 0, 0])
This should work
hello = [0, 1, 2, 3, 4, 5, 6, 7, 8, 9, 10]
new_hello = [(n, 0, 0) for n in hello]