Split sorted array into list with sublists

Split sorted array into list with sublists - python

I have a sorted array of float32 Values, I want to split this array into a list of lists containing only the same Values like this:
>>> split_sorted(array) # [1., 1., 1., 2., 2., 3.]
>>> [[1., 1., 1.], [2., 2.], [3.]]
My current approach is this Function
def split_sorted(array):
split = [[array[0]]]
s_index = 0
a_index = 1
while a_index < len(array):
while a_index < len(array) and array[a_index] == split[s_index][0]:
split[s_index].append(array[a_index])
a_index += 1
else:
if a_index < len(array):
s_index += 1
a_index += 1
split.append([array[a_index]])
My Question now is, is there a more Pythonic way to do this? maybe even with numpy? And is this the most performant way?
Thanks a lot!

Approach #1
With a as the array, we can use np.split -
np.split(a,np.flatnonzero(a[:-1] != a[1:])+1)
Sample run -
In [16]: a
Out[16]: array([1., 1., 1., 2., 2., 3.])
In [17]: np.split(a,np.flatnonzero(a[:-1] != a[1:])+1)
Out[17]: [array([1., 1., 1.]), array([2., 2.]), array([3.])]
Approach #2
Another more performant way would be to get the splitting indices and then slicing the array and zipping -
idx = np.flatnonzero(np.r_[True, a[:-1] != a[1:], True])
out = [a[i:j] for i,j in zip(idx[:-1],idx[1:])]
Approach #3
If you have to get a list of sublists as output, we could re-create with list duplication -
mask = np.r_[True, a[:-1] != a[1:], True]
c = np.diff(np.flatnonzero(mask))
out = [[i]*j for i,j in zip(a[mask[:-1]],c)]
Benchmarking
Timings for vectorized approaches on 1000000 elements with 10000 unique elements -
In [145]: np.random.seed(0)
...: a = np.sort(np.random.randint(1,10000,(1000000)))
In [146]: x = a
# Approach #1 from this post
In [147]: %timeit np.split(a,np.flatnonzero(a[:-1] != a[1:])+1)
100 loops, best of 3: 10.5 ms per loop
# Approach #2 from this post
In [148]: %%timeit
...: idx = np.flatnonzero(np.r_[True, a[:-1] != a[1:], True])
...: out = [a[i:j] for i,j in zip(idx[:-1],idx[1:])]
100 loops, best of 3: 5.18 ms per loop
# Approach #3 from this post
In [197]: %%timeit
...: mask = np.r_[True, a[:-1] != a[1:], True]
...: c = np.diff(np.flatnonzero(mask))
...: out = [[i]*j for i,j in zip(a[mask[:-1]],c)]
100 loops, best of 3: 11.1 ms per loop
# #RafaelC's soln
In [149]: %%timeit
...: v,c = np.unique(x, return_counts=True)
...: out = [[a]*b for (a,b) in zip(v,c)]
10 loops, best of 3: 25.6 ms per loop

You can use numpy.unique and zip
v,c = np.unique(x, return_counts=True)
[[a]*b for (a,b) in zip(v,c)]
Outputs
[[1.0, 1.0, 1.0], [2.0, 2.0], [3.0]]
Timings for a 6,000,000 sized array
%timeit v,c = np.unique(x, return_counts=True); [[a]*b for (a,b) in zip(v,c)]
18.2 ms ± 236 µs per loop (mean ± std. dev. of 7 runs, 100 loops each)
%timeit np.split(x,np.flatnonzero(x[:-1] != x[1:])+1)
424 ms ± 11.5 ms per loop (mean ± std. dev. of 7 runs, 1 loop each)
%timeit [list(group) for value, group in itertools.groupby(x)]
180 ms ± 4.42 ms per loop (mean ± std. dev. of 7 runs, 10 loops each)

The function itertools.groupby has this exact behavior.
>>> from itertools import groupby
>>> [list(group) for value, group in groupby(array)]
[[1.0, 1.0, 1.0], [2.0, 2.0], [3.0]]

>>> from itertools import groupby
>>> a = [1., 1., 1., 2., 2., 3.]
>>> for k, g in groupby(a) :
... print k, list(g)
...
1.0 [1.0, 1.0, 1.0]
2.0 [2.0, 2.0]
3.0 [3.0]
You may join the lists, if you like:
>>> result = []
>>> for k, g in groupby(a) :
... result.append( list(g) )
...
>>> result
[[1.0, 1.0, 1.0], [2.0, 2.0], [3.0]]

I improved your code a bit, it's not pythonic, but doesn't use external libraries (and also your code didn't work on the last element in the array):
def split_sorted(array):
splitted = [[]]
standard = array[0]
li = 0 # inner lists index
n = len(array)
for i in range(n):
if standard != array[i]:
standard = array[i]
splitted.append([]) # appending empty list
li += 1
split[li].append(array[i])
return splitted
# test
array = [1,2,2,2,3]
a = split_sorted(array)
print(a)enter code here

Related

numpy.vectorise() signature potentially causes scipy.spatial.distance.jaccard() dimension issue

I'm a numpy baby and am looking at using numpy.vectorise() to compute a distance matrix. I think that a key part of this is the signature param, but when I run the code below I get an error:
import numpy as np
from scipy.spatial.distance import jaccard
#find jaccard dissimilarities for a constant 1 row * m columns array vs each array in an n rows * m columns nested array, outputting a 1 row * n columns array of dissimilarities
vectorised_compute_jac = np.vectorize(jaccard, signature = '(m),(n,m)->(n)')
array_list = [[1, 2, 3], #arrA
[2, 3, 4], #arrB
[4, 5, 6]] #arrC
distance_matrix = np.array([])
for target_array in array_list:
print (target_array)
print (array_list)
#row should be an array of jac distances between target_array and each array in array_list
row = vectorised_compute_jac(target_array , array_list)
print (row, '\n\n')
#np.vectorise() functions return an array of objects of type specified by otype param, based on docs
np.append(distance_matrix, row)
Output + Error:
[1, 2, 3]
[[1, 2, 3], [2, 3, 4], [4, 5, 6]]
Traceback (most recent call last):
File "C:\Users\u03132tk\.spyder-py3\ModuleMapper\untitled1.py", line 21, in <module>
row = vectorised_compute_jac(array, array_list)
File "C:\ANACONDA3\lib\site-packages\numpy\lib\function_base.py", line 2163, in __call__
return self._vectorize_call(func=func, args=vargs)
File "C:\ANACONDA3\lib\site-packages\numpy\lib\function_base.py", line 2237, in _vectorize_call
res = self._vectorize_call_with_signature(func, args)
File "C:\ANACONDA3\lib\site-packages\numpy\lib\function_base.py", line 2277, in _vectorize_call_with_signature
results = func(*(arg[index] for arg in args))
File "C:\ANACONDA3\lib\site-packages\scipy\spatial\distance.py", line 893, in jaccard
v = _validate_vector(v)
File "C:\ANACONDA3\lib\site-packages\scipy\spatial\distance.py", line 340, in _validate_vector
raise ValueError("Input vector should be 1-D.")
ValueError: Input vector should be 1-D.
What I would like, with square brackets indicating numpy arrays not lists, based on array output types discussed in comments above:
#arrA #arrB #arrC
[[JD(AA), JD(AB), JD(AC)], #arrA
[JD(BA), JD(BB), JD(BC)], #arrB
[JD(CA), JD(CB), JD(CC)]] #arrC
Can someone advise how the signature param works and whether thats causing my woes? I suspect it's due to the (n, m) in my signature as it's the only multi-dimensional thing, hence the question :(
Cheers!
Tim

I was going to run your code as is, but then saw that you were misusing np.append. So I'll skip your iteration, and try to recreate the calculation with straight forward list comprehensions.
It looks like jaccard takes 2 1d arrays, and returns a scalar, and you apparently want to calculate it for all pairs of your list of arrays.
In [5]: arr = np.array(array_list)
In [6]: [jaccard(arr[0],b) for b in arr]
Out[6]: [0.0, 1.0, 1.0]
In [7]: [[jaccard(a,b) for b in arr] for a in arr]
Out[7]: [[0.0, 1.0, 1.0], [1.0, 0.0, 1.0], [1.0, 1.0, 0.0]]
In [9]: np.array(_)
Out[9]:
array([[0., 1., 1.],
[1., 0., 1.],
[1., 1., 0.]])
With symmetry and 0s it should be possible to cut down on the jaccard calls with a more selective iteration. But I'll leave that for others.
With your signature, you are telling vectorize to pass a 1d and a 2d array to the jaccard, and to expect back a 1d. That's not right.
This is, I think the correct use of vectorize:
In [12]: vectorised_compute_jac = np.vectorize(jaccard, signature = '(m),(m)->()
...: ')
In [13]: vectorised_compute_jac(arr[None,:,:],arr[:,None,:])
Out[13]:
array([[0., 1., 1.],
[1., 0., 1.],
[1., 1., 0.]])
Compare its time against the nested comprehension:
In [14]: timeit vectorised_compute_jac(arr[None,:,:],arr[:,None,:])
384 µs ± 5.34 µs per loop (mean ± std. dev. of 7 runs, 1000 loops each)
In [15]: timeit np.array([[jaccard(a,b) for b in arr] for a in arr])
203 µs ± 204 ns per loop (mean ± std. dev. of 7 runs, 1000 loops each)
In [15], it's the jaccard calls that dominate the time, not the iteration mechanism. So taking advantage of the symmetry will worth it.
In [17]: timeit jaccard(arr[0],arr[1])
21.2 µs ± 79.5 ns per loop (mean ± std. dev. of 7 runs, 10000 loops each)

Is there a better way to perform calculations using an array of indices on a numpy array? [duplicate]

I have an array of values arr with shape (N,) and an array of coordinates coords with shape (N,2). I want to represent this in an (M,M) array grid such that grid takes the value 0 at coordinates that are not in coords, and for the coordinates that are included it should store the sum of all values in arr that have that coordinate. So if M=3, arr = np.arange(4)+1, and coords = np.array([[0,0,1,2],[0,0,2,2]]) then grid should be:
array([[3., 0., 0.],
[0., 0., 3.],
[0., 0., 4.]])
The reason this is nontrivial is that I need to be able to repeat this step many times and the values in arr change each time, and so can the coordinates. Ideally I am looking for a vectorized solution. I suspect that I might be able to use np.where somehow but it's not immediately obvious how.
Timing the solutions
I have timed the solutions present at this time and it appear that the accumulator method is slightly faster than the sparse matrix method, with the second accumulation method being the slowest for the reasons explained in the comments:
%timeit for x in range(100): accumulate_arr(np.random.randint(100,size=(2,10000)),np.random.normal(0,1,10000))
%timeit for x in range(100): accumulate_arr_v2(np.random.randint(100,size=(2,10000)),np.random.normal(0,1,10000))
%timeit for x in range(100): sparse.coo_matrix((np.random.normal(0,1,10000),np.random.randint(100,size=(2,10000))),(100,100)).A
47.3 ms ± 1.79 ms per loop (mean ± std. dev. of 7 runs, 10 loops each)
103 ms ± 255 µs per loop (mean ± std. dev. of 7 runs, 10 loops each)
48.2 ms ± 36 µs per loop (mean ± std. dev. of 7 runs, 10 loops each)

One way would be to create a sparse.coo_matrix and convert that to dense:
from scipy import sparse
sparse.coo_matrix((arr,coords),(M,M)).A
# array([[3, 0, 0],
# [0, 0, 3],
# [0, 0, 4]])

With np.bincount -
def accumulate_arr(coords, arr):
# Get output array shape
m,n = coords.max(1)+1
# Get linear indices to be used as IDs with bincount
lidx = np.ravel_multi_index(coords, (m,n))
# Or lidx = coords[0]*(coords[1].max()+1) + coords[1]
# Accumulate arr with IDs from lidx
return np.bincount(lidx,arr,minlength=m*n).reshape(m,n)
Sample run -
In [58]: arr
Out[58]: array([1, 2, 3, 4])
In [59]: coords
Out[59]:
array([[0, 0, 1, 2],
[0, 0, 2, 2]])
In [60]: accumulate_arr(coords, arr)
Out[60]:
array([[3., 0., 0.],
[0., 0., 3.],
[0., 0., 4.]])
Another with np.add.at on similar lines and might be easier to follow -
def accumulate_arr_v2(coords, arr):
m,n = coords.max(1)+1
out = np.zeros((m,n), dtype=arr.dtype)
np.add.at(out, tuple(coords), arr)
return out

Constructing a Multidimensional Differentiation Matrix

I have been trying to construct the matrix Dij, defined as
I want to plot it for points located at xi = -cos[ π (2 i + 1) / (2 N)] on the interval [-1,1] to consequentially take derivatives of a function. I am though having problems constructing the differentiating matrix Dij.
I have written a python script as:
import numpy as np
N = 100
x = np.linspace(-1,1,N-1)
for i in range(0, N - 1):
x[i] = -np.cos(np.pi*(2*i + 1)/2*N)
def Dmatrix(x,N):
m_ij = np.zeros(3)
for k in range(len(x)):
for j in range(len(x)):
for i in range(len(x)):
m_ij[i,j,k] = -2/N*((k*np.sin(k*np.pi*(2*i + 1)/2*N(np.cos(k*np.pi*(2*j +1))/2*N)/(np.sin(np.pi*(2*i + 1)/2*N)))
return m_ij
xx = Dmatrix(x,N)
This thus returns the error:
IndexError: too many indices for array
Is there a way one could more efficiently construct this and successfully compute it over all k ?
The goal will be to multiply this matrix by a function and sum over j to get the first order derivative of given function.

m_ij = np.zeros(3) doesn't make a three-dimensional array, it makes an array with one dimension of length 3.
In [1]: import numpy as np
In [2]: m_ij = np.zeros(3)
In [3]: print(m_ij)
[0. 0. 0.]
I suspect you want (as a simple fix)
len_x = len(x)
m_ij = np.zeros((len_x, len_x, len_x))

Look at your x calc by itself
In [418]: N = 10
...: x = np.linspace(-1,1,N-1)
...: y = np.zeros(N)
...: for i in range(N):
...: y[i] = -np.cos(np.pi*(2*i + 1)/2*N)
...:
In [419]: x
Out[419]: array([-1. , -0.75, -0.5 , -0.25, 0. , 0.25, 0.5 , 0.75, 1. ])
In [420]: y
Out[420]: array([1., 1., 1., 1., 1., 1., 1., 1., 1., 1.])
In [421]: (2*np.arange(N)+1)
Out[421]: array([ 1, 3, 5, 7, 9, 11, 13, 15, 17, 19])
In [422]: (2*np.arange(N)+1)/2*N
Out[422]: array([ 5., 15., 25., 35., 45., 55., 65., 75., 85., 95.])
I separated x and y, because otherwise it doesn't make any sense to create x and then over write it.
The y values don't look interesting because they are all just cos of odd whole multiples of pi.
Note how I use np.arange instead of looping on range.

can be implemented as
def D(N):
from numpy import zeros, pi, sin, cos
D = zeros((N, N))
for i in range(N):
for j in range(N):
for k in range(N):
D[i,j] -= k*sin(k*pi*(i+i+1)/2/N)*cos(k*pi*(j+j+1)/2/N)
D[i,j] /= sin(pi*(i+i+1)/2/N)
return D*2/N
It could be convenient to vectorize the inner loop.
On second tought, all the procedure can be vectorized using np.einsum (at the end I have also some timing, the einsum version, of course, abysmally faster than a triple loop):
In [1]: from numpy import set_printoptions ; set_printoptions(linewidth=120)
In [2]: def D(N):
...: from numpy import zeros, pi, sin, cos
...: D = zeros((N, N))
...: for i in range(N):
...: for j in range(N):
...: for k in range(N):
...: D[i,j] -= k * sin(k*pi*(2*i+1)/2/N) * cos(k*pi*(2*j+1)/2/N)
...: D[i,j] /= sin(pi*(2*i+1)/2/N)
...: return D*2/N
In [3]: def E(N):
...: from numpy import arange, cos, einsum, outer, pi, sin
...: i = j = k = arange(N)
...: s_i = sin((2*i+1)*pi/2/N)
...: s_ki = sin(outer(k,(2*i+1)*pi/2/N))
...: c_kj = cos(outer(k,(2*j+1)*pi/2/N))
...: return -2/N*einsum('k, ki, kj -> ij', k, s_ki, c_kj) / s_i[:,None]
In [4]: for N in (3,4,5):
...: print(D(N)) ; print(E(N)) ; print('==========')
...:
[[-1.73205081e+00 2.30940108e+00 -5.77350269e-01]
[-5.77350269e-01 1.22464680e-16 5.77350269e-01]
[ 5.77350269e-01 -2.30940108e+00 1.73205081e+00]]
[[-1.73205081e+00 2.30940108e+00 -5.77350269e-01]
[-5.77350269e-01 1.22464680e-16 5.77350269e-01]
[ 5.77350269e-01 -2.30940108e+00 1.73205081e+00]]
==========
[[-3.15432203 4.46088499 -1.84775907 0.5411961 ]
[-0.76536686 -0.22417076 1.30656296 -0.31702534]
[ 0.31702534 -1.30656296 0.22417076 0.76536686]
[-0.5411961 1.84775907 -4.46088499 3.15432203]]
[[-3.15432203 4.46088499 -1.84775907 0.5411961 ]
[-0.76536686 -0.22417076 1.30656296 -0.31702534]
[ 0.31702534 -1.30656296 0.22417076 0.76536686]
[-0.5411961 1.84775907 -4.46088499 3.15432203]]
==========
[[-4.97979657e+00 7.20682930e+00 -3.40260323e+00 1.70130162e+00 -5.25731112e-01]
[-1.05146222e+00 -4.49027977e-01 2.10292445e+00 -8.50650808e-01 2.48216561e-01]
[ 3.24919696e-01 -1.37638192e+00 2.44929360e-16 1.37638192e+00 -3.24919696e-01]
[-2.48216561e-01 8.50650808e-01 -2.10292445e+00 4.49027977e-01 1.05146222e+00]
[ 5.25731112e-01 -1.70130162e+00 3.40260323e+00 -7.20682930e+00 4.97979657e+00]]
[[-4.97979657e+00 7.20682930e+00 -3.40260323e+00 1.70130162e+00 -5.25731112e-01]
[-1.05146222e+00 -4.49027977e-01 2.10292445e+00 -8.50650808e-01 2.48216561e-01]
[ 3.24919696e-01 -1.37638192e+00 2.44929360e-16 1.37638192e+00 -3.24919696e-01]
[-2.48216561e-01 8.50650808e-01 -2.10292445e+00 4.49027977e-01 1.05146222e+00]
[ 5.25731112e-01 -1.70130162e+00 3.40260323e+00 -7.20682930e+00 4.97979657e+00]]
==========
In [5]: %timeit D(20)
36 ms ± 277 µs per loop (mean ± std. dev. of 7 runs, 10 loops each)
In [6]: %timeit E(20)
146 µs ± 777 ns per loop (mean ± std. dev. of 7 runs, 10000 loops each)
In [7]: %timeit D(100)
4.35 s ± 30.5 ms per loop (mean ± std. dev. of 7 runs, 1 loop each)
In [8]: %timeit E(100)
7.7 ms ± 2.82 ms per loop (mean ± std. dev. of 7 runs, 1 loop each)
In [9]:

python location of elements in one numpy array with location of equal elements in another array

I need not just the values, but the locations of elements in one numpy array that also appear in a second numpy array, and I need the locations in that second array too.
Here's an example of the best I've been able to do:
>>> a=np.arange(0.,15.)
>>> a
array([ 0., 1., 2., 3., 4., 5., 6., 7., 8., 9., 10.,
11., 12., 13., 14.])
>>> b=np.arange(4.,8.,.5)
>>> b
array([ 4. , 4.5, 5. , 5.5, 6. , 6.5, 7. , 7.5])
>>> [ (i,j) for (i,alem) in enumerate(a) for (j,blem) in enumerate(b) if alem==blem]
[(4, 0), (5, 2), (6, 4), (7, 6)]
Anybody have anything faster, numpy specific, or more "pythonic"?

Here is an O((n+k)log(n+k)) (the naive algorithm is O(nk)) solution with np.unique
uniq, inv = np.unique(np.r_[a, b], return_inverse=True)
map = -np.ones((len(uniq),), dtype=int)
map[inv[:len(a)]] = np.arange(len(a))
bina = map[inv[len(a):]]
inds_in_b = np.where(bina != -1)[0]
elements, inds_in_a = b[inds_in_b], bina[inds_in_b]
or you could simply sort a for O((n+k)log(k))
inds = np.argsort(a)
aso = a[inds]
bina = np.searchsorted(aso[:-1], b)
inds_in_b = np.where(b == aso[bina])[0]
elements, inds_in_a = b[inds_in_b], inds[bina[inds_in_b]]

For sorted array a, here's another approach with np.searchsorted making use of its optional argument - side set as left and right -
lidx = np.searchsorted(a,b,'left')
ridx = np.searchsorted(a,b,'right')
mask = lidx != ridx
out = lidx[mask], np.flatnonzero(mask)
# for zipped o/p : zip(lidx[mask], np.flatnonzero(mask))
Runtime test
Approaches -
def searchsorted_where(a,b): # #Paul Panzer's soln
inds = np.argsort(a)
aso = a[inds]
bina = np.searchsorted(aso[:-1], b)
inds_in_b = np.where(b == aso[bina])[0]
return b[inds_in_b], inds_in_b
def in1d_masking(a,b): # #Psidom's soln
logic = np.in1d(b, a)
return b[logic], np.where(logic)[0]
def searchsorted_twice(a,b): # Proposed in this post
lidx = np.searchsorted(a,b,'left')
ridx = np.searchsorted(a,b,'right')
mask = lidx != ridx
return lidx[mask], np.flatnonzero(mask)
Timings -
Case #1 (Using sample data from question and scaling it up) :
In [2]: a=np.arange(0.,15000.)
...: b=np.arange(4.,15000.,0.5)
...:
In [3]: %timeit searchsorted_where(a,b)
...: %timeit in1d_masking(a,b)
...: %timeit searchsorted_twice(a,b)
...:
1000 loops, best of 3: 721 µs per loop
1000 loops, best of 3: 1.76 ms per loop
1000 loops, best of 3: 1.28 ms per loop
Case #2 (Same as case #1 with no. of elems in b comparatively lesser than in a) :
In [4]: a=np.arange(0.,15000.)
...: b=np.arange(4.,15000.,5)
...:
In [5]: %timeit searchsorted_where(a,b)
...: %timeit in1d_masking(a,b)
...: %timeit searchsorted_twice(a,b)
...:
10000 loops, best of 3: 77.4 µs per loop
1000 loops, best of 3: 428 µs per loop
10000 loops, best of 3: 128 µs per loop
Case #3 (and comparatively much lesser elems in b) :
In [6]: a=np.arange(0.,15000.)
...: b=np.arange(4.,15000.,10)
...:
In [7]: %timeit searchsorted_where(a,b)
...: %timeit in1d_masking(a,b)
...: %timeit searchsorted_twice(a,b)
...:
10000 loops, best of 3: 42.8 µs per loop
1000 loops, best of 3: 392 µs per loop
10000 loops, best of 3: 71.9 µs per loop

You can use numpy.in1d to find out the elements of b also in a, logical indexing and numpy.where can get the elements and index correspondingly:
logic = np.in1d(b, a)
list(zip(b[logic], np.where(logic)[0]))
# [(4.0, 0), (5.0, 2), (6.0, 4), (7.0, 6)]
b[logic], np.where(logic)[0]
# (array([ 4., 5., 6., 7.]), array([0, 2, 4, 6]))

Translating arrays from MATLAB to numpy

I am defining an array of two's, with one's on either end. In MATLAB this can be acheived by
x = [1 2*ones(1,3) 1]
In Python, however, numpy gives something quite different:
import numpy
numpy.array([[1],2*numpy.ones(3),[1]])
What is the most efficient way to perform this MATLAB command in Python?

In [33]: import numpy as np
In [34]: np.r_[1, 2*np.ones(3), 1]
Out[34]: array([ 1., 2., 2., 2., 1.])
Alternatively, you could use hstack:
In [42]: np.hstack(([1], 2*np.ones(3), [1]))
Out[42]: array([ 1., 2., 2., 2., 1.])
In [45]: %timeit np.r_[1, 2*np.ones(300), 1]
10000 loops, best of 3: 27.5 us per loop
In [46]: %timeit np.hstack(([1], 2*np.ones(300), [1]))
10000 loops, best of 3: 26.4 us per loop
In [48]: %timeit np.append([1],np.append(2*np.ones(300)[:],[1]))
10000 loops, best of 3: 28.2 us per loop
Thanks to DSM for pointing out that pre-allocating the right-sized array from the very beginning, can be much much faster than appending, using r_ or hstack on smaller arrays:
In [49]: %timeit a = 2*np.ones(300+2); a[0] = 1; a[-1] = 1
100000 loops, best of 3: 6.79 us per loop
In [50]: %timeit a = np.empty(300+2); a.fill(2); a[0] = 1; a[-1] = 1
1000000 loops, best of 3: 1.73 us per loop

Use numpy.ones instead of just ones:
numpy.array([[1],2*numpy.ones(3),[1]])

Develop Reference

Python is a programming language that lets you work quickly and integrate systems more effectively.

Split sorted array into list with sublists - python

The function itertools.groupby has this exact behavior. >>> from itertools import groupby >>> [list(group) for value, group in groupby(array)] [[1.0, 1.0, 1.0], [2.0, 2.0], [3.0]]

Related

numpy.vectorise() signature potentially causes scipy.spatial.distance.jaccard() dimension issue

Is there a better way to perform calculations using an array of indices on a numpy array? [duplicate]

Constructing a Multidimensional Differentiation Matrix

python location of elements in one numpy array with location of equal elements in another array

Translating arrays from MATLAB to numpy

Categories

Resources