Create a numpy matrix with elements as a function of indices - python

How can I create a numpy matrix with its elements being a function of its indices?
For example, a multiplication table: a[i,j] = i*j
An Un-numpy and un-pythonic would be to create an array of zeros and then loop through.
There is no doubt that there is a better way to do this, without a loop.
However, even better would be to create the matrix straight-away.

A generic solution would be to use np.fromfunction()
From the doc:
numpy.fromfunction(function, shape, **kwargs)
Construct an array by executing a function over each coordinate. The
resulting array therefore has a value fn(x, y, z) at coordinate (x, y,
z).
The below snippet should provide the required matrix.
import numpy as np
np.fromfunction(lambda i, j: i*j, (5,5))
Output:
array([[ 0., 0., 0., 0., 0.],
[ 0., 1., 2., 3., 4.],
[ 0., 2., 4., 6., 8.],
[ 0., 3., 6., 9., 12.],
[ 0., 4., 8., 12., 16.]])
The first parameter to the function is a callable which is executed for each of the coordinates. If foo is a function that you pass as the first argument, foo(i,j) will be the value at (i,j). This holds for higher dimensions too. The shape of the coordinate array can be modified using the shape parameter.
Edit:
Based on the comment on using custom functions like lambda x,y: 2*x if x > y else y/2, the following code works:
import numpy as np
def generic_f(shape, elementwise_f):
fv = np.vectorize(elementwise_f)
return np.fromfunction(fv, shape)
def elementwise_f(x , y):
return 2*x if x > y else y/2
print(generic_f( (5,5), elementwise_f))
Output:
[[0. 0.5 1. 1.5 2. ]
[2. 0.5 1. 1.5 2. ]
[4. 4. 1. 1.5 2. ]
[6. 6. 6. 1.5 2. ]
[8. 8. 8. 8. 2. ]]
The user is expected to pass a scalar function that defines the elementwise operation. np.vectorize is used to vectorize the user-defined scalar function and is passed to np.fromfunction().

Here's one way to do that:
>>> indices = numpy.indices((5, 5))
>>> a = indices[0] * indices[1]
>>> a
array([[ 0, 0, 0, 0, 0],
[ 0, 1, 2, 3, 4],
[ 0, 2, 4, 6, 8],
[ 0, 3, 6, 9, 12],
[ 0, 4, 8, 12, 16]])
To further explain, numpy.indices((5, 5)) generates two arrays containing the x and y indices of a 5x5 array like so:
>>> numpy.indices((5, 5))
array([[[0, 0, 0, 0, 0],
[1, 1, 1, 1, 1],
[2, 2, 2, 2, 2],
[3, 3, 3, 3, 3],
[4, 4, 4, 4, 4]],
[[0, 1, 2, 3, 4],
[0, 1, 2, 3, 4],
[0, 1, 2, 3, 4],
[0, 1, 2, 3, 4],
[0, 1, 2, 3, 4]]])
When you multiply these two arrays, numpy multiplies the value of the two arrays at each position and returns the result.

For the multiplication
np.multiply.outer(np.arange(5), np.arange(5)) # a_ij = i * j
and in general
np.frompyfunc(
lambda i, j: f(i, j), 2, 1
).outer(
np.arange(5),
np.arange(5),
).astype(np.float64) # a_ij = f(i, j)
basically you create an np.ufunc via np.frompyfunc and then outer it with the indices.
Edit
Speed comparision between the different solutions.
Small matrices:
Eyy![1]: %timeit np.multiply.outer(np.arange(5), np.arange(5))
100000 loops, best of 3: 4.97 µs per loop
Eyy![2]: %timeit np.array( [ [ i*j for j in xrange(5)] for i in xrange(5)] )
100000 loops, best of 3: 5.51 µs per loop
Eyy![3]: %timeit indices = np.indices((5, 5)); indices[0] * indices[1]
100000 loops, best of 3: 16.1 µs per loop
Bigger matrices:
Eyy![4]: %timeit np.multiply.outer(np.arange(4096), np.arange(4096))
10 loops, best of 3: 62.4 ms per loop
Eyy![5]: %timeit indices = np.indices((4096, 4096)); indices[0] * indices[1]
10 loops, best of 3: 165 ms per loop
Eyy![6]: %timeit np.array( [ [ i*j for j in xrange(4096)] for i in xrange(4096)] )
1 loops, best of 3: 1.39 s per loop

I'm away from my python at the moment, but does this one work?
array( [ [ i*j for j in xrange(5)] for i in xrange(5)] )

Just wanted to add that #Senderle's response can be generalized for any function and dimension:
dims = (3,3,3) #i,j,k
ii = np.indices(dims)
You could then calculate a[i,j,k] = i*j*k as
a = np.prod(ii,axis=0)
or a[i,j,k] = (i-1)*j*k:
a = (ii[0,...]-1)*ii[1,...]*ii[2,...]
etc

Related

Numpy: Finding count of distinct values from associations through binning

Prerequisite
This is a question is an extension of this post. So, some of the introduction of the problem will be similar to that post.
Problem
Let's say result is a 2D array and values is a 1D array. values holds some values associated with each element in result. The mapping of an element in values to result is stored in x_mapping and y_mapping. A position in result can be associated with different values. (x,y) pair from x_mapping and y_mapping is associated with results[-y,x]. I have to find the unique count of the values grouped by associations.
An example for better clarification.
result array:
[[ 0., 0.],
[ 0., 0.],
[ 0., 0.],
[ 0., 0.]]
values array:
[ 1., 2., 1., 1., 5., 6., 7., 1.]
Note: Here result arrays and values have the same number of elements. But it might not be the case. There is no relation between the sizes at all.
x_mapping and y_mapping have mappings from 1D values to 2D result. The sizes of x_mapping, y_mapping and values will be the same.
x_mapping - [0, 1, 0, 0, 0, 0, 0, 0]
y_mapping - [0, 3, 2, 2, 0, 3, 2, 0]
Here, 1st value(values[0]), 5th value(values[4]) and 8th value(values[7]) have x as 0 and y as 0 (x_mapping[0] and y_mappping[0]) and hence associated with result[0, 0]. If we compute the count of distinct values from this group- (1,5,1), we will have 2 as result.
#WarrenWeckesser
Let's see how [1, 3] (x,y) pair from x_mapping and y_mapping contribute to results. Since there is only one value, ie 2, associated with this particular group, the results[-3,1] will have one as the number of distinct values associated with that cell is one.
Another example. Let's compute the value of results[-1,1]. From mappings, since there is no value associated with the cell, the value of results[-1,1] will be zero.
Similarly, the position [-2, 0] in results will have value 2.
Note that if there is no association at all then the default value for result will be zero.
The result after computation,
[[ 2., 0.],
[ 1., 1.],
[ 2., 0.],
[ 0., 0.]]
Current working solution
Using the answer from #Divakar, I was able to find a working solution.
x_mapping = np.array([0, 1, 0, 0, 0, 0, 0, 0])
y_mapping = np.array([0, 3, 2, 2, 0, 3, 2, 0])
values = np.array([ 1., 2., 1., 1., 5., 6., 7., 1.], dtype=np.float32)
result = np.zeros([4, 2], dtype=np.float32)
m,n = result.shape
out_dtype = result.dtype
lidx = ((-y_mapping)%m)*n + x_mapping
sidx = lidx.argsort()
idx = lidx[sidx]
val = values[sidx]
m_idx = np.flatnonzero(np.r_[True,idx[:-1] != idx[1:]])
unq_ids = idx[m_idx]
r_res = np.zeros(m_idx.size, dtype=np.float32)
for i in range(0, m_idx.shape[0]):
_next = None
arr = None
if i == m_idx.shape[0]-1:
_next = val.shape[0]
else:
_next = m_idx[i+1]
_start = m_idx[i]
if _start >= _next:
arr = val[_start]
else:
arr = val[_start:_next]
r_res[i] = np.unique(arr).size
result.flat[unq_ids] = r_res
Question
Now, the above solution takes 15ms for operating on 19943 values.
I'm looking for a way to compute the result faster. Is there any more performant way to do this?
Side note
I'm using Numpy version 1.14.3 with Python 3.5.2
Edits
Thanks to #WarrenWeckesser, pointing out that I haven't explained how an element in results is associated with (x,y) from mappings. I have updated the post and added examples for clarity.
Here is one solution
import numpy as np
x_mapping = np.array([0, 1, 0, 0, 0, 0, 0, 0])
y_mapping = np.array([0, 3, 2, 2, 0, 3, 2, 0])
values = np.array([ 1., 2., 1., 1., 5., 6., 7., 1.], dtype=np.float32)
result = np.zeros([4, 2], dtype=np.float32)
# Get flat indices
idx_mapping = np.ravel_multi_index((-y_mapping, x_mapping), result.shape, mode='wrap')
# Sort flat indices and reorders values accordingly
reorder = np.argsort(idx_mapping)
idx_mapping = idx_mapping[reorder]
values = values[reorder]
# Get unique values
val_uniq = np.unique(values)
# Find where each unique value appears
val_uniq_hit = values[:, np.newaxis] == val_uniq
# Find reduction indices (slices with the same flat index)
reduce_idx = np.concatenate([[0], np.nonzero(np.diff(idx_mapping))[0] + 1])
# Reduce slices
reduced = np.logical_or.reduceat(val_uniq_hit, reduce_idx)
# Count distinct values on each slice
counts = np.count_nonzero(reduced, axis=1)
# Put counts in result
result.flat[idx_mapping[reduce_idx]] = counts
print(result)
# [[2. 0.]
# [1. 1.]
# [2. 0.]
# [0. 0.]]
This method takes more memory (O(len(values) * len(np.unique(values)))), but a small benchmark comparing with your original solution shows a significant speedup (although that depends on the actual size of the problem):
import numpy as np
np.random.seed(100)
result = np.zeros([400, 200], dtype=np.float32)
values = np.random.randint(100, size=(20000,)).astype(np.float32)
x_mapping = np.random.randint(result.shape[1], size=values.shape)
y_mapping = np.random.randint(result.shape[0], size=values.shape)
res1 = solution_orig(x_mapping, y_mapping, values, result)
res2 = solution(x_mapping, y_mapping, values, result)
print(np.allclose(res1, res2))
# True
# Original solution
%timeit solution_orig(x_mapping, y_mapping, values, result)
# 76.2 ms ± 623 µs per loop (mean ± std. dev. of 7 runs, 10 loops each)
# This solution
%timeit solution(x_mapping, y_mapping, values, result)
# 13.8 ms ± 51.3 µs per loop (mean ± std. dev. of 7 runs, 100 loops each)
Full code of benchmark functions:
import numpy as np
def solution(x_mapping, y_mapping, values, result):
result = np.array(result)
idx_mapping = np.ravel_multi_index((-y_mapping, x_mapping), result.shape, mode='wrap')
reorder = np.argsort(idx_mapping)
idx_mapping = idx_mapping[reorder]
values = values[reorder]
val_uniq = np.unique(values)
val_uniq_hit = values[:, np.newaxis] == val_uniq
reduce_idx = np.concatenate([[0], np.nonzero(np.diff(idx_mapping))[0] + 1])
reduced = np.logical_or.reduceat(val_uniq_hit, reduce_idx)
counts = np.count_nonzero(reduced, axis=1)
result.flat[idx_mapping[reduce_idx]] = counts
return result
def solution_orig(x_mapping, y_mapping, values, result):
result = np.array(result)
m,n = result.shape
out_dtype = result.dtype
lidx = ((-y_mapping)%m)*n + x_mapping
sidx = lidx.argsort()
idx = lidx[sidx]
val = values[sidx]
m_idx = np.flatnonzero(np.r_[True,idx[:-1] != idx[1:]])
unq_ids = idx[m_idx]
r_res = np.zeros(m_idx.size, dtype=np.float32)
for i in range(0, m_idx.shape[0]):
_next = None
arr = None
if i == m_idx.shape[0]-1:
_next = val.shape[0]
else:
_next = m_idx[i+1]
_start = m_idx[i]
if _start >= _next:
arr = val[_start]
else:
arr = val[_start:_next]
r_res[i] = np.unique(arr).size
result.flat[unq_ids] = r_res
return result

Variable Partial Array Summation in Python

I'm looking for a solution to sum per column in a 2D array ("a" in the example below) and starting from a cell position as defined in a different 1D array ("ref" in the example below).
I have tried the following:
import numpy as np
a = np.arange(20).reshape(5, 4)
print(a) # representing an original large 2D array
ref = np.array([0, 2, 4, 1]) # reference array for defining start of sum
s = a.sum(axis=0)
print(s) # Works: sums all elements per column
s = a[2:].sum(axis=0)
print(s) # Works as well: sum from the third element till end per column
# This is what I look for: sum per column starting at element defined by ref[]
s = np.zeros(4).astype(int) # makes an empty 1D array
for i in np.arange(4): # for each column
for j in np.arange(ref[i], 5):
s[i] += a[j, i] # sums all elements from ref till end (i.e. 5)
print(s) # This is the desired outcome
for i in np.arange(4):
s = a[ref[i]:].sum(axis=0)
print(s) # No good; same as a[ref[4]:].sum(axis=0) and here ref[4] = 1
s = np.zeros(4).astype(int) # makes an empty 1D array
for i in np.arange(4):
s[i] = np.sum(a[ref[i]:, i])
print(s) # Yes; this is also the desired outcome
Is it possible to realize this without using a for loop?
Does numpy have functions for doing this in a single step?
s = a[ref:].sum(axis=0)
This would be nice, but is not working.
Thank you for your time!
A basic solution based on np.cumsum:
In [1]: a = np.arange(15).reshape(5, 3)
In [2]: res = np.array([0, 2, 3])
In [3]: b = np.cumsum(a, axis=0)
In [4]: b
Out[4]:
array([[ 0, 1, 2],
[ 3, 5, 7],
[ 9, 12, 15],
[18, 22, 26],
[30, 35, 40]])
In [5]: a
Out[5]:
array([[ 0, 1, 2],
[ 3, 4, 5],
[ 6, 7, 8],
[ 9, 10, 11],
[12, 13, 14]])
In [6]: b[res, np.arange(a.shape[1])]
Out[6]: array([ 0, 12, 26])
In [7]: b[-1, :] - b[res, np.arange(a.shape[1])]
Out[7]: array([30, 23, 14])
so it does not give us the result we want: we need to add a first line of zeros to b:
In [13]: b = np.vstack([np.zeros((1, a.shape[1])), b])
In [14]: b
Out[14]:
array([[ 0., 0., 0.],
[ 0., 1., 2.],
[ 3., 5., 7.],
[ 9., 12., 15.],
[ 18., 22., 26.],
[ 30., 35., 40.]])
In [17]: b[-1, :] - b[res, np.arange(a.shape[1])]
Out[17]: array([ 30., 30., 25.])
which is, I believe, the desired output.

SciPy sparse matrix (COO,CSR): Clear row

For creating a scipy sparse matrix, I have an array or row and column indices I and J along with a data array V. I use those to construct a matrix in COO format and then convert it to CSR,
matrix = sparse.coo_matrix((V, (I, J)), shape=(n, n))
matrix = matrix.tocsr()
I have a set of row indices for which the only entry should be a 1.0 on the diagonal. So far, I go through I, find all indices that need wiping, and do just that:
def find(lst, a):
# From <http://stackoverflow.com/a/16685428/353337>
return [i for i, x in enumerate(lst) if x in a]
# wipe_rows = [1, 55, 32, ...] # something something
indices = find(I, wipe_rows) # takes too long
I = numpy.delete(I, indices).tolist()
J = numpy.delete(J, indices).tolist()
V = numpy.delete(V, indices).tolist()
# Add entry 1.0 to the diagonal for each wipe row
I.extend(wipe_rows)
J.extend(wipe_rows)
V.extend(numpy.ones(len(wipe_rows)))
# construct matrix via coo
That works alright, but find tends to take a while.
Any hints on how to speed this up? (Perhaps wiping the rows in COO or CSR format is a better idea.)
If you intend to clear multiple rows at once, this
def _wipe_rows_csr(matrix, rows):
assert isinstance(matrix, sparse.csr_matrix)
# delete rows
for i in rows:
matrix.data[matrix.indptr[i]:matrix.indptr[i+1]] = 0.0
# Set the diagonal
d = matrix.diagonal()
d[rows] = 1.0
matrix.setdiag(d)
return
is by far the fastest method. It doesn't really remove the lines, but sets all entries to zeros, then fiddles with the diagonal.
If the entries are actually to be removed, one has to do some array manipulation. This can be quite costly, but if speed is no issue: This
def _wipe_row_csr(A, i):
'''Wipes a row of a matrix in CSR format and puts 1.0 on the diagonal.
'''
assert isinstance(A, sparse.csr_matrix)
n = A.indptr[i+1] - A.indptr[i]
assert n > 0
A.data[A.indptr[i]+1:-n+1] = A.data[A.indptr[i+1]:]
A.data[A.indptr[i]] = 1.0
A.data = A.data[:-n+1]
A.indices[A.indptr[i]+1:-n+1] = A.indices[A.indptr[i+1]:]
A.indices[A.indptr[i]] = i
A.indices = A.indices[:-n+1]
A.indptr[i+1:] -= n-1
return
replaces a given row i of the matrix matrix by the entry 1.0 on the diagonal.
np.in1d should be a faster way of finding the indices:
In [322]: I # from a np.arange(12).reshape(4,3) matrix
Out[322]: array([0, 0, 1, 1, 1, 2, 2, 2, 3, 3, 3], dtype=int32)
In [323]: indices=[i for i, x in enumerate(I) if x in [1,2]]
In [324]: indices
Out[324]: [2, 3, 4, 5, 6, 7]
In [325]: ind1=np.in1d(I,[1,2])
In [326]: ind1
Out[326]:
array([False, False, True, True, True, True, True, True, False,
False, False], dtype=bool)
In [327]: np.where(ind1) # same as indices
Out[327]: (array([2, 3, 4, 5, 6, 7], dtype=int32),)
In [328]: I[~ind1] # same as the delete
Out[328]: array([0, 0, 3, 3, 3], dtype=int32)
Direct manipulation of the coo inputs like this often a good way. But another is to take advantage of the csr math abilities. You should be able to construct a diagonal matrix that zeros out the correct rows, and then adds the ones back in.
Here's what I have in mind:
In [357]: A=np.arange(16).reshape(4,4)
In [358]: M=sparse.coo_matrix(A)
In [359]: M.A
Out[359]:
array([[ 0, 1, 2, 3],
[ 4, 5, 6, 7],
[ 8, 9, 10, 11],
[12, 13, 14, 15]])
In [360]: d1=sparse.diags([(1,0,0,1)],[0],(4,4))
In [361]: d2=sparse.diags([(0,1,1,0)],[0],(4,4))
In [362]: (d1*M+d2).A
Out[362]:
array([[ 0., 1., 2., 3.],
[ 0., 1., 0., 0.],
[ 0., 0., 1., 0.],
[ 12., 13., 14., 15.]])
In [376]: x=np.ones((4,),bool);x[[1,2]]=False
In [378]: d1=sparse.diags([x],[0],(4,4),dtype=int)
In [379]: d2=sparse.diags([~x],[0],(4,4),dtype=int)
Doing this with lil format looks easy:
In [593]: Ml=M.tolil()
In [594]: Ml.data[wipe]=[[1]]*len(wipe)
In [595]: Ml.rows[wipe]=[[i] for i in wipe]
In [596]: Ml.A
Out[596]:
array([[ 0, 1, 2, 3],
[ 0, 1, 0, 0],
[ 0, 0, 1, 0],
[12, 13, 14, 15]], dtype=int32)
It's sort of what you are doing with csr format, but it's easy to replace each row list with the appropriate [1] and [i] list. But conversion times (tolil etc) can hurt run times.

numpy 2D array assignment with 2D value and indices arrays

My goal is to assign the values of an existing 2D array, or create a new array, using two 2D arrays of the same shape, one with values and one with indices to assign the corresponding value to.
X = np.array([range(5),range(5)])
X
array([[0, 1, 2, 3, 4],
[0, 1, 2, 3, 4]])
Y= np.array([range(5), [2,3,4,1,0]])
Y
array([[0, 1, 2, 3, 4],
[2, 3, 4, 1, 0]])
My desired output is an array of the same shape as X and Y, with the values of X given in the index from the corresponding row in Y. This result can be achieved by looping through each row in the following way:
output = np.zeros(X.shape)
for i in range(X.shape[0]):
output[i][Y[i]] = X[i]
output
array([[ 0., 1., 2., 3., 4.],
[ 4., 3., 0., 1., 2.]])
Is there a more efficient way to apply this sort of assignment?
np.take(output, Y)
Will return the items in the output array I would like to assign to the values of X to, but I believe np.take does not produce a reference to the original array, and instead a new array.
for i in range(X.shape[0]):
output[i][Y[i]] = X[i]
is equivalent to
I = np.arange(X.shape[0])[:, np.newaxis]
output[I, Y] = X
For example,
X = np.array([range(5),range(5)])
Y = np.array([range(5), [2,3,4,1,0]])
output = np.zeros(X.shape)
I = np.arange(X.shape[0])[:, np.newaxis]
output[I, Y] = X
yields
>>> output
array([[ 0., 1., 2., 3., 4.],
[ 4., 3., 0., 1., 2.]])
There is not much difference in performance when the loop has few iterations.
But if X.shape[0] is large, then using indexing is much faster:
def using_loop(X, Y):
output = np.zeros(X.shape)
for i in range(X.shape[0]):
output[i][Y[i]] = X[i]
return output
def using_indexing(X, Y):
output = np.zeros(X.shape)
I = np.arange(X.shape[0])[:, np.newaxis]
output[I, Y] = X
return output
X2 = np.tile(X, (100,1))
Y2 = np.tile(Y, (100,1))
In [77]: %timeit using_loop(X2, Y2)
1000 loops, best of 3: 376 µs per loop
In [78]: %timeit using_indexing(X2, Y2)
100000 loops, best of 3: 15.2 µs per loop

Good ways to "expand" a numpy ndarray?

Are there good ways to "expand" a numpy ndarray? Say I have an ndarray like this:
[[1 2]
[3 4]]
And I want each row to contains more elements by filling zeros:
[[1 2 0 0 0]
[3 4 0 0 0]]
I know there must be some brute-force ways to do so (say construct a bigger array with zeros then copy elements from old smaller arrays), just wondering are there pythonic ways to do so. Tried numpy.reshape but didn't work:
import numpy as np
a = np.array([[1, 2], [3, 4]])
np.reshape(a, (2, 5))
Numpy complains that: ValueError: total size of new array must be unchanged
You can use numpy.pad, as follows:
>>> import numpy as np
>>> a=[[1,2],[3,4]]
>>> np.pad(a, ((0,0),(0,3)), mode='constant', constant_values=0)
array([[1, 2, 0, 0, 0],
[3, 4, 0, 0, 0]])
Here np.pad says, "Take the array a and add 0 rows above it, 0 rows below it, 0 columns to the left of it, and 3 columns to the right of it. Fill these columns with a constant specified by constant_values".
There are the index tricks r_ and c_.
>>> import numpy as np
>>> a = np.array([[1, 2], [3, 4]])
>>> z = np.zeros((2, 3), dtype=a.dtype)
>>> np.c_[a, z]
array([[1, 2, 0, 0, 0],
[3, 4, 0, 0, 0]])
If this is performance critical code, you might prefer to use the equivalent np.concatenate rather than the index tricks.
>>> np.concatenate((a,z), axis=1)
array([[1, 2, 0, 0, 0],
[3, 4, 0, 0, 0]])
There are also np.resize and np.ndarray.resize, but they have some limitations (due to the way numpy lays out data in memory) so read the docstring on those ones. You will probably find that simply concatenating is better.
By the way, when I've needed to do this I usually just do it the basic way you've already mentioned (create an array of zeros and assign the smaller array inside it), I don't see anything wrong with that!
Just to be clear: there's no "good" way to extend a NumPy array, as NumPy arrays are not expandable. Once the array is defined, the space it occupies in memory, a combination of the number of its elements and the size of each element, is fixed and cannot be changed. The only thing you can do is to create a new array and replace some of its elements by the elements of the original array.
A lot of functions are available for convenience (the np.concatenate function and its np.*stack shortcuts, the np.column_stack, the indexes routines np.r_ and np.c_...), but there are just that: convenience functions. Some of them are optimized at the C level (the np.concatenate and others, I think), some are not.
Note that there's nothing at all with your initial suggestion of creating a large array 'by hand' (possibly filled with zeros) and filling it yourself with your initial array. It might be more readable that more complicated solutions.
A simple way:
# what you want to expand
x = np.ones((3, 3))
# expand to what shape
target = np.zeros((6, 6))
# do expand
target[:x.shape[0], :x.shape[1]] = x
# print target
array([[ 1., 1., 1., 0., 0., 0.],
[ 1., 1., 1., 0., 0., 0.],
[ 1., 1., 1., 0., 0., 0.],
[ 0., 0., 0., 0., 0., 0.],
[ 0., 0., 0., 0., 0., 0.],
[ 0., 0., 0., 0., 0., 0.]])
Functional way:
borrow from https://stackoverflow.com/a/35751427/1637673, with a little modification.
def pad(array, reference_shape, offsets=None):
"""
array: Array to be padded
reference_shape: tuple of size of narray to create
offsets: list of offsets (number of elements must be equal to the dimension of the array)
will throw a ValueError if offsets is too big and the reference_shape cannot handle the offsets
"""
if not offsets:
offsets = np.zeros(array.ndim, dtype=np.int32)
# Create an array of zeros with the reference shape
result = np.zeros(reference_shape, dtype=np.float32)
# Create a list of slices from offset to offset + shape in each dimension
insertHere = [slice(offsets[dim], offsets[dim] + array.shape[dim]) for dim in range(array.ndim)]
# Insert the array in the result at the specified offsets
result[insertHere] = array
return result
You should use np.column_stack or append
import numpy as np
p = np.array([ [1,2] , [3,4] ])
p = np.column_stack( [ p , [ 0 , 0 ],[0,0] ] )
p
Out[277]:
array([[1, 2, 0, 0],
[3, 4, 0, 0]])
Append seems to be faster though:
timeit np.column_stack( [ p , [ 0 , 0 ],[0,0] ] )
10000 loops, best of 3: 61.8 us per loop
timeit np.append(p, [[0,0],[0,0]],1)
10000 loops, best of 3: 48 us per loop
And a comparison with np.c_ and np.hstack [append still seems to be the fastest]:
In [295]: z=np.zeros((2, 2), dtype=a.dtype)
In [296]: timeit np.c_[a, z]
10000 loops, best of 3: 47.2 us per loop
In [297]: timeit np.append(p, z,1)
100000 loops, best of 3: 13.1 us per loop
In [305]: timeit np.hstack((p,z))
10000 loops, best of 3: 20.8 us per loop
and np.concatenate [that is a even a bit faster than append]:
In [307]: timeit np.concatenate((p, z), axis=1)
100000 loops, best of 3: 11.6 us per loop
there are also similar methods like np.vstack, np.hstack, np.dstack. I like these over np.concatente as it makes it clear what dimension is being "expanded".
temp = np.array([[1, 2], [3, 4]])
np.hstack((temp, np.zeros((2,3))))
it's easy to remember becase numpy's first axis is vertical so vstack expands the first axis and 2nd axis is horizontal so hstack.

Categories

Resources