Inserting rows and columns into a numpy array - python

I would like to insert multiple rows and columns into a NumPy array.
If I have a square array of length n_a, e.g.: n_a = 3
a = np.array([[1, 2, 3],
[4, 5, 6],
[7, 8, 9]])
and I would like to get a new array with size n_b, which contains array a and zeros (or any other 1D array of length n_b) on certain rows and columns with indices, e.g.
index = [1, 3]
so n_b = n_a + len(index). Then the new array is:
b = np.array([[1, 0, 2, 0, 3],
[0, 0, 0, 0, 0],
[4, 0, 5, 0, 6],
[0, 0, 0, 0, 0],
[7, 0, 8, 0, 9]])
My question is, how to do this efficiently, with the assumption that by bigger arrays n_a is much larger than len(index).
EDIT
The results for:
import numpy as np
import random
n_a = 5000
n_index = 100
a=np.random.rand(n_a, n_a)
index = random.sample(range(n_a), n_index)
Warren Weckesser's solution: 0.208 s
wim's solution: 0.980 s
Ashwini Chaudhary's solution: 0.955 s
Thank you to all!

Here's one way to do it. It has some overlap with #wim's answer, but it uses index broadcasting to copy a into b with a single assignment.
import numpy as np
a = np.array([[1, 2, 3],
[4, 5, 6],
[7, 8, 9]])
index = [1, 3]
n_b = a.shape[0] + len(index)
not_index = np.array([k for k in range(n_b) if k not in index])
b = np.zeros((n_b, n_b), dtype=a.dtype)
b[not_index.reshape(-1,1), not_index] = a

You can do this by applying two numpy.insert calls on a:
>>> a = np.array([[1, 2, 3],
[4, 5, 6],
[7, 8, 9]])
>>> indices = np.array([1, 3])
>>> i = indices - np.arange(len(indices))
>>> np.insert(np.insert(a, i, 0, axis=1), i, 0, axis=0)
array([[1, 0, 2, 0, 3],
[0, 0, 0, 0, 0],
[4, 0, 5, 0, 6],
[0, 0, 0, 0, 0],
[7, 0, 8, 0, 9]])

Since fancy indexing returns a copy instead of a view,
I can only think how to do it in a two-step process. Maybe a numpy wizard knows a better way...
Here you go:
import numpy as np
a = np.array([[1, 2, 3],
[4, 5, 6],
[7, 8, 9]])
index = [1, 3]
n = a.shape[0]
N = n + len(index)
non_index = [x for x in xrange(N) if x not in index]
b = np.zeros((N,n), a.dtype)
b[non_index] = a
a = np.zeros((N,N), a.dtype)
a[:, non_index] = b

Why can't you just Slice/splice? This has zero loops or for statements.
xlen = a.shape[1]
ylen = a.shape[0]
b = np.zeros((ylen * 2 - ylen % 2, xlen * 2 - xlen % 2)) #accomodates both odd and even shapes
b[0::2,0::2] = a

Related

Is there any fast way to find identical rows of two sparse matrices with different sizes?

Consider A, an n by j matrix, and B, an m by j matrix, both in SciPy with m<n. Is there any way that I can find the indices of the rows of A which are identical to rows of B?
I have tried for loops and tried to convert them into Numpy arrays. In my case, they're not working because I'm dealing with huge matrices.
Here is the link to the same question for Numpy arrays.
Edit:
An Example for A, B, and the desired output:
>>> import numpy as np
>>> from scipy.sparse import csc_matrix
>>> row = np.array([0, 2, 2, 0, 1, 2])
>>> col = np.array([0, 0, 1, 2, 2, 2])
>>> data = np.array([1, 3, 3, 4, 5, 6])
>>> A = csc_matrix((data, (row, col)), shape=(5, 3))
>>> A.toarray()
array([[1, 0, 4],
[0, 0, 5],
[3, 3, 6],
[0, 0, 0],
[0, 0, 0]])
>>> row = np.array([0, 2, 2, 0, 1, 2])
>>> col = np.array([0, 0, 1, 2, 2, 2])
>>> data = np.array([1, 2, 3, 4, 5, 6])
>>> B = csc_matrix((data, (row, col)), shape=(4, 3))
>>> B.toarray()
array([[1, 0, 4],
[0, 0, 5],
[2, 3, 6],
[0, 0, 0]])
Desired output:
def some_function(A,B):
# Some operations
return indices
>>> some_function(A,B)
[0, 1, 3, 4]

How to create a new list where new_array[i][j] = b[a[i][j]] (with a being an array and b a vector) without using for loops

I have two arrays, for example a = np.array([[0, 2, 0], [0, 2, 0]]) and b = np.array([1, 1, 2]).
What I want to do is to create a new array with the same size of a, but where each entry (i,j) corresponds to the value of list b with the index given by a[i][j]. Formally, I want new_list[i][j] = b[a[i][j]].
I know that this can be achieved with for loops, as shown in the code below. However, I wanted to ask if this is possible to do without for loops and only with Numpy or Python built-in functions using code vectorization.
a = np.array([[0, 2, 0], [0, 2, 0]])
b = np.array([0, 0, 2])
new_array = np.empty((2,3))
for i in range(len(a)):
for j in range(3):
new_array[i][j] = b[a[i][j]]
expected output:
array([[0, 2, 0],
[0, 2, 0]])
You can use numpy.take:
np.take(b, a)
output:
array([[0, 2, 0],
[0, 2, 0]])
non ambiguous example
a = [[0, 2, 0], [1, 1, 2]]
b = [6, 7, 8]
np.take(b, a)
# array([[6, 8, 6],
# [7, 7, 8]])

How to implement Numpy where index in TensorFlow?

I have the following operations which uses numpy.where:
mat = np.array([[1, 2, 3], [4, 5, 6], [7, 8, 9]], dtype=np.int32)
index = np.array([[1,0,0],[0,1,0],[0,0,1]])
mat[np.where(index>0)] = 100
print(mat)
How to implement the equivalent in TensorFlow?
mat = np.array([[1, 2, 3], [4, 5, 6], [7, 8, 9]], dtype=np.int32)
index = np.array([[1, 0, 0], [0, 1, 0], [0, 0, 1]])
tf_mat = tf.constant(mat)
tf_index = tf.constant(index)
indi = tf.where(tf_index>0)
tf_mat[indi] = -1 <===== not allowed
Assuming that what you want is to create a new tensor with some replaced elements, and not update a variable, you could do something like this:
import numpy as np
import tensorflow as tf
mat = np.array([[1, 2, 3], [4, 5, 6], [7, 8, 9]], dtype=np.int32)
index = np.array([[1, 0, 0], [0, 1, 0], [0, 0, 1]])
tf_mat = tf.constant(mat)
tf_index = tf.constant(index)
tf_mat = tf.where(tf_index > 0, -tf.ones_like(tf_mat), tf_mat)
with tf.Session() as sess:
print(sess.run(tf_mat))
Output:
[[-1 2 3]
[ 4 -1 6]
[ 7 8 -1]]
You can get indexes by tf.where, then you can either run the index, or use tf.gather to collect data from the origin array, or use tf.scatter_update to update origin data, tf.scatter_nd_update for multi-dimension update.
mat = tf.Variable([[1, 2, 3], [4, 5, 6], [7, 8, 9]], dtype=tf.int32)
index = tf.Variable([[1,0,0],[0,1,0],[0,0,1]])
idx = tf.where(index>0)
tf.scatter_nd_update(mat, idx, /*values you want*/)
note that update values should be the same first dimension size with idx.
see https://www.tensorflow.org/api_guides/python

Random value into matrix (np.ndarray) at position from tuple obtained from np.where

I have a matrix with some zero
x=np.array([[1,2,3,0],[4,0,5,0],[7,0,0,0],[0,9,8,0]])
>>> x
array([[1, 2, 3, 0],
[4, 0, 5, 0],
[7, 0, 0, 0],
[0, 9, 8, 0]])
And want to random value into only a position which is not zero. I can get the (row, col) position as tuple from np.where
pos = np.where(x!=0)
>>> (array([0, 0, 0, 1, 1, 2, 3, 3], dtype=int64), array([0, 1, 2, 0, 2, 0, 1, 2], dtype=int64))
Is there a way to use np.random (or something else) for the matrix x at position from posonly without changing where is zero?
# pseudocode
new_x = np.rand(x, at pos)
I assume you want to replace non-zero value with random integer number.
You can use the combination of numpy.place and numpy.random.randint functions.
>>> x=np.array([[1,2,3,0],[4,0,5,0],[7,0,0,0],[0,9,8,0]])
>>> x
array([[1, 2, 3, 0],
[4, 0, 5, 0],
[7, 0, 0, 0],
[0, 9, 8, 0]])
>>> lower_bound, upper_bound = 1, 5 # random function boundary
>>> np.place(x, x!=0, np.random.randint(lower_bound, upper_bound, np.count_nonzero(x)))
>>> x
array([[2, 2, 3, 0],
[1, 0, 3, 0],
[2, 0, 0, 0],
[0, 4, 3, 0]])
well you can use x.nonzero() which gives you all indices of array with nonzero values
and then then you just need to put random values at those indices
nz_indices = x.nonzero()
for i,j in zip(nz_indices[0],nz_indices[1]):
x[i][j] = np.random.randint(1500) #random number till 1500
you can find more about randint() here >> randint docs
How about something simple like this:
import numpy as np
x = np.array([[1, 2, 3, 0], [4, 0, 5, 0], [7, 0, 0, 0], [0, 9, 8, 0]])
w = x != 0
x[w] = np.random.randint(10, size=x.shape)[w]
print(x)
[[2 2 2 0]
[0 0 4 0]
[1 0 0 0]
[0 3 1 0]]
You could also do
x = np.random.randint(1, 10, size=x.shape) * (x != 0)
Just index with np.nonzero
i = np.nonzero(x)
x[i] = np.random.randint(1, 10, i[0].size)
Note for reference that np.nonzero(x) <=> np.where(x) <=> np.where(x != 0)

Removing duplicate columns and rows from a NumPy 2D array

I'm using a 2D shape array to store pairs of longitudes+latitudes. At one point, I have to merge two of these 2D arrays, and then remove any duplicated entry. I've been searching for a function similar to numpy.unique, but I've had no luck. Any implementation I've been
thinking on looks very "unoptimizied". For example, I'm trying with converting the array to a list of tuples, removing duplicates with set, and then converting to an array again:
coordskeys = np.array(list(set([tuple(x) for x in coordskeys])))
Are there any existing solutions, so I do not reinvent the wheel?
To make it clear, I'm looking for:
>>> a = np.array([[1, 1], [2, 3], [1, 1], [5, 4], [2, 3]])
>>> unique_rows(a)
array([[1, 1], [2, 3],[5, 4]])
BTW, I wanted to use just a list of tuples for it, but the lists were so big that they consumed my 4Gb RAM + 4Gb swap (numpy arrays are more memory efficient).
This should do the trick:
def unique_rows(a):
a = np.ascontiguousarray(a)
unique_a = np.unique(a.view([('', a.dtype)]*a.shape[1]))
return unique_a.view(a.dtype).reshape((unique_a.shape[0], a.shape[1]))
Example:
>>> a = np.array([[1, 1], [2, 3], [1, 1], [5, 4], [2, 3]])
>>> unique_rows(a)
array([[1, 1],
[2, 3],
[5, 4]])
Here's one idea, it'll take a little bit of work but could be quite fast. I'll give you the 1d case and let you figure out how to extend it to 2d. The following function finds the unique elements of of a 1d array:
import numpy as np
def unique(a):
a = np.sort(a)
b = np.diff(a)
b = np.r_[1, b]
return a[b != 0]
Now to extend it to 2d you need to change two things. You will need to figure out how to do the sort yourself, the important thing about the sort will be that two identical entries end up next to each other. Second, you'll need to do something like (b != 0).all(axis) because you want to compare the whole row/column. Let me know if that's enough to get you started.
updated: With some help with doug, I think this should work for the 2d case.
import numpy as np
def unique(a):
order = np.lexsort(a.T)
a = a[order]
diff = np.diff(a, axis=0)
ui = np.ones(len(a), 'bool')
ui[1:] = (diff != 0).any(axis=1)
return a[ui]
My method is by turning a 2d array into 1d complex array, where the real part is 1st column, imaginary part is the 2nd column. Then use np.unique. Though this will only work with 2 columns.
import numpy as np
def unique2d(a):
x, y = a.T
b = x + y*1.0j
idx = np.unique(b,return_index=True)[1]
return a[idx]
Example -
a = np.array([[1, 1], [2, 3], [1, 1], [5, 4], [2, 3]])
unique2d(a)
array([[1, 1],
[2, 3],
[5, 4]])
>>> import numpy as NP
>>> # create a 2D NumPy array with some duplicate rows
>>> A
array([[1, 1, 1, 5, 7],
[5, 4, 5, 4, 7],
[7, 9, 4, 7, 8],
[5, 4, 5, 4, 7],
[1, 1, 1, 5, 7],
[5, 4, 5, 4, 7],
[7, 9, 4, 7, 8],
[5, 4, 5, 4, 7],
[7, 9, 4, 7, 8]])
>>> # first, sort the 2D NumPy array row-wise so dups will be contiguous
>>> # and rows are preserved
>>> a, b, c, d, e = A.T # create the keys for to pass to lexsort
>>> ndx = NP.lexsort((a, b, c, d, e))
>>> ndx
array([1, 3, 5, 7, 0, 4, 2, 6, 8])
>>> A = A[ndx,]
>>> # now diff by row
>>> A1 = NP.diff(A, axis=0)
>>> A1
array([[0, 0, 0, 0, 0],
[4, 3, 3, 0, 0],
[0, 0, 0, 0, 0],
[0, 0, 0, 1, 0],
[0, 0, 1, 0, 0],
[2, 5, 0, 2, 1],
[0, 0, 0, 0, 0],
[0, 0, 0, 0, 0]])
>>> # the index array holding the location of each duplicate row
>>> ndx = NP.any(A1, axis=1)
>>> ndx
array([False, True, False, True, True, True, False, False], dtype=bool)
>>> # retrieve the duplicate rows:
>>> A[1:,:][ndx,]
array([[7, 9, 4, 7, 8],
[1, 1, 1, 5, 7],
[5, 4, 5, 4, 7],
[7, 9, 4, 7, 8]])
The numpy_indexed package (disclaimer: I am its author) wraps the solution posted by user545424 in a nice and tested interface, plus many related features:
import numpy_indexed as npi
npi.unique(coordskeys)
since you refer to numpy.unique, you dont care to maintain the original order, correct? converting into set, which removes duplicate, and then back to list is often used idiom:
>>> x = [(1, 1), (2, 3), (1, 1), (5, 4), (2, 3)]
>>> y = list(set(x))
>>> y
[(5, 4), (2, 3), (1, 1)]
>>>

Categories

Resources