Related
I am trying to double the size of a matrix without changing the values. Is there any dedicated function to do that? Here, I need to convert the matrix a to matrix b.
a = np.array([[1,2],
[3,4]])
b = np.array([[1,1,2,2],
[1,1,2,2],
[3,3,4,4],
[3,3,4,4]])
You can repeat on both axes:
a = np.array([[1, 2],
[3, 4]])
b =np.repeat(np.repeat(a, 2, axis=0), 2, axis=1)
Output:
array([[1, 1, 2, 2],
[1, 1, 2, 2],
[3, 3, 4, 4],
[3, 3, 4, 4]])
For example:
For
A = np.array([
[1, 2, 3],
[4, 4, 4],
[5, 6, 6]])
I want to get the output
array([[1, 2, 3]]).
For A = np.arange(9).reshape(3, 3),
I want to get
array([[0, 1, 2],
[3, 4, 5],
[6, 7, 8]])
WITHOUT using loops
You can use:
B = np.sort(A, axis=1)
out = A[(B[:,:-1] != B[:, 1:]).all(1)]
Or using pandas:
import pandas as pd
out = A[pd.DataFrame(A).nunique(axis=1).eq(A.shape[1])]
Output:
array([[1, 2, 3]])
I have a numpy array say
a = array([[1, 2, 3],
[4, 5, 6],
[7, 8, 9]])
I have an array 'replication' of the same size where replication[i,j](>=0) denotes how many times a[i][j] should be repeated along the row. Obiviously, replication array follows the invariant that np.sum(replication[i]) have the same value for all i.
For example, if
replication = array([[1, 2, 1],
[1, 1, 2],
[2, 1, 1]])
then the final array after replicating is:
new_a = array([[1, 2, 2, 3],
[4, 5, 6, 6],
[7, 7, 8, 9]])
Presently, I am doing this to create new_a:
##allocate new_a
h = a.shape[0]
w = a.shape[1]
for row in range(h):
ll = [[a[row][j]]*replicate[row][j] for j in range(w)]
new_a[row] = np.array([item for sublist in ll for item in sublist])
However, this seems to be too slow as it involves using lists. Can I do the intended entirely in numpy, without the use of python lists?
You can flatten out your replication array, then use the .repeat() method of a:
import numpy as np
a = array([[1, 2, 3],
[4, 5, 6],
[7, 8, 9]])
replication = array([[1, 2, 1],
[1, 1, 2],
[2, 1, 1]])
new_a = a.repeat(replication.ravel()).reshape(a.shape[0], -1)
print(repr(new_a))
# array([[1, 2, 2, 3],
# [4, 5, 6, 6],
# [7, 7, 8, 9]])
Given two matrices
A: m * r
B: n * r
I want to generate another matrix C: m * n, with each entry C_ij being a matrix calculated by the outer product of A_i and B_j.
For example,
A: [[1, 2],
[3, 4]]
B: [[3, 1],
[1, 2]]
gives
C: [[[3, 1], [[1 ,2],
[6, 2]], [2 ,4]],
[9, 3], [[3, 6],
[12,4]], [4, 8]]]
I can do it using for loops, like
for i in range (A.shape(0)):
for j in range (B.shape(0)):
C_ij = np.outer(A_i, B_j)
I wonder If there is a vectorised way of doing this calculation to speed it up?
The Einstein notation expresses this problem nicely
In [85]: np.einsum('ac,bd->abcd',A,B)
Out[85]:
array([[[[ 3, 1],
[ 6, 2]],
[[ 1, 2],
[ 2, 4]]],
[[[ 9, 3],
[12, 4]],
[[ 3, 6],
[ 4, 8]]]])
temp = numpy.multiply.outer(A, B)
C = numpy.swapaxes(temp, 1, 2)
NumPy ufuncs, such as multiply, have an outer method that almost does what you want. The following:
temp = numpy.multiply.outer(A, B)
produces a result such that temp[a, b, c, d] == A[a, b] * B[c, d]. You want C[a, b, c, d] == A[a, c] * B[b, d]. The swapaxes call rearranges temp to put it in the order you want.
Simple Solution with Numpy Array Broadcasting
Since, you want C_ij = A_i * B_j, this can be achieved simply by numpy broadcasting on element-wise-product of column-vector-A and row-vector-B, as shown below:
# import numpy as np
# A = [[1, 2], [3, 4]]
# B = [[3, 1], [1, 2]]
A, B = np.array(A), np.array(B)
C = A.reshape(-1,1) * B.reshape(1,-1)
# same as:
# C = np.einsum('i,j->ij', A.flatten(), B.flatten())
print(C)
Output:
array([[ 3, 1, 1, 2],
[ 6, 2, 2, 4],
[ 9, 3, 3, 6],
[12, 4, 4, 8]])
You could then get your desired four sub-matrices by using numpy.dsplit() or numpy.array_split() as follows:
np.dsplit(C.reshape(2, 2, 4), 2)
# same as:
# np.array_split(C.reshape(2,2,4), 2, axis=2)
Output:
[array([[[ 3, 1],
[ 6, 2]],
[[ 9, 3],
[12, 4]]]),
array([[[1, 2],
[2, 4]],
[[3, 6],
[4, 8]]])]
Use numpy;
In [1]: import numpy as np
In [2]: A = np.array([[1, 2], [3, 4]])
In [3]: B = np.array([[3, 1], [1, 2]])
In [4]: C = np.outer(A, B)
In [5]: C
Out[5]:
array([[ 3, 1, 1, 2],
[ 6, 2, 2, 4],
[ 9, 3, 3, 6],
[12, 4, 4, 8]])
Once you have the desired result, you can use numpy.reshape() to mold it in almost any shape you want;
In [6]: C.reshape([4,2,2])
Out[6]:
array([[[ 3, 1],
[ 1, 2]],
[[ 6, 2],
[ 2, 4]],
[[ 9, 3],
[ 3, 6]],
[[12, 4],
[ 4, 8]]])
I'm using a 2D shape array to store pairs of longitudes+latitudes. At one point, I have to merge two of these 2D arrays, and then remove any duplicated entry. I've been searching for a function similar to numpy.unique, but I've had no luck. Any implementation I've been
thinking on looks very "unoptimizied". For example, I'm trying with converting the array to a list of tuples, removing duplicates with set, and then converting to an array again:
coordskeys = np.array(list(set([tuple(x) for x in coordskeys])))
Are there any existing solutions, so I do not reinvent the wheel?
To make it clear, I'm looking for:
>>> a = np.array([[1, 1], [2, 3], [1, 1], [5, 4], [2, 3]])
>>> unique_rows(a)
array([[1, 1], [2, 3],[5, 4]])
BTW, I wanted to use just a list of tuples for it, but the lists were so big that they consumed my 4Gb RAM + 4Gb swap (numpy arrays are more memory efficient).
This should do the trick:
def unique_rows(a):
a = np.ascontiguousarray(a)
unique_a = np.unique(a.view([('', a.dtype)]*a.shape[1]))
return unique_a.view(a.dtype).reshape((unique_a.shape[0], a.shape[1]))
Example:
>>> a = np.array([[1, 1], [2, 3], [1, 1], [5, 4], [2, 3]])
>>> unique_rows(a)
array([[1, 1],
[2, 3],
[5, 4]])
Here's one idea, it'll take a little bit of work but could be quite fast. I'll give you the 1d case and let you figure out how to extend it to 2d. The following function finds the unique elements of of a 1d array:
import numpy as np
def unique(a):
a = np.sort(a)
b = np.diff(a)
b = np.r_[1, b]
return a[b != 0]
Now to extend it to 2d you need to change two things. You will need to figure out how to do the sort yourself, the important thing about the sort will be that two identical entries end up next to each other. Second, you'll need to do something like (b != 0).all(axis) because you want to compare the whole row/column. Let me know if that's enough to get you started.
updated: With some help with doug, I think this should work for the 2d case.
import numpy as np
def unique(a):
order = np.lexsort(a.T)
a = a[order]
diff = np.diff(a, axis=0)
ui = np.ones(len(a), 'bool')
ui[1:] = (diff != 0).any(axis=1)
return a[ui]
My method is by turning a 2d array into 1d complex array, where the real part is 1st column, imaginary part is the 2nd column. Then use np.unique. Though this will only work with 2 columns.
import numpy as np
def unique2d(a):
x, y = a.T
b = x + y*1.0j
idx = np.unique(b,return_index=True)[1]
return a[idx]
Example -
a = np.array([[1, 1], [2, 3], [1, 1], [5, 4], [2, 3]])
unique2d(a)
array([[1, 1],
[2, 3],
[5, 4]])
>>> import numpy as NP
>>> # create a 2D NumPy array with some duplicate rows
>>> A
array([[1, 1, 1, 5, 7],
[5, 4, 5, 4, 7],
[7, 9, 4, 7, 8],
[5, 4, 5, 4, 7],
[1, 1, 1, 5, 7],
[5, 4, 5, 4, 7],
[7, 9, 4, 7, 8],
[5, 4, 5, 4, 7],
[7, 9, 4, 7, 8]])
>>> # first, sort the 2D NumPy array row-wise so dups will be contiguous
>>> # and rows are preserved
>>> a, b, c, d, e = A.T # create the keys for to pass to lexsort
>>> ndx = NP.lexsort((a, b, c, d, e))
>>> ndx
array([1, 3, 5, 7, 0, 4, 2, 6, 8])
>>> A = A[ndx,]
>>> # now diff by row
>>> A1 = NP.diff(A, axis=0)
>>> A1
array([[0, 0, 0, 0, 0],
[4, 3, 3, 0, 0],
[0, 0, 0, 0, 0],
[0, 0, 0, 1, 0],
[0, 0, 1, 0, 0],
[2, 5, 0, 2, 1],
[0, 0, 0, 0, 0],
[0, 0, 0, 0, 0]])
>>> # the index array holding the location of each duplicate row
>>> ndx = NP.any(A1, axis=1)
>>> ndx
array([False, True, False, True, True, True, False, False], dtype=bool)
>>> # retrieve the duplicate rows:
>>> A[1:,:][ndx,]
array([[7, 9, 4, 7, 8],
[1, 1, 1, 5, 7],
[5, 4, 5, 4, 7],
[7, 9, 4, 7, 8]])
The numpy_indexed package (disclaimer: I am its author) wraps the solution posted by user545424 in a nice and tested interface, plus many related features:
import numpy_indexed as npi
npi.unique(coordskeys)
since you refer to numpy.unique, you dont care to maintain the original order, correct? converting into set, which removes duplicate, and then back to list is often used idiom:
>>> x = [(1, 1), (2, 3), (1, 1), (5, 4), (2, 3)]
>>> y = list(set(x))
>>> y
[(5, 4), (2, 3), (1, 1)]
>>>