Combinations of values of column vectors - python

I need help in storing the combinations of column vectors' values in a numpy array.
My problem consists of two column vectors, having size nx1 and mx1, with n=m, and finding n combinations.
I then vertical stacked these column vectors in a matrix, having size nx2.
I found the combinations with the itertools.combination function of python, but I struggle to store them in a numpy array, since itertools gives n rows of tuples.
The main example I found online is reported below:
import itertools
val = [1, 2, 3, 4]
com_set = itertools.combinations(val, 2)
for i in com_set:
print(i)
Output:
(1, 2)
(1, 3)
(1, 4)
(2, 3)
(2, 4)
(3, 4)
Now, in my case, I have two vectors, val and val1, different from each other.
And, I would need the output in a numpy array, possible a matrix, so I can apply the maximum likelihood estimation method on these values.

You are looking for itertools.product instead of itertools.combinations.
x = [1, 2, 3]
y = [4, 5, 6]
z = list(itertools.product(x, y))
# z = [(1, 4), (1, 5), (1, 6), (2, 4), (2, 5), (2, 6), (3, 4), (3, 5), (3, 6)]
You can turn the result into a (n * n, 2) shaped array by simply passing the result to np.array:
result = np.array(z)
# array([[1, 4],
# [1, 5],
# [1, 6],
# [2, 4],
# [2, 5],
# [2, 6],
# [3, 4],
# [3, 5],
# [3, 6]])
Finally, you can also do this with numpy directly, albeit in a different order:
result = np.stack(np.meshgrid(x, y)).reshape(2, -1).T
# array([[1, 4],
# [2, 4],
# [3, 4],
# [1, 5],
# [2, 5],
# [3, 5],
# [1, 6],
# [2, 6],
# [3, 6]])

Related

How to reshape matrices using index instead of shape inputs?

Given an array of shape (8, 3, 4, 4), reshape them into an arbitrary new shape (8, 4, 4, 3) by inputting the new indices compared to the old positions (0, 2, 3, 1).
Bonus: perform numpy.dot of one of said array's non-last index and a 1-D second, i.e. numpy.dot(<array with shape (8, 3, 4, 4)>, [1, 2, 3]) # will return shape mismatch as it is
Numpy's transpose "reverses or permutes":
ni = (0, 2, 3, 1)
arr = arr.transpose(ni)
Old solution:
ni = (0, 2, 3, 1)
s = arr.shape
arr = arr.reshape(s[ni[0]], s[ni[1]]...)
Maybe this is what you are looking for:
arr = np.array([[[1, 2], [3, 4], [5, 6]]])
s = arr.shape
new_indexes = (1, 0, 2) # permutation
new_arr = arr.reshape(*[s[index] for index in new_indexes])
print(arr.shape) # (1, 3, 2)
print(new_arr.shape) # (3, 1, 2)

How to make a 2d array of tuples in python?

I want to make a 2D array of 2-tuples of fixed dimension (say 10x10).
e.g
[[(1,2), (1,2), (1,2)],
[(1,2), (1,2), (1,2)],
[(1,2), (1,2), (1,2)]]
There are also two ways that I'd like to generate this array:
An array like the example above where every element is the same tuple
An array which I populate iteratively with specific tuples (possibly starting with an empty array of fixed size and then using assignment)
How would I go about doing this? For #1 I tried using numpy.tiles:
>>> np.tile(np.array([1,2]), (3, 3))
array([[1, 2, 1, 2, 1, 2],
[1, 2, 1, 2, 1, 2],
[1, 2, 1, 2, 1, 2]])
But I can't seem to copy it across columns, the columns are just concatenated.
i.e instead of:
[[[1,2], [1,2], [1,2]],
[[1,2], [1,2], [1,2]],
[[1,2], [1,2], [1,2]]]
you can use numpy.full:
numpy.full((3, 3, 2), (1, 2))
output:
array([[[1, 2],
[1, 2],
[1, 2]],
[[1, 2],
[1, 2],
[1, 2]],
[[1, 2],
[1, 2],
[1, 2]]])
for <1> you can generate like this
[[(1,2)] * 3]*3
# get [[(1, 2), (1, 2), (1, 2)], [(1, 2), (1, 2), (1, 2)], [(1, 2), (1, 2), (1, 2)]]
numpy.zeros((3,3,2))
I guess would work (but its not tuples its lists...)

Combinations without repeat and ordering matters or Permutations of array elements

For a 1D NumPy array, I am looking to get the combinations without the same elements being repeated in a combination. The order is important. So, [a,b] and [b,a] would be two distinct combinations. Since we don't want repeats, [a,a] and [b,b] aren't valid combinations. For simplicity, let's keep it to two elements per combination. Thus, the output would be a 2D NumPy array with 2 columns.
The desired result would be essentially same as itertools.product output except that we need to mask out the combinations that are repeated. As such, we can solve it for a sample case, like so -
In [510]: import numpy as np
In [511]: a = np.array([4,2,9,1,3])
In [512]: from itertools import product
In [513]: np.array(list(product(a,repeat=2)))[~np.eye(len(a),dtype=bool).ravel()]
Out[513]:
array([[4, 2],
[4, 9],
[4, 1],
[4, 3],
[2, 4],
[2, 9],
[2, 1],
[2, 3],
[9, 4],
[9, 2],
[9, 1],
[9, 3],
[1, 4],
[1, 2],
[1, 9],
[1, 3],
[3, 4],
[3, 2],
[3, 9],
[3, 1]])
But, creating that huge array and then masking out and hence not using some elements, doesn't look too efficient to me.
That got me thinking if numpy.ndarray.strides could be leveraged here. I have one solution with that idea in mind, which I will be posting as an answer post, but would love to see other efficient ones.
In terms of usage - We come across these cases with adjacency matrices among others and I thought it would be good to solve such a problem. For easier and efficient plug-n-play into other problems, it would be nice to have the final output that's not a view of some intermediate array.
Seems like np.lib.stride_tricks.as_strided could be used to maximize the efficiency of views and we delay the copying until the final stage, where we assign into an initialized array. The implementation would be in two steps, with some work needed for the second column (as shown in the sample case in the question), which we are calling as one-cold (fancy name that denotes one element missing per sequence / is cold in a each interval of len(input_array) - 1)
def onecold(a):
n = len(a)
s = a.strides[0]
strided = np.lib.stride_tricks.as_strided
b = np.concatenate((a,a[:-1]))
return strided(b[1:], shape=(n-1,n), strides=(s,s))
To showcase, onecold with a sample case -
In [563]: a
Out[563]: array([4, 2, 9, 1, 3])
In [564]: onecold(a).reshape(len(a),-1)
Out[564]:
array([[2, 9, 1, 3],
[4, 9, 1, 3],
[4, 2, 1, 3],
[4, 2, 9, 3],
[4, 2, 9, 1]])
To solve the original problem, we will use it like so -
def combinations_without_repeat(a):
n = len(a)
out = np.empty((n,n-1,2),dtype=a.dtype)
out[:,:,0] = np.broadcast_to(a[:,None], (n, n-1))
out.shape = (n-1,n,2)
out[:,:,1] = onecold(a)
out.shape = (-1,2)
return out
Sample run -
In [574]: a
Out[574]: array([4, 2, 9, 1, 3])
In [575]: combinations_without_repeat(a)
Out[575]:
array([[4, 2],
[4, 9],
[4, 1],
[4, 3],
[2, 4],
[2, 9],
[2, 1],
[2, 3],
[9, 4],
[9, 2],
[9, 1],
[9, 3],
[1, 4],
[1, 2],
[1, 9],
[1, 3],
[3, 4],
[3, 2],
[3, 9],
[3, 1]])
Seems quite efficient for a 1000 elements array of ints -
In [578]: a = np.random.randint(0,9,(1000))
In [579]: %timeit combinations_without_repeat(a)
100 loops, best of 3: 2.35 ms per loop
Would love to see others!
"It would be essentially same as itertools.product output, expect that we need to mask out the combinations that are repeated." Actually, what you want is itertools.permutations:
In [7]: import numpy as np
In [8]: from itertools import permutations
In [9]: a = np.array([4,2,9,1,3])
In [10]: list(permutations(a, 2))
Out[10]:
[(4, 2),
(4, 9),
(4, 1),
(4, 3),
(2, 4),
(2, 9),
(2, 1),
(2, 3),
(9, 4),
(9, 2),
(9, 1),
(9, 3),
(1, 4),
(1, 2),
(1, 9),
(1, 3),
(3, 4),
(3, 2),
(3, 9),
(3, 1)]
Benchmarking Post
Posting the performance numbers/figures for the proposed approaches thus far in this wiki-post.
Proposed solutions :
import numpy as np
from itertools import permutations
# https://stackoverflow.com/a/48234170/ #Divakar
def onecold(a):
n = len(a)
s = a.strides[0]
strided = np.lib.stride_tricks.as_strided
b = np.concatenate((a,a[:-1]))
return strided(b[1:], shape=(n-1,n), strides=(s,s))
# https://stackoverflow.com/a/48234170/ #Divakar
def combinations_without_repeat(a):
n = len(a)
out = np.empty((n,n-1,2),dtype=a.dtype)
out[:,:,0] = np.broadcast_to(a[:,None], (n, n-1))
out.shape = (n-1,n,2)
out[:,:,1] = onecold(a)
out.shape = (-1,2)
return out
# https://stackoverflow.com/a/48234349/ #Warren Weckesser
def itertools_permutations(a):
return np.array(list(permutations(a, 2)))
Using benchit package (few benchmarking tools packaged together; disclaimer: I am its author) to benchmark proposed solutions.
import benchit
in_ = [np.random.rand(n) for n in [10,20,50,100,200,500,1000]]
funcs = [combinations_without_repeat, itertools_permutations]
t = benchit.timings(funcs, in_)
t.rank()
t.plot(logx=True, save='timings.png')

Taking dot products of high dimensional numpy arrays

I am trying to take the dot product between three numpy arrays. However, I am struggling with wrapping my head around this.
The problem is as follows:
I have two (4,) shaped numpy arrays a and b respectively, as well as a numpy array with shape (4, 4, 3), c.
import numpy as np
a = np.array([0, 1, 2, 3])
b = np.array([[[1, 1, 1], [1, 1, 1], [1, 1, 1], [1, 1, 1]],
[[2, 2, 2], [2, 2, 2], [2, 2, 2], [2, 2, 2]],
[[3, 3, 3], [3, 3, 3], [3, 3, 3], [3, 3, 3]],
[[4, 4, 4], [4, 4, 4], [4, 4, 4], [4, 4, 4]]])
c = np.array([4, 5, 6, 7])
I want to compute the dot product in such a way that my result is a 3-tuple. That is, first dot a with b and then dotting with c, taking transposes if needed. In other words, I want to compute the dot product between a, b and c as if c was of shape (4, 4), but I want a 3-tuple as result.
I have tried:
Reshaping a and c, and then computing the dot product:
a = np.reshape(a, (4, 1))
c = np.reshape(c, (4, 1))
tmp = np.dot(a.T, b) # now has shape (1, 4, 3)
result = np.dot(tmp, c)
Ideally, I should now have:
print(result.shape)
>> (1, 1, 3)
but I get the error
ValueError: shapes (1,4,3) and (4,1) not aligned: 3 (dim 2) != 4 (dim 0)
I have also tried using the tensordot function from numpy, but without luck.
The basic dot(A,B) rule is: last axis of A with the 2nd to the last of B
In [965]: a.shape
Out[965]: (4,)
In [966]: b.shape
Out[966]: (4, 4, 3)
a (and c) is 1d. It's (4,) can dot with the 2nd (4) of b with:
In [967]: np.dot(a,b).shape
Out[967]: (4, 3)
Using c in the same on the output produces a (3,) array
In [968]: np.dot(c, np.dot(a,b))
Out[968]: array([360, 360, 360])
This combination may be clearer with the equivalent einsum:
In [971]: np.einsum('i,jik,j->k',a,b,c)
Out[971]: array([360, 360, 360])
But what if we want a to act on the 1st axis of b? With einsum that's easy to do:
In [972]: np.einsum('i,ijk,j->k',a,b,c)
Out[972]: array([440, 440, 440])
To do the same with the dot, we could just switch a and c:
In [973]: np.dot(a, np.dot(c,b))
Out[973]: array([440, 440, 440])
Or transpose axes of b:
In [974]: np.dot(c, np.dot(a,b.transpose(1,0,2)))
Out[974]: array([440, 440, 440])
This transposition question would be clearer if a and c had different lengths. e.g. A (2,) and (4,) with a (2,4,3) or (4,2,3).
In
tmp = np.dot(a.T, b) # now has shape (1, 4, 3)
you have a (1,4a) dotted with (4,4a,3). The result is (1,4,3). I added the a to identify when axes were combined.
To apply the (4,1) c, we have to do the same transpose:
In [977]: np.dot(c[:,None].T, np.dot(a[:,None].T, b))
Out[977]: array([[[360, 360, 360]]])
In [978]: _.shape
Out[978]: (1, 1, 3)
np.dot(c[None,:], np.dot(a[None,:], b)) would do the same without the transposes.
I was hoping numpy would automagically distribute over the last axis. That is, that the dot product would run over the two first axes, if that makes sense.
Given the dot rule that I cited at the start this does not make sense. But if we transpose b so the (3) axis is first, it can 'carry that along', using the last and 2nd to the last.
In [986]: b.transpose(2,0,1).shape
Out[986]: (3, 4, 4)
In [987]: np.dot(a, b.transpose(2,0,1)).shape
Out[987]: (3, 4)
In [988]: np.dot(np.dot(a, b.transpose(2,0,1)),c)
Out[988]: array([440, 440, 440])
(4a).(3, 4a, 4c) -> (3, 4c)
(3, 4c). (4c) -> 3
Not automagical but does the job:
np.einsum('i,ijk,j->k',a,b,c)
# array([440, 440, 440])
This computes d of shape (3,) such that d_k = sum_{ij} a_i b_{ijk} c_j.
You are multiplying (1,4,3) matrix by (4,1) matrix so it is impossible because you have 3 pages of (1,4) matrices in b. If you want to do multiplication of each page of matrix b by c just multiply each page separately.
a = np.array([0, 1, 2, 3])
b = np.array([[[1, 1, 1], [1, 1, 1], [1, 1, 1], [1, 1, 1]],
[[2, 2, 2], [2, 2, 2], [2, 2, 2], [2, 2, 2]],
[[3, 3, 3], [3, 3, 3], [3, 3, 3], [3, 3, 3]],
[[4, 4, 4], [4, 4, 4], [4, 4, 4], [4, 4, 4]]])
c = np.array([4, 5, 6, 7])
a = np.reshape(a, (4, 1))
c = np.reshape(c, (4, 1))
tmp = np.dot(a.T, b) # now has shape (1, 4, 3)
result = np.dot(tmp[:,:,0], c)
for i in range(1,3):
result = np.dstack((result, np.dot(tmp[:,:,i], c)))
print np.shape(result)
So you have result of size (1,1,3)

Return the subset of NumPy array according to the first element of each row

I am trying to get the subset x of the given NumPy array alist such that the first element of each row must be in the list r.
>>> import numpy
>>> alist = numpy.array([(0, 2), (0, 4), (1, 3), (1, 4), (2, 1), (3, 1), (3, 2), (4, 1), (4, 3), (4, 2)])
>>> alist
array([[0, 2],
[0, 4],
[1, 3],
[1, 4],
[2, 1],
[3, 1],
[3, 2],
[4, 1],
[4, 3],
[4, 2]])
>>> r = [1,3]
>>> x = alist[where first element of each row is in r] #this i need to figure out.
>>> x
array([[1, 3],
[1, 4],
[3, 1],
[3, 2]])
Any easy way (without looping as I've a large dataset) to do this in Python?
Slice the first column off input array (basically selecting first elem from each row), then use np.in1d with r as the second input to create a mask of such valid rows and finally index into the rows of the array with the mask to select the valid ones.
Thus, the implementation would be like so -
alist[np.in1d(alist[:,0],r)]
Sample run -
In [258]: alist # Input array
Out[258]:
array([[0, 2],
[0, 4],
[1, 3],
[1, 4],
[2, 1],
[3, 1],
[3, 2],
[4, 1],
[4, 3],
[4, 2]])
In [259]: r # Input list to be searched for
Out[259]: [1, 3]
In [260]: np.in1d(alist[:,0],r) # Mask of valid rows
Out[260]: array([False, False, True, True, False, True, True,
False, False, False], dtype=bool)
In [261]: alist[np.in1d(alist[:,0],r)] # Index and select for final o/p
Out[261]:
array([[1, 3],
[1, 4],
[3, 1],
[3, 2]])
You can construct the index array for the valid rows using some indexing tricks: we can add an additional dimension and check equality with each element of your first column:
import numpy as np
alist = np.array([(0, 2), (0, 4), (1, 3), (1, 4), (2, 1),
(3, 1), (3, 2), (4, 1), (4, 3), (4, 2)])
inds = (alist[:,0][:,None] == r).any(axis=-1)
x = alist[inds,:] # the valid rows
The trick is that we take the first column of alist, make it an (N,1)-shaped array, make use of array broadcasting in the comparison to end up with an (N,2)-shape boolean array, and if any of the values in a given row is True, we keep that index. The resulting index array is the exact same as the np.in1d one in Divakar's answer.

Categories

Resources