I want to make a 2D array of 2-tuples of fixed dimension (say 10x10).
e.g
[[(1,2), (1,2), (1,2)],
[(1,2), (1,2), (1,2)],
[(1,2), (1,2), (1,2)]]
There are also two ways that I'd like to generate this array:
An array like the example above where every element is the same tuple
An array which I populate iteratively with specific tuples (possibly starting with an empty array of fixed size and then using assignment)
How would I go about doing this? For #1 I tried using numpy.tiles:
>>> np.tile(np.array([1,2]), (3, 3))
array([[1, 2, 1, 2, 1, 2],
[1, 2, 1, 2, 1, 2],
[1, 2, 1, 2, 1, 2]])
But I can't seem to copy it across columns, the columns are just concatenated.
i.e instead of:
[[[1,2], [1,2], [1,2]],
[[1,2], [1,2], [1,2]],
[[1,2], [1,2], [1,2]]]
you can use numpy.full:
numpy.full((3, 3, 2), (1, 2))
output:
array([[[1, 2],
[1, 2],
[1, 2]],
[[1, 2],
[1, 2],
[1, 2]],
[[1, 2],
[1, 2],
[1, 2]]])
for <1> you can generate like this
[[(1,2)] * 3]*3
# get [[(1, 2), (1, 2), (1, 2)], [(1, 2), (1, 2), (1, 2)], [(1, 2), (1, 2), (1, 2)]]
numpy.zeros((3,3,2))
I guess would work (but its not tuples its lists...)
Related
I need help in storing the combinations of column vectors' values in a numpy array.
My problem consists of two column vectors, having size nx1 and mx1, with n=m, and finding n combinations.
I then vertical stacked these column vectors in a matrix, having size nx2.
I found the combinations with the itertools.combination function of python, but I struggle to store them in a numpy array, since itertools gives n rows of tuples.
The main example I found online is reported below:
import itertools
val = [1, 2, 3, 4]
com_set = itertools.combinations(val, 2)
for i in com_set:
print(i)
Output:
(1, 2)
(1, 3)
(1, 4)
(2, 3)
(2, 4)
(3, 4)
Now, in my case, I have two vectors, val and val1, different from each other.
And, I would need the output in a numpy array, possible a matrix, so I can apply the maximum likelihood estimation method on these values.
You are looking for itertools.product instead of itertools.combinations.
x = [1, 2, 3]
y = [4, 5, 6]
z = list(itertools.product(x, y))
# z = [(1, 4), (1, 5), (1, 6), (2, 4), (2, 5), (2, 6), (3, 4), (3, 5), (3, 6)]
You can turn the result into a (n * n, 2) shaped array by simply passing the result to np.array:
result = np.array(z)
# array([[1, 4],
# [1, 5],
# [1, 6],
# [2, 4],
# [2, 5],
# [2, 6],
# [3, 4],
# [3, 5],
# [3, 6]])
Finally, you can also do this with numpy directly, albeit in a different order:
result = np.stack(np.meshgrid(x, y)).reshape(2, -1).T
# array([[1, 4],
# [2, 4],
# [3, 4],
# [1, 5],
# [2, 5],
# [3, 5],
# [1, 6],
# [2, 6],
# [3, 6]])
I have a list consisting of two numbers:
[0, 3]
The first number is the starting number and the second number is the ending number. I want to find all the unique combinations of numbers in that range so that the first number is not the same as the second number. E.g:
[0, 1], [0, 2], [0, 3], [1, 3], [1, 2], [2, 3]
I can't seem to find a similar solution online. I've seen using itertools but it doesn't apply in my situation. How would you do this in Python? Thanks
Can you try the following:
import itertools
result = list(itertools.combinations([0, 1, 2, 3], 2))
print(result)
Output:
[(0, 1), (0, 2), (0, 3), (1, 2), (1, 3), (2, 3)]
Another post I had does exactly what I wanted, but I cannot seem to implement on a structured array.
Say I have an array like so:
>>> arr = np.empty(2, dtype=np.dtype([('xy', np.float32, (2, 2))]))
>>> arr['xy']
array([[[1., 1.],
[2., 2.]],
[3., 3.],
[4., 4.]]], dtype=float32)
I need to pad it so that the last row in each subarray is repeated a specific number of times:
arr['xy'] = np.pad(arr['xy'], [(0, 0), (0, 2), (0, 0)], mode='edge')
However I'm getting a ValueError:
ValueError: could not broadcast input array from shape (2, 4, 2) into shape (2, 2, 2)
So without a structured array, I tried the following:
>>> arr = np.array([[[1, 1], [2, 2]], [[3, 3], [4, 4]]])
>>> arr
array([[[1, 1],
[2, 2]],
[3, 3],
[4, 4]]], dtype=float32)
>>> arr = np.pad(arr, [(0, 0), (0, 2), (0, 0)], mode='edge')
>>> arr
array([[[1, 1],
[2, 2],
[2, 2],
[2, 2]],
[3, 3],
[4, 4],
[4, 4],
[4, 4]], dtype=float32)
How come I cannot repeat with a structured array?
Your padding works, it's the assignment to ar["xy"] that fails, you can't change the shape of a structure.
>>> arr = np.empty(2, dtype=np.dtype([('xy', np.float32, (2, 2))]))
>>> ar2 = np.pad(arr['xy'], [(0, 0), (0, 2), (0, 0)], mode='edge')
>>> ar2.shape
(2, 4, 2)
>>> arr["xy"] = ar2
Traceback (most recent call last):
File "<stdin>", line 1, in <module>
ValueError: could not broadcast input array from shape (2,4,2) into shape (2,2,2)
It might be the late reply, but you can use it instead.
assume that you have structured ndarray input "sample"
and pad_value_dict, having {name: pad_value} you want to add.
pad_arr = np.array(
[tuple([pad_value_dict[k] for k in keys])],
dtype=sample.dtype
)
pad_arr = np.tile(pad_arr, pad_len)
result = np.append(sample, pad_arr)
For a 1D NumPy array, I am looking to get the combinations without the same elements being repeated in a combination. The order is important. So, [a,b] and [b,a] would be two distinct combinations. Since we don't want repeats, [a,a] and [b,b] aren't valid combinations. For simplicity, let's keep it to two elements per combination. Thus, the output would be a 2D NumPy array with 2 columns.
The desired result would be essentially same as itertools.product output except that we need to mask out the combinations that are repeated. As such, we can solve it for a sample case, like so -
In [510]: import numpy as np
In [511]: a = np.array([4,2,9,1,3])
In [512]: from itertools import product
In [513]: np.array(list(product(a,repeat=2)))[~np.eye(len(a),dtype=bool).ravel()]
Out[513]:
array([[4, 2],
[4, 9],
[4, 1],
[4, 3],
[2, 4],
[2, 9],
[2, 1],
[2, 3],
[9, 4],
[9, 2],
[9, 1],
[9, 3],
[1, 4],
[1, 2],
[1, 9],
[1, 3],
[3, 4],
[3, 2],
[3, 9],
[3, 1]])
But, creating that huge array and then masking out and hence not using some elements, doesn't look too efficient to me.
That got me thinking if numpy.ndarray.strides could be leveraged here. I have one solution with that idea in mind, which I will be posting as an answer post, but would love to see other efficient ones.
In terms of usage - We come across these cases with adjacency matrices among others and I thought it would be good to solve such a problem. For easier and efficient plug-n-play into other problems, it would be nice to have the final output that's not a view of some intermediate array.
Seems like np.lib.stride_tricks.as_strided could be used to maximize the efficiency of views and we delay the copying until the final stage, where we assign into an initialized array. The implementation would be in two steps, with some work needed for the second column (as shown in the sample case in the question), which we are calling as one-cold (fancy name that denotes one element missing per sequence / is cold in a each interval of len(input_array) - 1)
def onecold(a):
n = len(a)
s = a.strides[0]
strided = np.lib.stride_tricks.as_strided
b = np.concatenate((a,a[:-1]))
return strided(b[1:], shape=(n-1,n), strides=(s,s))
To showcase, onecold with a sample case -
In [563]: a
Out[563]: array([4, 2, 9, 1, 3])
In [564]: onecold(a).reshape(len(a),-1)
Out[564]:
array([[2, 9, 1, 3],
[4, 9, 1, 3],
[4, 2, 1, 3],
[4, 2, 9, 3],
[4, 2, 9, 1]])
To solve the original problem, we will use it like so -
def combinations_without_repeat(a):
n = len(a)
out = np.empty((n,n-1,2),dtype=a.dtype)
out[:,:,0] = np.broadcast_to(a[:,None], (n, n-1))
out.shape = (n-1,n,2)
out[:,:,1] = onecold(a)
out.shape = (-1,2)
return out
Sample run -
In [574]: a
Out[574]: array([4, 2, 9, 1, 3])
In [575]: combinations_without_repeat(a)
Out[575]:
array([[4, 2],
[4, 9],
[4, 1],
[4, 3],
[2, 4],
[2, 9],
[2, 1],
[2, 3],
[9, 4],
[9, 2],
[9, 1],
[9, 3],
[1, 4],
[1, 2],
[1, 9],
[1, 3],
[3, 4],
[3, 2],
[3, 9],
[3, 1]])
Seems quite efficient for a 1000 elements array of ints -
In [578]: a = np.random.randint(0,9,(1000))
In [579]: %timeit combinations_without_repeat(a)
100 loops, best of 3: 2.35 ms per loop
Would love to see others!
"It would be essentially same as itertools.product output, expect that we need to mask out the combinations that are repeated." Actually, what you want is itertools.permutations:
In [7]: import numpy as np
In [8]: from itertools import permutations
In [9]: a = np.array([4,2,9,1,3])
In [10]: list(permutations(a, 2))
Out[10]:
[(4, 2),
(4, 9),
(4, 1),
(4, 3),
(2, 4),
(2, 9),
(2, 1),
(2, 3),
(9, 4),
(9, 2),
(9, 1),
(9, 3),
(1, 4),
(1, 2),
(1, 9),
(1, 3),
(3, 4),
(3, 2),
(3, 9),
(3, 1)]
Benchmarking Post
Posting the performance numbers/figures for the proposed approaches thus far in this wiki-post.
Proposed solutions :
import numpy as np
from itertools import permutations
# https://stackoverflow.com/a/48234170/ #Divakar
def onecold(a):
n = len(a)
s = a.strides[0]
strided = np.lib.stride_tricks.as_strided
b = np.concatenate((a,a[:-1]))
return strided(b[1:], shape=(n-1,n), strides=(s,s))
# https://stackoverflow.com/a/48234170/ #Divakar
def combinations_without_repeat(a):
n = len(a)
out = np.empty((n,n-1,2),dtype=a.dtype)
out[:,:,0] = np.broadcast_to(a[:,None], (n, n-1))
out.shape = (n-1,n,2)
out[:,:,1] = onecold(a)
out.shape = (-1,2)
return out
# https://stackoverflow.com/a/48234349/ #Warren Weckesser
def itertools_permutations(a):
return np.array(list(permutations(a, 2)))
Using benchit package (few benchmarking tools packaged together; disclaimer: I am its author) to benchmark proposed solutions.
import benchit
in_ = [np.random.rand(n) for n in [10,20,50,100,200,500,1000]]
funcs = [combinations_without_repeat, itertools_permutations]
t = benchit.timings(funcs, in_)
t.rank()
t.plot(logx=True, save='timings.png')
I am trying to get the subset x of the given NumPy array alist such that the first element of each row must be in the list r.
>>> import numpy
>>> alist = numpy.array([(0, 2), (0, 4), (1, 3), (1, 4), (2, 1), (3, 1), (3, 2), (4, 1), (4, 3), (4, 2)])
>>> alist
array([[0, 2],
[0, 4],
[1, 3],
[1, 4],
[2, 1],
[3, 1],
[3, 2],
[4, 1],
[4, 3],
[4, 2]])
>>> r = [1,3]
>>> x = alist[where first element of each row is in r] #this i need to figure out.
>>> x
array([[1, 3],
[1, 4],
[3, 1],
[3, 2]])
Any easy way (without looping as I've a large dataset) to do this in Python?
Slice the first column off input array (basically selecting first elem from each row), then use np.in1d with r as the second input to create a mask of such valid rows and finally index into the rows of the array with the mask to select the valid ones.
Thus, the implementation would be like so -
alist[np.in1d(alist[:,0],r)]
Sample run -
In [258]: alist # Input array
Out[258]:
array([[0, 2],
[0, 4],
[1, 3],
[1, 4],
[2, 1],
[3, 1],
[3, 2],
[4, 1],
[4, 3],
[4, 2]])
In [259]: r # Input list to be searched for
Out[259]: [1, 3]
In [260]: np.in1d(alist[:,0],r) # Mask of valid rows
Out[260]: array([False, False, True, True, False, True, True,
False, False, False], dtype=bool)
In [261]: alist[np.in1d(alist[:,0],r)] # Index and select for final o/p
Out[261]:
array([[1, 3],
[1, 4],
[3, 1],
[3, 2]])
You can construct the index array for the valid rows using some indexing tricks: we can add an additional dimension and check equality with each element of your first column:
import numpy as np
alist = np.array([(0, 2), (0, 4), (1, 3), (1, 4), (2, 1),
(3, 1), (3, 2), (4, 1), (4, 3), (4, 2)])
inds = (alist[:,0][:,None] == r).any(axis=-1)
x = alist[inds,:] # the valid rows
The trick is that we take the first column of alist, make it an (N,1)-shaped array, make use of array broadcasting in the comparison to end up with an (N,2)-shape boolean array, and if any of the values in a given row is True, we keep that index. The resulting index array is the exact same as the np.in1d one in Divakar's answer.