Creating numpy array from calculations across arrays - python

I currently have the task of creating a 4x4 array with operations performed on the cells
Below you will see a function that takes in array into function the_matrix which returns adj_array
It then has a for loop that is supposed to loop through array, looking at the the cell in ref_array and upon finding the matching first two numbers in array (like 6,3") it will put that function lambda N: 30 into it's respective cell in adj_array, as it will do for all cells in the 4x4 matrix
Essentially the function should return an array like this
array([[inf, <function <lambda> at 0x00000291139AF790>,
<function <lambda> at 0x00000291139AF820>, inf],
[inf, inf, inf, <function <lambda> at 0x00000291139AF8B0>],
[inf, inf, inf, <function <lambda> at 0x00000291139AF940>],
[inf, inf, inf, inf]], dtype=object)
My work so far below
def the_matrix(array):
ref_array = np.zeros((4,4), dtype = object)
ref_array[0,0] = (5,0)
ref_array[0,1] = (5,1)
ref_array[0,2] = (5,2)
ref_array[0,3] = (5,3)
ref_array[1,0] = (6,0)
ref_array[1,1] = (6,1)
ref_array[1,2] = (6,2)
ref_array[1,3] = (6,3)
ref_array[2,0] = (7,0)
ref_array[2,1] = (7,1)
ref_array[2,2] = (7,2)
ref_array[2,3] = (7,3)
ref_array[3,0] = (8,0)
ref_array[3,1] = (8,1)
ref_array[3,2] = (8,2)
ref_array[3,3] = (8,3)
for i in ref_array:
for a in i: #Expecting to get (5,1) here, but's showing me array
if a == array[0, 0:2]: #This specific slice was a test
put the function in that cell for adj_array
return adj_array
array = np.array([[5, 1, lambda N: 120],
[5, 2, lambda N: 30],
[6, 3, lambda N: 30],
[7, 3, lambda N: N/30]])
Have tried variations of this for loop, and it's throwing errors. For one, the a in the for loop is displaying the input argument array, which is weird because it hasn't been called in the loop at that stage. My intention here is to refer to the exact cell in ref_array.
Not sure where I'm going wrong here and how I'm improperly looping through. Any help appreciated

Your ref_array is object dtype, (4,4) containing tuples:
In [26]: ref_array
Out[26]:
array([[(5, 0), (5, 1), (5, 2), (5, 3)],
[(6, 0), (6, 1), (6, 2), (6, 3)],
[(7, 0), (7, 1), (7, 2), (7, 3)],
[(8, 0), (8, 1), (8, 2), (8, 3)]], dtype=object)
Your iteration, just showing the iteration variables. I'm using `repr
In [28]: for i in ref_array:
...: print(repr(i))
...: for a in i:
...: print(repr(a))
...:
array([(5, 0), (5, 1), (5, 2), (5, 3)], dtype=object)
(5, 0)
(5, 1)
(5, 2)
(5, 3)
...
So i is a "row" of the array, itself a 1d object dtype array.
a is one of those objects, a tuple.
Your description of the alternatives is vague. But assume on tries to start with a numeric dtype array
In [30]: arr = np.array(ref_array.tolist())
In [31]: arr
Out[31]:
array([[[5, 0],
[5, 1],
[5, 2],
[5, 3]],
...
[8, 2],
[8, 3]]])
In [32]: arr.shape
Out[32]: (4, 4, 2)
now the looping:
In [33]: for i in arr:
...: print(repr(i))
...: for a in i:
...: print(repr(a))
...:
array([[5, 0], # i is a (4,2) array
[5, 1],
[5, 2],
[5, 3]])
array([5, 0]) # a is (2,) array....
array([5, 1])
array([5, 2])
array([5, 3])
If "the a in the for loop is displaying the input argument array", it's most likely because a IS a an array.
Keep in mind that object dtype arrays are processed at list speeds. You might as well think of them as bastardized lists. While they have some array enhancements (multidimensonal indexing etc), the elements are still references, and are processed as in lists.
I haven't paid attention as to why you are putting lambdas in the array. It looks ugly, and I don't see what it gains you. They can't be "evaluated" at array speeds. You'd have to do some sort of iteration or list comprehension.
edit
A more direct way of generating the arr, derived from ref_array:
In [39]: I,J = np.meshgrid(np.arange(5,9), np.arange(0,4), indexing='ij')
In [40]: I
Out[40]:
array([[5, 5, 5, 5],
[6, 6, 6, 6],
[7, 7, 7, 7],
[8, 8, 8, 8]])
In [41]: J
Out[41]:
array([[0, 1, 2, 3],
[0, 1, 2, 3],
[0, 1, 2, 3],
[0, 1, 2, 3]])
In [42]: arr = np.stack((I,J), axis=2) # shape (4,4,2)
If the function was something like
In [46]: def foo(I,J):
...: return I*10 + J
...:
You could easily generate a value for each pair of the values in ref_array.
In [47]: foo(I,J)
Out[47]:
array([[50, 51, 52, 53],
[60, 61, 62, 63],
[70, 71, 72, 73],
[80, 81, 82, 83]])

Related

What is the most efficient way to use mesh grid with parameters in python?

def s(x,y,z,t1,t2):
return x + y + z + t1 + t2
X = [1,2,3]
Y = [4,5,6]
Z = [7,8,9]
Theta = [(1,2),(3,4),(5,6),(1,1)]
Is there any way for me to efficiently construct an array containing the evaluations of X cross Y cross Z cross Theta with respect to the function s ? Note that I do not want s(1,4,7,1,3), but I do want s(1,4,7,1,2); as in, I don't want s to be evaluated at X cross Y cross Z cross {1,3,5,1} cross {2,4,6,1}.
Thanks.
X = [1,2,3]
Y = [4,5,6]
Z = [7,8,9]
Theta = [(1,2),(3,4),(5,6),(1,1)]
[(a,b,c,d) for a,b,c,d in zip(X,Y,Z,Theta)]
#[(1, 4, 7, (1, 2)), (2, 5, 8, (3, 4)), (3, 6, 9, (5, 6))]
Or
[(a,b,c,*d) for a,b,c,d in zip(X,Y,Z,Theta)]
#[(1, 4, 7, 1, 2), (2, 5, 8, 3, 4), (3, 6, 9, 5, 6)]
If you want sum:
[sum([a,b,c,*d]) for a,b,c,d in zip(X,Y,Z,Theta)]
#[15, 22, 29]
List comprehension:
results = [s(x, y, z, t1, t2) for x in X
for y in Y
for z in Z
for t1, t2 in Theta]
is the most efficient pure python way of doing this.
Starting with your lists:
In [2]: X = [1,2,3]
...: Y = [4,5,6]
...: Z = [7,8,9]
...: Theta = [(1,2),(3,4),(5,6),(1,1)]
It looks like you should split Theta into two lists, as with:
In [5]: T1,T2 = zip(*Theta)
In [6]: T1,T2
Out[6]: ((1, 3, 5, 1), (2, 4, 6, 1))
A flat zip:
In [7]: list(zip(X,Y,Z,T1,T2))
Out[7]: [(1, 4, 7, 1, 2), (2, 5, 8, 3, 4), (3, 6, 9, 5, 6)]
But to get every combination:
In [8]: [(x,y,z,t1,t2) for x in X for y in Y for z in Z for t1 in T1 for t2 in T2]
Out[8]:
[(1, 4, 7, 1, 2),
(1, 4, 7, 1, 4),
(1, 4, 7, 1, 6),
(1, 4, 7, 1, 1),
(1, 4, 7, 3, 2),
(1, 4, 7, 3, 4),
...
]
For a total of:
In [9]: len(_)
Out[9]: 432
And you could easily pass those to your function or just use sum().
But you mention mesh grid and tag numpy, so using that:
Passing these lists to meshgrid:
In [10]: Xa,Ya,Za,T1a,T2a = np.meshgrid(X,Y,Z,T1,T2, indexing='ij', sparse=True)
That makes 5 arrays, with shapes like:
In [11]: Xa.shape
Out[11]: (3, 1, 1, 1, 1)
In [12]: T1a.shape
Out[12]: (1, 1, 1, 4, 1)
If I didn't specify sparse, the meshgrid arrays would all have shape as res below.
In [14]: def s(x,y,z,t1,t2):
...: return x + y + z + t1 + t2
...:
In [15]: res = s(Xa,Ya,Za,T1a,T2a)
In [16]: res.shape
Out[16]: (3, 3, 3, 4, 4)
That's same number of combinations as with the lists, but arranged as 5d array:
In [17]: res.size
Out[17]: 432
A sample 2d array:
In [19]: res[0,0,0]
Out[19]:
array([[15, 17, 19, 14],
[17, 19, 21, 16],
[19, 21, 23, 18],
[15, 17, 19, 14]])
If I made an array from Theta, I could have gotten the T1,T2 values by selecting columns:
In [20]: ThetaA = np.array(Theta)
In [21]: ThetaA
Out[21]:
array([[1, 2],
[3, 4],
[5, 6],
[1, 1]])
In [23]: ThetaA[:,0], T1
Out[23]: (array([1, 3, 5, 1]), (1, 3, 5, 1))
Read up on broadcasting to learn how the 'sparse' Xa, Ya, etc arrays work together to create the 5d res array.

Using np.pad() on structured array

Another post I had does exactly what I wanted, but I cannot seem to implement on a structured array.
Say I have an array like so:
>>> arr = np.empty(2, dtype=np.dtype([('xy', np.float32, (2, 2))]))
>>> arr['xy']
array([[[1., 1.],
[2., 2.]],
[3., 3.],
[4., 4.]]], dtype=float32)
I need to pad it so that the last row in each subarray is repeated a specific number of times:
arr['xy'] = np.pad(arr['xy'], [(0, 0), (0, 2), (0, 0)], mode='edge')
However I'm getting a ValueError:
ValueError: could not broadcast input array from shape (2, 4, 2) into shape (2, 2, 2)
So without a structured array, I tried the following:
>>> arr = np.array([[[1, 1], [2, 2]], [[3, 3], [4, 4]]])
>>> arr
array([[[1, 1],
[2, 2]],
[3, 3],
[4, 4]]], dtype=float32)
>>> arr = np.pad(arr, [(0, 0), (0, 2), (0, 0)], mode='edge')
>>> arr
array([[[1, 1],
[2, 2],
[2, 2],
[2, 2]],
[3, 3],
[4, 4],
[4, 4],
[4, 4]], dtype=float32)
How come I cannot repeat with a structured array?
Your padding works, it's the assignment to ar["xy"] that fails, you can't change the shape of a structure.
>>> arr = np.empty(2, dtype=np.dtype([('xy', np.float32, (2, 2))]))
>>> ar2 = np.pad(arr['xy'], [(0, 0), (0, 2), (0, 0)], mode='edge')
>>> ar2.shape
(2, 4, 2)
>>> arr["xy"] = ar2
Traceback (most recent call last):
File "<stdin>", line 1, in <module>
ValueError: could not broadcast input array from shape (2,4,2) into shape (2,2,2)
It might be the late reply, but you can use it instead.
assume that you have structured ndarray input "sample"
and pad_value_dict, having {name: pad_value} you want to add.
pad_arr = np.array(
[tuple([pad_value_dict[k] for k in keys])],
dtype=sample.dtype
)
pad_arr = np.tile(pad_arr, pad_len)
result = np.append(sample, pad_arr)

"invert" an array, i.e. convert list of 2d-indices into a 2d-array of 1d indices

The problem I want to solve in a preferably numpythonic way is this:
I have a list A of 2d indices, for example:
A = [(0, 3), (2, 2), (3, 1)]
My goal is to now get an array
[[H H H 0],
[H H H H],
[H H 1 H],
[H 2 H H]]
Where H would be some default value (for example -1)
So the problem is generally about inverting an array in this fashion.
If A is injective (no value appears twice) I can state it rigorously:
Let A be an injective array of 2d-indices.
Then, generate a 2d-array B such that B[i, j] = A.index((i, j))
Or for A not necessarily injective:
Let A be an injective array of 2d-indices.
Then, generate a 2d-array B such that A[B[i, j]] = (i, j)
More specifically in the non injective case we could resolve the situation with an additional "decider" function.
Say
A = [(0, 3), (2, 2), (3, 1), (0, 3)]
Then to resolve the conflict between (0, 3) being in position 0 and 3, I would like to apply some function to equivalent indices to find a definite value.
As an example:
In my case, specifically, I have a second array C with the same length as A.
If there are several candidates (2d-indices) in A for one "position" in the final 2d array, the chosen one should be the one whose 1d index in A minimizes the value in C.
I hope the problem is made clear by these examples.
Thank you for any help.
Edit: more example:
A = [(0, 3), (2, 2), (3, 1)]
print(my_dream_func(A, default=7)
>>> [[7 7 7 0],
[7 7 7 7],
[7 7 1 7],
[7 2 7 7]]
A = [(0, 3), (2, 2), (3, 1), (0, 3)]
print(my_dream_func(A, default=7))
>>> Err: an index appears twice
an alternative for this scenario:
def resolveFunc(indices):
c = [0.5, 2.0, 3.4, -1.9]
return(np.argmin(c[indices]))
A = [(0, 3), (2, 2), (3, 1), (0, 3)]
print(my_dream_func(A, resolveFunc, default=7))
#now resolveFunc is executed on 0 and 3
#because 0.5 > -1.9, 3 is chosen as the value for (0, 3)
>>> [[7 7 7 3],
[7 7 7 7],
[7 7 1 7],
[7 2 7 7]]
I would do it as follows:
In [11]: A = np.array([(0, 3), (2, 2), (3, 1)])
In [12]: a = np.full((len(A), len(A)), 7) # here H = 7
In [13]: a
Out[13]:
array([[7, 7, 7, 7],
[7, 7, 7, 7],
[7, 7, 7, 7],
[7, 7, 7, 7]])
In [14]: a[A[:, 0], A[:, 1]] = np.arange(len(A))
In [15]: a
Out[15]:
array([[7, 7, 7, 0],
[7, 7, 7, 7],
[7, 7, 1, 7],
[7, 2, 7, 7]])
The "decider" function is last wins.
If you want to chose a different decider function, you could specify/modify the tuple list (and enumeration) first, rather than trying to do something clever in numpy...
Numpy supports the simultaneous assignment of multiple values to multiple indizes.
Using this the most numpythonic way to write your function would thus be:
import numpy as np
def f(idx, shape, default):
arr = np.full(shape, default)
arr[idx] = np.arange(0, len(idx))
return arr
shape=(4,4)
default=7
idx=[(1,2),(0,3)]
print(f(idx, shape, default))
In case of duplicate indizes in idx, the last index tuple overwrites any predecessors.

Combinations without repeat and ordering matters or Permutations of array elements

For a 1D NumPy array, I am looking to get the combinations without the same elements being repeated in a combination. The order is important. So, [a,b] and [b,a] would be two distinct combinations. Since we don't want repeats, [a,a] and [b,b] aren't valid combinations. For simplicity, let's keep it to two elements per combination. Thus, the output would be a 2D NumPy array with 2 columns.
The desired result would be essentially same as itertools.product output except that we need to mask out the combinations that are repeated. As such, we can solve it for a sample case, like so -
In [510]: import numpy as np
In [511]: a = np.array([4,2,9,1,3])
In [512]: from itertools import product
In [513]: np.array(list(product(a,repeat=2)))[~np.eye(len(a),dtype=bool).ravel()]
Out[513]:
array([[4, 2],
[4, 9],
[4, 1],
[4, 3],
[2, 4],
[2, 9],
[2, 1],
[2, 3],
[9, 4],
[9, 2],
[9, 1],
[9, 3],
[1, 4],
[1, 2],
[1, 9],
[1, 3],
[3, 4],
[3, 2],
[3, 9],
[3, 1]])
But, creating that huge array and then masking out and hence not using some elements, doesn't look too efficient to me.
That got me thinking if numpy.ndarray.strides could be leveraged here. I have one solution with that idea in mind, which I will be posting as an answer post, but would love to see other efficient ones.
In terms of usage - We come across these cases with adjacency matrices among others and I thought it would be good to solve such a problem. For easier and efficient plug-n-play into other problems, it would be nice to have the final output that's not a view of some intermediate array.
Seems like np.lib.stride_tricks.as_strided could be used to maximize the efficiency of views and we delay the copying until the final stage, where we assign into an initialized array. The implementation would be in two steps, with some work needed for the second column (as shown in the sample case in the question), which we are calling as one-cold (fancy name that denotes one element missing per sequence / is cold in a each interval of len(input_array) - 1)
def onecold(a):
n = len(a)
s = a.strides[0]
strided = np.lib.stride_tricks.as_strided
b = np.concatenate((a,a[:-1]))
return strided(b[1:], shape=(n-1,n), strides=(s,s))
To showcase, onecold with a sample case -
In [563]: a
Out[563]: array([4, 2, 9, 1, 3])
In [564]: onecold(a).reshape(len(a),-1)
Out[564]:
array([[2, 9, 1, 3],
[4, 9, 1, 3],
[4, 2, 1, 3],
[4, 2, 9, 3],
[4, 2, 9, 1]])
To solve the original problem, we will use it like so -
def combinations_without_repeat(a):
n = len(a)
out = np.empty((n,n-1,2),dtype=a.dtype)
out[:,:,0] = np.broadcast_to(a[:,None], (n, n-1))
out.shape = (n-1,n,2)
out[:,:,1] = onecold(a)
out.shape = (-1,2)
return out
Sample run -
In [574]: a
Out[574]: array([4, 2, 9, 1, 3])
In [575]: combinations_without_repeat(a)
Out[575]:
array([[4, 2],
[4, 9],
[4, 1],
[4, 3],
[2, 4],
[2, 9],
[2, 1],
[2, 3],
[9, 4],
[9, 2],
[9, 1],
[9, 3],
[1, 4],
[1, 2],
[1, 9],
[1, 3],
[3, 4],
[3, 2],
[3, 9],
[3, 1]])
Seems quite efficient for a 1000 elements array of ints -
In [578]: a = np.random.randint(0,9,(1000))
In [579]: %timeit combinations_without_repeat(a)
100 loops, best of 3: 2.35 ms per loop
Would love to see others!
"It would be essentially same as itertools.product output, expect that we need to mask out the combinations that are repeated." Actually, what you want is itertools.permutations:
In [7]: import numpy as np
In [8]: from itertools import permutations
In [9]: a = np.array([4,2,9,1,3])
In [10]: list(permutations(a, 2))
Out[10]:
[(4, 2),
(4, 9),
(4, 1),
(4, 3),
(2, 4),
(2, 9),
(2, 1),
(2, 3),
(9, 4),
(9, 2),
(9, 1),
(9, 3),
(1, 4),
(1, 2),
(1, 9),
(1, 3),
(3, 4),
(3, 2),
(3, 9),
(3, 1)]
Benchmarking Post
Posting the performance numbers/figures for the proposed approaches thus far in this wiki-post.
Proposed solutions :
import numpy as np
from itertools import permutations
# https://stackoverflow.com/a/48234170/ #Divakar
def onecold(a):
n = len(a)
s = a.strides[0]
strided = np.lib.stride_tricks.as_strided
b = np.concatenate((a,a[:-1]))
return strided(b[1:], shape=(n-1,n), strides=(s,s))
# https://stackoverflow.com/a/48234170/ #Divakar
def combinations_without_repeat(a):
n = len(a)
out = np.empty((n,n-1,2),dtype=a.dtype)
out[:,:,0] = np.broadcast_to(a[:,None], (n, n-1))
out.shape = (n-1,n,2)
out[:,:,1] = onecold(a)
out.shape = (-1,2)
return out
# https://stackoverflow.com/a/48234349/ #Warren Weckesser
def itertools_permutations(a):
return np.array(list(permutations(a, 2)))
Using benchit package (few benchmarking tools packaged together; disclaimer: I am its author) to benchmark proposed solutions.
import benchit
in_ = [np.random.rand(n) for n in [10,20,50,100,200,500,1000]]
funcs = [combinations_without_repeat, itertools_permutations]
t = benchit.timings(funcs, in_)
t.rank()
t.plot(logx=True, save='timings.png')

Return the subset of NumPy array according to the first element of each row

I am trying to get the subset x of the given NumPy array alist such that the first element of each row must be in the list r.
>>> import numpy
>>> alist = numpy.array([(0, 2), (0, 4), (1, 3), (1, 4), (2, 1), (3, 1), (3, 2), (4, 1), (4, 3), (4, 2)])
>>> alist
array([[0, 2],
[0, 4],
[1, 3],
[1, 4],
[2, 1],
[3, 1],
[3, 2],
[4, 1],
[4, 3],
[4, 2]])
>>> r = [1,3]
>>> x = alist[where first element of each row is in r] #this i need to figure out.
>>> x
array([[1, 3],
[1, 4],
[3, 1],
[3, 2]])
Any easy way (without looping as I've a large dataset) to do this in Python?
Slice the first column off input array (basically selecting first elem from each row), then use np.in1d with r as the second input to create a mask of such valid rows and finally index into the rows of the array with the mask to select the valid ones.
Thus, the implementation would be like so -
alist[np.in1d(alist[:,0],r)]
Sample run -
In [258]: alist # Input array
Out[258]:
array([[0, 2],
[0, 4],
[1, 3],
[1, 4],
[2, 1],
[3, 1],
[3, 2],
[4, 1],
[4, 3],
[4, 2]])
In [259]: r # Input list to be searched for
Out[259]: [1, 3]
In [260]: np.in1d(alist[:,0],r) # Mask of valid rows
Out[260]: array([False, False, True, True, False, True, True,
False, False, False], dtype=bool)
In [261]: alist[np.in1d(alist[:,0],r)] # Index and select for final o/p
Out[261]:
array([[1, 3],
[1, 4],
[3, 1],
[3, 2]])
You can construct the index array for the valid rows using some indexing tricks: we can add an additional dimension and check equality with each element of your first column:
import numpy as np
alist = np.array([(0, 2), (0, 4), (1, 3), (1, 4), (2, 1),
(3, 1), (3, 2), (4, 1), (4, 3), (4, 2)])
inds = (alist[:,0][:,None] == r).any(axis=-1)
x = alist[inds,:] # the valid rows
The trick is that we take the first column of alist, make it an (N,1)-shaped array, make use of array broadcasting in the comparison to end up with an (N,2)-shape boolean array, and if any of the values in a given row is True, we keep that index. The resulting index array is the exact same as the np.in1d one in Divakar's answer.

Categories

Resources