Cross-reference between numpy arrays - python

I have a 1d array of ids, for example:
a = [1, 3, 4, 7, 9]
Then another 2d array:
b = [[1, 4, 7, 9], [3, 7, 9, 1]]
I would like to have a third array with the same shape of b where each item is the index of the corresponding item from a, that is:
c = [[0, 2, 3, 4], [1, 3, 4, 0]]
What's a vectorized way to do that using numpy?

this may not make sense but ... you can use np.interp to do that ...
a = [1, 3, 4, 7, 9]
sorting = np.argsort(a)
positions = np.arange(0,len(a))
xp = np.array(a)[sorting]
fp = positions[sorting]
b = [[1, 4, 7, 9], [3, 7, 9, 1]]
c = np.rint(np.interp(b,xp,fp)) # rint is better than astype(int) because floats are tricky.
# but astype(int) should work faster for small len(a) but not recommended.
this should work as long as the len(a) is smaller than the largest representable int by float (16,777,217) .... and this algorithm is of O(n*log(n)) speed, (or rather len(b)*log(len(a)) to be precise)

Effectively, this solution is a one-liner. The only catch is that you need to reshape the array before you do the one-liner, and then reshape it back again:
import numpy as np
a = np.array([1, 3, 4, 7, 9])
b = np.array([[1, 4, 7, 9], [3, 7, 9, 1]])
original_shape = b.shape
c = np.where(b.reshape(b.size, 1) == a)[1]
c = c.reshape(original_shape)
This results with:
[[0 2 3 4]
[1 3 4 0]]

Broadcasting to the rescue!
>>> ((np.arange(1, len(a) + 1)[:, None, None]) * (a[:, None, None] == b)).sum(axis=0) - 1
array([[0, 2, 3, 4],
[1, 3, 4, 0]])

Related

Comparing 2-dimensional numpy array in python

I want to comparing 2 values in a 2-dimensional numpy array. The array is as follows:
a = [[1, 3, 5],
[4, 8, 1]]
I want to comparing [1, 3, 5] with [4, 8, 1] with a greater value into 1 group.
The result I want is like this:
a1 = [4, 8, 5]
a2 = [1, 3, 1]
How could the code be written in python?
You can use np.sort on axis 0 (column-wise). Reverse the order using [::-1] to get them in descending order
>>> np.sort(a, axis = 0)[::-1]
array([[4, 8, 5],
[1, 3, 1]])
Since you only have 2 rows you can do this:
a = np.array([[1, 3, 5],
[4, 8, 1]])
idx = np.greater(*a)
a1 = a[idx.astype(int),np.arange(3)]
a2 = a[~idx.astype(int),np.arange(3)]

Replace values in array of indexes corresponding to another array

I have an array A of size [1, x] of values and an array B of size [1, y] (y > x) of indexes corresponding to array A. I want as result an array C of size [1,y] filled with values of A.
Here is an example of inputs and outputs:
>>> A = [6, 7, 8]
>>> B = [0, 2, 0, 0, 1]
>>> C = #Some operations
>>> C
[6, 8, 6, 6, 7]
Of course I could solve it like that:
>>> C = []
>>> for val in B:
>>> C.append(A[val])
But I was actually expected a nicer way to do it. Especially because I want to use it as an argument of another function. An expression looking like A[B] (but a working one) would be ideal. I don't mind solution using NumPy or pandas.
Simple with a list comprehension:
A = [6, 7, 8]
B = [0, 2, 0, 0, 1]
C = [A[i] for i in B]
print(C)
This yields
[6, 8, 6, 6, 7]
For fetching multiple items operator.itemgetter comes in handy:
from operator import itemgetter
A = [6, 7, 8]
B = [0, 2, 0, 0, 1]
itemgetter(*B)(A)
# (6, 8, 6, 6, 7)
Also as you've mentioned numpy, this could be done directly by indexing the array as you've specified, i.e. A[B]:
import numpy as np
A = np.array([6, 7, 8])
B = np.array([0, 2, 0, 0, 1])
A[B]
# array([6, 8, 6, 6, 7])
Another option is to use np.take:
np.take(A,B)
# array([6, 8, 6, 6, 7])
This is one way, using numpy ndarrays:
import numpy as np
A = [6, 7, 8]
B = [0, 2, 0, 0, 1]
C = list(np.array(A)[B]) # No need to convert B into an ndarray
# list() is for converting ndarray back into a list,
# (if that's what you finally want)
print (C)
Explanation
Given a numpy ndarray (np.array(A)), we can index into it using an
array of integers (which happens to be exactly what your preferred
form of solution is): The array of integers that you use for
indexing into the ndarray, need not be another ndarray. It can even
be a list, and that suits us too, since B happens to a list. So,
what we have is:
np.array(A)[B]
The result of such an indexing would be another ndarray, having the
same shape (dimensions) as the array of indexes. So, in our case, as
we are indexing into an ndarray using a list of integer indexes, the
result of that indexing would be a one-dimensional ndarray of the
same length as the list of indexes.
Finally, if we want to convert the above result, from a
one-dimensional ndarray back into a list, we can pass it as an
argument to list():
list(np.array(A)[B])
You could do it with list comprehension:
>>> A = [6, 7, 8]
>>> B = [0, 2, 0, 0, 1]
>>> C = [A[x] for x in B]
>>> print(C)
[6, 8, 6, 6, 7]
I think you need a generator (list comprehension):
A = [1, 2, 3]
B = [0, 2, 0, 0, 1]
C = [A[i] for i in B]
Once you're using numpy.array you're able to do exactly what you want with syntax you expect:
>>> a = array([6, 7, 8])
>>> b = array([0, 2, 0, 0, 1])
>>> a[b]
array([6, 8, 6, 6, 7])

Python numpy indexing confusion

I'm new in python, I was looking into a code which is similar to as follows,
import numpy as np
a = np.ones([1,1,5,5], dtype='int64')
b = np.ones([11], dtype='float64')
x = b[a]
print (x.shape)
# (1, 1, 5, 5)
I looked into the python numpy documentation I didn't find anything related to such case. I'm not sure what's going on here and I don't know where to look.
Edit
The actual code
def gausslabel(length=180, stride=2):
gaussian_pdf = signal.gaussian(length+1, 3)
label = np.reshape(np.arange(stride/2, length, stride), [1,1,-1,1])
y = np.reshape(np.arange(stride/2, length, stride), [1,1,1,-1])
delta = np.array(np.abs(label - y), dtype=int)
delta = np.minimum(delta, length-delta)+length/2
return gaussian_pdf[delta]
I guess that this code is trying to demonstrate that if you index an array with an array, the result is an array with the same shape as the indexing array (in this case a) and not the indexed array (i.e. b)
But it's confusing because b is full of 1s. Rather try this with a b full of different numbers:
>> a = np.ones([1,1,5,5], dtype='int64')
>> b = np.arange(11) + 3
array([ 3, 4, 5, 6, 7, 8, 9, 10, 11, 12, 13])
>>> b[a]
array([[[[4, 4, 4, 4, 4],
[4, 4, 4, 4, 4],
[4, 4, 4, 4, 4],
[4, 4, 4, 4, 4],
[4, 4, 4, 4, 4]]]])
because a is an array of 1s, the only element of b that is indexed is b[1] which equals 4. The shape of the result though is the shape of a, the array used as the index.

How to select value from array that is closest to value in array using vectorization?

I have an array of values that I want to replace with from an array of choices based on which choice is linearly closest.
The catch is the size of the choices is defined at runtime.
import numpy as np
a = np.array([[0, 0, 0], [4, 4, 4], [9, 9, 9]])
choices = np.array([1, 5, 10])
If choices was static in size, I would simply use np.where
d = np.where(np.abs(a - choices[0]) > np.abs(a - choices[1]),
np.where(np.abs(a - choices[0]) > np.abs(a - choices[2]), choices[0], choices[2]),
np.where(np.abs(a - choices[1]) > np.abs(a - choices[2]), choices[1], choices[2]))
To get the output:
>>d
>>[[1, 1, 1], [5, 5, 5], [10, 10, 10]]
Is there a way to do this more dynamically while still preserving the vectorization.
Subtract choices from a, find the index of the minimum of the result, substitute.
a = np.array([[0, 0, 0], [4, 4, 4], [9, 9, 9]])
choices = np.array([1, 5, 10])
b = a[:,:,None] - choices
np.absolute(b,b)
i = np.argmin(b, axis = -1)
a = choices[i]
print a
>>>
[[ 1 1 1]
[ 5 5 5]
[10 10 10]]
a = np.array([[0, 3, 0], [4, 8, 4], [9, 1, 9]])
choices = np.array([1, 5, 10])
b = a[:,:,None] - choices
np.absolute(b,b)
i = np.argmin(b, axis = -1)
a = choices[i]
print a
>>>
[[ 1 1 1]
[ 5 10 5]
[10 1 10]]
>>>
The extra dimension was added to a so that each element of choices would be subtracted from each element of a. choices was broadcast against a in the third dimension, This link has a decent graphic. b.shape is (3,3,3). EricsBroadcastingDoc is a pretty good explanation and has a graphic 3-d example at the end.
For the second example:
>>> print b
[[[ 1 5 10]
[ 2 2 7]
[ 1 5 10]]
[[ 3 1 6]
[ 7 3 2]
[ 3 1 6]]
[[ 8 4 1]
[ 0 4 9]
[ 8 4 1]]]
>>> print i
[[0 0 0]
[1 2 1]
[2 0 2]]
>>>
The final assignment uses an Index Array or Integer Array Indexing.
In the second example, notice that there was a tie for element a[0,1] , either one or five could have been substituted.
To explain wwii's excellent answer in a little more detail:
The idea is to create a new dimension which does the job of comparing each element of a to each element in choices using numpy broadcasting. This is easily done for an arbitrary number of dimensions in a using the ellipsis syntax:
>>> b = np.abs(a[..., np.newaxis] - choices)
array([[[ 1, 5, 10],
[ 1, 5, 10],
[ 1, 5, 10]],
[[ 3, 1, 6],
[ 3, 1, 6],
[ 3, 1, 6]],
[[ 8, 4, 1],
[ 8, 4, 1],
[ 8, 4, 1]]])
Taking argmin along the axis you just created (the last axis, with label -1) gives you the desired index in choices that you want to substitute:
>>> np.argmin(b, axis=-1)
array([[0, 0, 0],
[1, 1, 1],
[2, 2, 2]])
Which finally allows you to choose those elements from choices:
>>> d = choices[np.argmin(b, axis=-1)]
>>> d
array([[ 1, 1, 1],
[ 5, 5, 5],
[10, 10, 10]])
For a non-symmetric shape:
Let's say a had shape (2, 5):
>>> a = np.arange(10).reshape((2, 5))
>>> a
array([[0, 1, 2, 3, 4],
[5, 6, 7, 8, 9]])
Then you'd get:
>>> b = np.abs(a[..., np.newaxis] - choices)
>>> b
array([[[ 1, 5, 10],
[ 0, 4, 9],
[ 1, 3, 8],
[ 2, 2, 7],
[ 3, 1, 6]],
[[ 4, 0, 5],
[ 5, 1, 4],
[ 6, 2, 3],
[ 7, 3, 2],
[ 8, 4, 1]]])
This is hard to read, but what it's saying is, b has shape:
>>> b.shape
(2, 5, 3)
The first two dimensions came from the shape of a, which is also (2, 5). The last dimension is the one you just created. To get a better idea:
>>> b[:, :, 0] # = abs(a - 1)
array([[1, 0, 1, 2, 3],
[4, 5, 6, 7, 8]])
>>> b[:, :, 1] # = abs(a - 5)
array([[5, 4, 3, 2, 1],
[0, 1, 2, 3, 4]])
>>> b[:, :, 2] # = abs(a - 10)
array([[10, 9, 8, 7, 6],
[ 5, 4, 3, 2, 1]])
Note how b[:, :, i] is the absolute difference between a and choices[i], for each i = 1, 2, 3.
Hope that helps explain this a little more clearly.
I love broadcasting and would have gone that way myself too. But, with large arrays, I would like to suggest another approach with np.searchsorted that keeps it memory efficient and thus achieves performance benefits, like so -
def searchsorted_app(a, choices):
lidx = np.searchsorted(choices, a, 'left').clip(max=choices.size-1)
ridx = (np.searchsorted(choices, a, 'right')-1).clip(min=0)
cl = np.take(choices,lidx) # Or choices[lidx]
cr = np.take(choices,ridx) # Or choices[ridx]
mask = np.abs(a - cl) > np.abs(a - cr)
cl[mask] = cr[mask]
return cl
Please note that if the elements in choices are not sorted, we need to add in the additional argument sorter with np.searchsorted.
Runtime test -
In [160]: # Setup inputs
...: a = np.random.rand(100,100)
...: choices = np.sort(np.random.rand(100))
...:
In [161]: def broadcasting_app(a, choices): # #wwii's solution
...: return choices[np.argmin(np.abs(a[:,:,None] - choices),-1)]
...:
In [162]: np.allclose(broadcasting_app(a,choices),searchsorted_app(a,choices))
Out[162]: True
In [163]: %timeit broadcasting_app(a, choices)
100 loops, best of 3: 9.3 ms per loop
In [164]: %timeit searchsorted_app(a, choices)
1000 loops, best of 3: 1.78 ms per loop
Related post : Find elements of array one nearest to elements of array two

1D to 2D array - python

I would like to change the data stored in 1D into 2D:
I mean:
from
x|y|a
1|1|a(1,1)
2|1|a(2,1)
3|1|a(3,1)
1|2|a(1,2)
...
into:
x\y|1 |2 |3
1 |a(1,1)|a(1,2)|a(1,3
2 |a(2,1)|a(2,2)|a(2,3)...
3 |a(3,1)|a(3,2)|a(3,3)...
...
I did it by 2 loops:
(rows - array of x,y,a)
for n in range(len(rows)):
for k in range(x_len):
for l in range(y_len):
if ((a[2, n] == x[0, k]) and (a[3, n] == y[0, l])):
c[k, l] = a[0, n]
but it takes ages, so my question is if there is a smart and quick
solution for that in Python.
So to clarify what I want to do:
I know the return() function, the point is that it's randomly in array a.
So:
a = np.empty([4, len(rows)]
I read the data into array a from the database which has 4 columns (1,2,x,y) and 'len(rows)' rows.
I am interested in '1' column - this one I want to put to the new modified array.
x = np.zeros([1, x_len], float)
y = np.zeros([1, y_len], float)
x is a vector of sorted column(x) from the array a, but without duplicitas with a length x_len
(I read it by the sql query: select distinct ... )
y is a vector of sorted column(y) from the array a (without duplicitas) with a length y_len
Then I am making the array:
c = np.zeros([x_len, y_len], float)
and put by 3 loops (sorry for the mistake before) the data from array a:
>
for n in range(len(rows)):
for k in range(x_len):
for l in range(y_len):
if ((a[2, n] == x[0, k]) and (a[3, n] == y[0, l])):
c[k, l] = a[0, n]
Example:
Array a
array([[1, 3, 6, 5, 6],
[1, 2, 5, 5, 6],
[1, 4, 7, 1, 2], ## x
[2, 5, 3, 3, 4]]) ## y
Vectors: x and y
[[1,2,4,7]] ## x with x_len=4
[[2,3,4,5]] ## y with y_len=4
Array c
array([[1, 5, 0, 0],
[0, 0, 0, 0],
[0, 0, 0, 3],
[0, 6, 0, 0]])
the last array c looks like this (the first a[0] is written into):
x\y 2|3|4|5
-----------
1 1|5|0|0
2 0|0|0|0
4 0|0|0|3
7 0|6|0|0
I hope I didn't make mistake how it's written into the array c.
Thanks a lot for any help.
You could use numpy:
>>> import numpy as np
>>> a = np.arange(9)
>>> a
array([0, 1, 2, 3, 4, 5, 6, 7, 8])
>>> a.reshape(3,3)
array([[0, 1, 2],
[3, 4, 5],
[6, 7, 8]])
#or:
>>> a.reshape(3,3).transpose()
array([[0, 3, 6],
[1, 4, 7],
[2, 5, 8]])

Categories

Resources