numpy dot on 1D and 2D array - python

I am trying to understand what happens in the following python code:
import numpy as np
numberList1 = [1,2,3]
numberList2 = [[4,5,6],[7,8,9]]
result = np.dot(numberList2, numberList1)
# Converting iterator to set
resultSet = set(result)
print(resultSet)
Output:
{32, 50}
I can see that it is multiplying each element in numberList1 by the element in the same position in each array within numberList2 - so {1*4 + 2*5 + 3*6 = 32},{1*7+2*8+3*9 = 50}.
But, if I change the arrays to:
numberList1 = [1,1,1]
numberList2 = [[2,2,2],[3,3,3]]
Then the output I see is
{9, 6}
Which is the wrong way around...
and, if I change it to:
numberList1 = [1,1,1]
numberList2 = [[2,2,2],[2,2,2]]
Then the output I see is just
{6}
From the documentation:
If a is an N-D array and b is a 1-D array, it is a sum product over the last axis of a and b.
I am not enough of a mathematician to understand quite what this is telling me; or why the order of the outputs swaps around sometimes.

a set is an unordered data type - and it will remove your duplicates. np.dot does not return an iterator (as mentioned in your code) but an np.ndarray which will be in the order you expect:
import numpy as np
numberList1 = [1, 2, 3]
numberList2 = [[4, 5, 6], [7, 8, 9]]
result = np.dot(numberList2, numberList1)
# [32 50]
# <class 'numpy.ndarray'>
# numberList1 = [1, 1, 1]
# numberList2 = [[2, 2, 2], [3, 3, 3]]
# -> [6 9]

Related

Setting results of torch.gather(...) calls

I have a 2D pytorch tensor of shape n by m. I want to index the second dimension using a list of indices (which could be done with torch.gather) then then also set new values to the result of the indexing.
Example:
data = torch.tensor([[0,1,2], [3,4,5], [6,7,8]]) # shape (3,3)
indices = torch.tensor([1,2,1], dtype=torch.long).unsqueeze(-1) # shape (3,1)
# data tensor:
# tensor([[0, 1, 2],
# [3, 4, 5],
# [6, 7, 8]])
I want to select the specified indices per row (which would be [1,5,7] but then also set these values to another number - e.g. 42
I can select the desired columns row wise by doing:
data.gather(1, indices)
tensor([[1],
[5],
[7]])
data.gather(1, indices)[:] = 42 # **This does NOT work**, since the result of gather
# does not use the same storage as the original tensor
which is fine, but I would like to change these values now, and have the change also affect the data tensor.
I can do what I want to achieve using this, but it seems to be very un-pythonic:
max_index = torch.max(indices)
for i in range(0, max_index + 1):
mask = (indices == i).nonzero(as_tuple=True)[0]
data[mask, i] = 42
print(data)
# tensor([[ 0, 42, 2],
# [ 3, 4, 42],
# [ 6, 42, 8]])
Any hints on how to do that more elegantly?
What you are looking for is torch.scatter_ with the value option.
Tensor.scatter_(dim, index, src, reduce=None) → Tensor
Writes all values from the tensor src into self at the indices specified in the index tensor. For each value in src, its output index is specified by
its index in src for dimension != dim and by the corresponding value
in index for dimension = dim.
With 2D tensors as input and dim=1, the operation is:
self[i][index[i][j]] = src[i][j]
No mention of the value parameter though...
With value=42, and dim=1, this will have the following effect on data:
data[i][index[i][j]] = 42
Here applied in-place:
>>> data.scatter_(index=indices, dim=1, value=42)
>>> data
tensor([[ 0, 42, 2],
[ 3, 4, 42],
[ 6, 42, 8]])

Selecting one element from each innermost dimension with numpy

I have a three dimensional numpy source array and a two-dimensional numpy array of indexes.
For example:
src = np.array([[[1,2,3],[4,5,6]],
[[7,8,9],[10,11,12]]])
idx = np.array([[0,1],
[1,2]])
I'd like to get a 2d array, where each element represents the indexed value in the innermost dimension in that position:
array([[1,5],
[8,12]])
How do I do this with numpy?
You can try np.take, here is the documentation.
However, you should count the index of the array after flattening all the elements. For example you should use
src = np.array([[[1,2,3],[4,5,6]],
[[7,8,9],[10,11,12]]])
idx = np.array([[0,4],
[7,11]])
# Wanted result
res = np.take(src, idx)
where src was regarded as [1, 2, 3, 4, 5, 6, 7, 8, 9, 10, 11, 12]
You can also try np.take_along_axis, here is the documentation.
Using this method need your src and idx in same dimension, therefore, you should first unsqueezed the src and squeeze the res.
# Unsqueezed the last dim
idx = np.expand_dims(idx, axis=-1)
# Squeeze the last dim
res = np.take_along_axis(src, idx, axis=2).squeeze(-1)
You can use the np.choose method with a little reshaping:
np.choose(idx.reshape((1, 2, 2)), src.transpose()).reshape((2, 2))
>>>> array([[ 1, 8],
[ 5, 12]])
Direct indexing:
src[np.arange(2)[:, None], np.arange(2), idx]

What is a best way to intersect multiple arrays with numpy array?

Suppose I have an example of numpy array:
import numpy as np
X = np.array([2,5,0,4,3,1])
And I also have a list of arrays, like:
A = [np.array([-2,0,2]), np.array([0,1,2,3,4,5]), np.array([2,5,4,6])]
I want to leave only these items of each list that are also in X. I expect also to do it in a most efficient/common way.
Solution I have tried so far:
Sort X using X.sort().
Find locations of items of each array in X using:
locations = [np.searchsorted(X, n) for n in A]
Leave only proper ones:
masks = [X[locations[i]] == A[i] for i in range(len(A))]
result = [A[i][masks[i]] for i in range(len(A))]
But it doesn't work because locations of third array is out of bounds:
locations = [array([0, 0, 2], dtype=int64), array([0, 1, 2, 3, 4, 5], dtype=int64), array([2, 5, 4, 6], dtype=int64)]
How to solve this issue?
Update
I ended up with idx[idx==len(Xs)] = 0 solution. I've also noticed two different approaches posted between the answers: transforming X into set vs np.sort. Both of them has plusses and minuses: set operations uses iterations which is quite slow in compare with numpy methods; however np.searchsorted speed increases logarithmically unlike acceses of set items which is instant. That why I decided to compare performance using data with huge sizes, especially 1 million items for X, A[0], A[1], A[2].
One idea would be less compute and minimal work when looping. So, here's one with those in mind -
a = np.concatenate(A)
m = np.isin(a,X)
l = np.array(list(map(len,A)))
a_m = a[m]
cut_idx = np.r_[0,l.cumsum()]
l_m = np.add.reduceat(m,cut_idx[:-1])
cl_m = np.r_[0,l_m.cumsum()]
out = [a_m[i:j] for (i,j) in zip(cl_m[:-1],cl_m[1:])]
Alternative #1 :
We can also use np.searchsorted to get the isin mask, like so -
Xs = np.sort(X)
idx = np.searchsorted(Xs,a)
idx[idx==len(Xs)] = 0
m = Xs[idx]==a
Another way with np.intersect1d
If you are looking for the most common/elegant one, think it would be with np.intersect1d -
In [43]: [np.intersect1d(X,A_i) for A_i in A]
Out[43]: [array([0, 2]), array([0, 1, 2, 3, 4, 5]), array([2, 4, 5])]
Solving your issue
You can also solve your out-of-bounds issue, with a simple fix -
for l in locations:
l[l==len(X)]=0
How about this, very simple and efficent:
import numpy as np
X = np.array([2,5,0,4,3,1])
A = [np.array([-2,0,2]), np.array([0,1,2,3,4,5]), np.array([2,5,4,6])]
X_set = set(X)
A = [np.array([a for a in arr if a in X_set]) for arr in A]
#[array([0, 2]), array([0, 1, 2, 3, 4, 5]), array([2, 5, 4])]
According to the docs, set operations all have O(1) complexity, therefore the overall is O(N)

How do I use numpy vectorize to iterate through a two-dimentional vector?

I am trying to use numpy.vectorize to iterate over a (2x5) matrix which contains two vectors representing the x- and y-values of coordinates. The coordinates (x- and y-value) are to be fed to a function returning a (1x1) vector for each iteration. So that in the end, the result should be a (1x5) vector. My problem is that instead of iterating through each element I want the algorithm to iterate through both vectors simultaneously, so it picks up the x- and y-values of the coordinates in parallel to feed it to the function.
data = np.transpose(np.array([[1, 2], [1, 3], [2, 1], [1, -1], [2, -1]]))
th_ = np.array([[1, 1]])
th0_ = -2
def positive(x, th = th_, th0 = th0_):
if signed_dist(x, th, th0)[0][0] > 0:
return np.array([[1]])
elif signed_dist(x, th, th0)[0][0] == 0:
return np.array([[0]])
else:
return np.array([[-1]])
positive_numpy = np.vectorize(positive)
results = positive_numpy(data)
Reading the numpy documentation did not really help and I want to avoid large workarounds in favor of computation timing. Thankful for any suggestion!
This is a bit of a guess, but looks like your code can be simplified to
data = np.array([[1, 2], [1, 3], [2, 1], [1, -1], [2, -1]]) # (5,2) array
th_ = np.array([[1, 1]])
th0_ = -2
alist = [signed_dist(x, th_, th0_) for x in data]
arr = np.array(alist) # (5,?,?) array
arr = arr[:,0,0] # (5,) array
arr[arr>0] = 1

Calculating correlations between every item in a list

I'm trying to calculate the Pearson correlation correlation between every item in my list. I'm trying to get the correlations between data[0] and data[1], data[0] and data[2], and data[1] and data[2].
import scipy
from scipy import stats
data = [[1, 2, 4], [9, 5, 1], [8, 3, 3]]
def pearson(x, y):
series1 = data[x]
series2 = data[y]
if x != y:
return scipy.stats.pearsonr(series1, series2)
h = [pearson(x,y) for x,y in range(0, len(data))]
This returns the error TypeError: 'int' object is not iterable on h. Could someone please explain the error here? Thanks.
range will return you a list of int values while you are trying to use it like it returning you a tuple. Try itertools.combinations instead:
import scipy
from scipy import stats
from itertools import combinations
data = [[1, 2, 4], [9, 5, 1], [8, 3, 3]]
def pearson(x, y):
series1 = data[x]
series2 = data[y]
if x != y:
return scipy.stats.pearsonr(series1, series2)
h = [pearson(x,y) for x,y in combinations(len(data), 2)]
Or as #Marius suggested:
h = [stats.pearsonr(data[x], data[y]) for x,y in combinations(len(data), 2)]
Why not use numpy.corrcoef
import numpy as np
data = [[1, 2, 4], [9, 5, 1], [8, 3, 3]]
Result:
>>> np.corrcoef(data)
array([[ 1. , -0.98198051, -0.75592895],
[-0.98198051, 1. , 0.8660254 ],
[-0.75592895, 0.8660254 , 1. ]])
The range() function will give you only an int for each iteration, and you can't assign an int to a pair of values.
If you want to go through every possible pair of possibilities of ints in that range you could try
import itertools
h = [pearson(x,y) for x,y in itertools.product(range(len(data)), repeat=2)]
That will combine all the possibilities in the given range in a tuple of 2 elements
Remember that, using that function you defined, when x==y you will have None values. To fix that you could use:
import itertools
h = [pearson(x,y) for x,y in itertools.permutations(range(len(data)), 2)]

Categories

Resources