Related
I have a tensor of N unique target labels, randomly selected from [0,R], where N<R (i.e., my target vector can have any length, but only contains N unique labels.). I would like to transform the labels to [0,N]. Is there a function available for this target transform? e.g. input vector: [12, 6, 4, 5, 3, 12, 4] → transformed vector : [4, 3, 1, 2, 0, 4, 1]
My attempt:
I have implemented the following snippet, which works as expected, but might not be the most glorious implementation:
import torch
def my_transform(vec):
t_ = torch.unique(vec)
return torch.cat(list(map(lambda x: (t_ == x).nonzero(as_tuple=True)[0], vec)))
t = torch.Tensor([12, 6, 4, 5, 3, 12, 4])
print(my_transform(t))
You're looking for searchsorted
import torch
t = torch.Tensor([12, 6, 4, 5, 3, 12, 4])
transformed = torch.searchsorted(t.unique(),t)
# tensor([4, 3, 1, 2, 0, 4, 1])
In addition to #lwohlhart's answer, we can also use the return_inverse argument of unique:
import torch
x = torch.tensor([12, 6, 4, 5, 3, 12, 4])
x.unique(return_inverse=True)
# (tensor([ 3, 4, 5, 6, 12]), tensor([4, 3, 1, 2, 0, 4, 1]))
There is a numpy way to make a sum each three elements in the interval? For example:
import numpy as np
mydata = np.array([4, 2, 3, 8, -6, 10])
I would like to get this result:
np.array([9, 13, 5, 12])
We can use np.convolve -
np.convolve(mydata,np.ones(3,dtype=int),'valid')
The basic idea with convolution is that we have a kernel that we slide through the input array and the convolution operation sums the elements multiplied by the kernel elements as the kernel slides through. So, to solve our case for a window size of 3, we are using a kernel of three 1s generated with np.ones(3).
Sample run -
In [334]: mydata
Out[334]: array([ 4, 2, 3, 8, -6, 10])
In [335]: np.convolve(mydata,np.ones(3,dtype=int),'valid')
Out[335]: array([ 9, 13, 5, 12])
Starting in Numpy 1.20, the sliding_window_view provides a way to slide/roll through windows of elements. Windows that you can then individually sum:
from numpy.lib.stride_tricks import sliding_window_view
# values = np.array([4, 2, 3, 8, -6, 10])
np.sum(sliding_window_view(values, window_shape = 3), axis = 1)
# array([9, 13, 5, 12])
where:
window_shape is the size of the sliding window
np.sum(array, axis = 1) sums sub-arrays
and the intermediate result of the sliding is:
sliding_window_view(np.array([4, 2, 3, 8, -6, 10]), window_shape = 3)
# array([[ 4, 2, 3],
# [ 2, 3, 8],
# [ 3, 8, -6],
# [ 8, -6, 10]])
A solution without using external libraries might look like this:
from collections import deque
def sliding_window_sum(a, size):
out = []
the_sum = 0
q = deque()
for i in a:
if len(q)==size:
the_sum -= q[0]
q.popleft()
q.append(i)
the_sum += i
if len(q)==size:
out.append(the_sum)
return out
v = [0, 0, 0, 0, 1, 1, 1, 0, 1, 1, 0, 0, 1, 0, 0, 0, 0, 0, 0, 1]
sliding_window_sum(v, 5)
Which gives the output:
[1, 2, 3, 3, 4, 4, 3, 2, 3, 2, 1, 1, 1, 0, 0, 1]
This matches the result of using numpy:
import numpy as np
np.convolve(v, np.ones(5, dtype=int),'valid').tolist()
Say I have a Numpy vector,
A = zeros(100)
and I divide it into subvectors by a list of breakpoints which index into A, for instance,
breaks = linspace(0, 100, 11, dtype=int)
So the i-th subvector would be lie between the indices breaks[i] (inclusive) and breaks[i+1] (exclusive).
The breaks are not necessarily equispaced, this is only an example.
However, they will always be strictly increasing.
Now I want to operate on these subvectors. For instance, if I want to set all elements of the i-th subvector to i, I might do:
for i in range(len(breaks) - 1):
A[breaks[i] : breaks[i+1]] = i
Or I might want to compute the subvector means:
b = empty(len(breaks) - 1)
for i in range(len(breaks) - 1):
b = A[breaks[i] : breaks[i+1]].mean()
And so on.
How can I avoid using for loops and instead vectorize these operations?
You can use simple np.cumsum -
import numpy as np
# Form zeros array of same size as input array and
# place ones at positions where intervals change
A1 = np.zeros_like(A)
A1[breaks[1:-1]] = 1
# Perform cumsum along it to create a staircase like array, as the final output
out = A1.cumsum()
Sample run -
In [115]: A
Out[115]: array([3, 8, 0, 4, 6, 4, 8, 0, 2, 7, 4, 9, 3, 7, 3, 8, 6, 7, 1, 6])
In [116]: breaks
Out[116]: array([ 0, 4, 9, 11, 18, 20])
In [142]: out
Out[142]: array([0, 0, 0, 0, 1, 1, 1, 1, 1, 2, 2, 3, 3, 3, 3, 3, 3, 3, 4, 4]..)
If you want to have mean values of those subvectors from A, you can use np.bincount -
mean_vals = np.bincount(out, weights=A)/np.bincount(out)
If you are looking to extend this functionality and use a custom function instead, you might want to look into MATLAB's accumarray equivalent for Python/Numpy: numpy_groupies whose source code is available here.
There really isn't a single answer to your question, but several techniques that you can use as building blocks. Another one you may find helpful:
All numpy ufuncs have a .reduceat method, which you can use to your advantage for some of your calculations:
>>> a = np.arange(100)
>>> breaks = np.linspace(0, 100, 11, dtype=np.intp)
>>> counts = np.diff(breaks)
>>> counts
array([10, 10, 10, 10, 10, 10, 10, 10, 10, 10])
>>> sums = np.add.reduceat(a, breaks[:-1], dtype=np.float)
>>> sums
array([ 45., 145., 245., 345., 445., 545., 645., 745., 845., 945.])
>>> sums / counts # i.e. the mean
array([ 4.5, 14.5, 24.5, 34.5, 44.5, 54.5, 64.5, 74.5, 84.5, 94.5])
You could use np.repeat:
In [35]: np.repeat(np.arange(0, len(breaks)-1), np.diff(breaks))
Out[35]:
array([0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 2, 2, 2,
2, 2, 2, 2, 2, 2, 2, 3, 3, 3, 3, 3, 3, 3, 3, 3, 3, 4, 4, 4, 4, 4, 4,
4, 4, 4, 4, 5, 5, 5, 5, 5, 5, 5, 5, 5, 5, 6, 6, 6, 6, 6, 6, 6, 6, 6,
6, 7, 7, 7, 7, 7, 7, 7, 7, 7, 7, 8, 8, 8, 8, 8, 8, 8, 8, 8, 8, 9, 9,
9, 9, 9, 9, 9, 9, 9, 9])
To compute arbitrary binned statistics you could use scipy.stats.binned_statistic:
import numpy as np
import scipy.stats as stats
breaks = np.linspace(0, 100, 11, dtype=int)
A = np.random.random(100)
means, bin_edges, binnumber = stats.binned_statistic(
x=np.arange(len(A)), values=A, statistic='mean', bins=breaks)
stats.binned_statistic can compute means, medians, counts, sums; or,
to compute an arbitrary statistics for each bin, you can pass a callable to the statistic parameter:
def func(values):
return values.mean()
funcmeans, bin_edges, binnumber = stats.binned_statistic(
x=np.arange(len(A)), values=A, statistic=func, bins=breaks)
assert np.allclose(means, funcmeans)
Can I use numpy to generate repeating patterns of indices for example.
0, 1, 2, 3, 4, 5, 0, 6, 7, 8, 9, 10, 0, 11, 12, 13, 14, 15
or
0,1,2,1,2,3,4,5,6,5,6,7
Is there a method in numpy i can use to generate these lists between a range ?
currently I am doing this using lists in python but I was curious if I could use numpy to speed things up.
I am not sure what methods to even look into other than numpy.arange.
Just to further clarify I am generating indices to triangles in opengl in various patterns.
so for traingles in a circle I have some code like this.
for fan_set in range(0, len(self.vertices) / vertex_length, triangle_count):
for i in range(fan_set + 1, fan_set + 8):
self.indices.append(fan_set)
self.indices.append(i)
self.indices.append(i + 1)
Your first example can be produced via numpy methods as:
In [860]: np.concatenate((np.zeros((3,1),int),np.arange(1,16).reshape(3,5)),axis=1).ravel()
Out[860]:
array([ 0, 1, 2, 3, 4, 5, 0, 6, 7, 8, 9, 10, 0, 11, 12, 13, 14,
15])
That's because I see this 2d repeated pattern
array([[ 0, 1, 2, 3, 4, 5],
[ 0, 6, 7, 8, 9, 10],
[ 0, 11, 12, 13, 14, 15]])
The second pattern can be produced by ravel of this 2d array (produced by broadcasting 2 arrays):
In [863]: np.array([0,1,4,5])[:,None]+np.arange(3)
Out[863]:
array([[0, 1, 2],
[1, 2, 3],
[4, 5, 6],
[5, 6, 7]])
I can produce the 1st pattern with a variation on the 2nd (the initial column of 0s disrupts the pattern)
I=np.array([0,5,10])[:,None]+np.arange(0,6)
I[:,0]=0
I think your double loop can be expressed as a list comprehension as
In [872]: np.array([ [k,i,i+1] for k in range(0,1,1) for i in range(k+1,k+8)]).ravel()
Out[872]: array([0, 1, 2, 0, 2, 3, 0, 3, 4, 0, 4, 5, 0, 5, 6, 0, 6, 7, 0, 7, 8])
or without the ravel:
array([[0, 1, 2],
[0, 2, 3],
[0, 3, 4],
[0, 4, 5],
[0, 5, 6],
[0, 6, 7],
[0, 7, 8]])
though I don't know what parameters produce your examples.
I'm not sure I understand exactly what you mean, but the following is what I use to generate unique indices for 3D points;
def indexate(points):
"""
Convert a numpy array of points into a list of indices and an array of
unique points.
Arguments:
points: A numpy array of shape (N, 3).
Returns:
An array of indices and an (M, 3) array of unique points.
"""
pd = {}
indices = [pd.setdefault(tuple(p), len(pd)) for p in points]
pt = sorted([(v, k) for k, v in pd.items()], key=lambda x: x[0])
unique = np.array([i[1] for i in pt])
return np.array(indices, np.uint16), unique
You can find this code in my stltools package on github.
It works like this;
In [1]: import numpy as np
In [2]: points = np.array([[1,0,0], [0,0,1], [1,0,0], [0,1,0]])
In [3]: pd = {}
In [4]: indices = [pd.setdefault(tuple(p), len(pd)) for p in points]
In [5]: indices
Out[5]: [0, 1, 0, 2]
In [6]: pt = sorted([(v, k) for k, v in pd.items()], key=lambda x: x[0])
In [7]: pt
Out[7]: [(0, (1, 0, 0)), (1, (0, 0, 1)), (2, (0, 1, 0))]
In [8]: unique = np.array([i[1] for i in pt])
In [9]: unique
Out[9]:
array([[1, 0, 0],
[0, 0, 1],
[0, 1, 0]])
The key point (if you'll pardon the pun) is to use a tuple of the point (because a tuple is immutable and thus hashable) as the key in a dictionary with the setdefault method, while the length of the dict is the value. In effect, the value is the first time this exact point was seen.
I am not 100% certain this is what you're after, I think you can achieve this using pair of range values and increment n times 3 (the gap between each group), then use numpy.concatenate to concatenate the final array, like this:
import numpy as np
def gen_list(n):
return np.concatenate([np.array(range(i, i+3) + range(i+1, i+4)) + i*3
for i in xrange(n)])
Usage:
gen_list(2)
Out[16]: array([0, 1, 2, 1, 2, 3, 4, 5, 6, 5, 6, 7])
gen_list(3)
Out[17]:
array([ 0, 1, 2, 1, 2, 3, 4, 5, 6, 5, 6, 7, 8, 9, 10, 9, 10,
11])
list(gen_list(2))
Out[18]: [0, 1, 2, 1, 2, 3, 4, 5, 6, 5, 6, 7]
In my sample I only use n as how many groups you want to generate, you may change this to suit your triangle-ish requirements.
I have a numpy array, for example
a = np.arange(10)
how can I move the first n elements to the end of the array?
I found this roll function but it seems like it only does the opposite, which shifts the last n elements to the beginning.
Why not just roll with a negative number?
>>> import numpy as np
>>> a = np.arange(10)
>>> np.roll(a,2)
array([8, 9, 0, 1, 2, 3, 4, 5, 6, 7])
>>> np.roll(a,-2)
array([2, 3, 4, 5, 6, 7, 8, 9, 0, 1])
you can use negative shift
a = np.arange(10)
print(np.roll(a, 3))
print(np.roll(a, -3))
returns
[7, 8, 9, 0, 1, 2, 3, 4, 5, 6]
[3, 4, 5, 6, 7, 8, 9, 0, 1, 2]