Substitute values in a vector (PyTorch) - python

I have a tensor of N unique target labels, randomly selected from [0,R], where N<R (i.e., my target vector can have any length, but only contains N unique labels.). I would like to transform the labels to [0,N]. Is there a function available for this target transform? e.g. input vector: [12, 6, 4, 5, 3, 12, 4] → transformed vector : [4, 3, 1, 2, 0, 4, 1]
My attempt:
I have implemented the following snippet, which works as expected, but might not be the most glorious implementation:
import torch
def my_transform(vec):
t_ = torch.unique(vec)
return torch.cat(list(map(lambda x: (t_ == x).nonzero(as_tuple=True)[0], vec)))
t = torch.Tensor([12, 6, 4, 5, 3, 12, 4])
print(my_transform(t))

You're looking for searchsorted
import torch
t = torch.Tensor([12, 6, 4, 5, 3, 12, 4])
transformed = torch.searchsorted(t.unique(),t)
# tensor([4, 3, 1, 2, 0, 4, 1])

In addition to #lwohlhart's answer, we can also use the return_inverse argument of unique:
import torch
x = torch.tensor([12, 6, 4, 5, 3, 12, 4])
x.unique(return_inverse=True)
# (tensor([ 3, 4, 5, 6, 12]), tensor([4, 3, 1, 2, 0, 4, 1]))

Related

Python: How to split a list into ordered chunks

If I have the following list
[0, 1, 2, 3, 4, 5, 6, 7, 8, 9]
Then
np.array_split([0, 1, 2, 3, 4, 5, 6, 7, 8, 9], 3)
Returns
[array([0, 1, 2, 3]), array([4, 5, 6]), array([7, 8, 9])]
Is there a way to get the sub-arrays in the following order?
[array([0, 3, 6, 9]), array([1, 4, 7]), array([2, 5, 8])]
As the lists are of differing lengths, a numpy.ndarray isn't possible without a bit of fiddling, as all sub-arrays must be the same length.
However, if a simple list meets your requirement, you can use:
l = [0, 1, 2, 3, 4, 5, 6, 7, 8, 9]
l2 = []
for i in range(3):
l2.append(l[i::3])
Output:
[[0, 3, 6, 9], [1, 4, 7], [2, 5, 8]]
Or more concisely, giving the same output:
[l[i::3] for i in range(3)]
Let's look into source code refactor of np.array_split:
def array_split(arr, Nsections):
Neach_section, extras = divmod(len(arr), Nsections)
section_sizes = ([0] + extras * [Neach_section + 1] + (Nsections - extras) * [Neach_section])
div_points = np.array(section_sizes).cumsum()
sub_arrs = []
for i in range(Nsections):
st = div_points[i]
end = div_points[i + 1]
sub_arrs.append(arr[st:end])
return sub_arrs
Taking into account your example arr = [0, 1, 2, 3, 4, 5, 6, 7, 8, 9] and Nsections = 3 it will construct section sizes [0, 4, 3, 3] and dividing points [0, 4, 7, 10]. Then do something like this:
[arr[div_points[i]:div_points[i + 1]] for i in range(3)]
Trying to mimic behaviour of numpy, indeed,
def array_split_withswap(arr, N):
sub_arrs = []
for i in range(N):
sub_arrs.append(arr[i::N])
Is the best option to go with (like in #S3DEV solution).

Pythonic way to replace specific values with other values of numpy array within particular indices

I have seen answers for each part of my question. For example np.where(arr, b, c) converts all b's in arr to c. Or arr[arr == b] = c does the same. However, I have 1000 labels in a numpy array, labels_test, including 1 and 6. I want to flip 30 percent of the correct labels to wrong ones to make an erroneous dataset. So I create the following list of indices that should be changed.
l = [np.random.choice(1000) for x in range(100)] (I am not sure if each index is repeated once)
I want something like
np.put(labels_test, l, if labels_test[l] ==1, then 6 and if labels_test[l] ==6, then 1`
We can do it for the following toy example:
np.random.seed(1)
labels_test = [np.random.choice([1,6]) for x in range(20)]
[6, 6, 1, 1, 6, 6, 6, 6, 6, 1, 1, 6, 1, 6, 6, 1, 1, 6, 1, 1]
Here is one way:
>>> labels_test = np.random.choice([1, 6], 20)
>>> ind = np.random.choice(labels_test.shape[0], labels_test.shape[0]//3, replace=False)
>>> labels_test
array([1, 6, 1, 1, 6, 1, 1, 1, 6, 1, 1, 1, 6, 6, 6, 6, 6, 1, 1, 1])
>>> labels_test[ind] = 7 - labels_test[ind]
>>> labels_test
array([1, 6, 1, 6, 6, 6, 1, 1, 6, 1, 6, 1, 1, 6, 1, 6, 6, 1, 1, 6])
This flips exactly 30% (rounded down to the nearest integer) by sampling without replacement. Depending on your requirements, a suitable alternative might be to select every label with probability 0.3.

Random sampling without replacement when more needs to be sampled than there are samples

I need to generate samples from a list of numbers in a scenario where I might have the situation that I need to sample more numbers than I have. More explicitly, this is what I need to do:
Let the total number of elements in my list be N.
I need to sample randomly without replacement from this list M samples.
If M <= N, then simply use Numpy's random.choice without replacement.
If M > N, then the samples must consist X times all the N numbers in the list, where X is the number of times N fully divides M, i.e. X = floor(M/N) and then sample additional M-(X*N) remainder samples from the list without replacement.
For example, let my list be the following:
L = [1, 2, 3, 4, 5]
and I need to sample 8 samples. Then firstly, I sample the full list once and additional 3 elements randomly without replacement, e.g. my samples could then be:
Sampled_list = [1, 2, 3, 4, 5, 3, 5, 1]
How can I implement such a code as efficiently as possible in terms of computation time in Python? Can this be done without for-loops?
At the moment I'm implementing this using for-loops but this is too inefficient for my purposes. I have also tried Numpy's random.choice without replacement but then I need to have M <= N.
Thank you for any help!
You can concatenate the results of repeat and random.choice:
np.concatenate((np.repeat(L, M // len(L)), np.random.choice(L, M - M // len(L))))
First, the sequence is repeated as often as necessary, then a choice is made for the remaining number needed; finally, the two arrays are concatenated.
Note that you can easily determine whether choice works with replacement or without, using the replace parameter:
replace : boolean, optional --
Whether the sample is with or without replacement
I would just wrap numpy's random.choice() like so:
L = [1, 2, 3, 4, 5]
def wrap_choice(list_to_sample, no_samples):
list_size = len(list_to_sample)
takes = no_samples // list_size
samples = list_to_sample * (no_samples // list_size) + list(np.random.choice(list_to_sample, no_samples - takes * list_size))
return samples
print(wrap_choice(L, 2)) # [5, 1]
print(wrap_choice(L, 13)) # [1, 2, 3, 4, 5, 1, 2, 3, 4, 5, 3, 3, 1]
Edit: There is no need to check for the length. The algorithm you have for when the requests are more than the list's length also works when this is not the case.
Here is what might be a solution for the case where 0 < M-N < max(L) :
import numpy as np
from numpy.random import random
l = np.array([1, 2, 3, 4, 5])
rand = [ i for i in l[np.argsort(np.amax(l))[:M-N]] ]
new_l = np.concatenate(l,rand)
Here is an example :
l = np.array([1,2,3,4,5])
M, N = 7, len(l)
rand = [i for i in l[np.argsort(np.random(np.amax(l)))][:M-N]]
new_l = np.concatenate(l,rand)
And here is the output :
new_list = np.array([1,2,3,4,5,3,4])
Use divmod() to get the number of repetitions of the list and the remainder/shortfall. The shortfall can then be randomly selected from the list using numpy.random.choice().
import numpy as np
def get_sample(l, n):
samples, shortfall = divmod(n, len(l))
return np.concatenate((np.repeat(l, samples), np.random.choice(l, shortfall, False)))
>>> get_sample(range(100), 10)
array([91, 95, 73, 96, 18, 37, 32, 97, 4, 41])
>>> get_sample(range(10), 100)
array([0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 2, 2, 2,
2, 2, 2, 2, 2, 2, 2, 3, 3, 3, 3, 3, 3, 3, 3, 3, 3, 4, 4, 4, 4, 4, 4,
4, 4, 4, 4, 5, 5, 5, 5, 5, 5, 5, 5, 5, 5, 6, 6, 6, 6, 6, 6, 6, 6, 6,
6, 7, 7, 7, 7, 7, 7, 7, 7, 7, 7, 8, 8, 8, 8, 8, 8, 8, 8, 8, 8, 9, 9,
9, 9, 9, 9, 9, 9, 9, 9])
>>> get_sample([1,2,3,4], 0)
array([], dtype=int64)
>>> get_sample([1,2,3,4], 4)
array([1, 2, 3, 4])
>>> get_sample([1,2,3,4], 6)
array([1, 2, 3, 4, 4, 3])
>>> get_sample([1,2,3,4], 6)
array([1, 2, 3, 4, 3, 2])
>>> get_sample(list('test string'), 6)
array(['n', 's', 'g', 's', 't', ' '],
dtype='|S1')
>>> get_sample(np.array(list('test string')), 4)
array(['r', 't', 's', 'g'],
dtype='|S1')

Is there a way to generate a list of indices using numpy

Can I use numpy to generate repeating patterns of indices for example.
0, 1, 2, 3, 4, 5, 0, 6, 7, 8, 9, 10, 0, 11, 12, 13, 14, 15
or
0,1,2,1,2,3,4,5,6,5,6,7
Is there a method in numpy i can use to generate these lists between a range ?
currently I am doing this using lists in python but I was curious if I could use numpy to speed things up.
I am not sure what methods to even look into other than numpy.arange.
Just to further clarify I am generating indices to triangles in opengl in various patterns.
so for traingles in a circle I have some code like this.
for fan_set in range(0, len(self.vertices) / vertex_length, triangle_count):
for i in range(fan_set + 1, fan_set + 8):
self.indices.append(fan_set)
self.indices.append(i)
self.indices.append(i + 1)
Your first example can be produced via numpy methods as:
In [860]: np.concatenate((np.zeros((3,1),int),np.arange(1,16).reshape(3,5)),axis=1).ravel()
Out[860]:
array([ 0, 1, 2, 3, 4, 5, 0, 6, 7, 8, 9, 10, 0, 11, 12, 13, 14,
15])
That's because I see this 2d repeated pattern
array([[ 0, 1, 2, 3, 4, 5],
[ 0, 6, 7, 8, 9, 10],
[ 0, 11, 12, 13, 14, 15]])
The second pattern can be produced by ravel of this 2d array (produced by broadcasting 2 arrays):
In [863]: np.array([0,1,4,5])[:,None]+np.arange(3)
Out[863]:
array([[0, 1, 2],
[1, 2, 3],
[4, 5, 6],
[5, 6, 7]])
I can produce the 1st pattern with a variation on the 2nd (the initial column of 0s disrupts the pattern)
I=np.array([0,5,10])[:,None]+np.arange(0,6)
I[:,0]=0
I think your double loop can be expressed as a list comprehension as
In [872]: np.array([ [k,i,i+1] for k in range(0,1,1) for i in range(k+1,k+8)]).ravel()
Out[872]: array([0, 1, 2, 0, 2, 3, 0, 3, 4, 0, 4, 5, 0, 5, 6, 0, 6, 7, 0, 7, 8])
or without the ravel:
array([[0, 1, 2],
[0, 2, 3],
[0, 3, 4],
[0, 4, 5],
[0, 5, 6],
[0, 6, 7],
[0, 7, 8]])
though I don't know what parameters produce your examples.
I'm not sure I understand exactly what you mean, but the following is what I use to generate unique indices for 3D points;
def indexate(points):
"""
Convert a numpy array of points into a list of indices and an array of
unique points.
Arguments:
points: A numpy array of shape (N, 3).
Returns:
An array of indices and an (M, 3) array of unique points.
"""
pd = {}
indices = [pd.setdefault(tuple(p), len(pd)) for p in points]
pt = sorted([(v, k) for k, v in pd.items()], key=lambda x: x[0])
unique = np.array([i[1] for i in pt])
return np.array(indices, np.uint16), unique
You can find this code in my stltools package on github.
It works like this;
In [1]: import numpy as np
In [2]: points = np.array([[1,0,0], [0,0,1], [1,0,0], [0,1,0]])
In [3]: pd = {}
In [4]: indices = [pd.setdefault(tuple(p), len(pd)) for p in points]
In [5]: indices
Out[5]: [0, 1, 0, 2]
In [6]: pt = sorted([(v, k) for k, v in pd.items()], key=lambda x: x[0])
In [7]: pt
Out[7]: [(0, (1, 0, 0)), (1, (0, 0, 1)), (2, (0, 1, 0))]
In [8]: unique = np.array([i[1] for i in pt])
In [9]: unique
Out[9]:
array([[1, 0, 0],
[0, 0, 1],
[0, 1, 0]])
The key point (if you'll pardon the pun) is to use a tuple of the point (because a tuple is immutable and thus hashable) as the key in a dictionary with the setdefault method, while the length of the dict is the value. In effect, the value is the first time this exact point was seen.
I am not 100% certain this is what you're after, I think you can achieve this using pair of range values and increment n times 3 (the gap between each group), then use numpy.concatenate to concatenate the final array, like this:
import numpy as np
def gen_list(n):
return np.concatenate([np.array(range(i, i+3) + range(i+1, i+4)) + i*3
for i in xrange(n)])
Usage:
gen_list(2)
Out[16]: array([0, 1, 2, 1, 2, 3, 4, 5, 6, 5, 6, 7])
gen_list(3)
Out[17]:
array([ 0, 1, 2, 1, 2, 3, 4, 5, 6, 5, 6, 7, 8, 9, 10, 9, 10,
11])
list(gen_list(2))
Out[18]: [0, 1, 2, 1, 2, 3, 4, 5, 6, 5, 6, 7]
In my sample I only use n as how many groups you want to generate, you may change this to suit your triangle-ish requirements.

how to do circular shift in numpy

I have a numpy array, for example
a = np.arange(10)
how can I move the first n elements to the end of the array?
I found this roll function but it seems like it only does the opposite, which shifts the last n elements to the beginning.
Why not just roll with a negative number?
>>> import numpy as np
>>> a = np.arange(10)
>>> np.roll(a,2)
array([8, 9, 0, 1, 2, 3, 4, 5, 6, 7])
>>> np.roll(a,-2)
array([2, 3, 4, 5, 6, 7, 8, 9, 0, 1])
you can use negative shift
a = np.arange(10)
print(np.roll(a, 3))
print(np.roll(a, -3))
returns
[7, 8, 9, 0, 1, 2, 3, 4, 5, 6]
[3, 4, 5, 6, 7, 8, 9, 0, 1, 2]

Categories

Resources