Related
If I have the following list
[0, 1, 2, 3, 4, 5, 6, 7, 8, 9]
Then
np.array_split([0, 1, 2, 3, 4, 5, 6, 7, 8, 9], 3)
Returns
[array([0, 1, 2, 3]), array([4, 5, 6]), array([7, 8, 9])]
Is there a way to get the sub-arrays in the following order?
[array([0, 3, 6, 9]), array([1, 4, 7]), array([2, 5, 8])]
As the lists are of differing lengths, a numpy.ndarray isn't possible without a bit of fiddling, as all sub-arrays must be the same length.
However, if a simple list meets your requirement, you can use:
l = [0, 1, 2, 3, 4, 5, 6, 7, 8, 9]
l2 = []
for i in range(3):
l2.append(l[i::3])
Output:
[[0, 3, 6, 9], [1, 4, 7], [2, 5, 8]]
Or more concisely, giving the same output:
[l[i::3] for i in range(3)]
Let's look into source code refactor of np.array_split:
def array_split(arr, Nsections):
Neach_section, extras = divmod(len(arr), Nsections)
section_sizes = ([0] + extras * [Neach_section + 1] + (Nsections - extras) * [Neach_section])
div_points = np.array(section_sizes).cumsum()
sub_arrs = []
for i in range(Nsections):
st = div_points[i]
end = div_points[i + 1]
sub_arrs.append(arr[st:end])
return sub_arrs
Taking into account your example arr = [0, 1, 2, 3, 4, 5, 6, 7, 8, 9] and Nsections = 3 it will construct section sizes [0, 4, 3, 3] and dividing points [0, 4, 7, 10]. Then do something like this:
[arr[div_points[i]:div_points[i + 1]] for i in range(3)]
Trying to mimic behaviour of numpy, indeed,
def array_split_withswap(arr, N):
sub_arrs = []
for i in range(N):
sub_arrs.append(arr[i::N])
Is the best option to go with (like in #S3DEV solution).
I have seen answers for each part of my question. For example np.where(arr, b, c) converts all b's in arr to c. Or arr[arr == b] = c does the same. However, I have 1000 labels in a numpy array, labels_test, including 1 and 6. I want to flip 30 percent of the correct labels to wrong ones to make an erroneous dataset. So I create the following list of indices that should be changed.
l = [np.random.choice(1000) for x in range(100)] (I am not sure if each index is repeated once)
I want something like
np.put(labels_test, l, if labels_test[l] ==1, then 6 and if labels_test[l] ==6, then 1`
We can do it for the following toy example:
np.random.seed(1)
labels_test = [np.random.choice([1,6]) for x in range(20)]
[6, 6, 1, 1, 6, 6, 6, 6, 6, 1, 1, 6, 1, 6, 6, 1, 1, 6, 1, 1]
Here is one way:
>>> labels_test = np.random.choice([1, 6], 20)
>>> ind = np.random.choice(labels_test.shape[0], labels_test.shape[0]//3, replace=False)
>>> labels_test
array([1, 6, 1, 1, 6, 1, 1, 1, 6, 1, 1, 1, 6, 6, 6, 6, 6, 1, 1, 1])
>>> labels_test[ind] = 7 - labels_test[ind]
>>> labels_test
array([1, 6, 1, 6, 6, 6, 1, 1, 6, 1, 6, 1, 1, 6, 1, 6, 6, 1, 1, 6])
This flips exactly 30% (rounded down to the nearest integer) by sampling without replacement. Depending on your requirements, a suitable alternative might be to select every label with probability 0.3.
I need to generate samples from a list of numbers in a scenario where I might have the situation that I need to sample more numbers than I have. More explicitly, this is what I need to do:
Let the total number of elements in my list be N.
I need to sample randomly without replacement from this list M samples.
If M <= N, then simply use Numpy's random.choice without replacement.
If M > N, then the samples must consist X times all the N numbers in the list, where X is the number of times N fully divides M, i.e. X = floor(M/N) and then sample additional M-(X*N) remainder samples from the list without replacement.
For example, let my list be the following:
L = [1, 2, 3, 4, 5]
and I need to sample 8 samples. Then firstly, I sample the full list once and additional 3 elements randomly without replacement, e.g. my samples could then be:
Sampled_list = [1, 2, 3, 4, 5, 3, 5, 1]
How can I implement such a code as efficiently as possible in terms of computation time in Python? Can this be done without for-loops?
At the moment I'm implementing this using for-loops but this is too inefficient for my purposes. I have also tried Numpy's random.choice without replacement but then I need to have M <= N.
Thank you for any help!
You can concatenate the results of repeat and random.choice:
np.concatenate((np.repeat(L, M // len(L)), np.random.choice(L, M - M // len(L))))
First, the sequence is repeated as often as necessary, then a choice is made for the remaining number needed; finally, the two arrays are concatenated.
Note that you can easily determine whether choice works with replacement or without, using the replace parameter:
replace : boolean, optional --
Whether the sample is with or without replacement
I would just wrap numpy's random.choice() like so:
L = [1, 2, 3, 4, 5]
def wrap_choice(list_to_sample, no_samples):
list_size = len(list_to_sample)
takes = no_samples // list_size
samples = list_to_sample * (no_samples // list_size) + list(np.random.choice(list_to_sample, no_samples - takes * list_size))
return samples
print(wrap_choice(L, 2)) # [5, 1]
print(wrap_choice(L, 13)) # [1, 2, 3, 4, 5, 1, 2, 3, 4, 5, 3, 3, 1]
Edit: There is no need to check for the length. The algorithm you have for when the requests are more than the list's length also works when this is not the case.
Here is what might be a solution for the case where 0 < M-N < max(L) :
import numpy as np
from numpy.random import random
l = np.array([1, 2, 3, 4, 5])
rand = [ i for i in l[np.argsort(np.amax(l))[:M-N]] ]
new_l = np.concatenate(l,rand)
Here is an example :
l = np.array([1,2,3,4,5])
M, N = 7, len(l)
rand = [i for i in l[np.argsort(np.random(np.amax(l)))][:M-N]]
new_l = np.concatenate(l,rand)
And here is the output :
new_list = np.array([1,2,3,4,5,3,4])
Use divmod() to get the number of repetitions of the list and the remainder/shortfall. The shortfall can then be randomly selected from the list using numpy.random.choice().
import numpy as np
def get_sample(l, n):
samples, shortfall = divmod(n, len(l))
return np.concatenate((np.repeat(l, samples), np.random.choice(l, shortfall, False)))
>>> get_sample(range(100), 10)
array([91, 95, 73, 96, 18, 37, 32, 97, 4, 41])
>>> get_sample(range(10), 100)
array([0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 2, 2, 2,
2, 2, 2, 2, 2, 2, 2, 3, 3, 3, 3, 3, 3, 3, 3, 3, 3, 4, 4, 4, 4, 4, 4,
4, 4, 4, 4, 5, 5, 5, 5, 5, 5, 5, 5, 5, 5, 6, 6, 6, 6, 6, 6, 6, 6, 6,
6, 7, 7, 7, 7, 7, 7, 7, 7, 7, 7, 8, 8, 8, 8, 8, 8, 8, 8, 8, 8, 9, 9,
9, 9, 9, 9, 9, 9, 9, 9])
>>> get_sample([1,2,3,4], 0)
array([], dtype=int64)
>>> get_sample([1,2,3,4], 4)
array([1, 2, 3, 4])
>>> get_sample([1,2,3,4], 6)
array([1, 2, 3, 4, 4, 3])
>>> get_sample([1,2,3,4], 6)
array([1, 2, 3, 4, 3, 2])
>>> get_sample(list('test string'), 6)
array(['n', 's', 'g', 's', 't', ' '],
dtype='|S1')
>>> get_sample(np.array(list('test string')), 4)
array(['r', 't', 's', 'g'],
dtype='|S1')
Can I use numpy to generate repeating patterns of indices for example.
0, 1, 2, 3, 4, 5, 0, 6, 7, 8, 9, 10, 0, 11, 12, 13, 14, 15
or
0,1,2,1,2,3,4,5,6,5,6,7
Is there a method in numpy i can use to generate these lists between a range ?
currently I am doing this using lists in python but I was curious if I could use numpy to speed things up.
I am not sure what methods to even look into other than numpy.arange.
Just to further clarify I am generating indices to triangles in opengl in various patterns.
so for traingles in a circle I have some code like this.
for fan_set in range(0, len(self.vertices) / vertex_length, triangle_count):
for i in range(fan_set + 1, fan_set + 8):
self.indices.append(fan_set)
self.indices.append(i)
self.indices.append(i + 1)
Your first example can be produced via numpy methods as:
In [860]: np.concatenate((np.zeros((3,1),int),np.arange(1,16).reshape(3,5)),axis=1).ravel()
Out[860]:
array([ 0, 1, 2, 3, 4, 5, 0, 6, 7, 8, 9, 10, 0, 11, 12, 13, 14,
15])
That's because I see this 2d repeated pattern
array([[ 0, 1, 2, 3, 4, 5],
[ 0, 6, 7, 8, 9, 10],
[ 0, 11, 12, 13, 14, 15]])
The second pattern can be produced by ravel of this 2d array (produced by broadcasting 2 arrays):
In [863]: np.array([0,1,4,5])[:,None]+np.arange(3)
Out[863]:
array([[0, 1, 2],
[1, 2, 3],
[4, 5, 6],
[5, 6, 7]])
I can produce the 1st pattern with a variation on the 2nd (the initial column of 0s disrupts the pattern)
I=np.array([0,5,10])[:,None]+np.arange(0,6)
I[:,0]=0
I think your double loop can be expressed as a list comprehension as
In [872]: np.array([ [k,i,i+1] for k in range(0,1,1) for i in range(k+1,k+8)]).ravel()
Out[872]: array([0, 1, 2, 0, 2, 3, 0, 3, 4, 0, 4, 5, 0, 5, 6, 0, 6, 7, 0, 7, 8])
or without the ravel:
array([[0, 1, 2],
[0, 2, 3],
[0, 3, 4],
[0, 4, 5],
[0, 5, 6],
[0, 6, 7],
[0, 7, 8]])
though I don't know what parameters produce your examples.
I'm not sure I understand exactly what you mean, but the following is what I use to generate unique indices for 3D points;
def indexate(points):
"""
Convert a numpy array of points into a list of indices and an array of
unique points.
Arguments:
points: A numpy array of shape (N, 3).
Returns:
An array of indices and an (M, 3) array of unique points.
"""
pd = {}
indices = [pd.setdefault(tuple(p), len(pd)) for p in points]
pt = sorted([(v, k) for k, v in pd.items()], key=lambda x: x[0])
unique = np.array([i[1] for i in pt])
return np.array(indices, np.uint16), unique
You can find this code in my stltools package on github.
It works like this;
In [1]: import numpy as np
In [2]: points = np.array([[1,0,0], [0,0,1], [1,0,0], [0,1,0]])
In [3]: pd = {}
In [4]: indices = [pd.setdefault(tuple(p), len(pd)) for p in points]
In [5]: indices
Out[5]: [0, 1, 0, 2]
In [6]: pt = sorted([(v, k) for k, v in pd.items()], key=lambda x: x[0])
In [7]: pt
Out[7]: [(0, (1, 0, 0)), (1, (0, 0, 1)), (2, (0, 1, 0))]
In [8]: unique = np.array([i[1] for i in pt])
In [9]: unique
Out[9]:
array([[1, 0, 0],
[0, 0, 1],
[0, 1, 0]])
The key point (if you'll pardon the pun) is to use a tuple of the point (because a tuple is immutable and thus hashable) as the key in a dictionary with the setdefault method, while the length of the dict is the value. In effect, the value is the first time this exact point was seen.
I am not 100% certain this is what you're after, I think you can achieve this using pair of range values and increment n times 3 (the gap between each group), then use numpy.concatenate to concatenate the final array, like this:
import numpy as np
def gen_list(n):
return np.concatenate([np.array(range(i, i+3) + range(i+1, i+4)) + i*3
for i in xrange(n)])
Usage:
gen_list(2)
Out[16]: array([0, 1, 2, 1, 2, 3, 4, 5, 6, 5, 6, 7])
gen_list(3)
Out[17]:
array([ 0, 1, 2, 1, 2, 3, 4, 5, 6, 5, 6, 7, 8, 9, 10, 9, 10,
11])
list(gen_list(2))
Out[18]: [0, 1, 2, 1, 2, 3, 4, 5, 6, 5, 6, 7]
In my sample I only use n as how many groups you want to generate, you may change this to suit your triangle-ish requirements.
I have a numpy array, for example
a = np.arange(10)
how can I move the first n elements to the end of the array?
I found this roll function but it seems like it only does the opposite, which shifts the last n elements to the beginning.
Why not just roll with a negative number?
>>> import numpy as np
>>> a = np.arange(10)
>>> np.roll(a,2)
array([8, 9, 0, 1, 2, 3, 4, 5, 6, 7])
>>> np.roll(a,-2)
array([2, 3, 4, 5, 6, 7, 8, 9, 0, 1])
you can use negative shift
a = np.arange(10)
print(np.roll(a, 3))
print(np.roll(a, -3))
returns
[7, 8, 9, 0, 1, 2, 3, 4, 5, 6]
[3, 4, 5, 6, 7, 8, 9, 0, 1, 2]