First I create my array
myarray = np.random.random_integers(0,10, size=20)
Then, I want to set 20% of the elements in the array to 0 (or some other number). How should I do this? Apply a mask?
You can calculate the indices with np.random.choice, limiting the number of chosen indices to the percentage:
indices = np.random.choice(np.arange(myarray.size), replace=False,
size=int(myarray.size * 0.2))
myarray[indices] = 0
For others looking for the answer in case of nd-array, as proposed by user holi:
my_array = np.random.rand(8, 50)
indices = np.random.choice(my_array.shape[1]*my_array.shape[0], replace=False, size=int(my_array.shape[1]*my_array.shape[0]*0.2))
We multiply the dimensions to get an array of length dim1*dim2, then we apply this indices to our array:
my_array[np.unravel_index(indices, my_array.shape)] = 0
The array is now masked.
Use np.random.permutation as random index generator, and take the first 20% of the index.
myarray = np.random.random_integers(0,10, size=20)
n = len(myarray)
random_idx = np.random.permutation(n)
frac = 20 # [%]
zero_idx = random_idx[:round(n*frac/100)]
myarray[zero_idx] = 0
If you want the 20% to be random:
random_list = []
array_len = len(myarray)
while len(random_list) < (array_len/5):
random_int = math.randint(0,array_len)
if random_int not in random_list:
random_list.append(random_int)
for position in random_list:
myarray[position] = 0
return myarray
This would ensure you definitely get 20% of the values, and RNG rolling the same number many times would not result in less than 20% of the values being 0.
Assume your input numpy array is A and p=0.2. The following are a couple of ways to achieve this.
Exact Masking
ones = np.ones(A.size)
idx = int(min(p*A.size, A.size))
ones[:idx] = 0
A *= np.reshape(np.random.permutation(ones), A.shape)
Approximate Masking
This is commonly done in several denoising objectives, most notably the Masked Language Modeling in Transformers pre-training. Here is a more pythonic way of setting a certain proportion (say 20%) of elements to zero.
A *= np.random.binomial(size=A.shape, n=1, p=0.8)
Another Alternative:
A *= np.random.randint(0, 2, A.shape)
Related
I have 2 arrays, x_1g and x_2g. I want to randomly sample 10% of each array and remove that 10% and insert it into the other array. This means that my final and initial arrays should have the same shape, but 10% of the data is randomly sampled from the other array. I have been trying this with the code below but my arrays keep increasing in length, meaning I haven't properly deleted the sampled 10% data from each array.
n = len(x_1g)
n2 = round(n/10)
ints1 = np.random.choice(n, n2)
x_1_replace = x_1g[ints1,:]
x_1 = np.delete(x_1g, ints1, 0)
x_2_replace = x_2g[ints1,:]
x_2 = np.delete(x_2g, ints1, 0)
My arrays x_1g and x_2g have shapes (150298, 10)
x_1g.shape
>> (1502983, 10)
x_1_replace.shape
>> (150298, 10)
so when I remove the 10% data (x_1_replace) from my original array (x_1g) I should get the array shape:
1502983-150298 = 1352685
However when I check the shape of my array x_1 I get:
x_1.shape
>> (1359941, 10)
I'm not sure what is going on here so if anyone has any suggestions please let me know!!
What happens, is that by using ints1 = np.random.choice(n, n2) to generate your indices, you are choosing n2 times a number between 0 and n-1. You have no guarantee that you will generate n2 different numbers. You are most likely generating a certain number of duplicates. And if you pass several times the same index position to np.delete it will be deleted just once. You can check this by reading the number of unique values in ints1:
np.unique(ints1).shape
You'll see it is not matching n2 (in your example, you'll get (143042,)).
There's probably more than one way to ensure that you'll get n2 different indices, here is one example:
n = len(x_1g)
n2 = round(n/10)
ints1 = np.arange(n) # generating an array [0 ... n-1]
np.random.shuffle(ints1) # shuffle it
ints1 = ints1[:n2] # take the first n2 values
x_1_replace = x_1g[ints1,:]
x_1 = np.delete(x_1g, ints1, 0)
x_2_replace = x_2g[ints1,:]
x_2 = np.delete(x_2g, ints1, 0)
Now you can check:
x_1.shape
# (1352685, 10)
I am looking for an efficient way to do the following calculations on millions of arrays. For the values in each array, I want to calculate the mean of the values in the bin with most frequency as demonstrated below. Some of the arrays might contain nan values and other values are float. The loop for my actual data takes too long to finish.
import numpy as np
array = np.array([np.random.uniform(0, 10) for i in range(800,)])
# adding nan values
mask = np.random.choice([1, 0], array.shape, p=[.7, .3]).astype(bool)
array[mask] = np.nan
array = array.reshape(50, 16)
bin_values=np.linspace(0, 10, 21)
f = np.apply_along_axis(lambda a: np.histogram(a, bins=bin_values)[0], 1, array)
bin_start = np.apply_along_axis(lambda a: bin_values[np.argmax(a)], 1, f).reshape(array.shape[0], -1)
bin_end = bin_start + (abs(bin_values[1]-bin_values[0])
values = np.zeros(array.shape[0])
for i in range(array.shape[0]):
values[i] = np.nanmean(array[i][(array[i]>=bin_start[i])*(array[i]<bin_end[i])])
Also, when I run the above code I get three warnings. The first is 'RuntimeWarning: Mean of empty slice' for the line where I calculate the value variable. I set a condition in case I have all nan values to skip this line, but the warning did not go away. I was wondering what the reason is. The other two warnings are for when the less and greater_equal conditions do not meet which make sense to me since they might be nan values.
The arrays that I want to run this algorithm on are independent, but I am already processing them with 12 separate scripts. Running the code in parallel would be an option, however, for now I am looking to improve the algorithm itself.
The reason that I am using lambda function is to run numpy.histogram over an axis since it seems the histogram function does not take an axis as an option. I was able to use a mask and remove the loop from the code. The code is 2 times faster now, but I think it still can be improved more.
I can explain what I want to do in more detail by an example if it clarifies it. Imagine I have 36 numbers which are greater than 0 and smaller than 20. Also, I have bins with equal distance of 0.5 over the same interval (0.0_0.5, 0.5_1.0, 1.0_1.5, … , 19.5_20.0). I want to see if I put the 36 numbers in their corresponding bin what would be the mean of the numbers within the bin which contain the most number of numbers.
Please post your solution if you can think of a faster algorithm.
import numpy as np
# creating an array to test the algorithm
array = np.array([np.random.uniform(0, 10) for i in range(800,)])
# adding nan values
mask = np.random.choice([1, 0], array.shape, p=[.7, .3]).astype(bool)
array[mask] = np.nan
array = array.reshape(50, 16)
# the algorithm
bin_values=np.linspace(0, 10, 21)
# calculating the frequency of each bin
f = np.apply_along_axis(lambda a: np.histogram(a, bins=bin_values)[0], 1, array)
bin_start = np.apply_along_axis(lambda a: bin_values[np.argmax(a)], 1, f).reshape(array.shape[0], -1)
bin_end = bin_start + (abs(bin_values[1]-bin_values[0]))
# creating a mask to get the mean over the bin with maximum frequency
mask = (array>=bin_start) * (array<bin_end)
mask_nan = np.tile(np.nan, (mask.shape[0], mask.shape[1]))
mask_nan[mask] = 1
v = np.nanmean(array * mask_nan, axis = 1)
I have a 3 dimensional numpy array that I am checking multiple conditions for. I am checking each element to see if they are less than a certain number. If each 3d element is indexed by i, where i=[0,1,2] in a what I call array3, and if one of the elements is greater than the number I have set, maybe giving a boolean array [False, True, True] or [False, False, True], then this index is eliminated from array3.
I have a dumb method for each element less than 20:
import numpy as np
wx = np.where( np.abs(array3[:,0]) <= 20.0 ) # x values less than 20
xarray3x = array3[:,0][wx]
yarray3x = array3[:,1][wx]
zarray3x = array3[:,2][wx]
wy = np.where( np.abs(yarray3x) <= 20.0 ) # y values less than 20
xarray3xy = xarray3x[wy]
yarray3xy = yarray3x[wy]
zarray3xy = zarray3x[wy]
wz = np.where( np.abs(zarray3xy) <= 20.0 ) # z values less than 20
xarray3xyz = xarray3xy[wz]
yarray3xyz = yarray3xy[wz]
zarray3xyz = zarray3xy[wz]
Which works, but it can be annoying to keep up with the variables I have named. So now I am trying to write something that takes up less lines (and hopefully less compiling time).
I was thinking of constructing a for loop for each indices like so:
for i in range(3):
w = np.where( abs(array3[:,i]).all() <= 20. )
n_array = array3[w]
But I will only construct one value instead of many values.
I think that using a 4D array is the easiest in this example. In that case, you can check whether the element with the lowest value in the vector is below the threshold. Then, with np.newaxis, you can apply the mask to the entire vector and create a masked array.
import numpy as np
n = 4 # Size of the first three dimensions.
array3 = 100.*np.random.rand(n, n, n, 3) # Random numbers between 0 and 100.
thres = 20.
m = np.empty(array3.shape, dtype=bool)
m[:,:,:,:] = (np.min(array3, axis=-1) < thres)[:,:,:,np.newaxis]
array3_masked = np.ma.masked_array(array3, mask=m)
I have a relatively large array, e.g. 200 x 1000.The matrix is a sparse matrix where elements are can be considered weights. The weights range from 0 to 500. I would like to generate a new array of the same size, 200x1000, where N of the elements of the new array are random integers {0,1}. The probability of an element in the new matrix being 0 or 1 is proportional to the weights from the original array - the higher the weight the higher the probability of 1 versus 0.
Stated another way: I would like to generate a zero matrix of size 200x1000 and then randomly choose N elements to flip to 1 based on a 200x1000 matrix of weights.
I'll throw my proposed solution in here as well:
# for example
a = np.random.random_integers(0, 500, size=(200,1000))
N = 200
result = np.zeros((200,1000))
ia = np.arange(result.size)
tw = float(np.sum(a.ravel()))
result.ravel()[np.random.choice(ia, p=a.ravel()/tw,
size=N, replace=False)]=1
where a is the array of weights: that is, pick the indexes for the items to change to 1 from the array ia, weighted by a.
This can be done with numpy with
# Compute probabilities as a 1-D array
probs = numpy.float64(weights).ravel()
probs /= numpy.sum(probs)
# Pick winner indexes
winners = numpy.random.choice(len(probs), N, False, probs)
# Build result
result = numpy.zeros(weights.shape, numpy.uint8)
result.ravel()[winners] = 1
Something like this should work, no reason to get too complicated:
>>> import random
>>> weights = [[1,5],[500,0]]
>>> output = []
>>> for row in weights:
... outRow = []
... for entry in row:
... outRow.append(random.choice([0]+[1 for _ in range(entry)]))
... output.append(outRow)
...
>>> output
[[1, 1], [1, 0]]
This chooses a random entry from a sequence which always has a single zero and then n 1s where n is the corresponding entry in your weight matrix. In this implementation, a weight of 1 is actually a 50/50 chance of either a 1 or 0. If you want a 50/50 chance to happen at 250 use outRow.append(random.choice([0 for _ in range(500-entry)] + [1 for _ in range(entry)]))
I am trying to create a matrix of random numbers, but my solution is too long and looks ugly
random_matrix = [[random.random() for e in range(2)] for e in range(3)]
this looks ok, but in my implementation it is
weights_h = [[random.random() for e in range(len(inputs[0]))] for e in range(hiden_neurons)]
which is extremely unreadable and does not fit on one line.
You can drop the range(len()):
weights_h = [[random.random() for e in inputs[0]] for e in range(hiden_neurons)]
But really, you should probably use numpy.
In [9]: numpy.random.random((3, 3))
Out[9]:
array([[ 0.37052381, 0.03463207, 0.10669077],
[ 0.05862909, 0.8515325 , 0.79809676],
[ 0.43203632, 0.54633635, 0.09076408]])
Take a look at numpy.random.rand:
Docstring: rand(d0, d1, ..., dn)
Random values in a given shape.
Create an array of the given shape and propagate it with random
samples from a uniform distribution over [0, 1).
>>> import numpy as np
>>> np.random.rand(2,3)
array([[ 0.22568268, 0.0053246 , 0.41282024],
[ 0.68824936, 0.68086462, 0.6854153 ]])
use np.random.randint() as np.random.random_integers() is deprecated
random_matrix = np.random.randint(min_val,max_val,(<num_rows>,<num_cols>))
Looks like you are doing a Python implementation of the Coursera Machine Learning Neural Network exercise. Here's what I did for randInitializeWeights(L_in, L_out)
#get a random array of floats between 0 and 1 as Pavel mentioned
W = numpy.random.random((L_out, L_in +1))
#normalize so that it spans a range of twice epsilon
W = W * 2 * epsilon
#shift so that mean is at zero
W = W - epsilon
For creating an array of random numbers NumPy provides array creation using:
Real numbers
Integers
For creating array using random Real numbers:
there are 2 options
random.rand (for uniform distribution of the generated random numbers )
random.randn (for normal distribution of the generated random numbers )
random.rand
import numpy as np
arr = np.random.rand(row_size, column_size)
random.randn
import numpy as np
arr = np.random.randn(row_size, column_size)
For creating array using random Integers:
import numpy as np
numpy.random.randint(low, high=None, size=None, dtype='l')
where
low = Lowest (signed) integer to be drawn from the distribution
high(optional)= If provided, one above the largest (signed) integer to be drawn from the distribution
size(optional) = Output shape i.e. if the given shape is, e.g., (m, n, k), then m * n * k samples are drawn
dtype(optional) = Desired dtype of the result.
eg:
The given example will produce an array of random integers between 0 and 4, its size will be 5*5 and have 25 integers
arr2 = np.random.randint(0,5,size = (5,5))
in order to create 5 by 5 matrix, it should be modified to
arr2 = np.random.randint(0,5,size = (5,5)), change the multiplication symbol* to a comma ,#
[[2 1 1 0 1][3 2 1 4 3][2 3 0 3 3][1 3 1 0 0][4 1 2 0 1]]
eg2:
The given example will produce an array of random integers between 0 and 1, its size will be 1*10 and will have 10 integers
arr3= np.random.randint(2, size = 10)
[0 0 0 0 1 1 0 0 1 1]
First, create numpy array then convert it into matrix. See the code below:
import numpy
B = numpy.random.random((3, 4)) #its ndArray
C = numpy.matrix(B)# it is matrix
print(type(B))
print(type(C))
print(C)
x = np.int_(np.random.rand(10) * 10)
For random numbers out of 10. For out of 20 we have to multiply by 20.
When you say "a matrix of random numbers", you can use numpy as Pavel https://stackoverflow.com/a/15451997/6169225 mentioned above, in this case I'm assuming to you it is irrelevant what distribution these (pseudo) random numbers adhere to.
However, if you require a particular distribution (I imagine you are interested in the uniform distribution), numpy.random has very useful methods for you. For example, let's say you want a 3x2 matrix with a pseudo random uniform distribution bounded by [low,high]. You can do this like so:
numpy.random.uniform(low,high,(3,2))
Note, you can replace uniform by any number of distributions supported by this library.
Further reading: https://docs.scipy.org/doc/numpy/reference/routines.random.html
A simple way of creating an array of random integers is:
matrix = np.random.randint(maxVal, size=(rows, columns))
The following outputs a 2 by 3 matrix of random integers from 0 to 10:
a = np.random.randint(10, size=(2,3))
random_matrix = [[random.random for j in range(collumns)] for i in range(rows)
for i in range(rows):
print random_matrix[i]
An answer using map-reduce:-
map(lambda x: map(lambda y: ran(),range(len(inputs[0]))),range(hiden_neurons))
#this is a function for a square matrix so on the while loop rows does not have to be less than cols.
#you can make your own condition. But if you want your a square matrix, use this code.
import random
import numpy as np
def random_matrix(R, cols):
matrix = []
rows = 0
while rows < cols:
N = random.sample(R, cols)
matrix.append(N)
rows = rows + 1
return np.array(matrix)
print(random_matrix(range(10), 5))
#make sure you understand the function random.sample
numpy.random.rand(row, column) generates random numbers between 0 and 1, according to the specified (m,n) parameters given. So use it to create a (m,n) matrix and multiply the matrix for the range limit and sum it with the high limit.
Analyzing: If zero is generated just the low limit will be held, but if one is generated just the high limit will be held. In order words, generating the limits using rand numpy you can generate the extreme desired numbers.
import numpy as np
high = 10
low = 5
m,n = 2,2
a = (high - low)*np.random.rand(m,n) + low
Output:
a = array([[5.91580065, 8.1117106 ],
[6.30986984, 5.720437 ]])