Numpy random array limited by other arrays - python

I have two numpy ndarrays with the same sizes.
a = np.random.randn(x,y)
b = np.random.randn(x,y)
I want to create a new array, where every element will be a random value between the values of the elements with the same indices in a and in b. So every element c[i][j] should be between a[i][j] and b[i][j].
Is there any quicker/simpler/more efficient way than to go through all elements of c and assign random values?

you could do this:
c=a+(b-a)*d
with d = random array with values between 0 and 1 in the same dimensions like a

Here's an idea using numpy:
a = np.random.randn(2,5)
array([[ 1.56068748, -2.21431346],
[-0.33707115, 0.93420256]])
b = np.random.randn(2,5)
array([[-0.0522846 , 0.11635731],
[-0.57028069, -1.08307492]])
# Create an interleaved array from both a and b
s = np.vstack((a.ravel(),b.ravel()))
array([[ 1.56068748, -2.21431346, -0.33707115, 0.93420256],
[-0.0522846 , 0.11635731, -0.57028069, -1.08307492]])
# Feed it to `np.random.uniform` which takes low and high as inputs
# and reshape it to match input shape
np.random.uniform(*s).reshape(a.shape)
array([[ 0.14467235, -0.79804187],
[-0.41495614, -0.19177284]])

You could use numpy.random.uniform, from the documentation:
low : float or array_like of floats, optional
Lower boundary of the output interval. All values generated will be
greater than or equal to low. The default value is 0.
high : float or array_like of floats
Upper boundary of the output interval. All values generated will be
less than high. The default value is 1.0.
So both low and high can receive arrays as parameters, for the sake of completeness see the code below:
Code:
import numpy as np
x, y = 5, 5
a = np.random.randn(x, y)
b = np.random.randn(x, y)
high = np.maximum(a, b)
low = np.minimum(a, b)
c = np.random.uniform(low, high, (x, y))
print((low <= c).all() and (c <= high).all())
Output
True
In the example above note the usage of maximum and minimum to build both high and low. The last line checks that indeed all values of c are between high and low. You can do it all in one-line, if that is of interest to you:
c = np.random.uniform(np.minimum(a, b), np.maximum(a, b), (x, y))

Related

What is an efficient way to calculate the mean of values in the bin with maximum frequency for large number of numpy arrays?

I am looking for an efficient way to do the following calculations on millions of arrays. For the values in each array, I want to calculate the mean of the values in the bin with most frequency as demonstrated below. Some of the arrays might contain nan values and other values are float. The loop for my actual data takes too long to finish.
import numpy as np
array = np.array([np.random.uniform(0, 10) for i in range(800,)])
# adding nan values
mask = np.random.choice([1, 0], array.shape, p=[.7, .3]).astype(bool)
array[mask] = np.nan
array = array.reshape(50, 16)
bin_values=np.linspace(0, 10, 21)
f = np.apply_along_axis(lambda a: np.histogram(a, bins=bin_values)[0], 1, array)
bin_start = np.apply_along_axis(lambda a: bin_values[np.argmax(a)], 1, f).reshape(array.shape[0], -1)
bin_end = bin_start + (abs(bin_values[1]-bin_values[0])
values = np.zeros(array.shape[0])
for i in range(array.shape[0]):
values[i] = np.nanmean(array[i][(array[i]>=bin_start[i])*(array[i]<bin_end[i])])
Also, when I run the above code I get three warnings. The first is 'RuntimeWarning: Mean of empty slice' for the line where I calculate the value variable. I set a condition in case I have all nan values to skip this line, but the warning did not go away. I was wondering what the reason is. The other two warnings are for when the less and greater_equal conditions do not meet which make sense to me since they might be nan values.
The arrays that I want to run this algorithm on are independent, but I am already processing them with 12 separate scripts. Running the code in parallel would be an option, however, for now I am looking to improve the algorithm itself.
The reason that I am using lambda function is to run numpy.histogram over an axis since it seems the histogram function does not take an axis as an option. I was able to use a mask and remove the loop from the code. The code is 2 times faster now, but I think it still can be improved more.
I can explain what I want to do in more detail by an example if it clarifies it. Imagine I have 36 numbers which are greater than 0 and smaller than 20. Also, I have bins with equal distance of 0.5 over the same interval (0.0_0.5, 0.5_1.0, 1.0_1.5, … , 19.5_20.0). I want to see if I put the 36 numbers in their corresponding bin what would be the mean of the numbers within the bin which contain the most number of numbers.
Please post your solution if you can think of a faster algorithm.
import numpy as np
# creating an array to test the algorithm
array = np.array([np.random.uniform(0, 10) for i in range(800,)])
# adding nan values
mask = np.random.choice([1, 0], array.shape, p=[.7, .3]).astype(bool)
array[mask] = np.nan
array = array.reshape(50, 16)
# the algorithm
bin_values=np.linspace(0, 10, 21)
# calculating the frequency of each bin
f = np.apply_along_axis(lambda a: np.histogram(a, bins=bin_values)[0], 1, array)
bin_start = np.apply_along_axis(lambda a: bin_values[np.argmax(a)], 1, f).reshape(array.shape[0], -1)
bin_end = bin_start + (abs(bin_values[1]-bin_values[0]))
# creating a mask to get the mean over the bin with maximum frequency
mask = (array>=bin_start) * (array<bin_end)
mask_nan = np.tile(np.nan, (mask.shape[0], mask.shape[1]))
mask_nan[mask] = 1
v = np.nanmean(array * mask_nan, axis = 1)

How to randomly set elements in numpy array to 0

First I create my array
myarray = np.random.random_integers(0,10, size=20)
Then, I want to set 20% of the elements in the array to 0 (or some other number). How should I do this? Apply a mask?
You can calculate the indices with np.random.choice, limiting the number of chosen indices to the percentage:
indices = np.random.choice(np.arange(myarray.size), replace=False,
size=int(myarray.size * 0.2))
myarray[indices] = 0
For others looking for the answer in case of nd-array, as proposed by user holi:
my_array = np.random.rand(8, 50)
indices = np.random.choice(my_array.shape[1]*my_array.shape[0], replace=False, size=int(my_array.shape[1]*my_array.shape[0]*0.2))
We multiply the dimensions to get an array of length dim1*dim2, then we apply this indices to our array:
my_array[np.unravel_index(indices, my_array.shape)] = 0
The array is now masked.
Use np.random.permutation as random index generator, and take the first 20% of the index.
myarray = np.random.random_integers(0,10, size=20)
n = len(myarray)
random_idx = np.random.permutation(n)
frac = 20 # [%]
zero_idx = random_idx[:round(n*frac/100)]
myarray[zero_idx] = 0
If you want the 20% to be random:
random_list = []
array_len = len(myarray)
while len(random_list) < (array_len/5):
random_int = math.randint(0,array_len)
if random_int not in random_list:
random_list.append(random_int)
for position in random_list:
myarray[position] = 0
return myarray
This would ensure you definitely get 20% of the values, and RNG rolling the same number many times would not result in less than 20% of the values being 0.
Assume your input numpy array is A and p=0.2. The following are a couple of ways to achieve this.
Exact Masking
ones = np.ones(A.size)
idx = int(min(p*A.size, A.size))
ones[:idx] = 0
A *= np.reshape(np.random.permutation(ones), A.shape)
Approximate Masking
This is commonly done in several denoising objectives, most notably the Masked Language Modeling in Transformers pre-training. Here is a more pythonic way of setting a certain proportion (say 20%) of elements to zero.
A *= np.random.binomial(size=A.shape, n=1, p=0.8)
Another Alternative:
A *= np.random.randint(0, 2, A.shape)

How to randomly sample a matrix in python?

I have a relatively large array, e.g. 200 x 1000.The matrix is a sparse matrix where elements are can be considered weights. The weights range from 0 to 500. I would like to generate a new array of the same size, 200x1000, where N of the elements of the new array are random integers {0,1}. The probability of an element in the new matrix being 0 or 1 is proportional to the weights from the original array - the higher the weight the higher the probability of 1 versus 0.
Stated another way: I would like to generate a zero matrix of size 200x1000 and then randomly choose N elements to flip to 1 based on a 200x1000 matrix of weights.
I'll throw my proposed solution in here as well:
# for example
a = np.random.random_integers(0, 500, size=(200,1000))
N = 200
result = np.zeros((200,1000))
ia = np.arange(result.size)
tw = float(np.sum(a.ravel()))
result.ravel()[np.random.choice(ia, p=a.ravel()/tw,
size=N, replace=False)]=1
where a is the array of weights: that is, pick the indexes for the items to change to 1 from the array ia, weighted by a.
This can be done with numpy with
# Compute probabilities as a 1-D array
probs = numpy.float64(weights).ravel()
probs /= numpy.sum(probs)
# Pick winner indexes
winners = numpy.random.choice(len(probs), N, False, probs)
# Build result
result = numpy.zeros(weights.shape, numpy.uint8)
result.ravel()[winners] = 1
Something like this should work, no reason to get too complicated:
>>> import random
>>> weights = [[1,5],[500,0]]
>>> output = []
>>> for row in weights:
... outRow = []
... for entry in row:
... outRow.append(random.choice([0]+[1 for _ in range(entry)]))
... output.append(outRow)
...
>>> output
[[1, 1], [1, 0]]
This chooses a random entry from a sequence which always has a single zero and then n 1s where n is the corresponding entry in your weight matrix. In this implementation, a weight of 1 is actually a 50/50 chance of either a 1 or 0. If you want a 50/50 chance to happen at 250 use outRow.append(random.choice([0 for _ in range(500-entry)] + [1 for _ in range(entry)]))

How to vectorize 3D Numpy arrays

I have a 3D numpy array like a = np.zeros((100,100, 20)). I want to perform an operation over every x,y position that involves all the elements over the z axis and the result is stored in an array like b = np.zeros((100,100)) on the same corresponding x,y position.
Now i'm doing it using a for loop:
d_n = np.array([...]) # a parameter with the same shape as b
for (x,y), v in np.ndenumerate(b):
C = a[x,y,:]
### calculate some_value using C
minv = sys.maxint
depth = -1
C = a[x,y,:]
for d in range(len(C)):
e = 2.5 * float(math.pow(d_n[x,y] - d, 2)) + C[d] * 0.05
if e < minv:
minv = e
depth = d
some_value = depth
if depth == -1:
some_value = len(C) - 1
###
b[x,y] = some_value
The problem now is that this operation is much slower than others done the pythonic way, e.g. c = b * b (I actually profiled this function and it's around 2 orders of magnitude slower than others using numpy built in functions and vectorized functions, over a similar number of elements)
How can I improve the performance of such kind of functions mapping a 3D array to a 2D one?
What is usually done in 3D images is to swap the Z axis to the first index:
>>> a = a.transpose((2,0,1))
>>> a.shape
(20, 100, 100)
And now you can easily iterate over the Z axis:
>>> for slice in a:
do something
The slice here will be each of your 100x100 fractions of your 3D matrix. Additionally, by transpossing allows you to access each of the 2D slices directly by indexing the first axis. For example a[10] will give you the 11th 2D 100x100 slice.
Bonus: If you store the data contiguosly, without transposing (or converting to a contiguous array using a = np.ascontiguousarray(a.transpose((2,0,1))) the access to you 2D slices will be faster since they are mapped contiguosly in memory.
Obviously you want to get rid of the explicit for loop, but I think whether this is possible depends on what calculation you are doing with C. As a simple example,
a = np.zeros((100,100, 20))
a[:,:] = np.linspace(1,20,20) # example data: 1,2,3,.., 20 as "z" for every "x","y"
b = np.sum(a[:,:]**2, axis=2)
will fill the 100 by 100 array b with the sum of the squared "z" values of a, that is 1+4+9+...+400 = 2870.
If your inner calculation is sufficiently complex, and not amenable to vectorization, then your iteration structure is good, and does not contribute significantly to the calculation time
for (x,y), v in np.ndenumerate(b):
C = a[x,y,:]
...
for d in range(len(C)):
... # complex, not vectorizable calc
...
b[x,y] = some_value
There doesn't appear to be a special structure in the 1st 2 dimensions, so you could just as well think of it as 2D mapping on to 1D, e.g. mapping a (N,20) array onto a (N,) array. That doesn't speed up anything, but may help highlight the essential structure of the problem.
One step is to focus on speeding up that C to some_value calculation. There are functions like cumsum and cumprod that help you do sequential calculations on a vector. cython is also a good tool.
A different approach is to see if you can perform that internal calculation over the N values all at once. In other words, if you must iterate, it is better to do so over the smallest dimension.
In a sense this a non-answer. But without full knowledge of how you get some_value from C and d_n I don't think we can do more.
It looks like e can be calculated for all points at once:
e = 2.5 * float(math.pow(d_n[x,y] - d, 2)) + C[d] * 0.05
E = 2.5 * (d_n[...,None] - np.arange(a.shape[-1]))**2 + a * 0.05 # (100,100,20)
E.min(axis=-1) # smallest value along the last dimension
E.argmin(axis=-1) # index of where that min occurs
On first glance it looks like this E.argmin is the b value that you want (tweaked for some boundary conditions if needed).
I don't have realistic a and d_n arrays, but with simple test ones, this E.argmin(-1) matches your b, with a 66x speedup.
How can I improve the performance of such kind of functions mapping a 3D array to a 2D one?
Many functions in Numpy are "reduction" functions*, for example sum, any, std, etc. If you supply an axis argument other than None to such a function it will reduce the dimension of the array over that axis. For your code you can use the argmin function, if you first calculate e in a vectorized way:
d = np.arange(a.shape[2])
e = 2.5 * (d_n[...,None] - d)**2 + a*0.05
b = np.argmin(e, axis=2)
The indexing with [...,None] is used to engage broadcasting. The values in e are floating point values, so it's a bit strange to compare to sys.maxint but there you go:
I, J = np.indices(b.shape)
b[e[I,J,b] >= sys.maxint] = a.shape[2] - 1
* Strickly speaking a reduction function is of the form reduce(operator, sequence) so technically not std and argmin

Simple way to create matrix of random numbers

I am trying to create a matrix of random numbers, but my solution is too long and looks ugly
random_matrix = [[random.random() for e in range(2)] for e in range(3)]
this looks ok, but in my implementation it is
weights_h = [[random.random() for e in range(len(inputs[0]))] for e in range(hiden_neurons)]
which is extremely unreadable and does not fit on one line.
You can drop the range(len()):
weights_h = [[random.random() for e in inputs[0]] for e in range(hiden_neurons)]
But really, you should probably use numpy.
In [9]: numpy.random.random((3, 3))
Out[9]:
array([[ 0.37052381, 0.03463207, 0.10669077],
[ 0.05862909, 0.8515325 , 0.79809676],
[ 0.43203632, 0.54633635, 0.09076408]])
Take a look at numpy.random.rand:
Docstring: rand(d0, d1, ..., dn)
Random values in a given shape.
Create an array of the given shape and propagate it with random
samples from a uniform distribution over [0, 1).
>>> import numpy as np
>>> np.random.rand(2,3)
array([[ 0.22568268, 0.0053246 , 0.41282024],
[ 0.68824936, 0.68086462, 0.6854153 ]])
use np.random.randint() as np.random.random_integers() is deprecated
random_matrix = np.random.randint(min_val,max_val,(<num_rows>,<num_cols>))
Looks like you are doing a Python implementation of the Coursera Machine Learning Neural Network exercise. Here's what I did for randInitializeWeights(L_in, L_out)
#get a random array of floats between 0 and 1 as Pavel mentioned
W = numpy.random.random((L_out, L_in +1))
#normalize so that it spans a range of twice epsilon
W = W * 2 * epsilon
#shift so that mean is at zero
W = W - epsilon
For creating an array of random numbers NumPy provides array creation using:
Real numbers
Integers
For creating array using random Real numbers:
there are 2 options
random.rand (for uniform distribution of the generated random numbers )
random.randn (for normal distribution of the generated random numbers )
random.rand
import numpy as np
arr = np.random.rand(row_size, column_size)
random.randn
import numpy as np
arr = np.random.randn(row_size, column_size)
For creating array using random Integers:
import numpy as np
numpy.random.randint(low, high=None, size=None, dtype='l')
where
low = Lowest (signed) integer to be drawn from the distribution
high(optional)= If provided, one above the largest (signed) integer to be drawn from the distribution
size(optional) = Output shape i.e. if the given shape is, e.g., (m, n, k), then m * n * k samples are drawn
dtype(optional) = Desired dtype of the result.
eg:
The given example will produce an array of random integers between 0 and 4, its size will be 5*5 and have 25 integers
arr2 = np.random.randint(0,5,size = (5,5))
in order to create 5 by 5 matrix, it should be modified to
arr2 = np.random.randint(0,5,size = (5,5)), change the multiplication symbol* to a comma ,#
[[2 1 1 0 1][3 2 1 4 3][2 3 0 3 3][1 3 1 0 0][4 1 2 0 1]]
eg2:
The given example will produce an array of random integers between 0 and 1, its size will be 1*10 and will have 10 integers
arr3= np.random.randint(2, size = 10)
[0 0 0 0 1 1 0 0 1 1]
First, create numpy array then convert it into matrix. See the code below:
import numpy
B = numpy.random.random((3, 4)) #its ndArray
C = numpy.matrix(B)# it is matrix
print(type(B))
print(type(C))
print(C)
x = np.int_(np.random.rand(10) * 10)
For random numbers out of 10. For out of 20 we have to multiply by 20.
When you say "a matrix of random numbers", you can use numpy as Pavel https://stackoverflow.com/a/15451997/6169225 mentioned above, in this case I'm assuming to you it is irrelevant what distribution these (pseudo) random numbers adhere to.
However, if you require a particular distribution (I imagine you are interested in the uniform distribution), numpy.random has very useful methods for you. For example, let's say you want a 3x2 matrix with a pseudo random uniform distribution bounded by [low,high]. You can do this like so:
numpy.random.uniform(low,high,(3,2))
Note, you can replace uniform by any number of distributions supported by this library.
Further reading: https://docs.scipy.org/doc/numpy/reference/routines.random.html
A simple way of creating an array of random integers is:
matrix = np.random.randint(maxVal, size=(rows, columns))
The following outputs a 2 by 3 matrix of random integers from 0 to 10:
a = np.random.randint(10, size=(2,3))
random_matrix = [[random.random for j in range(collumns)] for i in range(rows)
for i in range(rows):
print random_matrix[i]
An answer using map-reduce:-
map(lambda x: map(lambda y: ran(),range(len(inputs[0]))),range(hiden_neurons))
#this is a function for a square matrix so on the while loop rows does not have to be less than cols.
#you can make your own condition. But if you want your a square matrix, use this code.
import random
import numpy as np
def random_matrix(R, cols):
matrix = []
rows = 0
while rows < cols:
N = random.sample(R, cols)
matrix.append(N)
rows = rows + 1
return np.array(matrix)
print(random_matrix(range(10), 5))
#make sure you understand the function random.sample
numpy.random.rand(row, column) generates random numbers between 0 and 1, according to the specified (m,n) parameters given. So use it to create a (m,n) matrix and multiply the matrix for the range limit and sum it with the high limit.
Analyzing: If zero is generated just the low limit will be held, but if one is generated just the high limit will be held. In order words, generating the limits using rand numpy you can generate the extreme desired numbers.
import numpy as np
high = 10
low = 5
m,n = 2,2
a = (high - low)*np.random.rand(m,n) + low
Output:
a = array([[5.91580065, 8.1117106 ],
[6.30986984, 5.720437 ]])

Categories

Resources