Python:: assign random values but should include specific number - python

I know how to create an array includes random values, but I would like to have at least one number equal to specific value, and all random values should be greater than 1 ?
import numpy as np
x=np.random.rand(1,4)
specif_value=3
print(x)
#x=[2 3 1 1]

Read the docs for numpy.random.rand here. It appears you want an array of random non-zero integers, in which case you should be using randint(docs here)
This code will give you an output array with a specific value in the specified index
import numpy as np
# Creating random array
size = 4
minimum_value = 1
maximum_value = 100
x = np.random.randint(minimum_value, maximum_value, size)
# Including specified value
specified_value = 3
specified_value_index = 2
x[specified_value_index] = specified_value
print(x)
Note that randint requires a maximum value if you want non-zero integers; otherwise it will return values between 0 and the given minimum

Related

Find the first larger value in numpy array

I have a sorted array of float numbers ranging from 0 to 1.
The goal is to generate a random number (r) between the same range and determine between which two numbers from the array (r) lie.
I tried "(np.abs(array - value)).argmin()" but it gives the nearest number which sometimes is the larger one and others is the smaller one.
take an example of a random number
import numpy as np
# Generate a random number between 0 and 1
r = np.random.rand()
# create a sorted array
a = np.arange(20)/20
# create an array of the same element r repeated to match the size of a
n = (np.repeat(r, 20) >= a).sum()
# you can get the interval like this
a[n-1:n+1]

Find the location (x and y index) of values (for example above a limit=0.5) in an 2D array Python

I want to return the index of elements in a 2d array that fit a certain condition, i.e. >0.5.
Consider the sample 2D array bellow:
import numpy as np
a = np.array([[0.720,0.764,0.058,0.101,0.504,0.715,0.373,0.584,0.052,0.617],
[0.855,0.413,0.952,0.948,0.109,0.397,0.014,0.719,0.896,0.137],
[0.237,0.660,0.494,0.193,0.504,0.315,0.600,0.172,0.639,0.464],
[0.534,0.967,0.400,0.400,0.629,0.490,0.580,0.826,0.118,0.023],
[0.312,0.133,0.335,0.548,0.729,0.687,0.229,0.216,0.759,0.594]])
Using the
a[np.array(a)>0.7]
it is discovered that there are 12 values above 0.7 in the array.
What should I add to the code so that it can show me the location (or x and y index) of these 12 values??
for example, in a dataframe like:
number
value
X index
Y index
1
0.720
0
0
...
...
...
...
12
0.759
4
8
I have a huge dataset (21500, 16000) and only 100 values are above my desired limits, so this approach will be very helpfull.
np.where returns the index of values fitting a specified condition, i.e. >0.5.
index = np.where(a>0.7)
Returns the index of all values in array a that are larger than 0.7.

numpy array for float values within range

With regards to efficiency, how can we create a large numpy array where the values are float numbers within a specific range.
For example, for a 1-D numpy array of fixed size where the values are between 0 and 200,000,000.00 (i.e. values in [0, 200,000,000.00]), I can create the array using the smallest data type for floats (float16) and then validate any new value (from user input) before inserting it to the array:
import numpy as np
a = np.empty(shape=(1000,), dtype=np.float16))
pos = 0
new_value = input('Enter new value: ')
# validate
new_value = round(new_value, 2)
if new_value in np.arange(0.00, 200000000.00, 0.01):
# fill in new value
a[pos] = new_value
pos = pos + 1
The question is, can we enforce the new_value validity (in terms of the already-known minimum/maximum values and number of decimals) based on the dtype of the array?
In other words, the fact that we know the range and number of decimals on the time of creating the array, does this gives us any opportunity to (more) efficiently insert valid values in the array?
I am a bit confused how your code even run because it's not working as it is presented here.
It is also a bit unclear why you want to append new values to an empty array you have created beforehand. Did you meant to fill the created array with the new incoming values instead of appending?
np.arange(0.00, 200000000.00, 0.01)
This line is causing problems as it creates a huge array with values leading to a MemoryError in my environment just to check if the new_value is in a certain range.
Extending my comment and fixing issues with your code my solution would look like
import numpy as np
max_value = 200000000
arr = np.empty(shape=(1000,), dtype=np.float16)
new_value = float(input('Enter new value: ')) # More checks might be useful if input is numeric
# validate
if 0 <= new_value <= max_value:
new_value = round(new_value, 2) # round only if range criterion is fulfilled
arr = np.append(arr, new_value) # in case you really want to append your value

Thresholding a python list with multiple values

Okay so I have a an array of 1000x100 with random numbers. I want to threshold this list with a list of multiple numbers; these numbers go from [3 to 9].If they are higher than the threshold I want the sum of the row appended to a list.
I have tried many ways, including a 3 times for conditional. Right now, I have found a way to compare an array to a list of numbers but each time that happens I get random numbers from that list again.
xpatient=5
sd_healthy=2
xhealthy=7
sd_patient=2
thresholdvalue1=(xpatient-sd_healthy)*10
thresholdvalue2=(((xhealthy+sd_patient))*10)
thresholdlist=[]
x1=[]
Ahealthy=np.random.randint(10,size=(1000,100))
Apatient=np.random.randint(10,size=(1000,100))
TParray=np.random.randint(10,size=(1,61))
def thresholding(A,B):
for i in range(A,B):
thresholdlist.append(i)
i+=1
thresholding(thresholdvalue1,thresholdvalue2+1)
thresholdarray=np.asarray(thresholdlist)
thedivisor=10
newthreshold=(thresholdarray/thedivisor)
for x in range(61):
Apatient=np.random.randint(10,size=(1000,100))
Apatient=[Apatient>=newthreshold[x]]*Apatient
x1.append([sum(x) for x in zip(*Apatient)])
So,my for loop consists of a random integer within it, but if I don't do that, I don't get to see the threshold each turn. I want the threshold for the whole array to be 3,3.1,3.2 etc. etc.
I hope I delivered my point. Thanks in advance
You can solve your problem using this approach:
import numpy as np
def get_sums_by_threshold(data, threshold, axis): # use axis=0 to sum values along rows, axis=1 - along columns
result = list(np.where(data >= threshold, data, 0).sum(axis=axis))
return result
xpatient=5
sd_healthy=2
xhealthy=7
sd_patient=2
thresholdvalue1=(xpatient-sd_healthy)*10
thresholdvalue2=(((xhealthy+sd_patient))*10)
np.random.seed(100) # to keep generated array reproducable
data = np.random.randint(10,size=(1000,100))
thresholds = [num / 10.0 for num in range(thresholdvalue1, thresholdvalue2+1)]
sums = list(map(lambda x: get_sums_by_threshold(data, x, axis=0), thresholds))
But you should know that your initial array includes only integer values and you will have same result for multiple thresholds that have the same integer part (f.e. 3.0, 3.1, 3.2, ..., 3.9). If you want to store float numbers from 0 to 9 in your initial array with the specified shape you can do following:
data = np.random.randint(90,size=(1000,100)) / 10.0

Vectorize an operation in Numpy

I am trying to do the following on Numpy without using a loop :
I have a matrix X of dimensions N*d and a vector y of dimension N.
y contains integers ranging from 1 to K.
I am trying to get a matrix M of size K*d, where M[i,:]=np.mean(X[y==i,:],0)
Can I achieve this without using a loop?
With a loop, it would go something like this.
import numpy as np
N=3
d=3
K=2
X=np.eye(N)
y=np.random.randint(1,K+1,N)
M=np.zeros((K,d))
for i in np.arange(0,K):
line=X[y==i+1,:]
if line.size==0:
M[i,:]=np.zeros(d)
else:
M[i,:]=mp.mean(line,0)
Thank you in advance.
The code's basically collecting specific rows off X and adding them for which we have a NumPy builtin in np.add.reduceat. So, with that in focus, the steps to solve it in a vectorized way could be as listed next -
# Get sort indices of y
sidx = y.argsort()
# Collect rows off X based on their IDs so that they come in consecutive order
Xr = X[np.arange(N)[sidx]]
# Get unique row IDs, start positions of each unique ID
# and their counts to be used for average calculations
unq,startidx,counts = np.unique((y-1)[sidx],return_index=True,return_counts=True)
# Add rows off Xr based on the slices signified by the start positions
vals = np.true_divide(np.add.reduceat(Xr,startidx,axis=0),counts[:,None])
# Setup output array and set row summed values into it at unique IDs row positions
out = np.zeros((K,d))
out[unq] = vals
This solves the question, but creates an intermediate K×N boolean matrix, and doesn't use the built-in mean function. This may lead to worse performance or worse numerical stability in some cases. I'm letting the class labels range from 0 to K-1 rather than 1 to K.
# Define constants
K,N,d = 10,1000,3
# Sample data
Y = randint(0,K-1,N) #K-1 to omit one class to test no-examples case
X = randn(N,d)
# Calculate means for each class, vectorized
# Map samples to labels by taking a logical "outer product"
mark = Y[None,:]==arange(0,K)[:,None]
# Count number of examples in each class
count = sum(mark,1)
# Avoid divide by zero if no examples
count += count==0
# Sum within each class and normalize
M = (dot(mark,X).T/count).T
print(M, shape(M), shape(mark))

Categories

Resources