Python numpy array values get rounded after boolean indexing - python

I want to apply calculation only for those values that are higher than threshold. After doing it with boolean indexing, values get rounded. How to prevent it?
starting_score = 1
threshold = 5
x = np.array([0,1,2,3,4,5,6,7,8,9,10])
gt_idx = x > threshold
le_idx = x <= threshold
decay = math.log(2) / 10
y = starting_score * np.exp(-decay * x)
x[gt_idx] = starting_score * np.exp(-decay * x[gt_idx])
y
array([1. , 0.93303299, 0.87055056, 0.8122524 , 0.75785828,
0.70710678, 0.65975396, 0.61557221, 0.57434918, 0.53588673,
0.5 ])
x
array([0, 1, 2, 3, 4, 5, 0, 0, 0, 0, 0])
when applied to full array, I get correct y array.
when applied to part of x, values get selected properly, but rounded to 0
My expected output is
array([0, 1, 2, 3, 4, 5, 0.65975396, 0.61557221, 0.57434918, 0.53588673, 0.5])

It is considered np.int32 as default type for when you create a NumPy array with integers as x. For getting other types in the results you have two ways:
# np.float32 or np.float64
x = np.array([0, 1, 2, 3, 4, 5, 6, 7, 8, 9, 10], dtype=np.float64) # way 1
x = x.astype(np.float64) # way 2
such operation is not needed for y because in is multiplied by a float type value i.e. np.exp(-decay * x), so it became to float types.

numpy automatically assigns the integer data type to x. To preserve your floats you need to change the type of the x array
x.dtype
# Out: dtype('int64')
x = x.astype('float64')
or declare x as an array of float64
x = np.array([0,1,2,3,4,5,6,7,8,9,10], dtype='float64')

Related

Is there a way to disregard masked values in an array used to mask separate array?

My data is several arrays of data taken of the same length. I am masking one array (y) then using that masked array to mask a 2nd array (x). I mask x to get rid of values indicating equipment error (-9999). I then use np.where() to find out where y is low (1 standard dev below the mean) to mask x in order to see the values of x when y is low.
I have tried changing my mask several times but none of the other numpy masked array operations gave me a different result. I tried to write a logical statement to give me the values when the mask = FALSE but I cannot do that within the np.where() statement.
x = np.array([ 0, 1, 2, 3, 4, 5, 6, 7, 8, 9, 10] )
y = np.array( [ 0, 1, -9999, 3, 4, 5, 6, 7, 8, -9999, 10 ] )
x = np.ma.masked_values( x, -9999 )
y = np.ma.masked_values( y, -9999 )
low_y = ( y.mean() - np.std( y ) )
x_masked = x[ np.where( y < low_y ) ]
When we call x_masked, it returns:
>>>x_masked
masked_array(data=[0, 1, 2, 9],
mask=False,
fill_value=-9999)
We expect the mean of x_masked to be 0.5 ( (0 + 1)/2 ) but instead the mean is 3 because of the masked -9999 values ( 2 & 9) that were included in x_masked.
Is there a way to exclude the masked values in order to only get the unmasked values?
Since version 1.8 numpy added nanstd and nanmean to handle missing data. In your case since the -9999 is there to indicate error state and by definition I think it is a good use case of numpy.nan
In [76]: y = np.where(y==-9999, np.nan, y)
In [77]: low_y = (np.nanmean(y) - np.nanstd(y))
In [78]: low_y
Out[78]: 1.8177166753143883
In [79]: x_masked = x[ np.where( y < low_y ) ] # [0, 1]
I think you'd want to masked x where y != -9999. If you make this change to your code, it works as you expect.
You could also just use np.where to mask.
x = x[np.where(y != -9999)]
y = y[np.where(y != -9999)]
low_y = ( y.mean() - np.std( y ) )
x_masked = x[np.where( y < low_y)]
print (x_masked)
[0 1]

PyTorch - multiplying tensor with scalar results in zero vector

I have no idea why the result is all 0 with tensor. Anything wrong here?
>>> import torch
>>> import numpy as np
>>> import math
>>> torch.__version__
'0.4.1'
>>> np.__version__
'1.15.4'
>>> torch.arange(0, 10, 2) *-(math.log(10000.0) / 10)
tensor([0, 0, 0, 0, 0])
>>> np.arange(0, 10, 2) *-(math.log(10000.0) / 10)
array([-0. , -1.84206807, -3.68413615, -5.52620422, -7.3682723 ])
>>> torch.arange(0, 10, 2)
tensor([0, 2, 4, 6, 8])
>>> np.arange(0, 10, 2)
array([0, 2, 4, 6, 8])
As written in the comment when using 0.4.0 get the same results as with numpy:
tensor([-0.0000, -1.8421, -3.6841, -5.5262, -7.3683])
However with 0.4.1 I'm getting a zero vector too.
The reason for this is that torch.arange(0, 10, 2) returns a tensor of type float for 0.4.0 while it returns a tensor of type long for 0.4.1.
So casting your tensor to float should work for you:
torch.arange(0, 10, 2).float() *-(math.log(10000.0) / 10)
Multiplying long and float works by heavy rounding, as the result is still a tensor of type long. So when converting a FloatTensor to a LongTensor values between -1 and 1 will be rounded to 0.
Since -(math.log(10000.0) / 10) results in -0.9210340371976183 your result is 0. So effectively -0.9210340371976183 is converted to type long before multiplying. But when converting it will be round down to 0, see this example:
t = torch.tensor((-(math.log(10000.0) / 10)))
print('FloatTensor:', t)
print('Converted to Long:', t.long())
Outout:
FloatTensor: tensor(-0.9210)
Converted to Long: tensor(0)
Thus:
torch.arange(0, 10, 2).float() *-(math.log(10000.0) / 10)
becomes:
torch.arange(0, 10, 2).float() * 0
Therefore you get a tensor of zeros as result.
Some more examples:
If you multiply it with a value between 1 and 2, lets say 1.7, it will always been rounded down to 1:
t = torch.tensor(range(5), dtype=torch.long)
print(t)
print(t * 1.7)
Output:
tensor([ 0, 1, 2, 3, 4])
tensor([ 0, 1, 2, 3, 4])
And similarly when multiplying with 2.7 results in an effective multiplication of 2:
t = torch.tensor(range(5), dtype=torch.long)
print(t)
print(t * 2.7)
Output:
tensor([ 0, 1, 2, 3, 4])
tensor([ 0, 2, 4, 6, 8])

usage of sum function in python

x_d = np.linspace(-4, 8, 30)
print('x_d shape: ',x_d.shape)
print('x shape: ',x.shape)
density = sum((abs(xi - x_d) < 0.5) for xi in x)---------> difficulty in understanding statement
output:
x_d shape: (30,)
x shape: (20,)
I am having difficulty in understanding above statement
for each value of x we are substracting x_d from it, and we will get single value. But we are density as (30,)
How we got density dimension as (30,)
The expression
xi - x_d
will use NumPy broadcasting to conform the shapes of the two objects. In this case it means treating the scalar value xi as if it was an array of all the same value and of equal dimensions as x_d.
The abs function and the less-than comparison will work element-wise with NumPy arrays, so that the expression
(abs(xi - x_d) < 0.5)
should result in a length-30 array (same size as x_d) where each entry of that array is either True or False depending on the condition applied to each element of x_d.
This gets repeated for multiple values of xi, leading to multiple different length-30 arrays.
The result of calling sum on these arrays is that they are added together elementwise (and also by the luck of broadcasting, since the sum function has a default initial value of 0, the first array is added to 0 elementwise, leaving it unchanged).
So in the final result, it will be a length-30 array, where item 0 of the array counts how many xi values satisfied the absolute value condition based on the 0th element of x_d. Item 1 of the output array will count the number of xi values that satisfied the absolute value condition on the 1st element of x_d, and so on.
Here is an example with some test data:
In [31]: x_d = np.linspace(-4, 8, 30)
In [32]: x = np.arange(20)
In [33]: x
Out[33]:
array([ 0, 1, 2, 3, 4, 5, 6, 7, 8, 9, 10, 11, 12, 13, 14, 15, 16,
17, 18, 19])
In [34]: density = sum((abs(xi - x_d) < 0.5) for xi in x)
In [35]: density
Out[35]:
array([0, 0, 0, 0, 0, 0, 0, 0, 0, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1,
1, 1, 1, 1, 1, 1, 1])

How to shuffle a matrix and an array accordingly

Suppose I have an mXd matrix called X, and an mX1 array called Y (using numpy). The rows of X correspond to the rows of Y.
Now suppose I need to shuffle the data (the rows) in X. I used:
random.shuffle(X)
Is there a way for me to keep track of the way X has been shuffled, so I could shuffle Y accordingly?
Thank you :)
You can use numpy.random.permutation to create a permuted list of indices, and then shuffle both X and Yusing those indices:
>>> import numpy
>>> m = 10
>>> X = numpy.random.rand(m, m)
>>> Y = numpy.random.rand(m)
>>> indices = numpy.random.permutation(m)
>>> indices
array([4, 7, 6, 9, 0, 3, 1, 2, 8, 5])
>>> Y
array([ 0.53867012, 0.6700051 , 0.06199551, 0.51248468, 0.4990566 ,
0.81435935, 0.16030748, 0.96252029, 0.44897724, 0.98062564])
>>> Y = Y[indices]
>>> Y
array([ 0.4990566 , 0.96252029, 0.16030748, 0.98062564, 0.53867012,
0.51248468, 0.6700051 , 0.06199551, 0.44897724, 0.81435935])
>>> X = X[indices, :]

How to make numpy.cumsum start after the first value

I have:
import numpy as np
position = np.array([4, 4.34, 4.69, 5.02, 5.3, 5.7, ..., 4])
x = (B/position**2)*dt
A = np.cumsum(x)
assert A[0] == 0 # I want this to be true.
Where B and dt are scalar constants. This is for a numerical integration problem with initial condition of A[0] = 0. Is there a way to set A[0] = 0 and then do a cumsum for everything else?
I don't understand what exactly your problem is, but here are some things you can do to have A[0] = 0.
You can create A to be longer by one index to have the zero as the first entry:
# initialize example data
import numpy as np
B = 1
dt = 1
position = np.array([4, 4.34, 4.69, 5.02, 5.3, 5.7])
# do calculation
A = np.zeros(len(position) + 1)
A[1:] = np.cumsum((B/position**2)*dt)
Result:
A = [ 0. 0.0625 0.11559096 0.16105356 0.20073547 0.23633533 0.26711403]
len(A) == len(position) + 1
Alternatively, you can manipulate the calculation to substract the first entry of the result:
# initialize example data
import numpy as np
B = 1
dt = 1
position = np.array([4, 4.34, 4.69, 5.02, 5.3, 5.7])
# do calculation
A = np.cumsum((B/position**2)*dt)
A = A - A[0]
Result:
[ 0. 0.05309096 0.09855356 0.13823547 0.17383533 0.20461403]
len(A) == len(position)
As you see, the results have different lengths. Is one of them what you expect?
1D cumsum
A wrapper around np.cumsum that sets first element to 0:
def cumsum(pmf):
cdf = np.empty(len(pmf) + 1, dtype=pmf.dtype)
cdf[0] = 0
np.cumsum(pmf, out=cdf[1:])
return cdf
Example usage:
>>> np.arange(1, 11)
array([ 1, 2, 3, 4, 5, 6, 7, 8, 9, 10])
>>> cumsum(np.arange(1, 11))
array([ 0, 1, 3, 6, 10, 15, 21, 28, 36, 45, 55])
N-D cumsum
A wrapper around np.cumsum that sets first element to 0, and works with N-D arrays:
def cumsum(pmf, axis=None, dtype=None):
if axis is None:
pmf = pmf.reshape(-1)
axis = 0
if dtype is None:
dtype = pmf.dtype
idx = [slice(None)] * pmf.ndim
# Create array with extra element along cumsummed axis.
shape = list(pmf.shape)
shape[axis] += 1
cdf = np.empty(shape, dtype)
# Set first element to 0.
idx[axis] = 0
cdf[tuple(idx)] = 0
# Perform cumsum on remaining elements.
idx[axis] = slice(1, None)
np.cumsum(pmf, axis=axis, dtype=dtype, out=cdf[tuple(idx)])
return cdf
Example usage:
>>> np.arange(1, 11).reshape(2, 5)
array([[ 1, 2, 3, 4, 5],
[ 6, 7, 8, 9, 10]])
>>> cumsum(np.arange(1, 11).reshape(2, 5), axis=-1)
array([[ 0, 1, 3, 6, 10, 15],
[ 0, 6, 13, 21, 30, 40]])
I totally understand your pain, I wonder why Numpy doesn't allow this with np.cumsum. Anyway, though I'm really late and there's already another good answer, I prefer this one a bit more:
np.cumsum(np.pad(array, (1, 0), "constant"))
where array in your case is (B/position**2)*dt. You can change the order of np.pad and np.cumsum as well. I'm just adding a zero to the start of the array and calling np.cumsum.
You can use roll (shift right by 1) and then set the first entry to zero.

Categories

Resources