x_d = np.linspace(-4, 8, 30)
print('x_d shape: ',x_d.shape)
print('x shape: ',x.shape)
density = sum((abs(xi - x_d) < 0.5) for xi in x)---------> difficulty in understanding statement
output:
x_d shape: (30,)
x shape: (20,)
I am having difficulty in understanding above statement
for each value of x we are substracting x_d from it, and we will get single value. But we are density as (30,)
How we got density dimension as (30,)
The expression
xi - x_d
will use NumPy broadcasting to conform the shapes of the two objects. In this case it means treating the scalar value xi as if it was an array of all the same value and of equal dimensions as x_d.
The abs function and the less-than comparison will work element-wise with NumPy arrays, so that the expression
(abs(xi - x_d) < 0.5)
should result in a length-30 array (same size as x_d) where each entry of that array is either True or False depending on the condition applied to each element of x_d.
This gets repeated for multiple values of xi, leading to multiple different length-30 arrays.
The result of calling sum on these arrays is that they are added together elementwise (and also by the luck of broadcasting, since the sum function has a default initial value of 0, the first array is added to 0 elementwise, leaving it unchanged).
So in the final result, it will be a length-30 array, where item 0 of the array counts how many xi values satisfied the absolute value condition based on the 0th element of x_d. Item 1 of the output array will count the number of xi values that satisfied the absolute value condition on the 1st element of x_d, and so on.
Here is an example with some test data:
In [31]: x_d = np.linspace(-4, 8, 30)
In [32]: x = np.arange(20)
In [33]: x
Out[33]:
array([ 0, 1, 2, 3, 4, 5, 6, 7, 8, 9, 10, 11, 12, 13, 14, 15, 16,
17, 18, 19])
In [34]: density = sum((abs(xi - x_d) < 0.5) for xi in x)
In [35]: density
Out[35]:
array([0, 0, 0, 0, 0, 0, 0, 0, 0, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1,
1, 1, 1, 1, 1, 1, 1])
Related
I want to apply calculation only for those values that are higher than threshold. After doing it with boolean indexing, values get rounded. How to prevent it?
starting_score = 1
threshold = 5
x = np.array([0,1,2,3,4,5,6,7,8,9,10])
gt_idx = x > threshold
le_idx = x <= threshold
decay = math.log(2) / 10
y = starting_score * np.exp(-decay * x)
x[gt_idx] = starting_score * np.exp(-decay * x[gt_idx])
y
array([1. , 0.93303299, 0.87055056, 0.8122524 , 0.75785828,
0.70710678, 0.65975396, 0.61557221, 0.57434918, 0.53588673,
0.5 ])
x
array([0, 1, 2, 3, 4, 5, 0, 0, 0, 0, 0])
when applied to full array, I get correct y array.
when applied to part of x, values get selected properly, but rounded to 0
My expected output is
array([0, 1, 2, 3, 4, 5, 0.65975396, 0.61557221, 0.57434918, 0.53588673, 0.5])
It is considered np.int32 as default type for when you create a NumPy array with integers as x. For getting other types in the results you have two ways:
# np.float32 or np.float64
x = np.array([0, 1, 2, 3, 4, 5, 6, 7, 8, 9, 10], dtype=np.float64) # way 1
x = x.astype(np.float64) # way 2
such operation is not needed for y because in is multiplied by a float type value i.e. np.exp(-decay * x), so it became to float types.
numpy automatically assigns the integer data type to x. To preserve your floats you need to change the type of the x array
x.dtype
# Out: dtype('int64')
x = x.astype('float64')
or declare x as an array of float64
x = np.array([0,1,2,3,4,5,6,7,8,9,10], dtype='float64')
I have a 2d numpy array called arm_resets that has positive integers. The first column has all positive integers < 360. For all columns other than the first, I need to replace all values over 360 with the value that is in the same row in the 1st column. I thought this would be a relatively easy thing to do, here's what I have:
i = 300
over_360 = arm_resets[:, [i]] >= 360
print(arm_resets[:, [i]][over_360])
print(arm_resets[:, [0]][over_360])
arm_resets[:, [i]][over_360] = arm_resets[:, [0]][over_360]
print(arm_resets[:, [i]][over_360])
And here's what prints:
[3600 3609 3608 ... 3600 3611 3605]
[ 0 9 8 ... 0 11 5]
[3600 3609 3608 ... 3600 3611 3605]
Since all numbers that are being shown in the first print (first 3 and last 3) are above 360, they should be getting replaced by the 2nd print in the 3rd print. Why is this not working?
edit: reproducible example:
df = pd.DataFrame({"start":[1,2,5,6],"freq":[1,5,6,9]})
periods = 6
arm_resets = df[["start"]].values
freq = df[["freq"]].values
arm_resets = np.pad(arm_resets,((0,0),(0,periods-1)))
for i in range(1,periods):
arm_resets[:,[i]] = arm_resets[:,[i-1]] + freq
#over_360 = arm_resets[:,[i]] >= periods
#arm_resets[:,[i]][over_360] = arm_resets[:,[0]][over_360]
arm_resets
Given commented out code here's what prints:
array([[ 1, 2, 3, 4, 5, 6],
[ 2, 7, 12, 17, 22, 27],
[ 3, 9, 15, 21, 27, 33],
[ 4, 13, 22, 31, 40, 49]])
What I would expect:
array([[ 1, 2, 3, 4, 5, 1],
[ 2, 2, 2, 2, 2, 2],
[ 3, 3, 3, 3, 3, 3],
[ 4, 4, 4, 4, 4, 4]])
Now if it helps, the final 2d array I'm actually trying to create is a 1/0 array that indicates which are filled in, so in this example I'd want this:
array([[ 0, 1, 1, 1, 1, 1],
[ 0, 0, 1, 0, 0, 0],
[ 0, 0, 0, 1, 0, 0],
[ 0, 0, 0, 0, 1, 0]])
The code I use to achieve this from the above arm_resets is this:
fin = np.zeros((len(arm_resets),periods),dtype=int)
for i in range(len(arm_resets)):
fin[i,a[i]] = 1
The slice arm_resets[:, [i]] is a fancy index, and therefore makes a copy of the ith column of the data. arm_resets[:, [i]][over_360] = ... therefore calls __setitem__ on a temporary array that is discarded as soon as the statement executes. If you want to assign to the mask, call __setitem__ on the sliced object directly:
arm_resets[over_360, [i]] = ...
You also don't need to make the index into a list. It's generally better to use simple indices, especially when doing assignments, since they create views rather than copies:
arm_resets[over_360, i] = ...
With slicing, even the following should work, since it calls __setitem__ on a view:
arm_resets[:, i][over_360] = ...
This index does not help you process each row of the data, since i is a column. In fact, you can process the entire matrix in one step, without looping, if you use indices rather than a boolean mask. The reason that indices are useful is that you can match the item from the correct row in the first column:
rows, cols = np.nonzero(arm_resets[:, 1:] >= 360)
arm_resets[rows, cols] = arm_resets[rows, 1]
You can use np.where()
first_col = arm_resets[:,0] # first col
first_col = first_col.reshape(first_col.size,1) #Transfor in 2d array
arm_resets = np.where(arm_resets >= 360,first_col,arm_resets)
You can see in detail how np.where work here, but basically it compare arm_resets >= 360, if true it put first_col value in place (there another detail here with broadcasting) if false it put arm_resets value.
Edit: As suggested by Mad Physicist. You can use arm_resets[:,0,None] directly instead of creating first_col variable.
arm_resets = np.where(arm_resets >= 360,arm_resets[:,0,None],arm_resets)
I have two numpy arrays of integers A and B. The values in array A and B correspond to time-points at which events A and B occurred. I would like to transform A to contain the time since the most recent event b occurred.
I know I need to subtract each element of A by its nearest smaller the element of B but am unsure of how to do so. Any help would be greatly appreciated.
>>> import numpy as np
>>> A = np.array([11, 12, 13, 17, 20, 22, 33, 34])
>>> B = np.array([5, 10, 15, 20, 25, 30])
Desired Result:
cond_a = relative_timestamp(to_transform=A, reference=B)
cond_a
>>> array([1, 2, 3, 2, 0, 2, 3, 4])
You can use np.searchsorted to find the indices where the elements of A should be inserted in B to maintain order. In other words, you are finding the closest elemet in B for each element in A:
idx = np.searchsorted(B, A, side='right')
result = A-B[idx-1] # substract one for proper index
According to the docs searchsorted uses binary search, so it will scale fine for large inputs.
Here's an approach consisting on computing the pairwise differences. Note that it has a O(n**2) complexity so it might for larger arrays #brenlla's answer will perform much better.
The idea here is to use np.subtract.outer and then find the minimum difference along axis 1 over a masked array, where only values in B smaller than a are considered:
dif = np.abs(np.subtract.outer(A,B))
np.ma.array(dif, mask = A[:,None] < B).min(1).data
# array([1, 2, 3, 2, 0, 2, 3, 4])
As I am not sure, if it is really faster to calculate all pairwise differences, instead of a python loop over each array entry (worst case O(Len(A)+len(B)), the solution with a loop:
A = np.array([11, 12, 13, 17, 20, 22, 33, 34])
B = np.array([5, 10, 15, 20, 25, 30])
def calculate_next_distance(to_transform, reference):
max_reference = len(reference) - 1
current_reference = 0
transformed_values = np.zeros_like(to_transform)
for i, value in enumerate(to_transform):
while current_reference < max_reference and reference[current_reference+1] <= value:
current_reference += 1
transformed_values[i] = value - reference[current_reference]
return transformed_values
calculate_next_distance(A,B)
# array([1, 2, 3, 2, 0, 2, 3, 4])
I have:
import numpy as np
position = np.array([4, 4.34, 4.69, 5.02, 5.3, 5.7, ..., 4])
x = (B/position**2)*dt
A = np.cumsum(x)
assert A[0] == 0 # I want this to be true.
Where B and dt are scalar constants. This is for a numerical integration problem with initial condition of A[0] = 0. Is there a way to set A[0] = 0 and then do a cumsum for everything else?
I don't understand what exactly your problem is, but here are some things you can do to have A[0] = 0.
You can create A to be longer by one index to have the zero as the first entry:
# initialize example data
import numpy as np
B = 1
dt = 1
position = np.array([4, 4.34, 4.69, 5.02, 5.3, 5.7])
# do calculation
A = np.zeros(len(position) + 1)
A[1:] = np.cumsum((B/position**2)*dt)
Result:
A = [ 0. 0.0625 0.11559096 0.16105356 0.20073547 0.23633533 0.26711403]
len(A) == len(position) + 1
Alternatively, you can manipulate the calculation to substract the first entry of the result:
# initialize example data
import numpy as np
B = 1
dt = 1
position = np.array([4, 4.34, 4.69, 5.02, 5.3, 5.7])
# do calculation
A = np.cumsum((B/position**2)*dt)
A = A - A[0]
Result:
[ 0. 0.05309096 0.09855356 0.13823547 0.17383533 0.20461403]
len(A) == len(position)
As you see, the results have different lengths. Is one of them what you expect?
1D cumsum
A wrapper around np.cumsum that sets first element to 0:
def cumsum(pmf):
cdf = np.empty(len(pmf) + 1, dtype=pmf.dtype)
cdf[0] = 0
np.cumsum(pmf, out=cdf[1:])
return cdf
Example usage:
>>> np.arange(1, 11)
array([ 1, 2, 3, 4, 5, 6, 7, 8, 9, 10])
>>> cumsum(np.arange(1, 11))
array([ 0, 1, 3, 6, 10, 15, 21, 28, 36, 45, 55])
N-D cumsum
A wrapper around np.cumsum that sets first element to 0, and works with N-D arrays:
def cumsum(pmf, axis=None, dtype=None):
if axis is None:
pmf = pmf.reshape(-1)
axis = 0
if dtype is None:
dtype = pmf.dtype
idx = [slice(None)] * pmf.ndim
# Create array with extra element along cumsummed axis.
shape = list(pmf.shape)
shape[axis] += 1
cdf = np.empty(shape, dtype)
# Set first element to 0.
idx[axis] = 0
cdf[tuple(idx)] = 0
# Perform cumsum on remaining elements.
idx[axis] = slice(1, None)
np.cumsum(pmf, axis=axis, dtype=dtype, out=cdf[tuple(idx)])
return cdf
Example usage:
>>> np.arange(1, 11).reshape(2, 5)
array([[ 1, 2, 3, 4, 5],
[ 6, 7, 8, 9, 10]])
>>> cumsum(np.arange(1, 11).reshape(2, 5), axis=-1)
array([[ 0, 1, 3, 6, 10, 15],
[ 0, 6, 13, 21, 30, 40]])
I totally understand your pain, I wonder why Numpy doesn't allow this with np.cumsum. Anyway, though I'm really late and there's already another good answer, I prefer this one a bit more:
np.cumsum(np.pad(array, (1, 0), "constant"))
where array in your case is (B/position**2)*dt. You can change the order of np.pad and np.cumsum as well. I'm just adding a zero to the start of the array and calling np.cumsum.
You can use roll (shift right by 1) and then set the first entry to zero.
I am trying to sort values in an numpy array so that I can store all of the values that are in a certain range (That could probably be phrased better). Anyway ill give an example of what I am trying to do. I have an array called bins that looks like this:
bins = array([11,11.5,12,12.5,13,13.5,14])
I also have another array called avgs:
avgs = array([11.02, 13.67, 11.78, 12.34, 13.24, 12.98, 11.3, 12.56, 13.95, 13.56,
11.64, 12.45, 13.23, 13.64, 12.46, 11.01, 11.87, 12.34, 13,87, 13.04,
12.49, 12.5])
What I am trying to do is to find the index values of the avgs array that are in the ranges between the values of the bins array. For example I was trying to make a while loop that would create new variables for each bin. The first bin would be everything that is between bins[0] and bins[1] and would look like:
bin1 = array([0, 6, 15])
Those index values would correspond to the values 11.02, 11.3, and 11.01 in the avgs and would be the values of avgs that were between index values 0 and 1 in bins. I also need the other bins so another example would be:
bin2 = array([2, 10, 16])
However the challenging part of this for me was that the size of bins and avgs changes based on other parameters so I was trying to build something that would be able to be expanded to larger or smaller bins and avgs arrays.
Numpy has some pretty powerful bin counting functions.
>>> binplace = np.digitize(avgs, bins) #Returns which bin an average belongs
>>> binplace
array([1, 6, 2, 3, 5, 4, 1, 4, 6, 6, 2, 3, 5, 6, 3, 1, 2, 3, 5, 7, 5, 3, 4])
>>> np.where(binplace == 1)
(array([ 0, 6, 15]),)
>>> np.where(binplace == 2)
(array([ 2, 10, 16]),)
>>> avgs[np.where(binplace == 1)]
array([ 11.02, 11.3 , 11.01])