Efficiently assign a value within predefined range in Numpy - python

The objective is to assign new value within certain range (b_top,b_low).
The code below able to achieve the intended objective
b_top=np.array([1,7])
b_low=np.array([3,9])+1
Mask=np.zeros((1,11), dtype=bool)
for x,y in zip(b_top,b_low):
Mask[0,x:y]=True
However, I wonder there is single line approach, or more efficient way of doing this?

You can turn b_top and b_low into a mask using np.cumsum and the fact that bool and int8 are the same itemsize.
header = np.zeros(M.shape[1], np.uint8)
header[b_top] = 1
header[b_low if b_low[-1] < header.size else b_low[:-1]] = -1
header.cumsum(out=Mask[0].view(np.int8))
I've implemented this function in a little utility library I made. The function is called haggis.math.runs2mask. You would call it as
from haggis.math import runs2mask
Mask[0] = runs2mask(np.stack((b_top, b_low), -1), Mask.shape[1])

Related

Problems with Numpy.where function and syntax

I am working on some code to create a table of data and have run into an issue. Early on in the code I create a range of speeds that the program needs to check with the following:
Vskn=np.linspace(Vl,Vh, num=int ((Vh-Vl)*2+1))
For each Vskn I then compute the FN and Frcrit, which are separate functions that use the Vskn list... Basically I need the FN and Frcrit for each speed.
Later on in the code I need to determine if FN or Frcrit is higher, then do some calculations to them based on that result. I have tried each of the following, and neither work.
np.where(FN<Frcrit[kFrm=1,kFrm=(FN/Frcrit)**c1dm])
Results in a "SyntaxError: invalid syntax"
#if FN<Frcrit:
# kFrm=1
#else:
# kFrm=(FN/Frcrit)**c1dm
Results in a "ValueError: The truth value of an array with more than one element is ambiguous. Use a.any() or a.all()"
How do I resolve this?
Your syntax is just a little off. When using numpy, it's often easier to use indexing directly rather than using where. So, if kFrm already exists, then you can index into kFrm "where" FN < Frcrit and set it to 1, and similarly index into kFrm "where" FN >= Frcrit and set it equal to your equation. I'm also indexing into greater_than_vals to make the shapes work.
kFrm[FN < Frcrit] = 1
greater_than_vals = (FN / Frcrit) ** c1dm
kFrm[FN >= Frcrit] = greater_than_vals[FN >= Frcrit]
If kFrm doesn't exist yet, then you can do:
kFrm = np.ones_like(FN)
greater_than_vals = (FN / Frcrit) ** c1dm
kFrm[FN >= Frcrit] = greater_than_vals[FN >= Frcrit]

Parallelizing for loops using Dask (or other efficient way)

I have a function who takes xarray data set (similar to pandas multi-index) and uses 4 for loops embedded in each other to compute a new data array variable.
I wonder if there is a way I can use Dask to make this process faster, I'm quite new to this so I'm not sure.
The function looks like this:
def A_calc(data, thresh):
A = np.zeros((len(data.time), len(data.lat), len(data.lon)))
foo = xr.DataArray(A, coords=[data.time, data.lat, data.lon],
dims=['time','lat', 'lon'])
for t in tqdm(range(len(data.time))):
for i in range(len(data.lat)):
for j in range(2,len(data.lon)):
for k in range(len(data.lev)):
if np.isnan(
data[dict(time=[t], lat=[i], lon=[j], lev=[k])].sigma_0.values):
foo[dict(time=[t], lat=[i], lon=[j])] = np.nan
break
elif abs(
data[dict(time=[t], lat=[i], lon=[j], lev=[k])].sigma_0.values
- data[dict(time=[t], lat=[i], lon=[j], lev=[1])].sigma_0.values) >= thresh:
foo[dict(time=[t], lat=[i], lon=[j])] = data.lev[k].values
break
return foo
Any suggestions?
As is said in the comments, Python for loops are slow. Typically the first step to accelerating code like to this is to either ...
Find some clever way to write all of this as a vectorized numpy expression, without Python for loops
Use Numba

What is the equivalent way of doing this type of pythonic vectorized assignment in MATLAB?

I'm trying to translate this line of code from Python to MATLAB:
new_img[M[0, :] - corners[0][0], M[1, :] - corners[1][0], :] = img[T[0, :], T[1, :], :]
So, naturally, I wrote something like this:
new_img(M(1,:)-corners(2,1),M(2,:)-corners(2,2),:) = img(T(1,:),T(2,:),:);
But it gives me the following error when it reaches that line:
Requested 106275x106275x3 (252.4GB) array exceeds maximum array size
preference. Creation of arrays greater than this limit may take a long
time and cause MATLAB to become unresponsive. See array size limit or
preference panel for more information.
This has made me believe that it is not assigning things correctly. Img is at most a 1000 × 1500 RGB image. The same code works in less than 5 seconds in Python. How can I do vector assignment like the code in the first line in MATLAB?
By the way, I didn't paste all lines of my code for this post not to get too long. If I need to add anything else, please let me know.
Edit:
Here's an explanation of what I want my code to do (basically, this is what the Python code does):
Consider this line of code. It's not a real MATLAB code, I'm just trying to explain what I want to do:
A([2 3 5], [1 3 5]) = B([1 2 3], [2 4 6])
It is interpreted like this:
A(2,1) = B(1,2)
A(3,1) = B(2,2)
A(5,1) = B(3,2)
A(2,3) = B(1,4)
A(3,3) = B(2,4)
A(5,3) = B(3,4)
...
...
...
Instead, I want it to be interpreted like this:
A(2,1) = B(1,2)
A(3,3) = B(2,4)
A(5,5) = B(3,6)
When you do A[vector1, vector2] in Python, you index the set:
A[vector1[0], vector2[0]]
A[vector1[1], vector2[1]]
A[vector1[2], vector2[2]]
A[vector1[3], vector2[3]]
...
In MATLAB, the similar-looking A(vector1, vector2) instead indexes the set:
A(vector1(1), vector2(1))
A(vector1(1), vector2(2))
A(vector1(1), vector2(3))
A(vector1(1), vector2(4))
...
A(vector1(2), vector2(1))
A(vector1(2), vector2(2))
A(vector1(2), vector2(3))
A(vector1(2), vector2(4))
...
That is, you get each combination of indices. You should think of it as a sub-array composed of the rows and columns specified in the two vectors.
To accomplish the same as the Python code, you need to use linear indexing:
index = sub2ind(size(A), vector1, vector2);
A(index)
Thus, your MATLAB code should do:
index1 = sub2ind(size(new_img), M(1,:)-corners(2,1), M(2,:)-corners(2,2));
index2 = sub2ind(size(img), T(1,:), T(2,:));
% these indices are for first 2 dims only, need to index in 3rd dim also:
offset1 = size(new_img,1) * size(new_img,2);
offset2 = size(img,1) * size(img,2);
index1 = index1.' + offset1 * (0:size(new_img,3)-1);
index2 = index2.' + offset2 * (0:size(new_img,3)-1);
new_img(index1) = img(index2);
What the middle block does here is add linear indexes for the same elements along the 3rd dimension. If ii is the linear index to an element in the first channel, then ii + offset1 is an index to the same element in the second channel, and ii + 2*offset1 is an index to the same element in the third channel, etc. So here we're generating indices to all those matrix elements. The + operation is doing implicit singleton expansion (what they call "broadcasting" in Python). If you have an older version of MATLAB this will fail, you need to replace that A+B with bsxfun(#plus,A,B).

How to efficiently mutate certain num of values in an array?

Given an initial 2-D array:
initial = [
[0.6711999773979187, 0.1949000060558319],
[-0.09300000220537186, 0.310699999332428],
[-0.03889999911189079, 0.2736999988555908],
[-0.6984000205993652, 0.6407999992370605],
[-0.43619999289512634, 0.5810999870300293],
[0.2825999855995178, 0.21310000121593475],
[0.5551999807357788, -0.18289999663829803],
[0.3447999954223633, 0.2071000039577484],
[-0.1995999962091446, -0.5139999985694885],
[-0.24400000274181366, 0.3154999911785126]]
The goal is to multiply some random values inside the array by a random percentage. Lets say only 3 random numbers get replaced by a random multipler, we should get something like this:
output = [
[0.6711999773979187, 0.52],
[-0.09300000220537186, 0.310699999332428],
[-0.03889999911189079, 0.2736999988555908],
[-0.6984000205993652, 0.6407999992370605],
[-0.43619999289512634, 0.5810999870300293],
[0.84, 0.21310000121593475],
[0.5551999807357788, -0.18289999663829803],
[0.3447999954223633, 0.2071000039577484],
[-0.1995999962091446, 0.21],
[-0.24400000274181366, 0.3154999911785126]]
I've tried doing this:
def mutate(array2d, num_changes):
for _ in range(num_changes):
row, col = initial.shape
rand_row = np.random.randint(row)
rand_col = np.random.randint(col)
cell_value = array2d[rand_row][rand_col]
array2d[rand_row][rand_col] = random.uniform(0, 1) * cell_value
return array2d
And that works for 2D arrays but there's chance that the same value is mutated more than once =(
And I don't think that's efficient and it only works on 2D array.
Is there a way to do such "mutation" for array of any shape and more efficiently?
There's no restriction of which value the "mutation" can choose from but the number of "mutation" should be kept strict to the user specified number.
One fairly simple way would be to work with a raveled view of the array. You can generate all your numbers at once that way, and make it easier to guarantee that you won't process the same index twice in one call:
def mutate(array_anyd, num_changes):
raveled = array_anyd.reshape(-1)
indices = np.random.choice(raveled.size, size=num_changes, replace=False)
values = np.random.uniform(0, 1, size=num_changes)
raveled[indices] *= values
I use array_anyd.reshape(-1) in favor of array_anyd.ravel() because according to the docs, the former is less likely to make an inadvertent copy.
The is of course still such a possibility. You can add an extra check to write back if you need to. A more efficient way would be to use np.unravel_index to avoid creating a view to begin with:
def mutate(array_anyd, num_changes):
indices = np.random.choice(array_anyd.size, size=num_changes, replace=False)
indices = np.unravel_indices(indices, array_anyd.shape)
values = np.random.uniform(0, 1, size=num_changes)
raveled[indices] *= values
There is no need to return anything because the modification is done in-place. Conventionally, such functions do not return anything. See for example list.sort vs sorted.
Using shuffle instead of random_choice, this would be a different solution. It works on an array of any shape.
def mutate(arrayIn, num_changes):
mult = np.zeros(arrayIn.ravel().shape[0])
mult[:num_changes] = np.random.uniform(0,1,num_changes)
np.random.shuffle(mult)
mult = mult.reshape(arrayIn.shape)
arrayIn = arrayIn + mult*arrayIn
return arrayIn

How to know the threshold value when using the Open CV adaptiveThreshold() method?

When using OpenCV's adaptiveThreshold() method , it returns only the image. But I want to know the used threshold value (like in the fixed thresholdvalue() method). Any idea of how to get that?
As you can read in the documentation to adaptiveThreshold, the output is obtained by thresholding each pixel with a different value. These values are obtained by one of two methods, depending on the value of the parameter adaptiveMethod:
output = cv2.adaptiveThreshold(input, 255, adaptiveMethod, THRESH_BINARY, blockSize, C);
Is equivalent to:
if adaptiveMethod == cv2.ADAPTIVE_THRESH_MEAN_C:
T = cv2.blur(input, blockSize);
else: # adaptiveMethod == cv2.ADAPTIVE_THRESH_GAUSSIAN_C
T = cv2.GaussianBlur(input, blockSize);
T -= C;
output = input > T;
Note that > by default sets the output to 255 if the condition is true, this is why I selected 255 as value for the maxValue parameter to adaptiveThreshold.
[I have not run the code above, but am reasonably sure it works as advertised. Please comment if there are any issues.]

Categories

Resources