Python np.sqrt(x-a)*np.heaviside(x,a) - python

I am trying to implement a calculation from a research paper. In that calculation, the value of a function is supposed to be
0, for x<a
sqrt(x-a)*SOMETHING_ELSE, for x>=a
In my module, x and a are 1D numpy-arrays (of the same length). In my first attempt I implemented the function as
f = np.sqrt(x-a)*SOMETHING*np.heaviside(x,a)
But for x<a, np.sqrt() returns NaN and even though the heaviside function returns 0 in that case, in Python 0*NaN = NaN.
I could also replace all NaN in my resulting array with 0s afterwards but that would lead to warning outputs from numpy.sqrt() used on negative values that I would need to supress. Another solution is to treat the argument of the squareroot as an imaginary number by adding 0j and taking the real part afterwards:
f = np.real(np.sqrt(x-a+0j)*SOMETHING*np.heaviside(x,a))
But I feel like both solutions are not really elegant and the second solution is unnecessarily complicated to read. Is there a more elegant way to do this in Python that I am missing here?

You can cheat with np.maximum in this case to not compute the square root of negative numbers.
Moreover, please note that np.heaviside does not use a as a threshold but 0 (the second parameter is the output of the heaviside in some case). You can use np.where instead.
Here is an example:
f = np.where(x<a, 0, np.sqrt(np.maximum(x-a, 0))*SOMETHING)
Note that in this specific case, the expression can be simplified and np.where is not even needed (because np.sqrt(np.maximum(x-a, 0)) gives 0). Thus, you can simply write:
f = np.sqrt(np.maximum(x-a, 0))*SOMETHING

Related

Numpy Divide Arrays With Multiple Out Conditions

I have two, two-dimensional arrays (say arrayA & arrayB) that are exactly the same size (2500X, 1500Y). I am interested in dividing array A by array B, but have three conditions that I would like to be excluded from the division and instead replaced with a specific value. These conditions are:
If arrayB contains zero at point (Bx,By), replace output (Cx,Cy) with (arrayA*arrayA)
If arrayA contains zero at point (Ax,Ay), replace output (Cx,Cy) with 0.50
If both arrayA & B at overlapping points (Ax,Ay & Bx,By) contain 0, replace output (Cx,Cy) with 1
I've found that numpy.divide parameters out and where allow me to define each of these individually, so I've taken the first condition and arranged it as follows:
arrayC = np.divide(arrayA, arrayB, out=(arrayA*arrayA), where=arrayB!=0)
My question is how can I combine the other two conditions and their desired outputs within this operation?
One solution, not sure it is the fastest
za=A==0
zb=B==0
case0=(~za)&~zb
case1=zb&~za
case2=za&~zb
case3=za&zb
C=case3*1 + case2*0.5 + case1*A*A # Case 3,2, 1
C[case0]=(A[case0]/B[case0])
Could be more compact with less intermediate values, but I've chosen clarity.
You could also use a cascade of np.where
zb=B==0
C=np.where(A==0, np.where(zb,1,0.5), np.where(zb, A*A, A/B))
Edit: better version (but still not perfect)
zb=B==0
za=A==0
C=np.where(za, np.where(zb,1,0.5), A*A)
np.divide(A, B, out=C, where=(~zb)&~za)
It combines np.where and your np.divide where=
It is as fast as the previous solution.
And does not complain about 0-division, since division occurs only for the cases where it is needed.
Nevertheless, it computes the first version of C (the one before np.divide), and particularly A*A, everywhere, even where it is not needed, since it will be overwritten.
So, it could probably be better.

Float rounding error with Numpy isin function

I'm trying to use the isin() function from Numpy library to find elements that are common in two arrays.
Seems pretty basic, but one of those arrays is created using linspace() and the other I just put hard values in.
But it seems like isin() is using == for its comparisons, and so the result returned by the method is missing one of the numbers.
Is there a way I can work around this, either by defining my arrays differently or by using a method other than isin() ?
thetas = np.array(np.linspace(.25, .50, 51))
known_thetas = [.3, .35, .39, .41, .45]
unknown_thetas = thetas[np.isin(thetas, known_thetas, assume_unique = True, invert = True)]
Printing the three arrays, I find that .41 is still in the third array, because when printing them one by one, my value in the first array is actually 0.41000000000000003, which means == comparison returns False. What is the best way of working around this ?
We could make use of np.isclose after extending one of those arrays to 2D for an outer isclose-match-finding and then doing a ANY match to give us a 1D boolean-array that could be used to mask the relevant input array -
thetas[~np.isclose(thetas[:,None],known_thetas).any(1)]
To customize the level of tolerance for matches, we could feed in custom relative and absolute tolerance values to np.isclose.
If you are looking for performance on large arrays, we could optimize on memory and hence performance too with a NumPy implementation of np.isin with tolerance arg for floating pt numbers with np.searchsorted -
thetas[~isin_tolerance(thetas,known_thetas,tol=0.001)]
Feed in your tolerance value in tol arg.
If you have a fixed absolute tolerance, you can use np.around to round the values before comparing:
unknown_thetas = thetas[np.isin(np.around(thetas, 5), known_thetas, assume_unique = True, invert = True)]
This rounds thetas to 5 decimal digits, but it's up to you to decide how close the numbers need to be for you to consider them equal.

How to check convergence for series with two indices using sympy or numpy

I have series with two symbols and I need to find sum with two-point accuracy. I checked results for some parts using sympy, and I know the answer, but I have no idea how to prove it. I don't even know how to prove convergence using sympy
import sympy as sp
i, j = sp.symbols('i, j', integer = True)
S = sp.Sum(sp.Sum(1/(i * j)*sp.sin(sp.pi*i/2)*
sp.sin(sp.pi*(2*j-1)/2)/sp.sinh(sp.pi*i), (i, 1,sp.oo )),(j,1,sp.oo))
S.is_convergent()
returns
NotImplementedError: convergence checking for more that one symbol
containing series is not handled
In principle, you can evaluate a Sympy Sum by calling either the method .doit(), which returns a closed-form expression (if Sympy is able to find one), or by calling the method .n(), which returns a numerical approximation of the sum (floating point number). In this case, both options do not work (I would expect at least the .n() to give an answer).
As a workaround, you could try to perform a simplification of the sum and then attempt to evaluate it. In particular,
S.factor()
returns
which transforms the double-index sum into the product of two single-index sums. Calling
S.factor().doit()
returns
which implies that the second sum cannot be evaluated in closed-form. However, .n() works now.
S.factor().n()
0.0599820444370520

Replacing missing values with random in a numpy array

I have a 2D numpy array with binary data, i.e. 0s and 1s (not observed or observed). For some instances, that information is missing (NaN). Since the missing values are random in the data set, I think the best way to replace them would be using random 0s and 1s.
Here is some example code:
import numpy as np
row, col = 10, 5
matrix = np.random.randint(2, size=(row,col))
matrix = matrix.astype(float)
matrix[1,2] = np.nan
matrix[5,3] = np.nan
matrix[8,0] = np.nan
matrix[np.isnan(matrix)] = np.random.randint(2)
The problem with this is that all NaNs are replaced with the same value, either 0 or 1, while I would like both. Is there a simpler solution than for example a for loop calling each NaN separately? The data set I'm working on is a lot bigger than this example.
Try
nan_mask = np.isnan(matrix)
matrix[nan_mask] = np.random.randint(0, 2, size=np.count_nonzero(nan_mask))
You can use a vectorized function:
random_replace = np.vectorize(lambda x: np.random.randint(2) if np.isnan(x) else x)
random_replace(matrix)
Since the missing values are random in the data set, I think the best way to replace them would be using random 0s and 1s.
I'd heartily contradict you here. Unless you have stochastic model that proves that assuming equal probability for each element to be either 0 or 1, that would bias your observation.
Now, I don't know where your data comes from, but "2D array" sure sounds like an image signal, or something of the like. You can find that most of the energy in many signal types is in low frequencies; if something of the like is the case for you, you can probably get lesser distortion by replacing the missing values with an element of a low-pass filtered version of your 2D array.
Either way, since you need to call numpy.isnan from python to check whether a value is NaN, I think the only way to solve this is writing an efficient loop, unless you want to senselessly calculate a huge random 2D array just to fill in a few missing numbers.
EDIT: oh, I like the vectorized version; it's effectively what I'd call a efficient loop, since it does the looping without interpreting a python loop iteration each time.
EDIT2: the mask method with counting nonzeros is even more effective, I guess :)

Check array for values equal or very close to zero

I have a one dimensional numpy array for which I need to find out if any value is zero or very close to it.
With this line I can check for zeros rapidly:
if 0. in my_array:
# do something
but I also have very small elements like 1.e-22 which I would also like to treat as zeros (otherwise I get a Divide by zero warning further down the road)
Say my threshold is 1.e-6 and I want to efficiently check if any value in my array is smaller than that. How can I do this?
There's no reason to loop in Python; just broadcast the abs and the < and use np.any:
np.any(np.absolute(my_array) < eps)
If you are using this for testing you can use numpy.testing.assert_almost_equal
As the document says it uses a method similar to #phihag suggests:
any(abs(x) < 0.5 * 10**(-decimal))
If you are doing this often, you should try to use searchsorted, or if you have scipy KDTree (or cKDTree depending on the version), to speed things up.

Categories

Resources