Test of equality between arrays using in1d function - python

Before I ask my question, I provide you with the code.
Code
from scipy import *
x = randn(10)
cum_x = cumsum(x)
#The objective is to recover x using cum_x and the diff function.
y = append(cum_x[0],diff(cum_x))
#Now, y should be equal to x but this is not confirmed by the function in1d
test = in1d(x,y)
The variable test does not return an array of "True" boolean values even if y and x are clearly the same. What is the problem here?
Thank you in advance.

if you use set_printoptions to increase precision you will see some differences:
from scipy import *
set_printoptions(30)
x = randn(10)
cum_x = cumsum(x)
#The objective is to recover x using cum_x and the diff function.
y = append(cum_x[0], diff(cum_x))
print(x)
print("\n")
print(y)
#Now, y should be equal to x but this is not confirmed by the function in1d
test = in1d(x, y)
print(test)
Output:
[ 0.54816314147543721002620031868 0.14319052613251953554041051575
0.489110961092741158839913850898 -0.093011827554544138085823590245
-0.58370623188476589149331630324 -0.40395493550429123486011917521
0.387387395892057895263604905267 1.001637373359834937147638811439
-1.486778459872974744726548124163 1.446772274227251076084144187917]
[ 0.54816314147543721002620031868 0.143190526132519591051561747008
0.48911096109274110332876261964 -0.093011827554544179719187013688
-0.58370623188476589149331630324 -0.40395493550429123486011917521
0.387387395892057895263604905267 1.001637373359834937147638811439
-1.486778459872974744726548124163 1.446772274227251076084144187917]
[ True False False False True True True True True True]
What you probably want is allclose but interestingly setting the dtype to np.float128 or np.longdouble on my ubuntu system does not lose precision and in1d returns True.
cum_x = cumsum(x,dtype=np.longdouble)

Related

Why do I observe difference between direct (manual) vs looped evaluation of summation?

I am evaluating an extensive summation, by evaluating each term separately using a for-loop (Python 3.5 + NumPy 1.15.4). However, I obtained a surprising result when comparing manual term-by-term evaluation vs. using the for-loop. See MWE below.
S = sum(c_i x^i) for i=0..n (properly formatted LaTeX version here)
Main questions:
Where does the difference in the outputs y1 and y2 originate from?
How could I alter the code such that the for-loop yields the expected result (y1==y2)?
Comparing dy1 and dy2:
dy1:
[-1.76004137e-02 3.50290845e+01 1.50326037e+01 -7.25045852e+01
2.08908445e+02 -3.31104542e+02 2.98005855e+02 -1.53154111e+02
4.18203833e+01 -4.68961704e+00 0.00000000e+00]
dy2:
[-1.76004137e-02 3.50290845e+01 1.50326037e+01 -7.25045852e+01
-3.27960559e-01 -4.01636743e-04 2.26525295e-07 4.80637463e-10
1.93967535e-13 -1.93976497e-17 -0.00000000e+00]
dy1==dy2:
[ True True True True False False False False False False True]
Thanks!
MWE:
import numpy as np
coeff = np.array([
[ 0.000000000000E+00, -0.176004136860E-01],
[ 0.394501280250E-01, 0.389212049750E-01],
[ 0.236223735980E-04, 0.185587700320E-04],
[-0.328589067840E-06, -0.994575928740E-07],
[-0.499048287770E-08, 0.318409457190E-09],
[-0.675090591730E-10, -0.560728448890E-12],
[-0.574103274280E-12, 0.560750590590E-15],
[-0.310888728940E-14, -0.320207200030E-18],
[-0.104516093650E-16, 0.971511471520E-22],
[-0.198892668780E-19, -0.121047212750E-25],
[-0.163226974860E-22, 0.000000000000E+00]
]).T
c = coeff[1] # select appropriate coeffs
x = 900 # input
# manual calculation
y = c[0]*x**0 + c[1]*x**1 + c[2]*x**2 + c[3]*x**3 + c[4]*x**4 + \
c[5]*x**5 + c[6]*x**6 + c[7]*x**7 + c[8]*x**8 + c[9]*x**9 + c[10]*x**10
print('y:',y)
# calc terms individually
dy1 = np.zeros(c.size)
dy1[0] = c[0]*x**0
dy1[1] = c[1]*x**1
dy1[2] = c[2]*x**2
dy1[3] = c[3]*x**3
dy1[4] = c[4]*x**4
dy1[5] = c[5]*x**5
dy1[6] = c[6]*x**6
dy1[7] = c[7]*x**7
dy1[8] = c[8]*x**8
dy1[9] = c[9]*x**9
dy1[10] = c[10]*x**10
# calc terms in for loop
dy2 = np.zeros(len(c))
for i in np.arange(len(c)):
dy2[i] = c[i]*x**i
# summation and print
y1 = np.sum(dy1)
print('y1:',y1)
y2 = np.sum(dy2)
print('y2:',y2)
Output:
y: 37.325915370853856
y1: 37.32591537085385
y2: -22.788859384118823
It seems like raising a python int to a power of numpy integer (of specific size) leads to conversion of result to a numpy integer of the same size.
Example:
type(900**np.int32(10))
returns numpy.int32 and
type(900**np.int64(10))
returns numpy.int64
From this Stackoverflow question it seems that while Python int are variable sized, numpy integers are not (the size is specified by type as, for example, np.int32 or np.int64). So, while Python range function returns integers of variable size (Python int type), np.arange returns integers of specific type (if not specified, type is inferred).
Trying to compare the Python integer math vs numpy integer math:
900**10 returns 348678440100000000000000000000
while 900**np.int32(10) returns -871366656
Looks like you get integer overflow via np.arange function because the numpy integer dtype (in this case it is inferred as np.int32) is too small to store the resulting value.
Edit:
In this specific case, using np.arange(len(c), dtype = np.uint64) seems to output the right values:
dy2 = np.zeros(len(c))
for i in np.arange(len(c), dtype = np.uint64):
dy2[i] = c[i]*x**i
dy1 == dy2
Outputs:
array([ True, True, True, True, True, True, True, True, True,
True, True])
Note: the accuracy might suffer using numpy in this case (int(900*np.uint64(10)) returns 348678440099999970966892445696 which is less than 900**10), so if that is of importance, I'd still opt to use Python built-in range function.

Function call error in Python

I have a 1D array X with both +/- elements. I'm isolating their signs as follows:
idxN, idxP = X<0, X>=0
Now I want to create an array whose value depends on the sign of X. I was trying to compute this but it gives the captioned syntax error.
y(idxN) = [math.log(1+np.exp(x)) for x in X(idxN)]
y(idxP) = X(idxP)+[math.log(np.exp(-x)+1) for x in X(idxP)];
Is the LHS assignment the culprit?
Thanks.
[Edit] The full code is as follows:
y = np.zeros(X.shape)
idxN, idxP = X<0, X>=0
y(idxN) = [math.log(1+np.exp(x)) for x in X(idxN)]
y(idxP) = X(idxP)+[math.log(np.exp(-x)+1) for x in X(idxP)];
return y
The traceback is:
y(idxN) = [math.log(1+np.exp(x)) for x in X(idxN)]
File "<ipython-input-63-9a4488f04660>", line 1
y(idxN) = [math.log(1+np.exp(x)) for x in X(idxN)]
^
SyntaxError: can't assign to function call
In some programming languages like Matlab, indexes are references with parentheses. In Python, indexes are represented with square brackets.
If I have a list, mylist = [1,2,3,4], I reference elements like this:
> mylist[1]
2
Wen you say y(idxN), Python thinks you are trying to pass idxN as an argument a function named y.
I got it to work like this:
y = np.zeros(X.shape)
idxN, idxP = X<0, X>=0
yn,yp,xn,xp = y[idxN], y[idxP],X[idxN],X[idxP]
yn = [math.log(1+np.exp(x)) for x in xn]
yp = xp+[math.log(np.exp(-x)+1) for x in xp];
If there is a better way, please let me know. Thanks.

Writing a function for x * sin(3/x) in python

I have to write a function, s(x) = x * sin(3/x) in python that is capable of taking single values or vectors/arrays, but I'm having a little trouble handling the cases when x is zero (or has an element that's zero). This is what I have so far:
def s(x):
result = zeros(size(x))
for a in range(0,size(x)):
if (x[a] == 0):
result[a] = 0
else:
result[a] = float(x[a] * sin(3.0/x[a]))
return result
Which...doesn't work for x = 0. And it's kinda messy. Even worse, I'm unable to use sympy's integrate function on it, or use it in my own simpson/trapezoidal rule code. Any ideas?
When I use integrate() on this function, I get the following error message: "Symbol" object does not support indexing.
This takes about 30 seconds per integrate call:
import sympy as sp
x = sp.Symbol('x')
int2 = sp.integrate(x*sp.sin(3./x),(x,0.000001,2)).evalf(8)
print int2
int1 = sp.integrate(x*sp.sin(3./x),(x,0,2)).evalf(8)
print int1
The results are:
1.0996940
-4.5*Si(zoo) + 8.1682775
Clearly you want to start the integration from a small positive number to avoid the problem at x = 0.
You can also assign x*sin(3./x) to a variable, e.g.:
s = x*sin(3./x)
int1 = sp.integrate(s, (x, 0.00001, 2))
My original answer using scipy to compute the integral:
import scipy.integrate
import math
def s(x):
if abs(x) < 0.00001:
return 0
else:
return x*math.sin(3.0/x)
s_exact = scipy.integrate.quad(s, 0, 2)
print s_exact
See the scipy docs for more integration options.
If you want to use SymPy's integrate, you need a symbolic function. A wrong value at a point doesn't really matter for integration (at least mathematically), so you shouldn't worry about it.
It seems there is a bug in SymPy that gives an answer in terms of zoo at 0, because it isn't using limit correctly. You'll need to compute the limits manually. For example, the integral from 0 to 1:
In [14]: res = integrate(x*sin(3/x), x)
In [15]: ans = limit(res, x, 1) - limit(res, x, 0)
In [16]: ans
Out[16]:
9⋅π 3⋅cos(3) sin(3) 9⋅Si(3)
- ─── + ──────── + ────── + ───────
4 2 2 2
In [17]: ans.evalf()
Out[17]: -0.164075835450162

python: getting around division by zero

I have a big data set of floating point numbers. I iterate through them and evaluate np.log(x) for each of them.
I get
RuntimeWarning: divide by zero encountered in log
I would like to get around this and return 0 if this error occurs.
I am thinking of defining a new function:
def safe_ln(x):
#returns: ln(x) but replaces -inf with 0
l = np.log(x)
#if l = -inf:
l = 0
return l
Basically,I need a way of testing that the output is -inf but I don't know how to proceed.
Thank you for your help!
You are using a np function, so I can safely guess that you are working on a numpy array?
Then the most efficient way to do this is to use the where function instead of a for loop
myarray= np.random.randint(10,size=10)
result = np.where(myarray>0, np.log(myarray), 0)
otherwise you can simply use the log function and then patch the hole:
myarray= np.random.randint(10,size=10)
result = np.log(myarray)
result[result==-np.inf]=0
The np.log function return correctly -inf when used on a value of 0, so are you sure that you want to return a 0? if somewhere you have to revert to the original value, you are going to experience some problem, changing zeros into ones...
Since the log for x=0 is minus infinite, I'd simply check if the input value is zero and return whatever you want there:
def safe_ln(x):
if x <= 0:
return 0
return math.log(x)
EDIT: small edit: you should check for all values smaller than or equal to 0.
EDIT 2: np.log is of course a function to calculate on a numpy array, for single values you should use math.log. This is how the above function looks with numpy:
def safe_ln(x, minval=0.0000000001):
return np.log(x.clip(min=minval))
You can do this.
def safe_ln(x):
try:
l = np.log(x)
except ZeroDivisionError:
l = 0
return l
I like to use sys.float_info.min as follows:
>>> import numpy as np
>>> import sys
>>> arr = np.linspace(0.0, 1.0, 3)
>>> print(arr)
[0. 0.5 1. ]
>>> arr[arr < sys.float_info.min] = sys.float_info.min
>>> print(arr)
[2.22507386e-308 5.00000000e-001 1.00000000e+000]
>>> np.log10(arr)
array([-3.07652656e+02, -3.01029996e-01, 0.00000000e+00])
Other answers have also introduced small positive values, but I prefer to use the smallest possible value to make the approximation more accurate.
The answer given by Enrico is nice, but both solutions result in a warning:
RuntimeWarning: divide by zero encountered in log
As an alternative, we can still use the where function but only execute the main computation where it is appropriate:
# alternative implementation -- a bit more typing but avoids warnings.
loc = np.where(myarray>0)
result2 = np.zeros_like(myarray, dtype=float)
result2[loc] =np.log(myarray[loc])
# answer from Enrico...
myarray= np.random.randint(10,size=10)
result = np.where(myarray>0, np.log(myarray), 0)
# check it is giving right solution:
print(np.allclose(result, result2))
My use case was for division, but the principle is clearly the same:
x = np.random.randint(10, size=10)
divisor = np.ones(10,)
divisor[3] = 0 # make one divisor invalid
y = np.zeros_like(divisor, dtype=float)
loc = np.where(divisor>0) # (or !=0 if your data could have -ve values)
y[loc] = x[loc] / divisor[loc]
use exception handling:
In [27]: def safe_ln(x):
try:
return math.log(x)
except ValueError: # np.log(x) might raise some other error though
return float("-inf")
....:
In [28]: safe_ln(0)
Out[28]: -inf
In [29]: safe_ln(1)
Out[29]: 0.0
In [30]: safe_ln(-100)
Out[30]: -inf
you could do:
def safe_ln(x):
#returns: ln(x) but replaces -inf with 0
try:
l = np.log(x)
except RunTimeWarning:
l = 0
return l
For those looking for a np.log solution that intakes a np.ndarray and nudges up only zero values:
import sys
import numpy as np
def smarter_nextafter(x: np.ndarray) -> np.ndarray:
safe_x = np.where(x != 0, x, np.nextafter(x, 1))
return np.log(safe_x)
def clip_usage(x: np.ndarray, safe_min: float | None = None) -> np.ndarray:
# Inspiration: https://stackoverflow.com/a/13497931/
clipped_x = x.clip(min=safe_min or np.finfo(x.dtype).min)
return np.log(clipped_x)
def inplace_usage(x: np.ndarray, safe_min: float | None = None) -> np.ndarray:
# Inspiration: https://stackoverflow.com/a/62292638/
x[x == 0] = safe_min or np.finfo(x.dtype).min
return np.log(x)
Or if you don't mind nudging all values and like bad big-O runtimes:
def brute_nextafter(x: np.ndarray) -> np.ndarray:
# Just for reference, don't use this
while not x.all():
x = np.nextafter(x, 1)
return np.log(x)

True or false output based on a probability

Is there a standard function for Python which outputs True or False probabilistically based on the input of a random number from 0 to 1?
example of what I mean:
def decision(probability):
...code goes here...
return ...True or False...
the above example if given an input of, say, 0.7 will return True with a 70% probability and false with a 30% probability
import random
def decision(probability):
return random.random() < probability
Given a function rand that returns a number between 0 and 1, you can define decision like this:
bool decision(float probability)
{
return rand()<probability;
}
Assuming that rand() returns a value in the range [0.0, 1.0) (so can output a 0.0, will never output a 1.0).
Just use PyProbs library. It is very easy to use.
>>> from pyprobs import Probability as pr
>>>
>>> # You can pass float (i.e. 0.5, 0.157), int (i.e. 1, 0) or str (i.e. '50%', '3/11')
>>> pr.prob(50/100)
False
>>> pr.prob(50/100, num=5)
[False, False, False, True, False]
I use this to generate a random boolean in python with a probability:
from random import randint
n=8 # inverse of probability
rand_bool=randint(0,n*n-1)%n==0
so to expand that :
def rand_bool(prob):
s=str(prob)
p=s.index('.')
d=10**(len(s)-p)
return randint(0,d*d-1)%d<int(s[p+1:])
I came up with this myself but it seems to work.
If you want to amass a lot of data, I would suggest using a map:
from numpy import random as rn
p = 0.15
data = rn.random(100)
final_data = list(map(lambda x: x < p, data))

Categories

Resources