How can I avoid value errors when using numpy.random.multinomial? - python

When I use this random generator: numpy.random.multinomial, I keep getting:
ValueError: sum(pvals[:-1]) > 1.0
I am always passing the output of this softmax function:
def softmax(w, t = 1.0):
e = numpy.exp(numpy.array(w) / t)
dist = e / np.sum(e)
return dist
except now that I am getting this error, I also added this for the parameter (pvals):
while numpy.sum(pvals) > 1:
pvals /= (1+1e-5)
but that didn't solve it. What is the right way to make sure I avoid this error?
EDIT: here is function that includes this code
def get_MDN_prediction(vec):
coeffs = vec[::3]
means = vec[1::3]
stds = np.log(1+np.exp(vec[2::3]))
stds = np.maximum(stds, min_std)
coe = softmax(coeffs)
while np.sum(coe) > 1-1e-9:
coe /= (1+1e-5)
coeff = unhot(np.random.multinomial(1, coe))
return np.random.normal(means[coeff], stds[coeff])

I also encountered this problem during my language modelling work.
The root of this problem rises from numpy's implicit data casting: the output of my sorfmax() is in float32 type, however, numpy.random.multinomial() will cast the pval into float64 type IMPLICITLY. This data type casting would cause pval.sum() exceed 1.0 sometimes due to numerical rounding.
This issue is recognized and posted here

I know the question is old but since I faced the same problem just now, it seems to me it's still valid. Here's the solution I've found for it:
a = np.asarray(a).astype('float64')
a = a / np.sum(a)
b = np.random.multinomial(1, a, 1)
I've made the important part bold. If you omit that part the problem you've mentioned will happen from time to time. But if you change the type of array into float64, it will never happen.

Something that few people noticed: a robust version of the softmax can be easily obtained by removing the logsumexp from the values:
from scipy.misc import logsumexp
def log_softmax(vec):
return vec - logsumexp(vec)
def softmax(vec):
return np.exp(log_softmax(vec))
Just check it:
print(softmax(np.array([1.0, 0.0, -1.0, 1.1])))
Simple, isn't it?

The softmax implementation I was using is not stable enough for the values I was using it with. As a result, sometimes the output has a sum greater than 1 (e.g. 1.0000024...).
This case should be handled by the while loop. But sometimes the output contains NaNs, in which case the loop is never triggered, and the error persists.
Also, numpy.random.multinomial doesn't raise an error if it sees a NaN.
Here is what I'm using right now, instead:
def softmax(vec):
vec -= min(A(vec))
if max(vec) > 700:
a = np.argsort(vec)
aa = np.argsort(a)
vec = vec[a]
i = 0
while max(vec) > 700:
i += 1
vec -= vec[i]
vec = vec[aa]
e = np.exp(vec)
return e/np.sum(e)
def sample_multinomial(w):
"""
Sample multinomial distribution with parameters given by softmax of w
Returns an int
"""
p = softmax(w)
x = np.random.uniform(0,1)
for i,v in enumerate(np.cumsum(p)):
if x < v: return i
return len(p)-1 # shouldn't happen...

Related

Method of generating a string with results from a curve_fit

I have created a class which takes a distribution, and fits it. The method has the option for choosing between a few predefined functions.
As part of printing the class, I print the result of the fit in the form of an equation, where the fit-results and subsequent errors are displayed on the over the figure.
My question is is there a tidy way to handle when a number is negative, such that the string for printing is formed as: "y = mx - c", and not "y = mx + -c".
I developed this with a linear fit, where I simply assess the sign of the constant, and form the string in one of two ways:
def fit_result_string(self, results, errors):
if self.fit_model is utl.linear:
if results[1] > 0:
fit_str = r"y = {:.3}($\pm${:.3})x + {:.3}($\pm${:.3})".format(
results[0],
errors[0],
results[1],
errors[1])
else:
fit_str = r"y = {:.3}($\pm${:.3})x - {:.3}($\pm${:.3})".format(
results[0],
errors[0],
abs(results[1]),
errors[1])
return fit_str
I now want to build this up to also be able to form a string containing the results if the fit model is changed to a 2nd, 3rd, or 4th degree polynomial, while handling the sign of each coefficient.
Is there a better way to do this than using a whole bunch of if-else statements?
Thanks in advance!
Define a function which returns '+' or '-' according to the given number, and call it inside a f-string.
def plus_minus_string(n):
return '+' if n >= 0 else '-'
print(f"y = {m}x {plus_minus_string(c)} {abs(c)}")
Examples:
>>> m = 2
>>> c = 5
>>> print(f"y = {m}x {plus_minus_string(c)} {abs(c)}")
y = 2x + 5
>>> c = -4
>>> print(f"y = {m}x {plus_minus_string(c)} {abs(c)}")
y = 2x - 4
You will need to change it a bit to fit to your code, but it's quite straight-forward I hope.

Sympy won't simplify or expand exponential with decimals

i'm trying to simplify a huge expression of powers of n , and one of the results of sympy throws a (n+1)^1.0 , i noticed that
f=n*((n+1)**1.0)
sympy.expand(f)
doesn't work it stays the same instead of giving n^2+n, so i was wondering if there's any way to perform something like this
Sympy will expand your expression as expected when the power is an integer number. If the power is stored as a rational or a float, it won't work. Your options are either to rewrite your expression using integers, or write some code that will automatically check if a float stores an integer number (up to numerical precision error) and act accordingly.
Here's a starting point for that:
def rewrite_polynomial(p):
args_list = []
if not p.is_Mul:
return None
for m in p.args:
if not m.is_Pow:
args_list.append(m)
else:
pow_val = m.args[1]
if pow_val.is_Float:
pow_val_int = int(pow_val)
if pow_val.epsilon_eq(pow_val_int):
args_list.append(Pow(m.args[0],Integer(pow_val_int)))
else:
args_list.append(m)
else:
args_list.append(m)
return Mul(*args_list)
n = Symbol('n')
f= n*((n+1)**1.0)
g = rewrite_polynomial(f)
print(g)
Based on Yakovs answer, I made a rewrite rule that makes a DFS traversal of the expression tree and replaces powers to integers in float type.
The code is probably not very efficient, but it worked for my use cases.
Since I'm not a sympy expert, I guess there are some edge cases where this code will break.
Anyways, here you go!
import sympy as s
def recurse_replace(expr,pred,func):
if len(expr.args) == 0:
return expr
else:
new_args = tuple(recurse_replace(a,pred,func) for a in expr.args)
if pred(expr):
return func(expr,new_args)
else:
return type(expr)(*new_args)
def rewrite(expr,new_args):
new_args = list(new_args)
pow_val = new_args[1]
pow_val_int = int(new_args[1])
if pow_val.epsilon_eq(pow_val_int):
new_args[1] = s.Integer(pow_val_int)
new_node = type(expr)(*new_args)
return new_node
def isfloatpow(expr):
out = expr.is_Pow and expr.args[1].is_Float
return out
def clean_exponents(expr):
return recurse_replace(expr,isfloatpow,rewrite)
x=s.symbols('x')
expr = (1+x) ** 1.0
s.pprint(expr)
expr2 = recurse_replace(expr,isfloatpow,rewrite)
s.pprint(expr2)
With output
1.0
(x + 1)
x + 1

Writing a function for x * sin(3/x) in python

I have to write a function, s(x) = x * sin(3/x) in python that is capable of taking single values or vectors/arrays, but I'm having a little trouble handling the cases when x is zero (or has an element that's zero). This is what I have so far:
def s(x):
result = zeros(size(x))
for a in range(0,size(x)):
if (x[a] == 0):
result[a] = 0
else:
result[a] = float(x[a] * sin(3.0/x[a]))
return result
Which...doesn't work for x = 0. And it's kinda messy. Even worse, I'm unable to use sympy's integrate function on it, or use it in my own simpson/trapezoidal rule code. Any ideas?
When I use integrate() on this function, I get the following error message: "Symbol" object does not support indexing.
This takes about 30 seconds per integrate call:
import sympy as sp
x = sp.Symbol('x')
int2 = sp.integrate(x*sp.sin(3./x),(x,0.000001,2)).evalf(8)
print int2
int1 = sp.integrate(x*sp.sin(3./x),(x,0,2)).evalf(8)
print int1
The results are:
1.0996940
-4.5*Si(zoo) + 8.1682775
Clearly you want to start the integration from a small positive number to avoid the problem at x = 0.
You can also assign x*sin(3./x) to a variable, e.g.:
s = x*sin(3./x)
int1 = sp.integrate(s, (x, 0.00001, 2))
My original answer using scipy to compute the integral:
import scipy.integrate
import math
def s(x):
if abs(x) < 0.00001:
return 0
else:
return x*math.sin(3.0/x)
s_exact = scipy.integrate.quad(s, 0, 2)
print s_exact
See the scipy docs for more integration options.
If you want to use SymPy's integrate, you need a symbolic function. A wrong value at a point doesn't really matter for integration (at least mathematically), so you shouldn't worry about it.
It seems there is a bug in SymPy that gives an answer in terms of zoo at 0, because it isn't using limit correctly. You'll need to compute the limits manually. For example, the integral from 0 to 1:
In [14]: res = integrate(x*sin(3/x), x)
In [15]: ans = limit(res, x, 1) - limit(res, x, 0)
In [16]: ans
Out[16]:
9⋅π 3⋅cos(3) sin(3) 9⋅Si(3)
- ─── + ──────── + ────── + ───────
4 2 2 2
In [17]: ans.evalf()
Out[17]: -0.164075835450162

How to define codependent functions in Python?

I need to plot the position of a particle at time t, given the following formulae: s(t) = -0.5*g(s)*t^2+v0*t, where g(s) = G*M/(R+s(t))^2 (G, M, and R are constants, s being a value, not the function s(t)). The particle is being shot up vertically, and I want to print its current position every second until it hits the ground. But I can't figure out how to define one function without using the other before it's defined. This is my code so far:
G = 6.6742*10^(-11)
M = 5.9736*10^24
R = 6371000
s0 = 0
v0 = 300
t = 0
dt = 0.005
def g(s):
def s(t):
s(t) = -0.5*g(s)*t^2+v0*t
g(s) = G*M/(R+s(t))^2
def v(t):
v(t) = v(t-dt)-g(s(t-dt))*dt
while s(t) >= 0:
s(t) = s(t-dt)+v(t)*dt
t = t+dt
if t == int(t):
print s(t)
When I run the function, it says that it can't assign the function call.
The error means that you can't write s(t) = x, because s(t) is a function, and assignment on functions is performed with def .... Instead, you'll want to return the value, so you'd rewrite it like this:
def g(s):
def s(t):
return -0.5*g(s)*t^2+v0*t
return G*M/(R+s(t))^2
However, there are other issues with that as well. From a computational standpoint, this calculation would never terminate. Python is not an algebra system and can't solve for certain values. If you try to call s(t) within g(s), and g(s) within s(t), you'd never terminate, unless you define a termination condition. Otherwise they'll keep calling each other, until the recursion stack is filled up and then throws an error.
Also, since you defined s(t) within g(s), you can't call it from the outside, as you do several times further down in your code.
You seem to be confused about several syntax and semantic specifics of Python. If you ask us for what exactly you'd like to do and provide us with the mathematical formulae for it, it might be easier to formulate an answer that may help you better.
Edit:
To determine the position of a particle at time t, you'll want the following code (reformatted your code to Python syntax, use ** instead of ^ and return statements):
G = 6.6742*10**(-11)
M = 5.9736*10**24
R = 6371000
s0 = 0
v0 = 300
t = 0
dt = 0.005
sc = s0 # Current position of the particle, initially at s0
def g(s):
return -G*M/(R+s)**2
def s(t):
return 0.5*g(sc)*t**2 + v0*t + s0
count = 0
while s(t) >= 0:
if count % 200 == 0:
print(sc)
sc = s(t)
count += 1
t = dt*count
Python functions can call each other, but that's not how a function returns a value. To make a function return a particular value, use return, e.g.,
def v(t):
return v(t - dt) - g(s(t - dt)) * dt
Furthermore, I don't really understand what you're trying to do with this, but you'll probably need to express yourself differently:
while s(t) >= 0:
s(t) = s(t-dt)+v(t)*dt
t = t+dt

python: getting around division by zero

I have a big data set of floating point numbers. I iterate through them and evaluate np.log(x) for each of them.
I get
RuntimeWarning: divide by zero encountered in log
I would like to get around this and return 0 if this error occurs.
I am thinking of defining a new function:
def safe_ln(x):
#returns: ln(x) but replaces -inf with 0
l = np.log(x)
#if l = -inf:
l = 0
return l
Basically,I need a way of testing that the output is -inf but I don't know how to proceed.
Thank you for your help!
You are using a np function, so I can safely guess that you are working on a numpy array?
Then the most efficient way to do this is to use the where function instead of a for loop
myarray= np.random.randint(10,size=10)
result = np.where(myarray>0, np.log(myarray), 0)
otherwise you can simply use the log function and then patch the hole:
myarray= np.random.randint(10,size=10)
result = np.log(myarray)
result[result==-np.inf]=0
The np.log function return correctly -inf when used on a value of 0, so are you sure that you want to return a 0? if somewhere you have to revert to the original value, you are going to experience some problem, changing zeros into ones...
Since the log for x=0 is minus infinite, I'd simply check if the input value is zero and return whatever you want there:
def safe_ln(x):
if x <= 0:
return 0
return math.log(x)
EDIT: small edit: you should check for all values smaller than or equal to 0.
EDIT 2: np.log is of course a function to calculate on a numpy array, for single values you should use math.log. This is how the above function looks with numpy:
def safe_ln(x, minval=0.0000000001):
return np.log(x.clip(min=minval))
You can do this.
def safe_ln(x):
try:
l = np.log(x)
except ZeroDivisionError:
l = 0
return l
I like to use sys.float_info.min as follows:
>>> import numpy as np
>>> import sys
>>> arr = np.linspace(0.0, 1.0, 3)
>>> print(arr)
[0. 0.5 1. ]
>>> arr[arr < sys.float_info.min] = sys.float_info.min
>>> print(arr)
[2.22507386e-308 5.00000000e-001 1.00000000e+000]
>>> np.log10(arr)
array([-3.07652656e+02, -3.01029996e-01, 0.00000000e+00])
Other answers have also introduced small positive values, but I prefer to use the smallest possible value to make the approximation more accurate.
The answer given by Enrico is nice, but both solutions result in a warning:
RuntimeWarning: divide by zero encountered in log
As an alternative, we can still use the where function but only execute the main computation where it is appropriate:
# alternative implementation -- a bit more typing but avoids warnings.
loc = np.where(myarray>0)
result2 = np.zeros_like(myarray, dtype=float)
result2[loc] =np.log(myarray[loc])
# answer from Enrico...
myarray= np.random.randint(10,size=10)
result = np.where(myarray>0, np.log(myarray), 0)
# check it is giving right solution:
print(np.allclose(result, result2))
My use case was for division, but the principle is clearly the same:
x = np.random.randint(10, size=10)
divisor = np.ones(10,)
divisor[3] = 0 # make one divisor invalid
y = np.zeros_like(divisor, dtype=float)
loc = np.where(divisor>0) # (or !=0 if your data could have -ve values)
y[loc] = x[loc] / divisor[loc]
use exception handling:
In [27]: def safe_ln(x):
try:
return math.log(x)
except ValueError: # np.log(x) might raise some other error though
return float("-inf")
....:
In [28]: safe_ln(0)
Out[28]: -inf
In [29]: safe_ln(1)
Out[29]: 0.0
In [30]: safe_ln(-100)
Out[30]: -inf
you could do:
def safe_ln(x):
#returns: ln(x) but replaces -inf with 0
try:
l = np.log(x)
except RunTimeWarning:
l = 0
return l
For those looking for a np.log solution that intakes a np.ndarray and nudges up only zero values:
import sys
import numpy as np
def smarter_nextafter(x: np.ndarray) -> np.ndarray:
safe_x = np.where(x != 0, x, np.nextafter(x, 1))
return np.log(safe_x)
def clip_usage(x: np.ndarray, safe_min: float | None = None) -> np.ndarray:
# Inspiration: https://stackoverflow.com/a/13497931/
clipped_x = x.clip(min=safe_min or np.finfo(x.dtype).min)
return np.log(clipped_x)
def inplace_usage(x: np.ndarray, safe_min: float | None = None) -> np.ndarray:
# Inspiration: https://stackoverflow.com/a/62292638/
x[x == 0] = safe_min or np.finfo(x.dtype).min
return np.log(x)
Or if you don't mind nudging all values and like bad big-O runtimes:
def brute_nextafter(x: np.ndarray) -> np.ndarray:
# Just for reference, don't use this
while not x.all():
x = np.nextafter(x, 1)
return np.log(x)

Categories

Resources