Python sklearn. Why I received warnings first time only? - python

I received warnings first time only.
Is this normal?
>>> cv=LassoCV(cv=10).fit(x,y)
C:\Python27\lib\site-packages\scikit_learn-0.14.1-py2.7-win32.egg\sklearn\linear_model\coordinate_descent.py:418: UserWarning: Objective did not converge. You might want to increase the number of iterations
' to increase the number of iterations')
>>> cv=LassoCV(cv=10).fit(x,y)
>>>

It's because the python warnings filter is set to just warn the first time it catches a particular warning by default.
If you want to get all the warnings, just add this:
import warnings
warnings.simplefilter("always")

because the "objective did not converge". The maximum iterations are by default 1000 and you are not setting them. Try setting the max_iter parameter to a higher value to avoid the warning.

Related

How can I handle divergence failure manually when using optimize.newton in SciPy?

I'm using newton optimize from SciPy to solve an equation and depending on the initial guess sometimes the solution does not converge and crashes.
x = optimize.newton(fun,1/1000)
Would it be possible to print a message instead of the python crash message to say that convergence failed or retry optimization with different initial values?
From the documentation:
disp: bool, optional
If True, raise a RuntimeError if the algorithm didn’t converge, with the error message containing the number of iterations and current function value. Otherwise the convergence status is recorded in a RootResults return object. Ignored if x0 is not scalar. Note: this has little to do with displaying, however the disp keyword cannot be renamed for backwards compatibility.
You should set disp to False, because it is enabled by default:
optimize.newton(fun, 1/1000, disp=False)
Your result and other information will be in a RootResults object.

Python tqdm package - how to configure for less frequent status bar updates

I'm using a tqdm package (v4.19.5) for a loop with 200K+ iterations in Python 3.6 (Anaconda 3, Windows 10 x64, PyCharm IDE). It prints the status bar too frequently, so that my earlier history disappears in the console.
Is there a way to limit the frequency of status bar updates? I can't seem to find this solution online or tqdm website.
Currently, my status bar prints like this. Ideally, I'd like to set put a cap on percentage change. For example, update it not more frequently than once per 1% change/progress.
You can use miniters parameter i have it from source code
https://github.com/tqdm/tqdm/blob/master/tqdm/_tqdm.py
tqdm.tqdm(iterable, miniters=int(223265/100))
The accepted answer and the cited public API
tqdm.tqdm(iterable, miniters=int(223265/100))
is correct but the code location has changed to here` https://github.com/tqdm/tqdm/blob/master/tqdm/std.py#L890-L897
Below is the description of miniters option:
miniters : int or float, optional
Minimum progress display update interval, in iterations.
If 0 and `dynamic_miniters`, will automatically adjust to equal
`mininterval` (more CPU efficient, good for tight loops).
If > 0, will skip display of specified number of iterations.
Tweak this and `mininterval` to get very efficient loops.
If your progress is erratic with both fast and slow iterations
(network, skipping items, etc) you should set miniters=1.
It is convenient to use mininterval option that defines minimum time interval between updates. For instance, to update every n_seconds use:
tqdm(iterable, mininterval=n_seconds)
I may be seeing a different problem than the one you asked about: each update to the progress bar is written to a separate new line on your end, which has been a bug of Pycharm's for years. If you run your Python script directly in a command line, there should only be a single line showing progress, and it should keep updating. Alternatively, you can set "Emulate terminal in output console" in the run/debug configurations (see screenshot attached). However, the problem might have been fixed by an update in the meantime.
Short answer, miniters or (mininterval and maxinterval) are the way to go.
Long answer:
Use mininterval and maxinterval (they default to 0.1 and 10 seconds respectively)
Or, use miniters and maxinterval=float("inf") (see #1429 in tqdm<=4.64.1)
And set refresh=False in calls to set_postfix and set_postfix_str.
Three examples:
from tqdm import tqdm
from time import sleep
N = 100000
# mininterval and maxinterval in seconds
for i in tqdm(range(N), total=N, mininterval=2, maxinterval=2):
sleep(0.3)
# miniters and maxinterval=float("inf")
for i in tqdm(range(N), total=N, miniters=50, maxinterval=float("inf")):
sleep(0.3)
# use refresh=False in set_postfix
progress_bar = tqdm(range(N), total=N, miniters=240, maxinterval=float("inf"))
for i in progress_bar:
sleep(0.3)
progress_bar.set_postfix({"i": i}, refresh=False)

Runtimewarning when using scipy.stats.beta.fit

If I run the following code in python
from scipy.stats import norm, beta
sample = beta.rvs(2,5,size=100)
beta_fit = beta.fit(sample)
I get the following error
/usr/lib/python3/dist-packages/scipy/stats/_continuous_distns.py:404: RuntimeWarning: invalid
value encountered in sqrt
sk = 2*(b-a)*sqrt(a + b + 1) / (a + b + 2) / sqrt(a*b)
and depending on the size of the sample, I sometimes also get this other error
/usr/lib/python3/dist-packages/scipy/optimize/minpack.py:161: RuntimeWarning:
The iteration is not making good progress, as measured by the improvement from the last ten iterations.
warnings.warn(msg, RuntimeWarning)
Does anyone know why this is happening and how to fix it?
Thanks!
In a comment you say that you want to keep the support fixed as [0, 1]. To do that with the fit() method, use the arguments floc=0 and fscale=1. Then only the shape parameters will be fit to the data.
from scipy.stats import beta
sample = beta.rvs(2, 5, size=100)
beta_fit = beta.fit(sample, floc=0, fscale=1)
This should also eliminate the warnings that you are seeing. Those warnings occur because when all four parameters are fit, the code uses a generic numerical optimization routine to find the parameters that maximize the likelihood, and something in that code is generating those warnings. (It might be a bug--the shape parameters are supposed to be positive, so neither of the calls to sqrt in the line that generates the warning should get a negative argument.) When you fix the location and scale, the fit() method solves a simpler numerical problem to find the maximum likelihood parameter estimates, so it avoids the code that generates the warnings.

Joblib UserWarning while trying to cache results

I get the following UserWarning when trying to cache results using joblib:
import numpy
from tempfile import mkdtemp
cachedir = mkdtemp()
from joblib import Memory
memory = Memory(cachedir=cachedir, verbose=0)
#memory.cache
def get_nc_var3d(path_nc, var, year):
"""
Get value from netcdf for variable var for year
:param path_nc:
:param var:
:param year:
:return:
"""
try:
hndl_nc = open_or_die(path_nc)
val = hndl_nc.variables[var][int(year), :, :]
except:
val = numpy.nan
logger.info('Error in getting var ' + var + ' for year ' + str(year) + ' from netcdf ')
hndl_nc.close()
return val
I get the following warning when calling this function using parameters:
UserWarning: Persisting input arguments took 0.58s to run.
If this happens often in your code, it can cause performance problems
(results will be correct in all cases).
The reason for this is probably some large input arguments for a wrapped function (e.g. large strings).
THIS IS A JOBLIB ISSUE. If you can, kindly provide the joblib's team with an example so that they can fix the problem.
Input parameters: C:/Users/rit/Documents/PhD/Projects/\GLA/Input/LUWH/\LUWLAN_v1.0h\transit_model.nc range_to_large 1150
How do I get rid of the warning? And why is it happening, since the input parameters are not too long?
I don't have an answer to the "why doesn't this work?" portion of the question. However to simply ignore the warning you can use warnings.catch_warnings with warnings.simplefilter as seen here.
import warnings
with warnings.catch_warnings():
warnings.simplefilter("ignore")
your_code()
Obviously, I don't recommend ignoring the warning unless you're sure its harmless, but if you're going to do it this way will only suppress warnings inside the context manager and is straight out of the python docs
UserWarning: Persisting input arguments took 0.58s to run.
If this happens often in your code, it can cause performance problems
(results will be correct in all cases).
The reason for this is probably some large input arguments for a wrapped function (e.g. large strings).
THIS IS A JOBLIB ISSUE. If you can, kindly provide the joblib's team with an example so that they can fix the problem.
the warning itself is self explanatory in my humble opinion. it might be in your code issue you can try to decrease the input size,or you can share your report with joblib team so that they can either help to improve joblib or suggest your better approach of usage to avoid this type of performance warnings.
The warning occurs because joblib is trying to persist the input arguments to disk for caching purposes, but it's taking too long for this operation. The cause of the issue could be due to some large input arguments, such as a long string, which is taking time to serialize.
To resolve the issue, you can either disable the persist argument of the cache method, which would result in no caching, or you can try to preprocess the input arguments to reduce their size before calling the cache method.
#memory.cache(persist=False)

How to reset warnings completely

How can I see a warning again without restarting python. Now I see them only once.
Consider this code for example:
import pandas as pd
pd.Series([1]) / 0
I get
RuntimeWarning: divide by zero encountered in true_divide
But when I run it again it executes silently.
How can I see the warning again without restarting python?
I have tried to do
del __warningregistry__
but that doesn't help.
Seems like only some types of warnings are stored there.
For example if I do:
def f():
X = pd.DataFrame(dict(a=[1,2,3],b=[4,5,6]))
Y = X.iloc[:2]
Y['c'] = 8
then this will raise warning only first time when f() is called.
However, now when if do del __warningregistry__ I can see the warning again.
What is the difference between first and second warning? Why only the second one is stored in this __warningregistry__? Where is the first one stored?
How can I see the warning again without restarting python?
As long as you do the following at the beginning of your script, you will not need to restart.
import pandas as pd
import numpy as np
import warnings
np.seterr(all='warn')
warnings.simplefilter("always")
At this point every time you attempt to divide by zero, it will display
RuntimeWarning: divide by zero encountered in true_divide
Explanation:
We are setting up a couple warning filters. The first (np.seterr) is telling NumPy how it should handle warnings. I have set it to show warnings on all, but if you are only interested in seeing the Divide by zero warnings, change the parameter from all to divide.
Next we change how we want the warnings module to always display warnings. We do this by setting up a warning filter.
What is the difference between first and second warning? Why only the second one is stored in this __warningregistry__? Where is the first one stored?
This is described in the bug report reporting this issue:
If you didn't raise the warning before using the simple filter, this
would have worked. The undesired behavior is because of
__warningsregistry__. It is set the first time the warning is emitted.
When the second warning comes through, the filter isn't even looked at.
I think the best way to fix this is to invalidate __warningsregistry__
when a filter is used. It would probably be best to store warnings data
in a global then instead of on the module, so it is easy to invalidate.
Incidentally, the bug has been closed as fixed for versions 3.4 and 3.5.
warnings is a pretty awesome standard library module. You're going to enjoy getting to know it :)
A little background
The default behavior of warnings is to only show a particular warning, coming from a particular line, on its first occurrence. For instance, the following code will result in two warnings shown to the user:
import numpy as np
# 10 warnings, but only the first copy will be shown
for i in range(10):
np.true_divide(1, 0)
# This is on a separate line from the other "copies", so its warning will show
np.true_divide(1, 0)
You have a few options to change this behavior.
Option 1: Reset the warnings registry
when you want python to "forget" what warnings you've seen before, you can use resetwarnings:
# warns every time, because the warnings registry has been reset
for i in range(10):
warnings.resetwarnings()
np.true_divide(1, 0)
Note that this also resets any warning configuration changes you've made. Which brings me to...
Option 2: Change the warnings configuration
The warnings module documentation covers this in greater detail, but one straightforward option is just to use a simplefilter to change that default behavior.
import warnings
import numpy as np
# Show all warnings
warnings.simplefilter('always')
for i in range(10):
# Now this will warn every loop
np.true_divide(1, 0)
Since this is a global configuration change, it has global effects which you'll likely want to avoid (all warnings anywhere in your application will show every time). A less drastic option is to use the context manager:
with warnings.catch_warnings():
warnings.simplefilter('always')
for i in range(10):
# This will warn every loop
np.true_divide(1, 0)
# Back to normal behavior: only warn once
for i in range(10):
np.true_divide(1, 0)
There are also more granular options for changing the configuration on specific types of warnings. For that, check out the docs.

Categories

Resources