code is working but with a 'DeprecationWarning' -scipy.stats - python

I wrote this code to calculate the mode and standard deviation for a large sample:
import numpy as np
import csv
import scipy.stats as sp
import math
r=open('stats.txt', 'w') #file with results
r.write('Data File'+'\t'+ 'Mode'+'\t'+'Std Dev'+'\n')
f=open('data.ls', 'rb') #file with the data files
for line in f:
dataf=line.strip()
data=csv.reader(open(dataf, 'rb'))
data.next()
data_list=[]
datacol=[]
data_list.extend(data)
for rows in data_list:
datacol.append(math.log10(float(rows[73])))
m=sp.mode(datacol)
s=sp.std(datacol)
r.write(dataf+'\t'+str(m)+'\t'+str(s)+'\n')
del(datacol)
del(data_list)
Which is working well -I think! However after I run the code there is an error message on my terminal and I am wondering if anybody can tell me what it means?
/usr/lib/python2.6/dist-packages/scipy/stats/stats.py:1328: DeprecationWarning: scipy.stats.std is deprecated; please update your code to use numpy.std.
Please note that:
- numpy.std axis argument defaults to None, not 0
- numpy.std has a ddof argument to replace bias in a more general manner.
scipy.stats.std(a, bias=True) can be replaced by numpy.std(x,
axis=0, ddof=0), scipy.stats.std(a, bias=False) by numpy.std(x, axis=0,
ddof=1).
ddof=1).""", DeprecationWarning)
/usr/lib/python2.6/dist-packages/scipy/stats/stats.py:1304: DeprecationWarning: scipy.stats.var is deprecated; please update your code to use numpy.var.
Please note that:
- numpy.var axis argument defaults to None, not 0
- numpy.var has a ddof argument to replace bias in a more general manner.
scipy.stats.var(a, bias=True) can be replaced by numpy.var(x,
axis=0, ddof=0), scipy.stats.var(a, bias=False) by var(x, axis=0,
ddof=1).
ddof=1).""", DeprecationWarning)
/usr/lib/python2.6/dist-packages/scipy/stats/stats.py:420: DeprecationWarning: scipy.stats.mean is deprecated; please update your code to use numpy.mean.
Please note that:
- numpy.mean axis argument defaults to None, not 0
- numpy.mean has a ddof argument to replace bias in a more general manner.
scipy.stats.mean(a, bias=True) can be replaced by numpy.mean(x,
axis=0, ddof=1).
axis=0, ddof=1).""", DeprecationWarning)

Those are deprecation warnings, which usually mean that your code will work, but may stop working in a future release.
Currently you have this line s=sp.std(datacol). It looks like the warning suggests using numpy.std() instead of scipy.stats.std() Making this change could make this warning go away.
If you don't care about the deprecation warning and want to use your code as is, you can suppress it with the warnings module. For example, if you have a function fxn() that generates a DeprecationWarning, you can wrap it like this:
with warnings.catch_warnings():
warnings.simplefilter("ignore")
fxn() #this function generates DeprecationWarnings

The DeprecationWarnings don't prevent your code to run properly, they are just warnings that the code you are using will soon be deprecated and that you should update it to the proper syntax.
In this particular case, it stems from inconstencies between NumPy and SciPy on the default arguments for the var, std... functions/methods. In order to clean things up, it was decided to drop the functions from scipy.stats and use their NumPy counterparts instead.
Of course, just dropping the functions would upset some users whose code would suddenly fail to work. So, the SciPy devs decided to include a DeprecationWarning for a couple of releases, which should leave enough time for everybody to update their code.
In your case, you should use check the docstring of scipy.stats.std on your system to see what defaults they're using, and follow the warning instructions on how to modify your code accordingly.

Related

_base.py:251: FutureWarning: Support for multi-dimensional indexing - WHAT?

I have a project with a couple thousand lines of code.
I'm getting this message when it runs:
(e.g. obj[:, None]) is deprecated and will be removed in a future version. Convert to a numpy array before indexing instead. y = y[:, np.newaxis]
The error message doesn't give me any line number to go look at and I have no idea what to look for to try to debug this.
Any suggestions would be appreciated.
One approach is to run Python with -Werror, i.e
python3 -Werror myproj.py
This will cause Python to exit with a full traceback when the warning is triggered.
The same effect can be achieved by setting the PYTHONWARNINGS environment variable to error.
I also encountered this problem when using Python's matplotlib.pyplot library, where I converted the input data from the plot() function to numpy Array, such a warning is not availabled, just for reference.

scipy overwriting warning filters?

It seems as though some scipy modules are messing with my warning filters. Consider the following code. My understanding is that it should only throw one warning because of the "once" filter I supplied to my custom Warning class. However, the warning after the scipy import gets shown as well.
This is with python 3.7 and scipy 1.6.3.
import warnings
class W(DeprecationWarning): pass
warnings.simplefilter("once", W)
warnings.warn('warning!', W)
warnings.warn('warning!', W)
from scipy import interpolate
warnings.warn('warning!', W)
This only seems to happen when I import certain scipy modules. A generic "import scipy" doesn't do this.
I've narrowed it down to the filters set in scipy.special.sf_error.py and scipy.sparse.__init__.py. I don't see how that code would cause the problem, but it does. When I comment those filtersout, my code works as expected.
Am I misunderstanding something? Is there a work-around that that doesn't involved overwriting warnings.filterwarnings/warnings.simplefilters?
This an open Python bug: https://bugs.python.org/issue29672.
Note, in particular, the last part of the comment by Tom Aldcroft:
Even a documentation update would be useful. This could explain not only catch_warnings(), but in general the unexpected feature that if any package anywhere in the stack sets a warning filter, then that globally resets whether a warning has been seen before (via the call to _filters_mutated()).
The code in scipy/special/sf_error.py sets a warning filter, and that causes a global reset of which warnings have been seen before. (If you add another call of warnings.warn('warning!', W) to the end of your sample code, you should see that it does not raise a warning.)

Joblib UserWarning while trying to cache results

I get the following UserWarning when trying to cache results using joblib:
import numpy
from tempfile import mkdtemp
cachedir = mkdtemp()
from joblib import Memory
memory = Memory(cachedir=cachedir, verbose=0)
#memory.cache
def get_nc_var3d(path_nc, var, year):
"""
Get value from netcdf for variable var for year
:param path_nc:
:param var:
:param year:
:return:
"""
try:
hndl_nc = open_or_die(path_nc)
val = hndl_nc.variables[var][int(year), :, :]
except:
val = numpy.nan
logger.info('Error in getting var ' + var + ' for year ' + str(year) + ' from netcdf ')
hndl_nc.close()
return val
I get the following warning when calling this function using parameters:
UserWarning: Persisting input arguments took 0.58s to run.
If this happens often in your code, it can cause performance problems
(results will be correct in all cases).
The reason for this is probably some large input arguments for a wrapped function (e.g. large strings).
THIS IS A JOBLIB ISSUE. If you can, kindly provide the joblib's team with an example so that they can fix the problem.
Input parameters: C:/Users/rit/Documents/PhD/Projects/\GLA/Input/LUWH/\LUWLAN_v1.0h\transit_model.nc range_to_large 1150
How do I get rid of the warning? And why is it happening, since the input parameters are not too long?
I don't have an answer to the "why doesn't this work?" portion of the question. However to simply ignore the warning you can use warnings.catch_warnings with warnings.simplefilter as seen here.
import warnings
with warnings.catch_warnings():
warnings.simplefilter("ignore")
your_code()
Obviously, I don't recommend ignoring the warning unless you're sure its harmless, but if you're going to do it this way will only suppress warnings inside the context manager and is straight out of the python docs
UserWarning: Persisting input arguments took 0.58s to run.
If this happens often in your code, it can cause performance problems
(results will be correct in all cases).
The reason for this is probably some large input arguments for a wrapped function (e.g. large strings).
THIS IS A JOBLIB ISSUE. If you can, kindly provide the joblib's team with an example so that they can fix the problem.
the warning itself is self explanatory in my humble opinion. it might be in your code issue you can try to decrease the input size,or you can share your report with joblib team so that they can either help to improve joblib or suggest your better approach of usage to avoid this type of performance warnings.
The warning occurs because joblib is trying to persist the input arguments to disk for caching purposes, but it's taking too long for this operation. The cause of the issue could be due to some large input arguments, such as a long string, which is taking time to serialize.
To resolve the issue, you can either disable the persist argument of the cache method, which would result in no caching, or you can try to preprocess the input arguments to reduce their size before calling the cache method.
#memory.cache(persist=False)

How to reset warnings completely

How can I see a warning again without restarting python. Now I see them only once.
Consider this code for example:
import pandas as pd
pd.Series([1]) / 0
I get
RuntimeWarning: divide by zero encountered in true_divide
But when I run it again it executes silently.
How can I see the warning again without restarting python?
I have tried to do
del __warningregistry__
but that doesn't help.
Seems like only some types of warnings are stored there.
For example if I do:
def f():
X = pd.DataFrame(dict(a=[1,2,3],b=[4,5,6]))
Y = X.iloc[:2]
Y['c'] = 8
then this will raise warning only first time when f() is called.
However, now when if do del __warningregistry__ I can see the warning again.
What is the difference between first and second warning? Why only the second one is stored in this __warningregistry__? Where is the first one stored?
How can I see the warning again without restarting python?
As long as you do the following at the beginning of your script, you will not need to restart.
import pandas as pd
import numpy as np
import warnings
np.seterr(all='warn')
warnings.simplefilter("always")
At this point every time you attempt to divide by zero, it will display
RuntimeWarning: divide by zero encountered in true_divide
Explanation:
We are setting up a couple warning filters. The first (np.seterr) is telling NumPy how it should handle warnings. I have set it to show warnings on all, but if you are only interested in seeing the Divide by zero warnings, change the parameter from all to divide.
Next we change how we want the warnings module to always display warnings. We do this by setting up a warning filter.
What is the difference between first and second warning? Why only the second one is stored in this __warningregistry__? Where is the first one stored?
This is described in the bug report reporting this issue:
If you didn't raise the warning before using the simple filter, this
would have worked. The undesired behavior is because of
__warningsregistry__. It is set the first time the warning is emitted.
When the second warning comes through, the filter isn't even looked at.
I think the best way to fix this is to invalidate __warningsregistry__
when a filter is used. It would probably be best to store warnings data
in a global then instead of on the module, so it is easy to invalidate.
Incidentally, the bug has been closed as fixed for versions 3.4 and 3.5.
warnings is a pretty awesome standard library module. You're going to enjoy getting to know it :)
A little background
The default behavior of warnings is to only show a particular warning, coming from a particular line, on its first occurrence. For instance, the following code will result in two warnings shown to the user:
import numpy as np
# 10 warnings, but only the first copy will be shown
for i in range(10):
np.true_divide(1, 0)
# This is on a separate line from the other "copies", so its warning will show
np.true_divide(1, 0)
You have a few options to change this behavior.
Option 1: Reset the warnings registry
when you want python to "forget" what warnings you've seen before, you can use resetwarnings:
# warns every time, because the warnings registry has been reset
for i in range(10):
warnings.resetwarnings()
np.true_divide(1, 0)
Note that this also resets any warning configuration changes you've made. Which brings me to...
Option 2: Change the warnings configuration
The warnings module documentation covers this in greater detail, but one straightforward option is just to use a simplefilter to change that default behavior.
import warnings
import numpy as np
# Show all warnings
warnings.simplefilter('always')
for i in range(10):
# Now this will warn every loop
np.true_divide(1, 0)
Since this is a global configuration change, it has global effects which you'll likely want to avoid (all warnings anywhere in your application will show every time). A less drastic option is to use the context manager:
with warnings.catch_warnings():
warnings.simplefilter('always')
for i in range(10):
# This will warn every loop
np.true_divide(1, 0)
# Back to normal behavior: only warn once
for i in range(10):
np.true_divide(1, 0)
There are also more granular options for changing the configuration on specific types of warnings. For that, check out the docs.

Differences between ipython %run and promt

I've the following simple erroneous code
from numpy import random, sqrt
points = random.randn(20,3);
points = points / sqrt(sum(points**2,1))
In ipython (with %autoreload 2) if I copy and paste it into the terminal I get a ValueError as one would expect. If I save this as a file and use %run then it runs without error (it shouldn't).
What's going on here?
I just figured it out, but I had written the question and it might be useful to someone else.
It is a difference between the numpy sum and the native sum. Changing the first line to
from numpy import random, sqrt, sum
fixes it as %run uses the native version by default (at least with my settings). The native run does not take an axis parameter, but does not throw an error either, because it is a start parameter, which is in effect just an offset to the sum. So,
>>> sum([1,2,3],10000)
10006
for the native version. And "axis out of bounds" for the numpy one.

Categories

Resources