Configuring python warnings control to show first occurrence only - python

I am trying to use the python warnings module to show only the first occurrence of a given warning. According to Warning control documentation, this looks to be as simple as setting the action flag to 'once' but that doesn't work for me.
Example code.
# Tried with python 3.9.13 for all the pandas versions below,
# and python 3.8.13 and 3.10.6 with pandas 1.5.0.
# Recent pandas versions 1.4.4 and 1.5.0 result in a deprecated warning
# for DataFrame.append, pandas version 1.3.4 does not.
import pandas as pd
import warnings
# Set action parameter. Various options include 'default', 'once' and 'ignore'
warnings.filterwarnings(action='default')
data = [1]
appendDF = pd.DataFrame(data)
for i in range(5):
print('Trial:', i)
appendDF = appendDF.append(data)
For action='default' and action='once', I get several copies of the same deprecated warning. For action='ignore', I get none (as desired).
I can see a potential workaround in How to print only first occurrence of python warning?, but that doesn't go into the 'once' option for the action parameter and also involves additional bespoke coding.
Any suggestions as to why the 'once' option above does not seem to work?

Related

xarray firing a Future Warning

Just upgraded to python3.8, I have updated xarray to v.0.16, but now I am always getting this warning:
/usr/local/lib/python3.8/dist-packages/xarray/core/common.py:1123: FutureWarning: 'base' in .resample() and in Grouper() is deprecated.
The new arguments that you should use are 'offset' or 'origin'.
>>> df.resample(freq="3s", base=2)
becomes:
>>> df.resample(freq="3s", offset="2s")
grouper = pd.Grouper(
The only point in my script in which I am using .resample is this:
mydata = xr.open_dataset(ncfile).resample(time='3H').reduce(np.mean)
but I don't know how to change it to avoid the warning.
Until this is updated in xarray, warnings can be ignored with a call to warnings.filterwarnings.
You're welcome to open an issue on GitHub for xarray (or even a PR!)

Specifying the message in warnings.simple_filter() filters based on category not message

When we import our_library in Python2, we have coded it to raise a DeprecationWarning once. Here's the representative code.
our_library/init.py
def _py2_deprecation_warning():
py2_warning = ('Python2 support is deprecated and will be removed in '
'a future release. Consider switching to Python3.')
warnings.filterwarnings('once', message=py2_warning)
warnings.warn(message=py2_warning,
category=DeprecationWarning,
stacklevel=3,
)
def _python_deprecation_warnings():
if sys.version_info.major == 2:
_py2_deprecation_warning()
_python_deprecation_warnings()
We deprecated the parameters in a function in our_library. Here's the representative code:
our_library/some_module.py
def some_function(new_param, deprecated_param):
if deprecated_param:
param_deprecation_msg = (
'The parameter "{}" will be removed in a future version of Nilearn.'
'Please use the parameter "{}" instead.'.format(deprecated_param,
new_param,
)
)
warnings.warn(category=DeprecationWarning,
message=param_deprecation_msg,
stacklevel=3)
Then when we import our library, and call that function, like this:
calling_script.py
from our_library.some_module import some_function
some_function(deprecated_param)
We get the Python2 DeprecationWarning but not the Parameter DeprecationWarning.
DeprecationWarning: Python2 support is deprecated and will be removed in a future release. Consider switching to Python3.
_python_deprecation_warnings()
Now know I can solve this by using a with warnings.catch_warnings(): or resetwarnings(). However I thought that specifying the message explicitly in Python2 Warning will prevent the 'once filter being set for other DeprecationWarnings.
However that is not the case? WhHy is that and how do I make my existing code work without using CatchWarnings or reset warnings?
If I change the Parameter warning to FutureWarning, that I can see.
Why is the first simplefilter blocking all deprecation messages based on category instead of messages?
UPDATE:
with warnings.catch_warnings(): doesn't seem to work either.
def _py2_deprecation_warning():
py2_warning = ('Python2 support is deprecated and will be removed in '
'a future release. Consider switching to Python3.')
with warnings.catch_warnings():
warnings.filterwarnings('once', message=py2_warning)
warnings.warn(message=py2_warning,
category=DeprecationWarning,
stacklevel=3,
)
Nevermind, I had forgotten that DeprecationWarnings are not displayed by defualt. They must specifically be set to be displayed, which I have not done here.

How to fix this error: "SQLContext object has no no attribute 'jsonFile'

I am learning Spark now. When I tried to load a json file, as follows:
people=sqlContext.jsonFile("C:\wdchentxt\CustomerData.json")
I got the following error:
AttributeError: 'SQLContext' object has no attribute 'jsonFile'
I am running this on Windows 7 PC, with spark-2.1.0-bin-hadoop2.7, and Python 2.7.13 (Dec 17, 2016).
Thank you for any suggestions that you may have.
You probably forgot to import the implicits. This is what my solution looks like in Scala:
def loadJson(filename: String, sqlContext: SqlContext): Dataset[Row] = {
import sqlContext._
import sqlContext.implicits._
val df = sqlContext.read.json(filename)
df
}
First, the more recent versions of Spark (like the one you are using) involve .read.json(..) instead of the deprecated .readJson(..).
Second, you need to be sure that your SqlContext is setup right, as mentioned here: pyspark : NameError: name 'spark' is not defined. In my case, it's setup like this:
from pyspark.sql import SQLContext, Row
sqlContext = SQLContext(sc)
myObjects = sqlContext.read.json('file:///home/cloudera/Downloads/json_files/firehose-1-2018-08-24-17-27-47-7066324b')
Note that they have version-specific quick-start tutorials that can help with getting some of the basic operations right, as mentioned here: name spark is not defined
So, my point is to always check to ensure that with whatever library or language you are using (and this applies in general across all technologies) that you are following the documentation that matches the version you are running because it is very common for breaking changes to create a lot of confusion if there is a version mismatch. In cases where the technology you are trying to use is not well documented in the version you are running, that's when you need to evaluate if you should upgrade to a more recent version or create a support ticket with those who maintain the project so that you can help them to better support their users.
You can find a guide on all of the version-specific changes of Spark here: https://spark.apache.org/docs/latest/sql-programming-guide.html#upgrading-from-spark-sql-16-to-20
You can also find version-specific documentation on Spark and PySpark here (e.g. for version 1.6.1): https://spark.apache.org/docs/1.6.1/sql-programming-guide.html
As mentioned before, .jsonFile (...) has been deprecated1, use this instead:
people = sqlContext.read.json("C:\wdchentxt\CustomerData.json").rdd
Source:
[1]: https://docs.databricks.com/spark/latest/data-sources/read-json.html

DataFrame object has no attribute 'sample'

Simple code like this won't work anymore on my python shell:
import pandas as pd
df=pd.read_csv("K:/01. Personal/04. Models/10. Location/output.csv",index_col=None)
df.sample(3000)
The error I get is:
AttributeError: 'DataFrame' object has no attribute 'sample'
DataFrames definitely have a sample function, and this used to work.
I recently had some trouble installing and then uninstalling another distribution of python. I don't know if this could be related.
I've previously had a similar problem when trying to execute a script which had the same name as a module I was importing, this is not the case here, and pandas.read_csv is actually working.
What could cause this?
As given in the documentation of DataFrame.sample -
DataFrame.sample(n=None, frac=None, replace=False, weights=None, random_state=None, axis=None)
Returns a random sample of items from an axis of object.
New in version 0.16.1.
(Emphasis mine).
DataFrame.sample is added in 0.16.1 , you can either -
Upgrade your pandas version to latest, you can use pip for that, Example -
pip install pandas --upgrade
Or if you don't want to upgrade, and want to sample few rows from the dataframe, you can also use random.sample(), Example -
import random
num = 100 #number of samples
sampleddata = df.loc[random.sample(list(df.index),num)]

How to reset warnings completely

How can I see a warning again without restarting python. Now I see them only once.
Consider this code for example:
import pandas as pd
pd.Series([1]) / 0
I get
RuntimeWarning: divide by zero encountered in true_divide
But when I run it again it executes silently.
How can I see the warning again without restarting python?
I have tried to do
del __warningregistry__
but that doesn't help.
Seems like only some types of warnings are stored there.
For example if I do:
def f():
X = pd.DataFrame(dict(a=[1,2,3],b=[4,5,6]))
Y = X.iloc[:2]
Y['c'] = 8
then this will raise warning only first time when f() is called.
However, now when if do del __warningregistry__ I can see the warning again.
What is the difference between first and second warning? Why only the second one is stored in this __warningregistry__? Where is the first one stored?
How can I see the warning again without restarting python?
As long as you do the following at the beginning of your script, you will not need to restart.
import pandas as pd
import numpy as np
import warnings
np.seterr(all='warn')
warnings.simplefilter("always")
At this point every time you attempt to divide by zero, it will display
RuntimeWarning: divide by zero encountered in true_divide
Explanation:
We are setting up a couple warning filters. The first (np.seterr) is telling NumPy how it should handle warnings. I have set it to show warnings on all, but if you are only interested in seeing the Divide by zero warnings, change the parameter from all to divide.
Next we change how we want the warnings module to always display warnings. We do this by setting up a warning filter.
What is the difference between first and second warning? Why only the second one is stored in this __warningregistry__? Where is the first one stored?
This is described in the bug report reporting this issue:
If you didn't raise the warning before using the simple filter, this
would have worked. The undesired behavior is because of
__warningsregistry__. It is set the first time the warning is emitted.
When the second warning comes through, the filter isn't even looked at.
I think the best way to fix this is to invalidate __warningsregistry__
when a filter is used. It would probably be best to store warnings data
in a global then instead of on the module, so it is easy to invalidate.
Incidentally, the bug has been closed as fixed for versions 3.4 and 3.5.
warnings is a pretty awesome standard library module. You're going to enjoy getting to know it :)
A little background
The default behavior of warnings is to only show a particular warning, coming from a particular line, on its first occurrence. For instance, the following code will result in two warnings shown to the user:
import numpy as np
# 10 warnings, but only the first copy will be shown
for i in range(10):
np.true_divide(1, 0)
# This is on a separate line from the other "copies", so its warning will show
np.true_divide(1, 0)
You have a few options to change this behavior.
Option 1: Reset the warnings registry
when you want python to "forget" what warnings you've seen before, you can use resetwarnings:
# warns every time, because the warnings registry has been reset
for i in range(10):
warnings.resetwarnings()
np.true_divide(1, 0)
Note that this also resets any warning configuration changes you've made. Which brings me to...
Option 2: Change the warnings configuration
The warnings module documentation covers this in greater detail, but one straightforward option is just to use a simplefilter to change that default behavior.
import warnings
import numpy as np
# Show all warnings
warnings.simplefilter('always')
for i in range(10):
# Now this will warn every loop
np.true_divide(1, 0)
Since this is a global configuration change, it has global effects which you'll likely want to avoid (all warnings anywhere in your application will show every time). A less drastic option is to use the context manager:
with warnings.catch_warnings():
warnings.simplefilter('always')
for i in range(10):
# This will warn every loop
np.true_divide(1, 0)
# Back to normal behavior: only warn once
for i in range(10):
np.true_divide(1, 0)
There are also more granular options for changing the configuration on specific types of warnings. For that, check out the docs.

Categories

Resources