I want to use hypothesis awesome features to create some sample data for my application. I use it roughly like this
from hypothesis import strategies as st
ints = st.integers() #simplified example
ints.example()
I get this warning:
NonInteractiveExampleWarning: The .example() method is good for exploring strategies, but should only be used interactively
Is there a simple way to disable this warning? Just to be clear: I want to use the example data generation outside of a testing and in a non-interactive context and I'm aware of what the warning is referring to. I just want to get rid of it.
The warnings module lets you selectively ignore specific warnings; for example:
from hypothesis.errors import NonInteractiveExampleWarning
from hypothesis import strategies as st
import warnings
with warnings.catch_warnings():
warnings.filterwarnings("ignore", category=NonInteractiveExampleWarning)
ints = st.integers()
print( ints.example() )
Related
It seems as though some scipy modules are messing with my warning filters. Consider the following code. My understanding is that it should only throw one warning because of the "once" filter I supplied to my custom Warning class. However, the warning after the scipy import gets shown as well.
This is with python 3.7 and scipy 1.6.3.
import warnings
class W(DeprecationWarning): pass
warnings.simplefilter("once", W)
warnings.warn('warning!', W)
warnings.warn('warning!', W)
from scipy import interpolate
warnings.warn('warning!', W)
This only seems to happen when I import certain scipy modules. A generic "import scipy" doesn't do this.
I've narrowed it down to the filters set in scipy.special.sf_error.py and scipy.sparse.__init__.py. I don't see how that code would cause the problem, but it does. When I comment those filtersout, my code works as expected.
Am I misunderstanding something? Is there a work-around that that doesn't involved overwriting warnings.filterwarnings/warnings.simplefilters?
This an open Python bug: https://bugs.python.org/issue29672.
Note, in particular, the last part of the comment by Tom Aldcroft:
Even a documentation update would be useful. This could explain not only catch_warnings(), but in general the unexpected feature that if any package anywhere in the stack sets a warning filter, then that globally resets whether a warning has been seen before (via the call to _filters_mutated()).
The code in scipy/special/sf_error.py sets a warning filter, and that causes a global reset of which warnings have been seen before. (If you add another call of warnings.warn('warning!', W) to the end of your sample code, you should see that it does not raise a warning.)
I am struggling with properly organising Python code. My struggle stems from namespaces in Python. More specifically, as I understand, every module has its own specific namespace and if not stated specifically, elements from one namespace will not be available in another namespace.
To give an example what I mean, consider the function subfile1.py:
def foo();
print(a)
Now if I want the foo() function to print a from the namespace of another module/script, I will have to set it explicitly for that subfile1 module; e.g. like in main.py:
import subfile1
a = 4
subfile1.a = 4
subfile1.foo()
When I code (in R), it’s generally to provide data analytics reports. So I have to import data, clean data and perform analyses on this data. My coding projects (to be re-run every couple of months) are then organised as follows: I have a main script from which I run other sub-scripts. I find this very handy I have separate sub-scripts for:
Setting paths
Loading data
Data cleaning
Specific analyses
Simply looking at the names of the sub-scripts listed in the main script gives me then a good overview of what is being done and where I need to add additional sub-scripts if needed.
With Python I don’t find this an easy way of working because of the namespaces. If I would like to keep this way of working in Python, it seems to me that I would have to import all sub-scripts as modules and set the necessary variables for each module explicitly (either in the module or in the main script as in the example above).
So I thought that maybe my way of working is simply not optimal, at least in a Python setting. Could anyone tell me how best to organise the project I have in mind? Or put differently, what is a good way of organising code if the same data/variables has to be used at various places in the code?
Update:
After reading the comments, I made another example. Now with classes. Any comments on whether this is an appropriate way of structuring code or not would be very welcome. My main aim is to split the code in bits and pieces so a third party can quickly gauge what is going on and make changes to the code where necessary.
I have a master_file.py in which I run the code:
from sub_file_1 import DoSomething
from sub_file_2 import DoSomethingElse
from sub_file_3 import CombineTheStuff
x1 = DoSomething.c #could e.g. be import from a data source and transform the data
x2 = DoSomethingElse.d #could e.g. be import from another data source and transform the data taking into its particularities
y = CombineTheStuff(x1, x2) # e.g. with the data imported and cleaned (x1 and x2) I can now estimate a particular model
print(x1)
print(x2)
print(y.fin_result())
sub_file_1.py contains (in reality these sub_files are very lengthy)
from param_file import GeneralPar
class DoSomething():
c = GeneralPar.a + GeneralPar.b
sub_file_2.py contains
from param_file import GeneralPar
class DoSomethingElse():
d = GeneralPar.a * 4
sub_file_3.py contains
class CombineTheStuff():
def __init__(self, input1, input2):
self.input1 = input1
self.input2 = input2
def fin_result(self):
k = self.input1 * self.input2 - 0.01
return(k)
In REDHAWK IDE (v2.12), I am trying to use fcalc component for some math calculations. I tried to follow an example in the doc by putting math.sin(a+b)+random.random() in the equation field, but I got the following error:
CF.PropertySetPackage.InvalidConfiguration: Failure: . Properties: equation
IDL:CF/PropertySet/InvalidConfiguration:1.0
I also tried other math functions, such as sqrt. However, none of them worked. It is very hard to add any modules in the import field as well.
Did I do anything wrong while using this fcalc component?
It appears that the property change listener is not being triggered for the initial property configuration when launched in the IDE sandbox. There are several workarounds:
Manually configure the import property after launching the component, which will trigger the property change listener. Adding time to the list of imports, for example, will then import math and random as well.
Use the Python sandbox instead of the IDE sandbox
>>> from ossie.utils import sb
>>> fcalc = sb.launch('rh.fcalc')
2019-01-04 11:55:44 WARNING rh_fcalc:176 - NOT overriding global namespace with random from random
>>> fcalc.equation = 'sin(a+b)+random.random()'
The warning is expected and just indicates that you can't use random() in the equation without the full namespace random.random() because it would conflict with the random library.
Launch rh.fcalc in a waveform in a domain
import rpy2.robjects as robjects
dffunc = sc.parallelize([(0,robjects.r.rnorm),(1,robjects.r.runif)])
dffunc.collect()
Outputs
[(0, <rpy2.rinterface.SexpClosure - Python:0x7f2ecfc28618 / R:0x26abd18>), (1, <rpy2.rinterface.SexpClosure - Python:0x7f2ecfc283d8 / R:0x26aad28>)]
While the partitioned version results in an error:
dffuncpart = dffunc.partitionBy(2)
dffuncpart.collect()
RuntimeError: ('R cannot evaluate code before being initialized.', <built-in function unserialize>
It seems like this error is that R wasn't loaded on one of the partitions, which I assume implies that the first import step was not performed. Is there anyway around this?
EDIT 1 This second example causes me to think there's a bug in the timing of pyspark or rpy2.
dffunc = sc.parallelize([(0,robjects.r.rnorm), (1,robjects.r.runif)]).partitionBy(2)
def loadmodel(model):
import rpy2.robjects as robjects
return model[1](2)
dffunc.map(loadmodel).collect()
Produces the same error R cannot evaluate code before being initialized.
dffuncpickle = sc.parallelize([(0,pickle.dumps(robjects.r.rnorm)),(1,pickle.dumps(robjects.r.runif))]).partitionBy(2)
def loadmodelpickle(model):
import rpy2.robjects as robjects
import pickle
return pickle.loads(model[1])(2)
dffuncpickle.map(loadmodelpickle).collect()
Works just as expected.
I'd like to say that "this is not a bug in rpy2, this is a feature" but I'll realistically have to settle with "this is a limitation".
What is happening is that rpy2 has 2 interface levels. One is a low-level one (closer to R's C API) and available through rpy2.rinterface and the other one is a high-level interface with more bells and whistles, more "pythonic", and with classes for R objects inheriting from rinterface level-ones (that last part is important for the part about pickling below). Importing the high-level interface results in initializing (starting) the embedded R with default parameters if necessary. Importing the low-level interface rinterface does not have this side effect and the initialization of the embedded R must be performed explicitly (function initr). rpy2 was designed this way because the initialization of the embedded R can have parameters: importing first rpy2.rinterface, setting the initialization, then importing rpy2.robjects makes this possible.
In addition to that the serialization (pickling) of R objects wrapped by rpy2 is currently only defined at the rinterface level (see the documentation). Pickling robjects-level (high-level) rpy2 objects is using the rinterface-level code and when unpickling them they will remain at that lower-level (the Python pickle contains the module the class of the object is defined in and will import that module - here rinterface, which does not imply the initialization of the embedded R). The reason for things being this way are simply that it was "good enough for now": at the time this was implemented I had to simultaneously think of a good way to bridge two somewhat different languages and learn my way through Python C-API and pickling/unpickling Python objects. Given the ease with which one can write something like
import rpy2.robjects
or
import rpy2.rinterface
rpy2.rinterface.initr()
before unpickling, this was never revisited. The uses of rpy2's pickling I know about are using Python's multiprocessing (and adding something similar to the import statements in the code initializing a child process was a cheap and sufficient fix). May this is the time to look at this again. File a bug report for rpy2 if the case.
edit: this is undoubtedly an issue with rpy2. pickled robjects-level objects should unpickle back to robjects-level, not rinterface-level. I have opened an issue in the rpy2 tracker (and already pushed a rudimentary patch in the default/dev branch).
2nd edit: The patch is part of released rpy2 starting with version 2.7.7 (latest release at the time of writing is 2.7.8).
How can I see a warning again without restarting python. Now I see them only once.
Consider this code for example:
import pandas as pd
pd.Series([1]) / 0
I get
RuntimeWarning: divide by zero encountered in true_divide
But when I run it again it executes silently.
How can I see the warning again without restarting python?
I have tried to do
del __warningregistry__
but that doesn't help.
Seems like only some types of warnings are stored there.
For example if I do:
def f():
X = pd.DataFrame(dict(a=[1,2,3],b=[4,5,6]))
Y = X.iloc[:2]
Y['c'] = 8
then this will raise warning only first time when f() is called.
However, now when if do del __warningregistry__ I can see the warning again.
What is the difference between first and second warning? Why only the second one is stored in this __warningregistry__? Where is the first one stored?
How can I see the warning again without restarting python?
As long as you do the following at the beginning of your script, you will not need to restart.
import pandas as pd
import numpy as np
import warnings
np.seterr(all='warn')
warnings.simplefilter("always")
At this point every time you attempt to divide by zero, it will display
RuntimeWarning: divide by zero encountered in true_divide
Explanation:
We are setting up a couple warning filters. The first (np.seterr) is telling NumPy how it should handle warnings. I have set it to show warnings on all, but if you are only interested in seeing the Divide by zero warnings, change the parameter from all to divide.
Next we change how we want the warnings module to always display warnings. We do this by setting up a warning filter.
What is the difference between first and second warning? Why only the second one is stored in this __warningregistry__? Where is the first one stored?
This is described in the bug report reporting this issue:
If you didn't raise the warning before using the simple filter, this
would have worked. The undesired behavior is because of
__warningsregistry__. It is set the first time the warning is emitted.
When the second warning comes through, the filter isn't even looked at.
I think the best way to fix this is to invalidate __warningsregistry__
when a filter is used. It would probably be best to store warnings data
in a global then instead of on the module, so it is easy to invalidate.
Incidentally, the bug has been closed as fixed for versions 3.4 and 3.5.
warnings is a pretty awesome standard library module. You're going to enjoy getting to know it :)
A little background
The default behavior of warnings is to only show a particular warning, coming from a particular line, on its first occurrence. For instance, the following code will result in two warnings shown to the user:
import numpy as np
# 10 warnings, but only the first copy will be shown
for i in range(10):
np.true_divide(1, 0)
# This is on a separate line from the other "copies", so its warning will show
np.true_divide(1, 0)
You have a few options to change this behavior.
Option 1: Reset the warnings registry
when you want python to "forget" what warnings you've seen before, you can use resetwarnings:
# warns every time, because the warnings registry has been reset
for i in range(10):
warnings.resetwarnings()
np.true_divide(1, 0)
Note that this also resets any warning configuration changes you've made. Which brings me to...
Option 2: Change the warnings configuration
The warnings module documentation covers this in greater detail, but one straightforward option is just to use a simplefilter to change that default behavior.
import warnings
import numpy as np
# Show all warnings
warnings.simplefilter('always')
for i in range(10):
# Now this will warn every loop
np.true_divide(1, 0)
Since this is a global configuration change, it has global effects which you'll likely want to avoid (all warnings anywhere in your application will show every time). A less drastic option is to use the context manager:
with warnings.catch_warnings():
warnings.simplefilter('always')
for i in range(10):
# This will warn every loop
np.true_divide(1, 0)
# Back to normal behavior: only warn once
for i in range(10):
np.true_divide(1, 0)
There are also more granular options for changing the configuration on specific types of warnings. For that, check out the docs.