Function argrelextrema from scipy.signal does not detect flat extrema.
Example:
import numpy as np
from scipy.signal import argrelextrema
data = np.array([ 0, 1, 2, 1, 0, 1, 3, 3, 1, 0 ])
argrelextrema(data, np.greater)
(array([2]),)
the first max (2) is detected, the second max (3, 3) is not detected.
Any workaround for this behaviour?
Thanks.
Short answer: Probably argrelextrema will not be flexible enough for your task. Consider writing your own function matching your needs.
Longer answer: Are you bound to use argrelextrema? If yes, then you can play around with the comparator and the order arguments of argrelextrema (see the reference).
For your easy example, it would be enough to chose np.greater_equal as comparator.
>>> data = np.array([ 0, 1, 2, 1, 0, 1, 3, 3, 1, 0 ])
>>> print(argrelextrema(data, np.greater_equal,order=1))
(array([2, 6, 7]),)
Note however that in this way
>>> data = np.array([ 0, 1, 2, 1, 0, 1, 3, 3, 4, 1, 0 ])
>>> print(argrelextrema(data, np.greater_equal,order=1))
(array([2, 6, 8]),)
behaves differently that you would probably like, finding the first 3 and the 4 as maxima, since argrelextrema now sees everything as a maximum that is greater or equal to its two nearest neighbors. You can now use the order argument to decide to how many neighbors this comparison must hold - choosing order=2 would change my upper example to only find 4 as a maximum.
>>> print(argrelextrema(data, np.greater_equal,order=2))
(array([2, 8]),)
There is, however, a downside to this - let's change the data once more:
>>> data = np.array([ 0, 1, 2, 1, 0, 1, 3, 3, 4, 1, 5 ])
>>> print(argrelextrema(data, np.greater_equal,order=2))
(array([ 2, 10]),)
Adding another peak as a last value keeps you from finding your peak at 4, as argrelextrema is now seeing a second-neighbor that is greater than 4 (which can be useful for noisy data, but not necessarily the behavior expected in all cases).
Using argrelextrema, you will always be limited to binary operations between a fixed number of neighbors. Note, however, that all argrelextrema is doing in your example above is to return n, if data[n] > data[n-1] and data[n] > data[n+1]. You could easily implement this yourself, and then refine the rules, for example by checking the second neighbor in case that the first neighbor has the same value.
For the sake of completeness, there seems to be a more elaborate function in scipy.signal, find_peaks_cwt. I have however no experience using it and can therefore not give you more details about it.
I'm really surprised that no one figured out an answer to this. All you need to do is preprocess the array to remove duplicates that are located next to each other and you can run argrelextrema like so:
import numpy as np
from scipy.signal import argrelextrema
data = np.array([ 0, 1, 2, 1, 0, 1, 3, 3, 1, 0 ])
filter_table = [False] + list(np.equal(data[:-1], data[1:]))
data = np.array([x for idx, x in enumerate(data) if not filter_table[idx]])
argrelextrema(data, np.greater)
Related
Is there a way to copy values from one numpy masked array to another where ONLY the unmasked values are copied and the target values are left unchanged for masked source values? It seems like this should be handled automatically, but so far I haven't found a good way to do it. Right now I'm using ma.choose with the target region of the destination and the mask, but it really seems like there should be a better way given that the entire purpose of the masked array is to not operate on masked values automatically.
import numpy as np
from numpy import ma
x = ma.array([1, 2, 3, 4], mask=[0, 1, 1, 0])
y = np.array([5, 6, 7, 8])
y[~x.mask] = x[~x.mask]
which gives for y:
array([1, 6, 7, 4])
I have a question related to finding maxima or more preciseley discontinuities in a numpy array?
My exemplary data looks for example like this
a = np.array([3,4,5,8,7,6,5,4,1])
In general, I am interested in every maximum/jump in the data. For array a, I want to detect the 8 since it is a maximum (growing numbers on the left side and decreasing numbers on the right) and the value of 4, since the data drops after this value. Until now, I have used scipy.signal.argrelextrema
with np.greater to detect maxima, but I am not able to detect these jumps/discontinuities. For the data I am looking at, only a jump towards smaller values can occur not the opposite. Is there an easy pythonic way to detect these jumps?
Let's try this:
threshold = 1
a = np.array([3, 4, 5, 8, 7, 6, 5, 4, 1])
discontinuities_idx = np.where(abs(np.diff(a))>threshold)[0] + 1
np.diff(a) gives the difference between every component of a:
>>> array([ 1, 1, 3, -1, -1, -1, -1, -3])
From then np.where(abs(np.diff(a))>threshold)[0] is applied to find where detected discontinuities are (above user specified threshold in terms of absolute difference). Finally, you may add +1 to compensate for n=1 difference idx if needed (see np.diff kwargs) depending on which side of the discontinuities you need to be.
>>> discontinuities_idx
>>> array([3, 8])
>>> a[discontinuities_idx]
>>> array([8, 1])
It sounds like mathemathical analysis where you need to define some conditions like a'(x)>0 or a'(x)<0. So you can mask them:
a = np.array([3,4,5,8,7,8,6,5,4,9,2,9,9,7])
mask1 = np.diff(a) > 0
mask2 = np.diff(a) < 0
>>> np.flatnonzero(mask1[:-1] & mask2[1:]) + 1
array([3, 5, 9], dtype=int64)
It returns indices of items where maxima is met.
You can try this:
import numpy as np
import math
a = np.array([3,4,5,8,7,6,5,4,1])
MaxJump = np.diff(a)
print(MaxJump)
print(len(MaxJump))
MaxJump1 = []
for i in range (len(MaxJump)):
MaxJump1.append(math.fabs(MaxJump[i]))
print(MaxJump1)
MaxJump3 = np.max(MaxJump1)
print(MaxJump3)
What is the difference between sklearn.metrics.jaccard_similarity_score and sklearn.metrics.accuracy_score ?
1.When do we use accuracy_score ?
2.When do we use jaccard ?
3.I know the formula.Could someone explain the algorithm behind these metrics.
4.How can I calculate jaccard on my dataframes?
array([[1, 1, 1, 1, 2, 0, 1, 0],
[2, 1, 1, 0, 1, 1, 0, 1]], dtype=int64)
thanks
The accuracy_score is straight forward, which is one of the reasons why it is a common choice. It's the amount of correcty classified samples divided by the total, so in your case:
from sklearn.metrics import jaccard_score, accuracy_score
print(a)
array([[1, 1, 1, 1, 2, 0, 1, 0],
[2, 1, 1, 0, 1, 1, 0, 1]])
accuracy_score(a[0,:], a[1,:])
# 0.25
Which is the same as doing:
(a[0,:] == a[1,:]).sum()/a.shape[1]
# 0.25
The jaccard_score is suited specially for certain problems, such as in object detection. You can get a better understanding by taking a look at Jaccard index, which is also known as intersection over union, and measures the overlap of two sample sets divided by the union (sample size minus the intersection).
Note that sklearn.metrics.jaccard_similarity_score is deprecated, and you should probably be looking at sklearn.metrics.jaccard_score. The latter has several averaging modes, depending on the what you're most interested in. By default is is in binary which you should change since you're dealing with multiple labels.
So depending on your application you'll be more interested in one or the other. Though if you aren't sure I'd suggest you to go with the simpler, which is the accuracy score.
I am trying very simply to plot subplots generated by the PyMC3 traceplot function (see here) to a file.
The function generates a numpy.ndarray (2d) of subplots.
I need to move or copy these subplots into a matplotlib.figure in order to save the image file. Everything I can find shows how to generate the figure's subplots first, then build them out.
As a minimum example, I lifted the sample PyMC3 code from Here, and added to it just a few lines in an attempt to handle the subplots.
from pymc3 import *
import theano.tensor as tt
from theano import as_op
from numpy import arange, array, empty
### Added these three lines relative to source #######################
import matplotlib
matplotlib.use('Agg')
import matplotlib.pyplot as plt
__all__ = ['disasters_data', 'switchpoint', 'early_mean', 'late_mean', 'rate', 'disasters']
# Time series of recorded coal mining disasters in the UK from 1851 to 1962
disasters_data = array([4, 5, 4, 0, 1, 4, 3, 4, 0, 6, 3, 3, 4, 0, 2, 6,
3, 3, 5, 4, 5, 3, 1, 4, 4, 1, 5, 5, 3, 4, 2, 5,
2, 2, 3, 4, 2, 1, 3, 2, 2, 1, 1, 1, 1, 3, 0, 0,
1, 0, 1, 1, 0, 0, 3, 1, 0, 3, 2, 2, 0, 1, 1, 1,
0, 1, 0, 1, 0, 0, 0, 2, 1, 0, 0, 0, 1, 1, 0, 2,
3, 3, 1, 1, 2, 1, 1, 1, 1, 2, 4, 2, 0, 0, 1, 4,
0, 0, 0, 1, 0, 0, 0, 0, 0, 1, 0, 0, 1, 0, 1])
years = len(disasters_data)
#as_op(itypes=[tt.lscalar, tt.dscalar, tt.dscalar], otypes=[tt.dvector])
def rateFunc(switchpoint, early_mean, late_mean):
out = empty(years)
out[:switchpoint] = early_mean
out[switchpoint:] = late_mean
return out
with Model() as model:
# Prior for distribution of switchpoint location
switchpoint = DiscreteUniform('switchpoint', lower=0, upper=years)
# Priors for pre- and post-switch mean number of disasters
early_mean = Exponential('early_mean', lam=1.)
late_mean = Exponential('late_mean', lam=1.)
# Allocate appropriate Poisson rates to years before and after current switchpoint location
rate = rateFunc(switchpoint, early_mean, late_mean)
# Data likelihood
disasters = Poisson('disasters', rate, observed=disasters_data)
# Initial values for stochastic nodes
start = {'early_mean': 2., 'late_mean': 3.}
# Use slice sampler for means
step1 = Slice([early_mean, late_mean])
# Use Metropolis for switchpoint, since it accomodates discrete variables
step2 = Metropolis([switchpoint])
# njobs>1 works only with most recent (mid August 2014) Thenao version:
# https://github.com/Theano/Theano/pull/2021
tr = sample(1000, tune=500, start=start, step=[step1, step2], njobs=1)
### gnashing of teeth starts here ################################
fig, axarr = plt.subplots(3,2)
# This gives a KeyError
# axarr = traceplot(tr, axarr)
# This finishes without error
trarr = traceplot(tr)
# doesn't work
# axarr[0, 0] = trarr[0, 0]
fig.savefig("disaster.png")
I've tried a few variations along the subplot() and add_subplot() lines, to no avail -- all errors point toward the fact that empty subplots must first be created for the figure, not assigned to pre-existing subplots.
A different example (see here, about 80% of the way down, beginning with
### Mysterious code to be explained in Chapter 3.
) avoids the utility altogether and builds out the subplots manually, so maybe there's no good answer to this? Is the pymc3.traceplot output indeed an orphaned ndarray of subplots that can't be used?
I ran into the same problem. I am working with pymc3 3.5 and matplotlib 2.1.2.
I realized it's possible to export the traceplot by:
trarr = traceplot(tr)
fig = plt.gcf() # to get the current figure...
fig.savefig("disaster.png") # and save it directly
Can you print type(trarr[0,0]) and post the result?
First of all, matplotlib axes objects are part of a figure and can only live inside a figure. It is therefore not possible to simply take an axes and put it to a different figure. However, in your case it may be, that fig.add_axes(trarr[0,0]) nonetheless works. I doubt it, but you can still try.
Apart from that, traceplot() has a keyword argument called ax.
ax : axes
Matplotlib axes. Defaults to None.
Although it is pretty unclear, how you'd specify several subplots as one axes object, you can still try to play around with it. Try to put a single axes in or your own created subplots axes array axarr or only part of it.
Edit, just that noone oversees the small line in the comments:
According to the answer in the bug report, traceplot(tr, ax = axarr) is indeed reported to work just fine.
It should be a standard question but I am not able find the answer :(
I have a numpy darray n samples (raw) and p variables (observation).
I would like to count how many times each variables is non 0.
I would use a function like
sum([1 for i in column if i!=0])
but how can I apply this function to all the columns of my matrix?
from this post: How to apply numpy.linalg.norm to each row of a matrix?
If the operation supports axis, use the axis parameter, it's usually faster,
Otherwise, np.apply_along_axis could help.
Here is the numpy.count_nonzero.
So here is the simple answer:
import numpy as np
arr = np.eye(3)
np.apply_along_axis(np.count_nonzero, 0, arr)
You can use np.sum over a boolean array created from comparing your original array to zero, using the axis keyword argument to indicate whether you want to count over rows or columns. In your case:
>>> a = np.array([[0, 1, 1, 0],[1, 1, 0, 0]])
>>> a
array([[0, 1, 1, 0],
[1, 1, 0, 0]])
>>> np.sum(a != 0, axis=0)
array([1, 2, 1, 0])