Is a negative log likelihood necessary small? - python

I am having issues implementing a strategy described in the article "Blind identification strategies for room occupancy estimation" in Python
In order to make my question as easy to answer as possible, I will do my best to keep the level of detail to the bare minimum
Here are the Mathematics of the problem
My aim is to minimize L as it represents the negative log likelihood function. The problem : the implementation leads to astronomically high values that don't seem coherent (infinite for certain values even). Is it? Because my results do not correspond to the ones given in the article either and I can't find why
Here is the L function I defined in Python:
import numpy as np
from numpy.linalg import inv,det
from numpy import dot,transpose,log
def L(ARX,O,u,y):
#Preliminaries: Break down theta into a,bu,bo,s2,o(1),...,o(N)
N=O.shape[0]
[a,bu,bo,s2]=ARX #the coefficient of the ARX system to be identified
I=np.eye(N) #the identity matrix for later calculations
#Delta matrix
Delta=np.zeros([N,N])
Delta[1:N,0:N-1]=np.eye(N-1)
#y_barre
y_barre=dot(I-a*Delta,y)-dot(bu*Delta,u)-dot(bo*Delta,O)
#Sigma_y (covariance matrix)
Sy=s2*dot(inv(I-a*Delta),inv(I-a*Delta).T)
#finally the negative log likelihood is given by
return log(det(Sy))+1/s2*dot(y_barre.T,y_barre)

Related

Odeint function by scipy.integrate does not follow the given time vector when solving the system

I am trying to solve a ODE matrix system in which one of the terms change with time. To change it to the accurate value, the code should search within the predefined time vector to find at what instant it is, and use the same index (named control on the code) to set the values in the matrix, as shown below:
from scipy.integrate import odeint as ode
#(Here, I cut some of the non-relevant to the issue objects, that I can guarantee that are correct)
def susp(x,TT):
dxdt=[]
control=0
dxdtlong=np.zeros((1,4),float)
for i in range(time_size):
ttt=t[i]
if TT!=ttt:
control+=1
else:
break
W[0,0]=zr[control]
W[1,0]=zrp[control]
matriz1=np.matrix(np.dot(A,x))
matriz2=np.matrix(np.dot(H,W))
for j in range(len(matriz1)):
for k in range(len(matriz1[0])):
dxdtlong=matriz1[k][j]+matriz2[k][j]
dxdt.append(dxdtlong[0,0])
dxdt.append(dxdtlong[0,1])
dxdt.append(dxdtlong[0,2])
dxdt.append(dxdtlong[0,3])
data=np.reshape(dxdt,np.size(dxdt))
return data
x=ode(susp, x0_reshaped, t, full_output=1)
t is a equally spaced time vector, created with linspace.
Oddly, when running this code, the second time point called by ODE isn't t[1], but a value that isn't even in my time vector.
Can someone please explain to me what I am doing wrong?

Python curve fitting using MLE and obtaining standard errors for parameter estimates

The original problem
While translating MATLAB code to python, I have the function [parmhat,parmci] = gpfit(x,alpha). This function fits a Generalized Pareto Distribution and returns the parameter estimates, parmhat, and the 100(1-alpha)% confidence intervals for the parameter estimates, parmci.
MATLAB also provides the function gplike that returns acov, the inverse of Fisher's information matrix. This matrix contains the asymptotic variances on the diagonal when using MLE. I have the feeling this can be coupled to the confidence intervals as well, however my statistics background is not strong enough to understand if this is true.
What I am looking for is Python code that gives me the parmci values (I can get the parmhat values by using scipy.stats.genpareto.fit). I have been scouring Google and Stackoverflow for 2 days now, and I cannot find any approach that works for me.
While I am specifically working with the Generalized Pareto Distribution, I think this question can apply to many more (if not all) distributions that scipy.stats has.
My data: I am interested in the shape and scale parameters of the generalized pareto fit, the location parameter should be fixed at 0 for my fit.
What I have done so far
scipy.stats While scipy.stats provides nice fitting performance, this library does not offer a way to calculate the confidence interval on the parameter estimates of the distribution fitter.
scipy.optimize.curve_fit As an alternative I have seen suggested to use scipy.optimize.curve_fit instead, as this does provide the estimated covariance of the parameter estimated. However that fitting method uses least squares, whereas I need to use MLE and I didn't see a way to make curve_fit use MLE instead. Therefore it seems that I cannot use curve_fit.
statsmodel.GenericLikelihoodModel Next I found a suggestion to use statsmodel.GenericLikelihoodModel. The original question there used a gamma distribution and asked for a non-zero location parameter. I altered the code to:
import numpy as np
from statsmodels.base.model import GenericLikelihoodModel
from scipy.stats import genpareto
# Data contains 24 experimentally obtained values
data = np.array([3.3768732 , 0.19022354, 2.5862942 , 0.27892331, 2.52901677,
0.90682787, 0.06842895, 0.90682787, 0.85465385, 0.21899145,
0.03701204, 0.3934396 , 0.06842895, 0.27892331, 0.03701204,
0.03701204, 2.25411215, 3.01049545, 2.21428639, 0.6701813 ,
0.61671203, 0.03701204, 1.66554224, 0.47953739, 0.77665706,
2.47123239, 0.06842895, 4.62970341, 1.0827188 , 0.7512669 ,
0.36582134, 2.13282122, 0.33655947, 3.29093622, 1.5082936 ,
1.66554224, 1.57606579, 0.50645878, 0.0793677 , 1.10646119,
0.85465385, 0.00534871, 0.47953739, 2.1937636 , 1.48512994,
0.27892331, 0.82967374, 0.58905024, 0.06842895, 0.61671203,
0.724393 , 0.33655947, 0.06842895, 0.30709881, 0.58905024,
0.12900442, 1.81854273, 0.1597266 , 0.61671203, 1.39384127,
3.27432715, 1.66554224, 0.42232511, 0.6701813 , 0.80323855,
0.36582134])
params = genpareto.fit(data, floc=0, scale=0)
# HOW TO ESTIMATE/GET ERRORS FOR EACH PARAM?
print(params)
print('\n')
class Genpareto(GenericLikelihoodModel):
nparams = 2
def loglike(self, params):
# params = (shape, loc, scale)
return genpareto.logpdf(self.endog, params[0], 0, params[2]).sum()
res = Genpareto(data).fit(start_params=params)
res.df_model = 2
res.df_resid = len(data) - res.df_model
print(res.summary())
This gives me a somewhat reasonable fit:
Scipy stats fit: (0.007194143471555344, 0, 1.005020562073944)
Genpareto fit: (0.00716650293, 8.47750397e-05, 1.00504535)
However in the end I get an error when it tries to calculate the covariance:
HessianInversionWarning: Inverting hessian failed, no bse or cov_params available
If I do return genpareto.logpdf(self.endog, *params).sum() I get a worse fit compared to scipy stats.
Bootstrapping Lastly I found mentions to bootstrapping. While I did sort of understand what's the idea behind it, I have no clue how to implement it. What I understand is that you should resample N times (1000 for example) from your data set (24 points in my case). Then do a fit on that sub-sample, and register the fit result. Then do a statistical analysis on the N results, i.e. calculating mean, std_dev and then confidence interval, like Estimate confidence intervals for parameters of distribution in python or Compute a confidence interval from sample data assuming unknown distribution. I even found some old MATLAB documentation on the calculations behind gpfit explaining this.
However I need my code to run fast, and I am not sure if any implementation that I make will do this calculation fast.
Conclusions Does anyone know of a Python function that calculates this in an efficient manner, or can point me to a topic where this has been explained already in a way that it works for my case at least?
I had the same issue with GenericLikelihoodModel and I came across this post (https://pystatsmodels.narkive.com/9ndGFxYe/mle-error-warning-inverting-hessian-failed-maybe-i-cant-use-matrix-containers) which suggests using different starting parameter values to get a result with positive hessian. Solved my problem.

scipy.signal.resample: malfunction with even number of points?

It seems that scipy.signal.resample() makes errors when downsampling to an even number of points. For example, if we upsample a function to a multiple of the original points and then downsample again, we should get the original function back.
from scipy import signal
import numpy as np
def test_resample(n1,n2): # upsample from n1 to n2 points and back
x1=np.arange(n1)
y1=np.sin(x1)
y2,x2=signal.resample(y1,n2,x1)
y3,x3=signal.resample(y2,n1,x2)
print np.allclose(y1,y3)
But this fails when the lower number of points is even:
test_resample(10,20)
False
test_resample(11,22)
True
test_resample(11,33)
True
The problem occurs at the downsampling step. The errors are large, at least several percent for functions I tested.
Update of 4/8/17: This really seems to be a coding error. I reported details of the bug here.

Using cross-correlation to detect an audio signal within another signal

I am trying to write a script in python to detect the existence of a simple alarm sound in any given input audio file. I explain my solution and I appreciate it if anyone can confirm it is a good solution. Any other solution implementable in python is appreciated.
The way I do this is calculating cross correlation of the two signals by calculating FFT of both signals (one is reversed), and multiplying them together and then calculating IFFT of the result. Then finding the peak of the result and comparing it with a pre-specified threshold would determine if the alarm sound is detected or not.
This is my code:
import scipy.fftpack as fftpack
def similarity(template, test):
corr = fftpack.irfft(fftpack.rfft(test , 2 * test.size ) * \
fftpack.rfft(template[:-1] , 2 * template.size ))
return max(abs(corr))
template and test are the 1-D lists of signal data. The second argument to rfft is used to pad zeros for calculating FFT. however, I am not sure how many zeros should be added. Also, should I do any normalisation o the given signal before applying FFT? for example, normalizing it based on the peak of template signal?
Solved!
I just needed to use scipy.signal.fftconvolve which takes care of zero padding itself. No normalization was required. So the working code for me is:
from scipy.signal import fftconvolve
def similarity(template, test):
corr = fftconvolve(template, test, mode='same')
return max(abs(corr))

Generating a graph with certain degree distribution?

I am trying to generate a random graph that has small-world properties (exhibits a power law distribution). I just started using the networkx package and discovered that it offers a variety of random graph generation. Can someone tell me if it possible to generate a graph where a given node's degree follows a gamma distribution (either in R or using python's networkx package)?
If you want to use the configuration model something like this should work in NetworkX:
import random
import networkx as nx
z=[int(random.gammavariate(alpha=9.0,beta=2.0)) for i in range(100)]
G=nx.configuration_model(z)
You might need to adjust the mean of the sequence z depending on parameters in the gamma distribution. Also z doesn't need to be graphical (you'll get a multigraph), but it does need an even sum so you might have to try a few random sequences (or add 1)...
The NetworkX documentation notes for configuration_model give another example, a reference and how to remove parallel edges and self loops:
Notes
-----
As described by Newman [1]_.
A non-graphical degree sequence (not realizable by some simple
graph) is allowed since this function returns graphs with self
loops and parallel edges. An exception is raised if the degree
sequence does not have an even sum.
This configuration model construction process can lead to
duplicate edges and loops. You can remove the self-loops and
parallel edges (see below) which will likely result in a graph
that doesn't have the exact degree sequence specified. This
"finite-size effect" decreases as the size of the graph increases.
References
----------
.. [1] M.E.J. Newman, "The structure and function
of complex networks", SIAM REVIEW 45-2, pp 167-256, 2003.
Examples
--------
>>> from networkx.utils import powerlaw_sequence
>>> z=nx.create_degree_sequence(100,powerlaw_sequence)
>>> G=nx.configuration_model(z)
To remove parallel edges:
>>> G=nx.Graph(G)
To remove self loops:
>>> G.remove_edges_from(G.selfloop_edges())
Here is an example similar to the one at http://networkx.lanl.gov/examples/drawing/degree_histogram.html that makes a drawing including a graph layout of the largest connected component:
#!/usr/bin/env python
import random
import matplotlib.pyplot as plt
import networkx as nx
def seq(n):
return [random.gammavariate(alpha=2.0,beta=1.0) for i in range(100)]
z=nx.create_degree_sequence(100,seq)
nx.is_valid_degree_sequence(z)
G=nx.configuration_model(z) # configuration model
degree_sequence=sorted(nx.degree(G).values(),reverse=True) # degree sequence
print "Degree sequence", degree_sequence
dmax=max(degree_sequence)
plt.hist(degree_sequence,bins=dmax)
plt.title("Degree histogram")
plt.ylabel("count")
plt.xlabel("degree")
# draw graph in inset
plt.axes([0.45,0.45,0.45,0.45])
Gcc=nx.connected_component_subgraphs(G)[0]
pos=nx.spring_layout(Gcc)
plt.axis('off')
nx.draw_networkx_nodes(Gcc,pos,node_size=20)
nx.draw_networkx_edges(Gcc,pos,alpha=0.4)
plt.savefig("degree_histogram.png")
plt.show()
I did this a while ago in base Python... IIRC, I used the following method. From memory, so this may not be entirely accurate, but hopefully it's worth something:
Chose the number of nodes, N, in your graph, and the density (existing edges over possible edges), D. This implies the number of edges, E.
For each node, assign its degree by first choosing a random positive number x and finding P(x), where P is your pdf. The node's degree is (P(x)*E/2) -1.
Chose a node at random, and connect it to another random node. If either node has realized its assigned degree, eliminate it from further selection. Repeat E times.
N.B. that this doesn't create a connected graph in general.
I know this is very late, but you can do the same thing, albeit a little more straightforward, with mathematica.
RandomGraph[DegreeGraphDistribution[{3, 3, 3, 3, 3, 3, 3, 3}], 4]
This will generate 4 random graphs, with each node having a prescribed degree.
Including the above mentioned, networkx provides 4 algorithms that receives the degree_distribution as an input:
configuration_model: explain by #eric
expected_degree_graph: use a probabilistic approach based on the expected degree of each node. It won't give you the exact degrees but an approximation.
havel_hakimi_graph: this one tries to connect the nodes with highest degree first
random_degree_sequence_graph: as far as I can see, this is similar to what #JonC suggested; it has a trials parameter as there is no guarantee of finding a suitable configuration.
The full list (including some versions of the algorithms for directed graphs) is here.
I also found a couple of papers:
Efficient and exact sampling of simple graphs with given arbitrary degree sequence
A Sequential Importance Sampling Algorithm for Generating Random Graphs with Prescribed Degrees

Categories

Resources