Symmetrical log scale array in python - python

I am trying to solve a 1D non linear Poisson equation (the charge distribution depends on the potential). I have been treating it as a minimization problem and have been using the fsolve function from the scipy.optimize module.
While qualitatively i get a reasonable answer, i had noticed that it varies with the distance between points in the array. It is reasonable as the solution (and its derivatives) are exponential. The solution if most affected near the boundaries of the space over which the problem is defined.
It appears that the amount of time required for 'fsolve' to complete its calculation increases dramatically with the number of points in the array. I have been looking into the option of using nonlinear spacing with the help of the 'logspace' function from numpy. However, this function gives tighter spacing at one side of the array only. I have been trying to generate two arrays using 'logspace' and concatenating them but have not managed to get the required outcome.
To clarify, i require an array in the range [0,x] (x is some float value) where the spacing between array points becomes smaller as they get closer to 0 or x. Any suggestions on how to accomplish this?

The following should give you a log-scale spacing between 0 and 1, so you can scale it to your requirements. I've included two solutions, with and without the boundary values.
import numpy
import math
#set number of spaces: num=?
logrange = numpy.logspace(0,math.log10(11),num=6)
#including boudary points
inclusive = numpy.hstack([logrange -1,21-logrange[-2:0:-1],20])/20
print(inclusive)
#excluding boundary points
exclusive = numpy.hstack([logrange[1:] -1,21-logrange[-2:0:-1]])/20
print(exclusive)

Related

how to set one fitting parameter larger than the other as constraints in iminuit in python?

I have two related fitting parameters. They have the same fitting range. Let's call them r1 and r2. I know I can limit the fitting range using minuit.limits, but I have an additional criteria that r2 has to be smaller than r1, can I do that in iminuit?
I've found this, I hope this can help you!
Extracted from: https://iminuit.readthedocs.io/en/stable/faq.html
**Can I have parameter limits that depend on each other (e.g. x^2 + y^2 < 3)?**ΒΆ
MINUIT was only designed to handle box constrains, meaning that the limits on the parameters are independent of each other and constant during the minimisation. If you want limits that depend on each other, you have three options (all with caveats), which are listed in increasing order of difficulty:
Change the variables so that the limits become independent. For example, transform from cartesian coordinates to polar coordinates for a circle. This is not always possible, of course.
Use another minimiser to locate the minimum which supports complex boundaries. The nlopt library and scipy.optimize have such minimisers. Once the minimum is found and if it is not near the boundary, place box constraints around the minimum and run iminuit to get the uncertainties (make sure that the box constraints are not too tight around the minimum). Neither nlopt nor scipy can give you the uncertainties.
Artificially increase the negative log-likelihood in the forbidden region. This is not as easy as it sounds.
The third method done properly is known as the interior point or barrier method. A glance at the Wikipedia article shows that one has to either run a series of minimisations with iminuit (and find a clever way of knowing when to stop) or implement this properly at the level of a Newton step, which would require changes to the complex and convoluted internals of MINUIT2.
Warning: you cannot just add a large value to the likelihood when the parameter boundary is violated. MIGRAD expects the likelihood function to be differential everywhere, because it uses the gradient of the likelihood to go downhill. The derivative at a discrete step is infinity and zero in the forbidden region. MIGRAD does not like this at all.

Total variation implementation in numpy for a piecewise linear function

I would like to use total variation in Python, but I wasn't able to find an existing implementation.
Assuming that I have an array with a finite number of elements, is the implementation with NumPy simply as:
import numpy as np
a = np.array([...], dtype=float)
tv = np.sum(np.abs(np.diff(a)))
My main doubt is how to compute the supremum of tv across all partitions, and if just the sum of the absolute difference might suffice for a finite array of floats.
Edit: My input array represents a piecewise linear function, therefore the supremum over the full set of partitions is indeed the sum of absolute differences between contiguous points.
Yes, that is correct.
I imagine you're confused by the mathy definition on the Wikipedia page for total variation. Have a look at the more practical definition on the Wikipedia page for total variation denoising instead.
For an actual code (even Python) implementation, see e.g. Tensorflow's total_variation(), though this is for one or more (2D, color) images, so the TV is computed for both rows and columns, and then added together.

Scipy Gaussian KDE : Matrix is not positive definite

I am trying to estimate the the density of a data set at certain points, using scipy.
from scipy.stats import gaussian_kde
import numpy as np
I have a dataset A of 3D points (this is just a minimal example. My actual data has many more dimensions and many more samples)
A = np.array([[0.078377 , 0.76737392, 0.45038174],
[0.65990129, 0.13154658, 0.30770917],
[0.46068406, 0.22751313, 0.28122463]])
and the points at which I want to estimate the density
B = np.array([[0.40209377, 0.21063273, 0.75885516],
[0.91709997, 0.79303252, 0.65156937]])
But I can't seem to be able to use the gaussian_kde function, as
result = gaussian_kde(A.T)(B.T)
returns
LinAlgError: Matrix is not positive definite
How do I fix this error? How do I get the density of my sample?
TL; DR:
You have highly correlated features in your data which leads to a numerical error. There are several possible ways to address this, each with pros and cons. A drop-in replacement class for gaussian_kde is proposed below.
Diagnostic
Your dataset (i.e. the matrix that you feed when creating the gaussian_kde object, not when using it), likely contains highly dependent features. This fact (possibly combined with having low numerical resolution like float32 and "too many" datapoints) causes the covariance matrix to have near-zero eigenvalues, which due to numerical error can get below zero. This in turn will break the code that uses the Cholesky decomposition on the dataset covariance matrix (see explanation for specific details).
Assuming your dataset has shape (dims, N) you can test if this is your problem via np.linalg.eigh(np.cov(dataset))[0] <= 0. If any of the outputs comes out True, let me be the first welcoming you to the club.
Treatments
The idea is to get all eigenvalues above zero.
Increasing numerical resolution to the highest float that is practical can be an easy fix and worth trying, but may not be enough.
Given the fact that this is caused by correlated features, removing datapoints doesn't help much a priori. There is a slim hope that having less numbers to crush will propagate less error, and keep the eigenvalues above zero. It's easy to implement but it discards data points.
A more involved fix is to identify the highly correlated features and merge them or ignore the "superfluous" ones. This is tricky especially if the correlations among dimensions vary from instance to instance.
Probably the cleanest way is to leave the data untouched, and lift the problematic eigenvalues to positive values. This is usually done in 2 ways:
SVD addresses the problem directly at its core: Decompose the covariance matrix and replace the negative eigenvalues with a small positive epsilon. This will put your matrix back to PD domain introducing minimal error.
If the SVD is too expensive to compute, an alternative numerical hack is to add epsilon * np.eye(D) to the covariance matrix. This has the effect of adding epsilon to each one of the eigenvalues. Again, if epsilon is small enough, this doesn't introduce that much of an error.
If you go for the last approach you'll need to tell gaussian_kde to modify its covariance matrix. This is a relatively clean way I found to do that: simply add this class to your codebase and replace gaussian_kde with GaussianKde (tested on my end, seems to work fine).
class GaussianKde(gaussian_kde):
"""
Drop-in replacement for gaussian_kde that adds the class attribute EPSILON
to the covmat eigenvalues, to prevent exceptions due to numerical error.
"""
EPSILON = 1e-10 # adjust this at will
def _compute_covariance(self):
"""Computes the covariance matrix for each Gaussian kernel using
covariance_factor().
"""
self.factor = self.covariance_factor()
# Cache covariance and inverse covariance of the data
if not hasattr(self, '_data_inv_cov'):
self._data_covariance = np.atleast_2d(np.cov(self.dataset, rowvar=1,
bias=False,
aweights=self.weights))
# we're going the easy way here
self._data_covariance += self.EPSILON * np.eye(
len(self._data_covariance))
self._data_inv_cov = np.linalg.inv(self._data_covariance)
self.covariance = self._data_covariance * self.factor**2
self.inv_cov = self._data_inv_cov / self.factor**2
L = np.linalg.cholesky(self.covariance * 2 * np.pi)
self._norm_factor = 2*np.log(np.diag(L)).sum() # needed for scipy 1.5.2
self.log_det = 2*np.log(np.diag(L)).sum() # changed var name on 1.6.2
Explanation
In case your error is similar, but not quite that, or anyone feels curious, here is the process I followed, hopefully it helps.
The exception stack specified that the error was triggered during a Cholesky decomposition. In my case, this was this line inside the _compute_covariance method.
Indeed, after the exception, checking the Eigenvalues for the covariance and inv_cov attributes via np.eigh showed that covariance had a close-to-zero negative eigenvalue, and hence its inverse had a huge one. Since Cholesky expects PD matrices, this triggered an error.
At this point we can heavily suspect that the tiny, negative eigenvalue is a numerical error, since covariance matrices are PSD. Indeed, the error source comes when the covariance matrix is originally computed from the dataset that has been passed to the constructor, here. In my case, the covariance matrix yielded at least 2 almost identical columns, which led to the residual negative eigenvalue due to numerical error.
When will your dataset lead to a quasi-singular covariance matrix? That question is perfectly addressed in this other SE post. The bottom line is: If some variable is an exact linear combination of the other variables, with constant term allowed, the correlation and covariance matrices of the variables will be singular. The dependency observed in such matrix between its columns is actually that same dependency as the dependency between the variables in the data observed after the variables have been centered (their means brought to 0) or standardized (if we mean correlation rather than covariance matrix) (Kudos and +1 to ttnphns for the amazing work).

How to interpolate spatial data from irregular grid to regular grid using the max of the k-nearest neighbours?

I am having a problem struggeling me for a few days now. :
There are 17 numpy arrays with values and corresponding latitude and longitude coordinates. Each of the them contains 360*600 points. These points are overlappping at some parts. What I want to do in the end is to have a composite of the data at one regular grid.
With the common scipy.interpolate.griddata function I am having the problem that in these overlapping regions I am having different values often. This results in strange artefacts you can see in the first image:
My first idea is to take the max value of the values used in the interpolation.
I have found out that scipy.interpolate.griddata uses triangulation to interpolate but actually I can't find a pipeline that I can adapt.
I hope you can understand that I do not share any code bc. dataset is huge and my question is more about to find the best practice or receive some interesting ideas to solve this problem. Thanks in advance for your support.
Maybe calculate first the distance matrix between your regular grid points (x and the existing irregular ones y:
https://docs.scipy.org/doc/scipy/reference/generated/scipy.spatial.distance_matrix.html
Then, for each point, find the indices of the k smallest distances and take the maximum value of the value on the irregular grid.
Disclamer: I don't know how it scales - and what your requirements are regarding performance.
Edit: You might be able to pre-eliminate data-sets for specific regions, to minimise the effort to calculate all the distance matrices.

Inverse Matrix (Numpy) int too large to convert to float

I am trying to take the inverse of a 365x365 matrix. Some of the values get as large as 365**365 and so they are converted to long numbers. I don't know if the linalg.matrix_power() function can handle long numbers. I know the problem comes from this (because of the error message and because my program works just fine for smaller matrices) but I am not sure if there is a way around this. The code needs to work for a NxN matrix.
Here's my code:
item=0
for i in xlist:
xtotal.append(arrayit.arrayit(xlist[item],len(xlist)))
item=item+1
print xtotal
xinverted=numpy.linalg.matrix_power(xtotal,-1)
coeff=numpy.dot(xinverted,ylist)
arrayit.arrayit:
def arrayit(number, length):
newarray=[]
import decimal
i=0
while i!=(length):
newarray.insert(0,decimal.Decimal(number**i))
i=i+1
return newarray;
The program is taking x,y coordinates from a list (list of x's and list of y's) and makes a function.
Thanks!
One thing you might try is the library mpmath, which can do simple matrix algebra and other such problems on arbitrary precision numbers.
A couple of caveats: It will almost certainly be slower than using numpy, and, as Lutzl points out in his answer to this question, the problem may well not be mathematically well defined. Also, you need to decide on the precision you want before you start.
Some brief example code,
from mpmath import mp, matrix
# set the precision - see http://mpmath.org/doc/current/basics.html#setting-the-precision
mp.prec = 5000 # set it to something big at the cost of speed.
# Ideally you'd precalculate what you need.
# a quick trial with 100*100 showed that 5000 works and 500 fails
# see the documentation at http://mpmath.org/doc/current/matrices.html
# where xtotal is the output from arrayit
my_matrix = matrix(xtotal) # I think this should work. If not you'll have to create it and copy
# do the inverse
xinverted = my_matrix**-1
coeff = xinverted*matrix(ylist)
# note that as lutlz pointed out you really want to use solve instead of calculating the inverse.
# I think this is something like
from mpmath import lu_solve
coeff = lu_solve(my_matrix,matrix(ylist))
I suspect your real problem is with the maths rather than the software, so I doubt this will work fantastically well for you, but it's always possible!
Did you ever hear of Lagrange or Newton interpolation? This would avoid the whole construction of the VanderMonde matrix. But not the potentially large numbers in the coefficients.
As a general observation, you do not want the inverse matrix. You do not need to compute it. What you want is to solve a system of linear equations.
x = numpy.linalg.solve(A, b)
solves the system A*x=b.
You (really) might want to look up the Runge effect. Interpolation with equally spaced sample points is an increasingly ill-conditioned task. Useful results can be obtained for single-digit degrees, larger degrees tend to give wildly oscillating polynomials.
You can often use polynomial regression, i.e., approximating your data set by the best polynomial of some low degree.

Categories

Resources