I am working with GPFlow to implement a basic Gaussian Process Regressor. In particular, I have defined the following kernel for my GPR:
kernel = gpflow.kernels.RBF(lengthscales=1., active_dims=[0,1]) + \
gpflow.kernels.RBF(lengthscales=1., active_dims=[2,3])
self.gp = gpflow.models.GPR(data=(gpXVar, gpYVar),kernel=self.kernel)
The overall parameter space is 4-dimensional where I expect dimensions (0,1) to share the same lengthscale parameter and (2,3) to share another lengthscale parameter). I bound the input space to the unit hypercube, and I normalize my labels. My question is: when setting the prior distributions over my kernel lengthscales, how does the internal implementation of gpflow inform my choices?
In particular, I only add data points to the regression if they are a minimum Euclidean distance away from any other point in the dataset (1e-4). Since I bound the input domain to the unit cube, I know the maximum distance between samples on any given dimension is 1, and the total Euclidean distance within the input domain is 2=Sqrt(1+1+1+1). From the excellent case study here:
https://betanalpha.github.io/assets/case_studies/gaussian_processes.html#3_Inferring_A_Gaussian_Process
I know that the prior distribution should lower bound the lengthscale to be 1e-4, but how am I to upper bound the lengthscales? Is the upper bound the maximum distance for a single dimension (1), is it the maximum distance for a given component kernel (sqrt(2)), or is it the maximum distance over the entire input domain (2)?
I have been pouring over the source code for GPFlow, but I can't seem to figure out which is correct.
Thank you.
Related
I computed derivatives using different methods such as :
convolution with an array [[-1, 1]].
Using the fourier theorem by computing DFT of the image and the array mentioned above, multiplying them and performing IDFT.
Directly through the derivative formula (Computing Fourier, multiplying by index and a constant and computing the inverse).
All methods seem to work almost identically, but have slight differences.
An explanation why they end up with slightly different results would be appreciated.
After computing those I started playing with the result to learn about it, and I found out something that confused me:
The main thing that baffles me is that when I try computing the median of this derivative, its ALWAYS 0.0.
Why is that?
I added the code I used to compute this (the first method at least) because maybe I'm doing something wrong.
from scipy.signal import convolve2d
im = sl.read_image(r'C:\Users\ahhal\Desktop\Essentials\Uni\year3\SemesterA\ImageProcessing\Exercises\Ex2\external\monkey.jpg', 1)
b = [[-1, 1]]
print(np.median(convolve2d(im, b)))
output: 0.0
The read_image function is my own and this is the implementation:
from imageio import imread
from skimage.color import rgb2gray
import numpy as np
def read_image(filename, representation):
"""
Receives an image file and converts it into one of two given representations.
:param filename: The file name of an image on disk (could be grayscale or RGB).
:param representation: representation code, either 1 or 2 defining wether the output
should be a grayscale image (1) or an RGB image (2). If the input image is grayscale,
we won't call it with representation = 2.
:return: An image, represented by a matrix of type (np.float64) with intensities
normalized to the range [0,1].
"""
assert representation in [1, 2]
# reads the image
im = imread(filename)
if representation == 1: # If the user specified they need grayscale image,
if len(im.shape) == 3: # AND the image is not grayscale yet
im = rgb2gray(im) # convert to grayscale (**Assuming its RGB and not a different format**)
im_float = im.astype(np.float64) # Convert the image type to one we can work with.
if im_float.max() > 1: # If image values are out of bound, normalize them.
im_float = im_float / 255
return im_float
Edit 2:
I tried it on several different images, and got 0.0 at all of them.
The image I'm using in the example is:
I computed derivatives using different methods such as :
convolution with an array [[-1, 1]].
Using the fourier theorem by computing DFT of the image and the array mentioned above, multiplying them and performing IDFT.
Directly through the derivative formula (Computing Fourier, multiplying by index and a constant and computing the inverse).
These derivative methods are all approximate and make different assumptions:
Convolution by [[-1, 1]] computes differences between adjacent elements,
derivative ~= data[n+1] − data[n]
You can interpret this like interpolating the data with a line segment, then taking the derivative of that interpolant:
I(x) = data[n] + (data[n+1] − data[n]) * (x − n)
So the approximation assumes the underlying function is locally linear. You can analyze the error by Taylor expansion to find that the error comes from the ignored higher-order terms. In other words, the approximation is accurate provided the function doesn't have strong nonlinear terms. This is a simple case of finite differences.
This is the same as 1, except with different boundary handling to handle convolution of samples near the edges of the image. By default, scipy.signal.convolve2d does zero padding (though you can use the boundary option to choose some other methods). However when computing the convolution through the DFT, then implicitly the boundary handling is periodic, wrapping around at the image edges. So the results of 1 and 2 differ for a margin of pixels near the edge because of the different boundary handling.
Computing the derivative through multiplying iω under the DFT representation can be interpreted like evaluating the derivative of the sinc interpolation the data. Sinc interpolation assumes the data is band limited. The error comes from spectra beyond the Nyquist frequency. Particularly, if there is a hard jump discontinuity from an object boundary, then the image is not bandlimited and the DFT-based derivative will have substantial error in the vicinity of the jump, appearing as ringing artifacts.
The main thing that baffles me is that when I try computing the median of this derivative, its ALWAYS 0.0.
I don't know why this happened here, but it shouldn't always be the case. For instance if each image row is the unit ramp data[n] = n, then the convolution by [[-1, 1]] is equal to 1 everywhere, except depending on boundary handling possibly not at the edges, so the median is 1.
Pascal already gave a wonderful explanation of the differences between the various approximations to the derivative. So I'll focus here on the "why always 0.0?" question.
The median of the derivative is 0.0 only by approximation. When I compute it, based on the finite difference approximation (method #1), I get -5.15e-5 as the median. Close to zero, but not exactly zero.
The derivative is 0 in uniform (flat) regions of the image such as the out-of-focus background. Other features in the image tend to have both a positive and a negative edge, making the histogram of the derivative image very symmetric:
This symmetry causes the median (as well as the mean) to be close to zero for such an image. However, this is not always the case. For example, if the image is brighter on the left edge than the right edge (or the other way around), then there must be a net gradient across the image, causing the mean or median to be different from zero.
I am using Umeyama's SVD method to estimate the rigid transformation between two 3-D point sets. See the code snippet below-
Eigen::Matrix4f T_svd;
const pcl::registration::TransformationEstimationSVD<pcl::PointXYZ, pcl::PointXYZ> trans_est_svd;
trans_est_svd.estimateRigidTransformation(source, target, T_svd);
Later on, the source point set was converted using above estimated transformation matrix
pcl::PointCloud<pcl::PointXYZ> target_est;
pcl::transformPointCloud(source, target_est, T_svd);
In order to calculate the accuracy of above transformation estimation, this is how I am proceeding:
import numpy as np
# target and target_est clouds were saved as csv
y = np.loadtxt(target, delimiter=',')
y_est = np.loadtxt(target_est, delimiter=',')
y_est = np.sqrt(np.sum((y_est)**2, axis=1))
y_sum = np.sqrt(np.sum((y)**2, axis=1))
acc = y_est/y_sum
mean_acc = acc.mean()
The mean accuracy from the above code is shown 1.0001, which makes me suspect about my approach.
I want to know that how to define the accuracy of transformation estimation in 3-D space.
Providing more insight on the answer I already gave at the pcl users forum.
Have a look at equation 1 and 2 of the Model Based Training, Detection and Pose
Estimation of Texture-Less 3D Objects in Heavily Cluttered Scenes from Hinsterstoisser et al. commonly known as LineMOD. This is a common used method for pose estimation problem with rigid objects.
Equation 1 is for non-symetric objects and 2 for symmetric ones. I can't type the expressions because Stack Overflow doens't support mathjax but the general idea is:
The norm of the difference between the location where your point is estimated to be and where it actually is, summed over all points, normalized by the number of points
The difference for the first one is that you no longer enforce you point correspondences. Each point from the source cloud, picks the closest point on the target cloud. Then perform the sum and averaging like before.
This is not exactly a stack overflow question, it's a math exchange one.
I'm using meanshift clustering to remove unwanted noise from my input data..
Data can be found here. Here what I have tried so far..
import numpy as np
from sklearn.cluster import MeanShift
data = np.loadtxt('model.txt', unpack = True)
## data size is [3X500]
ms = MeanShift()
ms.fit(data)
after trying some different bandwidth value I am getting only 1 cluster.. but the outliers and noise like in the picture suppose to be in different cluster.
when decreasing the bandwidth a little more then I ended up with this ... which is again not what I was looking for.
Can anyone help me with this?
You can remove outliers before using mean shift.
Statistical removal
For example, fix a number of neighbors to analyze for each point (e.g. 50), and the standard deviation multiplier (e.g. 1). All points who have a distance larger than 1 standard deviation of the mean distance to the query point will be marked as outliers and removed. This technique is used in libpcl, in the class pcl::StatisticalOutlierRemoval, and a tutorial can be found here.
Deterministic removal (radius based)
A simpler technique consists in specifying a radius R and a minimum number of neighbors N. All points who have less than N neighbours withing a radius of R will be marked as outliers and removed. Also this technique is used in libpcl, in the class pcl::RadiusOutlierRemoval, and a tutorial can be found here.
Mean-shift is not meant to remove low-density areas.
It tries to move all data to the most dense areas.
If there is one single most dense point, then everything should move there, and you get only one cluster.
Try a different method. Maybe remove the outliers first.
set his parameter to false cluster_allbool, default=True
If true, then all points are clustered, even those orphans that are not within any kernel. Orphans are assigned to the nearest kernel. If false, then orphans are given cluster label -1.
I am trying to k-means clustering with selected initial centroids.
It says here
that to specify your initial centers:
init : {‘k-means++’, ‘random’ or an ndarray}
If an ndarray is passed, it should be of shape (n_clusters, n_features) and gives the initial centers.
My code in Python:
X = np.array([[-19.07480000, -8.536],
[22.010800000,-10.9737],
[12.659700000,19.2601]], np.float64)
km = KMeans(n_clusters=3,init=X).fit(data)
# print km
centers = km.cluster_centers_
print centers
Returns an error:
RuntimeWarning: Explicit initial center position passed: performing only one init in k-means instead of n_init=10
n_jobs=self.n_jobs)
and return the same initial centers. Any idea how to form the initial centers so it can be accepted?
The default behavior of KMeans is to initialize the algorithm multiple times using different random centroids (i.e. the Forgy method). The number of random initializations is then controlled by the n_init= parameter (docs):
n_init : int, default: 10
Number of time the k-means algorithm will be run with different
centroid seeds. The final results will be the best output of
n_init consecutive runs in terms of inertia.
If you pass an array as the init= argument then only a single initialization will be performed using the centroids explicitly specified in the array. You are getting a RuntimeWarning because you are still passing the default value of n_init=10 (here are the relevant lines of source code).
It's actually totally fine to ignore this warning, but you can make it go away completely by passing n_init=1 if your init= parameter is an array.
I'm trying to understand what f_regression() in the feature selection package does.
(http://scikit-learn.org/stable/modules/generated/sklearn.feature_selection.f_regression.html#sklearn.feature_selection.f_regression)
According to the documentation, the first step in f_regression is as follows:
"1. the regressor of interest and the data are orthogonalized wrt constant regressors."
What does this line mean, exactly? What are these constant regressors?
Thanks!
It means that the mean is subtracted on both variables.
A constant regressor is a vector full of ones. What this vector can explain in your data is then subtracted out. This leads to a vector with zero sum, i.e. a centered variable.
What f1_regression essentially calculates is correlation, a scalar product between centered and appropriately rescaled variables.
The resulting score is a function of this value and the degrees of freedom, i.e. the dimensionality of the vectors. The higher the score, the more probably the variables are associated.