Based on the cv2.kmeans function, I have written a function "F(Image)" with "label" as output.
ret,label,center=cv2.kmeans(Image,K,None,criteria,10,cv2.KMEANS_RANDOM_CENTERS)
The output of F(Image), "label", is later used for other image processing.
However, I need to run F(Image) for numerous images. I noticed that the labels are different if I ran, say, F(Image1) and F(Image2) consecutively versus F(Image1) and F(Image2) separately.
My suspicion is that every time cv2.KMEANS_RANDOM_CENTERS is ran, it starts at a different random number.
Without going into the source code of cv2.KMEANS_RANDOM_CENTERS, is there any way to ensure that the labels are the same every time I run the code? Or run F(Image1) and F(Image2) as in they are ran separately.
cv2.kmeans() takes 2 type of flags: cv2.KMEANS_PP_CENTERS and cv2.KMEANS_RANDOM_CENTERS.
cv2.KMEANS_RANDOM_CENTERS:
With this flag enabled, the method always starts with a random set of initial samples, and tries to converge from there depending upon your TermCirteria.
Pros:
Saves computation Time.
Cons:
Doesn't guarantee same labels for the exact same image.
cv2.KMEANS_PP_CENTERS:
With this flag enabled, the method first iterates the whole image to determine the probable centers and then starts to converge.
Pros:
Would yield optimum and consistent results for same input image
Cons:
Takes extra computation time for iterating all the pixels and determining probable samples.
Note: I have also read about another flag cv::KMEANS_USE_INITIAL_LABELS, using which you can pass custom samples, which are used by the method to converge, But in the documentation linked, that flag is not mentioned, Not Sure if it has been deprecated or the documentation is not updated.
The method only keeps the best labels after each iteration. So if the number of iterations you set is high enough, let's say cv2.kmeans(Image,K,None,criteria,100,cv2.KMEANS_RANDOM_CENTERS), the output result would be similar.
Related
Suppose I have a system that is driven by a signal comprising 3 voltage levels (let's say -V1, 0, V1). I need to determine the composition of the signal that most accurately produces the desired output. The output is a single number that represents the current state of the system. The number of possible permutations for such a signal are too high to brute-force and find the global minimum i.e. exploring the entirety of the search space is impossible. However, I do have a model that simulates the system so I can still process several possible options. How can I find the best signal to produce the desired output (in other words, the signal that drives the system to the desired state)?
One method that I have right now involves producing a starting set (i.e. a small subset of the search space) of signals that align with a set of constraints, finding the signal that produces output closest to the desired output, and making modifications to this signal (i.e. fine tuning) in order to obtain the desired output. This final step is difficult for me, as I am manually doing this. One idea to automate this final step is to parametrize all possible modifications (for instance, parameter x1 = 1 adds a single -V1 'frame' to the signal, x1 = 2 adds two such frames, x1 = -1 removes a -V1 frame, and so on), and step through the set of possible modifications. But again, there's a lot of possibilities. To improve upon this, I explored the effect that modifications have on the system output. The effects of these modifications look somewhat predictable (the distributions of the changes in output they produce generally follow Gaussian distributions). But I'm not sure how to proceed from here. What models/schemes would you suggest I use? Can I use information from the distributions of changes produced by modifications to intelligently fine-tune the signal? How do I account for outliers (i.e. cases wherein the modification(s) to an initial signal produce a change in output that lies in the tail end of the distribution?
Edit: Forgot to mention, but the constraints on the signal would be length (the number of frames/steps in the signal must be less than or equal to a finite positive integer, N) and total potential (i.e. the sum of the voltages in the signal should equal an integer, V).
The Problem
I am looking to tackle a minimization problem using scipy's optimization utilities.
Specifically, I've been using this function:
result = spo.minimize(s21_mag, goto_start, options={"disp": True}, bounds=bnds)
My s21_mag function takes a couple of seconds to return an output (due to physically moving motors). It consists of 3 parameters (3 moving parts), with no constraints - just three bounds (identical for all 3 parameters):
bnds = ((0,45000),(0,45000),(0,45000))
The limit on the amount of iterations is not very constraint (1000 is probably a good enough upper limit for me), but I expect the optimizer to try many configurations in this set of iterations to identify an optimal value. So far, some methods I've tried just seem to converge somewhere with meaningless progress.
Here's progress beyond the 50th iteration (full code here) - the goal is the maximization of S21 at a specific frequency (purple vertical line):
This is with no method passed tospo.minimize(), so it uses the default (and it looks like it applies the exact same movement to each motor).
Questions
Although scipy's minimization function offers a wide variety of optimization methods/algorithms, how could I (as a beginner in optimization math) select the one that would work best for my application? What kind of aspects of my problem should I take into account to jump to such conclusions? Assume I have no idea about the initial value of each parameter and want the optimizer to figure that out (I usually just set it to the midpoint, i.e. initial: x1=x2=x3=22500).
The same set of parameters as an input to my s21_mag function could yield different results at different times the function is called.
This happens for two reasons:
(a) The parameter step of the optimizer can get extremely small (particularly as the number of iterations increase and the convergence is approached), whereas the motor expects a minimum value of ~100 to make a step.
Is there a way to somehow set a minimum step? Otherwise, it tries to step from e.g. 1234.0 to 1234.0001 and eventually gets "stuck" between trying tiny changes.
(b) The output of the function goes through a measuring instrument, which exhibits a little bit of noise (e.g. one measurement may yield 5.42 dB, while another measurement (with the exact same parameters) may yield 5.43 dB).
Is there a way to deal with these kinds of small variabilities/errors to avoid confusions for the optimizer?
I'm trying to do some physical experiments to find a formulation that optimizes some parameters. By physical experiments I mean I have a chemistry bench, I'm mixing stuff together, then measuring the properties of that formulation. Historically I've used traditional DOEs, but I need to speed up my time to getting to the ideal formulation. I'm aware of simplex optimization, but I'm interested in trying out Bayesian optimization. I found GPyOpt which claims (even in the SO Tag description) to support physical experiments. However, it's not clear how to enable this kind of behavior.
One thing I've tried is to collect user input via input, and I suppose I could pickle off the optimizer and function, but this feels kludgy. In the example code below, I use the function from the GPyOpt example but I have to type in the actual value.
from GPyOpt.methods import BayesianOptimization
import numpy as np
# --- Define your problem
def f(x):
return (6*x-2)**2*np.sin(12*x-4)
def g(x):
print(f(x))
return float(input("Result?"))
domain = [{'name': 'var_1', 'type': 'continuous', 'domain': (0, 1)}]
myBopt = BayesianOptimization(f=g,
domain=domain,
X=np.array([[0.745], [0.766], [0], [1], [0.5]]),
Y=np.array([[f(0.745)], [f(0.766)], [f(0)], [f(1)], [f(0.5)]]),
acquisition_type='LCB')
myBopt.run_optimization(max_iter=15, eps=0.001)
So, my questions is, what is the intended way of using GPyOpt for physical experimentation?
A few things.
First, set f=None. Note that this has the side-effect of causing the BO object to ignore the maximize=True, if you happen to be using this.
Second, rather than use run_optimization, you want suggest_next_locations. The former runs the entire optimization, whereas the latter just runs a single iteration. This method returns a vector with parameter combinations ("locations") to go test in the lab.
Third, you'll need to make some decisions regarding batch size. The number of combinations/locations that you get are controlled by the batch_size parameter that you use to initialize the BayesianOptimization object. Choice of acquisition function is important here, because some are closely tied to a batch_size of 1. If you need larger batches, then you'll need to read the docs for combinations suitable to your situation (e.g. acquisition_type=EI and evaluator_type=local_penalization.
Fourth, you'll need to explicitly manage the data between iterations. There are at least two ways to approach this. One is to pickle the BO object and add more data to it. An alternative that I think is more elegant is to instead create a completely fresh BO object each time. When instantiating it, you concatenate the new data to the old data, and just run a single iteration on the whole set (again, using suggest_next_locations). This might be kind of insane if you were using BO to optimize a function in silico, but considering how slow the chemistry steps are likely to be, this might be cleanest (and easier to make mid-course corrections.)
Hope this helps!
I'd be thankful for all thoughts, tips or links on this:
Using TF 1.10 and the recent object detection-API (github, 2018-08-18) I can do box- and mask prediction using the PETS dataset as well as using my own proof of concept data-set:
But when training on the cityscapes traffic signs (single class) I am having troubles to achieve any results. I have adapted the anchors to respect the much smaller objects and it seems the RPN is doing something useful at least:
Anyway, the box predictor is not going into action at all. That means I am not getting any boxes at all - not to ask for masks.
My pipelines are mostly or even exactly like the sample configs.
So I'd expect either problems with the specific type of data or a bug .
Would you have any tips/links how to (either)
visualize the RPN results when using 2 or 3 stages? (Using only one stage does that, but how would one force that?)
train the RPN first and continue with boxes later?
investigate where/why the boxes get lost? (having predictions with zero scores while evaluation yields zero classification error)
The Solution finally was a combination of multiple issues:
The parameter from_detection_checkpoint: true is depreciated and to be replaced by fine_tune_checkpoint_type: 'detection'. However, without any of those the framework seems to default to 'classification', what seems to break the whole idea of the object detection framework. No good idea to rely on the defaults this time.
My data wasn't prepared good enough. I had boxes with zero width+/height (for whatever reason). I also removed masks for instances that were disconnected.
Using the keep_aspect_ratio_resizer together with random_crop_image and random_coef: 0.0 does not seem to allow for the full resolution as the resizer seems to be applied before the random cropping. I do now split my input images into (vertical) stripes [for memory saving] and apply the random_crop with a small min_area so it does not skip the small features at all. Also I can now allow for a max_area: 1 and a random coefficient > 0, as the memory usage is dealt with.
One potential problem also arose from the fact that I only considered a single class (so far). This might be a problem either for the framework, or for the activation function in the network. However, in combination with the other issues this change seemed to cause no additional problems - at minimum.
Last but not least I updated the sources to 2018-10-02 but didn't walk through all modifications in detail.
I hope others can save time and troubles from my findings.
I tried to use the option of putting monotonic constraints in XGBboost (without using Scikit-Learn wrapper : see my previous post here)
but I now would like to check that this was correcly applied.
As this type of model is a kind of black box, I usually try to look at some KPI related with the overall accuracy of the model (logloss, RMSE, etc.) but not directly to the effect of each feature.
Is there an easy way (or alternatively complex one) to do so and then check that monotonicity was effectively applied ?
At this stage, what comes to my mind is 1/ to take one observation of the test set, 2/ to duplicate it - let's say 10 or 100 times - 3/ then to manually make vary one of the features for which monotonicity should apply 4/ to make of graph about the predicted values. That is not stragith forward (especially considering that I want to check it on ~25 features...) nor very robust (I do not change the values of the other features that are correlated to the one I am looking at).
Any suggestion is welcome !