Boundingbox defintion for opencv object tracking - python

How is the boundingbox object defined that takes opencv's tracker.init() function?
is it (xcenter,ycenter,boxwidht,boxheight)
or (xmin,ymin,xmax,ymax)
or (ymin,xmin,ymax,xmax)
or something completely different?
I am using python and OpenCV 3.3 and i basically do the following on each object i want to track for each frame of a video:
tracker = cv2.trackerKCF_create()
ok = tracker.init(previous_frame,bbox)
bbox = tracker.update(current_frame)

The Answer is: (xmin,ymin,boxwidth,boxheight)

The other post states the answer as a fact, so let's look at how to figure it out on your own.
The Python version of OpenCV is a wrapper around the main C++ API, so when in doubt, it's always useful to consult either the main documentation, or even the source code. There is a short tutorial providing some basic information about the Python bindings.
First, let's look at cv::TrackerKCF. The init member takes the bounding box as an instance of cv::Rect2d (i.e. a variant of cv::Rect_ which represents the parameters using double values):
bool cv::Tracker::init(InputArray image, const Rect2d& boundingBox)
Now, the question is, how is a cv::Rect2d (or in general, the variants of cv::Rect_) represented in Python? I haven't found any part of documentation that states this clearly (although I think it's hinted at in the tutorials), but there is some useful information in the bindings tutorial mentioned earlier:
... But there may be some basic OpenCV datatypes like Mat, Vec4i,
Size. They need to be extended manually. For example, a Mat type
should be extended to Numpy array, Size should be extended to a tuple
of two integers etc. ... All such manual wrapper functions are placed
in modules/python/src2/cv2.cpp.
Not much, so let's look at the code they point us at. Lines 941-954 are what we're after:
template<>
bool pyopencv_to(PyObject* obj, Rect2d& r, const char* name)
{
(void)name;
if(!obj || obj == Py_None)
return true;
return PyArg_ParseTuple(obj, "dddd", &r.x, &r.y, &r.width, &r.height) > 0;
}
template<>
PyObject* pyopencv_from(const Rect2d& r)
{
return Py_BuildValue("(dddd)", r.x, r.y, r.width, r.height);
}
The PyArg_ParseTuple in the first function is quite self-explanatory. A 4-tuple of double (floating point) values, in the order x, y, width and height.

Related

How does one deal with various errors in statically typed languages (or when typing in general)

For context, my primary langauge is Python, and I'm just beginning to use annotations. This is in preparation for learning C++ (and because, intuitively, it feels better).
I have something like this:
from models import UserLocation
from typing import Optional
import cluster_module
import db
def get_user_location(user_id: int, data: list) -> Optional[UserLocation]:
loc = UserLocation.query.filter_by(user_id=user_id).one_or_none()
if loc:
return loc
try:
clusters = cluster_module.cluster(data)
except ValueError:
return None # cluster throws an error if there is not enough data to cluster
if list(clusters.keys()) == [-1]:
return None # If there is enough data to cluster, the cluster with an index of -1 represents all data that didn't fit into a cluster. It's possible for NO data to fit into a cluster.
loc = UserLocation(user_id=user_id, location = clusters[0].center)
db.session.add(loc)
db.session.commit()
return loc
So, I use typing.Optional to ensure that I can return None in case there's an error (if I understand correctly, the static-typing-language equivalent of this would be to return a null pointer of the appropriate type). Though, how does one distinguish between the two errors? What I'd like to do, for example, is return -1 if there's not enough data to cluster and -2 if there's data, but none of them fit into a cluster (or some similar thing). In Python, this is easy enough (because it isn't statically typed). Even with mypy, I can say something like typing.Union[UserLocation, int].
But, how does one do this in, say, C++ or Java? Would a Java programmer need to do something like set the function to return int, and return the ID of UserLocation instead of the object itself (then, whatever code uses the get_user_location function would itself do the lookup)? Is there runtime benefit to doing this, or is it just restructuring the code to fit the fact that a language is statically typed?
I believe I understand most of the obvious benefits of static typing w.r.t. code readability, compile-time, and efficiency at runtime—but I'm not sure what to make of this particular issue.
In a nutshell: How does one deal with functions (which return a non-basic type) indicating they ran into different errors in statically typed languages?
The direct C++ equivalent to the python solution would be std::variant<T, U> where T is the expected return value and U the error code type. You can then check which of the types the variant contains and go from there. For example :
#include <cstdlib>
#include <iostream>
#include <string>
#include <variant>
using t_error_code = int;
// Might return either `std::string` OR `t_error_code`
std::variant<std::string, t_error_code> foo()
{
// This would cause a `t_error_code` to be returned
//return 10;
// This causes an `std::string` to be returned
return "Hello, World!";
}
int main()
{
auto result = foo();
// Similar to the Python `if isinstance(result, t_error_code)`
if (std::holds_alternative<t_error_code>(result))
{
const auto error_code = std::get<t_error_code>(result);
std::cout << "error " << error_code << std::endl;
return EXIT_FAILURE;
}
std::cout << std::get<std::string>(result) << std::endl;
}
However this isn't often seen in practice. If a function is expected to fail, then a single failed return value like a nullptr or end iterator suffices. Such failures are expected and aren't errors. If failure is unexpected, exceptions are preferred which also eliminates the problem you describe here. It's unusual to both expect failure and care about the details of why the failure occurred.

Pybind11 and std::vector -- How to free data using capsules?

I have a C++ function that returns a std::vector and, using Pybind11, I would like to return the contents of that vector as a Numpy array without having to copy the underlying data of the vector into a raw data array.
Current Attempt
In this well-written SO answer the author demonstrates how to ensure that a raw data array created in C++ is appropriately freed when the Numpy array has zero reference count. I tried to write a version of this using std::vector instead:
// aside - I made a templated version of the wrapper with which
// I create specific instances of in the PYBIND11_MODULE definitions:
//
// m.def("my_func", &wrapper<int>, ...)
// m.def("my_func", &wrapper<float>, ...)
//
template <typename T>
py::array_t<T> wrapper(py::array_t<T> input) {
auto proxy = input.template unchecked<1>();
std::vector<T> result = compute_something_returns_vector(proxy);
// give memory cleanup responsibility to the Numpy array
py::capsule free_when_done(result.data(), [](void *f) {
auto foo = reinterpret_cast<T *>(f);
delete[] foo;
});
return py::array_t<T>({result.size()}, // shape
{sizeof(T)}, // stride
result.data(), // data pointer
free_when_done);
}
Observed Issues
However, if I call this from Python I observe two things: (1) the data in the output array is garbage and (2) when I manually delete the Numpy array I receive the following error (SIGABRT):
python3(91198,0x7fff9f2c73c0) malloc: *** error for object 0x7f8816561550: pointer being freed was not allocated
My guess is that this issue has to do with the line "delete[] foo", which presumably is being called with foo set to result.data(). This is not the way to deallocate a std::vector.
Possible Solutions
One possible solution is to create a T *ptr = new T[result.size()] and copy the contents of result to this raw data array. However, I have cases where the results might be large and I want to avoid taking all of that time to allocate and copy. (But perhaps it's not as long as I think it would be.)
Also, I don't know much about std::allocator but perhaps there is a way to allocate the raw data array needed by the output vector outside the compute_something_returns_vector() function call and then discard the std::vector afterwards, retaining the underlying raw data array?
The final option is to rewrite compute_something_returns_vector.
After an offline discussion with a colleague I resolved my problem. I do not want to commit an SO faux pas so I won't accept my own answer. However, for the sake of using SO as a catalog of information I want to provide the answer here for others.
The problem was simple: result was stack-allocated and needed to be heap-allocated so that free_when_done can take ownership. Below is an example fix:
{
// ... snip ...
std::vector<T> *result = new std::vector<T>(compute_something_returns_vector(proxy));
py::capsule free_when_done(result, [](void *f) {
auto foo = reinterpret_cast<std::vector<T> *>(f);
delete foo;
});
return py::array_t<T>({result->size()}, // shape
{sizeof(T)}, // stride
result->data(), // data pointer
free_when_done);
}
I was also able to implement a solution using std::unique_ptr that doesn't require the use of a free_when_done function. However, I wasn't able to run Valgrind with either solution so I'm not 100% sure that the memory held by the vector was appropriately freed. (Valgrind + Python is a mystery to me.) For completeness, below is the std::unique_ptr approach:
{
// ... snip ...
std::unique_ptr<std::vector<T>> result =
std::make_unique<std::vector<T>>(compute_something_returns_vector(proxy));
return py::array_t<T>({result->size()}, // shape
{sizeof(T)}, // stride
result->data()); // data pointer
}
I was, however, able to inspect the addresses of the vectors allocated in both the Python and C++ code and confirmed that no copies of the output of compute_something_returns_vector() were made.

Is it ok to call `tape.watch(x)` when `x` is already a `tf.Variable` in TensorFlow?

Consider the following function
def foo(x):
with tf.GradientTape() as tape:
tape.watch(x)
y = x**2 + x + 4
return tape.gradient(y, x)
The call to tape.watch(x) is necessary if the function is called say as foo(tf.constant(3.14)), but is not when it is passed in a variable directly, such as foo(tf.Variable(3.14)).
Now my question is, is the call to tape.watch(x) safe even in the case when tf.Variable is passed in directly? Or will some strangness happen due to the variable already being auto-watched and then watched manually again? What is the correct way to write general functions like this that can accept both tf.Tensor and tf.Variable?
It should be safe. On the one hand, the documentation of tf.GradientTape.watch says:
Ensures that tensor is being traced by this tape.
"Ensures" seems to imply that it will make sure it is traced in case it is not. In fact, the documentation does not give any indication that using it twice over the same object should be a problem (although it wouldn't hurt if they made that explicit).
But in any case, we can dig into the source code to check. In the end, calling watch on a variable (the answer ends up the same if it's not a variable but the path diverges slightly) comes down to the WatchVariable method of a GradientTape class in C++:
void WatchVariable(PyObject* v) {
tensorflow::Safe_PyObjectPtr handle(PyObject_GetAttrString(v, "handle"));
if (handle == nullptr) {
return;
}
tensorflow::int64 id = FastTensorId(handle.get());
if (!PyErr_Occurred()) {
this->Watch(id);
}
tensorflow::mutex_lock l(watched_variables_mu_);
auto insert_result = watched_variables_.emplace(id, v);
if (insert_result.second) {
// Only increment the reference count if we aren't already watching this
// variable.
Py_INCREF(v);
}
}
The second half of the method shows that the watched variable is added to watched_variables_, which is a std::set, so adding again something will not do anything. This is actually checked later to make sure Python reference counting is correct. The first half basically calls Watch:
template <typename Gradient, typename BackwardFunction, typename TapeTensor>
void GradientTape<Gradient, BackwardFunction, TapeTensor>::Watch(
int64 tensor_id) {
tensor_tape_.emplace(tensor_id, -1);
}
tensor_tape_ is a map (specifically a tensorflow::gtl:FlatMap, pretty much the same as a standard C++ map), so if tensor_id is already there this will have no effect.
So, even though it is not explicitly stated, everything suggests there should be no issues with it.
It's designed to be used by variables. From the docs
By default GradientTape will automatically watch any trainable
variables that are accessed inside the context. If you want fine
grained control over which variables are watched you can disable
automatic tracking by passing watch_accessed_variables=False to the
tape constructor:
with tf.GradientTape(watch_accessed_variables=False) as tape:
tape.watch(variable_a)
y = variable_a ** 2 # Gradients will be available for `variable_a`.
z = variable_b ** 3 # No gradients will be available since `variable_b` is
# not being watched.

Python C API: Switch on PyObject type

I have some code to interface Python to C++ which works fine but every time I look at it I think there must be a better way to do it. On the C++ side there is a 'variant' type that can deal with a fixed range of basic types - int, real, string, vector of variants, etc. I have some code using the Python API to convert from the equivalent Python types. It looks something like this:
variant makeVariant(PyObject* value)
{
if (PyString_Check(value)) {
return PyString_AsString(value);
}
else if (value == Py_None) {
return variant();
}
else if (PyBool_Check(value)) {
return value == Py_True;
}
else if (PyInt_Check(value)) {
return PyInt_AsLong(value);
}
else if (PyFloat_Check(value)) {
return PyFloat_AsDouble(value);
}
// ... etc
The problem is the chained if-else ifs. It seems to be calling out for a switch statement, or a table or map of creation functions which is keyed by a type identifier. In other words I want to be able to write something like:
return createFunMap[typeID(value)](value);
Based on a skim of the API docs it wasn't obvious what the best way is to get the 'typeID' here directly. I see I can do something like this:
PyTypeObject* type = value->ob_type;
This apparently gets me quickly to the type information but what is the cleanest way to use that to relate to the limited set of types I am interested in?
In a way, I think you've answered your own question.
Somewhere, you're going to have to select functionality based on data. The way to do this in C is to use function pointers.
Create a map of object_type->function mappers... where each function has a clearly-defined interface.
variant PyBoolToVariant(PyObject *value) {
return value == Py_True;
}
Map<PyTypeObject*,variant* (PyObject*)> function_map;
function_map.add(PyBool, PyBoolToVariant);
Now your makeVariant can look like this.
variant makeVariant(PyObject *value) {
return (*function_map.get(value->ob_type))(value);
}
The hard part is going to be getting the syntax right for the Map object. Also, I'm assuming there is a Map object you can use that takes type parameters (<PyTypeObject*, variant*(PyObject*)).
I probably have not quite gotten the syntax correct for the second type of the Map. It should be a pointer to a function which takes one pointer and returns a pointer to a variant.
I hope this is helpful.

Assignment into Python 3.x Buffers with itemsize > 1

I am trying to expose a buffer of image pixel information (32 bit RGBA) through the Python 3.x buffer interface. After quite a bit of playing around, I was able to get this working like so:
int Image_get_buffer(PyObject* self, Py_buffer* view, int flags)
{
int img_len;
void* img_bytes;
// Do my image fetch magic
get_image_pixel_data(self, &img_bytes, &img_len);
// Let python fill my buffer
PyBuffer_FillInfo(view, self, img_bytes, img_len, 0, flags);
}
And in python I can play with it like so:
mv = memoryview(image)
print(mv[0]) # prints b'\x00'
mv[0] = b'\xFF' # set the first pixels red component to full
mx[0:4] = b'\xFF\xFF\xFF\xFF' # set the first pixel to white
And that works splendidly. However, it would be great if I could work with the full pixel value (int, 4 byte) instead of individual bytes, so I modified the buffer fetch like so:
int Image_get_buffer(PyObject* self, Py_buffer* view, int flags)
{
int img_len;
void* img_bytes;
// Do my image fetch magic
get_image_pixel_data(self, &img_bytes, &img_len);
// Fill my buffer manually (derived from the PyBuffer_FillInfo source)
Py_INCREF(self);
view->readonly = 0;
view->obj = self;
view->buf = img_bytes;
view->itemsize = 4;
view->ndim = 1;
view->len = img_len;
view->suboffsets = NULL;
view->format = NULL;
if ((flags & PyBUF_FORMAT) == PyBUF_FORMAT)
view->format = "I";
view->shape = NULL;
if ((flags & PyBUF_ND) == PyBUF_ND)
{
Py_ssize_t shape[] = { (int)(img_len/4) };
view->shape = shape;
}
view->strides = NULL;
if((flags & PyBUF_STRIDED) == PyBUF_STRIDED)
{
Py_ssize_t strides[] = { 4 };
view->strides = strides;
}
return 0;
}
This actually returns the data and I can read it correctly, but any attempt to assign a value into it now fails!
mv = memoryview(image)
print(mv[0]) # prints b'\x00\x00\x00\x00'
mv[0] = 0xFFFFFFFF # ERROR (1)
mv[0] = b'\xFF\xFF\xFF\xFF' # ERROR! (2)
mv[0] = mv[0] # ERROR?!? (3)
In case 1 the error informs me that 'int' does not support the buffer interface, which is a shame and a bit confusing (I did specify that the buffer format was "I" after all), but I can deal with that. In case 2 and 3 things get really weird, though: Both cases gime me an TypeError reading mismatching item sizes for "my.Image" and "bytes" (Where my.Image is, obviously, my image type)
This is very confusing to me, since the data I'm passing in is obviously the same size as what I get out of that element. It seems as though buffers simply stop allowing assignment if the itemsize is greater than 1. Of course, the documentation for this interface is really sparse and perusing through the python code doesn't really give any usage examples so I'm fairly stuck. Am I missing some snippit of documentation that states "buffers become essentially useless when itemsize > 1", am I doing something wrong that I can't see, or is this a bug in Python? (Testing against 3.1.1)
Thanks for any insight you can give on this (admittedly advanced) issue!
I found this in the python code (in memoryobject.c in Objects) in the function memory_ass_sub:
/* XXX should we allow assignment of different item sizes
as long as the byte length is the same?
(e.g. assign 2 shorts to a 4-byte slice) */
if (srcview.itemsize != view->itemsize) {
PyErr_Format(PyExc_TypeError,
"mismatching item sizes for \"%.200s\" and \"%.200s\"",
view->obj->ob_type->tp_name, srcview.obj->ob_type->tp_name);
goto _error;
}
that's the source of the latter two errors. It looks like the itemsize for even mv[0] is still not equal to itself.
Update
Here's what I think is going on. When you try to assign something in mv, it calls memory_ass_sub in Objects/memoryobject.c, but that function takes only a PyObject as input. This object is then changed into a buffer inside using the PyObject_GetBuffer function even though in the case of mv[0] it is already a buffer (and the buffer you want!). My guess is that this function takes the object and makes it into a simple buffer of itemsize=1 regardless of whether it is already a buffer or not. That is why you get the mismatching item sizes even for
mv[0] = mv[0]
The problem with the first assignment,
mv[0] = 0xFFFFFFFF
stems (I think) from checking if the int is able to be used as a buffer, which currently it isn't set-up for from what I understand.
In other words, the buffer system isn't currently able to handle item sizes bigger from 1. It doesn't look like it is so far off, but it would take a bit more work on your end. If you do get it working, you should probably submit the changes back to the main Python distribution.
Another Update
The error code from your first try at assigning mv[0] stems from the int failing the PyObject_CheckBuffer when PyObject_CheckBuffer is called on it. Apparently the system only handles copies from bufferable objects. This seems like it should be changed too.
Conclusion
Currently the Python buffer system can't handle items with itemsize > 1 as you guessed. Also, it can't handle assignments to a buffer from non-bufferable objects such as ints.

Categories

Resources