opencv python is faster than c++? - python

I am trying to time the houghcircle in python and c++ to see if c++ gives edge over processing time (intuitively it should!)
Versions
python: 3.6.4
gcc compiler: gcc (Ubuntu 5.4.0-6ubuntu1~16.04.9) 5.4.0 20160609
cmake : 3.5.1
opencv : 3.4.1
I actually installed opencv using anaconda. Surprisingly c++ version
also worked
The image I am using is given here:
Python code
import cv2
import time
import sys
def hough_transform(src,dp,minDist,param1=100,param2=100,minRadius=0,maxRadius=0):
gray = cv2.cvtColor(src,cv2.COLOR_RGB2GRAY)
start_time = time.time()
circles=cv2.HoughCircles(gray,
cv2.HOUGH_GRADIENT,
dp = dp,
minDist = minDist,
param1=param1,
param2=param2,
minRadius=minRadius,
maxRadius=maxRadius)
end_time = time.time()
print("Time taken for hough circle transform is : {}".format(end_time-start_time))
# if circles is not None:
# circles = circles.reshape(circles.shape[1],circles.shape[2])
# else:
# raise ValueError("ERROR!!!!!! circle not detected try tweaking the parameters or the min and max radius")
#
# a = input("enter 1 to visualize")
# if int(a) == 1 :
# for circle in circles:
# center = (circle[0],circle[1])
# radius = circle[2]
# cv2.circle(src, center, radius, (255,0,0), 5)
#
# cv2.namedWindow("Hough circle",cv2.WINDOW_NORMAL)
# cv2.imshow("Hough circle",src)
# cv2.waitKey(0)
# cv2.destroyAllWindows()
#
#
return
if __name__ == "__main__":
if len(sys.argv) != 2:
raise ValueError("usage: python hough_circle.py <path to image>")
image = cv2.imread(sys.argv[1])
image = cv2.cvtColor(image,cv2.COLOR_BGR2RGB)
hough_transform(image,1.7,100,50,30,690,700)
C++ code
#include <iostream>
#include <opencv2/opencv.hpp>
#include <ctime>
using namespace std;
using namespace cv;
void hough_transform(Mat src, double dp, double minDist, double param1=100, double param2=100, int minRadius=0, int maxRadius=0 )
{
Mat gray;
cvtColor( src, gray, COLOR_RGB2GRAY);
vector<Vec3f> circles;
int start_time = clock();
HoughCircles( gray, circles, HOUGH_GRADIENT, dp, minDist, param1, param2, minRadius, maxRadius);
int end_time = clock();
cout<<"Time taken hough circle transform: "<<(end_time-start_time)/double(CLOCKS_PER_SEC)<<endl;
// cout<<"Enter 1 to visualize the image";
// int vis;
// cin>>vis;
// if (vis == 1)
// {
// for( size_t i = 0; i < circles.size(); i++ )
// {
// Point center(cvRound(circles[i][0]), cvRound(circles[i][1]));
// int radius = cvRound(circles[i][2]);
// circle( src, center, radius, Scalar(255,0,0), 5);
// }
// namedWindow( "Hough Circle", WINDOW_NORMAL);
// imshow( "Hough Circle", src);
// waitKey(0);
// destroyAllWindows();
// }
return;
}
int main(int argc, char** argv)
{
if( argc != 2 ){
cout<<"Usage hough_circle <path to image.jpg>";
return -1;
}
Mat image;
image = imread(argv[1]);
cvtColor(image,image,COLOR_BGR2RGB);
hough_transform(image,1.7,100,50,30,690,700);
return 0;
}
I was hoping for C++ hough transform to ace python but what happened was actually opposite.
Python result:
C++ result:
Even though C++ ran the complete program ~2X faster it is very slow in hough transform. Why is it so? This is very counter intuitive. What am I missing here?

I wouldn't expect any difference between the two at all to be honest. The python library more than likely is a wrapper around the C++ library; meaning that once they get into the core of the opencv they will have identical performance if compiled with the same optimisation flags.
The only slight slowdown I'd EXPECT is python getting to that point; and with so little python code actually there; the difference is unlikely to be measureable. The fact that you're seeing it the other way around I don't think proves anything as you're performing a single test; and getting a difference of 0.2s which could trivially be the difference in just the hard disk seeking to the file to process.

I was actually comparing 2 different times. Namely wall and CPU.
In Linux, in C++ clock() gives CPU time and in Windows it gives wall time. So when I changed my python code to time.clock() Both gave same results.
As explained by #UKMonkey, The time to calculate hough in python and C++ did not have any difference at all. But, running the entire program in c++ was almost 2.5 times faster (looped 100 times).Hands down to C++ :P.

Related

Detect motion blur of a cropped face with python i.e. opencv

I'm detecting faces with haarcascade and tracking them with a webcam using OpenCV. I need to save each face that is tracked. But the problem is when people are moving. In which case the face becomes blurry.
I've tried to mitigate this problem with opencv's dnn face detector and Laplacian with the following code:
blob = cv2.dnn.blobFromImage(cropped_face, 1.0, (300, 300), (104.0, 177.0, 123.0))
net.setInput(blob)
detections = net.forward()
confidence = detections[0, 0, 0, 2]
blur = cv2.Laplacian(cropped_face, cv2.CV_64F).var()
if confidence >= confidence_threshold and blur >= blur_threshold:
cv2.imwrite('less_blurry_image', cropped_face)
Here I tried to limit saving a face if it is not blurry due to motion by setting blur_threshold to 500 and confidence_threshold to 0.98 (i.e. 98%).
But the problem is if I change the camera I have to change the thresholds again manually. And in most of the cases setting a threshold omits most of the faces.
Plus, it is difficult to detect since the background is always clear compared to the blurred face.
So my question is how can I detect this motion blur on a face. I know I can train an ML model for motion blur detection of a face. But that would require heavy processing resources for a small task.
Moreover, I will be needing a huge amount of annotated data for training if I go that route. Which is not easy for a student like me.
Hence, I am trying to detect this with OpenCV which will be a lot less resource intensive compared to using an ML model for detection.
Is there any less resource intensive solution for this?
You can probably use a Fourier Transform (FFT) or a Discrete Cosine Transform (DCT) to figure out how blurred your faces are. Blur in images leads to high frequencies disappearing, and only low frequencies remaining.
So you'd take an image of your face, zero-pad it to a size that'll work well for FFT or DCT, and look how much spectral power you have at higher frequencies.
You probably don't need FFT - DCT will be enough. The advantage of DCT is that it produces a real-valued result (no imaginary part). Performance-wise, FFT and DCT are really fast for sizes that are powers of 2, as well as for sizes that have only factors 2, 3 and 5 in them (although if you also have 3's and 5's it'll be a bit slower).
As mentioned by #PlinyTheElder, DCT information can give you motion blur. I am attaching the code snippet from repo below:
The code is in C and i am not sure if there is python binding for libjpeg. Else you need to create one.
/* Fast blur detection using JPEG DCT coefficients
*
* Based on "Blur Determination in the Compressed Domain Using DCT
* Information" by Xavier Marichal, Wei-Ying Ma, and Hong-Jiang Zhang.
*
* Tweak MIN_DCT_VALUE and MAX_HISTOGRAM_VALUE to adjust
* effectiveness. I reduced these values from those given in the
* paper because I find the original to be less effective on large
* JPEGs.
*
* Copyright 2010 Julian Squires <julian#cipht.net>
*/
#include <string.h>
#include <stdlib.h>
#include <stdio.h>
#include <jpeglib.h>
static int min_dct_value = 1; /* -d= */
static float max_histogram_value = 0.005; /* -h= */
static float weights[] = { /* diagonal weighting */
8,7,6,5,4,3,2,1,
1,8,7,6,5,4,3,2,
2,1,8,7,6,5,4,3,
3,2,1,8,7,6,5,4,
4,3,2,1,8,7,6,5,
5,4,3,2,1,8,7,6,
6,5,4,3,2,1,8,7,
7,6,5,4,3,2,1,8
};
static float total_weight = 344;
static inline void update_histogram(JCOEF *block, int *histogram)
{
for(int k = 0; k < DCTSIZE2; k++, block++)
if(abs(*block) > min_dct_value) histogram[k]++;
}
static float compute_blur(int *histogram)
{
float blur = 0.0;
for(int k = 0; k < DCTSIZE2; k++)
if(histogram[k] < max_histogram_value*histogram[0])
blur += weights[k];
blur /= total_weight;
return blur;
}
static int operate_on_image(char *path)
{
struct jpeg_error_mgr jerr;
struct jpeg_decompress_struct cinfo;
jvirt_barray_ptr *coeffp;
JBLOCKARRAY cs;
FILE *in;
int histogram[DCTSIZE2] = {0};
cinfo.err = jpeg_std_error(&jerr);
jpeg_create_decompress(&cinfo);
if((in = fopen(path, "rb")) == NULL) {
fprintf(stderr, "%s: Couldn't open.\n", path);
jpeg_destroy_decompress(&cinfo);
return 0;
}
jpeg_stdio_src(&cinfo, in);
jpeg_read_header(&cinfo, TRUE);
// XXX might be a little faster if we ask for grayscale
coeffp = jpeg_read_coefficients(&cinfo);
/* Note: only looking at the luma; assuming it's the first component. */
for(int i = 0; i < cinfo.comp_info[0].height_in_blocks; i++) {
cs = cinfo.mem->access_virt_barray((j_common_ptr)&cinfo, coeffp[0], i, 1, FALSE);
for(int j = 0; j < cinfo.comp_info[0].width_in_blocks; j++)
update_histogram(cs[0][j], histogram);
}
printf("%f\n", compute_blur(histogram));
// output metadata XXX should be in IPTC etc
// XXX also need to destroy coeffp?
jpeg_destroy_decompress(&cinfo);
return 0;
}
int main(int argc, char **argv)
{
int status, i;
for(status = 0, i = 1; i < argc; i++) {
if(argv[i][0] == '-') {
if(argv[i][1] == 'd')
sscanf(argv[i], "-d=%d", &min_dct_value);
else if(argv[i][1] == 'h')
sscanf(argv[i], "-h=%f", &max_histogram_value);
continue;
}
status |= operate_on_image(argv[i]);
}
return status;
}
Compile the code:
gcc -std=c99 blur_detection.c -l jpeg -o blur-detection
Run the code:
./blur-detection <image path>

How can I transform from cv::Mat (in C++) to tf.placeholder(in C++ Boost.Python)?

I am studying about C++ BoostPython, But I have one problem about transforming Image data type.
Receive DepthImage from Intel Realsense camera SR300.
depthImage data type = cv::Mat(this is imageformat of opencv in C++)
I want to put this Image of cv::Mat type in the tf.placeholder of Boost.Python, C++.
How can I do that? Please...
the follow code could work, i am not sure this is the best approach, but this is how i managed to get the the tensroflow working for me.
void* object = (void*)resized.ptr() ;
int64_t* dims; //set dims
//C API
TF_Tensor* tftensor = TF_NewTensor(TF_DataType::TF_FLOAT, dims, nDims, object, data_size, &deallocator, 0);
//C++ API
tensorflow::Tensor input_tensor(tensorflow::DT_FLOAT, tensorflow::TensorShape({ batch_size, objectHeight, objectWidth, objectChannels }));
int data_size = objectHeight * objectWidth * objectChannels * batch_size * sizeof(float);
std::copy_n((float*)object, data_size, (input_tensor.flat<float>()).data());

Converting from UTM to LongLat using Proj4 in C++

I've been going around this issue for days, but haven't been able to find an explanation to what I am doing wrong. I hope you can lend me a hand.
I have a set of UTM coordinates (epsg:23030) that I want to convert to LongLat Coordinates (epsg:4326) by using the proj4 library for C++ (libproj-dev). My code is as follows:
#include "proj_api.h
#include <geos/geom/Coordinate.h>
geos::geom::Coordinate utm2longlat(double x, double y){
// Initialize LONGLAT projection with epsg:4326
if ( !( pj_longlat = pj_init_plus("+init=epsg:4326" ) ) ){
qDebug() << "pj_init_plus error: longlat";
}
// Initialize UTM projection with epsg:23030
if ( ! (pj_utm = pj_init_plus("+init=epsg:23030" ) ) ){
qDebug() << "pj_init_plus error: utm";
}
// Transform UTM projection into LONGLAT projection
int p = pj_transform( pj_utm, pj_longlat, 1, 1, &x, &y, NULL );
// Check for errors
qDebug() << "Error message" << pj_strerrno( p ) ;
// Return values as coordinate
return geos::geom::Coordinate(x, y)
}
My call to the function utm2longlat:
...
// UTM coordinates
double x = 585363.1;
double y = 4796767.1;
geos::geom::Coordinate coord = utm2longlat( x, y );
qDebug() << coord.x << coord.y;
/* Result is -0.0340087 0.756025 <-- WRONG */
In my example:
I know that UTM coordinates (585363.1 4796767.1) refer to LongLat coordinates (-1.94725 43.3189).
However, when called, the function returns a set of wrong coordinates: (-0.0340087 0.756025 ).
I was wondering if I had any misconfiguration when initializing the projections, so I decided to test the Proj4 Python bindings (pyproj), just to test whether I got the same wrong coordinates... and curiously, I got the good ones.
from pyproj import Proj, transform
// Initialize UTM projection
proj_utm = Proj(init='epsg:23030')
// Initialize LongLat projection
proj_lonlat = Proj(init='epsg:4326')
x_utm, y_utm = 585363.1, 4796767.1
x_longlat, y_longlat = transform(proj_utm, proj_lonlat, x_utm, y_utm)
// Print results
print "original", x_utm, y_utm
print "utm2lonlat", x_longlat, y_longlat
/* Result is -1.94725 43.3189 <-- CORRECT */
From what I understand pyproj is a set of Cython bindings over the Proj4 library, so I am using the same core in both programming languages.
Do you have any clue as to what could be wrong? Am I missing some type of conversion in the C++ function?
Thanks in advance.
The result seems to be correct to me, but it's returned in radians instead of degrees. Convert the result to degrees and check again.

PyQt5 OpenGL swapBuffers very slow

I am trying to make a small application using PyQt5 and PyOpenGL. Everything works fine, however rendering takes way too long with even only one sphere. I tried different routes to try and optimise the speed of the app, and right now I am using a simple QWindow with an OpenGLSurface.
I managed to figure out that it is the context.swapBuffers call that takes a long time to complete and varies between approx. 0.01s (which is fine) and 0.05s (which is way to long), when displaying 1 sphere with some shading and 240 vertices.
Now my questions are the following: Is this normal? If so, is there a way to speed this process up or is this related to how pyqt works, since it is a python wrap around the library? Basically: is there any way for me to continue developing this program without needing to learn c++. It's quite a simple application that just needs to visualise some atomic structure and be able to manipulate it.
Is there another gui toolkit I could maybe use to have less overhead when working with OpenGL from pyopengl?
This is the definition that does the rendering:
def renderNow(self):
if not self.isExposed():
return
self.m_update_pending = False
needsInitialize = False
if self.m_context is None:
self.m_context = QOpenGLContext(self)
self.m_context.setFormat(self.requestedFormat())
self.m_context.create()
needsInitialize = True
self.m_context.makeCurrent(self)
if needsInitialize:
self.m_gl = self.m_context.versionFunctions()
self.m_gl.initializeOpenGLFunctions()
self.initialize()
self.render()
self.m_context.swapBuffers(self)
if self.m_animating:
self.renderLater()
I am using OpenGl directly without using Qt opengl definitions, the format for the surface is given by:
fmt = QSurfaceFormat()
fmt.setVersion(4, 2)
fmt.setProfile(QSurfaceFormat.CoreProfile)
fmt.setSamples(4)
fmt.setSwapInterval(1)
QSurfaceFormat.setDefaultFormat(fmt)
Edit1:
Some more clarification on how my code works:
def render(self):
t1 = time.time()
glClear(GL_COLOR_BUFFER_BIT | GL_DEPTH_BUFFER_BIT)
wtvMatrix = self.camera.get_wtv_mat()
transformMatrix = matrices.get_projection_matrix(60, self.width() / self.height(), 0.1, 30, matrix=wtvMatrix)
transformMatrixLocation = glGetUniformLocation(self.shader,"transformMatrix")
glUniformMatrix4fv(transformMatrixLocation,1,GL_FALSE,transformMatrix)
eye_pos_loc = glGetUniformLocation(self.shader, "eye_world_pos0")
glUniform3f(eye_pos_loc, self.camera.position[0], self.camera.position[1], self.camera.position[2])
glDrawElementsInstanced(GL_TRIANGLES,self.num_vertices,GL_UNSIGNED_INT,None,self.num_objects)
print("drawing took:{}".format(time.time()-t1))
self.frame+=1
t1=time.time()
self.m_context.swapBuffers(self)
print('swapping buffers took:{}'.format(time.time()-t1))
This is the only drawElementsInstanced that I call. Shaders are set up as follows (sorry for the mess):
VERTEX_SHADER = compileShader("""#version 410
layout(location = 0) in vec3 vertex_position;
layout(location = 1) in vec3 vertex_colour;
layout(location = 2) in vec3 vertex_normal;
layout(location = 3) in mat4 model_mat;
layout(location = 7) in float mat_specular_intensity;
layout(location = 8) in float mat_specular_power;
uniform mat4 transformMatrix;
uniform vec3 eye_world_pos0;
out vec3 normal0;
out vec3 colour;
out vec3 world_pos;
out float specular_intensity;
out float specular_power;
out vec3 eye_world_pos;
void main () {
colour = vertex_colour;
normal0 = (model_mat*vec4(vertex_normal,0.0)).xyz;
world_pos = (model_mat*vec4(vertex_position,1.0)).xyz;
eye_world_pos = eye_world_pos0;
specular_intensity = mat_specular_intensity;
specular_power = mat_specular_power;
gl_Position = transformMatrix*model_mat*vec4(vertex_position,1.0);
}""", GL_VERTEX_SHADER)
FRAGMENT_SHADER = compileShader("""#version 410
in vec3 colour;
in vec3 normal0;
in vec3 world_pos;
in float specular_intensity;
in float specular_power;
in vec3 eye_world_pos;
out vec4 frag_colour;
struct directional_light {
vec3 colour;
float amb_intensity;
float diff_intensity;
vec3 direction;
};
uniform directional_light gdirectional_light;
void main () {
vec4 ambient_colour = vec4(gdirectional_light.colour * gdirectional_light.amb_intensity,1.0f);
vec3 light_direction = -gdirectional_light.direction;
vec3 normal = normalize(normal0);
float diffuse_factor = dot(normal,light_direction);
vec4 diffuse_colour = vec4(0,0,0,0);
vec4 specular_colour = vec4(0,0,0,0);
if (diffuse_factor>0){
diffuse_colour = vec4(gdirectional_light.colour,1.0f) * gdirectional_light.diff_intensity*diffuse_factor;
vec3 vertex_to_eye = normalize(eye_world_pos-world_pos);
vec3 light_reflect = normalize(reflect(gdirectional_light.direction,normal));
float specular_factor = dot(vertex_to_eye, light_reflect);
if(specular_factor>0) {
specular_factor = pow(specular_factor,specular_power);
specular_colour = vec4(gdirectional_light.colour*specular_intensity*specular_factor,1.0f);
}
}
frag_colour = vec4(colour,1.0)*(ambient_colour+diffuse_colour+specular_colour);
}""", GL_FRAGMENT_SHADER)
Now the code that I use when I want to rotate the scene is the following (the camera updates etc are as normally done afaik):
def mouseMoveEvent(self, event):
dx = event.x() - self.lastPos.x()
dy = event.y() - self.lastPos.y()
self.lastPos = event.pos()
if event.buttons() & QtCore.Qt.RightButton:
self.camera.mouse_update(dx,dy)
elif event.buttons()& QtCore.Qt.LeftButton:
pass
self.renderNow()
Some final info: All vertex info needed in the shaders is given through a vao that I initialized and bound earlier in the initialize definition, does not contain too many objects (I'm just testing and it uses an icosahedron with 2 subdivisions to render a sphere, also, I removed the duplicate vertices but that did not do anything since that really should not be the bottleneck I think).
To answer some questions: I did try with varius different versions of opengl just for gigglez, no changes, tried without vsync, nothing changes, tried with different sample sizes, no changes.
Edit2:
Might be a clue: the swapBuffers takes around 0.015s most of the time, but when I start moving around a lot, it stutters and jumps up to 0.05s for some renders. Why is this happening? From what I understand, every render has to process all the data anyways?
By the way OpenGL works, the rendering commands you submit are sent to the GPU and executed asynchronously (frankly even the process of sending them to the GPU is asynchronous). When you request to display the back buffer by a call to swapBuffers the display driver must wait till the content of the back buffer finishes rendering (i.e. all previously issued commands finish executing), and only then it can swap the buffers.†
If you experience low frame rate then you shall optimize your rendering code, that is the stuff you submit to the GPU. Switching to C++ will not help you here (though it would be a great idea independently).
EDIT: You say that when you do nothing then your swapBuffers executes in 0.015 seconds, which is suspiciously ~1/60th of a second. It implies that your rendering code is efficient enough to render at 60 FPS and you have no reason to optimize it yet. What probably happens is that your call to renderNow() from mouseMoveEvent causes re-rendering the scene more than 60 times per second, which is redundant. Instead you should call renderLater() in mouseMoveEvent, and restructure your code accordingly.
NOTE: you call swapBuffers twice, once in render() and once in renderNow() immediately after.
DISCLAIMER: I'm not familiar with PyOpenGL.
† swapBuffer may also execute asynchronously, but even then if the display driver swaps buffers faster than you can render you will eventually block on the swapBuffer call.

What is the simplest way to make object detector on C++ with Fast/Faster-RCNN?

What is the simplest way to make object detector on C++ with Fast/Faster-RCNN and Caffe?
As known, we can use follow RCNN (Region-based Convolutional Neural Networks) with Caffe:
RCNN: https://github.com/BVLC/caffe/blob/be163be0ea5befada208dbf0db29e6fa5811dc86/python/caffe/detector.py#L174
Fast RCNN: https://github.com/rbgirshick/fast-rcnn/blob/master/tools/demo.py#L89
scores, boxes = im_detect(net, im, obj_proposals) which calls to def im_detect(net, im, boxes):
for this used rbgirshick/caffe-fast-rcnn, ROIPooling-layers and output bbox_pred
Faster RCNN: https://github.com/rbgirshick/py-faster-rcnn/blob/master/tools/demo.py#L82
scores, boxes = im_detect(net, im) which calls to def im_detect(net, im, boxes=None):
for this used rbgirshick/caffe-fast-rcnn, ROIPooling-layers and output bbox_pred
All of these use Python and Caffe, but how to do it on C++ and Caffe?
There is only C++ example for classification (to say what on image), but there is not for detecton (to say what and where on image): https://github.com/BVLC/caffe/tree/master/examples/cpp_classification
Is it enough to simply clone rbgirshick/py-faster-rcnn repository with
rbgirshick/caffe-fast-rcnn, download the pre-tained model ./data/scripts/fetch_faster_rcnn_models.sh, use this coco/VGG16/faster_rcnn_end2end/test.prototxt and done a small change in CaffeNet C++ Classification example?
And how can I get output data from two layers bbox_pred and cls_score ?
Will I have all (bbox_pred & cls_score) in one array:
const vector<Blob<float>*>& output_blobs = net_->ForwardPrefilled();
Blob<float>* output_layer = output_blobs[0];
const float* begin = output_layer->cpu_data();
const float* end = begin + output_layer->channels();
std::vector<float> bbox_and_score_array(begin, end);
Or in two arrays?
const vector<Blob<float>*>& output_blobs = net_->ForwardPrefilled();
Blob<float>* bbox_output_layer = output_blobs[0];
const float* begin_b = bbox_output_layer ->cpu_data();
const float* end_b = begin_b + bbox_output_layer ->channels();
std::vector<float> bbox_array(begin_b, end_b);
Blob<float>* score_output_layer = output_blobs[1];
const float* begin_c = score_output_layer ->cpu_data();
const float* end_c = begin_c + score_output_layer ->channels();
std::vector<float> score_array(begin_c, end_c);
for those of you who are still looking for it, there is a C++ version of faster-RCNN with caffe in this project. You can even find a c++ api to include it in your project. I have successfully tested it.

Categories

Resources